Tokyo Tyrant: The magic little database

I’ve been playing around with Tokyo Tyrant master-master replication and I have got to say that it is a magical little database. Here’s some sample code to configure two Tokyo Tyrant instances in master-master replication mode, as well as a script to show off some of Tyrant’s capabilities, including memcached protocol support.

#!/bin/bash
 
HOST=127.0.0.1
PORT=1978
SID=1
UPDATE_LOG_DIR=ulog/$SID
MASTER_HOST=127.0.0.1
MASTER_PORT=1979
 
mkdir -p $UPDATE_LOG_DIR
 
ttserver -host $HOST \
         -port $PORT \
         -ulog $UPDATE_LOG_DIR \
         -sid $SID \
         -mhost $MASTER_HOST \
         -mport $MASTER_PORT
#!/bin/bash
 
HOST=127.0.0.1
PORT=1979
SID=2
UPDATE_LOG_DIR=ulog/$SID
MASTER_HOST=127.0.0.1
MASTER_PORT=1978
 
mkdir -p $UPDATE_LOG_DIR
 
ttserver -host $HOST \
         -port $PORT \
         -ulog $UPDATE_LOG_DIR \
         -sid $SID \
         -mhost $MASTER_HOST \
         -mport $MASTER_PORT
import pytyrant
import time
 
t1 = pytyrant.PyTyrant.open('127.0.0.1', 1978)
t2 = pytyrant.PyTyrant.open('127.0.0.1', 1979)
 
print t2.get('foo', 'nothing here') # 'nothing here'
 
# Writing to t1 replicates to t2
t1['foo'] = 'bar'
print t2.get('foo', 'nothing here') # 'foo'
 
 
# Tokyo Tyrant also speaks the memcached protocol 
 
import memcache
mc = memcache.Client(['127.0.0.1:1978', '127.0.0.1:1979'])
 
print mc.get('foo') # 'bar'
mc.set('yo', 'sup') # Assigns ('yo', 'sup') pair to one of the servers in the pool 
 
# Both Tyrant servers should have ('yo', 'sup') because of master-master replication 
print t1.get('yo', 'nothing in t1') # 'sup'
print t2.get('yo', 'nothing in t2') # 'sup'

For more on Tokyo Tyrant and Tokyo Cabinet, check out this slideshow:

Erliki – a wiki written in Erlang

So, I had published Erliki, my Erlang wiki server onto GitHub a while back. Erliki is a self-contained wiki server that uses BeepBeep, a framework that provides a few niceties like templating via ErlyDTL and session handling on top of the very solid Erlang http server, MochiWeb. BeepBeep doesn’t provide a backend, so I decided to go native and use Mnesia, Erlang’s native distributed dbms.

I didn’t go very far in implementing wiki syntax, so Erliki only supports [[Wiki links]]. However, I feel that I can easily extend it to support any syntax because I created a compiler using Erlang’s yecc module and some custom code. It’s pretty sweet, yecc will code-generate the parsing code for you if you just supply it with a grammar. Here’s wiki.yrl:

Header "%% Erliki".
Nonterminals phrase wikilink word.
Terminals string '[[' ']]' '<' '>'.
Rootsymbol phrase.
 
phrase -> word : '$1'.
phrase -> word phrase : ['$1', '$2'].
wikilink -> '[[' string ']]' : {'wikilink', '$2'}.
word -> wikilink : '$1'.
word -> string : '$1'.
word -> '<' : '$1'.
word -> '>' : '$1'.

Passing a string into the generated parsing code will return a list of tuples representing the parse tree, which I then store directly into Mnesia. From there, I can convert the wiki links in the parse tree into <a> tags or [[wiki syntax]] depending on if you’re reading or editing the page. Storing the native parse tree in Mnesia also makes it easy to prevent html injection because you’re not re-parsing the input when storing or reading from the database.

I created it mostly to learn about MochiWeb and Mnesia so I doubt I’ll be maintaining Erliki, but feel free to check it out for yourself. There’s not external dependencies, just add Erlang. :)

Golden Age of Gaming?

Are we in a golden age of gaming? Check out the production value on these games:

The production value on these games have a Hollywood-esque sense of epicness. Whether they turn out to be good games remains to be seen, but the quantity of epic releases these days is kind of staggering. In previous gaming generations, there might have been two epic games a year, staggered throughout the year.

Now, there’s so many that it’s hard keeping up. I’m still playing Left 4 Dead, Prototype, Fallout 3, Gears of War 2, which are all relatively old, but still super good. I haven’t even played Prince of Persia, Force Unleashed, Mirror’s Edge or Red Faction: Guerrilla, just because reviews said they were only ok. Now there’s a whole slew of new games coming out again! Oh to be in high school again so that I could properly enjoy this golden age.

Ordered dicts in Python 3.1!

Hooray, at long last, they’ve added an ordered dict, appearing in Python 3.1 (and probably eventually backed ported to 2.7).

New 10 gallon tank

I started a new 10 gallon tank.

New 10 gallon tank
Isn’t it pretty??

Grotto
Here’s a close-up of the grotto area.

I created the grotto to provide places for fish to hide while maintaining a natural look and feel. I’m not sure what the name of the plants I bought, but in general, I found it was very difficult to find healthy aquatic plants at the various tropical fish shops I visited. Petco had some decent plants, but they didn’t even look like they were aquatic plants. Seemed like they just immersed some household plants in water. Cheaters.

So while the plants cycle my aquarium, fostering beneficial bacteria to start the nitrogen cycle, I’ve got some time to think about what fish to put in my new tank. I’m liking cichilds, I’ve got two firemouth cichilds in a 5 gallon tank. I might try to get a more varied community in this tank because it’s a bit bigger. More pictures once I decide.

Tell me something interesting, Facebook

I would like someone on the internets to create a Facebook app that will tell me which of my friends have the following characteristics:

  • most photographed (camera ho)
  • most photographed with you (shutterbuddy)
  • most commented on (viral)
  • most activity on your profile (bff)
  • most friends (mr/ms popular)
  • most friends in common (good taste in friends)
  • least friends (social pariah)
  • most status updates (no one cares about you)
  • most videos posted (lifetime achievement award)
  • most frequently changing profile pic (look at me look at me look at me)
  • most stale profile picture (i think it’s about time for a change)
  • most obscure interests (uh, what?)
  • most cliched tastes (like, omg)
  • most jobs (journeyman/woman)
  • longest description (most self-absorbed)
  • most active (get off the internet!)
  • least active (get on the internet!)
  • most “boxes” (stop sending me drinks/saplings/zombies)
Once the app has that data, then it can aggregate an overall toplist for the people who have these traits and those people would get special badges. And there will be much rejoicing.

Determining image similarity

When I saw this question on stackoverflow asking about how to determine if an image is identical, it reminded me of my favorite class at JHU, Computer Vision. One of the things that I remember is that if you wanted to compute how similar two images are, you’d treat their pixels as vectors, normalize them, then take their dot product. The result is a float between 0 and 1 that indicated the percent similarity of the two images. This process is called the normalized cross correlation. After you got that number, it was a matter of setting a threshold as to what you wanted to accept as similar or not. For fun, I whipped up a naive implementation of normalized cross correlation in Python using PIL and numpy:

import Image
from numpy import average, linalg, dot
import sys
 
images = sys.argv[1:3]
vectors = []
norms = []
 
for image in images:
  vector = []
 
  for pixel_tuple in Image.open(image).getdata():
    vector.append(average(pixel_tuple))
 
  vectors.append(vector)
  norms.append(linalg.norm(vector, 2))
 
a, b = vectors
a_norm, b_norm = norms
 
print dot(a / a_norm, b / b_norm)

It’s pretty slow, taking about a minute to process two 400k jpegs on my MacBook Pro, but I bet there’s a nice way to parallelize it (maybe using Python 2.6’s sweet new multiprocessing module?). 

Pythonic lazy instantiation (singleton pattern)

Sometimes it’s useful to have global variables, like for config or database connections. However, you don’t want to introduce side effects when you import the module (with certain exceptions).

Normally to avoid this, you would wrap your global variables in functions, maybe memoizing the return value. For example:

def get_db():
    db = getattr(get_db, 'db', db_connection())
    get_db.db = db
    return db
 
def func1():
    db = get_db()
    db.execute('SELECT * FROM things')    
 
def func2():
    db = get_db()
    db.execute('SELECT * FROM other_things')

However, it gets kind of annoying having to call that function all the time when you just want to have a global variable. With Python metaclass magic, you can have that nice global variable feel without the bad side effects on import:

class Lazy(type):
    def __init__(cls, name, bases, dict):
        super(Lazy, cls).__init__(name, bases, dict)    
        cls.instance = None
 
    def check_instance(cls):
        if cls.instance is None:
            if hasattr(cls, 'instantiate'):
                setattr(cls, 'instance', getattr(cls, 'instantiate')())
            else:
                raise Exception('Must implement the instantiate class method!')
 
    def __getattr__(cls, name):
        cls.check_instance()                
        return getattr(cls.instance, name)
 
    def __getitem__(cls, key):
        cls.check_instance()                
        return cls.instance.__getitem__(name)
 
    def __iter__(cls):
        cls.check_instance()                
        return cls.__iter__()
 
    def __contains__(self, item):
        cls.check_instance()                
        return cls.__contains__(item)

The Lazy class is a metaclass that implements the singleton design pattern. It delegates all read access to a special class variable called instance, calling the instantiate() class method upon first access. The db class uses this metaclass and implements the instantiate() method.

This little bit of magic helps you keep your code clean without introducing import side effects. For more info on Python metaclasses, see Guido’s tutorial.

Data Visualization is the new Modern Art

Earlier this year, I went to the Museum of Modern Art to check out Jonathan Harris’s data visualization artwork piece, I Want You To Want Me. With I Want You To Want Me, Harris mines Craigslist’s personal ads and slices up the data by gender, age, match preference and self-description. To visualize the data, Harris presents to the viewer an open sky that gets flooded with balloons representing each person’s ad. You can touch the screen to interact with the balloons for more details or change filters, and the balloons react realistically. It’s a very beautiful work, and it warms my heart that a programmer’s work can be considered art.

There’s hope for me to get into the MoMa yet!

Harris’s other well-known work is We Feel Fine, is similar to I Want You To Want Me, but instead of personal ads, he mines blogs for the phrase “I feel” and analyzes the text to figure out what feeling the blog entry is expressing. He presents the data as little blobs that you can interact with. He even provides an API for you to use the data he collected.

Some other cool data visualization links:

mysql-proxy-cache: a protocol-level mysql cache

Recently I’ve been playing around with MySQL Proxy, a network proxy for MySQL. One cool thing you can do with MySQL Proxy is to specify a Lua script that implements special hooks that expose various parts of the MySQL network protocol. For example, implementing a read_query() function will let you manipulate queries that MySQL has received but hasn’t processed yet. You can do fun things with it like log, manipulate or discard the query, all without having to modify your client applications. 

For fun, I’ve created a mysql-proxy-cache project that will return a cached version of any SELECT queries, if they’ve been executed already. I store cached results in a memcache instance whose keys are md5 hashes of the queries that generated them.

It was pretty fun working on this because it let me learn Lua as well, further adding to my arsenal of programming languages. However, the project is totally alpha and shouldn’t be using in a production environment. Mostly because there’s no way to expire cached items.

In order to supported cache expiration, I’d need to intercept UPDATE/INSERT/DELETE queries and clear the cache if they touch any rows that are in the cache. An easy way out would be to just clear cached items if the queries’ source table(s) were modified, not necessarily their rows, but then that’s exactly the behavior of MySQL’s built in query cache so it wouldn’t be very useful.