Hooray, at long last, they’ve added an ordered dict, appearing in Python 3.1 (and probably eventually backed ported to 2.7).
Wednesday, April 29, 2009
I started a new 10 gallon tank.

Isn’t it pretty??

Here’s a close-up of the grotto area.
I created the grotto to provide places for fish to hide while maintaining a natural look and feel. I’m not sure what the name of the plants I bought, but in general, I found it was very difficult to find healthy aquatic plants at the various tropical fish shops I visited. Petco had some decent plants, but they didn’t even look like they were aquatic plants. Seemed like they just immersed some household plants in water. Cheaters.
So while the plants cycle my aquarium, fostering beneficial bacteria to start the nitrogen cycle, I’ve got some time to think about what fish to put in my new tank. I’m liking cichilds, I’ve got two firemouth cichilds in a 5 gallon tank. I might try to get a more varied community in this tank because it’s a bit bigger. More pictures once I decide.
I would like someone on the internets to create a Facebook app that will tell me which of my friends have the following characteristics:
- most photographed (camera ho)
- most photographed with you (shutterbuddy)
- most commented on (viral)
- most activity on your profile (bff)
- most friends (mr/ms popular)
- most friends in common (good taste in friends)
- least friends (social pariah)
- most status updates (no one cares about you)
- most videos posted (lifetime achievement award)
- most frequently changing profile pic (look at me look at me look at me)
- most stale profile picture (i think it’s about time for a change)
- most obscure interests (uh, what?)
- most cliched tastes (like, omg)
- most jobs (journeyman/woman)
- longest description (most self-absorbed)
- most active (get off the internet!)
- least active (get on the internet!)
- most “boxes” (stop sending me drinks/saplings/zombies)
Once the app has that data, then it can aggregate an overall toplist for the people who have these traits and those people would get special badges. And there will be much rejoicing.
Wednesday, December 3, 2008
When I saw this question on stackoverflow asking about how to determine if an image is identical, it reminded me of my favorite class at JHU, Computer Vision. One of the things that I remember is that if you wanted to compute how similar two images are, you’d treat their pixels as vectors, normalize them, then take their dot product. The result is a float between 0 and 1 that indicated the percent similarity of the two images. This process is called the normalized cross correlation. After you got that number, it was a matter of setting a threshold as to what you wanted to accept as similar or not. For fun, I whipped up a naive implementation of normalized cross correlation in Python using PIL and numpy:
It’s pretty slow, taking about a minute to process two 400k jpegs on my MacBook Pro, but I bet there’s a nice way to parallelize it (maybe using Python 2.6’s sweet new multiprocessing module?).
Tuesday, December 2, 2008
Sometimes it’s useful to have global variables, like for config or database connections. However, you don’t want to introduce side effects when you import the module (with certain exceptions).
Normally to avoid this, you would wrap your global variables in functions, maybe memoizing the return value. For example:
However, it gets kind of annoying having to call that function all the time when you just want to have a global variable. With Python metaclass magic, you can have that nice global variable feel without the bad side effects on import:
The Lazy class is a metaclass that implements the singleton design pattern. It delegates all read access to a special class variable called instance, calling the instantiate() class method upon first access. The db class uses this metaclass and implements the instantiate() method.
This little bit of magic helps you keep your code clean without introducing import side effects. For more info on Python metaclasses, see Guido’s tutorial.
Earlier this year, I went to the Museum of Modern Art to check out Jonathan Harris’s data visualization artwork piece, I Want You To Want Me. With I Want You To Want Me, Harris mines Craigslist’s personal ads and slices up the data by gender, age, match preference and self-description. To visualize the data, Harris presents to the viewer an open sky that gets flooded with balloons representing each person’s ad. You can touch the screen to interact with the balloons for more details or change filters, and the balloons react realistically. It’s a very beautiful work, and it warms my heart that a programmer’s work can be considered art.
There’s hope for me to get into the MoMa yet!
Harris’s other well-known work is We Feel Fine, is similar to I Want You To Want Me, but instead of personal ads, he mines blogs for the phrase “I feel” and analyzes the text to figure out what feeling the blog entry is expressing. He presents the data as little blobs that you can interact with. He even provides an API for you to use the data he collected.
Some other cool data visualization links:
Monday, November 24, 2008
Recently I’ve been playing around with MySQL Proxy, a network proxy for MySQL. One cool thing you can do with MySQL Proxy is to specify a Lua script that implements special hooks that expose various parts of the MySQL network protocol. For example, implementing a read_query() function will let you manipulate queries that MySQL has received but hasn’t processed yet. You can do fun things with it like log, manipulate or discard the query, all without having to modify your client applications.
For fun, I’ve created a mysql-proxy-cache project that will return a cached version of any SELECT queries, if they’ve been executed already. I store cached results in a memcache instance whose keys are md5 hashes of the queries that generated them.
It was pretty fun working on this because it let me learn Lua as well, further adding to my arsenal of programming languages. However, the project is totally alpha and shouldn’t be using in a production environment. Mostly because there’s no way to expire cached items.
In order to supported cache expiration, I’d need to intercept UPDATE/INSERT/DELETE queries and clear the cache if they touch any rows that are in the cache. An easy way out would be to just clear cached items if the queries’ source table(s) were modified, not necessarily their rows, but then that’s exactly the behavior of MySQL’s built in query cache so it wouldn’t be very useful.
Yesterday on a whim, I decided to buy some new fish for my aquarium. I’ve had zebra danios, ghost shrimp, platys and pencilfish at some point in time in my aquarium, but all but the zebras have died. I wanted to try a new fish, so I got some Jack Dempseys. They’re pretty tough, chasing away other fish from their territory. Also, apparently they grow to be like a foot long. I wish I had known that at the fish store. I think there’s an opportunity for an iPhone app in there somewhere.
Let’s see, what would the workflow be?
- Go to fish store
- See a fish that looks cute
- Take a picture with iPhone
- Update picture to fish servers for identity matching
- Match the identity and serve up a page of detailed information on the fish, including ideal tank mates, preferred diet, turn ons, and horoscope.
- Repeat
Hey, it could happen, if I muster up enough non-laziness.
Right now I’m trying my hand at culturing fruit flies so that my fish can have some live food from time to time. I’ve had a fruit fly problem for a few weeks now, and have been trying to create traps to exterminate them, but I’ve recently realized that hey, mayhaps my fish would like to eat them. And they do!
So I’ve been luring them into a jar with some vinegar at the bottom (vinegar DOES attract more flies than honey) and then when there’s a bunch in there, covering the top with a plastic baggie and sealing it off once the flies are all in there. After that, I throw the bag in the freezer for 10 seconds to stun the flies, then dump them into the fish tank. Mmmm!
I’d like to have a more repeatable way of doing this, so I’m gonna try letting them breed for a while in the jar. We’ll see how that goes.
Tuesday, September 2, 2008
I recently published my pygame artificial life simulation: Emergent. Check it out. I’m not quite sure where to head with it so any suggestions are welcome.
Fareed Zarakia in The Post-American World
talks about the difference in 19th century productivity in China and Europe:
Throwing more manpower at a problem is not the path to innovation. The historian Philip Huang makes a fascinating comparison between the farmers of the Yangtze Delta and those of England, the richest regions of China and Europe respectively in 1800. He points out that, by some measures, the two areas might seem to have been at equivalent economic levels. But in fact, Britain was far ahead in the key measure of growth – labor productivity. The Chinese were able to make their land highly productive, but they did so by putting more and more people to work on a given acre – what Huang calls “output without development.” The English, on the other hand, kept searching for ways to make labor more productive so that each farmer was producing more crops. They discovered new labor-saving devices, using animals and inventing machines . . . Ultimately, the results was that a small number of Britons were able to farm huge swaths of land. By the eighteenth century, the average farm size in southern England was 150 acres; in the Yangtze delta, it was about 1 acre.
I feel like a lot of software companies are repeating the same mistake by throwing many warm bodies at software projects without investing enough resources into trying to improve the productivity of each programmer, with software best practices and processes instead of with plows and animals.