Category Archives: Programming

CouchDB: the last RESTful JSON store you’ll ever need

Today I gave a talk about CouchDB at the company I work for. I titled the talk “CouchDB: the last RESTful JSON store you’ll ever need”, which is of course an exaggeration, but our shop is heavy into Oracle and JSON services, so I geared the talk around presenting CouchDB as a viable alternative. I [...]

Announcing Couch Crawler, a CouchDB search engine/crawler

Hi! So, for fun, I made couch-crawler, a search engine and crawler on top of the very excellent couchdb-lucene. I wanted to create a hackable search engine for my work intranet using modern tools. Lucene is great, but the Nutch search engine/crawler was kind of annoying to work with. I couldn’t figure out how to [...]

Javascript sucks at sorting integers

In general, I heart Javascript. It’s one of the most misunderstood languages, but it has definitely made a comeback in a big way, not just with sweet client-side frameworks like Prototype and jQuery, but also on the server-side with CouchDB, MongoDB and Node.js.
But in its current form (Javascript 1.5), it sucks at sorting integers:
js> [...]

Merlke: a native Erlang build tool

After seriously diving into my pet Erlang project, I found myself clinging to Rake to do my builds. It was kind of unsatisfying though, especially once my builds became more complicated (hooray for code-generated parsers, leex/yecc). Why can we have Erlangs all the way down? Futhermore, OTP provides most of the build steps you’d wanna [...]

Tokyo Tyrant: The magic little database

I’ve been playing around with Tokyo Tyrant master-master replication and I have got to say that it is a magical little database. Here’s some sample code to configure two Tokyo Tyrant instances in master-master replication mode, as well as a script to show off some of Tyrant’s capabilities, including memcached protocol support.

#!/bin/bash
 
HOST=127.0.0.1
PORT=1978
SID=1
UPDATE_LOG_DIR=ulog/$SID
MASTER_HOST=127.0.0.1
MASTER_PORT=1979
 
mkdir -p $UPDATE_LOG_DIR
 
ttserver -host [...]

Erliki – a wiki written in Erlang

So, I had published Erliki, my Erlang wiki server onto GitHub a while back. Erliki is a self-contained wiki server that uses BeepBeep, a framework that provides a few niceties like templating via ErlyDTL and session handling on top of the very solid Erlang http server, MochiWeb. BeepBeep doesn’t provide a backend, so I decided [...]

Ordered dicts in Python 3.1!

Hooray, at long last, they’ve added an ordered dict, appearing in Python 3.1 (and probably eventually backed ported to 2.7).

Determining image similarity

When I saw this question on stackoverflow asking about how to determine if an image is identical, it reminded me of my favorite class at JHU, Computer Vision. One of the things that I remember is that if you wanted to compute how similar two images are, you’d treat their pixels as vectors, normalize them, [...]

Pythonic lazy instantiation (singleton pattern)

Sometimes it’s useful to have global variables, like for config or database connections. However, you don’t want to introduce side effects when you import the module (with certain exceptions).
Normally to avoid this, you would wrap your global variables in functions, maybe memoizing the return value. For example:

def get_db():
db = getattr(get_db, ‘db’, [...]

Data Visualization is the new Modern Art

Earlier this year, I went to the Museum of Modern Art to check out Jonathan Harris’s data visualization artwork piece, I Want You To Want Me. With I Want You To Want Me, Harris mines Craigslist’s personal ads and slices up the data by gender, age, match preference and self-description. To visualize the data, Harris [...]