Wednesday, June 11, 2008

Python-powered webcam for MIT construction site--with time-lapse video

My office on the 4th floor of the Broad Institute looks down on the construction site for MIT's new cancer research building. Soon after they tore down the beautiful landscaping that used to be outside the Stata Center, I set up a webcam on my windowsill and started idly capturing shots of the construction while I worked.

This week I finally automated the process with a Python script using VideoCapture and pyMedia. Once a day it compiles all the shots, now taken at 30 minute intervals from 7am-5pm, into a 12fps time-lapse video:

Large MPEG (1280x960, 16.6MB and growing)
Small MPEG (640x480)

A live shot is also taken every 5 minutes: [Update 3/7/09: I stopped recording when they covered the whole construction site with a big sheet.]


(click for larger image)

I used a Logitech Quickcam Pro 9000, which has a glass lens and nice resolution. It deals surprisingly well with the bright sky and dark, shadowed construction site.

Construction is expected to finish in December 2010, so I'll up the frame-rate when the file starts getting huge.

Saturday, March 15, 2008

stored_result(): automatic serialization of expensive calculations

A very common situation I encounter: "Do [expensive calculation] and pickle the result; unless it's been done before, in which case unpickle the stored result." This stored_result() utility function makes that very easy, and provides some convenient options for scripting and memory management."


def stored_results(pkl_filename, function, overwrite=0, verbose=1,
          lazy=False, args=[], kwargs={}):
    """Returns contents of pickled file 'pkl_filename'. If file does not exist
    or is malformed, returns output of function (with args & kwargs) and pickles same
    result to pkl_filename.
    If overwrite>0, runs function regardless of presence of 'pkl_filename'.
    If lazy=True, instead outputs a generator that evaluates the rest of the stored_results()
    call only when its next() method is called."""
    import cPickle
    if lazy:
        # Create a generator that will eventually retrieve stored result
        def result_gen():
            result = stored_results(pkl_filename, function, overwrite, verbose,
                                    False, args, kwargs)
            while 1:
                yield result
        return result_gen()
    else:
        if not overwrite>0:
            try:
                inf = file(pkl_filename, 'rb')
                result = cPickle.load(inf)
                inf.close()
                if verbose: print pkl_filename, "successfully unpickled"
                return result
            except (cPickle.UnpicklingError, IOError):
                if verbose: print "Result not stored. Running function", function.func_name
        result = function(*args, **kwargs)
        outf = file(pkl_filename, 'wb')
        cPickle.dump(result, outf, -1)
        outf.close()
        if verbose: print "Result pickled to", pkl_filename
        return result


Typical use:
result = stored_results('saved_result.pkl', expensive_function, args=(23, 'sweet'))

Advanced use:
To perform the calculation/load the result in a just-in-time fashion:
result_gen = stored_results('saved_result.pkl', expensive_function, args=(23, 'sweet'), lazy=True)
[memory-sensitive tasks that may or may not need the result]
result = result_gen.next()

Victory is mine!

I defended and graduated! PhD in Physics from MIT. Now it gets good.

I crossed the street to the Broad Institute to do cancer research. Now, I don't interact with patients, or even deal with tumor samples. My job is to manage and analyze the huge flow of data coming from the latest microarray experiments. We look at sequence, expression, methylation, copy-number variation, and microRNA data simultaneously to tease out the subtle cellular changes that result in cancer.

When motivation strikes, I'll write about some of the programming and other issues I deal with at work.