A very common situation I encounter: "Do [expensive calculation] and pickle the result; unless it's been done before, in which case unpickle the stored result." This stored_result() utility function makes that very easy, and provides some convenient options for scripting and memory management."
def stored_results(pkl_filename, function, overwrite=0, verbose=1,
lazy=False, args=[], kwargs={}):
"""Returns contents of pickled file 'pkl_filename'. If file does not exist
or is malformed, returns output of function (with args & kwargs) and pickles same
result to pkl_filename.
If overwrite>0, runs function regardless of presence of 'pkl_filename'.
If lazy=True, instead outputs a generator that evaluates the rest of the stored_results()
call only when its next() method is called."""
import cPickle
if lazy:
# Create a generator that will eventually retrieve stored result
def result_gen():
result = stored_results(pkl_filename, function, overwrite, verbose,
False, args, kwargs)
while 1:
yield result
return result_gen()
else:
if not overwrite>0:
try:
inf = file(pkl_filename, 'rb')
result = cPickle.load(inf)
inf.close()
if verbose: print pkl_filename, "successfully unpickled"
return result
except (cPickle.UnpicklingError, IOError):
if verbose: print "Result not stored. Running function", function.func_name
result = function(*args, **kwargs)
outf = file(pkl_filename, 'wb')
cPickle.dump(result, outf, -1)
outf.close()
if verbose: print "Result pickled to", pkl_filename
return result
Typical use:
result = stored_results('saved_result.pkl', expensive_function, args=(23, 'sweet'))
Advanced use:
To perform the calculation/load the result in a just-in-time fashion:
result_gen = stored_results('saved_result.pkl', expensive_function, args=(23, 'sweet'), lazy=True)
[memory-sensitive tasks that may or may not need the result]
result = result_gen.next()