Tuesday, April 11, 2006

Bootstrap statistics in Python

bootstrap.py version 0.1, 3/08/06

Python didn't seem to have a straightforward published bootstrap statistics module, so here is my effort.

Suppose you had 40 measurements

>>> dist
array([ 9.95587335, 11.14893547, 12.07186714, 9.06794986,
10.66503852, 10.7564387 , 11.42936871, 9.72575509, …
…11.41262324, 9.28556988, 8.01210309, 9.98832624])

with standard deviation 0.912. What is the uncertainty in your measurement of std=.912? Might the underlying distribution have sigma=1? (There is an exact formula in this case for Gaussian samples, but that won't always be so; and besides, it's a pain.)

Bootstrap statistics assumes that our 40 data points completely describe the underlying distribution–a "non-parametric" distribution. This assumption gets better the more points we have, but has good statistical properties all the time.

If the 40 points fully chararacterize our distribution, we can essentially do our experiment over and over again by resampling from those 40 points with replacement. Bootstrap.py does this very quickly using SciPy. Given distribution dist and an estimation function that takes a distribution and returns a number–such as scipy.std()–bootstrap.py resamples dist many many times and returns the mean and standard deviation of the estimate distribution.

>>> import bootstrap
>>> print scipy.std(dist)
0.911973669727
>>> print bootstrap.fast_bootstrap(scipy.std, dist)
(0.89697434867588222, 0.10347317195461153)

What does this mean? If we had 1,000,000 data points, rather than just 40, then their standard deviation would most likely be in the interval 0.897 +- 0.103. This is consistent with sigma=1, which, you might have guessed, I used to generate the data.

  • Bootstrap.py also lets you set error tolerance and a few other parameters.
  • Are you enticed? Let me know if this was useful to you, or failed spectacularly.
  • Thanks to gumuz' devlog for making me want to share.

6 comments:

gumuz said...

thanx for the headsup! I really need to revive my blog someday soon

cheers,

gumuz
http://gumuz.looze.net/

Michael J.T. O'Kelly said...

Do that!
If you ever want to post more Python Challenge solutions, I could use a little help with Level 30...

Nicholas Crawford said...

MJTO,
Thanks for the bootstrapping example! I've incorporated a modified portion of it in a webapp I wrote (http://www.ngcrawford.com/django/jost/). I made sure to include you in the acknowledgements of the associated paper. I hope you don't mind.

- Nick

Constantine Evans said...

It seems your module is no longer available at the link you provide here. Do you have a copy of it you could post somewhere?

Unknown said...

These statics are useful
www.besanttechnologies.com

Yashika said...

I'm very glad when read your article. It's easy to understand and very useful for newbie as me. Thank you so much and wish you happy…Professional Android Training in Chennai