A few months ago Siva Balakrishnan wrote a nice post on Larry Wasserman's blog about the ChaudhuriDasgupta algorithm for estimating a density cluster tree. The method, which is described in a NIPS 2010 paper "Rates of convergence for the cluster
The world hardly needs yet another tutorial on statistical computing with Python, but I made one for a live demo and I might as well post it. Also, the IPython Notebook makes me happy. http://nbviewer.ipython.org/urls/raw.github.com/papayawarrior/public_talks/master/statBytespython.ipynb In a very loose sense,
tapply is a super convenient function in R for computing statistics on a "ragged array". It lets you separate a dataset into groups based on a categorial variable, then compute any function you want on each group. Suppose we have
Thoughts on Statistics and Machine Learning