Implementing the Chaudhuri-Dasgupta density cluster tree

A few months ago Siva Balakrishnan wrote a nice post on Larry Wasserman's blog about the Chaudhuri-Dasgupta algorithm for estimating a density cluster tree. The method, which is described in a NIPS 2010 paper "Rates of convergence for the cluster

Getting started with statistical computing in Python

The world hardly needs yet another tutorial on statistical computing with Python, but I made one for a live demo and I might as well post it. Also, the IPython Notebook makes me happy. In a very loose sense,

tapply in Python

tapply is a super convenient function in R for computing statistics on a "ragged array". It lets you separate a dataset into groups based on a categorial variable, then compute any function you want on each group. Suppose we have

