k-means is an iterative, simple and fast clustering algorithm. It groups a set of n data points into k clusters. The number of clusters being a prameter of the algorithm.
First, k centroids (or means) are randomly generated. The algorithm then associates every data point to the nearest centroid. The centroids of the newly-built clusters are calculated and this process is repeated until some convergence criterion is met.
This application uses some of the clustering datasets from the Speech and Image Processing Unit at the University of Eastern Finland. I have used the Euclidean (L2) distance in this implementation. The algorithm stops when the centroids do not change between two successive iterations. It is possible to restart the algorithm and/or to switch to a new dataset at any time. The application halts for 2 seconds between successive stept to allow users inspect the outcome of each step. Small squares represent the data points and the larger squares are the centroids.
As can be seen the algorithm converges in very few steps to a local optimum. Also, the algorithm performs better when the clusters are spherical.
Future releases of the K-Means clustering animator will allow users to run the algorithm on randomly-generated datasets as well as on manually-entered data.