I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.

Scipy's clustering implementations work well, and they include a k-means implementation.

There's also scipy-cluster, which does agglomerative clustering; ths has the advantage that you don't need to decide on the number of clusters ahead of time.

SciPy's kmeans2() has some numerical problems: others have reported error messages such as "Matrix is not positive definite - Cholesky decomposition cannot be computed" in version 0.6.0, and I just encountered the same in version 0.7.1.

For now, I would recommend using PyCluster instead. Example usage:

```
>>> import numpy
>>> import Pycluster
>>> points = numpy.vstack([numpy.random.multivariate_normal(mean,
0.03 * numpy.diag([1,1]),
20)
for mean in [(1, 1), (2, 4), (3, 2)]])
>>> labels, error, nfound = Pycluster.kcluster(points, 3)
>>> labels # Cluster number for each point
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)
>>> error # The within-cluster sum of distances for the solution
1.7721661785401261
>>> nfound # Number of times this solution was found
1
```

Licensed under: CC-BY-SA with attribution

Not affiliated with: Stack Overflow