Python k-means algorithm


Question

I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.

1
46
10/9/2009 7:16:13 PM

Accepted Answer

Scipy's clustering implementations work well, and they include a k-means implementation.

There's also scipy-cluster, which does agglomerative clustering; ths has the advantage that you don't need to decide on the number of clusters ahead of time.

54
8/4/2015 10:50:27 PM

SciPy's kmeans2() has some numerical problems: others have reported error messages such as "Matrix is not positive definite - Cholesky decomposition cannot be computed" in version 0.6.0, and I just encountered the same in version 0.7.1.

For now, I would recommend using PyCluster instead. Example usage:

>>> import numpy
>>> import Pycluster
>>> points = numpy.vstack([numpy.random.multivariate_normal(mean, 
                                                            0.03 * numpy.diag([1,1]),
                                                            20) 
                           for mean in [(1, 1), (2, 4), (3, 2)]])
>>> labels, error, nfound = Pycluster.kcluster(points, 3)
>>> labels  # Cluster number for each point
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)
>>> error   # The within-cluster sum of distances for the solution
1.7721661785401261
>>> nfound  # Number of times this solution was found
1

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon