Stability of k-Means Clustering

Ben-David, Shai; Pál, Dávid; Simon, Hans Ulrich

doi:10.1007/978-3-540-72927-3_4

Shai Ben-David¹,
Dávid Pál¹ &
Hans Ulrich Simon²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4539))

Included in the following conference series:

International Conference on Computational Learning Theory

3626 Accesses
27 Citations

Abstract

We consider the stability of k-means clustering problems. Clustering stability is a common heuristics used to determine the number of clusters in a wide variety of clustering applications. We continue the theoretical analysis of clustering stability by establishing a complete characterization of clustering stability in terms of the number of optimal solutions to the clustering optimization problem. Our results complement earlier work of Ben-David, von Luxburg and Pál, by settling the main problem left open there. Our analysis shows that, for probability distributions with finite support, the stability of k-means clusterings depends solely on the number of optimal solutions to the underlying optimization problem for the data distribution. These results challenge the common belief and practice that view stability as an indicator of the validity, or meaningfulness, of the choice of a clustering algorithm and number of clusters.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Estimations of Clustering Quality via Evaluation of Its Stability

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

Probably certifiably correct k-means clustering

Article 21 December 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Extended version of this paper. Availabe at http://www.cs.uwaterloo.ca/~dpal/papers/stability/stability.pdf or at http://www.cs.uwaterloo.ca/~shai/publications/stability.pdf
Ben-David, S.: A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In: Proceedings of the Conference on Computational Learning Theory, pp. 415–426 (2004)
Google Scholar
Ben-David, S., von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Proceedings of the Conference on Computational Learning Theory, pp. 5–19 (2006)
Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing 7, 6–17 (2002)
Google Scholar
Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 3(7) (2002)
Google Scholar
Lange, T., Braun, M.L., Roth, V., Buhmann, J.: Stability-based model selection. Advances in Neural Information Processing Systems 15, 617–624 (2003)
Google Scholar
Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Computation 13(11), 2573–2593 (2001)
Article MATH Google Scholar
Meila, M.: Comparing clusterings. In: Proceedings of the Conference on Computational Learning Theory, pp. 173–187 (2003)
Google Scholar
Pollard, D.: Strong consistency of k-means clustering. The Annals of Statistics 9(1), 135–140 (1981)
MATH MathSciNet Google Scholar
Rakhlin, A., Caponnetto, A.: Stability of k-means clustering. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA (2007)
Google Scholar
von Luxburg, U., Ben-David, S.: Towards a statistical theory of clustering. In: PASCAL workshop on Statistics and Optimization of Clustering (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
Shai Ben-David & Dávid Pál
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon

Authors

Shai Ben-David
View author publications
You can also search for this author in PubMed Google Scholar
Dávid Pál
View author publications
You can also search for this author in PubMed Google Scholar
Hans Ulrich Simon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nader H. Bshouty Claudio Gentile

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben-David, S., Pál, D., Simon, H.U. (2007). Stability of k-Means Clustering. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-72927-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stability of k-Means Clustering

Abstract

Chapter PDF

Similar content being viewed by others

Estimations of Clustering Quality via Evaluation of Its Stability

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

Probably certifiably correct k-means clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Stability of k-Means Clustering

Abstract

Chapter PDF

Similar content being viewed by others

Estimations of Clustering Quality via Evaluation of Its Stability

k Is the Magic Number—Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

Probably certifiably correct k-means clustering

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation