A Bad Instance for k-Means++

Brunsch, Tobias; Röglin, Heiko

doi:10.1007/978-3-642-20877-5_34

Tobias Brunsch¹⁸ &
Heiko Röglin¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6648))

Included in the following conference series:

International Conference on Theory and Applications of Models of Computation

889 Accesses
6 Citations

Abstract

k-means++ is a seeding technique for the k-means method with an expected approximation ratio of O(logk), where k denotes the number of clusters. Examples are known on which the expected approximation ratio of k-means++ is Ω(logk), showing that the upper bound is asymptotically tight. However, it remained open whether k-means++ yields an O(1)-approximation with probability 1/poly(k) or even with constant probability. We settle this question and present instances on which k-means++ achieves an approximation ratio of (2/3 − ε)·logk only with exponentially small probability.

A part of this work was done at Maastricht University and was supported by a Veni grant from the Netherlands Organisation for Scientific Research.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Tight Lower Bound Instance for k-means++ in Constant Dimension

k-means++ under Approximation Stability

Theoretical Analysis of the k-Means Algorithm – A Survey

References

Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)
Chapter Google Scholar
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)
Article Google Scholar
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proc. of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 405–414 (2009)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035. SIAM, Philadelphia (2007)
Google Scholar
Berkhin, P.: Survey of Clustering Data Mining Techniques. Technical report, Accrue Software (2002)
Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)
Article MathSciNet MATH Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–136 (1982)
Article MathSciNet MATH Google Scholar
Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proc. of the 25th ACM Symposium on Computational Geometry (SoCG), pp. 324–332. ACM, New York (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bonn, Germany
Tobias Brunsch & Heiko Röglin

Authors

Tobias Brunsch
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Röglin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Miami, 1365 Memorial Drive, 33146, Coral Gables, FL, USA
Mitsunori Ogihara
Department of Information and Communication Engineering, University of Electro-Comm, Chofugaoga 1-5-1, Chofu, 182-8585, Tokyo, Japan
Jun Tarui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brunsch, T., Röglin, H. (2011). A Bad Instance for k-Means++. In: Ogihara, M., Tarui, J. (eds) Theory and Applications of Models of Computation. TAMC 2011. Lecture Notes in Computer Science, vol 6648. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20877-5_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-20877-5_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20876-8
Online ISBN: 978-3-642-20877-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Bad Instance for k-Means++

Abstract

Chapter PDF

Similar content being viewed by others

A Tight Lower Bound Instance for k-means++ in Constant Dimension

k-means++ under Approximation Stability

Theoretical Analysis of the k-Means Algorithm – A Survey

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Bad Instance for k-Means++

Abstract

Chapter PDF

Similar content being viewed by others

A Tight Lower Bound Instance for k-means++ in Constant Dimension

k-means++ under Approximation Stability

Theoretical Analysis of the k-Means Algorithm – A Survey

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation