Abstract
k-means++ is a seeding technique for the k-means method with an expected approximation ratio of O(logk), where k denotes the number of clusters. Examples are known on which the expected approximation ratio of k-means++ is Ω(logk), showing that the upper bound is asymptotically tight. However, it remained open whether k-means++ yields an O(1)-approximation with probability 1/poly(k) or even with constant probability. We settle this question and present instances on which k-means++ achieves an approximation ratio of (2/3 − ε)·logk only with exponentially small probability.
A part of this work was done at Maastricht University and was supported by a Veni grant from the Netherlands Organisation for Scientific Research.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proc. of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 405–414 (2009)
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035. SIAM, Philadelphia (2007)
Berkhin, P.: Survey of Clustering Data Mining Techniques. Technical report, Accrue Software (2002)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–136 (1982)
Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proc. of the 25th ACM Symposium on Computational Geometry (SoCG), pp. 324–332. ACM, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brunsch, T., Röglin, H. (2011). A Bad Instance for k-Means++. In: Ogihara, M., Tarui, J. (eds) Theory and Applications of Models of Computation. TAMC 2011. Lecture Notes in Computer Science, vol 6648. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20877-5_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-20877-5_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20876-8
Online ISBN: 978-3-642-20877-5
eBook Packages: Computer ScienceComputer Science (R0)