Abstract
We study the problem of determining an optimal bipartition {A,B} of a set X of n points in ℝ2 that minimizes the sum of the sample variances of A and B, under the size constraints |A| = k and |B| = n − k. We present two algorithms for such a problem. The first one computes the solution in \(O(n\sqrt[3]{k}\log^2 n)\) time by using known results on convex-hulls and k-sets. The second algorithm, for an input X ⊂ ℝ2 of size n, solves the problem for all \(k=1,2,\ldots,\lfloor n/2\rfloor\) and works in O(n 2 logn) time.
This research has been supported by project PRIN #H41J12000190001 “Automata and formal languages: mathematical and applicative aspects”.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75, 245–249 (2009)
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman and Hall/CRC (2008)
Bertoni, A., Goldwurm, M., Lin, J., Pini, L.: Size-constrained 2-Clustering in the Plane with Manhattan Distance. In: Proc. 15th Italian Conference on Theoretical Computer Science. CEUR Workshop Proceedings, vol. 1231, pp. 33–44. CEUR-WS.org (2014) ISSN 1613-0073
Bertoni, A., Goldwurm, M., Lin, J., Saccà, F.: Size Constrained Distance Clustering: Separation Properties and Some Complexity Results. Fundamenta Informaticae 115(1), 125–139 (2012)
Bishop, C.: Pattern Recognition and Machine Learning. Springer (2006)
Dasgupta, S.: The hardness of k-means clustering. Technical Report CS2007-0890, Department of Computer Science and Engineering, University of California, San Diego (2007)
Dey, T.: Improved Bounds for Planar k-Sets and Related Problems. Discrete & Computational Geometry 19(3), 373–382 (1998)
Edelsbrunner, H.: Algorithms in Combinatorial Geometry. EATCS monographs on theoretical computer science. Springer (1987)
Erdős, P., Lovász, L., Simmons, A., Straus, E.G.: Dissection graphs of planar point sets. In: A Survey of Combinatorial Theory (Proc. Internat. Sympos., Colorado State Univ., Fort Collins, Colo., 1971), pp. 139–149. North-Holland, Amsterdam (1973)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer (2009)
Inaba, M., Katoh, N., Imai, H.: Applications of weighted voronoi diagrams and randomization to variance-based k-clustering (extended abstract). In: Proceedings of the Tenth Annual Symposium on Computational Geometry, SCG 1994, USA, pp. 332–339 (1994)
Lin. J.: Exact algorithms for size constrained clustering. PhD Thesis, Dottorato di ricerca in Matematica, Statistica e Scienze computationali, Università degli Studi di Milano. Ledizioni Publishing (2013)
MacQueen, J.B.: Some method for the classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Structures, pp. 281–297 (1967)
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theoretical Computer Science 442, 13–21 (2012)
Novick, B.: Norm statistics and the complexity of clustering problems. Discrete Applied Mathematics 157, 1831–1839 (2009)
Overmars, M.H., van Leeuwen, J.: Maintenance of configurations in the plane. J. Comput. Syst. Sci. 23(2), 166–204 (1981)
Preparata, F., Shamos, M.: Computational geometry: an introduction. Texts and monographs in computer science. Springer (1985)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, Elsevier (2009)
Vattani, A.: K-means requires exponentially many iterations even in the plane. In: Proceedings of the 25th Symposium on Computational Geometry (SoCG) (2009)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proc. of the 17th Intl. Conf. on Machine Learning, pp. 1103–1110 (2000)
Zhu, S., Wang, D., Li, T.: Data clustering with size constraints. Knowledge-Based Systems 23(8), 883–889 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bertoni, A., Goldwurm, M., Lin, J. (2015). Exact Algorithms for 2-Clustering with Size Constraints in the Euclidean Plane. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, JJ., Wattenhofer, R. (eds) SOFSEM 2015: Theory and Practice of Computer Science. SOFSEM 2015. Lecture Notes in Computer Science, vol 8939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46078-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-662-46078-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46077-1
Online ISBN: 978-3-662-46078-8
eBook Packages: Computer ScienceComputer Science (R0)