Abstract
Size-Constrained clustering tries to solve the problem that how to classify dataset into groups based on each document’s similarity with additional requirement which each group size is within a fixed range. By far, adding constraints to assignment step in K-Means clustering is a main approach. But the performance of the algorithm also depends highly on the initial cluster centers like standard K-Means. We propose an initial points selection method by recursively discovering the point with large density around it. Root Mean Square Error and convergence speed (iteration times) are the two most important evaluation standards for clustering using an iterative procedure. Our experiments are conducted on about ten thousand research proposals of National Natural Science Foundation of China and the results show that our method can reduce the iteration times by over 50% and get smaller Root Mean Square Error. The method is scalable and can be coupled with a scalable size-constrained clustering algorithm to address the large-scale clustering problem in data mining.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Dubes, R.C., Jain, A.K.: Algorithms for Clustering Data. Prentice Hall (1988)
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained K-Means Clustering. MSR-RT-2000-65, Microsoft Research (2000)
Bertsekas, D.P.: Linear Network Optimization. MIT Press, Cambridge (1991)
Kelly, D.J., O’Neill, G.M.: The Minimum Cost Flow Problem and The Network Simplex Solution Method (September 1991)
Zhao, J.: Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms (2006)
Zhu, S., Wang, D., Li, T.: Data clustering with size constraints. Knowledge-Based Systems 23(8), 883–889 (2010)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley and Sons, NY (1973)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Bradley, P.S., Fayyad, U.M.: Refing initial points for K-Means clustering. In: 15th Internat. Conf. on Machine Learning (1998)
Deelers, S., Auwatanamongkol, S.: Enhancing K-Means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. Internat. J. Comput. Sci. 2, 247–252 (2007)
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-Means clustering. Pattern Recognition Letters 25(11), 1293–1302 (2004)
Steinley, D., Brusco, J.M.: Initialization K-Means Batch Clustering: A Critical Evaluation of Several Techniques. Journal of Classification 24(1), 99–121 (2007)
Lozano, J.A., Pena, J.M., Larranaga, P.: An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Lett. 20, 1027–1040 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lei, K., Wang, S., Song, W., Li, Q. (2013). Size-Constrained Clustering Using an Initial Points Selection Method. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-39787-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39786-8
Online ISBN: 978-3-642-39787-5
eBook Packages: Computer ScienceComputer Science (R0)