Size-Constrained Clustering Using an Initial Points Selection Method

Lei, Kai; Wang, Sibo; Song, Weiwei; Li, Qilin

doi:10.1007/978-3-642-39787-5_16

Kai Lei²⁰,
Sibo Wang²⁰,
Weiwei Song²⁰ &
…
Qilin Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8041))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2301 Accesses
2 Citations

Abstract

Size-Constrained clustering tries to solve the problem that how to classify dataset into groups based on each document’s similarity with additional requirement which each group size is within a fixed range. By far, adding constraints to assignment step in K-Means clustering is a main approach. But the performance of the algorithm also depends highly on the initial cluster centers like standard K-Means. We propose an initial points selection method by recursively discovering the point with large density around it. Root Mean Square Error and convergence speed (iteration times) are the two most important evaluation standards for clustering using an iterative procedure. Our experiments are conducted on about ten thousand research proposals of National Natural Science Foundation of China and the results show that our method can reduce the iteration times by over 50% and get smaller Root Mean Square Error. The method is scalable and can be coupled with a scalable size-constrained clustering algorithm to address the large-scale clustering problem in data mining.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Big data: an optimized approach for cluster initialization

Article Open access 20 July 2023

Initial Centroid Selection Method for an Enhanced K-means Clustering Algorithm

On Various Types of Even-Sized Clustering Based on Optimization

Keywords

References

Dubes, R.C., Jain, A.K.: Algorithms for Clustering Data. Prentice Hall (1988)
Google Scholar
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained K-Means Clustering. MSR-RT-2000-65, Microsoft Research (2000)
Google Scholar
Bertsekas, D.P.: Linear Network Optimization. MIT Press, Cambridge (1991)
MATH Google Scholar
Kelly, D.J., O’Neill, G.M.: The Minimum Cost Flow Problem and The Network Simplex Solution Method (September 1991)
Google Scholar
Zhao, J.: Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms (2006)
Google Scholar
Zhu, S., Wang, D., Li, T.: Data clustering with size constraints. Knowledge-Based Systems 23(8), 883–889 (2010)
Article MathSciNet Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley and Sons, NY (1973)
MATH Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Bradley, P.S., Fayyad, U.M.: Refing initial points for K-Means clustering. In: 15th Internat. Conf. on Machine Learning (1998)
Google Scholar
Deelers, S., Auwatanamongkol, S.: Enhancing K-Means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. Internat. J. Comput. Sci. 2, 247–252 (2007)
Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-Means clustering. Pattern Recognition Letters 25(11), 1293–1302 (2004)
Article Google Scholar
Steinley, D., Brusco, J.M.: Initialization K-Means Batch Clustering: A Critical Evaluation of Several Techniques. Journal of Classification 24(1), 99–121 (2007)
Article MathSciNet MATH Google Scholar
Lozano, J.A., Pena, J.M., Larranaga, P.: An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Lett. 20, 1027–1040 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Key Lab for Cloud Computing Technology & Applications (SPCCTA), School of Electronics and Computer Engineering, Peking University, Shenzhen, P.R. China
Kai Lei, Sibo Wang, Weiwei Song & Qilin Li

Authors

Kai Lei
View author publications
You can also search for this author in PubMed Google Scholar
Sibo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Song
View author publications
You can also search for this author in PubMed Google Scholar
Qilin Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Management and Economics, Dalian University of Technology, 116023, Dalian, China
Mingzheng Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, K., Wang, S., Song, W., Li, Q. (2013). Size-Constrained Clustering Using an Initial Points Selection Method. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-39787-5_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39786-8
Online ISBN: 978-3-642-39787-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Size-Constrained Clustering Using an Initial Points Selection Method

Abstract

Chapter PDF

Similar content being viewed by others

Big data: an optimized approach for cluster initialization

Initial Centroid Selection Method for an Enhanced K-means Clustering Algorithm

On Various Types of Even-Sized Clustering Based on Optimization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Size-Constrained Clustering Using an Initial Points Selection Method

Abstract

Chapter PDF

Similar content being viewed by others

Big data: an optimized approach for cluster initialization

Initial Centroid Selection Method for an Enhanced K-means Clustering Algorithm

On Various Types of Even-Sized Clustering Based on Optimization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation