A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

Wang, Junkuan; Wen, Qing; Chen, Zizhong

doi:10.1007/978-981-19-2448-4_11

Junkuan Wang⁷,
Qing Wen⁷ &
Zizhong Chen⁸

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 297))

556 Accesses

Abstract

The k-means algorithm is a widely used clustering algorithm, but the time overhead of the algorithm is relatively high on large-scale data sets and high-dimensional data sets. In this paper, we propose an efficient heuristic algorithm with the main idea of narrowing the search space of sample points and reducing the number of sample points reallocated during the iteration. Experimental results show that our algorithm has excellent time performance in most of the datasets, and the clustering quality is almost the same or even better than that of the exact k-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Enhancing K-Means Algorithm Based on Sorting and Partition

The global Minmax k-means algorithm

Article Open access 27 September 2016

Automatic Clustering Based on Cluster Nearest Neighbor Distance (CNND) Algorithm

References

Bachem, O., Lucic, M., Hassani, S. H., et al.: Approximate k-means++ in sublinear time. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Botía, J.A., Vandrovcova, J., Forabosco, P., et al.: An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst. Bio. 11(1), 1–16 (2017)
Article Google Scholar
Wang, J., Wang, J., Ke, Q., et al.: Fast Approximate K-Means via Cluster Closures. In: Multimedia Data Mining and Analytics, pp. 373–395. Springer, Cham (2015)
Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: The Advantages of Careful Seeding. Stanford (2006)
Google Scholar
Bachem, O., Lucic, M., Hassani, H., et al.: Fast and provably good seedings for k-means. Adv. Neural Inf. Proc. Syst. 29, 55–63 (2016)
Google Scholar
Ng, R.T., Han, J.: E cient and E ective clustering methods for spatial data mining. In: Proceedings of VLDB, pp. 144–155 (1994)
Google Scholar
Newling, J., Fleuret, F.: K-medoids for k-means seeding. In: arXiv preprint arXiv:1609.04723 (2016)
Pérez, J., Pazos, R., Cruz, L., et al.: Improving the efficiency and efficacy of the k-means clustering algorithm through a new convergence condition. In: International Conference on Computational Science and Its Applications, pp. 674–682. Springer, Berlin, Heidelberg (2007)
Google Scholar
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on world Wide Web, pp. 1177–1178 (2010)
Google Scholar
Pérez, J., Pires, C.E., Balby, L., et al.: Early classification: A new heuristic to improve the classification step of k-means. J. Inf. Data Manag. 4(2), 94–94 (2013)
Google Scholar
Shen, X., Liu, W., Tsang, I., et al.: Compressed k-means for large-scale clustering. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Hu, Q., Wu, J., Bai, L., et al.: Fast k-means for large scale clustering. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2099–2102 (2017)
Google Scholar
Deng, C.H., Zhao, W.L.: Fast k-means based on k-NN Graph. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1220–1223. IEEE (2018)
Google Scholar
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 147–153 (2003)
Google Scholar
Hamerly, G.: Making k-means even faster. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 130–140 (2010)
Google Scholar
Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In: Partitional Clustering Algorithms, pp. 41–78. Springer, Cham (2015)
Google Scholar
Newling, J, Fleuret, F.: Fast k-means with accurate bounds. In: International Conference on Machine Learning, pp. 936–944. PMLR (2016)
Google Scholar
Pelleg, D.: Extending K-means with efficient estimation of the number of clusters in ICML. In: Proceedings of the 17th International Conference on Machine Learning. pp. 277–281 (2000)
Google Scholar
Curtin, R.R.: A dual-tree algorithm for fast k-means clustering with large k. In: Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 300–308 (2017)
Google Scholar
Ding, Y., Zhao, Y., Shen, X., et al.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: International Conference on Machine Learning, pp. 579–587. PMLR (2015)
Google Scholar
Xia, S., Peng, D., Meng, D., et al.: A fast adaptive k-means with no bounds. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Google Scholar
Xia, S., Liu, Y., Ding, X., et al.: Granular ball computing classifiers for efficient, scalable and robust learning. Inf. Sci. 483, 136–152 (2019)
Article MathSciNet Google Scholar
Peng, D., Chen, Z., Fu, J., et al.: Fast k-means Clustering Based on the Neighbor Information. In: 2021 International Symposium on Electrical, Electronics and Information Engineering, pp. 551–555 (2021)
Google Scholar

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China (2019QY(Y)0301, the National Natural Science Foundation of China under Grant Nos. 62176033 and 61936001, and the Natural Science Foundation of Chongqing No. cstc2019jcyj-cxttX0002.

Author information

Authors and Affiliations

Chongqing Key Laboratory of Computational Intelligence, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications Chongqing, Chongqing, China
Junkuan Wang & Qing Wen
University of California, Riverside, Riverside, CA, USA
Zizhong Chen

Authors

Junkuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wen
View author publications
You can also search for this author in PubMed Google Scholar
Zizhong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zizhong Chen .

Editor information

Editors and Affiliations

KES International, Shoreham-by-Sea, UK
Lakhmi C. Jain
Technical University of Sofia, Sofia, Bulgaria
Roumen Kountchev
Yunnan Normal University, Kunming, China
Yonghang Tai
TK Engineering, bulgaria, Bulgaria
Roumiana Kountcheva

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Wen, Q., Chen, Z. (2022). A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information. In: Jain, L.C., Kountchev, R., Tai, Y., Kountcheva, R. (eds) 3D Imaging—Multidimensional Signal Processing and Deep Learning. Smart Innovation, Systems and Technologies, vol 297. Springer, Singapore. https://doi.org/10.1007/978-981-19-2448-4_11

Download citation

DOI: https://doi.org/10.1007/978-981-19-2448-4_11
Published: 02 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2447-7
Online ISBN: 978-981-19-2448-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Enhancing K-Means Algorithm Based on Sorting and Partition

The global Minmax k-means algorithm

Automatic Clustering Based on Cluster Nearest Neighbor Distance (CNND) Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Enhancing K-Means Algorithm Based on Sorting and Partition

The global Minmax k-means algorithm

Automatic Clustering Based on Cluster Nearest Neighbor Distance (CNND) Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation