Abstract
The k-means algorithm is a widely used clustering algorithm, but the time overhead of the algorithm is relatively high on large-scale data sets and high-dimensional data sets. In this paper, we propose an efficient heuristic algorithm with the main idea of narrowing the search space of sample points and reducing the number of sample points reallocated during the iteration. Experimental results show that our algorithm has excellent time performance in most of the datasets, and the clustering quality is almost the same or even better than that of the exact k-means algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bachem, O., Lucic, M., Hassani, S. H., et al.: Approximate k-means++ in sublinear time. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Botía, J.A., Vandrovcova, J., Forabosco, P., et al.: An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst. Bio. 11(1), 1–16 (2017)
Wang, J., Wang, J., Ke, Q., et al.: Fast Approximate K-Means via Cluster Closures. In: Multimedia Data Mining and Analytics, pp. 373–395. Springer, Cham (2015)
Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Arthur, D., Vassilvitskii, S.: k-means++: The Advantages of Careful Seeding. Stanford (2006)
Bachem, O., Lucic, M., Hassani, H., et al.: Fast and provably good seedings for k-means. Adv. Neural Inf. Proc. Syst. 29, 55–63 (2016)
Ng, R.T., Han, J.: E cient and E ective clustering methods for spatial data mining. In: Proceedings of VLDB, pp. 144–155 (1994)
Newling, J., Fleuret, F.: K-medoids for k-means seeding. In: arXiv preprint arXiv:1609.04723 (2016)
Pérez, J., Pazos, R., Cruz, L., et al.: Improving the efficiency and efficacy of the k-means clustering algorithm through a new convergence condition. In: International Conference on Computational Science and Its Applications, pp. 674–682. Springer, Berlin, Heidelberg (2007)
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on world Wide Web, pp. 1177–1178 (2010)
Pérez, J., Pires, C.E., Balby, L., et al.: Early classification: A new heuristic to improve the classification step of k-means. J. Inf. Data Manag. 4(2), 94–94 (2013)
Shen, X., Liu, W., Tsang, I., et al.: Compressed k-means for large-scale clustering. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
Hu, Q., Wu, J., Bai, L., et al.: Fast k-means for large scale clustering. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2099–2102 (2017)
Deng, C.H., Zhao, W.L.: Fast k-means based on k-NN Graph. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1220–1223. IEEE (2018)
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 147–153 (2003)
Hamerly, G.: Making k-means even faster. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 130–140 (2010)
Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In: Partitional Clustering Algorithms, pp. 41–78. Springer, Cham (2015)
Newling, J, Fleuret, F.: Fast k-means with accurate bounds. In: International Conference on Machine Learning, pp. 936–944. PMLR (2016)
Pelleg, D.: Extending K-means with efficient estimation of the number of clusters in ICML. In: Proceedings of the 17th International Conference on Machine Learning. pp. 277–281 (2000)
Curtin, R.R.: A dual-tree algorithm for fast k-means clustering with large k. In: Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 300–308 (2017)
Ding, Y., Zhao, Y., Shen, X., et al.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: International Conference on Machine Learning, pp. 579–587. PMLR (2015)
Xia, S., Peng, D., Meng, D., et al.: A fast adaptive k-means with no bounds. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Xia, S., Liu, Y., Ding, X., et al.: Granular ball computing classifiers for efficient, scalable and robust learning. Inf. Sci. 483, 136–152 (2019)
Peng, D., Chen, Z., Fu, J., et al.: Fast k-means Clustering Based on the Neighbor Information. In: 2021 International Symposium on Electrical, Electronics and Information Engineering, pp. 551–555 (2021)
Acknowledgements
This work was supported in part by National Key Research and Development Program of China (2019QY(Y)0301, the National Natural Science Foundation of China under Grant Nos. 62176033 and 61936001, and the Natural Science Foundation of Chongqing No. cstc2019jcyj-cxttX0002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, J., Wen, Q., Chen, Z. (2022). A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information. In: Jain, L.C., Kountchev, R., Tai, Y., Kountcheva, R. (eds) 3D Imaging—Multidimensional Signal Processing and Deep Learning. Smart Innovation, Systems and Technologies, vol 297. Springer, Singapore. https://doi.org/10.1007/978-981-19-2448-4_11
Download citation
DOI: https://doi.org/10.1007/978-981-19-2448-4_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2447-7
Online ISBN: 978-981-19-2448-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)