Skip to main content

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information

  • Conference paper
  • First Online:
3D Imaging—Multidimensional Signal Processing and Deep Learning

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 297))

  • 556 Accesses

Abstract

The k-means algorithm is a widely used clustering algorithm, but the time overhead of the algorithm is relatively high on large-scale data sets and high-dimensional data sets. In this paper, we propose an efficient heuristic algorithm with the main idea of narrowing the search space of sample points and reducing the number of sample points reallocated during the iteration. Experimental results show that our algorithm has excellent time performance in most of the datasets, and the clustering quality is almost the same or even better than that of the exact k-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bachem, O., Lucic, M., Hassani, S. H., et al.: Approximate k-means++ in sublinear time. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  2. Botía, J.A., Vandrovcova, J., Forabosco, P., et al.: An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst. Bio. 11(1), 1–16 (2017)

    Article  Google Scholar 

  3. Wang, J., Wang, J., Ke, Q., et al.: Fast Approximate K-Means via Cluster Closures. In: Multimedia Data Mining and Analytics, pp. 373–395. Springer, Cham (2015)

    Google Scholar 

  4. Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  5. Arthur, D., Vassilvitskii, S.: k-means++: The Advantages of Careful Seeding. Stanford (2006)

    Google Scholar 

  6. Bachem, O., Lucic, M., Hassani, H., et al.: Fast and provably good seedings for k-means. Adv. Neural Inf. Proc. Syst. 29, 55–63 (2016)

    Google Scholar 

  7. Ng, R.T., Han, J.: E cient and E ective clustering methods for spatial data mining. In: Proceedings of VLDB, pp. 144–155 (1994)

    Google Scholar 

  8. Newling, J., Fleuret, F.: K-medoids for k-means seeding. In: arXiv preprint arXiv:1609.04723 (2016)

  9. Pérez, J., Pazos, R., Cruz, L., et al.: Improving the efficiency and efficacy of the k-means clustering algorithm through a new convergence condition. In: International Conference on Computational Science and Its Applications, pp. 674–682. Springer, Berlin, Heidelberg (2007)

    Google Scholar 

  10. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on world Wide Web, pp. 1177–1178 (2010)

    Google Scholar 

  11. Pérez, J., Pires, C.E., Balby, L., et al.: Early classification: A new heuristic to improve the classification step of k-means. J. Inf. Data Manag. 4(2), 94–94 (2013)

    Google Scholar 

  12. Shen, X., Liu, W., Tsang, I., et al.: Compressed k-means for large-scale clustering. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  13. Hu, Q., Wu, J., Bai, L., et al.: Fast k-means for large scale clustering. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2099–2102 (2017)

    Google Scholar 

  14. Deng, C.H., Zhao, W.L.: Fast k-means based on k-NN Graph. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1220–1223. IEEE (2018)

    Google Scholar 

  15. Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 147–153 (2003)

    Google Scholar 

  16. Hamerly, G.: Making k-means even faster. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 130–140 (2010)

    Google Scholar 

  17. Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In: Partitional Clustering Algorithms, pp. 41–78. Springer, Cham (2015)

    Google Scholar 

  18. Newling, J, Fleuret, F.: Fast k-means with accurate bounds. In: International Conference on Machine Learning, pp. 936–944. PMLR (2016)

    Google Scholar 

  19. Pelleg, D.: Extending K-means with efficient estimation of the number of clusters in ICML. In: Proceedings of the 17th International Conference on Machine Learning. pp. 277–281 (2000)

    Google Scholar 

  20. Curtin, R.R.: A dual-tree algorithm for fast k-means clustering with large k. In: Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 300–308 (2017)

    Google Scholar 

  21. Ding, Y., Zhao, Y., Shen, X., et al.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: International Conference on Machine Learning, pp. 579–587. PMLR (2015)

    Google Scholar 

  22. Xia, S., Peng, D., Meng, D., et al.: A fast adaptive k-means with no bounds. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)

    Google Scholar 

  23. Xia, S., Liu, Y., Ding, X., et al.: Granular ball computing classifiers for efficient, scalable and robust learning. Inf. Sci. 483, 136–152 (2019)

    Article  MathSciNet  Google Scholar 

  24. Peng, D., Chen, Z., Fu, J., et al.: Fast k-means Clustering Based on the Neighbor Information. In: 2021 International Symposium on Electrical, Electronics and Information Engineering, pp. 551–555 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Program of China (2019QY(Y)0301, the National Natural Science Foundation of China under Grant Nos. 62176033 and 61936001, and the Natural Science Foundation of Chongqing No. cstc2019jcyj-cxttX0002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zizhong Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Wen, Q., Chen, Z. (2022). A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information. In: Jain, L.C., Kountchev, R., Tai, Y., Kountcheva, R. (eds) 3D Imaging—Multidimensional Signal Processing and Deep Learning. Smart Innovation, Systems and Technologies, vol 297. Springer, Singapore. https://doi.org/10.1007/978-981-19-2448-4_11

Download citation

Publish with us

Policies and ethics