An Improved K-means Clustering Algorithm Based on Hadoop Platform

Hou, Xiangru

doi:10.1007/978-3-030-15235-2_146

Xiangru Hou¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 928))

Included in the following conference series:

The International Conference on Cyber Security Intelligence and Analytics

125 Accesses
6 Citations

Abstract

In order to solve the problem of poor clustering effect of K-means algorithm when dealing with massive high-dimensional data on Hadoop platform, and the existing improved algorithm is not conducive to parallelization. An improved K-means algorithm based on Hash is proposed on Hadoop platform. Firstly, the massive high-dimensional data is mapped to a compressed identification space, and then the clustering relationship is mined, and the initial clustering center is selected to avoid the sensitivity of the traditional K-means algorithm to randomly select the initial clustering center, and reduced the number of iterations of the K-means algorithm. Secondly, the overall parallelization of the algorithm is implemented in the framework of Map Reduce, and the degree of parallelization and execution efficiency is enhanced through the mechanisms of partition and combine. Finally, the experiments show that the algorithm not only improves the accuracy and stability of clustering, but also has a good processing speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

Article 20 December 2019

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Article 25 February 2019

A MapReduce-based K-means clustering algorithm

Article 20 September 2021

References

Jinhai, Zhang J, Wu R (2018) A spectral clustering algorithm for large data based on improved sampling weighted kernel K-means. Surv Mapp Bull (11):78–82
Google Scholar
Liu W, Zhang J (2018) An improved K-means clustering algorithm. Mod Bus Ind 39(19):196–198
Google Scholar
He M (2018) Research on power load data classification algorithm based on Hadoop platform. Xi’an University of Science and Technology
Google Scholar
Zhao W, Ma H, He Q (2009) Parallel K-means clustering based on map reduce. In: CloudCom 2009. LNCS, vol 5931, pp 674–679
Google Scholar
Miao Y, Zhang J et al (2014) New clustering algorithm based on Hadoop. Comput Sci 41(4):269–272
Google Scholar
Zhang S, Wu Z (2014) Clustering algorithm optimization research based on Hadoop. Comput Sci 41(4):269–272
Google Scholar
Yang M, Ma C, Wang Y, Zhang Z (2019) An improved FCMM algorithm for K-means clustering. Comput Appl Res (07):1–6
Google Scholar
Wang B (2018) Research on clustering K-means algorithm based on Hadoop platform. Comput Lett (04):18–20 (2018)
Google Scholar
Zhang S, Dong Y, Chen X (2018) Research on the design of HKM clustering algorithm based on cloud computing platform Hadoop. J Appl Sci 36(03):524–534
Google Scholar
Miu Y, Zhang J (2014) A new clustering algorithm based on Hadoop platform. Comput Sci 41(4):269–272
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, Heilongjiang International University, Harbin, 150025, China
Xiangru Hou

Authors

Xiangru Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangru Hou .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Zheng Xu
University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
University of Guelph, Guelph, ON, Canada
Ali Dehghantanha
Kennesaw State University, Marietta, GA, USA
Reza Parizi
Manchester Metropolitan University, Stockport, UK
Mohammad Hammoudeh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hou, X. (2020). An Improved K-means Clustering Algorithm Based on Hadoop Platform. In: Xu, Z., Choo, KK., Dehghantanha, A., Parizi, R., Hammoudeh, M. (eds) Cyber Security Intelligence and Analytics. CSIA 2019. Advances in Intelligent Systems and Computing, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-030-15235-2_146

Download citation

DOI: https://doi.org/10.1007/978-3-030-15235-2_146
Published: 25 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15234-5
Online ISBN: 978-3-030-15235-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Improved K-means Clustering Algorithm Based on Hadoop Platform

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

A MapReduce-based K-means clustering algorithm

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Improved K-means Clustering Algorithm Based on Hadoop Platform

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

A MapReduce-based K-means clustering algorithm

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation