Abstract
Cloud computing is highly praised for its high data reliability, lower cost, and nearly unlimited storage. In cloud computing projects, the MapReduce distributed computing model is prevalent. MapReduce distributed computing model is mainly divided into the Map and Reduce functions. As a mapper, the Map function is responsible for dividing tasks (such as uploaded files) into multiple small tasks executed separately; As a reducer, the Reduce function is responsible for summarizing the processing results of multiple tasks after decomposition. It is a scalable and fault-tolerant data processing tool that can process huge voluminous data in parallel with many low-end computing nodes. This paper implements the wordcount program based on the MapReduce framework and uses different dividing methods and data sizes to test the program. The common faults faced by the MapReduce framework also emerged during the experiment. This paper proposes schemes to improve the efficiency of the MapReduce framework. Finally, building an index or using a machine learning model to alleviate data skew is proposed to improve program efficiency. The application system is recommended to be a hybrid system with different modules to process variant tasks.
Rongpei Han and Yiting Wang these authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baldini I, Castro P, Chang K et al (2017) Serverless computing: current trends and open problems. In: Research advances in cloud computing. Springer Singapore, pp 1–20. https://doi.org/10.1007/978-981-10-5026-8_1
Barranco CD, Campaña JR, Medina JM (2008) A B +—tree based indexing technique for fuzzy numerical data. Fuzzy Sets Syst 159(12):1431–1449. https://doi.org/10.1016/j.fss.2008.01.006
Benois-Pineau J, Zemmari A (2021) Multi-faceted deep learning. Springer International Publishing
Chen Q, Yao J, Xiao Z (2015) LIBRA: lightweight data skew mitigation in MapReduce. IEEE Trans Parallel Distrib Syst 26:2520–2533
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, vol 6
DeWitt D, Stonebraker M (2008) MapReduce: a major step backwards. Database Column 1:23
Giménez-Alventosa V, Moltó G, Caballer M (2019) A framework and a performance assessment for serverless MapReduce on AWS Lambda. Future Gener Comput Syst 97:259–274. https://doi.org/10.1016/j.future.2019.02.057
Irandoost MA, Rahmani AM, Setayeshi S (2019) A novel algorithm for handling reducer side data skew in MapReduce based on a learning automata game. Inform Sci Int J 501:501
Sardar TH, Ansari Z (2018) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Future Comput Inform J 3(2):247–261. https://doi.org/10.1016/j.fcij.2018.06.002
Sosinsky BA (2011) Cloud computing bible. Wiley Pub
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Han, R., Wang, Y. (2023). The Advance and Performance Analysis of MapReduce. In: Yadav, S., Kumar, H., Kankar, P.K., Dai, W., Huang, F. (eds) Proceedings of 2nd International Conference on Artificial Intelligence, Robotics, and Communication . ICAIRC 2022. Lecture Notes in Electrical Engineering, vol 1063. Springer, Singapore. https://doi.org/10.1007/978-981-99-4554-2_20
Download citation
DOI: https://doi.org/10.1007/978-981-99-4554-2_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4553-5
Online ISBN: 978-981-99-4554-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)