Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1063))

  • 144 Accesses

Abstract

Cloud computing is highly praised for its high data reliability, lower cost, and nearly unlimited storage. In cloud computing projects, the MapReduce distributed computing model is prevalent. MapReduce distributed computing model is mainly divided into the Map and Reduce functions. As a mapper, the Map function is responsible for dividing tasks (such as uploaded files) into multiple small tasks executed separately; As a reducer, the Reduce function is responsible for summarizing the processing results of multiple tasks after decomposition. It is a scalable and fault-tolerant data processing tool that can process huge voluminous data in parallel with many low-end computing nodes. This paper implements the wordcount program based on the MapReduce framework and uses different dividing methods and data sizes to test the program. The common faults faced by the MapReduce framework also emerged during the experiment. This paper proposes schemes to improve the efficiency of the MapReduce framework. Finally, building an index or using a machine learning model to alleviate data skew is proposed to improve program efficiency. The application system is recommended to be a hybrid system with different modules to process variant tasks.

Rongpei Han and Yiting Wang these authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Baldini I, Castro P, Chang K et al (2017) Serverless computing: current trends and open problems. In: Research advances in cloud computing. Springer Singapore, pp 1–20. https://doi.org/10.1007/978-981-10-5026-8_1

  • Barranco CD, Campaña JR, Medina JM (2008) A B +—tree based indexing technique for fuzzy numerical data. Fuzzy Sets Syst 159(12):1431–1449. https://doi.org/10.1016/j.fss.2008.01.006

    Article  MathSciNet  MATH  Google Scholar 

  • Benois-Pineau J, Zemmari A (2021) Multi-faceted deep learning. Springer International Publishing

    Google Scholar 

  • Chen Q, Yao J, Xiao Z (2015) LIBRA: lightweight data skew mitigation in MapReduce. IEEE Trans Parallel Distrib Syst 26:2520–2533

    Article  Google Scholar 

  • Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th conference on symposium on operating systems design & implementation, vol 6

    Google Scholar 

  • DeWitt D, Stonebraker M (2008) MapReduce: a major step backwards. Database Column 1:23

    Google Scholar 

  • Giménez-Alventosa V, Moltó G, Caballer M (2019) A framework and a performance assessment for serverless MapReduce on AWS Lambda. Future Gener Comput Syst 97:259–274. https://doi.org/10.1016/j.future.2019.02.057

  • Irandoost MA, Rahmani AM, Setayeshi S (2019) A novel algorithm for handling reducer side data skew in MapReduce based on a learning automata game. Inform Sci Int J 501:501

    Google Scholar 

  • Sardar TH, Ansari Z (2018) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Future Comput Inform J 3(2):247–261. https://doi.org/10.1016/j.fcij.2018.06.002

    Article  Google Scholar 

  • Sosinsky BA (2011) Cloud computing bible. Wiley Pub

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongpei Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Han, R., Wang, Y. (2023). The Advance and Performance Analysis of MapReduce. In: Yadav, S., Kumar, H., Kankar, P.K., Dai, W., Huang, F. (eds) Proceedings of 2nd International Conference on Artificial Intelligence, Robotics, and Communication . ICAIRC 2022. Lecture Notes in Electrical Engineering, vol 1063. Springer, Singapore. https://doi.org/10.1007/978-981-99-4554-2_20

Download citation

Publish with us

Policies and ethics