Skip to main content

Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce

  • Conference paper
  • First Online:
Information Management and Machine Intelligence (ICIMMI 2019)

Abstract

The information volumes are bursting, and more data has been created in the past couple of years than in the past history of the human race. Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of fresh information is going to be created every second for each and every person on Earth. So we need a platform to process the large volume of complex information process hardware as well as software. MapReduce frame has received large popularity because of scalable distributed network atmosphere for productive processing of high scale info from this arrangement of Terabytes or longer. Hadoop, open-source implementation of MapReduce combined with Hadoop distributed file system, is broadly applied to encourage bunch computing tasks demanding non-response period. The recent Hadoop implementation supposes the nodes at the bunch have been homogenous in character. In this paper, we proposed a new algorithm to fix these issues for the commercial as well as non-commercial uses can enable the betterment of the community. We have conducted the experiment to establish that if a procedure is defined to handle the different use case situations, an individual could overall decrease the expense of computing and can benefit on relying on distributed systems for rapid executions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Liu, Z. (2015). Efficient storage design and query scheduling for improving big data retrieval and analytics, Dissertation, Auburn University, Alabama.

    Google Scholar 

  2. Zongben, X., & Shi, Y. (2015). Exploring big data analysis: Fundamental scientific problems. Springer Annals of Data Science, 2(4), 363–372.

    Article  Google Scholar 

  3. Tinetti, F. G., Real, I., Jaramillo, R., & Barry, D. (2015). Hadoop scalability and performance testing in heterogeneous clusters. In The proceedings of the 2015 international conference on parallel and distributed processing techniques and applications (PDPTA-2015), Part of WORLDCOMP’15 (pp. 441–446).

    Google Scholar 

  4. Wan, J., Yu, W., & Xu, X. (2009). Design and implement of distributed document clustering based on MapReduce, ISBN 978-952-5726-07-7, 2009.

    Google Scholar 

  5. Kamtekar, K., & Jain, R.. (2015). Performance modeling of big data (pp. 1–9). Washington University in St. Louis.

    Google Scholar 

  6. Das, T. K., & Mohan Kumar, P. (2013). Big data analytics: A framework for unstructured data analysis. International Journal of Engineering and Technology (IJET), 5(1), 153–156. ISSN: 0975-4024.

    Google Scholar 

  7. Liu, F. H., Liou, Y. R., Lo, H. F., Chang, K. C., & Lee, W. T. (2014). The comprehensive performance rating for hadoop clusters on cloud computing platform. International Journal of Information and Electronics Engineering, 4(6), 480–484.

    Google Scholar 

  8. Rong, Z., & De Knijf, J. (2013). Direct out-of-memory distributed parallel frequent pattern mining, ACM, BigMine’13. In Proceedings of the 2nd international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications (pp. 55–62). ISBN: 978-1-4503-2324-6, https://doi.org/10.1145/2501221. 2501229.

  9. Li, B., & Guoyong, Y. (2012). Improvement of TF-IDF algorithm based on Hadoop framework. In The 2nd international conference on computer application and system modeling (pp. 0391–0393), Paris, France: Atlantis Press.

    Google Scholar 

  10. Kamtekar, K. (2015). Performance modeling of big data, May 2015.

    Google Scholar 

  11. Jagtap, A. (2015). Categorization of the documents using K-Means and MapReduce. International Journal of Innovative Research in Science, Engineering and Technology, ISSN: 2319-8753, 2015.

    Google Scholar 

  12. Das, T. K., & Kumar, P. M. (2013). BIG data analytics: A framework for unstructured data analysis. International Journal of Engineering and Technology (IJET), 5(1), 153–156. ISSN: 0975-4024.

    Google Scholar 

  13. Novacescu, F. (2013). Big data in high performance scientific computing. In International Journal of Analele Universităţii “Eftimie Murgu (vol. 1, pp. 207–216). “Eftimie Murgu” University of Resita, ANUL XX, NR.

    Google Scholar 

  14. Rao, B. T., Sridevi, N. V., Reddy, V. K., & Reddy, L. S. S. (2011). Performance issues of heterogeneous Hadoop clusters in cloud computing. Global Journal of Computer Science and Technology, XI(VIII).

    Google Scholar 

  15. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., & Qin, X. (2010). Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In Proceedings of the 19th international heterogeneity in computing workshop (pp. 1–9), Atlanta, Georgia.

    Google Scholar 

  16. Liu, J., et al. (2015). An efficient job scheduling for MapReduce clusters. International Journal of Future Generation Communication and Networking, 8(2), 391–398.

    Google Scholar 

  17. Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., Foufou, S., & Bouras, A. (2014). A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267–279. Digital Object Identifier https://doi.org/10.1109/tetc.2014.2330519.

  18. MonaP. EMC Corporation (2014). Virtualizing Hadoop in large-scale infrastructures.

    Google Scholar 

  19. Aggarwal, C., & Han, J. (2014). An introduction to frequent pattern mining. In Frequent pattern mining, Springer. ISBN 978-3-319-07820-5 (Print) 978-3-319-07821-2 (Online), https://doi.org/10.1007/978-3-319-07821-2.

  20. Victor, G. S., Antonia, P., & Spyros, S. (2014). CSMR: A scalable algorithm for text clustering with cosine similarity and MapReduce. In IFIP international conference on artificial intelligence applications and innovations, AIAI 2014: Artificial intelligence applications and innovations (pp. 211–220), AICT 437.

    Google Scholar 

  21. Novacescu, F. (2013). Big data in high performance scientific computing. EFTIMIE MURGU RESITA, ANUL XX, NR. 1, (pp 207–216). ISSN 1453–7397.

    Google Scholar 

  22. Xue, J., Li, J., & Gong, Y. (2013). Restructuring of deep neural network acoustic models with singular value decomposition (pp. 2365–2369), ISCA, INTERSPEECH.

    Google Scholar 

  23. Akdere, M., Cetintemel, U., Riondato, M., Upfal, E., & Zdonik, S. B. (2012). Learning based query performance modeling and prediction. In IEEE 28th international conference on data engineering (pp. 390–401).

    Google Scholar 

  24. Thirumala Rao, B., Sridevi, N. V., Krishna Reddy, V., & Reddy, L. S. S. (2011). Performance issues of heterogeneous Hadoop clusters in cloud computing. Global Journal of Computer Science and Technology, XI(VIII).

    Google Scholar 

  25. Kumar, A., Goyal, D., Dadheech, P. (2018). A novel framework for performance optimization of routing protocol in VANET network. Journal of Advanced Research in Dynamical & Control Systems, 10(02), 2110–2121. ISSN: 1943-023X.

    Google Scholar 

  26. Dadheech, P., Goyal, D., Srivastava, S., & Kumar, A. (2018). A scalable data processing using Hadoop & MapReduce for big data. Journal of Advanced Research in Dynamical & Control Systems, 10, (02), 2099–2109. ISSN: 1943-023X.

    Google Scholar 

  27. Dadheech, P., Goyal, D., Srivastava, S., & Choudhary, C. M. (2018). An efficient approach for big data processing using spatial boolean queries. Journal of Statistics and Management Systems (JSMS), 21(4), 583–591.

    Google Scholar 

  28. Dadheech, P., Kumar, A., Choudhary, C., Beniwal, M. K., Dogiwal, S. R., & Agarwal, B. (2019). An enhanced 4-way technique using cookies for robust authentication process in wireless network. Journal of Statistics and Management Systems, 22(4), 773–782. https://doi.org/10.1080/09720510.2019.1609557.

    Article  Google Scholar 

  29. Kumar, A., Dadheech, P., Singh, V., Raja, L., & Poonia, R. C. (2019). An enhanced quantum key distribution protocol for security authentication. Journal of Discrete Mathematical Sciences and Cryptography, 22(4), 499–507. https://doi.org/10.1080/09720529.2019.1637154.

    Article  MathSciNet  Google Scholar 

  30. Kumar, A., Dadheech, P., Singh, V., Poonia, R. C., & Raja, L. (2019). An improved quantum key distribution protocol for verification. Journal of Discrete Mathematical Sciences and Cryptography, 22(4), 491–498. https://doi.org/10.1080/09720529.2019.1637153.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pankaj Dadheech .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dadheech, P., Goyal, D., Srivastava, S., Kumar, A., Bhardwaj, M. (2021). Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce. In: Goyal, D., Bălaş, V.E., Mukherjee, A., Hugo C. de Albuquerque, V., Gupta, A.K. (eds) Information Management and Machine Intelligence. ICIMMI 2019. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-4936-6_9

Download citation

Publish with us

Policies and ethics