Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce

Dadheech, Pankaj; Goyal, Dinesh; Srivastava, Sumit; Kumar, Ankit; Bhardwaj, Manish

doi:10.1007/978-981-15-4936-6_9

Pankaj Dadheech⁹,
Dinesh Goyal¹⁰,
Sumit Srivastava¹¹,
Ankit Kumar⁹ &
…
Manish Bhardwaj¹⁰

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Included in the following conference series:

International Conference on Information Management & Machine Intelligence

506 Accesses
1 Citations

Abstract

The information volumes are bursting, and more data has been created in the past couple of years than in the past history of the human race. Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of fresh information is going to be created every second for each and every person on Earth. So we need a platform to process the large volume of complex information process hardware as well as software. MapReduce frame has received large popularity because of scalable distributed network atmosphere for productive processing of high scale info from this arrangement of Terabytes or longer. Hadoop, open-source implementation of MapReduce combined with Hadoop distributed file system, is broadly applied to encourage bunch computing tasks demanding non-response period. The recent Hadoop implementation supposes the nodes at the bunch have been homogenous in character. In this paper, we proposed a new algorithm to fix these issues for the commercial as well as non-commercial uses can enable the betterment of the community. We have conducted the experiment to establish that if a procedure is defined to handle the different use case situations, an individual could overall decrease the expense of computing and can benefit on relying on distributed systems for rapid executions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Big Data Hadoop MapReduce Job Scheduling: A Short Survey

A Deep Dive into the Hadoop World to Explore Its Various Performances

A Priori Study on Factors Affecting MapReduce Performance in Cloud-Based Environment

References

Liu, Z. (2015). Efficient storage design and query scheduling for improving big data retrieval and analytics, Dissertation, Auburn University, Alabama.
Google Scholar
Zongben, X., & Shi, Y. (2015). Exploring big data analysis: Fundamental scientific problems. Springer Annals of Data Science, 2(4), 363–372.
Article Google Scholar
Tinetti, F. G., Real, I., Jaramillo, R., & Barry, D. (2015). Hadoop scalability and performance testing in heterogeneous clusters. In The proceedings of the 2015 international conference on parallel and distributed processing techniques and applications (PDPTA-2015), Part of WORLDCOMP’15 (pp. 441–446).
Google Scholar
Wan, J., Yu, W., & Xu, X. (2009). Design and implement of distributed document clustering based on MapReduce, ISBN 978-952-5726-07-7, 2009.
Google Scholar
Kamtekar, K., & Jain, R.. (2015). Performance modeling of big data (pp. 1–9). Washington University in St. Louis.
Google Scholar
Das, T. K., & Mohan Kumar, P. (2013). Big data analytics: A framework for unstructured data analysis. International Journal of Engineering and Technology (IJET), 5(1), 153–156. ISSN: 0975-4024.
Google Scholar
Liu, F. H., Liou, Y. R., Lo, H. F., Chang, K. C., & Lee, W. T. (2014). The comprehensive performance rating for hadoop clusters on cloud computing platform. International Journal of Information and Electronics Engineering, 4(6), 480–484.
Google Scholar
Rong, Z., & De Knijf, J. (2013). Direct out-of-memory distributed parallel frequent pattern mining, ACM, BigMine’13. In Proceedings of the 2nd international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications (pp. 55–62). ISBN: 978-1-4503-2324-6, https://doi.org/10.1145/2501221. 2501229.
Li, B., & Guoyong, Y. (2012). Improvement of TF-IDF algorithm based on Hadoop framework. In The 2nd international conference on computer application and system modeling (pp. 0391–0393), Paris, France: Atlantis Press.
Google Scholar
Kamtekar, K. (2015). Performance modeling of big data, May 2015.
Google Scholar
Jagtap, A. (2015). Categorization of the documents using K-Means and MapReduce. International Journal of Innovative Research in Science, Engineering and Technology, ISSN: 2319-8753, 2015.
Google Scholar
Das, T. K., & Kumar, P. M. (2013). BIG data analytics: A framework for unstructured data analysis. International Journal of Engineering and Technology (IJET), 5(1), 153–156. ISSN: 0975-4024.
Google Scholar
Novacescu, F. (2013). Big data in high performance scientific computing. In International Journal of Analele Universităţii “Eftimie Murgu (vol. 1, pp. 207–216). “Eftimie Murgu” University of Resita, ANUL XX, NR.
Google Scholar
Rao, B. T., Sridevi, N. V., Reddy, V. K., & Reddy, L. S. S. (2011). Performance issues of heterogeneous Hadoop clusters in cloud computing. Global Journal of Computer Science and Technology, XI(VIII).
Google Scholar
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., & Qin, X. (2010). Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In Proceedings of the 19th international heterogeneity in computing workshop (pp. 1–9), Atlanta, Georgia.
Google Scholar
Liu, J., et al. (2015). An efficient job scheduling for MapReduce clusters. International Journal of Future Generation Communication and Networking, 8(2), 391–398.
Google Scholar
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., Foufou, S., & Bouras, A. (2014). A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267–279. Digital Object Identifier https://doi.org/10.1109/tetc.2014.2330519.
MonaP. EMC Corporation (2014). Virtualizing Hadoop in large-scale infrastructures.
Google Scholar
Aggarwal, C., & Han, J. (2014). An introduction to frequent pattern mining. In Frequent pattern mining, Springer. ISBN 978-3-319-07820-5 (Print) 978-3-319-07821-2 (Online), https://doi.org/10.1007/978-3-319-07821-2.
Victor, G. S., Antonia, P., & Spyros, S. (2014). CSMR: A scalable algorithm for text clustering with cosine similarity and MapReduce. In IFIP international conference on artificial intelligence applications and innovations, AIAI 2014: Artificial intelligence applications and innovations (pp. 211–220), AICT 437.
Google Scholar
Novacescu, F. (2013). Big data in high performance scientific computing. EFTIMIE MURGU RESITA, ANUL XX, NR. 1, (pp 207–216). ISSN 1453–7397.
Google Scholar
Xue, J., Li, J., & Gong, Y. (2013). Restructuring of deep neural network acoustic models with singular value decomposition (pp. 2365–2369), ISCA, INTERSPEECH.
Google Scholar
Akdere, M., Cetintemel, U., Riondato, M., Upfal, E., & Zdonik, S. B. (2012). Learning based query performance modeling and prediction. In IEEE 28th international conference on data engineering (pp. 390–401).
Google Scholar
Thirumala Rao, B., Sridevi, N. V., Krishna Reddy, V., & Reddy, L. S. S. (2011). Performance issues of heterogeneous Hadoop clusters in cloud computing. Global Journal of Computer Science and Technology, XI(VIII).
Google Scholar
Kumar, A., Goyal, D., Dadheech, P. (2018). A novel framework for performance optimization of routing protocol in VANET network. Journal of Advanced Research in Dynamical & Control Systems, 10(02), 2110–2121. ISSN: 1943-023X.
Google Scholar
Dadheech, P., Goyal, D., Srivastava, S., & Kumar, A. (2018). A scalable data processing using Hadoop & MapReduce for big data. Journal of Advanced Research in Dynamical & Control Systems, 10, (02), 2099–2109. ISSN: 1943-023X.
Google Scholar
Dadheech, P., Goyal, D., Srivastava, S., & Choudhary, C. M. (2018). An efficient approach for big data processing using spatial boolean queries. Journal of Statistics and Management Systems (JSMS), 21(4), 583–591.
Google Scholar
Dadheech, P., Kumar, A., Choudhary, C., Beniwal, M. K., Dogiwal, S. R., & Agarwal, B. (2019). An enhanced 4-way technique using cookies for robust authentication process in wireless network. Journal of Statistics and Management Systems, 22(4), 773–782. https://doi.org/10.1080/09720510.2019.1609557.
Article Google Scholar
Kumar, A., Dadheech, P., Singh, V., Raja, L., & Poonia, R. C. (2019). An enhanced quantum key distribution protocol for security authentication. Journal of Discrete Mathematical Sciences and Cryptography, 22(4), 499–507. https://doi.org/10.1080/09720529.2019.1637154.
Article MathSciNet Google Scholar
Kumar, A., Dadheech, P., Singh, V., Poonia, R. C., & Raja, L. (2019). An improved quantum key distribution protocol for verification. Journal of Discrete Mathematical Sciences and Cryptography, 22(4), 491–498. https://doi.org/10.1080/09720529.2019.1637153.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan, Jaipur, Rajasthan, India
Pankaj Dadheech & Ankit Kumar
Poornima Institute of Engineering & Technology, Jaipur, Rajasthan, India
Dinesh Goyal & Manish Bhardwaj
Manipal University Jaipur, Jaipur, Rajasthan, India
Sumit Srivastava

Authors

Pankaj Dadheech
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Sumit Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Manish Bhardwaj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pankaj Dadheech .

Editor information

Editors and Affiliations

Poornima Institute of Engineering and Technology, Jaipur, Rajasthan, India
Dinesh Goyal
Department of Automatics and Applied Informatics, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Bălaş
CISCO Technologies, Milpitas, CA, USA
Abhishek Mukherjee
Universidade de Fortaleza, Fortaleza, Brazil
Victor Hugo C. de Albuquerque
Poornima Institute of Engineering and Technology, Jaipur, Rajasthan, India
Amit Kumar Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dadheech, P., Goyal, D., Srivastava, S., Kumar, A., Bhardwaj, M. (2021). Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce. In: Goyal, D., Bălaş, V.E., Mukherjee, A., Hugo C. de Albuquerque, V., Gupta, A.K. (eds) Information Management and Machine Intelligence. ICIMMI 2019. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-4936-6_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-4936-6_9
Published: 17 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4935-9
Online ISBN: 978-981-15-4936-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Big Data Hadoop MapReduce Job Scheduling: A Short Survey

A Deep Dive into the Hadoop World to Explore Its Various Performances

A Priori Study on Factors Affecting MapReduce Performance in Cloud-Based Environment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Big Data Hadoop MapReduce Job Scheduling: A Short Survey

A Deep Dive into the Hadoop World to Explore Its Various Performances

A Priori Study on Factors Affecting MapReduce Performance in Cloud-Based Environment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation