Abstract
The information volumes are bursting, and more data has been created in the past couple of years than in the past history of the human race. Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of fresh information is going to be created every second for each and every person on Earth. So we need a platform to process the large volume of complex information process hardware as well as software. MapReduce frame has received large popularity because of scalable distributed network atmosphere for productive processing of high scale info from this arrangement of Terabytes or longer. Hadoop, open-source implementation of MapReduce combined with Hadoop distributed file system, is broadly applied to encourage bunch computing tasks demanding non-response period. The recent Hadoop implementation supposes the nodes at the bunch have been homogenous in character. In this paper, we proposed a new algorithm to fix these issues for the commercial as well as non-commercial uses can enable the betterment of the community. We have conducted the experiment to establish that if a procedure is defined to handle the different use case situations, an individual could overall decrease the expense of computing and can benefit on relying on distributed systems for rapid executions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, Z. (2015). Efficient storage design and query scheduling for improving big data retrieval and analytics, Dissertation, Auburn University, Alabama.
Zongben, X., & Shi, Y. (2015). Exploring big data analysis: Fundamental scientific problems. Springer Annals of Data Science, 2(4), 363–372.
Tinetti, F. G., Real, I., Jaramillo, R., & Barry, D. (2015). Hadoop scalability and performance testing in heterogeneous clusters. In The proceedings of the 2015 international conference on parallel and distributed processing techniques and applications (PDPTA-2015), Part of WORLDCOMP’15 (pp. 441–446).
Wan, J., Yu, W., & Xu, X. (2009). Design and implement of distributed document clustering based on MapReduce, ISBN 978-952-5726-07-7, 2009.
Kamtekar, K., & Jain, R.. (2015). Performance modeling of big data (pp. 1–9). Washington University in St. Louis.
Das, T. K., & Mohan Kumar, P. (2013). Big data analytics: A framework for unstructured data analysis. International Journal of Engineering and Technology (IJET), 5(1), 153–156. ISSN: 0975-4024.
Liu, F. H., Liou, Y. R., Lo, H. F., Chang, K. C., & Lee, W. T. (2014). The comprehensive performance rating for hadoop clusters on cloud computing platform. International Journal of Information and Electronics Engineering, 4(6), 480–484.
Rong, Z., & De Knijf, J. (2013). Direct out-of-memory distributed parallel frequent pattern mining, ACM, BigMine’13. In Proceedings of the 2nd international workshop on big data, streams and heterogeneous source mining: Algorithms, systems, programming models and applications (pp. 55–62). ISBN: 978-1-4503-2324-6, https://doi.org/10.1145/2501221. 2501229.
Li, B., & Guoyong, Y. (2012). Improvement of TF-IDF algorithm based on Hadoop framework. In The 2nd international conference on computer application and system modeling (pp. 0391–0393), Paris, France: Atlantis Press.
Kamtekar, K. (2015). Performance modeling of big data, May 2015.
Jagtap, A. (2015). Categorization of the documents using K-Means and MapReduce. International Journal of Innovative Research in Science, Engineering and Technology, ISSN: 2319-8753, 2015.
Das, T. K., & Kumar, P. M. (2013). BIG data analytics: A framework for unstructured data analysis. International Journal of Engineering and Technology (IJET), 5(1), 153–156. ISSN: 0975-4024.
Novacescu, F. (2013). Big data in high performance scientific computing. In International Journal of Analele Universităţii “Eftimie Murgu (vol. 1, pp. 207–216). “Eftimie Murgu” University of Resita, ANUL XX, NR.
Rao, B. T., Sridevi, N. V., Reddy, V. K., & Reddy, L. S. S. (2011). Performance issues of heterogeneous Hadoop clusters in cloud computing. Global Journal of Computer Science and Technology, XI(VIII).
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., & Qin, X. (2010). Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In Proceedings of the 19th international heterogeneity in computing workshop (pp. 1–9), Atlanta, Georgia.
Liu, J., et al. (2015). An efficient job scheduling for MapReduce clusters. International Journal of Future Generation Communication and Networking, 8(2), 391–398.
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., Foufou, S., & Bouras, A. (2014). A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267–279. Digital Object Identifier https://doi.org/10.1109/tetc.2014.2330519.
MonaP. EMC Corporation (2014). Virtualizing Hadoop in large-scale infrastructures.
Aggarwal, C., & Han, J. (2014). An introduction to frequent pattern mining. In Frequent pattern mining, Springer. ISBN 978-3-319-07820-5 (Print) 978-3-319-07821-2 (Online), https://doi.org/10.1007/978-3-319-07821-2.
Victor, G. S., Antonia, P., & Spyros, S. (2014). CSMR: A scalable algorithm for text clustering with cosine similarity and MapReduce. In IFIP international conference on artificial intelligence applications and innovations, AIAI 2014: Artificial intelligence applications and innovations (pp. 211–220), AICT 437.
Novacescu, F. (2013). Big data in high performance scientific computing. EFTIMIE MURGU RESITA, ANUL XX, NR. 1, (pp 207–216). ISSN 1453–7397.
Xue, J., Li, J., & Gong, Y. (2013). Restructuring of deep neural network acoustic models with singular value decomposition (pp. 2365–2369), ISCA, INTERSPEECH.
Akdere, M., Cetintemel, U., Riondato, M., Upfal, E., & Zdonik, S. B. (2012). Learning based query performance modeling and prediction. In IEEE 28th international conference on data engineering (pp. 390–401).
Thirumala Rao, B., Sridevi, N. V., Krishna Reddy, V., & Reddy, L. S. S. (2011). Performance issues of heterogeneous Hadoop clusters in cloud computing. Global Journal of Computer Science and Technology, XI(VIII).
Kumar, A., Goyal, D., Dadheech, P. (2018). A novel framework for performance optimization of routing protocol in VANET network. Journal of Advanced Research in Dynamical & Control Systems, 10(02), 2110–2121. ISSN: 1943-023X.
Dadheech, P., Goyal, D., Srivastava, S., & Kumar, A. (2018). A scalable data processing using Hadoop & MapReduce for big data. Journal of Advanced Research in Dynamical & Control Systems, 10, (02), 2099–2109. ISSN: 1943-023X.
Dadheech, P., Goyal, D., Srivastava, S., & Choudhary, C. M. (2018). An efficient approach for big data processing using spatial boolean queries. Journal of Statistics and Management Systems (JSMS), 21(4), 583–591.
Dadheech, P., Kumar, A., Choudhary, C., Beniwal, M. K., Dogiwal, S. R., & Agarwal, B. (2019). An enhanced 4-way technique using cookies for robust authentication process in wireless network. Journal of Statistics and Management Systems, 22(4), 773–782. https://doi.org/10.1080/09720510.2019.1609557.
Kumar, A., Dadheech, P., Singh, V., Raja, L., & Poonia, R. C. (2019). An enhanced quantum key distribution protocol for security authentication. Journal of Discrete Mathematical Sciences and Cryptography, 22(4), 499–507. https://doi.org/10.1080/09720529.2019.1637154.
Kumar, A., Dadheech, P., Singh, V., Poonia, R. C., & Raja, L. (2019). An improved quantum key distribution protocol for verification. Journal of Discrete Mathematical Sciences and Cryptography, 22(4), 491–498. https://doi.org/10.1080/09720529.2019.1637153.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dadheech, P., Goyal, D., Srivastava, S., Kumar, A., Bhardwaj, M. (2021). Performance Improvement of Heterogeneous Cluster of Big Data Using Query Optimization and MapReduce. In: Goyal, D., Bălaş, V.E., Mukherjee, A., Hugo C. de Albuquerque, V., Gupta, A.K. (eds) Information Management and Machine Intelligence. ICIMMI 2019. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-4936-6_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-4936-6_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4935-9
Online ISBN: 978-981-15-4936-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)