Abstract
As more social media platforms expand through our lives, the amount of data exchanged across them has sharply upsurged. Data coming from social network sites can be immensely useful for all companies for determining customer trends and increase operational efficiency to get a competitive edge. At the same time, traditional decision support systems are unable to meet the growing needs of the modern enterprise to integrate and analyze a wide variety of data generated by social networks platforms. This emergence of large amounts of data requires new techniques of data management and data storage architectures able to find information quickly in a large volume of data. In this context, a data storage concept known under the name of data lake appeared, which refers to one of the latest technologies that were introduced to address this challenge in the last period. A data lake is a large raw data repository that stores and manages all company data in raw form before integrating them into the data warehouse. In this paper, we provide a new approach to design a NoSQL data warehouse from a data lake. More precisely, we start by introducing some of the recent literature reviews on NoSQL data warehouse design approaches. Then, we describe the main concepts of a NoSQL data lake that allows storing the big data collected from social networks such as Facebook, Twitter, and Youtube. Finally, we define a set of mapping rules to integrate social media data from the data lake into the NoSQL data warehouse based on two NoSQL logical models: column-oriented and document-oriented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, C.: Transforming relational database into HBase: a case study. In: 2010 IEEE International Conference on Software Engineering and Service Sciences, pp. 683-687. IEEE, July 2010
Han, D., Stroulia, E.: A three-dimensional data model in hbase for large time-series dataset analysis. In: 2012 IEEE 6th International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA), pp. 47–56. IEEE, September 2012
Dede, E., Govindaraju, M., Gunter, D., Canon, R.S., Ramakrishnan, L.: Performance evaluation of a mongodb and hadoop platform for scientific data analysis. In: Proceedings of the 4th ACM Workshop on Scientific Cloud Computing, pp. 13-20, June 2013
Dehdouh, K., Boussaid, O., Bentayeb, F.: Columnar nosql star schema benchmark. In: International Conference on Model and Data Engineering, pp. 281–288. Springer, Cham, September 2014
Dehdouh, K., Boussaid, O., Bentayeb, F.: Big data warehouse: building columnar NoSQL OLAP cubes. Int. J. Dec. Support Syst. Technol. (IJDSST) 12(1), 1–24 (2020)
Dehdouh, K., Bentayeb, F., Boussaid, O., Kabachi, N.: Using the column oriented NoSQL model for implementing big data warehouses. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 469. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp) (2015)
Zhao, H., Ye, X.: A multidimensional OLAP engine implementation in key-value database systems. In: Advancing Big Data Benchmarks, pp. 155–170. Springer, Cham (2013)
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL (2015)
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Implementation of multidimensional databases in column-oriented NoSQL systems. In: East European Conference on Advances in Databases and Information Systems, pp. 79–91. Springer, Cham, September 2015
Chevalier, M., El Malki, M., Kopliku, A., Teste, O., Tournier, R.: Entrepôts de données orientés documents: cuboïdes étendus. Document numérique 20(1), 9–38 (2017)
Ferro, M., Fragoso, R., Fidalgo, R.: Document-oriented geospatial data warehouse: an experimental evaluation of SOLAP queries. In: 2019 IEEE 21st Conference on Business Informatics (CBI), vol. 1, pp. 47–56. IEEE, July 2019
Oditis, I., Bicevska, Z., Bicevskis, J., Karnitis, G.: Implementation of NoSQL-based data Wareh. Baltic J. Mod. Comput. 6(1), 45–55 (2018)
Scabora, L.C., Brito, J.J., Ciferri, R.R., Ciferri, C.D.D.A.: Physical data warehouse design on NoSQL databases. In: Proceedings of the 18th International Conference on Enterprise Information Systems, pp. 111–118. SCITEPRESS-Science and Technology Publications, Lda, April 2016
Dabbèchi, H., Haddar, N., Abdallah, M.B., Haddar, K.: A unified multidimensional data model from social networks for unstructured data analysis. In: 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), pp. 415-422. IEEE, October 2017
Prakash, D.: NOSOLAP: Moving from data warehouse requirements to NoSQL databases. In: ENASE, pp. 452–458, May 2019
Yang, E., Scheff, J.D., Shen, S.C., Farnum, M.A., Sefton, J., Lobanov, V., Agrafiotis, D.K.: A late-binding, distributed, NoSQL warehouse for integrating patient data from clinical trials. Database (2019)
Jianmin, W., Wenbin, Z., Tongrang, F., Shilong, Y., Hongwei, L.: An improved join-free snowflake schema for ETL and OLAP of data warehouse. Pract. Exper. Concurrency Comput. e5519 (2019)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dabbèchi, H., Haddar, N.Z., Elghazel, H., Haddar, K. (2021). Social Media Data Integration: From Data Lake to NoSQL Data Warehouse. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_64
Download citation
DOI: https://doi.org/10.1007/978-3-030-71187-0_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71186-3
Online ISBN: 978-3-030-71187-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)