Abstract
The objective of this paper is to propose some of the best storage practices for using Spatial Big data on the Data Lakehouse. In fact, handling Big Spatial Data showed the limits of current approaches to store massive spatial data, either traditional such as geographic information systems or new ones such as extensions of augmented Big Data approaches. Our article is divided into four parts. In the first part, we will give a brief background of the data management system scene. In the second part, we will present the Data LakeHouse and how it responds to the problems of storage, processing and exploitation of big data while ensuring consistency and efficiency as in data warehouses. Then, we will recall the constraints posed by the management of Big Spatial Data. We end our paper with an experimental study showing the best storage practice for Spatial Big data on the Data LakeHouse. Our experiment shows that the partitioning of Spatial Big data over Geohash index is an optimal solution for the storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Llave, M.R.: Data Lakes in business intelligence: reporting from the trenches. Procedia Comput. Sci. 138, 504–516 (2008)
Singh, A.: Architecture of data Lake. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 5(2), 411–414 (2019)
Khine, P.P., Wang, Z.S.: Data Lake: a new ideology in big data era. ITM Web Conf. 17, 03025 (2008)
Lechtenbörger, J., Vossen, G.: Multidimensional normal forms for Data Warehouse design. Inf. Syst. 28(5), 415–434 (2003)
Decker, H., Lhotská, L., Link, S., Spies, M., Eds, R.R.W., Hutchison, D.: Data Lakes: Trends and Perspectives. In: Dexa 2014: Part II, LNCS, vol. 8645. Springer (2014)
Mathis, C.: Data Lakes. Datenbank-Spektrum 17(3), 289–293 (2017). https://doi.org/10.1007/s13222-017-0272-7
Armbrust, M., et al.: Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proc. VLDB Endow. 13(12), 3411–3424 (2020)
Armbrust, M., Ghodsi, A., Xin, R., Zaharia, M.: Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: Conference on Innovative Data Systems Research (CIDR) (2021)
Delta Lake. https://delta.io
Apache parquet. https://parquet.apache.org/
Databricks. https://databricks.com/
Oh, G., Leblanc, D.J., Peng, H.: Vehicle Energy Dataset (VED), a large-scale dataset for vehicle energy consumption research. IEEE Trans. Intell. Transp. Syst. 1–11 (2020)
Zhou, C., Lu, H., Xiang, Y., Wu, J., Wang, F.: GeohashTile: vector geographic data display method based on Geohash. ISPRS Int. J. Geo Inf. 9(7), 418 (2020). https://doi.org/10.3390/ijgi9070418
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Errami, S.A., Hajji, H., Kadi, K.A.E., Badir, H. (2023). Managing Spatial Big Data on the Data LakeHouse. In: Ben Ahmed, M., Abdelhakim, B.A., Ane, B.K., Rosiyadi, D. (eds) Emerging Trends in Intelligent Systems & Network Security. NISS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 147. Springer, Cham. https://doi.org/10.1007/978-3-031-15191-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-15191-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15190-3
Online ISBN: 978-3-031-15191-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)