Data Profiling over Big Data Area

Elbaghazaoui, Bahaa Eddine; Amnai, Mohamed; Semmouri, Abdellatif

doi:10.1007/978-3-030-72588-4_8

Bahaa Eddine Elbaghazaoui¹⁶,
Mohamed Amnai¹⁶ &
Abdellatif Semmouri¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1344))

629 Accesses
7 Citations

Abstract

Before consuming datasets for any application, we need to understand the dataset at hand and its metadata. Discovering metadata process known as data profiling. Data profiling focus on examining the data sets and collecting metadata such as statistics or informative summaries about that data. In this chapter, we will discuss the importance of data profiling and shed light on the area of data profiling in big data. In addition, we will detail data profiling use cases and reviewing the state-of-the-art data profiling systems and techniques. Finally, we conclude with directions and challenges for future research in the area of data profiling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

New Trends in Big Data Profiling

An Introduction to Data Profiling

Data Profiling Technology of Data Governance Regarding Big Data: Review and Rethinking

References

Olsen, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann Publishers. ISBN 1558608915 (2003)
Google Scholar
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24, 557–581 (2015)
Google Scholar
Hildebrandt, M., de Vries, K.: Privacy, Due Process and the Computational Turn, 43 (58 / 271). Routledge, New York (2013)
Google Scholar
Dixon, J.: Pentaho, Hadoop, and Data Lakes. James Dixon’s Blog (2010)
Google Scholar
Abedjan, Z., Naumann, F.: Advancing the Discovery of Unique Column Combinations. Universittsverlag Potsdam (2011). ISBN 978-3-86956-148-6
Google Scholar
Johnson, T.: Data Profiling, Encyclopedia of Database Systems, pp. 604–608. Springer, Heidelberg (2009)
Book Google Scholar
Suereth, R., Ennis, W., Clavens, G.: Systems and methods of profiling data for integration, United Parcel Service of America Inc., US7912867B2, US12/036,611 (2008)
Google Scholar
Heise, A., Quiané-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB Endow. 7(4) (2013)
Google Scholar
Bauckmann, J., Leser, U., Naumann, F., Tietz, V.: Efficiently detecting inclusion dependencies. In: International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey (poster paper, to appear)
Google Scholar
Papenbrock, Thorsten., Kruse, Sebastian., Quian-Ruiz, Jorge-Arnulfo, Naumann, Felix: Divide and conquer-based inclusion dependency discovery. Proc. VLDB Endow. 8(7), 774–785 (2015)
Google Scholar
Abedjan, Z., Grütze, T., Jentzsch, A., Naumann, F.: Profiling and mining RDF data with ProLOD++. In: Proceedings of the International Conference on Data Engineering (ICDE) (2014)
Google Scholar
Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 240–251 (2002)
Google Scholar
Raman, V., Hellerstein, J.M.: Potters wheel: an interactive data cleaning system. In: Proceedings of the International Conference on Very Large Databases (VLDB), Rome, Italy, pp. 381–390 (2001)
Google Scholar
Golab, L., Karloff, H., Korn, F., Srivastava, D.: Data auditor: exploring data quality and semantics using pattern tableaux. Proc. VLDB Endow. 3(12), 16410–1644 (2010)
Google Scholar
Chu, X., Ilyas, I., Papotti, P., Ye, Y.: RuleMiner: data quality rules discovery. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 1222–1225 (2014)
Google Scholar
Hellerstein, J.M., Christopher, R., Schoppmann, F., Wang, D.Z., Fratkin, E., Gorajek, A., Ng, K.S., Welton, C., Feng, X., Li, K., Kumar, A.: The MADlib analytics library or MAD skills, the SQL. Proc. VLDB Endow. 5(12), 1700–1711 (2012)
Article Google Scholar
Mohamed, F.S., Bellahsene, B.E.Z., Todorov, K.: Towards semantic dataset profiling. In: (2014)
Google Scholar
Shoaib, M., Basharat, A.: Ontology based knowledge representation and semantic profiling in personalized semantic social networking framework. In: 2010 3rd International Conference on Computer Science and Information Technology. IEEE (2010)
Google Scholar
Gangadharan, S.P.: Digital inclusion and data profiling. First Monday 17(5) (2012). https://doi.org/10.5210/fm.v17i5.3821
Bauckmann, J., Leser, U., Naumann, F., Tietz, V.: Efficiently detecting inclusion dependencies. In: Proceedings of the International Conference on Data Engineering (ICDE), Istanbul, Turkey, pp. 1448–1450 (2007)
Google Scholar
Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 1082–1093 (2015)
Google Scholar
Heise, A., Quian-Ruiz, J.A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB Endow. 7, 301–312 (2013)
Article Google Scholar
Bohm, C., Naumann, F., Abedjan, Z., Grutze, D.F.T., Hefenbrock, D., Pohl, M., Sonnabend, D.: Profiling linked open data with ProLOD. In: IEEE 26th International Conference on Data Engineering Workshops (ICDEW) (2010)
Google Scholar
Buneman, P., Davidson, S., Fernandez, M., Suciu, D.: Adding structure to unstructured data. In: International Conference on Database Theory ICDT 1997: Database Theory ICDT 1997, pp. 336-350 (2005)
Google Scholar
Bruinsma, G., Weisburd, D. (eds.) Encyclopedia of Criminology and Criminal Justice. Springer, New York (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Computer Sciences, Faculty of Sciences Kenitra, Ibn Tofail University, Kenitra, Morocco
Bahaa Eddine Elbaghazaoui & Mohamed Amnai
Faculty of Sciences and Techniques, Lab. TIAD, Sultan Moulay Slimane University, Beni Mellal, Morocco
Abdellatif Semmouri

Authors

Bahaa Eddine Elbaghazaoui
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Amnai
View author publications
You can also search for this author in PubMed Google Scholar
Abdellatif Semmouri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bahaa Eddine Elbaghazaoui .

Editor information

Editors and Affiliations

ENSA, Sultan Moulay Slimane University, Khouribga, Morocco
Noreddine Gherabi
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Elbaghazaoui, B.E., Amnai, M., Semmouri, A. (2021). Data Profiling over Big Data Area. In: Gherabi, N., Kacprzyk, J. (eds) Intelligent Systems in Big Data, Semantic Web and Machine Learning. Advances in Intelligent Systems and Computing, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-72588-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-72588-4_8
Published: 29 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72587-7
Online ISBN: 978-3-030-72588-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Data Profiling over Big Data Area

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

New Trends in Big Data Profiling

An Introduction to Data Profiling

Data Profiling Technology of Data Governance Regarding Big Data: Review and Rethinking

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Data Profiling over Big Data Area

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

New Trends in Big Data Profiling

An Introduction to Data Profiling

Data Profiling Technology of Data Governance Regarding Big Data: Review and Rethinking

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation