Abstract
Data quality is the primary concern faced by most of the organizations due to improper maintenance in the database. Data obtained from the various resources are dirty, affecting the accuracy of predicted results. There are a lot of challenges when handling Big Data because it requires well-defined and precise measurement processes. The challenges are in the characteristics of big data itself where the V’s play an important role in measuring and determining data quality. Although the issue has been discussed over 20 years, there is no guideline in identifying the important dimension of data quality being proposed to adhere with the context of Big Data. Therefore, the purpose of this systematic review is to review literature on the issue, challenges, and dimension of data quality in the era of Big Data using thematic review. This review included journal and conference proceeding papers from ACM Digital Library, Scopus, and Science Direct published between 2016 until 2020. Inclusion and exclusion processes have filtered out 21 final articles for the review. A systematic review on these 21 articles focuses on the issue, challenges, and dimension of data quality. The results of this study benefit the future study on the development of data quality dimensions and can be a guideline for the researcher to design the data quality assessment framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Swapnil W, Anil Y, Gupta S (2016) Big data: characteristics, challenges and data mining. Int J Comput Appl 25–29
Ridzuan F, Wan Zainon WMN (2019) A review on data cleansing methods for big data. Proc Comput Sci 161:731–738
Taleb I, Dssouli R, Serhani MA (2015) Big Data pre-processing: a quality framework. In: 2015 IEEE international congress on big data. IEEE, pp 191–198
Feng Y (2018) Improve data quality by using dependencies and regular expressions. Mid Sweden University
Chu X (2017) Scalable and holistic qualitative data cleaning. University of Waterloo
Alotaibi SB (2017) ETDC: an efficient technique to cleanse data in the data warehouse. In: Proceedings of the international conference on advances in image processing. ACM, Bangkok, pp 135–138
Abdalla AMNT (2018) Leverage data quality improvement for big data analytics. Universitas Masarykiana
Auer F, Felderer M (2019) Addressing data quality problems with metamorphic data relations. In: Proceedings of the 2019 IEEE/ACM 4th international workshop on metamorphic testing (MET 2019), pp 76–83
Salih FI, Ismail SA, Hamed MM, Mohd Yusop O, Azmi A, Mohd Azmi NF (2019) Data quality issues in big data: a review. Adv Int Syst Comput 843:105–116
Zairul M (2020) A thematic review on student-centred learning in the studio education. J Crit Rev 7(2):504–511
Clarke V, Braun V (2013) Teaching thematic analysis: Overcoming challenges and developing strategies for effective learning. The Psychologist 26:120–123
Taleb I, Serhani MA, Dssouli R (2018) Big data quality: a survey. In: 2018 IEEE international congress on big data (Big Data congress), pp 166–73
Tian Y (2017) Accelerating data preparation for big data analytics. TELECOM ParisTech
El Alaoui (2019) Big data quality metrics for sentiment analysis approaches
El Glaoui I, Gahi Y (2019) The impact of big data quality on sentiment analysis approaches. Proc Comput Sci, pp 803–810 (Elsevier B.V.)
Dong X, He H, Li C, Liu Y, Xiong H (2018) Scene-based big data quality management framework. In: International conference of pioneering computer scientists, engineers and educators, pp 122–139
Emmanuel I, Stanier C (2016) Defining big data. In: Proceedings of the international conference on big data and advanced wireless technologies—BDAW’16. ACM Press, New York, pp 1–6
Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 1–10
Hermans K, Waegeman W, Opsomer G, Van Ranst B, De Koster J, Van Eetvelde M et al (2017) Novel approaches to assess the quality of fertility data stored in dairy herd management software. J Dairy Sci 100(5):4078–4089
Ardagna D, Cappiello C, Samá W, Vitali M (2018) Context-aware data quality assessment for big data. Futur Gener Comput Syst 89:548–562
Saha B, Srivastava D (2014) Data quality: the other face of Big Data. In: 2014 IEEE 30th international conference on data engineering. IEEE, pp 1294–1297
Abdellaoui S, Bellatreche L, Nader F (2016) A quality-driven approach for building heterogeneous distributed databases: the case of data warehouses. In: 2016 16th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). IEEE, pp 631–638
Talha M, El Kalam AA, Elmarzouqi N (2019) Big data: trade-off between data quality and data security. In: The 9th international symposium on frontiers in Ambient and mobile systems (FAMS). Elsevier B.V., pp 916–922
Ehrlinger L, Rusz E, Wöß W (2019) A survey of data quality measurement and monitoring tools. CoRR abs/1907.0
Jarwar MA, Chong I (2020) Web objects based contextual data quality assessment model for semantic data application. Appl Sci [Internet] 10(6):33
Onyeabor GA, Ta’a A (2019) A model for addressing quality issues in big data. In: Advances in intelligent systems and computing, pp 65–73
Jang WJ, Lee ST, Kim JB, Gim GY (2019) A study on data profiling: focusing on attribute value quality index. Appl Sci 9(23)
Gyulgyulyan E, Julien A, Franck R, Astsatryan H (2019) Data quality alerting model for big data analytics, vol. 3, pp 405–416
Cappiello C, Samá W, Vitali M (2018) Quality awareness for a successful big data exploitation. In: Proceedings of the 22nd international database engineering and applications symposium. Villa San Giovanni, Italy, pp 37–44
Catarci T, Scannapieco M, Console M, Demetrescu C (2017) My (fair) big data. In: 2017 IEEE international conference on Big Data (Big Data). IEEE, pp 2974–2979
De Tré G, De Mol R, Bronselaer A (2018) Handling veracity in multi-criteria decision-making: a multi-dimensional approach. Inf Sci (NY). 460–461:541–554
Shankaranarayanan G, Blake R (2017) From content to context: the evolution and growth of data quality research. J Data Inf Qual 8(2):1–28
Surbakti FPS, Wang W, Indulska M, Sadiq S (2020) Factors influencing effective use of big data: a research framework. Inf Manag 57(1):103146
Lee D (2019) Big data quality assurance through data traceability: a case study of the national standard reference data program of Korea. IEEE Access 7:36294–36299
Abdallah M (2019) Big Data quality challenges. In: 2019 international conference on Big Data and computational intelligence (ICBDCI). IEEE, pp 1–3
L’Heureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797
Taleb I, El Kassabi HTE, Serhani MA, Dssouli R, Bouhaddioui C (2016) Big Data quality: a quality dimensions evaluation. In: 2016 international IEEE conferences on ubiquitous intelligence and computing, advanced and trusted computing, scalable computing and communications, cloud and big data computing, internet of people, and smart world congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). IEEE, pp 759–765
García Lozano M, Brynielsson J, Franke U, Rosell M, Tjörnhammar E, Varga S et al (2020) Veracity assessment of online data. Decis Supp Syst 129:113132
Swapnil W, Anil Y, Gupta S.: Big Data and data mining. In: International conference on advances in information technology and management, pp 25–29
Hariri RH, Fredericks EM, Bowers KM (2019) Uncertainty in Big Data analytics: survey, opportunities, and challenges. J Big Data 6(1)
Francisco MMC, Alves-Souza SN, Campos EGL, De Souza LS (2017) Total data quality management and total information quality management applied to costumer relationship management. In: ACM international conference proceeding series, pp 40–45
Zheng L (2017) SNSQ ontology: a domain ontology for SNSs data quality. In: 2017 2nd IEEE international conference on cloud computing Big Data analysis (ICCCBDA 2017), pp 11–18
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ridzuan, F., Wan Zainon, W.M.N., Zairul, M. (2022). A Thematic Review on Data Quality Challenges and Dimension in the Era of Big Data. In: Isa, K., et al. Proceedings of the 12th National Technical Seminar on Unmanned System Technology 2020. Lecture Notes in Electrical Engineering, vol 770. Springer, Singapore. https://doi.org/10.1007/978-981-16-2406-3_56
Download citation
DOI: https://doi.org/10.1007/978-981-16-2406-3_56
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2405-6
Online ISBN: 978-981-16-2406-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)