Abstract
Currently, Big Data is gaining wide adoption in the digital world as a new technology able to manage and support the explosive growth of data. Indeed, data is growing at a higher rate due to the variety of the data-generating adopted devices. In addition to the volume aspect, the generated data are usually unstructured, inaccurate, and incomplete, making its processing even more difficult. However, analyzing such data can provide significant benefits to businesses if the quality of data is improved. Facing the fact that value could only be extracted from high data quality, companies using data in their business management focus more on the quality aspect of the gathered data. Therefore, Big data quality has received a lot of interest from the literature. Indeed, many researchers have attempted to address Big data quality issues by suggesting novel approaches to assess and improve Big data quality. All these researches inspire us to review the most relevant findings and outcomes reported in this regard. Assuming that some review papers were already published for the same purpose, we believe that researchers always need an update. It is worth noting that all the published review papers are focused on a specific area of Big data quality. Therefore, this paper aims to review all the big data quality aspects discussed in the literature, including Big data characteristics, big data value chain, and big data quality dimensions and metrics. Moreover, we will discuss how the quality aspect could be employed in the different applications domains of Big data. Thus, this review paper provides a global view of the current state of the art of the various aspects of Big data quality and could be used to support future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
N. Abdullah, S. A. Ismail, S. Yuhaniz, S. Mohd sam. Data Quality in Big Data: a Review, vol. 7, pp. 16–27, Jan. 2015
R. Agrawal, A. Kadadi, X. Dai, F. Andres, Challenges and Opportunities with Big Data Visualization. (2015). https://doi.org/10.1145/2857218.2857256
F. Arolfo, A. Vaisman, Data Quality in a Big Data Context: 22nd European Conference, ADBIS 2018, Budapest, Hungary, September 2–5, 2018, Proceedings, pp. 159–172 (2018). https://doi.org/10.1007/978-3-319-98398-1_11
I.E. Alaoui, et Y. Gahi, The impact of big data quality on sentiment analysis approaches. Procedia Comput. Sci. 160, 803–810, janv. 2019. https://doi.org/10.1016/j.procs.2019.11.007
D. Al-Hajjar, N. Jaafar, M. Al-Jadaan et, R. Alnutaifi, Framework for social media big data quality analysis, in New Trends in Database and Information Systems II, ed. by N. Bassiliades, M. Ivanovic, M. Kon-Popovska, Y. Manolopoulos, T. Palpanas, G. Trajcevski, et A. Vakali (Springer, International Publishing, Cham, 2015), pp. 301–314. https://doi.org/10.1007/978-3-319-10518-5_23
D. Ardagna, C. Cappiello, W. Samá, et M. Vitali, Context-aware data quality assessment for big data. Future Gen. Comput. Syst. 89, 548–562, déc. 2018. https://doi.org/10.1016/j.future.2018.07.014
C. Batini, A. Rula, M. Scannapieco, G. Viscusi, From data quality to big data quality. J. Database Manag. 26(1), 60–82 (Jan. 2015). https://doi.org/10.4018/JDM.2015010103
E. Curry, The big data value chain: definitions, concepts, and theoretical approaches, in New Horizons for a Data-Driven Economy: A Roadmap for Usage and Exploitation of Big Data in Europe (2015). https://doi.org/10.1007/978-3-319-21569-3_3
C. Cappiello, W. Samá, M. Vitali, Quality awareness for a successful big data exploitation, in Proceedings of the 22nd International Database Engineering & Applications Symposium on—IDEAS 2018 (2018). https://doi.org/10.1145/3216122.3216124
S. Dhamodharavadhani, G. Rajasekaran, R. Ramalingam, Unlock Different V's of Big Data for Analytics (2018)
J. Espinosa, S. Kaisler, F. Armour, W. Money, Big Data Redux: New Issues and Challenges Moving Forward. (2019). https://doi.org/10.24251/HICSS.2019.131
I. El Alaoui, Y. Gahi, R. Messoussi, Big data quality metrics for sentiment analysis approaches, in Proceedings of the 2019 International Conference on Big Data Engineering (2019). https://doi.org/10.1145/3341620.3341629
I. El Alaoui, Y. Gahi, R. Messoussi, Big Data Quality Metrics for Sentiment Analysis Approaches, p. 43 (2019). https://doi.org/10.1145/3341620.3341629
A. Faroukhi, I. El Alaoui, Y. Gahi, A. Amine, Big data monetization throughout Big Data Value Chain: a comprehensive review. J. Big Data 7, 3 (2020). https://doi.org/10.1186/s40537-019-0281-5
A. Faroukhi, I. El Alaoui, Y. Gahi, et A. Amine, Big Data Value Chain: A Unified Approach for Integrated Data Quality and Security, p. 8 (2020). https://doi.org/10.1109/ICECOCS50124.2020.9314391
A.Z. Faroukhi, I. El Alaoui, Y. Gahi, et A. Amine, A novel approach for big data monetization as a service, in Advances on Smart and Soft Computing (Singapore, 2021), pp. 153–165. https://doi.org/10.1007/978-981-15-6048-4_14
IRJET-V4I957.pdf. Accessed 05 Apr. 2021. https://www.irjet.net/archives/V4/i9/IRJET-V4I957.pdf
A. Juneja, et N.N. Das, Big data quality framework: pre-processing data in weather monitoring application, in 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), févr., pp. 559–563 (2019). https://doi.org/10.1109/COMITCon.2019.8862267.
S. Juddoo, C. George, Discovering the Most Important Data Quality Dimensions in Health Big Data Using Latent Semantic Analysis. (2018). https://doi.org/10.1109/ICABCD.2018.8465129
G. Kapil, A. Agrawal, R.A. Khan, A study of big data characteristics,” in 2016 International Conference on Communication and Electronics Systems (ICCES), Oct. 2016, pp. 1–4. https://doi.org/10.1109/CESYS.2016.7889917
N. Khan, M. Alsaqer, H. Shah, G. Badsha, A.A. Abbasi, S. Salehian, The 10 Vs, Issues and challenges of big data, in Proceedings of the 2018 International Conference on Big Data and Education, New York, NY, USA, Mar. 2018, pp. 52–56. https://doi.org/10.1145/3206157.3206166
M. Knight, What Is Big Data? DATAVERSITY, 05 Feb. 2018. https://www.dataversity.net/what-is-big-data/. Accessed 05 Apr. 2021
M.Y. Khaleel, et M.M. Hamad, Data quality management for big data applications, in 2019 12th International Conference on Developments in eSystems Engineering (DeSE), Oct. 2019, pp. 357–362. https://doi.org/10.1109/DeSE.2019.00072
S.S.B.T. Lincy, N.S. Kumar, An enhanced preprocessing model for big data processing: a quality framework, in 2017 International Conference on Innovations in Green Energy and Healthcare Technologies (IGEHT), Mar. 2017, pp. 1–7. https://doi.org/10.1109/IGEHT.2017.8094109
J. Merino, I. Caballero, B. Rivas, M. Serrano, M. Piattini, A Data Quality in Use model for Big Data. Future Gener. Comput. Syst. 63, 123–130 (Oct. 2016). https://doi.org/10.1016/j.future.2015.11.024
G. Mylavarapu, J. P. Thomas, et K. A. Viswanathan, An Automated Big Data Accuracy Assessment Tool, in 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), mars 2019, p. 193‑197. doi: https://doi.org/10.1109/ICBDA.2019.8713218.
I. Noorwali, D. Arruda, N.H. Madhavji, Understanding quality requirements in the context of big data systems, in 2016 IEEE/ACM 2nd International Workshop on Big Data Software Engineering (BIGDSE), May 2016, pp. 76–79. https://doi.org/10.1109/BIGDSE.2016.021
N. B. PROS, “The Missing Vs in Big Data: Viability and Value,” Wired, May 06, 2013. Accessed: Apr. 05, 2021. [Online]. Available: https://www.wired.com/insights/2013/05/the-missing-vs-in-big-data-viability-and-value/
P. Pääkkönen et, J. Jokitulppo, Quality management architecture for social media data. J. Big Data 4 (2017). https://doi.org/10.1186/s40537-017-0066-7
A. Ramasamy, S. Chowdhury, Big Data Quality Dimensions: A Systematic Literature Review, May 2020. https://doi.org/10.4301/S1807-1775202017003
D. Reinsel, J. Gantz, J. Rydning, The Digitization of the World from Edge to Core, p. 28 (2018)
F. Sidi, P. Hassany Shariat Panahy, L. Affendey, M.A. Jabar, H. Ibrahim, A. Mustapha, Data quality: a survey of data quality dimensions, Aug. 2013. https://doi.org/10.1109/InfRKM.2012.6204995
R. Schmidt, M. Möhring, Strategic alignment of cloud-based architectures for big data, in 2013 17th IEEE International Enterprise Distributed Object Computing Conference Workshops, Sep. 2013, pp. 136–143. https://doi.org/10.1109/EDOCW.2013.22
M. Serhani, H. El Kassabi, I. Taleb, R. Nujum, An Hybrid Approach to Quality Evaluation across Big Data Value Chain (2016). https://doi.org/10.1109/BigDataCongress.2016.65
S. Soni, A. Singh, Improving Data Quality using Big Data Framework: A Proposed Approach (2021)
The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. https://datascience.codata.org/articles/https://doi.org/10.5334/dsj-2015-002/. Accessed 05 Apr. 2021
The UNECE Big Data Quality Task Team, A Suggested Framework for the Quality of Big Data, Dec. 2014
I. Taleb, M. Serhani, R. Dssouli, Big Data Quality Assessment Model for Unstructured Data (2018). https://doi.org/10.1109/INNOVATIONS.2018.8605945
TDWI Best Practices Report | Big Data Analytics, Transforming Data with Intelligence (2021). https://tdwi.org/research/2011/09/best-practices-report-q4-big-data-analytics.aspx Accessed 05 Apr. 2021
D. Tranfield, D. Denyer, P. Smart, Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br. J. Manag. 14(3), 207–222 (Sep. 2003). https://doi.org/10.1111/1467-8551.00375
I. Taleb, M. Serhani et, R. Dssouli, Big Data Quality: A Data Quality Profiling Model, pp. 61–77 (2019). https://doi.org/10.1007/978-3-030-23381-5_5.
I. Taleb, M.A. Serhani, Big data pre-processing: closing the data quality enforcement loop, in 2017 IEEE International Congress on Big Data (BigData Congress) (2017).https://doi.org/10.1109/bigdatacongress.2017.73
I. Taleb, H.T.E. Kassabi, M.A. Serhani, R. Dssouli, et C. Bouhaddioui, Big data quality: a quality dimensions evaluation, in 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, juill. 2016, pp. 759–765. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0122
I. Taleb, R. Dssouli, et M.A. Serhani, Big data pre-processing: a quality framework, in 2015 IEEE International Congress on Big Data, New York City, NY, USA, Juin 2015, pp. 191–198. https://doi.org/10.1109/BigDataCongress.2015.35
M. Talha, A.A. El kalam et, N. Elmarzouqi, Big data: tradeoff between data quality and data security. Procedia Comput. Sci. 151, 916–922, Janv 2019. https://doi.org/10.1016/j.procs.2019.04.127
Y. Wand, R.Y. Wang, Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (Nov. 1996). https://doi.org/10.1145/240455.240479
A. Wahyudi, G. Kuk, et M. Janssen, A Process Pattern Model for Tackling and Improving Big Data Quality. Inf. Syst. Front 20(3), 457–469, juin 2018. https://doi.org/10.1007/s10796-017-9822-7
X. Xu, Y. Lei, et Z. Li, An Incorrect Data Detection Method for Big Data Cleaning of Machinery Condition Monitoring , IEEE Transactions on Industrial Electronics, vol. 67, no. 3, pp. 2326–2336, Mar. 2020, https://doi.org/10.1109/TIE.2019.2903774.
S. Zan, X. Zhang, Medical data quality assessment model based on credibility analysis, in 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC) (2018). https://doi.org/10.1109/itoec.2018.8740576
G. Zhang, A data traceability method to improve data quality in a big data environment, in 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), July 2020, pp. 290–294. https://doi.org/10.1109/DSC50466.2020.00051
P. Zhang, F. Xiong, J. Gao, J. Wang, Data Quality in Big Data Processing: Issues, Solutions and Open Problems, p. 7 (2017). https://doi.org/10.1109/UIC-ATC.2017.8397554.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Elouataoui, W., Alaoui, I.E., Gahi, Y. (2022). Data Quality in the Era of Big Data: A Global Review. In: Baddi, Y., Gahi, Y., Maleh, Y., Alazab, M., Tawalbeh, L. (eds) Big Data Intelligence for Smart Applications. Studies in Computational Intelligence, vol 994. Springer, Cham. https://doi.org/10.1007/978-3-030-87954-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-87954-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87953-2
Online ISBN: 978-3-030-87954-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)