Abstract
Poor quality data may be detected and corrected by performing various quality assurance activities that rely on techniques with different efficacy and cost. In this paper, we propose a quantitative approach for measuring and comparing the effectiveness of these data quality (DQ) techniques. Our definitions of effectiveness are inspired by measures proposed in Information Retrieval. We show how the effectiveness of a DQ technique can be mathematically estimated in general cases, using formal techniques that are based on probabilistic assumptions. We then show how the resulting effectiveness formulas can be used to evaluate, compare and make choices involving DQ techniques.
Chapter PDF
Similar content being viewed by others
References
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications), 1st edn. Springer, Heidelberg (2006)
Jiang, L., Topaloglou, T., Borgida, A., Mylopoulos, J.: Goal-oriented conceptual database design. In: Proceedings of the 15th IEEE International Requirements Engineering Conference (RE 2007) (2007)
Jiang, L., Borgida, A., Topaloglou, T., Mylopoulos, J.: Data quality by design: A goal-oriented approach. In: Proceedings of the 12th International Conference on Info. Quality (ICIQ 2007) (2007)
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007, pp. 746–755 (2007)
Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 1–48 (2008)
van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworth, London (1979)
Barbará, D., Goel, R., Jajodia, S.: Using checksums to detect data corruption. In: Advances in Database Technology — EDBT 2000, pp. 136–149 (2000)
Fenton, N.E., Pfleeger, S.L.: Software Metrics: A Rigorous and Practical Approach. PWS Publishing Co., Boston (1998)
Ballou, D., Wang, R., Pazer, H., Tayi, G.K.: Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44(4), 462–484 (1998)
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)
Ballou, D.P., Pazer, H.L.: Modeling completeness versus consistency tradeoffs in information decision contexts. IEEE Trans. on Knowl. and Data Engineering 15(1), 240–243 (2003)
Gu, L., Baxter, R., Vickers, D., Rainsford, C.: Record linkage: Current practice and future directions. Technical report, CSIRO Mathematical and Information Sciences (2003)
Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. In: Guillet, F., Hamilton, H.J. (eds.) Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43, pp. 127–151. Springer, Heidelberg (2007)
Batini, C., Ceri, S., Navathe, S.B.: Conceptual Database Design: An Entity-Relationship Approach. Benjamin/Cummings (1992)
Moody, D.L.: Metrics for evaluating the quality of entity relationship models. In: Ling, T.-W., Ram, S., Li Lee, M. (eds.) ER 1998. LNCS, vol. 1507, pp. 211–225. Springer, Heidelberg (1998)
Piattini, M., Calero, C., Genero, M.: Table oriented metrics for relational databases. Software Quality Journal 9(2), 79–97 (2001)
Calero, C., Piattini, M.: Metrics for databases: a way to assure the quality. In: Piattini, M.G., Calero, C., Genero, M. (eds.) Information and database quality, pp. 57–84. Kluwer Academic Publishers, Norwell (2002)
Baroni, A.L., Calero, C., Abreu, F.B., Piattini, M.: Object-relational database metrics formalization. In: Sixth International Conference on Quality Software, pp. 30–37. IEEE Computer Society, Los Alamitos (2006)
Serrano, M.A., Calero, C., Piattini, M.: Metrics for data warehouse quality. In: Khosrow-Pour, M. (ed.) Encyclopedia of Info. Sci. and Techno. (IV), pp. 1938–1944. Idea Group (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, L., Barone, D., Borgida, A., Mylopoulos, J. (2009). Measuring and Comparing Effectiveness of Data Quality Techniques. In: van Eck, P., Gordijn, J., Wieringa, R. (eds) Advanced Information Systems Engineering. CAiSE 2009. Lecture Notes in Computer Science, vol 5565. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02144-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-02144-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02143-5
Online ISBN: 978-3-642-02144-2
eBook Packages: Computer ScienceComputer Science (R0)