Abstract
Data quality (DQ) might degrade over time, due to changes in real-world entities or behaviors that are not reflected correctly in datasets that describe them. This study presents a continuous-time Markov-Chain model that reflects DQ as a dynamic process. The model may help assessing and predicting accuracy degradation over time. Taking into account cost-benefit tradeoffs, it can also be used to recommend an economically-optimal point in time at which data values should be evaluated and possibly reacquired. The model addresses data-acquisition scenarios that reflect real-world processes with a finite number of states, each described by certain data-attribute values. It takes into account state-transition probabilities, the distribution of time spent in each state, the damage associated with incorrect data that fails to reflect the real-world state, and the cost of data reacquisition. Given current state and the time passed since the last transition, the model estimates the expected damage of a data record and recommends whether or not to correct it, by comparing the potential benefits of correction (elimination of potential damage), versus reacquisition cost.
Following common design science research guidelines, the applicability and the potential contribution of the model is demonstrated with a real-world dataset that reflects a process of handling insurance claims. Insurants’ status must be kept up-to-date, to avoid potential monetary damages; however, contacting an insurant for status update is costly and time consuming. Currently the contact decision is guided by some heuristics that are based on employees’ experience. The evaluation shows that applying the model has major cost-saving potential, compared to the current state.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ballou, D.P., Pazer, H.L.: Modeling completeness versus consistency tradeoffs in information decision contexts. IEEE Trans. Knowledge and Data Eng. 15(1), 240–243 (2003)
Ballou, D.P., Pazer, H.L.: Designing information systems to optimize the accuracy-timeliness tradeoff. Information Systems Research 6(1), 51–72 (1995)
Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys (CSUR) 41(3), 16 (2009)
Cai, Y., Shankaranarayanan, G.: Supporting data quality management in decision-making. Decision Support Systems 42(1), 302–317 (2006)
Cappiello, C., Francalanci, C., Pernici, B.: Time-related factors of data quality in multichannel information systems. J. of Management Information Systems 20(3), 71–92 (2003)
Chengalur-Smith, I.N., Ballou, D.P., Pazer, H.L.: The impact of data quality information on decision making: An exploratory analysis. IEEE Transactions on Knowledge and Data Engineering 11(6), 853–864 (1999)
Eppler, M., Helfert, M.: A classification and analysis of data quality costs. Paper presented at the International Conference on Information Quality (2004)
Even, A., Shankaranarayanan, G.: Utility-driven assessment of data quality. ACM SIGMIS Database 38(2), 75–93 (2007)
Even, A., Shankaranarayanan, G., Berger, P.D.: Evaluating a model for cost-effective data quality management in a real-world CRM setting. DSS 50(1), 152–163 (2010)
Fisher, C.W., Lauria, E.J., Matheus, C.C.: An accuracy metric: Percentages, randomness, and probabilities. Journal of Data and Information Quality (JDIQ) 1(3), 16 (2009)
Haug, A., Zachariassen, F., Van Liempd, D.: The costs of poor data quality. Journal of Industrial Engineering and Management 4(2), 168–193 (2011)
Heinrich, B., Klier, M., Kaiser, M.: A procedure to develop metrics for currency and its application in CRM. Journal of Data and Information Quality (JDIQ) 1(1), 5 (2009)
Heinrich, B., Klier, M.: Assessing data currency—a probabilistic approach. Journal of Information Science 37(1), 86–100 (2011)
Hevner, A.R., March, S.T., Park, J., Ram, S.: Design Science in Information Systems Research. MIS Quarterly 28(1), 75–105 (2004)
Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H.: Overview and framework for data and information quality research. J. of Data and Information Quality (JDIQ) 1(1), 2 (2009)
Parssian, A., Sarkar, S., Jacob, V.S.: Assessing data quality for information products: Impact of selection, projection, and Cartesian product. Management Science 50(7), 967–982 (2004)
Peffers, K., Tuunanen, T., Rothenberger, M., Chatterjee, S.: A Design Science Research Methodology for Information Systems Research. Journal of Management Information Systems 24(3), 45–77 (2007)
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)
Ross, S.M.: Stochastic processes, 2nd edn. Wiley, USA (1996)
Wang, R.Y.: A product perspective on total data quality management. Communications of the ACM 41(2), 58–65 (1998)
Wechsler, A., Even, A.: Assessing accuracy degradation over time with A Markov-chain model. In: The 17th Intl. Conference on Information Quality (ICIQ), Paris (2012)
Wechsler, A., Even, A., Weiss-Meilik, A.: A Model for Setting Optimal Data-Acquisition Policy and its Application with Clinical Data. In: The Intl. Conf. on Information System (ICIS), Milan, Italy (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zak, Y., Even, A. (2015). A Continuous Markov-Chain Model of Data Quality Transition: Application in Insurance-Claim Handling. In: Donnellan, B., Helfert, M., Kenneally, J., VanderMeer, D., Rothenberger, M., Winter, R. (eds) New Horizons in Design Science: Broadening the Research Agenda. DESRIST 2015. Lecture Notes in Computer Science(), vol 9073. Springer, Cham. https://doi.org/10.1007/978-3-319-18714-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-18714-3_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18713-6
Online ISBN: 978-3-319-18714-3
eBook Packages: Computer ScienceComputer Science (R0)