Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model

Ben Othman, Leila; Rioult, François; Ben Yahia, Sadok; Crémilleux, Bruno

doi:10.1007/978-3-642-03730-6_35

Leila Ben Othman^19,20,
François Rioult²⁰,
Sadok Ben Yahia¹⁹ &
…
Bruno Crémilleux²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5691))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1104 Accesses
2 Citations

Abstract

Handling missing values when tackling real-world datasets is a great challenge arousing the interest of many scientific communities. Many works propose completion methods or implement new data mining techniques tolerating the presence of missing values. It turns out that these tasks are very hard. In this paper, we propose a new typology characterizing missing values according to relationships within the data. These relationships are automatically discovered by data mining techniques using generic bases of association rules. We define four types of missing values from these relationships. The characterization is made for each missing value. It differs from the well-known statistical methods which apply a same treatment for all missing values coming from a same attribute. We claim that such a local characterization enables us perceptive techniques to deal with missing values according to their origins: the way in which we deal with the missing values should depend on their origins (e.g., attribute meaningless w.r.t. other attributes, missing values depending on other data, missing values by accident). Experiments on a real-world medical dataset highlight the interests of such a characterization.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Cleaning Missing Data Based on the Bayesian Network

Association Rule-based Classifier Using Artificial Missing Values

Imputation for Categorical Attributes with Probabilistic Reasoning

Keywords

References

Bastide, Y., Pasquier, N., Taouil, R., Lakhal, L., Stumme, G.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000)
Chapter Google Scholar
Ben Othman, L., Ben Yahia, S.: GBAR _MVC: Generic Basis of Association Rules based approach for Missing Values Completion. The International Journal of Computing and Information Sciences (to appear)
Google Scholar
Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Approximation of frequency queries by means of free-sets. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 75–85. Springer, Heidelberg (2000)
Chapter Google Scholar
Calders, T., Goethals, B., Mampaey, M.: Mining itemsets in the presence of missing values. In: Proceedings of the ACM Symposium on Applied Computing, Seoul, Korea, pp. 404–408. ACM Press, New York (2007)
Google Scholar
Dardzinska, A., Ras, Z.W.: CHASE-2: Rule based chase algorithm for information systems of type lambda. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds.) AM 2003. LNCS (LNAI), vol. 3430, pp. 258–270. Springer, Heidelberg (2005)
Chapter Google Scholar
Delavallade, T., Dang, T.: Using entropy to impute missing data in a classification task. In: Proceedings of the International Conference of Fuzzy Systems (FUZZ-IEEE 2007), London, UK, July 2007, pp. 23–26 (2007)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Grzymala-Busse, J.W.: Three approaches to missing attribute values - a rough set perspective. In: Workshop on Foundations of Data Mining, associated with the fourth IEEE International Conference on Data Mining (2004)
Google Scholar
Grzymała-Busse, J.W., Hu, M.: A comparison of several approaches to missing attribute values in data mining. In: Ziarko, W.P., Yao, Y. (eds.) RSCTC 2000. LNCS, vol. 2005, pp. 378–385. Springer, Heidelberg (2001)
Chapter Google Scholar
Little, R., Rubin, D.: Statistical Analysis with Missing Data. John Wiley, New York (1987)
MATH Google Scholar
Nelwamondo, F., Marwala, T.: Rough set theory for the treatment of incompltete data. In: Proceedings of the IEEE International Conference of Fuzzy Systems (FUZZ-IEEE 2007), London, UK, July 2007, pp. 23–26 (2007)
Google Scholar
Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. Journal of Intelligent Information Systems 24, 29–60 (2005)
Article MATH Google Scholar
Pearson, R.K.: The problem of disguised missing data. SIGKDD Explorations 8(1), 83–92 (2006)
Article Google Scholar
Ragel, A., Crémilleux, B.: Treatment of missing values for association rules. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 258–270. Springer, Heidelberg (1998)
Chapter Google Scholar
Rioult, F., Crémilleux, B.: Mining Correct Properties in Incomplete Databases. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 208–222. Springer, Heidelberg (2007)
Chapter Google Scholar
Shafer, J.L., Graham, J.W.: Mising data: Our view of the state of the art. Psychological Methods 7(2), 147–177 (2002)
Article Google Scholar
Shen, J.J., Chang, C.C., Li, Y.C.: Combined association rules for dealing with missing values. Journal of Information Science 33(4), 468–480 (2007)
Article Google Scholar
Taouil, R., Bastide, Y.: Computing proper implications. In: Proceedings of the 9th International Conference on Conceptual Structures (ICCS 2001), Stanford, CA, pp. 49–61 (2001)
Google Scholar
Vreeken, J., Siebes, A.: Filling in the blanks - krimp minimisation for missing data. In: Perner, P. (ed.) ICDM 2008. LNCS, vol. 5077, pp. 1067–1072. Springer, Heidelberg (2008)
Google Scholar
Wu, C., Wun, C., Chou, H.: Using association rules for completing missing data. In: Proceedings of 4th International Conference on Hybrid Intelligent Systems (HIS 2004), Kitakyushu, Japan, December 5-8, 2004, pp. 236–241 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Sciences of Tunis, Tunisia
Leila Ben Othman & Sadok Ben Yahia
GREYC - CNRS UMR, University of Caen Basse-Normandie, France, 6072
Leila Ben Othman, François Rioult & Bruno Crémilleux

Authors

Leila Ben Othman
View author publications
You can also search for this author in PubMed Google Scholar
François Rioult
View author publications
You can also search for this author in PubMed Google Scholar
Sadok Ben Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Crémilleux
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Aalborg University, Selma Lagerlöfsvej 300, 9220, Aalborg Ø, Denmark
Torben Bach Pedersen
IBM India Research Lab, Plot No. 4, Block C, Institutional Area, Vasant Kunj, 110 070, New Delhi, India
Mukesh K. Mohania
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Wien, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben Othman, L., Rioult, F., Ben Yahia, S., Crémilleux, B. (2009). Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-03730-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03729-0
Online ISBN: 978-3-642-03730-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model

Abstract

Chapter PDF

Similar content being viewed by others

Cleaning Missing Data Based on the Bayesian Network

Association Rule-based Classifier Using Artificial Missing Values

Imputation for Categorical Attributes with Probabilistic Reasoning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model

Abstract

Chapter PDF

Similar content being viewed by others

Cleaning Missing Data Based on the Bayesian Network

Association Rule-based Classifier Using Artificial Missing Values

Imputation for Categorical Attributes with Probabilistic Reasoning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation