Abstract
Missing values issue in databases is an important problem because missing values bias the information provided by the usual data mining methods. In this paper, we are searching for mining patterns satisfying correct properties in presence of missing values (it means that these patterns must satisfy the properties in the corresponding complete database). We focus on k-free patterns. Thanks to a new definition of this property suitable for incomplete data and compatible with the usual one, we certify that the extracted k-free patterns in an incomplete database also satisfy this property in the corresponding complete database. Moreover, this approach enables to provide an anti-monotone criterion with respect to the pattern inclusion and thus design an efficient level-wise algorithm which extracts correct k-free patterns in presence of missing values.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Grzymala-Busse, J., Hu, M.: A comparison of several approaches to missing attribute values in data mining. In: Ziarko, W., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 378–385. Springer, Heidelberg (2001)
Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 71–82. Springer, Heidelberg (2003)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, Springer, Heidelberg (2002)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Intl. Conference on Very Large Data Bases (VLDB 1994), Santiago de Chile, Chile, pp. 487–499 (1994)
Bykowski, A., Rigotti, C.: A condensed representation to find frequent patterns. In: ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, Santa Barbara, USA, pp. 267–273. ACM Press, New York (2001)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: International Conference on Deductive and Object Databases (DOOD 2000), pp. 972–986 (2000)
Zaki, M.: Generating non-redundant association rules. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 34–43. ACM Press, New York (2000)
Rioult, F.: Extraction de connaissances dans les bases de données comportant des valeurs manquantes ou un grand nombre d’attributs. PhD thesis, Université de Caen Basse-Normandie, France (2005)
Gunopulos, D., Mannila, H., Khardon, R., Toivonen, H.: Data mining, hypergraph transversals, and machine learning. In: ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1997), ACM Press, New York (1997)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Antonie, M.L., Zaïane, O.: An associative classifier based on positive and negative rules. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2004), ACM Press, New York (2004)
Antonie, M.L., Zaïane, O.: Mining positive and negative association rules: An approach for confined rules. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 27–38. Springer, Heidelberg (2004)
Jaroszewicz, S., Simovici, D.: Support approximations using bonferroni-type inequalities. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 212–224. Springer, Heidelberg (2002)
Dyreson, C.E.: A Bibliography on Uncertainty Management in Information Systems. In: Uncertainty Management in Information Systems, Kluwer Academic Publishers, Dordrecht (1997)
Levene, M., Loizou, G.: Database design for incomplete relations. ACM Transactions on Database Systems 24(1), 80–126 (1999)
Ragel, A., Crémilleux, B.: Mvc - a preprocessing method to deal with missing values. Knowledge-Based Systems 12(5-6), 285–291 (1999)
Nayak, J., Cook, D.: Approximate association rule mining. In: Florida Artificial Intelligence Research Symposium, Key West, Florida, USA, pp. 259–263 (2001)
Jami, S., Jen, T., Laurent, D., Loizou, G., Sy, O.: Extraction de régles d’association pour la prédiction de valeurs manquantes. In: Colloque Africain sur la Recherche en Informatique (CARI) (2004)
Ragel, A., Crémilleux, B.: Treatment of missing values for association rules. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, pp. 258–270. Springer, Heidelberg (1998)
Kryszkiewicz, M.: Association rules in incomplete databases. In: Zhong, N., Zhou, L. (eds.) Methodologies for Knowledge Discovery and Data Mining (PAKDD 1999). LNCS (LNAI), vol. 1574, pp. 84–93. Springer, Heidelberg (1999)
Rioult, F., Crémilleux, B.: Condensed representations in presence of missing values. In: Berthold, M.R., Lenz, H-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 578–588. Springer, Heidelberg (2003)
Rioult, F., Crémilleux, B.: Représentation condensée en présence de valeurs manquantes. In: XXIIé congrés Inforsid, Biarritz, France, pp. 301–317 (2004)
Calders, T., Goethals, B.: Quick inclusion-exclusion. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rioult, F., Crémilleux, B. (2007). Mining Correct Properties in Incomplete Databases. In: Džeroski, S., Struyf, J. (eds) Knowledge Discovery in Inductive Databases. KDID 2006. Lecture Notes in Computer Science, vol 4747. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75549-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-75549-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75548-7
Online ISBN: 978-3-540-75549-4
eBook Packages: Computer ScienceComputer Science (R0)