Abstract
This article introduces the class of Most Informative Patterns (MIPs) for characterizing a given dataset. MIPs form a reduced subset of non redundant closed patterns that are extracted from data thanks to a scoring function depending on domain knowledge. Accordingly, MIPs are designed for providing experts good insights on the content of datasets during data analysis. The article presents the model of MIPs and their formal properties wrt other kinds of patterns. Then, two algorithms for extracting MIPs are detailed: the first directly searches for MIPs in a dataset while the second screens MIPs from frequent patterns. The efficiencies of both algorithms are compared when applied to reference datasets. Finally the application of MIPs to labelled graphs, here molecular graphs, is discussed.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., pp. 207–216. ACM Press, New York (1993)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 647–652. ACM Press, New York (2004)
Soulet, A., Crémilleux, B.: Extraction des top-k motifs par approximer-et-pousser. In: Noirhomme-Fraiture, M., Venturini, G. (eds.) Extraction et gestion des connaissances (EGC 2007), Actes des cinquièmes journées Extraction et Gestion des Connaissances, Namur, Belgique. Volume RNTI-E-9 of Revue des Nouvelles Technologies de l’Information, Cépaduès-Éditions, January 23-26, vol. 2, pp. 271–282 (2007)
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) SDM. SIAM, Philadelphia (2006)
Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. of Art. Intell. Res. 1, 231–255 (1994)
Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: Li, Y., Liu, B., Sarawagi, S. (eds.) KDD, pp. 204–212. ACM, New York (2008)
Pennerath, F., Napoli, A.: La famille des motifs les plus informatifs. application à l’extraction de graphes en chimie organique. Revue I3 8(2), 252 (2008)
McKay, B.D.: Practical graph isomorphism. Congr. Numer. 30, 45–87 (1981)
Zaki, M.J.: Scalable algorithms for association mining. IEEE T. Knowl. Data. En. 12(3), 372–390 (2000)
Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, vol. 51. IEEE Computer Society, Los Alamitos (2002)
Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM J. Comput. 1(2), 146–160 (1972)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: A case study. In: KDD, pp. 254–260 (1999)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. International Journal of Information Systems 24(1), 25–46 (1999)
Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7(1), 5–22 (2003)
Soulet, A., Crémilleux, B.: Adequate condensed representations of patterns. Data Min. Knowl. Discov. 17(1), 94–110 (2008)
Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Grossman, R., Bayardo, R.J., Bennett, K.P. (eds.) KDD, pp. 314–323. ACM, New York (2005)
Chen, C., Lin, C.X., Yan, X., Han, J.: On effective presentation of graph patterns: a structural representative approach. In: Shanahan, J.G., Amer-Yahia, S., Manolescu, I., Zhang, Y., Evans, D.A., Kolcz, A., Choi, K.S., Chowdhury, A. (eds.) CIKM, pp. 299–308. ACM, New York (2008)
Gallo, A., Bie, T.D., Cristianini, N.: Mini: Mining informative non-redundant itemsets. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 438–445. Springer, Heidelberg (2007)
Hasan, M.A., Chaoji, V., Salem, S., Besson, J., Zaki, M.J.: Origami: Mining representative orthogonal graph patterns. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), pp. 153–162. IEEE Computer Society, Los Alamitos (2007)
Bringmann, B., Zimmermann, A.: One in a million: picking the right patterns. Knowledge and Information Systems (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pennerath, F., Napoli, A. (2009). The Model of Most Informative Patterns and Its Application to Knowledge Extraction from Graph Databases. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-04174-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04173-0
Online ISBN: 978-3-642-04174-7
eBook Packages: Computer ScienceComputer Science (R0)