The iZi Project: Easy Prototyping of Interesting Pattern Mining Algorithms

Flouvat, Frédéric; De Marchi, Fabien; Petit, Jean-Marc

doi:10.1007/978-3-642-14640-4_1

Frédéric Flouvat²⁷,
Fabien De Marchi²⁸ &
Jean-Marc Petit²⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5669))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

647 Accesses
1 Citations

Abstract

In the last decade, many data mining tools have been developed. They address most of the classical data mining problems such as classification, clustering or pattern mining. However, providing classical solutions for classical problems is not always sufficient.

This is especially true for pattern mining problems known to be “representable as set”, an important class of problems which have many applications such as in data mining, in databases, in artificial intelligence, or in software engineering. A common idea is to say that solutions devised so far for classical pattern mining problems, such as frequent itemset mining, should be useful to answer these tasks. Unfortunately, it seems rather optimistic to envision the application of most of publicly available tools even for closely related problems.

In this context, the main contribution of this paper is to propose a modular and efficient tool in which users can easily adapt and control several pattern mining algorithms. From a theoretical point of view, this work takes advantage of the common theoretical background of pattern mining problems isomorphic to boolean lattices. This tool, a C++ library called iZi, has been devised and applied to several problems such as itemset mining, constraint mining in relational databases, and query rewriting in data integration systems. According to our first results, the programs obtained using the library have very interesting performance characteristics regarding simplicity of their development. The library is open source and freely available on the Web.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Constraint-Based Pattern Mining

Pushing Constraints into a Pattern-Tree

Two Decades of Pattern Mining: Principles and Methods

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Goethals, B., Nijssen, S., Zaki, M.J.: Open source data mining: workshop report. SIGKDD Explorations 7, 143–144 (2005)
Article Google Scholar
Bayardo Jr., R.J., Zaki, M.J.(eds.): FIMI 2003, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA, November 19. CEUR Workshop Proceedings, vol. 90, CEUR-WS.org (2003)
Google Scholar
Bayardo Jr., R.J., Goethals, B., Zaki, M.J. (eds.): FIMI 2004, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1. CEUR Workshop Proceedings, vol. 126, CEUR-WS.org (2004)
Google Scholar
Han, J.: Data Mining Group: IlliMine project. University of Illinois Urbana-Champaign Database and Information Systems Laboratory (2005), http://illimine.cs.uiuc.edu/
Hasan, M., Chaoji, V., Salem, S., Parimi, N., Zaki, M.: DMTL: A generic data mining template library. In: Workshop on Library-Centric Software Design (LCSD 2005), at OOPSLA 2005 conference, San Diego, California (2005)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: Yale: rapid prototyping for complex data mining tasks. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) KDD, pp. 935–940. ACM, New York (2006)
Google Scholar
Bonchi, F., Giannotti, F., Lucchese, C., Orlando, S., Perego, R., Trasarti, R.: Conquest: a constraint-based querying system for exploratory pattern discovery. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE, p. 159. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
Blockeel, H., Calders, T., Fromont, É., Goethals, B., Prado, A., Robardet, C.: An inductive database prototype based on virtual mining views. In: Li, Y., Liu, B., Sarawagi, S. (eds.) KDD, pp. 1061–1064. ACM, New York (2008)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov. 1, 241–258 (1997)
Article Google Scholar
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) SIGMOD Conference, pp. 207–216. ACM Press, New York (1993)
Google Scholar
Mannila, H., Toivonen, H.: Multiple uses of frequent sets and condensed representations (extended abstract). In: KDD, pp. 189–194 (1996)
Google Scholar
Koeller, A., Rundensteiner, E.A.: Heuristic strategies for inclusion dependency discovery. In: Meersman, R., Tari, Z. (eds.) OTM 2004, Part II. LNCS, vol. 3291, pp. 891–908. Springer, Heidelberg (2004)
Google Scholar
De Marchi, F., Flouvat, F., Petit, J.M.: Adaptive strategies for mining the positive border of interesting patterns: Application to inclusion dependencies in databases. In: Boulicaut, J.-F., De Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 81–101. Springer, Heidelberg (2006)
Chapter Google Scholar
Angluin, D.: Queries and concept learning. Machine Learning 2, 319–342 (1987)
Google Scholar
Li, Z., Zhou, Y.: Pr-miner: automatically extracting implicit programming rules and detecting violations in large software code. In: Wermelinger, M., Gall, H. (eds.) ESEC/SIGSOFT FSE, pp. 306–315. ACM, New York (2005)
Google Scholar
Casali, A., Cicchetti, R., Lakhal, L.: Essential patterns: A perfect cover of frequent patterns. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 428–437. Springer, Heidelberg (2005)
Chapter Google Scholar
Gschwind, T.: Pstl-a c++ persistent standard template library. In: COOTS, pp. 147–158. USENIX (2001)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, pp. 283–286 (1997)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharm, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28, 140–174 (2003)
Article Google Scholar
Flouvat, F., De Marchi, F., Petit, J.M.: ABS: Adaptive Borders Search of frequent itemsets. In: [3]
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) SIGMOD Conference, pp. 1–12. ACM, New York (2000)
Chapter Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: Lcm: An efficient algorithm for enumerating frequent closed item sets. In: [2]
Google Scholar
Bodon, F.: Surprising results of trie-based fim algorithms. In: [3]
Google Scholar
Flach, P.A., Savnik, I.: Database dependency discovery: A machine learning approach. AI Commun. 12, 139–160 (1999)
MathSciNet Google Scholar
Jaudoin, H., Flouvat, F., Petit, J.M., Toumani, F.: Towards a scalable query rewriting algorithm in presence of value constraints. Journal on Data Semantics 12, 37–65 (2009)
Google Scholar
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)
MATH Google Scholar
Mitchell, J.C.: The implication problem for functional and inclusion dependencies. Information and Control 56, 154–173 (1983)
Article MATH MathSciNet Google Scholar
Goethals, B.: Frequent itemset mining implementations repository, http://fimi.cs.helsinki.fi/
Sismanis, Y., Brown, P., Haas, P.J., Reinwald, B.: Gordian: Efficient and scalable discovery of composite keys. In: Dayal, U., Whang, K.Y., Lomet, D.B., Alonso, G., Lohman, G.M., Kersten, M.L., Cha, S.K., Kim, Y.K. (eds.) VLDB, pp. 691–702. ACM, New York (2006)
Google Scholar
Goethals, B.: Apriori implementation. University of Antwerp, http://www.adrem.ua.ac.be/~goethals/
Borgelt, C.: Recursion pruning for the apriori algorithm. In: [3]
Google Scholar
Boulicaut, J.F., Klemettinen, M., Mannila, H.: Modeling kdd processes within the inductive database framework. In: Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 293–302. Springer, Heidelberg (1999)
Google Scholar
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999)
MATH Google Scholar
Chaudhuri, S.: Data mining and database systems: Where is the intersection? IEEE Data Eng. Bull. 21, 4–8 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

PPME, University of New Caledonia, F-98851, Noumea, New Caledonia
Frédéric Flouvat
Université de Lyon, CNRS, Université Lyon 1, LIRIS, UMR5205, F-69621, France
Fabien De Marchi
CNRS, INSA-Lyon, LIRIS, UMR5205, Université de Lyon, F-69621, France
Jean-Marc Petit

Authors

Frédéric Flouvat
View author publications
You can also search for this author in PubMed Google Scholar
Fabien De Marchi
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Petit
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Thammasat University, Sirindhorn International Institute of Technology,, 131 Moo 5 Tiwanont Road, Bangkadi, 12000, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Department of Architecture for Intelligence, The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka,Ibaraki, 567-0047, Osaka, Japan
Cholwich Nattee
Center for Informatics, Federal University of Pernambuco, Brazil
Paulo J. L. Adeodato
Computer Science and Engineering Department, University of Notre Dame, 353 Fitzpatrick Hall, 46556, Notre Dame, IN, USA
Nitesh Chawla
Department of Computer Science, The Australian National University, Australia
Peter Christen
TELECOM Bretagne, Lab-STICC, Institut TELECOM, Brest, France
Philippe Lenca
School of Information Technologies, University of Sydney, P.O. Box, Australia
Josiah Poon
Australian Taxation Office, Australia
Graham Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Flouvat, F., De Marchi, F., Petit, JM. (2010). The iZi Project: Easy Prototyping of Interesting Pattern Mining Algorithms. In: Theeramunkong, T., et al. New Frontiers in Applied Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14640-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-14640-4_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14639-8
Online ISBN: 978-3-642-14640-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The iZi Project: Easy Prototyping of Interesting Pattern Mining Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Constraint-Based Pattern Mining

Pushing Constraints into a Pattern-Tree

Two Decades of Pattern Mining: Principles and Methods

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The iZi Project: Easy Prototyping of Interesting Pattern Mining Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Constraint-Based Pattern Mining

Pushing Constraints into a Pattern-Tree

Two Decades of Pattern Mining: Principles and Methods

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation