Generic Pattern Mining Via Data Mining Template Library

Zaki, Mohammed J.; De, Nilanjana; Gao, Feng; Palmerini, Paolo; Parimi, Nagender; Pathuri, Jeevan; Phoophakdee, Benjarath; Urban, Joe

doi:10.1007/11615576_17

Mohammed J. Zaki²¹,
Nilanjana De²¹,
Feng Gao²¹,
Paolo Palmerini²¹,
Nagender Parimi²¹,
Jeevan Pathuri²¹,
Benjarath Phoophakdee²¹ &
…
Joe Urban²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

315 Accesses

Abstract

Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for data mining, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. A detailed set of experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.

This work was supported by NSF Grant EIA-0103708 under the KD-D program, NSF CAREER Award IIS-0092978, and DOE Early Career PI Award DE-FG02-02ER25538.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Pattern Mining: Current Challenges and Opportunities

Pushing Constraints into a Pattern-Tree

Comparative evaluation of pattern mining techniques: an empirical study

Article Open access 11 November 2020

References

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Inkeri Verkamo, A.: Fast discovery of association rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th Intl. Conf. on Data Engg. (1995)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: 2nd SIAM Int’l Conference on Data Mining (April 2002)
Google Scholar
Austern, M.H.: Generic Programming and the STL. Addison Wesley Longman, Inc., Amsterdam (1999)
Google Scholar
Chaudhri, S., Fayyad, U., Bernhardt, J.: Scalable classification over SQL databases. In: 15th IEEE Intl. Conf. on Data Engineering (March 1999)
Google Scholar
Freitas, A., Lavington, S.: Mining very large databases with parallel processing. Kluwer Academic Pub., Boston (1998)
Google Scholar
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Google Scholar
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational databases. In: 1st ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1996)
Google Scholar
Imielinski, T., Virmani, A.: MSQL: A query language for database mining. Data Mining and Knowledge Discovery: An International Journal 3, 373–408 (1999)
Article Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: 4th European Conference on Principles of Knowledge Discovery and Data Mining (September 2000)
Google Scholar
Knizhnik, K.: Gigabase, object-relational database management system, http://sourceforge.net/projects/gigabase
Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using mlc++, a machine learning library in c++. International Journal of Artificial Intelligence Tools 6(4), 537–566 (1997)
Article Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Google Scholar
Mastroianni, C., Talia, D., Trunfio, P.: Managing heterogeneous resources in data mining applications on grids using xml-based metadata. In: Proceedings of The 12th Heterogeneous Computing Workshop (2002)
Google Scholar
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: 22nd Intl. Conf. Very Large Databases (1996)
Google Scholar
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: alternatives and implications. In: ACM SIGMOD Intl. Conf. Management of Data (June 1998)
Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Technology (March 1996)
Google Scholar
Tsur, D., Ullman, J.D., Abitboul, S., Clifton, C., Motwani, R., Nestorov, S.: Query flocks: A generalization of association rule mining. In: ACM SIGMOD Intl. Conf. Management of Data (June 1998)
Google Scholar
Jean-Claude Wippler. Metakit, http://www.equi4.com/metakit/
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (1999)
Google Scholar
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: IEEE Int’l Conf. on Data Mining (2002)
Google Scholar
Yan, X., Han, J.: Closegraph: Mining closed frequent graph patterns. In: ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (August 2003)
Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)
Article MathSciNet Google Scholar
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42(1/2), 31–60 (2001)
Article MATH Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (July 2002)
Google Scholar
Zaki, M.J., Aggarwal, C.C.: Xrules: An effective structural classifier for xml data. In: 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (August 2003)
Google Scholar
Zaki, M.J., Hsiao, C.-J.: ChARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining (April 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Mohammed J. Zaki, Nilanjana De, Feng Gao, Paolo Palmerini, Nagender Parimi, Jeevan Pathuri, Benjarath Phoophakdee & Joe Urban

Authors

Mohammed J. Zaki
View author publications
You can also search for this author in PubMed Google Scholar
Nilanjana De
View author publications
You can also search for this author in PubMed Google Scholar
Feng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Palmerini
View author publications
You can also search for this author in PubMed Google Scholar
Nagender Parimi
View author publications
You can also search for this author in PubMed Google Scholar
Jeevan Pathuri
View author publications
You can also search for this author in PubMed Google Scholar
Benjarath Phoophakdee
View author publications
You can also search for this author in PubMed Google Scholar
Joe Urban
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt
HIIT, Helsinki University of Technology and, University of Helsinki, Finland
Heikki Mannila

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zaki, M.J. et al. (2006). Generic Pattern Mining Via Data Mining Template Library. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_17

Download citation

DOI: https://doi.org/10.1007/11615576_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generic Pattern Mining Via Data Mining Template Library

Abstract

Chapter PDF

Similar content being viewed by others

Pattern Mining: Current Challenges and Opportunities

Pushing Constraints into a Pattern-Tree

Comparative evaluation of pattern mining techniques: an empirical study

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Generic Pattern Mining Via Data Mining Template Library

Abstract

Chapter PDF

Similar content being viewed by others

Pattern Mining: Current Challenges and Opportunities

Pushing Constraints into a Pattern-Tree

Comparative evaluation of pattern mining techniques: an empirical study

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation