Abstract
Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for data mining, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. A detailed set of experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.
This work was supported by NSF Grant EIA-0103708 under the KD-D program, NSF CAREER Award IIS-0092978, and DOE Early Career PI Award DE-FG02-02ER25538.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Inkeri Verkamo, A.: Fast discovery of association rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th Intl. Conf. on Data Engg. (1995)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: 2nd SIAM Int’l Conference on Data Mining (April 2002)
Austern, M.H.: Generic Programming and the STL. Addison Wesley Longman, Inc., Amsterdam (1999)
Chaudhri, S., Fayyad, U., Bernhardt, J.: Scalable classification over SQL databases. In: 15th IEEE Intl. Conf. on Data Engineering (March 1999)
Freitas, A., Lavington, S.: Mining very large databases with parallel processing. Kluwer Academic Pub., Boston (1998)
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational databases. In: 1st ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1996)
Imielinski, T., Virmani, A.: MSQL: A query language for database mining. Data Mining and Knowledge Discovery: An International Journal 3, 373–408 (1999)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: 4th European Conference on Principles of Knowledge Discovery and Data Mining (September 2000)
Knizhnik, K.: Gigabase, object-relational database management system, http://sourceforge.net/projects/gigabase
Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using mlc++, a machine learning library in c++. International Journal of Artificial Intelligence Tools 6(4), 537–566 (1997)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Mastroianni, C., Talia, D., Trunfio, P.: Managing heterogeneous resources in data mining applications on grids using xml-based metadata. In: Proceedings of The 12th Heterogeneous Computing Workshop (2002)
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: 22nd Intl. Conf. Very Large Databases (1996)
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with databases: alternatives and implications. In: ACM SIGMOD Intl. Conf. Management of Data (June 1998)
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Technology (March 1996)
Tsur, D., Ullman, J.D., Abitboul, S., Clifton, C., Motwani, R., Nestorov, S.: Query flocks: A generalization of association rule mining. In: ACM SIGMOD Intl. Conf. Management of Data (June 1998)
Jean-Claude Wippler. Metakit, http://www.equi4.com/metakit/
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (1999)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: IEEE Int’l Conf. on Data Mining (2002)
Yan, X., Han, J.: Closegraph: Mining closed frequent graph patterns. In: ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (August 2003)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42(1/2), 31–60 (2001)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (July 2002)
Zaki, M.J., Aggarwal, C.C.: Xrules: An effective structural classifier for xml data. In: 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (August 2003)
Zaki, M.J., Hsiao, C.-J.: ChARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining (April 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zaki, M.J. et al. (2006). Generic Pattern Mining Via Data Mining Template Library. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_17
Download citation
DOI: https://doi.org/10.1007/11615576_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31331-1
Online ISBN: 978-3-540-31351-9
eBook Packages: Computer ScienceComputer Science (R0)