Frequent Set Mining

Goethals, Bart

doi:10.1007/0-387-25465-X_17

Bart Goethals²

21k Accesses
29 Citations

Abstract

Frequent sets lie at the basis of many Data Mining algorithms. As a result, hundreds of algorithms have been proposed in order to solve the frequent set mining problem. In this chapter, we attempt to survey the most successful algorithms and techniques that try to solve this problem efficiently.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Keywords

References

Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining association rules between sets of items in large databases. In Buneman, P. and Jajodia, S., editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, volume 22(2) of SIGMOD Record, pages 207–216. ACM Press.
Google Scholar
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., and Verkamo, A. (1996). Fast discovery of association rules. In Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and Data Mining, pages 307–328. MIT Press.
Google Scholar
Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In Bocca, J., Jarke, M., and Zaniolo, C, editors, Proceedings 20th International Conference on Very Large Data Bases, pages 487–499. Morgan Kaufmann.
Google Scholar
Amir, A., Feldman, R., and Kashi, R. (1997). A new and versatile method for association generation. Information Systems, 2:333–347.
Article Google Scholar
Bayardo, Jr., R. (1998). Efficiently mining long patterns from databases. In (Haas and Tiwary, 1998), pages 85–93.
Google Scholar
Bonchi, F., Giannotti, F, Mazzanti, A., and Pedreschi, D. (2003). Exante: Anticipated data reduction in constrained pattern mining. In (Lavrac et al., 2003).
Google Scholar
Borgelt, C. and Kruse, R. (2002). Induction of association rules: Apriori implementation. In Hardle, W. and Ronz, B., editors, Proceedings of the 15th Conference on Computational Statistics, pages 395–400. Physica-Verlag.
Google Scholar
Boulicaut, J.-E, Bykowski, A., and Rigotti, C. (2003). Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1):5–22.
Article MathSciNet Google Scholar
Brin, S., Motwani, R., Ullman, J., and Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, volume 26(2) of SIGMOD Record, pages 255–264. ACM Press.
Article Google Scholar
Burdick, D., Calimlim, M., and Gehrke, J. (2001). MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proceedings of the 17th International Conference on Data Engineering, pages 443–452. IEEE Computer Society.
Google Scholar
Bykowski, A. and Rigotti, C. (2001). A condensed representation to find frequent patterns. In Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 267–273. ACM Press.
Google Scholar
Calders, T. (2004). Computational complexity of itemset frequency satisfiability. In Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 143–154. ACM Press.
Google Scholar
Calders, T. and Goethals, B. (2002). Mining all non-derivable frequent item-sets. In Elomaa, T., Mannila, H., and Toivonen, H., editors, Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, volume 2431 of Lecture Notes in Computer Science, pages 74–85. Springer.
Google Scholar
Calders, T. and Goethals, B. (2003). Minimal k-free representations of frequent sets. In (Lavrac et al., 2003), pages 71–82.
Google Scholar
Cercone, N., Lin, T., and Wu, X., editors (2001). Proceedings of the 2001 IEEE International Conference on Data Mining. IEEE Computer Society.
Google Scholar
Dayal, U., Gray, P., and Nishio, S., editors (1995). Proceedings 21th International Conference on Very Large Data Bases. Morgan Kaufmann.
Google Scholar
G. Grahne, J. Z. (2003). Efficiently using prefix-trees in mining frequent item-set. In (Goethals and Zaki, 2003).
Google Scholar
G. Ramesh, W. Maniatty, M. Z. (2003). Feasible itemset distributions in Data Mining: theory and application. In Proceedings of the Twenty-second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 284–295. ACM Press.
Google Scholar
Geerts, E, Goethals, B., and den Bussche, J. V. (2001). A tight upper bound on the number of candidate patterns. In (Cercone et al., 2001), pages 155–162.
Google Scholar
Getoor, L., Senator, T., Domingos, P., and Faloutsos, C, editors (2003). Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press.
Google Scholar
Goethals, B. (2004). Memory issues in frequent itemset mining. In Haddad, H., Omicini, A., Wainwright, R., and Liebrock, L., editors, Proceedings of the 2004 ACM symposium on Applied computing, pages 530–534. ACM Press.
Google Scholar
Goethals, B. and den Bussche, J. V. (2000). On supporting interactive association rule mining. In Kambayashi, Y., Mohania, M., and Tjoa, A., editors, Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery, volume 1874 of Lecture Notes in Computer Science, pages 307–316. Springer.
Google Scholar
Goethals, B. and Zaki, M., editors (2003). Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, volume 90 of CEUR Workshop Proceedings.
Google Scholar
Gouda, K. and Zaki, M. (2001). Efficiently mining maximal frequent itemset. In (Cercone et al., 2001), pages 163–170.
Google Scholar
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., and Sharma, R. (2003). Discovering all most specific sentences. ACM Transactions on Database Systems, 28(2): 140–174.
Article Google Scholar
Haas, L. and Tiwary, A., editors (1998). Proceedings of the 1998 ACM SIG-MOD International Conference on Management of Data, volume 27(2) of SIGMOD Record. ACM Press.
Google Scholar
Google Scholar
Han, J., Pei, J., Yin, Y., and Mao, R. (2004). Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1):53–87.
Article MathSciNet Google Scholar
Holsheimer, M., Kersten, M., Mannila, H., and Toivonen, H. (1995). A perspective on databases and Data Mining. In Fayyad, U. and Uthurusamy, R., editors, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 150–155. AAAI Press.
Google Scholar
Lavrac, N., Gamberger, D., Blocked, H., and Todorovski, L., editors (2003). Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 2838 of Lecture Notes in Computer Science. Springer.
Google Scholar
Liu, G., Lu, H., Yu, J., Wei, W., and Xiao, X. (2003). AFOPT: An efficient implementation of pattern growth approach. In (Goethals and Zaki, 2003).
Google Scholar
Mannila, H. (1997). Inductive databases and condensed representations for Data Mining. In Maluszynski, J., editor, Proceedings of the 1997 International Symposium on Logic Programming, pages 21–30. MIT Press.
Google Scholar
Mannila, H. (2002). Local and global methods in Data Mining: Basic techniques and open problems. In Widmayer, P., Ruiz, F., Morales, R., Hennessy, M., Eidenbenz, S., and Conejo, R., editors, Proceedings of the 29th International Colloquium on Automata, Languages and Programming, volume 2380 of Lecture Notes in Computer Science, pages 57–68. Springer.
Google Scholar
Mannila, H., Toivonen, H., and Verkamo, A. (1994). Efficient algorithms for discovering association rules. In Fayyad, U. and Uthurusamy, R., editors, Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, pages 181–192. AAAI Press.
Google Scholar
Mielikäinen, T. (2003). On inverse frequent set mining. In Du, W. and Clifton, C, editors, 2nd Workshop on Privacy Preserving Data Mining, pages 18–23.
Google Scholar
Ng, R., Lakshmanan, L., Han, J., and Pang, A. (1998). Exploratory mining and pruning optimizations of constrained association rules. In (Haas and Tiwary, 1998), pages 13–24.
Google Scholar
Orlando, S., Palmerini, P., Perego, R., and Silvestri, F. (2002). Adaptive and resource-aware mining of frequent sets. In Kumar, V., Tsumoto, S., Yu, P., and N. Zhong, editors, Proceedings of the 2002 IEEE International Conference on Data Mining. IEEE Computer Society. To appear.
Google Scholar
Pan, F., Cong, G., and A.K.H. Tung, J. Yang, M. Z. (2003). Carpenter: finding closed patterns in long biological datasets. In (Getoor et al., 2003), pages 637–642.
Google Scholar
Park, J., Chen, M.-S., and Yu, P. (1995). An effective hash based algorithm for mining association rules. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, volume 24(2) of SIGMOD Record, pages 175–186. ACM Press.
Article Google Scholar
Pasquier, N., Bastide, Y, Taouil, R., and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In Beeri, C. and Buneman, P., editors, Proceedings of the 7th International Conference on Database Theory, volume 1540 of Lecture Notes in Computer Science, pages 398–416. Springer.
Google Scholar
Rioult, K, Boulicaut, J.-R, and B. Cremilleux, J. B. (2003). Using transposition for pattern discovery from microarray data. In Zaki, M. and Aggarwal, C, editors, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 73–79. ACM Press.
Google Scholar
Savasere, A., Omiecinski, E., and Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. In (Dayal et al., 1995), pages 432–444.
Google Scholar
Srikant, R. (1996). Fast algorithms for mining association rules and sequential patterns. PhD thesis, University of Wisconsin, Madison.
Google Scholar
Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. In (Dayal et al., 1995), pages 407–419.
Google Scholar
Srikant, R., Vu, Q., and Agrawal, R. (1997). Mining association rules with item constraints. In Heckerman, D., Mannila, H., and Pregibon, D., editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 66–73. AAAI Press.
Google Scholar
Toivonen, H. (1996). Sampling large databases for association rules. In Vija-yaraman, T., Buchmann, A., Mohan, C, and Sarda, N., editors, Proceedings 22nd International Conference on Very Large Data Bases, pages 134–145. Morgan Kaufmann.
Google Scholar
Uno, T. and Satoh, K. (2003). Detailed description of an algorithm for enumeration of maximal frequent sets with irredundant dualization. In (Goethals and Zaki, 2003).
Google Scholar
Vaidya, J. and Clifton, C. (2002). Privacy preserving association rule mining in vertically partitioned data. In Hand, D., Keim, D., and Ng, R., editors, Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639–644. ACM Press.
Google Scholar
Wang, J., Han, J., and Pei, J. (2003). CLOSET+: searching for the best strategies for mining frequent closed itemsets. In (Getoor et al., 2003), pages 236–245.
Google Scholar
Yang, G. (2004). The complexity of mining maximal frequent itemsets and maximal frequent patterns. In DuMouchel, W., Gehrke, J., Ghosh, J., and Kohavi, R., editors, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press.
Google Scholar
Zaki, M. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372–390.
Article MathSciNet Google Scholar
Zaki, M. and Gouda, K. (2003). Fast vertical mining using diffsets. In (Getoor et al., 2003), pages 326–335.
Google Scholar
Zaki, M. and Hsiao, C.-J. (2002). CHARM: An efficient algorithm for closed itemset mining. In Grossman, R., Han, J., Kumar, V., Mannila, H., and Mot-wani, R., editors, Proceedings of the Second SIAM International Conference on Data Mining.
Google Scholar

Download references

Author information

Authors and Affiliations

Departement of Mathematils and Computer Science, University of Antwerp, Belgium
Bart Goethals

Authors

Bart Goethals
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Industrial Engineering, Tel-Aviv University, 69978, Ramat-Aviv, Israel
Oded Maimon & Lior Rokach &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Goethals, B. (2005). Frequent Set Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_17

Download citation

DOI: https://doi.org/10.1007/0-387-25465-X_17
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics