Abstract
Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declara-tively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
References
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.
H. Albert-Lorincz and J.-F. Boulicaut. Mining frequent sequential patterns under regular expressions: a highly adaptative strategy for pushing constraints. In Proc. SIAMDM’03, pages 316–320, 2003.
Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proc. CL 2000, volume 1861 of LNCS, pages 972–986. Springer-Verlag, 2000.
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2):66–75, 2000.
R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. ACM SIGMOD’98, pages 85–93, 1998.
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Adaptive constraint pushing in frequent pattern mining. In Proc. PKDD’03, volume 2838 of LNAI, pages 47–58. Springer-Verlag, 2003A.
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Examiner: Optimized level-wise frequent pattern mining with monotone constraints. In Proc. IEEEICDMW, pages 11–18, 2003B.
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Exante: Anticipated data reduction in constrained pattern mining. In Proc. PKDD’03, volume 2838 of LNAI, pages 59–70. Springer-Verlag, 2003C.
F. Bonchi and C. Lucchese. On closed constrained frequent pattern mining. In Proc. IEEE ICDM’04 (In Press), 2004.
J.-F. Boulicaut. Inductive databases and multiple uses of frequent itemsets: the clnQ approach. In Database Technologies for Data Mining-Discovering Knowledge with Inductive Queries, volume 2682 of LNCS, pages 1–23. Springer-Verlag, 2004.
J.-F. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary Data Mining. In Proc. PAKDDW, volume 1805 of LNAI, pages 62–73. Springer-Verlag, 2000.
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by mean of free-sets. In Proc. PKDD’00, volume 1910 of LNAI, pages 75–85. Springer-Verlag, 2000.
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery, 7(1):5–22, 2003.
J.-F. Boulicaut and B. Jeudy. Using constraint for itemset mining: should we prune or not? In Proc. BDA’00, pages 221–237, 2000.
J.-F. Boulicaut and B. Jeudy. Mining free-sets under constraints. In Proc. IEEE IDEAS’01, pages 322–329, 2001.
C. Bucila, J. E. Gehrke, D. Kifer, and W. White. Dualminer: A dual-pruning algorithm for itemsets with constraints. Data Mining and Knowledge Discovery, 7(4):241–272, 2003.
D. Burdickj M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. IEEE ICDE’01, pages 443–452, 2001.
A. Bykowski and C. Rigotti. DBC: a condensed representation of frequent patterns for efficient mining. Information Systems, 28(8):949–977, 2003.
T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proc. PKDD’02, volume 2431 of LNAI, pages 74–85. Springer-Verlag, 2002.
B. Crémilleux and J.-F. Boulicaut. Simplest rules characterizing classes generated by delta-free sets. In Proc. ES 2002, pages 33–46. Springer-Verlag, 2002.
L. De Raedt. A perspective on inductive databases. SIGKDD Explorations, 4(2):69–77, 2003.
L. De Raedt, M. Jaeger, S. Lee, and H. Mannila. A theory of inductive query answering. In Proc. IEEE ICDM’02, pages 123–130, 2002.
L. De Raedt and S. Kramer. The levelwise version space algorithm and its application to molecular fragment finding. In Proc. IJCA’01, pages 853–862, 2001.
M. M. Garofalakis and R. Rastogi. Scalable Data Mining with model constraints. SIGKDD Explorations, 2(2):39–48, 2000.
M. M. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. VLDB’99, pages 223–234, 1999.
B. Goethals and M. J. Zaki, editors. Proc. of the IEEE ICDM 2003 Workshop on Frequent Itemset Mining Implementations, volume 90 of CEUR Workshop Proceedings, 2003.
D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharm. Discovering all most specific sentences. ACM Transactions on Database Systems, 28(2): 140–174, 2003.
T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of the ACM, 39(ll):58–64, 1996.
B. Jeudy and J.-F. Boulicaut. Optimization of association rule mining queries. Intelligent Data Analysis, 64:341–357, 2002.
D. Kifer, J. E. Gehrke, C. Bucila, and W. White. How to quickly find a witness. In Proc. ACMPODS’03, pages 272–283, 2003.
S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. ACMSIGKDD’O1, pages 136–143, 2001.
L. V. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set queries with 2-variable constraints. In Proc. ACM SIGMOD’99, pages 157–168, 1999.
D.-I. Lin and Z. M. Kedem. Pincer search: An efficient algorithm for discovering the maximum frequent sets. IEEE Transactions on Knowledge and Data Engineering, 14(3):553–566, 2002.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proc. KDD’96, pages 189–194. AAAI Press, 1996.
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3): 241–258, 1997.
C. Mellish. The description identification problem. Artificial Intelligence, 52(2): 151–168, 1992.
R. Meo. Optimization of a language for Data Mining. In Proc. ACM SAC’03-Data Mining Track, pages 437–444, 2003.
R. Meo, G. Psaila, and S. Ceri. An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, 2(2): 195–224, 1998.
T. Mitchell. Generalization as search. Artificial Intelligence, 18(2):203–226, 1980.
R. Ng, L. V. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. ACM SIGMOD’98, pages 13–24, 1998.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1):25–26, 1999.
J. Pei, G. Dong, W. Zou, and J. Han. On computing condensed frequent pattern bases. In Proc. IEEE ICDM’02, pages 378–385, 2002.
J. Pei, J. Han, and L. V. S. Lakshmanan. Mining frequent itemsets with convertible constraints. In Proc. IEEEICDE’O1, pages 433–442, 2001.
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. itc. ACMSIGKDD’97, pages 67–73, 1997.
M. J. Zaki. Sequence mining in categorical domains: incorporating constraints. In Proc. ACM CIKM’00, pages 422–429, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Boulicaut, JF., Jeudy, B. (2005). Constraint-Based Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_18
Download citation
DOI: https://doi.org/10.1007/0-387-25465-X_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24435-8
Online ISBN: 978-0-387-25465-4
eBook Packages: Computer ScienceComputer Science (R0)