Abstract
In this article we show that there is a strong connection between decision tree learning and local pattern mining. This connection allows us to solve the computationally hard problem of finding optimal decision trees in a wide range of applications by post-processing a set of patterns: we use local patterns to construct a global model. We exploit the connection between constraints in pattern mining and constraints in decision tree induction to develop a framework for categorizing decision tree mining constraints. This framework allows us to determine which model constraints can be pushed deeply into the pattern mining process, and allows us to improve the state-of-the-art of optimal decision tree induction.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) VLDB’94, proceedings of 20th international conference on very large data bases. Morgan Kaufmann, pp 487–499
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, pp 307–328
Angelopoulos N, Cussens J (2005) Exploiting informative priors for Bayesian clas-sification and regression trees. In: Kaelbling LP, Saffiotti A (eds) IJCAI-05, proceedings of the 19th international joint conference on artificial intelligence. Professional Book Center, pp 641–646
Asuncion A, Newman D (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
Baixeries J, Szathmary L, Valtchev P, Godin R (2009) Yet a faster algorithm for building the Hasse diagram of a concept lattice. In: Ferre S, Rudolph S (eds) ICFCA’09, proceedings of the 7th international conference on formal concept analysis. Springer, pp 162–177
Bayardo R, Goethals B, Zaki MJ (eds) (2004) FIMI ’04, proceedings ofthe IEEE ICDM workshop on frequent itemset mining implementations. CEUR-WS.org
Blanchard G, Schafer C, Rozenholc Y, Müller KR (2007) Optimal dyadic decision trees. Mach Learn 66(2–3): 209–241
Bonchi F, Lucchese C (2007) Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl Eng 60(2): 377–399
Boulicaut J-F, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: Zighed DA, Komorowski HJ, Zytkow JM (eds) PKDD’00, proceedings ofthe 4th European conference on principles of data mining and knowledge discovery. Springer, pp 75–85
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowl Discov 7(1): 5–22
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Publishing Company, Belmont, California, USA
Bringmann B, Nijssen S, Zimmermann A (2009) Pattern-based classification: a unifying perspective. In: Knobbe A, Furnkranz J (eds) LeGo ’09, proceedings of the ECML PKDD 2009 workshop ‘From Local Patterns to Global Models’
Bucila C, Gehrke J, Kifer D, White WM (2003) DualMiner: a dual-pruning algorithm for itemsets with constraints. Data Mining Knowl Discov 7(3): 241–272
Buntine W (1992) Learning classification trees. Stat Comput 2: 63–73
Chipman HA, George EI, McCulloch RE (1998) Bayesian CART model search. J Am Stat Assoc 93(443): 935–947
De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Parthasarathy S, Liu B (eds) SDM’07, proceedings of the seventh SIAM international conference on data mining. SIAM Press, pp 1–12
Dong G, Zhang X, Wong L, Li J (1999) CAEP: classification by aggregating emerging patterns. In: Arikawa S, Furukawa K (eds) DS’99, proceedings of the second international conference on discovery science. Springer, pp 30–42
Esmeir S, Markovitch S (2007a) Anytime induction of cost-sensitive trees. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) NIPS’07, proceedings of the 21st conference on neural information processing systems. MIT Press, pp 425–132
Esmeir S, Markovitch S (2007b) Anytime learning of decision trees. J Machine Learn Res 8: 891–933
Friedman A, Schuster A, Wolff R (2006) k-anonymous decision tree induction. In: Furnkranz J, Scheffer T, Spiliopoulou M (eds) PKDD’06, proceedings ofthe 10th European conference on principles and practice of knowledge discovery in databases. Springer, pp 151–162
Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer, Berlin
Garey MR (1972) Optimal binary identification procedures. SIAM J Appl Math 23(2): 173–186
Garofalakis MN, Hyun D, Rastogi R, Shim K (2003) Building decision trees with constraints. Data Min Knowl Discov 7(2): 187–214
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton J, Bernstein PA (eds) SIGMOD’00, proceedings ofthe 2000 ACM SIGMOD international conference on management of Data. ACM Press, pp 1–12
Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1): 15–17
Imielinski T, Mannila H (1996) A database perspective on knowledge discovery. Commun ACM 39: 58–64
Knobbe A, Ho E (2006) Maximally informative k-itemsets and their efficient discovery. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) KDD’06, proceeding ofthe 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, pp 237–244
Knobbe A, Cremilleux B, Furnkranz J, Scholz M (2008) From local patterns to global models: the LeGo approach to data mining. In: Furnkranz J, Knobbe A (eds) LeGo’08, proceedings ofthe ECML PKDD 2008 workshop ‘From Local Pat-terns to Global Models’, pp 1–16
Lew A (1978) Optimal conversion of extended-entry decision tables with general cost criteria. Commun ACM 21(4): 269–279
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Cercone N, Lin TY, Wu X (eds) ICDM’01, proceedings of the 2001 IEEE international conference on data mining. IEEE Computer Society, pp 369–376
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) KDD’98, proceedings ofthe 4th international conference on knowledge discovery and data mining. AAAI Press, pp 80–86
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1): 3
Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Fayyad UM, Uthurusamy R (eds) KDD’94, proceedings ofthe AAAI workshop on knowledge discovery in databases. AAAI Press, pp 181–192
Meisel WS, Michalopoulos D (1973) A partitioning algorithm with application in pattern classification and the optimization of decision tree. IEEE Trans Comput C-22:93–103
Moore A, Lee MS (1998) Cached sufficient statistics for efficient machine learning with large datasets. J Artif Intell Res 8: 67–91
Murphy PM, Pazzani MJ (1997) Exploring the decision forest: an empirical in-vestigation of Occam’s razor in decision tree induction. In: Greiner R, Petsche T, Jose S (eds) Computational learning theory and natural learning systems vol IV: making learning systems practical. MIT Press, pp 171–187
Nadeau C, Bengio Y (2003) Inference for the generalization error. Machine Learn 52(3): 239–281
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1): 25–46
Payne HJ, Meisel WS (1977) An algorithm for constructing optimal binary decision trees. IEEE Trans Comput 26(9): 905–916
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent item sets with convertible constraints. In: ICDE’01, proceedings of the IEEE international conference on data engineering. IEEE Computer Society, pp 433–142
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Schumacher H, Sevcik KC (1976) The synthetic approach to decision table conversion. Commun ACM 19(6): 343–351
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertainty, Fuzziness Knowledge-Based Syst 10(5): 557–570
Turney P (1995) Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2: 369–409
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Bayardo R, Goethals B, Zaki MJ (eds) FIMI ’04, proceedings of the IEEE ICDM workshop on frequent itemset mining implementations. CEUR-WS.org
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Grossman R, Bayardo RJ, Bennett KP (eds) KDD’05, proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, pp 314–323
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997a) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds) KDD’97, proceedings of the 3rd international conference on knowledge discovery and data mining. AAAI Press, pp 283–286
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997b) Parallel algorithms for discovery of association rules. Data Mining Knowl Discov 1(4): 343–373
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Fürnkranz and Arno Knobbe.
Rights and permissions
About this article
Cite this article
Nijssen, S., Fromont, E. Optimal constraint-based decision tree induction from itemset lattices. Data Min Knowl Disc 21, 9–51 (2010). https://doi.org/10.1007/s10618-010-0174-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0174-x