Abstract
In many data mining tools that support regression tasks, training data are stored in a single table containing both the target field (dependent variable) and the attributes (independent variables). Generally, only intra-tuple relationships between the attributes and the target field are found, while inter-tuple relationships are not considered and (inter-table) relationships between several tuples of distinct tables are not even explorable. Disregarding inter-table relationships can be a severe limitation in many real-word applications that involve the prediction of numerical values from data that are naturally organized in a relational model involving several tables (multi-relational model). In this paper, we present a new data mining algorithm, named Mr-SMOTI, which induces model trees from a multi-relational model. A model tree is a tree-structured prediction model whose leaves are associated with multiple linear regression models. The particular feature of Mr-SMOTI is that internal nodes of the induced model tree can be of two types: regression nodes, which add a variable to some multiple linear models according to a stepwise strategy, and split nodes, which perform tests on attributes or the join condition and eventually partition the training set. The induced model tree is a multi-relational pattern that can be represented by means of selection graphs, which can be translated into SQL, or equivalently into first order logic expressions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Appice, A., Ceci, M., Lanza, A., Lisi, F.A., Malerba, D.: Discovery of Spatial Association Rules in Georeferenced Census Data: A Relational Mining Approach, Intelligent Data Analysis, numero speciale su "Mining Official Data" (in press)
Blockeel, H.: Top-down induction of first order logical decision trees. Ph.D thesis, Department of Computer Science, Katholieke Universiteit Leuven (1998)
Breiman, L., Friedman, J., Olshen, R., Stone, J.: Classification and regression tree. Wadsworth & Brooks (1984)
Draper, N.R., Smith, H.: Applied regression analysis. John Wiley & Sons, Chichester (1982)
Dzeroski, S.: Numerical Constraints and Learnability in Inductive Logic Programming. Ph.D thesis, University of Ljubljana, Slovenia (1995)
Dzeroski, S., Blockeel, H., Kramer, S., Kompare, B., Pfahringer, B., Van Laer, W.: Experiments in predicting biodegradability. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 80–91. Springer, Heidelberg (1999)
Dzeroski, S., Todoroski, L., Urbancic, T.: Handling real numbers in inductive logic programming: A step towards better behavioural clones. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, Springer, Heidelberg (1995)
Dzeroski, S., Lavrac, N. (eds.): Relational Data Mining. Springer, Heidelberg (2001)
Karalic, A.: Linear regression in regression tree leaves. In: Proc. of ISSEK 1992 (International School for Synthesis of Expert Knowledge), Bled, Slovenia (1992)
Karalic, A.: First Order regression. Ph.D thesis, University of Ljubljana, Slovenia (1995)
Knobbe, J., Siebes, A., Van der Wallen, D.M.G.: Multi-relational decision tree induction. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 378–383. Springer, Heidelberg (1999)
Knobbe, J., Blockeel, H., Siebes, A., Van der Wallen, D.M.G.: Multi-relational Data Mining. In: Proc. of Benelearn 1999 (1999)
Knobbe, A.J., Haas, M., Siebes, A.: Propositionalisation and aggregates. In: Proc. 5th European Conf. on Principles of Data Mining and Knowledge Discovery, Springer, Heidelberg (2001)
Kramer, S.: Structural regression trees. In: Proc. 13th National Conf. on Artificial Intelligence (1996)
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications, Ellis Horwood, Chichester, UK (1994)
Leiva, H.A.: MRDTL: A multi-relational decision tree learning algorithm. Master thesis, University of Iowa, USA (2002)
Lubinsky, D.: Tree Structured Interpretable Regression. In: Fisher, D., Lenz, H.J. (eds.) Learning from Data. Lecture Notes in Statistics, vol. 112, Springer, Heidelberg (1994)
Malerba, D., Appice, A., Ceci, M., Monopoli, M.: Trading-off versus global effects or regression nodes in model trees. In: Hacid, M.-S., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds.) ISMIS 2002. LNCS (LNAI), vol. 2366, p. 393. Springer, Heidelberg (2002)
Malerba, D., Esposito, F., Ceci, M., Appice, A.: Top -down induction of model trees with regression and splitting nodes. LACAM Technical Report (2003)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: Proceedings of the Fifth International Conference on Extending Database Technology (1996)
Muggleton, S., Srinivasan, A., King, R., Sternberg, M.: Biochemical knowledge discovery using Inductive Logic Programming. In: Motoda, H. (ed.) Proceedings of the first Conference on Discovery Science, Springer, Berlin (1998)
Orkin, M., Drogin, R.: Vital Statistics. McGraw-Hill, New York (1990)
Quinlan, J.R.: Learning with continuous classes. In: Adams, Sterling (eds.) Proceedings AI 1992, World Scientific, Singapore (1992)
Quinlan, J.R.: A case study in Machine Leaning. In: Proceedings ACSC-16, Sixteenth Australian Computer Science Conferences (1993)
Silverstein, G., Pazzani, M.J.: Relational cliches: Constraining constructive induction during relational learning. In: Proc. 8th Int. Workshop on Machine Learning (1991)
Torgo, L.: Functional Models for Regression Tree Leaves. In: Proceedings of the 14th International Conference (ICML 1997), Nashville, Tennessee (1997)
Wang, Y., Witten, I.H.: Inducing Model Trees for Continuous Classes. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, Springer, Heidelberg (1997)
Weiss, S.M., Indurkhya, N.: Predictive Data Mining. A Practical Guide. Morgan Kaufmann, San Francisco (1998)
Wrobel, S.: Inductive logic programming for knowledge discovery in databases. In: Dzeroski, S., Lavrac, N. (eds.) Relational Data Mining, pp. 74–101. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Apice, A., Ceci, M., Malerba, D. (2003). Mining Model Trees: A Multi-relational Approach. In: Horváth, T., Yamamoto, A. (eds) Inductive Logic Programming. ILP 2003. Lecture Notes in Computer Science(), vol 2835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39917-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-39917-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20144-1
Online ISBN: 978-3-540-39917-9
eBook Packages: Springer Book Archive