Abstract
This paper study splitting criterion in decision trees using three original points of view. First we propose a unified formalization for association measures based on entropy of type beta. This formalization includes popular measures such as Gini index or Shannon entropy. Second, we generate artificial data from M-of-N concepts whose complexity and class distribution are controlled. Third, our experiment allows us to study the behavior of measures on datasets of growing complexity. The results show that the differences of performances between measures, which are significant when there is no noise in the data, disappear when the level of noise increases.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International, California (1984)
Buntine, W., Niblett, T.: A further comparison of splitting rules for decision tree induction. Machine Learning 8, 75–85 (1992)
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. Journal of the American Statistical Association 37, 54–115 (1954)
Kingsland, L.C.: The evaluation of medical expert system: Experience with the ai/rheum knowledge-based consultant in rheumatology. In: Proceedings of the Ninth Annual Symposium on Computer Applications in Medical Care (1985)
Kononenko, I.: On biases in estimating multi-valued attributes. In: Proc. Int. Joint Conf. On Artificial Intelligence IJCAF 1995, pp. 1034–1040 (1995)
Lallich, S.: Concept de diversite et association predictive. In: Proceedings of XXXIemes Journees de Statistique, May 1999, pp. 673–676 (1999)
De Mantaras, R.L.: A distance-based attributes selection measures for decision tree induction. Machine Learning 6, 81–92 (1991)
Murphy, P., Pazzani, M.: Id2-of-3: Constructive induction of m-of-n concepts for discriminators in decision trees. Technical Report 91-37, Department of Information and Computer Science - University of California at Irvine (1991)
Pagallo, G., Haussler, D.: Boolean feature discovery in empirical learning. Machine Learning 5, 71–99 (1990)
Quinlan, J.R.: Clf.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Rakotomalala, R.: Graphes d’Induction. PhD thesis, University Claude Bernard -Lyon 1 (December 1997)
Wehenkel, L.: On uncertainty measures used for decision tree induction. In: Proceedings of Info. Proc. and Manag. Of Uncertainty, pp. 413–418 (1996)
White, A.P., Liu, W.Z.: Bias in information-based measures in decision tree induction. Machine Learning 15(3), 321–329 (1994)
Zighed, D.A., Auray, J.P., Duru, G.: Sipina: Methode et logiciel. Lacassagne (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rakotomalala, R., Lallich, S., Di Palma, S. (1999). Studying the Behavior of Generalized Entropy in Induction Trees Using a M-of-N Concept. In: Żytkow, J.M., Rauch, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1999. Lecture Notes in Computer Science(), vol 1704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48247-5_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-48247-5_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66490-1
Online ISBN: 978-3-540-48247-5
eBook Packages: Springer Book Archive