Abstract
In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not suitable in practical situations. In contrast, statistics based inference methods have enjoyed a long and distinguished career. Currently, Bayesian reasoning in various forms, minimum message length (MML) and minimum description length (MDL), are widely applied approaches. They are the tools to use with particular machine learning praxis such as simulated annealing, genetic algorithms, genetic programming, artificial neural networks, and the like. These statistical inference methods select the hypothesis which minimizes the sum of the length of the description of the hypothesis (also called ‘model’) and the length of the description of the data relative to the hypothesis. It appears to us that the future of computational machine learning will include combinations of the approaches above coupled with guaranties with respect to used time and memory resources. Computational learning theory will move closer to practice and the application of the principles such as MDL require further justification. Here, we survey some of the actors in this dichotomy between theory and praxis, we justify MDL via the Bayesian approach, and give a comparison between pac learning and MDL learning of decision trees.
The first author was supported in part by NSERC operating grant OGP-046506, ITRC, and a CGAT grant. The second author was supported by NSERC through International Scientific Exchange Award ISE0125663, and by the European Union through NeuroCOLT ESPRIT Working Group Nr. 8556, and by NWO through NFI Project ALADDIN under Contract number NF 62-376.
Preview
Unable to display preview. Download preview PDF.
References
D. Angluin, Computational learning theory: survey and selected bibliography, Proc. 24th Ann. ACM Symp. Theory of Computing, 1992, pp. 319–342.
L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984.
W. Buntine, Personal communication, September 1994.
A. Ehrenfeucht and D. Haussler, Learning decision trees from random examples. Proc. 1988 Workshop on Comp. Learning Theory, Morgan-Kaufmann, 1988, pp. 182–195.
P. Gács, On the symmetry of algorithmic information, Soviet Math. Dokl., 15 (1974) 1477–1480. Correction: ibid., 15 (1974) 1480.
Q. Gao and M. Li, An application of minimum description length principle to online recognition of hand-printed alphanumerals, 11th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1989, pp. 843–848.
E.M. Gold, Language identification in the limit, Inform. Contr. 10 (1967) 447–474.
T. Hancock, T. Jiang, M. Li, and J. Tromp, Lower bounds on learning decision lists and trees. in: E.W. Mayr, C. Puech (Eds.), STACS 95, Proc. 12th Annual Symp. Theoret. Aspects of Comput. Science, Lecture Notes in Computer Science, Vol. 900, Springer-Verlag, Heidelberg, 1995, pp. 527–538.
A.N. Kolmogorov, Three approaches to the quantitative definition of information, Problems Inform. Transmission 1:1 (1965) 1–7.
M. Li and P.M.B. Vitányi, Inductive reasoning and Kolmogorov complexity, J. Comput. Syst. Sci. 44 (1992) 343–384.
M. Li and P. Vitányi, A theory of learning simple concepts under simple distributions, SIAM J. Computing 20:5 (1991) 915–935.
M. Li and P.M.B. Vitányi, An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 1993.
R. Nohre, Some Topics in Descriptive Complexity, Ph.D. Thesis, Linköping University, Linköping, Sweden, 1994.
D. MacKay, pp 59 in Maximum Entropy and Bayesian Methods, Kluwer, 1992. (Personal communication W. Buntine.)
J. Mingers, An empirical comparison of selection measures for decision-tree induction. Machine Learning 3 (1989) 319–342.
J.R. Quinlan, Induction of decision trees, Machine Learning 1 (1986) 81–106.
J.R. Quinlan and R. Rivest, Inferring decision trees using the minimum description length principle, Inform. Computation 80 (1989) 227–248.
J. Rissanen, Modeling by the shortest data description, Automatica-J.IFAC 14 (1978) 465–471.
J. Rissanen, Universal coding, information, prediction and estimation, IEEE Transactions on Information Theory IT-30 (1984) 629–636.
J. Rissanen, Stochastic Complexity and Statistical Inquiry, World Scientific Publishers, 1989.
J. Rissanen, Stochastic complexity, J. Royal Stat. Soc, series B 49 (1987) 223–239. Discussion: ibid., pp. 252–265.
J. Rissanen, Stochastic complexity in learning, in: P. Vitányi (Ed.), Computational Learning Theory, Proc. 2nd European Conf. (EuroCOLT '95), Lecture Notes in Artificial Intelligence, Vol. 904, Springer-Verlag, Heidelberg, 1995, pp. 196–210.
J. Rissanen and M. Wax, Algorithm for constructing tree structured classifiers, US Patent No. 4719571, 1988.
J. Segen, Pattern-Directed Signal Analysis, PhD Thesis, Carnegie-Mellon University, Pittsburgh, 1980.
R.J. Solomonoff, Complexity-based induction systems: comparisons and convergence theorems, IEEE Trans. Inform. Theory IT-24 (1978) 422–432.
L.G. Valiant, A Theory of the Learnable, Comm. ACM 27 (1984) 1134–1142.
V. Vovk, Minimum description length estimators under the universal coding scheme, in: P. Vitányi (Ed.), Computational Learning Theory, Proc. 2nd European Conf. (EuroCOLT '95), Lecture Notes in Artificial Intelligence, Vol. 904, Springer-Verlag, Heidelberg, 1995, pp. 237–251.
C.S. Wallace and D.M. Boulton, An information measure for classification, Computing Journal 11 (1968) 185–195.
C.S. Wallace and P.R. Freeman, Estimation and inference by compact coding, J. Royal Stat. Soc, Series B, 49 (1987) 240–251. Discussion: ibid.,252–265.
K. Yamanishi, Approximating the minimum description length and its applications to learning, Manuscript, NEC Research Labs, New Jersey, 1995.
A.K. Zvonkin and L.A. Levin, The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms, Russian Math. Surveys 25:6 (1970) 83–124.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, M., Vitányi, P. (1995). Computational machine learning in theory and praxis. In: van Leeuwen, J. (eds) Computer Science Today. Lecture Notes in Computer Science, vol 1000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015264
Download citation
DOI: https://doi.org/10.1007/BFb0015264
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60105-0
Online ISBN: 978-3-540-49435-5
eBook Packages: Springer Book Archive