Abstract
We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barron, A.R.: Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In: Bayesian Statistics, vol. 6, pp. 27–52. Oxford University Press, Oxford (1998)
Barron, A.R., Rissanen, J., Yu, B.: The MDL Principle in coding and modeling. IEEE Trans. Inform. Theory 44(6), 2743–2760 (1998)
Barron, A.R.: Complexity regularization with application to artificial neural networks. In: Nonparametric Functional Estimation and Related Topics, pp. 561–576. Kluwer Academic Publishers, Dordrecht (1990)
Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Trans. Inform. Theory 37(4), 1034–1054 (1991)
Bernardo, J.M., Smith, A.F.M.: Bayesian theory. John Wiley, Chichester (1994)
Blackwell, D., Dubins, L.: Merging of opinions with increasing information. The Annals of Mathematical Statistics 33, 882–886 (1962)
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Occam’s razor. Information Processing Letters 24, 377–380 (1987)
Bunke, O., Milhaud, X.: Asymptotic behaviour of Bayes estimates under possibly incorrect models. The Annals of Statistics 26, 617–644 (1998)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)
Diaconis, P., Freedman, D.: On the consistency of Bayes estimates. The Annals of Statitics 14(1), 1–26 (1986)
Grünwald, P.D.: MDL tutorial. In: Grünwald, P.D., Myung, I.J., Pitt, M.A. (eds.) Minimum Description Length: recent developments in theory and practice, ch.1, MIT Press, Cambridge (2004) (to appear)
Grünwald, P.D.: The Minimum Description Length Principle and Reasoning under Uncertainty. PhD thesis, University of Amsterdam, The Netherlands (1998)
Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research 1, 49–75 (2000)
Jordan, M.I.: Why the logistic funtion? a tutorial discussion on probabilities and neural networks. Computational Cognitive Science Tech. Rep. 9503, MIT (1995)
Kearns, M., Mansour, Y., Ng, A.Y., Ron, D.: An experimental and theoretical comparison of model selection methods. Machine Learning 27, 7–50 (1997)
Kleijn, B., van der Vaart, A.: Misspecification in infinite-dimensional Bayesian statistics. (2004) (submitted)
Li, J.K.: Estimation of Mixture Models. PhD thesis, Yale University, Department of Statistics (1997)
McAllester, D.: PAC-Bayesian model averaging. In: Proceedings COLT 1999 (1999)
Meir, R., Merhav, N.: On the stochastic complexity of learning realizable and unrealizable rules. Machine Learning 19, 241–261 (1995)
Quinlan, J., Rivest, R.: Inferring decision trees using the minimum description length principle. Information and Computation 80, 227–248 (1989)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore (1989)
Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244 (2001)
Viswanathan, M., Wallace, C.S., Dowe, D.L., Korb, K.B.: Finding cutpoints in noisy binary sequences - a revised empirical evaluation. In: Foo, N.Y. (ed.) AI 1999. LNCS, vol. 1747, pp. 405–416. Springer, Heidelberg (1999)
Yamanishi, K.: A decision-theoretic extension of stochastic complexity and its applications to learning. IEEE Trans. Inform. Theory 44(4), 1424–1439 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grünwald, P., Langford, J. (2004). Suboptimal Behavior of Bayes and MDL in Classification Under Misspecification. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-27819-1_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive