Abstract
In this paper I describe the general principles of learning as data compression. I introduce two-part code optimization and analyze the theoretical background in terms of Kolmogorov complexity. The good news is that the optimal compression theoretically represents the optimal interpretation of the data, the bad news is that such an optimal compression cannot be computed and that an increase in compression not necessarily implies a better theory. I discuss the application of these insights to DFA induction.
This project is supported by a BSIK grant from the Dutch Ministry of Education, Culture and Science (OC&W) and is part of the ICT innovation program of the Ministry of Economic Affairs (EZ).
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Adriaans, P., Vitányi, P.: The Power and Perils of MDL, IEEE Trans. Inform. Th. (submitted)
Adriaans, P.W.: The philosophy of learning, Handbook of the philosophy of information. In: Adriaans, P.W., van Benthem, J. (eds.) Handbook of the philosophy of science, Series edited by Gabbay, D. M., Thagard, P., Woods, J. (to appear)
Adriaans, P.W.: Learning Deterministic DEC Grammars Is Learning Rational Numbers. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 320–326. Springer, Heidelberg (2006)
Adriaans, P.W.: Using MDL for Grammar Induction, in Grammatical Inference: Algorithms and Applications. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 293–306. Springer, Heidelberg (2006)
Cilibrasi, R., Vitányi, P.: Clustering by compression, IEEE Trans. Infomat. Th., Submitted. See http://arxiv.org/abs/cs.CV/0312044
Cilibrasi, R., Vitányi, P.M.B.: Automatic Meaning Discovery Using Google (2004), http://www.citebase.org/abstract?id=oai:arXiv.org:cs/0412098
Domingos, P.: The Role of Occam’s Razor in Knowledge Discovery. Data. Mining and Knowledge Discovery 3(4), 409–425 (1999)
Barron, A., Rissanen, J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. Information Theory 44(6), 2743–2760 (1998)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 2nd edn. Springer, New York (1997)
Vereshchagin, N.K., Vitányi, P.M.B.: Kolmogorov’s structure functions and model selection. IEEE Trans. Information Theory 50(12), 3265–3290 (2004)
Grünwald, P.D., Langford, J.: Suboptimal behavior of Bayes and MDL in classification under misspecification. Machine Learning (2007)
Gold, E.: Mark, Language Identification in the Limit. Information and Control 10(5), 447–474 (1967)
Pitt, L., Warmuth, M.K.: The Minimum Consistent DFA Problem Cannot be Approximated within any Polynomial. Journal of the ACM 40(1), 95–142 (1993)
Adriaans, P., Vervoort, M.: The EMILE 4.1 grammar induction toolbox. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 293–295. Springer, Heidelberg (2002)
Vervoort, M.: Games, walks and Grammars, Thesis University of Amsterdam (2000)
Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the Abbadingo One DFA learning competition and a new evidence-driven state merging algorithm. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 1–12. Springer, Heidelberg (2002)
van Zaanen, M., Adriaans, P.: Alignment-Based Learning versus EMILE: A Comparison. In: Proceedings of the Belgian-Dutch Conference on Artificial Intelligence (BNAIC), pp. 315–322. Amsterdam, the Netherlands (2001)
Solan, Z., Horn, D., Ruppin, E., Edelman, S.: Unsupervised learning of natural languages. PNAS 102(33), 11629–11634 (2005)
Curnéjols, A., Miclet, L.: Apprentissage artificiel, concepts et algorithmes, Eyrolles (2003)
Gerard Wolff, J.: Unifying Computing And Cognition, The SP Theory and its Applications, CognitionResearch.org.uk (2006)
Wolff, J.G.: Computing As Compression: An Overview of the SP Theory and System. New Generation Comput. 13(2), 187–214 (1995)
Wolff, J.G.: Information Compression by Multiple Alignment, Unification and Search as a Unifying Principle in Computing and Cognition. Journal of Artificial Intelligence Research 19(3), 193–230 (2003)
Dubrovnik, Croatia, de la Higuera, Colin and Adriaans, Pieter and van Zaanen, Menno and Oncina, Jose (eds.): Proceedings of the Workshop and Tutorial on Learning Context-Free Grammars held at the 14th European Conference on Machine Learning (ECML) and the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adriaans, P. (2007). Learning as Data Compression. In: Cooper, S.B., Löwe, B., Sorbi, A. (eds) Computation and Logic in the Real World. CiE 2007. Lecture Notes in Computer Science, vol 4497. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73001-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-73001-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73000-2
Online ISBN: 978-3-540-73001-9
eBook Packages: Computer ScienceComputer Science (R0)