Abstract
We consider the problem of PAC-learning distributions over strings, represented by probabilistic deterministic finite automata (PDFAs). PDFAs are a probabilistic model for the generation of strings of symbols, that have been used in the context of speech and handwriting recognition, and bioinformatics. Recent work on learning PDFAs from random examples has used the KL-divergence as the error measure; here we use the variation distance. We build on recent work by Clark and Thollard, and show that the use of the variation distance allows simplifications to be made to the algorithms, and also a strengthening of the results; in particular that using the variation distance, we obtain polynomial sample size bounds that are independent of the expected length of strings.
This work was supported by EPSRC Grant GR/R86188/01. This work was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Abe, N., Takeuchi, J., Warmuth, M.: Polynomial Learnability of Stochastic Rules with respect to the KL-divergence and Quadratic Distance. IEICE Trans. Inf. and Syst. E84-D(3), 299–315 (2001)
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Clark, A., Thollard, F.: PAC-learnability of Probabilistic Deterministic Finite State Automata. Journal of Machine Learning Research 5, 473–497 (2004)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, Chichester (1991)
Cryan, M., Goldberg, L.A., Goldberg, P.W.: Evolutionary Trees can be Learnt in Polynomial Time in the Two-State General Markov Model. SIAM Journal on Computing 31(2), 375–397 (2001)
Goldberg, P.W.: When can two unsupervised learners achieve PAC separation? In: Helmbold, D.P., Williamson, B. (eds.) COLT 2001 and EuroCOLT 2001. LNCS (LNAI), vol. 2111, pp. 303–319. Springer, Heidelberg (2001)
de la Higuera, C., Oncina, J.: Learning Probabilistic Finite Automata. tech. rept. EURISE, Université de Saint-Etienne and Departamento de Lenguajes y Sistemas Informaticos (2002)
Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R.E., Sellie, L.: On the Learnability of Discrete Distributions. In: Procs. of STOC, pp. 273–282 (1994)
Palmer, N., Goldberg, P.W.: PAC Classification via PAC Estimates of Label Class Distributions. Tech rept. 411, Dept. of Computer Science, University of Warwick (2004)
Ron, D., Singer, Y., Tishby, N.: On the Learnability and Usage of Acyclic Probabilistic Finite Automata. Journal of Computer and System Sciences 56(2), 133–152 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Palmer, N., Goldberg, P.W. (2005). PAC-Learnability of Probabilistic Deterministic Finite State Automata in Terms of Variation Distance. In: Jain, S., Simon, H.U., Tomita, E. (eds) Algorithmic Learning Theory. ALT 2005. Lecture Notes in Computer Science(), vol 3734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564089_14
Download citation
DOI: https://doi.org/10.1007/11564089_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29242-5
Online ISBN: 978-3-540-31696-1
eBook Packages: Computer ScienceComputer Science (R0)