Abstract
This paper presents PCFG-BCL, an unsupervised algorithm that learns a probabilistic context-free grammar (PCFG) from positive samples. The algorithm acquires rules of an unknown PCFG through iterative biclustering of bigrams in the training corpus. Our analysis shows that this procedure uses a greedy approach to adding rules such that each set of rules that is added to the grammar results in the largest increase in the posterior of the grammar given the training corpus. Results of our experiments on several benchmark datasets show that PCFG-BCL is competitive with existing methods for unsupervised CFG learning.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Adriaans, P., Trautwein, M., Vervoort, M.: Towards high speed grammar induction on large text corpora. In: Jeffery, K.G., Hlaváč, V., Wiedermann, J. (eds.) SOFSEM 2000. LNCS, vol. 1963. Springer, Heidelberg (2000)
van Zaanen, M.: Abl: Alignment-based learning. In: COLING (2000)
Clark, A.: Unsupervised induction of stochastic context-free grammars using distributional clustering. In: Proceedings of CoNLL (2001)
Clark, A.: Learning deterministic context free grammars: The omphalos competition. Machine Learning 66 (2007)
Solan, Z., Horn, D., Ruppin, E., Edelman, S.: Unsupervised learning of natural languages. Proc. Natl. Acad. Sci. 102(33), 11629–11634 (2005)
Chen, S.F.: Bayesian grammar induction for language modeling. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics (1995)
Kurihara, K., Sato, T.: An application of the variational bayesian approach to probabilistic contextfree grammars. In: IJCNLP 2004 Workshop beyond shallow analyses (2004)
Kurihara, K., Sato, T.: Variational bayesian grammar induction for natural language. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 84–96. Springer, Heidelberg (2006)
Liang, P., Petrov, S., Jordan, M.I., Klein, D.: The infinite pcfg using hierarchical dirichlet processes. In: Proceedings of EMNLP-CoNLL, pp. 688–697 (2007)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. on Comp. Biol. and Bioinformatics 1(1), 24–45 (2004)
Tu, K., Honavar, V.: Unsupervised learning of probabilistic context-free grammar using iterative biclustering (extended version). Technical Report 572, Computer Science, Iowa State University (2008), http://archives.cs.iastate.edu/
Stolcke, A.: Boogie (1993), ftp://ftp.icsi.berkeley.edu/pub/ai/stolcke/software/boogie.shar.z
Baker, J.K.: Trainable grammars for speech recognition. In: Speech Communication Papers for the 97th Meeting of the Acoustical Society of America (1979)
Lari, K., Young, S.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, 35–36 (1990)
Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of ACL (2004)
Bod, R.: An all-subtrees approach to unsupervised parsing. In: Proceedings of ACL (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tu, K., Honavar, V. (2008). Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering. In: Clark, A., Coste, F., Miclet, L. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2008. Lecture Notes in Computer Science(), vol 5278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88009-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-88009-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88008-0
Online ISBN: 978-3-540-88009-7
eBook Packages: Computer ScienceComputer Science (R0)