Learning deterministic context free grammars: The Omphalos competition

Clark, Alexander

doi:10.1007/s10994-006-9592-9

Learning deterministic context free grammars: The Omphalos competition

Published: 04 August 2006

Volume 66, pages 93–110, (2007)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning deterministic context free grammars: The Omphalos competition

Download PDF

Alexander Clark¹

597 Accesses
13 Citations
3 Altmetric
Explore all metrics

Abstract

This paper describes the winning entry to the Omphalos context free grammar learning competition. We describe a context-free grammatical inference algorithm operating on positive data only, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency. The competition is discussed from the perspective of a competitor. We discuss a class of deterministic grammars, the Non-terminally Separated (NTS) grammars, that have a property relied on by our algorithm, and consider the possibilities of extending the algorithm to larger classes of languages.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abe, N., & Warmuth, M. K. (1992). On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9, 205–260.
MATH Google Scholar
Adriaans, P., Trautwein, M., & Vervoort, M. (2000). Towards high speed grammar induction on large text corpora. In V. Hlavac, K. G. Jeffery, & J. Wiedermann (Eds.), SOFSEM 2000: Theory and practice of informatics (pp. 173–186). Springer Verlag.
Boasson, L., & Senizergues, S. (1985). NTS languages are deterministic and congruential. J. Comput. Syst. Sci., 31(3), 332–342.
Article MATH MathSciNet Google Scholar
Book, R., & Otto, F. (1993). String rewriting systems. Springer Verlag.
Book, R. V. (1981). NTS grammars and Church-Rosser systems. Information Processing Letters, 13(2), 73–76.
Article MATH MathSciNet Google Scholar
Carrasco, R. C., & Oncina, J. (1999). Learning deterministic regular grammars from stochastic samples in polynomial time. Theoretical Informatics and Applications, 33(1), 1–20.
Article MATH MathSciNet Google Scholar
Clark, A. (2001). Unsupervised induction of stochastic context free grammars with distributional clustering. In Proc. of Conference on Computational Natural Language Learning (pp. 105–112). Toulouse, France.
Clark, A., & Thollard, F. (2004a). PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5, 473–497.
Google Scholar
Clark, A., & Thollard, F. (2004b). Partially distribution-free learning of regular languages from positive samples. In Proceedings of COLING. Geneva, Switzerland.
Eyraud, R., de la Higuera, C., & Janodet, J.-C. (2004). Representing languages by learnable rewriting systems. In International Colloquium on Grammatical Inference, vol. 3264 of LNAI (pp. 139–150). Athens, Greece.
Gazdar, G., Klein, E., Pullum, G., & Sag, I. (1985). Generalised phrase structure grammar. Basil Blackwell.
Gusfield, D. (1997). Algorithms onstrings, trees and sequences: computer science and computational biology. Cambridge University Press.
Harris, Z. (1954). Distributional structure. In J. A. Fodor & J. J. Katz (Eds.), The structure of language (pp. 33–49). Prentice-Hall.
Harrison, M. A. (1978). Introduction to formal language theory. Addison Wesley.
Kearns, M., & Valiant, G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. JACM, 41(1), 67–95.
Article MATH MathSciNet Google Scholar
Keller, B., & Lutz, R. (2005). Evolutionary induction of stochastic context free grammars. Pattern Recognition. to appear.
Klein, D., & Manning, C. (2001). Distributional phrase structure induction. In Proceedings of CoNLL 2001 (pp. 113–121).
Lamb, S. M. (1961). On the mechanisation of syntactic analysis. In 1961 Conference on Machine Translation of Languages and Applied Language Analysis, vol. 2 of National Physical Laboratory Symposium No. 13 (pp. 674–685). London: Her Majesty’s Stationery Office.
Lari, K., & Young, S. J. (1990). The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4, 35–56.
Article Google Scholar
McNaughton, R., Narendran, P., & Otto, F. (1988). Church-Rosser Thue systems and formal languages. J. ACM, 35(2), 324–344.
Article MATH MathSciNet Google Scholar
Nevill-Manning, C. G., & Witten, I. H. (1997). Identifying hierarchical structure in sequences. Journal of Artificial Intelligence Research, 7, 67–82.
MATH Google Scholar
Ron, D., Singer, Y., & Tishby, N. (1995). On the learnability and usage of acyclic probabilistic finite automata. In COLT 1995 (pp. 31–40). Santa Cruz CA USA.
Sakakibara, Y. (1992). Efficient learning of context free grammars from positive structural samples. Inf Computing, 97(1), 23–60.
Article MATH MathSciNet Google Scholar
Solan, Z., Horn, D., Ruppin, E., & Edelman, S. (2003). Unsupervised context sensitive language acquisition from a large corpus. In Proceedings of NIPS-2003.
Starkie, B., Coste, F., & van Zaanen, M. (2004). The Omphalos Context-Free Grammar learning competition. In International Colloquium on Grammatical Inference, vol. 3264 of LNAI (pp. 16–27). Athens, Greece.
Stolz, W. (1965). A probabilistic procedure for grouping words into phrases. Language and Speech, 8(4), 219–235.
Google Scholar
Thatcher, J. W. (1967). Characterizing derivation trees of context free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences, 1(4), 317–322.
MATH MathSciNet Google Scholar
van Zaanen, M. (2000). ABL: Alignment-based learning. In COLING 2000—Proceedings of the 18th International Conference on Computational Linguistics.
Yuval, G. (1979). How to swindle Rabin. Cryptologia, 3, 187–189.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Royal Holloway University of London, Egham, Surrey, TW20 0EX
Alexander Clark

Authors

Alexander Clark
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Clark.

Additional information

Editor: Georgios Paliouras and Yasubumi Sakakibara

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clark, A. Learning deterministic context free grammars: The Omphalos competition. Mach Learn 66, 93–110 (2007). https://doi.org/10.1007/s10994-006-9592-9

Download citation

Received: 15 June 2005
Revised: 13 April 2006
Accepted: 22 June 2006
Published: 04 August 2006
Issue Date: January 2007
DOI: https://doi.org/10.1007/s10994-006-9592-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning deterministic context free grammars: The Omphalos competition

Abstract

Article PDF

Similar content being viewed by others

Distributional Learning of Context-Free and Multiple Context-Free Grammars

Learning Context Free Grammars with the Finite Context Property: A Correction of A. Clark’s Algorithm

Learning Domain-Specific Grammars from a Small Number of Examples

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning deterministic context free grammars: The Omphalos competition

Abstract

Article PDF

Similar content being viewed by others

Distributional Learning of Context-Free and Multiple Context-Free Grammars

Learning Context Free Grammars with the Finite Context Property: A Correction of A. Clark’s Algorithm

Learning Domain-Specific Grammars from a Small Number of Examples

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation