Abstract
This paper describes the winning entry to the Omphalos context free grammar learning competition. We describe a context-free grammatical inference algorithm operating on positive data only, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency. The competition is discussed from the perspective of a competitor. We discuss a class of deterministic grammars, the Non-terminally Separated (NTS) grammars, that have a property relied on by our algorithm, and consider the possibilities of extending the algorithm to larger classes of languages.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abe, N., & Warmuth, M. K. (1992). On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9, 205–260.
Adriaans, P., Trautwein, M., & Vervoort, M. (2000). Towards high speed grammar induction on large text corpora. In V. Hlavac, K. G. Jeffery, & J. Wiedermann (Eds.), SOFSEM 2000: Theory and practice of informatics (pp. 173–186). Springer Verlag.
Boasson, L., & Senizergues, S. (1985). NTS languages are deterministic and congruential. J. Comput. Syst. Sci., 31(3), 332–342.
Book, R., & Otto, F. (1993). String rewriting systems. Springer Verlag.
Book, R. V. (1981). NTS grammars and Church-Rosser systems. Information Processing Letters, 13(2), 73–76.
Carrasco, R. C., & Oncina, J. (1999). Learning deterministic regular grammars from stochastic samples in polynomial time. Theoretical Informatics and Applications, 33(1), 1–20.
Clark, A. (2001). Unsupervised induction of stochastic context free grammars with distributional clustering. In Proc. of Conference on Computational Natural Language Learning (pp. 105–112). Toulouse, France.
Clark, A., & Thollard, F. (2004a). PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5, 473–497.
Clark, A., & Thollard, F. (2004b). Partially distribution-free learning of regular languages from positive samples. In Proceedings of COLING. Geneva, Switzerland.
Eyraud, R., de la Higuera, C., & Janodet, J.-C. (2004). Representing languages by learnable rewriting systems. In International Colloquium on Grammatical Inference, vol. 3264 of LNAI (pp. 139–150). Athens, Greece.
Gazdar, G., Klein, E., Pullum, G., & Sag, I. (1985). Generalised phrase structure grammar. Basil Blackwell.
Gusfield, D. (1997). Algorithms onstrings, trees and sequences: computer science and computational biology. Cambridge University Press.
Harris, Z. (1954). Distributional structure. In J. A. Fodor & J. J. Katz (Eds.), The structure of language (pp. 33–49). Prentice-Hall.
Harrison, M. A. (1978). Introduction to formal language theory. Addison Wesley.
Kearns, M., & Valiant, G. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. JACM, 41(1), 67–95.
Keller, B., & Lutz, R. (2005). Evolutionary induction of stochastic context free grammars. Pattern Recognition. to appear.
Klein, D., & Manning, C. (2001). Distributional phrase structure induction. In Proceedings of CoNLL 2001 (pp. 113–121).
Lamb, S. M. (1961). On the mechanisation of syntactic analysis. In 1961 Conference on Machine Translation of Languages and Applied Language Analysis, vol. 2 of National Physical Laboratory Symposium No. 13 (pp. 674–685). London: Her Majesty’s Stationery Office.
Lari, K., & Young, S. J. (1990). The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4, 35–56.
McNaughton, R., Narendran, P., & Otto, F. (1988). Church-Rosser Thue systems and formal languages. J. ACM, 35(2), 324–344.
Nevill-Manning, C. G., & Witten, I. H. (1997). Identifying hierarchical structure in sequences. Journal of Artificial Intelligence Research, 7, 67–82.
Ron, D., Singer, Y., & Tishby, N. (1995). On the learnability and usage of acyclic probabilistic finite automata. In COLT 1995 (pp. 31–40). Santa Cruz CA USA.
Sakakibara, Y. (1992). Efficient learning of context free grammars from positive structural samples. Inf Computing, 97(1), 23–60.
Solan, Z., Horn, D., Ruppin, E., & Edelman, S. (2003). Unsupervised context sensitive language acquisition from a large corpus. In Proceedings of NIPS-2003.
Starkie, B., Coste, F., & van Zaanen, M. (2004). The Omphalos Context-Free Grammar learning competition. In International Colloquium on Grammatical Inference, vol. 3264 of LNAI (pp. 16–27). Athens, Greece.
Stolz, W. (1965). A probabilistic procedure for grouping words into phrases. Language and Speech, 8(4), 219–235.
Thatcher, J. W. (1967). Characterizing derivation trees of context free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences, 1(4), 317–322.
van Zaanen, M. (2000). ABL: Alignment-based learning. In COLING 2000—Proceedings of the 18th International Conference on Computational Linguistics.
Yuval, G. (1979). How to swindle Rabin. Cryptologia, 3, 187–189.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Georgios Paliouras and Yasubumi Sakakibara
Rights and permissions
About this article
Cite this article
Clark, A. Learning deterministic context free grammars: The Omphalos competition. Mach Learn 66, 93–110 (2007). https://doi.org/10.1007/s10994-006-9592-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-9592-9