Abstract
We explain why weighted automata are an attractive knowledge representation for natural language problems. We first trace the close historical ties between the two fields, then present two complex real-world problems, transliteration and translation. These problems are usefully decomposed into a pipeline of weighted transducers, and weights can be set to maximize the likelihood of a training corpus using standard algorithms. We additionally describe the representation of language models, critical data sources in natural language processing, as weighted automata. We outline the wide range of work in natural language processing that makes use of weighted string and tree automata and describe current work and challenges.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Language Model
- Natural Language Processing
- Machine Translation
- Statistical Machine Translation
- Computational Linguistics
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
H. Alshawi, S. Douglas, and S. Bangalore. Learning dependency translation models as collections of finite-state head transducers. Computational Linguistics, 26(1):45–60, 2000.
J.K. Baker. The DRAGON system—An overview. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23(1):24–29, 1975.
T. Brants, A.C. Popat, P. Xu, F.J. Och, and J. Dean. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, June 2007, pages 858–867. Association for Computational Linguistics, Stroudsburg, 2007.
P.F. Brown, S.A.D. Pietra, V.J.D. Pietra, and R.L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–312, 1993.
E. Charniak. Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, July 2001, pages 116–123. Association for Computational Linguistics, Stroudsburg, 2001.
N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2(3):113–124, 1956.
N. Chomsky. Syntactic Structures. Mouton, The Hague, 1957.
K.W. Church. A stochastic parts program and noun phrase parser for unrestricted text. In Second Conference on Applied Natural Language Processing Proceedings, Austin, TX, February 1988, pages 136–143. Association for Computational Linguistics, Stroudsburg, 1988.
A. Clark. Memory-based learning of morphology with stochastic transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, July 2002, pages 513–520. Association for Computational Linguistics, Stroudsburg, 2002.
M. Collins. Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, Philadelphia, PA, 1999.
M. Dalrymple. Lexical Functional Grammar. Academic Press, New York, 2001.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
A. Echihabi and D. Marcu. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 16–23. Association for Computational Linguistics, Stroudsburg, 2003.
S. Eilenberg. Automata, Languages, and Machines. Academic Press, New York, 1974.
J. Eisner. Learning non-isomorphic tree mappings for machine translation. In The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 205–208. Association for Computational Linguistics, Stroudsburg, 2003.
D. Eppstein. Finding the k shortest paths. SIAM Journal on Computing, 28(2):652–673, 1998.
R.A. Fisher. On the “probable error” of a coefficient of correlation deduced from a small sample. Metron. International Journal of Statistics, 1:3–32, 1921.
M. Galley, M. Hopkins, K. Knight, and D. Marcu. What’s in a translation rule? In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT–NAACL 2004, Boston, MA, May 2004, pages 273–280. Association for Computational Linguistics, Stroudsburg, 2004.
F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.
F. Gécseg and M. Steinby. Tree languages. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, Chapter 1, pages 1–68. Springer, Berlin, 1997.
D. Gildea. Loosely tree-based alignment for machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 80–87. Association for Computational Linguistics, Stroudsburg, 2003.
J. Graehl. Carmel finite-state toolkit. http://www.isi.edu/licensed-sw/carmel, 1997.
J. Graehl and K. Knight. Training tree transducers. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT–NAACL 2004, Boston, MA, May 2004, pages 105–112. Association for Computational Linguistics, Stroudsburg, 2004.
J. Graehl, K. Knight, and J. May. Training tree transducers. Computational Linguistics, 34(3):391–427, 2008.
L. Huang and D. Chiang. Better k-best parsing. In Proceedings of the Ninth International Workshop on Parsing Technology, Vancouver, Canada, October 2005, pages 53–64. Association for Computational Linguistics, Stroudsburg, 2005.
E.T. Jaynes. Information theory and statistical mechanics. Physical Review (Series II), 106(4):620–630, 1957.
F. Jelinek. Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4):532–556, 1976.
F. Jelinek, L.R. Bahl, and R.L. Mercer. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory, IT-21(3):250–256, 1975.
D. Jurafsky and J.H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition. Chapter 4: N-grams. Prentice Hall, Englewood Cliffs, 2009.
R. Kaplan and M. Kay. Regular models of phonological rule systems. Computational Linguistics, 20(3):331–378, 1994.
K. Knight and Y. Al-Onaizan. Translation with finite-state devices. In Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA’98, Langhorne, PA, October 1998, volume 1529 of Lecture Notes in Computer Science, pages 421–437. Springer, Berlin, 1998.
K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4):599–612, 1998.
K. Knight and J. Graehl. An overview of probabilistic tree transducers for natural language processing. In Computational Linguistics and Intelligent Text Processing 6th International Conference, CICLing 2005, Mexico City, Mexico, February 2005, volume 3406 of Lecture Notes in Computer Science, pages 1–24. Springer, Berlin, 2005.
K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach. Artificial Intelligence, 139(1):91–107, 2002.
O. Kolak, W. Byrne, and P. Resnik. A generative probabilistic OCR model for NLP applications. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 55–62. Association for Computational Linguistics, Stroudsburg, 2003.
S. Kumar and W. Byrne. A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 63–70. Association for Computational Linguistics, Stroudsburg, 2003.
A. Maletti. Minimizing deterministic weighted tree automata. In Proceedings of the 2nd International Conference on Language and Automata Theory and Applications, pages 371–382. Universitat Rovira I Virgili, Tarragona, 2008.
A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM Journal on Computing, 39(2):410–430, 2009.
A.A. Markov. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (Example of a statistical investigation of the text of “Eugene Onegin” illustrating the dependence between samples in chain). Izvistia Imperatorskoi Akademii Nauk (Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg), 7:153–162, 1913. English translation by Morris Halle, 1956.
J. May and K. Knight. A better n-best list: Practical determinization of weighted finite tree automata. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, New York, NY, June 2006, pages 351–358. Association for Computational Linguistics, Stroudsburg, 2006.
J. May and K. Knight. Tiburon: A weighted tree automata toolkit. In O.H. Ibarra and H.-C. Yen, editors, Proceedings of the 11th International Conference of Implementation and Application of Automata, CIAA 2006, Taipei, Taiwan, August 2006. volume 4094 of Lecture Notes in Computer Science, pages 102–113. Springer, Berlin, 2006.
I.D. Melamed. Multitext grammars and synchronous parsers. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 79–86. Association for Computational Linguistics, Stroudsburg, 2003.
G.A. Miller and N. Chomsky. Finitary models of language users. In R.D. Luce, R.R. Bush, and E. Galanter, editors, Handbook of Mathematical Psychology, volume II, pages 419–491. Wiley, New York, 1963.
M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269–312, 1997.
M. Mohri, F.C.N. Pereira, and M.D. Riley. AT&T FSM library. http://www.research.att.com/~fsmtools/fsm, 1998. AT&T Labs—Research.
F. Och, C. Tillmann, and H. Ney. Improved alignment models for statistical machine translation. In Proceedings of the 1999 Joint SIGDAT Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, June 1999, pages 20–28. Association for Computational Linguistics, Stroudsburg, 1999.
B. Pang, K. Knight, and D. Marcu. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 102–109. Association for Computational Linguistics, Stroudsburg, 2003.
F. Pereira, M. Riley, and R. Sproat. Weighted rational transductions and their application to human language processing. In Human Language Technology, Plainsboro, NJ, March 1994, pages 262–267. Morgan Kaufmann, San Mateo, 1994.
M. Riley, F. Pereira, and E. Chun. Lazy transducer composition: A flexible method for on-the-fly expansion of context-dependent grammar network. In Proceedings, IEEE Automatic Speech Recognition Workshop, Snowbird, UT, December 1995, pages 139–140.
W.C. Rounds. Mappings and grammars on trees. Theory of Computing Systems, 4:257–287, 1970.
I.A. Sag, T. Wasow, and E.M. Bender. Syntactic Theory, 2nd edition. CSLI Publications, Stanford, 2003.
C. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 1948. 623–656
S.M. Shieber. Synchronous grammars as tree transducers. In Proceedings of the Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+ 7), Vancouver, Canada, May 2004, pages 88–95.
S.M. Shieber. Unifying synchronous tree adjoining grammars and tree transducers via bimorphisms. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, April 2006, pages 377–384. Association for Computational Linguistics, Stroudsburg, 2006.
R. Sproat, W. Gales, C. Shih, and N. Chang. A stochastic finite-state word-segmentation algorithm for Chinese. Computational Linguistics, 22(3):377–404, 1996.
J.W. Thatcher. Generalized2 sequential machine maps. Journal of Computer and System Sciences, 4(4):339–367, 1970.
J.W. Thatcher. Tree automata: An informal survey. In A.V. Aho, editor, Currents in the Theory of Computing, pages 143–172. Prentice Hall, Englewood Cliffs, 1973.
W.A. Woods. Transition network grammars for natural language analysis. Communications of the Association for Computing Machinery, 13(10):591–606, 1970.
D. Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–404, 1997.
K. Yamada and K. Knight. A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, July 2001, pages 523–530. Association for Computational Linguistics, Stroudsburg, 2001.
D. Zajic, B. Dorr, and R. Schwartz. Automatic headline generation for newspaper stories. In Proceedings of the ACL-02 Workshop on Text Summarization (DUC 2002), Philadelphia, PA, July 2002, pages 78–85. Association for Computational Linguistics, Stroudsburg, 2002.
B. Zhou, S.F. Chen, and Y. Gao. Folsom: A fast and memory-efficient phrase-based approach to statistical machine translation. In Proceedings of the IEEE/ACL 2006 Workshop on Spoken Language Technology, Palm Beach, Aruba, December 2006, pages 226–229, IEEE Press, New York, 2006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Knight, K., May, J. (2009). Applications of Weighted Automata in Natural Language Processing. In: Droste, M., Kuich, W., Vogler, H. (eds) Handbook of Weighted Automata. Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01492-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-01492-5_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01491-8
Online ISBN: 978-3-642-01492-5
eBook Packages: Computer ScienceComputer Science (R0)