Abstract
This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources into a single lexical data structure. The BICORD system (BIlingual CORpus-enhanced Dictionaries) involves linking entries in Collins English-French and French-English bilingual dictionary with a large English-French and French-English bilingual corpus. We have concentrated on the class of action verbs of movement, building on earlier work on lexical correspondences specific to this verb class between languages (Klavans and Tzoukermann, 1989), (Klavans and Tzoukermann, 1990a), (Klavans and Tzoukermann, 1990b).1 We first examine the way prototypical verbs of movement are translated in the Collins-Robert (Atkins, Duval, and Milne, 1978) bilingual dictionary, and then analyze the behavior of some of these verbs in a large bilingual corpus. We incorporate the results of linguistic research on the theory of verb types to motivate corpus analysis coupled with data from MRDs for the purpose of establishing lexical correspondences with the full range of associated translations, and with statistical data attached to the relevant nodes.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Atkins, Beryl T. 1987. Semantic ID tags: Corpus evidence for dictionary senses. InProceedings of the Third Conference of the University of Waterloo. Centre for the New Oxford English Dictionary and Text Research: Electronic Text Research.
Atkins, Beryl T., Alain Duval, and Rosemary C. Milne. 1978.Collins Robert French Dictionary: French-English. English-French. Collins Publishers, London.
Atkins, Beryl T., Judith Kegl, and Beth Levin. 1988. Anatomy of a verb entry: from linguistics theory to lexicographic practice.International Journal of Lexicography, 1:84–126.
Boguraev, Branimir. 1991. Building a lexicon: The contribution of computers.International Journal of Lexicography, 4(3).
Boguraev, Branimir, Roy Byrd, Judith Klavans, and Mary Neff. 1989. From structural analysis of lexical resources to semantics in a lexical knowledge base. InFirst International Lexical Acquisition Workshop, Detroit, Michigan. International Joint Conference on Artificial Intelligence.
Brent, Michael R. 1993. From grammar to lexicon: Unsupervised learning of lexical syntax.Computational Linguistics, 19(2):243–262.
Brill, Eric. 1992. A simple rule-based part of speech tagger. InThird Conference on Applied Computational Linguistics, Trento, Italy.
Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, R. Mercer, and P. Roossin. 1988. A statistical approach to language translation. InProceedings of the Twelfth International Conference on Computational Linguistics, Budapest, Hungary.
Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, R. Mercer, and P. Roossin. 1990. A statistical approach to machine translation.Computational Linguistics, 16(2):79–85.
Brown, P., J. Lai, and R. Mercer. 1991. Aligning sentences in parallel corpora. InProceedings of the Twenty-ninth Annual Meeting of the Association for Computational Linguistics, pages 169–176, Berkeley, California.
Brown, P., S. Della Pietra, V. Della Pietra, M. Goldsmith, J. Hajic, R. Mercer, and S. Mohanty. 1993. But dictionaries are data too. InProceedings of DARPA, Princeton, New Jersey.
Byrd, Roy, Nicoletta Calzolari, Martin Chodorow, Judith Klavans, Mary Neff, and Omneya Rizk. 1987. Tools and methods for computational lexicology.Computational Linguistics, 13(3):219–240.
Calzolari, Nicoletta and Remo Bindi. 1990. Acquisition of lexical information from a large textual Italian corpus. InProceedings of the Thirteenth International Conference on Computational Linguistics, Helsinki, Finland.
Carter, Richard. 1988. On movement (written in 1984). In Beth Levin and Carol Tenny, editors,On Linking: Papers by Richard Carter, volume 25 ofLexicon Project Working Papers. MIT Press, pages 231–252.
Catizone, Robert, Graham Russell, and Susan Warwick, 1989.Deriving Translation Data from Bilingual Text. ISSCO, Geneva, Switzerland, unpublished manuscript.
Chodorow, Martin S., Roy J. Byrd, and George E. Heidorn. 1985. Extracting semantic hierarchies from a large on-line dictionary. InProceedings of the Twenty-third Annual Meeting of the Association for Computational Linguistics, pages 299–304.
Church, Kenneth W. 1989. A stochastic parts program noun phrase parser for unrestricted text. InIEEE Proceedings of the ICASSP, pages 695–698, Glasgow.
Church, Kenneth W. 1993. Char_align: A program for aligning parallel texts at the character level. InProceedings of the Thirty-first Annual Meeting of the Association for Computational Linguistics, pages 1–8.
Church, Kenneth W. and Patrick Hanks. 1990. Word association norms, mutual information and lexicography.Computational Linguistics, 16(1):22–29.
Church, Kenneth W., Patrick Hanks, D. Hindle, and W. Gale. 1991. Using statistics in lexical analysis. In Uri Zernik, editor,Lexical Acquisition: Using on-line Resources to Build a Lexicon. Lawrence Erlbaum.
Cruse, D. Alan. 1986.Lexical Semantics. Cambridge University Press, Cambridge, England.
DeRose, Stephen. 1988. Grammatical category disambiguation by statistical optimization.Computational Linguistics, 14(1):31–39.
Dorr, Bonnie J. 1992. The use of lexical semantics in interlingual machine translation.Machine Translation, 7(3):135–193.
Dowty, David. 1979.Word Meaning and Montague Grammar. Reidel, Dordrecht.
Gove, Philip B., editor. 1963.Webster's Seventh New Collegiate Dictionary. G. & C. Merriam Company, Springfield, Mass.
Grishman, Ralph and Richard Kittredge, editors. 1986.Analyzing language in restricted domains: Sublanguage description and processing. Lawrence Erlbaum.
Gruber, J. S. 1965.Studies in Lexical Relations. Ph.D. thesis, The Massachusetts Institute of Technology, Department of Linguistics, Cambridge, Massachusetts. published later 1976 as Lexical Structures in Syntax and Semantics, North-Holland, Amsterdam.
Hale, Kenneth and Jay Keyser. 1986.Some Transitivity Alternations in English. Center for Cognitive Science, The Massachusetts Institute of Technology.
Jackendoff, Ray S. 1987. The status of thematic relations in linguistic theory.Linguistic Inquiry, 18(3):369–411.
Jackendoff, Ray S. (1990).Semantic Structures. MIT Press, Cambridge, MA.
Kay, Martin and Martin Röscheisen. 1993. Text translation alignment.Computational Linguistics, 19(1):75–102.
Klavans, Judith L. 1988. Complex: A computational lexicon for natural language systems. InProceedings of the Twelfth International Conference on Computational Linguistics, Budapest, Hungary.
Klavans, Judith L., Martin Chodorow, and Nina Wacholder. 1990. From dictionary to knowledge base via taxonomy. InProceedings of the Sixth Conference of the University of Waterloo. Centre for the New Oxford English Dictionary and Text Research: Electronic Text Research.
Klavans, Judith L. and Philip Resnik, editors. 1996.The Balancing Act: Combining Symbolic and Statistical Approaches to Language. MIT Press, Cambridge, Mass.
Klavans, Judith L. and Evelyne Tzoukermann. 1989. Corpus-based lexical acquisition for translation systems. InProceedings of the Sixth Israeli Conference of Artificial Intelligence and Computer Vision, Tel Aviv, Israel.
Klavans, Judith L. and Evelyne Tzoukermann. 1990a. Linking bilingual corpora and machine readable dictionaries with the BICORD system. InProceedings of the Sixth Conference of the University of Waterloo. Centre for the New Oxford English Dictionary and Text Research: Electronic Text Research.
Klavans, Judith L. and Evelyne Tzoukermann. 1990b. The BICORD system: Combining lexical information from bilingual corpora and machine readable dictionaries. InProceedings of the Thirteenth International Conference on Computational Linguistics, Helsinki, Finland.
Kupiec, Julian. 1989. Augmenting a hidden markov model for phrase-dependent word tagging. InProceedings of the 1989 DARPA Speech and Natural Language Workshop, pages 92–98, San Mateo, California. Morgan Kaufmann.
Leech, Geoffrey, Roger Garside, and Erik Atwell. 1983. Automatic grammatical tagging of the LOB corpus.ICAME News, 7:13–33.
Levin, Beth and Malka Rappaport. 1988. On the nature of unaccusativity. InProceedings of New England Linguistic Society.
Merialdo, Bernard. 1994. Tagging English text with a probabilistic model.Computational Linguistics, 20(2):155–172.
Neff, Mary and Bran Boguraev. 1989. Dictionaries, dictionary grammars and dictionary entry parsing. InProceedings of the Twenty-seventh Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada.
Neff, Mary S., Roy J. Byrd, and Omneya A. Rizk. 1988. Creating and querying hierarchical lexical data bases. InProceedings of the Second Applied Association for Computational Linguistics Conference, pages 84–92, Austin, Texas.
Pustejovsky, James, Sabine Bergler, and Peter Anick. 1993. Lexical semantic techniques for corpus analysis.Computational Linguistics, 19(2):331–358.
Rizk, Omneya, 1989.Sense Disambiguation of Word Translation in Bilingual Dictionaries: Trying to Solve the Mapping Problem Automatically. Unpublished M.A. thesis. Courant Institute of Mathematical Sciences, New York University, New York.
Sadler, Victor, 1989.The Bilingual Knowledge Bank: A New conceptual basis for MT. BSO/Research, unpublished manuscript, Utrecht.
Smadja, Frank, Kathleen McKeown, and Vasileios Hatzivassiloglou. in press, Translating collocations for bilingual lexicons: A statistical approach.Computational Linguistics.
Talmy, Leonard. 1975. Semantics and syntax of motion. In J.P. Kimball, editor,Syntax and Semantics, volume 4. Academic Press, New York, NY, pages 181–238.
Talmy, Leonard. 1985. Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen, editor,Language Typology and Syntactic Description: Grammatical categories and the Lexicon. Cambridge University Press, Cambridge UK.
Tenny, Carol L., 1992.How Motion Verbs are Special. University of Pittsburgh, Department of Linguistics, unpublished manuscript.
Tenny, Carol L. 1994.Aspectual Roles and the Syntax-Semantics Interface. Kluwer Academic Publishers, Dordrecht.
Tzoukermann, Evelyne and Bernard Merialdo, 1989.Some Statistical Approaches for Tagging Unrestricted Text. IBM, T. J. Watson Research Center, Yorktown Heights, New York, unpublished manuscript.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Klavans, J., Tzoukermann, E. Combining corpus and machine-readable dictionary data for building bilingual lexicons. Mach Translat 10, 185–218 (1995). https://doi.org/10.1007/BF00981486
Issue Date:
DOI: https://doi.org/10.1007/BF00981486