Abstract
Multiword expressions are a key problem for the development of large-scale, linguistically sound natural language processing technology. This paper surveys the problem and some currently available analytic techniques. The various kinds of multiword expressions should be analyzed in distinct ways, including listing “words with spaces”, hierarchically organized lexicons, restricted combinatoric rules, lexical selection, “idiomatic constructions” and simple statistical affinity. An adequate comprehensive analysis of multiword expressions must employ both symbolic and statistical techniques.
The research reported here was conducted in part under the auspices of the LinGO project, an international collaboration centered around the lkb system and related resources (see http://lingo.stanford.edu). This research was supported in part by the Research Collaboration between NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation and CSLI, Stanford University. We would like to thank Emily Bender and Tom Wasow for their contributions to our thinking. However, we alone are responsible for any errors that remain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abeillé, Anne: 1988, ‘Light verb constructions and extraction out of NP in a tree adjoining grammar’, in Papers of the 24th Regional Meeting of the Chicago Linguistics Society.
Bauer, Laurie: 1983, English Word-formation, Cambridge: Cambridge University Press.
Bolinger, Dwight, ed.: 1972, Degree Words, the Hague: Mouton.
Charniak, Eugene: 2001, ‘Immediate-head parsing for language models’, in Proc. of the 39th Annual Meeting of the ACL and 10th Conference of the EACL (ACL-EACL 2001), Toulouse.
Copestake, Ann: 1992, ‘The representation of lexical semantic information’, Ph.D. thesis, University of Sussex.
Copestake, Ann: 1994, ‘Representing idioms’, Presentation at the HPSG Conference, Copenhagen.
Copestake, Ann: in press, Implementing Typed Feature Structure Grammars, Stanford: CSLI Publications.
Copestake, Ann & Dan Flickinger: 2000, ‘An open-source grammar development environment and broad-coverage English grammar using HPSG’, in Proc. of the Second conference on Language Resources and Evaluation (LREC-2000), Athens.
Copestake, Ann, Dan Flickinger, Ivan Sag & Carl Pollard: 1999, ‘Minimal recursion semantics: An introduction’, (http://www-csli.stanford.edu/~aac/papers/newmrs.ps), (draft).
Copestake, Ann & Alex Lascarides: 1997, ‘Integrating symbolic and statistical representations: The lexicon pragmatics interface’, in Proc. of the 35th Annual Meeting of the ACL and 8th Conference of the EACL (ACL-EACL’97), Madrid, pp. 136–43.
Dehé, Nicole, Ray Jackendoff, Andrew McIntyre & Silke Urban, eds.: to appear, Verbparticle explorations, Mouton de Gruyter.
Dixon, Robert: 1982, ‘The grammar of English phrasal verbs’, Australian Journal of Linguistics, 2: 149–247.
Fellbaum, Christine, ed.: 1998, WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press.
Hektoen, Eirik: 1997, ‘Probabilistic parse selection based on semantic cooccurrences’, in Proc. of the 5th International Workshop on Parsing Technologies (IWPT-97), MIT, pp. 113–122.
Jackendoff, Ray: 1997, The Architecture of the Language Faculty, Cambridge, MA: MIT Press.
Johnson, Mark, Stuart Geman, Stephan Canon, Zhiyi Chi & Stefan Riezler: 1999, ‘Estimators for stochastic “unification-based” grammars’, in Proc. of the 37th Annual Meeting of the ACL, University of Maryland, pp. 535–541.
Lascarides, Alex & Ann Copestake: 1999, ‘Default representation in constraint-based frameworks’, Computational Linguistics, 25(1): 55–106.
McIntyre, Andrew: 2001, ‘Introduction to the verb-particle experience’, Ms, Leipzig.
Nunberg, Geoffery, Ivan A. Sag & Thomas Wasow: 1994, ‘Idioms’, Language, 70: 491–538.
Oepen, Stephan, Dan Flickinger, Hans Uszkoreit & Jun-ichi Tsujii: 2000, ‘Introduction to the special issue on efficient processing with HPSG: methods, systems, evaluation’, Natural Language Engineering, 6(1): 1–14.
Pearce, Darren: 2001, ‘Synonymy in collocation extraction’, in Proc. of the NAACL 2001 Workshop on Word Net and Other Lexical Resources: Applications, Extensions and Customizations, CMU.
Pollard, Carl & Ivan A. Sag: 1994, Head Driven Phrase Structure Grammar, Chicago: University of Chicago Press.
Pulman, Stephen G.: 1993, ‘The recognition and interpretation of idioms’, in Cristina Cacciari & Patrizia Tabossi, eds., Idioms: Processing, Structure and Interpretation, Hillsdale, NJ: Lawrence Erlbaum Associates, chap. 11.
Riehemann, Susanne: 2001, ‘A constructional approach to idioms and word formation’, Ph.D. thesis, Stanford.
Sag, Ivan A. & Tom Wasow: 1999, Syntactic Theory: A Formal Introduction, Stanford: CSLI Publications.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_1
Download citation
DOI: https://doi.org/10.1007/3-540-45715-1_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43219-7
Online ISBN: 978-3-540-45715-2
eBook Packages: Springer Book Archive