Abstract
Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.
Work done under partial support of Mexican Government (CONACyT, SNI, PIFI-IPN, CGEPI-IPN) and RITOS-2.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Natural Language Processing
- Similar Word
- Computational Linguistics
- Prepositional Phrase
- Backoff Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Biblioteca de Consulta Microsoft Encarta 2004, Microsoft Corporation (1994–2004)
Banerjee, S., Ted Pedersen, T.: The Design, Implementation, and Use of the Ngram Statistic Package. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, pp. 370–381 (2003)
Brants, T.: TnT: A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied Natural Language Processing Conference, Seattle, WA, USA (2000)
Calvo, H., Gelbukh, A.: Improving Disambiguation of Prepositional Phrase Attachments Using the Web as Corpus. In: Procs. of CIARP 2003, Cuba, pp. 592–598 (2003)
Calvo, H., Gelbukh, A.: Unsupervised Learning of Ontology-Linked Selectional Preferences. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 418–424. Springer, Heidelberg (2004)
Clark, S., Weir, D.: Class-based Probability Estimation Using a Semantic Hierarchy. Computational Linguistics 28(2) (2002)
Farreres, X., Rigau, G., Rodríguez, H.: Using WordNet for Building WordNets. In: Proceedings of COLING-ACL Workshop Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer, Dordrecht (1994)
Hindle, D., Rooth, M.: Structural ambiguity and lexical relations. Computational Linguistics 19, 103–120 (1993)
Kilgarriff, A.: Thesauruses for Natural Language Processing. In: Proceedings of NLP KE 2003, Beijing, China, pp. 5–13 (2003)
Lázaro Carreter, F. (ed.): Diccionario Anaya de la Lengua, Vox (1991)
Li, H., Abe, N.: Word clustering and disambiguation based on co-ocurrence data. In: Proceedings of COLING 1998, pp. 749–755 (1998)
Lin, D.: An information-theoretic measure of similarity. In: Proceedings of ICML 1998, pp. 296–304 (1998)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing, ch. 1. MIT Press, Cambridge (1999)
McLauchlan, M.: Thesauruses for Prepositional Phrase Attachment. In: Proceedings of CoNLL 2004, Boston, MA, USA, pp. 73–80 (2004)
Mitchell, B.: Prepositional phrase attachment using machine learning algorithms. Ph.D. thesis, University of Sheffield (2003)
Morales-Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: Proc. Fourth Mexican International Conference on Computer Science, Mexico (2003)
Navarro, B., Civit, M., Antonia Martí, M., Marcos, R., Fernández, B.: Syntactic, semantic and pragmatic annotation in Cast3LB. In: Shallow Processing of Large Corpora (SProLaC), a Workshop of Corpus Linguistics, Lancaster, UK (2003)
Pantel, P., Lin, D.: An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words. In: Proceedings of Association for Computational Linguistics (ACL 2000), Hong Kong, pp. 101–108 (2000)
Ratnaparkhi, A., Reynar, J., Roukos, S.: A maximum entropy model for prepositional phrase attachment. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 250–255 (1994)
Ratnaparkhi, A.: Unsupervised Statistical Models for Prepositional Phrase Attachment. In: Proceedings of COLINGACL 1998, Montreal, Canada (1998)
Resnik, P.: Selectional preference and sense disambiguation. In: ACL SIGLEX Workshop on Tagging Text with Lexical Semantics, Washington, D. C., USA (1997)
Roth, D.: Learning to Resolve Natural Language Ambiguities: A Unified Approach. In: Proceedings of AAAI 1998, Madison, Wisconsin, pp. 806–813 (1998)
Stetina, J., Nagao, M.: Corpus based PP attachment ambiguity resolution with a semantic dictionary. In: Proceedings of WVLC 1997, pp. 66–80 (1997)
Jones, S., Karen: Synonymy and Semantic Classification. Edinburgh University Press (1986)
Weeds, J.: Measures and Applications of Lexical Distributional Similarity. Julie Weeds, Ph.D. thesis. University of Sussex (2003)
Volk, M.: Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In: Proceeding of Corpus Linguistics 2001, Lancaster (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Calvo, H., Gelbukh, A., Kilgarriff, A. (2005). Distributional Thesaurus Versus WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)