Skip to main content

Identification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities

  • Conference paper
Text, Speech and Dialogue (TSD 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Included in the following conference series:

  • 432 Accesses

Abstract

Previous approaches on automatic extraction of lexical similarities have considered as semantic unit of text the word. However, the theoretical perspective of contextual lexical semantics suggests that larger segments of text, specifically non-compositional multiwords, are more appropriate for this role. We experimentally tested the applicability of this notion, applying automatic collocation extraction to identify and merge such multiwords prior to the similarity estimation process. Employing an automatic comparative evaluation scheme we ascertain improvement of the extracted lexico-semantic knowledge.

The presented work is supported by GEMINI (IST-2001-32343) EC project

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Miller, G.: Wordnet: An on-line lexical database. International Journal of Lexicography 3, 235–312 (1990)

    Article  Google Scholar 

  2. Cruse, D.A.: Lexical Semantics. Cambridge University Press, Cambridge (1986)

    Google Scholar 

  3. Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer, Boston (1994)

    MATH  Google Scholar 

  4. Schütze, H.: Word Sense Discrimination. Computational Linguistics 24, 97–124 (1998)

    Google Scholar 

  5. Martin, S., Liermann, J., Ney, H.: Algorithms for bigram and trigram word clustering. Speech Communication 24, 19–37 (1998)

    Article  Google Scholar 

  6. Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of International Conference on Machine Learning (1998)

    Google Scholar 

  7. Choueka, Y.: Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: Proceedings of the RIAO Conference (1988)

    Google Scholar 

  8. Smadja, F.: Retrieving Collocations from text: Xtract. Computational Linguistics 19, 43–177 (1993)

    Google Scholar 

  9. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  10. Pearce, D.: Synonymy in collocation extraction. In: Proceedings of the NAACL 2001 Workshop on WordNet and other Lexical Resources, Pittsburgh (2001)

    Google Scholar 

  11. Sekine, S., Carroll, J., Ananiadou, S., Tsujii, J.: Automatic Learning for Semantic Collocation. In: Proceedings of the 3rd Conference on Applied NLP (1992)

    Google Scholar 

  12. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 61–74 (1993)

    Google Scholar 

  13. Thanopoulos, A., Fakotakis, N., Kokkinakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of LREC, Las Palmas (2002)

    Google Scholar 

  14. Lin, D.: Automatic retrieval and clustering of similar words. COLING-ACL (1998)

    Google Scholar 

  15. Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th IJCAI Conference, Montreal (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Thanopoulos, A., Fakotakis, N., Kokkinakis, G. (2003). Identification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39398-6_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20024-6

  • Online ISBN: 978-3-540-39398-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics