Identification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities

Thanopoulos, Aristomenis; Fakotakis, Nikos; Kokkinakis, George

doi:10.1007/978-3-540-39398-6_14

Aristomenis Thanopoulos⁷,
Nikos Fakotakis⁷ &
George Kokkinakis⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

432 Accesses

Abstract

Previous approaches on automatic extraction of lexical similarities have considered as semantic unit of text the word. However, the theoretical perspective of contextual lexical semantics suggests that larger segments of text, specifically non-compositional multiwords, are more appropriate for this role. We experimentally tested the applicability of this notion, applying automatic collocation extraction to identify and merge such multiwords prior to the similarity estimation process. Employing an automatic comparative evaluation scheme we ascertain improvement of the extracted lexico-semantic knowledge.

The presented work is supported by GEMINI (IST-2001-32343) EC project

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Gathering Information About Word Similarity from Neighbor Sentences

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Mining the Web for Collocations: IR Models of Term Associations

References

Miller, G.: Wordnet: An on-line lexical database. International Journal of Lexicography 3, 235–312 (1990)
Article Google Scholar
Cruse, D.A.: Lexical Semantics. Cambridge University Press, Cambridge (1986)
Google Scholar
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer, Boston (1994)
MATH Google Scholar
Schütze, H.: Word Sense Discrimination. Computational Linguistics 24, 97–124 (1998)
Google Scholar
Martin, S., Liermann, J., Ney, H.: Algorithms for bigram and trigram word clustering. Speech Communication 24, 19–37 (1998)
Article Google Scholar
Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of International Conference on Machine Learning (1998)
Google Scholar
Choueka, Y.: Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In: Proceedings of the RIAO Conference (1988)
Google Scholar
Smadja, F.: Retrieving Collocations from text: Xtract. Computational Linguistics 19, 43–177 (1993)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Pearce, D.: Synonymy in collocation extraction. In: Proceedings of the NAACL 2001 Workshop on WordNet and other Lexical Resources, Pittsburgh (2001)
Google Scholar
Sekine, S., Carroll, J., Ananiadou, S., Tsujii, J.: Automatic Learning for Semantic Collocation. In: Proceedings of the 3rd Conference on Applied NLP (1992)
Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19, 61–74 (1993)
Google Scholar
Thanopoulos, A., Fakotakis, N., Kokkinakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of LREC, Las Palmas (2002)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. COLING-ACL (1998)
Google Scholar
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th IJCAI Conference, Montreal (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500, Rion, Patras, Greece
Aristomenis Thanopoulos, Nikos Fakotakis & George Kokkinakis

Authors

Aristomenis Thanopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Fakotakis
View author publications
You can also search for this author in PubMed Google Scholar
George Kokkinakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek & Pavel Mautner &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thanopoulos, A., Fakotakis, N., Kokkinakis, G. (2003). Identification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-39398-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Identification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Gathering Information About Word Similarity from Neighbor Sentences

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Mining the Web for Collocations: IR Models of Term Associations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Identification of Multiwords as Preprocessing for Automatic Extraction of Lexical Similarities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Gathering Information About Word Similarity from Neighbor Sentences

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Mining the Web for Collocations: IR Models of Term Associations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation