Automatic Discovery of Similar Words

Senellart, Pierre P.; Blondel, Vincent D.

doi:10.1007/978-1-4757-4305-0_2

Pierre P. Senellart &
Vincent D. Blondel

2255 Accesses
9 Citations

Abstract

We deal with the issue of automatic discovery of similar words (synonyms and near-synonyms) from different kinds of sources: from large corpora of documents, from the Web, and from monolingual dictionaries. We present in detail three algorithms that extract similar words from a large corpus of documents and consider the specific case of the World Wide Web. We then describe a recent method of automatic synonym extraction in a monolingual dictionary. The method is based on an algorithm that computes similarity measures between vertices in graphs. We use the 1913 Webster’s Dictionary and apply the method on four synonym queries. The results obtained are analyzed and compared with those obtained by two other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Domain-agnostic discovery of similarities and concepts at scale

Article 30 August 2016

Efficient Search of Cosine and Tanimoto Near Duplicates among Vectors with Domains Consisting of Zero, a Positive Number and a Negative Number

Dimensions of Semantic Similarity

References

S. Brin and L. Page.The anatomy of a large-scale hypertextual Web search engine.Computer Networks and ISDN Systems, 30 (1–7): 107–117, 1998.
Article Google Scholar
V.D. Blondel and P.P. Senellart.Automatic extraction of synonyms in a dictionary.Technical Report 89, Université catholique de Louvain, Louvain-la-neuve, Belgium, 2001. Presented at the Text Mining Workshop 2002 in Arlington, VA.
Google Scholar
V.D. Blondel and P. Van Dooren.A measure of graph similarity between graph vertices.Technical Report, Université catholique de Louvain, Louvain-la-neuve, Belgium, 2002.
Google Scholar
H. Chen and K.J. Lynch.Automatic construction of networks of concepts characterizing document databases.IEEE Transactions on Systems, Man and Cybernetics, 22 (5): 885–902, 1992.
Article Google Scholar
C.J. Crouch.An approach to the automatic construction of global thesauri.Information Processing and Management, 26: 629–640, 1990.
Article Google Scholar
J. Dean and M.R. Henzinger.Finding related pages in the World Wide Web. WWW8/Computer Networks, 31 (11–16): 1467–1479, 1999.
Article MathSciNet Google Scholar
G. Grefenstette.Automatic thesaurus generation from raw text using knowledge-poor techniques.In Making Sense of Words. Ninth Annual Conference of the UW Centre for the New OED and Text Research. 9, 1993.
Google Scholar
G. Grefenstette.Explorations in Automatic Thesaurus Discovery.Kluwer Academic, Boston, 1994.
Google Scholar
M. Heymans.Extraction d’information dans les graphes, et application aux moteurs de recherche sur interne, Jun 2001. Université Catholique de Louvain, Faculté des Sciences Appliquées, Département d’Ingénierie Mathématique.
Google Scholar
JW99] J. Jannink and G. Wiederhold.Thesaurus entry extraction from an on-line dictionary.In Proceedings of Fusion ‘89,Sunnyvale, CA, Jul 1999.
Google Scholar
J.M. Kleinberg.Authoritative sources in a hyperlinked environment. Journal of the ACM, 46 (5): 604–632, 1999.
Article MathSciNet Google Scholar
The online plain text english dictionary, 2000.http: //msowww. anu. edu. au/ralph/OPTED/.
Google Scholar
P. P. Senellart.Extraction of information in large graphs. Automatic search for synonyms.Technical Report 90, Université catholique de Louvain, Louvain-laneuve, Belgium, 2001.
Google Scholar
G. Salton, C.S. Yang, and C.T. Yu.A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26 (1): 33–44, 1975.
Article Google Scholar
P. D. Turney.Mining the Web for synonyms: PMI-IR versus LSA on TOEFL.In Proceedings of the European Conference on Machine Learning, pages 491–502, 2001.
Google Scholar
Wordnet 1.6, 1998.http://www.cogsci.princeton.edu/~wn/.

Download references

Authors

Pierre P. Senellart
View author publications
You can also search for this author in PubMed Google Scholar
Vincent D. Blondel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Tennessee, 203 Claxton Complex, 37996-3450, Knoxville, TN, USA
Michael W. Berry

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Senellart, P.P., Blondel, V.D. (2004). Automatic Discovery of Similar Words. In: Berry, M.W. (eds) Survey of Text Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-4305-0_2

Download citation

DOI: https://doi.org/10.1007/978-1-4757-4305-0_2
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-3057-6
Online ISBN: 978-1-4757-4305-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Automatic Discovery of Similar Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Domain-agnostic discovery of similarities and concepts at scale

Efficient Search of Cosine and Tanimoto Near Duplicates among Vectors with Domains Consisting of Zero, a Positive Number and a Negative Number

Dimensions of Semantic Similarity

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Discovery of Similar Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Domain-agnostic discovery of similarities and concepts at scale

Efficient Search of Cosine and Tanimoto Near Duplicates among Vectors with Domains Consisting of Zero, a Positive Number and a Negative Number

Dimensions of Semantic Similarity

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation