Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

Mavroeidis, Dimitrios; Tsatsaronis, George; Vazirgiannis, Michalis; Theobald, Martin; Weikum, Gerhard

doi:10.1007/11564126_21

Dimitrios Mavroeidis²³,
George Tsatsaronis²³,
Michalis Vazirgiannis²³,
Martin Theobald²⁴ &
…
Gerhard Weikum²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3721))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3044 Accesses
21 Citations
3 Altmetric

Abstract

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional “bag of words” representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.

Download to read the full chapter text

Chapter PDF

Kernel methods for word sense disambiguation

Article 30 December 2015

WSD-TIC: Word Sense Disambiguation Using Taxonomic Information Content

WordNet and Wiktionary-Based Approach for Word Sense Disambiguation

References

Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: Advances in Neural Information Processing Systems (NIPS), pp. 155–161 (1996)
Google Scholar
Kehagias, A., Petridis, V., Kaburlasos, V.G., Fragkou, P.: A comparison of word- and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21, 227–247 (2003)
Article Google Scholar
Hwang, R., Richards, D., Winter, P.: The steiner tree problem. Annals of Discrete Mathematics 53 (1992)
Google Scholar
Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Mining for and from the Semantic Web Workshop, pp. 70–87 (2004)
Google Scholar
Rosso, P., Ferretti, E., Jimenez, D., Vidal, V.: Text categorization and information retrieval using wordnet senses. In: Proc. of the 2nd International WordNet Conference, GWC (2004)
Google Scholar
Scott, S., Matwin, S.: Feature engineering for text classification. In: Proc. of the 16th International Conference on Machine Learning (ICML), pp. 379–388 (1999)
Google Scholar
Theobald, M., Schenkel, R., Weikum, G.: Exploiting structure, annotation, and ontological knowledge for automatic classification of xml data. In: International Workshop on Web and Databases (WebDB), pp. 1–6 (2003)
Google Scholar
Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: Proc. of the 8th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 18–25 (1985)
Google Scholar
Fellbaum, C. (ed.): WordNet, An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Siolas, G., d’Alche Buc, F.: Support vector machines based on semantic kernel for text categorization. In: Proc. of the International Joint Conference on Neural Networks (IJCNN), vol. 5, pp. 205–209. IEEE Press, Los Alamitos (2000)
Google Scholar
Sussna, M.: Word sense disambiguation for free-text indexing using a massive semantic network. In: Proc. of the 2nd International Conference on Information and Knowledge Management (CIKM), pp. 67–74 (1993)
Google Scholar
Agirre, E., Rigau, G.: A proposal for word sense disambiguation using conceptual distance. In: Proc. of Recent Advances in NLP (RANLP), pp. 258–264 (1995)
Google Scholar
Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: Proc. of the 18th International Joint Conference on Artificial Intelligence (IJCAI), pp. 805–810 (2003)
Google Scholar
Molina, A., Pla, F., Segarra, E.: A hidden markov model approach to word sense disambiguation. In: Proc. of the 8th Iberoamerican Conference on Artificial Intelligence (2002)
Google Scholar
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. of the International Conference on Research in Computational Linguistics (1997)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proc. of the 14th International Joint Conference on Artificial Intelligence (IJCAI) (1995)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proc. of the 15th International Conference on Machine Learning (ICML), pp. 296–304 (1998)
Google Scholar
Mavroeidis, D., Tsatsaronis, G., Vazirgiannis, M.: Semantic distances for sets of senses and applications in word sense disambiguation. In: Proc. of the 3rd International Workshop on Text Mining and its Applications (2004)
Google Scholar
Devitt, A., Vogel, C.: The topology of wordnet: Some metrics. In: Proc. of the 2nd International WordNet Conference (GWC), pp. 106–111 (2004)
Google Scholar
Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: Proc. of the 17th International Conference on Machine Learning (ICML), pp. 487–494 (2000)
Google Scholar
Cowie, J., Guthrie, J., Guthrie, L.: Lexical disambiguation using simulated annealing. In: 14th International Conference on Computational Linguistics (COLING), pp. 359–365 (1992)
Google Scholar
Manning, C., Schuetze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Athens University of Economics and Business, Greece
Dimitrios Mavroeidis, George Tsatsaronis & Michalis Vazirgiannis
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Martin Theobald & Gerhard Weikum

Authors

Dimitrios Mavroeidis
View author publications
You can also search for this author in PubMed Google Scholar
George Tsatsaronis
View author publications
You can also search for this author in PubMed Google Scholar
Michalis Vazirgiannis
View author publications
You can also search for this author in PubMed Google Scholar
Martin Theobald
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Weikum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6, 4050-190, Porto, Portugal
Luís Torgo
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel Brazdil
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
Faculty of Economics of the University of Porto, Portugal
João Gama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mavroeidis, D., Tsatsaronis, G., Vazirgiannis, M., Theobald, M., Weikum, G. (2005). Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_21

Download citation

DOI: https://doi.org/10.1007/11564126_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

Abstract

Chapter PDF

Similar content being viewed by others

Kernel methods for word sense disambiguation

WSD-TIC: Word Sense Disambiguation Using Taxonomic Information Content

WordNet and Wiktionary-Based Approach for Word Sense Disambiguation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification

Abstract

Chapter PDF

Similar content being viewed by others

Kernel methods for word sense disambiguation

WSD-TIC: Word Sense Disambiguation Using Taxonomic Information Content

WordNet and Wiktionary-Based Approach for Word Sense Disambiguation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation