An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval

Sorg, Philipp; Cimiano, Philipp

doi:10.1007/978-3-642-12550-8_4

Philipp Sorg²⁰ &
Philipp Cimiano²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5723))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

914 Accesses
9 Citations

Abstract

Explicit Semantic Analysis (ESA) has been recently proposed as an approach to computing semantic relatedness between words (and indirectly also between texts) and has thus a natural application in information retrieval, showing the potential to alleviate the vocabulary mismatch problem inherent in standard Bag-of-Word models. The ESA model has been also recently extended to cross-lingual retrieval settings, which can be considered as an extreme case of the vocabulary mismatch problem. The ESA approach actually represents a class of approaches and allows for various instantiations. As our first contribution, we generalize ESA in order to clearly show the degrees of freedom it provides. Second, we propose some variants of ESA along different dimensions, testing their impact on performance on a cross-lingual mate retrieval task on two datasets (JRC-ACQUIS and Multext). Our results are interesting as a systematic investigation has been missing so far and the variations between different basic design choices are significant. We also show that the settings adopted in the original ESA implementation are reasonably good, which to our knowledge has not been demonstrated so far, but can still be significantly improved by tuning the right parameters (yielding a relative improvement on a cross-lingual mate retrieval task of between 62% (Multext) and 237% (JRC-ACQUIS) with respect to the original ESA model).

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Ad hoc retrieval via entity linking and semantic similarity

Article 21 April 2018

Cross-Lingual Natural Language Querying over the Web of Data

DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

Article 13 January 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Richardson, R., Smeaton, A.: Using wordnet in a knowledge-based approach to information retrieval. In: Proceedings of the BCS-IRSG-Colloquium (1995)
Google Scholar
Schütze, H., Pedersen, J.: A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing and Management 33(3), 307–318 (1997)
Article Google Scholar
Gurevych, I., Müller, C., Zesch, T.: What to be? - electronic career guidance based on semantic relatedness. In: Proceedings of ACL (2007)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, pp. 38–44 (1998)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of IJCAI, pp. 1606–1611 (2007)
Google Scholar
Furnas, G., Landauer, T., Gomez, L., Dumais, S.: The vocabulary problem in human-system communication. Communications of the ACM 30(1), 964–971 (1987)
Article Google Scholar
Sorg, P., Cimiano, P.: Cross-lingual information rerieval with explicit semantic analysis. In: Working Notes of the Annual CLEF Meeting (2008)
Google Scholar
Potthast, M., Stein, B., Anderka, M.: A wikipedia-based multilingual retrieval model. In: Proceedings of ECIR, pp. 522–530 (2008)
Google Scholar
Littman, M., Dumais, S., Landauer, T.: Automatic Cross-Language Information Retrieval using Latext Semantic Indexing. In: Cross-Language Information Retrieval, pp. 51–62. Kluwer, Dordrecht (1998)
Google Scholar
Dumais, S., Letsche, T., Littman, M., Landauer, T.: Automatic cross-language retrieval using latent semantic indexing. In: Proceedings of the AAAI Symposium on Cross Language Text and Speech Retrieval (1997)
Google Scholar
Müller, C., Gurevych, I.: Using wikipedia and wiktionary in domain-specific information retrieval. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) Evaluating Systems for Multilingual and Multimodal Information Access. LNCS, vol. 5706, pp. 219–226. Springer, Heidelberg (2009)
Chapter Google Scholar
Gabrilovich, E.: Feature Generation for Textual Information Retrieval using World Knowledge. PhD thesis, Israel Institute of Technology, Haifa (2006)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: Proceedings of TREC (1994)
Google Scholar
Zhai, C.X., Lafferty, J.D.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of CIKM, pp. 403–410 (2001)
Google Scholar
Lee, L.: Measures of distributional similarity. In: Proceedings of ACL (1999)
Google Scholar
Egozi, O., Gabrilovich, E., Markovitch, S.: Concept-based feature generation and selection for information retrieval. In: Proceedings of AAAI (2008)
Google Scholar
Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of IJCAI (2005)
Google Scholar
Gupta, R., Ratinov, L.: Text categorization with knowledge transfer from heterogeneous data sources. In: Proceedings of AAAI, pp. 842–847 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute AIFB, University of Karlsruhe,
Philipp Sorg
Web Information Systems Group, Delft University of Technology,
Philipp Cimiano

Authors

Philipp Sorg
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Cimiano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Computertechnologie, Technische Universität Wien, A-1040, Wien, Austria
Helmut Horacek
CNAM- Laboratoire Cédric, 292 Rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Campus de San Vincente del Raspeig, Apdo 99, 03080, Alicante, Spain
Rafael Muñoz
Dept. of Computational Linguistics, Saarland University, Germany
Magdalena Wolska

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sorg, P., Cimiano, P. (2010). An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval. In: Horacek, H., Métais, E., Muñoz, R., Wolska, M. (eds) Natural Language Processing and Information Systems. NLDB 2009. Lecture Notes in Computer Science, vol 5723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12550-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-12550-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12549-2
Online ISBN: 978-3-642-12550-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval

Abstract

Chapter PDF

Similar content being viewed by others

Ad hoc retrieval via entity linking and semantic similarity

Cross-Lingual Natural Language Querying over the Web of Data

DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval

Abstract

Chapter PDF

Similar content being viewed by others

Ad hoc retrieval via entity linking and semantic similarity

Cross-Lingual Natural Language Querying over the Web of Data

DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation