Cross-Lingual Information Retrieval System for Indian Languages

Jagarlamudi, Jagadeesh; Kumaran, A.

doi:10.1007/978-3-540-85760-0_10

Jagadeesh Jagarlamudi¹ &
A. Kumaran¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5152))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

735 Accesses
21 Citations

Abstract

This paper describes our attempt to build a Cross-Lingual Information Retrieval (CLIR) system as a part of the Indian language sub-task of the main Adhoc monolingual and bilingual track in CLEF competition. In this track, the task required retrieval of relevant documents from an English corpus in response to a query expressed in different Indian languages including Hindi, Tamil, Telugu, Bengali and Marathi. Groups participating in this track were required to submit a English to English monolingual run and a Hindi to English bilingual run with optional runs in rest of the languages. Our submission consisted of a monolingual English run and a Hindi to English cross-lingual run.

We used a word alignment table that was learnt by a Statistical Machine Translation (SMT) system trained on aligned parallel sentences, to map a query in the source language into an equivalent query in the language of the document collection. The relevant documents are then retrieved using a Language Modeling based retrieval algorithm. On the CLEF 2007 data set, our official cross-lingual performance was 54.4% of the monolingual performance and in the post submission experiments we found that it can be significantly improved up to 76.3%.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF

Tamil English Cross Lingual Information Retrieval

Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Internet, http://www.internetworldstats.com
GlobalReach, http://www.global-reach.biz/globstats/evol.html
Ballesteros, L., Croft, W.B.: Dictionary methods for cross-lingual information retrieval. In: Thoma, H., Wagner, R.R. (eds.) DEXA 1996. LNCS, vol. 1134, pp. 791–801. Springer, Heidelberg (1996)
Chapter Google Scholar
Hull, D.A., Grefenstette, G.: Querying across languages: A dictionary-based approach to Multilingual Information Retrieval. In: SIGIR 1996: Proc. of the 19th annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 49–57. ACM Press, New York (1996)
Chapter Google Scholar
McNamee, P., Mayfield, J.: Comparing Cross-Language Query Expansion Techniques by Degrading Translation Resources. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 159–166. ACM Press, New York (2002)
Chapter Google Scholar
Pirkola, A., Hedlund, T., Keskustalo, H., Järvelin, K.: Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4(3-4), 209–230 (2001)
Article MATH Google Scholar
Moulinier, I., Schilder, F.: What is the future of multi-lingual information access?. In: SIGIR 2006 Workshop on Multilingual Information Access 2006, Seattle, Washington, USA (2006)
Google Scholar
Burkhart, G.E., Goodman, S.E., Mehta, A., Press, L.: The Internet in India: Better times ahead?. Commun. ACM 41(11), 21–26 (1998)
Article Google Scholar
Bharati, A., Sangal, R., Sharma, D.M., Kulakarni, A.P.: Machine Translation activities in India: A survey. In: Workshop on survey on Research and Development of Machine Translation in Asian Countries (2002)
Google Scholar
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Article Google Scholar
Kwok, K.L., Choi, S., Dinstl, N.: Rich results from poor resources: Ntcir-4 monolingual and cross-lingual retrieval of korean texts using chinese and english. ACM Transactions on Asian Language Information Processing (TALIP) 4(2), 136–162 (2005)
Article Google Scholar
Kumaran, A., Kellner, T.: A generic framework for machine transliteration. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 721–722. ACM Press, New York (2007)
Chapter Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: English Translation in Soviet Physics Doklady, pp. 707–710 (1966)
Google Scholar
Ponte, J.M., Croft, W.B.: A Language Modeling Approach to Information Retrieval. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program: News of Computers in British University libraries 14, 130–137 (1980)
Google Scholar
Bhogal, J., Macfarlane, A., Smith, P.: A review of ontology based query expansion. Inf. Process. Manage. 43(4), 866–886 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Multilingual Systems Research, Microsoft Research India, Bangalore, India
Jagadeesh Jagarlamudi & A. Kumaran

Authors

Jagadeesh Jagarlamudi
View author publications
You can also search for this author in PubMed Google Scholar
A. Kumaran
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Carol Peters Valentin Jijkoun Thomas Mandl Henning Müller Douglas W. Oard Anselmo Peñas Vivien Petras Diana Santos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jagarlamudi, J., Kumaran, A. (2008). Cross-Lingual Information Retrieval System for Indian Languages. In: Peters, C., et al. Advances in Multilingual and Multimodal Information Retrieval. CLEF 2007. Lecture Notes in Computer Science, vol 5152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85760-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-85760-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85759-4
Online ISBN: 978-3-540-85760-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cross-Lingual Information Retrieval System for Indian Languages

Abstract

Chapter PDF

Similar content being viewed by others

Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF

Tamil English Cross Lingual Information Retrieval

Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Cross-Lingual Information Retrieval System for Indian Languages

Abstract

Chapter PDF

Similar content being viewed by others

Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF

Tamil English Cross Lingual Information Retrieval

Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation