Abstract
This paper describes our work on participation in the FIRE 2010 evaluation campaign in the cross lingual information retrieval track. We describe how cross lingual information retrieval can be effectively performed between a highly agglutinative language, Tamil and English, an isolating language. Agglutination is a morphological process of adding affixes to word base. These affixations can be between noun- noun, adjective-noun, noun-case, etc. This phenomenon of the language has brought serious problems in translation, transliteration and expansion of the query into another language. To overcome these we have used a morphological analyzer which gives the root word or a word base. The word base is used in turn for translation, transliteration and query expansion. The translation of the query is done using bilingual dictionary and transliteration uses statistical method. And query expansion is performed using ontology and WordNet.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Afraz, M., Sobha, L.: English to Dravidian Language Machine Transliteration: A Statistical Approach Based on N-grams. In: International Seminar on Malayalam and Globalization (2008)
Bandyopadhyay, S., Mondal, T., Naskar, S.K., Ekbal, A., Haque, R., Godhavarthy, S.R.: Bengali, Hindi and Telugu to English Ad-hoc Bilingual Task at CLEF 2007. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 88–94. Springer, Heidelberg (2008)
Chinnakotla, M.K., Ranadive, S., Damani, O.P., Bhattacharyya, P.: Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 111–118. Springer, Heidelberg (2008)
Demner-Fushman, D., Oard, D.W.: The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval. In: 36th Annual Hawaii International Conference on System Sciences (HICSS 2003) – Track 4 (2003)
Jagarlamudi, J., Kumaran, A.: Cross-Lingual Information Retrieval System for Indian Languages. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 80–87. Springer, Heidelberg (2008)
Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments (Part 1 & 2). Information Processing and Management 36(6), 779–840 (2000)
Lehmann, T.: A Grammar of Modern Tamil. Pondicherry Institute of Linguistics and Culture, Pondicherry (1989)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An on-line lexical Database (1993)
Oard, D.W.: The surprise language exercises. ACM Transactions on Asian Language Information Processing (TALIP) 2(2), 79–84 (2003)
Perez-Iglesias, J., Perez-Aguera, J.R., Fresno, V., Feinstein, Y.Z.: Integrating the Probabilistic Models BM25/BM25F into Lucene. CoRR vol. Abs/0911.5046 (2009)
Pingali, P., Varma, V.: IIIT Hyderabad at CLEF 2007 Adhoc Indian Language CLIR task. In: Nardi, A., Peters, C. (eds.) Working Notes for CLEF 2007 Workshop (2007)
Pirkola, A.: The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference, pp. 55–63 (1998)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Sproat, R.: Morphology and Computation. MIT Press, Cambridge (1992)
Thiyagarajan, S., Arulmozi, S., Rajendran, S.: Tamil WordNet. In: First Global WordNet Conference, CIIL, Mysore (2002)
Vijay Sundar Ram, R., Menaka, S., Sobha, L.D.: Tamil Morphological Analyser. In: Parakh, M. (ed.) Morphological Analysers and Generators, LDC-IL, Mysore, pp. 1–18 (2010)
Viswanathan, S., Ramesh Kumar, S., Kumara Shanmugam, B., Arulmozi, S., Vijay Shanker, K.: A Tamil Morphological Analyser. In: Proceedings of International Conference on Natural Language Processing (ICON), Mysore (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rao, T.P.R.K., Lalitha Devi, S. (2013). Tamil English Cross Lingual Information Retrieval. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds) Multilingual Information Access in South Asian Languages. Lecture Notes in Computer Science, vol 7536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40087-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-40087-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40086-5
Online ISBN: 978-3-642-40087-2
eBook Packages: Computer ScienceComputer Science (R0)