Abstract
This chapter describes the lessons learnt from the ad hoc track at CLEF in the years 2000 to 2009. This contribution focuses on Information Retrieval (IR) for languages other than English (monolingual IR), as well as bilingual IR (also termed “cross-lingual”; the request is written in one language and the searched collection in another), and multilingual IR (the information items are written in many different languages). During these years the ad hoc track has used mainly newspaper test collections, covering more than 15 languages. The authors themselves have designed, implemented and evaluated IR tools for all these languages during those CLEF campaigns. Based on our own experience and the lessons reported by other participants in these years, we are able to describe the most important challenges when designing a IR system for a new language. When dealing with bilingual IR, our experiments indicate that the critical point is the translation process. However, currently online translating systems tend to offer rather effective translation from one language to another, especially when one of these languages is English. In order to solve the multilingual IR question, different IR architectures are possible. For the simplest approach based on query translation of individual language pairs, the crucial component is the merging of the intermediate bilingual results. When considering both document and query translation, the complexity of the whole system represents clearly a main issue.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Amati G, van Rijsbergen CJ (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inf Syst 20:357–389
Ballesteros L, Croft BW (1997) Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings ACM SIGIR. ACM Press, New York, pp 84–91
Braschler M (2004) Combination approaches for multilingual text retrieval. Inform Retrieval J 7:183–204
Braschler M, Ripplinger B (2004) How effective is stemming and decompounding for German text retrieval? Inform Retrieval J 7:291–316
Braschler M, Schäuble P (2001) Experiments with the eurospider retrieval system for CLEF 2000. In: Peters C (ed) Cross-language information retrieval and evaluation. LNCS, vol 2069, Springer, Berlin pp 140–148
Braschler M, Göhring A, Schäuble P (2003) Europsider at CLEF 2002. In: Peters P, Braschler M, Gonzalo J, Kluck M (eds) Advances in cross-language information retrieval: third workshop of the cross–language evaluation forum (CLEF 2002) revised papers. LNCS, vol 2785. Springer, Berlin, pp 164–174
Buckley C, Singhal A, Mitra M, Salton G (1995) New retrieval approaches using SMART. In: Proceedings TREC-4, NIST, Gaithersburg, pp 25–48
Buckley C, Singhal A, Mitra M, Salton G (1997) Using clustering and superconcepts within SMART: TREC-6. In: Proceedings TREC-6, NIST, Gaithersburg, pp 107–124
Chen A (2004) Report on CLEF-2003 monolingual tracks: fusion of probabilistic models for effective monolingual retrieval. In: Peters C, Gonzalo J, Braschler M, Kluck M (eds) Comparative evaluation of multilingual information access systems, LNCS, vol 3237. Springer, Berlin, pp 322–336
Crocker C (2006) Løst in Tränšlatioπ. Misadventures in English abroad. Michael 0’Mara Books, London
Dolamic L, Savoy J (2009a) Indexing and searching strategies for the Russian language. J Am Soc Inf Sci Technol 60:2540–2547
Dolamic L, Savoy J (2009b) Indexing and stemming approaches for the Czech language. Inf Process Manag 45:714–720
Dolamic L, Savoy J (2010a) Comparative study of indexing and search strategies for the Hindi, Marathi and Bengali languages. ACM Trans Asian Lang Inf Process 9(3):11
Dolamic L, Savoy J (2010b) Retrieval effectiveness of machine translated queries. J Am Soc Inf Sci Technol 61:2266–2273
Dolamic L, Savoy J (2010c) When stopword lists make the difference. J Am Soc Inf Sci Technol 61:200–203
Dumais ST (1994) Latent semantic indexing (LSI) and TREC-2. In: Proceedings TREC-2, vol #500-215. NIST, Gaithersburg, pp 105–115
Fautsch C, Savoy J (2009) Algorithmic stemmers or morphological analysis: an evaluation. J Am Soc Inf Sci Technol 60:1616–1624
Ferro N, Silvello G (2016a) A general linear mixed models approach to study system component effects. In: Proceedings ACM SIGIR. ACM Press, New York, pp 25–34
Ferro N, Silvello G (2016b) The CLEF monolingual grid of points. In: Fuhr N, Quaresma P, Gonçalves T, Larsen B, Balog K, Macdonald C, Cappellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Proceedings of the eighth international conference of the CLEF association (CLEF 2017). LNCS, vol 9822. Springer, Berlin, pp 13–24
Fox C (1990) A stop list for general text. ACM-SIGIR Forum 24:19–35
Fox EA, Shaw JA (1994) Combination of multiple searches. In: Proceedings TREC-2, vol 500-215. NIST, Gaithersburg, pp 243–249
Gotti F, Langlais P, Lapalme G (2013) Designing a machine translation system for the Canadian weather warnings: a case study. Nat Lang Eng 20:399–433
Harman DK (1991) How effective is suffixing? J Am Soc Inf Sci 42:7–15
Hedlund T, Airio E, Keskustalo H, Lehtokangas R, Pirkola A, Järvelin K (2004) Dictionary-based cross-language information retrieval: learning experiences from CLEF 2000–2002. Inf Retrieval J 7:99–120
Hiemstra D (2000) Using language models for IR. PhD thesis, CTIT, Enschede
Kraaij W, Nie JY, Simard M (2003) Embedding web-based statistical translation models in cross-lingual information retrieval. Comput Linguist 29:381–419
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
McNamee P, Mayfield J (2002) Scalable Multilingual Information Access. In: Peters P, Braschler M, Gonzalo J, Kluck M (eds) Advances in cross-language information retrieval. LNCS, vol 2785. Springer, Berlin, pp 207–218
McNamee P, Mayfield J (2004) Character N-gram tokenization for European language text retrieval. Inf Retrieval J 7:73–98
McNamee P, Nicholas C, Mayfield J (2009) Addressing morphological variation in alphabetic languages. In: Proceedings ACM - SIGIR. ACM Press, New York, pp 75–82
Moulinier I (2004) Thomson legal and regulatory at NTCIR-4: monolingual and pivot-language retrieval experiments. In: Proceedings NTCIR-4, pp 158–165
Nie JY, Simard M, Isabelle P, Durand R (1999) Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In: Proceedings ACM - SIGIR. ACM Press, New York, pp 74–81
Paik JH, Parai SK (2011) A fast corpus-based stemmer. ACM Trans Asian Lang Inf Process 10(2):8
Paik JH, Parai SK, Dipasree P, Robertson SE (2013) Effective and robust query-based stemming. ACM Trans Inf Syst 31(4):18
Peters C, Braschler M, Clough P (2012) Multilingual information retrieval. From research to practice. Springer, Berlin
Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137
Powell AL, French JC, Callan J, Connell M, Viles CL (2000) The impact of database selection on distributed searching. In: Proceedings ACM-SIGIR. ACM Press, New York, pp 232–239
Rasolofo Y, Hawking D, Savoy J (2003) Result merging strategies for a current news metasearcher. Inf Process Manage 39:581–609
Robertson SE, Walker S, Beaulieu M (2000) Experimentation as a way of life: Okapi at TREC. Inf Process Manage 36:95–108
Sanders RH (2010) German, biography of a language. Oxford University Press, Oxford
Savoy J (2003a) Cross-language information retrieval: experiments based on CLEF 2000 corpora. Inf Process Manage 39:75–115
Savoy J (2003b) Cross-language retrieval experiments at CLEF 2002. In: Peters P, Braschler M, Gonzalo J, Kluck M (eds) Advances in cross-language information retrieval. LNCS, vol 2785. Springer, Berlin, pp 28–48
Savoy J (2004) Combining multiple strategies for effective monolingual and cross-lingual retrieval. Inf Retrieval J 7:121–148
Savoy J (2005) Comparative study of monolingual and multilingual search models for use with Asian languages. ACM Trans Asian Lang Inf Process 4:163–189
Savoy J (2006) Light stemming approaches for the French, Portuguese, German and Hungarian languages. In: Proceedings ACM-SAC. ACM Press, New York, pp 1031–1035
Savoy J (2008a) Searching strategies for the Bulgarian language. Inf Retrieval J 10:509–529
Savoy J (2008b) Searching strategies for the Hungarian language. Inf Process Manage 44:310–324
Savoy J, Berger PY (2005) Selecting and merging strategies for multilingual information retrieval. In: Peters C, Clough P, Gonzalo J, Jones GJF, Kluck M, Magnini B (eds) Multilingual information access for text, speech and images. LNCS, vol 3491. Springer, Berlin, pp 27–37
Savoy J, Dolamic L (2010) How effective is Google’s translation service in search? Commun ACM 52:139–143
Zhou D, Truran M, Brailsford T, Wade V, Ashman H (2012) Translation techniques in cross-language information retrieval. ACM Comput Surv 45(1):1
Acknowledgement
The authors would like to thank the CLEF organizers for their efforts in developing the CLEF test collections.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Savoy, J., Braschler, M. (2019). Lessons Learnt from Experiments on the Ad Hoc Multilingual Test Collections at CLEF. In: Ferro, N., Peters, C. (eds) Information Retrieval Evaluation in a Changing World. The Information Retrieval Series, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-22948-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-22948-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22947-4
Online ISBN: 978-3-030-22948-1
eBook Packages: Computer ScienceComputer Science (R0)