Abstract
The translation stage in cross language information retrieval (CLIR) acts as the main enabling stage to cross the language barrier between documents and queries. In recent years machine translation (MT) systems have become the dominant approach to translation in CLIR. However, unlike information retrieval (IR), MT focuses on the morphological and syntactical quality of the sentence. This requires large training resources and high computational power for training and translation. We present a novel technique for MT designed specifically for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this pre-processing step is found to significantly speed up the translation process without affecting the retrieval quality.
This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (CNGL) project at Dublin City University.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: ACL 2007 (2007)
Magdy, W., Jones, G.J.F.: PRES: a score metric for evaluating recall-oriented information retrieval applications. In: SIGIR 2010 (2010)
Magdy, W., Jones, G.J.F.: Applying the KISS Principle for the CLEF-IP 2010 Prior Art Candidate Patent Search Task. In: CLEF 2010 (2010)
Oard, D.W., Diekema, A.R.: Cross-Language Information Retrieval. In: Williams, M. (ed.) ARIST (1998)
Piroi, F.: CLEF-IP 2010: Retrieval Experiments in the Intellectual Property Domain. In: CLEF 2010 (2010)
Strohman, T., Metzler, D., Turtle, H., Croft, W.B.: Indri: A language model-based search engine for complex queries. In: ICIA (2004)
Stroppa, N., Way, A.: MaTrEx: DCU Machine Translation System for IWSLT 2006. In: IWSLT (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Magdy, W., Jones, G.J.F. (2011). Should MT Systems Be Used as Black Boxes in CLIR?. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_70
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)