Abstract
For cross-language text retrieval systems that rely on bilingual dictionaries for bridging the language gap between the source query language and the target document language, good bilingual dictionary coverage is imperative. For terms with missing translations, most systems employ some approaches for expanding the existing translation dictionaries. In this paper, instead of lexicon expansion, we explore whether using the context of the unknown terms can help mitigate the loss of meaning due to missing translation. Our approaches consist of two steps: (1) to identify terms that are closely associated with the unknown source language terms as context vectors and (2) to use the translations of the associated terms in the context vectors as the surrogate translations of the unknown terms. We describe a query-independent version and a query-dependent version using such monolingual context vectors. These methods are evaluated in Japanese-to-English retrieval using the NTCIR-3 topics and data sets. Empirical results show that both methods improved CLIR performance for short and medium-length queries and that the query-dependent context vectors performed better than the query-independent versions.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ballesteros, L., Croft, B.: Dictionary Methods for Cross-Language Information Retrieval. In: Proceedings of Database and Expert Systems Applications, pp. 791–801 (1996)
Ballesteros, L., Croft, W.B.: Resolving Ambiguity for Cross-Language Retrieval. In: Proceedings of SIGIR, pp. 64–71 (1998)
Billhardt, H., Borrajo, D., Maojo, V.: A Context Vector Model for Information Retrieval. Journal of the American Society for Information Science and Technology 53(3), 236–249 (2002)
Evans, D.A., Lefferts, R.G.: CLARIT–TREC Experiments. Information Processing and Management 31(3), 385–395 (1995)
Fujii, A., Ishikawa, T.: Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration. Computer and the Humanities 35(4), 389–420 (2001)
Fung, P.: A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora. In: Proceedings of AMTA, pp. 1–17 (1998)
Fung, P., Yee, L.Y.: An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In: Proceedings of COLING-ACL, pp. 414–420 (1998)
Hull, D.A., Grefenstette, G.: Experiments in Multilingual Information Retrieval. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–57 (1996)
Grefenstette, G.: Evaluating the Adequacy of a Multilingual Transfer Dictionary for Cross Language Information Retrieval. In: Proceedings of LREC, pp. 755–758 (1998)
Grefenstette, G.: The Problem of Cross Language Information Retrieval. In: Grefenstette, G. (ed.) Cross Language Information Retrieval, pp. 1–9. Kluwer Academic Publishers, Dordrecht (1998)
Grefenstette, G., Qu, Y., Evans, D.A.: Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese. In: Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 110–116 (2004)
Ido, D., Church, K., Gale, W.A.: Robust Bilingual Word Alignment for Machine Aided Translation. In: Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, pp. 1–8 (1993)
Jeong, K.S., Myaeng, S., Lee, J.S., Choi, K.S.: Automatic Identification and Back-transliteration of Foreign Words for Information Retrieval. Information Processing and Management 35(4), 523–540 (1999)
Knight, K., Graehl, J.: Machine Transliteration. Computational Linguistics 24(4), 599–612 (1998)
Kumano, A., Hirakawa, H.: Building an MT dictionary from Parallel Texts Based on Linguistic and Statistical Information. In: Proceedings of the 15th International Conference on Computational Linguistics (COLING), pp. 76–81 (1994)
Meng, H., Lo, W., Chen, B., Tang, K.: Generating Phonetic Cognates to Handel Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In: Proc. of the Automatic Speech Recognition and Understanding Workshop, ASRU 2001 (2001)
Pirkola, A., Puolamaki, D., Jarvelin, K.: Applying Query Structuring in Cross-Language Retrieval. Information Management and Processing: An International Journal 39(3), 391–402 (2003)
Qu, Y., Grefenstette, G.: Finding Ideographic Representations of Japanese Names in Latin Scripts via Language Identification and Corpus Validation. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 183–190 (2004)
Qu, Y., Grefenstette, G., Evans, D.A.: Resolving Translation Ambiguity Using Monolingual Corpora. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 223–241. Springer, Heidelberg (2003)
Qu, Y., Grefenstette, G., Evans, D.A.: Automatic Transliteration for Japanese-to-English Text Retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 353–360 (2003)
Qu, Y., Hull, D.A., Grefenstette, G., Evans, D.A., Ishikawa, M., Nara, S., Ueda, T., Noda, D., Arita, K., Funakoshi, Y., Matsuda, H.: Towards Effective Strategies for Monolingual and Bilingual Information Retrieval: Lessons Learned from NTCIR-4. In: ACM Transactions on Asian Language Information Processing (to appear)
Zhang, Y., Vines, P.: Using the web for automated translation extraction in cross-language information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 162–169 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qu, Y., Grefenstette, G., Evans, D.A. (2005). The Use of Monolingual Context Vectors for Missing Translations in Cross-Language Information Retrieval. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_3
Download citation
DOI: https://doi.org/10.1007/11562214_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)