N-Grams for Translation and Retrieval in CL-SDR

McNamee, Paul; Mayfield, James

doi:10.1007/978-3-540-30222-3_63

Paul McNamee¹⁹ &
James Mayfield¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3237))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

387 Accesses
2 Citations

Abstract

We report on a first attempt to perform cross-language spoken document retrieval. Without prior monolingual speech retrieval experience we applied the same general approach we use for bilingual retrieval that is typified by the use of overlapping character n-grams for tokenization and a statistical language model of retrieval. An innovative approach was adopted for coping with out-of-vocabulary words and misspelled or mistranscribed words: direct translation of individual n-grams was the sole mechanism to translate source language queries into target language terms. Though this approach shows promise, especially for non-speech retrieval, our performance appears to lag that of other teams participating in this novel evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Adding Multilingual Terminological Resources to Parallel Corpora for Statistical Machine Translation Deteriorates System Performance: A Negative Result from Experiments in the Biomedical Domain

BBN’s low-resource machine translation for the LoReHLT 2016 evaluation

Article 24 October 2017

Constraint Grammar-Based Swedish-Danish Machine Translation

References

McNamee, P., Mayfield, J.: JHU/APL Experiments in Tokenization and Non-Word Translation. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 85–97. Springer, Heidelberg (2004)
Chapter Google Scholar
Ng, C., Wilkinson, R., Zobel, J.: Experiments in Spoken Document Retrieval Using Phoneme N-grams. Speech Communication 32, 1–2, 61–77 (2000)
Google Scholar
Ng, K.: Subword-based Approaches for Spoken Document Retrieval. Ph.D. Thesis. MIT (2000)
Google Scholar
McNamee, P., Mayfield, J.: Scalable Multilingual Information Access. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 207–218. Springer, Heidelberg (2003)
Chapter Google Scholar
McNamee, P., Mayfield, J.: Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval (to appear)
Google Scholar
Hiemstra, D.: Using Language Models for Information Retrieval. Ph. D. Thesis. Center for Telematics and Information Technology, The Netherlands (2000)
Google Scholar
Miller, D., Leek, T., Schwartz, R.: A hidden Markov model information retrieval system. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214–221 (1999)
Google Scholar
Ponte, J., Croft, B.: A Language Modeling Approach to Information Retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281 (1998)
Google Scholar
Pirkola, A., Hedlund, T., Keskusalo, H., Järvelin, K.: Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4, 209–230 (2001)
Article MATH Google Scholar
Porter, M.: Snowball: A Language for Stemming Algorithms, Available online at: http://snowball.tartarus.org/texts/introduction.html (visited, March 13, 2003)
http://europa.eu.int/
McNamee, P., Mayfield, J.: Comparing Cross-Language Query Expansion Techniques by Degrading Translation Resources. In: Proceedings of the 25th Annual International Conference on Research and Development in Information Retrieval, pp. 159–166 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Applied Physics Laboratory, The Johns Hopkins University, 11100 Johns Hopkins Road, Laurel, MD, 20723-6099, USA
Paul McNamee & James Mayfield

Authors

Paul McNamee
View author publications
You can also search for this author in PubMed Google Scholar
James Mayfield
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISTI-CNR, Area di Ricerca, Pisa, Italy
Carol Peters
No Affiliations,
Julio Gonzalo & Martin Braschler &
German Institute for International and Security Affairs, Stiftung Wissenschaft und Politik (SWP), Ludwigkirchplatz 3-4, 10719, Berlin, Germany
Michael Kluck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McNamee, P., Mayfield, J. (2004). N-Grams for Translation and Retrieval in CL-SDR. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_63

Download citation

DOI: https://doi.org/10.1007/978-3-540-30222-3_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24017-4
Online ISBN: 978-3-540-30222-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

N-Grams for Translation and Retrieval in CL-SDR

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Adding Multilingual Terminological Resources to Parallel Corpora for Statistical Machine Translation Deteriorates System Performance: A Negative Result from Experiments in the Biomedical Domain

BBN’s low-resource machine translation for the LoReHLT 2016 evaluation

Constraint Grammar-Based Swedish-Danish Machine Translation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

N-Grams for Translation and Retrieval in CL-SDR

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Adding Multilingual Terminological Resources to Parallel Corpora for Statistical Machine Translation Deteriorates System Performance: A Negative Result from Experiments in the Biomedical Domain

BBN’s low-resource machine translation for the LoReHLT 2016 evaluation

Constraint Grammar-Based Swedish-Danish Machine Translation

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation