Abstract
Machine translation systems are not reliable enough to be used “as is”: except for the most simple tasks, they can only be used to grasp the general meaning of a text or assist human translators. The purpose of confidence measures is to detect erroneous words or sentences produced by a machine translation system. In this article, after reviewing the mathematical foundations of confidence estimation, we propose a comparison of several state-of-the-art confidence measures, predictive parameters and classifiers. We also propose two original confidence measures based on Mutual Information and a method for automatically generating data for training and testing classifiers. We applied these techniques to data from the WMT campaign 2008 and found that the best confidence measures yielded an Equal Error Rate of 36.3% at word level and 34.2% at sentence level, but combining different measures reduced these rates to 35.0% and 29.0%, respectively. We also present the results of an experiment aimed at determining how helpful confidence measures are in a post-editing task. Preliminary results suggest that our system is not yet ready to efficiently help post-editors, but we now have both software and a protocol that we can apply to further experiments, and user feedback has indicated aspects which must be improved in order to increase the level of helpfulness of confidence measures.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: 20th International conference on computational linguistics, proceedings, Vol I. Geneva, Switzerland, pp 315–321
Brown P, Della-Pietra S, Della-Pietra V, Mercer R (1993) The mathematic of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297
Duchateau J, Demuynck K, Wambacq P (2002) Confidence scoring based on backward language models. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, Vol. 1. Orlando, Florida, pp 221–224
Fausett LV (1994) Fundamentals of neural networks. Prentice-Hall, Englewood Cliffs
Gandrabur S, Foster G (2003) Confidence estimation for translation prediction. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the association for computational linguistics conference series. Vol. 4. Edmonton, Canada, pp 95–102
Guo G, Huang C, Jiang H, Wang R (2004) A comparative study on various confidence measures in large vocabulary speech recognition. In: Proceedings of the international symposium on chinese spoken language processing. Hong Kong, China, pp 9–12
Hsu C-W, Chang C-C, Lin C-J (2003) A Practical Guide to Support Vector Classification. Technical report, Department of Computer Science, National Taiwan University, Taiwan
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R et al (2007) Moses: Open source toolkit for statistical machine translation. In: ACL 2007, Proceedings of the interactive poster and demonstration sessions. Prague, Czech Republic, pp 177–180
Kononenko I, Bratko I (1991) Information-based evaluation criterion for classifier’s performance. Mach Learn 6(1): 67–80
Lavecchia C, Smaïli K, Langlois D, Haton J-P (2007) Using inter-lingual triggers for Machine translation. In: Proceedings of the eighth international conference on speech communication and technology (INTERSPEECH). Antwerp, Belgium, pp 2829–2832
Menard SW (2002) Applied logistic regression analysis, Sage university papers, Quantitative applications in the social sciences. Sage, Thousand Oaks, CA
Miller G (1995) WordNet: a lexical database for English. Commun ACM 38(11): 39–41
Nissen S (2003) Implementation of a fast artificial neural network library (fann). Technical report, Department of Computer Science University of Copenhagen (DIKU), Copenhagen, Denmark. http://fann.sf.net
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics. Philadelphia, USA, pp 311–318
Plitt M, Masselot F (2010) A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bull Math Linguist 93(1): 7–16
Quirk C (2004) Training a sentence-level machine translation confidence measure. In: LREC-2004: Fourth international conference on language resources and evaluation, proceedings. Lisbon, Portugal, pp 825–828
Raybaud S, Lavecchia C, Langlois D, Smaïli K (2009a) New confidence measures for statistical machine translation. In: Proceedings of the international conference on agents and artificial intelligence. Porto, Portugal, pp 61–68
Raybaud S, Lavecchia C, Langlois D, Smaïli K (2009b) Word- and sentence-level confidence measures for machine translation. In: EAMT-2009: Proceedings of the 13th annual conference of the european association for machine translation. Barcelona, Spain, pp 104–111
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing, Vol. 12. Manchester, UK, pp 44–49
Schmid H (1995) Improvements in part-of-speech tagging with an application to German. In: In Proceedings of the EACL SIGDAT-workshop. Dublin, Ireland, pp 47–50
Simard M, Ueffing N, Isabelle P, Kuhn R (2007) Rule-based translation with statistical phrase-based post-editing. In: Proceedings of the ACL-2007 workshop on statistical machine translation (WMT-07). Prague, Czech Republic, pp 203–206
Siu M, Gish H (1999) Evaluation of word confidence for speech recognition systems. Comput Speech Lang 13(4): 299–318
Smola A, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3): 199–222
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, USA, pp 223–231
Specia L, Cancedda N, Dymetman M, Turchi M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: EAMT-2009: Proceedings of the 13th annual conference of the European association for machine translation. Barcelona, Spain, pp 28–35
Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Seventh international conference on spoken language processing. Denver, CO, pp 901–904
Tobias R (1995) An introduction to partial least squares regression. In: Proceedings of the twentieth annual sas users group international conference, Cary, NC: SAS Institute Inc. Orlando, FL, pp 1250–1257
Ueffing N, Ney H (2004) Bayes decision rules and confidence measures for statistical machine translation. In: Proceedings of EsTAL Espana for natural language processing. Alicante, Spain, pp 70–81, Springer
Ueffing N, Ney H (2005) Word-level confidence estimation for machine translation using phrase-based translation models. In: HLT/EMNLP 2005: Human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 763–770
Uhrik C, Ward W (1997) Confidence metrics based on N-Gram language model backoff behaviors. In: Fifth European conference on speech communication and technology. Rhodes, Greece, pp 2771–2774
Wold S, Ruhe A, Wold H, Dunn W III (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comput 5(3): 735–743
Zhang R, Rudnicky A (2001) Word level confidence annotation using combinations of features. In: Seventh European conference on speech communication and technology. Aalborg, Denmark, pp 2105–2108
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raybaud, S., Langlois, D. & Smaïli, K. “This sentence is wrong.” Detecting errors in machine-translated sentences. Machine Translation 25, 1–34 (2011). https://doi.org/10.1007/s10590-011-9094-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9094-9