Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

Luong, Ngoc-Quang; Besacier, Laurent; Lecouteux, Benjamin

doi:10.1007/978-3-319-02741-8_9

Ngoc-Quang Luong⁷,
Laurent Besacier⁷ &
Benjamin Lecouteux⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 244))

1031 Accesses
1 Citations

Abstract

This paper proposes some ideas to build an effective estimator, which predicts the quality of words in a Machine Translation (MT) output. We integrate a number of features of various types (system-based, lexical, syntactic and semantic) into the conventional feature set, for our baseline classifier training. After the experiments with all features, we deploy a “Feature Selection” strategy to filter the best performing ones. Then, a method that combines multiple “weak” classifiers to build a strong “composite” classifier by taking advantage of their complementarity allows us to achieve a better performance in term of F score. Finally, we exploit word confidence scores for improving the estimation system at sentence level.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Automatic quality estimation for speech translation using joint ASR and MT features

Article 01 June 2018

A Bayesian non-linear method for feature selection in machine translation quality estimation

Article 30 January 2015

Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., Ueffing, N.: Confidence Estimation for Machine Translation. Technical report, JHU/CLSP Summer Workshop (2003)
Google Scholar
Gandrabur, S., Foster, G.: Confidence Estimation for Text Prediction. In: Conference on Natural Language Learning (CoNLL), Edmonton, pp. 315–321 (May 2003)
Google Scholar
Ueffing, N., Macherey, K., Ney, H.: Confidence Measures for Statistical Machine Translation. In: MT Summit IX, New Orleans, LA, pp. 394–401 (September 2003)
Google Scholar
Blatz, J., Fitzgerald, E., Foster, G., Gandrabur, S., Goutte, C., Kulesza, A., Sanchis, A., Ueffing, N.: Confidence Estimation for Machine Translation. In: Proceedings of COLING 2004, Geneva, pp. 315–321 (April 2004)
Google Scholar
Ueffing, N., Ney, H.: Word-level Confidence Estimation for Machine Translation Using Phrased-based Translation Models. In: Human Language Technology Conference and Conference on Empirical Methods in NLP, Vancouver, pp. 763–770 (2005)
Google Scholar
Xiong, D., Zhang, M., Li, H.: Error Detection for Statistical Machine Translation Using Linguistic Features. In: 48th ACL, Uppsala, Sweden, pp. 604–611 (July 2010)
Google Scholar
Soricut, R., Echihabi, A.: Trustrank: Inducing Trust in Automatic Translations via Ranking. In: 48th ACL (Association for Computational Linguistics), Uppsala, Sweden, pp. 612–621 (July 2010)
Google Scholar
Nguyen, B., Huang, F., Al-Onaizan, Y.: Goodness: A Method for Measuring Machine Translation Confidence. In: 49th ACL, Portland, Oregon, pp. 211–219 (June 2011)
Google Scholar
Felice, M., Specia, L.: Linguistic Features for Quality Estimation. In: 7th Workshop on Statistical Machine Translation, Montreal, Canada, June 7-8, pp. 96–103 (2012)
Google Scholar
Ueffing, N., Och, F.J., Ney, H.: Generation of Word Graphs in Statistical Machine Translation. In: Conference on Empirical Methods for Natural Language Processing (EMNLP 2002), Philadelphia, PA, pp. 156–163 (2002)
Google Scholar
Stolcke, A.: Srilm - an Extensible Language Modeling Toolkit. In: 7th International Conference on Spoken Language Processing, Denver, USA, pp. 901–904 (2002)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open source toolkit for statistical machine translation. In: 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp. 177–180 (June 2007)
Google Scholar
Potet, M., Rodier, E.E., Besacier, L., Blanchon, H.: Collection of a Large Database of French-English SMT Output Corrections. In: 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, May 23-25 (2012)
Google Scholar
Snover, M., Madnani, N., Dorr, B., Schwartz, R.: Terp System Description. In: MetricsMATR workshop at AMTA (2008)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: CML 2001, pp. 282–289 (2001)
Google Scholar
Lavergne, T., Cappé, O., Yvon, F.: Practical Very Large Scale CRFs. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 504–513 (2010)
Google Scholar
Raybaud, S., Langlois, D., Smaïli, K.: This sentence is wrong. Detecting errors in machine - translated sentences. Machine Translation 25(1), 1–34 (2011)
Article Google Scholar
Luong, N.Q.: Integrating Lexical, Syntactic and System-based Features to Improve Word Confidence Estimation in SMT. In: JEP-TALN-RECITAL, Grenoble, France, June 4-8, pp. 43–56 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique de Grenoble, Campus de Grenoble, 41, Rue des Mathématiques, BP53, F-38041, Grenoble Cedex 9, France
Ngoc-Quang Luong, Laurent Besacier & Benjamin Lecouteux

Authors

Ngoc-Quang Luong
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Besacier
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Lecouteux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ngoc-Quang Luong .

Editor information

Editors and Affiliations

School of Knowledge Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Van Nam Huynh
UMR CNRS 7253 Heudiasyc, Universite de Technologie de Compiegne, Compiegne Cedex, France
Thierry Denoeux
Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
Dang Hung Tran
Faculty of Information Technology, University of Engineering and Technology, Hanoi, Vietnam
Anh Cuong Le
Faculty of Information Technology, University of Engineering and Technology, Hanoi, Vietnam
Son Bao Pham

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luong, NQ., Besacier, L., Lecouteux, B. (2014). Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-02741-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-02741-8_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02740-1
Online ISBN: 978-3-319-02741-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

Abstract

Chapter PDF

Similar content being viewed by others

Automatic quality estimation for speech translation using joint ASR and MT features

A Bayesian non-linear method for feature selection in machine translation quality estimation

Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Word Confidence Estimation and Its Integration in Sentence Quality Estimation for Machine Translation

Abstract

Chapter PDF

Similar content being viewed by others

Automatic quality estimation for speech translation using joint ASR and MT features

A Bayesian non-linear method for feature selection in machine translation quality estimation

Quality Estimation for English-Hungarian Machine Translation Systems with Optimized Semantic Features

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation