Abstract
This paper proposes ELEXR, a novel metric to evaluate machine translation (MT). In our proposed method, we extract lexical co-occurrence relationships of a given reference translation (Ref) and its corresponding hypothesis sentence using hyperspace analogue to language space matrix. Then, for each term appearing in these two sentences, we convert the co-occurrence information into a conditional probability distribution. Finally, by comparing the conditional probability distributions of the words held in common by Ref and the candidate sentence (Cand) using Kullback-Leibler divergence, we can score the hypothesis. ELEXR can evaluate MT by using only one Ref assigned to each Cand without incorporating any semantic annotated resources like WordNet. Our experiments on eight language pairs of WMT 2011 submissions show that ELEXR outperforms baselines, TER and BLEU, on average at system-level correlation with human judgments. It achieves average Spearman’s rho correlation of about 0.78, Kendall’s tau correlation of about 0.66 and Pearson’s correlation of about 0.84, corresponding to improvements of about 0.04, 0.07 and 0.06 respectively over BLEU, the best baseline.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods 28, 203–208 (1996)
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22, 79–86 (1951)
Callison-Burch, C., Koehn, P., Monz, C., Peterson, K., Przybocki, M., Zaidan, O.F.: Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metrics MATR, pp. 17–53. Association for Computational Linguistics (2010)
Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 workshop on statistical machine translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 22–64. Association for Computational Linguistics (2011)
Callison-Burch, C., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation. Association for Computational Linguistics (2012)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Agarwal, A., Lavie, A.: Meteor, m-bleu and m-ter: Evaluation metrics for high-correlation with human rankings of machine translation output. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 115–118. Association for Computational Linguistics (2008)
Chen, B., Kuhn, R.: Amber: A modified bleu, enhanced ranking metric. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 71–77 (2011)
Chen, B., Kuhn, R., Foster, G.: Improving amber, an mt evaluation metric. In: NAACL 2012 Workshop on Statistical Machine Translation (WMT 2012), pp. 59–63 (2012)
Chen, B., Kuhn, R., Larkin, S.: Port: a precision-order-recall mt evaluation metric for tuning. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012) (2012)
Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Lavie, A., Denkowski, M.J.: The meteor metric for automatic evaluation of machine translation. Machine Translation 23, 105–115 (2009)
Dahlmeier, D., Liu, C., Ng, H.T.: Tesla at wmt 2011: Translation evaluation and tunable metric. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 78–84. Association for Computational Linguistics (2011)
Song, X., Cohn, T.: Regression and ranking based optimisation for sentence level machine translation evaluation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 123–129. Association for Computational Linguistics (2011)
Popović, M.: Morphemes and pos tags for n-gram based evaluation metrics. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 104–107. Association for Computational Linguistics (2011)
Rios, M., Aziz, W., Specia, L.: Tine: A metric to assess mt adequacy. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 116–122. Association for Computational Linguistics (2011)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Snover, M., Madnani, N., Dorr, B., Schwartz, R.: Fluency, adequacy, or hter? exploring different human judgments with a tunable mt metric. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, vol. 30, pp. 259–268. Association for Computational Linguistics (2009)
Nieen, S., Och, F.J., Leusch, G., Ney, H.: An evaluation tool for machine translation: Fast evaluation for mt research. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation, pp. 39–45 (2000)
Leusch, G., Ueffing, N., Ney, H.: Cder: Efficient mt evaluation using block movements. In: Proceedings of the Thirteenth Conference of the European Chapter of the Association for Computational Linguistics, pp. 241–248 (2006)
Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., Sawaf, H.: Accelerated dp based search for statistical translation. In: European Conf. on Speech Communication and Technology, pp. 2667–2670 (1997)
Wang, M., Manning, C.D.: Spede: Probabilistic edit distance metrics for mt evaluation. In: Proceedings of WMT (2012)
Kahn, J.G., Snover, M., Ostendorf, M.: Expected dependency pair match: predicting translation quality with expected syntactic structure. Machine Translation 23, 169–179 (2009)
Wong, B., Kit, C.: Atec: automatic evaluation of machine translation via word choice and word order. Machine Translation 23, 141–155 (2009)
Popović, M., Vilar, D., Avramidis, E., Burchardt, A.: Evaluation without references: Ibm1 scores as evaluation metrics. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 99–103. Association for Computational Linguistics (2011)
Parton, K., Tetreault, J., Madnani, N., Chodorow, M.: E-rating machine translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 108–115. Association for Computational Linguistics (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahmoudi, A., Faili, H., Dehghan, M.H., Maleki, J. (2013). ELEXR: Automatic Evaluation of Machine Translation Using Lexical Relationships. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-45114-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)