Abstract
Recent efforts to develop new machine translation evaluation methods have tried to account for allowable wording differences either in terms of syntactic structure or synonyms/paraphrases. This paper primarily considers syntactic structure, combining scores from partial syntactic dependency matches with standard local n-gram matches using a statistical parser, and taking advantage of N-best parse probabilities. The new scoring metric, expected dependency pair match (EDPM), is shown to outperform BLEU and TER in terms of correlation to human judgments and as a predictor of HTER. Further, we combine the syntactic features of EDPM with the alternative wording features of TERp, showing a benefit to accounting for syntactic structure on top of semantic equivalency features.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, pp 65–72
Cahill A, Burke M, O’Donovan R, Van Genabith J, Way A (2004) Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations. In: Proceedings of the ACL, pp 319–326
Callison-Burch C (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of the EACL, pp 249–256
Charniak E, Johnson M (2005) Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the ACL, pp 173–180
Charniak E, Knight K, Yamada K (2003) Syntax-based language models for statistical machine translation. In: Proceedings of the MT Summit IX
Kahn JG, Roark B, Ostendorf M (2008) Automatic syntactic MT evaluation with expected dependency pair match. In: MetricsMATR: NIST metrics for machine translation challenge
LDC (2003) Multiple translation Chinese corpus, part 2. Catalog number LDC2003T17
LDC (2006) Multiple translation Chinese corpus, part 4. Catalog number LDC2006T04
LDC (2008) GALE phase 2 + retest evaluation references. Catalog number LDC2008E11
Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, pp 25–32
Magerman DM (1995) Statistical decision-tree models for parsing. In: Proceedings of the ACL, pp 276–283
Owczarzak K, van Genabith J, Way A (2007a) Evaluating machine translation with LFG dependencies. Mach Transl 21(2): 95–119
Owczarzak K, van Genabith J, Way A (2007b) Labelled dependencies in machine translation evaluation. In: Proceedings of the second workshop on statistical machine translation, pp 104–111
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the ACL, pp 311–318
Roark B, Harper M, Charniak E, Dorr B, Johnson M, Kahn JG, Liu Y, Ostendorf M, Hale J, Krasnyanskaya A, Lease M, Shafran I, Snover M, Stewart R, Yung L (2006) SParseval: evaluation metrics for parsing speech. In: Proceedings of the LREC
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the AMTA
Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the workshop on statistical machine translation at EACL
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kahn, J.G., Snover, M. & Ostendorf, M. Expected dependency pair match: predicting translation quality with expected syntactic structure. Machine Translation 23, 169–179 (2009). https://doi.org/10.1007/s10590-009-9057-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-009-9057-6