Abstract
This paper describes the NICT’s neural machine translation systems for Chinese\(\leftrightarrow \)English directions in the CCMT-2019 shared news translation task. We used the provided parallel data augmented with a large quantity of back-translated monolingual data to train state-of-the-art NMT systems. We then employed techniques that have been proven to be most effective, such as fine-tuning, and model ensembling, to generate the primary submissions of Chinese\(\leftrightarrow \)English translation tasks.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
This paper presents the neural machine translation (NMT) systems built for National Institute of Information and Communications Technology (NICT)’s participation in the CCMT-19 shared News Translation Task for Chinese\(\leftrightarrow \)English directions. Specifically, we used the Transformer architecture to build our translation systems. We then employed techniques that have been proven to be most effective, such as back-translation, fine-tuning, and model ensembling, to generate the primary submissions of Chinese\(\leftrightarrow \)English translation tasks. All of our systems are constrained, i.e., we used only the parallel and monolingual data provided by the organizers to train and tune our systems. This system is also a part of our system for WMT19 [1]Footnote 1.
The remainder of this paper is organized as follows. In Sect. 2, we present the data preprocessing. In Sect. 3, we introduce the details of our NMT systems. Empirical results obtained with our systems are analyzed in Sect. 4 and we conclude this paper in Sect. 5.
2 Datasets
2.1 Data
As parallel data to train our systems, we used all the provided parallel data for all our targeted translation directions. The training data for the Chinese\(\leftrightarrow \)English (ZH\(\leftrightarrow \)EN) translation tasks consists of two parts: (1) we selected the first 10 million lines of the News Crawl 2018 English corpus according to the finding of [6, 11], (2) the corresponding synthetic data was generated through back-translation [5, 8].
2.2 Pre-processsing
We applied tokenizer and truecaser of Moses [4] to the English sentences. For Chinese, we used JiebaFootnote 2 for tokenization but did not perform truecasing. For cleaning, we filtered out sentences longer than 80 tokens in the training data by using Moses script clean-n-corpus.perl, and replaced characters forbidden by Moses. Tables 1 and 2 present the statistics of the parallel and monolingual data, respectively, after pre-processing.
3 MT Systems
3.1 NMT
We used Marian toolkit [2]Footnote 3 to build competitive NMT systems based on the Transformer [10] architecture. We used the byte pair encoding (BPE) algorithm [9] for obtaining the sub-word vocabulary whose size was set to 50,000. The number of dimensions of all input and output layers was set to 512, and that of the inner feed-forward neural network layer was set to 2048. The number of attention heads in each encoder and decoder layer was set to eight. During training, the value of label smoothing was set to 0.1, and the attention dropout and residual dropout were set to 0.1. The Adam optimizer [3] was used to tune the parameters of the model. The learning rate was varied under a warm-up strategy with warm-up steps of 16,000. We validated the model with an interval of 5,000 batches on the development set and selected the best model according to BLEU [7] score on the development set. All our NMT systems were consistently trained on 4 GPUs,Footnote 4 with the following parameters for Marian (Table 3):
3.2 Back-Translation of Monolingual Data
The so-called “back-translation” of monolingual has been shown to be one of the most efficient ways to exploit monolingual data for NMT [8]. It is simply to translate target monolingual data into the source language, using a pre-trained target-to-source NMT models, in order to produce a new synthetic parallel data that can be used to train NMT models. We concatenated the resulting synthetic parallel data to the original parallel data to train better NMT models. For En\(\rightarrow \)Zh, we back-translated the entire XMU Chinese monolingual corpus containing 5.4M sentences as the source to produce synthetic English data. For Zh\(\rightarrow \)En, we empirically compared the impact of back-translating different sizes of English monolingual data, using the first 10M lines of the concatenation of News Crawl-2016 and News Crawl-2017 English corpora to produce synthetic Chinese data.
3.3 Fine-Tuning and Ensemble of NMT Models
After the back-translation, we performed the training run independently for five times on the mixture of the original parallel data and the pseudo-parallel data, and thus obtain the translation models. The new model was further fine-tuned on the ccmt2018_newstest set for 20 epochs. Finally, we decoded the ccmt2019_newstest set with an ensemble of the five fine-tuned models to generate the primary submissions for the ZH\(\leftrightarrow \)EN tasks.
4 Results
Our systems are evaluated on the WMT2019NewsTest test setFootnote 5 for ZH\(\leftrightarrow \)EN tasks and the results are shown in Table 4. For EN\(\rightarrow \)ZH, BLEU scores were computed on the basis of character-based segmentation. “w/backtr” and “w/o backtr” indicate with and without back-translation, respectively. “w/ft” indicates that this single model was fine-tuned on the ccmt2018_newstest sets. “ensemble” indicates that five fine-tuned single models were ensembled at decoding time.
Our observations from Table 4 are as follows: It is obvious that the back-translation, fine-tuning, and ensemble methods are greatly effective for the ZH\(\leftrightarrow \)EN tasks. In particular, the ensemble gave more improvements on the ZH\(\rightarrow \)EN task over the “Single model+back-translation+fine-tuning” model than the EN\(\rightarrow \)ZH task.
5 Conclusion
We presented in this paper the NICT’s participation in the CCMT-2019 shared Chinese\(\leftrightarrow \)English news translation task. Our primary submissions to the tasks were the results of a simple combination of back-translation, fine-tuning, and ensemble methods. Our results confirmed that these three methods can incrementally improve translation performance of the Transformer NMT.
Notes
- 1.
The Chinese-English task is jointly held by CCMT-2019 and WMT19. Therefore, part of these two system description papers are overlapped.
- 2.
- 3.
- 4.
NVIDIA® Tesla® P100 16 Gb.
- 5.
References
Dabre, R., et al.: NICT’s supervised neural machine translation systems for the WMT19 news translation task. In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), Association for Computational Linguistics, Florence, Italy, pp. 168–174, August 2019. https://www.aclweb.org/anthology/W19-5313
Junczys-Dowmunt, M., et al.: Marian: fast neural machine translation in C++. In: Proceedings of ACL 2018, System Demonstrations, Melbourne, Australia, pp. 116–121 (2018). http://aclweb.org/anthology/P18-4020
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, pp. 177–180 (2007). http://aclweb.org/anthology/P07-2045
Marie, B., et al.: NICT’s unsupervised neural and statistical machine translation systems for the WMT19 news translation task. In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), Association for Computational Linguistics, Florence, Italy, pp. 294–301, August 2019. https://www.aclweb.org/anthology/W19-5330
Marie, B., Wang, R., Fujita, A., Utiyama, M., Sumita, E.: NICT’s neural and statistical machine translation systems for the WMT18 news translation task. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Belgium, Brussels, pp. 449–455, October 2018. https://www.aclweb.org/anthology/W18-6419
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318, July 2002. https://doi.org/10.3115/1073083.1073135, http://www.aclweb.org/anthology/P02-1040
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 86–96 (2016). http://aclweb.org/anthology/P16-1009
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic, Berlin, Germany, pp. 1715–1725 (2016). http://aclweb.org/anthology/P16-1162
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30 (2017). https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wang, R., Marie, B., Utiyama, M., Sumita, E.: NICT’s corpus filtering systems for the WMT18 parallel corpus filtering task. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Association for Computational Linguistics, Belgium, Brussels, pp. 963–967, October 2018. https://doi.org/10.18653/v1/W18-6489, https://www.aclweb.org/anthology/W18-6489
Acknowledgments
We are grateful to the anonymous reviewers and the area chair for their insightful comments and suggestions. Rui Wang was partially supported by JSPS grant-in-aid for early-career scientists (19K20354): “Unsupervised Neural Machine Translation in Universal Scenarios” and NICT tenure-track researcher startup fund “Toward Intelligent Machine Translation”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chen, K., Wang, R., Utiyama, M., Sumita, E. (2019). NICT’s Machine Translation Systems for CCMT-2019 Translation Task. In: Huang, S., Knight, K. (eds) Machine Translation. CCMT 2019. Communications in Computer and Information Science, vol 1104. Springer, Singapore. https://doi.org/10.1007/978-981-15-1721-1_8
Download citation
DOI: https://doi.org/10.1007/978-981-15-1721-1_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1720-4
Online ISBN: 978-981-15-1721-1
eBook Packages: Computer ScienceComputer Science (R0)