NICT’s Machine Translation Systems for CCMT-2019 Translation Task

Chen, Kehai; Wang, Rui; Utiyama, Masao; Sumita, Eiichiro

doi:10.1007/978-981-15-1721-1_8

Kehai Chen⁸,
Rui Wang⁸,
Masao Utiyama⁸ &
…
Eiichiro Sumita⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1104))

Included in the following conference series:

China Conference on Machine Translation

425 Accesses

Abstract

This paper describes the NICT’s neural machine translation systems for Chinese$\leftrightarrow $English directions in the CCMT-2019 shared news translation task. We used the provided parallel data augmented with a large quantity of back-translated monolingual data to train state-of-the-art NMT systems. We then employed techniques that have been proven to be most effective, such as fine-tuning, and model ensembling, to generate the primary submissions of Chinese$\leftrightarrow $English translation tasks.

Access provided by Autonomous University of Puebla. Download conference paper PDF

NJUNLP’s Machine Translation System for CCMT-2020 Uighur $$\rightarrow $$ Chinese Translation Task

ISTIC’s Neural Machine Translation Systems for CCMT’ 2023

ISTIC’s Neural Machine Translation System for CCMT’ 2021

Keywords

1 Introduction

This paper presents the neural machine translation (NMT) systems built for National Institute of Information and Communications Technology (NICT)’s participation in the CCMT-19 shared News Translation Task for Chinese$\leftrightarrow $English directions. Specifically, we used the Transformer architecture to build our translation systems. We then employed techniques that have been proven to be most effective, such as back-translation, fine-tuning, and model ensembling, to generate the primary submissions of Chinese$\leftrightarrow $English translation tasks. All of our systems are constrained, i.e., we used only the parallel and monolingual data provided by the organizers to train and tune our systems. This system is also a part of our system for WMT19 [1]^{Footnote 1}.

The remainder of this paper is organized as follows. In Sect. 2, we present the data preprocessing. In Sect. 3, we introduce the details of our NMT systems. Empirical results obtained with our systems are analyzed in Sect. 4 and we conclude this paper in Sect. 5.

2 Datasets

2.1 Data

As parallel data to train our systems, we used all the provided parallel data for all our targeted translation directions. The training data for the Chinese$\leftrightarrow $English (ZH$\leftrightarrow $EN) translation tasks consists of two parts: (1) we selected the first 10 million lines of the News Crawl 2018 English corpus according to the finding of [6, 11], (2) the corresponding synthetic data was generated through back-translation [5, 8].

2.2 Pre-processsing

We applied tokenizer and truecaser of Moses [4] to the English sentences. For Chinese, we used Jieba^{Footnote 2} for tokenization but did not perform truecasing. For cleaning, we filtered out sentences longer than 80 tokens in the training data by using Moses script clean-n-corpus.perl, and replaced characters forbidden by Moses. Tables 1 and 2 present the statistics of the parallel and monolingual data, respectively, after pre-processing.

Table 1. Statistics of our pre-processed parallel data

Full size table

Table 2. Statistics of our pre-processed monolingual data

Full size table

3 MT Systems

3.1 NMT

We used Marian toolkit [2]^{Footnote 3} to build competitive NMT systems based on the Transformer [10] architecture. We used the byte pair encoding (BPE) algorithm [9] for obtaining the sub-word vocabulary whose size was set to 50,000. The number of dimensions of all input and output layers was set to 512, and that of the inner feed-forward neural network layer was set to 2048. The number of attention heads in each encoder and decoder layer was set to eight. During training, the value of label smoothing was set to 0.1, and the attention dropout and residual dropout were set to 0.1. The Adam optimizer [3] was used to tune the parameters of the model. The learning rate was varied under a warm-up strategy with warm-up steps of 16,000. We validated the model with an interval of 5,000 batches on the development set and selected the best model according to BLEU [7] score on the development set. All our NMT systems were consistently trained on 4 GPUs,^{Footnote 4} with the following parameters for Marian (Table 3):

Table 3. Parameters for training Marian.

Full size table

3.2 Back-Translation of Monolingual Data

The so-called “back-translation” of monolingual has been shown to be one of the most efficient ways to exploit monolingual data for NMT [8]. It is simply to translate target monolingual data into the source language, using a pre-trained target-to-source NMT models, in order to produce a new synthetic parallel data that can be used to train NMT models. We concatenated the resulting synthetic parallel data to the original parallel data to train better NMT models. For En$\rightarrow $Zh, we back-translated the entire XMU Chinese monolingual corpus containing 5.4M sentences as the source to produce synthetic English data. For Zh$\rightarrow $En, we empirically compared the impact of back-translating different sizes of English monolingual data, using the first 10M lines of the concatenation of News Crawl-2016 and News Crawl-2017 English corpora to produce synthetic Chinese data.

3.3 Fine-Tuning and Ensemble of NMT Models

After the back-translation, we performed the training run independently for five times on the mixture of the original parallel data and the pseudo-parallel data, and thus obtain the translation models. The new model was further fine-tuned on the ccmt2018_newstest set for 20 epochs. Finally, we decoded the ccmt2019_newstest set with an ensemble of the five fine-tuned models to generate the primary submissions for the ZH$\leftrightarrow $EN tasks.

4 Results

Our systems are evaluated on the WMT2019NewsTest test set^{Footnote 5} for ZH$\leftrightarrow $EN tasks and the results are shown in Table 4. For EN$\rightarrow $ZH, BLEU scores were computed on the basis of character-based segmentation. “w/backtr” and “w/o backtr” indicate with and without back-translation, respectively. “w/ft” indicates that this single model was fine-tuned on the ccmt2018_newstest sets. “ensemble” indicates that five fine-tuned single models were ensembled at decoding time.

Table 4. Results (BLEU-cased) of our MT systems on the ccmt2018_newstest test set.

Full size table

Our observations from Table 4 are as follows: It is obvious that the back-translation, fine-tuning, and ensemble methods are greatly effective for the ZH$\leftrightarrow $EN tasks. In particular, the ensemble gave more improvements on the ZH$\rightarrow $EN task over the “Single model+back-translation+fine-tuning” model than the EN$\rightarrow $ZH task.

5 Conclusion

We presented in this paper the NICT’s participation in the CCMT-2019 shared Chinese$\leftrightarrow $English news translation task. Our primary submissions to the tasks were the results of a simple combination of back-translation, fine-tuning, and ensemble methods. Our results confirmed that these three methods can incrementally improve translation performance of the Transformer NMT.

Notes

1.
The Chinese-English task is jointly held by CCMT-2019 and WMT19. Therefore, part of these two system description papers are overlapped.
2.
https://github.com/fxsjy/jieba.
3.
https://marian-nmt.github.io.
4.
NVIDIA® Tesla® P100 16 Gb.
5.
http://www.statmt.org/wmt19/translation-task.html.

References

Dabre, R., et al.: NICT’s supervised neural machine translation systems for the WMT19 news translation task. In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), Association for Computational Linguistics, Florence, Italy, pp. 168–174, August 2019. https://www.aclweb.org/anthology/W19-5313
Junczys-Dowmunt, M., et al.: Marian: fast neural machine translation in C++. In: Proceedings of ACL 2018, System Demonstrations, Melbourne, Australia, pp. 116–121 (2018). http://aclweb.org/anthology/P18-4020
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, pp. 177–180 (2007). http://aclweb.org/anthology/P07-2045
Marie, B., et al.: NICT’s unsupervised neural and statistical machine translation systems for the WMT19 news translation task. In: Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), Association for Computational Linguistics, Florence, Italy, pp. 294–301, August 2019. https://www.aclweb.org/anthology/W19-5330
Marie, B., Wang, R., Fujita, A., Utiyama, M., Sumita, E.: NICT’s neural and statistical machine translation systems for the WMT18 news translation task. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Belgium, Brussels, pp. 449–455, October 2018. https://www.aclweb.org/anthology/W18-6419
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318, July 2002. https://doi.org/10.3115/1073083.1073135, http://www.aclweb.org/anthology/P02-1040
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 86–96 (2016). http://aclweb.org/anthology/P16-1009
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic, Berlin, Germany, pp. 1715–1725 (2016). http://aclweb.org/anthology/P16-1162
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30 (2017). https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wang, R., Marie, B., Utiyama, M., Sumita, E.: NICT’s corpus filtering systems for the WMT18 parallel corpus filtering task. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Association for Computational Linguistics, Belgium, Brussels, pp. 963–967, October 2018. https://doi.org/10.18653/v1/W18-6489, https://www.aclweb.org/anthology/W18-6489

Download references

Acknowledgments

We are grateful to the anonymous reviewers and the area chair for their insightful comments and suggestions. Rui Wang was partially supported by JSPS grant-in-aid for early-career scientists (19K20354): “Unsupervised Neural Machine Translation in Universal Scenarios” and NICT tenure-track researcher startup fund “Toward Intelligent Machine Translation”.

Author information

Authors and Affiliations

National Institute of Information and Communications Technology, Kyoto, Japan
Kehai Chen, Rui Wang, Masao Utiyama & Eiichiro Sumita

Authors

Kehai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Masao Utiyama
View author publications
You can also search for this author in PubMed Google Scholar
Eiichiro Sumita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Wang .

Editor information

Editors and Affiliations

Nanjing University, Nanjing, China
Shujian Huang
Didi Labs, University of Southern California, Marina Del Rey, CA, USA
Kevin Knight

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, K., Wang, R., Utiyama, M., Sumita, E. (2019). NICT’s Machine Translation Systems for CCMT-2019 Translation Task. In: Huang, S., Knight, K. (eds) Machine Translation. CCMT 2019. Communications in Computer and Information Science, vol 1104. Springer, Singapore. https://doi.org/10.1007/978-981-15-1721-1_8

Download citation

DOI: https://doi.org/10.1007/978-981-15-1721-1_8
Published: 23 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1720-4
Online ISBN: 978-981-15-1721-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

NICT’s Machine Translation Systems for CCMT-2019 Translation Task

Abstract

Similar content being viewed by others

NJUNLP’s Machine Translation System for CCMT-2020 Uighur $$\rightarrow $$ Chinese Translation Task

ISTIC’s Neural Machine Translation Systems for CCMT’ 2023

ISTIC’s Neural Machine Translation System for CCMT’ 2021

Keywords

1 Introduction

2 Datasets

2.1 Data

2.2 Pre-processsing

3 MT Systems

3.1 NMT

3.2 Back-Translation of Monolingual Data

3.3 Fine-Tuning and Ensemble of NMT Models

4 Results

5 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

NICT’s Machine Translation Systems for CCMT-2019 Translation Task

Abstract

Similar content being viewed by others

NJUNLP’s Machine Translation System for CCMT-2020 Uighur $$\rightarrow $$ Chinese Translation Task

ISTIC’s Neural Machine Translation Systems for CCMT’ 2023

ISTIC’s Neural Machine Translation System for CCMT’ 2021

Keywords

1 Introduction

2 Datasets

2.1 Data

2.2 Pre-processsing

3 MT Systems

3.1 NMT

3.2 Back-Translation of Monolingual Data

3.3 Fine-Tuning and Ensemble of NMT Models

4 Results

5 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation