Word Position Aware Translation Memory for Neural Machine Translation

He, Qiuxiang; Huang, Guoping; Liu, Lemao; Li, Li

doi:10.1007/978-3-030-32233-5_29

Qiuxiang He¹³,
Guoping Huang¹⁴,
Lemao Liu¹⁴ &
…
Li Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11838))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

2385 Accesses
1 Citations

Abstract

The approach based on translation pieces is appealing for neural machine translation with a translation memory (TM), owing to its efficiency in both computation and memory consumption. Unfortunately, it is incapable of capturing sufficient contextual translation leading to a limited translation performance. This paper thereby proposes a simple yet effective approach to address this issue. Its key idea is to employ the word position information from a TM as additional rewards to guide the decoding of neural machine translation (NMT). Experiments on seven tasks show that the proposed approach yields consistent gains particularly for those source sentences whose TM is very similar to themselves, while maintaining similar efficiency to the counterpart of translation pieces.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Integrating TM Knowledge into NMT with Double Chain Graph

A Novel Method of Translation Memory to Improve Machine Translation

Neural Machine Translation with Diversity-Enabled Translation Memory

Keywords

1 Introduction

A translation memory (TM) provides the most similar source-target sentence pairs to the source sentence to be translated, and it yields more reliable translation results particularly for those matched segments between a TM and the source sentence [9]. Therefore, a TM has been widely used in machine translation systems. For example, various research work has been devoted to integrating TM into statistical machine translation (SMT) [4, 6, 12]. As an evolutional shift from SMT to the advanced neural machine translation (NMT), there are increasingly interests in employing TM information to improve the NMT results.

Li et al. and Farajian et al. proposed a fine tuning approach in [2, 5] to train a sentence-wise local neural model on top of a retrieved TM, which was further used for testing a particular sentence. Despite its appealing performance, the fine-tuning for each testing sentence leads to the low latency in decoding. On the contrary, in [3] and [13], the standard NMT model was augmented by additionally encoding a TM for each testing sentence. The proposed model was trained to optimize for testing all source sentences. Although these approaches [3, 13] are capable of capturing global context from a TM, its encoding of a TM with neural networks requires intensive computation and considerable memory, because a TM typically encodes much more words than those encoded by a standard NMT model.

Thankfully, a simple approach was proposed in [14], which was efficient in both computation and memory. Rather than employing neural networks for TM encoding, they represent a TM for each sentence as a collection of translation pieces consisting of weighted n-grams in a TM, whose weights are added into NMT probabilities as rewards. Unfortunately, because translation pieces capture very local context in a TM, this approach can not generate good translations when a TM is very similar to the testing sentence: in particular, the translation quality is far away from perfect even if the reference translation of the source sentence is included in the training set as argued by [13].

To address the above issue, this paper proposes a word position aware TM approach which captures more contextual information in a TM while maintaining similar efficiency to [14]. Our intuition is that: when translating a source sentence, if a word y is at the position i of a target sentence in a TM, and the word y should be in the output, then the position of y in the output should be not far away from i.

To put this intuition into practice, we design two types of position rewards according to the normal distribution and then integrate them into NMT with translation pieces. We apply our approach to Transformer, a strong NMT system [11]. Extensive experiments on seven translation tasks demonstrate the proposed method delivers substantial BLEU improvements over Transformer and it further consistently and significantly outperforms the approach in [14] over 1 BLEU score on average, while our running speed is almost the same as that in [14].

2 Background

2.1 NMT

In this paper, we use the state-of-the-art NMT model, Transformer [11], as our baseline. Suppose $\mathbf x =\left\langle x_1,\dots , x_{|\mathbf x |}\right\rangle $ is a source sentence with length $|\mathbf x |$ and $\mathbf y =\left\langle y_1,\dots ,y_{|\mathbf y |}\right\rangle $ is the corresponding target sentence of $\mathbf x $ with length $|\mathbf y |$. Generally, for a given $\mathbf x $, Transformer aims to generate a translation $\mathbf y $ according to the conditional probability $P(\mathbf y |\mathbf x )$ defined by neural networks:

$$\begin{aligned} P(\mathbf y |\mathbf x )=\prod ^{|\mathbf y |}_{i=1}P(y_i|\mathbf y _{<i},\mathbf x ) \end{aligned}$$

(1)

where $\mathbf y _{<i} = \left\langle y_1,\dots ,y_{i-1}\right\rangle $ denotes a prefix of $\mathbf y $ with length $i$ $-$ $1$. To expand each factor $P(y_i|\mathbf y _{<i},\mathbf x )$, Transformer bases on the encoder-decoder framework similar to the standard sequence-to-sequence learning in [1].

More specifically, in encoding x, an encoder is composed of L layers of neural networks. During decoding process, the Transformer is also composed of L layers of neural networks as mentioned in [11]. The factory $P(y_i|\mathbf y _{<i},\mathbf x )$ can be defined as following:

$$\begin{aligned} P(y_i|\mathbf y _{<i},\mathbf x ) = \text {softmax} \left( \phi (h_i^{D,L}) \right) \end{aligned}$$

(2)

where $h_i^{D,L}$ indicates the $i_{th}$ hidden unit at $L_{th}$ layer under the encoder-decoder framework, and $\phi $ is a linear network to project the hidden unit to a vector with dimension of the target vocabulary size.

The standard decoding algorithm for NMT is beam search. Namely, at each time step i, we keep n-best hypotheses. The probability of a complete hypothesis is computed as following:

$$\begin{aligned} \log P(\mathbf y |\mathbf x )=\sum _{i=1}^{|\mathbf y |} \log P(y_i|\mathbf y _{<i},\mathbf x ) \end{aligned}$$

(3)

2.2 Translation Pieces

For a source sentence x to be translated, we use an off-the-shelf search engine to retrieve a set of source sentences along with corresponding translations from translation memory (TM), and then get the TM list $\left\{ (\mathbf x ^m, \mathbf y ^m) | m \in [1,M] \right\} $. Then, we calculate the similarity between x and $\mathbf x ^m$ as following [3]:

$$\begin{aligned} \text {sim}(\mathbf x , \mathbf x ^m) = 1 - \frac{dist(\mathbf x , \mathbf x ^m)}{\max (|\mathbf x |, |\mathbf x ^m|)} \end{aligned}$$

(4)

where $dist(\cdot )$ denotes the edit-distance and $|\mathbf x |$ denotes the word-based length of x.

Following [14], we firstly collect translation pieces from the TM list. Specifically, translation pieces (up to 4-grams) are collected from the retrieved target sentences $\mathbf y ^m$ as possible translation pieces $G^m_\mathbf x $ for $\mathbf x $, using word-level alignments to select n-grams that are related to $\mathbf x $ and discard others. For example, in Fig. 1, the red part of the retrieved TM target sentence is employed to extracted translation pieces for the source sentence, such as “gets”, “object” and “object that” etc. While the black part of the TM target sentence is the unmatched piece that will not be collected. Formally, the translation pieces $G_\mathbf x $ from TM are represented as:

$$\begin{aligned} G_\mathbf x = \cup _{m=1}^M G^m_\mathbf x \end{aligned}$$

(5)

where $G^m_\mathbf x $ denotes all weighted n-grams from $\langle \mathbf x ^m, \mathbf y ^m \rangle $ with n up to 4.

Secondly, we calculate a score for each $u \in G_\mathbf x $. The weighted score for each u measures how likely it is a correct translation piece for x based on sentence similarity between the retrieved source sentences $\left\{ \mathbf x ^m|m \in [1,M] \right\} $ and the input sentence $\mathbf x $ as following:

$$\begin{aligned} s_p(\mathbf x ,u) = \max _{1\le m \le M \wedge u \in G_\mathbf x ^m} \text {sim}(\mathbf x , \mathbf x ^m) \end{aligned}$$

(6)

And then, as shown in Fig. 2(a)(b), an additional translation piece reward for the collected translation pieces will be added to NMT output layer according to:

$$\begin{aligned} R_{p}(y_i|\mathbf y _{<i},\mathbf x ) = \lambda \sum _{n=1}^{4} \delta \big ( y_{i-n+1}^i \in G_\mathbf x , s_p(\mathbf x , u)\big ) \end{aligned}$$

(7)

where $\lambda $ can be tuned on the development set and $\delta (cond, val)$ is computed as:

$$\begin{aligned} \delta (cond,val) = \left\{ \begin{array}{lr} 0 &{} \text {if}~cond~\text {is}~false \\ val &{} \text {if}~cond~\text {is}~true \end{array} \right. \end{aligned}$$

(8)

Finally, based on Eqs. 2 and 7, the updated probability $P'(y_i|\mathbf y _{<i},\mathbf x )$ for the word $y_i$ is calculated by:

$$\begin{aligned} P'(y_i|\mathbf y _{<i},\mathbf x ) = P(y_i|\mathbf y _{<i},\mathbf x ) \times e^{R_p(y_i|\mathbf y _{<i},\mathbf x )} \end{aligned}$$

(9)

In this section, we provide a brief summary of how to use retrieved translation pieces in TM for NMT. For more details, we refer readers to [14].

3 Word Positions Aware TM

In order to improve greatly the translation quality, we hope the NMT output majorly follows the target sentences of TM. Although translation pieces are very useful to accomplish word selection, it is hard to capture sufficient contextual information beyond 4-grams in a TM, leading to the limited translation performance: in particular, given the TM source sentence, it is hard for the translation pieces to guide the NMT model to generate the reliable translation even if its reference is in the TM.

Then, inspired by our intuition stated in Sect. 1, we study the position of word y in the collected translation pieces, and find that:

If there is a low similarity between the TM source sentence and the input sentence, the positions of word y in translation pieces are less helpful to guide the decoding process.
In the middle similarity situation, the positions of word y in translation pieces are helpful to guide the decoding process.
In the high similarity situation, the positions of word y in translation pieces are very helpful to guide the decoding process.

In general, word positions may be helpful to supply more contextual information or long distance knowledge, and it depends on the similarity between the source and the TM source sentences. As shown in Fig. 3, if the TM source is highly similar to the source, the word position $i'$ in the TM target should be not far away from the word position i in the decoding process. For example, at decoding step 4, the positions of output word “object” are 3 and 7 in TM as shown in red.

Therefore, if we consider the global position of a word in a TM, it is possible to improve NMT with translation pieces. Hence, we try some methods to capture the position distribution such as the linear distribution, the normal distribution, and the multinomial distribution. Finally, we select the normal distribution. As shown in Fig. 2(a)(c), v refers to a word in the target vocabulary, and $i'$ refers to the expected position of word $v_3$ according to TM. And we add word position rewards into the NMT output layer according to normal distributions. Therefore, the position reward at time $i'$ is larger than that at time i.

In this paper, we will design two types of position rewards, namely sentence level rewards and piece level rewards, for the given target word v from the retrieved TM according to normal distributions as follows.

3.1 Sentence Level Position

To capture contextual information or long distance knowledge, in this paper, we use the normal distribution to represent the relationship between positions. And we adopt the top-1 TM instance $\mathbf{x ^m, \mathbf y ^m}$ to learn the parameters of distributions for word positions at the sentence level. Finally, the mathematical expectation of the normal distribution is $i'$ and the standard deviation is $2$ $\cdot sim(\mathbf x ,\mathbf x ^m)$. Specifically, for the target word $y_i$ and the translation target position i during decoding, the corresponding position score $s_{ps}$ at the sentence level is calculated as following:

$$\begin{aligned} s_{ps}(\mathbf x , y_i, i)=\frac{e^{-\frac{1}{2} \cdot \big ( \frac{i-i'}{2\cdot \text {sim}(\mathbf x ,\mathbf x ^m)}\big )^2 }}{2\sqrt{2\pi }\cdot \text {sim}(\mathbf x ,\mathbf x ^m)} \end{aligned}$$

(10)

where $i'$ refers to the position of the word $y_i$ in $\mathbf y ^m$.

Then, an additional sentence level position reward is calculated as following:

$$\begin{aligned} R_{ps}(y_i|i,\mathbf y _{<i},\mathbf x ) = \delta \Big ( y_i \in \mathbf x ^m , s_{ps}(\mathbf x , y_i, i)\Big ) \end{aligned}$$

(11)

In this way, the NMT results capture sentence level patterns as we expected, overcoming the limitation of translation pieces and the presence of mismatched source words.

3.2 Piece Level Position

The piece level positions are beneficial to help the underlying NMT system to further capture local patterns. Similar to integrating the sentence level position above, the score of piece level position n ($0 \le n \le 3$) of the word $y_i$ in the collected translation piece u is simply based on the standard normal distribution with the mathematical expectation is 0 and the standard deviation is 1:

$$\begin{aligned} s_{pp}(\mathbf x , y_i, n)=\frac{e^{-\frac{(n+1)^2}{2}}}{\sqrt{2\pi }} \end{aligned}$$

(12)

where n refers to the relative position of the word $y_i$ in the piece u. For example, as shown in Fig. 3, the translation pieces are collected using the method stated in Sect. 2.2; such as “associated”, “is associated”, “that is associated” and “object that is associated” are collected. And at time step 7 when decoding the word “associated” in the NMT output layer, the values of n in those four pieces are 0, 1, 2 and 3, separately.

As a result, an additional piece level position reward can be added according to:

$$\begin{aligned} R_{pp}(y_i|i,\mathbf y _{<i},\mathbf x ) = \lambda \sum _{n=0}^{3} \delta \big (y^i_{i-n+1}\in G_\mathbf x , s_{pp}(\mathbf x , y_i, n)\big ) \end{aligned}$$

(13)

In summary, at each time step i, we update the probabilities over the output vocabulary and increase the probabilities of those that match the expected positions according to:

$$\begin{aligned} P'(y_i|\mathbf y _{<i},\mathbf x ) = P(y_i|\mathbf y _{<i},\mathbf x ) \times e^{R_p(y_i|\mathbf y _{<i},\mathbf x )} \times e^{R_{ps}(y_i|i,\mathbf y _{<i},\mathbf x )} \times e^{R_{pp}(y_i|i,\mathbf y _{<i},\mathbf x )} \end{aligned}$$

(14)

4 Experiments

In this section, we demonstrate, by experiments, the advantages of the proposed model: it yields better translation on the basis of [14] with the help of word positions from translation memory; and it still be able to keep the low latency in terms of running time mainly because of the lightweight position formulation using normal distributions.

4.1 Settings

To fully explore the effectiveness of our proposed model, we conduct translation experiments on 7 language pairs, namely, zh-en, fr-en, en-fr, es-en, en-es, de-en, and en-de. And we use case-insensitive BLEU score on single references as the automatic metric [7] for translation quality evaluation. We collect about 2 million news sentences from several online news websites for zh-en experiments, and manage to obtain pre-processed JRC-Acquis corpus from [3] for other language pairs. The highly related text in the corpus is suitable for us to make evaluations. For each language pair, we randomly select 2000 samples to form a development and a test set respectively. The rest of the pairs are used as the training set. In addition, we employ Byte Pair Encoding [8] on the previous datasets. We maintain a source/target vocabulary of 35k tokens for each language pair.

As the proposed method is directly build upon the Transformer architecture [11], which is referred to as TFM in this paper. Following [14], we implement translation pieces based system on top of Transformer for fair comparison, and it is denoted by TFM-P. The implemented systems for the proposed word position integration methods are denoted by TFM-PS and TFM-PSP for the sentence level positions and the sentence + piece level positions, respectively.

For each sentence, we retrieve 100 translation pairs from the training set by using Apache Lucene, and score them with fuzzy matching score, finally select top $N=5$ translation sentence pairs as the TMs for the sentence $\mathbf x $ to be translated.

Furthermore, since there is a hyper-parameter $\lambda $ in the system TFM-PSP (the same principle for TFM-P and TFM-PS) which is sensitive to the specific translation task, we tune it carefully on the development set for all translation tasks.

4.2 Results and Analysis

Some of translation examples are given in Fig. 4. As shown in Fig. 4, TFM and TFM-P have under-translations while TFM-PS and TFM-PSP don’t. Under-translation refers to that some source words are not translated. Our proposed methods can make full use of the fragment information in TM target and obtain translation results which are highly similar to those in TM target, with the help of word positions from translation memory.

Table 1. Translation accuracy in terms of BLEU on 7 translation tasks. Best results are highlighted.

Full size table

Table 2. Similarity Analysis - Translation quality (BLEU score) on zh-en task for the divided subsets according to similarity. Best results are highlighted.

Full size table

Translation Accuracy. Table 1 shows the main experimental results. From the overall perspective, we can see that our methods outperform the baseline TFM-P system 0.1–2.2 BLEU points varying as tasks. The zh-en translation task obtains the maximized promotion with the word position integration, while the fr-en translation task cannot make an immediate benefits as the bold numbers shown in Table 1. The main reason is that the baseline is extraordinarily strong (fr-en: 70.95 vs zh-en: 46.65), and this result is still consistent with the discovery reported in [14].

Influence on Similarity. In order to dig deeper on the influence of various similarities, we reported the translation quality on zh-en task for the divided subsets according to similarity, in terms of BLEU and TER [10] as shown in Tables 2 and 3, respectively.

The low similarity subset which is in the range of [0.0, 0.4), does little to help the result. And the middle similarity subset [0.4, 0.7) obtains improvements by 1 BLEU point. The high similarity subset that is in the range of [0.7, 1.0], obtains significant improvements, up to 9 BLEU points and down to 9.16 TER (The lower the TER value, the better) points for the test set, respectively, with the help of word position rewards as we expected according to [13].

Table 3. Similarity Analysis - Translation quality (TER score) on zh-en task for the divided subsets according to similarity. Best results are highlighted.

Full size table

Table 4. Composition of dev and test sets based on similarity score on 7 translation tasks.

Full size table

Table 4 shows statistics of each dev and test set on seven translation tasks where sentences are grouped by their similarity scores. In addition, the sentence level word positions are the main contributors to the quality improvement. In this way, we can conclude that the word positions extracted from TM are efficient to improve the final translation results in most cases, especially for those source sentences that are very similar to TM.

Running Time. We eliminate the retrieval time and directly compare running time for neural models as shown in Table 5. From this table, we observe that our proposed approach still be able to keep the low latency, compared to the baseline TFM-P employing translation pieces, and our system TFM-PSP achieves better translation performance with sentence and piece level positions.

Hyper-parameter Robustness. At last, we try to verify the robustness of the hyper-parameter $\lambda $ among various translation tasks, and show the search process in Table 6 on zh-en task. As shown in Table 6, there is enough parameter space for $\lambda $ to keep smaller translation quality volatility. In general, we can search a better value for $\lambda $ in the range of [1.0, 1.3] for other translation tasks.

In summary, the extensive experimental results show that the proposed approach achieves better translation on the basis of [14] with the help of word positions from TM, especially for those source sentences that are very similar to TM. In addition, this approach still be able to keep the low latency in terms of running time.

5 Related Work

In SMT paradigm, many research works are devoted to integrating a translation memory into the SMT [4, 6, 12]. Such as [4] extracted bilingual segments from a TM which matched the source sentence to be translated, and adopted SMT to decode for those unmatched parts of the source sentence.

Table 5. Running time in terms of seconds/sentence on zh-en task. The average lengths of sentences in Dev and Test are 31.34 and 31.17 words/sentence, respectively.

Full size table

Table 6. Translation quality (BLEU score) among various values of $\lambda $ on zh-en task.

Full size table

Recently, TM based NMT has been witnessed the increasing interests. As NMT does not explicitly rely on the translation rules as SMT, many works resort to different approaches. For example, Li et al. and Farajian et al. [2, 5] proposed a fine tuning approach to train a sentence-wise local neural model on top of a retrieved TM, which was further used for testing a particular sentence. The standard NMT model was augmented by additionally encoding a TM for each testing sentence in [3] and [13], and the proposed global models were trained to optimize for testing all source sentences. However, the above two approaches require intensive computation and considerable memory.

Considering the complexity in computation and memory, a simple and effective method that retrieved translation pieces to guide NMT for narrow domains was proposed in [14]. Their method was effective and simple, however, it can only captured local information in a hard manner while ignoring the global information in TM. Hence, in order to keep the low complexity and capture both global and local context information, in this work, we study the distribution of word positions in the collected translation pieces from TM, and employ the word position information as additional rewards to guide the decoding of NMT.

6 Conclusion

To capture sufficient contextual information in translation pieces extracted from translation memory, we have proposed a novel method that integrates sentence and piece level positions of translation memory into neural machine translation. The extensive experimental results on 7 translation tasks have demonstrated that the proposed method further achieve better translation results on the basis of integrating translation pieces, especially for those source sentences that are very similar to those retrieved from translation memory. What’s more, this approach still be able to keep the low latency and memory consumption, and the system architecture in brief.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2016), arXiv preprint arXiv:1409.0473
Farajian, M.A., Turchi, M., Negri, M., Federico, M.: Multi-domain neural machine translation through unsupervised adaptation. In: Proceedings of the Second Conference on Machine Translation, pp. 127–137 (2017)
Google Scholar
Gu, J., Wang, Y., Cho, K., Li, V.O.: Search engine guided non-parametric neural machine translation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018), pp. 5133–5140 (2018)
Google Scholar
Koehn, P., Senellart, J.: Convergence of translation memory and statistical machine translation. In: Proceedings of AMTA Workshop on MT Research and the Translation Industry, pp. 21–31 (2010)
Google Scholar
Li, X., Zhang, J., Zong, C.: One sentence one model for neural machine translation (2016), arXiv preprint arXiv:1609.06490
Ma, Y., He, Y., Way, A., van Genabith, J.: Consistent translation using discriminative learning: a translation memory-inspired approach. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011), pp. 1239–1248 (2011)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL 2002, pp. 311–318. ACL (2002)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), pp. 1715–1725 (2016)
Google Scholar
Simard, M., Isabelle, P.: Phrase-based machine translation in a computer-assisted translation environment. In: Proceedings of the Twelfth Machine Translation Summit (MT Summit XII), pp. 120–127 (2009)
Google Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pp. 223–231 (2006)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, pp. 5998–6008 (2017)
Google Scholar
Wang, K., Zong, C., Su, K.Y.: Integrating translation memory into phrase-based machine translation during decoding. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 11–21 (2013)
Google Scholar
Xia, M., Huang, G., Liu, L., Shi, S.: Graph based translation memory for neural machine translation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI 2019), pp. 7297–7304 (2019)
Google Scholar
Zhang, J., Utiyama, M., Sumita, E., Neubig, G., Nakamura, S.: Guiding neural machine translation with retrieved translation pieces. In: Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018), pp. 1325–1335 (2018)
Google Scholar

Download references

Acknowledgments

This work is supported by NSFC (grant No. 61877051).

Author information

Authors and Affiliations

School of Computer and Information Science, Southwest University, Chongqing, 400715, China
Qiuxiang He & Li Li
Tencent AI Lab, Tencent, Shenzhen, 518000, China
Guoping Huang & Lemao Liu

Authors

Qiuxiang He
View author publications
You can also search for this author in PubMed Google Scholar
Guoping Huang
View author publications
You can also search for this author in PubMed Google Scholar
Lemao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Q., Huang, G., Liu, L., Li, L. (2019). Word Position Aware Translation Memory for Neural Machine Translation. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-32233-5_29
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32232-8
Online ISBN: 978-3-030-32233-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Word Position Aware Translation Memory for Neural Machine Translation

Abstract

Similar content being viewed by others

Integrating TM Knowledge into NMT with Double Chain Graph

A Novel Method of Translation Memory to Improve Machine Translation

Neural Machine Translation with Diversity-Enabled Translation Memory

Keywords

1 Introduction