Towards achieving a delicate blending between rule-based translator and neural machine translator

Islam, Md. Adnanul; Anik, Md. Saidul Hoque; Islam, A. B. M. Alim Al

doi:10.1007/s00521-021-05895-x

Towards achieving a delicate blending between rule-based translator and neural machine translator

Original Article
Published: 29 March 2021

Volume 33, pages 12141–12167, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Towards achieving a delicate blending between rule-based translator and neural machine translator

Download PDF

Md. Adnanul Islam ORCID: orcid.org/0000-0002-0278-7738¹,
Md. Saidul Hoque Anik² &
A. B. M. Alim Al Islam³

730 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

Popular translators such as Google, Bing, etc., perform quite well when translating among the popular languages such as English, French, etc.; however, they make elementary mistakes when translating the low-resource languages such as Bengali, Arabic, etc. Google uses Neural Machine Translation (NMT) approach to build its multilingual translation system. Prior to NMT, Google used Statistical Machine Translation (SMT) approach. However, these approaches solely depend on the availability of a large parallel corpus of the translating language pairs. As a result, a good number of widely spoken languages such as Bengali, remain little explored in the research arena of artificial intelligence. Hence, the goal of this study is to explore improvized translation from Bengali to English. To do so, we study both the rule-based translator and the corpus-based machine translators (NMT and SMT) in isolation, and in combination with different approaches of blending between them. More specifically, first, we adopt popular corpus-based machine translators (NMT and SMT) and a rule-based machine translator for Bengali to English translation. Next, we integrate the rule-based translator with each of the corpus-based machine translators separately using different approaches. Besides, we perform rigorous experimentation over different datasets to report the best performance score for Bengali to English translation till today by revealing a comparison among the different approaches in terms of translation performance. Finally, we discuss how our different blending approaches can be re-used for other low-resource languages.

A Practice on Neural Machine Translation from Indonesian to Chinese

Integrating Knowledge Encoded by Linguistic Phenomena of Indian Languages with Neural Machine Translation

An empirical analysis on statistical and neural machine translation system for English to Mizo language

Article 13 September 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Every human language has a vocabulary consisting of thousands of words, which are primarily built up from several dozens of speech sounds. More remarkable point here to be noted is that every normal child basically learns the whole system (mother tongue) just from hearing others using it. However, apart from the mother tongue, other languages are generally learnt in a more systematic process. Besides, in all languages, there are many words that may have multiple meanings and also some sentences may use different grammatical structures to express the same meaning [30]. This challenge, in turn, makes it immensely difficult to perform translation between a pair of languages, which poses a major challenge in the sector of artificial intelligence. Moreover, the task of machine translation experiences the top level of difficulty when the pair of languages contain a source language that is less explored in terms of having substantially large parallel corpus [6]. Bengali represents an example of such a low-resource source language. Therefore, it remains a great challenge to do the right semantic analysis to properly recognize any sentence of such a language.

Natural languages such as English, Spanish, and even Hindi are rapidly progressing in machine translation using artificial intelligence. While progress has been made in language translation software and allied technologies, the primary language of the ubiquitous and all-influential World Wide Web remains to be English.^{Footnote 1} Millions of immigrants who travel the world from non-English-speaking countries every year, face the necessity of learning English to communicate in the language, since it is very important to enter and ultimately succeed in mainstream English speaking countries. The success gets comprehended when the learning covers all forms of reading, writing, speaking, and listening that eventually realize the process of translation encompassing a diversified set of applications. However, similar to many other non-English-speaking countries, a major group of Bengali speaking people from Bangladesh and India lacks proficiency in English [33]. This crisis is getting boosted over the period of time, as there is no well-developed translator till now for Bengali to English translation. Moreover, popular translators perform well for languages that have text corpora of millions of sentences; however, they perform poorly for low-resource languages such as Bengali. Therefore, the importance of an efficient artificially intelligent translation system for low-resource languages such as Bengali to English is noteworthy.

To this context, in this article, we study machine translation for a low-resource language (e.g., Bengali to English) through exploring rule-based translation and neural (or statistical) machine translation—both in isolation and in combination, through applying different blending approaches. Besides, we discuss the implication of our blending approaches for several low-resource languages by providing concrete examples. Based on our work, our main contributions in this article are as follows:

We integrate rule-based translator with existing NMT (and SMT) using different possible approaches. To do so, first, we implement the classical NMT, and adopt an existing Bengali to English rule-based translator. Next, we blend rule-based translator and NMT in three different ways to investigate the best-possible blending approach. Afterwards, similar to NMT, we implement SMT, and blend rule-based translator with it to verify our best-possible blending approach.
Additionally, designing a parallel corpus containing Bengali–English sentence pairs for training NMT or SMT is one of the toughest challenges that we face, since Bengali is an extremely low-resource language. Hence, we develop three Bengali–English parallel corpora having reasonable sizes, which can enhance future research opportunities in this arena.
Finally, we perform the performance evaluation for rule-based translator, classical NMT, and their integrated solutions using three standard metrics—BLEU, METEOR, and TER. We present the results for rule-based translator and NMT both in isolation and in combination. We also perform comparative analysis of the results among all the proposed approaches both statistically and graphically. Besides, we show performance scores for SMT and its integrated solutions with rule-based translator as an extension of our experimental results.

2 Background and related work

Bengali, being among the top ten languages worldwide,^{Footnote 2} lags behind in some crucial areas of research in natural language processing (NLP) such as parts-of-speech (POS) tagging [36], machine translation, text categorization and contextualization [11], syntax and semantic checking, speech to text conversion [12], etc. Most noteworthy previous studies specifically in machine translation include Example-based Machine Translation (EBMT) [33], phrase-based machine translation [9, 15], and use of syntactic chunks as translation units [17]. However, these studies lack in processing Bengali words semantically. Besides, although significant research work can be found on English to Bengali translation [4, 7, 32, 40], very few work have been performed on translating from Bengali to English [29,30,31]. Popular translators such as Google, Bing, Yahoo Babel Fish, etc., often perform very poorly when they translate from Bengali to other languages. Google translator, the most popular one among them, uses neural machine translation (NMT) approach with RNN at present [19, 41].

NMT (Bahdanau et al. [3]) has emerged as the most propitious machine translation technology recently, exhibiting superior performance on different public benchmarks. It is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional translation systems. In spite of the recent success of NMT in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs such as Bengali–English [20]. This is why, NMT performs reasonably well when it translates among the most popular languages, however, it often makes elementary mistakes while translating languages that are less known to the system such as Bengali as shown in Fig. 1. Focusing on rule-based translation for such low-resource languages might be a solution. Moreover, blending NMT (and SMT) with such rule-based translator is yet an aspect to be investigated till now.

2.1 Corpus-based machine translation

Wu et al. [41] presented GNMT, Google’s Neural Machine Translation system, with the objectives of reducing computational cost both in training and in translation inference, and increasing parallelism and robustness in translation. However, this approach solely relies on availability of significantly large parallel corpus and makes elementary mistakes while translating low-resource languages [16]. Sennrich et al. [37] proposed an approach for translating low-resource languages by pairing monolingual training data with an automatic back-translation to treat it as additional parallel training data. Besides, Gu et al. [10] proposed a new universal machine translation approach focusing on extremely low-resource languages. Furthermore, Artetxe et al. [2] removed the need of parallel data and proposed a novel method to train an NMT system with the objectives of relying on monolingual corpora only, and profiting from small parallel corpora. However, this promising approach still falls much behind the performance level of classical NMT. Saha et al. [35] reported an EBMT system with the objective of translating news headlines from English to Bengali. However, this work was a specialized methodology only for newspaper headlines.

Gangadharaiah et al. [9] converted CNF to normal parse trees using bilingual dictionary with the objective of generating templates for aligning and extracting phrase-pairs for clustering. Kim et al. [17] used syntactic chunks as translation units with the objective of properly dealing with systematic translation for insertion or deletion of words between two distant languages. However, these approaches also rely on availability of significantly large parallel corpus.

2.2 Rule-based and hybrid machine translation

Additionally, there exist several research studies focusing on Bengali language processing. For example, Bal et al. [4] proposed a solution based on parse tree, Naskar et al. [25] handled prepositions, Dasgupta et al. [7] proposed another approach based on parse tree, etc. However, these techniques consider English-to-Bengali context only, not focusing on Bengali-to-English. Rahman et al. [30] explored statistical approach for Bengali-to-English translation. Besides, Rahman et al. [31] explored a basic rule-based approach for the same. However, these techniques either depend on a large corpus or omit some basic grammatical features. In addition to that, [1, 26] proposed hybrid translation techniques, which do not offer substantial improvement over other existing techniques.

None of the existing studies focuses on integration of rule-based translator with any corpus-based machine translator (NMT or SMT) for translation between any language pair. Moreover, an effectively large parallel corpus for Bengali-to-English machine translation is yet to be available. In this article, we focus on integration of rule-based translator with corpus-based machine translators (NMT and SMT) specifically for Bengali to English translation.

3 Architecture of NMT

Earlier, phrase-based translation systems accomplished translation tasks by splitting source sentences into several phrases, and then performing phrase-by-phrase translation. However, these approaches fail to realise the semantics of the whole (source) sentence before generating translation, and thus, fall short of accuracy and fluency in translation. With the advent of NMT, such limitations are significantly curtailed. NMT is basically an end-to-end learning approach for automated translation, which has the potential to generate cogent translations by realising long-range dependencies (e.g., subject-verb agreements, gender agreements, semantics, etc.) in source sentences [21]. NMT models generally differ in terms of their architectures. In this article, we employ the encoder–decoder architecture to generate neural machine translations. In this architecture (Fig. 2), an encoder converts the source sentence into a sequence of numbers that represents meaning of the sentence (i.e., ‘thought’ vector), and a decoder performs a translation from that vector. Usually, both the encoder and the decoder uses Recurrent Neural Network (RNN). However, RNN models vary in terms of several aspects such as number of layers (single or multi), directionality (unidirectional or bidirectional), and category (vanilla RNN, long short-term memory (LSTM), or gated recurrent unit (GRU)). We consider multi-layered (2 layers) bi-directional RNN (LSTM architecture) for both encoder and decoder. Figure 3 shows the architecture of our employed such NMT model for performing neural machine translations.

We discuss different components of our adopted NMT architecture in relevant details below.

3.1 Embedding layer

The NMT system is trained with a suitable parallel corpus as the system must fetch the corresponding word embeddings using source and target embeddings learned from training. To accomplish so, first, the NMT model selects a vocabulary of size V for both source and target languages by taking only the most frequent V words into consideration. The remaining words are transformed into an “unknown” ($< \text{unk}>$) token having identical embedding. Our NMT model utilizes ‘Word2Vec’ embedding technique with Skip Gram model [23].

3.2 Encoder

NMT models can use one or more LSTM layers to implement the encoder model. It outputs a fixed-sized vector that represents the internal portrayal (semantics) of the input sequence. The number of memory cells in each layer defines the length of this vector. Here, we use dynamic RNN to allow variable length sequential data processing. The encoder RNN states are initialized to zero vectors.

3.3 Decoder

The decoder converts the learned semantics of a source sequence into a target sequence. Similar to encoder, the decoder model can be implemented using single or multiple LSTM layers. Since the decoder needs to acquire information about the input sequence from the encoder, it is simply initialized to the final hidden state of the encoder. Hence, as shown in Fig. 3, the hidden state at the last input word ‘student’ is transferred to the decoder side. In our model, we use beam search decoder with $beam width=10$ for translation.

3.4 Projection layer and loss

The Projection layer is a dense matrix to turn the top hidden states to logit^{Footnote 3} vectors of dimension equal to the vocabulary size. Next, ‘Loss’ is calculated between predicted translation and reference translation so that it can be propagated backwards for updating the weights in both encoder and decoder. Our employed NMT model uses cross entropy loss function for calculating ‘Loss’.

3.5 Gradient optimization

After calculating ‘Loss’, its derivative (i.e., gradient) needs to be calculated for updating the weights (controlled by a learning rate) during backward propagation through the neural network. However, to avoid ‘exploding gradients,^{Footnote 4}’ we perform the gradient clipping by the global norm.

Next, the NMT model chooses an optimizer to control the attributes of its neural network (e.g., learning rate and weights) for diminishing the overall losses [34]. We choose the stochastic gradient descent (SGD) [34] optimizer (with a decreasing learning rate schedule) over the widely used Adam optimizer [18] based on the benchmarks achieved by [21].^{Footnote 5}

3.6 Inference with attention: generating translations

Once the NMT model has finished its training, it can generate translations of the source sentences that are not seen yet. This step of generating translations is termed as ‘Inference’. Inference differs from training since during inference, the NMT model has access to the input sentence only. During inference, firstly, NMT encodes the input sentence using the encoder (similar to training phase). Next, it initiates the beam search decoding process upon receiving a special starting symbol ($<\text{s}>$) as shown in Fig. 3. Then, at each decoder time step, NMT computes the attention to generate the RNN’s output as logit vector.

Attention mechanism basically operates as an interface between encoder and decoder in order to transfer relevant information from the encoder hidden states to the decoder. Computation of attention involves several successive steps such as attention weights ($a_{ts}$) derivation, context vector ($c_{t}$) calculation, and final attention vector ($a_{t}$) computation, which is then input to the next time step. We incorporate Luong’s attention mechanism^{Footnote 6} [22] for computing $a_{ts}$, $c_{t}$, and $a_{t}$ using the following three equations, respectively.

$$\begin{aligned} a_{ts}& = \frac{exp(score(h_{t}, \overline{h_{s}}))))}{\sum _{s'=1}^{S} exp(score(h_{t}, \overline{h_{s'}}))} \end{aligned}$$

(1)

$$\begin{aligned} c_{t}&= \sum _{s}^{} a_{ts}\overline{h_{s}} \end{aligned}$$

(2)

$$\begin{aligned} a_{t}&= f(c_{t}, h_{t})= tanh(W_{c}[c_{t}; h_{t}]) \end{aligned}$$

(3)

Here at time step t, ‘score’ function [22] is used to compare the decoder hidden state $h_{t}$ with the encoder hidden states ${\overline{h}}_s$, which is normalized to produced the weights ($a_{ts}$). Afterwards, the computed attention vector $a_t$ is utilized for obtaining the logit vector and loss using Softmax.

Using the logit vector, our NMT model applies beam search decoding technique with $width=10$ (i.e., selecting the top 10 words based on the logit values in a depth first search fashion) at each decoder time step. The process terminates when the decoder outputs the special ending marker ($< /\text{s}>$) as shown in Fig. 3. We limit the translation lengths by decoding up to twice the source sentence lengths.

Although the NMT models thrive to achieve the accuracy level of human translators in high-resource language settings, they always suffer from several major weaknesses. Three such inherent weaknesses of NMT are: (1) inefficacy in handling atypical or unknown words, (2) torpid training and inference speed, and (3) sometimes, inability to translate every word in the input sentence [6, 19]. Thus, to enhance the translation performance of NMT (particularly in low-resource settings), we investigate the integration of rule-based approach with NMT in the subsequent sections.

4 Proposed methodology

Our work initially focuses on adopting a rule-based translator for Bengali to English translation. To our best knowledge, building a reasonably working rule-based $Bengali\rightarrow English$ translator has been studied only in [13] and [14]. Next, our target is to explore and implement the classical NMT^{Footnote 7} [41] as discussed in the previous section. To do so, we collect and build datasets (Bengali–English parallel corpora) of different sizes from different sources. Subsequently, after implementing both rule-based translator and classical NMT in isolation, we integrate these two translators using different approaches to investigate the best possible translation performance. We present our proposed mechanisms and algorithms in details next.

4.1 Blending rule-based translator with corpus-based translators

Scope of rule-based translator expands as we keep adding more rules. However, it is near-to-impossible to implement unlimited and ever changing grammatical rules for any language. Besides, it is hard to deal with rule interactions in big systems, grammatical ambiguities,^{Footnote 8} and idiomatic expressions.^{Footnote 9} As a result, the potential of corpus-based machine translation (NMT and SMT) comes to light. Recently, NMT has emerged as the most popular machine translation system. However, both NMT and SMT have their own major limitation in terms of generating accurate translations for low-resource languages as shown in Fig. 1 earlier. Thus, both rule-based translator and corpus-based translators exhibit their advantages and limitations compared to other. This finding leads to our investigation on blending between rule-based translator and corpus-based translators.

To do so, we implement the classical NMT in our system from an open source resource$^{7}$. Then, we integrate a Bengali to English rule-based translator [13, 14] with the classical NMT to investigate whether such an integration can achieve a better performance in translation. To be precise, we explore the blending in three different ways:

NMT followed by rule-based translation,
Rule-based translation followed by NMT, and
Either NMT or rule-based translation

Figure 4 illustrates how we can implement the possible blending approaches in our system. Note that we also implement similar approaches using SMT in place of NMT. Besides, we present our blending techniques in Algorithm 1, Algorithm 2 and 3. We discuss each of these three techniques next.

4.1.1 NMT followed by rule-based translation (NMT+RB)

Classical NMT initially requires training with parallel corpus (sentence pairs of source language and target language). In our case, we develop and adopt parallel corpus of different sizes containing Bengali–English sentence pairs for training the NMT. After training, we feed the intended input sentences to the NMT and generate the output translated sentences using the classical NMT approach. After getting the NMT generated translated sentence, our blending approach applies grammatical rules on the translated sentence to further modify the sentence to improve its translation performance (Fig. 4a). In our experimentation, we consider a deep multi-layer recurrent neural network (RNN), which is bidirectional and uses LSTM as a recurrent unit.

Algorithm 1 shows the skeleton of our blending approaches. Here, using the token tagging (parts-of-speech) information from the rule-based translator [13], our blending system substitutes some of the words or phrases in the NMT generated translated sentence with the translated words or phrases obtained from the rule-based translator. More specifically, rule-based translator just further ameliorates the skeleton of the translated sentence that NMT has already built as shown in Algorithm 2.

Algorithm 2 considers NMT generated translation and rule-based translation as ‘sentence1’ and ‘sentence2’, respectively, for ‘NMT followed by rule-based’ blending approach. Here, if our blending system finds any pair of unmatched words (tokens) having the same parts-of-speech (PoS_tag) between these two sentences, then our system replaces the NMT word with the corresponding rule-based word. This is how our system checks each word in the NMT generated translation with each word in the rule-based translation for replacement.

Figure 5 shows an example of how this blending technique works. Here, apart from generating translation of the source sentence by NMT, we also generate its rule-based translation. Next, our blending system matches translations from both the translators word by word using the parts-of-speech tagging information of the rule-based translator.

In the figure, the input Bengali sentence is pronounced as “Oisheeo tar kajti shesh kortechilo”. Its reference translation is “Oishee also was finishing her work”.^{Footnote 10} Here, NMT translates Bengali name “Oishee” to “Ishii”, where “Ishii” is tagged as noun. However, “Oishee” gets the same PoS_tag in rule-based translation. Therefore, first, this blending technique replaces “Ishii” by “Oishee” in the final translation. Afterwards, similarly, it also replaces “had”, “finish”, and “his” by “was”, “finishing”, and “her”, respectively, keeping the words in other positions intact. Here, all of these substitutions contribute in improving translation performance.

This technique proves itself to be the best blending technique (will be evaluated next) because of the fact that it realizes skeleton of translation from NMT and token-based attributes (person, number, tense, etc.) from rule-based translation. These two different forms of realizations best fit to strengths of the two different translation approaches.

4.1.2 Rule-based translation followed by NMT (RB+NMT)

Our next blending technique implements a reverse sequence of the previous blending technique. We modify the rule-based translated sentence by NMT in this blending technique as shown in Fig. 4b. Similar to the earlier case, Algorithm 2 also illustrates this blending technique. This time, our system considers rule-based translation as ‘sentence1’ and NMT generated translation as ‘sentence2’ in Algorithm 2.

Major limitation of this technique originates from the fact that NMT can generate completely wrong words during translation since NMT always predicts the next word in sequence based on its training data. On the other hand, rule-based translator at least cannot pick wrong words since it only searches the vocabulary for any particular word translation and pick the translated word if found. Therefore, if the blending system further modifies the rule-based translated sentence by NMT then it can happen that translation performance degrades in many cases. Only luck with this approach is when rule-based translator cannot recognize the source sentence due to lack of appropriate ruleset, which may leave some space for NMT to contribute plausibly.

Figure 6 presents an example on how this technique performs translation for the same source sentence (in Fig. 5). Here, initially, the two unmatched words - “Oishee” in rule-based translation and “Ishii” in NMT, hold the same PoS_tag (noun). Therefore, first, this blending approach replaces “Oishee” by “Ishii”. Afterwards, it also replaces “was”, “finishing”, and “her” by “had”, “finish”, and “his”, respectively, as shown in Fig. 6. Unfortunately, in this example, all of these replaced words are incorrect; thus, degrading the translation performance.

4.1.3 Either NMT or rule-based translator (NMT or RB)

This blending technique is much simpler compared to the earlier ones. It basically performs choosing one between two translations generated by rule-based translator and NMT separately as shown in Fig. 4c. However, this blending system needs to make the choice based on some criteria so that it chooses the better one.

We find that the rule-based translator performs better for smaller sentences (not more than 6 words). We present a quantitative analysis to reflect this statement later in Sect. 7.1.4. Therefore, this blending approach chooses rule-based translation if the source sentence is smaller in length (less than 7). Otherwise, it chooses NMT generated translation as the output translation. We present this blending approach in Algorithm 3.

Figure 7 shows a working example of this blending technique. In the figure, we identify the source sentence (in Fig. 5) as a small sentence with only five words. Since our blending system considers sentences consisting of less than 7 words as small sentences, the system selects translation generated by the rule-based translator as the final translation and ignores NMT this time. Besides, note that we can update the selection criteria (sentence type) in this blending system according to the scope of the rule-based translator. The more we add rules, the more types of sentences (having different lengths) we can translate using rule-based translator. Therefore, selection criteria can be made much more flexible and tricky in this system depending on performance analysis after incorporating more rules.

5 Experimental settings

We perform rigorous performance evaluation of our different approaches on the basis of different types of metrics. We need to employ considerable resources for our experimentation, as such experimentation are resource-hungry and time consuming in general. To perform our experimentation with NMT, we use Python language, PyCharm IDE, Tensorflow library, and Linux (64—bit) operating system. Here, we consider an encoder–decoder model with a deep multi-layer RNN, which uses LSTM as a recurrent unit.

To start the experimentation, we design datasets for training NMT and testing its performance. We use the following hyper-parameters in our system for training NMT with our designed datasets—(1) 2-layer LSTMs of 512-dim hidden units with bidirectional encoder, (2) 12k-100k training steps, (3) 20% dropout rate, (4) Luong’s attention (scale=True), (5) embedding dimension 512, and (6) SGD optimizer with initial learning rate 1.00.^{Footnote 11} Besides, we use sigmoid ($\sigma $) and hyperbolic tangent (Tanh) as the activation functions [8] in the LSTM cells,^{Footnote 12} and apply softmax in the output layer. We choose these hyper-parameters and activation functions based on the benchmarks achieved for English-Vietnamese and German-English translations in [21]. Besides, after experimenting with multifarious parameter settings, we find the best results for $Bengali\rightarrow English$ neural machine translation using this setup.

Afterwards, to experiment with SMT, we use Moses toolkit (with GIZA, SRILM, and IRSTLM)^{Footnote 13} written in C, C++, and Perl. Besides, we use JAVA language, Netbeans IDE, Sqlite database, and Opennlp tools for both implementing the rule-based translation system and performing blending between rule-based and corpus-based translators.

6 Datasets and evaluation metrics

Designing and developing datasets has been one of the most challenging and time intensive tasks in our experimentation. For training the NMT reasonably, we require a large parallel corpus containing both source language and target language. In our case, NMT requires such a corpus of Bengali–English sentence pairs. However, we find very few sources available for constructing a reasonable sized dataset containing Bengali–English sentence pairs. Hence, one of the major contributions in this study is that we build three novel parallel corpora containing Bengali–English sentence pairs, which will enhance future research opportunity for Bengali language.

6.1 Demography of datasets

We develop our dataset of Bengali–English parallel corpus from well-established contents such as Al-Quran,^{Footnote 14} newspapers,^{Footnote 15} movie subtitles,^{Footnote 16} and university websites.^{Footnote 17} Besides, we translate different example-based individual Bengali sentences into English and accumulate them in the dataset. In fact, we create the corpus mostly at our own by translating different Bengali sentences to English one by one. Figure 8a illustrates a demography of our full dataset.

Initially, we experiment with only literature-based source (Al-Quran) of our full dataset since its size is large enough to be considered as a separate dataset when compared to the size our full dataset. Afterwards, we also experiment with our full dataset with an intent to generate results from a fairly diversified dataset. Therefore, our full dataset also includes another dataset (custom dataset) as its subset (apart from the literature-based dataset). However, we do not consider using this custom dataset independently in our experimentation since its size is too small to train an NMT system reasonably. We present a demography of our custom dataset (a subset of full dataset) in Fig. 8b. Besides, both Bengali and English sentences in our full dataset vary in size or length. Figure 9 reflects percentages (%) of different types of sentences in our full dataset in terms of different sizes or lengths.

There is another dataset containing more than 1 million Bengali–English parallel sentences, which is primarily collected from a website.^{Footnote 18} Similar to our full datasat, we show the percentages (%) of different types of sentences in this dataset in terms of different sizes or lengths in Fig. 10. However, the sentences in this dataset contain numerous unknown characters and words (even from other languages such as Arabic, Chinese, German, etc.), which needs to be cleaned first before using in experimentation. Therefore, we carefully remove such unknown characters from this dataset. Besides, there are English sentences that are not proper translations of corresponding Bengali sentences in this dataset. Therefore, this dataset requires further rigorous manual checking and translation alignment for each sentence pair, which we leave for now as an immediate future work.

Table 1 shows summary of the different datasets. For experimentation, each dataset is split into train (around 80%), development and test data by choosing sentences randomly, where train and test data are mutually exclusive. Also, development and test data are independent of each other. Besides, all the data (sentences) are tokenized and segmented into subword symbols using byte-pair encoding (BPE) [38] with 32,000 operations.

Table 1 Summary of the different datasets

Towards achieving a delicate blending between rule-based translator and neural machine translator

Abstract

Similar content being viewed by others

Explore related subjects

1 Introduction

2 Background and related work

2.1 Corpus-based machine translation

2.2 Rule-based and hybrid machine translation

3 Architecture of NMT

3.1 Embedding layer

3.2 Encoder

3.3 Decoder

3.4 Projection layer and loss

3.5 Gradient optimization

3.6 Inference with attention: generating translations

4 Proposed methodology

4.1 Blending rule-based translator with corpus-based translators

4.1.1 NMT followed by rule-based translation (NMT+RB)

4.1.2 Rule-based translation followed by NMT (RB+NMT)

4.1.3 Either NMT or rule-based translator (NMT or RB)

5 Experimental settings

6 Datasets and evaluation metrics

6.1 Demography of datasets

6.2 Representativeness in our datasets

6.3 Performance evaluation metrics

7 Experimental results from our different blending approaches

7.1 Results using literature-based dataset

7.1.1 Comparison between NMT and only rule-based approach

7.1.2 Comparison between NMT and ‘NMT followed by rule-based’ approach

7.1.3 Comparison between NMT and ‘rule-based followed by NMT’ approach

7.1.4 Comparison between NMT and ‘either NMT or rule-based’ approach

7.1.5 Analysis on sensitivity of our operational parameter

7.2 Results using full dataset

8 Overall experimental findings

9 Extension of our experimental results with statistical machine translation

10 Extending our study to a high-resource context

11 Comparative analysis with benchmark translation approaches for low-resource languages

12 Discussions

13 Conclusion and future work

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation