1 Introduction

Machine translation (MT) system has become an important area of research for many researchers as communication with the people of various nations, countries and cultures via social networks. MT translates the source language sentence into a targeted language sentence [1]. Thus, the application of MT of various languages has become a part of our daily life. Basically, MT systems, analyzes the inputted sentence and builds its grammatical structure to generate the translation into the target language structure [2,3,4]. Google translate is the most popular translator website which supports over 100 different human languages. According to the website of May 2017, more than 500 million people use Google translator daily [5]. The technology used to develop this translator is statistical MT. It changed the world by allowing people to communicate even after they do not know the language. Although it is a popular MT system, it does not provide efficient result for every sentence. The aim of this research is to build an efficient MT system that overcomes some of the limitations of previous MT systems. One of the major limitations for translating a SOV structured sentence into SVO structured language is semantic ambiguity. Semantic ambiguity refers to different meanings of one word. For this reason, translators provide logically meaningless output for each sentence. Moreover, language experts are required for adding new language or update the existing language rules to the translator. But our system overcomes these problems.

There are a lots of subject-object-verb (SOV) structured language in the world such as- Bengali, Hindi, Punjabi, Marathi, Japanese etc. Although our proposed MT system is shown as SOV structural language for Bengali and English as SVO structural language, the process is similar for the same structural language. Bengali language follows subject-object-verb (SOV) structure whereas English follows subject-verb-object (SVO) structure. The general population in Bangladesh and two states (West Bengal and Tripura) in India use Bengali as their first dialect. According to a recent survey around 1/6th populace of the world is talking in Bengali [6]. Although 230 million people speak in Bengali, but only a few numbers of resources and tools are available for translating this language. On the other side, semantic ambiguity is one of the major problems in Bengali language [7]. More researches have been conducted to solve this problem over the past years. Some of the researches provide the best result for some particular parts, but none of them works efficiently for every part of a sentence. There are various approaches for machine translation such as- word-by-word MT, Rule Based MT, statistical MT, etc., [8]. Word-by-word MT is an approach which translates a sentence in word-by-word sequence. It does not provide efficient result for some languages like- English to Bengali, English to Hindi, English to Japanese, etc., [9]. Rule based MT system contains some set of rules which translate a sentence efficiently [10,11,12]. But it has limitations because the human language rule is not fixed, it’s changing continuously. New things added to the language and updated the language continuously day by day. Statistical MT system solves those limitations of word-by-word and rule-based MT. Basically, Statistical MT analyzes the existing human translations to translate a sentence [13]. This translation system is popular as it generates more than one translated sentence for each translation and shows the most suitable one as a result [14,15,16]. Furthermore, the accuracy of this system is much better [17, 18]. But this technique also contains some disadvantages. This system requires a huge number of training data. To build and maintain this translation system is complicated compares to others. Moreover, new language experts are required for a new pair of languages translation [19,20,21].

In this study we have established an efficient system that will analyze, understand, and generate languages, which humans use naturally. Our proposed MT system contains four main parts—(1) analysis, (2) encoding, (3) decoding, and (4) generation. In the analysis part, source sentence is divided into meaningful words and every word is checked lexically. Two deep learning techniques—recurrent neural networks (RNNs) and encodings are used in encode part. One word at a time is sent into the RNNs and it generates a set of numbers using Encoding algorithms. In the decode part, another RNNs is trained to generate the word into the targeted language from the source language. In the generation part, those words are classified into syntactic categories based on verb, adverb, noun, pronoun, etc. Those words can be swapped their positions and combined with other words to create meaningful sentences in the targeted language. So, if we are able to generate the set of numbers from source language sentences using RNN and Encoding algorithm, it will be easier for us to translate the sentence into targeted language.

The rest of the paper organized as follows. We start with a discussion of various previous works along with various machine translation techniques such as word-by-word translation, rule-based translation, and statistical machine translation of MT in Sect. 2. The overall discussion of proposed model and deep learning algorithms (RNN and Encodings) are shown in Sect. 3. Section 4 focuses on dataset collection and experimental result analysis of this system whereas Sect. 5 shows the overall discussion of the system. Finally, we summarized the paper with some concluding remarks and future direction in Sect. 6.

2 Previous works

Many researchers contributed various methods for translating sentences from one language to another. More often, deep learning, machine learning, and artificial intelligence have been used. In [22], the contributors proposed a recurrent neural network (RNNs) based statistical machine translation model. They improved the quality of result by comparing the phrase table of statistical machine translation and the phrase table of their proposed method. The authors used Moses toolkit to develop the machine translation (MT) model. In [23], the authors have shown a tense and phrase-based English to Bengali transfer architecture using fuzzy Rule-Based approach. Based on the attributes of sentences they have categorized each sentence and organized them into a pattern. Their proposed system separated meaningful words from each sentence and arranges them according to the rules. After that, the system uses morphological analysis to reconstruct the sentence into the target language. They have also shown the efficiency of their work by comparing their experimental result with Google translator. In [24], authors proposed a Bengali DeConverter for translating Universal Networking Language to Bengali Language. They tested their DeConverter on UNL expressions of 300 Bengali sentences using a Russian and English Language Server. And found that their system generates 90% syntactically and semantically correct Bengali sentences with a UNL Bilingual Evaluation Understudy (BLEU) score of 0.76.

In [25], another phrase based statistical machine translation model has been discussed in English to Bengali sentence. Since there is not much training data available for Bengali language, authors have used a transliteration module to handle this problem. Finally, the effectiveness of this system has shown by BLEU, NIST and TER scores. They have claimed that the overall BLEU score of their proposed model is 11.7 and for the short sentence, it is 23.3. In [26], the authors build a syntactic structure of Bengali and English sentence using context-free grammars in English assertive sentences. In this system, they have used a bilingual dictionary for the contextual information and the morphological properties of English to Bengali words. The transferring English language structure of the corresponding Bengali language structure with lexical meaning is required to build this system. They have also generated a prototype of their system and used a huge number of English assertive sentences. They have found that their system shows the efficient results compare to other systems.

As per the above discussion, some researchers have been conducted in the development of automatic translation of Bengali noun-based compound sentence in universal networking language (UNL) documents and develops some automatic software for it. In [27], the authors have shown the method of morphological analysis in Bengali words into UNL. They have discussed the techniques to develop a mediator language from Bengali language which can easily be converted into various languages and vice versa. In this paper, the authors have explained the morphological rules. Based on tense, subject, preposition etc., their morphological rules can modify the parts of speech of a sentence. In [28], the authors have endeavored to create machine translation, Bengali word references that address the association, substance and subtleties of the data. In [29], contributor grew minimal effort English to Bengali (E2B)—ANUBAD making an interpretation of English content into Bengali content, utilizing both guidelines based and change based MT conspires alongside three dimensions of parsing. Another endeavor by [30] was to build up a factual Bengali to English interpretation motor utilizing just basic Bengali sentences that contains a subject, an item and an action word.

The various MT systems have been proposed by various researchers. Those are word-by-word MT, Rule Based MT, statistical MT, etc. All of the systems perform well in some particular areas, but none of them does not provide efficient results in every translation. Here, we have discussed about various machine translation techniques along with figures and examples.

2.1 Word-by-word translation

This is the easiest and the simplest translation process. The main concept of this system is to replace each word of a sentence with the translated word in the target language [31]. Figure 1 shows an example of translating from Bengali to English language using word-by-word technique. The implementation process of this technique is easy because it requires a dictionary for word translation. But the resultant accuracy of this technique is poor, because it does not care any grammatical rule and word order of the targeted language.

Fig. 1
figure 1

Word-by-Word translation system

2.2 Rule-based translation

To make the previous system more accurate, language specific grammars and rule-based translation system should be added. For example, the order of verbs and nouns might be swapped because in Bengali language verbs usually come after the noun on the basis of structure of subject-object-verb (SOV) unlike English where English follows SVO structure. And sometimes two words might be translated into one single word. Figure 2 shows an example of translating from Bengali to English language using grammars and context. If more grammars and rules are added, the system will be more efficient [32, 33].

Fig. 2
figure 2

Rule based translation system

Some rules for universal networking language (UNL) Bengali word dictionary are shown below:

The mapping of Bengali words to UW and their corresponding lexical-semantic attributes are stored in UNL Bengali word dictionary. Basically, the collection of entries in the UNL Bengali word dictionary is known as lexicon. Every entry is made out of three sorts of components: Universal Word (UW), Headword (HW) and Grammatical Attribute (GA). A HW is a documentation of a word in a local dialect making the information sentence. It is utilized as a trigger in acquiring equal UWs from a Word Dictionary during the time spent in re-conversion. A UW communicates the importance of a word is utilized in making UNL systems (i.e., UNL articulations) of yield. GAs is the data on how words carry on in a sentence and are utilized in re-conversion rules. Every lexicon passage has the accompanying configuration partner with any local dialect word [10].

figure a

In the dictionary entries, attributes N denotes noun, COMN for common nouns, ANI for animal object, CEND means consonant ended word, VEND means vowel ended word, FLWR for flower, PLNT for plant, PRON alludes to Pronoun, 3SG for third person singular number, HPRON to human pronoun, respectively.

This machine translation system provides good accuracy in written and plainly structured documents, such as simple article, and weather report [34]. It cannot work efficiently for the real-world documents. The main reason is human language does not follow a fixed set of rules [35]. Human languages are full of regional variations, special cases, and new rules. New rules are continuously coming into the language and old rules are continuously changing [35]. So, a supervised automated system is required for efficient translation of language, which can easily resolve those problems.

2.3 Statistical machine translation

Since rule-based translation systems contain lots of limitation, new translation systems were developed. Instead of grammars and rules of a sentence, the new translation system uses the statistics and probability to translate a sentence. This statistical machine translation system requires a huge number of training data [36]. The interesting thing of statistical machine translation systems is that unlikely previous two translation systems they do not generate only one translation for one sentence. Instead of generating only one translation they generate all possible translations and placed them in terms of rank [36]. Finally, outputs the lowest ranked translated sentence as the result. The whole process has been completed by three steps [13].

Step 1: Divide the sentence into meaningful words In the first step, divide the original sentence into simple, meaningful words that can be translated easily. Figure 3 shows how original sentence is divided into simple meaningful words. Here each underlined word is a meaningful word.

Fig. 3
figure 3

Sentence is divided into meaningful words

Step 2: Translate each meaningful word into the targeted language in all possible ways In the second step, find out how humans translated those words with all possible ways. Not only has the simple translation dictionary involved in looking up these words, but also how actual human translate these same words are involved. This process is much helpful to find out all the possible ways of translating each word. Figure 4 shows all possible ways of translating each word of a sentence.

Fig. 4
figure 4

Meaningful word translation into targeted language

Since each word contains lots of translated words in targeted language, some translated words are more meaningful and frequent than others. A score has been provided for each translated word based on how frequently these translations appearing the training data. For example, it is more common to everyone that “puro” means “the whole” rather than “the every”. Based on how frequently “puro” translates to “the whole” in the training data, the system provides a score to each word.

Step 3: Make all possible sentences and take the suitable one Finally, the third step uses all possible combinations of translated words to find out the possible sentence. More than 2000 different sentences can be generated only from the translated words shown in Fig. 4. Some examples are shown below:

I | would like | to visit | the entire | universe

I | want | to travel | the whole | world

I | quest | to see | the full | earth

I | desire | to look over | the whole | universe

But in the real world, people do not use the same order to expose a sentence. They may use different order of words for the same sentence. Therefore, more possible sentences might be come. Some examples are shown below:

I | wish | to travel | the entire | earth

I | would like | to sally out | the total | world

I | want | to visit | the entire | terra

I | hope | to travel | the whole | glove

Now, the main challenge is to find out the sentence, which is more humanistic. To make sure of this the system compares every translated sentence with millions of real-world sentences. These real-world sentences are written on different types of books, magazines, newspapers, etc. Based on this technique, the system will generate a probability score for each translated sentence. For example, consider this possible translation:

I | quest | to see | the full | earth

It seems that nobody has ever written a sentence like this way and any similarity of this sentence will not be found in the dataset. Therefore, a low probability score has been given in this sentence. But look at this possible translation:

I | want | to travel | the whole | world

Since this is much meaningful sentence and many similar sentences will be found in the datasets, so a high probability score has been given in this sentence. Finally, the system will output a translated sentence, which contains the most probable score. If enough training data have given to this system, it will perform more accurately compared to other two translation systems. In the early of 2000s, Franz Josef Och used those ideas to make Google Translator [5]. And still now Google Translator is performing well.

Although statistical machine translation systems perform well, they have some limitations too. To build and maintain a statistical machine translation system is complicated compares to others. New language experts are required for multi-step translation pipeline when you want to translate new pair of languages. Building a new pipeline is not as easy as it contains lots of work. For example, if you want to translate Sinhala to Bosnian with Google translate, firstly Sinhala will translate into English as an intermediate language and then English to Bosnian translation will take place. Because there is not enough translation happens between Sinhala to Bosnian language pair. To make a translation directly from Sinhala to Bosnian, new language experts and multi-step translation pipeline are required.

3 Proposed method

In this paper, we have proposed a deep learning-based translation system, which can translate between two languages without human intervention. Our proposed translation system uses RNN and encoding algorithms. This section shows the proposed model along with an overview on RNN, and encodings algorithms.

3.1 Proposed model

Our proposed system is categorized into three stages.

  1. 1.

    We used the encoding technique to generate a series of unique numbers from a given sentence. The input sentence to the encodings is in subject-verb-object (SOV) structured language form.

  2. 2.

    We added RNN with encodings. In addition to encoding algorithm, we used a RNN so it can generate unique numbers only for one word at a time. The final result (series of unique numbers) for a given sentence generates after the last word was processed. In the process, we are not required to know the meaning of each encoded number because unique numbers are generated for each unique sentence.

  3. 3.

    Lastly, we added another RNN with the previous stage. Two RNNs were used to translate a sentence. The first RNN embedded with Encodings generates a series of measurement numbers only if the input sentences are SOV structured. While the second RNN decodes those generated numbers and generates the translated sentence. Input a sentence and encodes it, then decodes into the same language is not a useful idea. Therefore, we train our second RNN to decode the sentence into SVO structured language (English) instead of SOV (Bengali). Figure 5 shows the overall description of our proposed model.

Fig. 5
figure 5

Description of our proposed model

3.2 Recurrent neural network (RNN)

To be able to use the sequential information in the calculation is the main idea of RNNs [37]. In a neural network, all inputs and outputs are independent of each other. But this is not efficient for most of the systems. Suppose we want to build a system which can predict the next word we are going to write. In that case, we need to know the current word. Otherwise, it will be difficult of a system to predict the next word. In these types of applications, neural networks cannot perform well. To solve these problems a system is required, which can use the previous states result for the next state calculations. RNN is a little bit updated version of a neural network, where the result of previous state is one of the inputs for the next state. Unlike neural networks, next output depends on the previous output in RNNs. Figure 6 shows the block diagram of RNN and Fig. 7 shows the structure of a RNN being unfolded into a network.

Fig. 6
figure 6

Block diagram of RNN

Fig. 7
figure 7

Structure of a RNN being unfolded into the network [38]

Unfold means to write the complete sequence of the network. Suppose we have a sentence of four words, the network will be unfolded into a neural network of 4-layers, one layer per word. The working procedure of Fig. 7 is described as follows.

At the time step \(t,x_{t}\) denotes input, st denotes hidden state, and ot denotes output of a sentence. The current state input and previous hidden state are required to calculate the next hidden state. The equation to calculate the hidden state is:

$$S_{t} = \, f \, \left( {Ux_{t} + \, Ws_{t - 1} } \right).$$
(1)

Here, f is a nonlinear function and st-1 is the previous hidden state. st is also known as the memory of this network. st store the previous state information at the time step t. The point to be noted is that st cannot store all previous state information, it can store only a few previous state information. Although a normal deep neural network uses various parameters on each layer, RNN uses the same parameters (here, U, V, W) in above all layers [39,40,41,42].

The probability distribution of a sequence can be learned by training to guess the next symbol by an RNN. In that case, the output of each stamp is conditionally distributed as:

$$p\left( {xt|xt - 1, \ldots ,x1} \right).$$
(2)

The probability of the sequence x can be measured by combining these probabilities using:

$$p\left( x \right) = \prod t = 1T \times p\left( {xt|xt - 1, \ldots ,x1} \right).$$
(3)

New sequence can be predicted easily by sampling a symbol in each time step.

3.3 Encodings

Encodings is a technique which represents something complicated into a simple way [43]. The most popular application of encodings is face recognition. For example, a system is required which can compare two different people’s faces with a computer. Various measurement data from each face are needed to build this system. Those various measurements might be the spacing between eyes, the size of nose, lips and size of each ear, etc. Here, we can use a neural network to generate those measurements data. The system will compare those data of two faces to see if they are the same person. This idea of collecting a list of measurement data from a face is an example of encoding.

We can use the same concept of encodings to generate measurement values from a sentence. Figure 8 shows the list of measurement data F(X) from an inputted sentence X. This system is much more efficient, because the system only generates 128 numbers for each sentence.

Fig. 8
figure 8

List of measurements from Input Sentence

As RNN reads each symbol, thus the hidden state of the RNN changes accordingly Eq. (1). After the end of each sequence reading, the hidden state of the RNN contains an overview c of the input sequence. In this proposed model, the decoder is another RNN which generates the output by predicting the next symbol ot where the hidden state is st.

$${\text{S}}_{{\text{t}}} = {\text{ f}}\left( {{\text{s}}_{{({\text{t}} - {1}}} } \right),{\text{o}}_{{({\text{t}} - {1})}} ,{\text{c}}).$$
(4)

Similarly, the conditional distribution of the next symbol for the activation functions given s and g is:

$${\text{P}}\left( {{\text{o}}_{{\text{t}}} |{\text{o}}_{{{\text{t}} - {1}}} ,{\text{o}}_{{{\text{t}} - {2}}} , \ldots ,{\text{o}}_{{1}} ,{\text{c}}} \right) = {\text{g}}\left( {{\text{s}}_{{\text{t}}} ,{\text{ o}}_{{{\text{t}} - {1}}} ,{\text{ c}}} \right).$$
(5)

The proposed RNNs two components encoder decoder are jointly trained to optimize the conditional log-likelihood,

$$\max \frac{1}{N}\sum\limits_{n = 1}^{N} {\log \,p(On\,V\,Xn),}$$
(6)

where, Θ denotes the set of the model parameters, xn denotes the input sequence, on denotes output sequence pair from the training set. When the RNN encoder-decoder trained, it can be used to predict scores of the given input and output sequences. The score shown in terms of probability pΘ(o|x) from Eqs. (2) and (3). If the RNN encoder-decoder is equipped, we apply a new score to the current pair of sentences in each equation. This allows new scores to be inserted into existing tuning algorithms with minimal additional overhead for the calculation.

4 Experiments and results analysis

In this research, we have used two datasets for learning and testing purpose. One is an Enabling Minority Language Engineering (EMILLE) corpus [44] and another is Prothom-Alo corpus [45]. A corpus is the collection of written texts. EMILLE was developed by a joint research by Central Institute of Indian Languages (CIIL), India and Lancaster University, UK. Although EMILLE corpus contains three components (monolingual, parallel and annotated), but only two of them (monolingual and parallel) contains Bengali language data. The EMILLE monolingual corpus contains 1,867,452 words and EMILLE parallel contains 189,495 words of Bengali text data. Prothom-Alo corpus was developed by BRAC University, Bangladesh which contains 19,496,884 words of Bengali text data. We have also used “Facebook Graph API” tools for collecting the social network data by ourselves. This tool can extract data from any page or group such as restaurant, movie, etc. We have collected above 1800 sentences using this tool.

Before training the system, we encoded the files of the corpus to UTF-8. Then all the sentences extracted from XML to text. We have used an automatic sentence aligner to align those sentences. Since, a single evaluation tool is not enough to evaluate an MT system, we have used three evaluation tools in this system. BLEU (bilingual evaluation understudy) [46], NIST (metric) [47], and TER (tertiary entrance rank) [48] evaluation tools are used to evaluate our system.

The major problem for translating a sentence from one language to another is semantic ambiguity. Semantic ambiguity means different meanings of one word. For example, this Bengali word “khay” (খায়) has various meanings. The word “khay” (খায়) is a verb. In Bengali sentence, a verb comes after the noun. The word “khay” (খায়) changes its meaning based on the noun of a sentence. Similar case for some words like- “dekhar” (দেখার), “purbe” (পূর্বে), “suni” (শুনি) etc. Table 1 shows some examples of semantic ambiguity. For this reason, although a huge number of researches had been done for English to Bengali language translation, but a few research had been done for Bengali to English language. From those few researches, none of them address this problem. Even the world popular translator Google translates also cannot solve this problem. Our proposed system can solve this problem efficiently. Table 2 shows some Bengali sentences and their corresponding translated English sentences using various translation techniques. The evaluation result of our system describes the efficiency of it. Our system gets 0.742 BLEU score, 4.11 NIST score, and 0.18 TER score for the overall combined dataset. Table 3 shows the evaluation result. We have compared our evaluation result with Islam et al. [25] to find out the efficiency of our proposed system (Fig. 9).

Table 1 Examples of semantic ambiguity
Table 2 Bengali to English translation in various translation techniques
Table 3 Evaluation of proposed method
Fig. 9
figure 9

Result comparison on EMILLE dataset of proposed method with Islam et al. [25]

Our proposed system is robust and generic for any SOV to SVO language translation. Our system provides good result in similar SOV structured language like Bengali e.g., Hindi, Sanskrit, Assamese, Oriya, Marathi, Punjabi, Nepali, Japanese, etc. Table 4 shows some sentences in various similar structured languages and their corresponding translated English sentences using various translation techniques.

Table 4 Various Language to English translation in various translation techniques

All we need to do is that to train the system with new language datasets. To find out the novelty, we have used Panjabi (SOV) and English (SVO) as another case study. We trained our system using Punjabi Monolingual Text Corpus ILCI-II dataset [49]. ILCI-II was initiated by Ministry of Electronics & Information Technology, Meity and collected by the Govt. of India, Jawaharal Nehru University, New Delhi. This corpus contains 30,000 sentences of general domain. We have found that our system got 0.75 BLEU score (Fig. 10).

Fig. 10
figure 10

Result comparison on Punjabi Monolingual Text Corpus ILCI-II dataset of proposed method with Kumar and Sharma [50]

In [24], authors proposed a DeConverter for Bengali language, where they used 300 Bengali sentences for check the efficiency of their system. Authors claimed that they have found 0.76 BLEU scores. To test the effectiveness of our system, we have trained our system with the exact same datasets and found that our system achieved a BLEU score of 0.79 (Fig. 11).

Fig. 11
figure 11

Result comparison of proposed method with Ali et al. [24] based on same dataset

5 Discussion

We have evaluated the proposed machine translation system with only one other machine translation system because, for the Bengali language a few MT system has been proposed so far. Moreover, in [25] authors used EMILLE dataset to evaluate their proposed system. Basically, result comparison on the same dataset with two different systems is easier to find the best one.

After comparing our work with [25], we have found that our work performs far better. The high in BLEU score means better results [46]. In Fig. 11 we can see that our proposed system got 0.74 BLEU score where [25] got 0.057. The lower in the TER score is, the better result [48]. Our system got 0.17 TER score, whereas [25] got 0.83. NIST score is 0 to 10 range where it contains three stages between this range- Low (0.0–3.9), Medium (4.0–6.9) and High (7.0–10.0) [47]. We can see that proposed system got Medium range score, whereas [25] got Low range score. Therefore, the efficiency of our proposed system much better than other system.

We have also trained our system with Punjabi Monolingual Text Corpus ILCI-II [49]. To find out the efficiency of our system, we have compared the result with a Punjabi Deconverter [50]. After comparing our work with [50], we have found that our work performs far better. In Fig. 10, we can see that our proposed system got 0.75 BLEU score where Punjabi Deconverter [49] got 0.72. Moreover, we have trained our system with the same datasets used in [24] and compare the results with [24]. We have found that our proposed system achieves 0.79 BLEU score where [24] got 0.76 BLEU score. Figure 11 shows that our system outperforms [24].

Our proposed system does not require to be compelled to knowing any rules concerning human language. The system figures out those rules itself. Language experts are not required to tune each step of the translation pipeline. The system will do it automatically. One thing needs to remember is that this system required real word data for training purpose. Since rare words come often in a sentence, sometimes our system does not translate rear word correctly.

6 Conclusion

In this paper, we have presented a deep learning-based sequence-to-sequence statistical machine translation system which can translate any SOV language into SVO language. This paper also provides an overall description of other MT systems. Two popular deep learning algorithms named- recurrent neural network (RNN) and encodings are used in this system. Two case study is shown in this paper. Bengali (SOV) language and English (SVO) language as case study-1 where Punjabi (SOV) and English (SVO) language as case study-2. EMILLE corpus dataset and Prothom-Alo corpus dataset are used for case study-1 in this system. A very less research had been reported on Bengali to English language MT, but none of them solve the semantic ambiguity problem of Bengali sentence efficiently. Since a few numbers of MT system available for Bengali to English language, the efficiency of this system shown by comparing some translated sentence with other state-of-the-art translation system. On average, we have obtained satisfied scores for combined datasets (BLEU score: 0.742, NIST score: 4.11, TER score: 0.18). Punjabi Monolingual Text Corpus ILCI-II dataset is used for case study-2 which achieved a BLEU score of 0.75. Our immediate plan is to improve the accuracy of this system and we will also extend this work by adding some other languages.