Keywords

1 Introduction

Word alignment is the process of identifying or mapping the exact and corresponding word between two parallel corpora. It is one of the translation relationships of the words between two or more parallel sentences. Somehow, a word is translated by a single word or multiple words called word divergence. In the given parallel sentences, to find the corresponding relationship among words that may be one-to-one, one-to-many, and many-to-many of source and target sentences remains the main task of word alignment. Alignment of source language phrases with corresponding target language phrases or groups of words is the solution of phrase-based translation. If the phrases of the supply sentence are not able to discover their suitable translation withinside the goal language, clearly, they're assigned null. The movement of translated words in the source sentence to their appropriate position in the target sentence is also done in word alignment. In the case of bilingual machine translation, the word reordering may be a necessity and word alignment helps in achieving it. There are multiple factors for word alignment, i.e., Named entities, Transliteration similarities, Local word grouping, nearest aligned neighbors and dictionary lookup. The various challenges of achieving word alignment include ambiguity, word order, word sense, idioms, and pronoun resolution can be solved by mathematical operation and some conceptual concept of linguistics. In Word alignment, handling the ‘Word divergence’ or ‘lexical divergence’ problem is the main issue and challenging task here though it is not solved by many more algorithms till now it is only possible through a bilingual dictionary or called lexical database that is experimentally examined and tested only mathematically. Problems of word divergence or lexical divergence are normally addressed at the phrase level using bilingual dictionaries or lexical databases.

In the information of phrase alignment, the use of numerous the use of techniques inclusive of hybrid approach which plays nearby phrase grouping on Hindi sentences and makes use of different techniques which includes dictionary lookup, transliteration similarity, anticipated English phrases and nearest aligned neighbors. The probability values between small and large pair of sentences are discussed thoroughly [1]. The various issues, problems, and challenges are described very briefly here. Different types of approaches are also described thoroughly [2]. Most of the challenges are faced and solved very carefully using Expectation Maximization algorithm and using statistical technique, the whole concept is described very prominently with good accuracy. Most of the problems and issues are solved here [3]. In this paper, the various mapping techniques one-to-one, many-to-one are solved for Bangla–Odia lexical divergence problem [4]. In this paper, for estimating the parameters of those models given a hard and fast of pairs of sentences which can be translations of each other is defined through a sequence of five statistical models of the interpretation system and algorithms [5]. English–Hindi parallel words of the sentences are mapping using word dictionary [6]. Automatic word alignment has been done using different approaches like boundary detection approach, minimum distance function, and dictionary look up [7]. Compound word spitting is the most important part of machine translation which breaks the whole word into different meaning of the word. Different approaches and their advantages and disadvantages are elaborated systematically as well as discussed, the challenges faced during translation of one language to another [8]. In this paper, a new probabilistic version is supplied for phrase alignment wherein phrase alignments are related to linguistically encouraged alignment sorts. A novel undertaking of joint prediction of phrase alignment and alignment sorts is being proposed and applied novel semi-supervised gaining knowledge of set of rules for this undertaking [9]. The algorithm illustrated with examples: pooling information from more than one noisy source and turning into an aggregate density [10]. A collection of five statistical version of translation method is defined and algorithms are given for estimating the parameters of those models additionally proven a fixed of pairs of sentences which are translation of each other and is described an idea of phrase-by-phrase alignment among such pairs of sentences [11]. This book provides a comprehensive and clear introduction to the most prominent techniques employed in the field of statistical machine translation [12]. Semantic relationship can be used to improve the word alignment, in addition to the lexical and syntactic feature that are typically used [13].

2 Estimation of Maximum Likelihood

MLE is a way that discover values for the parameters of a version. The parameter values are determined such that they maximize the probability that the method defined via way of means of the version produced the information have been really observe.

Maximum likelihood estimation is a technique of calculating the parameters of a possibility distribution technique via way of means of maximizing the possibility price the usage of argmax characteristic in order that the assumed statistical model, the discovered facts is maximum probable. The price withinside the parameter area that maximizes the chance characteristic is referred to as the most chance estimate. The good judgment of most chance is each intuitive and bendy to calculate the most price amongst all chance’s values. Now it is mostly dominate the all maximization functions.

3 Word Alignment with Methodology

This paper presents to learn and implement conditional probability model between Bangla and Odia sentence, denoted as Pθ(B|O). If the alignment of the sentences is observed before, then only estimate the P(B|O) that means to find the MLE value by taking some sentence pairs as an example. The subscript θ represents set of parameters having a dataset D of n sentences pairs, \(\mathrm{D}=\{(\mathrm{B}1,\mathrm{ O}1), (\mathrm{B}2,\mathrm{O}2), (\mathrm{B}3,\mathrm{O}3),\dots ..,(\mathrm{Bn},\mathrm{ On})\}\), where each subscript n indexes a different pair and it represents number of sentence pairs that means \((\mathrm{B}1,\mathrm{ O}1)\) is one pair, (B2, O2) is another pair and so on. The model is fully trained to predict the existence of the missing word alignment. These are many ways to define \(\mathrm{P}(\mathrm{B}|\mathrm{O})\). Suppose a Bangla sentence B is represented by an array of I, \((\mathrm{B}1,\mathrm{ B}2,\mathrm{ B}3,...\mathrm{BI})\) and an Odia sentence O is represented by an array of J, \((\mathrm{O}1,\mathrm{ O}2,\mathrm{ O}3,...,\mathrm{OJ})\). The Bangla–Odia word can be represented as an array of length I, is \((\mathrm{a}1,\mathrm{ a}2,\mathrm{ a}3,\dots \mathrm{ai})\) where \(\mathrm{a}1,\mathrm{ a}2,\mathrm{ a}3,\dots \mathrm{ ai}\) one–one alignment variables are. An alignment variable ai takes a value in the range \([0,\mathrm{ J}]\). If \(\mathrm{ai}=0\) means j value is also 0 because \(\mathrm{ai}=\mathrm{j}\), that means Bi is not aligned to any word Odia word called null alignment. Consider the sentence pair Bangla–Odia as an example. But in this particular example there is no null value exist. It may be arise in other pair of sentences in the whole corpus.

Bangla sentence

figure a

Transliteration. Rabibar Mayacha grame krushak sangharsh samite panchayate kobe 25 Octobar theke nirman karjya band korar sidhant niechhe.

Odia sentence

figure b

Transliteration. Rabibar dino mayacha gramare krushaka sangharsha samiti panchayata basai 25 Octobarru nirmana karjya band karaibaku nispatti neichhi.

The Bangla sentence is a length of 17 and the Odia sentence length is also 17. The Bangla sentence length indicates as I and so on the Odia sentence length indicates as J. The words of both the sentences are indexed like \(\mathrm{B}1,\mathrm{ B}2,\mathrm{ B}3,\dots ,\mathrm{ BI}\) and \(\mathrm{O}1,\mathrm{ O}2,\mathrm{ O}3,...,\mathrm{OJ}\). The value of an alignment array ‘a’ will be \(\{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17\}\). These are all j values. It is being assumed that the probabilistic model automatically create the Odia sentence from Bangla using a normal method. First of all, the size of Odia sentence I is chosen as per the probability distribution \(\mathrm{P}(\mathrm{I}|\mathrm{J})\), i.e., \(\mathrm{P}(17|17)\). Since the \(\mathrm{P}(\mathrm{I}|\mathrm{J})\) can be written mathematically as \(\mathrm{P}(\mathrm{1,2},3,\dots .\mathrm{I}|\mathrm{1,2},3,\dots \mathrm{J})\) i.e., P (length of source language followed by target language). Then each Bangla word position aligns to an Odia word (or null) according to the valid sentence alignment of the standard corpus (ILCI) is \(\mathrm{P}(\mathrm{ai}=\mathrm{j}|\mathrm{J})\). Finally, each Bangla word Bi is translated according to the probability distribution function on the aligned Odia word, \(\mathrm{P}(\mathrm{Bi}|{O}_{{a}_{i}})\). So, for this alignment, all probability values are multiplied likewise P (Rabibar dino|Rabibar), P(Mayacha|Mayacha), P(grama|grame), and so on. The joint probability value of the Bangla sentence and its alignment conditioned, both are calculated on the Odia sentence is simply the product of all these probabilities [15].

$$\mathrm{\rm P}\left(B,a|O\right)=\mathrm{\rm P}\left(\mathrm{I}|\mathrm{J}\right)\prod_{i=1}^{I}\mathrm{\rm P}\left({a}_{i}|\mathrm{J}\right).\mathrm{\rm P}\left({B}_{i}|{O}_{{a}_{i}}\right)$$
(1)

It is basically two values, \(\mathrm{P}(\mathrm{I}|\mathrm{J})\), for all pairs of sentence lengths I and J, and \(\mathrm{P}\left(\mathrm{B}|\mathrm{O}\right)\) for all pairs of co-occurring Bangla and Odia words B and O.

$${\forall }_{O,B}P\left(B|O\right)\in \left[\mathrm{0,1}\right]$$
(2)
$${\forall }_{O }\sum_{B}P\left(B|O\right)=1$$
(3)

4 Use of Maximum Likelihood Estimation

To observe the alignment, just taking care of the \(\mathrm{P}(\mathrm{B}|\mathrm{O})\) and estimate the approximate value through maximum likelihood estimation (MLE). At first, the alignment of the sentence has been discussed properly before then start doing the word alignment between Bangla and Odia. But there is no such type of situation occurs in Bangla–Odia as it shows in French to English translation. For Example, most of the word of French is aligned with the English word many times but this type of situation also arises in Bangla–Odia sentence pairs. From the understanding point of view, an MLE function is introduced here to calculate the probability of the given parameters. Here is showing one example how \(\mathrm{P}(\mathrm{B}|\mathrm{O})\) is calculated,

$$\begin{gathered} \theta_1 = {\text{P}}\left( {\text{krushaka|krushakder}} \right) \hfill \\ = \frac{{{\text{count}}\left( {{\text{krushaka}},{\text{ krushakder}}} \right)}}{{{\text{count}}\left( {{\text{krushaka}},{\text{ krushakder}}} \right),{\text{P}}\left( {{\text{krushakamananka}},{\text{krushakder}}} \right),{\text{ P}}\left( {{\text{krushamanankara}},{\text{ krushader}}} \right)}} \hfill \\ { = 1}/\left( {{1} + {2} + {1}} \right) \, = \, \raise.5ex\hbox{$ 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$ 4$} \, = \, 0.{25} \hfill \\ \end{gathered}$$
(4)
$$\begin{gathered} \theta_2 = {\text{P}}({\text{krushaka}}|{\text{krushakder}}) \hfill \\ = \frac{{{\text{count}}\left( {{\text{krushakamananka}},{\text{ krushakder}}} \right)}}{{{\text{count}}\left( {\text{krushakamananka krushakder}} \right),{\text{P}}\left( {{\text{krushaka}},{\text{krushakder}}} \right),{\text{ P}}\left( {{\text{krushamanankara}},{\text{ krushader}}} \right){ }}} \hfill \\ = { 2}/\left( {{1} + {1} + {2}} \right) \, = {2}/{4 } = \, \raise.5ex\hbox{$1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$2$} \, = \, 0.{5} \hfill \\ \end{gathered}$$
(5)
$$\begin{gathered} \theta_3 = {\text{P}}({\text{krushaka}}|{\text{krushakder}}) \hfill \\ = \frac{{{\text{count}}\left( {{\text{krushaka}},{\text{ krushakder}}} \right)}}{{{\text{count}}\left( {{\text{krushaka}},{\text{ krushakder}}} \right),{\text{P}}\left( {{\text{krushakamananka}},{\text{krushakder}}} \right),{\text{ P}}\left( {{\text{krushamanankara}},{\text{ krushader}}} \right){ }}} \hfill \\ = {1}/\left( {{1} + {2} + {1}} \right) \, = \, \raise.5ex\hbox{$ 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$ 4$} \, = \, 0.{25} \hfill \\ \end{gathered}$$
(6)

From these three equations [14], the Bangla word “krushakder” is aligned with different Odia words many times with different probability values. The matter is which value should be chosen for consideration. Sometimes it’s depended on highest probability value as find out by MLE here. But three parameters θ1, θ2, and θ3 have different values of different alignments. If the highest value is considered, i.e., 0.5(Eq. 4) for P(krushaka|Krushakder) not always satisfied for all cases, only satisfied for that particular semantic sense of the sentence. So, MLE is not always good at all for all cases to find the exact values.

$$\prod_{n=1}^{N}{P}_{\theta }\left({B}^{\left(n\right)}, {a}^{\left(n\right)}|{O}^{\left(n\right)}\right)=\prod_{n=1}^{N}P({I}^{\left(n\right)}|{ J}^{\left(n\right)})\prod_{i=1}^{{I}^{(n)}}P({a}_{i}^{\left(n\right)}| {J}^{\left(n\right)}.P({B}_{i}^{\left(n\right)}|{O}_{{a}_{i}}^{(n)})$$
(7)

Here, N is number of sentences, the source length language Bangla is I, the target language Odia length is J, i is the alignment index, and ai is the alignment.

Now data is observed, and the parameters are estimated, finally need a probability function to find the highest value as our data(value) is highly probable under this model.

$$\widehat{\theta }=\begin{array}{c}argmax\\ \theta \end{array} \prod_{i=1}^{N}{P}_{\theta }({B}^{\left(n\right)}, {a}^{\left(n\right)}\left|{O}^{\left(n\right)}\right)$$
(8)

In Eq. (4), where \(\widehat{\theta }\) it searches the highest probability value of word alignment by argmax function for each and every word in a sentence. It is basically a searching problem from an infinite number of possible sentences in the case of machine translation. Only one sentence is selected from different possible sentences after translation in agreement with the corpus. For this case, though the search problem is trivial, because the solution for \(\widehat{\theta }\) when the data described by model is fully observed. An algorithm is developed to learn \(\theta\) from our hypothetical aligned data actually initiates the strategy or model which is described here. The data is scanned and observing the alignments and counting them (means aligned data) for each Bangla–Odia word pair. To calculate the probabilities values (aligned word pair Bangla–Odia), all counts (means probability values) are normalized by the number of times that is observed the corresponding Bangla word participating in any alignment. This implies an algorithm which is described here.

Algorithm

Step 1. Initialize all counts to 0

Step 2. For each n value between 1 to N

Step 3. For each i value between 1 to I

Step 4. For each j value between 1 to J

Step 5. Compare ai = j upto n i.e. i value

Step 6. Count [(Bi, Oj)] +  + 

Step 7. Count [Oj] +  + 

Step 8. For each (Bi, Oj) value in count do

Step 9. P(B|O) = Count(B,O)/Count(O)

This algorithm implements over all pairs of the word in each to collect count, a computation that’s quadratic in sentence length. This is not strictly necessary: it could have just looped over the alignment variable to collect the counts, which is linear. However, thinking about the algorithm as one that examines all pairs of a word will be useful when it is moving to the case of unobserved alignments, which turns out to be an extension of this algorithm. Here, two formulae are used to calculate alignment probabilities after some iteration.

A Bangla sentence B = b1, b2, b3….bi and translated into an Odia sentence O = o1, o2, o3…oj. Among all possible Odia sentences, one is looked for the highest probability \(\mathrm{P}(\mathrm{B}|\mathrm{O})\). Using Bayes’ rule it may be written as follows:

$$\mathrm{P}(\mathrm{O}|\mathrm{B}) =\mathrm{ P}(\mathrm{O})\mathrm{P}(\mathrm{B}|\mathrm{O})/\mathrm{P}(\mathrm{B})$$
(9)

As the denominator is independent of O, finding the most probable translation e* will lead to the noisy channel model for statistical machine translation.

$${\text{e}}^{*} {\text{ = argmax P}}({\text{O}}|{\text{B)}}$$
(10)
$$= {\text{ argmaxP}}\left( {\text{O}} \right)({\text{P}}\left( {{\text{B}}|{\text{O}}} \right)$$
(11)

where P(B|O) is the translation model and P(O) is referred to as the language model. In most of the cases, many-to-one and one-to-many world alignment is purely based on phrase-based translation, there is no other way to do translation when word divergence is seen in word alignment. A bilingual Bangla–Odia lexicon is developed as per the corpus based on the agriculture domain for mapping the words and translated very smoothly by one-to-one, one-to-many, and many-to-many.

5 Result and Discussion

In the bilingual dictionary based on the agriculture (Corpus collected from TDIL, Govt. of India) domain, a small handful of sentences (approximately five thousand), around fifty thousand words stored in a well-formatted and scientific manner for easy access with observed alignments. All observed alignments are trained and it produces a good estimate of θ as mentioned in Eq. (8). If we think as much as data, to get good estimates. It contains a one-to-one word, many-to-one, and many-to-one word correspondence. First of all, connections (as one-to-one mapping) are equally likely. After one iteration the model learns that the connection is made between most similar words from two parallel sentences by finding the probability value between 0 and 1. After another iteration, it becomes clear that a connection between previous similar words is more likely as the probability value of the current word. So, bigram and trigram are the best method to find the probability of the sentence along with the alignment among the words. All probability values are calculated using a bigram with MLE and argmax function in the form of a table/matrix. All probabilities values calculated by MLE with argmax function is not sufficient for the finding to exact alignment two parallel sentences Bangla–Odia. Taking more than thousands of parallel sentences, the accuracy is not so satisfactory by experimentally done. So further, it will be tested by Expectation Maximization (EM) algorithm to get the good accuracy value for proposed system. So here a better probability distribution is being progressed. This percentage value can be further enhanced by using EM algorithm in near future. But here the accuracy is calculated manually using the mathematically formula Precision, Recall, and F-Score measure to reach near the threshold value around more than 80%.

6 Conclusion and Future Work

When a translation is occurred from one language to another, first of all, if a parallel corpus is properly aligned in sentence level, then word by word is easily done by machine. Most of the problem is raised like one-to-many and many-to-one alignment which are solved by bilingual dictionary and phrase-based translation. A bilingual dictionary is made one-to-one, one-to-many, and many-to-one correspondence (Bangla–Odia) between two languages is created. Sometimes phrased-level translation is a more appropriate solution for word divergence occurrences. The MLE function is used for finding the most suitable word pair between two languages (Bangla–Odia) from where the highest probability value is taken. It also helps to translate word by word, phrase wise and finding the appropriate position of the word of the target language with good accuracy. Time complexity is one of the major factors when data is huge for word alignment as well as machine translation. So, care should be taken to obtain a better result; to optimize this, is a challenging task. Space complexity not be reduced as our data or corpus is huge, space should be increased for this as memory is concern, otherwise, any research work based on NLP or Data Science will be superficial.