Abstract
This paper shows the word alignment between Odia–Bangla languages using the expectation–maximization (EM) algorithm with high accuracy output. The entire mathematical calculation is worked out and shown here by taking some Bangla–Odia sentences as a set of examples. The EM algorithm helps to find out the maximum likelihood probability value with the collaboration of the ‘argmax function’ that follows the mapping between two or more words of source and target language sentences. The lexical relationship among the words between two parallel sentences is known after calculating some mathematical values, and those values indicate which word of the target language is aligned with which word of the source language. As the EM algorithm is an iterative or looping process, the word relationship between source and target languages is easily found out by calculating some probability values in terms of maximum likelihood estimation (MLE) in an iterative way. To find the MLE or maximum a posterior (MAP) of parameters in the probability model, the model depends on unobserved latent variable(s). For years, it has been one of the toughest challenges because the process of lexical alignment for translation involves several machine learning algorithms and mathematical modeling. Keeping all these issues in mind, we have attempted to describe the nature of lexical problems that arise at the time of analyzing bilingual translated texts between Bangla (as source language) and Odia (as the target language). In word alignment, handling the ‘word divergence’ or ‘lexical divergence’ problem is the main issue and a challenging task, though it is not solved by EM algorithm, it is only possible through a bilingual dictionary or called as a lexical database that is experimentally examined and tested only mathematically. Problems of word divergence are normally addressed at the phrase level using bilingual dictionaries or lexical databases. The basic challenge lies in the identification of the single word units of the source text which are converted into multiword units in the target text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aswani, N., Gaizauskas, R.: Aligning words in English-Hindi parallel corpora. In: Association for Computational Linguistics, vol. 19, pp. 115–118 (2005)
Das, B.R., Maringanti, H.B., Dash, N.S.: Word alignment in bilingual text for Bangla to Odia machine translation. Presented in the International Conference on Languaging and Translating: Within and Beyond on 21–23 Feb 2020, IIT Patna, India
Das, B.R., Maringanti, H.B., Dash, N.S.: Challenges faced in machine learning-based Bangla-Odia word alignment for machine translation. Presented in the 42nd International Conference of Linguistic Society of India (ICOLSI-42) on 10–12 Dec 2020, GLA University, Mathura, UP, India
Das, B.R., Maringanti, H.B., Dash, N.S.: Bangla-Odia word alignment using EM algorithm for machine translation. J. Sci. Technol (Special issue), Maharaja Sriram Chandra Bhanja Deo (erstwhile North Orissa) University, Baripada, India
Dubey, S., Diwan, T.D.: Supporting large English-Hindi parallel corpus using word alignment. Int. J. Comput. Appl. 49(16–19) (2012)
Jindal, K., et al.: Automatic word aligning algorithm for Hindi-Punjabi parallel text. In: Conference on Information Systems for Indian languages, pp. 180–184 (2011)
Koehn, P., Knight, K.: Empirical methods for compounding splitting. In: EACL ‘03 Association for Computational Linguistics, vol. 1, pp. 187–193, 12–17 Apr (2003)
Mansouri, A.B., et. al.: Joint prediction of word alignment with alignment types. Trans. Assoc. Comput. Linguist. 5, 501–514 (2017)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Koehn, P.: Statistical machine translation (2010)
Songyot, T., Songyot, D.C.: Improving word alignment using word similarity. In: Empirical methods in Natural Language Processing, pp. 1840–1845 (2014)
Tidemann, J.: Word alignment step by step. In: Proceedings of the 12th Nordic Conference on Computational Linguistics, pp. 216–227. University of Trondheim, Norway (1999)
Tidemann, J.: Combining clues for word alignment. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 339–346, Budapest, Hungary, Apr 2003
Tidemann, J.: Word to word alignment strategies. In: International Conference on Computational Linguistics (2004)
Bhattacharyya, P.: Machine Translation. CRC Press (2017)
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 4th edn. Pearson (2011)
https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm.
https://www.cs.sfu.ca/~anoop/students/anahita_mansouri/anahita-depth-report.pdf.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.421.5497&rep=rep1&type=pdf.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Das, B.R., Maringanti, H.B., Dash, N.S. (2022). Application of Expectation–Maximization Algorithm to Solve Lexical Divergence in Bangla–Odia Machine Translation. In: Dehuri, S., Prasad Mishra, B.S., Mallick, P.K., Cho, SB. (eds) Biologically Inspired Techniques in Many Criteria Decision Making. Smart Innovation, Systems and Technologies, vol 271. Springer, Singapore. https://doi.org/10.1007/978-981-16-8739-6_39
Download citation
DOI: https://doi.org/10.1007/978-981-16-8739-6_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8738-9
Online ISBN: 978-981-16-8739-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)