Skip to main content

Application of Expectation–Maximization Algorithm to Solve Lexical Divergence in Bangla–Odia Machine Translation

  • Conference paper
  • First Online:
Biologically Inspired Techniques in Many Criteria Decision Making

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 271))

  • 439 Accesses

Abstract

This paper shows the word alignment between Odia–Bangla languages using the expectation–maximization (EM) algorithm with high accuracy output. The entire mathematical calculation is worked out and shown here by taking some Bangla–Odia sentences as a set of examples. The EM algorithm helps to find out the maximum likelihood probability value with the collaboration of the ‘argmax function’ that follows the mapping between two or more words of source and target language sentences. The lexical relationship among the words between two parallel sentences is known after calculating some mathematical values, and those values indicate which word of the target language is aligned with which word of the source language. As the EM algorithm is an iterative or looping process, the word relationship between source and target languages is easily found out by calculating some probability values in terms of maximum likelihood estimation (MLE) in an iterative way. To find the MLE or maximum a posterior (MAP) of parameters in the probability model, the model depends on unobserved latent variable(s). For years, it has been one of the toughest challenges because the process of lexical alignment for translation involves several machine learning algorithms and mathematical modeling. Keeping all these issues in mind, we have attempted to describe the nature of lexical problems that arise at the time of analyzing bilingual translated texts between Bangla (as source language) and Odia (as the target language). In word alignment, handling the ‘word divergence’ or ‘lexical divergence’ problem is the main issue and a challenging task, though it is not solved by EM algorithm, it is only possible through a bilingual dictionary or called as a lexical database that is experimentally examined and tested only mathematically. Problems of word divergence are normally addressed at the phrase level using bilingual dictionaries or lexical databases. The basic challenge lies in the identification of the single word units of the source text which are converted into multiword units in the target text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aswani, N., Gaizauskas, R.: Aligning words in English-Hindi parallel corpora. In: Association for Computational Linguistics, vol. 19, pp. 115–118 (2005)

    Google Scholar 

  2. Das, B.R., Maringanti, H.B., Dash, N.S.: Word alignment in bilingual text for Bangla to Odia machine translation. Presented in the International Conference on Languaging and Translating: Within and Beyond on 21–23 Feb 2020, IIT Patna, India

    Google Scholar 

  3. Das, B.R., Maringanti, H.B., Dash, N.S.: Challenges faced in machine learning-based Bangla-Odia word alignment for machine translation. Presented in the 42nd International Conference of Linguistic Society of India (ICOLSI-42) on 10–12 Dec 2020, GLA University, Mathura, UP, India

    Google Scholar 

  4. Das, B.R., Maringanti, H.B., Dash, N.S.: Bangla-Odia word alignment using EM algorithm for machine translation. J. Sci. Technol (Special issue), Maharaja Sriram Chandra Bhanja Deo (erstwhile North Orissa) University, Baripada, India

    Google Scholar 

  5. Dubey, S., Diwan, T.D.: Supporting large English-Hindi parallel corpus using word alignment. Int. J. Comput. Appl. 49(16–19) (2012)

    Google Scholar 

  6. Jindal, K., et al.: Automatic word aligning algorithm for Hindi-Punjabi parallel text. In: Conference on Information Systems for Indian languages, pp. 180–184 (2011)

    Google Scholar 

  7. Koehn, P., Knight, K.: Empirical methods for compounding splitting. In: EACL ‘03 Association for Computational Linguistics, vol. 1, pp. 187–193, 12–17 Apr (2003)

    Google Scholar 

  8. Mansouri, A.B., et. al.: Joint prediction of word alignment with alignment types. Trans. Assoc. Comput. Linguist. 5, 501–514 (2017)

    Google Scholar 

  9. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)

    Google Scholar 

  10. Koehn, P.: Statistical machine translation (2010)

    Google Scholar 

  11. Songyot, T., Songyot, D.C.: Improving word alignment using word similarity. In: Empirical methods in Natural Language Processing, pp. 1840–1845 (2014)

    Google Scholar 

  12. Tidemann, J.: Word alignment step by step. In: Proceedings of the 12th Nordic Conference on Computational Linguistics, pp. 216–227. University of Trondheim, Norway (1999)

    Google Scholar 

  13. Tidemann, J.: Combining clues for word alignment. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 339–346, Budapest, Hungary, Apr 2003

    Google Scholar 

  14. Tidemann, J.: Word to word alignment strategies. In: International Conference on Computational Linguistics (2004)

    Google Scholar 

  15. Bhattacharyya, P.: Machine Translation. CRC Press (2017)

    Google Scholar 

  16. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 4th edn. Pearson (2011)

    Google Scholar 

  17. https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm.

  18. https://www.cs.sfu.ca/~anoop/students/anahita_mansouri/anahita-depth-report.pdf.

  19. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.421.5497&rep=rep1&type=pdf.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Das, B.R., Maringanti, H.B., Dash, N.S. (2022). Application of Expectation–Maximization Algorithm to Solve Lexical Divergence in Bangla–Odia Machine Translation. In: Dehuri, S., Prasad Mishra, B.S., Mallick, P.K., Cho, SB. (eds) Biologically Inspired Techniques in Many Criteria Decision Making. Smart Innovation, Systems and Technologies, vol 271. Springer, Singapore. https://doi.org/10.1007/978-981-16-8739-6_39

Download citation

Publish with us

Policies and ethics