Abstract
Although Machine Translation has historically trusted on huge amounts of parallel corpora, the latest analysis has accomplished to prepare each Neural and Statistical Machine Translation system using monolingual corpora only. In spite of the prospective of this methodology for low-resource settings, obtainable structures square measure way outstanding their supervised counterparts, restraining their concrete interest. In this paper, Sect. 1 contains numerous deficiencies of existing unsupervised SMT approaches by exploiting subword information. Section 2 consists of another methodology established on phrase-based statistical machine translation that significantly cessations the gap with supervised structures. Principled Unsupervised Statistical Machine Translation in Sect. 3. Results and discussions in Sect. 4 and conclusion in Sect. 5.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vaswani, A., Knight, K., Dyer, C.: Unifying bayesian inference and vector space models for improved decipherment. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, Long Papers, pp. 836–845. Association for Computational Linguistics, Beijing, China (2015)
Artetxe, M., Labaka, G., Agirre, E., Cho, K.: Unsupervised neural machine translation. In: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) (2018c)
Conneau, A., Lample, G., Ranzato, M.A., Denoyer, L., Jégou, H.: Word translation without parallel data. In: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) (2018); Dou, Q., Knight, K.: Large scale decipherment for out-of-domain machine translation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 266–275 (2012)
Artetxe, M., Labaka, G., Agirre, E.: A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, Long Papers, pp. 789–798. Association for Computational Linguistics (2018a)
Artetxe, M., Labaka, G., Agirre, E.: Unsupervised statistical machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3632–3642. Association for Computational Linguistics, Brussels, Belgium (2018b)
Artetxe, M., Labaka, G., Agirre, E.: Learning bilingual word embeddings with (almost) no bilingual data. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, Long Papers, pp. 451–462. Association for Computational Linguistics, Vancouver, Canada (2017)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 160–167. Association for Computational Linguistics, Sapporo, Japan (2003)
Ott, M., Edunov, S., Grangier, D., Auli, M.: Scaling neural machine translation. In: Proceedings of the Third Conference on Machine Translation: Research Papers, pp. 1–9. Association for Computational Linguistics, Belgium, Brussels (2018)
McCallum, A., Bellare, K., Pereira, F.: A conditional random field for discriminatively-trained finite-state string edit distance. In: Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence, pp. 388–395 (2005)
Dou, Q., Knight, K.: Dependency-based decipherment for resource-limited machine translation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1668–1676. Association for Computational Linguistics, Jeju Island, Korea, Seattle, Washington, USA (2013)
Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 489–500. Association for Computational Linguistics, Brussels, Belgium (2018)
Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 644–648. Association for Computational Linguistics, Atlanta, Georgia (2013)
Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., Li, M., et al.: Achieving human parity on automatic Chinese to English news translation (2018)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013)
He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.-Y., Ma, W.-Y.: Dual learning for machine translation. In: Advances in Neural Information Processing Systems, vol. 29, pp. 820–828 (2016). arXiv:1803.05567
Lample, G., Conneau, A., Denoyer, L., Ranzato, M.A.: Unsupervised machine translation using monolingual corpora only. In: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018) (2018a)
Lample, G., Ott, M., Conneau, A., Denoyer, L., Ranzato, M.A.: Phrase-based & neural unsupervised machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 5039–5049 (2018b)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady, vol. 10, pp. 707–710. Association for Computational Linguistics, Brussels, Belgium (1966)
Marie, B., Fujita, A.: Unsupervised neural machine translation initialized by unsupervised statistical machine translation (2018). arXiv:1810.12703
Heafield, K., Pouzyrevsky, I., Clark, J.H., Koehn, P.: Scalable modified Kneser-ney language model estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, vol. 2, Short Papers, pp. 690–696. Association for Computational Linguistics, Sofia, Bulgaria (2013)
Post, M.: A call for clarity in reporting bleu scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, pp. 186–191. Association for Computational Linguistics, Belgium, Brussels (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tarakeswara Rao, B., Patibandla, R.S.M.L., Murty, M.R. (2020). A Comparative Study on Effective Approaches for Unsupervised Statistical Machine Translation. In: Bhateja, V., Satapathy, S., Satori, H. (eds) Embedded Systems and Artificial Intelligence. Advances in Intelligent Systems and Computing, vol 1076. Springer, Singapore. https://doi.org/10.1007/978-981-15-0947-6_85
Download citation
DOI: https://doi.org/10.1007/978-981-15-0947-6_85
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0946-9
Online ISBN: 978-981-15-0947-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)