Abstract
Hebrew and Arabic are related but mutually incomprehensible languages with complex morphology and scarce parallel corpora. Machine translation between the two languages is therefore interesting and challenging. We discuss similarities and differences between Hebrew and Arabic, the benefits and challenges that they induce, respectively, and their implications on machine translation. We highlight the shortcomings of using English as a pivot language and advocate a direct, transfer-based and linguistically-informed (but still statistical, and hence scalable) approach. We report preliminary results of the two systems we are currently developing, for translation in both directions.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Alkuhlani S, Habash N (2011) A corpus for modeling morpho-syntactic agreement in arabic: gender, number and rationality. In: Proceedings of the ACL’2011, Short Paper, Portland
Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85
Buckwalter T (2004) Buckwalter Arabic Morphological Analyzer Version 2.0. Linguistic Data Consortium, Philadelphia
El Kholy A, Habash N (2010) Techniques for Arabic morphological detokenization and orthographic denormalization. In: Proceedings of LREC-2010
Habash N (2004) Large scale lexeme based Arabic morphological generation. In: Proceedings of Traitement Automatique du Langage Naturel (TALN-04), Fez, Morocco
Habash N (2010) Introduction to Arabic Natural Language Processing. Morgan & Claypool Publishers
Habash N, Rambow O (2005) Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of ACL-05, Ann Arbor, MI, USA
Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Moore RC, Bilmes JA, Chu-Carroll J, Sanderson M (eds) HLT-NAACL, New York, NY, USA
Habash N, Soudi A, Buckwalter T (2007) On Arabic transliteration. In: Soudi A, Neumann G, van den Bosch A (eds) Arabic computational morphology, text, speech and language technology, vol 38. Springer, chap 2, pp 15–22. doi:10.1007/978-1-4020-6046-5_2
Hajic J (1987) Ruslan: an MT system between closely related languages. In: Proceedings of the 3rd conference of the European chapter of the association for computational linguistics, pp 113–117
Hajic J, Hric J, Kubon V (2000) Machine translation of very close languages. In: Proceedings of the sixth conference on applied natural language processing. Association for Computational Linguistics, Seattle, WA, USA, pp 7–12. doi:10.3115/974147.974149. http://www.aclweb.org/anthology/A00-1002
Hanneman G, Ambati V, Clark JH, Parlikar A, Lavie A (2009) An improved statistical transfer system for French–English machine translation. In: StatMT ’09: Proceedings of the fourth workshop on statistical machine translation. Association for Computational Linguistics, Morristown, NJ, USA, pp 140–144
Itai A, Wintner S (2008) Language resources for Hebrew. Lang Resour Eval 42: 75–98
Kumar S, Och FJ, Macherey W (2007) Improving word alignment with bridge languages. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, Prague, Czech Republic, pp 42–50. http://www.aclweb.org/anthology/D/D07/D07-1005
Lavie A (2008) Stat-XFER: a general search-based syntax-driven framework for machine translation. In: Gelbukh AF (ed) CICLing, Lecture Notes in Computer Science, vol 4919. Springer, pp 362–375
Lavie A, Vogel S, Levin L, Peterson E, Probst K, Llitjós AF, Reynolds R, Carbonell J, Cohen R (2003) Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario. ACM Trans Asian Lang Inform Process (TALIP) 2(2): 143–163. doi:10.1145/974740.974747
Lavie A, Sagae K, Jayaraman S (2004a) The significance of recall in automatic metrics for mt evaluation. In: Frederking RE, Taylor K (eds) AMTA. Lecture Notes in Computer Science, vol 3265. Springer, pp 134–143
Lavie A, Wintner S, Eytani Y, Peterson E, Probst K (2004b) Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In: Proceedings of TMI-2004: the 10th international conference on theoretical and methodological issues in machine translation, Baltimore, MD
Monson C, Font Llitjós A, Ambati V, Levin L, Lavie A, Alvarez A, Aranovich R, Carbonell J, Frederking R, Peterson E, Probst K (2008) Linguistic structure and bilingual informants help induce machine translation of lesser-resourced languages. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/
Munteanu DS, Marcu D (2005) Improving machine translation performance by exploiting non-parallel corpora. Comput Linguist 31(4): 477–504. doi:10.1162/089120105775299168
Muraki K (1987) PIVOT: two-phase machine translation system. In: MT summit manuscripts and program, pp 81–83
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL ’02: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 311–318. doi:10.3115/1073083.1073135
Roth R, Rambow O, Habash N, Diab M, Rudin C (2008) Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking. In: Proceedings of ACL-08, Short Papers, Columbus, OH, USA, pp 117–120
Shilon R, Habash N, Lavie A, Wintner S (2010) Machine translation between Hebrew and Arabic: needs, challenges and preliminary solutions. In: Proceedings of AMTA 2010: the ninth conference of the association for machine translation in the Americas
Tantug AC, Adali E, Oflazer K (2007) Machine translation between turkic languages. In: Proceedings of ACL 2007, Companion Volume. The Association for Computer Linguistics
Varga D, Halácsy P, Kornai A, Nagy V, Németh L, Trón V (2005) Parallel corpora for medium density languages. In: Proceedings of RANLP’2005, pp 590–596
Wu H, Wang H (2007) Pivot language approach for phrase-based statistical machine translation. In: Proceedings of the 45th annual meeting of the association of computational linguistics. Association for Computational Linguistics, Prague, Czech Republic, pp 856–863. http://www.aclweb.org/anthology/P07-1108
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shilon, R., Habash, N., Lavie, A. et al. Machine translation between Hebrew and Arabic. Machine Translation 26, 177–195 (2012). https://doi.org/10.1007/s10590-011-9103-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9103-z