Abstract
Arabic has a very rich morphology characterized by a combination of templatic and affixational morphemes, complex morphological rules, and a rich feature system. This complexity makes working with Arabic as a source of target language in machine translation (MT) a challenge for two reasons. First, it is not clear what the right representation is for two reasons. First, it is not clear what the right representation is for Arabic words given a specific MT approach or system. And secondly, there are many MT-relevant resources for Arabic morphology, lexicography and syntax (e.g., morphological analyzers, dictionaries and treebanks) that adopt various representations that are not necessarily compatible with each other. The result is that for MT researchers, there is a need to experiment with and to relate multiple representations used by different resources or components to each other within a single system. In this chapter, we describe different Arabic morphological representations used by MT-relevant natural language processing resources and tools and we discuss their usability in different MT approaches. We also present a common framework for relating different levels of representations to each other
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Azza Abdel-Monem, Khaled Shaalan, Ahmed Rafea, and Hoda Baraka. A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System. In Proceedings of the 4th Conference on Language Engineering, pp. 197–206, 2003. Cairo, Egypt.
Imad Al-Sughaiyer and Ibrahim Al-Kharashi. Arabic Morphological Analysis Tech-niques: A Comprehensive Survey. Journal of the American Society for Information Science and Technology, 55(3):189–213, 2004.
Muhammed Aljlayl and Ophir Frieder. On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach. In Proceedings of ACM Eleventh Conference on Information and Knowledge Management, Mclean, VA, pp. 340–347, 2002.
Haytham Alsharaf, Sylviane Cardey, Peter Greenfield, and Yihui Shen. Problems and Solutions in Machine Translation Involving Arabic, Chinese and French. In Proceedings of the International Conference on Information Technology, pp. 293–297, Las Vegas, Nevada, 2004.
Satanjeev Banerjee and Alon Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72, Ann Arbor, Michigan, 2005. Association for Computational Linguistics.
Kenneth Beesley. Arabic Finite-State Morphological Analysis and Generation. In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pp. 89–94, Copenhagen, Denmark, 1996.
Daniel Bikel. Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine. In Proceedings of International Conference on Human Language Technology Research (HLT), pp. 24–27, 2002.
Jeff A. Bilmes and Katrin Kirchhoff. Factored Language Models and Generalized Parallel Backoff. In Proceedings of the Human Language Technology Conference/North American Chapter of Association for Computational Linguistics (HLT/NAACL-03), pp. 4–6, Edmonton, Canada, 2003.
Peter Brown, John Cocke, Stephen Della-Pietra, Vincent Della-Pietra, Fredrick Jelinek, John Lafferty, Robert Mercer, and Paul Roossin. A Statistical Approach to Machine Translation. Computational Linguistics, 16:79–85, June 1990.
Peter Brown, Stephen Della-Pietra, Vincent Della-Pietra, and Robert Mercer. The Mathematics of Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263–311, 1993.
Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 1.0, 2002. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Catalog No.: LDC2002L49.
Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0, 2004. Linguistic Data Consortium, University of Pennsylvania, 2002. LDC Cat alog No.: LDC2004L02, ISBN 1-58563-324-0.
Chris Callison-Burch, Miles Osborne, and Philipp Koehn. Re-evaluating the Role of BLEU in Machine Translation Research. In Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics (EACL’06), pp. 249–256, Trento, Italy, 2006.
Michael Carl and Andy Way. Recent Advances in Example-Based Machine Translation. Kluwer Academic Publishers, Dordrecht, Holland, 1988.
Violetta Cavalli-Sforza, Abdelhadi Soudi, and Teruko Mitamura. Arabic Morphology Generation Using a Concatenative Strategy. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP 2000), pp. 86–93, Seattle, Washington, USA, 2000.
Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Annual Meeting of the ACL (jointly with the 8th Conference of the EACL), pp. 16–23, Madrid, Spain, 1997.
Michael Collins, Philipp Koehn, and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 531–540, Ann Arbor, Michigan, 2005.
Kareem Darwish. Building a Shallow Morphological Analyzer in One Day. In Proceedings of the workshop on Computational Approaches to Semitic Languages in the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02), pp. 47–54, Philadelphia, PA, USA, 2002.
Mona Diab, Kadri Hacioglu, and Daniel Jurafsky. Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), pp. 149–152, Boston, MA, 2004.
Bonnie J. Dorr, Pamela W. Jordan, and John W. Benoit. A Survey of Current Research in Machine Translation. In M. Zelkowitz, editor, Advances in Computers, Vol. 49, pp. 1–68. Academic Press, London, 1999.
Anas El Isbihani, Shahram Khadivi, Oliver Bender, and Hermann Ney. Morpho-syntactic arabic preprocessing for arabic to english statistical machine translation. In Proceedings on the Workshop on Statistical Machine Translation, pp. 15–22, New York City, June 2006. Association for Computational Linguistics.
Sharon Goldwater and David McClosky. Improving Statistical MT Through Morphological Analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 676–683, Vancouver, Canada, 2005.
Nizar Habash. Generation Heavy Hybrid Machine Translation. PhD thesis, University of Maryland College Park, 2003.
Nizar Habash. Large Scale Lexeme Based Arabic Morphological Generation. In Proceedings of Traitement Automatique des Langues Naturelles (TALN-04), pp. 271–276, 2004. Fez, Morocco.
Nizar Habash, Bonnie Dorr, and Christof Monz. Challenges in Building an Arabic-English GHMT System with SMT Components. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA06), pp. 56–65, Cambridge,MA, 2006.
Nizar Habash and Owen Rambow. Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 573–580, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
Nizar Habash and Owen Rambow. MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 681–688, Sydney, Australia, July 2006. Association for Computational Linguistics.
Nizar Habash, Owen Rambow, and George Kiraz. Morphological Analysis and Generation for Arabic Dialects. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at 43rd Meeting of the Association for Computational Linguistics (ACL’05), pp. 17–24, Ann Arbor, Michigan, 2005.
Nizar Habash and Fatiha Sadat. Arabic Preprocessing Schemes for Statistical Machine Translation. In Proceedings of the 7th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL06), pp. 49–52, New York, NY, 2006.
Jan Hajič, Otakar Smrž, Tim Buckwalter, and Hubert Jin. Feature-based Tagger of Approximations of Functional Arabic Morphology. In Ma. Antonia Martí Montserrat Civit, Sandra Kübler, editor, Proceedings of Treebanks and Linguistic Theories (TLT), pp. 53–64, Barcelona, Spain, 2005.
Xu Jinxi. UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15, 2002. Linguistic Data Consortium, University of Pennsylvania.
Lauri Karttunen, Ronald Kaplan, and Annie Zaenen. Two-level Morphology with Composition. In Proceedings of Fourteenth International Conference on Computational Linguistics (COLING-92), pp. 141–148, Nantes, France, July 20–28 1992.
George Kiraz. Multi-tape Two-level Morphology: A Case study in Semitic Non-Linear Morphology. In Proceedings of Fifteenth International Conference on Computational Linguistics (COLING-94), pp. 180–186, Kyoto, Japan, 1994.
Katrin Kirchhoff, Mei Yang, and Kevin Duh. Statistical Machine Translation of Parliamentary Proceedings Using Morpho-Syntactic Knowledge. In TC-STAR Workshop on Speech-to-Speech Translation, pp. 57–62, Barcelona, Spain, 2006.
Kevin Knight. A Statistical MT Tutorial Workbook, April 30 1999. http://www.clsp. jhu.edu/ws99/projects/mt/mt-workbook.htm.
Philipp Koehn. Pharaoh: a Beam Search Decoder for Phrase-based Statistical Machine Translation Models. In Proceedings of the Association for Machine Translation in the Americas, pp. 115–124, 2004.
Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical Phrase-based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), pp. 127–133, Edmonton, Canada, 2003.
Kimmo Koskenniemi. Two-Level Model for Morphological Analysis. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, pp. 683–685, 1983.
Young-Suk Lee. Morphological Analysis for Statistical Machine Translation. In Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), pp. 57–60, Boston, MA, 2004.
Young-Suk Lee, Kishore Papineni, Salim Roukos, Ossama Emam, and Hany Hassan. Language Model Based Arabic Word Segmentation. In Proceedings of the 41st Meeting of the Association for Computational Linguistics (ACL’03), pp. 399–406, Sapporo, Japan, 2003.
Mohamed Maamouri, Ann Bies, and Tim Buckwalter. The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt, 2004.
Guido Minnen, John Carroll, and Darren Pearce. Robust, Applied Morphological Generation. In Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), pp. 201–208, Mitzpe Ramon, Israel, 2000.
Sonja Nieien and Hermann Ney. Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information. Computational Linguistics, 30(2), 2004.
Franz Josef Och. Google System Description for the 2005 NIST MT Evaluation. In MT Eval Workshop (unpublished talk), 2005.
Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. A Smorgasbord of Features for Statistical Machine Translation. In Proceedings of the Human Language Technology / North American Association of Computational Linguistics Conference, pp. 161–168, Boston, Massachusetts, 2004.
Franz Josef Och and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19–52, 2003.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318, Philadelphia, PA, 2002.
Aaron Phillips and Violetta Cavalli-Sforza. Arabic-to-English Example Based Machine Translation Using Context-Insensitive Morphological Analysis. In Journées d’Etudes sur le Traitement Automatique de la Langue Arabe (JETALA), Rabat, Morocco, 2006.
Maja Popović and Hermann Ney. Towards the Use of Word Stems and Suffixes for Statistical Machine Translation. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC), pp. 1585–1588, Lisbon, Portugal, May 2004.
Chris Quirk, Arul Menezes, and Colin Cherry. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 271–279, Ann Arbor, Michigan, 2005.
Jason Riesa and David Yarowsky. Minimally Supervised Morphological Segmentation with Applications to Machine Translation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA06), pp. 185–192, Cambridge, MA, 2006.
Fatiha Sadat and Nizar Habash. Combination of Arabic Preprocessing Schemes for Statistical Machine Translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 1–8, Sydney, Australia, July 2006. Association for Computational Linguistics.
Mohammed Sharaf. Implications of the Agreement Features in (English to Arabic) Machine Translation. Master’s thesis, Al-Azhar University, 2002.
Noah Smith, David Smith, and Roy Tromble. Context-Based Morphological Disambiguation with Random Fields. In Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing (EMNLP05), pp. 475–482, Vancouver, Canada, 2005.
Harold Somers. Review Article: Example-based Machine Translation. Machine Translation, 14(2):113–157, 1999.
Abdelhadi Soudi. Challenges in the Generation of Arabic from Interlingua. In Proceedings of Traitement Automatique des Langues Naturelles (TALN-04), pp. 343–350, 2004. Fez, Morocco.
Abdelhadi Soudi, Violetta Cavalli-Sforza, and Abderrahim Jamari. A Computational Lexeme-Based Treatment of Arabic Morphology. In Proceedings of the Arabic Natural Language Processing Workshop, Conference of the Association for Computational Linguistics (ACL 2001), pp. 50–57, Toulouse, France, 2001.
Abdelhadi Soudi, Violetta Cavalli-Sforza, and Abderrahim Jamari. A Prototype English-to-Arabic Interlingua-based MT system. In Proceedings of the Third International Conference on Language Resources and Evaluation: Workshop on Arabic language resources and evaluation, Las Palmas, Spain, 2002.
Andreas Zollmann, Ashish Venugopal, and Stephan Vogel. Bridging the inflection morphology gap for arabic statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 201–204, New York City, USA, 2006. Association for Computational Linguistics.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Habash, N. (2007). Arabic Morphological Representations for Machine Translation. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_14
Download citation
DOI: https://doi.org/10.1007/978-1-4020-6046-5_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6045-8
Online ISBN: 978-1-4020-6046-5
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)