Improving Arabic Tokenization and POS Tagging Using Morphological Analyzer

Nawar, Michael N.

doi:10.1007/978-3-319-13461-1_6

Michael N. Nawar¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 488))

Included in the following conference series:

International Conference on Advanced Machine Learning Technologies and Applications

2466 Accesses
2 Citations

Abstract

In this paper a new technique of tokenization and part-of-speech (POS) tagging for Arabic text is presented. The introduced technique uses the Arabic morphological analyzer to extract new features that will improve the stemming and the POS tagging. Applying standard evaluation metrics, the proposed tokenizer achieves an F _(β = 1) score of 99.99, and the POS tagger achieves an accuracy of 98.05%.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Light Arabic POS Tagger Using a Hybrid Approach

A hybrid Arabic POS tagging for simple and compound morphosyntactic tags

Article 08 October 2015

Rule Based Part of Speech Tagger for Arabic Question Answering System

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Diab, M., Hacioglu, K., Jurafsky, D.: Automated methods for processing Arabic text: From tokenization to base phrase chunking. In: van den Bosch, A., Soudi, A. (eds.) Arabic Computational Morphology: Knowledge-based and Empirical Methods. Kluwer/Springer (2007)
Google Scholar
Habash, N., Rambow, O.: Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop. In: Proc. of the American Association of Computational Linguistic Conference (ACL) Short Papers, Michigan, USA (2005)
Google Scholar
Habash, N., Rambow, O.: Morphological analysis and generation for Arabic dialects. In: Proc. of the Workshop on Computational Approaches to Semitic Languages in the American Association of Computational Linguistic Conference (ACL), Michigan, USA (2005)
Google Scholar
AlGahtani, S., Black, W., McNaught, J.: Arabic Part-of-Speech Tagging Using Transformation-Based Learning. In: Proc. of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt (April 2009)
Google Scholar
Kulick, S.: Simultaneous Tokenization and Part-of-Speech Tagging for Arabic without a Morphological Analyzer. In: Proc. of the American Association of Computational Linguistic (ACL) Conference Short Papers, Uppsala, Sweden (July 2010)
Google Scholar
Mansour, S., Sima’an, K., Winter, Y.: Smoothing a Lexicon-based POS tagger for Arabic and Hebrew. In: Proc. of the American Association of Computational Linguistic Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, Prague, Czech Republic (2007)
Google Scholar
Diab, M.: Second generation tools (AMIRA 2.0): Fast and robust tokenization, pos tagging, and base phrase chunking. In: Proc. of 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt (April 2009)
Google Scholar
Maamouri, M., Bies, A., Buckwalter, T.: The penn arabic treebank: Building a largescale annotated arabic corpus. In: Proc. of NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt (2004)
Google Scholar
Tamah, E., Al-Shammari, J.L.: Towards an Error-Free Arabic Stemming. In: Proc. of the American Association of Computational Linguistic (ACL) Conference on Information and Knowledge Management, New York, NY, USA (2008)
Google Scholar
Khoja, S., Garside, P., Knowles, G.: A tagset for the morphosynactic tagging of Arabic. In: Proc. of Corpus Linguistics. Lancaster University, Lancaster (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Cairo University, Cairo, Egypt
Michael N. Nawar

Authors

Michael N. Nawar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Cairo University, Egypt
Aboul Ella Hassanien
Ain Shams University, Cairo, Egypt
Mohamed F. Tolba
Benha University, Benha, Egypt
Ahmad Taher Azar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nawar, M.N. (2014). Improving Arabic Tokenization and POS Tagging Using Morphological Analyzer. In: Hassanien, A.E., Tolba, M.F., Taher Azar, A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2014. Communications in Computer and Information Science, vol 488. Springer, Cham. https://doi.org/10.1007/978-3-319-13461-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-13461-1_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13460-4
Online ISBN: 978-3-319-13461-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Arabic Tokenization and POS Tagging Using Morphological Analyzer

Abstract

Chapter PDF

Similar content being viewed by others

A Light Arabic POS Tagger Using a Hybrid Approach

A hybrid Arabic POS tagging for simple and compound morphosyntactic tags

Rule Based Part of Speech Tagger for Arabic Question Answering System

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Arabic Tokenization and POS Tagging Using Morphological Analyzer

Abstract

Chapter PDF

Similar content being viewed by others

A Light Arabic POS Tagger Using a Hybrid Approach

A hybrid Arabic POS tagging for simple and compound morphosyntactic tags

Rule Based Part of Speech Tagger for Arabic Question Answering System

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation