Abstract
Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. It is a useful preprocessing tool in many natural languages processing (NLP) applications. In this paper, we expose a new Arabic POS Tagger based on the combination of two main modules: the 1st order Markov and a decision tree models. These two modules allow improving existing POS Taggers with the possibility of tagging unknown words. The tag set used for this POS is an elementary tag set composed of 4 tags {noun, verb, particle, punctuation} that are sufficient for some NLP applications but greatly help increasing the accuracy. The POS tagger has been trained with the NEMLAR corpus. The experiment results demonstrate its efficiency with an overall accuracy of 98% for the full system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
Transliterated using Buckwalter [6].
- 4.
Affixes represent the prefixes, infixes and suffixes indicate substrings that come respectively at the beginning middle and at the end of a word.
- 5.
Corpus available at https://github.com/kdarwish/Farasa.
- 6.
- 7.
References
Al Shamsi FG (2006) A hidden Markov model-based POS tagger for Arabic. In: Proceeding of the 8th international conference on the statistical analysis of textual data, France, pp 31–42
Albared MO (2009) Arabic part of speech disambiguation, pp 517–532
Attia MM (2005) Specifications of the Arabic Written Corpus produced within th NEMLAR project
Atwell ES (2008) Development of tag sets for part-of-speech tagging
Atwell MS (2013) A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging. Edinburgh University Press
Buckwalter Arabic Transliteration. (n.d.). https://www.qamus.org/transliteration.htm. 20 Oct 2020
Darwish K, Mubarak H, Abdelali A, Eldesouki M (2017) Arabic POS tagging: don’t abandon feature engineering just yet. In: Proceedings of the third arabic natural language processing workshop, pp 130–137. https://doi.org/10.18653/v1/W17-1316
Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: short papers. association for computational linguistics, pp 149–152
Dinçer BT, Karaoğlan B (2004) He effect of part-of-speech tagging on IR performance for Turkish. In: Aykanat C, Dayar T, Körpeoğlu İ (eds.), Computer and Information Sciences—ISCIS 2004, Springer, pp 771–778. https://doi.org/10.1007/978-3-540-30182-0_77
Habash NF (2009) Syntactic annotation in Columbia Arabic Treebank. In: 2nd International Conference on Arabic Language Resources & Tools MEDAR. Cairo
Hammo B, Abu-Salem H, Lytinen SL, Evens M (2002) QARAB: A: question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on Computational approaches to semitic languages. July 2002
Imad Zeroual AL (2017) Towards a standard Part of Speech tagset for the Arabic language. J King Saud Univ Comput Inf Sci 171–178
Albared M, T-M O-S-A (2005) probabilistic Arabic part of speech tagger with unknown words handling. J Theor Appl Inf Technol
Maamouri MA (2004) Developing an Arabic treebank: methods, guidelines, procedures, and tools. In: Proceedings of the 20th international conference on computational linguistics
Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16
Salameh S (2018) A review of part of speech tagger for Arabic Language. International Journal of Computation and Applied Sciences IJOCAAS, 4, 4–5, June 2018. Darwish K, Mubarak H (n.d.). Farasa: A New Fast and Accurate Arabic Word Segmenter. 5
Jaafar Y, Bouzoubaa K (2015) Arabic natural language processing from software engineering to complex pipelines. In: Cicling 2015, Cairo, Egypt, April 2015
Zouaghi A, Merhbene L, Zrigui M (2012) Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artif Intell Rev 38:257–269. https://doi.org/10.1007/s10462-011-9249-3
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tnaji, K., Bouzoubaa, K., Aouragh, S.L. (2021). A Light Arabic POS Tagger Using a Hybrid Approach. In: Motahhir, S., Bossoufi, B. (eds) Digital Technologies and Applications. ICDTA 2021. Lecture Notes in Networks and Systems, vol 211. Springer, Cham. https://doi.org/10.1007/978-3-030-73882-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-73882-2_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73881-5
Online ISBN: 978-3-030-73882-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)