Skip to main content

Neural Network for Arabic Text Diacritization on a New Dataset

  • Conference paper
  • First Online:
Proceedings of the 6th International Conference on Big Data and Internet of Things (BDIoT 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 625))

Included in the following conference series:

  • 361 Accesses

Abstract

Arabic language is one of the most spoken languages in the world, it’s the official language of many countries and the fourth most used language on the internet. Arabic texts are often written without diacritic marks. However, those marks are important to clarify the sense and meaning of words. Automatic diacritization is the process of assigning diacritics to letters, and it’s an important field in Arabic Natural Language Processing (ANLP). In this work, we try to find the effect of increasing the training dataset on the diacritization error rate (DER) by building a new dataset and concatenating it with the Tashkeela dataset. We trained a deep learning model based on bidirectional long short-term memory BLSTM that transcribes undiacritized sequences of Arabic letters and produces an output sequence of the same length fully diacritized. Our model shows significant results on the new dataset in terms of DER and validation loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Boudad, N., Faizi, R., Thami, R.O.H., Chiheb, R.: Sentiment analysis in Arabic: a review of the literature. Ain Shams Eng. J. 9(4), 2479–2490 (2018)

    Article  Google Scholar 

  2. Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Language Inform. Process. 8(4), 1–22 (2009)

    Article  Google Scholar 

  3. Fadel, A., Tuffaha, I., Al-Jawarneh, B., Al-Ayyoub, M.: Arabic text diacritization using deep neural networks. In: 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, pp. 1–7 (2019). https://doi.org/10.1109/CAIS.2019.8769512

  4. Thompson, B., Alshehri, A.: Improving Arabic Diacritization by Learning to Diacritize and Translate. https://arxiv.org/ftp/arxiv/papers/2109/2109.14150.pdf

  5. Almanea, M.M.: Automatic methods and neural networks in Arabic texts Diacritization: a comprehensive survey. IEEE Access 9, 145012–145032 (2021). https://doi.org/10.1109/ACCESS.2021.3122977

    Article  Google Scholar 

  6. Larabi, S., Marie-Sainte, S., Alalyani, N., Alotaibi, S., Ghouzali, S., Abunadi, I.: Arabic natural language processing and machine learning-based systems. IEEE Access 7, 7011–7020 (2019). https://doi.org/10.1109/ACCESS.2018.2890076

    Article  Google Scholar 

  7. El-Sadany T., Hashish M.: Semi-automatic vowelization of Arabic verbs. In: 10th National Computer Conference, pp. 725–732 (1988)

    Google Scholar 

  8. Al-Sughaiyer, I.A., Al-Kharashi, I.A.: Arabic morphological analysis techniques: a comprehensive survey. J. Am. Soc. Inf. Sci. Technol, 55(3), 189–213 (2004)

    Article  Google Scholar 

  9. Gal, Y.: An HMM approach to vowel restoration in Arabic and Hebrew. In: ACL-02 Workshop on Computational Approaches to Semitic Languages, pp. 1–7 (2002)

    Google Scholar 

  10. Hifny, Y.: Smoothing techniques for Arabic diacritics restoration. In: 12th Conf. on Language Engineering, pp. 6–12 (2012)

    Google Scholar 

  11. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech and Language Process. 20(1), 30–42 (2012)

    Google Scholar 

  12. Vergyri, D., Kirchhoff, K.: Automatic diacritization of Arabic for acoustic modeling in speech recognition. In: Workshop on Computational Approaches to Arabic Scriptbased Languages, pp. 66–73 (2004)

    Google Scholar 

  13. Nelken, R., Shieber, S.M.: Arabic diacritization using weighted _nite-state transducers. In: ACL Workshop on Computational Approaches to Semitic Languages, pp. 79–86 (2005)

    Google Scholar 

  14. Barqawi, A., Zerrouki, T.: Shakkala, arabic text vocalization. https://github.com/Barqawiz/Shakkala (2017)

  15. Al Sallab, M., Rashwan, H., Raafat, M., Rafea, A.,: Automatic Arabic diacritics restoration based on deep nets. In: Proceedings of the MNLP Workshop Arabic Natural Lang. Process. (ANLP). Association Computational Linguistics, Doha, Qatar, pp. 65–72. https://www.aclweb.org/anthology/W14-3608 (2014)

  16. Abandah, G.A., Graves, A., Al-Shagoor, B., Arabiyat, A., Jamour, F., Al-Taee, M.: Automatic diacritization of Arabic text using recurrent neural networks. Int. J. Document Anal. Recogn. (IJDAR) 18(2), 183–197 (2015). https://doi.org/10.1007/s10032-015-0242-2

    Article  Google Scholar 

  17. Belinkov, Y., Glass, J., : Arabic diacritization with recurrent neural networks. In: Proceedings of the Conference Empirical Methods Natural Language Processing Lisbon, pp. 2281–2285. Association Computational Linguistics, Portugal. https://www.aclweb.org/anthology/D15-1274 (2015)

  18. Fadel, A., Tuffaha, I., Al-Jawarneh, B., Al-Ayyoub, M.: Neural arabic text diacritization: state of the art results and a novel approach for machine translation. In: Proceedings 6thWorkshop Asian Translation, pp. 215–225. Association Computational Linguistics, Hong Kong (2019)

    Google Scholar 

  19. Mubarak, H., Abdelali, A., Sajjad, H., Samih, Y., Darwish, K.: Highly effective Arabic diacritization using sequence-to-sequence modeling. In: Proceedings of the Conference North American Chapter Association Computational Linguistics, Human Language Technologies, vol. 1, pp. 2390–2395. Association Computational Linguistics, Minneapolis, MN, USA (2019)

    Google Scholar 

  20. AlKhamissi, B., ElNokrashy, N., Gabr, M.: Deep Diacritization: Efficient Hierarchical Recurrence for Improved Arabic Diacritization. arXiv:2011.00538v1 (2020)

  21. Madhfar, M.A.H., Qamar, A.M.: Effective deep learning models for automatic Diacritization of Arabic Text. IEEE Access 9, 273–288 (2021). https://doi.org/10.1109/ACCESS.2020.3041676

    Article  Google Scholar 

  22. Abandah, G., Abdel-Karim, A.: Accurate and fast recurrent neural network solution for the automatic Diacritization of Arabic text. Jordanian J. Comput. Inform. Technol. 06, (02) 1 (2020)

    Google Scholar 

  23. Abandah, G.A., Khedher, M.Z., Abdel-Majeed, M.R., Mansour, H.M., Hulliel, S.F., Bisharat, L.M.: Classifying and diacritizing Arabic poems using deep recurrent neural networks. J. King Saud Univ. – Comput. Inform. Sci. 34, 3775-3788 (2022)

    Google Scholar 

  24. Náplava, J., Straka, M., Straková, J.: Diacritics restoration using BERT with analysis on Czech language. Prague Bull. Math. Linguist. 116(1), 27–42 (2021)

    Article  Google Scholar 

  25. Ayogu, I.I., Abu, O.: Automatic diacritic recovery with focus on the quality of the training corpus for resource-scarce languages. In: 2020 IEEE 2nd International Conference on Cyberspac (CYBER NIGERIA), pp. 98–103. Abuja, Nigeria (2021)

    Google Scholar 

  26. Alqahtani, S., Mishra, M., Diab M.: A Multitask Learning Approach for Diacritic Restoration arXiv:2006.04016v1 (2020)

  27. Abdel Karim, A., Abandah, G.: On the training of deep neural networks for automatic Arabic-text Diacritization. Int. J. Adv. Comput. Sci. Appl. 12(8), 276–286 (2021)

    Google Scholar 

  28. Abandah, G.A., Suyyagh, A.E., Abdel-Majeed, M.R.: Transfer learning and multi-phase training for accurate diacritization of Arabic poetry. J. King Saud Univ. – Comput. Inf. Sci. 34(6), 3744–3757 (2022). https://doi.org/10.1016/j.jksuci.2022.04.005

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zubeiri Iman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Iman, Z., Adnan, S., Eddine, E.M.B. (2023). Neural Network for Arabic Text Diacritization on a New Dataset. In: Lazaar, M., En-Naimi, E.M., Zouhair, A., Al Achhab, M., Mahboub, O. (eds) Proceedings of the 6th International Conference on Big Data and Internet of Things. BDIoT 2022. Lecture Notes in Networks and Systems, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-031-28387-1_17

Download citation

Publish with us

Policies and ethics