Skip to main content

Natural Language Processing Based Approach to Overcome Arabizi and Code Switching in Social Media Moroccan Dialect

  • Conference paper
  • First Online:
Advances in Information, Communication and Cybersecurity (ICI2C 2021)

Abstract

Over the past decade, the use of social networks has become more common and people have found a convenient place to share information and express opinions. The massive volume of data generated provides a good opportunity to extract valuable knowledge to reveal people’s needs and behaviours. For this purpose, Sentiment Analysis techniques are widely used. The results are very accurate when they are applied to common languages, namely English, Spanish or French. However, these techniques are still at development stage for Modern Standard Arabic (MSA) and derived dialects. In the case of Moroccan Dialect used in Social Media, the main challenge is the phenomena of Code Switching; two or more languages appear in the same sentence (Arabic, Tamazight, French, English or Spanish). The second is the Arabizi of words using Latin script combined with numbers instead of Arabic characters. As a consequence, the preprocessing became one of the important steps of data analysis. This paper proposes a new method based on Natural Language Processing (NLP) to address the challenges of preprocessing text that contains Arabizi and Code Switching forms. We aim to build a multilingual corpus that includes linguistic features and reflects the structure of text written in Social Media Moroccan Dialect.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Arabic transliteration according to Buckwalter System. Retrieved June 07, 2021, from http://www.qamus.org/transliteration.htm.

  2. 2.

    Alexa Ranking: Top social media sites in Morocco, http://www.alexa.com/topsites/countries/MA, visited Mai 07, 2021.

References

  1. Alasadi, S.A., Bhaya, W.S.: Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12(16), 4102–4107 (2017)

    Google Scholar 

  2. Alshdaifat, E., Alshdaifat, D., Alsarhan, A., Hussein, F., El-Salhi, S.M.F.S., et al.: The effect of preprocessing techniques, applied to numeric features, on classification algorithms’ performance. Data 6(2), 1–23 (2021)

    Article  Google Scholar 

  3. Harrat, S., Meftouh, K., Smaili, K.: Machine translation for Arabic dialects (survey). Inf. Process. Manag. 56(2), 262–273 (2019)

    Article  Google Scholar 

  4. Hegazi, M.O., Al-Dossari, Y., Al-Yahy, A., Al-Sumari, A., Hilal, A.: Preprocessing Arabic text on social media. Heliyon 7(2), e06191 (2021)

    Article  Google Scholar 

  5. Talafha, B., Abuammar, A., Al-Ayyoub, M.: ATAR: Attention-based LSTM for Arabizi transliteration (2088–8708). Int. J. Electr. Comput. Eng. 11(3), 2327–2334 (2021)

    Google Scholar 

  6. Chakrani, B.: Between profit and identity: analyzing the effect of language of instruction in predicting overt language attitudes in Morocco. Appl. Linguis. 38(2), 215–233 (2017)

    Article  Google Scholar 

  7. Ferguson, C.A.: Diglossia. Word 15(2), 325–340 (1959)

    Article  Google Scholar 

  8. Farha, I.A., Magdy, W.: A comparative study of effective approaches for Arabic sentiment analysis. Inf. Process. Manag. 58(2), 102438 (2021)

    Article  Google Scholar 

  9. Soufan, A.: Deep learning for sentiment analysis of Arabic text. In: Proceedings of the ArabWIC 6th Annual International Conference Research Track. ArabWIC 2019. Association for Computing Machinery (2019)

    Google Scholar 

  10. Mallek, F., Belainine, B., Sadat, F.: Arabic social media analysis and translation. Procedia Comput. Sci. 117, 298–303 (2017). Arabic Computational Linguistics

    Article  Google Scholar 

  11. El Abdouli, A., Hassouni, L., Anoun, H.: Mining tweets of Moroccan users using the framework Hadoop, NLP, k-means and basemap. In: 2017 Intelligent Systems and Computer Vision (ISCV), pp. 1–7. IEEE (2017)

    Google Scholar 

  12. Al-Ghaith, W.: Developing lexicon-based algorithms and sentiment lexicon for sentiment analysis of Saudi dialect tweets. Int. J. Adv. Comput. Sci. Appl. 10(11), 83–88 (2019)

    Google Scholar 

  13. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)

    MATH  Google Scholar 

  14. Habash, N., Rambow, O.: Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 573–580 (2005)

    Google Scholar 

  15. Hughes, B., Baldwin, T., Bird, S., Nicholson, J., MacKinlay, A.: Reconsidering language identification for written language resources. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). European Language Resources Association (2006)

    Google Scholar 

  16. Jauhiainen, T., Lui, M., Zampieri, M., Baldwin, T., Lindén, K.: Automatic language identification in texts: a survey. J. Artif. Intell. Res. 65, 675–782 (2019)

    Article  MathSciNet  Google Scholar 

  17. Shuyo, N.: Language detection library for java (2010). http://code.google.com/p/language-detection/

  18. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997). https://doi.org/10.1023/A:1007465528199

    Article  MATH  Google Scholar 

  19. Jünger, J., Keyling, T.: Facepager. An application for automated data retrieval on the web. Facepager. An application for generic data retrieval through APIs. Source code and releases available (2019). https://github.com/strohne/Facepager

  20. Chiny, M., Chihab, M., Chihab, Y., Bencharef, O.: LSTM, VADER and TF-IDF based hybrid sentiment analysis model (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soufiane Hajbi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hajbi, S., Chihab, Y., Ed-Dali, R., Korchiyne, R. (2022). Natural Language Processing Based Approach to Overcome Arabizi and Code Switching in Social Media Moroccan Dialect. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds) Advances in Information, Communication and Cybersecurity. ICI2C 2021. Lecture Notes in Networks and Systems, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-91738-8_6

Download citation

Publish with us

Policies and ethics