Natural Language Processing Based Approach to Overcome Arabizi and Code Switching in Social Media Moroccan Dialect

Hajbi, Soufiane; Chihab, Younes; Ed-Dali, Rachid; Korchiyne, Redouan

doi:10.1007/978-3-030-91738-8_6

Soufiane Hajbi ORCID: orcid.org/0000-0002-4504-5363¹⁴,
Younes Chihab¹⁴,
Rachid Ed-Dali¹⁵ &
…
Redouan Korchiyne¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 357))

Included in the following conference series:

The International Conference on Information, Communication & Cybersecurity

1011 Accesses
2 Citations
3 Altmetric

Abstract

Over the past decade, the use of social networks has become more common and people have found a convenient place to share information and express opinions. The massive volume of data generated provides a good opportunity to extract valuable knowledge to reveal people’s needs and behaviours. For this purpose, Sentiment Analysis techniques are widely used. The results are very accurate when they are applied to common languages, namely English, Spanish or French. However, these techniques are still at development stage for Modern Standard Arabic (MSA) and derived dialects. In the case of Moroccan Dialect used in Social Media, the main challenge is the phenomena of Code Switching; two or more languages appear in the same sentence (Arabic, Tamazight, French, English or Spanish). The second is the Arabizi of words using Latin script combined with numbers instead of Arabic characters. As a consequence, the preprocessing became one of the important steps of data analysis. This paper proposes a new method based on Natural Language Processing (NLP) to address the challenges of preprocessing text that contains Arabizi and Code Switching forms. We aim to build a multilingual corpus that includes linguistic features and reflects the structure of text written in Social Media Moroccan Dialect.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DZDC12: a new multipurpose parallel Algerian Arabizi–French code-switched corpus

Article 01 April 2019

Language resources for Maghrebi Arabic dialects’ NLP: a survey

Article 25 April 2020

Towards Automatic Normalization of the Moroccan Dialectal Arabic User Generated Text

Notes

1.
Arabic transliteration according to Buckwalter System. Retrieved June 07, 2021, from http://www.qamus.org/transliteration.htm.
2.
Alexa Ranking: Top social media sites in Morocco, http://www.alexa.com/topsites/countries/MA, visited Mai 07, 2021.

References

Alasadi, S.A., Bhaya, W.S.: Review of data preprocessing techniques in data mining. J. Eng. Appl. Sci. 12(16), 4102–4107 (2017)
Google Scholar
Alshdaifat, E., Alshdaifat, D., Alsarhan, A., Hussein, F., El-Salhi, S.M.F.S., et al.: The effect of preprocessing techniques, applied to numeric features, on classification algorithms’ performance. Data 6(2), 1–23 (2021)
Article Google Scholar
Harrat, S., Meftouh, K., Smaili, K.: Machine translation for Arabic dialects (survey). Inf. Process. Manag. 56(2), 262–273 (2019)
Article Google Scholar
Hegazi, M.O., Al-Dossari, Y., Al-Yahy, A., Al-Sumari, A., Hilal, A.: Preprocessing Arabic text on social media. Heliyon 7(2), e06191 (2021)
Article Google Scholar
Talafha, B., Abuammar, A., Al-Ayyoub, M.: ATAR: Attention-based LSTM for Arabizi transliteration (2088–8708). Int. J. Electr. Comput. Eng. 11(3), 2327–2334 (2021)
Google Scholar
Chakrani, B.: Between profit and identity: analyzing the effect of language of instruction in predicting overt language attitudes in Morocco. Appl. Linguis. 38(2), 215–233 (2017)
Article Google Scholar
Ferguson, C.A.: Diglossia. Word 15(2), 325–340 (1959)
Article Google Scholar
Farha, I.A., Magdy, W.: A comparative study of effective approaches for Arabic sentiment analysis. Inf. Process. Manag. 58(2), 102438 (2021)
Article Google Scholar
Soufan, A.: Deep learning for sentiment analysis of Arabic text. In: Proceedings of the ArabWIC 6th Annual International Conference Research Track. ArabWIC 2019. Association for Computing Machinery (2019)
Google Scholar
Mallek, F., Belainine, B., Sadat, F.: Arabic social media analysis and translation. Procedia Comput. Sci. 117, 298–303 (2017). Arabic Computational Linguistics
Article Google Scholar
El Abdouli, A., Hassouni, L., Anoun, H.: Mining tweets of Moroccan users using the framework Hadoop, NLP, k-means and basemap. In: 2017 Intelligent Systems and Computer Vision (ISCV), pp. 1–7. IEEE (2017)
Google Scholar
Al-Ghaith, W.: Developing lexicon-based algorithms and sentiment lexicon for sentiment analysis of Saudi dialect tweets. Int. J. Adv. Comput. Sci. Appl. 10(11), 83–88 (2019)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
MATH Google Scholar
Habash, N., Rambow, O.: Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 573–580 (2005)
Google Scholar
Hughes, B., Baldwin, T., Bird, S., Nicholson, J., MacKinlay, A.: Reconsidering language identification for written language resources. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). European Language Resources Association (2006)
Google Scholar
Jauhiainen, T., Lui, M., Zampieri, M., Baldwin, T., Lindén, K.: Automatic language identification in texts: a survey. J. Artif. Intell. Res. 65, 675–782 (2019)
Article MathSciNet Google Scholar
Shuyo, N.: Language detection library for java (2010). http://code.google.com/p/language-detection/
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997). https://doi.org/10.1023/A:1007465528199
Article MATH Google Scholar
Jünger, J., Keyling, T.: Facepager. An application for automated data retrieval on the web. Facepager. An application for generic data retrieval through APIs. Source code and releases available (2019). https://github.com/strohne/Facepager
Chiny, M., Chihab, M., Chihab, Y., Bencharef, O.: LSTM, VADER and TF-IDF based hybrid sentiment analysis model (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Computer Sciences, Ibn Tofail University, Kenitra, Morocco
Soufiane Hajbi, Younes Chihab & Redouan Korchiyne
Faculty of Letters and Humanities, Cadi Ayyad University, Marrakesh, Morocco
Rachid Ed-Dali

Authors

Soufiane Hajbi
View author publications
You can also search for this author in PubMed Google Scholar
Younes Chihab
View author publications
You can also search for this author in PubMed Google Scholar
Rachid Ed-Dali
View author publications
You can also search for this author in PubMed Google Scholar
Redouan Korchiyne
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soufiane Hajbi .

Editor information

Editors and Affiliations

Sultan Moulay Slimane University, Beni Mellal, Morocco
Yassine Maleh
Charles Darwin University, Darwin, NT, Australia
Mamoun Alazab
Sultan Moulay Slimane University, Béni Mellal, Morocco
Noreddine Gherabi
San Antonio One University Way, Texas A&M University, San Antonio, TX, USA
Lo’ai Tawalbeh
Gamal Abd El-Nasir, Menoufia University, Menofia Governorate, Egypt
Ahmed A. Abd El-Latif

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hajbi, S., Chihab, Y., Ed-Dali, R., Korchiyne, R. (2022). Natural Language Processing Based Approach to Overcome Arabizi and Code Switching in Social Media Moroccan Dialect. In: Maleh, Y., Alazab, M., Gherabi, N., Tawalbeh, L., Abd El-Latif, A.A. (eds) Advances in Information, Communication and Cybersecurity. ICI2C 2021. Lecture Notes in Networks and Systems, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-91738-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-91738-8_6
Published: 12 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91737-1
Online ISBN: 978-3-030-91738-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Natural Language Processing Based Approach to Overcome Arabizi and Code Switching in Social Media Moroccan Dialect

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DZDC12: a new multipurpose parallel Algerian Arabizi–French code-switched corpus

Language resources for Maghrebi Arabic dialects’ NLP: a survey

Towards Automatic Normalization of the Moroccan Dialectal Arabic User Generated Text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Natural Language Processing Based Approach to Overcome Arabizi and Code Switching in Social Media Moroccan Dialect

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DZDC12: a new multipurpose parallel Algerian Arabizi–French code-switched corpus

Language resources for Maghrebi Arabic dialects’ NLP: a survey

Towards Automatic Normalization of the Moroccan Dialectal Arabic User Generated Text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation