Skip to main content

Machine Learning for Hate Speech Detection in Arabic Social Media

  • Chapter
  • First Online:
Computational Intelligence in Recent Communication Networks

Abstract

(WARNING: This paper may contain some offensive words)

Over the past few years, abusive language and cyberbullying have known a great increase on social media in general. This phenomenon has encouraged efforts to propose solutions able to detect and prohibit such behavior. Most of these solutions are dedicated to English, but the ones that can handle Arabic are, to the best of our knowledge, rare. Many reasons lie behind this situation including the informality and ambiguity of the Arabic dialects, as well as the use of Arabic/Arabizi combinations. In this paper, we will use a collection of Arabic YouTube comments that are annotated as either “hateful” or “inoffensive” to compare the ability of five machine learning algorithms to perform correct classification on hateful Arabic comments. The algorithms are Logistic Regression, Naïve Bayes, Random Forests, Support Vector Machines, and Long Short-Term Memory. The performance metrics are Accuracy, F1-Score, Precision, and Recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    [Accessed: 28-Feb-2021]

  2. 2.

    https://scikit-learn.org/stable/ [Accessed: 28-Feb-2021]

  3. 3.

    https://www.tensorflow.org/ [Accessed: 28-Feb-2021]

  4. 4.

    https://colab.research.google.com/ [Accessed: 28-Feb-2021]

References

  1. S. Kemp, “Digital 2020: 3.8 billion people use social media – We Are Social,” (2020). [Online]. Available: https://wearesocial.com/blog/2020/01/digital-2020-3-8-billion-people-use-social-media. Accessed 21 Feb 2021

  2. D. Radcliffe, H. Abuhmaid, Social Media in the Middle East: 2019 in Review, SSRN Electronic J., (2020)

    Google Scholar 

  3. S. Modha, P. Majumder, T. Mandl, C. Mandalia, Detecting and visualizing hate speech in social media: a cyber watchdog for surveillance. Expert Syst. Appl. 161, 113725 (2020)

    Article  Google Scholar 

  4. P. Kapil, A. Ekbal, A deep neural network based multi-task learning approach to hate speech detection. Knowl.-Based Syst. 210, 106458 (2020)

    Article  Google Scholar 

  5. F.E. Ayo, O. Folorunso, F.T. Ibharalu, I.A. Osinuga, Machine learning techniques for hate speech classification of twitter data: state-of-the-art, future challenges and research directions. Comput. Sci. Rev. 38, 100311 (2020)

    Article  Google Scholar 

  6. W. Alhalabi et al., Social mining for terroristic behavior detection through Arabic tweets characterization. Futur. Gener. Comput. Syst. (2020)

    Google Scholar 

  7. H. Mubarak, A. Rashed, K. Darwish, Y. Samih, A. Abdelali, Arabic offensive language on twitter: analysis and experiments. arXiv (2020)

    Google Scholar 

  8. H. Mulki, H. Haddad, C. Bechikh Ali, H. Alshabani, L-HSAB: a Levantine Twitter dataset for hate speech and abusive language, in Proceedings of the Third Workshop on Abusive Language Online, (2019), pp. 111–118

    Google Scholar 

  9. R. Alshalan, H. Al-Khalifa, A deep learning approach for automatic hate speech detection in the saudi twittersphere. Appl. Sci. (Switzerland) 10(23), 1–16 (2020)

    Google Scholar 

  10. A. Alakrot, L. Murray, N.S. Nikolov, Dataset construction for the detection of anti-social behaviour in online communication in Arabic. Procedia Comput. Sci. 142, 174–181 (2018)

    Article  Google Scholar 

  11. United Nations, United Nations Strategy and Plan of Action on Hate Speech, (2019)

    Google Scholar 

  12. F.M. Plaza-del-Arco, M.D. Molina-González, L.A. Ureña-López, M.T. Martín-Valdivia, Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, no. March 2020, 114120 (2021)

    Article  Google Scholar 

  13. C. Arcila Calderón, D. Blanco-Herrero, M.B. Valdez Apolo, Rechazo y discurso de odio en twitter: análisis de contenido de los tuits sobre migrantes y refugiados en español/rejection and hate speech in twitter: content analysis of tweets about migrants and refugees in Spanish. Revista Española de Investigaciones Sociológicas 172, 21–39 (2020)

    Google Scholar 

  14. P. Chiril, F. Benamara Zitoune, V. Moriceau, M. Coulomb-Gully, A. Kumar, Multilingual and Multitarget Hate Speech Detection in Tweets, Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts, 4, (2019), pp. 351–360

    Google Scholar 

  15. M. Corazza, S. Menini, E. Cabrio, S. Tonelli, S. Villata, A multilingual evaluation for online hate speech detection. ACM Trans. Internet Technol. 20(2), 1–22 (2020)

    Article  Google Scholar 

  16. H. Mubarak, K. Darwish, and W. Magdy, Abusive Language Detection on Arabic Social Media, in Proceedings of the First Workshop on Abusive Language Online, (2017), pp. 52–56

    Google Scholar 

  17. E. Abozinadah, Detecting Abusive Arabic Language Twitter Accounts Using a Multidimensional Analysis Model (George Mason University, 2017)

    Google Scholar 

  18. A. Alakrot, L. Murray, N.S. Nikolov, Towards accurate detection of offensive language in online communication in Arabic. Procedia Comput. Sci. 142, 315–320 (2018)

    Article  Google Scholar 

  19. I. Guellil, A. Adeel, F. Azouaou, S. Chennoufi, H. Maafi, T. Hamitouche, Detecting hate speech against politicians in Arabic community on social media. Int. J. Web Inf. Syst. 16(3), 295–313 (2020)

    Article  Google Scholar 

  20. N. Ousidhoum, Z. Lin, H. Zhang, Y. Song, D.-Y. Yeung, Multilingual and Multi-Aspect Hate Speech Analysis, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (2019), pp. 4674–4683

    Google Scholar 

  21. Internet Live Stats, 1 Second – Internet Live Stats, (2021). [Online]. Available: https://www.internetlivestats.com/one-second/#youtube-band. Accessed 28 Feb 2021

  22. YouTube Blog, “YouTube for Press,” (2021). [Online]. Available: https://blog.youtube/press/. Accessed 28 Feb 2021

  23. I. Aljarah et al., Intelligent detection of hate speech in Arabic social network: a machine learning approach. J. Inf. Sci., 016555152091765 (2020)

    Google Scholar 

  24. NLTK, Natural Language Toolkit — NLTK 3.5 documentation, (2021). [Online]. Available: https://www.nltk.org/. Accessed 02 Mar 2021

  25. H. Nayebi, Logistic regression analysis, in Advanced Statistics for Testing Assumed Casual Relationships, (Springer, Cham, 2020), pp. 79–109

    Google Scholar 

  26. G. I. Webb, E. Keogh, R. Miikkulainen, R. Miikkulainen, M. Sebag, Naïve Bayes, in Encyclopedia of Machine Learning, (Springer US, 2011), pp. 713–714

    Google Scholar 

  27. Y. Liu, Y. Wang, J. Zhang, New machine learning algorithm: Random forest, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2012), vol. 7473 LNCS, pp. 246–252

    Google Scholar 

  28. S.H.H. Mehne, S. Mirjalili, Support vector machine: applications and improvements using evolutionary algorithms, in Evolutionary Machine Learning Techniques, ed. by S. Mirjalili, H. Faris, I. Aljarah, (Singapore, Springer, 2020), pp. 35–50

    Chapter  Google Scholar 

  29. E. Alpaydin, Introduction to Machine Learning, 4th edn. (MIT Press, 2020)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zakaria Boulouard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Boulouard, Z., Ouaissa, M., Ouaissa, M. (2022). Machine Learning for Hate Speech Detection in Arabic Social Media. In: Ouaissa, M., Boulouard, Z., Ouaissa, M., Guermah, B. (eds) Computational Intelligence in Recent Communication Networks . EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-77185-0_10

Download citation

Publish with us

Policies and ethics