Abstract
(WARNING: This paper may contain some offensive words)
Over the past few years, abusive language and cyberbullying have known a great increase on social media in general. This phenomenon has encouraged efforts to propose solutions able to detect and prohibit such behavior. Most of these solutions are dedicated to English, but the ones that can handle Arabic are, to the best of our knowledge, rare. Many reasons lie behind this situation including the informality and ambiguity of the Arabic dialects, as well as the use of Arabic/Arabizi combinations. In this paper, we will use a collection of Arabic YouTube comments that are annotated as either “hateful” or “inoffensive” to compare the ability of five machine learning algorithms to perform correct classification on hateful Arabic comments. The algorithms are Logistic Regression, Naïve Bayes, Random Forests, Support Vector Machines, and Long Short-Term Memory. The performance metrics are Accuracy, F1-Score, Precision, and Recall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
[Accessed: 28-Feb-2021]
- 2.
https://scikit-learn.org/stable/ [Accessed: 28-Feb-2021]
- 3.
https://www.tensorflow.org/ [Accessed: 28-Feb-2021]
- 4.
https://colab.research.google.com/ [Accessed: 28-Feb-2021]
References
S. Kemp, “Digital 2020: 3.8 billion people use social media – We Are Social,” (2020). [Online]. Available: https://wearesocial.com/blog/2020/01/digital-2020-3-8-billion-people-use-social-media. Accessed 21 Feb 2021
D. Radcliffe, H. Abuhmaid, Social Media in the Middle East: 2019 in Review, SSRN Electronic J., (2020)
S. Modha, P. Majumder, T. Mandl, C. Mandalia, Detecting and visualizing hate speech in social media: a cyber watchdog for surveillance. Expert Syst. Appl. 161, 113725 (2020)
P. Kapil, A. Ekbal, A deep neural network based multi-task learning approach to hate speech detection. Knowl.-Based Syst. 210, 106458 (2020)
F.E. Ayo, O. Folorunso, F.T. Ibharalu, I.A. Osinuga, Machine learning techniques for hate speech classification of twitter data: state-of-the-art, future challenges and research directions. Comput. Sci. Rev. 38, 100311 (2020)
W. Alhalabi et al., Social mining for terroristic behavior detection through Arabic tweets characterization. Futur. Gener. Comput. Syst. (2020)
H. Mubarak, A. Rashed, K. Darwish, Y. Samih, A. Abdelali, Arabic offensive language on twitter: analysis and experiments. arXiv (2020)
H. Mulki, H. Haddad, C. Bechikh Ali, H. Alshabani, L-HSAB: a Levantine Twitter dataset for hate speech and abusive language, in Proceedings of the Third Workshop on Abusive Language Online, (2019), pp. 111–118
R. Alshalan, H. Al-Khalifa, A deep learning approach for automatic hate speech detection in the saudi twittersphere. Appl. Sci. (Switzerland) 10(23), 1–16 (2020)
A. Alakrot, L. Murray, N.S. Nikolov, Dataset construction for the detection of anti-social behaviour in online communication in Arabic. Procedia Comput. Sci. 142, 174–181 (2018)
United Nations, United Nations Strategy and Plan of Action on Hate Speech, (2019)
F.M. Plaza-del-Arco, M.D. Molina-González, L.A. Ureña-López, M.T. Martín-Valdivia, Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, no. March 2020, 114120 (2021)
C. Arcila Calderón, D. Blanco-Herrero, M.B. Valdez Apolo, Rechazo y discurso de odio en twitter: análisis de contenido de los tuits sobre migrantes y refugiados en español/rejection and hate speech in twitter: content analysis of tweets about migrants and refugees in Spanish. Revista Española de Investigaciones Sociológicas 172, 21–39 (2020)
P. Chiril, F. Benamara Zitoune, V. Moriceau, M. Coulomb-Gully, A. Kumar, Multilingual and Multitarget Hate Speech Detection in Tweets, Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts, 4, (2019), pp. 351–360
M. Corazza, S. Menini, E. Cabrio, S. Tonelli, S. Villata, A multilingual evaluation for online hate speech detection. ACM Trans. Internet Technol. 20(2), 1–22 (2020)
H. Mubarak, K. Darwish, and W. Magdy, Abusive Language Detection on Arabic Social Media, in Proceedings of the First Workshop on Abusive Language Online, (2017), pp. 52–56
E. Abozinadah, Detecting Abusive Arabic Language Twitter Accounts Using a Multidimensional Analysis Model (George Mason University, 2017)
A. Alakrot, L. Murray, N.S. Nikolov, Towards accurate detection of offensive language in online communication in Arabic. Procedia Comput. Sci. 142, 315–320 (2018)
I. Guellil, A. Adeel, F. Azouaou, S. Chennoufi, H. Maafi, T. Hamitouche, Detecting hate speech against politicians in Arabic community on social media. Int. J. Web Inf. Syst. 16(3), 295–313 (2020)
N. Ousidhoum, Z. Lin, H. Zhang, Y. Song, D.-Y. Yeung, Multilingual and Multi-Aspect Hate Speech Analysis, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (2019), pp. 4674–4683
Internet Live Stats, 1 Second – Internet Live Stats, (2021). [Online]. Available: https://www.internetlivestats.com/one-second/#youtube-band. Accessed 28 Feb 2021
YouTube Blog, “YouTube for Press,” (2021). [Online]. Available: https://blog.youtube/press/. Accessed 28 Feb 2021
I. Aljarah et al., Intelligent detection of hate speech in Arabic social network: a machine learning approach. J. Inf. Sci., 016555152091765 (2020)
NLTK, Natural Language Toolkit — NLTK 3.5 documentation, (2021). [Online]. Available: https://www.nltk.org/. Accessed 02 Mar 2021
H. Nayebi, Logistic regression analysis, in Advanced Statistics for Testing Assumed Casual Relationships, (Springer, Cham, 2020), pp. 79–109
G. I. Webb, E. Keogh, R. Miikkulainen, R. Miikkulainen, M. Sebag, Naïve Bayes, in Encyclopedia of Machine Learning, (Springer US, 2011), pp. 713–714
Y. Liu, Y. Wang, J. Zhang, New machine learning algorithm: Random forest, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2012), vol. 7473 LNCS, pp. 246–252
S.H.H. Mehne, S. Mirjalili, Support vector machine: applications and improvements using evolutionary algorithms, in Evolutionary Machine Learning Techniques, ed. by S. Mirjalili, H. Faris, I. Aljarah, (Singapore, Springer, 2020), pp. 35–50
E. Alpaydin, Introduction to Machine Learning, 4th edn. (MIT Press, 2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Boulouard, Z., Ouaissa, M., Ouaissa, M. (2022). Machine Learning for Hate Speech Detection in Arabic Social Media. In: Ouaissa, M., Boulouard, Z., Ouaissa, M., Guermah, B. (eds) Computational Intelligence in Recent Communication Networks . EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-77185-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-77185-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77184-3
Online ISBN: 978-3-030-77185-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)