Abstract
Social media is rapidly growing in popularity and has its advantages and disadvantages. Users posting their daily updates and opinions on social media may inadvertently hurt the feelings of others. Detecting hate speech and harmful information on social media is critical these days, lest it led to calamity. In this research, machine learning classifiers such as Naïve Bayes, support vector machines, logistic regression, and pre-trained models BERT and RoBERTa, developed by Google and Facebook, respectively, are used to detect hate speech and offensive content from Twitter data on a newly created dataset that included tweets and articles/blogs. The sentiments were obtained using the VADER sentiment analyzer. The results depicted that the pre-trained classifiers outperformed the machine learning classifiers utilized in this study. An accuracy score of 96% and 93% was scored by BERT and RoBERTa, respectively, on the tweet dataset, whereas on a dataset of articles/blogs, accuracy of 97% and 98%, respectively, was achieved by both the classifiers outperforming other classifiers used in this work. Further, it can also be depicted that neutral content is shared more in articles/blogs, hate content is mostly shared equally in both the tweets and article/blogs, whereas offensive content is shared higher in tweets than articles/blogs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ayo FE, Folorunso O, Ibharalu FT, Osinuga IA, Abayomi-Alli A (2021) A probabilistic clustering model for hate speech classification in twitter. Expert Syst Appl 173. https://doi.org/10.1016/j.eswa.2021.114762
Kapil P, Ekbal A (2020) A deep neural network based multi-task learning approach to hate speech detection. Knowledge-Based Syst 210. https://doi.org/10.1016/j.knosys.2020.106458
Watanabe H, Bouazizi M, Ohtsuki T (2018) Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6:13825–13835. https://doi.org/10.1109/ACCESS.2018.2806394
Souza A de, Abreu DC, Souza, GA (n.d.). Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. http://shura.shu.ac.uk/26018/, https://orcid.org/0000-0001-7461-7570
Alfina I, Mulia R, Fanany MI, Ekanata Y (2018) Hate speech detection in the Indonesian language: A dataset and preliminary study. In: 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017, 2018-January, 233–237. https://doi.org/10.1109/ICACSIS.2017.8355039
Ketsbaia L, Chen X (n.d.) Detection of hate Tweets using machine learning and deep learning. https://t.co/xUCcwoetmn
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. www.aaai.org
Modha S, Majumder P, Mandl T, Mandalia C (2020). Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance. Expert Syst Appl, 161. https://doi.org/10.1016/j.eswa.2020.113725
Almeida TG, Nakamura FG, Souza B, Nakamura EF (2017) Detecting hate, offensive, and regular speech in short comments. In: WebMedia 2017—Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web, 225–228. https://doi.org/10.1145/3126858.3131576
Vigna F, Cimino A, Dell’orletta F, Petrocchi M, Tesconi M (n.d.) Hate me, hate me not: Hate speech detection on Facebook. https://curl.haxx.se
Gröndahl T, Pajola L, Juuti M, Conti M, Asokan N (2018) All you need is “love”: evading hate-speech detection. http://arxiv.org/abs/1808.09115
Zhang Z, Robinson D, Tepper J (2016) Hate speech detection using a convolution-LSTM based deep neural network. https://doi.org/10.475/123_4
Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V (2022) Emotionally informed hate speech detection: a multi-target perspective. Cogn Comput 14(1):322–352. https://doi.org/10.1007/s12559-021-09862-5
Plaza-Del-Arco FM, Molina-Gonzalez, MD, Urena-Lopez LA, Martin-Valdivia MT (2021) A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access 9:112478–112489. https://doi.org/10.1109/ACCESS.2021.3103697
Koushik G, Rajeswari K, Muthusamy SK (2019) Automated hate speech detection on Twitter. In: Proceedings—2019 5th International Conference on Computing, Communication Control and Automation, ICCUBEA 201, September 19. https://doi.org/10.1109/ICCUBEA47591.2019.9128428
Dorris W, Hu RR, Vishwamitra N, Luo F, Costello M (2020) Towards automatic detection and explanation of hate speech and offensive language. In: IWSPA 2020—Proceedings of the 6th International Workshop on Security and Privacy Analytics, 23–29. https://doi.org/10.1145/3375708.3380312
Pitsilis GK, Ramampiaro H, Langseth H (2018) Effective hate-speech detection in Twitter data using recurrent neural networks. Appl Intell 48(12):4730–4742. https://doi.org/10.1007/s10489-018-1242-y
Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: 26th International World Wide Web Conference 2017, WWW 2017 Companion, 759–760. https://doi.org/10.1145/3041021.3054223
Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: WWW 2015 Companion—Proceedings of the 24th International Conference on World Wide Web, 29–30. https://doi.org/10.1145/2740908.2742760
Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. 25th International World Wide Web Conference. WWW 2016:145–153. https://doi.org/10.1145/2872427.2883062
Gao L, Huang R (2017) Detecting online hate speech using context aware models. http://arxiv.org/abs/1710.07395
Roy PK, Tripathy AK, Das TK, Gao XZ (2020) A framework for hate speech detection using deep convolutional neural network. IEEE Access 8:204951–204962. https://doi.org/10.1109/ACCESS.2020.3037073
Alakrot A, Murray L, Nikolov NS (2018) Towards accurate detection of offensive language in online communication in Arabic. Procedia Comp Sci 142:315–320. https://doi.org/10.1016/j.procs.2018.10.491
Mossie Z, Wang JH (2020) Vulnerable community identification using hate speech detection on social media. Info Process Manag 57(3). https://doi.org/10.1016/j.ipm.2019.102087
Charitidis P, Doropoulos S, Vologiannidis S, Papastergiou I, Karakeva S (2019). Towards countering hate speech against journalists on social media. https://doi.org/10.1016/j.osnem.2020.100071
Abdul Aziz NA, Aizaini Maarof M, Zainal A (2021). Hate speech and offensive language detection: a new feature set with filter-embedded combining feature selection. In: 2021 3rd International Cyber Resilience Conference, CRC 2021, January 29. https://doi.org/10.1109/CRC50527.2021.9392486
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shah, S.M.A., Singh, S. (2023). Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. ICICSE 2022. Lecture Notes in Networks and Systems, vol 565. Springer, Singapore. https://doi.org/10.1007/978-981-19-7455-7_17
Download citation
DOI: https://doi.org/10.1007/978-981-19-7455-7_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7454-0
Online ISBN: 978-981-19-7455-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)