Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers

Shah, Seyed Muzaffar Ahmad; Singh, Satwinder

doi:10.1007/978-981-19-7455-7_17

Seyed Muzaffar Ahmad Shah¹³ &
Satwinder Singh¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 565))

Included in the following conference series:

International Conference on Innovations in Computer Science and Engineering

313 Accesses
1 Citations

Abstract

Social media is rapidly growing in popularity and has its advantages and disadvantages. Users posting their daily updates and opinions on social media may inadvertently hurt the feelings of others. Detecting hate speech and harmful information on social media is critical these days, lest it led to calamity. In this research, machine learning classifiers such as Naïve Bayes, support vector machines, logistic regression, and pre-trained models BERT and RoBERTa, developed by Google and Facebook, respectively, are used to detect hate speech and offensive content from Twitter data on a newly created dataset that included tweets and articles/blogs. The sentiments were obtained using the VADER sentiment analyzer. The results depicted that the pre-trained classifiers outperformed the machine learning classifiers utilized in this study. An accuracy score of 96% and 93% was scored by BERT and RoBERTa, respectively, on the tweet dataset, whereas on a dataset of articles/blogs, accuracy of 97% and 98%, respectively, was achieved by both the classifiers outperforming other classifiers used in this work. Further, it can also be depicted that neutral content is shared more in articles/blogs, hate content is mostly shared equally in both the tweets and article/blogs, whereas offensive content is shared higher in tweets than articles/blogs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Classifying and Measuring Hate Speech in Twitter Using Topic Classifier of Sentiment Analysis

Using Machine Learning to Detect the Signs of Radicalization and Hate Speech on Twitter

References

Ayo FE, Folorunso O, Ibharalu FT, Osinuga IA, Abayomi-Alli A (2021) A probabilistic clustering model for hate speech classification in twitter. Expert Syst Appl 173. https://doi.org/10.1016/j.eswa.2021.114762
Kapil P, Ekbal A (2020) A deep neural network based multi-task learning approach to hate speech detection. Knowledge-Based Syst 210. https://doi.org/10.1016/j.knosys.2020.106458
Watanabe H, Bouazizi M, Ohtsuki T (2018) Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6:13825–13835. https://doi.org/10.1109/ACCESS.2018.2806394
Article Google Scholar
Souza A de, Abreu DC, Souza, GA (n.d.). Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. http://shura.shu.ac.uk/26018/, https://orcid.org/0000-0001-7461-7570
Alfina I, Mulia R, Fanany MI, Ekanata Y (2018) Hate speech detection in the Indonesian language: A dataset and preliminary study. In: 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017, 2018-January, 233–237. https://doi.org/10.1109/ICACSIS.2017.8355039
Ketsbaia L, Chen X (n.d.) Detection of hate Tweets using machine learning and deep learning. https://t.co/xUCcwoetmn
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. www.aaai.org
Modha S, Majumder P, Mandl T, Mandalia C (2020). Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance. Expert Syst Appl, 161. https://doi.org/10.1016/j.eswa.2020.113725
Almeida TG, Nakamura FG, Souza B, Nakamura EF (2017) Detecting hate, offensive, and regular speech in short comments. In: WebMedia 2017—Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web, 225–228. https://doi.org/10.1145/3126858.3131576
Vigna F, Cimino A, Dell’orletta F, Petrocchi M, Tesconi M (n.d.) Hate me, hate me not: Hate speech detection on Facebook. https://curl.haxx.se
Gröndahl T, Pajola L, Juuti M, Conti M, Asokan N (2018) All you need is “love”: evading hate-speech detection. http://arxiv.org/abs/1808.09115
Zhang Z, Robinson D, Tepper J (2016) Hate speech detection using a convolution-LSTM based deep neural network. https://doi.org/10.475/123_4
Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V (2022) Emotionally informed hate speech detection: a multi-target perspective. Cogn Comput 14(1):322–352. https://doi.org/10.1007/s12559-021-09862-5
Article Google Scholar
Plaza-Del-Arco FM, Molina-Gonzalez, MD, Urena-Lopez LA, Martin-Valdivia MT (2021) A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access 9:112478–112489. https://doi.org/10.1109/ACCESS.2021.3103697
Koushik G, Rajeswari K, Muthusamy SK (2019) Automated hate speech detection on Twitter. In: Proceedings—2019 5th International Conference on Computing, Communication Control and Automation, ICCUBEA 201, September 19. https://doi.org/10.1109/ICCUBEA47591.2019.9128428
Dorris W, Hu RR, Vishwamitra N, Luo F, Costello M (2020) Towards automatic detection and explanation of hate speech and offensive language. In: IWSPA 2020—Proceedings of the 6th International Workshop on Security and Privacy Analytics, 23–29. https://doi.org/10.1145/3375708.3380312
Pitsilis GK, Ramampiaro H, Langseth H (2018) Effective hate-speech detection in Twitter data using recurrent neural networks. Appl Intell 48(12):4730–4742. https://doi.org/10.1007/s10489-018-1242-y
Article Google Scholar
Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: 26th International World Wide Web Conference 2017, WWW 2017 Companion, 759–760. https://doi.org/10.1145/3041021.3054223
Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: WWW 2015 Companion—Proceedings of the 24th International Conference on World Wide Web, 29–30. https://doi.org/10.1145/2740908.2742760
Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. 25th International World Wide Web Conference. WWW 2016:145–153. https://doi.org/10.1145/2872427.2883062
Article Google Scholar
Gao L, Huang R (2017) Detecting online hate speech using context aware models. http://arxiv.org/abs/1710.07395
Roy PK, Tripathy AK, Das TK, Gao XZ (2020) A framework for hate speech detection using deep convolutional neural network. IEEE Access 8:204951–204962. https://doi.org/10.1109/ACCESS.2020.3037073
Article Google Scholar
Alakrot A, Murray L, Nikolov NS (2018) Towards accurate detection of offensive language in online communication in Arabic. Procedia Comp Sci 142:315–320. https://doi.org/10.1016/j.procs.2018.10.491
Article Google Scholar
Mossie Z, Wang JH (2020) Vulnerable community identification using hate speech detection on social media. Info Process Manag 57(3). https://doi.org/10.1016/j.ipm.2019.102087
Charitidis P, Doropoulos S, Vologiannidis S, Papastergiou I, Karakeva S (2019). Towards countering hate speech against journalists on social media. https://doi.org/10.1016/j.osnem.2020.100071
Article Google Scholar
Abdul Aziz NA, Aizaini Maarof M, Zainal A (2021). Hate speech and offensive language detection: a new feature set with filter-embedded combining feature selection. In: 2021 3rd International Cyber Resilience Conference, CRC 2021, January 29. https://doi.org/10.1109/CRC50527.2021.9392486

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Central University of Punjab, Bathinda, India
Seyed Muzaffar Ahmad Shah & Satwinder Singh

Authors

Seyed Muzaffar Ahmad Shah
View author publications
You can also search for this author in PubMed Google Scholar
Satwinder Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satwinder Singh .

Editor information

Editors and Affiliations

Guru Nanak Institutions, Ibrahimpatnam, Telangana, India
H. S. Saini
Guru Nanak Institutions, Ibrahimpatnam, India
Rishi Sayal
Jawaharlal Nehru Technological University, Hyderabad, Telangana, India
A. Govardhan
Cloud Computing, University of Melbourne, Melbourne, VIC, Australia
Rajkumar Buyya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, S.M.A., Singh, S. (2023). Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. ICICSE 2022. Lecture Notes in Networks and Systems, vol 565. Springer, Singapore. https://doi.org/10.1007/978-981-19-7455-7_17

Download citation

DOI: https://doi.org/10.1007/978-981-19-7455-7_17
Published: 04 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7454-0
Online ISBN: 978-981-19-7455-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Classifying and Measuring Hate Speech in Twitter Using Topic Classifier of Sentiment Analysis

Using Machine Learning to Detect the Signs of Radicalization and Hate Speech on Twitter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets

Classifying and Measuring Hate Speech in Twitter Using Topic Classifier of Sentiment Analysis

Using Machine Learning to Detect the Signs of Radicalization and Hate Speech on Twitter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation