Skip to main content

Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers

  • Conference paper
  • First Online:
Innovations in Computer Science and Engineering (ICICSE 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 565))

Abstract

Social media is rapidly growing in popularity and has its advantages and disadvantages. Users posting their daily updates and opinions on social media may inadvertently hurt the feelings of others. Detecting hate speech and harmful information on social media is critical these days, lest it led to calamity. In this research, machine learning classifiers such as Naïve Bayes, support vector machines, logistic regression, and pre-trained models BERT and RoBERTa, developed by Google and Facebook, respectively, are used to detect hate speech and offensive content from Twitter data on a newly created dataset that included tweets and articles/blogs. The sentiments were obtained using the VADER sentiment analyzer. The results depicted that the pre-trained classifiers outperformed the machine learning classifiers utilized in this study. An accuracy score of 96% and 93% was scored by BERT and RoBERTa, respectively, on the tweet dataset, whereas on a dataset of articles/blogs, accuracy of 97% and 98%, respectively, was achieved by both the classifiers outperforming other classifiers used in this work. Further, it can also be depicted that neutral content is shared more in articles/blogs, hate content is mostly shared equally in both the tweets and article/blogs, whereas offensive content is shared higher in tweets than articles/blogs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ayo FE, Folorunso O, Ibharalu FT, Osinuga IA, Abayomi-Alli A (2021) A probabilistic clustering model for hate speech classification in twitter. Expert Syst Appl 173. https://doi.org/10.1016/j.eswa.2021.114762

  2. Kapil P, Ekbal A (2020) A deep neural network based multi-task learning approach to hate speech detection. Knowledge-Based Syst 210. https://doi.org/10.1016/j.knosys.2020.106458

  3. Watanabe H, Bouazizi M, Ohtsuki T (2018) Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6:13825–13835. https://doi.org/10.1109/ACCESS.2018.2806394

    Article  Google Scholar 

  4. Souza A de, Abreu DC, Souza, GA (n.d.). Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. http://shura.shu.ac.uk/26018/, https://orcid.org/0000-0001-7461-7570

  5. Alfina I, Mulia R, Fanany MI, Ekanata Y (2018) Hate speech detection in the Indonesian language: A dataset and preliminary study. In: 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017, 2018-January, 233–237. https://doi.org/10.1109/ICACSIS.2017.8355039

  6. Ketsbaia L, Chen X (n.d.) Detection of hate Tweets using machine learning and deep learning. https://t.co/xUCcwoetmn

  7. Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. www.aaai.org

  8. Modha S, Majumder P, Mandl T, Mandalia C (2020). Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance. Expert Syst Appl, 161. https://doi.org/10.1016/j.eswa.2020.113725

  9. Almeida TG, Nakamura FG, Souza B, Nakamura EF (2017) Detecting hate, offensive, and regular speech in short comments. In: WebMedia 2017—Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web, 225–228. https://doi.org/10.1145/3126858.3131576

  10. Vigna F, Cimino A, Dell’orletta F, Petrocchi M, Tesconi M (n.d.) Hate me, hate me not: Hate speech detection on Facebook. https://curl.haxx.se

  11. Gröndahl T, Pajola L, Juuti M, Conti M, Asokan N (2018) All you need is “love”: evading hate-speech detection. http://arxiv.org/abs/1808.09115

  12. Zhang Z, Robinson D, Tepper J (2016) Hate speech detection using a convolution-LSTM based deep neural network. https://doi.org/10.475/123_4

  13. Chiril P, Pamungkas EW, Benamara F, Moriceau V, Patti V (2022) Emotionally informed hate speech detection: a multi-target perspective. Cogn Comput 14(1):322–352. https://doi.org/10.1007/s12559-021-09862-5

    Article  Google Scholar 

  14. Plaza-Del-Arco FM, Molina-Gonzalez, MD, Urena-Lopez LA, Martin-Valdivia MT (2021) A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access 9:112478–112489. https://doi.org/10.1109/ACCESS.2021.3103697

  15. Koushik G, Rajeswari K, Muthusamy SK (2019) Automated hate speech detection on Twitter. In: Proceedings—2019 5th International Conference on Computing, Communication Control and Automation, ICCUBEA 201, September 19. https://doi.org/10.1109/ICCUBEA47591.2019.9128428

  16. Dorris W, Hu RR, Vishwamitra N, Luo F, Costello M (2020) Towards automatic detection and explanation of hate speech and offensive language. In: IWSPA 2020—Proceedings of the 6th International Workshop on Security and Privacy Analytics, 23–29. https://doi.org/10.1145/3375708.3380312

  17. Pitsilis GK, Ramampiaro H, Langseth H (2018) Effective hate-speech detection in Twitter data using recurrent neural networks. Appl Intell 48(12):4730–4742. https://doi.org/10.1007/s10489-018-1242-y

    Article  Google Scholar 

  18. Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: 26th International World Wide Web Conference 2017, WWW 2017 Companion, 759–760. https://doi.org/10.1145/3041021.3054223

  19. Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: WWW 2015 Companion—Proceedings of the 24th International Conference on World Wide Web, 29–30. https://doi.org/10.1145/2740908.2742760

  20. Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. 25th International World Wide Web Conference. WWW 2016:145–153. https://doi.org/10.1145/2872427.2883062

    Article  Google Scholar 

  21. Gao L, Huang R (2017) Detecting online hate speech using context aware models. http://arxiv.org/abs/1710.07395

  22. Roy PK, Tripathy AK, Das TK, Gao XZ (2020) A framework for hate speech detection using deep convolutional neural network. IEEE Access 8:204951–204962. https://doi.org/10.1109/ACCESS.2020.3037073

    Article  Google Scholar 

  23. Alakrot A, Murray L, Nikolov NS (2018) Towards accurate detection of offensive language in online communication in Arabic. Procedia Comp Sci 142:315–320. https://doi.org/10.1016/j.procs.2018.10.491

    Article  Google Scholar 

  24. Mossie Z, Wang JH (2020) Vulnerable community identification using hate speech detection on social media. Info Process Manag 57(3). https://doi.org/10.1016/j.ipm.2019.102087

  25. Charitidis P, Doropoulos S, Vologiannidis S, Papastergiou I, Karakeva S (2019). Towards countering hate speech against journalists on social media. https://doi.org/10.1016/j.osnem.2020.100071

    Article  Google Scholar 

  26. Abdul Aziz NA, Aizaini Maarof M, Zainal A (2021). Hate speech and offensive language detection: a new feature set with filter-embedded combining feature selection. In: 2021 3rd International Cyber Resilience Conference, CRC 2021, January 29. https://doi.org/10.1109/CRC50527.2021.9392486

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satwinder Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shah, S.M.A., Singh, S. (2023). Hate Speech and Offensive Language Detection in Twitter Data Using Machine Learning Classifiers. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. ICICSE 2022. Lecture Notes in Networks and Systems, vol 565. Springer, Singapore. https://doi.org/10.1007/978-981-19-7455-7_17

Download citation

Publish with us

Policies and ethics