1 Introduction

India is one of the largest and fastest-growing marketplace for digital consumers. Urban consumers mostly drove this rapid expansion of the digital economy. However, with the government’s push for digital India, rural India has also begun to embrace the digital economy. People adopting their regional language or blending it with English during conversation or information exchange has become a typical phenomenon as the number of Internet users from rural India has grown. Furthermore, because of lower literacy levels and a lack of cyber awareness, these people are readily motivated to spread toxic content such as Hate speech, fake news, and so on through social media platforms. Further, India has 22 different languages in which people communicate; research into controlling the transmission of hate content in these code-mixed regional languages has taken a major step in multilingual and multicultural countries like India. It will also provide an ideal test bed for the research community to investigate the spread of harmful content through social media platforms. Furthermore, the government lacks media policies to prevent the spread of such harmful content. Though the government has recently attempted to impose certain regulations, enforcing these laws in a country with a huge and diverse population is extremely difficult. There is a definite need to develop automated solutions to tackle the spread of hate content across regional languages or code mixed languages. Bokamba (1989) defines code-mixing as mixing words, phrases, and sentences from two grammatical subsystems inside the same utterance.

Some of the recent events that occurred due to the spread of hate news through social media have given a glimpse of the gravity of the problem. The recent 2020 Delhi riots, which resulted in the deaths of 53 innocent people, are living proof of how this Hate news may be exploited to destabilize the region or ruin social harmony. In addition, a new trending hashtagFootnote 1 appeared on Twitter at the end of March 2020, blaming a religious group for Covid-19’s spread in India. It had been seen more than 300,000 times on Twitter by the beginning of April, with a potential audience of 165 million. Demonstrates how quickly Hate speech can spread on social media. Also, hate-mongering on social media reached its peak during the recent West Bengal, India elections, which ended in violence between workers from two opposing parties, resulting in the deaths of several innocent people. More recently, northeast Indian inhabitants have suffered racial discrimination during the Covid-19 rise due to hateful propaganda published on social media against those individuals. These facts have emphasized the need to prevent the transmission of hate news on social media, gaining larger attention among researchers and academics.

Because of their widespread availability, previous research on Hate speech identification has mainly focused on high-resource monolingual languages such as English Khan et al. (2020); Mossie and Wang (2020); Senarath and Purohit (2020). Among these, neural network-based models have obtained state-of-the-art outcomes for various natural language processing applications. Several neural architectures, including RNN Bisht et al. (2020), CNN Khan et al. (2020), and transformer Banerjee et al. (2021); Biradar et al. (2021); Biradar and Saumya (2022) models, have been explored for Hate speech identification. However, code-mixing has recently been popular in social media, especially in multilingual countries like India, where more than 350 million individuals speak HinglishFootnote 2. Due to a lack of grammar Lal et al. (2019) and informal transliteration Singh et al. (2018), identifying language cues and establishing a contextually robust representation for code-mixed texts remain a fundamental difficulty in NLP. The following are some examples of code-mixed texts.

T1:‘neeraj ka nam humesha yaad rahega because he won the first gold medal for India in athletics!!!..’.

T2:‘muje apane manager se bahut nafarat hai, I want to kill him.’

From the preceding examples, T1 contains normal speech; however, T2 is a Hate speech instance.

Several off-the-shelf tools like IndicNLP Kunchukuttan et al. (2020), iNLTK Arora (2020), and stanza Qi et al. (2020) have been developed in recent times to handle regional languages. However, they work mainly on monolingual data like Hindi, Bengali, Marathi, etc., but they cannot handle code-mixed bilingual text. Finding Hate speech in code mixed data is more difficult because of shorter forms of words and spelling changes. More recently, few researchers have tried to develop a model for handling Code-mixed text, such as CS-ELMO, a neural architecture for transfer learning from an ELMO model, pre-trained English to code-mixed texts Aguilar and Solorio (2020). Also, the Bilingual word embedding model is proposed by Pratapa et al. (2018) to handle code-mixed data. All these models handle different aspects of NLP tasks; however, no comprehensive work has been done to identify Hate content in code-mixed data. Hence to explore the challenges of code-mixed scenarios, in this work, we have experimented with various language as well transformer models; we also proposed Transformer-based Interpreter and Feature extraction model on Deep Neural Network (TIF-DNN) as explained in Sect. 3.

The main contribution to the paper includes:

  1. 1.

    TIF-DNN, a Transformer-based Interpreter and Feature extraction model on Deep Neural Network for Hate speech identification in code-mixed Hinglish language, has been developed.

  2. 2.

    The efficiency of the proposed model is demonstrated by comparing the proposed method with existing ones.

  3. 3.

    The performance of different off-the-shelf language and Transformer models are demonstrated for Hate speech identification on code-mixed data.

The remainder of the article is structured as follows: Sect. 2 gives a summary of the background literature. Next, sect. 3 contains specifics on the suggested approach and data set. Further, sect. 4 discusses the experimental results as well as the experimental setup. Finally, we conclude our article with Discussion and limitations.

2 Literature review

Most prior work on sentiment analysis has been done primarily on high-resource languages such as English. However, code-mixed languages have received little attention due to their non-standard writing style and a shortage of data sets to train the models. As a result, researchers have just lately begun to investigate code-mixed data. The following are some of the approaches used to handle Code-mixed data.

2.1 Using handcrafted linguistic features

The first such attempt was performed by Bohra et al. (2018), they provided an annotated corpus of Hindi–English code-mixed text, comprising tweet ids and the accompanying annotations. They also demonstrated the supervised method for detecting Hate speech in code-mixed text. They employed character n-grams, word n-grams, punctuation’s, negation words, and hate lexicons as classification features. Furthermore, some researchers used Logistic Regression and multinomial Naïve Bayes to analyze statistical features such as char n-gram, word uni-gram, and word bi-gram. The experimental investigations revealed that character n-grams and word uni-gram performed better when classified using Logistic Regression on a Hindi data set. In addition, the authors used pre-trained Word2Vec embeddings for the English data set Samghabadi et al. (2018).

Ghosh et al. performed sentiment identification on code-mixed text data derived from social media. Their experiment used two code-mixed data sets, English–Bengali and English–Hindi. They classified the data according to the polarity contradiction in the statement, such as positive, negative, or neutral. SentiWordNet, opinion lexicon, and Part-of-speech (POS) tags are employed, and the multilayer perception model is used to classify the polarity, with 68.5% accuracy Ghosh et al. (2017). Si et al. used statistical features such as TF-IDF and linguistic features like emoji, part of speech, and emotion score to evaluate the performance of machine learning classifiers like XGBoost Classifier, Gradient Boosting Classifier (GBM), and Support Vector Machine (SVM) on three different datasets: English, Hindi, and Hinglish code-mixed. They obtained F1-scores of 68.13%, 54.82%, and 55.31% for the English, Hindi, and code-mixed datasets, respectively Si et al. (2019).

2.2 Using deep learning models

Recently, deep learning-based models have improved the performance of handcrafted feature models. A substantial amount of work has been done using deep learning models to detect hatred and inflammatory content. Mathur et al. used a CNN-based transfer learning approach to detect abusive tweets. They also introduced the HEOT dataset and the Profanity Lexicon Set Mathur et al. (2018). In addition, Mathur et al. (2018) classified Hate speech in Hinglish using a Multi-Input Multi-Channel transfer learning architecture based on a CNN-LSTM network. (Kamble and Joshi 2018; Kumar et al. 2020) have built a domain-specific word embedding to detect Hate speech in Hindi code mixed data and applied CNN, LSTM, and BiLSTM as a classifier and found that word-level feature is the most contributing feature for detecting Hate speech.

Santosh et al. worked with existing code-mixed datasets for Hate speech identification using two architectures: sub-word level LSTM model and Hierarchical LSTM model with attention based on phonemic sub-words(Santosh and Aravind 2019). Chopra et al. demonstrated how targeted hate embeddings combined with social network-based features outperform existing state-of-the-art models, both quantitatively and qualitatively (Chopra et al. 2020). (Chakravarthi et al. 2020; Kumar et al. 2020; Saumya et al. 2021) presented a code mixed data set for Malayalam–English language obtained from offensive comments on YouTube and Twitter, also achieved a baseline result of 75% F1 score using BERT’s transformer model.

2.3 Using transformer models

Transformers are deep learning models developed with the attention mechanism in mind. The transformer model relies heavily on self-attention and a feed-forward neural network. The transformer employs a series of self-attention, feed-forward networks, and layer normalization’s to encode the provided input text Vaswani et al. (2017). They have achieved the state of the art results on various natural language processing tasks.

Ayan et al. used the transformer model BERT and hierarchical attention network to investigate the relationship between five offensive features: anger, hatred, sarcasm, humor, and stance in Hindi–English code mixed social media content. They also used Pseudo-labeling techniques to create a combined annotated data set Sengupta et al. (2021). Mustafa Farooqi et al. (2021) used transformer-based models to identify hate material in a Hate speech and Offensive Content Identification (HASOC 2021) Hinglish code-mixed data set. They approached the problem in three ways: by using neural networks, by utilizing the transformer’s cross-lingual embeddings, and finally by fine-tuning the transformers using transliterated Hindi text. Experimental results conclude that the ensembled setup of IndicBERT, XLM-RoBERTa, and mBERT performed better, with a weighted F1 score of 72%.

Ananya et al. classified hateful content in code-mixed data using context-based embedding from ELMo, FLAIR, and the transformer BERT models[31]. Saha et al. (2021) give a thorough examination of various transformer models, as well as a genetic algorithm technique is used to ensemble the result of various models. The authors employed a genetic algorithm to obtain optimal weights during the ensemble process. We found some limitations in the existing work which are stated in Table 1.

Table 1 Limitations of the existing models

Most of the models listed in Table 1 tried to detect hate speech in the code-mixed text by using a few statistical constructed features that are extremely difficult to identify without domain expertise. Furthermore, few models tried to use transfer learning from models trained on monolingual text. These models, however, failed to detect hate features in the code-mixed text because it does not follow the same grammatical norms and syntax as the monolingual text. To overcome the difficulties caused by code-mixed text, our proposed model first attempted to convert the code-mixed text to monolingual text through a series of translation and transliteration processes before applying the classification. Since, due to the availability of pre-trained models trained on larger corpora, Hate speech detection can be performed better on monolingual data.

3 Methodology

This section discusses how various models for evaluating Hate speech data are developed and tested using different approaches. We first discuss the data set used in the study, and later, its pre-treatment and different hate news classifiers are explained in detail in this section

3.1 Problem definition:

Let S =\(\{s_{1}, s_{2}, s_{3},....s_{n}\}\) be the set of input tweets, and L = \(\{l_{1}, l_{2}, l_{3},....l_{n}\}\) be the corresponding n labels for input S, where S\(\in\){Hate, Non-hate} denotes the presence and absence of Hate speech, respectively. The goal of the proposed model is to predict the conditional label ’l’ for the given input ‘s’ i.e., P(l/s).

3.2 Data set description

The data set used to validate the proposed model was obtained from Bohra et al. (2018). The data set include both normal and Hate speech. The data collection contains 4575 code-mixed tweets, of which 1661 contain Hate speech, and the remaining 2914 code-mixed tweets in the data set consist of Non-Hate speech. All of these tweets were scraped from Twitter using the Twitter Python API. The data set is slightly unbalanced and consists of two fields: Text and LabelFootnote 3. Table 2 provides some of the sample sentences from the data set.

Table 2 Samples from dataset

3.3 Data preprocessing

As we all know, social media data contains noise. Several pre-treatment methods were carried out on the text fields to eliminate noise from the data set. The textual corpus had URLs, hyperlinks, emojis, stop words, and capital characters. Various preprocessing steps were carried out to simplify details, such as replacing punctuation with white spaces removing URLs and Twitter account names that could not be used to identify hate news. Texts were also lower-cased to avoid duplication problems. Further, the data lemmatization was carried out to translate tweets’ words into their useful basic form. We have used a wordnet lemmatizer from NLTK to convert words into their basic form.

3.4 Baseline transformer model without translation

We performed experiments with two different transformer models trained in Indian languages. On top of these models, we applied traditional machine learning classifiers such as Logistic Regression (LR) and Support Vector Machine (SVM) for classification. The input from the preprocessing step is tokenized into several tokens before being passed to the padding layer, which turns uneven length sentences into equal length sentences. These padded tokens are then passed through a transformer model for feature selection; in our study, we employed two distinct transformer models; we used multilingual BERT for preliminary trials. mBERT is a bidirectional model that is built on the Transformer architecture (Devlin et al. 2018). The Transformer again relied on the attention mechanism (Vaswani et al. 2017). We employed multilingual BERT trained on 104 distinct languages from Wikipedia articles in the proposed model for feature extraction. mBERT, like BERT, holds 12 attention heads and 12 transformer blocks. On top of mBERT, we implement the aforementioned conventional algorithms.

In addition, we also experimented with IndicBERT, a multilingual ALBERT model trained on 12 different Indian languages which includes Assamese, Bengali, English, Gujarati, Hindi, Kannada, and Marathi. IndicBERT also supports Oriya, Punjabi, Tamil, and Telugu. IndicBERT has fewer parameters than other BERT variants but still provides state-of-the-art performance Kakwani et al. (2020). The architecture of IndicBERT is similar to that of other BERT variants. Again, on top of IndicBERT, we add a conventional machine learning classifier, which takes input from the CLS token of dimension 768. To improve the model performance, we also experimented with their ensembled setup. We add weak learners LR and SVM classifiers in an ensemble model on top of the transformer models. The output from the weak learners is then routed through the Hard voting classifier model. The model takes input from each weak learner and predicts the class with maximum votes. For example, suppose predictions from the classifiers are(‘A,’ ‘A,’ ‘B’), so most of the classifiers have predicted ‘A’ as output. Hence ‘A’ will be the final prediction. The detailed architecture of the model is illustrated in Fig. 1.

Fig. 1
figure 1

Transformer-based model without translation

3.5 Pre-trained language models

We also use state-of-the-art language model classifiers like BERT and ULMFiT. We used a simple classification API supplied by the developers to create the BERT and ULMFiT models. Our underlying option models for BERT and ULMFiT, respectively, are ‘bert-base-uncased’ and ASGD Weight Dropped LSTM (AWD-LSTM). AWD-LSTM is a state-of-the-art language model made out of ordinary LSTM without any attention. The AWD-LSTM process is divided into three stages: LM pre-training is the process of training a language model with a large wikitext-103 corpus to capture general language properties. Then, the model is fine-tuned to the target task data during LM fine-tuning. Finally, using the BPTT for Text Classification (BPT3C) language model, the classifier is fine-tuned. BPT3C Language models are trained with backpropagation through time (BPTT) to enable gradient propagation for large input sequences Howard and Ruder (2018). The huggingfaceFootnote 4 in python is used to build BERT model while the fastaiFootnote 5 is used to build ULMFiT.

3.6 Proposed TIF-DNN model

The proposed model (TIF-DNN) is built on three-layer architecture: the interpretation layer, the feature extraction layer, and the classification layer. Figure 2 illustrates the complete pipeline of the architecture.

Fig. 2
figure 2

Pipeline architecture of proposed model

3.7 Interpretation layer

The interpretation layer forms the first layer in our proposed model; cleaned and lemmatized tweets are input to this layer. The Interpretation is performed in the following four steps:

  1. 1.

    Each tweet is separated into several words at the start using Python’s split() function.

  2. 2.

    The Microsoft LID-toolFootnote 6 is used to annotate each word with its matching Lang-id. Language tags such as English, Hindi are used to annotate words.

  3. 3.

    Each annotated word is compared to its language id; if the Lang-id is English, the word is translated into the matching Devanagari term using Python’s Englishtohindi moduleFootnote 7. On the other hand, if Lang-id is Hindi, the word is transliterated to the Devanagari script using the Indic-transliteration function from the Indic-nlp-libraryFootnote 8.

  4. 4.

    Concatenates all transliterated and translated words to produce an original phrase that will be used as input to the feature extraction layer.

3.8 Feature extraction layer

The feature extraction layer receives a monolingual tweet with the Devanagari script from the previous layer. The tokenizer is then given a Devanagari tweet to turn each tweet into several tokens, with each word in the tweet considered a separate token. Padding and masking for variable-length phrases were also performed in conjunction with tokenization. The proposed model uses of transformer’s mBERT tokenizer. Padded tokens are then passed through mBERT transformer model for feature extraction. We only drew embedding from the CLS token, which gives full-sentence embedding of 768 vector dimensions. These embeddings are then passed through the classification layer for stance detection.

3.9 Classification layer

We used two classifiers in our suggested model on Hate speech data. First, we tested conventional machine learning classifiers like Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RM), Naïve Bayes (NB), and K Nearest Neighbors (KNN) on translated and transliterated Devanagari script using mBERT embeddings. Later, we experimented with the Deep Neural Network (DNN) model, which acts as the second model in our suggested approach. DNN model comprises multiple dense layers, which aim to shape and compress the input in a meaningful fashion. Dense layers are those that are fully connected. A dropout layer follows each dense layer to avoid over-fitting problems. We also used a batch normalization layer to normalize activation values; the normalization layer calculates new activation values as follows.

\(h_{ij}^{norm}\) = \((h_{ij} -{j})\) / \(\sigma _{j}\)

\(h_{ij}^{final}\) = \(\gamma _{j}.h_{ij}^{norm} + \beta _{j}\) where ‘\(\gamma\)’ is mean, and ‘\(\sigma\)’ is the standard deviation. The detailed architecture of the proposed model is illustrated in Fig.  3

Fig. 3
figure 3

Proposed TIF-DNN-based architecture

4 Results and implementation

The experiment is started with transformer models; on top of the transformer, we implemented several traditional machine learning classifiers like LR, SVM, RF, NB, and KNN. The experimental trials found that the mBERT model and traditional machine learning classifiers have performed better than IndicBERT for hate content detection in the code-mixed Hinglish Twitter data set. On the other hand, IndicBERT failed to achieve a better result because the model is trained using Devanagari script; hence it failed to identify context information from Romanized script. Parameters used during the training of the aforementioned classifiers are indicated in Table 3. The parameters such as learning rate, loss function, and optimizers are selected from experimental trials, while the others are selected through optimal grid search. These algorithms were implemented using the Scikit-learn library, using a train-test ratio of 70:30. The outcomes of these algorithms provide baseline results for the proposed model. According to the results shown in Table 4, LR has achieved a better result for IndicBERT with 65% accuracy, and SVM performed better on mBERT embeddings with 67% accuracy. We also experiment with language models such as pre-trained BERT and ULMFiT, the results of which are shown in Table 5. From the Table, BERT outperformed ULMFiT in language models, with an accuracy of 71%. However, even language models failed to produce improved results since they were primarily trained on high-resource languages such as English.

Table 3 Classifier’s parameters
Table 4 Baseline transformer model results

4.1 Proposed model results

The proposed model uses the interpretation layer to transform multilingual input data to a monolingual form. We found that Hate speech identification on monolingual data sets has better accuracy from the literature. An mBERT subsequently processes the translated data for feature extraction. First, classical machine learning models are used on top of the mBERT model for classification; among these, SVM achieves a higher accuracy of 72 percent. Later, we experimented with a Deep Neural Network-based model to improve the performance of existing techniques. In the DNN model, the mBERT input is initially passed through dense layers of sizes 1000, 500, 100, and 50; batch normalization and dropout of 0.4 are added to avoid over-fitting problems. The dropout value of 0.4 is decided based on the experimental trials. Finally, the output from the last dense layer is passed through the sigmoid layer for stance detection. Table 5 compares the outcomes of our proposed method with the baseline classifiers. As shown in the table proposed model outperformed top-performing baseline models with an accuracy of 73%.

Table 5 Comparative study of proposed model with baseline results
Table 6 Sample test cases

5 Discussion

To examine the behavior of individual models, we selected sample phrases from test data. Then, we passed them through the best-performing models for stance identification, and the results are included in Table 6. According to the table, most models correctly recognized non-hate sentences, but only BERT and the proposed model correctly classified Hate speech. The reason for our suggested model’s better performance is that we first transformed code-mixed data to monolingual, then extracted features using Transformer models trained on monolingual text. On the other hand, other models failed to capture the hateful features from the data set because they were not trained on code-mixed text.

Fig. 4
figure 4

Performance of ML models with and without translation

To evaluate the efficacy of our suggested translation and transliteration model, we compared our benchmark findings on both translated and untranslated data, and the results are summarized in Fig. 4. The observations from the comparative study indicate the translation process improves model performance significantly. The main conclusion that can be made from our experimental data is that the process of translating code-mixed text to monolingual text during classification will increase model performance. Another interesting observation from our experiments is that SVM outperforms all other machine learning models on both translated and original text, with an accuracy of 72%. Other contribution of experimental trials includes, among the pre-trained transformer-based embeddings, mBERT outperforms IndicBERT for hate speech identification in code-mixed text. Further, we also compared the outcomes of the best-performing models, and the results are summarized in Fig. 5. The experimental results show that our suggested TIF-DNN outperforms all baseline models by considerably improving classification accuracy. Furthermore, to validate our claims we compared the proposed work with existing models, and the results are shown in Table 7. On Twitter data, the proposed model outperformed existing methods for Hate speech recognition.

Fig. 5
figure 5

Comparison of best performing models

5.1 Limitations of our model

To understand the limitations of our model, we examine the outcomes of our models, which inspire new research directions. Some of these limitations are as follows.

  1. 1.

    As shown in Table 5, when compared to Hate speech, the proposed model exhibits higher accuracy for Non-Hate speech identification. Unbalanced data used during the training process might be one of the reasons.

  2. 2.

    If the performance of the translator model used during interpretation is improved further, the performance of the proposed model can be improved. Our translation model struggles to find the exact translated term in Devanagari script for the few equivalent English words in a code-mixed tweet. As a result, there are significant mistranslations in our translated tweets.

Table 7 Comparison with existing work

6 Conclusion and future enhancements

The proposed approach investigates Hate speech identification in a Hindi–English code-mixed Twitter data set. In this article, we proposed the TIF-DNN model for Hate speech identification. We proved the efficacy of the proposed model by comparing the results of the suggested model to baseline classifiers and past work. The findings also revealed that the proposed translator-based models outperform several baseline classifiers and existing work. However, better results may be obtained if a more powerful translation model is included in future studies. Furthermore, experiments detailed in this study can be repeated on other regional languages as part of future research, which is essential because India is a multilingual society with numerous local languages.