1 Introduction

The rapid growth of information technology and communication network brings various changes in industries, societies and various other sectors. In 2022, 94 zettabytes of internet data were produced [1] and consumed by 5.07 billion internet users and they share their feelings, emotions and thoughts through social media [2]. Nowadays, social networks like Facebook, Twitter, Reddit and MySpace are popular platforms that play significant role for users. Users spend ample time on social media platforms where people interact with others, update their status, share personal experiences, and defame other members [3]. The bright side of social media platforms comes with various negative consequences such as work-life conflicts, anxiety, revenge, depression, aggression, and hate crimes [4]. Various online activities and exposures provide threats for common users. This situation motivates to consider techno-regulation approaches that safeguard society and people. It supports legal norms using technological tools or devices [5]. The two ways to support cyber security norms are nudge and techno-regulation, where nudge push common users to safeguard their future action and techno-regulation forcefully removed the riskier content that harm common users [6]. This techno-regulation comes under various social media research area like privacy, child protection, cybercrime, cyber-security, revenge detection and many others. Revenge is also a root cause of cyberstalking, cybercrimes, and sexting [7].

Revenge is an act of human aggression that harms any person or group of persons in response to a perceived provocation or grievance. Cyber revenge via social media includes various forms of aggression (cyber defamation, cyberbullying, cyber trolling/stalking, revenge porn, and cyber dating abuse) and occurs through email, tweets, social media posts, and text messages [8]. Social media revenge is an almost fresh research direction where some author has implemented revenge porn detection [9, 10]. Interpersonal context-based social media revenge is the new research path and is yet to get due attention. One recent attempt [11] has extracted a dataset from Reddit social media to show the difference between active and passive revenge. Along with that another vengeful or revenge content detection research also considers text from social media, terrorist activity based on religious and school shooters datasets [12]. Revenge posts are increasing because this is easier for users to take revenge through social media. Social media revenge may receive various consequences like job loss, suicide, and lawsuit [8].

Reddit is a popular social media platform where 223 million users are from the United States [13]. It is a controversial social site that contains various communities named Subreddits. Numerous Subreddits from the Reddit website have been banned due to abusive and controversial content [14]. In 2020, most of the data removal requests (Reddit) by the Russian Government and the South Korean Government are 89% and 60% respectively [15]. Various Subreddits represent malicious posts as passive revenge and active revenge. Active revenge consists of petty and pro revenge. Petty revenge is not like criminal revenge, but pro revenge is a more grievous offense than petty revenge.

In Fig. 1, all five posts present revenge stories where two are pro revenge post, two are malicious compliance, and one is petty revenge content. The first malicious compliance post shows an employee’s reaction against his boss for daily 14 h of working time. The user wants to defame his boss and share demands through social media. Another malicious post shows an employee's reaction against the manager for her behavior. The malicious post presents passive revenge stories. The petty revenge post shares a personal experience with one women's behaviors, and the user takes active revenge against her. This revenge is not like a criminal offense. Two pro revenge posts indicate bullying or aggressive behaviors against some person. As per the recent literature, cyberbullying considers an act of active revenge [16]. All posts are long and contain various sentences. Contextual analysis of paragraph-like contents and intensity of aggressive behaviors are significant features of revenge posts. All posts include no offensive terms but show malicious or revenge expressions. Revenge is part of complex human behavior and emotion analysis. Several authors have classified emotions [17], abusive [18], aggression [19], hate [20, 21], cyberbullying [22] content with NLP models, and revenge detection also relates to all these fields. The availability of fewer datasets is the primary concern for this field. It also observes that contextual analysis for implicit revenge content detection requires more NLP research.

Fig. 1
figure 1

Sample Revenge Posts from Reddit with ProRevenge, PettyRevenge and Malicious Compliance Subreddits (https://www.reddit.com/)

With contemplating revenge text research limitations, this study contributes a novel revenge detection model (CatRevenge) for binary and multiclass. The proposed model symbolizes “CatRevenge” where “Revenge” is for feature vectors of revenge post, and “CAT” is for the categorical feature-based CATBoost classifier. This research develop and analyze a novel revenge classification model with the Reddit English social media dataset. It consider paragraph embedding for contextual semantic analysis of paragraph-like contents and POS tag-based impact weight analysis to find the intensity of aggressive behaviors. The contributions of this research are given below in concise form:

  • This is the first literature work on revenge posts detection with a combination of syntactic, lexical, and semantic features – POS tag based impact weight, TF-IDF, and paragraph embedding respectively.

  • It is the one revenge text detection research where the paragraph embedding model considers for contextual semantic analysis of English revenge stories.

  • This research is the first revenge text detection research that considers the impact weight analysis of each POS tagger for each revenge post and that considers the total corpus.

  • This is the first revenge text detection research that preprocesses all reviews with Slangzy, internet slang words meaning dictionary.

  • It includes an efficient gradient boost classification model CATBoost classifier that considers categorical features of revenge text.

The remainder of the research is prepared as follows: Section 2 illustrates the existing social media research and text-based applications with gradient boosting classifiers. The CatRevenge model architecture and methodology with all features and classifiers are present in Section 3. Section 4 shows the experimentation setup with results and findings. Finally, the last section concludes with future research directions for revenge post-detection works.

2 Related research

This related research work mainly illustrates two sections – Social media text analysis, and text classification with gradient boosting classifiers. With various social media text classification research, revenge text detection is an almost new research direction. So, this study describes various existing social media text research that is related to revenge text analysis. Along with that, this work consider gradient boosting classifiers-based text classification research to portray the existing research path to the readers.

Several terminologies are applied in NLP research to detect negative and harmful text from social media. Active revenge text is the expression of complex human behavior and that related to various NLP research areas like – emotion detection [17], cyber aggression detection [19, 23], abusive language detection [18], hate content detection [21, 24], cyberbullying and stalking detection [22]. Classification of abusive content is also related to malicious content and active revenge. Revenge detection is a fresh NLP research domain, one author has implemented revenge content detection model with POS tagging, word2vec embedding and AdaBoost, KNN classifier [12]. This model has applied three dataset but this model considers short sentence based text and it also not able to detect the importance of each tokens. Both the above models not able to detect slang or offensive words from text. The related work section illustrated various existing research works in the above areas.

Various machine learning, and deep learnings models are applied to the above studies as supervised, unsupervised, semi-supervised, and deep learning models. Most of the studies employed various classification models like – SVM [18], Logistic Regression [21], Random Forest [22], MLP [23], Bert [19, 25], LSTM [17, 26] and CATBoost [24]. One recent emotion detection study [17] has applied the Glove embedding model with the LSTM classifier to detect feelings from the text. Another very recent emotion detection research detects negative emotional text from patients who suffer from mental health. This model also applied the Glove embedding model and bi-directional LSTM classifier along with the CNN model. This study mainly detects negative emotions like stress, anxiety, depression, addiction and shows achievable performance for WebMD and Healthtap datasets [26]. Aggression detection research has employed various Bert models to tackle cyber aggression [19]. One aggression detection study has employed the MLP classifier and deep neural network which achieved 92% accuracy. This study also established that aggression and hate are related to cyber harassment and bullying [23]. Along with emotion and aggression text, one study has detected cyberbullying [22] with the Random Forest classifier which achieved 0.90 F1. Cyberbullying detection has also been explored with sentiment and emotion features that have created a code-switch corpus. The Bert and VecMap based two embedding techniques have outperformed the cyberbullying baseline models [27]. Hate text detection with emotion informed has explored multitask and multi-target approaches. This study has applied sentic computing models, hate speech lexicons, and the BERT model to achieve remarkable performance compared to the baseline models [25]. Hate speech detection research has applied Multinomial logistic regression that achieved 87.68% accuracy [21]. The CATBoost classifier with various features for multiclass hate text classification has also achieved the best performance compared to other machine learning and deep learning classifiers [24]. Abusive language detection is also another NLP research domain that is related to revenge detection employing linear SVM classifier with polarized and generic embedding [18]. Abusive text detection with variations of CNN and LSTM models has also achieved remarkable performance for native and code-switch language [28]. Spam message detection with sentiment analysis has also employed hybrid machine learning classifiers SVM and hybrid KNN algorithm. It has also applied optimization approach to enhance the accuracy of spam message detection and it achieved 99.82% accuracy considering three benchmark datasets [29]. Another sentiment analysis research considered efficient feature selection approach with meta-heuristic genetic algorithm for online customer reviews along with various machine learning classifiers like AdaBoost, XGBoost, Gradient Boost, Random Forest and many more. This approach achieved 77% to 78% accuracy with various benchmark datasets of sentiment classification task [30].

The above social media research works show that revenge text detection is a fresh research path and that needs more findings and analysis. Along with active and passive revenge, pro and petty revenge classification is also a difficult task and needs contextual semantic analysis. This study considers contextual text analysis for improved accurate classification of revenge posts. The proposed work considers paragraph embedding model for revenge text detection that also have applied in various NLP applications. Information retrieval considered PV-DBOW model where it observed that language estimation performance improved for paragraph embedding [31]. Paragraph embedding model also achieved improved performance for social network computation that efficiently approximate closeness centrality measure [32]. Paragraph embedding model also shown effective performance for sentiment analysis domain. Both the models PV-DBOW and PV-DM achieved more than 75% accuracy for sentiment text classification [33]. Along with paragraph embedding model, many NLP application also considered combination of lexical and semantic features that shows effective performance in present state of the art for various NLP applications. In short text classification, top n different words based on each category considered for lexical feature computation and word map with their specific weight considered for semantic feature. The combination of both feature helps to detect right topic for short text classification [34]. Another NLP application for question classification employed combination of semantic, syntactic and lexical features where it considered hypernyms and question category as semantic feature, question pattern and headwords as syntactic feature, and word n-gram and word shape as lexical feature. It was also observed that combination of all approach improved the performance of question classification [35]. Grading system of automatic short answer is another NLP application where combination of lexical and semantic feature was employed and that shows improved performance [36]. Performance of different NLP application with combination of various features motivate this research to consider combination features.

Boosting algorithm is mostly applied to boost weak learners and classifiers to achieve a higher accuracy of the classifier [37]. The most popular boosting model is the Adaptive boosting or Adaboost algorithm. It emphasizes on the misclassified samples where high accuracy based weak classifier have high weight [38]. In very recent research, three ensemble gradient boosting methods [39] show competitive state of the art classification results – XGBoost or extreme gradient boosting [40], Light GBM or light gradient boosting [41] and CATBoost or categorical feature based boosting [42].

This related research section presents the performance of existing gradient boosting based text classification research to show the significance of CATBoost classifier in this research. One recent research has applied Light GBM classifier for large recommendation dataset with 160 M tweets. Author has considered various engagement of tweets with Bert training model. As per the research, Light GBM text classification model is well oriented and fast for large tweet dataset [43]. Another online sexism and harassment detection research has employed XGBoost and CATBoost classifiers with LSTM model. This research shows improved performance with XGB classifier and CAT Boost classifier for social media text classification [44]. Light GBM classifier has also shown achievable performance for sentiment short and natural text classification where it considers domain free data [45]. This research also considers slang words as a feature. Another sentiment based prediction for fluctuation of crypto currency price has considered tree ensemble XGBoost classifier where that also accomplished good performance with tenfold cross validation [46]. Along with above text classification research, another social media aggression detection has employed CATBoost classifier. This research detects aggression and misogyny contents from social media data with TF-IDF and bag of words feature [47].

The CATBoost is a recently developed gradient boosting classifier but it already shows its efficiency in various textual applications as well as other applications. Most of the gradient boosting classifiers show effective performance but CatRevenge considers categorical features based CATboost that reduce overfitting problems for various text datasets.

3 CatRevenge methodology

This proposed CatRevenge model consists of five main tasks where the preprocessing of raw revenge text is an initial task. After preprocessing of text, next part is feature extractions. This model considers three feature extraction approaches to enrich the model efficiency. The three feature extraction approaches are – impact weight analysis with syntactic feature, lexical feature with TF-IDF vector, and last is semantic feature with Paragraph Embedding. POS tagging impact analysis converts data as a sparse row matrix and concatenate with TF-IDF vector. After that, stack paragraph embedding vector in horizontal sequence with above feature matrix. These all feature extraction methods help to analyze the text in depth level where as feature selection methods generally considers to find the subset of existing features. Analysis of text is important part to find actual context. Using various feature selection methods it can removes the important words that contain contextual meaning of text. In existing approach, this research considers three important feature extraction approaches where it considers the relevant importance of word and semantic similarities of text. With feature selection methods the existing long text can remove many important words that effect the model performance. So, final feature matrix considers for classification with CATBoost classifier.

Both binary and multiclass classification follows same model structure. Figure 2 describes the model architecture for revenge text classification and all the steps to detect revenge text illustrates below. Algorithm 1 presents complete flow of CatRevenge model with all five steps – cleaning and preprocessing, POS tag based impact weight, TF-IDF, Paragraph Embedding and CATBoost classifier.

Fig. 2
figure 2

Revenge text detection framework

3.1 Cleaning and preprocessing

This study employs various cleaning and preprocessing steps for social media revenge posts. Analysis of complex human behavior from social media posts initially removes author information columns, symbols, and punctuations. It also removes irrelevant columns and null rows from the dataset. This research removes stopwords using NLTK python library [48]. Elimination of stopwords means removing noise that helps to enhance the speed of processing time. This research also considers slang word meanings from the fuzzy logic based slangzy dictionary to include significant information about tokens for revenge text [49]. Preprocessing steps also combine two columns for more accurate detection of revenge posts. In this analysis, title column and post column considers as a single post column.

Algorithm 1
figure a

Revenge Text classification algorithm

3.2 Impact Weight Analysis with syntactic feature

This research implements syntactical features-based impact weight analysis where it considers POS tagging. Parts of speech tagging classification system reveal the role of a term in a particular context. The English language considers eight POS tags – verb, noun, adjective, adverb, pronoun, preposition, interjection, and conjunction. POS tag is a significant approach for text analysis that helps to show the sentence and word relations. Grammar checking, text to speech, and word sense disambiguation of words are important POS tags applications in text analysis. NLTK python library supervised learning approach to determine POS tag [50]. NLTK POS tagger uses 35 POS tags.

Traditional POS tag does not represent the numerical impact weight analysis of each tag. This study considers the POS tag syntactic feature and computes the impact weight of each tag in a post. Noun, verb, adjective, and adverb POS tags assign for impact weight analysis. It computes the significance of a tag in the corpus. Syntactic feature like POS tags can able to obtain hidden information from textual data. It also extracts the canonical form of a term that helps to analyze syntactical information from a post [50].

$${POS-TAG}_{ij}= \frac{|{tag}_{ij}|}{|{tag}_{j}|}$$
(1)

where, \(|{tag}_{ij}|\) is a denoted as a total number of frequencies for each POS tag i for each tagged corpus j. \(|{tag}_{j}|\) denoted as a total number of frequencies for all POS tags and for each tagged corpus j. \({POS-TAG}_{ij}\) is set of POS tag vectors for each tag in tagged corpus j. The \({POS-TAG}_{ij}\) vector finally converts into a compressed format of sparse row matrix that considers a syntactic feature.

3.3 Lexical feature with TF-IDF

Lexical feature analysis is an important NLP process that intuits the importance of keywords, meanings, context, and the relation between terms. Keywords and terms represent significant content from documents and sentences that enrich the classification task.

TF-IDF is a statistical-based important keyword or term extraction approach that effectively extracts keywords and terms from documents. It consists of two methods: Term frequency or TF and Inverse Document Frequency or IDF. TF computes the frequency of keyword or term occurrence for a particular document. Variation of document lengths can directly change keywords frequency. So, TF measures the ratio of term frequency and document length.

$${TF}_{ij}= \frac{{t}_{ij}}{\sum_{k}{t}_{kj}}$$
(2)

where frequency of term i is \({t}_{ij}\) and term available in document j. IDF computes the relevance of keywords because TF considers all terms with equal importance. So it requires to measures the importance of keywords. IDF considers log value to cut down the weight for less important keywords.

$${{\text{IDF}}}_{i}= {\text{log}}\frac{|D|}{1+|{D}_{i}|}$$
(3)

where the total documents number is denoted as \(|D|\) and the total documents number with term i is \(|{D}_{i}|\). So, TF-IDF is a combination of term frequency measures and inverse document frequency measures.

$${\text{TF}}-\mathrm{IDF }= {TF}_{ij}* {{\text{IDF}}}_{i}= \frac{{t}_{ij}}{\sum_{k}{t}_{kj}}* {\text{log}}\frac{|D|}{1+|{D}_{i}|}$$
(4)

The revenge post categorization research applies the TF-IDF feature to extract the relevance and occurrence of each word in a context. Numeric representation of a corpus shows characteristics of textual data where TF-IDF represents numeric representation of occurrence. This feature normalizes word occurrence based on document size as well as the contribution of words in the corpus. TF-IDF vectors empower NLP models and have enormous applications in the NLP domain [51].

3.4 Semantic Feature with Paragraph Embedding

Along with syntactic and lexical features, this research requires a suitable semantic feature to extract similarities from revenge text. This research could have applied syntactic and lexical features for classification but to enhance further refinement, it applied semantic features to compute the comparative distance between each term in the revenge context. Paragraph embedding learns vector representations to plot each term in such a manner that it can compute numerical distances between each term based on the context. With lexical and syntactical features, revenge detection model efficiently extracts the semantic meaning of active and passive revenge text to represent the context of each term.

In existing research, various semantic embedding approaches have been applied to several text analysis research. Word2Vec and Paragraph Embedding represent low dimensional vectors for each word or document. Word2Vec model [52] represents high semantic similarities of terms in continuous space whereas the Paragraph Embedding model [53] uses distributed memory model to represent terms and paragraphs as low dimension vectors. This research also implemented the Word2Vec model for semantic embedding features but the Paragraph Embedding model shows better results compared to the Word2Vec model. Along with that BERT is also an efficient embedding model for text analysis. Various NLP research in text analysis including hate speech detection [25], cyberbullying detection [27], short answer grading [36] has utilized BERT embedding model and it has shown improved contextual analysis compared to the other embedding model. This research considers paragraph embedding model due to the long text and each post shared as a paragraph with various connected sentences. In future research, long revenge stories will evaluated with various other embedding models including BERT embedding model to explore the contextual analysis more.

The revenge detection framework employs the Paragraph Embedding vector model that can able to learn semantic relatedness from revenge posts describes in Fig. 3. The semantic embedding Paragraph Embedding model aims to analyze semantic similarities between various terms in the revenge context. The Paragraph Embedding model represents low dimension vectors for text data and that learn token vectors with distributed memory. It considers Distributed Memory Vector or PV-DM to recall missing words in a related context. In comparison with the PV-DBOW or distributed bag of words model, the PV-DM model performs better in revenge text classification context. It maps each post and word to a unique vector.

Fig. 3
figure 3

Paragraph Embedding model with Revenge text

Along with the above details of paragraph embedding, this research analyzes vector dimensions and window size value, especially for revenge text classification research. Window size value 6 is applied for broader contents whereas 2 is used for smaller and more focused contents. k = 2 considers w-2, w-1, w + 1, w + 2 context words for target term w. The Smaller window size with focused revenge text contains shows better performance. This research experimented with various window sizes k = 2, 3, 5, 6 and vector dimensions 50, 100, 200, 300, 600, and 800. The best performing parameter values are k = 2 and dimension 300.

The Revenge detection model considers three features with various models like Paragraph Embedding, impact weight with POS tagging and lexical feature with TF-IDF. The proposed model considers revenge text from social media. Figure 4 shows the flow of sample revenge text with all feature exploration that considers concatenation between POS tag impact weight and TF-IDF vectorization. It also considers stack with paragraph embedding vectors. “Concat” or concatenation method mainly considers for concatenating data frames of TF-IDF vectors and value of POS tag impact weight along with the column axis. This combine feature stack with paragraph embedding vectors by considering sequence of arrays in horizontal manner to make single array.

Fig. 4
figure 4

CatRevenge Model workflow with sample example

3.5 CATBoost classifier

The proposed study considers one gradient boosting machine learning classification algorithm to detect categories of revenge text. Gradient boosting models can efficiently handle learning problems with noise data and heterogeneous features for various NLP research. Generally, gradient boosting considers decision trees for base prediction. Numerical features are convenient for decision trees but various datasets consider categorical features to improve prediction. Categorical features are discrete value sets and not comparable to each other. Generally, categorical feature converts into numeric before training. Gradient boosting effectively accelerates the classification tasks and also reduces the consumption of memory.

The CATBoost classifier [54] is a new non-linear, tree-based gradient boosting algorithm that can effectively handle categorical features [55]. Generally boosting algorithms build new tree to compute the model gradients where CATBoost classifier enhance the existing model by reducing overfitting problems. There are several advantages of the CATBoost algorithm like it supports categorical features, it uses a new schema to reduce model overfitting, and it predicts faster and shows good performance for heterogeneous data. The CATBoost classifier shows achievable performance compare to the Light GBM, XG Boost algorithms for various applications [39].

Categorical boosting or CATBoost aims to reduce the shift in the training phase. This shift arises because gradient boosting applies the same instances for gradient and model estimation to minimize gradients. CATBoost provides the solutions for this shift problem where it estimates gradients by applying sequences of base models and it excludes that particular instance from the training set. CATBoost model considers various hyper parameters for classification tasks – learning rate, depth of the tree, iterations for leaf estimations, and regularization coefficient. This study considers latest optimization tool Optuna for experimentation and improvement of CatBoost model.

Preliminary revenge text research has only implemented some machine learning classifiers, but this research experiments with various deep learning and gradient boosting algorithms. Gradient boosting improves training efficiency and classification accuracy for revenge text. This study considers two classes for binary classification and three classes for multiclass classification. The two classes for binary classification are defined as y ϵ {0, 1} where y = 0 for active revenge and y = 1 for passive revenge. The three classes for multiclass classification defined as y ϵ {0, 1, 2} where y = 0 for Malicious compliance, y = 1 for pro revenge and y = 2 for petty revenge. This research considers this data for training and testing purpose \(\left\{\left({X}_{1}, {y}_{1}\right), \left({X}_{2}, {y}_{2}\right)\dots \dots . \left({X}_{i}, {y}_{i}\right), \dots \dots \left({X}_{n}, {y}_{n}\right)\right\}\), where \({X}_{i}\) is the final feature vectors and \({y}_{n}\) is the target class.

4 Experiments, results, and findings

This section initially presents the experimentation setup with dataset details, baseline model, and evaluation metrics for this research. Along with that, the next part of this research presents results and findings with various experimentations – impact analysis for various features, performance with paragraph embedding, comparison with other classifiers, and baseline model.

4.1 Experimentation setup

This section initially presents the dataset details and after that it presents the brief of state of the art models and evaluation metrics for comparisons.

4.1.1 Dataset

This study considers one revenge dataset to analyze complex human behavior. The revenge Reddit (social networking site) dataset considers for this research with subreddit (class) of active revenge and passive revenge for binary class and malicious compliance, petty revenge and pro revenge for multiclass. The dataset is extracted from a specified locationFootnote 1 in csv file format with all details of dataset. It considers three subreddits or topics that pulled the most recent 550 days of data. After preprocessing, the dataset contains 11,189 English posts and task A considers binary classification and task B considers multiclass classification. Both binary and multiclass classification approaches are related to each other. This Reddit dataset is already labeled and to solve any discrepancy, this research appointed two subject matter experts for accurate labelling process.

4.1.2 State-of-the-art model

The revenge text classification is a fresh research direction, so this work can able to consider one State-of-the-art models for this research.

Vengeful Text

This study considered Adaboost classifier, POS tag and Word2Vec embedding. KNN classifier with same features also considers as a baseline model [12].

4.1.3 Evaluation Metrics

This revenge detection study applies four important evaluation metrics – Accuracy, Weighted F1 (F1), Precision (P) and Recall (R). Accuracy and weighted F1 metrics consider for best analysis of various classification models and baseline implementation.

$$P= \frac{{T}_{P}}{{T}_{P}+{F}_{P}}$$
(5)
$$R= \frac{{T}_{P}}{{T}_{P}+{F}_{N}}$$
(6)
$$F1=2* \frac{R*P}{R+P}$$
(7)
$$Accuracy= \frac{{T}_{P}+ {T}_{N}}{{T}_{P}+ {F}_{P}+ {T}_{N}+ {F}_{N}}$$
(8)

where \({T}_{P}\) denoted as true positives, \({T}_{P}\) denoted as false positives, \({T}_{P}\) denoted as true negatives and \({T}_{P}\) denoted as false negatives.

This revenge detection model implements with various python libraries. Reddit dataset collected from Github location and that already specified in dataset details section. The experimentation section considers 25% of data for testing in a stratified fashion. The result section is divided into two tasks – the first section shows the performance of the active and passive revenge classification model and the second section analyzes the performance of malicious compliance, petty and pro revenge classification model.

4.2 Results and Findings

This results and findings section analyze the performance of the CatRevenge model. It studies the impact analysis with various features along with CATBoost classifier. In next part it compares various machine learning and deep learning models with a combination of CatRevenge feature set. The final part represents the performance improvement compare to the baseline model.

4.2.1 Impact Analysis for Features

Feature extraction from social media text is an important part for textual analysis. The CatRevenge model consists of various features where this section shows the impact analysis of various features and feature set. This study analyzes the impact of the CatRevenge feature sets with other features for both binary and multiclass classification models. To compare with various other features and feature set this study considers – Bag of Words, Word2Vec, Paragraph Embedding, count vector + TF-IDF, POS tag impact weight + Paragraph Embedding, TF-IDF + Paragraph Embedding, and proposed CatRevenge feature set POS tag impact weight + TF-IDF + Paragraph Embedding. All features and combinations of features consider the CATBoost classification algorithm and (Table 1) presents the comparison with P, R, and F1 evaluation metrics. It also shows the performance of features and feature sets with binary and multiclass classification.

Table 1 Impact analysis of various features and feature set

Table 1 shows the performance of the CatRevenge feature set outperformed compared to any feature and feature set. Apart from the CatRevenge feature set, POS tag impact weight + Paragraph embedding feature combination shows a better F1 score compared to other features and feature sets. It was identified that POS tagging based analysis helps to analyze the importance of relevant word in text. It was also observed that semantic analysis with paragraph embedding model achieved better F1 score compare to word2vec embedding model for both cases. In Reddit revenge dataset each post contain long sentences and paragraph like contents, so analysis between sentences and words are more effective with paragraph embedding model. So, the combination between POS tag impact weight analysis and contextual semantic analysis are effective in long paragraphs like posts and that enrich better F1-score. Along with above feature extraction methods, CatRevenge considered TF-IDF feature to improve efficiency of the model more.

4.2.2 Performance with paragraph embedding

This CatRevenge model shows the performance of paragraph embedding vectors with various dimensions in Fig. 4 and Table 2 shows two types of paragraph embedding model performance – PV-DM and PV-DBOW. It was observed that the change of dimensions reflects the CatRevenge model efficiency. It was also observed that the performance of the PV-DM model performs better compared to the PV-DBOW model for the revenge text dataset. As PV-DM model contextually analyzes the contents and is able to find the missing terms. This unsupervised approach can efficiently analyze the text and context of the word. CatRevenge model considers PV-DM that outperformed other classifiers with the same feature set.

Table 2 Comparison with both Paragraph Embedding Models

With the experimentation of various dimensions, Fig. 5 presents a clear comparison for binary and multiclass classification where 300 vector dimension shows better performance for long text. The CatRevenge model considers model dimension 300 for better results.

Fig. 5
figure 5

Paragraph embedding vectors with various dimensions

4.2.3 Comparison with Other classification models

This work compare the CatRevenge model with seven machine learning and deep learning models – SVM, Random Forest, MLP, Logistic Regression, XGB Boost, Naïve Bayes, and LSTM. The machine learning and deep learning models are considered the combination of three feature sets for comparing proposed model’s performance. Along with that LSTM model considers epoch size 5, batch_size 64, 'crossentropy' loss function and 'adam' optimizer. MLP classifier considers Relu activation function, 'adam' optimizer, and epoch number is 300. XGBBoost classifier considers learning rate 0.05, and max depth of a tree 5. Other all classifiers consider default parameter values. These all machine learning and deep learning models consider to evaluate the performance of proposed CatRevenge model.

Table 3 presents this comparison and considers the CatRevenge feature set with various classifiers. It is observed that Gradient boosting algorithms (XGB Boost and CATBoost) perform better compare to other models. It considers P, R and F1 for evaluation measures and Table 3 shows the comparison for binary and multiclass classification. The CatRevenge model with CATBoost algorithm outperforms all the machine learning and deep learning classifiers. Figure 6 shows the clear evaluation line plot of CatRevenge model with all other machine learning and deep learning classifiers. Here, CATBoost classifier achieved 0.87 F1-score for binary revenge classification. In same manner, CATBoost classifier also achieved 0.80 F1-score for multiclass revenge text classification.

Table 3 Comparison with various Machine Learning and Deep Learning classifiers
Fig. 6
figure 6

Accuracy comparison with all other machine learning and deep learning classifiers

Along with above binary and multiclass revenge post classification results table and bar plots, it also plot two heatmap classification reports. In Fig. 7 (a), it is observed that testing data contains almost equal numbers of active and passive revenge. F1-score is 0.88 for active revenge and precision, recall metrics also show better value for active revenge. In Fig. 7 (b), it is observed that testing data contains more malicious compliance revenge posts compared to the pro and petty revenge posts. F1-score is 0.87 and recall is 0.91 for malicious compliance. Both the multiclass and binary revenge post classification shows achievable performance with CATBoost classifier.

Fig. 7
figure 7

Classification Reports for CatRevenge Model with (a) Binary Classification (B) Multiclass Classification

Figure 8 and Fig. 9 present ROC curves for binary and multiclass classifiers. ROC curve shows the comparison between false-positive rates and true positive rates. Figure 9 represents ROC for three class where class 0 denotes pro revenge, class 1 denotes petty revenge and class 2 denotes malfigicious compliance. AUC values for both binary and multiclass classification show achievable performance of CatRevenge model.

Fig. 8
figure 8

ROC Curve – Binary classification

Fig. 9
figure 9

ROC Curve – Multiclass classification

4.2.4 Performance analysis of CatRevenge models with sample text

The analysis of some example dataset is considered to show the performance of proposed CatRevenge model for binary and multiclass classification. Each of the post from Reddit dataset are very long, so we considered three.

examples in Table 4 and there annotated labels. Along with that Table 4 also shows the predicted labels of this posts. It was observed that some samples are wrongly classified by proposed model. In Table 4, second post is wrongly classified by proposed model. It was also observed that some of the posts are wrongly classified for binary and multiclass classification. Long text contains various contextual information and this may effects in model performance.

Table 4 Performance analysis of CatRevenge model for binary and multiclass classification with sample text

4.2.5 Comparison with machine learning models and baseline model

This CatRevenge model compare with other models and state-of-the-art model in Table 4 with considering revenge text dataset. This is the fresh research area, so any standard datasets is not available for evaluation. This research considered other machine learning models with count vector and TF-IDF feature set. It considered two machine learning models Naïve Bayes classifier and Random Forest classifier.

This research also considered one baseline model vengeful text identification [12] with POS tagging, word2vec embedding and KNN, AdaBoost classifiers. This comparison table considered the standard revenge text dataset to evaluate the performance of machine learning models and baseline model Table 5.

Table 5 Experiment results for Revenge detection. Binary Revenge classification considers active and passive revenge and Multiclass Revenge classification considers Malicious Compliance, Petty Revenge and Pro Revenge

This research considers weighted F1 as main metric, it is observed that the CatRevenge model increases 6—10% weighted F1 compared to the baseline model for binary classification. Binary and multiclass classification of revenge post shows achievable performance with Gradient boosting classifiers with all three features. It is observed that the CatRevenge model increases 2.5—10% weighted F1 compared to the baseline model for multiclass classification

In order to evaluate the statistical significance of proposed CatRevenge model, this research considers McNemar’s test [56] to analyze the paired observation. This research analyzed revenge text classification results for each post before and after considering baseline method in consideration with McNemar’s test. This test can check the differences in error rates between two models with statistical analysis. The McNemar’s test is mainly based on \({\chi }^{2}\) and this is one of the best test for error rate analysis. This research considers McNemar’s test because it shown low type 1 error. In experiment 1, proposed model compared with baseline model where it considers AdaBoost classifier [8] and it shown \({\chi }^{2}\) = 32.33 and p value < 0.001. In experiment 2, proposed model compared with baseline model where it considers K-NN classifier [12] and it shown \({\chi }^{2}\) = 30.13 and p value < 0.001. Both the experiments with McNemar’s test show that, statistically significant change is observed in error rates for both methods.

Along with statistical test, the hyperparameter optimization method is considered for proposed CatRevenge model. This research considers latest hyperparameter optimization tool Optuna for experimentation and improvement of CatBoost model [57]. Optuna tool optimize tree based search for hyperparameter and it considers method TPEsampler. Optuna can also able to set superparameter. It can able to find the best hyperparameter. Total number of finish trials is considered as 100, and experimentation also shows the best parameters values for CatBoost classifier.

Above Fig. 10 shows the values with hyperparameter optimization. This research considers this hyperparameter values for k-fold cross validation.

Fig. 10
figure 10

Values with hyperparameter optimization

The K-fold cross validation approach is considered for assessing the proposed model performance especially for machine learning classifiers [58]. This research considered 2, 5 and tenfold cross validation for binary and multiclass classification. Table 6 presents the detail results of K-fold cross validation approach. In binary classification, tenfold cross validation outperformed with achieving 0.859 accuracy, compared to the 2 and fivefold cross validation. In same manner, this research also evaluate multiclass classification with 2, 5 and tenfold cross validation. It also outperformed for tenfold cross validation with achieving 0.809 accuracy. After considering all validation and testing, the proposed model shows achievable performance compare to other baseline model.

Table 6 K-fold cross validation with revenge text dataset

The automatic revenge text detection model is beneficial for society and social structure. Evaluation of proposed CatRevenge framework exhibits the improved performance for revenge text detection compare to the other state of the art strategies. It was observed that efficient classification of active and passive revenge can help to block the strong revenge from social media. The proposed framework also able to detect grievances and revenge stories from paragraph but some text suffers from misclassification. Revenge text consists of long sentences and long paragraphs. Revenge categorization is a challenging task based on long text, as complete paragraph shows mixed categories and ambiguous meaning. It was also observed that some of the revenge text consists of emojis that contain hidden information about revenge. In future research revenge text detection will consider emojis for more accurate classification.

5 Conclusion and Future Scope

The experimentation of the CatRevenge model revealed that the binary and multiclass revenge text detection accomplished satisfactory performance. This research successfully investigated the revenge posts detection approach with contextual semantics and the impact weight of each POS tagger for the English language. This CatRevenge model also considered gradient boosting classifier CATBoost and evaluated the performance with lexical, syntactical, and contextual semantic feature sets.

The result revealed that contextual semantic analysis with paragraph embedding is significant to categorize active revenge for long text. It also observed that Distributed Memory vector critically influenced the performance of the CatRevenge model. The results also indicated that impact analysis of each POS tag enhanced the classification accuracy and weighted F1 for long revenge text. Analysis of revenge posts with NLP, contextual semantic similarity extraction with paragraph vector is significant to produce meaningful classification results. Thus, this work considered a paragraph embedding model that creates vectors for each paragraph to reflect the contexts of revenge terms. Finally this work achieved 6% improvement for binary class and 2.5% improvement for multiclass revenge text classification with the weighted F1 score compared to the baseline model.

Research with revenge datasets is fewer, so the future path is vast. This study is restricted to a single English language revenge dataset, but the CatRevenge model can be used with diverse textual datasets from different sources and languages. Future work may implement negative emotion boundaries to detect implicit active revenge more efficiently. Emotion analysis can able to detect positive and negative emotion along with degree of negative emotion in text. Analysis of emotion level may help to categorize the revenge text in depth level. It was observed that some long revenge stories contain humor contents that misclassified the revenge text. So sarcasm detection or humor detection from long text may improve the misclassification error. In future research, long revenge stories will evaluated with various other embedding models including BERT embedding model to explore the contextual analysis more. Emoji is a symbol that contain hidden information about the complete text and it can also contain hidden expression. Analysis of hidden emotions, sentiments and information from emojis can also enhance the classification performance for revenge text detection. Revenge text in social media may also contain code-mix languages with other low resource languages. So in future, analysis and processing of code-mix revenge text may enhance the performance of revenge text detection model. Minimization of misclassification errors can also enhance the model accuracy. Furthermore, it is acclaimed that the CatRevenge model may improve the efficiency of various aggressive, hate, and cyberbullying detection research.