1 Introduction

The rapid development of microblogging websites such as Twitter [1] has effectively made a greater foundation for understanding the dislikes and likes of the public. Such sites are widely used for exchanging views and opinions on different topics that could be uncritical and socially critical. Furthermore, the social response to certain incidents could also be calculated by these microblogging websites. In particular, persons get to share incidents based on natural disasters [2], become acquainted with the impacts of newly released movies, and have opinions regarding recently manufactured products [3]. These positive increases in microblogs and variances of public reviews gave rise to the area of opinion mining [4], which has been shown to be an essential component for understanding the mindset of the public and recent trends.

Sarcasm is a rhetorical manner of uttering negative emotions or dislikes through an exaggerated language construct. It is a collection of false and mockery politeness for intensifying hostility without obviously doing so. In face-to-face conversations, sarcasm could be detected easily via gestures, tone of the speaker, and facial expressions. However, detecting sarcasm in textual communications is not simple since any of these cues are easily accessible. Using the explosion of Internet use, detecting sarcasm in online communication from e-commerce websites, social media platforms, and discussion forums has been critical for identifying cyberbullies, opinion mining, and sentimental analyses—online trolling. The concept of sarcasm has gained much attention from neuropsychology [5] to linguistics; however, emerging computational methods for automated detection of sarcasm remain at their earlier stage. Previous studies on sarcasm detection on text using pragmatic (context) and lexical (content cues) [6], such as sentimental shifts, interjections, and punctuation, are the key indicators of sarcasm. In this work, the features are handcrafted and cannot be generalized in the existence of figurative slang and informal language, i.e., broadly employed in online conversation.

Gibbs [7] examined the record of discussions between friends and established that 8% of sarcastic words were in every record. Whereas sarcastic expression is mostly utilized in product reviews, daily communications, and social media, sarcasm detection has recently attracted growing interest from Natural language processing (NLP) researchers and aims to manually detect sarcastic expression through the text. Because of the ambiguity and complexity of sarcasm, sarcasm detection has become stimulating NLP tasks and is extensively used in human–machine dialog, sentimental analyses, and other NLP applications. With the development of machine learning approaches, current studies [8] leverage neural networks for learning contextual and lexical features, removing the need for handcrafted features. In this work, word embeddings are integrated for training deep convolution-, recurrent-, or attention-based neural networks to achieve the advanced results on multiple large-scale datasets.

When the ML-based approach achieves the remarkable results, they miss interpretability. Almost all prior sarcasm detection algorithms are based on automatically created sentimental features [9]. Various studies extracted features using sentiment data for detecting sarcasm with conventional ML methods. As extracting features includes many manual laborers, few researchers have recently attempted to use ML models to resolve these problems. [10] proposed that contrastive situations (or sentiment) are common in sarcastic language. However, it only deliberates all pairs of words in the sentence without taking into account the significance of sentimental semantics. Sentimental information should be examined more for detecting precise sarcasm [25,26,27,28].

This paper presents an Intelligent ML-based sarcasm detection and classification (IMLB-SDC) technique on social media. The IMLB-SDC model involves different stages of operations such as preprocessing, feature engineering, Feature selection (FS), classification, and parameter tuning. Besides, feature engineering process takes place using Term frequency—inverse document frequency (TF-IDF). Moreover, two FS approaches are used, namely chi-square and information gain. Furthermore, the IMLB-SDC model involves the Support vector machine (SVM) as a classification model and the penalty factor \(C\) can be optimally tuned by the use of Particle swarm optimization (PSO) algorithm. An extensive simulation analysis is carried out to guarantee the betterment of the IMLB-SDC technique.

The rest of the paper is organized as follows. Section 2 offers the related works, Sect. 3 discusses the proposed model, and Sect. 4 provides the experimental validation. Lastly, Sect. 5 draws the conclusion.

2 Literature review

This section reviews the state-of-the-art sarcasm detection and classification approaches. Razali et al. [11] emphasized identifying sarcasm in Twitters through DL-extracted features integrated by contextual handcrafted features. Feature sets are extracted from a CNN framework beforehand and integrated carefully with a handcrafted feature set. This handcrafted feature set is generated according to its corresponding contextual explanation. All feature sets are especially developed for the individual task of detecting sarcasm. The aim is to discover the optimum features. Banerjee et al. [12] proposed synthetic minority oversampling-based models for mitigating the problems of imbalanced classes that could seriously affect the classifier's efficiency in social network sarcasm detection. In the present work, five distinct variations of the synthetic minority oversampling method were employed on two distinct datasets of differing sizes. The reliability is determined through testing and training six well-known classifiers and measures their performances regarding test phase confusion matrix-based performance metrics.

Ren et al. [13] proposed a multilevel memory network by means of sentimental semantics for capturing the features of sarcasm expression. In this method, they utilize a first-level memory network for capturing sentimental semantics and exploit the second-level memory network for capturing the contrasts among sentimental semantics and the situations in all sentences. Furthermore, they utilize an enhanced CNN for improving the memory network in the deficiency of local data. Porwal et al. [14] aim at an RNN approach for detecting sarcasm since it manually extracts features essential to ML models. In addition to RNN, this method also utilizes an LSTM cell on TensorFlow for capturing semantic and syntactic information on Twitter's post for detecting sarcasm.

Abulaish and Kamal [15] proposed a new self-deprecating sarcasm detection method with an amalgamation of rule-based and ML methods. The rule-based technique aims to detect candidate self-around twitters, where ML methods are employed for feature classification and extraction. The overall number of eleven features, involving five hyperbolic and six self-deprecating features, is detected for training three distinct classifications—bagging, DT, and NB. Kumar et al. [16] proposed a DL approach named sAtt-BLSTM convNet, i.e., depending upon the hybrid sAtt-BLSTM and convNet employing GLoVe for word depiction to build semantic word embeddings. In addition to the feature map produced using the punctuation-based auxiliary features, sAtt-BLSTM is fused to convNet.

Chia et al. [17] examine irony and sarcasm on Twitter through feature engineering and ML methods. Initially, they clarified and reviewed the determination of sarcasm and irony by deliberating several studies concentrating on the terms. Then, the initial experiments are carried out relating several kinds of classification approaches involving few common classifiers for text classification tasks. For the next experiments, distinct kinds of data preprocessing approaches were analyzed and compared. Shrivastava and Kumar [18] investigate critical problems of sarcasm recognition in text-based communications. To avoid these problems, a new method was introduced on the basis of Google BERT, which is capable of handling the veracity, volume, and velocity of the data. The efficiency of the models is related to other contemporary and classical methods such as SVM, LR, LSTM, CNN, BiLSTM, and attention-based methods that were stated to be utilized for this task.

Kumar and Harish [19] proposed a new algorithm for classifying sarcastic text through a content-based FS approach. The presented method consists of a two-stage FS approach for selecting highly characteristic features. Initially, traditional FS techniques such as Mutual information (MI), chi-square, and Information gain (IG) are employed for selecting appropriate feature subsets. The elected features subset is additionally sophisticated by the next phase. Next, k-means clustering algorithms are applied to select highly characteristic features among the corresponding features. The selected features are categorized by SVM and RF classifiers. Bharti et al. [20] presented a Hadoop-based architecture that captures real-time tweet posts and processes them with a group of methods that efficiently finds sarcastic sentiments. Additionally, they notice that the elapsed time for processing and analyzing under Hadoop-based architecture considerably outperforms the traditional algorithms and is highly suitable for real-time streaming twitters.

3 The proposed IMLB-SDC technique

In this study, a new IMLB-SDC technique is derived to detect and classify sarcasm from social networking data. The IMLB-SDC technique encompasses different stages, such as preprocessing, TF-IDF-based feature engineering, FS (chi-square, information), SVM-based classification, and PSO-based parameter tuning. Figure 1 illustrates the overall process of IMLB-SDC model. The detailed working of these processes is discussed in the succeeding sections.

Fig. 1
figure 1

Overall process of IMLB-SDC model

3.1 Data preprocessing

At the initial stage, the actual input data are preprocessed into a compatible format by the use of diverse subprocesses, namely tokenization, stemming and lemmatization, and POS tagging.

3.1.1 Tokenization

A token is a series of characters that is treated as a group. The process of decomposition of a text to tokens allows creating a token count that could be utilized as a feature. A token can be a sentence, paragraph, and so on; however, the frequently used words are selected as tokens in text categorization.

3.1.2 Stemming and lemmatization

Stemming along with lemmatization is a widespread model for NLP. The intention of this method is to minimize the quantity of the inflected words by stripping the suffixes of the word for retrieving a “base form” of the word. Stemming is developed based on the concept that words with identical stems are closer in meaning and that NLP can be enhanced when those different words could be combined together to an individual term; as a result, the feature count is minimized when stems are utilized rather than the actual words, for instance, argue, argued, argues, and arguing. Next, a stemming technique will detect the suffix present in words such as “e, ed, es, ing” and strip the words to the stem “argu.” From the above instance, it can be noted that the stem needs are not a complete word. In linguistics, a lemma is the canonical form of a word. For instance, the verb “to walked” could appear as “walking,” “walked,” and “walks.” The canonical way “walk” is a lemma, and the approach to convert the words to the corresponding lemma is called lemmatization. Compared to stemming, the lemmatization process is considered to be difficult, as the lemmatization techniques parse the text at the beginning to identify the part of speech of words because lemmatization will not be considered in this report.

3.1.3 Part-of-speech (POS) tagging

The POS tagging model can also be applied to extract the features. It is a fascinating model in stylometry that can be employed in NLP. It allocates every token to its word class, i.e., tokens will be tagged with its POS. An advantage is that it can differentiate homonyms as given below.

  • Put/VB it/PRP back/RB.

  • I/PRP hurt/VBP my/PRP$ back/NN.

Here, back is tagged as an adverb (RB) in the foremost sentence and as a noun (NN) in the next sentence. The POS tagger employed here depends upon an easy rule-based tagger that operates in two steps. At the initial step, every word will be allocated to a tag manually determined by a massively tagged corpus. In addition, when fresh words occur that are not present in the tagged corpus, they are allocated based on the following considerations: When the initial letter is capital, then it is allocated to a noun tag. Else, tag ends with identical three letters. Next, the tags are examined by a collection of rules that examine the order of the tags (Brill, 1992).

3.2 Feature engineering using TF-IDF technique

Once the input data are preprocessed, the next stage is the extraction of features using the TF-IDF technique. TF-IDF is a statistical value that determines the key role of owards in the document that depends upon the words that display for maximum times to offer more data regarding the document; simultaneously, they could sometimes provide information or no details about the word. TF-IDF maximizes the significance of the words using several iterations in a document and reduces the significance of words that appear in various documents.

$$tf-idf=tf\left(t,f\right)\times idf\left(d,D\right)$$
(1)

An easy option for selecting the term frequency \(tf(t, f)\) is to apply frequency \(f(t, d),\) the number of occurrences t in a document \(d\). If the length of the document varies, then it could be an optimized suggestion for normalizing the term frequency, as longer documents consist of maximum occurrences of words. Equation (4.2) is an instance of \(tf(t, f)\) might look like.

$$tf\left(t,d\right)=0.05+\frac{0.5\times f(t,d)}{\mathrm{max}\{f\left(w,d:w\in d\right\}} $$
(2)

\(Idf\) is a metric to indicate the importance of a word. The \(tf(t, f)\) treats all the words as important features. Few words are common like “and” and “to” which has higher frequency but provides a small amount of information regarding categories of classes. Hence, \(idf(t, d),\) Eq. (4.3), demonstrates the importance of words that occur in various documents by classifying the techniques of documents \(N\) by the number of documents \(d\) in \(D\), which consists of the word \(t\).

$$idf\left(t,D\right)=log \frac{N }{\left|\left\{d\in D:t\in d\right\}\right|} $$
(3)

3.3 Feature selection

The main objective of FS is to choose a subset for use in classification process. The text information comprises features which are redundant and irrelevant. The redundant features have no contribution in separating the classes from each other. This process decreases the number of data to be processed to a lower amount and saves time for classification operation. Alternate merits for few classification techniques are to reduce the threat of overfitting the information.

3.3.1 Chi-square

The chi-squared statistic is a metric used to investigate division among stochastic parameters. If these variables are identical to chi-square, it produces a minimum simulation outcome and a higher result, while the variables have maximum differentiation. Definition: If \(\nu \) is an independent variable, \({x}_{i}\) is generally shared using mean \({\mu }_{i}\) and variance \({\sigma }_{i}^{2}\); hence, chi-squared is described as:

$${X}^{2}=\frac{{\left({x}_{1}i{\mu }_{1}\right)}^{2}}{{\sigma }_{1}^{2}}+\frac{{\left({x}_{2}-{\mu }_{2}\right)}^{2}}{{\sigma }_{2}^{2}}+\dots +\frac{{\left({x}_{v}-{\mu }_{v}\right)}^{2}}{{\sigma }_{v}^{2}}=\sum_{i=1}^{v}\frac{{\left({x}_{i}-{\mu }_{i}\right)}^{2}}{{\sigma }_{i}^{2}} $$
(4)

Usually, the chi-squared statistic is applied to estimate the optimal distributions to be fixed collectively. The chi-squared strategy is employed as a measure to find the converse by extracting the words from the text that has the same distributions for each class; thus, it is not related to the classification process.

3.3.2 Information gain

To understand information gain, entropy should be explained. Entropy is a statistical measure that can interpret the (im)purity of the corpus and is described as:

$$H\left(p\right)=-\sum_{x\in X}p\left(x\right)\mathrm{log}p\left(x\right) $$
(5)

The information gain could be applied in selecting consecutive attributes to narrow down the class of an instance in a smaller path. By means of entropy, the information gain can be defined as follows:

$$I\left(X,Y\right)=H\left(X\right)-H\left(X|Y\right)=H\left(Y\right)-H(Y|X) $$
(6)

By integrating Eqs. (5) and (6), information gain is obtained as:

$$IG\left(X;Y\right)=\sum_{x\in X}\sum_{y\in Y}p\left({x}_{i},{y}_{i}\right)log \frac{p({x}_{i},{y}_{i})}{p\left({x}_{i}\right)p({y}_{i})} $$
(7)

Despite comparing two random attributes, such as Eq. (7), information gain could be normalized to estimate the reduction in entropy by the comparison of a single attribute to a group of attributes. This method is applicable whenever text categorization occurs. Information gain in current situation is defined as:

$$IG\left(T\right)=\sum_{c\in \left\{{c}_{k},!{c}_{k}\right\}}\sum_{t\in \left\{{t}_{i},!{t}_{i}\right\}}p\left(t,c\right)log\frac{p(t,c)}{p\left(t\right)p\left(c\right)}$$
(8)

3.4 Data classification

At the classification stage, the feature subsets are fed into the SVM model to detect and classify the existence of sarcasm. SVM is a commonly employed ML technique to solve classification problems. In the SVM model, every data instance is plotted as a point in the n-dimensional space with the value of every feature existing in the value of a specific coordinate. Let \(\left({x}_{i},{y}_{i}\right)\), \(1<\mathrm{i}<N\), signify the benefit of data composing \(N\) trained samples [21]. Figure 2 depicts the hyperplane of SVM. Every instance follows the criteria of \({x}_{i}\in {R}^{d}.\) At this point, \({y}_{i}\) depicts the class of the respective sample, \({x}_{i}\). Therefore, \({y}_{i}\in \left\{-1, 1\right\}\) and \(d\) represent the count of dimensions of the input data. The separating hyperplane has been provided as:

Fig. 2
figure 2

Hyperplane SVM model

$$w\bullet {x}_{i}+b=0, 1 \le \mathrm{i}\le N. $$
(9)

If the hyperplane happens, then the linear partition was obtained. The sample neighboring I separating hyperplane was identified as the support vector. The borders (for instance, support vectors) (9) were altered by:

$$w\bullet {x}_{i}+b=\pm 1. $$
(10)

According to Eq. (10) of all the instances, Eq. (11) has been provided as:

$${y}_{i}\bullet \left(w\bullet {x}_{i}+b\right)\ge 1.$$
(11)

Therefore, this issue is for finding \(w\) and \(b\). Many hyperplanes that partition the two-class data afterward can exit the SVM to make the optimal hyperplane. These hyperplanes are higher distances near support vectors. The boundaries of separating hyperplanes are provided as \(2/\| w\| \). Therefore, the optimal hyperplanes are minimized \(\| w\| \). To ease this issue, is interchan½\(\left(1/2\right)\| w{\| }^{2}\) by \(\| w\| \). Hence, it can be allocated with optimized issues and represents minimizing ½ \(\left(1/2\right)\| w{\| }^{2}\) referred to by Eq. (11). In nonlinear issues, the optimistic slack parameters \({\zeta }_{i}\) were recognized. Therefore, the issue alteration is provided as:

$$\mathrm{Min}\frac{1}{2}\| w{\| }^{2}+\mathrm{C}\bullet \sum_{i=1}^{n}{\zeta }_{i}$$
$$s.t {y}_{i}\bullet \left(w\bullet {x}_{i}+b\right)\ge 1-{\zeta }_{i} $$
(12)
$${\zeta }_{\mathrm{i}}\ge 0,$$
$$1\le \mathrm{i}\le N.$$

where \(C\) implies the penalty factor. It can be demonstrated to manage trade-offs among borders as well as error minimization. This issue was solved employing Lagrange multipliers. Hence, the classification decision operation is altered as follows:

$$F\left(\mathrm{x}\right)=sign\left(\sum_{i=1}^{N}{\alpha }_{i}\bullet {y}_{i}\bullet K\left({x}_{i},{x}_{j}\right)+b\right), $$
(13)

where \({\alpha }_{\mathrm{i}}\) denote Lagrange multiplier. \(K\left({x}_{i},{x}_{j}\right)=\phi \left({x}_{i}\right)\bullet \phi \left({x}_{j}\right)\) is a kernel function by some additional map functions, \(\phi \left(\mathrm{x}\right)\). QP problem solver was employed for determining \({\alpha }_{\mathrm{i}}\). Afterward, the \(w\) and \(b\) is reached as:

$$w=\sum_{i=1}^{N}{\alpha }_{i}\bullet {y}_{i}\bullet \phi \left({x}_{i}\right), $$
(14)
$$b=\frac{1}{{N}_{\mathrm{SV}}}\sum_{i}\left({y}_{i}-\sum_{j}{\alpha }_{j}\bullet {y}_{j}\bullet K\left({x}_{j},x\right)\right). $$
(15)

where \({N}_{\mathrm{SV}}\) demonstrates the number of support vectors and \(x\) refers to the input unknown instances. Some general kernel functions were demonstrated as:

linear:\(k\left(x,y\right)=x\bullet y+1,\)

polynomial:\(k\left(x,y\right)={\left(x\bullet y+1\right)}^{\sigma },\)

RBF: \(\left(x,y\right)=exp\left(-\| x-y\| /\left(2\bullet {\sigma }^{2}\right)\right)\),

quadratic: \(1-\| x-y{\| }^{2}/\left(\| x-y\| +\sigma \right)\),

where \(\sigma \) must optimize tuned as \(C.\)

3.5 Parameter optimization

To determine the optimal parameter of \(C\), trial-and-error methods could be applied. However, applying these techniques would lead to overhead in performance, and there is no assurance for obtaining the best solution. In this study, PSO was utilized to optimize the parameters independently. It is associated with a global optimization model obtained from the study of fish schooling or bird flocking. PSO is simple and executed rapidly. Additionally, a Cross-validation (CV) approach has been introduced for constructing fitness functions that are employed for the PSO algorithm.

PSO is a populated search approach that derives from the study of the movement of organisms in bird flocking or fish schooling [22]. This technique is simple to implement and has some parameters for adjustment. The PSO implements the search utilizing a population (named as swarm) of individuals (named as particles) that are upgraded with all iterations. To discover the optimum result, all the particles move in the way of their preceding optimum place (pbest) and their optimum global place (gbest). The velocity and place of particles are upgraded utilizing Eqs. (16) and (17).

$${\mathrm{V}}_{\mathrm{ij}}\left(\mathrm{t}+\mathrm{l}\right)={\mathrm{W}}^{*}{\mathrm{V}}_{\mathrm{ij}}\left(\mathrm{t}\right)+{\mathrm{c}}_{{\mathrm{l}}^{*}}{\mathrm{r}}_{\mathrm{l}}^{*}\left({\mathrm{X}}_{\mathrm{pbest}}\left(\mathrm{t}\right)-{\mathrm{x}}_{\mathrm{ij}}\left(\mathrm{t}\right)\right)+{\mathrm{c}}_{{2}^{*}}{\mathrm{r}}_{{2}^{*}}\left({\mathrm{X}}_{\mathrm{gbest}}\left(\mathrm{t}\right)-{\mathrm{X}}_{\mathrm{ij}}\left(\mathrm{t}\right)\right)$$
(16)
$${\mathrm{X}}_{\mathrm{ij}}\left(\mathrm{t}+1\right)={\mathrm{X}}_{\mathrm{ij}}\left(\mathrm{t}\right)+{\mathrm{V}}_{\mathrm{ij}}\left(\mathrm{t}+1\right) $$
(17)

where \(\mathrm{t}\)” refers to the iteration counter, \({\mathrm{v}}_{\mathrm{ij}}\) implies the velocity of particle\(\mathrm{i}\)” on the \(jth\) dimensional, whose value was restricted to the range of\([{\mathrm{v}}_{\mathrm{min}};{\mathrm{v}}_{\mathrm{max}}]\); \({\mathrm{p}}_{\mathrm{ij}}\) represents the place of particle\(\mathrm{i}\)” on the \(jth\) dimensional, whose value was restricted to the range\(\left[{\mathrm{X}}_{\mathrm{min }};{\mathrm{X}}_{\mathrm{max}}\right]\); \({\mathrm{X}}_{\mathrm{pbes}}\) defines the pbest place of particle “I” on the \(jth\) dimensional, and \({\mathrm{X}}_{\mathrm{gbest}}\) indicates the gbest place of the swarm on the \({\mathrm{j}}^{\mathrm{th}}\) dimensional. The inertia weight “\(\mathrm{w}\)” denotes utilization for balancing global exploration and local exploitation. \({\mathrm{r}}_{1}\) and \({\mathrm{r}}_{2}\) are arbitrary functions ranging from 0 to 1, and \(\mathrm{b}\) refers to the constraint factor utilized for controlling the velocity weight, whose value was generally set to 1. The positive constants \({\mathrm{c}}_{1}\) and \({\mathrm{c}}_{2}\) are personal and social learning factors, whose values were generally set to 2. Figure 3 demonstrates the flowchart of PSO-SVM model. The important steps of the PSO-based parameter optimization procedure are summarized as follows:

Fig. 3
figure 3

Flowchart of PSO-SVM

Step 1: Initialization.

During this step, distinct parameters of PSO were initialized with a population of arbitrary particles as well as velocities.

Step 2: Train the SVM method and estimate the Fitness function (FF).

The SVM technique was trained with parameters \(C\) contained in the existing particle. The tenfold CV approach was implemented for evaluating FF. In tenfold CV, the trained dataset was arbitrarily separated into ten mutually exclusive subsets of approximately equivalent size, where nine subsets were utilized for training the data and the last subset was utilized for testing the data. The aforementioned model was repeated ten times, so all subsets were utilized once for testing. The FF was determined as \(1 - CA_{validation}\) of the tenfold CV technique from the training dataset that has been demonstrated in Eqs. (18) and (19). Additionally, the solution with superior \(1 - CA_{validation}\) has a lower fitness value.

$$Fitness = 1-{\mathrm{CA}}_{\mathrm{validation}} $$
(18)
$${\mathrm{CA}}_{\mathrm{validation}}=\mathrm{l}-\frac{\mathrm{l}}{10}{\sum }_{\mathrm{i}=\mathrm{l}}^{10}\left|\frac{{\mathrm{y}}_{\mathrm{c}}}{{\mathrm{y}}_{\mathrm{c}}+{\mathrm{y}}_{\mathrm{f}}}\right|\times \mathrm{l}00 $$
(19)

where \({\mathrm{y}}_{\mathrm{c}}\) and \({\mathrm{y}}_{\mathrm{f}}\) refer to the number of true and false classifiers, respectively.

Step 3: Upgrade the global and personal optimum positions.

During this step, the global optimum and personal optimum places of particles were upgraded based on FF values.

Step 4: Upgrade the velocity and position.

The position and velocity of all particles were upgraded utilizing Eqs. (16 and 17) and attained novel places of particles for more iterations.

Step 5 Termination Condition.

Repeat steps 2‐4 still end situations are not fulfilled.

4 Performance validation

This section investigates the sarcasm detection performance of the proposed IMBL-SDC technique on the applied Kaggle repository [23]. The dataset includes a total of 14,949 instances into non-sarcasm and 13,552 instances into sarcasm.

Figure 4 portrays the confusion matrices of the IMLB-SDC technique under five distinct folds. Under fold-1, the IMLB-SDC technique has classified a set of 14,224 instances into sarcastic and 12,707 instances into non-sarcastic. Eventually, under fold-2, the IMLB-SDC approach classified a set of 14,206 instances into sarcastic and 12,725 instances into non-sarcastic. Meanwhile, under fold-3, the IMLB-SDC method has classified a set of 14,237 instances into sarcastic and 12,742 instances into non-sarcastic. Along with that, under fold-4, the IMLB-SDC manner has classified a set of 14,234 instances into sarcastic and 12,787 instances into non-sarcastic. Finally, under fold-5, the IMLB-SDC algorithm classified a set of 14,259 instances into sarcastic and 12,802 instances into non-sarcastic.

Fig. 4
figure 4

Confusion matrix of the IMLB-SDC model

Table 1 and Figs. 56 examine the sarcasm classification outcomes of the IMLB-SDC technique under five distinct folds. The results demonstrated that the IMLB-SDC technique has gained maximum performance in all the applied folds. For instance, with fold-1, the IMLB-SDC method has accomplished a precision of 0.944, recall of 0.952, accuracy of 0.945, F-score of 0.948, and kappa of 0.307. Additionally, with fold-2, the IMLB-SDC approach has a precision of 0.945, recall of 0.950, accuracy of 0.945, F-score of 0.948, and kappa of 0.307. In addition, with fold-3, the IMLB-SDC methodology has accomplished a precision of 0.946, recall of 0.952, accuracy of 0.947, F-score of 0.949, and kappa of 0.308. Concurrently, with fold-4, the IMLB-SDC approach has a precision of 0.949, recall of 0.952, accuracy of 0.948, F-score of 0.951, and kappa of 0.309. Finally, with fold-5, the IMLB-SDC methodology has accomplished a precision of 0.950, recall of 0.954, accuracy of 0.950, F-score of 0.952, and kappa of 0.310.

Table 1 Results analysis of proposed IMLB-SDC model on different folds under various measures
Fig. 5
figure 5

Result analysis of the IMLB-SDC technique with distinct measures

Fig. 6
figure 6

F-score and kappa analysis of IMLB-SDC model

Figure 7 inspects the average results analysis of the IMLB-SDC technique under five different folds. The figure denotes that the IMLB-SDC technique has resulted in a maximum precision of 0.947, recall of 0.952, accuracy of 0.947, F-score of 0.949, and kappa of 0.308.

Fig. 7
figure 7

Average analysis of the IMLB-SDC technique with distinct measures

Figure 8 illustrates the ROC analysis of the IMLB-SDC technique obtained under five runs of execution. The ROC value of the IMLB-SDC technique is found to be maximum on all five applied runs. For instance, with run-1, the IMLB-SDC technique attained a higher ROC of 98.1987. Similarly, with run-2, the IMLB-SDC manner gained an increased ROC of 98.9169. Likewise, with run-3, the IMLB-SDC algorithm has reached a superior ROC of 99.2714. In line with run-4, the IMLB-SDC methodology attained a maximum ROC of 99.2929. Last, with run-5, the IMLB-SDC approach obtained an improved ROC of 99.5657.

Fig. 8
figure 8

ROC analysis of the IMLB-SDC technique under distinct runs

To showcase the superior performance of the IMLB-SDC technique, a comprehensive comparison study is performed in Table 2 [24]. A comparative precision analysis of the IMLB-SDC technique is shown in Fig. 9. The figure demonstrates that the G-RNN model has the least outcome by offering a precision of 0.622. At the same time, the NBOW, CNN-LSTM, V-LSTM, and V-CNN techniques have shown moderately closer outcomes with precisions of 0.66, 0.661, 0.683, and 0.684, respectively. Moreover, the A-LSTM, SIARN, MIARN, and ELM-BiLSTM techniques have gained somewhat considerable precision of 0.7, 0.721, 0.729, and 0.748, respectively. Although the IMH-SA technique has accomplished competitive precision of 0.774, the proposed IMLB-SDC technique has reached a maximum precision of 0.947.

Table 2 Results analysis of existing with proposed IMLB-SDC model under various measures
Fig. 9
figure 9

Comparative analysis of IMLB-SDC model in terms of precision

A detailed recall analysis of the IMLB-SDC method is shown in Fig. 10. The figure portrayed that the G-RNN method exhibited the worse results by offering a recall of 0.618. Likewise, the V-LSTM, NBOW, CNN-LSTM, and V-CNN approaches have demonstrated moderately closer outcomes with recalls of 0.639, 0.660, 0.667, and 0.681, respectively. Next, the A-LSTM, SIARN, MIARN, and ELM-BiLSTM techniques achieved slightly considerably recalls of 0.699, 0.718, 0.729, and 0.747, respectively. Eventually, the IMH-SA technique has accomplished a competitive recall of 0.772, and the projected IMLB-SDC methodology has attained an improved recall of 0.952.

Fig. 10
figure 10

Comparative analysis of IMLB-SDC model in terms of recall

A brief F-score analysis of the IMLB-SDC approach is shown in Fig. 11. The figure shows that the V-LSTM method yields the minimal results by offering an F-score of 0.607. Simultaneously, the G-RNN, CNN-LSTM, NBOW, and V-CNN techniques illustrated moderately closer outcomes with F-scores of 0.612, 0.657, 0.660, and 0.682. In addition, the A-LSTM, SIARN, MIARN, and ELM-BiLSTM methods reached somewhat considerably F-scores of 0.696, 0.718, 0.727, and 0.747, respectively. However, the IMH-SA algorithm achieved a competitive F-score of 0.772, and the presented IMLB-SDC method achieved a maximal F-score of 0.949.

Fig. 11
figure 11

Comparative analysis of IMLB-SDC model in terms of F-score

By looking into the detailed results analysis, it is demonstrated that the IMBL-SDC technique has accomplished better sarcasm detection outcomes over the other techniques. The enhanced outcomes is due to the utilization of feature selection and parameter tuning.

5 Conclusions

In this study, a new IMLB-SDC technique is derived to detect and classify sarcasm from social networking data. The IMLB-SDC technique encompasses different stages such as preprocessing, TF-IDF-based feature engineering, FS (chi-square, information), SVM-based classification, and PSO-based parameter tuning. At the initial stage, the actual input data are preprocessed into a compatible format by the use of diverse subprocesses, namely tokenization, stemming and lemmatization, and POS tagging. Besides, the use of PSO algorithm for the optimal selection of SVM parameters results in improved sarcasm detection and classification performance. An extensive simulation analysis is carried out to guarantee the betterment of the IMLB-SDC technique. The experimental outcomes pointed out the promising efficiency of the IMLB-SDC technique over the recent state-of-the-art techniques. As a part of extension, the IMLB-SDC technique can be improved by deep learning approaches.