1 Introduction

Social media is a powerful medium for sharing sentiments and opinions about a particular topic, focusing on content-sharing, communication, interaction, and collaboration. Sentiment indicates an individual's feelings regarding a post or product viewed on social media, and opinion indicates the thoughts expressed by an individual over the post or item [11, 25]. Through numerous social media sites like Facebook, Twitter, and Flickr, people may communicate with family, friends, and different communities around the world through text, photographs, and video [5, 8]. The emergence of the Internet allows users to post sentiment-related content about products [10], people, book, research, hotels [12], events and many others. With the proliferation of the Internet, massive unstructured data can be generated online and creates high risk with the information extraction procedures, whereas decision-making is a great challenge. The presence of unstructured data prohibits the classification methodologies from accurately processing the input textual content, consisting of different slang conveying more emotions [3]. Thus, automatic analysis of sentiments and emotions has become a widely researched domain because of the enormous availability of data on the social web.

In Natural Language Processing (NLP), sentiment analysis (SA) and emotion recognition (ER) are the dual critical research areas [13, 17]. However, both names can be utilized interchangeably but completely differs in some respects. In Sentiment Analysis, the most important concern is polarity which aims to assess whether the data is positive (P), negative (N) or neutral (Ne) [6, 24]. But emotion identification [4, 26] aims to identify the various human emotions such as happy, sad, angry, fear, disgust and surprise. People easily express their opinions, arguments, and feelings on social media focusing on different topics. People's active feedback helps marketers better understand user perspectives to improve their business strategies.

Sentiment, as well as Emotion analysis, offers a wide variety of applications such as monitoring conversations in social media, customer satisfaction and feedback evaluation, recognizing facial emotions [1, 15, 22], identifying physiological behaviours, helping in better decision-making and other Human–Computer Interaction (HCI) applications [18]. The methodologies like machine learning (ML) [2, 27], Lexicon [23], and Deep Learning (DL) [14, 21] can be used to perform sentiment and emotion analysis. Recently, DL has developed as a popular technique which learns features or representations from multiple layers and predicts the outcome. With the rising advances of deep Learning towards many applications, DL is being applied in sentiment and emotion detection. The DL model Convolutional Neural Network (CNN) is adopted only for visual data processing but recently gained improved results with textual inputs. Deep Learning can also be used for data mining and text analysis, enhancing the accuracy of sentiment analysis and emotion detection.

Motivation, contribution and organization of this research paper

Social networking platforms, namely Twitter, Facebook, LinkedIn, Instagram and others platforms, have become an important source of communication or conveying emotions globally due to the rapid proliferation in the Internet era. People use textual content, audio, pictures, and video data to express their viewpoints or feelings. In web-based networking, text communication is very problematic. Each second, a huge amount of data is generated in unstructured form with the ease of social media platforms. With technical advances, the data can be generated rapidly and processed to understand human psychology using sentiment analysis which identifies the text polarity. The sentiment-based data mainly affords valuable information for various applications. Classifying sentiments from the provided input data is a challenging task. Despite the availability of multiple techniques for sentiment analysis, researchers find it difficult to accurately identify the emotion behind the statement due to the conveyance of multiple emotions, slang used, ridicule, lexical and syntactical ambiguity. Recently, several existing works used automated deep-learning techniques for emotion classification using text-based data. These deep learning mechanisms provide improved classification results compared to other machine learning techniques and are more scalable. Moreover, deep learning techniques can learn large-level features from the given input data that assist the system in generating higher performance. Thus, the author prefers a robust deep-learning model for sentiment-based emotion classification. The general hierarchy of existing and proposed sentiment classification frameworks is presented in Fig. 1.

Fig. 1
figure 1

a Hierarchy of the existing sentiment classification frameworks b Hierarchy of the proposed sentiment classification framework

The Figure shows that the proposed framework is improved with the feature selection stage to optimize the feature set and enhance the overall classification accuracy. Moreover, the classification is performed with the deep Learning based approach, which is more efficient than traditional techniques. The major contributions of this proposed work are described as,

  • This work introduces an automated sentiment and emotion classification framework based on deep Learning to analyze the emotional tone behind a series of words.

  • The proposed framework extracts the most crucial features from the input textual content using Latent Semantic Analysis (LSA) and Log Term Frequency-based Modified Inverse Class Frequency (LTF-MICF) to enhance the overall classification accuracy.

  • To reduce the classifier’s dimensionality and computational complexity issues, the proposed framework selects the most optimal set of features using the chaotic artificial hummingbird algorithm (CAHA).

  • Further, the selected features are provided to the Dual-stage Deep model called Convolutional Gated Recurrent Unit (CGARU) model to perform multi-class emotion prediction based on sentiment polarity.

  • To evaluate the performance on various datasets and compare with the baseline classifier model to describe the efficacy of the proposed network.

The rest of this research paper is distributed as follows. Section 2 presents a literature review of the recent works on sentiment classification using different techniques. Section 3 elaborates on the proposed sentiment-based emotion classification model. Section 4 covers the simulation results and analysis of the proposed study. Finally, Section 5 concludes the paper with future scopes.

2 Related work

Some of the works related to this research are:

Karthik and Sethukarasi [16] presented a deep learning framework and Centered Convolutional Restricted Boltzmann Machines (CCRBM) to analyze user behaviour sentiments. The Deep Belief Network (DBN) was utilized to classify the sentiments accurately since this network performs dimensionality reduction and extraction of in-depth sentimental features. The parameters of a DBN architecture were optimized using Hybrid Atom Search Arithmetic Optimization (HASAO) and solved the issues with instability and randomness. Nine datasets were executed and tested to prove the effectiveness of a CCRBM method. The accuracy attained with polarity classes was positive (96%), neutral (90%) and negative (97%). The limitations were lower classification accuracy and required more training time.

Chen et al. [7] introduced deep Learning based sentiment analysis to explore social media data. For sentiment analysis, the data were collected from the largest online forum, PTT board-Military Life. Chinese words were segmented by combining sentiment dictionaries and the Jieba system. The deep learning models such as Long Short Term Memory (LSTM) and Bidirectional LSTM (Bi-LSTM) were used to perform training following the Word2vector (W2V) conversion. The activation function used for modelling was the Tanh function, which effectively enhances sentiment classification. The accuracy outcome obtained was 92.68%, whereas the existing models obtained MILSentic (82.41%) and self-developed senti-dictionary (84.08%). The limitation was that sentiment analysis only focuses on Militarylife PTT board data and not social media military-related data.

Singh et al. [20] developed deep learning LSTM-Recurrent Neural Network (LSTM-RNN) framework for analyzing sentiments of human emotions on COVID-19 Twitter reviews. Text pre-processing was done to eliminate the noisy or irrelevant features, hashtag removal, Internet slang removal, and white space removal. The extraction of features was processed with TF-IDF (Term Frequency-Inverse Document Frequency) method. The introduction of an attention mechanism allows efficient weight values for the features. The LSTM-RNN network classifies the data labels into 4 classes: fear, sad, joy and anger. The performance was analyzed using accuracy (84.56%), recall (82.12%), precision (82.34%) and F1-score (81.23%). The approach’s main drawback was the limited number of training instances used to train the model.

Du et al. [9] presented a new capsule network-based Hybrid Neural Network (HNN) for classifying sentiments. The deep learning capsule-HNN was formulated to obtain implicit semantic information efficiently. The different steps used for classifying the sentiments were Semantic Representation (SR) module, Word Attention (WA) module, Capsule module, Feature Extraction (FE) module and Classification module. The word embedding procedure illustrates each word as a multi-dimensional vector. The semantic features among the words in similar sentences were captured using the Self-attention mechanism. The extraction of features was adopted with CNN, and the reduction of feature dimensions was processed with K-max pooling. Combining a capsule and a Bidirectional Gated Recurrent Unit (BGRU) executes sentiment classification. The approach was evaluated using the Movie Review and NLPCC data. The main limitations of the approach were that the time complexity was high and the model resulted in a lower accuracy rate.

Pathak et al. [19] introduced a topic-based sentiment analysis using deep Learning on social media data. This topic-level sentiment analysis aims to detect topics at the sentence level via online Latent Semantic (LS) indexing. The LSTM framework was introduced to detect the topics with the attention mechanism. Three different datasets with the hashtags #bitcoin, #facebook and #ethereum were gathered from the Twitter site. The topic-level sentiment analysis focus on categorizing the positive and negative classes. The accuracy value estimated was Ethereum (0.846), facebook (0.794) and bitcoin (0.824). The main limitation of the model was that the sentence streaming procedure resulted only in detecting a single topic. Table 1 represents some existing studies and methods utilized for sentiment classification in recent years.

Table 1 Existing studies and their limitations

Problem statementstatement

In recent decades, sharing comments on the social media environment has increased rapidly. The emotions expressed in such comments assist many essential downstream applications. Hence, it is important to establish a powerful emotion classification method to detect emotions from online posts or reviews automatically. Various exiting works try to develop a robust sentiment and emotion classification technique but fail to bring enhanced results. Some of the real-world machine learning techniques face data imbalance issues. This issue directly affects the performance of a classification process. Moreover, extracting and selecting the optimal features from the given text data is essential to obtain higher results in the classification stage. But some of the techniques cannot extract the needed features from the text data and generate many problems while selecting suitable features for accurate classification. This creates higher computational complexity and degrades the system’s efficiency. Thus, the proposed study proposes an effective automated sentiment-based emotion classification model using a deep learning method to overcome such issues mentioned in the existing studies.

3 Proposed methodology

Sentiment analysis is one of the efficient models for extracting opinion mining with identification and classification from unstructured text data. The text reviews in social media carry a large degree of colloquialism, minimum sentence integrity and non-standard grammatical structures. Hence, it is difficult to accurately categorize the sentiment polarity and emotions from the text data. Thus, it is inspired to propose an Artificial Intelligence (AI) guided intelligent approach based on Bio-inspired CAHA with CGARU for the sentiment classification process.

The proposed approach of sentiment analysis comprises four main phases: 1) Text Pre-processing (TP), 2) Textual Features Extraction (TFE), 3) Bio-inspired Feature Selection (BFS), and 4) Sentiment Polarity based Emotion Classification (SPEC). Pre-processing steps are carried out on the raw tweets data, such as stop words removal, Parts-Of-Speech (POS) tagging, and duplicate removal. Further, the TFE process is performed on pre-processed data for additional processing using LSA and LTF-MICF models. This work especially employs the CAHA technique for the feature selection process to detect the optimal subset of informative features from the n-number of textual features. Finally, the output of CAHA data is rendered to the Dual-stage Deep Model named CGARU. The first stage (convolutional network) classifies the sentiment as positive, negative, and neutral. Then, based on the sentiments (positive, negative, neutral), the second stage (GARU) identifies the multi-class emotions as Positive (Happiness, Love, Surprise), Negative (Stress, Regret, Disgust) and Neutral. The proposed research framework is illustrated in Fig. 2.

Fig. 2
figure 2

Workflow of proposed sentiment and emotion classification model

3.1 Text pre-processing for quality enhancement

The presented input raw data contain several noises which may degrade the text’s quality and reduce system performance. The proposed classification model initially performs the pre-processing stage to overcome this issue. This stage enhances the quality of text data and also aids in classifying the sentiments accurately. The proposed work utilizes the following stages for text pre-processing: stop words removal, POS tagging and duplicate removal.

Stop words removal

The removal of stop words is a crucial process employed in the pre-processing stage. This process intends to remove the commonly available words across the entire corpus of documents. The articles and pronouns in the provided data are categorized as stop words. The contents which mainly used words are prepositions, conjunctions and determiners. The mentioned words are included in the stop word list, and these words will be eliminated from the text data. An example of stop word removal is shown in Table 2.

Table 2 Removal of stop words: an example

Thus, the reduced level information is neglected by eliminating the available stop words from the given text data, and the essential information is focused deeply. This process helps the system to attain higher performance. After removing the stop words, POS tagging is performed in this pre-processing stage.

POS tagging

POS tagging is the major part of the pre-processing stage and is a famous process in NLP. A text’s words are classified with a certain part of speech based on the word’s definition and context. The word class to the word’s position in the provided sentence is identified in POS tagging. Generally, the POS tagger allocates grammatical information for all words in the sentence. An example of POS tagger are CC, EX, NNP, CD, PRP$, MD, etc.; POS tagging find out each word as a “noun”, “adjective”, “adverb” or “verb” and allocates a POS tag to all words.

Duplicate removal

This process reduces the retrieval of duplicate information. Removing duplicated text is necessary because it creates several issues during classification. The duplicated text will affect the system hence removing the duplicated text makes the proposed model to obtain enhanced results.

After the completion of pre-processing, the quality of text data is improved, easing the classification process. These pre-processed data are moved into the feature extraction stage for further processing.

3.2 Textual feature extraction

Feature extraction is converting raw text data into numerical features that can be processed while securing the information in the actual data. The proposed feature extraction stage extracts the essential features from the pre-processed data. The features which provide more information for sentiment-based emotion classification should be extracted. This feature extraction stage has a major role in improving classification accuracy. Here, the proposed work chooses LSA and LTF-MICF for texture feature extraction.

3.2.1 Latent Semantic Analysis (LSA)

LSA is also termed as Latent Semantic Index (LSI). LSA utilizes the Bag of Words (BoW) algorithm, which affords a term-document matrix. In this, rows are mentioned as terms and columns are mentioned as documents. LSA learns the latent or hidden topics by enabling a matrix decomposition of the document term matrix through Singular Value Decomposition (SVD). The detailed study of LSA is described as follows,

  • Let’s consider \(p\) documents and \(m\) words in the given dataset and design a \(p\times m\) matrix \(X\) in which all row mentions a document and all columns mentions a word.

  • All entries can be determined by a raw count of the amount of times, the \({i}^{th}\) word presented in the \({j}^{th}\) document.

  • The LSA models replace the raw data in the document term matrix with a Term Frequency Inverse Document Frequency (TF-IDF) score. A weight is allocated for each term \(i\) in document \(j\) through TF-IDF. The computation of TF-IDF is provided as,

    $$\delta\;_{j,i}=tf_{j,i}\;\times\;\log\frac M{df_i}$$
    (1)

Where, \({\delta }_{j,i}\) is represented as TF-IDF score, \({tf}_{j,i}\) is mentioned as the number of times a term is encountered in a document, \(M\) specifies the overall documents and the documents that present words is denoted as \({df}_{i}\). The term contains a higher weight when it appears continuously across the document but rarely across the corpus. The document term matrix \(X\) is very large, noisy and sparse across various dimensions. The dimensionality minimization on a matrix \(X\) is enabled through SVD, in which some latent topics that gather the relation between the documents and words are determined. SVD is a method in linear algebra that changes the matrix \(P\) into the product of three individual matrices \(P={U}^{*}{V}^{*}W\). Here, the singular values of \(P\)’s diagonal matrix are specified as \(V\). The restricted SVD minimizes dimensionality by choosing only the \(l\) highest singular values and only maintaining the initial \(l\) columns of \(U\) and \(W\). Now, \(l\) is a chosen hyperparameter and regulated to reflect the amount of topics that need to be identified.

$$X\approx {U}_{l}{V}_{l}{W}_{l}{}^{T}$$
(2)

In this, the document topic matrix is exhibited by \(\in {H}^{\wedge }\left(p\times l\right)\), and the term topic matrix is \(W\in {H}^{\wedge }\left(m\times l\right)\). The columns related to one of the \(l\) topics in both \(U\) and \(W\). The row mentions document vector in \(U\) and is defined in terms of topics. On the other hand, the rows mentioned terms vectors defined as topics in \(W\). Through the term vectors and document vectors, one can effortlessly evaluate cosine similarity to compute the similarity of varied documents, the relationship of varied words and the relationship among terms and documents. In the proposed feature extraction process, the TF-IDF vectorizer is used to refine the word counts and penalize the words that arise various times across the document. Depending on the scores in TF-IDF, the largest scored words are extracted to enhance the classification performance.

3.2.2 Log Term Frequency-based Modified Inverse Class Frequency (LTF-MICF) model

The feature extraction method, named LTF-MICF, combines two weighting schemes. The term frequency \(TF\) is the computation of how often a term is presented within a review document’s contents. However, the continuous occurrence of terms in a document leads to additional weight, and the computation of TF alone becomes insufficient. The supervised methods have increased attention for generating usage of the review class information. Hence, the proposed work combines \(TF\) with MICF. The inverse class frequency \(CF\) is an inverse ratio of the overall classes wherein the term presented on training reviews to the entire classes. In the beginning, LTF centred is computed, which measures \(TF\) of each term in the pre-processed data. Then, perform log normalization on the \(TF\) data output represented as LTF. Next, the MICF is evaluated for all terms and is considered the advanced version \(CF\). For all terms, the disparate class-specific scores should have varied importance to the entire term score; subsequently, the \(CF\) modification is enabled. Therefore, varied weights for varied class particular scores should be strictly allocated, and the weighted sum across each particular class score is used for the entire term score. The equation of LTF-MICF is represented as,

$$LTF-MICF({t}_{q})=LTF({t}_{q})*\sum_{v=1}^{m}{w}_{qv}.[{}^{i}c{}_{h}({t}_{q})]$$
(3)

where, \({w}_{qv}\) represents a particular weighting factor of the term \({t}_{q}\) for class \({c}_{v}\) and can be mentioned as,

(4)

The weighting factor \({W}_{f}\) is the technique utilized to consider the weight of a provided input text data. The total amount of data in class \({c}_{v}\) is denoted as \({k}_{i}\), and it includes the term \({t}_{q}\); an amount of \({k}_{i}\) in another class that includes the term \({t}_{n}\) is represented as \({k}_{i}{t}^{\prime}\), \({k}_{i}\hat t\) specifies the amount of \({k}_{i}\) in-class \({c}_{v}\), which cannot include the term \({t}_{q}\). The amount of \({k}_{i}\) which does not include the term \({t}_{q}\) in other classes is specified as \({k}_{i}\widetilde{t}\). To neglect the negative weights, fixed ‘1’ is employed in the proposed LTF-MICF approach. When \({k}_{i}{t}^{\prime}=0\) or \(K_i\;\hat t=0\), the lowest denominator is fixed to ‘1’for diminishing the zero-denominator issue in a critical case. An advanced term weighting LTF-MICF \(\left({t}_{q}\right)\) is generated centred on the MICF \(\left({t}_{q}\right)\). The equation for \(LTF\left({t}_{q}\right)\) and \({{}^{i}c}_{h}\left({t}_{q}\right)\) can be represented as,

$$LTF\left(t_q\right)=\log\left(1+TF\left(t_q,k_i\right)\right)$$
(5)

where, \(TF\left({t}_{q},{k}_{i}\right)\) mentions the raw number of a term \({t}_{q}\) on a provided data. The number of times that \({t}_{q}\) is encountered on the input data is provided as,

$${}^ic{}_h(t_q)=\log\left(1+\frac v{c(t_q)}\right)$$
(6)

where, \({{}^{i}c}_{h}\left({t}_{q}\right)\) indicates the total number of times \({t}_{q}\) is present on the input data, \(v\) mentions the overall classes in a provided dataset and the amount of classes presenting the term \({t}_{q}\) is specified as \(c\left({t}_{q}\right)\). The dataset features are mentioned as \({H}_{i}=\left\{{H}_{1},{H}_{2},..\dots \dots {H}_{3},\dots \dots \dots {H}_{m}\right\}\). The extracted features from the pre-processed data are considered as \({H}_{1},{H}_{2},..\dots \dots {H}_{3},\dots \dots \dots {H}_{m}\). The algorithm of a proposed LSA with LTF-MICF is illustrated in Table 3.

Table 3 Pseudocode of proposed LTF-MICF-based feature extraction

The most discriminative features are extracted in the proposed feature extraction step. Since all the discriminative features are extracted, it may result in dimensionality issues in the classification stage. This may affect the system's performance by increasing its complexity. Thus, selecting useful features from the extracted features is necessary to improve classification accuracy.

3.3 Chaotic Artificial hummingbird algorithm (CAHA) based feature selection

The feature selection process is essential to choose the optimal feature set from the extracted features. This stage selects the most needed features that afford higher classification performance. The preceding stage contains more features that can make higher computational complexity. It may affect the efficiency of a proposed classification. Thus, the proposed study attempts to select the optimal feature set using the CAHA approach. The Artificial Hummingbird Algorithm (AHA) approach is a bio-inspired optimization method mainly utilized to resolve optimization issues. This approach simulates certain flight skills and strategies regarding the intelligent foraging of hummingbirds. The foraging strategies of hummingbirds contain three types of flight skills such as axial, diagonal and omnidirectional. Additionally, the hummingbird used the strategies like territorial foraging, guided foraging and migrating foraging. Using the strategies of the AHA approach, the proposed study selects the optimal features according to the fitness function \(\left({F}_{f}\right)\). The fitness function is given as,

$$(F_f)=\beta\ast\psi+(1-\beta)\ast\frac{\vert N_e\vert}{N_d}$$
(7)

where, \(\beta\) signifies the parameter that degrades the results of a classification stage, the classification error is specified as \(\psi\), the total number of features obtained in the feature extraction stage is denoted as \({N}_{e}\) and the total features in the given dataset is represented as \({N}_{d}\). In the beginning, the search agents are randomly initialized as,

$$z_i=L_B+g.\;(U_B-L_B)\;\;\;\;\;\;\;i=1,........\;.\;\;....m$$
(8)

where, \({L}_{B}\) and \({U}_{B}\) represents the lower and upper boundaries for a \(l\)-dimensional issue. The random vector ranges from 0 to 1 is specified as \(g\), and the position of \({i}^{th}\) feature as a resolution of the mentioned issue is denoted as \({z}_{i}\).

Initialization

The initialization of features is represented as,

$$I_F=\left\{\begin{array}{l}0\;\;\;\;\;\;if\;\;i\neq j\\null\;\;\;\;\;\;i=j\end{array}\right.\;\;\;i=1,.......m;\;j=1,.....,m$$
(9)

where, \({I}_{F}={\text{null}}\) mentions a search agent is choosing a feature as its feature source and \({I}_{F}=0\) represents the \({j}^{th}\) feature source visited by the \({i}^{th}\) search agent in the present iteration.

Guided foraging

Each search agent tends to attain a feature source with a high nectar volume. It specifies that an optimal feature requires an increased nectar-refilling rate and a high unvisited time by that search agent. In the AHA approach, a search agent tries to analyze the features with the largest visit level for guided foraging. In this, the search agent selects the features with the largest nectar refilling rate from them as its optimal feature source. After attaining the optimal features, the search agent can explore them to select features. Three flight skills like diagonal, omnidirectional and axial flights, are utilized and modelled in the searching process using a direction switch vector. The above-mentioned vector is employed to limit the several directions in the presented \(l\)-dimension. The pattern of flights can be enlarged to a \(l\) dimensional space, and the formulation of axial flight is represented as,

$$L^{(i)}=\left\{\begin{array}{l}1\;\;\;if\;\;\;i=randi(\lbrack1,l\rbrack)\\0\;\;\;else\end{array}\right.\;\;\;\;\;\;i=1,...,l$$
(10)

The computation of diagonal flight is represented as,

$$L^{(i)}=\left\{\begin{aligned}&1&if\;i=K(j),\;\;j\in\lbrack1,k\rbrack,\;K=rand\;perm(f),\;\;f\in\lbrack2,\lceil g_1.(l-2)\rceil+1\rbrack\\ &0\end{aligned}\right.i=1,...,l$$
(11)

The computation of omnidirectional flight is shown in below equation,

$${L}^{(i)}=1i=1,......,l$$
(12)

where, \({{randi}}\left(\left[1,l\right]\right)\) mentions a random integer ranges from 1 to \(l\), a random permutation of integer’s ranges from 1 to \(f\) is specified as \(rand\;{perm}\left(f\right)\), and the random number ranges between 0 and 1 is denoted as \({g}_{1}\). The mathematical formulation of a guided foraging strategy of search agent and feature source is provided as,

$${u}_{i}(t+1)={z}_{i,opt}(t)+\delta .L.({z}_{i}(t)-{z}_{i,opt}(t))$$
(13)
$$\delta \sim M\left(0,1\right)$$
(14)

where, \({z}_{i}\left(t\right)\) mentions the location of \({i}^{th}\) feature source at a time \(t\), the location of optimal features attained by the search agent is denoted as \({z}_{i,opt}\left(t\right)\) and a guided factor is specified as \(\delta\) which is a normal distribution \(M\left(0,1\right)\) with a mean of 0 and standard deviation of 1. The above Eq. (13) represents all present feature sets to update their location in the neighbourhood of the optimal feature source, and the search agent’s guided foraging strategy is modelled through varied flight patterns. The equation of position updation of \({i}^{th}\) feature source is mentioned as,

$$z_i(t+1)=\left\{\begin{aligned}&z_i(t)&h(z_i(t))\leq h(u_i(t+1))\\ &u_i(t+1)&h(z_i(t))>h(u_i(t+1))\end{aligned}\right.$$
(15)

where, \(h\left(.\right)\) mentions the function of fitness value. The above Eq. (15) exhibits that if the nectar refilling rate of a feature source is superior to the present features, the search agent leaves the present feature sets and remains at the candidate resulting from Eq. (13) for exploring. The visit table is an essential component in the AHA approach which preserves the visit information of feature sources. Based on the visit table, any search agent can identify the optimal feature set to visit on each iteration. The visit table details how long the search agent has not visited the feature set since it was visited by the same search agent. A large unvisited time indicates a large visit level. Through Eq. (13), all search agents visit their optimal feature sets. According to the optimal feature set, the search agent enables the guided foraging behaviour in each iteration using Eq. (13). The visit levels of other feature sets for this search agent are maximized by one, and the visit level of its optimal feature set visited is initialized to zero. After guided foraging, the search agent will not change its feature set if there is no suitable nectar-refilling rate. Then, the present source will be alternated by a new one if there is an appropriate nectar-refilling rate, and then this search agent will remain at the new feature sets. The visit level updation is set as the maximum level of another feature set enhanced by one. For each iteration, the visit table is updated in the AHA approach.

Territorial foraging

After attaining the optimal features, the search agent explores the new feature sets instead of visiting other preceding features. Hence, a search agent can move its location to a neighbouring region within its territory, where a feature may be identified as a candidate solution. It may be superior to the present one, and the mathematical formulation for local exploration of search agents in the territorial foraging behaviour and a candidate feature set is attained as,

$${u}_{i}\left(t+1\right)={z}_{i}\left(t\right)+\eta .L.{z}_{i}\left(t\right)$$
(16)
$$\eta\;\sim M\left(0,1\right)$$
(17)

Where, \(\eta\) mentions a territorial factor, and it is the normal distribution \(M\left(0,1\right)\) with a mean of zero and a standard deviation of one. The above Eq. (16) can permit the search agent to effortlessly identify the feature sets in this local neighbourhood based on its own location through certain flight skills.

Migration foraging

When a position chosen by the search agent includes a small number of features, this search agent moves to a high distant feature set for exploration. The coefficient of migration is defined in the AHA approach. When the pre-decided value of the migration coefficient is enhanced, the search agent with minimal nectar refilling rate will migrate to a new feature set. This new feature set is generated randomly in the exploration space. In this case, the search agent will leave the preceding source and remains at the new feature source for exploration. Then, the visit table is updated by the AHA approach. The equation of migration behaviour is represented as,

$${z}_{wor}\left(t+1\right)={L}_{B}+\mathrm{g}.\left({U}_{B}-{L}_{B}\right)$$
(18)

where, \({z}_{wor}\) is specified as the feature source with the worst rate of nectar refilling. When the guided and territorial foraging steps result in the worst nectar refilling rate, the search agent visits each feature set based on the visit table at the end of each iteration. The migration coefficient, which corresponds to the population size, is given as,

$$N=2m$$
(19)

3.3.1 Chaotic maps

Generally, several chaotic maps are available and are used to generate chaotic sets. To enhance the convergence rate of the AHA approach, the proposed study used chaotic maps that can enhance the overall efficiency of the algorithm. The proposed CAHA approach is established by applying chaos in the AHA mechanism. Because of the dynamic strategy of AHA, chaotic maps have been employed in the optimization field, which assists in searching the exploration space more globally and vigorously. The proposed work chooses an iterative map-based chaos to enhance AHA’s convergence rate. It can be given as,

$${z}_{i}={L}_{B}+g.({U}_{B}-{L}_{B})\times {y}_{q+1}\;\;\;\;\;\;\;i=1,.............m$$
(20)
$${\mathrm{Where},\;y}_{q+1}=abs\left(sin\left[\frac{S}{{z}_{q}}\right]\right)$$
(21)

where, \(S\) mentions the adjustable parameter. Thus, the proposed CAHA-based feature selection process selects the most optimal sentiment and emotion classification features. This stage diminishes the overall system’s computational complexity and helps the classifier to obtain improved results. The Pseudocode of a proposed CAHA approach is illustrated in Table 4.

Table 4 Proposed CAHA algorithm

3.4 Dual-stage Deep CGARU model for Sentiment-based Emotion Classification

The chosen features from the proposed CAHA approach are utilized as input in the final classification stage. For classification purposes, the proposed study used a dual-stage deep CGARU model. The modules involved in the CGARU model are the channel-wise attention module, convolutional module, Gated Recurrent Unit (GRU) module and self-attention module. An attention mechanism is utilized initially to find the channels with appropriate information for classification. Then, 3 one-dimensional CNN layers and four GRU layers were utilized to obtain temporal information. Later, a self-attention mechanism encodes the temporal dependency extracted from the preceding layers, like one-dimensional CNN and GRU layers. At last, the classification, fully connected, and softmax layers were employed to classify the sentiments and emotions from the provided text data. Table 5 provides the hyperparameter settings of a proposed CGARU method. Figure 3 depicts the architecture of a proposed CGARU model.

Table 5 Hyperparameters of the CGARU model
Fig. 3
figure 3

Structure of the proposed dual-stage CGARU model

3.4.1 Channel-wise attention module

This module is utilized to attain suitable features by allocating varied weights to all channels in which each channel's significance is analyzed. Initially, average pooling was enabled in each step using the pre-processed data. Then, it obtains the attention matrix with fully connected layers and a non-linear activation function via a channel-wise vector mentioning a mean value over time. It is given as,

$$X_{att}=soft\;\max\left(\omega_0.\left(\frac1m\sum\nolimits_1^mZ\right)^T+a_0\right)$$
(22)

In Eq. (21), an average pooling of epochs \(Z\in {F}^{m\times b}\) is employed as an input, and the final result in an attention mechanism is specified as the attention matrix \({X}_{att}\in {F}^{1\times b}\). Where, \(m\) is considered as the amount of time steps and \(b\) mentions the amount of channels. During training, the bias and weight matrix \(\left({a}_{0}\in {F}^{1\times b}\;{\text{and}}\;{\omega }_{0}\in {F}^{b\times b}\right)\) in the fully connected layer is updated. The softmax layer is employed for the activation function, and the equation is,

$$soft\;\mathrm{max}({s}_{i})=\frac{\mathrm{exp}({s}_{i})}{{\sum }_{j=1}^{B}\mathrm{exp}({s}_{j})}$$
(23)

where, \(s=\left[{s}_{1},{s}_{2},.\dots ,{s}_{b}\right]\) is considered as the fully connected layer. Then, the output feature \({Y}_{att}\in {F}^{m\times b}\) obtained from the channel-wise attention mechanism is attained by increasing the actual data \(Z\) and attention matrix \({X}_{att}\), which is mentioned as,

$${Z}_{att}=Z\otimes {X}_{att}$$
(24)

The represented symbol \(\otimes\) signifies element-wise multiplication, and it operates utilizing the product of related elements of dual matrices when both matrices hold similar dimensions.

3.4.2 Convolutional neural network module

Two-dimensional CNNs are utilized in computer vision due to the capability of adaptive Learning of spatial features from the text data. The proposed study prefers three 1D CNNs to extract temporal features from recalibrated input data whose informative channels were identified through a channel-wise attention mechanism. The initial convolutional layers enabled the process of convolution at the time step \(m\). When the convolutional operations are enabled with a kernel size of \(p\), the amount of output filters of \(g\) and stride length of \(l\) by utilizing the similar padding to manage the input size, the output \({X}_{\text{conv}}\in {F}^{m+2k-p/l\times g}\) can be computed as,

$${X}_{conv}={\cup }_{i=1}^{g}ELU(BN({\omega }_{i}*{Z}_{att}+{a}_{i}))$$
(25)

where, \({\omega }_{i}\in {F}^{p\times b}\) and \({a}_{i}\in {F}^{\left(m+2k-p\right)}/l\times 1\) mentions the convolutional kernel of filter \(i\) and bias correspondingly. The represented symbol * defines convolution operation with the batch normalization function and temporal dimension, which is employed for stability training. The equation of batch normalization is given as,

$${B}_{N}(\mathrm y)=\mathrm{y}_{{B}_{N}}=\psi \frac{y-{\mu }_{A}}{\sqrt{{\sigma }_{{A}^{2}}+\lambda }}+\eta$$
(26)

where, \({\mu }_{A}\) and \({\sigma }_{A}\) mentions the mean and standard deviation of every batch correspondingly, the hyperparameters are denoted as \(\psi\) and \(\eta\). The term \(ELU\) specifies an exponential linear unit and was utilized as an activation function. With the assistance of concatenation, the output \({X}_{\text{conv}}\) is attained and is signified by the symbol \(\cup\) of the outcome from each filter. The resting two convolutional layers effectively enabled a similar operation according to the above Eqs. (24) and (25), with the same parameters as the initial convolutional layers.

3.4.3 Gated recurrent unit (GRU) module

Using a gating concept, GRU is an advanced version of the conventional RNN method for resolving long-term dependency issues. This module is employed to further acquire temporal features from the resultant samples of a convolutional module. The GRU employs a reset gate and an update to obtain one hidden state, which is faster and highly simpler than the traditional LSTM method that utilizes an input, output and forget gate to attain a cell and hidden state. The update gate \({Ug}_{t}\in {F}^{1\times mh}\) combines the input and forget gate’s function in the LSTM method, analyses the update ratio of previous and current information as mentioned in the below equation,

$$U{g}_{t}=sigmoid({h}_{t-1}{\omega }_{uh}+xt{\omega }_{ux}+{a}_{u})$$
(27)

where, \({h}_{t-1}{\omega }_{uh}\) represents the hidden state and \({x}_{t}\in {F}^{1\times g}\) signifies the present input vector. The weight matrices like \({\omega }_{uy}\in {F}^{g\times mh}\) and \({\omega }_{uh}\in {F}^{mh\times mh}\) and the bias \({a}_{u}\in {F}^{1\times mh}\) represent the update gate which is set with the amount of hidden units \(mh\). In the initial GRU layer, the input vector \({x}_{t}\) denotes the present output of a convolutional layer \({X}_{\text{conv}}\). Hence, \(g\) represents the amount of output filters presented in CNN. Such values were consistent across each GRU layer. The reset gate decides whether to discard or retain the preceding information, and it can be attained as,

$${f}_{t}=sigmoid({h}_{t-1}{\omega }_{fh}+{x}_{t}{\omega }_{fx}+{a}_{f})$$
(28)

where, \({\omega }_{fx}\in {F}^{g\times mh}\) and \({\omega }_{fh}\in {F}^{mh\times mh}\) are considered as the weight matrices and \({a}_{f}\in {F}^{1\times mh}\) is represented as bias. The sigmoid function was utilized for the activation function in reset and update gates. Next, by enabling element-wise multiplication, the candidate hidden state \({H}_{t}\) can be attained among the preceding hidden states and reset gates. This is given in below equation,

$${H}_{t}=\mathrm{tanh}({H}_{t-1}{\omega }_{HH}\otimes {f}_{t}+{x}_{t}{\omega }_{xh})$$
(29)

where, \({\omega }_{HH}\in {F}^{mh\times mh}\) and \({\omega }_{xh}\in {F}^{g\times mh}\) mentions the weight matrix and tanh signifies the hyperbolic tangent function. Hence, the preceding hidden state is totally concealed if \({f}_{t}=0\). Then, the new hidden state \({H}_{t}\) can be computed along with the update gate \({Ug}_{t}\) in which element-wise multiplication of the preceding hidden state \({H}_{t-1}\) and present candidate state \({H}_{t}\) is enabled. The computation of new hidden states is provided as,

$${H}_{t}=(1-U{g}_{t})\otimes {H}_{t-1}+U{g}_{t}\otimes {H}_{t}$$
(30)

In this, the four GRU layers extract the temporal features of CNN layers. All GRU layers include similar input and output dimensions; as well as every four layers distribute similar parameter sets.

3.4.4 Self-attention module

The extracted temporal features from both CNN and GRU are time-varying. Therefore, the self-attention module differentiates the importance of all features and provides appropriate prediction. This self-attention concept was initially established in machine translation to enhance long-range dependencies. The proposed study uses multiplicative attention, termed as Luong attention, to compute the alignment function \({A}_{t}\) and is represented as,

$${A}_{t}=\frac{\mathrm{exp}({H}_{t}{}^{T}{H}_{l})}{{\sum }_{t}\mathrm{exp}({H}_{t}{}^{T}{H}_{l})}$$
(31)

where, \({H}_{t}\) and \({H}_{l}\) mentions query and key correspondingly. The proposed classification stage sets the query and key as the output state of a previous GRU layer. Next, the new feature set \({X}_{A}\) mentions the feature set \({H}_{t}\) context, which can be mentioned utilizing the score of attentive alignment \({A}_{t}\). It is given as,

$${X}_{A}=\sum_{t}{A}_{t}{H}_{t}$$
(32)

In the end, a fully connected layer and a softmax function enabled the classification of sentiments and emotions. The computation of a classifier’s probability is represented as,

$$Q=soft\;\mathrm{max}(\omega {X}_{A}+\varepsilon )$$
(33)

Where, \(\omega\) represents the weight and \(\varepsilon\) mentions bias correspondingly. The model is trained by diminishing the cross entropy loss among the estimated probability and ground truth label. The equation of binary cross entropy loss function is determined as,

$$Loss=-{\sum }_{t}({X}_{t}\mathrm{log}({Q}_{t})+(1-{X}_{t})\mathrm{log}(1-{X}_{t}))$$
(34)

where, \({X}_{t}\) and \({Q}_{t}\) defines the target label and estimated probability at \({t}^{th}\) time correspondingly. The presented loss function is reduced by using the Adam optimizer, making the classification stage more robust. The attention layer provides varied attention weights for each data due to the different samples. Hence the attention weight for each input text data can be employed to minimize the amount of channels. Thus, the proposed CGARU classifier effectively classifies the sentiments and emotions from the given input. In CGARU, the convolutional module initially classifies positive negative and neutral sentiments. Also, the emotions behind the sentiments are categorized in the proposed GARU module. The working process of a proposed framework is presented in Algorithm 3.

Algorithm 3
figure a

Working principle of the proposed framework

4 Simulation results and analysis

This section describes the simulation results and discussions of the proposed dual-stage deep learning model. The simulation is performed in the Python platform to illustrate the robustness of the proposed techniques. The datasets are included, followed by the training approach, and hyperparameters are employed. The performance of a proposed study is measured by evaluating some of the metrics, and the results are compared with other deep learning techniques to determine the efficacy of a proposed model. Table 6 represents the system configuration that supports the proposed simulation.

Table 6 System configuration

4.1 Dataset description

The proposed study utilized three different datasets to classify the sentiment-based emotion from the text. Thus, the study analyses the proposed classification model with the following datasets such as,

The Twitter sentimental dataset is attained from the Kaggle repository, mainly used for sentiment classifications. This dataset involves an amount of 162981 Twitter data. The IMDB Movie Reviews dataset is also collected from the Kaggle repository, which holds 50001 data. The third dataset, YelpReviewsDataset, contains 10001 data which holds customer reviews with positive and negative comments.

4.2 Performance metrics

The performance of a proposed dual-stage CGARU model is evaluated by measuring the performance metrics like accuracy, precision, recall, specificity, F-measure, specificity and kappa coefficient. The attained results from the proposed technique are compared with other conventional deep learning techniques like MLP (Multi-Layer Perceptron), LSTM (Long-Short Term Memory), Bi-LSTM (Bi-directional Long-Short Term Memory), Convolutional Neural Network (CNN) and DNN (Deep neural network). The comparison analysis exhibits the efficiency of a proposed model.

Accuracy

Accuracy is an essential measure to determine the correctly classified data for the entire data in the given dataset. When the classifier provides a classification of sentiment, it is compared with the actual sentiment of a word and then the true positives and false positives are categorized. This metric has a major role in analyzing the performance of a proposed dual-stage CGARU-based classification. The term accuracy is evaluated as,

$$Accurac{y}=\frac{TP+TP}{TP+TN+FP+FN}$$
(35)

Precision

The precision metric also exhibits the proposed classifier’s efficacy and determines the rate of perfectly classified sentiments from the total amount of true positives. It is formulated as the sum of accurately classified data in a specific class divided by the whole categorized data. The equation of precision measure is given as,

$$\mathrm{Pr}ecision=\frac{TP}{TP+FP}$$
(36)

Recall

The recall metric is the ratio of an entire amount of true positives to the entire amount of both true and false negatives. This recall metric is also considered a sensitivity measure and can identify the error rate of classier. A recall is mathematically represented as,

$$\mathrm{Re}call=\frac{TP}{TP+FN}$$
(37)

F1-score

It is the harmonic mean of precision and recall. In the f-score, appropriate precision and recall are computed by one. If the value of recall or precision is zero, then the f-score value is also assumed as zero. The computation of the F1-score is represented as,

$$F-measure=2\times \frac{\mathrm{Pr}ecision\times \mathrm{Re}call}{\mathrm{Pr}ecision+\mathrm{Re}call}$$
(38)

Specificity

Specificity determines the proportion of real negatives that are classified as negative by the proposed model. The correctly classified sentiments and emotions are analyzed in the specificity measure. It is evaluated as,

$$Specificity=\frac{TN}{TN+FP}$$
(39)

Kappa coefficient

Kappa defines the classifier’s performance rate, i.e. it measures how much better the classifier is accurately classifying the output. This kappa measure provides the inter and intra-rater reliability of classified samples.

$$K=\frac{{P}_{observed}-{P}_{chance}}{1-{P}_{chance}}$$
(40)

where, \(TP\) mentions true positives, \(TN\) is considered as true negatives, \(FN\) signifies false negatives, \(FP\) signifies false positives and \(P\) is the probability of an event.

4.3 Performance evaluation of the proposed dual-stage deep CGARU model

This section presents the performance evaluation results of a proposed sentiment and emotion classification model. The efficiency of a proposed study is proved by enabling the comparative analysis using different deep learning techniques like MLP, LSTM, Bi-LSTM, CNN and DNN. The performance of a proposed model is determined in three datasets, and the attained accuracy and loss of three datasets are shown in Fig. 4.

Fig. 4
figure 4

Accuracy and loss comparison with different datasets

The obtained accuracy and loss in the proposed classification stage are analyzed by varying the size of epochs. The accuracy and loss are determined using three datasets during training and testing. In the Twitter sentiment dataset, the accuracy is minimal at the epoch of 0 to 40 compared to the training stage. When the epoch size is varied to 150, the accuracy in testing is increased. Similarly, the loss is reduced at the epoch of 150 using the Twitter sentiment dataset. The accuracy remains the same for training and testing at the epoch of 150 to 200 in the IMDB Movie Reviews dataset. After the epoch of 200, the accuracy of a proposed classifier is improved in the testing phase. On the other hand, the loss remains the same at the epoch of 200 to 300. In the Yelp Reviews dataset, almost the accuracy attained in the training and testing phase is similar. However, the loss is minimized in the testing phase at the 150th epoch. This states that the proposed method has attained better performance for classifying the sentiments and emotions from the provided data. Figure 5 represents the confusion matrix for three datasets.

Fig. 5
figure 5

Confusion matrix for different dataset

The confusion matrix shows the robustness of a proposed classifier. Using the Twitter sentiment dataset, only a few data are misclassified as other classes. In the positive class, 503 data are correctly classified as positive, and only one class is misclassified. Similarly, only one data is misclassified for the negative and neutral classes. Using the IMDB Movie Reviews dataset, 666 data are accurately classified as positive, and 6 are misclassified. For the negative class, only one data is misclassified from the total of 8 data. For the neutral class, 1316 are correctly classified, and the remaining 4 data are wrongly predicted. Also, using the Yelp Reviews dataset, 173 positive sentiment data are accurately classified as positive and only a few data are misclassified from negative and neutral classes. Thus, it states that the classifier has higher efficiency in categorizing sentiments from each dataset. The accuracy comparison of a proposed and other deep learning techniques is shown in Fig. 6.

Fig. 6
figure 6

Graphical comparison of the accuracy of different datasets for the proposed and the existing method

The above graphical representation provides the accuracy of both proposed and existing techniques. Using three datasets, the accuracy measure is determined for each technique. The Figure states that the accuracy of a proposed model is better than the existing techniques. The existing techniques cannot accurately classify sentiment-based emotion because of several issues. The existing MLP provides the lowest accuracy as compared with other techniques. During the classification process, the extending parameters in the MLP approach make the classification accuracy a minimum. Similarly, the LSTM and Bi-LSTM consume more time to train the input data, and it easily falls into an overfitting issue. The conventional CNN method automatically extracts the needed features. But, it generates an increased class imbalance problem during emotion categorization; hence, the accuracy of CNN is diminished. In the proposed study, the effective features selected through the optimization technique help the classifier to attain enhanced classification accuracy. Also, the proposed model maintains the neural network weights to resolve the vanishing gradient issues and helps to attain higher accuracy. Thus, the proposed technique accurately classifies the sentiments and emotions in the provided datasets. Using the Twitter sentiment dataset, the accuracy of CGARU is 99.80%, MLP is 95.70%, LSTM is 96.70%, Bi-LSTM is 97.35%, CNN is 98.15%, and DNN is 98.75. By performing the simulations with the IMDB Movie Review Dataset, the obtained accuracy of CGARU is 99.75, MLP is 95.10%, LSTM is 95.95%, Bi-LSTM is 97%, CNN is 97.55%, and DNN is 98.95. In the Yelp Reviews dataset, the attained accuracy of CGARU is 98.83%, MLP is 94.45%, LSTM is 96.15%, Bi-LSTM is 96.70%, CNN is 97.30%, and DNN is 98.35%. The result analysis proves that the proposed model attains higher classification accuracy for each dataset than other techniques. The comparative analysis in terms of precision performance is illustrated in Fig. 7.

Fig. 7
figure 7

Graphical comparison of the precision of different datasets for the proposed and the existing method

The comparison analysis in terms of the precision measure states that the proposed CGARU-based classification model provides higher precision results than the other techniques. Because of various issues, the existing deep learning techniques have failed to provide enhanced precision performance. Computational complexity is the major issue in the existing techniques in which the overall system’s performance degrades. But, the proposed study used the most effective sentiment-based emotion classification techniques. Moreover, the proposed model minimizes computational complexity and enhances classification performance. Using the Twitter sentiment dataset, the attained precision value of the proposed CGARU is 99.64%, MLP is 90.67%, LSTM is 95.17%, Bi-LSTM is 97.04%, CNN is 97.95%, and DNN is 98.69%. By analyzing the classification performance using the IMDB Movie Review Dataset, the obtained precision of CGARU is 99.75, MLP is 91.73%, LSTM is 93.53%, Bi-LSTM is 96.46%, CNN is 96.95%, and DNN is 98.37%. Finally, the Yelp Reviews dataset attained the precision value of CGARU is 98.83%, MLP is 94.45%, LSTM is 96.15%, Bi-LSTM is 96.70%, CNN is 97.30%, DNN is 98.35%. The attained values show that the proposed model’s precision is superior to the other methods. The performance comparison of proposed and existing techniques in terms of recall is shown in Fig. 8.

Fig. 8
figure 8

Graphical comparison of the recall of different datasets for the proposed and the existing method

The above graphical representation exhibits the effectiveness of a proposed model. For each dataset, the recall performance of a proposed is higher than the other compared techniques. This analysis proves the effectiveness of the proposed classifier in determining the perfect positive predictions from all the positive predictions made. The Twitter sentiment dataset achieved the recall value of a proposed CGARU is 99.70%, MLP is 95.54%, LSTM is 96.06%, Bi-LSTM is 96.71%, CNN is 98.10%, DNN is 98.73%. By analyzing the classification performance using the IMDB Movie Review Dataset, the obtained recall of CGARU is 99.38, MLP is 93.01%, LSTM is 93.92%, Bi-LSTM is 95.90%, CNN is 98.50%, and DNN is 98.90%. Whereas the Yelp Reviews dataset attained the recall value of CGARU is 99.60%, MLP is 89.44%, LSTM is 90.95%, Bi-LSTM is 95.50%, CNN is 96.19%, and DNN is 96.82%. The simulation results state that the proposed classification model is highly suitable for sentiment and emotion classification. The performance comparison in terms of specificity is depicted in Fig. 9.

Fig. 9
figure 9

Graphical comparison of the specificity of different datasets for the proposed and the existing method

The specificity performance of a proposed dual-stage deep CGARU model is compared with other deep learning techniques. For each dataset, the specificity is determined for all the techniques. Compared with other techniques, the MLP attained reduced specificity performance in sentiment and emotion classification. Using the Twitter Sentiment dataset, the specificity of a proposed CGARU is 99.85%, MLP is 95.54%, LSTM is 96.71%, Bi-LSTM is 97.36%, CNN is 98.10%, DNN is 98.73%. Using the IMDB Movie Review Dataset, the obtained precision of CGARU is 99.75, MLP is 94.60%, LSTM is 96.92%, Bi-LSTM is 98.07%, CNN is 98.50%, and DNN is 99%. Finally, the Yelp Reviews dataset attained the precision value of CGARU is 99.83%, MLP is 96.05%, LSTM is 96.72%, Bi-LSTM is 97.07%, CNN is 97.91%, DNN is 98.44%. The mentioned analysis proves the efficacy of a proposed model. The performance comparison analysis in terms of the F1-score is shown in Fig. 10.

Fig. 10
figure 10

Graphical comparison of the F1-score of different datasets for the proposed and the existing method

The above graphical representation represents the F1-score comparison of proposed and conventional deep learning techniques. The graph mentions that the proposed dual-stage deep CGARU model obtains higher F1-score results for all datasets. Using the Twitter sentiment dataset, the F1-score of a proposed CGARU is 99.67%, MLP is 92%, LSTM is 96%, Bi-LSTM is 96.63%, CNN is 98.08%, and DNN is 98.71%. By determining the classification performance using the IMDB Movie Review Dataset, the obtained F1-score of CGARU is 99.47, MLP is 90.49%, LSTM is 93.01%, Bi-LSTM is 95.66%, CNN is 97.01%, and DNN is 97.85%. On the other hand, the Yelp Reviews dataset attained the F1-score value of CGARU is 99.66%, MLP is 92.14%, LSTM is 93.52%, Bi-LSTM is 95.41%, CNN is 97.59%, and DNN is 98.97%. The result shows that the proposed model is highly better regarding F1-score performance than others. The kappa performance comparison of proposed and existing techniques is shown in Fig. 11.

Fig. 11
figure 11

Graphical comparison of the kappa-coefficient of different datasets for the proposed and the existing method

The analysis of a kappa performance of proposed and existing techniques shows that the proposed sentiment-based emotion classification is better than the other techniques. Analyzing the classifier performance with different datasets enhances the kappa performance in each dataset in the proposed method. This shows that the proposed model is appropriate for sentiment and emotion classification. By performing the simulation using the Twitter sentiment dataset, the kappa coefficient of a proposed CGARU is 99.52%, MLP is 93.40%, LSTM is 95.92%, Bi-LSTM is 96.84%, CNN is 98.13%, and DNN is 99.07%. By determining the classification performance using the IMDB Movie Review Dataset, the obtained precision of CGARU is 98.99%, MLP is 94.01%, LSTM is 95.02%, Bi-LSTM is 95.90%, CNN is 96.91%, and DNN is 97.57%. On the other hand, the Yelp Reviews dataset attained the precision value of CGARU is 99.32%, MLP is 94.44%, LSTM is 95%, Bi-LSTM is 95.50%, CNN is 96.25%, DNN is 98.57%. The comparative analysis of a kappa coefficient shows that the proposed framework resulted in optimal performance compared to the other classifiers. The proposed approach on all three datasets resulted in a better kappa score than the other algorithms. This is because of improvements in the model that resulted in accurate classification. The proposed study analyses the classifier’s performance by varying the epoch size, depicted in Fig. 12.

Fig. 12
figure 12

Overall performance analysis by varying epochs

By varying the epoch size from 1 to 20, the performance is analyzed for the proposed classification model. The performance of the proposed model is influenced by the number of epochs followed. The proposed study also analyses the performance of each classified emotion. Table 7 shows the performance attained for each emotion in the proposed study.

Table 7 Attained performance for varied classes of emotions utilizing three datasets

The performance, like accuracy, precision, recall, specificity, F1-score and kappa coefficient, is analyzed for varied classes of emotions in three datasets. Initially, the developed dual-stage deep CGARU model classifies the sentiments from the input data. The GARU module also classifies emotions based on positive, negative and neutral sentiments. The performance is varied for each class of emotions according to its nature. The results reveal that the proposed model is more efficient for classifying emotions from the input text data.

4.4 Discussion

The overall simulation results and outcomes prove that the proposed model can accurately predict the emotional tone behind the input text. One of the significant advantages of the proposed method is that it can deal with unstructured textual data, which is a real-time problem in most of the existing sentiment analysis frameworks. The unstructured data presents complexities such as difficulty in understanding the slang used, original emotion behind the words, number of emotions exhibited and processing of words in sequence. Due to these complexities, the unstructured data issue is ill-focused in most existing sentiment analysis mechanisms. However, the proposed framework efficiently classified the unstructured data and provided accurate emotional labels. The analysis conducted in the last section also proved the performance efficacy of a proposed method compared to other algorithms. The existing state-of-the-art works have been evaluated, and the results are presented in Table 8.

Table 8 Performance comparison of the proposed and existing state-of-the-art works

The comparison shows that the proposed approach attained state-of-the-art results compared to the other techniques. It is also seen that the proposed method outperformed the existing techniques in terms of all the considered metrics. Most of the approaches considered for comparison considered the LSTM model to preserve the temporal features. However, the proposed model included the GRU model in place of LSTM to preserve the temporal features, further enhancing classification quality. Moreover, the training of a proposed model has been enhanced by using a convolutional module that extracted and learned only the crucial features suitable for classification. Among the compared methods, the method presented by Chen et al. [7] resulted in a better accuracy rate of 92.68% which is incomparable to the other considered techniques. Also, the accuracy attained by the method introduced by Pathak et al. [19] is 88.9% which is also better in terms of sentiment classification. The F1-score attained by the method of Chen et al. [7] is also high, with an average F1-score of 88.41%, and this is followed by the method introduced by Pathak et al. [19] with an average F1-score of 87.9%. Apart from these, the recall value scored by Pathak et al. [19] is also high, proving that it offers effective performance. However, the proposed approach resulted in higher precision, recall, F1-score and accuracy values. Thus, it can be proved from the analysis that a proposed approach is more optimal and accurate in performing sentiment classification compared to the existing state-of-the-art techniques.

Though the proposed model is optimal and stable in efficiently classifying sentiments from textual data, some challenges can still be addressed to achieve higher performance. The pre-trained models are highly utilized in most existing domains to achieve higher performance results. This can also be followed in this field to attain better results where the pre-training process can be improved on any considered granularity (i.e. character, word or sub-word). Another possible research direction is deep multi-task Learning, a growing area in NLP-related tasks. Multiple tasks can be learned simultaneously to avoid overfitting issues in the network. Since the existing DL models are highly non-interpretable, explainable DL models can also be followed. The recent focus on common sense knowledge can also be utilized along with the classifier to attain better and satisfactory results. Another research direction arises from the availability of poorly labelled data resources. To deal with such problem, low-resource methods can also be adopted to provide better results in classification. Another research scope is that deeper network models with changes in the architectural design can also help the developers to attain better future results.

5 Conclusion

This paper proposes an effective sentiment polarity-based emotion classification with the help of dual stage deep learning model. The proposed model includes four phases: pre-processing, textual feature extraction, feature selection and classification. Initially, the noises in the input data are eliminated in the pre-processing stage through effective mechanisms. After pre-processing, the features are extracted using LSA and LTF-MICF model in the feature extraction stage. The features extracted in this stage contain higher dimensionality, which may generate increased computational complexity. Thus, the proposed work selects the optimal feature set in the feature selection phase using the CAHA approach. Next, the selected features are fed into the deep CGARU model for sentiment and emotion classification. The initial convolutional stage classifies the positive, negative, and neutral sentiments, and then the CGARU stage classifies the multi-class emotions from the given text data. The proposed experimental setup used three datasets, and the performance are analyzed for these datasets. The results show that the proposed model obtained increased classification accuracy for Twitter Sentiment Dataset 99.80%, IMDB Movie Reviews 99.75%, and Yelp Reviews Dataset 99.83%. Different features will be extracted in future studies using other approaches to improve classification performance. Moreover, the proposed work will be extended to determine emotions through big data.