Keywords

1 Introduction

The growing need for emotion recognition from many sources has resulted from advances in computational linguistics and Natural Language Processing (NLP). Emotions can manifest themselves in a variety of ways. The most common and widely used emotion classification is of Ekman’s model, in which he classified emotion in six categories (Happy, Anger, Sad, Disgust, Fear, Surprise). After the research many new areas were introduced in affective computing and sentiment analysis [1]. Human-computer interaction (HCI) focuses on the detecting human emotion from nonverbal data. However, still there is lot of confusion and obstacles to get better accuracy over emotion recognition from text [2].

According to psychological research, there are several theories regarding how to express emotions, but two are the most essential and widely employed in existing Sentiment Analysis techniques: emotional categories and emotional dimensions [1]. The category model assumes that emotions are divided into separate groups. This method incorporates Ekman’s [1] fundamental emotion model. ANGER, DISGUST, FEAR, HAPPINESS, SADNESS, and SURPRISE were identified as the six primary emotions by Ekman. Plutchik identifies eight fundamental bipolar feelings, which are a superset of Ekman’s plus two additional emotions: TRUST and ANTICIPATION. Anger vs. fear, joy versus sadness, surprise versus anticipation and trust versus disgust, are the four bipolar groups into which these eight emotions are classified. Dimensions of emotion affects are represented in a. three-dimensional shape via methods (Fig. 1).

Fig. 1
A circular model of emotion lists the dimensions of activation, pleasant, deactivation, and unpleasant.

Emotion model

In this model, every emotion has its own proximity [3]. Among all the models, the most distinctive and popular method is of the Rusell’s method. Emotions are organized in a two-dimensional circular region, according to Rusell’s [4] Circumplex Model of Affect: arousal dimension and valence dimension. UNPLEASANT and PLEASANT feelings can be detected by the valence dimension. The arousal component categorized between DEACTIVATION and ACTIVATION phases (Fig. 2).

Fig. 2
A diagram represents the different types of approaches that are rule-based, machine learning based, and hybrid.

Different types of approaches

In this paper, a systematic survey of the current status, databases and future dimensions. Thus, we have categorized mainly three categories. (1) Rule-based approach (2) Machine Learning-based approach (3) Hybrid approach. In Sect. 5 of this paper, databases, which are necessary to deploy the successful model are discussed and concludes the paper with some analysis, comparison and some points to improve research on this area of affective computing.

2 Rule-Based Approach

Rule-based categorization applies to categories of the emotions in user evaluations through the set of “if–then” rules. The “if” clause is regarded as “rule antecedent”, and the “then” clause is known as a “rule consequent” clause. [5]. Rules can be easily created; however, this is a tedious and time-consuming process. Mostly, the rule-based approach consists of keywords recognition and lexicons [6].

2.1 Keyword Based Approach

In this approach the emotion is predicated using the concept of keyword independence, it mostly excludes the possibility of expressing complex emotions using many sorts of keywords at the same time [7]. To check the meaning of the word this model has to deal with numerous dictionaries and lexicons. Such as, WordNet-affect [8]. They devised a simple algorithm for detecting emotive terms in the sentence and computing a result that reflected the word frequency from the text’s subjective lexicon [8].

At the beginning of the step, words will get tokenized and prevent the pronouns and prepositions to get enter in the process as they do not directly contribute to the emotion [9]. After finding the lexicon words, a label is given to the sentence by matching the relevant frequency. Among the term complexity and the unavailability of linguistic data, inadequate keywords might have a significant impact on the approach’s efficiency [6].

2.2 Lexical Affinity Method

Though the keyword technique is easy and straightforward to implement, it has some barriers which can be resolved by the affinity method. As there are many words that have different usage according to the sentence and emotion associated with them [7].

Aside from emotional keywords, there are a few other things to note. The Lexical Affinity method is a development in the keyword detection approach; it gives a probabilistic “affinity” for a specific emotion to arbitrary phrases [10]. The probability that this technique assigns is part of linguistic corpora. The given possibility is prejudiced toward one particular content type, and it does not identify sentiments derived from the words that do not exist globally on which this methodology relies on, which are some of its flaws [11]. For instance, because the word “accident” has a high likelihood of suggesting a negative feeling, sentences like “I averted an accident” or “I met my lover by accident” would not contribute appropriately to the emotional evaluation.

This lexicon-based approach has mainly 3 resources to develop the relevant meaning of words. (1) utilize the vocabulary of emotions from DUTIR1. (2) Gather a few slang phrases and utilize them. (3) To expand the vocabulary, assemble a collection of emoticons from the short blog website [11]. Two types of lexicon-based methods based on sentiment lexicon include dictionary-based and corpus-based techniques.

In general, a dictionary keeps track of words in a language in a systematic way, whereas a corpus keeps track of text in a language at random [12].

2.3 Statistical Approach

Latent Semantic Analysis (LSA) is the statistical technique to evaluate the links between a group of texts and phrases they include in those documents in addition to creating a set of relevant features in most knowledge-based works [13, 14]. To determine the contextual information between words and sentiment phrases efficiently, the Hyperspace Analogue to Language (HAL) was used. In 2013, wang has suggested a new approach that uses an efficient and better LSA for emotion classification of text using the dataset of ISEAR [15]. This method is also regarded as the lexical based method.

3 Machine Learning Based Approach

To address the problem in a new and effective way, machine learning-based approaches are being used. The challenge with rule-based approaches used to be determining emotions from provided texts, but now the issue is classifying the incoming data/text into various emotions. Machine Learning techniques try to recognize emotions by employing a learned classifier/model that may be applied to a variety of machine learning topics, such as SVM, KNN, naive Bayes (NB), and CRF to determine which emotion category [anger, sadness, joy, fear, disgust, trust, and surprise] should be used.

In these techniques, the emotion is recognized through classification methodologies, which relies on previously trained samples. Hence, the concept of machine learning-based categorization is also known as supervised learning as the model is guided by pre-trained or pre-classified data. Ref. [16] Rather than following solely explicitly coded instructions, such algorithms build a model from different data sources and then utilize that model to make judgements [17].

In emotion detection, categorical procedures are the most common by Calvo [3]. The model by Alm [18] was used to create one of the initial pieces. Roth conducted an empirical study of employing supervised machine learning using the architecture of SNoW learning [4] which has been described in this proposal. They employed a corpus with an expanded collection of Ekman fundamental emotions that were annotated. In one of the experiments given in their paper, Strapparava [8] utilized a Naive Bayes classification algorithm supervised on LiveJournal.com blog postings. They used blog postings that had Ekman’s emotions as references (Fig. 3).

Fig. 3
A flow diagram represents how the input and the training data are predicted and evaluated using the machine learning algorithm.

Machine learning based approach

We can build emotion-based models using both category and dimensional approaches in studies that use supervised learning algorithms. To detect the writer’s emotion class Balabantaray has provided a classification model [19] which is based on SVM multi-class and makes decisions based on Ekman’s [1] basic emotions. Roberts [20] uses the emotion of LOVE as one of Ekman’s six fundamental emotions. To recognize each of the seven emotions, their method employs a series of binary SVM classifiers to recognize each of the seven emotions. Suttles [21] categorizes emotions based on the 8 fundamental bipolar emotions which were given by Plutchick in previous categorical emotion modelling work. As a result, they might consider emotion recognition to be a discrete problem, apart from a multi-class problem. Strapparava [8] devised a framework, which employs several types of Latent Semantic Analysis to identify emotions when there are no affective words in a text. However, their technique is inaccurate since it is not context-sensitive and fails to capture a conceptual examination of the statement.

Burget [22] proposed a system that significantly relies on pre-processing and labeling the incoming data (Czech Newspaper Headlines) with a classifier. Pre-processing was done at both the levels, sentence and word levels, by employing POS tagging, tokenization, and removing stop-words. To compute the importance between each term and each emotion class (TF-IDF), Inverse Document Frequency was employed. They were able to achieve a prediction performance of 80% on 1000 Czech news heads-6 lines using SVM and tenfold cross-validation. Their approach, however, was not validated on an English dataset. Also, because it solely considers emotional keywords as attributes, it is not context-sensitive.

Dung [23] based his argument on the idea that emotions are connected to cognitive processes that are triggered through emotional events. This indicates that when a specific event occurs, the human mind originates from one mental state and then transforms into another. They used a Hidden Markov Model (HMM) to implement this notion, in which each sentence is made up of several sub-ideas, each of which is regarded as an incident that involves a state change. The algorithm received an Fscore of 35% on the dataset of ISEAR [24], with the greatest precision of 47%. The system’s poor accuracy was owing to its failure to evaluate the sentence’s semantic and syntactic analysis, making it non-context sensitive.

4 Hybrid Approach

Emotions are detected in hybrid approaches by combining emotion-based keywords and machine learning features obtained from allocated training datasets with knowledge from many areas, such as human psychology [25]. There has been few research on the difficulty of extracting feelings from literature, which neglects keywords based on emotion [25,26,27,28].

Wu and others [25] suggested technique for phrase emotion mining is based on identifying preset conceptual tags and sentence characteristics, then classifying just one emotion, joyful, related to biological patterns of human emotions. This was an ambiguous method when one EGR may include multiple emotions.

By establishing common action histogram between two entities, Cheng-Yu and others. Ref. [26] accomplished vent-level textual emotion detection. Each column represents the degree to which the two entities shared an action (verb). They received an F-score of 75% when tested on four emotions. On the other hand, their technique ignores the content of the phrase and relies largely on the structure of the training data, such as the grammatical type of sentences in the data and the frequency of emotions for a certain subject. Furthermore, only four of the six Ekman emotions are used in the categorization.

Chaumartin [27] created UPAR7, a framework based on linguistic rules based on the lexical resources], SentiWordNet [29], WordNet-Affect [17] and WordNet [30]. This is based on the Stanford POS tagger’s dependency graph [31], which is used by the system, with the root nodes of the derived graph serving as the main subject. For each emotion, each word in the statement is assessed separately. Because the principal objective (major word) is more important than the other words in the sentence, it receives a higher grade. This method’s best accuracy was 30% for the Ekman model’s six emotions. This method is not context-sensitive and lacks a global comprehension of the language, in addition to its low accuracy.

For text-based emotion recognition, BERT is the most studied transformer-based model. The research suggests that these BERT variants be investigated in terms of detecting emotions in textual data [13]. The LSTM Bi-directional words embedding and annotated corpus were proposed by [32]. The first stage is to apply a preprocessing technique to the input data, in which we remove excessive spaces, incorrect characters, resolve character encoding, and do spelling correction.

Yang [33] suggested a machine-learning-based emotion classification system that combines CRF-based (conditional random field) emotion trigger identification, phrase-based detection, and SVM, Naive Bayesian, and Max Entropy-based emotion classification. The system performed well on a dataset of suicide notes, with an Fscore of 61%, precision of 58%, and recall of 64%. This strategy produced reasonable results, but neither the classifier nor the dataset is published.

5 Dataset

The gathering of data relevant to the course is the next essential stage in recognizing emotions from the text after settling on the model to represent emotions. For research purposes, there are a few structured annotated datasets for emotion detection that are freely available. This section lists the most important publicly accessible datasets and their characteristics. Table 1 lists the datasets, their characteristics, and the emotion models they reflect.

Table 1 Characteristics of dataset

5.1 ISEAR Dataset

The world-level study on Emotion Antecedents and Reactions (ISEAR) project, headed by Harald Wallbott and Klaus R. Scherer, gathered data from a large group of psychologists around the globe in which 7 emotion labels were declared (joy, sadness, fear, anger, guilt, disgust, and shame). It was obtained through the questionnaire from 37 different countries and 3000 respondents from 5 continents [34] (Table 2).

Table 2 Characteristics of ISEAR database

5.2 Emobank

EmoBank is a collection of over 10,000 phrase corpus labelled with multidimensional emotional metadata in the Valence-Arousal-Dominance (VAD) style, combining various genres. EmoBank is not only bi-representational but also bi-perspectival in design, which makes it stand out [35]. It is consisting of the reader’s and writer’s emotions. The automatic mapping among categorical and dimensional makes the dataset feasible and efficient to use in machine learning techniques.

5.3 SemEval-2017 Task 4

The Semantic Evaluations (SemEval) database contains Arabic and English news headlines from different sources. The task adds substantial benefits to the sentiment community by making a large, publicly accessible benchmark dataset with over 70,000 tweets in two languages available for academics to examine and compare their approach to the current [36].

5.4 WASSA-2017 Emotion Intensities (EmoInt)

To estimate emotion intensities in tweets, researchers analyzed data from the seminar on Quantitative Methodologies to social media, Sentiment and Subjectivity. For the four feelings (fear, pleasure, rage, and sorrow), training and testing data were represented. For instance, the predicted value of the user’s anger was measured in 0 and 1, which describe the level of anger along with that, it has a corresponding tweet regarding that anger text [37].

5.5 Emotion-Stimulus Dataset

The Emotion-Stimuli Dataset was created by Ghazi et al. in 2015 and using FrameNet’s emotions-directed frame, both the emotion and the stimuli were validated. There are categories for happiness, sorrow, anger, fear, surprise, disgust, and humiliation. There are 820 sentences having both a cause and an emotion tag, and 1594 sentences with only an emotion tag and has 2414 items in XML file. This dataset was built carefully and in well-manner form.

6 Comparison

All of these approaches are appropriated in certain manners. Most experiments and papers are demonstrated that comparatively hybrid methods perform well than learning and rule-based performs solely. Ahead of these approaches Poonam [34] suggested their method, which seems to perform better than previous ideologies. Their results are depicted below, each row retrieved from their results (Table 3).

Table 3 Comparison of different approaches

Based on the research findings of the experiment, a graphical comparison of all significant approaches to emotion recognition is shown below (Fig. 4).

Fig. 4
A vertical bar graph plots the different approaches against the number of samples. The emotions positive, negative, and neutral are depicted.

Comparison of different approaches for detecting emotions

7 Limitations and Future Scope

Dimensional and discrete models are used in the current TER tasks. However, universal annotation standards should be established, as their lack shows incompatibilities among existing techniques. SemEval [36] is annotated with Ekman’s six basic emotions, for example, Wassa-2017 (EmoInt) Fear, pleasure, rage, and sorrow are the four emotions annotated in the dataset [37]. Various emotion labels have been proposed, each of which has an impact on the compatibility of cross-corpus resources. This limitation (incompatible notation) can be used to indicate a lack of data for training and testing [14] (Table 4).

Table 4 Limitation of current works in text-based emotion detection

In future work, solving this limitation of incompatibility between model for labelling various emotions will enhance integration of existing corpus resources. We highly encourage future researchers to design substantial large-scale dataset which includes micro-scale emotion too, which can be helpful to train model effectively. Along with this, creating efficient generative deep learning model will be more effective solutions with large dataset.

8 Conclusion

In this survey, we discussed mainly three approaches to classify or recognize the emotion based on text data on the fine-grade level. Our contribution readdresses the techniques and concluded that among the presented approaches hybrid method seems logical when the community or team have enough trained data as well as a strong keyword for employing the rule-based methodology. On the other hand, in the particular learning approaches, clearly observed by several researchers that Support-vector-machine (SVM), Naïve Bayesian, and KNN derived impressive outcomes. It is also noticeable overall that the classifier model is context-sensitive thanks to syntactic and semantic analysis of the sentence, and the use of ConceptNet and Wordent assist the algorithm in characterizing the training dataset, resulting in improved coverage of emotion rules. We have also taken various wide-range datasets into consideration, which can be helpful in the case trained own model or performing analysis over the data. The entire survey has identified that, rather than giving an individual emotional rate to each word, examining the relationships between the terms of the sentence could lead to greater accuracy.