Abstract
Text Processing is a method for comprehending, analyzing, and cleaning text as well as performing actions on the same data. The technique is used to extract meaningful data from text. It is a written form of communication to express emotions through text. Happy, neutral, fear, sadness, surprise, disgust, and anger are the most common emotional expressions. As a result, in the social media era, identifying emotions from text is especially important. A survey of operational methods and approaches for identifying emotion from textual data is discussed in this paper. This research primarily focuses on existing datasets and methodologies that incorporate a Lexical keyword, Machine Learning and Hybrid-based approach.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The growing need for emotion recognition from many sources has resulted from advances in computational linguistics and Natural Language Processing (NLP). Emotions can manifest themselves in a variety of ways. The most common and widely used emotion classification is of Ekman’s model, in which he classified emotion in six categories (Happy, Anger, Sad, Disgust, Fear, Surprise). After the research many new areas were introduced in affective computing and sentiment analysis [1]. Human-computer interaction (HCI) focuses on the detecting human emotion from nonverbal data. However, still there is lot of confusion and obstacles to get better accuracy over emotion recognition from text [2].
According to psychological research, there are several theories regarding how to express emotions, but two are the most essential and widely employed in existing Sentiment Analysis techniques: emotional categories and emotional dimensions [1]. The category model assumes that emotions are divided into separate groups. This method incorporates Ekman’s [1] fundamental emotion model. ANGER, DISGUST, FEAR, HAPPINESS, SADNESS, and SURPRISE were identified as the six primary emotions by Ekman. Plutchik identifies eight fundamental bipolar feelings, which are a superset of Ekman’s plus two additional emotions: TRUST and ANTICIPATION. Anger vs. fear, joy versus sadness, surprise versus anticipation and trust versus disgust, are the four bipolar groups into which these eight emotions are classified. Dimensions of emotion affects are represented in a. three-dimensional shape via methods (Fig. 1).
In this model, every emotion has its own proximity [3]. Among all the models, the most distinctive and popular method is of the Rusell’s method. Emotions are organized in a two-dimensional circular region, according to Rusell’s [4] Circumplex Model of Affect: arousal dimension and valence dimension. UNPLEASANT and PLEASANT feelings can be detected by the valence dimension. The arousal component categorized between DEACTIVATION and ACTIVATION phases (Fig. 2).
In this paper, a systematic survey of the current status, databases and future dimensions. Thus, we have categorized mainly three categories. (1) Rule-based approach (2) Machine Learning-based approach (3) Hybrid approach. In Sect. 5 of this paper, databases, which are necessary to deploy the successful model are discussed and concludes the paper with some analysis, comparison and some points to improve research on this area of affective computing.
2 Rule-Based Approach
Rule-based categorization applies to categories of the emotions in user evaluations through the set of “if–then” rules. The “if” clause is regarded as “rule antecedent”, and the “then” clause is known as a “rule consequent” clause. [5]. Rules can be easily created; however, this is a tedious and time-consuming process. Mostly, the rule-based approach consists of keywords recognition and lexicons [6].
2.1 Keyword Based Approach
In this approach the emotion is predicated using the concept of keyword independence, it mostly excludes the possibility of expressing complex emotions using many sorts of keywords at the same time [7]. To check the meaning of the word this model has to deal with numerous dictionaries and lexicons. Such as, WordNet-affect [8]. They devised a simple algorithm for detecting emotive terms in the sentence and computing a result that reflected the word frequency from the text’s subjective lexicon [8].
At the beginning of the step, words will get tokenized and prevent the pronouns and prepositions to get enter in the process as they do not directly contribute to the emotion [9]. After finding the lexicon words, a label is given to the sentence by matching the relevant frequency. Among the term complexity and the unavailability of linguistic data, inadequate keywords might have a significant impact on the approach’s efficiency [6].
2.2 Lexical Affinity Method
Though the keyword technique is easy and straightforward to implement, it has some barriers which can be resolved by the affinity method. As there are many words that have different usage according to the sentence and emotion associated with them [7].
Aside from emotional keywords, there are a few other things to note. The Lexical Affinity method is a development in the keyword detection approach; it gives a probabilistic “affinity” for a specific emotion to arbitrary phrases [10]. The probability that this technique assigns is part of linguistic corpora. The given possibility is prejudiced toward one particular content type, and it does not identify sentiments derived from the words that do not exist globally on which this methodology relies on, which are some of its flaws [11]. For instance, because the word “accident” has a high likelihood of suggesting a negative feeling, sentences like “I averted an accident” or “I met my lover by accident” would not contribute appropriately to the emotional evaluation.
This lexicon-based approach has mainly 3 resources to develop the relevant meaning of words. (1) utilize the vocabulary of emotions from DUTIR1. (2) Gather a few slang phrases and utilize them. (3) To expand the vocabulary, assemble a collection of emoticons from the short blog website [11]. Two types of lexicon-based methods based on sentiment lexicon include dictionary-based and corpus-based techniques.
In general, a dictionary keeps track of words in a language in a systematic way, whereas a corpus keeps track of text in a language at random [12].
2.3 Statistical Approach
Latent Semantic Analysis (LSA) is the statistical technique to evaluate the links between a group of texts and phrases they include in those documents in addition to creating a set of relevant features in most knowledge-based works [13, 14]. To determine the contextual information between words and sentiment phrases efficiently, the Hyperspace Analogue to Language (HAL) was used. In 2013, wang has suggested a new approach that uses an efficient and better LSA for emotion classification of text using the dataset of ISEAR [15]. This method is also regarded as the lexical based method.
3 Machine Learning Based Approach
To address the problem in a new and effective way, machine learning-based approaches are being used. The challenge with rule-based approaches used to be determining emotions from provided texts, but now the issue is classifying the incoming data/text into various emotions. Machine Learning techniques try to recognize emotions by employing a learned classifier/model that may be applied to a variety of machine learning topics, such as SVM, KNN, naive Bayes (NB), and CRF to determine which emotion category [anger, sadness, joy, fear, disgust, trust, and surprise] should be used.
In these techniques, the emotion is recognized through classification methodologies, which relies on previously trained samples. Hence, the concept of machine learning-based categorization is also known as supervised learning as the model is guided by pre-trained or pre-classified data. Ref. [16] Rather than following solely explicitly coded instructions, such algorithms build a model from different data sources and then utilize that model to make judgements [17].
In emotion detection, categorical procedures are the most common by Calvo [3]. The model by Alm [18] was used to create one of the initial pieces. Roth conducted an empirical study of employing supervised machine learning using the architecture of SNoW learning [4] which has been described in this proposal. They employed a corpus with an expanded collection of Ekman fundamental emotions that were annotated. In one of the experiments given in their paper, Strapparava [8] utilized a Naive Bayes classification algorithm supervised on LiveJournal.com blog postings. They used blog postings that had Ekman’s emotions as references (Fig. 3).
We can build emotion-based models using both category and dimensional approaches in studies that use supervised learning algorithms. To detect the writer’s emotion class Balabantaray has provided a classification model [19] which is based on SVM multi-class and makes decisions based on Ekman’s [1] basic emotions. Roberts [20] uses the emotion of LOVE as one of Ekman’s six fundamental emotions. To recognize each of the seven emotions, their method employs a series of binary SVM classifiers to recognize each of the seven emotions. Suttles [21] categorizes emotions based on the 8 fundamental bipolar emotions which were given by Plutchick in previous categorical emotion modelling work. As a result, they might consider emotion recognition to be a discrete problem, apart from a multi-class problem. Strapparava [8] devised a framework, which employs several types of Latent Semantic Analysis to identify emotions when there are no affective words in a text. However, their technique is inaccurate since it is not context-sensitive and fails to capture a conceptual examination of the statement.
Burget [22] proposed a system that significantly relies on pre-processing and labeling the incoming data (Czech Newspaper Headlines) with a classifier. Pre-processing was done at both the levels, sentence and word levels, by employing POS tagging, tokenization, and removing stop-words. To compute the importance between each term and each emotion class (TF-IDF), Inverse Document Frequency was employed. They were able to achieve a prediction performance of 80% on 1000 Czech news heads-6 lines using SVM and tenfold cross-validation. Their approach, however, was not validated on an English dataset. Also, because it solely considers emotional keywords as attributes, it is not context-sensitive.
Dung [23] based his argument on the idea that emotions are connected to cognitive processes that are triggered through emotional events. This indicates that when a specific event occurs, the human mind originates from one mental state and then transforms into another. They used a Hidden Markov Model (HMM) to implement this notion, in which each sentence is made up of several sub-ideas, each of which is regarded as an incident that involves a state change. The algorithm received an Fscore of 35% on the dataset of ISEAR [24], with the greatest precision of 47%. The system’s poor accuracy was owing to its failure to evaluate the sentence’s semantic and syntactic analysis, making it non-context sensitive.
4 Hybrid Approach
Emotions are detected in hybrid approaches by combining emotion-based keywords and machine learning features obtained from allocated training datasets with knowledge from many areas, such as human psychology [25]. There has been few research on the difficulty of extracting feelings from literature, which neglects keywords based on emotion [25,26,27,28].
Wu and others [25] suggested technique for phrase emotion mining is based on identifying preset conceptual tags and sentence characteristics, then classifying just one emotion, joyful, related to biological patterns of human emotions. This was an ambiguous method when one EGR may include multiple emotions.
By establishing common action histogram between two entities, Cheng-Yu and others. Ref. [26] accomplished vent-level textual emotion detection. Each column represents the degree to which the two entities shared an action (verb). They received an F-score of 75% when tested on four emotions. On the other hand, their technique ignores the content of the phrase and relies largely on the structure of the training data, such as the grammatical type of sentences in the data and the frequency of emotions for a certain subject. Furthermore, only four of the six Ekman emotions are used in the categorization.
Chaumartin [27] created UPAR7, a framework based on linguistic rules based on the lexical resources], SentiWordNet [29], WordNet-Affect [17] and WordNet [30]. This is based on the Stanford POS tagger’s dependency graph [31], which is used by the system, with the root nodes of the derived graph serving as the main subject. For each emotion, each word in the statement is assessed separately. Because the principal objective (major word) is more important than the other words in the sentence, it receives a higher grade. This method’s best accuracy was 30% for the Ekman model’s six emotions. This method is not context-sensitive and lacks a global comprehension of the language, in addition to its low accuracy.
For text-based emotion recognition, BERT is the most studied transformer-based model. The research suggests that these BERT variants be investigated in terms of detecting emotions in textual data [13]. The LSTM Bi-directional words embedding and annotated corpus were proposed by [32]. The first stage is to apply a preprocessing technique to the input data, in which we remove excessive spaces, incorrect characters, resolve character encoding, and do spelling correction.
Yang [33] suggested a machine-learning-based emotion classification system that combines CRF-based (conditional random field) emotion trigger identification, phrase-based detection, and SVM, Naive Bayesian, and Max Entropy-based emotion classification. The system performed well on a dataset of suicide notes, with an Fscore of 61%, precision of 58%, and recall of 64%. This strategy produced reasonable results, but neither the classifier nor the dataset is published.
5 Dataset
The gathering of data relevant to the course is the next essential stage in recognizing emotions from the text after settling on the model to represent emotions. For research purposes, there are a few structured annotated datasets for emotion detection that are freely available. This section lists the most important publicly accessible datasets and their characteristics. Table 1 lists the datasets, their characteristics, and the emotion models they reflect.
5.1 ISEAR Dataset
The world-level study on Emotion Antecedents and Reactions (ISEAR) project, headed by Harald Wallbott and Klaus R. Scherer, gathered data from a large group of psychologists around the globe in which 7 emotion labels were declared (joy, sadness, fear, anger, guilt, disgust, and shame). It was obtained through the questionnaire from 37 different countries and 3000 respondents from 5 continents [34] (Table 2).
5.2 Emobank
EmoBank is a collection of over 10,000 phrase corpus labelled with multidimensional emotional metadata in the Valence-Arousal-Dominance (VAD) style, combining various genres. EmoBank is not only bi-representational but also bi-perspectival in design, which makes it stand out [35]. It is consisting of the reader’s and writer’s emotions. The automatic mapping among categorical and dimensional makes the dataset feasible and efficient to use in machine learning techniques.
5.3 SemEval-2017 Task 4
The Semantic Evaluations (SemEval) database contains Arabic and English news headlines from different sources. The task adds substantial benefits to the sentiment community by making a large, publicly accessible benchmark dataset with over 70,000 tweets in two languages available for academics to examine and compare their approach to the current [36].
5.4 WASSA-2017 Emotion Intensities (EmoInt)
To estimate emotion intensities in tweets, researchers analyzed data from the seminar on Quantitative Methodologies to social media, Sentiment and Subjectivity. For the four feelings (fear, pleasure, rage, and sorrow), training and testing data were represented. For instance, the predicted value of the user’s anger was measured in 0 and 1, which describe the level of anger along with that, it has a corresponding tweet regarding that anger text [37].
5.5 Emotion-Stimulus Dataset
The Emotion-Stimuli Dataset was created by Ghazi et al. in 2015 and using FrameNet’s emotions-directed frame, both the emotion and the stimuli were validated. There are categories for happiness, sorrow, anger, fear, surprise, disgust, and humiliation. There are 820 sentences having both a cause and an emotion tag, and 1594 sentences with only an emotion tag and has 2414 items in XML file. This dataset was built carefully and in well-manner form.
6 Comparison
All of these approaches are appropriated in certain manners. Most experiments and papers are demonstrated that comparatively hybrid methods perform well than learning and rule-based performs solely. Ahead of these approaches Poonam [34] suggested their method, which seems to perform better than previous ideologies. Their results are depicted below, each row retrieved from their results (Table 3).
Based on the research findings of the experiment, a graphical comparison of all significant approaches to emotion recognition is shown below (Fig. 4).
7 Limitations and Future Scope
Dimensional and discrete models are used in the current TER tasks. However, universal annotation standards should be established, as their lack shows incompatibilities among existing techniques. SemEval [36] is annotated with Ekman’s six basic emotions, for example, Wassa-2017 (EmoInt) Fear, pleasure, rage, and sorrow are the four emotions annotated in the dataset [37]. Various emotion labels have been proposed, each of which has an impact on the compatibility of cross-corpus resources. This limitation (incompatible notation) can be used to indicate a lack of data for training and testing [14] (Table 4).
In future work, solving this limitation of incompatibility between model for labelling various emotions will enhance integration of existing corpus resources. We highly encourage future researchers to design substantial large-scale dataset which includes micro-scale emotion too, which can be helpful to train model effectively. Along with this, creating efficient generative deep learning model will be more effective solutions with large dataset.
8 Conclusion
In this survey, we discussed mainly three approaches to classify or recognize the emotion based on text data on the fine-grade level. Our contribution readdresses the techniques and concluded that among the presented approaches hybrid method seems logical when the community or team have enough trained data as well as a strong keyword for employing the rule-based methodology. On the other hand, in the particular learning approaches, clearly observed by several researchers that Support-vector-machine (SVM), Naïve Bayesian, and KNN derived impressive outcomes. It is also noticeable overall that the classifier model is context-sensitive thanks to syntactic and semantic analysis of the sentence, and the use of ConceptNet and Wordent assist the algorithm in characterizing the training dataset, resulting in improved coverage of emotion rules. We have also taken various wide-range datasets into consideration, which can be helpful in the case trained own model or performing analysis over the data. The entire survey has identified that, rather than giving an individual emotional rate to each word, examining the relationships between the terms of the sentence could lead to greater accuracy.
References
Paul E (1999) Basic emotions. In handbook of cognition and emotion, pp 45–60; Francisco V, Gervás P (2013) EmoTag: an approach to automated mark-up of emotions in texts. Comput Intell 29(4):680–721.
Sebe N, Cohen I, Gevers T, Huang TS (2005)Multimodal approaches for emotion recognition: a survey. Proceedings of SPIE—the international society for optical engineering, vol 5670, 08, pp 56–67. https://doi.org/10.1117/12.600746
Calvo RA, Kim SM (2013) Emotions in text: dimensional and categorical models. Comput Intell 29(3)
Roth D, Cumby C, Carlson A, Rosen J (1999) The SNoW learning architecture. Technical report, UIUC Computer Science Department; Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178
Asghar MZ, Khan A, Ahmad S, Qasim M, Khan IA (2017) Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2):e0171649. https://doi.org/10.1371/journal.pone.0171649
Acheampong FA, Wenyu C, Nunoo-Mensah H (2020) Text-based emotion detection: advances, challenges, and opportunities. Eng Rep 2:e12189. https://doi.org/10.1002/eng2.12189
Kao E, Liu CC, Yang T-H, Hsieh C-T, Soo V-W (2009) Towards text-based emotion detection: a survey and possible improvements. Proceedings—2009 international conference on information management and engineering, ICIME 2009, pp 70–74. https://doi.org/10.1109/ICIME.2009.113
Strapparava C, Mihalcea R (2008) Learning to identify emotions in text. In: Proceedings of the ACM symposium on applied computing, pp 1556–1560. https://doi.org/10.1145/1363686.1364052
Seal D, Roy UK, Basak R (2020) Sentence-level emotion detection from text based on semantic rules. In: Tuba M, Akashe S, Joshi A (eds) Information and communication technology for sustainable development. Advances in intelligent systems and computing, vol 933. Springer, Singapore. https://doi.org/10.1007/978-981-13-7166-0_42
Shivhare SN, Khethawat S (2012) Emotion detection from text. Comput Sci Inf Technol 2. https://doi.org/10.5121/csit.2012.2237
Chopade R (June 2015) Text based emotion recognition: a survey. Int J Sci Res (IJSR) 4(6):409–414. https://www.ijsr.net/search_index_results_paperid.php?id=SUB155271
Nandwani P, Verma R (2021) A review on sentiment analysis and emotion detection from text. Soc Netw Anal Min 11:81. https://doi.org/10.1007/s13278-021-00776-6
Acheampong FA, Nunoo-Mensah H, Chen W (2021) Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev 54:5789–5829. https://doi.org/10.1007/s10462-021-09958-2
Deng J, Ren F. A survey of textual emotion recognition and its challenges. In: IEEE transactions on affective computing. https://doi.org/10.1109/TAFFC.2021.3053275
Wang X, Zheng Q (2013) Text emotion classification research based on improved latent semantic analysis algorithm. https://doi.org/10.2991/iccsee.2013.55
Acheampong FA, Wenyu C, Nunoo-Mensah H (28 May 2020) Text-based emotion detection: advances, challenges, and opportunities. Wiley Online Library
Bishop CM (2006) Pattern recognition and machine learning. Springer; Bradley MM, Lang PJ (1999) Affective norms for English words (ANEW): instruction manual and affective ratings. Technical report, The Center for Research in Psychophysiology, University of Florida
Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Processing conference human language technology and empirical methods in natural language processing, pp 579–586
Balabantaray RC, Mohammad M, Sharma N (2012) Multi-class Twitter emotion classification: a new approach. Int J Appl Inf Syst (IJAIS) 4(1):48–53
Roberts K, Roach MA, Johnson J, Guthrie J, Harabagiu SM (2012) EmpaTweet: annotating and detecting emotions on Twitter. In: Calzolari N (Conference Chair) Piperidis, Choukri K, Declerck T, Doğan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Stelios (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12). European Language Resources Association (ELRA)
Suttles J, Ide N (2013) Distant supervision for emotion classification with discrete binary values. In: Gelbukh A (ed) Computational Linguistics and intelligent text processing, volume7817 of lecture notes in computer science. Springer, Berlin Heidelberg, pp 121–136
Burget R, Karasek J, Smekal Z (2011) Recognition of emotions in Czech newspaper headlines. Radioengineering 20(1):39–47
Ho DT, Cao TH (2012) A high-order hidden markov model for emotion detection from textual data. In: Richards D, Kang BH (eds) Knowledge management and acquisition for intelligent systems. PKAW 2012. Lecture notes in computer science, vol 7457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32541-0_8
Scherer KR, Wallbott HG (Feb 1994) Evidence for universality and cultural variation of differential emotion response patterning. J Pers Soc Psychol 66(2):310–28. https://doi.org/10.1037//0022-3514.66.2.310; (Jul 1994) Erratum in: J Pers Soc Psychol 67(1):55. PMID: 8195988
Chung-Hsien W, Chuang Z-J, Lin Y-C (2006) Emotion recognition from text using semantic labels and separable mixture models. ACM Trans Asian Lang Inf Process (TALIP) 5(2):165–183
Cheng-Yu L et al (2010) Automatic event-level textual emotion sensing using mutual action histogram between entities. Expert Syst Appl 37(2):1643–1653
Chaumartin F-R (2007) UPAR7: a knowledge-based system for headline sentiment tagging. Proceedings of the 4th international workshop on semantic evaluations. Association for computational Linguistics
Suhasini M, Srinivasu B (2020) Emotion detection framework for Twitter data using supervised classifiers. New York, NY, Springer, pp 565–576
Esuli A, Sebastiani F (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. Proceedings of LREC, vol 6
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Toutanova K, Klein D, Manning C, Singer Y (2003) StanfordPOStagger, [Online]. Available: http://nlp.stanford.edu/software/tagger.shtml,Stanford
Rashid U, Iqbal MW, Skiandar MA, Raiz MQ, Naqvi MR, Shahzad SK (2020) Emotion detection of contextual text using deep learning. 2020 4th International symposium on multidisciplinary studies and innovative technologies (ISMSIT), pp 1–5. https://doi.org/10.1109/ISMSIT50672.2020.9255279
Yang H et al (2012) A hybrid model for automatic emotion recognition in suicide notes. Biomed Inf Insights 5(Suppl 1):17
Arya P, Jain S (May–June 2018) Text-based emotion detection. IJCET 9(9)
Scherer KR, Wallbott HG (1994) Evidence for universality and cultural variation of differential emotion response patterning. J Pers Soc Psychol 66(2):310
Buechel S, Hahn U (2017) Readers versus: writers versus texts: coping with different perspectives of text understanding in emotion annotation. Paper presented at: proceedings of the proceedings of the 11th Linguistic annotation workshop, pp 1–12
Rosenthal S, Farra N, Nakov P (2019) SemEval-2017 task 4: sentiment analysis in Twitter. arXiv preprint arXiv:1912.00741
Mohammad SM, Bravo-Marquez F (2017) WASSA-2017 shared task on emotion intensity. arXiv preprint arXiv:1708.03700
Ahmad Z, Jindal R, Ekbal A, Bhattachharyya P (2020) Borrow from rich cousin: transfer learning for emotion detection using cross lingual embedding. Expert Syst Appl 139:112851
Huang C, Trabelsi A, Zaïane OR (2019) ANA at SemEval-2019 Task 3: contextual emotion detection in conversations through hierarchical LSTMs and BERT. arXiv preprint arXiv:1904.00132
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shah, A., Chopade, M., Patel, P., Patel, P. (2022). Survey: Emotion Recognition from Text Using Different Approaches. In: Singh, P.K., Wierzchoń, S.T., Chhabra, J.K., Tanwar, S. (eds) Futuristic Trends in Networks and Computing Technologies . Lecture Notes in Electrical Engineering, vol 936. Springer, Singapore. https://doi.org/10.1007/978-981-19-5037-7_31
Download citation
DOI: https://doi.org/10.1007/978-981-19-5037-7_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-5036-0
Online ISBN: 978-981-19-5037-7
eBook Packages: Computer ScienceComputer Science (R0)