Keywords

1 Introduction

With the rise of opinion-rich content in social media, such as Twitter, Sina Microblog, Facebook and WeChat, there is an increasing desire for individuals to express their feelings, mood, and emotions rather than just browsing and receiving information. These emotions, expressed in sentences, contain useful information for understanding public opinion, opinion mining, business decisions, mood management, and information forecasting [5]. Different from mature sentiment lexica labeled with sentiment polarity, emotion lexicon should contain multiple emotions with different intensity [21].

Analyzing and summarizing the emotions in texts has become an area of focus for many studies. An emotion lexicon, which reflects the unstructured characteristics of texts, plays an important role in emotion analysis and provides the advantage of high speed and easy understanding. In addition, it is also a fundamental task for affective modification and mood management, which are essential for research institutions, information consulting organizations, and government decision-making departments.

Much work has been done on emotion lexicon construction [15,16,17, 20, 21, 23]. Xu et al. [21] manually labeled a large Chinese lexicon with part-of-speech(POS), emotion, and intensity. In addition, Staiano et al. [17] and Song et al. [16] used a crowd-based method for emotion lexicon construction. Song et al. [15] synchronously used seed words and emoticons to build an emotion lexicon. However, their lexica all focus on primary emotions(i.e. happiness, sadness, like, anger, disgust, fear, and surprise), which are not enough for representing our daily lives.

The emotions of humans are complex, and many emotions are derived from two or more primary emotions, which are called compound emotions in the field of psychology. Compound emotions express the complex relationship between people and objective things that are widely used in our daily lives. A Chinese Microblog regarding the 2014 Malaysia airlines event can be considered as an example, (which translates to: “Do you suffer from Malaysia airlines anxiety disorders? Yes, I’m keeping track of Malaysia airlines and eager to know about the result”). Both fear and anticipation are expressed in this microblog, and thus anxiety, a compound emotion, can be directly used to make a detailed analysis. Learning about compound emotions can allow us to accurately understand the tendency of hot social issues. In another context, compound emotions can also useful in a test for depression. There are many different types of depression, and each type of depression has its own emotional characteristics. Anxiety is a major symptom of menopausal depression [14], other kinds of depression, such as postpartum depression (guilt and inferiority) [2] and major depressive disorder (sad, anxious, guilty, sentimentality, pessimism, and in the worst case, insomnia or even try to suicide) all have their own characteristics of different compound emotions. Effectively identifying the compound emotions can provide a foundation for detecting and predicting depression. Therefore, it is necessary to build a compound emotion lexicon, which can be used as a fundamental tool for sophisticated applications of emotion analysis.

In this paper, we construct a novel compound emotion lexicon called EmoMix, for complex emotion analyzing. Specifically, the candidate words are mapped into an emotion space that is built based on Plutchik’s emotion wheel [11]. Then a cascade clustering algorithm is used to tag the candidate words with compound emotion labels. The major contributions of this work are as follows:

  • We propose a novel building method of emotion lexicon to meet the demands of compound emotion analysis. To the best of our knowledge, our EmoMix is the first method for creating a lexicon with the psychological theory of compound emotion.

  • We propose a unified emotion labeling scheme based on a cascade clustering algorithm to address the requirements of compound emotions. Our method uses an emotion space to help find the most relevant emotion words. Although we built a Chinese lexicon as an example, the building method is language independent.

  • We conduct experiments for lexicon quality, state-of-the-art emotion lexica are used for evaluation, and show that EmoMix is competent in the emotion classification task. Additionally, the results of case studies demonstrate that EmoMix outperforms the conventional emotion lexica in compound emotion analysis.

The rest of this paper is organized as follows: In Sect. 2, we review the related work. Section 3 presents some preliminaries. Section 4 describes the details of our proposed EmoMix lexicon. Section 5 presents performance evaluation results. Finally, Sect. 6 concludes the paper.

2 Related Work

This paper involves two research focuses: emotion models and emotion lexicon construction.

Emotion Models. In general, psychologists classify emotion models into two categories, discrete [4, 18] or dimensional models [11, 13]. Discrete models introduce the basic cognition of human emotions from psychologists. In previous work, Tomkins [18] classified human emotions into eight primary emotions: surprise, interest, joy, rage, fear, disgust, shame, and anguish. Subsequently, Ekman [4] put forward another primary emotion model with six emotions, namely, anger, disgust, fear, happiness, sadness, and surprise. However, it is difficult for these models to describe complex emotions within the constraints of the number of emotions and their independence characteristic. Human emotions are considered to lie in two or three dimensions in the dimensional models. Plutchik offered a 3-dimensional model, called the wheel of emotions [11], which contains primary emotions, intensity dimensions, and compound emotions. Among them, compound emotions are acquired by adding two primary emotions together.

It is widely accepted that human expressions cannot be attributed to only primary emotions. We built a compound emotion lexicon based on Plutchik’s wheel of emotions, which is more suitable for the variety of human expressions.

Emotion Lexicon Construction. Emotion lexicon plays an important role in opinion mining and emotion analysis. Unlike a sentiment lexicon, an emotion lexicon has more granular classifications. Existing emotion lexicon construction methods can be roughly classified into manual and automated labeling. Xu et al. [21] constructed an affective lexicon ontology (ALO) in Chinese, which leveraged sentiment lexica and wordnet to obtain candidate words. They manually labeled the candidate words with emotional categories and intensity, and divided them into seven primary categories, and 21 subclasses. Then, Staiano et al. [17] presented a crowd-sourced approach DepecheMood (DPM) for emotion lexicon generation by combining the document-frequency distributions of words and the emotion distributions over documents. This method used the score in each emotion dimension to represent the strength of the emotion. As an extension to their work, Song et al. [16] analyzed the emotions of Rappler news articles at the topic level. By developing a non-negative matrix factorization model, a fine-grained emotion lexicon was built that associated words with emotions based on the hidden topics obtained from the factorization process. An approach based on manual labeling is labor-intensive, slow, and costly; therefore, researchers have focused more attention on automatic lexicon construction.

Xu et al. [20] proposed an automatic method for building a Chinese emotion lexicon by using WordNet-Affect. Their method translated WordNet-Affect into Chinese and filtered all non-emotion words. Then, synonym words were expanded to obtain the Chinese emotion lexicon. To address domain dependence, Yang et al. [23] proposed an Emotion-aware LDA (EaLDA) model to automatically build a domain-specific lexicon with six primary emotions. To solve the data scarcity problem, Song et al. [15] adopted a multi-label random walk algorithm based on a three-layer heterogeneous graph to build the emotion lexicon. They combined the effects of seed words and emoticons co-occurring with candidate words to capture the fine-grained emotion of candidate words. However, these lexica ignore the compound emotion at the word level, which results in a loss of valuable information for emotion analysis in complex situations.

In this paper, we introduce a cascade clustering algorithm into lexicon construction, and generate a compound lexicon with advantages as shown in Table 1.

Table 1. A comparison of emotion lexicons

3 Preliminaries

3.1 Plutchik’s Wheel of Emotions and Compound Emotion

The psychologist Robert Plutchik created the wheel of emotions [11], which is widely accepted for illustrating the various relationships among the emotions. As shown in Fig. 1, there are three important concepts to his theory: (1) Primary Emotions. There are eight primary emotions that are indecomposable (joy, trust, fear, surprise, sadness, anticipation, anger, and disgust). (2) Emotion Intensity. Each primary emotion has different degrees of intensity, which range from very light to very intense. (3) Compound Emotions. All emotions other than primary emotions occur as a result of a combination of the primary emotions. Recently, some work in natural language processing (NLP) has started to pay attention to this theory; however, most of the focus is on primary emotions [3] and emotional intensity [1], which do not fully express the complex feelings of users. If primary emotion lexica were used to analyze the 2014 Malaysia airlines event in Sect. 1, one might only obtain the anticipation of a compound emotion “anxiety” and miss the component of fear.

Fig. 1.
figure 1

Plutchik’s emotion wheel

Table 2. The extension of compound emotions
Table 3. Chinese microblog examples

Different from existing work, to meet the demand of complex emotion analysis, this study focuses on the third concept, “Compound Emotions.” To build such a compound emotion lexicon, the combinations of two adjacent primary emotions are not enough. Following the definition of “primary dyad” in Plutchik’s theory (the combinations of two adjacent primary emotions), Turner, another psychologist, extended the rest of the compound emotions [19]. He defined “secondary dyad” (mixed emotions that are one step apart on the wheel) and “tertiary dyad” (emotions that are two steps apart on the wheel) emotions. Additionally, he pointed out that the emotions on the opposite side of the wheel are in conflict, and they can’t generate compound emotions. As shown in Table 2, we can discover that the extended compound emotions also appear as pairs of opposites.

The above-mentioned research has provided the theoretical foundation for building a compound emotion lexicon. In this work, the rules of compound emotions and opposite emotions function as theoretical guides for emotion lexicon construction.

3.2 Motivation and Basic Idea

This work is supported by using a set of observations from users’ microblogs who suffer from depression. We collected over 1,600 microblogs from 57 users who committed suicide due to depression, as shown in Table 3. For example, consider the following Chinese microblog, (which translates to: “I am suffering from depression, thus I try to die. There is no important reason. Don’t care about my departure. Bye.”), which expresses pessimism and despair. It is very difficult to express these feelings by any single primary emotion. As bloggers’ delicate emotions move far beyond primary classes and are much more complicated, a fundamental tool for compound emotion analysis is urgently required.

It is a common phenomenon in our daily lives that our expressions are often complex and overlapping. Even more than one primary emotion can be expressed in a single word. After statistical analysis of these microblogs, we find that there are over 1,800 compound emotions, accounting for 63.7% of the total emotional words. This implies that a primary classification only gives limited information on the actual intentions of users’ expressions. However, most existing Chinese lexica classify emotional words into a single class from six to eight primary emotion categories. A few works [21] can be found where emotional words are labeled with multiple primary emotions; however, they are only a tiny proportion (14.3%) of the total number of words in the lexicon. Based on these observations from microblogs, the idea behind our EmoMix is to explore new methods for building a compound lexicon that can provide a foundational resource in emotion applications. Two primary emotions can evolve into a new emotion, and the new emotion category is no longer a part of primary categories. Each emotion, including primary and compound emotions, could combine with negation words to convert it to the opposite emotion. With such a compound emotion lexicon, we can provide many personalized applications could be provided that are difficult to accomplish using only a primary emotion lexicon. In addition to emotion management and depression detection, a compound emotion lexicon could be applied to recommend products, track hotpots, and forecast trends.

Above all, our goal is to build a compound emotion lexicon with the following characteristics: (1) The lexicon should be in accordance with modern psychological theory. (2) The lexicon should have a computational model of emotions. (3) The lexicon should directly reflect the compound emotion category of each word. To meet the design principles above, we follow the steps described below to build the lexicon. First, we select the domain-independent and widely-applicable corpus to train word vectors. Word embeddings trained from different corpus are compared and we choose the best fit for our work. Second, we construct an emotion space accompanied by Plutchik’s wheel of emotions and map the emotional words into the emotion space. The emotion space is built to refine the candidate words so that they could be closer to both semantically and emotionally similar words and further away from emotionally dissimilar words. Third, similar words are classified through cluster analysis. According to the position of word embedded in the emotion vector space, the emotional words could be divided into different compound emotion categories. In this way, a lexicon is obtained that has compound emotions in the word level.

4 Emotion Lexicon Construction

4.1 Dataset

To build a compound emotion lexicon, we gather different kinds of resources, such as dictionaries (Chinese synonym dictionaryFootnote 1, Contemporary Chinese Dictionary, and NTUSDFootnote 2) and semantic networks (HowNet and WordNet). From these resources, candidate words are selected that are related to emotion, such as psychology, feeling, affection, character, and attitude. The unlabeled corpora of different domains (Microblogs, News, Literature, Encyclopedias, etc.) are compared to train the word embeddings, and Baidubaike (an online Chinese encyclopedia)Footnote 3 is chosen for our work due to its wide coverage and accordance with the rules of NLP. We use SGNS (skip-gram model with negative sampling) [9] to train word embeddings on Baidubaike [8], which includes 0.745G word tokens. We train 300-dimensional word vectors with n-gram features from the corpus.

4.2 The Emotion Space

Generally, existing word embeddings can capture semantic and syntactic information from an unlabeled corpus, but they cannot acquire sufficient emotion information, which makes it difficult to apply the word embedding directly to emotion analysis. Therefore, we propose an Emotion Space to refine the word embedding. The emotion space can transfer semantic similarity into the emotion similarity. As discussed in Sect. 3.1, one compound word can be composed of multiple primary emotions. Each of these primary emotions and its opposites on Plutchik’s wheel form an emotion pair. These emotion pairs can be regarded as axes to build a 4-dimensional emotion space. To better represent the compound emotion character of words, the cosine similarity between candidate words and primary emotions is computed, and then these words are mapped into the emotion space.

Formally, the set of candidate words is denoted as C, and the seed words of the primary emotions that are used to predict the emotion similarity are denoted as S, respectively. The i-th primary emotion pair can be presented as \(\langle s^+_i, s^-_i\rangle \in S\), where \(i \in \{1,2,3,4\}\) represents the index of the primary emotion pair. For example, we use as \(\langle s^+_1, s^-_1\rangle \), which is presented as the positive and negative of one axis of the emotion space. For each candidate word \(c^* \in C\), 3COSMUL [7] is exploited to calculate the similarity between the word embedding of the candidate words and the seed words of every four emotion pairs. The cosine similarity measurement can be defined as \(\cos (u,v)= \frac{u\cdot v}{\Vert u\Vert \Vert v\Vert }\). Thus, we can obtain the emotion similarity \(e_i\) that represents how much closer the word is to a primary emotion and further away from its opposite.

$$\begin{aligned} e_i = \underset{c^* \in C}{\arg \max } \frac{\prod \limits _{j = 1}^m \cos (c^*, s^+_{i,j})}{\prod \limits _{k = 1}^n \cos (c^*, s^-_{i,k}) + \varepsilon }, \end{aligned}$$
(1)

where \(\varepsilon = 0.001\) is used to prevent division by zero, m is the number of seed words in \(s^+_1\), and n is the number of seed words in \(s^-_1\).

After calculating the four emotion similarities with each emotion pair, we can express the emotion vector of the candidate word \(c^* \in C\) as:

$$\begin{aligned} E_{c^*} = (e_1,e_2,e_3,e_4), \end{aligned}$$
(2)

When the candidate word is more similar to the opposite emotion, then the value of the emotion similarity \(e_i\) will below the zero. If each opposite emotion pair is regarded as the two extremes of a single axis, then all candidate words can be mapped into the emotion space by using \(E_{c^*}\) as the coordinate values. As shown in Fig. 2, there are four quadrants representing different compound emotions in the hyperplane composed of every two emotion axes.

Fig. 2.
figure 2

The hyperplane of the emotion axes (a): Joy-sadness and trust-disgust; (b): Joy-sadness and anticipation-surprise

4.3 Clustering Algorithm for Building the Lexicon

In this section, a cascade clustering algorithm is used to build the compound emotion lexicon. Since the emotion similarities are calculated by different seed words, the values on different axes are not in the same ratio. We cannot simply rank the four values of the axes and select the top two of them to mix into a compound emotion. To obtain more accurate results, a density-based clustering algorithm is leveraged to group candidate words into subclasses naturally. Then, all subclasses are divided into primary or compound emotions with the help of a modified k-means method, as shown in Fig. 3.

Specifically, let \(C = \{ E_{c_1}, E_{c_2}, \cdots , E_{c_n} \}\) be the set of pre-processed emotion vectors corresponding to the candidate words in emotion space. The measure used in literature [12] is used to find out the density peaks of candidate words. The local density \(\rho _{c_i}\) of the candidate word \(c_i\) is defined as

$$\begin{aligned} \rho _{c_i} = \sum _{j} \chi (d_{c_i,c_j}-d_{cut-off}), \end{aligned}$$
(3)

where \(\chi (x)=1\) if \(x < 0\); otherwise, \(\chi (x)=0\); and \(d_{cut-off}\) is a cutoff distance. A parameter t is used to adjust the size of \(d_{cut-off}\). The minimum distance between the candidate word \(c_i\) and any other word with a higher density can be denoted as follows:

$$\begin{aligned} \delta _{c_i} = \min _{j:\rho _{c_i}>\rho _{c_j}} (d_{c_i,c_j}), \end{aligned}$$
(4)

where \(\delta _{c_i} = \max _{j} (d_{c_i,c_j})\) for the highest density word. After comparing the quantity \(\gamma _{c_i} = \rho _{c_i}\delta _{c_i}\), the highest density word is obtained, which is the center of the emotion subclass \(Sub = \{sub_1, sub_2, \cdots , sub_n\}\), where n is the number of subclasses.

The idea of k-means is used in the next step. Eight typical subclasses (\(k=8\)) were selected as the initial primary emotion classes, which are denoted as \(Pri = \{pri_1, pri_2,\) \( \cdots , pri_8\}\), and then the similarity between the rest of the subclasses \(sub_j\) is calculated, where \(j \in \{1, 2,\cdots , n-8\}\) and each primary emotion class \(pri_i\), where \(i \in \{1, 2, \cdots , 8\}\). The similarity function can be presented as follows:

$$\begin{aligned} similarity^{(i)}:=\arg \min _{j}{\Vert x^{(i)}-\mu _j\Vert }^2. \end{aligned}$$
(5)

Specially, another constraint is added to limit the k-means method to consolidated emotion subclasses. The distribution of eight similarities is normalized into a range of [0, 1], and then they are ranked and the ratio of the top two emotions is calculated. Thus, the constraint of the similarity ratio can be defined as:

$$\begin{aligned} \small Constraint= {\left\{ \begin{array}{ll} Consolidated &{} \text{ if } \ ratio_{x,y} \ge \lambda ,\\ Not \ Consolidated &{} \text{ otherwise }. \end{array}\right. } \end{aligned}$$
(6)

If the similarity ratio is greater than a threshold \(\lambda \), then the subclass is classified into the nearest primary emotion; otherwise, it retains the two primary emotions labels, which are regraded as compound emotions. Then, the center of a new primary emotion is relocated, and all similarities are recalculated. The consolidated process is repeated until a global stable state is achieved and all the primary and compound emotions remain unchanged.

Fig. 3.
figure 3

Emotion classify based on clustering

5 Experiments

In this section, we describe the empirical evaluations of the proposed EmoMix. Since there is no existing method to metric the quality of compound lexica, its performance is verified in following ways: (1) we use a standard task of primary emotion analysis to test EmoMix with traditional lexica on the word and sentence levels; (2) we conduct several case studies of compound emotion analysis to show the potential of EmoMix in different applications.

5.1 Quality of Lexicon on the Word Level

Baseline: To the best of our knowledge, this method is the first method for creating compound emotion lexica. Thus, we can only compare Precision, Recall, and F-measure of the lexica generated through state-of-the-art methods with the lexicon ALO [21]. We choose ALO as a baseline because it is a large scale manually crafted lexicon and it is widely regarded as a standard in Chinese emotion analysis. The Precision P and Recall R are defined as follows:

$$\begin{aligned} \textstyle P=\frac{\sum _{e\in E}|W_{ALO}(e)\cap W_{TestLex}(e)|}{\sum _{e\in E}|W_{TestLex}(e)|}, \end{aligned}$$
(7)
$$\begin{aligned} \textstyle R=\frac{\sum _{e\in E}|W_{ALO}(e)\cap W_{TestLex}(e)|}{\sum _{e\in E}|W_{ALO}(e)|}, \end{aligned}$$
(8)

where E is the set of seven emotions, \(W_{ALO}(e)\) is the word set in emotion e of ALO, \(W_{TestLex}(e)\) is the word set with emotion label e in the testing lexicon, and the F-measure F is defined as \(F=\frac{2\cdot P\cdot R}{P+R}\).

As the emotion categories present in ALO do not exactly match with the ones provided by Plutchik’s wheel (there are seven main classes in ALO while there are eight in Plutchik’s). To meet the baseline ALO, EmoMix is adjusted by merging “trust” and “anticipation” into the emotion “like,” and mapping “joy” to “happiness,” as shown in Table 4. For a better comparison, we also use the same seed words in different lexicon construction methods.

Tuning the Parameter: There are two adjustable parameters in our method, including the size of emotion subclasses parameter t and the threshold of the top-2 primary emotions \(\lambda \). We investigate how the parameter t and threshold \(\lambda \) affect the final result of clustering. Generally, to obtain enough numbers of the emotion subclasses, a smaller parameter t is better; however, if we regard each word as a subclass under extreme conditions, then the natural distribution of emotion classes is lost. Finally, we set parameter \(t = 0.02\). \(\lambda \) is also an important parameter for determining whether it is a primary or compound emotion, and we chose a threshold \(\lambda = 7\). Therefore, the words in primary emotion classes would have a domination of emotion distribution.

Lexicon Evaluation: Experiments are conducted on the following emotion lexicon generation methods for comparison: (1) The DPM lexicon [17] is built by document-by-emotion matrix \(M_{DE}\) and word-by-emotion matrix \(M_{WE}\). \(M_{DE}\) is built from crowd-sourced emotions data and \(M_{WE}\) is obtained by using compositional semantic method over \(M_{DE}\). (2) The PMI lexicon [10] uses pointwise mutual information (PMI)-based scores and starts from a small set of seed words for emotion lexicon construction. (3) The Lex_yang lexicon [22] is built from a variation of PMI to calculate the collocation strength between candidate word and the emoticon. Every word entry of the lexicon contains several emotion senses ordered by the collocation strength. (4) The Lex_song lexicon [15] is obtained from a multi-label random walk algorithm combining the seed words and emoticons to construct the emotion lexicon. (5) The EmoMix_s lexicon is generated as a special case of our method that uses only emotion space to label words. (6) EmoMix lexicon is built using the full configuration of our method, which combines the emotion space and the cascade clustering algorithm.

Table 4. Mapping of EmoMix on ALO and Lex_song
Table 5. The quality of lexica on emotions
Fig. 4.
figure 4

Emotion performance of lexica (a): Precision; (b): Recall; (c): F-measure

The results are shown in Table 5, and we can easily find that EmoMix has obvious advantages over DPM and Lex_yang. Although our Precision is only a bit more than Lex_song, our Recall and F-measure were higher, which means it could detect and identify more emotion words. The advantage of EmoMix comes from its emotion expression and the natural distribution of emotion classes.

In addition, we further study the performance of each class in the three best lexica, as shown in Fig. 4. The “like” and “disgust” emotions are higher than other emotions, and “fear” is much lower. In fact, the taxonomy of ALO is not very appropriate for emotion analysis. The reason for this is that their classes of “like” and “disgust” are more similar to “positive” and “negative” in sentiment lexica. For example, the subclasses “trust” and “wish” in “like” are not as fine-grained as “like,” and “suspicion” in “disgust” is the same. We expect that EmoMix can offer more elaborate categories in both emotion classes and subclasses for emotion analysis applications.

5.2 Emotion Classification on the Sentence Level

Sentence level experiments are conducted based on the public dataset, Emotion Analysis in Chinese Weibo Texts (EACWT). This dataset has been widely used to verify primary emotion classification, which is provided for the emotion analysis shared task in NLP&CC2013. The emotional sentences in EACWT are annotated with emotion labels from anger, disgust, fear, happiness, like, sadness, or surprise (the same as in ALO). In this experiment, we compare the performance between the best performance on the sentence level and EmoMix. Furthermore, our results are compared with those obtained by a supervised learning TextCNN [6]. For the unbalanced distribution of the emotion classes, class-by-class classification results are provided as well as the overall performance of experiments.

A straight-forward voting-based algorithm is used for each lexicon to assign emotion labels for sentences, in the same manner as in [15]. Note that our main idea is to compare the quality of lexicon building methods rather than improve the performance of emotion classification on the sentence level; therefore, we only consider the negation and degree words in this task. We also use a basic CNN algorithm with raw training data for the same goal. The results for single-label classification are summarized in Table 6. As shown, EmoMix not only outperforms the state-of-the-art lexicon, Lex_song, but it also provides a competitive performance with the supervised learning method, TextCNN. It performs poorly when the data is lacking or imbalanced because the results of supervised leaning methods are closely impacted by the training dataset. It is worth stressing that Lex_song is a domain-specific lexicon of Weibo, while EmoMix is not; therefore, we expect EmoMix has general applicability in other domains.

Table 6. Sentence level emotion classification

5.3 Case Studies on the Paragraph Level

In this paper, the research question is to raised as to whether primary emotion classes can be used to reliably evaluate complex emotions in multiple kinds of applications. The advisable emotion categories can help reduce error classification, improve the quality of analysis, and increase the consistency of results. Because of that, we analyze four different kinds of media as a case study, including headline (news), microblog, blogs, and fairy tales (literature). As shown in Table 7, the different emotions expressed in several types of texts are analyzed.

Table 7. Case studies

Headlines. Headlines are an important text that are used to attract readers, and they directly affect the subsequence output of news media. Moreover, headlines are not just about the topics, but they are also about the emotion expressed regarding the topic. Readers often determine whether to read on according to the emotional orientation of news headline, and hence, fine-grained emotion analysis cannot be ignored. As an example, the emotion “admiration” has a higher distinction over “like,” and applying this study to headlines would help build more personalized recommendation services.

Microblogs. A traditional microblog is limited to 140 words, which is called a short-text. Users often express their feelings with one or a few sentences. However, they might not realize that the emotions in microblogs could affect their relationship with friends. Mood management, a burgeoning psychological assistant application, could help users improve the control ability of emotions. In particular, some special psychological assistant application, like the detection of autism and depression, could effectively help users and family monitor their moods and avoid tragedies. During this process, fine-grained emotions can assist doctors to find out about different psychological characteristics and provide more options for the treatment or prevention.

Blogs. A blog is another type of writing for self-expression that can be a paragraph or an article. Bloggers have enough space to express their feelings, and thus the emotions in blogs are usually exquisite, complex, and multiple. A blogger may have buried an inexpressible feeling or sentimentality, at the bottom of her heart; traditional primary emotions are limited by the number of categories, and they would be insufficient at expressing such complex feelings.

Fairy Tales. Literature is a much larger category of text that can be as long as a chapter level or a document level. Authors like to write many foreshadowing plots to highlight protagonists’ emotions. For example, in The Ugly Duckling, after so much suffering, the ducking’s feeling of despair is very difficult to express with only a primary emotion. It is worth mentioning that the emotional analysis of literature has a business value in Text-to-Speech(TTS), which could make the voice of machine more emotional.

Our findings indicate that the value of compound emotions in question is indeed competitive. These results pave the way to the practical analysis emotions in real life with simple, and comprehensible analysis benchmarks.

6 Conclusion and Future Work

In this paper, we presented EmoMix, a compound emotion lexicon, which was built using a novel construction method. Based on Plutchik’s wheel of emotions, we first constructed the emotion space by using four pairs of primary emotions and mapped the word embeddings into it. Then, to group the words naturally, a cascade clustering algorithm was applied, and the candidate words were classified into primary and compound emotion classes. Experimental results showed that EmoMix was competitive to the current state-of-the-art lexica on primary emotion classification. Additionally, it had a promising performance on compound emotion analysis over other lexica.

In future work, we would like to add a combination for the intensity and extend the method to the other languages. In addition, we will further enhance the quality and capacity of our compound emotion lexicon, EmoMix.