1 Introduction

The rapid growth of social media platforms results in the increasing number of people who express their emotions through short messages [20]. To extract the great value from this type of data for social emotion mining and monitoring, it is necessary to perform topic discovery and emotion detection to identify topics and emotions embedded in short texts [11].

Topic discovery aims to model topics from documents based on their content, and emotion detection identifies emotions from documents at the word, sentence or document level. In the scenario of emotion mining, public emotions always vary from one topic to another topic, and topics trigger public emotions. Therefore, topic discovery and emotion detection are closely related such that jointly modeling topics and emotions is an appropriate way to conduct these tasks [34]. Furthermore, we may also want to learn not only the emotions of a single document or user, but also the statistical results based on groups of individuals sharing similar interests. For example, editors of magazines often want to identify common interests among readers to ensure that all of the major interests are covered in each issue. The editors are also interested in their readers’ characteristics (e.g., sex, age, and education level) to maintain the magazine’s content appropriately. Thus, there are practical reasons for jointly modeling topics, emotions and user groups [33].

However, there are challenges to effectively detect topics and emotions in short texts. First, each short message includes only a few words, resulting the lack of significant context [36]. Models directly applied to these types of text often suffer from the feature sparsity problem leading to undesirable results. Second, there is the question of which information should be used when discovering representative groups of users. Third, there is the question of how to model content, emotions, and user information within groups to capture the relationships between topics, emotions, and users.

Conventional emotion-aware topic models only present results at the word, sentence, and document-level. Emotion Topic Model (ETM) [2] assumes every word is selected according to specific topics and emotions. Multi-label Supervised Topic Model (MSTM) and Sentiment Latent Topic Model (SLTM) [24] first discover topics within each document and then analyze emotions towards those topics at the document level. These methods can not produce high-level results for groups of users, which we denote as user group level in this paper. They also suffer from the sparsity problem in short texts. Time-User Sentiment/Topic Latent Dirichlet Allocation (TUS-LDA) [32] aggregates short texts from a single user or a single time interval into lengthy pseudo-documents to tackle the above problem when detecting burst topics and social sentiment feedback. TUS-LDA can work at the user level when topics belong to a user’s static interest, or the global level when topics relate to current social issues. However, TUS-LDA can not discover groups of users, or the differences in topical interests and emotions between groups either. Besides, TUS-LDA needs pre-developed sentiment lexicons, which may be limited when dealing with a new emotion label.

Regarding the issue of how to divide users into groups, we observe that the more similar people are, the more likely they share similar interests. For example, in the 45th US presidential election, the Washington Post, an authoritative newspaper, used several sets of data to illustrate the characteristics of supporters of Donald Trump: the proportion of male supporters was 19% more than women; and 50% of those with annual incomes below $50,000 supported Trump versus 32% for those with higher income. Data broadly support the theory of homophily that relates similarity of interests to similarity of emotions [16]. Based on this phenomenon, this paper exploits user characteristics, content and emotions, to carry out topic discovery and emotion detection at the user group level.

We propose a method of emotion detection and topic discovery with the help of user characteristics, which is called User Group based Topic Emotion (UGTE) model. Our main contributions are summarized as follows. Firstly, UGTE models user characteristics, emotions, and content jointly to improve the effectiveness of both emotion detection and topic discovery. By influencing the process of topic generation, user characteristics help to identify semantic group structures. As mentioned above, individuals with similar characteristics are more likely to generate similar emotions. Therefore, when analyzing user group level results, UGTE considers not only topics and emotions information of an individual, but also the effects of user groups, as characterized by gender, social income, education, and others. Secondly, UGTE aggregates short texts into lengthy pseudo-documents and jointly models topics and emotions within each group to address the feature sparsity problem. Finally, different from existing methods, UGTE not only captures the relationship between topics and emotions for every group, presented as distributions over words, but also releases portraits for these groups presented as distributions over characteristics.

The rest of this paper is organized as follows. Section 2 introduces related work concerning topic models on short texts, joint topic emotion modeling, and community based sentiment/emotion detection. Section 3 demonstrates the proposed model and the inference of model parameters. Section 4 presents our experiments and discussions. In Section 5, conclusion is drawn.

2 Related work

2.1 Short text topic models

The topic model provides a solution for implicit semantic mining and understanding. Probabilistic Latent Semantic Analysis (PLSA) [10] is one of the first latent semantic models, which uses expectation-maximization (EM) algorithm for parameter inference. Given the fact that PLSA suffers from the overfitting problem, Latent Dirichlet Allocation (LDA) [3] introduces the Dirichlet distribution as the conjugate prior of topics. In recent years, LDA has achieved great success in information retrieval [21] and topic modeling [6, 15]. However, both LDA and PLSA perform well when mining topics from lengthy documents only. Nowadays, texts from the Internet are typically short and lacking context. The feature sparsity problem arises for LDA and PLSA when applied to short texts [36].

To overcome this limitation, the external document embedding method was first introduced to enrich contextual information in short texts [14, 19, 28]. This method is effective, but the enriched documents are not always consistent with the original messages. Thus, the method may have no effect or even a negative effect on the results. In addition, finding the auxiliary data is expensive and time-consuming. Besides the external document embedding, the Biterm Topic Model (BTM) [7] is an alternative method. It is proposed based on the idea that two words are more likely to belong to a same topic if they co-occurred more frequently. Such a kind of methods lengthen short texts by converting documents into biterm sets. However, the problem of biterm based methods lies in that they bring in little additional word co-occurrence information and therefore still face the feature sparsity problem [38]. Another alternative approach is integrating short texts into lengthy pseudo-documents, which solves the feature sparsity problem without carefully selecting external documents [39]. Twitter LDA [37] aggregates posts from a single user into a pseudo-document to identify topics from the words. TimeUserLDA [8], aggregates posts by user or timestamp to detect “breakout” topics. Such topics fall into two categories: personal static topics and temporal dynamic topics. Similar to TimeUserLDA, the model that incorporates temporal, personal and extraction factor (TUK-TTM) [35] aggregates posts by time slices or users to produce personalized time-aware tag recommendations. However, these models can not be applied to emotion detection. To mine burst topics on social media, TUS-LDA [32] introduces a sentiment variable to every post aggregated in pseudo-documents. Taking advantage of the aggregation method, our proposed model also uses this idea to address the feature sparsity problem. UGTE differs, however, by aggregating short messages from a user group into a pseudo-document to give group level results.

2.2 Jointly modeling topics and emotions

Data from the Internet contains users’ opinions and emotions. In recent years, to jointly model topics and emotions, several researchers have extended topic models to perform emotion detection of user-generated text, such as product and movie reviews [34]. ETM [2] uses emotion labels to implement a supervised emotion topic model for social emotion mining. Different from ETM which was developed from the writer’s perspective, MSTM and SLTM [24] model topics and sentiment labels from the perspective of readers. Experiments show that they are more suitable for public voting articles when mining social emotions. The Contextual Sentiment Topic Model (CSTM) [23] proposes to classify reader emotions by explicitly distinguishing context-independent topics from nondiscriminative information such as some very common words, and a contextual theme which characterizes context-dependent information across different collections. However, models mentioned above are applied on regular documents rather than shor texts. Weighted Labeled Topic Model (WLTM) [25] based on BTM models multiple emotion labels and biterms for short text emotion detection jointly. Except LDA-based methods, neural based topic models arise for topic discovery and supervise learning recently. Supervised Neural Topic Model (sNTM) [4] extracts topics based on neural network by following the document-topic distribution in topic models. However, observable labels have a little effect on the process of topic discovery. Neural Siamese Labeled Topic Model (nSLTM) [12] incorporates the supervision of labels into topic modeling, which can be applied to both classification and regression.

Previous joint topic emotion models only model topics and emotions at the word, sentence, or document level. They do not capture emotions and topics within groups of users, nor do they identify which user would be interested in specific topics. Our UGTE approach integrates user characteristics with topics and emotions so that the model performance can be enhanced by exploiting the relationships among topics, emotions, and users.

2.3 Community based sentiment/emotion detection

LDA-based topic models are applied widely to community detection. Community detection has been studied from the perspective of network structural communities and semantic communities. Since most methods for detecting network structural communities use graph partitioning algorithms considering only users’ relationships or interactions [18], we do not discuss them here due to their lack of relevance. Different from network structural community detection, semantic community detection takes both network structure and user semantic attributes into consideration. For example, the Group-Topic (GT) model uses entity relationships and textual attributes to simultaneously discover topics for events and communities among the entities [31]. The Topic User Community Mode (TUCM) uses social links, interaction types and context information to detect communities [27]. However, these methods do not take sentiments or emotions into consideration. To conduct sentiment analysis, the Sentiment-Topic model for Community discovery (STC) aggregates topics, sentiments and interactions among users to detect sentiment-topic level communities [33]. Work in [30] detects sentiment communities with social relationships between users, context and sentiment labels. However, it is unable to discover topics or opinions across communities. The People Opinion Topic (POT) model introduces opinion based community detection to discover hot topics and analyze sentiment along with detecting social communities [5].

The methods mentioned thus far integrate sentiment analysis and community detection to improve model performance on both tasks. However, there are several differences between our work and these studies. First, existing models of community detection mostly work on discovering the best structural community by examining users with more interactions. They take sentiments or user context into consideration and fail to extract topics or sentiments of different communities. Instead, our model attempts to discover topics and emotions at the group level, which not only models topics and emotions but also identifies people sharing similar interests in the same group. Second, most community detection models depend on information from users’ social relationships or online interactions. None of the existing models of community detection employ users’ characteristics tags to analyze relationships between users interests and their profiles. However, in many cases, a decision maker may want to know what different groups of users think of an event, and which characteristics have the largest influence. Our model achieves such high level results when other models fail.

3 User group based topic emotion model

In this section, we introduce our UGTE model and present its structure. After defining the problem, relevant general terms and notations, we will describe our model in detail. We also present our method of learning parameters.

3.1 Problem definition

Given a set of documents D = {d1, d2,..., d|D|} with |D| elements, the vocabulary of D is W = {w1, w2,..., w|W|} with size of |W|, the set of globally distinct emotion labels is E = {e1, e2,..., e|E|} with |E| elements, and the set of users is U = {u1, u2,..., u|U|} with size of |U|. Suppose every document of D is generated by one user and labeled with one of the above emotions. Each document di can be further denoted as \(d_{i}^{r, k}\), which means document di is generated by user ur and labeled with emotion ek. The words in document di are denoted as \(W_{d_{i}}=\{w_{i,1}, w_{i,2}, ..., w_{i, N_{i}}\}\), where Ni is the total number of words in document di.

To discover user group based topics and emotions, we need exploit user characteristics such as age, gender, and country. For total J types of characteristics collected, we denote the set of characteristics tags of the j th type as \(F_{j} = \{ f_{j,1}, f_{j,2}, ..., f_{j,|F_{j}|}\}\) with |Fj| elements. We denote the characteristics tags for each user ur as \(F^{u_{r}}=\{{f_{1}^{r}}, {f_{2}^{r}}, ..., {f_{J}^{r}}\}\) where \({f_{j}^{r}}\) is the element of the j th type characteristic tag of ur, that belong to Fj. For example, assume that three users u1, u2, and u3 have characteristics tags \(F^{u_{1}}=\{\) ‘Male’, ‘22’, ‘America’ }, \(F^{u_{2} }=\{\) ‘Female’, ‘23’, ‘America’ } and \(F^{u_{3}}=\{\) ‘Male’, ‘24’, ‘America’ }, respectively. There are totally J = 3 different types of characteristics tags: gender, age, and country. From the tag values, we determine that \(F_{1}=\{\) ‘Male’, ‘Female’ } with |F1| = 2, F2 = { ‘22’, ‘23’, ‘24’ } with |F2| = 3 and F3 = { ‘America’ } with |F3| = 1.

Our primary task is to jointly discover the topics \(Z=\{z_{1}, z_{2}, ..., z_{|Z|} \}\) with size of |Z| and the emotions of given documents at the user group level. In other words, we should detect different user groups \(G = \{g_{1}, g_{2}, ..., g_{|G|}\}\) with |G| elements, infer the topic distributions of different groups 𝜃g, and analyze the emotion distributions ϕg, z of each topic within groups simultaneously. In our UGTE model, the user groups are latent variables as topics. The number of user groups is a predefined parameter, whose impact will be detailed in Section 4.2. Table 1 provides a summary of the notations used in our presentation.

Table 1 Notations used in UGTE

3.2 Generative process

Conventional joint topic-emotion models focus on the association between emotions and topics at the level of documents or users. To model topics and emotions jointly at the user group level, we propose the UGTE model by adding a user group layer to the generation module of topics and emotions. Figure 1 shows the structure of UGTE. In UGTE, every user ur is related to a global group distribution π. Every group g is associated with its own characteristics tags distributions ψg, j, topic distribution 𝜃g, and emotion distribution ϕg, z. For a document \(d_{i}^{r,k}\), user group of this document \(g_{d_{i}}\) will be sampled according to π. After determining the group assignment, we generate each of the user’s characteristics tags \({f_{j}^{r}}\) from the characteristic distribution ψg, j. UGTE identifies topics and emotions according to each group’s parameters by assuming that documents of each group follow the same topic distributions, and introducing emotions to topics in each group separately. When a user writes each word wi, n in document \(d_{i}^{r,k}\), s/he first chooses a topic zi, n from a group’s topic distribution 𝜃g. Then, emotion ei, n is determined from the emotion distribution ϕg, z. According to the specific topic and emotion, the user draws word wi, n from the word distribution φz, e.

Figure 1
figure 1

The graphical model of UGTE

With respect to the group membership, it is reasonable and natural for UGTE to assume that one document belongs to one group while one user of several documents can belong to multiple groups with different probabilities. Although a short message often expresses one central idea, the user may write several messages on different topics with different attitudes. For example, if a person is fond of comics but has little interest in political news, s/he may regularly post about comics but far less often about political news. Such a person would be strongly related to a group whose users are fond of comics and weakly related to another group with heated political discussion. Besides, different from conventional joint topic-emotion models, one of the contributions of UGTE is its use of each user’s characteristics tags for group discovery. UGTE identifies groups according to the documents’ topics, emotion labels, and characteristics tags of corresponding users. The basis for this idea is that people sharing similar characteristics are more likely to share similar emotions on specific topics so that can be treated as a group. For example, individuals from different regions or social classes and with different ages are often interested in different topics. Even for a given topic, different groups of people may hold different emotions. The size of group set G is a predetermined parameter as that of topic set Z, enabling UGTE to mine different levels of group based emotions.

Formally, the generative process for each document is as follows:

  1. 1.

    For every type of characteristic tag fjF, draw \(\psi _{j} \sim Dirichlet(\lambda )\);

  2. 2.

    Draw the distribution over groups \(\pi \sim Dirichlet(\gamma )\);

  3. 3.

    For each group g, draw the distribution over topics \(\theta _{g} \sim Dirichlet(\alpha )\);

  4. 4.

    For each topic z of each group g, draw the distribution over emotions \(\phi _{g,z} \sim Dirichlet(\mu )\);

  5. 5.

    For each topic z of specific emotion e, draw the distribution over words \(\varphi _{z,e} \sim Dirichlet(\upbeta )\);

  6. 6.

    For each document \(d_{i}^{r,k}\):

    1. (a)

      Draw group \(g_{r} \sim Multinomial(\pi )\);

    2. (b)

      Draw each characteristics tag \({f_{j}^{r}} \sim Multinomial(\psi _{j})\);

    3. (c)

      For each word wi, n document \(d_{i}^{r,k}\):

      1. (i)

        Draw topic \(z_{i,n} \sim Multinomial(\theta _{g_{r}})\);

      2. (ii)

        Draw emotion \(e_{i,n} \sim Multinomial(\phi _{g_{r}, z_{i,n}})\);

      3. (iii)

        Draw word \(w_{i,n} \sim Multinomial(\varphi _{e_{i,n}, z_{i,n}})\).

3.3 Parameter inference

As a variant of the joint topic-emotion model, the inference of latent variables in our model are intractable. To address this, Gibbs sampling [9] or variational inference [3] is often employed. Gibbs sampling is a special case of Markov Chain Monte Carlo [13], which could achieve an accurate posterior distribution for parameter inference. On the other hand, variational inference can only provide an analytic approximation. Furthermore, it is mathematically arduous for variational inference to derive the approximation when the model structure is complex. Thus, following the previous works [2, 7, 32], we use Gibbs sampling when discovering groups and modeling the topics and emotions. According to the generative process, the joint probability of all the random variables for a document collection is shown as follows:

$$ \begin{array}{@{}rcl@{}} & & p(z, w, e, g, f, \psi, \pi, \theta, \phi, \varphi, \alpha, \upbeta, \mu, \lambda, \gamma) \\ & = & p(\pi; \gamma) p(\psi; \lambda) p(\theta; \alpha) p(\varphi; \upbeta) p(\phi; \mu) \\ & & p(g | \pi) p(f | g, \psi) p(z | g, \theta) p(e |g, z, \phi) p(w | z, e, \varphi) . \end{array} $$
(1)

During group discovery, a posterior probability for inferring the group gr of a user ur can be derived by marginalizing the above joint probability. The posterior probability is related to the user characteristics tags \(F^{u_{r}}\), topics and the emotion label of document di, as follows:

$$ \begin{array}{@{}rcl@{}} & & p(g_{r} = g \mid F^{u_{r}}, { d_{i}^{r,k}},{g}_{-r}, \alpha, \upbeta, \mu, \lambda, \gamma ) \\ & \propto & p(g_{r} = g \mid{g}_{-r}, \gamma) p(F^{u_{r}} \mid{g}_{-r}, \lambda) p(d_{i}\mid {d_{i,e}=e_{k}},{g}_{-r}, \alpha, \upbeta, \mu) \\ & \propto & p(g_{r} = g \mid{g}_{-r}, \gamma) \prod\limits_{j=1}^{J} p({f_{j}^{r}} = f \mid{g}_{-r}, \lambda) \prod\limits_{n=1}^{N_{i}} p(w_{i, n} \mid {d_{i,e}=e_{k}},{g}_{-r}, \alpha, \upbeta, \mu) . \end{array} $$
(2)

Specially, emotion label di, e is an observable variable, so that the posterior probability of a group generates a word with a specific emotion can be derived by marginalizing the topic variable according to (1). The formulas is shown as follows:

$$ \begin{array}{@{}rcl@{}} & & p(w_{i, n}, \mid {d_{i,e}=e_{k}},{g}_{-r}, \alpha, \upbeta, \mu) \\ & = & \sum\limits_{z=1}^{|z|} p(z_{i,n} = z \mid{g}_{-r}, \alpha ) p(e_{i, n} = e_{k} \mid z,{g}_{-r}, \mu) p(w_{i, n} \mid z, e_{i,n} = e_{k},{g}_{-r}, \upbeta). \end{array} $$
(3)

According to the detailed derivation, we can estimate the posterior probability by (4):

$$ \begin{array}{@{}rcl@{}} & & p(g_{r}= g \mid F^{u_{r}}, d_{i}, {d_{i,e}=e_{k}},{g}_{-r}, \alpha, \upbeta, \mu, \lambda, \gamma ) \\ & \propto & \frac{N_{-r}^{g}+\gamma}{{\sum}_{g'=1}^{|G|}(N_{-r}^{g'}+ \gamma)} \times \prod\limits_{j=1}^{J} \frac{N_{f,j}^{g,-r}+\lambda}{{\sum}_{f'=1}^{|F_{j}|} N_{f',j}^{g, -r}+ |F_{j}| \lambda} \\ & & \times \prod\limits_{n=1}^{N_{i}} \sum\limits_{z=1}^{|z|} \frac{ N_{z}^{-i, g}+\alpha }{ {\sum}_{z'=1}^{|Z|} (N_{z'}^{-i,g} + \alpha)} \times \frac{ N_{{e_{k}}}^{-i, g,z}+\mu}{ {\sum}_{e'=1}^{|E|} (N_{e'}^{-i,g,z} + \mu) } \times \frac{ N_{w_{i,n}}^{-i,z,e}+\upbeta }{ {\sum}_{w'=1}^{|W|} (N_{w'}^{-i,z,e} + \upbeta) }, \end{array} $$
(4)

where \(N_{-r}^{g}\) is the number of users assigned to group g excluding user ur, and \(N_{f,j}^{g, -r}\) is the number of characteristics tags fj assigned to group g excluding tag \({f_{j}^{r}}\). Furthermore, \(N_{z}^{-i, g}\) is the number of words assigned to topic z in group g excluding words of document di, \( N_{e}^{-i, g,z}\) is the number of words assigned to emotion e of topic z in group g excluding words of document di, and \(N_{w}^{-i,z,e}\) is the number of words w assigned to emotion e of topic z excluding words of document di.

After sampling the group of document di, the assignment of topics and emotions of words can be inferred by parameters characterizing the group. Differing from conventional unsupervised LDA-based model, UGTE is a supervised joint topic-emotion model, which utilizes the emotion labels of documents when performing Gibbs sampling. Inspired by Labeled LDA [22], we incorporate supervision by simply constraining the emotion assignment of words same as the emotion labels of corresponding documents. We formulate this process as follows:

$$ \begin{array}{@{}rcl@{}} & & p(z_{i,n}=z,e_{i,n}={e_{k}} \mid {d_{i,e}=e_{k}},{z_{-i,n}},{e_{-i,n}}, w_{i,n}, g, \theta, \phi, \varphi, \alpha, \upbeta, \mu) \\ & \propto & p(w_{i,n}=w \mid z_{i,n}=z, e_{i,n}={e_{k}}, {d_{i,e}=e_{k}},{z_{-i,n}},{e_{-i,n}}, \theta, \phi, \varphi, \alpha, \upbeta, \mu) \\ & \propto & \frac{N_{z}^{g,-w_{i,n}}+\alpha}{{\sum}_{z'=1}^{|Z|} (N_{z'}^{g,-w_{i,n}}+\alpha)} \cdot \frac{N_{{e_{k}}}^{g,z,-w_{i,n}}+\mu}{{\sum}_{e'=1}^{|E|}(N_{e'}^{g,z,-w_{i,n}}+\mu)} \cdot \frac{N_{w}^{z,{e_{k}},-w_{i,n}}+\upbeta }{ {\sum}_{w'=1}^{|W|} (N_{w'}^{z,e, -w_{i,n}}+\upbeta)}. \end{array} $$
(5)

After the sampling process converging according to (4) and (5), the distribution of π, 𝜃g, ϕg, z and φz, e is convenient to be estimated according to (8)- (10), as follows:

$$ \begin{array}{@{}rcl@{}} \pi_{g} & = & \frac{N_{g}+\gamma}{{\sum}_{g'=1}^{|G|} (N_{g'}+ \gamma)}, \end{array} $$
(6)
$$ \begin{array}{@{}rcl@{}} \psi_{g,j,f} & = & \frac{N_{f,j}^{g}+\lambda}{{\sum}_{f'=1}^{|F_{j}|}(N_{f',j}^{g}+\lambda)}, \end{array} $$
(7)
$$ \begin{array}{@{}rcl@{}} \theta_{g, z} & = & \frac{{N_{z}^{g}}+\alpha}{{\sum}_{z'=1}^{|Z|} (N_{z'}^{g} + \alpha)}, \end{array} $$
(8)
$$ \begin{array}{@{}rcl@{}} \phi_{g, z, e} & = & \frac{N_{e}^{g,z}+\mu}{{\sum}_{e'=1}^{|E|} (N_{e'}^{g,z}+\mu)}, \end{array} $$
(9)
$$ \begin{array}{@{}rcl@{}} \varphi_{z, e, w} & = & \frac{N_{w}^{z,e}+\upbeta}{{\sum}_{w'=1}^{|W|} (N_{w'}^{z,e}+\upbeta)}. \end{array} $$
(10)

With all the parameters derived above, we can further infer the emotions of unlabeled document dtest as follows:

$$ \begin{array}{@{}rcl@{}} p(e_{test} \mid d_{test} ) & = & p(e_{test}=e \mid d_{test}, F^{u_{r}}) \\ & = & \prod\limits_{n=1}^{N_{test}} \sum\limits_{g=1}^{|G|} \sum\limits_{z=1}^{|Z|} \begin{array}{l} \tilde{\pi}_{g} \\ \cdot {\prod}_{j=1}^{J} \tilde{\psi}_{g,j,f^{u_{test}}} \\ \cdot \tilde{\theta}_{g,z} \\ \cdot \tilde{\phi}_{g,z,e} \\ \cdot \tilde{\varphi}_{z,e,w} \end{array} , \end{array} $$
(11)

where \(\tilde {\pi }_{g}\), \(\tilde {\psi }_{g,j,f^{u_{test}}}\), \(\tilde {\theta }_{g,z}\), \(\tilde {\phi }_{g,z,e}\), \(\tilde {\varphi }_{z,e,w}\) can be inferred according to (6) - (10) with the number of corresponding instances including the union of documents in Dtrain and document dtest.

4 Experiments

To evaluate our proposed method, we perform topic discovery and emotion classification, and compare our method with other state-of-the-art models.

4.1 Experimental setup

4.1.1 Dataset

We use a real-world dataset to verify the effectiveness of our model. ISEARFootnote 1 is a typical dataset for emotion detection, which contains 7,666 sentences/short texts annotated by 1,096 users with different cultural backgrounds. It is completed in the form of a questionnaire, which includes their personal information, experiences, and their expressions over seven emotions, i.e., anger, disgust, fear, joy, sadness, shame and guilt. Each sample contains 42 attributes,including discrete types and some discriptions. We use 11 discrete attributes and contents for experiments: ID, CITY, COUNTRY, SEX, AGE, RELI, PRAC, FOCC, MOCC, FIEL, EMOT and SIT. After pre-processing by removing stop words and filtering punctuation marks, there are totally 7,652 samples left for experiments. By default, we use 80% data (6,122 samples) as the training set and the remaining 20% data (1,530 samples) as the testing set. To further explore how effective the model could address the feature sparsity problem brought by extremely short text, we divide examples into two different groups based on the length of its content. Texts longer than 10 words are grouped as the “short text” subset, while those shorter than 10 words are classified into the “extremely short text” subset. For each subset, 80% samples are used as the training set and the remaining 20% samples are used as the testing set. Particularly, there are 891 training samples and 223 testing samples in the “short text” subset, and 5,230 training samples and 1,308 testing samples in the “extremely short text” subset. Details of attributes of ISEAR are shown in Table 2.

Table 2 Selected attributes of ISEAR

4.1.2 Baselines

To evaluate the effectiveness of UGTE, we employ several representative algorithms that jointly model topics and emotions/sentiments as baselines: Author-Topic model (AT) [26], Multi-label Supervised Topic Model (MSTM) and Sentiment Latent Topic Model (SLTM) [24], Contextual Sentiment Topic Model (CSTM) [23], supervised Neural Topic Model (sNTM) [4], and neural Siamese Labeled Topic Model (nSLTM) [12]. AT extends LDA to include authorship information by jointly modeling users and topics. MSTM and SLTM are topic models for social emotion mining from the perspective of readers. CSTM classifies reader emotions across different contexts by distinguishing context-independent topics from both a background theme and a contextual theme. sNTM is in essence a neural network by following the document-topic distribution in topic models. nSLTM is a supervised topic model based on the Siamese network, which can trade off label-specific word distributions with document-specific label distributions in a uniform framework.

4.1.3 Metrics

Topic coherence [17] is an effective measure for the quality of topic discovered by the models. Between any two words in top-n words for each topic, the more the words co-occurred within a document, the better the generated topic is. Coherence@n denotes models’ performance on topic discovery, as measured by the average of coherence values for each topic in the model. The calculation can be formulated as follows:

$$ \begin{array}{@{}rcl@{}} Coherence@n= \frac{1}{Z} \sum\limits_{z=1}^{Z} C(z,n), \end{array} $$
(12)
$$ \begin{array}{@{}rcl@{}} C(z,n)=\sum\limits_{i=2}^{n} \sum\limits_{j=1}^{i-1} \log \frac{D(w_{z,j}, w_{z,i} )+1}{D(w_{z,i} )} , \end{array} $$
(13)

where Cz, n is the coherence value of topic z according to top-n words, wz, i is the i th most probable word of topic z, D(wz, i) is the frequency of word wz, i appeared in the dataset, D(wz, j, wz, i) is the co-occurrence frequency of wz, j and wz, i within documents in the dataset. For the task of emotion classification, the accuracy and the Cohen’s kappa score [1] are used as the evaluation metrics.

4.1.4 Parameter setting

We verify the effectiveness of our proposed model by conducting topic discovery and emotion classification. Experiments of comparing UGTE and baselines are set up. For topic discovery, we run all models with different topic numbers |Z|∈{25,30,35,40,45,50,100,150,200,250,300}. We selecte hyper-parameters for the Dirichlet priors as symmetric Dirichlet prior vectors according to other studies [2, 23, 24, 26], where α = 50/|Z|, β = 0.1, γ = 0.1, λ = 0.1, μ = 0.1. We set the number of user groups |G| to 10 based on a preliminary study. For completeness, we also evaluate the influence of user group numbers on our model in Section 4.2, by setting |Z|∈{25,50,100} and |G|∈{1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,100}. Tasks of topics discovery and emotion classification are conducted on UGTE_ID (UGTE that exploits only ID, EMOT and SIT of users) in this part. Since LDA-based model is insensitive to values of the hyper-parameters for the Dirichlet priors [29], we set parameters for the baselines according to the corresponding papers. Except baselines of AT, MSTM, SLTM, CSTM, sNTM and nSLTM, the proposed UGTE_ALL (UGTE that use all attributes of users) is also compared with UGTE_ID. We use the training set to estimate model parameters. Then we infer parameters and evaluated Coherence@10, Coherence@20 and Coherence@30 on the testing set. For emotion classification, we use a similar process to set parameters and evaluated accuracy and Cohen’s kappa on the testing set. MSTM, SLTM, CSTM, sNTM, nSLTM, UGTE_ALL, UGTE_ID are adopted for comparison since AT can not be applied to emotion classification directly.

Then, to explore the impact of different user characteristics tags on UGTE, topic discovery and emotion classification are conducted on 12 variant models of UGTE. These variant models include UGTE_NULL, UGTE_ID, UGTE_CITY, UGTE_COUN, UGTE_SEX, UGTE_AGE, UGTE_RELI, UGTE_PRAC, UGTE_FOCC, UGTE_MO-CC, UGTE_FIEL, UGTE_ALL, which refer models that use no characteristics tags, ID, CITY, COUN, SEX, AGE, RELI, PRAC, FOCC, MOCC, FIEL and all characteristics tags, respectively. Experiments are run in a similar process to the above.

Finally, a case study is conducted to demonstrate how does the user characteristic help improve the performance of topics discovery and emotions detection. User portraits are illustrated to show how dose the UGTE discover the relationship between topics and emotions at the group level. The number of iterations is set to 3,000 for all experiments. We run each model 10 times to reduce noise and randomness, and both the mean and the variance are presented.

4.2 Influence of user group numbers

To investigate the relationship between the number of user groups and model performance, we conduct topic discovery and emotion classification tasks with different numbers of user groups under a fixed number of topics. Results are shown in Figures 2 and 3. From Figure 2 we can observe that when |Z| = 25 and |Z| = 50, the coherence score fluctuates firstly and then decreases as the number of user groups increased. It indicates that UGTE_ID performs better when the number of user groups is small for this dataset. The optimal size of user groups is G ∈{1,2,3,4,5,6,7,8,9,10} according to that UGTE_ID performs well conditioned on 1 ≤ G ≤ 10. However, when |Z| = 100, the performance of UGTE_ID is more stable under three coherence metrics with smaller variances. It indicates that the number of user groups has little influence on UGTE when the number of topics is large. Although the variances of UGTE_ID with small numbers of topics and user groups are bigger, it can achieve more competitive results in average. For the task of emotion classification, we can see that as the number of user groups increased, UGTE_ID performs stabler under |Z| = 100 than that under |Z| = 50 and |Z| = 25, as shown in Figure 3. Furthermore, UGTE_ID achieves higher accuracy and Kappa score when |Z| = 100. It indicates that UGTE_ID with a large number of topics performs better in the task of emotion classification.

Figure 2
figure 2

Topic coherence of UGTE_ID with different numbers of user groups

Figure 3
figure 3

Emotion classification of UGTE_ID with different numbers of user groups

4.3 Comparison with baselines

4.3.1 Topic discovery

The coherence of topics for our models and baselines over ISEAR are illustrated in Figure 4. Under Coherence@10, UGTE_ID performs the best when |Z|≤ 50, but achieves worse results when the number of topics increased. It indicates that UGTE_ID is more suitable to discover a small number of topics. On the other hand, the baseline model of CSTM performs better than other models when |Z|≥ 100. Though UGTE_ALL and UGTE_ID do not achieves competitive results under Coherence@10, they both perform better and more steadily under Coherence@20 and Coherence@30. Under Coherence@30, when |Z|≤ 150, UGTE_ID achieves the best performance. Neural network based model nSLTM achieves coherence values as the number of topics increased, which indicates that nSLTM is suitable for mining a large number of topics. sNTM does not perform as well as nSLTM, which achieves lower coherence values than nSLTM as |Z| increased.

Figure 4
figure 4

Topic coherence of UGTE_ID and baselines with different topic numbers when |G| = 10

To evaluate the differences of these models statistically, we also perform two kinds of statistical test on paired models. The first one is conducted to evaluate the stability of performance in terms of variances, and the second one is to evaluate the averaged performance in terms of means. The p-values are estimated for both kinds of statistical test. The conventional significance level (i.e., p-value) is 0.05, which means the null hypothesis can be rejected with a probability of 95%. The difference between paired models is statistically significant if the p-value is lower than 0.05. Firstly, the analysis of variance in terms of F-test is employed to test the underlying assumption of homoscedasticity. The F-tests are conducted on UGTE_ID, UGTE_ALL, AT, CSTM, MSTM, SLTM, sNTM and nSLTM. Results are shown in Table 3 where the significant values are highlighted in boldface. UGTE_ID is statistically significantly different from AT, CSTM, MSTM, SLTM, sNTM under Coherence@10, Coherence@20 and Coherence@30. It indicates that UGTE_ID is statistically stabler than AT, CSTM, MSTM, SLTM, sNTM over different topic numbers. UGTE_ID differs from nSLTM significantly under Coherence@20 and Coherence@30, indicating that UGTE_ID performs stabler than nSLTM under large numbers of top words in the topic coherence metric.

Table 3 P-values of F-test between UGTE_ID and other models

Secondly, t-tests are conducted to test the underlying assumption that the difference of performance between paired models has a mean value of zero (i.e., the null hypothesis implies identical performance). The results are shown in Table 4 where the significant values are highlighted in boldface. We can observe that UGTE_ID outperforms the baselines of CSTM and SLTM significantly under three coherence metrics. UGTE_ID is statistically significantly different from AT, MSTM, sNTM under Coherence@20 and Coherence@30.

Table 4 P-values of T-test between UGTE_ID and other models

4.3.2 Emotion detection

For the task of emotion classification, it needs to estimate parameters firstly on the training set, and then make prediction of emotions of unlabeled documents in the testing set. Figure 5a and b present accuracy and Cohen’s Kappa score of emotion classification on ISEAR. nSLTM achieves highest accuracy and Kappa score when |Z|≥ 100, which show the effectiveness of neural based algorithms when mining large topics. UGTE_ALL, UGTE_ID and nSLTM perform better than CSTM, MSTM, SLTM and sNTM. UGTE_ID performs better than UGTE_ALL when |Z|≤ 100 and |Z| = 200, indicating that multi-characteristics may have negative effects on emotion detection. In this task, UGTE_ID do not outperform nSLTM. However, nSLTM can not neither detect groups of individuals sharing similar interest nor give group-based topic and emotion analysis.

Figure 5
figure 5

Emotion classification results of UGTE_ID and baselines with different topic numbers when |G| = 10

Statistical tests are performed on the results, as shown in Tables 3 and 4. The results show that the variance of UGTE_ID is statistically different from those of MSTM, SLTM and nSLTM on both accuracy and Kappa score. It indicates that UGTE_ID performs stabler than MSTM, SLTM and nSLTM. The mean of UGTE_ID is statistically different from MSTM, SLTM, CSTM, sNTM and nSLTM. The results show that UGTE_ID outperforms than MSTM, SLTM, CSTM and sNTM.

4.4 Impact of user characteristics

Our proposed model aggregates characteristics tags to conduct group-based topic and emotion analysis. To explore the impact of different characteristics tags on UGTE, topic discovery and emotion classification are performed over 12 variant models of UGTE. Topic discovery and emotion analysis are conducted on each variant model with different numbers of topics are above experiments. Average results are taken as illustrated in Table 5. Not all the characteristics are helpful for UGTE to get a good result. We can observe that when compared with UGTE_NULL, some characteristics have positive effective on average like ID, CITY, COUNT, SEX, AGE, RELI, PRAC and FOCC under Coherence@10. Under Coherence@20, only ID, CITY, SEX, AGE, PRAC, FOCC and MOCC have positive effective on average while others have negative effects with lower coherence values. However, in the task of emotion classification, show that AGE and SEX can be helpful. UGTE_ALL performs worse than other variant models of UGTE on both topic discovery and emotion classification, which verifies that not all the characteristics tags can help to discover user groups. Statistics tests are performed on results between UGTE_NULL and other variant models, and the values are shown in Tables 6 and 7. It is obvious that different characteristics tags do not have statistically significant differences in terms of variances and means under Coherence@20, Coherence@30 and Accuracy.

Table 5 The mean and variance over different characteristics tags where the best results are highlighted in boldface
Table 6 P-values of F-test over GTSM_NULL and different characteristics tags
Table 7 P-values of T-test over UGTE_NULL and different characteristics tags

4.5 Performance on Extremely Short Text

To further explore the performance of joint topic-emotion models on extremely short texts, we run UGTE_ID and MSTM on “short text” and “extremely short text” subsets and present their results in Figures 6 and 7. The results indicate that both UGTE_ID and MSTM achieve higher coherence values on “short text” and perform a little unstably on “extremely short text”. It suggests that extremely short text brings more serious feature sparsity problem to joint topic-emotion models. However, by exploiting user characteristics, UGTE_ID achieves higher coherence values than MSTM consistently in the task of topic coherence. According to Figures 6a and b, we can observe that UGTE_ID achieves the best results on both “short text” and “extremely short text”. In Figure 6c, UGTE_ID performs better than MSTM when |Z|≤ 200, since UGTE may be more suitable to discovery a small number of topics. But UGTE_ID always performs much more stable than MSTM. In the task of emotion detection, UGTE_ID achieves much higher accuracy and Kappa score than MTSM. On the “extremely short text” subset, UGTE_ID achieves the best accuracy of 0.4594 and Kppa Score of 0.3693, while MSTM only achieves the best accuracy of 0.1849 and Kappa Score of 0.0452. Results show that user characteristics can improve the model perfomance by addressing the feature sparsity problem of short texts for both topic discovery and emotion detection.

Figure 6
figure 6

Topic coherence of UGTE_ID and MSTM over “short text” and “extremely short text” subsets

Figure 7
figure 7

Emotion detection results of UGTE_ID and MSTM over “short text” and “extremely short text” subsets

4.6 Case study

To verify the effect of user characteristics (e.g.,“Age”) on topic discovery and emotion detection, we compare UGTE_AGE and UGTE_NULL under |Z| = 100 and |G| = 10. Table 8 presents 10 representative words of topics with joy and sadness emotions, respectively. By checking these top words manually, we can conclude that the topics are related to “Intimate relationship”. The results indicate that words under the emotion of “joy” generated by UGTE_AGE are more about “Love” and “Wedding”. On the other hand, “period”, “helped” , “smoking”, and “job” discovered by UGTE_NULL seem to be less associated with “Love” or “Wedding”, making the topic less coherent. Similarly, the topic under the emotion of “sadness” discovered by UGTE_AGE is more about “injury” and “death”. By contrast, many words of UGTE_NULL are incoherent, such as “announced” and “handing”.

Table 8 Top 10 words of selected topic “Intimate relationship”

Different from conventional joint topic-emotion models, our method can identify portraits for these groups presented as distributions over characteristics. User portraits in Figure 8 show that the distribution over ages of group 1 achieves the maximum value within 11 to 20, while users in group 2 are between 41 to 50 mostly. Besides, group 1 concerns a little less about the topic than group 2. As we can see in Figure 8c, emotions of users in group 1 are more about “fear” and “joy”, which coincides with the mentality of youth. In contrast, users in group 2 feel more about “sadness” and “guilt” on the “Intimate relationship” topic, which could be understood according to the top words of topic with “sadness” emotion in Table 8. As shown in Figure 8d, UGTE_AGE achieves a higher accuracy than UGTE_NULL, which validates that the user characteristics of “Age” can help to improve the performance of emotion detection.

Figure 8
figure 8

Portraits of selected groups generated by UGTE_AGE

5 Conclusion and future work

To address the issue of feature sparsity in short text, we proposed a method named UGTE by modeling topics, emotions and user characteristics jointly. UGTE can explore the relationships of topics, emotions and users characteristics among different groups. In addition, short messages popular online bring challenges of feature sparsity problems to traditional joint topic-emotion models. So, introducing a user group layer to the topic-based emotion detection model, UGTE can efficiently aggregate short text into long pseudo-documents to address the feature sparsity problem of short text. Experiments conducted on a real-world dataset ISEAR showed that UGTE is not only effective in emotion detection, but also can mine significant topics concerned by each user group. With the development of neural network technologies, we plan to combine our proposed model with neural networks to improve its capacity for modeling topics and emotions at the user group level. Besides, considering the generality of the model, we also plan to propose a general framework for topic discovery by integrating other information, such as word position, context relevance, and so forth.