User group based emotion detection and topic discovery over short text

Feng, Jiachun; Rao, Yanghui; Xie, Haoran; Wang, Fu Lee; Li, Qing

doi:10.1007/s11280-019-00760-3

User group based emotion detection and topic discovery over short text

Published: 12 December 2019

Volume 23, pages 1553–1587, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

World Wide Web Aims and scope Submit manuscript

User group based emotion detection and topic discovery over short text

Download PDF

Jiachun Feng¹,
Yanghui Rao ORCID: orcid.org/0000-0003-1610-9599¹,
Haoran Xie²,
Fu Lee Wang³ &
…
Qing Li⁴

736 Accesses
23 Citations
Explore all metrics

Abstract

In recent years, with the development of social media platforms, more and more people express their emotions online through short messages. It is quite valuable to detect emotions and relevant topics from such data. However, the feature sparsity of short texts brings challenges to joint topic-emotion models. In many cases, it is necessary to know not only what people think of specific topics, but also which individuals have similar feedback, and what characteristics of these users have. In this paper, we propose a user group based topic-emotion model named UGTE for emotions detection and topic discovery, which can alleviate the above feature sparsity problem of short texts. Specifically, the characteristics of each user are used to discover groups of individuals who share similar emotions, and UGTE aggregates short texts within a group into long pseudo-documents effectively. Experiments conducted on a real-world short text dataset validate the effectiveness of our proposed model.

Supervised Intensive Topic Models for Emotion Detection over Short Text

ABET: an affective emotion-topic method of biterms for emotion recognition from the short texts

Article 02 April 2022

Hot news mining and public opinion guidance analysis based on sentiment computing in network social media

Article 18 December 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The rapid growth of social media platforms results in the increasing number of people who express their emotions through short messages [20]. To extract the great value from this type of data for social emotion mining and monitoring, it is necessary to perform topic discovery and emotion detection to identify topics and emotions embedded in short texts [11].

Topic discovery aims to model topics from documents based on their content, and emotion detection identifies emotions from documents at the word, sentence or document level. In the scenario of emotion mining, public emotions always vary from one topic to another topic, and topics trigger public emotions. Therefore, topic discovery and emotion detection are closely related such that jointly modeling topics and emotions is an appropriate way to conduct these tasks [34]. Furthermore, we may also want to learn not only the emotions of a single document or user, but also the statistical results based on groups of individuals sharing similar interests. For example, editors of magazines often want to identify common interests among readers to ensure that all of the major interests are covered in each issue. The editors are also interested in their readers’ characteristics (e.g., sex, age, and education level) to maintain the magazine’s content appropriately. Thus, there are practical reasons for jointly modeling topics, emotions and user groups [33].

However, there are challenges to effectively detect topics and emotions in short texts. First, each short message includes only a few words, resulting the lack of significant context [36]. Models directly applied to these types of text often suffer from the feature sparsity problem leading to undesirable results. Second, there is the question of which information should be used when discovering representative groups of users. Third, there is the question of how to model content, emotions, and user information within groups to capture the relationships between topics, emotions, and users.

Conventional emotion-aware topic models only present results at the word, sentence, and document-level. Emotion Topic Model (ETM) [2] assumes every word is selected according to specific topics and emotions. Multi-label Supervised Topic Model (MSTM) and Sentiment Latent Topic Model (SLTM) [24] first discover topics within each document and then analyze emotions towards those topics at the document level. These methods can not produce high-level results for groups of users, which we denote as user group level in this paper. They also suffer from the sparsity problem in short texts. Time-User Sentiment/Topic Latent Dirichlet Allocation (TUS-LDA) [32] aggregates short texts from a single user or a single time interval into lengthy pseudo-documents to tackle the above problem when detecting burst topics and social sentiment feedback. TUS-LDA can work at the user level when topics belong to a user’s static interest, or the global level when topics relate to current social issues. However, TUS-LDA can not discover groups of users, or the differences in topical interests and emotions between groups either. Besides, TUS-LDA needs pre-developed sentiment lexicons, which may be limited when dealing with a new emotion label.

Regarding the issue of how to divide users into groups, we observe that the more similar people are, the more likely they share similar interests. For example, in the 45th US presidential election, the Washington Post, an authoritative newspaper, used several sets of data to illustrate the characteristics of supporters of Donald Trump: the proportion of male supporters was 19% more than women; and 50% of those with annual incomes below $50,000 supported Trump versus 32% for those with higher income. Data broadly support the theory of homophily that relates similarity of interests to similarity of emotions [16]. Based on this phenomenon, this paper exploits user characteristics, content and emotions, to carry out topic discovery and emotion detection at the user group level.

We propose a method of emotion detection and topic discovery with the help of user characteristics, which is called User Group based Topic Emotion (UGTE) model. Our main contributions are summarized as follows. Firstly, UGTE models user characteristics, emotions, and content jointly to improve the effectiveness of both emotion detection and topic discovery. By influencing the process of topic generation, user characteristics help to identify semantic group structures. As mentioned above, individuals with similar characteristics are more likely to generate similar emotions. Therefore, when analyzing user group level results, UGTE considers not only topics and emotions information of an individual, but also the effects of user groups, as characterized by gender, social income, education, and others. Secondly, UGTE aggregates short texts into lengthy pseudo-documents and jointly models topics and emotions within each group to address the feature sparsity problem. Finally, different from existing methods, UGTE not only captures the relationship between topics and emotions for every group, presented as distributions over words, but also releases portraits for these groups presented as distributions over characteristics.

The rest of this paper is organized as follows. Section 2 introduces related work concerning topic models on short texts, joint topic emotion modeling, and community based sentiment/emotion detection. Section 3 demonstrates the proposed model and the inference of model parameters. Section 4 presents our experiments and discussions. In Section 5, conclusion is drawn.

2 Related work

2.1 Short text topic models

The topic model provides a solution for implicit semantic mining and understanding. Probabilistic Latent Semantic Analysis (PLSA) [10] is one of the first latent semantic models, which uses expectation-maximization (EM) algorithm for parameter inference. Given the fact that PLSA suffers from the overfitting problem, Latent Dirichlet Allocation (LDA) [3] introduces the Dirichlet distribution as the conjugate prior of topics. In recent years, LDA has achieved great success in information retrieval [21] and topic modeling [6, 15]. However, both LDA and PLSA perform well when mining topics from lengthy documents only. Nowadays, texts from the Internet are typically short and lacking context. The feature sparsity problem arises for LDA and PLSA when applied to short texts [36].

To overcome this limitation, the external document embedding method was first introduced to enrich contextual information in short texts [14, 19, 28]. This method is effective, but the enriched documents are not always consistent with the original messages. Thus, the method may have no effect or even a negative effect on the results. In addition, finding the auxiliary data is expensive and time-consuming. Besides the external document embedding, the Biterm Topic Model (BTM) [7] is an alternative method. It is proposed based on the idea that two words are more likely to belong to a same topic if they co-occurred more frequently. Such a kind of methods lengthen short texts by converting documents into biterm sets. However, the problem of biterm based methods lies in that they bring in little additional word co-occurrence information and therefore still face the feature sparsity problem [38]. Another alternative approach is integrating short texts into lengthy pseudo-documents, which solves the feature sparsity problem without carefully selecting external documents [39]. Twitter LDA [37] aggregates posts from a single user into a pseudo-document to identify topics from the words. TimeUserLDA [8], aggregates posts by user or timestamp to detect “breakout” topics. Such topics fall into two categories: personal static topics and temporal dynamic topics. Similar to TimeUserLDA, the model that incorporates temporal, personal and extraction factor (TUK-TTM) [35] aggregates posts by time slices or users to produce personalized time-aware tag recommendations. However, these models can not be applied to emotion detection. To mine burst topics on social media, TUS-LDA [32] introduces a sentiment variable to every post aggregated in pseudo-documents. Taking advantage of the aggregation method, our proposed model also uses this idea to address the feature sparsity problem. UGTE differs, however, by aggregating short messages from a user group into a pseudo-document to give group level results.

2.2 Jointly modeling topics and emotions

Data from the Internet contains users’ opinions and emotions. In recent years, to jointly model topics and emotions, several researchers have extended topic models to perform emotion detection of user-generated text, such as product and movie reviews [34]. ETM [2] uses emotion labels to implement a supervised emotion topic model for social emotion mining. Different from ETM which was developed from the writer’s perspective, MSTM and SLTM [24] model topics and sentiment labels from the perspective of readers. Experiments show that they are more suitable for public voting articles when mining social emotions. The Contextual Sentiment Topic Model (CSTM) [23] proposes to classify reader emotions by explicitly distinguishing context-independent topics from nondiscriminative information such as some very common words, and a contextual theme which characterizes context-dependent information across different collections. However, models mentioned above are applied on regular documents rather than shor texts. Weighted Labeled Topic Model (WLTM) [25] based on BTM models multiple emotion labels and biterms for short text emotion detection jointly. Except LDA-based methods, neural based topic models arise for topic discovery and supervise learning recently. Supervised Neural Topic Model (sNTM) [4] extracts topics based on neural network by following the document-topic distribution in topic models. However, observable labels have a little effect on the process of topic discovery. Neural Siamese Labeled Topic Model (nSLTM) [12] incorporates the supervision of labels into topic modeling, which can be applied to both classification and regression.

Previous joint topic emotion models only model topics and emotions at the word, sentence, or document level. They do not capture emotions and topics within groups of users, nor do they identify which user would be interested in specific topics. Our UGTE approach integrates user characteristics with topics and emotions so that the model performance can be enhanced by exploiting the relationships among topics, emotions, and users.

2.3 Community based sentiment/emotion detection

LDA-based topic models are applied widely to community detection. Community detection has been studied from the perspective of network structural communities and semantic communities. Since most methods for detecting network structural communities use graph partitioning algorithms considering only users’ relationships or interactions [18], we do not discuss them here due to their lack of relevance. Different from network structural community detection, semantic community detection takes both network structure and user semantic attributes into consideration. For example, the Group-Topic (GT) model uses entity relationships and textual attributes to simultaneously discover topics for events and communities among the entities [31]. The Topic User Community Mode (TUCM) uses social links, interaction types and context information to detect communities [27]. However, these methods do not take sentiments or emotions into consideration. To conduct sentiment analysis, the Sentiment-Topic model for Community discovery (STC) aggregates topics, sentiments and interactions among users to detect sentiment-topic level communities [33]. Work in [30] detects sentiment communities with social relationships between users, context and sentiment labels. However, it is unable to discover topics or opinions across communities. The People Opinion Topic (POT) model introduces opinion based community detection to discover hot topics and analyze sentiment along with detecting social communities [5].

The methods mentioned thus far integrate sentiment analysis and community detection to improve model performance on both tasks. However, there are several differences between our work and these studies. First, existing models of community detection mostly work on discovering the best structural community by examining users with more interactions. They take sentiments or user context into consideration and fail to extract topics or sentiments of different communities. Instead, our model attempts to discover topics and emotions at the group level, which not only models topics and emotions but also identifies people sharing similar interests in the same group. Second, most community detection models depend on information from users’ social relationships or online interactions. None of the existing models of community detection employ users’ characteristics tags to analyze relationships between users interests and their profiles. However, in many cases, a decision maker may want to know what different groups of users think of an event, and which characteristics have the largest influence. Our model achieves such high level results when other models fail.

3 User group based topic emotion model

In this section, we introduce our UGTE model and present its structure. After defining the problem, relevant general terms and notations, we will describe our model in detail. We also present our method of learning parameters.

3.1 Problem definition

Given a set of documents D = {d₁, d₂,..., d_|D|} with |D| elements, the vocabulary of D is W = {w₁, w₂,..., w_|W|} with size of |W|, the set of globally distinct emotion labels is E = {e₁, e₂,..., e_|E|} with |E| elements, and the set of users is U = {u₁, u₂,..., u_|U|} with size of |U|. Suppose every document of D is generated by one user and labeled with one of the above emotions. Each document d_i can be further denoted as $d_{i}^{r, k}$, which means document d_i is generated by user u_r and labeled with emotion e_k. The words in document d_i are denoted as $W_{d_{i}}=\{w_{i,1}, w_{i,2}, ..., w_{i, N_{i}}\}$, where N_i is the total number of words in document d_i.

To discover user group based topics and emotions, we need exploit user characteristics such as age, gender, and country. For total J types of characteristics collected, we denote the set of characteristics tags of the j th type as $F_{j} = \{ f_{j,1}, f_{j,2}, ..., f_{j,|F_{j}|}\}$ with |F_j| elements. We denote the characteristics tags for each user u_r as $F^{u_{r}}=\{{f_{1}^{r}}, {f_{2}^{r}}, ..., {f_{J}^{r}}\}$ where ${f_{j}^{r}}$ is the element of the j th type characteristic tag of u_r, that belong to F_j. For example, assume that three users u₁, u₂, and u₃ have characteristics tags $F^{u_{1}}=\{$ ‘Male’, ‘22’, ‘America’ }, $F^{u_{2} }=\{$ ‘Female’, ‘23’, ‘America’ } and $F^{u_{3}}=\{$ ‘Male’, ‘24’, ‘America’ }, respectively. There are totally J = 3 different types of characteristics tags: gender, age, and country. From the tag values, we determine that $F_{1}=\{$ ‘Male’, ‘Female’ } with |F₁| = 2, F₂ = { ‘22’, ‘23’, ‘24’ } with |F₂| = 3 and F₃ = { ‘America’ } with |F₃| = 1.

Our primary task is to jointly discover the topics $Z=\{z_{1}, z_{2}, ..., z_{|Z|} \}$ with size of |Z| and the emotions of given documents at the user group level. In other words, we should detect different user groups $G = \{g_{1}, g_{2}, ..., g_{|G|}\}$ with |G| elements, infer the topic distributions of different groups 𝜃_g, and analyze the emotion distributions ϕ_{g, z} of each topic within groups simultaneously. In our UGTE model, the user groups are latent variables as topics. The number of user groups is a predefined parameter, whose impact will be detailed in Section 4.2. Table 1 provides a summary of the notations used in our presentation.

Table 1 Notations used in UGTE

User group based emotion detection and topic discovery over short text

Abstract

Similar content being viewed by others

Supervised Intensive Topic Models for Emotion Detection over Short Text

ABET: an affective emotion-topic method of biterms for emotion recognition from the short texts

Hot news mining and public opinion guidance analysis based on sentiment computing in network social media

Explore related subjects

1 Introduction

2 Related work

2.1 Short text topic models

2.2 Jointly modeling topics and emotions

2.3 Community based sentiment/emotion detection

3 User group based topic emotion model

3.1 Problem definition

3.2 Generative process

3.3 Parameter inference

4 Experiments

4.1 Experimental setup

4.1.1 Dataset

4.1.2 Baselines

4.1.3 Metrics

4.1.4 Parameter setting

4.2 Influence of user group numbers

4.3 Comparison with baselines

4.3.1 Topic discovery

4.3.2 Emotion detection

4.4 Impact of user characteristics

4.5 Performance on Extremely Short Text

4.6 Case study

5 Conclusion and future work

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix:

Appendix:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation