Fine-grained emotion classification of Chinese microblogs based on graph convolution networks

Lai, Yuni; Zhang, Linfeng; Han, Donghong; Zhou, Rui; Wang, Guoren

doi:10.1007/s11280-020-00803-0

Fine-grained emotion classification of Chinese microblogs based on graph convolution networks

Published: 17 June 2020

Volume 23, pages 2771–2787, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

World Wide Web Aims and scope Submit manuscript

Fine-grained emotion classification of Chinese microblogs based on graph convolution networks

Download PDF

Yuni Lai¹,
Linfeng Zhang¹,
Donghong Han ORCID: orcid.org/0000-0002-9056-1308¹,
Rui Zhou² &
…
Guoren Wang³

1736 Accesses
49 Citations
Explore all metrics

Abstract

Microblogs are widely used to express people’s opinions and feelings in daily life. Sentiment analysis (SA) can timely detect personal sentiment polarities through analyzing text. Deep learning approaches have been broadly used in SA but still have not fully exploited syntax information. In this paper, we propose a syntax-based graph convolution network (GCN) model to enhance the understanding of diverse grammatical structures of Chinese microblogs. In addition, a pooling method based on percentile is proposed to improve the accuracy of the model. In experiments, for Chinese microblogs emotion classification categories including happiness, sadness, like, anger, disgust, fear, and surprise, the F-measure of our model reaches 82.32% and exceeds the state-of-the-art algorithm by 5.90%. The experimental results show that our model can effectively utilize the information of dependency parsing to improve the performance of emotion detection. What is more, we annotate a new dataset for Chinese emotion classification, which is open to other researchers.

Aspect-level sentiment analysis based on semantic heterogeneous graph convolutional network

Article 21 January 2023

Transformer and Multi-scale Convolution for Target-Oriented Sentiment Analysis

Highway-Based Local Graph Convolution Network for Aspect Based Sentiment Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Sentiment analysis (SA) is a problem belonging to natural language processing (NLP) to detect sentiment polarities of a writer from a piece of text. In the research field of SA, fine-grained emotion classification aims to detect exact types of feelings such as happiness, sadness, like, anger, disgust, fear, and surprise. Nowadays, more and more Chinese people share their daily feelings on social networks. For example, one of the most popular Chinese microblog platforms, Sina microblog, has its monthly active users increased to 431 million by August 2018 [1]. SA for social network has become a hot topic in recent years, and the analysis results can be widely used in public opinion analysis, psychological research, social events and even political elections [40].

Sentiment analysis (or emotion detection) for English tweets is quite satisfactory in the SemEval-2017 with 48 teams participated [26]. However, there are not as much research in the field of Chinese. One of the key issues in emotion analysis for Chinese tweets is how to understand text with various and complex syntactic structures. When a person is reading a sentence, she reads words one by one and has grammar structures in mind to comprehend the entire sentence. Guided by this reading behaviour, a deep learning model of long short-term memory (LSTM) [7] was proposed to deal with NLP problems. However, although LSTM has a long memory of essential words, it may have not fully exploited the syntactic information of text to see the sentence as a whole as a person.

In fact, syntax structure is important for a model to identify the emotion of texts, especially for microblogs that contain the same set of words but different syntaxes. For example, the meanings of the two sentences “Don’t you know that?” and “You don’t know that!” are different since the former one expresses ’surprise’ while the latter one is probably more negative. Motivated by this, we consider dependency parse tree, which reveals the syntactic dependency relations among language components, could be of help. Besides, according to other existing research, dependency parse tree did help to increase the performance of other NLP tasks like event detection [24] and semantic role labeling [21]. Specifically, syntactic dependency identifies subject-verb, adverbial and other grammatical structures and then analyzes the relationship between different elements in a sentence. Figure 1 shows the dependency analysis result of an example sentence (; I am very unhappy today). If there is an edge between two words, these words are related [15, 22].

Dependency parse tree is a kind of non-Euclidean structure, which cannot be processed by convolution neural networks (CNN) [16], a widely-used deep learning network. In order to apply deep learning on non-Euclidean domain, graph convolution network (GCN) [14] is proposed to extend convolution to graph-structure domains. As a result, in order to integrate grammar information and better simulate the reading process of human, we propose a syntax-based GCN model to detect emotion polarities. Firstly, bidirectional long short-term memory network (Bi-LSTM) [8] is adopted to extract preliminary features for each word. Then these features and dependency parse tree are fed in GCN. Dependency parse tree based on text is used to build the convolution graph for GCN. In the graph, each word is regarded as a vertex, and each syntactic dependency relation between two words is regarded as an edge. In the operation of GCN, each word feature given by Bi-LSTM, regarded as the feature of each vertex, is convoluted according to the graph. GCN obtains a final representation vector of each microblog for classification. The convolution over dependency parse graph can focus on relevant words to promote the ability to capture complex linguistic phenomena.

In this paper, we tackle the challenge of Chinese fine-grained emotion classification, including seven types of basic emotions—happiness, sadness, like, anger, disgust, fear, and surprise. To the best of our knowledge, this is the first work to integrate dependency parse tree in the graph convolution network model for emotion detection. Our main contributions of this paper are as follows:

We propose a novel syntax-based graph convolution network (GCN) model for emotion detection of Chinese microblogs. After preprocessing, a set of embedding vectors representing a microblog are fed into a Bi-LSTM neural network. Then, GCN applies convolution on the preliminary word features from Bi-LSTM according to syntactic trees, extracting comprehensive features of microblogs.
In order to improve the performance of the model, we propose a pooling method of percentile pooling, which enhances the robustness of the model since percentile is not sensitive to outliers.
Experimental results show that the proposed model can fully exploit the syntactic relation of a sentence and outperform state-of-the-art algorithms. In addition, we collected and labeled 15,664 microblogs to construct a high-quality Chinese microblogs dataset which is open to other researchers. Our experiment codes and datasets are free on GitHub.^{Footnote 1}

The rest of paper is organized as follows: In Section 2, we introduce the background and related work. Our model and novel algorithms are presented in Section 3, with experimental results and evaluations discussed in Section 4. Finally, Sections 5 gives the conclusions and future work.

2 Related work

2.1 Sentiment classification

Sentiment analysis (SA) tasks have attracted much attention these days, including two main types: coarse-grained sentiment polarity classification and fine-grained emotion classification. Our work belongs to the latter one.

In recent years, deep learning methods are applied to analyze text sentiment, such as convolution neural network (CNN) and recurrent neural network (RNN). Yoon Kim [13] used CNN trained on top of pre-trained word vectors for sentence-level classification tasks. After that, much research has been done to improve the CNN models as well as word representation. Kalchbrenner et al. [12] described a dynamic convolution neural network with dynamic k-max pooling, modeling the semantics of sentences. Chen et al. [5] built a character embedding with dual channel convolution neural network to comprehend the sentiment of Chinese short comments in Sina Weibo. Their model reached a relatively high accuracy of 88.35% on the NLP&CC2012 dataset for binary classification. Zhao et al. [10] introduced a word embedding method combining n-grams features and word sentiment polarity score features. They integrated the feature set into a deep convolution neural network to train the sentiment classifier. Xue & Li [33] proposed a model based on CNN with gating mechanisms, which had higher accuracy and efficiency. Unfortunately, these CNN-based works have not made full use of syntax information of text.

Other researchers used extended recurrent neural network (RNN) to classify sentiment polarity. Socher et al. [28] introduced a sentiment treebank and recursive neural tensor network which outperforms all previous methods on several metrics. Their work employed syntactic information on Stanford Sentiment Treebank corpus with full sentiment labeled parse trees in neural networks. In order to design a document-level sentiment classification model, Tang et al. [29] employed LSTM to extract the sentence representation and then used gated recurrent neural network to encode document representation. Baziotis et al. [3] used deep LSTM along with context-aware attention mechanisms to conduct topic-based sentiment polarity analysis in Twitter. Similarly, Ma et al. [20] integrated commonsense knowledge into LSTM model extended with target-level attention and sentence-level attention. Although RNN is a suitable network for NLP, if it is used alone, its learning ability becomes limited when the task getting complex, because during the operation, RNN throws away some information that may be useful in some situation and fails to give an overall view. Instead of applying RNN alone, we combined LSTM with graph convolution network, which complements the shortcomings of LSTM while keeps its advantages.

Instead of using CNN or RNN alone, some researchers combined two networks together to obtain better performance. Wang et al. [30] proposed a CNN-LSTM model including regional CNN and LSTM to gain the valence-arousal ratings of texts, local and overall information. Our idea is similar to it, stacking two different neural networks to improve the performance of the deep learning model. However, none of the existing works have combined LSTM with syntactic graph convolution network for sentiment analysis as we do in this paper.

Besides improving network structures, some research tried to improve models with linguistic information. Qian et al. [25] attempted to model the semantic role of sentiment lexicons, negation words, and intensity words in sentiment expression. Meanwhile, they expressed these linguistic factors with mathematical operations, parameterized with shifting distribution vectors or transformation matrices. Similarly, Lei et al. [18] integrated three kinds of sentiment linguistic knowledge (e.g., sentiment lexicon, negation words, intensity words) into the deep neural network via attention mechanism and proposed a multi-sentiment-resource enhanced attention network to better exploit sentiment linguistic knowledge. Although they also integrated linguistic content to train models, sometimes there is no sentiment lexicon information in some microblogs. These works did not used dependency parse tree to train a deep learning network. Since dependency parse tree is a quite mature technique that has been successfully used in language understanding, it can reveal syntax structure of sentences more accurately and clearly.

There are more research efforts focusing on fine-grained emotion detection. Some works are based on traditional machine learning methods, rule-based, emotion lexicon, and emoticons. Li & Xu [19] designed a support vector regression method along with rule-based emotion caused extraction to classify 6-point emotion for Sina Weibo. Wen & Wan [32] introduced an approach based on sequential class rules for emotion classification of microblog, in which each tweet was regarded as a sequence. They used emotion lexicons and machine learning to obtain potential emotion labels in tweets. Zheng & Purver [38] described a method for detecting emotions of Chinese microblog using distant supervision with conventional labels (emoticons and smileys). Unfortunately, the process of generalizing and designing linguistic patterns for posts with complicated linguistic patterns is labor-intensive and time–consuming. Different from rule-based methods above, we design an automatic model without human generated rules.

Recently, deep neural networks were employed for emotion detection. Wang et al. [31] leveraged the skip-gram language model to learn distributed word representations as input features and utilized a CNN-based method to classify multi-label emotion sentences of Chinese microblogs. Muhammad & Lyle [2] built an enormous, automatically curated dataset for emotion detection using distant supervision and then modeled with gated recurrent neural networks, achieving an average accuracy of 95.68% in 8-point classification on English text. He et al. [6] proposed an emotion-semantics enhanced multi-channel convolution neural network model that used emoticons to construct emotion space, combining different embedding methods to extract features which would be fed into multi-channel convolution neural network. He & Xia [7] proposed a joint Bi-LSTM network model to detect multi-label emotion on a Chinese long blog corpus. Different from works above, we integrate dependency parse tree that can be obtained by existing algorithms to make our model more suitable for fine-grained task, aiming to enhance deeper understanding of text by introducing grammar information to the model.

Although sentiment analysis has been developed for several years, the emotion classifying accuracy for Chinese microblogs is still not satisfactory due to the complexity of Chinese. In this paper, we focus on this issue and adopt Bi-LSTM model and graph convolution neural network to enhance emotion understanding of microblogs.

2.2 Pooling method

Pooling layer plays an important role in image processing and deep learning since it helps to keep values we need and discard others. In image quality assessment field, percentile pooling has been introduced to detect the poor-quality regions [27, 35]. According to a common experience that deep neural network is expected to obtain a higher value if the input is close to the learned pattern, max pooling is widely-used to extract the maximum value for further analysis. Max pooling, as a special case of percentile pooling, performs well in neural networks, but there is still room for improvement.

Based on max pooling, k-max pooling, keeping k top maximum values in order to include more information, was designed by Kalchbrenner et al, which has better performance in some cases [11].

Another popular global pooling method is average pooling [4] that keeps the mean value of a group.

Different from the works above, we introduce percentile pooling into neural networks where the percentile is not restricted to the maximum since the specific percentile is adjustable.

2.3 Graph convolution neural network

There are lots of graph structure data in the world such as social network among people, Web page link network and paper citation network. In these applications, every node object not only has its own feature but also has its connection information to others. For example, in the social network, each person has his/her feature, such as age, hobbies and job. Besides, each person makes friends with different people. If we want to classify these people, using the connection information is necessary. However, CNN can only take the feature information. In order to take the connection information as well, Kipf & Welling [14] proposed a spectral graph convolution network (GCN) that uses localized first-order approximation to encode both local graph structure and features of nodes, which speeds up the process of training and works well in semi-supervised classification. After that, GCN draws more attention of researchers and then was employed in a variety of applications such as Web-scale recommendation systems [36], Skeleton-based action recognition [34], and traffic forecasting [37]. In NLP tasks, GCN also has good performance in event detection [24], semantic role labeling [21]. Considering the significance of syntax structure in sentiment analysis and the graph-structure of dependency parse tree, we propose to use GCN to integrate grammar tree in order to better extract emotions of complex Chinese text. To the best of our knowledge, this is the first work to employ GCN in Chinese emotion detection.

3 Syntax-based graph convolution neural network model

Most of the microblogs have a variety of emotion tendencies while some texts do not express any sentiment. In practical applications, such as psychological research or user emotional portrait, emotion analysis for social networks is necessary. According to the task of NLP&CC2013, in this paper, we focus on the research of classifying Chinese microblogs with sentiment into seven categories: happiness, sadness, like, anger, disgust, fear, and surprise.

The model is mainly composed of three parts: (i) First, a Bi-LSTM network is used to extract preliminary word features of the given text. (ii) Then, we feed the preliminary word features and the dependency parse tree built for each microblog into a single-layer graph convolution network (GCN) to exploit the emotion feature of the microblog. (iii) Finally, we obtain the probability distribution with pooling or fully connected layer. As Figure 2 shows, our model has the following steps:

Preprocessing
Word embedding
Bi-LSTM layer
GCN layer
Pooling or fully connected layer
Softmax classification

3.1 Preprocessing

Raw microblog data usually have redundancy and noise such as the URLs, ’@’ symbols and useless stop words. We first clean up all the unnecessary contents of the text. Next, jieba^{Footnote 2} python package is adopted to realize Chinese sentence segmentation, separating the sentences into words. Then, we use the ltp^{Footnote 3} python package to get the dependency parse tree of each microblog. For word embeddings, we use random initialization method to represent each word by a random vector, and the vector will be updated during network training.

We denote each microblog after preprocessing as $X=\left \{ x_{1},x_{2},\ldots ,x_{n}\right \}$, in which $ x_{i}\in \mathbb {R}^{300} \left (i=1,\ldots , n\right )$ is the embedding vector representing a word. Then X would be fed into a Bi-LSTM neural network.

3.2 Bidirectional LSTM neural network

A Bi-LSTM is used to obtain the rudimentary representations of microblogs. After applying Bi-LSTM on word embedding sequence $X=\left \{ x_{1},x_{2},\ldots ,x_{n}\right \}$ of a microblog, we can get both forward and backward vector sequences, denoted by $L_{1}=\left \{ l_{11},l_{12},\ldots ,l_{1n}\right \}$ and $L_{2}=\left \{ l_{21},l_{22},\ldots ,l_{2n}\right \}$ respectively. We concatenate the two sequences to get $L=\left [\begin {array}{l} L_{1} \\ L_{2} \end {array}\right ]=\left \{ l_{1},l_{2},\ldots ,l_{n}\right \}$, where $l_{i}=\left [l_{1i},l_{2i}\right ]^{\mathrm {T}}, i=1, \ldots ,n$. L is the elementary feature of X. Here, each word in a sentence has its own word feature l_i. Next, each output vector l_i from Bi-LSTM and the syntactic trees will be the input of the GCN network.

3.3 Graph convolution network

For each microblog, a graph $G=\left (V,E\right )$ is built, where V is the vertex set consisting of all words of a microblog, and E is the edge set including all dependency relations between two words. According to Kipf & Welling [14] and Marcheggiani & Titov [21], we add self-loops and opposite edges to the edge set, which can improve the general ability of GCN. The numbers ‘0’, ‘1’, ‘1’, ‘1’ are applied to label the types of dependency relations of no relation, self-loop relation, head-to-dependent, and dependent-to-head, respectively. Based on these rules, the sparse adjacency matrix of the dependency parse tree, denoted by A, for each blog is created. Figure 3 shows the corresponding adjacency matrix for the example sentence (I am very unhappy today) according to its dependency parse tree. For words of different sentences in the same microblog, we regard no relation between them, and the labels of the edges are all ‘0’. Since every microblog is no longer than 140 words, we set every adjacency matrix A of size $\left [ 140\times 140\right ]$, and padding with ‘0’.

There is one-to-one correspondence between the nodes and word feature vectors $\left \{ l_{1},l_{2},\ldots ,l_{n}\right \}$ given by Bi-LSTM, and each $l_{i}, \left (i=1,\ldots , n\right ) $ is the word feature of each vertex. GCN applies convolution on features of the vertices according to the graph G which is represented by the adjacency matrix A.

According to Kipf & Welling [14], GCN network can be calculated as:

$$ Z=ReLU\left( \widetilde{D}^{-\frac {1}{2}}A\widetilde{D}^{-\frac {1}{2}}L{\varTheta}\right), $$

(1)

where ReLU is the rectifier linear unit activation function; $\widetilde {D}$ is the degree matrix of dependency tree, which is attained by $ \widetilde {D}_{ii}={\sum }_{j}A_{ij}$; A is a sparse adjacency matrix of a dependency parse tree; $L=\left \{ l_{1},l_{2},\ldots ,l_{n}\right \}$ is a matrix that made of vectors given by Bi-LSTM; Θ is the weight matrix that learned by the network training.

3.4 Percentile pooling

The aim of the pooling layer is to improve the invariance and the efficiency of the neural network model. In general, max pooling performs better than average pooling when dealing with NLP problems [39]. However, sometimes max pooling is not robust enough since there may be some uncontrollable and unexpected noise in models, which would cause the maximum of a group of numbers extremely high. To solve the problem, we adopted a novel pooling method based on percentile, named p th percentile pooling. p th percentile means the lowest p% value of a set when the elements have been ascendingly sorted [23]. As we know, p th percentile like the 50th percentile (median) is more robust than average since it is not sensitive to outliers. Besides, percentile pooling also have its advantage compared to max pooling. Max pooling is a special case of percentile pooling. However, the maximum value might not be the best choice all the time. If there is a very large outlier in a group of numbers, max pooling always output the outlier. In this case, 90th percentile pooling can be of help to better represent the top 10% large values while excluding the outlier as shown in Figure 4. We denote the p th percentile for a vector Z as a function f_p(z), in which the value p ranges from 0 to 100. For example, f₁₀₀(z) is the maximum and f₅₀(z) is the median of Z. We use f_p(z) as the pooling function for p th percentile pooling. Here, p is a hyper-parameter that can be adjusted until the best result is obtained, which provides researchers with a more flexible choice aside from max pooling.

3.5 Orthogonalization constraint

In order to control the problem of vanishing and exploding gradients, we employ an orthogonalization constraint in training. A convenient way to achieve an approximate orthogonalization constraint is to add a regularization term in the loss function as follow,

$$ Loss=loss\left( y,f_{w}\left( x\right) \right)+\lambda \sum\limits_{i}\left\| {W^{T}_{i}}W_{i}-I\right\|^{2}, $$

(2)

where loss() is the original loss function, y is the label, and f_w is the predicted class, λ is the penalty coefficient, W_i is the weight matrix, and I is an identity matrix. Besides, weight matrices in Bi-LSTM and GCN are initialized with orthogonal matrices. We employ singular value decomposition (SVD) on a randomly initialized matrix M, and we get M = USV^T, where U and V are orthogonal matrices and S is a diagonal spectral matrix. U or V can be used to initialize weight matrix W, that is W := U.

4 Experiments and results analysis

All the codes are implemented on PyTorch 0.4.0 running on Linux CUDA platform. It takes around 2 hours to train a syntax-based graph convolution network (GCN) model on a 1080Ti GPU computer.

4.1 Datasets

In our experiments, the NLP&CC2013 dataset^{Footnote 4} is chosen, which was used for sentiment classification task for the International Conference on Natural Language Processing and Chinese Computing (NLP&CC). To improve the generalization ability of our model, we crawled 15,664 microblogs from Sina Weibo randomly and then labeled them by three human judges. Inconsistent labels were determined by votes. The test dataset is provided by NLP&CC2013 for testing, and all the remaining are used for training. Table 1 shows the distribution of our datasets.

Table 1 Datasets

Full size table

4.2 Performance measure

In this paper, since the testing dataset is originated from NLP&CC Emotion Analysis in Chinese Weibo Text task,^{Footnote 5} we use the same metrics as before for the convenience of comparison. The evaluation metrics are macro average and micro average on precision, recall, and F-measure which defined as follows:

$$ Macro_{Percision}=\frac{1}{7}\sum\limits_{i}\frac{\#system\_correct\left( emotion=i\right)}{\#system\_proposed\left( emotion=i\right)}, $$

(3)

$$ Macro_{Recall}=\frac{1}{7}\sum\limits_{i}\frac{\#system\_correct\left( emotion=i\right)}{\#gold\left( emotion=i\right)}, $$

(4)

$$ Macro_{F\_measure}=\frac{2\times Macro_{Percision}\times Macro_{Recall}}{Macro_{Percision}+Macro_{Recall}}, $$

(5)

$$ Micro_{Percision}=\frac{{\sum}_{i}\#system\_correct\left( emotion=i\right)}{{\sum}_{i}\#system\_proposed\left( emotion=i\right)}, $$

(6)

$$ Micro_{Percision}=\frac{{\sum}_{i}\#system\_correct\left( emotion=i\right)}{{\sum}_{i}\#system\_proposed\left( emotion=i\right)}, $$

(7)

$$ Micro_{F\_measure}=\frac{2\times Micro_{Percision}\times Micro_{Recall}}{Micro_{Percision}+Micro_{Recall}}, $$

(8)

where #gold is the number of labels manually annotated for the test set, #system_proposed is the number of classified tags by our system on the test set, and #system_correct is the number of microblogs which are correctly classified. i is one of the emotion type among happiness, sadness, like, anger, disgust, fear, and surprise.

4.3 Hyper-parameters and training

The details of hyper-parameter setting in syntax-based graph convolution network (GCN) model is shown in Table 2.

Table 2 Hyper-parameters setting

Full size table

Figures 5 and 6 show how the word embedding size and the number of hidden neurons in LSTM influence the F-measure of the model. These experiments show that the F-measure of our model is higher than 80% with different word embedding sizes and different numbers of hidden neurons in LSTM. We fix the embedding size as 300 and number of hidden neurons in LSTM as 180 since they are likely to provide the best performance of our model. For the layers of GCN, we set it as 1. According to Marcheggiani and Titov [21], when GCN is on the top of LSTM to deal with NLP task, single-layer GCN can best collaborate with LSTM.

4.4 Comparison of fine-grained emotion classification among different models

We chose several sentiment classification algorithms as baselines, including both conventional machine learning methods and state-of-the-art neural network architectures. Results of our model against the best team in NLP&CC2013 and other baselines are listed in Table 3.

Table 3 Comparison of different models on the NLP&CC2013 testing dataset

Full size table

Benchmarks

CNN:

In this approach, widely-used CNN model with one layer and max-pooling method is used. Other parts in the model are the same as others.
LSTM:

LSTM is also a powerful model in NLP tasks. Two-layer LSTM is employed to extract text features word by word.
LSTM+CNN:

Different from our model, this approach replaces GCN with CNN which is similar to GCN but do not have graph in it. This comparison shows both the contribution of GCN and syntax.
LSTM+GCN (without syntax information):

In order to show the contribution of syntactic graph, we remove dependency parse tree in this experiment by assigning all element in the adjacency matrix with ‘1’, so that every word links to others directly.

According to Table 3, our syntax-based GCN model outperformed the other models by 10.04% on Macro_F−measure. Compared with the LSTM model and CNN network, LSTM with GCN attached could catch the context emotion information in the microblog more effectively, retaining the syntactic information of a sentence. Dependency parse tree is trained on a vast corpus using deep learning methods, which brings much information about language structure.

In general, as is shown in Table 3, syntax-based GCN with dependency parse tree increases both the Macro_F−measure and Micro_F−measure scores by around 5% compared with LSTM + GCN without syntax information. Tables 4 and 5 show the detailed categorical performance of our model with and without syntax information respectively. According to Tables 4 and 5, syntax-based GCN network can make use of the sophisticated techniques of dependency parsing, especially for the emotion of ’Surprise’ on which the syntactic information helps to increase the F1-score by 11%.

Table 4 Detailed result of syntax-based GCN emotion detection

Full size table

Table 5 Detailed result of LSTM+GCN(without syntax information)

Full size table

4.5 Comparison of different p th percentile pooling methods

To test the validity of percentile pooling, we compare different p th percentile pooling methods with other widely used pooling methods. Comparison results are shown in Table 6.

Table 6 Comparison of different p th percentile pooling

Full size table

According to Table 6, although the widely used max pooling works well, our models with 50th percentile pooling has the best performance, and that indicates percentile pooling fits syntax-based GCN model well. Experiments show that percentile pooling can improve the Micro_F−measure of our model by 3.55%. Besides, we tested 50th percentile pooling in the inferior CNN and LSTM models, which are showed in Table 7. It shows that 50th percentile pooling also outperformed max pooling by 1.63% and 1.65% respectively.

Table 7 Micro_F−measure of 50th percentile pooling and max pooling methods

Full size table

4.6 Comparison of different orthogonalization constraints

We apply orthogonalization constraints penalty on both Bi-LSTM and GCN weight matrices to learn long-term dependence. The following experiments show how the coefficient parameter of the penalty influences the performance. Experiment results of different penalty coefficients are shown in Table 8. It shows, by setting the orthogonalization constraint to 1 × 10^− 8, our model can be effectively improved by 3.39% and 5.54% in Micro_F−measure and Macro_F−measure scores respectively.

Table 8 Comparison of different penalty coefficients

Full size table

4.7 Comparison of sentiment binary classification models

Our model can be transformed into a polarity classification of positive and negative sentiments. In such case, microblogs with an ambiguous emotion type of surprise are removed, and we consider emotion types of happiness, like belong to positive while anger, disgust, sadness, and fear belong to negative. To fit the binary classification problem, we set the output dimension of graph convolution network to 2, and other parts of the model remain unchanged. The comparison results of the different models are shown in Table 9.

Table 9 Comparison of polarity classification result

Full size table

According to Table 9, the syntax-based graph convolution network (GCN) model also performs well in polarity sentiment classification with higher accuracy and precision. Experiments show that syntax-based GCN improve the performance slightly in binary emotion classification task. One possible reason for this result is that binary classification task is relatively simple, and its accuracy is already high, so every 1% improvement can be very challenging, while in fine-grained task, it is more complex, for which more information contributed by GCN with dependency parse tree is useful.

5 Conclusions and future work

In this paper, we introduced a syntax-based graph convolution network (GCN) model for fine-grained emotion detection. To the best of our knowledge, this is the first work to integrate syntactic information in the GCN for emotion detection. Experiments show that our model has fully exploited the syntactic relation of a sentence, which outperforms state-of-the-art algorithms. Besides, we proposed a new feature pooling method for neural networks which promotes the robustness of the model. Finally, we contribute a new annotated dataset of Chinese microblogs to public research.

It is an interpretable phenomenon that syntactic information boosts the performance of our model. However, the method we used to construct sentence graph from word wise dependence parsing is still primitive, declaring a grander prospect of our model. In this paper, exact types of syntactic dependency relationship are ignored, and the choosing of p th in percentile pooling is arduous. Future work can incorporate more information about syntactic dependency relationship. Besides, the p th percentile pooling can be designed to be self-adapted during training or try k-percentile pooling that similar to the k-max pooling method. Last but not least, in this paper, we only focus on Chinese microblogs. When we apply our model to English corpus (Stanford Sentiment Treebank), the model does not show its advantages compared to the state-of-the-art. Future work can be done on how to adapt the model to other languages such as English.

Notes

References

A users number report of Sina Weibo: http://tech.sina.com.cn/i/2018-08-08/doc-ihhkuskt9903395.shtml. Accessed 27 Jan 2019
Abdul-Mageed, M., Ungar, L.: Emonet: Fine-grained emotion detection with gated recurrent neural networks. ACL’17 1, 718–728 (2017)
Google Scholar
Baziotis, C., Pelekis, N., Doulkeridis, C.: Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis. SemEval’17, pp. 747–75 (2017)
Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp 2559–2566 (2010)
Chen, S., Ding, Y., Xie, Z., Liu, S., Ding, H.: Chinese Weibo sentiment analysis based on character embedding with dual-channel convolutional neural network. ICCCBDA’18, pp. 107–111 (2018)
He, Y., Sun, S., Niu, F., Li, F.: A deep learning model enhanced with emotion semantics for microblog sentiment analysis. Chin. J. Comput. 40(4), 773–790 (2017)
Google Scholar
He, H., Xia, R.: Joint binary neural network for multilabel learning with applications to emotion classification. NLPCC’18 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neur. Comput. 9 (8), 1735–1780 (1997)
Article Google Scholar
Jiang, F., Liu, Y., Luan, H., Sun, J., Zhu, X., Zhang, M., Ma, S.: Microblog sentiment analysis with emoticon space model. J. Comput. Sci. Technol. 30 (5), 1120–1129 (2015)
Article Google Scholar
Jianqiang, Z., Xiaolin, G., Xuejun, Z.: Deep convolution neural networks for twitter sentiment analysis. IEEE Access. 6, 23253–23260 (2018)
Article Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences (CVPR). ACL’14 2014, 655–665 (2014)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. ACL’14, pp. 655–665 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. EMNLP’14, pp. 1746–1751 (2014)
Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5Th International Conference on Learning Representations(ICLR) (2016)
Koo, T., Carreras, X., Collins, M.: Simple semi-supervised dependency parsing. ACL’08, pp. 595–603 (2008)
Lécun, Y, Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lee, J.Y., Dernoncourt, F.: Sequential Short-Text classification with recurrent and convolutional neural networks. NAACL’16, pp. 515–520 (2016)
Lei, Z., Yang, Y., Yang, M., Liu, Y.: A multi-sentiment-resource enhanced attention network for sentiment classification. ACL’18, pp. 758–763 (2018)
Li, W., Xu, H.: Text-based emotion classification using emotion cause extraction. Expert. Systems. Appl. 41(4), 1742–1749 (2014)
Article Google Scholar
Ma, Y., Peng, H., Cambria, E.: Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. In: Proc of AAAI (2018)
Marcheggiani, D., Titov, I.: Encoding sentences with graph convolutional networks for semantic role labeling. EMNLP’17, pp. 1506–1515 (2017)
Mcdonald, R., Pereira, F.: Online learning of approximate dependency parsing algorithms. EACL’06 (2006)
Moorthy, A.K., Bovik, A.C.: Visual importance pooling for image quality assessment. IEEE J. Sel. Top. Signal. Process. 3(2), 193–201 (2009)
Article Google Scholar
Nguyen, T.H., Grishman, R.: Graph convolutional networks with Argument-Aware pooling for event detection. AAAI’18, pp. 5900–5907 (2018)
Qian, Q., Huang, M., Lei, J., Zhu, X.: Linguistically regularized SLTMs for sentiment classification. ACl’16, pp. 1679–1689 (2016)
Rosenthal, S., Farra, N., Nakov, P.: Semeval2017 task 4: Sentiment analysis in Twitter. SemEval’17, pp. 502–518 (2017)
Saad, M., Bovik, A., Charrier, C.: Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans. Image. Process. 21(8), 3339–3352 (2012)
Article MathSciNet Google Scholar
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. ACL’13, pp. 1631–1642 (2013)
Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. EMNLP’15, pp. 1422–1432 (2015)
Wang, J., Yu, L., Lai, K., Zhang, X.: Dimensional sentiment analysis using a regional CNN-LSTM model. ACL’16 2, 225–230 (2016)
Google Scholar
Wang, Y., Feng, S., Wang, D., Yu, G., Zhang, Y.: Multi-label Chinese microblog emotion classification via convolutional neural network. APWeb’16, pp. 567–580 (2016)
Wen, S., Wan, X.: Emotion classification in microblog texts using class sequential rules. AAAI’14, pp. 187–193 (2014)
Xue, W., Li, T.: Aspect based sentiment analysis with gated convolutional networks. ACL’18, pp 2514–2523 (2018)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. AAAI’18, pp. 7444–7452 (2018)
Ye, P., Kumar, J., Kang, L., Doermann, D.: Unsupervised feature learning framework for no-reference image quality assessment. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1098–1105 (2012)
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W., Leskovec, J.: Graph convolutional neural networks for Web-Scale recommender systems. KDD’18, pp. 974–983 (2018)
Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. AAAI’18, pp. 3634–3640 (2018)
Yuan, Z., Purver, M.: Predicting emotion labels for chinese microblog texts. Adv. Soc. Media. Analy., pp. 129–149 (2015)
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. ACL’17 1, 253–263 (2017)
Google Scholar
Zhao, J., Liu, K., Xu, L.: Sentiment analysis: Mining opinions, sentiments, and emotions. Comput. Linguis. 42(3), 595–598 (2016)
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Key R&D Program of China (No. 20-16YFC1401900), the National Natural Science Foundation of China (61173029, 61672144, 61872072), and the Australian Research Council Discovery Grants (DP170104747, DP180100212).

Author information

Authors and Affiliations

Northeastern University, Shenyang, 110819, China
Yuni Lai, Linfeng Zhang & Donghong Han
Swinburne University of Technology, Hawthorn, 3122, Australia
Rui Zhou
Beijing Institute of Technology, Beijing, 100081, China
Guoren Wang

Authors

Yuni Lai
View author publications
You can also search for this author in PubMed Google Scholar
Linfeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Han
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Guoren Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donghong Han.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yuni Lai and Linfeng Zhang contributed equally to this paper.

This article belongs to the Topical Collection: Special Issue on Application-Driven Knowledge Acquisition

Guest Editors: Xue Li, Sen Wang, and Bohan Li

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lai, Y., Zhang, L., Han, D. et al. Fine-grained emotion classification of Chinese microblogs based on graph convolution networks. World Wide Web 23, 2771–2787 (2020). https://doi.org/10.1007/s11280-020-00803-0

Download citation

Received: 27 January 2019
Revised: 05 December 2019
Accepted: 17 February 2020
Published: 17 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11280-020-00803-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fine-grained emotion classification of Chinese microblogs based on graph convolution networks

Abstract

Similar content being viewed by others

Aspect-level sentiment analysis based on semantic heterogeneous graph convolutional network

Transformer and Multi-scale Convolution for Target-Oriented Sentiment Analysis

Highway-Based Local Graph Convolution Network for Aspect Based Sentiment Analysis

1 Introduction