Keywords

1 Introduction

The COVID-19 pandemic is spreading all over the world, and fighting the pandemic is a protracted battle. It is also an important battle to analyze the COVID-19 related data continuously generated on social media and quickly grasp the public opinion that the pandemic may trigger. Public opinion caused by the pandemic will have an important impact on the decision-making of the government and relevant departments. Automatic emotion classification of COVID-19 related data on social media is helpful to assess the risk of public opinion. Common social media platforms at home and abroad include Weibo, Facebook, Twitter, and so on, which are important ways for netizens to express their opinions and emotions [1].

As one of the largest social media platforms in China, Sina Weibo has generated massive amounts of COVID-19 microblog data. Through analysis, we found that the text of the COVID-19 Chinese microblog is short, lacks context, and has nonstandard expressions. For example, the two posts ‘ (Come on Wuhan, China will win)’ and ‘ ’. Generally, existing emotion classification methods only utilize the features of the microblog itself for modeling and do not consider the semantics of emotion categories well. Therefore, they cannot analyze the emotion of nonstandard COVID-19 Chinese microblog text well. To solve this problem, we present an emotion classification method based on the emotion category description. Based on the idea of question answering, the semantic information of categories is fused to help the model understand the emotion of microblogs. In the next paragraph, we will analyze the impact of category semantic information on the emotion classification of COVID-19 Chinese microblogs.

Emotion classification requires identifying specific sentiments in the text, such as happiness, anger, sadness, and fear [2, 3]. Traditional supervised emotion classification models generally transform the categories into digital labels, as the supervised signal to guide the learning process of the model. For example, traditional models use ‘1’ for happiness emotion and ‘2’ for anger emotion. Normally, the digital label will be represented as a one-hot vector, and be used to calculate the training loss. Then the backpropagation algorithm is used to minimize the objective function to train the model. Traditional models do not adequately consider the semantic information of emotion categories, which means that they do not better learn the meaning of emotion categories and cannot accurately classify microblogs into corresponding emotion categories. When judging the emotions expressed in a text, humans usually have some prior knowledge. For example, the microblog ‘ ’. When humans are judging the emotion of this microblog, the embodiment of prior knowledge is that they know the specific meaning of anger emotion, that is, someone is agitated because of extreme dissatisfaction, so it is easy to correctly judge the emotion of this microblog. However, the models do not have prior knowledge, so they are not ideal for microblog emotion learning. In other words, if the models can grasp the prior knowledge that humans have, they will better understand the emotion of the text.

By asking what the emotion of a certain microblog expresses and then giving an answer, this process of judging the emotions of the text is similar to the question-answering task. Inspired by this, we introduce the idea of question answering into the emotion classification task of COVID-19 Chinese microblogs. The main contributions are summarized as follows:

  1. 1)

    We propose an emotion classification model of COVID-19 Chinese microblogs based on the emotion category description. Firstly, all emotion categories of microblogs to be classified are expanded into formalized category descriptions, as a candidate answer set. Secondly, we construct a question for each microblog in the form of ‘What is the emotion expressed in the text X?’. Then, the question and all category descriptions are constructed into a question-and-answer pair as the input of the pre-trained BERT model. Finally, by fusing rich contextual and category semantic information, the model completes the emotion classification of COVID-19 Chinese microblogs.

  2. 2)

    We present three emotion category description strategies, which consider words, extended words and emotion definitions to describe three different granularity of category information, respectively.

  3. 3)

    Experimental results show that our approach outperforms many existing emotion classification methods on the COVID-19 Chinese microblog dataset.

2 Related Work

Emotion classification of COVID-19 Chinese microblogs is essentially a sentiment classification task. Recently, with the rise of deep learning, existing sentiment classification studies are usually based on deep learning methods. Neural network models, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and Transformers, have been proven effective in many sentiment classification tasks.

Tang et al. [4] first used CNN or LSTM to encode a single sentence and then used gated RNN to encode the internal relations and semantic connections between sentences. Finally, they obtained the representation of the document to complete sentiment classification. Wang et al. [5] proposed a context-aware bidirectional LSTM model, which used forward and backward LSTMs to jointly encode the context information of the text. It has achieved good results in the emotion classification of Chinese microblogs. Kim [6] proposed for the first time to use convolutional neural networks to extract text sequence features for sentence-level text classification and completed sentiment classification on movie review datasets. Since then, a series of sentiment classification methods based on CNN have been produced [7, 8]. Johnson et al. [9] proposed a word-level DPCNN, which extracted long-distance text dependencies by continuously deepening the network. They performed sentiment classification on review datasets such as Amazon and achieved the best results at the time. He et al. [10] completed the enhancement of emotional semantics by mapping the commonly used emoji vector representation and the word vector representation of the text to the same emotional space. Then they used a multi-channel convolutional neural network to classify the emotion of Chinese microblogs. In the above research, although the RNN-based model can effectively process serialized text, it also has the problem of sequence dependence and cannot be calculated in parallel. Although the CNN-based model can be processed in parallel, its ability to capture long-distance features is weak due to its mechanism of extracting text features through sliding convolution windows. Besides, CNN-based models generally use a pooling layer to integrate text features, but this will lose the location information of the text, which is another serious problem. To improve these problems, Vaswani et al. [11] proposed the Transformer model, which completely took the Self-Attention mechanism as the basic structure of the model, and abandoned the loop structure of RNN and the convolution structure of CNN. At the same time, Transformer not only has all the advantages of RNN and CNN but also solves the problem of sequence dependence of RNN and the problem of CNN’s weak ability to capture long-distance features. Devlin et al. [12] proposed the BERT model based on Transformer, which opened the prelude to the development of pre-training language models and refreshed the records of a series of NLP tasks including sentiment classification tasks. Since then, a series of Transformer-based models have been proposed, which can be referred to as Transformers [13, 14].

To sum up, the current methods of sentiment classification mainly focus on neural network models such as RNNs, CNNs, and Transformers. These methods only perform modeling based on the text semantics and fail to utilize the semantic information of classification categories. Studies have shown that the semantic information of categories is effective for classification problems. For example, Rios and Kavuluru [15] integrated the semantics of the category into the model in the form of word embedding, which improved the performance of the text classification task. Chai et al. [16] guided the learning process of the model by using all the category descriptions as questions and the classified text as the answer, thereby enhancing the performance of the text classification task. Their method requires the model to ask about the emotion of the text N times, where N is the number of categories. This process seems not easy to understand, that is, when humans judge the category of a text, they usually understand the semantics of its category, and then combine their knowledge to make judgments.

Different from [16], we construct a question for the input microblog text, and then use all category descriptions as candidate answer set to classify the emotion of the text by a question answering (QA) based method. By constructing a question-and-answer pair to combine each microblog with category descriptions, the model can focus on both the category information related to the microblog and the microblog information related to categories. We also introduce the attention mechanism to focus on the important information in the candidate answer set. Emotion classification requires not only understanding the semantics of the text, but also the emotions contained in the text. Therefore, how to accurately abstract and integrate the semantic information of emotion categories to help the model better understand the emotions of the text is an important issue that we focus on.

3 Methods

In this section, we present our emotion classification method for COVID-19 Chinese microblogs, which contains two parts: (1) the definition and strategy of emotion category description (Sect. 3.1), and (2) the emotion classification fine-tuning based on a question answering (QA) method (Sect. 3.2).

3.1 Definition and Strategy of Emotion Category Description

The definition of emotion category description is to extend emotion categories into formalized descriptions according to a certain strategy. We use three strategies to construct descriptions.

Keyword-Based Category Description. We use six keywords as the description of six emotion categories of happiness, anger, sadness, fear, surprise, and neutral. The construction examples are shown in Table 1.

Table 1. Construction examples of keyword-based category description.

Keyword Expansion-Based Category Description According to the affective lexicon ontology of Dalian University of Technology [17], the synonyms corresponding to the five Chinese category keywords of ‘ (happiness)’, ‘ (anger)’, ‘ (sadness)’, ‘ (fear)’, and ‘ (surprise)’ are searched, and they are used as emotion category descriptions together. For the neutral category, the Chinese keywords ‘ (neutral)’ and ‘ (no emotion)’ are spliced together as the emotion category description. There are two versions of the keyword expansion-based category description, which are shown in Table 2 and Table 3.

Table 2. Construction examples of keyword expansion-based category description (Version 1).
Table 3. Construction examples of keyword expansion-based category description (Version 2).

Emotion Definition-Based Category Description. We determined the specific definition of each emotion category through Baidu Encyclopedia and then adapted it to the category descriptions. The construction examples are shown in Table 4.

Table 4. Construction examples of emotion definition-based category description.

3.2 Emotion Classification Fine-Tuning Based on a Question Answering (QA) Method

Fig. 1.
figure 1

Structure of emotion classification model based on the emotion category description

We use the Chinese pre-trained BERT model (BERT-Base, Chinese) released by Google as the basic model. There are two input forms of pre-trained BERT to fine-tune downstream classification tasks: one is the single sentence input, and the other is the sentence pair input. We adopt the second form, by constructing a question for the microblog and using the category descriptions of all emotion categories as a candidate answer set to construct a question-and-answer pair as the input of the pre-trained BERT model. The structure of our model is shown in Fig. 1.

First, we introduce the method of constructing the question-and-answer pair. Given a microblog and all emotion categories \(\{Y_{c}|X\}=\{Y_{c}|x_{1},x_{2},\dots ,x_{n}\},c=1,2,\dots ,N\). \(Y_{c}\) represents a category of emotions, and \(X=\{x_{1},x_{2},\dots ,x_{n}\}\) represents a microblog. Based on the idea of question answering, a microblog X is used to construct a question to ask the model what is the emotion expressed in the text X, and all emotion category descriptions are used as a set of candidate answers. Then, \(\{Y_{c}|X\}=\{Y_{c}|x_{1},x_{2},\dots ,x_{n}\},c=1,2,\dots ,N\) can be represented as a question-and-answer pair: \(\{Y_{c}|X\}=\) ‘[CLS]What is the emotion expressed in the text X?\([SEP-1]\)Category description of \(Y_{1}[SEP-2]\)Category description of \(Y_{2}[SEP-N]\)Category description of \(Y_{N}[SEP]\)’. Among them, ‘[CLS]’ represents a special classification token, and the hidden state of ‘[CLS]’ can be used to represent the semantics of text for classification tasks. ‘\([SEP-1]\)’, ‘\([SEP-2]\)’, etc. represent the separator tokens, which are used to separate each category description in the answer set. An construction example of the question-and-answer pair is shown in Fig. 2.

Fig. 2.
figure 2

An construction example of the question-and-answer pair

The basic structure of the BERT is the Transformer. We omit the specific description of the model and instead focus on how to use a pre-trained BERT model to fine-tune downstream emotion classification task based on category description. In order to fine-tune, we first initialize the BERT model with pre-trained parameters, and then input the constructed question-and-answer pairs into the model, as shown in Fig. 1. In the process of training, the model will be continuously fine-tuned according to the input labeled data and adjusted to the final model suitable for the emotion classification task. After the question-and-answer pair is encoded by the BERT model, we obtain the hidden state of the special tokens as the contextual representations, denoted as \(h_{[CLS]}\in {R^{768\times 1}}\) and \(h_{[SEP-n]}=\{h_{[SEP-1]},h_{[SEP-2]},\dots ,h_{[SEP-N]}\}\in {R^{768\times 1}}\). \(h_{[SEP-n]}\) is the contextual hidden representation of each answer (category description). For the current question of ‘What is the emotion expressed in the text X?’, the contextual hidden representation of the category description corresponding to the real label should be more important. Therefore, we used the attention mechanism to process \(h_{[SEP-n]}\), which is given as follows:

$$\begin{aligned} a_{n}=h_{[SEP-n]}^{T} q \end{aligned}$$
(1)
$$\begin{aligned} \alpha _{n}=\frac{exp(a_{n})}{\sum _{t=1}^{N}exp(a_{t})} \end{aligned}$$
(2)
$$\begin{aligned} h_{[SEP]}^{att}={\sum _{n=1}^{N}\alpha _{n}h_{[SEP-n]}} \end{aligned}$$
(3)

where \(h_{[SEP-n]}^{T}\) is the transpose of \(h_{[SEP-n]}\), \(q\in {R^{768\times 1}}\) is the randomly initialized attention query vector, and \(\alpha _{n}\) is the attention distribution.

Then we calculate the fused semantic representation by the formula \(h_{add}=h_{[CLS]}+h_{[SEP]}^{att}\) and input it into a fully connected layer to obtain the emotion category score vector \(s\in {R^{N\times 1}}\). Furthermore, we use the Softmax function to normalize s to obtain the conditional probability distribution \(P_{i}(s)\). The formulas are as follows:

$$\begin{aligned} s=W_{1}h_{add}+b_{1} \end{aligned}$$
(4)
$$\begin{aligned} P_{i}(s)=\frac{exp(s_{i})}{\sum _{j=1}^{N}exp(s_{j})} \end{aligned}$$
(5)

where \(W_{1}\in {R^{N\times 768}}\) is the weight matrix, \(b_{1}\in {R^{N\times 1}}\) is the bias vector, and N is the number of emotion categories. The cross-entropy loss function is used to train and update the parameters of the model through a backpropagation algorithm, the formula is as follows:

$$\begin{aligned} loss=-\sum _{x\in {T}}{\sum _{i=1}^{N}{P_{i}^{t}(x)log_{2}(P_{i}^{p}(x))}} \end{aligned}$$
(6)

where T is the training set, x is one of the samples in the training set. \(P_{i}^{t}(x)\) is the ground truth probability distribution of the emotion category of x, and \(P_{i}^{p}(x)\) is the predicted probability distribution of the emotion category of x.

We propose several models based on the above three strategies, named BERT-KCD, BERT-KECD, and BERT-EDCD. Among them, BERT-KCD (BERT with Keyword-based Category Description) represents the integration of keyword-based category descriptions into the BERT model. BERT-KECD (BERT with Keyword Expansion-based Category Description) represents the integration of keyword expansion-based category descriptions into the BERT model. There are two versions of BERT-KECD, corresponding to the two types of extended keywords in Table 2 and Table 3, named BERT-KECD-v1 and BERT-KECD-v2, respectively. BERT-EDCD (BERT with Emotion Definition-based Category Description) represents the integration of emotion definition-based category descriptions into the BERT model.

4 Experiments

4.1 Experimental Dataset

The experimental dataset comes from ‘The Evaluation of Weibo Emotion Classification Technology, SMP2020-EWECTFootnote 1’ on ‘The Ninth China National Conference on Social Media Processing’. Each microblog is manually labeled with one of six categories: happiness, anger, sadness, fear, surprise, and neutral. Table 5 shows the statistical information of the experimental dataset.

Table 5. Statistical information of the COVID-19 microblog dataset.

4.2 Baseline Models

We compared the model with seven other baseline models, the baseline models are as follows:

MNB (Multinomial Naïve Bayes) [18]: It achieves excellent performance in many sentiment classification tasks. The smoothing factor-alpha of MNB is set to 1.0.

SVM (Support Vector Machines) [19]: It is widely used in sentiment classification tasks and has achieved excellent results. The regularization constant C of SVM is set to 1.0, and the kernel function is linear.

BLSTM (Bidirectional Long Short-Term Memory) [20]: The model extracts context-related text features for sentiment classification through bidirectional LSTM. It uses a single-layer bidirectional LSTM network with 256 hidden layer units.

CNN (Convolutional Neural Networks) [6]: The classic convolutional neural network proposed by Kim, which uses CNN to extract deep semantic features for text sentiment classification. The convolution kernel sizes of the model are 3, 4, and 5. There are 100 convolution kernels of each size.

DPCNN (Deep Pyramid Convolutional Neural Networks) [9]: The model performs deep convolution operations at the word level and extracts long-distance text features for sentiment classification. It has achieved the best results at the time on multiple review datasets such as Amazon. The hyper-parameters are consistent with [9].

HAN (Hierarchical Attention Networks [21]: The model extracts word-level and sentence-level features through hierarchical bidirectional GRU and Attention mechanisms, and obtain the semantic representation of the entire text for sentiment classification. The best results are obtained in many sentiment classification tasks, and the parameter settings of the model are consistent with [21].

BERT (Bidirectional Encoder Representations from Transformers) [12]: The Chinese pre-trained BERT model (BERT-Base, Chinese)Footnote 2 released by Google, which refreshed a series of NLP task records including sentiment classification tasks. Our models are based on this model to fine-tune the emotion classification task.

4.3 Implementation Details

All experimental codes are based on Python 3.6.5 and Tensorflow 1.15.0 and run on the Linux CUDA platform. For baseline models, the learning rate of BLSTM, CNN, DPCNN, and HAN is 0.001, and the batch size is 64. We use the pre-trained word vectorsFootnote 3 disclosed in [22] for neural network models. The dimension of each word vector is 300. All neural network models use Adam optimizer. Furthermore, the hyper-parameter settings of the BERT series model are shown in Table 6.

Table 6. Hyper-parameters setting.

We counted the sequence length of the COVID-19 microblog data, the sequence length of the question, and the sequence length of each emotion category description strategy, respectively. After that, the max sequence length of the BERT, BERT-KCD, BERT-KECD-v1, BERT-KECD-v2, and BERT-EDCD models are taken as 128, 160, 180, 210, and 240, respectively.

4.4 Experimental Results

We use Precision, Recall, F1, Macro_Precision, Macro_Recall, Macro_F1, and Micro_F1 as the evaluation metrics.

Emotion Classification of COVID-19 Chinese Microblogs. To verify the effectiveness of our model, we compare it with some existing mainstream emotion classification models. Among them, BERT-KECD-v2 is our model. The experimental results are shown in Table 7.

Table 7. Experimental results of COVID-19 microblog emotion classification.

Table 7 shows that compared with other deep learning models, the traditional machine learning models such as MNB and SVM have poor performance. The Micro_F1 of MNB and SVM are only 68.53% and 68.47%, respectively. It can be seen that the Micro_F1 of BLSTM, CNN, DPCNN, HAN, and BERT are 72.97%, 74.97%, 74.67%, 75.20%, and 79.17%, respectively. The performance of these deep learning-based models is significantly better than MNB and SVM.

Besides, the Micro_F1 of BERT-KECD-v2 is 79.83%, which is significantly better than the above five deep learning models. Compared with BLSTM, the Macro_Precision, Macro_Recall, Macro_F1, and Micro_F1 of BERT-KECD-v2 have increased by 17.81%, 16.55%, 18.20%, and 6.86%, respectively. Compared with DPCNN, the four metrics of BERT-KECD-v2 have increased by 4.47%, 12.09%, 12.53%, and 5.16%, respectively. Compared with BERT, the four metrics of BERT-KECD-v2 have increased by 3.46%, 0.79%, 2.10%, and 0.66%, respectively. The experimental results prove the effectiveness of our model and show the advantages of our model in the COVID-19 microblog emotion classification.

Validation of Category Description Strategy. We compare the five models of BERT, BERT-KCD, BERT-KECD-v1, BERT-KECD-v2, and BERT-EDCD to verify the effectiveness of the proposed category description strategies. The experimental results are shown in Table 8.

Table 8. Validation results of category description strategy.

Table 8 shows that the three different category description strategies could improve the performance of the BERT model. Compared with the BERT model, the Macro_F1 and Micro_F1 of BERT-KCD increased by 1.40% and 0.26%, respectively. The Macro_F1 and Micro_F1 of BERT-KECD-v1 increased by 1.88% and 0.76%, respectively. The Macro_F1 and Micro_F1 of BERT-KECD-v2 increased by 2.10% and 0.66%, respectively. In our analysis, this is because the two keyword-based category descriptions represent part of the semantic information of the category, which can help the model understand the emotion of the text. Moreover, the richer the keywords, the more obvious the performance improvement of the model.

Besides, compared with BERT, the Macro_F1 and Micro_F1 of BERT-EDCD increased by 1.19% and 0.33%, respectively. In our hypothesis, because the BERT-EDCD based on the emotion definition description carries richer category information, it should perform best. However, the experimental results show that the keyword-based models surprisingly achieve better results than BERT-EDCD. In our analysis, there are two possible reasons why the BERT-EDCD did not work as expected. One is that the definition of emotion is not precise enough. As a comparison, keywords may more intuitively reflect the semantics of the emotion category and are easily accessible. The other is that the structure of BERT is not conducive to handling long sequences. The max sequence length of BERT-EDCD is 240, which is the maximum length among all BERT series models.

Overall, the proposed models based on the category description have improved performance compared to the basic model BERT. The experimental results show that the three description strategies proposed in this paper are effective for the COVID-19 microblog emotion classification task.

Effective Verification of Our Model on Each Emotion Category.

To prove the effectiveness of our model on each emotion category, we compare the Precision, Recall, and F1 of the BERT and BERT-KECD-v2 models. The experimental results are shown in Figs. 3, 4, and 5.

Fig. 3.
figure 3

Comparison results of Precision on each emotion category

Fig. 4.
figure 4

Comparison results of Recall on each emotion category

Fig. 5.
figure 5

Comparison results of F1 on each emotion category

From Figs. 3, 4, and 5, it can be seen that BERT-KECD-v2 has overall better performance than the basic model BERT. Compared with the BERT model, the Precision of the anger and surprise categories of BERT-KECD-v2 increased the most significantly, with an increase of 12.20% and 17.02%, respectively. In addition, BERT-KECD-v2 and BERT perform poorly on the three categories of sadness, fear, and surprise. In our analysis, it can be attributed to the small number of training samples in these three categories, which are only 649, 555, and 197, respectively. In contrast, there are 4423 training samples in the happiness category, which enables the model to be fully trained and thus has the best classification performance in this category. The experimental results show that our model effectively improves the performance of the COVID-19 microblog emotion classification.

5 Conclusion

This paper proposes an emotion classification method based on the emotion category description for COVID-19 Chinese microblog data. By extending the emotion category into a formalized category description, the semantic information of the category is integrated to guide the model to classify emotions. Experimental results show that our method can effectively model the nonstandard COVID-19 microblog text, and the introduced category description semantic information helps the model understand the semantics and emotions of the irregular text. It also proves that introducing the idea of question answering into the BERT model can significantly improve the performance of the COVID-19 microblog emotion classification. Moreover, the issue of category imbalance in emotion classification is a challenge for existing studies. In the future, we will further investigate the category imbalance of COVID-19 microblogs.