Keywords

1 Introduction

Sentiment analysis refers to extracting, analyzing, understanding, and generating subjective information in natural language. It has been widely applied in many applications, such as public opinion analysis, recommendation systems, review analysis and generation, and business decision-making. External knowledge is often introduced to help classify the sentiment polarity according to the sentiment words such as happy, sad, and fear. However, sentiment lexicons only construct the relationship between words and sentiments and fail to model the relationship between words and phrases. In particular, Chinese is composed of basic character units, which are influenced by the characteristics of the language itself with complex grammatical structures, diversified semantics, and diversified expressions. Especially considering that function words, a vital means of expression in Chinese, can help express the semantic relationship between content words. Among them, adverbs and conjunctions significantly influence the sentiment and polarity of sentiment words.

Adverbs, especially degree and negative adverbs, are the influencing factors and judgment conditions considered widely in the existing studies on sentiment classifications [1,2,3,4,5,6,7,8]. When they modify sentiment words, the orientation and intensity of the sentiment polarity will change to a certain extent. For example, is composed of the positive adjective and the negative adverb , so it has a derogatory meaning. The phrase and under the modification of the degree adverb and express different levels of . Besides, conjunctions are function words with the function of connecting, which can connect words, phrases, clauses, or sentences and indicate causal, inference, hypothesis, conditional, and other linguistic relations [13, 14, 16].

Therefore, this paper proposes a framework, Function words-guided Sentiment-aware Attention (FuncSA) for Chinese sentiment analysis. FuncSA focuses on the sentiment expressed in the input text and obtains the internal correlated features of sentiment words, adverbs, and conjunctions. In particular, based on calculating the polarity values of emotive words modified by adverbs, different weights are given according to the influence of conjunctions in clauses. The sentiment-aware attention will then adjust the contribution of the sentiment of different clauses. These obtained sentiment semantic representations are effectively utilized to guide the model in conducting Chinese sentiment analysis.

2 Related Work

2.1 Sentiment Analysis

Function Words Irrelevant Methods. Neural networks with attention mechanisms can increase interpretability and sentiment analysis performance [9,10,11,12]. Cheng [10] makes the model learn more precise and abundant information by neural network and hierarchical attention network. Li [11] uses the multi-head self-attention to fully extract context representation. Xie [12] uses attention to obtain sentiment information contained in the sentiment words.

Function Words-Relevant Methods. In the early research based on rules or traditional machine learning approaches, adverbs and conjunctions were used to calculate the sentiment [3, 4, 13] or assist classifiers in predicting the sentiment polarity [5,6,7,8]. With the help of deep learning, Qian [1] uses the features of sentiment lexica, negative words, and degree adverbs and achieves good results with LSTM. Liang [14] uses conjunctions to segment phrases to construct graph structures to encode contextual information.

Although these studies have tried to utilize function words in sentiment analysis, their adverbs and conjunctions are mainly based on empirical results. Therefore, combining the existing sentiment lexical resources, this paper introduces the CFKB [15,16,17,18,19] into sentiment analysis. Thereinto, negative adverbs can change the polarity in the modification of sentiment words, degree adverbs can enhance or weaken the tendency, and conjunctions can change or even reverse the sentiment orientation of clauses to varying degrees.

2.2 Chinese Function Word Usage Knowledge Base

Function words are a primary means of expressing grammatical meaning in Chinese [20]. Unlike content words that can act independently as a sentence component, function words have no semantic meaning. They must be attached to content words or phrases to express grammatical meaning, tone, or sentiment. CFKB classifies function words into six categories: adverb, preposition, conjunction, auxiliary word, modality, and locality. It comprehensively describes function words in terms of their attributes, such as POS, definitions, example sentences, and usage descriptions.

CFKB, providing reliable lexical resources for Chinese language processing and semantic understanding, is widely used in the syntactic analysis [21, 22], information extraction [23], sentiment analysis [2], and other natural language processing tasks. Li [2]introduced degree adverbs, negative adverbs, and conjunctions in CFKB into sentiment analysis. Experimental results of the rule-based method are better than traditional machine learning methods, showing that function words have essential effects on sentiment word recognition and analysis. However, it still needs to be improved with deep learning methods.

3 Methodology

Based on the context representation obtained by ERNIE, FuncSA uses the sentiment knowledge obtained from lexical resources and CFKB to extract the sentiment-aware semantic information for sentiment analysis. The overall structure is shown in Fig. 1.

Fig. 1.
figure 1

Overview of FuncSA

3.1 ERNIE Encoder

Pre-trained language models such as BERT [24] have achieved substantial progress in sentiment analysis. Like BERT, ERNIE [25] consists of multiple Transformer encoder layers. In addition, ERNIE is consistent with BERT in training by adding the mask [CLS] and [SEP] before and after the input text. [CLS] is the first token to capture the information representation of the context, which will be used for downstream tasks. [SEP] is at the end of the sentence, indicating the end.

However, ERNIE introduced phrases and entity knowledge in the pre-training stage. Specifically, ERNIE’s mask strategy mainly has three ways: First, it adopts the same way as BERT; that is, it masks and learns the expression of each piece in the input. Secondly, it randomly masks phrases to make use of word information. Finally, the entities in the input sentences are masked and predicted in the pre-training stage. ERNIE can learn the entities, phrases, and other external knowledge in the mass corpus to obtain a more reliable language representation through such a mask strategy. Therefore, we utilize ERNIE as the context encoder in Chinese sentiment analysis.

3.2 Function Words-Guided Sentiment Representation

Function words-guided Sentiment Representation is obtained through three steps. 1) Split the input sentence into clauses by conjunction. 2) Calculate the base sentiment score. 3) Weight the sentiment score by conjunctions.

Separate the Input Sentence into Clauses by Conjunctions. Conjunctions have complex and diverse functions and usages and can express various logical relations by connecting phrases, sentences, and texts. This paper selects four kinds of conjunctions in CFKB that express transition, progression, selection, and coordinate. The clauses connected by transition conjunctions usually have opposite sentiment orientations, and the clause after the conjunctions mainly determines the overall sentiment. Selective, progressive, and coordinating conjunctions tend to connect elements with the same polarity of sentiment, while progressive conjunctions tend to be followed by clauses with a more intense sentiment. The input sentence \(X=\{x_1^a,...,x_m^a,x_1^c,...,x_\ell ^c,x_1^b,...,x_n^b\}\) is divided into two clauses \(X_a=\{x_1^a,...,x_m^a\}\) and \(X_{c+b}=\{x_1^c,...,x_\ell ^c,x_1^b,...,x_n^b\}\) by conjunctions \(X_c=\{x_1^c,...,x_\ell ^c\}\) with the length of m and \(\ell +n\), respectively.

Calculate the Base Sentiment Score. Then, we combine the HowNet, and the National Taiwan University Sentiment Dictionary(NTUSD)Footnote 1, and then remove the stop words in the collection and form FuncSA’s dictionary for calculating sentiment scores. In addition to sentiment words, the context of sentiment words, especially the degree and negative adverbs, is also essential to sentiment classification.

Degree adverbs significantly influence the strength of the sentiment orientation in sentences. The degree adverbs used in this paper come from HowNet. There are 219 \(^\circ \)C words in Chinese, which can be divided into six categories according to degree level: “extreme/most, very, more, -ish, insufficiently, and over”. Each category is given a weight to calculate the sentiment score in sentences, and the weight is selected by experience. Examples of degree adverbs and the weights assigned to each category are shown in Table 1.

Table 1. Examples of degree adverbs and the weights

As for the negative adverbs, we screened all 50 negative adverbs in CFKB. In Chinese, there are double negation and multiple negation cases. When negation occurs even times, the original text logically has an affirmative meaning, and when negation occurs at odd times, the original text indicates a negative meaning. The number of negative adverbs in front of sentiment words is counted, and the sentiment value modified by odd negative adverbs is negative, while the value modified by even negative adverbs remains unchanged.

Based on the above steps, the sentiment scores of clauses \(X_a\) and \(X_{c+b}\) under the influence of degree adverbs and negative adverbs is \(S_a\) and \(S_{c+b}\) respectively.

Weight the Sentiment Score by Conjunctions. According to the conjunctions \(X_c\), assign different weights \(w_1\) and \(w_2\) to \(S_a\) and \(S_{c+b}\) respectively, then the weighted sentiment scores of clauses \(X_a\) and \(X_{c+b}\) can be expressed as \(S_a \times w_1\), \(S_b \times w_2\). The weighted scores are assigned to each character in the clauses, and extended to the matrix \(A_1\in \mathbbm {R}^{m*m}\), \(A_2\in \mathbbm {R}^{(\ell +n)*(\ell +n)}\) :

$$\begin{aligned} \begin{aligned}&A_1=w_1 \times S_a \times \begin{bmatrix} 1 &{} \cdots &{} 1 \\ \vdots &{} &{} \vdots \\ 1 &{} \cdots &{} 1 \end{bmatrix} = \begin{bmatrix} w_1S_a &{} \cdots &{} w_1S_a \\ \vdots &{} &{} \vdots \\ w_1S_a &{} \cdots &{} w_1S_a \end{bmatrix} \\&A_2=w_2 \times S_b \times \begin{bmatrix} 1 &{} \cdots &{} 1 \\ \vdots &{} &{} \vdots \\ 1 &{} \cdots &{} 1\\ \end{bmatrix} = \begin{bmatrix} w_2S_b &{} \cdots &{} w_2S_b \\ \vdots &{} &{} \vdots \\ w_2S_b &{} \cdots &{} w_2S_b\\ \end{bmatrix} \end{aligned} \end{aligned}$$
(1)

Thus, the sentiment-aware representation M guided by function words can be expressed as:

$$\begin{aligned} M= \begin{bmatrix} A_1 &{} O \\ O &{} A_2 \\ \end{bmatrix} \in \mathbbm {R}^{(m+\ell +n)*(m+\ell +n)} \end{aligned}$$
(2)

3.3 Guide Module

FuncSA added one more Transformer encoder layer above ERNIE to integrate function words-guided sentiment representation. A Transformer encoder layer mainly comprises multi-head attention, feed-forward network, and Layer Norm. Self-attention learns the semantic representation of context by calculating the interaction between words. Multi-head self-attention expands feature space by calculating self-attention in different subspaces and improves implementation.

We integrate the sentiment-aware representation M into multi-head self-attention, which constitutes the sentiment-aware attention, to capture the interaction of the sentiment words and the context information. The guide module can learn implicit information in combination with sentiment knowledge in context representation obtained from the pre-trained model.

Specifically, for the context representation H, we apply function words-guided sentiment representation M on the \(QK^T\) to obtain self-attention:

$$\begin{aligned} \begin{aligned} Attetnion(Q,K,V)&=Softmax(\frac{QK^T}{\sqrt{d_K}}+ M)V \\ where \quad Q=HW^Q&, K=HW^K, V=HW^V \end{aligned} \end{aligned}$$
(3)

where \(QK^T\) calculates the internal correlation between words in the clause. M is added to the attention score normalized by \(\sqrt{d_K}\) and then the weighted attention score is obtained by multiplying Softmax normalization by the V. Therefore, it can exert an influence on self-attention through sentiment information. Then, the attention scores from multiple subspaces are concatenation according to Eq. 4.

$$\begin{aligned} \begin{aligned} MultiHead(Q,K,V)&=Concat(head_1,...,head_h)W^O \\ where \quad head_i&=Attention(Q_i,K_i,V_i) \end{aligned} \end{aligned}$$
(4)

According to Eq. 2, M is the diagonal block matrix so that attention will be limited within each clause. Each character only pays attention to the other in the same clause because the sentiment or polarity has changed with the influence of conjunctions.

Furthermore, layer normalization and a feedforward network are used to accelerate the convergence and enhance the analyses and prediction of the model.

4 Experimental Settings

4.1 Datasets

In this paper, ChnSentiCorp [26], COAE2013Footnote 2, and NLPCC2014Footnote 3 is selected to verify the performance of FuncSA.

ChnSentiCorp is an online shopping review dataset containing hotels, laptops, and books. To conduct a fair experimental comparison, this article follows the division of datasets in previous studies [25].

COAE2013 is from The Fifth Chinese Opinion Analysis Evaluation. There are 1004 positive reviews among the annotated data and 834 negative reviews. The dataset is divided into train set and test set according to the ratio of 9:1.

NLPCC2014 is from the sentiment classification with deep learning technology task on the 3rd CCF Conference on Natural Language Processing & Chinese Computing, utilizing data from Chinese product review websites, including books, DVDs, and electronic products reviews.

4.2 Baselines

We compare our model with the following baseline methods on both datasets.

  1. 1)

    RNNs or CNNs baselines: BiLSTM [27], BiLSTM+Att [27], TextCNN [28], DPCNN [29].

  2. 2)

    Vallina pre-trained models: BERT, BERT-WWM [30], RoBERTa [31], ERNIE.

  3. 3)

    ERNIE-based models: ERNIE+BiLSTM, ERNIE+BiGRU, and ERNIE+Att.

For the RNNs or CNNs baselines, we used the word vector pre-trained by the Sogou News corpus, running a maximum of 100 epochs with a batch size of 128. Adam optimizer was adopted with a learning rate of 1e-4. For BiLSTM and BiLSTM+Att, the dimension of the hidden layer was set to 128. For TextCNN and DPCNN based on CNN, the number of convolution kernels was 256.

For the pre-trained model baselines, we followed default settings, i.e., 12 layers of multi-head attention with the dimension of the hidden layer set to be 768. The batch size was set as 16, and the learning rate was 5E-5. Adam was used to optimize the cross-entropy loss function.

5 Experimental Results

5.1 Main Results

The results are shown in Table 2, from which several observations can be obtained. First, the results of vanilla pre-trained models on all datasets are better than the optimal results of the RNNs or CNNs baselines. The pre-trained model can capture long-distance features better than RNNs and CNNs. Second, comparing these vanilla pre-trained models, we found that ERNIE reaches the best accuracy and F1 on both datasets. ERNIE could better adapt to the Chinese language and learn a more appropriate global representation.

Table 2. Performance comparison of models on four datasets.

Last but not least, comparing the ERNIE-based models, ERNIE+Att has an improvement compared with ERNIE+BiLSTM and ERNIE+BiGRU because attention can adjust the global representation learned by ERNIE to focus on critical areas in the context. FuncSA added sentiment knowledge guided by function words and integrated sentiment knowledge with context information through one more Transformer encoder layer. Compared with ERNIE+Att, FuncSA achieved better results in accuracy and F1, which fully demonstrates the effectiveness of this model.

5.2 Ablation Study

In order to explore the directive role of different classes of function words and lexical information on sentiment-aware attention, we discuss the performance of FuncSA with or without function words-guided sentiment representation M and different function words combination. The results are shown in Table 3.

Table 3. Ablation study.

Results show that although accuracy and F1 of FuncSA are decreased by 0.16% and 0.17% compared with w/o M on the ChnSenticorp-Dev, which may result from unexpected noise caused by the introduction of the sentiment-aware representation. However, FuncSA is better than others in most datasets, illustrating the function words-guided knowledge can effectively guide the attention to focus on sentiment-specific areas.

Comparing w/o conj. & adv. with w/o conj., it is found that in the ChnSenticorp-Test, and NLPCC2014 datasets, accuracy, and F1 of w/o conj. are all higher than that of W/O conj. & adv., which indicates that adverbs are indispensable.

Comparing w/o conj. & adv. with w/o adv., which both considered sentiment words but did not consider adverbs, the results of w/o adv. were better than the accuracy of w/o conj. & adv. in all datasets, indicating that there is indeed an association between conjunctions and sentiment words in Chinese text sentiment analysis.

Furthermore, by comparing w/o M with the other three variants, it was found that the results of other variants were better than the results of w/o M in the ChnSenticorp-Test, COAE2013, and NLPCC2014. FuncSA makes full use of the sentiment information of lexicons, and the modification of conjunctions and adverbs, thus improving the accuracy of classification.

5.3 Visualization

Fig. 2.
figure 2

Attention detail view of ERNIE and FuncSA: (a) ERNIE; (b) FuncSA

In order to intuitively compare the semantic expression ability of baseline ERNIE with FuncSA in Chinese text, we use BertViz [32], an open-source attention visualization tool for pre-trained models. For example, the text is the case for the attention view, as shown in Fig. 2. Since different layers have different attention modes, this paper selects the view of all attention subspaces at the last layer, and the color blocks at the top of Fig. 2 distinguish the attention representation of each 12 subspaces. The intensity of the attention distribution in each subspace determines the color saturation of the attention view on the right side of each row.

The leftmost column of the attention head view contains the model input with two identifying masks [CLS] and [SEP] and the comment text. The [CLS] represents the hidden state output of the input, so the attention distribution from the [CLS] to other positions in the sequence was tracked. It can be seen that FuncSA’s multiple subspaces pay more attention to the text containing negation tendencies , and it is easier to make correct sentiment polarity classification than ERNIE.

6 Conclusion

This paper mainly introduces FuncSA, a sentiment analysis model which integrates function words-guided sentiment-aware attention. FuncSA integrates discrete sentiment lexical information with degree adverbs, negative adverbs, and conjunctions in CFKB. Sentiment-aware attention is then introduced to assist the pre-trained model in sentiment classification. Experimental results show that the proposed FuncSA can improve the results of Chinese sentiment analysis. This study reveals the validity of sentiment lexical information influenced by function words and verifies the role of adverbs and conjunctions in guiding sentiment changes.