Hierarchical Sentiment Estimation Model for Potential Topics of Individual Tweets

Ji, Qian; Dai, Yilin; Ma, Yinghua; Liu, Gongshen; Zhang, Quanhai; Lin, Xiang

doi:10.1007/978-3-030-63820-7_75

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

International Conference on Neural Information Processing

2379 Accesses

Abstract

Twitter has gradually become a valuable source of people’s opinions and sentiments. Although tremendous progress has been made in sentiment analysis, mainstream methods hardly leverage user information. Besides, most methods strongly rely on sentiment lexicons in tweets, thus ignoring other non-sentiment words that imply rich topic information. This paper aims to predict individuals’ sentiment towards potential topics on a two-point scale: positive or negative. The analysis is conducted based on their past tweets for the precise topic recommendation. We propose a hierarchical model of individuals’ tweets (HMIT) to explore the relationship between individual sentiments and different topics. HMIT extracts token representations from fine-tuned Bidirectional Encoder Representations from Transformer (BERT). Then it incorporates topic information in context-aware token representations through a topic-level attention mechanism. The Convolutional Neural Network (CNN) serves as a final binary classifier. Unlike conventional sentiment classification in the Twitter task, HMIT extracts topic phrases through Single-Pass and feeds tweets without sentiment words into the whole model. We build six user models from one benchmark and our collected datasets. Experimental results demonstrate the superior performance of the proposed method against multiple baselines on both classification and quantification tasks.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A BERT Model-Based Sentiment Analysis on COVID-19 Tweets

TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations

Sentiment Analysis of Tweets Using Deep Learning

Keywords

1 Introduction

Nowadays, the information exploited from tweets is abundant and useful, thus receiving great attention from researchers. The task in this paper is to predict individuals’ sentiment towards potential topics on a two-point scale: positive or negative based on their past tweets. Generally, a user’s attitudes towards different topics are closely related and won’t change dramatically in a short time, so building models of individuals’ tweets and estimating sentiment polarities towards potential topics are beneficial for precise topic recommendations for individuals, including related topics, advertisements and social circles. Earlier researchers [5] use a Support Vector Machines with part-of-speech features to categorize tweets. An adaptive recursive neural network for target-dependent classification is proposed, which propagates sentiment signals from sentiment-baring words to specific targets on a dependence tree [4].

All the methods mentioned above ignore potential sentiment relations within individuals’ tweets but rely heavily on sentiment lexicons. Besides, the models mentioned above only focus on the classification task while more practical applications require a combination of extraction and classification. In this work, we propose a hierarchical model of individuals’ tweets, which extracts topics with Single-Pass algorithm and models the relationship between individual sentiments and different topics. The main contributions of this paper are three-fold:

Models are built on individuals’ tweets and the topic phrase of each tweet is obtained through Single-Pass. Individuals’ tweets without sentiment words, along with extracted topic and gold labels are inputs of HMIT. Based on the approach, it’s possible to provide precise topic recommendations for individuals.
We propose a novel topic-dependent hierarchical model, which extracts features from fine-tuned BERT and incorporates topic information through topic-level attention. CNN categorizes sentence representations into positive or negative.
We build models on six users separately from one Twitter benchmark dataset and dataset collected by ourselves. We also create new test dataset, collecting neutral sentences from three general topics. In our experiments, the proposed method is able to outperform multiple baselines on both datasets in terms of classification and quantification.

2 Related Work

Target-based sentiment analysis aims to judge the sentiment polarity expressed for each target being discussed. To capture semantic relations flexibly, a target-dependent Long Short-Term Memory (TD-LSTM) is proposed [10]. As attention mechanism has been successfully applied to many tasks, a variety of attention-based RNN models have proven to be effective ways [12]. To our knowledge, we are the first to exploit target-individual relation for target-based sentiment analysis. Hierarchical models have been used predominantly for representing sentences. A hierarchical ConvNet to extract salient sentences from reviews is employed in [2]. Along with the wide use of pre-trained language models, there is a recent trend of incorporating extra knowledge to pre-trained language models as a different hierarchical model [1]. For BERT, it is difficult to be applied to downstream tasks which need to put emphasis on several specific words. We propose a hierarchical model that extracts overall information from fine-tuned BERT and then incorporates topic information. CNN categorizes the whole sentence representation into positive and negative.

3 Proposed Method

The HMIT architecture is shown in Fig. 1. We describe each component and how it is used in learning and inference in detail. For one user, $\{s_{1},s_{2},\ldots ,s_{m}\}$ is a collection of his/her tweets, containing m tweets of various topics. A tweet $s_{i}$ composed of n words is denoted as $s_{i}=\{x_{1}^{(i)},x_{2}^{(i)},\ldots ,x_{n}^{(i)}\}$ with a gold sentiment label $y_{i}=\{POSITIVE,NEGATIVE\}$.

3.1 Topic Phrases Extraction

To extract several topic phrases in each tweet, we employ Single-Pass algorithm. The core idea is to input texts continuously to determine the matching degree between the input text and an existing cluster. Texts whose maximum similarity to the cluster core are greater than the given threshold $p_{0}$ will be clustered as one category. After all texts are clustered, we set the most frequently occurred bi-gram as topic phrases for this cluster, so that every tweet is associated with a topic phrase $\{TP_{1},TP_{2}\}$.

3.2 Fine-Tuned BERT with Non-sentiment Words

All sentiment words are removed from tweets first according to sentiment lexicons [6]. $\{x_{1}^{(i)},x_{2}^{(i)},\ldots ,x_{n^{\prime }}^{(i)}\}$ represents a tweet without sentiment words and is further fed into BERT tokenization. Each sentence is tokenized and padded to length N by inserting padding tokens. The embedding layer of BERT integrates word, position and token type embeddings where $E_j\in {\mathbb {R}}^K$ is the K-dimensional vector of the j-th word in the tweet. BERT is a multi-layer bidirectional Transformer encoder. In text classification, the decoder applies first token pooling to a full connection layer with softmax activation, returning a probability distribution on two categories. After fine-tuning BERT on our own dataset, we extract one layer of the latent vector from the encoder of fine-tuned BERT.

3.3 Topic-Level Attention

We use a topic-level attention mechanism over a topic phrase to produce a single representation. Since different tokens in a topic phrase may contribute to its semantics differently, we calculate an attention vector for a topic phrase. The hidden outputs corresponding to $\{TP_1^{(i)},TP_2^{(i)}\}$ is denoted as $H^{(i)} = \{h_{TP1}^{(i)}, h_{TP2}^{(i)}\}$. We compute the aggregated representation of a topic phrase as

$$\begin{aligned} H^{(i)\prime }={\alpha }^{(i)\mathsf {T}}H^{(i)}=\sum \limits _{o\in \{1,2\}}{\alpha }_{j}^{(i)}{h_{TPo}^{(i)}}\ \end{aligned}$$

(1)

where the topic attention vector ${\alpha }^{(i)} = \{{\alpha }_{1}^{(i)},{\alpha }_{2}^{(i)}\}$ is distributed over topic phrase $H^{(i)}$. The attention vector ${\alpha }^{(i)}$ is a self-attention vector that takes the hidden outputs of a topic phrase as input and feeds them into a bi-layer perceptron. We concatenate each token representation and the aggregated topic representation $H^{(i)\prime }$ to obtain the final context-aware representation for each word.

3.4 CNN Classification

CNN has grabbed increasing attention in text classification tasks recently due to its strong ability to capture local contextual dependencies. Based on that, we propose to apply CNN to the final layer of classification. As shown in Fig. 1, convolution operation involves kernels with three different sizes. Suppose $w \in {{\mathbb { R}}^{q \times 2K}}$ is a filter of q tokens, a feature $c_j$ is generated by:

$$\begin{aligned} {c_j} = f(w \circ h_{^{j:j + q - 1}}^{(i)\prime } + b) \end{aligned}$$

(2)

Here $\circ $ denotes convolution, while $b\in \mathbb {R}$ is a bias term and f is ReLU activation function. This filter applies to whole possible tokens in the sentence to produce a feature map:

$$\begin{aligned} c = [{c_1},{c_2},...,{c_{N - q + 1}}] \in {{\mathbb { R}}^{N - q + 1}} \end{aligned}$$

(3)

Max-pooling layer take the maximum value $\hat{c} = \max \{ c\} $ of c as the feature corresponding to filter w. ${{\hat{y}}_i}$ denotes the predicted label for the i-th tweet.

3.5 Inference and Learning

The objective to train topic-level attention and CNN classifier is defined as minimizing the sum of the cross-entropy losses of prediction on each tweet as follows:

$$\begin{aligned} {{\mathcal {L}}_s} = - \sum \limits _{i = 1}^m {{{\hat{y}}_i}} \log ({y_i}) + (1 - {{\hat{y}}_i})\log (1 - {y_i}) \end{aligned}$$

(4)

For inference, test news is first passed to the fine-tuned BERT to obtain its hidden vector. According to the topic-level attention mechanism, context-aware representations are incorporated with topic information and then fed to the CNN classifier. Finally, a prediction is obtained.

4 Experiments

4.1 Experimental Settings

Datasets. Table 1 shows the statistics of datasets. We select three users from Sentiment140 [9] to build models separately as $\mathbb {D}_{s1}$, $\mathbb {D}_{s2}$ and $\mathbb {D}_{s3}$. We also collect tweets from three talkative users and label them manually as $\mathbb {D}_{t1}$, $\mathbb {D}_{t2}$ and $\mathbb {D}_{t3}$. To verify the feasibility of the method in practical application, we collect 100 news for each of three topics: health care, climate change, social security as $\mathbb {T}_{h}$, $\mathbb {T}_{c}$ and $\mathbb {T}_{s}$.

Table 1. Dataset statistics

Full size table

Network Details. For Single-Pass, we set the threshold $p_0$ to 0.4. We tune pre-trained base uncased BERT which sets hidden size K as 768 with 12 hidden layers and 12 attention heads. Max sequence length N, batch size and learning rate are set to 128, 32 and $5\times 10^{-5}$ respectively. For the CNN classifier, we adopt three filter sizes: 2, 3 and 4 separately. 64 filters are used for each filter size and three pooling sizes are set to 4 in the task. We train the fine-tuned BERT for 3 epochs and the CNN classifier for 20 epochs.

Evaluation Metrics. We employ accuracy and F1 score as evaluation for classification. We regard evaluation of test sets as a quantification task, which estimates the distribution of tweets across two classes. We adopt Mean Absolute Error based on a predicted distribution ${\hat{p}}$, its true distribution p and the set $\mathcal {C}$ of classes. It’s computed separately for each topic, and the results are averaged across three topics to yield the final score.

4.2 Models Under Comparison

We compare our proposed method with the methods that have been proposed for sentiment analysis (SA) and target-based sentiment analysis (TBSA).

BERT [3]: BERT achieves state-of-the-art results in sentence classification, including sentiment classification.

mem_absa [11]: Mem_absa adopts a multi-hop attention mechanism over an external memory to focus on the importance level of the context words and the given target.
IAN [8]: IAN considers both attention mechanisms on the target and the full context. It uses two attention-based LSTMs to interactively capture the keywords of the target and its content.
Cabasc [7]: Cabasc takes into account the correlation between the given target and each context word, composed of sentence-level content attention mechanism and content attention mechanism.

BERT$^{-s}$, mem_absa$^{-s}$, IAN$^{-s}$ and Cabasc$^{-s}$ are variant models of BERT, mem_absa, IAN and Cabasc respectively, removing sentiment words in training and testing.

4.3 Results and Analysis

Main Results. From Table 2, we observe that HMIT is able to significantly outperform other baselines in both classification and quantification tasks on our own dataset, which suggests that our proposed method is effective to capture the relationship between individual sentiments and different topics and succeed in sentiment estimation towards potential topics. We also find that BERT performs reasonably well on validation sets, which confirms its strong ability to represent a whole sentence and its feasibility as the first layer of our model. We compare HMIT with one SA model, three TBSA models and their variants. Table 2 shows that TBSA models display little advantage compared with SA models, which implies that current sentiment classification is mostly decided by sentiment lexicons or opinion words around the target instead of the target itself. Furthermore, the superior performance of the variants on both datasets indicates that removing sentiment words from tweets enables models to pay more attention to the topic in a tweet, thus constructing the relationship between topics and individual sentiments.

Table 2. Comparison results

Full size table

Extract Features from BERT. We discover which encoding layer extracted from BERT is the most appropriate for further modification and classification. We extract features from -3, -2 and -1 encoding layer of BERT and simply add a CNN classifier after that. In Fig. 2, we report the accuracy and F1 score of cross-validation on $\mathbb {D}_{t2}$. It turns out that the penultimate layer is the most appropriate to make changes or incorporate external information. The last layer is too close to the target and the previous layers may not have been fully learned semantically. Therefore, we extract the penultimate layer in the method.

5 Conclusion

We have proposed a hierarchical model to make individual sentiment estimation of potential topics. The approach extracts topics automatically and models the relationship between individual sentiments and different topics. It takes as input tweets without sentiment words, extracts features first from fine-tuned BERT and then incorporates topic information in context-aware token representation through the topic-level attention mechanism. CNN further classifies the repre- sentation into positive or negative. The proposed architecture can potentially be applied for a precise individual recommendation or group sentiment estimation towards one topic.

References

Bao, X., Qiao, Q.: Transfer learning from pre-trained bert for pronoun resolution. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 82–88 (2019)
Google Scholar
Denil, M., Demiraj, A., De Freitas, N.: Extraction of salient sentences from labelled documents. arXiv preprint arXiv:1412.6815 (2014)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., Xu, K.: Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 49–54 (2014)
Google Scholar
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project report, Stanford 1(12), 2009 (2009)
Google Scholar
Liu, B., Hu, M.: Opinion lexicon (or sentiment lexicon) (2004). https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Liu, Q., Zhang, H., Zeng, Y., Huang, Z., Wu, Z.: Content attention model for aspect based sentiment analysis. In: Proceedings of the 2018 World Wide Web Conference, pp. 1023–1032. International World Wide Web Conferences Steering Committee (2018)
Google Scholar
Ma, D., Li, S., Zhang, X., Wang, H.: Interactive attention networks for aspect-level sentiment classification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 4068–4074. AAAI Press (2017)
Google Scholar
Mohammad, S., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of Tweets. In: Proceedings of the Seventh International Workshop on Semantic Evaluation Exercises (SemEval-2013), Atlanta, Georgia, USA, June 2013
Google Scholar
Tang, D., Qin, B., Feng, X., Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers, pp. 3298–3307 (2016)
Google Scholar
Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 214–224 (2016)
Google Scholar
Wang, J., et al.: Aspect sentiment classification with both word-level and clause-level attention networks. In: IJCAI, pp. 4439–4445 (2018)
Google Scholar

Download references

Acknowledgment

This research work has been funded by the National Natural Science Foundation of China (Grant No. 61772337), the National Key Research and Development Program of China NO. 2016QY03D0604.

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan RD, Shanghai, 200240, China
Qian Ji, Yilin Dai, Yinghua Ma, Gongshen Liu, Quanhai Zhang & Xiang Lin

Authors

Qian Ji
View author publications
You can also search for this author in PubMed Google Scholar
Yilin Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yinghua Ma
View author publications
You can also search for this author in PubMed Google Scholar
Gongshen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Quanhai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gongshen Liu or Quanhai Zhang .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, Q., Dai, Y., Ma, Y., Liu, G., Zhang, Q., Lin, X. (2020). Hierarchical Sentiment Estimation Model for Potential Topics of Individual Tweets. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_75

Download citation

DOI: https://doi.org/10.1007/978-3-030-63820-7_75
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Sentiment Estimation Model for Potential Topics of Individual Tweets

Abstract

Similar content being viewed by others

A BERT Model-Based Sentiment Analysis on COVID-19 Tweets

TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations

Sentiment Analysis of Tweets Using Deep Learning

Keywords

1 Introduction

2 Related Work

3 Proposed Method

3.1 Topic Phrases Extraction

3.2 Fine-Tuned BERT with Non-sentiment Words

3.3 Topic-Level Attention

3.4 CNN Classification

3.5 Inference and Learning

4 Experiments

4.1 Experimental Settings

4.2 Models Under Comparison

4.3 Results and Analysis

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hierarchical Sentiment Estimation Model for Potential Topics of Individual Tweets

Abstract

Similar content being viewed by others

A BERT Model-Based Sentiment Analysis on COVID-19 Tweets

TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations

Sentiment Analysis of Tweets Using Deep Learning

Keywords

1 Introduction

2 Related Work

3 Proposed Method

3.1 Topic Phrases Extraction

3.2 Fine-Tuned BERT with Non-sentiment Words

3.3 Topic-Level Attention

3.4 CNN Classification

3.5 Inference and Learning

4 Experiments

4.1 Experimental Settings

4.2 Models Under Comparison

4.3 Results and Analysis

5 Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation