An Integrated Semantic-Syntactic SBLSTM Model for Aspect Specific Opinion Extraction

Han, Zhongming; Jiang, Xin; Li, Mengqi; Zhang, Mengmei; Duan, Dagao

doi:10.1007/978-3-030-02934-0_18

Zhongming Han^18,19,
Xin Jiang¹⁸,
Mengqi Li¹⁸,
Mengmei Zhang¹⁸ &
…
Dagao Duan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11242))

Included in the following conference series:

International Conference on Web Information Systems and Applications

1409 Accesses
4 Citations

Abstract

Opinion Mining (OM) of Internet reviews is one of the key issues in Natural Language Processing (NLP) field. This paper proposes a stacked Bi-LSTM aspect opinion extraction model in which semantic and syntactic features are both integrated. The model takes embedded vector which is composed by word embedding, POS tags and dependency relations as its input while taking label sequence as its output. The experimental results show the effectiveness of this structural features embedded stacked Bi-LSTM model on cross-domain and cross-language datasets, and indicate that this model outperforms the state-of-the-art methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A mixed unsupervised method for aspect extraction using BERT

Article 11 April 2022

MRCE: A Multi-Representation Collaborative Enhancement Model for Aspect-Opinion Pair Extraction

POS-ATAEPE-BiLSTM: an aspect-based sentiment analysis algorithm considering part-of-speech embedding

Article 09 September 2023

Keywords

1 Introduction

With the evolution of the Internet, OM has become one of the most vigorous research areas in NLP field. An aspect is a concept in which the opinion is expressed in the given text [1]. Aspect specific OM task can be divided into four main subtasks: aspect extraction, opinion extraction, sentiment analysis and opinion summarization [2]. This paper focuses on the second subtasks: opinion extraction. In this paper, we propose a hierarchical model based on stacked Bi-LSTM using both semantic information and syntactic information as input to extract aspect specific opinions.

Internet reviews OM can be carried out from three directions: document-level OM [3], sentence-level OM and aspect-level OM. Aspect-level OM is to extract both the aspects and the corresponding opinion expressions in sentences [4]. The extraction of opinion towards its corresponding aspect is a core task in Aspect-level OM. In recent years, the neural network has reached remarkable effect in NLP. Pang et al. [5] committed a survey of the current deep models used to handle text sequence issues. Socher et al. [6] proposed the recursive neural tensor network and represent phrases by distributed vectors. RNN [7] and its variants such as LSTM [8] and GRU [9] stood out from various deep learning methods. Huang et al. [10] proposed a bidirectional LSTM-CRF model for sequence labeling, and on this basis, Ma et al. [11] joined the CNNs in the model to encode character-level information of a word into its character-level representation. Du et al. [12] proposed an attention mechanism based RNN model which contains two bidirectional LSTM layers to label sequences so that to extract opinion phrases. Nevertheless, the neural networks’ performance drops rapidly when the models solely depend on neural embedding as input [11].

2 SBLSTM Model

2.1 SBLSTM Model Structure

We model aspect opinion extraction as a sequence labeling. The input of the model includes embedded vector, POS tags and dependency relations. The output is the corresponding label sequence of the input text sequence. We use a stacked Bi-LSTM between the input layer and output layer. Opinion expressions extraction has often been treated as a sequence labeling task. This kind of method usually uses the conventional B-I-O tagging scheme.

The basic idea of LSTM is to present each sequence forwards and backwards to two separate hidden states to capture past and future information. Then the two hidden states are concatenated to form the final output. The bidirectional variant of one unit’s hidden state’s update at time step t is as following.

$$ \overrightarrow {{h_{t} }} = \vec{g}\left( {\overrightarrow {{h_{t - 1} }} ,x_{t} } \right)\left( {\overrightarrow {{h_{0} }} = 0} \right) $$

(1)

$$ \overleftarrow {{h_{t} }} = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{g} \left( {\overleftarrow {{h_{t + 1} }} ,x_{t} } \right)\left( {\overleftarrow {{h_{T} }} = 0} \right) $$

(2)

$ h_{t} = \left[ {\overrightarrow {{h_{t} }} ,\overleftarrow {{h_{t} }} } \right] $ can be regarded as an intermediate representation containing the information from both directions to predict the label of the current input $ {\text{x}}_{\text{t}} $.

Stacked RNNs is stacked by k(k ≥ 2) RNN networks. The first RNN receives the word embedding sequences as its input and the last RNN forms the abstract vector representation of the input sequence which is used to predict the final labels. Suppose the output of $ {\mathbf{j}}^{{{\mathbf{th}}}} $ RNN on time-step t is $ {\mathbf{h}}_{{\mathbf{t}}}^{{\mathbf{j}}} $, the stacked RNNs can be formulated as following.

$$ h_{t}^{j} = \left\{ {\begin{array}{*{20}l} { g\left( {h_{t - 1}^{j} ,x_{t} } \right)\quad \quad j = 0 } \hfill \\ { g\left( {h_{t - 1}^{j} ,h_{t}^{j - 1} } \right)\quad \quad otherwise} \hfill \\ \end{array} } \right. $$

(3)

The function g in (3) can be replaced by any RNN transition functions. We expect to capture the important opinion elements. Therefore, we choose the 2 layer Stacked-BiLSTM network as the basic model, and adds attention mechanism to it. In this attention model, the second BLSTM’s input $ {\text{i}}_{t}^{2} $ at time t can be expressed as:

$$ {\text{i}}_{\text{t}}^{2} = \sum\nolimits_{s = 1}^{T} {\alpha_{ts} h_{s}^{1} } $$

(4)

where $ h_{s}^{1} $ is the output vector of the first BLSTM at time s, $ \alpha_{ts}^{ } $ is the weight of the output vector sequence $ [h_{1}^{1} ,h_{2}^{1} ,h_{3}^{1} , \ldots ,h_{T}^{1} ] $, the product of which is the input of the second BLSTM at the time t. The weight $ \alpha_{ts}^{ } $ is calculated as follows:

$$ e_{ts}^{ } = tanh\left( {W^{1} \, _{s}^{1} + W^{2} \, _{t - 1}^{2} + b} \right) $$

(5)

$$ \alpha_{ts}^{ } = \frac{{exp\left( {e_{ts}^{T} e} \right)}}{{\mathop \sum \nolimits_{k = 1}^{T} exp\left( {e_{tk}^{T} e} \right)}} $$

(6)

where $ {\text{W}}^{1} $ and $ {\text{W}}^{2} $ are the parameter matrixs that update in the model training process. b is the bias vector. e and $ e_{ts}^{ } $ has the same dimension and also update with the above adjustable parameters in the model training process.

Figure 1 demonstrates a stacked Bi-LSTM model consisting two Bi-LSTMs with an attention layer. The input is distributed word vectors of texts while the output is a series of B-I-O tags predicted from the network. In order to make the stacked RNNs to be extended easily, we use stacked bidirectional LSTMs with depth of 2 as our basic model in this paper.

2.2 Features in SBLSTM Model

In SBLSTM model, the features used is as following

Word embeddings. The word embedding is a kind of distributed vector which contains the semantic information.
POS tags. We use Stanford Tagger to obtain the POS tags.
Syntactic tree. Here we particularly apply the syntactic information, dependency tree in the model. Figure 2 displays the dependency tree for a movie review.
Fig. 2.
Dependency tree of an example context
Full size image

The syntactic representation of one word is defined as its m (m ≥ 0) children in a dependency tree, where m denotes the window size to limit the amount of the dependency relations of one word for the learning models. Introducing the window size could prevent excessive usage on VRAM.

Finally, the three type’s features will be concatenated as the input vector and fed to the SBLSTM model. Figure 3 shows the final features composition of one word.

3 Experiments Design and Analysis

3.1 Datasets

For now, there are no available benchmark datasets that mark phrase boundaries of aspect specific expressions. Therefore, two manually constructed datasets are used in our experiment. Mukherjee constructed an annotated corpus which considers 1, 2, and 3-star product reviews from Amazon in English. The other dataset consists of online reviews of three Chinese movies Mr. Six, The Witness and Chongqing Hotpot collected from Douban, Mtime and Sina microblog. The movie reviews dataset is manually annotated.

The statistics information of these two datasets are showed in Table 1. Figures 4 and 5 display the sentences length distributions of the two datasets.

Table 1. Statistics of dataset

Full size table

3.2 Experimental Setting

In experiments, we use Stanford parser to obtain the syntactic information. The SBLSM models is implemented in python 2.7 and we use the Keras framework to construct the deep neural networks. The input length is limited to be 60 in LSTMs’ models and the amount of LSTM input units is 60 while the number of the output units is 64. The word embeddings dimension is set to be 100. The window size in extracting dependency features is set to be 4 initially. In training process, the ratio between training set and validation set is 4:1. The activation function chosen is softmax function and the batch size to train the model is 256.

3.3 Quantitative Analysis

Evaluation Metrics.

Precision, recall and F1 score are commonly used to evaluate the performance of OM models. In OM task, the boundaries of opinion expressions are hard to define. Therefore, we use proportional overlap as a soft measure to evaluate the performance.

Model Comparative Analysis.

To illustrate the performance boost of our SBLSTM model, we firstly compare our model with some baseline methods on both two datasets. Since we use stacked bidirectional LSTM with depth of 2 as the core model, we choose the LSTM network and the bidirectional LSTM as the baselines. Furthermore, we also compare our model with the CRF model and rules based method which depends on the dependency tree.

As shown in Table 2, we reported the accuracy, the precision, the recall and the F1-score across all single runs for each approach. We could found that the proposed SBLSTM model outperforms the baseline methods in terms of accuracy, recall and F1. Bi-LSTM outperforms all of the others in terms of precision in the movie dataset, and compared with the CRF model, which achieves the highest precision in the product dataset, our proposed model also provides a comparable precision. Another observation is for both datasets, Bi-LSTM outperforms the normal LSTM model with absolute gains of 4.73% and 4.87% in terms of F1 score.

Table 2. Results of our proposed model against baseline methods

Full size table

Feature Comparative Analysis.

In training process, the batch size is set to be 256 and the epoch number is set to be 30. Table 3 shows the comparison of experimental results using different feature sets.

Table 3. Comparison of the models performance using different features

Full size table

Our proposed methods which introduces all of the three types feature performs best in term of accuracy, recall and F1 score. Refer to the third line and the fourth line of Table 3, adding word embeddings into the feature set makes the performance of the model improved in a similar way. This indicates that both word embedding and POS tags have some help in extracting the aspect specific opinion expressions. Particularly, we can observe that the recall measure and the F1-score are improved by 20% and 10% respectively when the dependency relations have been added into features, providing the evidence that syntactic information does play an important role in extracting the opinions.

Window Size Analysis.

We conduct a series of experiments with different window sizes to compare and analyze the impact of the children amount in dependency tree on model performance on movie dataset. The batch size in training process is set to be 256 and each trial was carried out in 300 epochs. Table 4 shows the comparison of the predictive performance of the proposed stacked Bi-LSTM models with both the semantic features and the syntactic features.

Table 4. Performance of different window sizes

Full size table

From Table 4, we found that F1-score increase with the growth of the window size in general, tending to be stable when the window length is greater than 4.

3.4 Qualitative Analysis

To explore the contribution of this paper, we conducted a qualitative analysis experiment on five chinese movie comments below and the aim aspect is FengXiaogang.

The experiment uses rules-based model, CRF model, LSTM network and Bi-LSTM network as baseline methods. The aspect specific opinion extraction results of different methods are shown in Table 5. The green words are annotated opinion expression which we want models to extract, and the red words refer to those words haven’t been extracted by the model while these blue ones are those words not in annotation.

Table 5. Aspect specific opinion extraction results of different methods

Full size table

The dependency rule based method is more effective when the sentence is short and simple. When here comes a complex sentence, it is impossible to obtain more information when the comment contains a demonstrative pronoun. Most importantly, no matter the length of the sentence, our model can extract the opinion information well.

4 Conclusions

In this paper, we proposed a method to embed syntactic information into the deep neural models. Experimental results on two domains and different languages data sets showed that the proposed stacked bidirectional LSTM model outperform all of the baseline methods, proofing that the syntactic information did play a significant role in correctly locating the aspect-specific opinion expressions.

References

Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl.-Based Syst. 108, 42–49 (2016)
Article Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 168–177. ACM, New York (2004)
Google Scholar
Valakunde, N.D., Patwardhan, M.S.: Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process. In: International Conference on Cloud & Ubiquitous Computing & Emerging Technologies (CUBE 2013), pp. 188–192. IEEE Computer Society, Washington (2013)
Google Scholar
Singh, V.K., Piryani, R., Uddin, A., et al.: Sentiment analysis of movie reviews: a new feature-based heuristic for aspect-level sentiment classification. In: International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4 s 2013), pp. 712–717. IEEE Computer Society, Washington (2013)
Google Scholar
Pang, L., Lan, Y.Y., Xu, J., et al.: A survey on deep text matching. Chin. J. Comput. 40(04), 985–1003 (2017). (in Chinese with English abstract)
MathSciNet Google Scholar
Socher, R., Perelygin, A., Wu, J.Y., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 1631–1642. ACL, Stroudsburg, PA (2013)
Google Scholar
Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: IEEE International Conference on Neural Networks, vol. 1, pp. 347–352. IEEE (2002)
Google Scholar
Hochreiter, S., Jurgen, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Cho, K., Merrienboer, B.V., Bahdana, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111 (2014)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Computing Research Repository, abs/1508.01991 (2015)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF (2016)
Google Scholar
Du, J., Gui, L., Xu, R.: Extracting opinion expression with neural attention. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds.) SMP 2016. CCIS, vol. 669, pp. 151–161. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-2993-6_13
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Information Engineering, Beijing Technology and Business University, Beijing, 100048, China
Zhongming Han, Xin Jiang, Mengqi Li, Mengmei Zhang & Dagao Duan
Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing, China
Zhongming Han

Authors

Zhongming Han
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Mengqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Mengmei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dagao Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongming Han .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Xiaofeng Meng
Huazhong University of Science and Technology, Wuhan, China
Ruixuan Li
Renmin University of China, Beijing, China
Kanliang Wang
Taiyuan University of Technology, Yuci, China
Baoning Niu
Tianjin University, Tianjin, China
Xin Wang
South China Normal University, Guangzhou, China
Gansen Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, Z., Jiang, X., Li, M., Zhang, M., Duan, D. (2018). An Integrated Semantic-Syntactic SBLSTM Model for Aspect Specific Opinion Extraction. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham. https://doi.org/10.1007/978-3-030-02934-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-02934-0_18
Published: 20 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02933-3
Online ISBN: 978-3-030-02934-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Integrated Semantic-Syntactic SBLSTM Model for Aspect Specific Opinion Extraction

Abstract

Similar content being viewed by others

A mixed unsupervised method for aspect extraction using BERT

MRCE: A Multi-Representation Collaborative Enhancement Model for Aspect-Opinion Pair Extraction

POS-ATAEPE-BiLSTM: an aspect-based sentiment analysis algorithm considering part-of-speech embedding

Keywords

1 Introduction