Abstract
Opinion Mining (OM) of Internet reviews is one of the key issues in Natural Language Processing (NLP) field. This paper proposes a stacked Bi-LSTM aspect opinion extraction model in which semantic and syntactic features are both integrated. The model takes embedded vector which is composed by word embedding, POS tags and dependency relations as its input while taking label sequence as its output. The experimental results show the effectiveness of this structural features embedded stacked Bi-LSTM model on cross-domain and cross-language datasets, and indicate that this model outperforms the state-of-the-art methods.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
With the evolution of the Internet, OM has become one of the most vigorous research areas in NLP field. An aspect is a concept in which the opinion is expressed in the given text [1]. Aspect specific OM task can be divided into four main subtasks: aspect extraction, opinion extraction, sentiment analysis and opinion summarization [2]. This paper focuses on the second subtasks: opinion extraction. In this paper, we propose a hierarchical model based on stacked Bi-LSTM using both semantic information and syntactic information as input to extract aspect specific opinions.
Internet reviews OM can be carried out from three directions: document-level OM [3], sentence-level OM and aspect-level OM. Aspect-level OM is to extract both the aspects and the corresponding opinion expressions in sentences [4]. The extraction of opinion towards its corresponding aspect is a core task in Aspect-level OM. In recent years, the neural network has reached remarkable effect in NLP. Pang et al. [5] committed a survey of the current deep models used to handle text sequence issues. Socher et al. [6] proposed the recursive neural tensor network and represent phrases by distributed vectors. RNN [7] and its variants such as LSTM [8] and GRU [9] stood out from various deep learning methods. Huang et al. [10] proposed a bidirectional LSTM-CRF model for sequence labeling, and on this basis, Ma et al. [11] joined the CNNs in the model to encode character-level information of a word into its character-level representation. Du et al. [12] proposed an attention mechanism based RNN model which contains two bidirectional LSTM layers to label sequences so that to extract opinion phrases. Nevertheless, the neural networks’ performance drops rapidly when the models solely depend on neural embedding as input [11].
2 SBLSTM Model
2.1 SBLSTM Model Structure
We model aspect opinion extraction as a sequence labeling. The input of the model includes embedded vector, POS tags and dependency relations. The output is the corresponding label sequence of the input text sequence. We use a stacked Bi-LSTM between the input layer and output layer. Opinion expressions extraction has often been treated as a sequence labeling task. This kind of method usually uses the conventional B-I-O tagging scheme.
The basic idea of LSTM is to present each sequence forwards and backwards to two separate hidden states to capture past and future information. Then the two hidden states are concatenated to form the final output. The bidirectional variant of one unit’s hidden state’s update at time step t is as following.
\( h_{t} = \left[ {\overrightarrow {{h_{t} }} ,\overleftarrow {{h_{t} }} } \right] \) can be regarded as an intermediate representation containing the information from both directions to predict the label of the current input \( {\text{x}}_{\text{t}} \).
Stacked RNNs is stacked by k(k ≥ 2) RNN networks. The first RNN receives the word embedding sequences as its input and the last RNN forms the abstract vector representation of the input sequence which is used to predict the final labels. Suppose the output of \( {\mathbf{j}}^{{{\mathbf{th}}}} \) RNN on time-step t is \( {\mathbf{h}}_{{\mathbf{t}}}^{{\mathbf{j}}} \), the stacked RNNs can be formulated as following.
The function g in (3) can be replaced by any RNN transition functions. We expect to capture the important opinion elements. Therefore, we choose the 2 layer Stacked-BiLSTM network as the basic model, and adds attention mechanism to it. In this attention model, the second BLSTM’s input \( {\text{i}}_{t}^{2} \) at time t can be expressed as:
where \( h_{s}^{1} \) is the output vector of the first BLSTM at time s, \( \alpha_{ts}^{ } \) is the weight of the output vector sequence \( [h_{1}^{1} ,h_{2}^{1} ,h_{3}^{1} , \ldots ,h_{T}^{1} ] \), the product of which is the input of the second BLSTM at the time t. The weight \( \alpha_{ts}^{ } \) is calculated as follows:
where \( {\text{W}}^{1} \) and \( {\text{W}}^{2} \) are the parameter matrixs that update in the model training process. b is the bias vector. e and \( e_{ts}^{ } \) has the same dimension and also update with the above adjustable parameters in the model training process.
Figure 1 demonstrates a stacked Bi-LSTM model consisting two Bi-LSTMs with an attention layer. The input is distributed word vectors of texts while the output is a series of B-I-O tags predicted from the network. In order to make the stacked RNNs to be extended easily, we use stacked bidirectional LSTMs with depth of 2 as our basic model in this paper.
2.2 Features in SBLSTM Model
In SBLSTM model, the features used is as following
-
Word embeddings. The word embedding is a kind of distributed vector which contains the semantic information.
-
POS tags. We use Stanford Tagger to obtain the POS tags.
-
Syntactic tree. Here we particularly apply the syntactic information, dependency tree in the model. Figure 2 displays the dependency tree for a movie review.
The syntactic representation of one word is defined as its m (m ≥ 0) children in a dependency tree, where m denotes the window size to limit the amount of the dependency relations of one word for the learning models. Introducing the window size could prevent excessive usage on VRAM.
Finally, the three type’s features will be concatenated as the input vector and fed to the SBLSTM model. Figure 3 shows the final features composition of one word.
3 Experiments Design and Analysis
3.1 Datasets
For now, there are no available benchmark datasets that mark phrase boundaries of aspect specific expressions. Therefore, two manually constructed datasets are used in our experiment. Mukherjee constructed an annotated corpus which considers 1, 2, and 3-star product reviews from Amazon in English. The other dataset consists of online reviews of three Chinese movies Mr. Six, The Witness and Chongqing Hotpot collected from Douban, Mtime and Sina microblog. The movie reviews dataset is manually annotated.
The statistics information of these two datasets are showed in Table 1. Figures 4 and 5 display the sentences length distributions of the two datasets.
3.2 Experimental Setting
In experiments, we use Stanford parser to obtain the syntactic information. The SBLSM models is implemented in python 2.7 and we use the Keras framework to construct the deep neural networks. The input length is limited to be 60 in LSTMs’ models and the amount of LSTM input units is 60 while the number of the output units is 64. The word embeddings dimension is set to be 100. The window size in extracting dependency features is set to be 4 initially. In training process, the ratio between training set and validation set is 4:1. The activation function chosen is softmax function and the batch size to train the model is 256.
3.3 Quantitative Analysis
Evaluation Metrics.
Precision, recall and F1 score are commonly used to evaluate the performance of OM models. In OM task, the boundaries of opinion expressions are hard to define. Therefore, we use proportional overlap as a soft measure to evaluate the performance.
Model Comparative Analysis.
To illustrate the performance boost of our SBLSTM model, we firstly compare our model with some baseline methods on both two datasets. Since we use stacked bidirectional LSTM with depth of 2 as the core model, we choose the LSTM network and the bidirectional LSTM as the baselines. Furthermore, we also compare our model with the CRF model and rules based method which depends on the dependency tree.
As shown in Table 2, we reported the accuracy, the precision, the recall and the F1-score across all single runs for each approach. We could found that the proposed SBLSTM model outperforms the baseline methods in terms of accuracy, recall and F1. Bi-LSTM outperforms all of the others in terms of precision in the movie dataset, and compared with the CRF model, which achieves the highest precision in the product dataset, our proposed model also provides a comparable precision. Another observation is for both datasets, Bi-LSTM outperforms the normal LSTM model with absolute gains of 4.73% and 4.87% in terms of F1 score.
Feature Comparative Analysis.
In training process, the batch size is set to be 256 and the epoch number is set to be 30. Table 3 shows the comparison of experimental results using different feature sets.
Our proposed methods which introduces all of the three types feature performs best in term of accuracy, recall and F1 score. Refer to the third line and the fourth line of Table 3, adding word embeddings into the feature set makes the performance of the model improved in a similar way. This indicates that both word embedding and POS tags have some help in extracting the aspect specific opinion expressions. Particularly, we can observe that the recall measure and the F1-score are improved by 20% and 10% respectively when the dependency relations have been added into features, providing the evidence that syntactic information does play an important role in extracting the opinions.
Window Size Analysis.
We conduct a series of experiments with different window sizes to compare and analyze the impact of the children amount in dependency tree on model performance on movie dataset. The batch size in training process is set to be 256 and each trial was carried out in 300 epochs. Table 4 shows the comparison of the predictive performance of the proposed stacked Bi-LSTM models with both the semantic features and the syntactic features.
From Table 4, we found that F1-score increase with the growth of the window size in general, tending to be stable when the window length is greater than 4.
3.4 Qualitative Analysis
To explore the contribution of this paper, we conducted a qualitative analysis experiment on five chinese movie comments below and the aim aspect is FengXiaogang.
The experiment uses rules-based model, CRF model, LSTM network and Bi-LSTM network as baseline methods. The aspect specific opinion extraction results of different methods are shown in Table 5. The green words are annotated opinion expression which we want models to extract, and the red words refer to those words haven’t been extracted by the model while these blue ones are those words not in annotation.
The dependency rule based method is more effective when the sentence is short and simple. When here comes a complex sentence, it is impossible to obtain more information when the comment contains a demonstrative pronoun. Most importantly, no matter the length of the sentence, our model can extract the opinion information well.
4 Conclusions
In this paper, we proposed a method to embed syntactic information into the deep neural models. Experimental results on two domains and different languages data sets showed that the proposed stacked bidirectional LSTM model outperform all of the baseline methods, proofing that the syntactic information did play a significant role in correctly locating the aspect-specific opinion expressions.
References
Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl.-Based Syst. 108, 42–49 (2016)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 168–177. ACM, New York (2004)
Valakunde, N.D., Patwardhan, M.S.: Multi-aspect and multi-class based document sentiment analysis of educational data catering accreditation process. In: International Conference on Cloud & Ubiquitous Computing & Emerging Technologies (CUBE 2013), pp. 188–192. IEEE Computer Society, Washington (2013)
Singh, V.K., Piryani, R., Uddin, A., et al.: Sentiment analysis of movie reviews: a new feature-based heuristic for aspect-level sentiment classification. In: International Multi-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4 s 2013), pp. 712–717. IEEE Computer Society, Washington (2013)
Pang, L., Lan, Y.Y., Xu, J., et al.: A survey on deep text matching. Chin. J. Comput. 40(04), 985–1003 (2017). (in Chinese with English abstract)
Socher, R., Perelygin, A., Wu, J.Y., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 1631–1642. ACL, Stroudsburg, PA (2013)
Goller, C., Kuchler, A.: Learning task-dependent distributed representations by backpropagation through structure. In: IEEE International Conference on Neural Networks, vol. 1, pp. 347–352. IEEE (2002)
Hochreiter, S., Jurgen, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Cho, K., Merrienboer, B.V., Bahdana, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111 (2014)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Computing Research Repository, abs/1508.01991 (2015)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF (2016)
Du, J., Gui, L., Xu, R.: Extracting opinion expression with neural attention. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds.) SMP 2016. CCIS, vol. 669, pp. 151–161. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-2993-6_13
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Han, Z., Jiang, X., Li, M., Zhang, M., Duan, D. (2018). An Integrated Semantic-Syntactic SBLSTM Model for Aspect Specific Opinion Extraction. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds) Web Information Systems and Applications. WISA 2018. Lecture Notes in Computer Science(), vol 11242. Springer, Cham. https://doi.org/10.1007/978-3-030-02934-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-02934-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02933-3
Online ISBN: 978-3-030-02934-0
eBook Packages: Computer ScienceComputer Science (R0)