Abstract
Sentiment analysis is a crucial step in the social media data analysis. The majority of research works on sentiment analysis focus on sentiment polarity detection which identifies whether an input text is positive, negative or neutral. In this paper, we have implemented a stacked ensemble approach to sentiment polarity detection in Bengali tweets. The basic concept of stacked generalization is to fuse the outputs of the first level base classifiers using a second-level Meta classifier in an ensemble. In our ensemble method, we have used two types of base classifiers- multinomial Naïve Bayes classifiers and SVM that make use of a diverse set of features. Our proposed approach shows an improvement over some existing Bengali sentiment analysis approaches reported in the literature.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Now-a-days, a huge amount of social media and user-generated content are available on the Web. The major portion of social media texts such as blog posts, tweets and comments includes opinion related information. The internet users give opinions in the various domains such as politics, crickets, sports, movies, music etc. This vast amount of online social media textual data can be collected and mined for deriving intelligence useful in the many applications in the various domains such as marketing, politics/political science, policy making, sociology and psychology. A social media user expresses the sentiment in form of an opinion, a subjective impression, a thought or judgment prompted by feelings [1]. It also includes emotions. Sentiment analysis for detecting sentiment polarity (positive, negative or neutral) in social media texts has been recognized by the researchers as one of the major research topics though it is hard to find the concrete boundary between two research areas-sentiment analysis and opinion mining.
So, to derive knowledge from the vast amount of social media data, there is a need for an efficient and accurate system which can perform analysis and detect sentiment polarity of the opinions coming from various heterogeneous sources.
The most common approaches to sentiment analysis use various machine learning based techniques [2,3,4,5,6,7,8,9] though the earlier approaches to sentiment analysis used the natural language processing (NLP) and computational linguistics techniques [10,11,12,13] which use in-depth linguistic knowledge. Research works on sentiment polarity detection have already been carried out in the different genres: blogs [14], discussion boards or forums [15], user reviews [16] and expert reviews [17].
In contrast to machine learning based approach to sentiment analysis, the lexicon based approach [18] relies solely on background knowledge base or sentiment lexicon which is either manually constructed lexicon of polarity (positive and negative) terms [19] or a lexicon constructed using some automatic process [20,21,22,23,24,25]. A kind of sentiment lexicon [26] is manually constructed lexicon created based on the quantitative analysis of the glosses associated with synsets retrieved from WordNet [27]. Though the sentiment lexicon plays a crucial role in the most sentiment analysis tasks, the main problem with the lexicon based approach is that it is difficult to extract and maintain a universal sentiment lexicon.
The main advantage of the machine learning based approach is that it can easily be ported to any domain quickly by changing the training dataset. Hence the most researchers prefer supervised machine based sentiment analysis approach. Supervised method for sentiment polarity detection uses machine learning algorithms trained on a sentiment polarity labeled corpus of social media texts where each text is turned into a feature vector. The most commonly used features are word n-grams, surrounding words, or even punctuations. The most common machine learning algorithms which have been used for sentiment analysis task are Naïve Bayes, SVM, K-nearest neighbor, decision tree, Artificial Neural Networks [18, 28,29,30,31,32,33,34,35,36].
The most previous research works on sentiment polarity detection involves analysis of sentiments of English texts. But due to the multilingual nature of Indian social media texts, there is also need for developing a system that can do sentiment analysis of Indian language texts. In this line, a shared task on Sentiment Analysis in Indian Languages (SAIL) Tweets was held in conjunction with MIKE 2015 conference at IIIT Hyderabad, India [37]. Bengali language was also included in this shared task as one of the major Indian languages. The Bengali (Bangla) Language is also one of the most spoken languages in the world. In recent years, some researchers have attempted to develop sentiment analysis systems for Bengali language [35, 38,39,40,41].
In this paper, we present a stacked ensemble approach for sentiment polarity detection of Bengali tweets. This approach first constructs three base models, where each model makes use of a subset of input features representing the tweets. The tweets are represented by word n-gram, character n-gram and SentiWordNet features. SentiWordNet or sentiment-WordNet is an external knowledge base containing a collection of polarity words (discussed in the next section).
Features are grouped into three subsets- (1) the subset consisting of word n-gram features and Sentiment-WordNet features, (2) the subset consisting of character n-gram features and Sentiment-WordNet features and (3) the subset consisting of unigram and Sentiment-WordNet features. The first base model is developed using multinomial Naïve Bayes with the word n-gram and Sentiment-WordNet features, the second base model is developed using multinomial Naïve Bayes with character n-gram and Sentiment-WordNet features and the third base model is developed using support vector machine with unigram features and Sentiment-WordNet features. The base level classifiers’ predictions are combined using a Meta classifier. This process is popularly known as stacking.
2 Proposed Methodology
The proposed system uses stacked ensemble model for sentiment polarity detection in Bengali tweets. The proposed model has three important steps: (1) data cleaning, (2) features and base classifiers (3) model development and sentiment polarity classification.
2.1 Data Cleaning
At the preprocessing step, the entire data collection is processed to remove irrelevant characters from the data. This is important for tweet data because tweet data is noisy.
2.2 Features and Base Classifiers
Our idea of stacked ensemble is to combine outputs of the base classifiers, each of which makes use of a subset of features representing the tweets. As mentioned earlier, a tweet is represented by word n-gram, character n-gram features and SentiWordNet features. The word n-grams or the character n-grams that do not occur at least 3 times in the training data are removed from the tweets as noise.
We have developed three base classifiers at the first level-(1) multinomial naïve Bayes with word n-gram features and Sentiment-WordNet features, (2) multinomial naïve Bayes with character n-gram features and Sentiment-WordNet features, and (3) linear kernel support vector machines with unigram (1-gram) features and Sentiment-WordNet features. For the second level, we have used MLP classifier as the Meta classifier. The overall architecture of our proposed model is shown in Fig. 1.
Base Classifiers
Multinomial Naïve Bayes with word n-gram features and Sentiment-WordNet features. As mentioned earlier, the first base model in our proposed stacked ensemble model uses multinomial naïve Bayes [38, 40] and the associated feature set includes word n-gram features and Sentiment-WordNet features. For this model, we have taken unigrams, bigrams and trigrams as the features (we have taken up to trigrams, i.e., n = 1, 2, 3). Word n-grams which do not occur at least 3 times in the training data are removed as noise. Considering word n-gram as features, the sentiment class of a tweet T is determined by the posterior probability for a sentiment class given the sequence of word n-grams in the tweet:
Where: m is the number of word n-grams (n = 1 to 3) in the tweet,
ti is the i-th word n-gram type,
C is a sentiment class,
P(C) is the prior probability,
T is a tweet represented as a sequence of word n-grams in the tweet, T = (t1, t2, …..tm) and
m is the number of word n-grams in the tweet including repetition (a word n-gram may repeat in the tweet).
The details of how the Multinomial Naïve Bayes is applied to sentiment analysis task can be found in [38]. In addition to word n-gram features, we have also incorporated external knowledge base called Sentiment-Wordnet wherefrom some polarity information is retrieved for the tweet words. Though polarity of a word does not always depend on its literal meaning, there are many words which are usually used as positive words (for example, the word “good”). This is also true for negative polarity words. Such information may be useful for sentiment polarity detection in tweets. For our work, Sentiment-WordNet for Indian Languages [42] (retrieved from http://amitavadas.com/sentiwordnet.php) has been used. This is a collection of positive, negative and neutral words along with their broad part-of-speech categories. To incorporate Sentiment-WordNet, each word in a tweet of the corpus is augmented with a special word “#P” if the tweet word is found in the list of positive polarity words, “#N” if the tweet word is found in the negative set and “#NU” if the word is found in the neutral set. For example, the tweet “ ” (This is a very good food) is augmented as follows: “ #P #P ” (This is a very #P good #P food). With this new augmentation, the formula for posterior probability is modified as follows:
Where: m = number of word n-grams (n = 1 to 3) in the tweet (including repetition)
m1 = number of tweet words found in the positive word-list of Sentiment-WordNet.
m2 = number of tweet words found in the negative word-list of Sentiment-WordNet.
m3 = number of tweet words found in the neutral-word list of Sentiment-WordNet.
From Eq. 2, it is evident that the posterior probability for a tweet is boosted by how many polarity words it contains. For example, if a tweet contains more number of positive polarity words than other two types, the overall polarity of the tweet is boosted in the direction of positivity.
Multinomial Naïve Bayes with Character n-gram Features and Sentiment-WordNet Features.
This base classifier is also based on the same principle described in the above sub-section. The only difference is that this model makes use of the different subset of input features, that is, it uses character n-grams and Sentiment-WordNet features representing a tweet. The examples of character n-grams and word n-grams are given below:
Example Input text: “khub bhalo cinema” (very good movie).
Word n-grams (for n = 1, 2) are: “khub”, “bhalo”, “cinema”, “khub bhalo”, “bhalo cinema”.
Character n-grams for n = 4 are: “khub”, “hub”, “ub b”, “b bh”, “bha”, “bhal”, “halo”, “alo”, “lo c”, “o cin”, “cin”, “cine”, “inem”, “nema”.
It is very common that the word occurring in the test tweet is absent in the training data. This is known as out-of-vocabulary problem. Character n-gram features are useful to deal with the out-of-vocabulary problem. The character n-grams with n varying from 2 to 5 are used for developing this base model. The character n-grams that do not occur at least 3 times in the training data are removed as noise.
For this base model, the set of character n-gram features and the Sentiment-WordNet features are used and the posterior probability for the tweet is calculated using Eq. 2 with the only difference is that the variables t1, t2, ….tm in Eq. 2 refer to the distinct character n-grams in the tweet, that means, the probability value is taken only once in the equation even if the character n-gram repeats several times in the tweet.
Support Vector Machines with Unigram and Sentiment-WordNet Features.
It is proven that Support Vector Machines (SVM) [43] with linear kernel is useful in text classification task due to its inherent capability in dealing with high dimensionality of the data. So, for the third base classifier, SVM with linear kernel has been used. This base model also uses a different subset of tweet features, that is, it uses unigram (word 1-gram) features and Sentiment-WordNet features. Since the tweet words, which are found in Sentiment-WordNet, are augmented with one of the possible pseudo words - “#P”, “#N” and “#NU”, Sentiment-WordNet features are automatically taken into account while computing the unigram feature set.
For developing this base model, we did not take all unigrams as features. A subset of unigrams is taken as features because we observe that increasing the number of unigram features hampers the individual performance of this base model. So, for this purpose, the most frequent 1000 unigrams per class are considered as the features for developing this base classifier. Thus, according to bag-of-unigrams model, each tweet is represented by a feature vector of length 3000 (1000 per class × 3) where each component of the vector corresponds to the frequency of the corresponding unigram in the tweet under consideration and finally each vector is labelled by the label of the corresponding training tweet.
2.3 Model Development and Sentiment Classification
As we have shown in Fig. 1, for model development, three base classifiers are used and the base classifiers’ predictions are combined using a Meta classifier. We have used multilayer perceptron (MLP) neural network classifier as the meta-classifier at the second level. From the training data provided to the model, it learns how to classify a tweet into one of three sentiment polarity classes - Positive, Negative and Neutral. Here the MLP classifier has one hidden layer with softplus activation function. The number of nodes considered in the hidden layer is 2.
During testing phase, the unlabeled tweet is presented for classification to the trained model. The label of the test tweet, assigned by the model, is considered as the sentiment label of the corresponding tweet.
3 Evaluation and Experimental Results
We have used Bengali datasets released for a shared task on Sentiment Analysis in Indian Languages (SAIL) Tweets, held at IIIT Hyderabad, India [37]. The training set consists of 1000 tweets and the test set consists of 500 tweets.
3.1 Experiments and Results
We have combined the SAIL training and test data to form a dataset consisting of 1500 tweets and 10-fold cross validation is done and the average accuracy over 10 folds is computed for each model presented in this paper. The obtained average accuracy has been reported in this paper.
We have compared our proposed stacked ensemble model with some existing Bengali tweet sentiment analysis systems published in the literature. For meaningful comparisons among the systems, we implemented the existing systems that previously used SAIL 2015 datasets for system development. The brief description of the systems to which our proposed system is compared is given below.
-
A deep learning model for Bengali tweet sentiment analysis has been presented in [41]. It has used recurrent neural networks called LSTM for model development. This LSTM based model takes into account the entire sequence of tokens contained in a tweet while detecting sentiment polarity of the tweet. The similar tweet augmentation strategy used in our proposed model is also used in this model.
-
The sentiment polarity detection model in [38] uses Multinomial Naïve Bayes with word unigram, bigram and Sentiment-WordNet features. The details can be found in [38].
-
The sentiment polarity detection model reported in [38] uses SVM with word unigram and Sentiment-WordNet features. The details of this model can be found in [38].
-
The sentiment polarity detection model presented in [40] uses character n-gram features and Sentiment-WordNet features. The details of this model can be found in [40].
We have compared the results obtained by our proposed stacked ensemble model with four existing sentiment polarity detection models described above. The comparisons of the results have been shown in Table 1. It is evident from Table 1 that our proposed stacked ensemble models perform better than other existing models it is compared to. Since each of the existing models mentioned in Table 1 uses a single machine learning algorithm that uses either word n-gram and Sentiment-WordNet features or character n-gram and Sentiment-WordNet features for Bengali tweet sentiment classification, the results obtained by our proposed stacked ensemble model show that combining classifiers with stacking improves performance over the individual classifier applied to sentiment polarity detection in Bengali tweets. As we can see from Table 1, our proposed model also performs better than a LSTM based deep learning model presented in [41].
4 Conclusion and Future Work
In this paper, we have described stacked ensemble model for Bengali tweet sentiment classification. Two Multinomial Naïve Bayes models using the different subsets of features and a SVM based model with linear kernel have been combined in a stacked ensemble using MLP classifier. We have experimented to choose the appropriate Meta classifier and our experiments reveal that MLP classifier with softplus activation in the hidden units performs best among other possibilities we considered.
The insufficiency of the training data is one of the major problems for developing systems for Bengali tweet sentiment analysis. We also observe that SAIL 2015 data is not error free. Some tweets have been wrongly labeled by the human annotators. However, for meaningful comparisons of the systems, we have left those errors uncorrected. We hope that the system performance can be improved with the increased amount of training data and proper annotation. We also expect that our proposed system can be easily ported to other Indian languages like Hindi, Tamil etc.
Choosing appropriate base classifiers and the Meta classifier can be the other ways for improving the system performance.
References
Bowker, J.: The Oxford Dictionary of World Religions. Oxford University Press, Oxford (1997)
Zhao, J., Liu, K., Wang, G.: Adding redundant features for CRFs-based sentence sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 117–126. Association for Computational Linguistics (2008)
Joachims, T.: Making large scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., der Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning. MITPress, Cambridge (1999)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th International Conference on World Wide Web, pp. 519–528. ACM (2003)
Mullen, T., Collier, N.: Sentiment analysis using support vector machines with diverse information sources. In: EMNLP, vol. 4, pp. 412–418 (2004)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)
Goldberg, A.B., Zhu, X.: Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 45–52. Association for Computational Linguistics (2006)
Miao, Q., Li, Q., Zeng, D.: Fine grained opinion mining by integrating multiple review sources. J. Am. Soc. Inform. Sci. Technol. 61(11), 2288–2299 (2010)
Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 105–112. Association for Computational Linguistics (2003)
Prabowo, R., Thelwall, M.: Sentiment analysis: a combined approach. J. Inf. 3(2), 143–157 (2009)
Narayanan, R., Liu, B., Choudhary, A.: Sentiment analysis of conditional sentences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 180–189. Association for Computational Linguistics (2009)
Wiegand, M., Balahur, A., Roth, B., Klakow, D., Montoyo, A.: A survey on the role of negation in sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 60–68. Association for Computational Linguistics (2010)
Ku, L.-W., Liang, Y.T., Chen, H-H.: Opinion extraction, summarization and tracking in news and blog corpora. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (2006)
Kim, J., Chern, G., Feng, D., Shaw, E., Hovy, E.: Mining and assessing discussions on the web through speech act analysis. In: Proceedings of the Workshop on Web Content Mining with Human Language Technologies at the 5th International Semantic Web Conference (2006)
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 271 (2004)
Zhu, F., Zhang, X.: Impact of online consumer reviews on sales: the moderating role of product and consumer characteristics. J. Mark. 74(2), 133–148 (2010)
Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1275–1284. ACM (2009)
Ramakrishnan, G., Jadhav, A., Joshi, A., Chakrabarti, S., Bhattacharyya, P.: Question answering via Bayesian inference on lexical relations. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, vol. 12, pp. 1–10. Association for Computational Linguistics (2003)
Jiao, J., Zhou, Y.: Sentiment Polarity Analysis based multi-dictionary. Phys. Procedia 22, 590–596 (2011)
Macdonald, C., Ounis, I.: The TREC Blogs06 collection: creating and analysing a blog test collection. Department of Computer Science, University of Glasgow Technical report TR-2006-224, 1, 3-1, (2006)
Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, pp. 174–181. Association for Computational Linguistics, July 1997
Wiebe, J.: Learning subjective adjectives from corpora. In: AAAI/IAAI, pp. 735–740, July 2000
Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 129–136. Association for Computational Linguistics, July 2003
Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 105–112. Association for Computational Linguistics, July 2003
Esuli, A., Sebastiani, F.: SENTIWORDNET: a publicly available lexical resource for opinion mining. In: Proceedings of LREC, vol. 6, pp. 417–422, May 2006
Fellbaum, C.: WordNet. Blackwell Publishing Ltd., Hoboken (1999)
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL (2004)
Chen, C.C., Tseng, Y.D.: Quality evaluation of product reviews using an information quality framework. Decis. Support Syst. 50(4), 755–768 (2011)
Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39(5), 6000–6010 (2012)
Clarke, D., Lane, P., Hender, P.: Developing robust models for favourability analysis. In: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, pp. 44–52. Association for Computational Linguistics (2011)
Reyes, A., Rosso, P.: Making objective decisions from subjective data: detecting irony in customer reviews. Decis. Support Syst. 53(4), 754–760 (2012)
Moraes, R., Valiati, J.F., Neto, W.P.G.: Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst. Appl. 40(2), 621–633 (2013)
Martín-Valdivia, M.T., Martínez-Cámara, E., Perea-Ortega, J.M., Ureña-López, L.A.: Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst. Appl. 40(10), 3934–3942 (2013)
Sarkar, K., Chakraborty, S.: A sentiment analysis system for Indian Language Tweets. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 694–702. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_66
Li, Y.M., Li, T.Y.: Deriving market intelligence from microblogs. Decis. Support Syst. 55(1), 206–217 (2013)
Patra, B.G., Das, D., Das, A., Prasath, R.: Shared task on Sentiment Analysis in Indian Languages (SAIL) tweets - an overview. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 650–655. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_61
Sarkar, K., Bhowmik, M.: Sentiment polarity detection in bengali tweets using multinomial Naïve Bayes and support vector machines. In: CALCON 2017, Kolkata. IEEE (2017)
Sarkar, K.: Sentiment polarity detection in Bengali tweets using deep convolutional neural networks. J. Intell. Syst. 28(3), 377–386 (2018). https://doi.org/10.1515/jisys-2017-0418. Accessed 7 July 2019
Sarkar, K.: Using character N gram features and multinomial Naïve Bayes for sentiment polarity detection in Bengali tweets. In: Proceedings of Fifth International Conference on Emerging Applications of Information Technology (EAIT), Kolkata. IEEE (2018)
Sarkar, K.: Sentiment polarity detection in Bengali tweets using LSTM recurrent neural networks. In: Proceedings of Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Sikkim, India, 25–28 February 2019. IEEE (2019)
Das, A., Bandyopadhyay, S.: SentiWordNet for Indian languages. In: Proceedings of 8th Workshop on Asian Language Resources (COLING 2010), Beijing, China, pp. 56–63 (2010)
Vapnik, V.: Estimation of Dependences Based on Empirical Data, vol. 40. Springer-Verlag, New York (1982). https://doi.org/10.1007/0-387-34239-7
Acknowledgments
This research work has received support from the project entitled ‘‘Indian Social Media Sensor: an Indian Social Media Text Mining System for Topic Detection, Topic Sentiment Analysis and Opinion Summarization’’ funded by the Department of Science and Technology, Government of India under the SERB scheme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarkar, K. (2020). A Stacked Ensemble Approach to Bengali Sentiment Analysis. In: Tiwary, U., Chaudhury, S. (eds) Intelligent Human Computer Interaction. IHCI 2019. Lecture Notes in Computer Science(), vol 11886. Springer, Cham. https://doi.org/10.1007/978-3-030-44689-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-44689-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44688-8
Online ISBN: 978-3-030-44689-5
eBook Packages: Computer ScienceComputer Science (R0)