Hybrid sentiment classification on twitter aspect-based sentiment analysis

Zainuddin, Nurulhuda; Selamat, Ali; Ibrahim, Roliana

doi:10.1007/s10489-017-1098-6

Hybrid sentiment classification on twitter aspect-based sentiment analysis

Published: 13 December 2017

Volume 48, pages 1218–1232, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Hybrid sentiment classification on twitter aspect-based sentiment analysis

Download PDF

Nurulhuda Zainuddin¹,
Ali Selamat^1,2,3 &
Roliana Ibrahim¹

4575 Accesses
88 Citations
Explore all metrics

Abstract

Social media sites and applications, including Facebook, YouTube, Twitter and blogs, have become major social media attractions today. The huge amount of information from this medium has become an attractive resource for organisations to monitor the opinions of users, and therefore, it is receiving a lot of attention in the field of sentiment analysis. Early work on sentiment analysis approached this problem at a document-level, where the overall sentiment was identified, rather than the details of the sentiment. This research took into account the use of an aspect-based sentiment analysis on Twitter in order to perform a finer-grained analysis. A new hybrid sentiment classification for Twitter is proposed by embedding a feature selection method. A comparison of the accuracy of the classification by the principal component analysis (PCA), latent semantic analysis (LSA), and random projection (RP) feature selection methods are presented in this paper. Furthermore, the hybrid sentiment classification was validated using Twitter datasets to represent different domains, and the evaluation with different classification algorithms also demonstrated that the new hybrid approach produced meaningful results. The implementations showed that the new hybrid sentiment classification was able to improve the accuracy performance from the existing baseline sentiment classification methods by 76.55, 71.62 and 74.24%, respectively.

Twitter Feature Selection and Classification Using Support Vector Machine for Aspect-Based Sentiment Analysis

Pre-processing Framework for Twitter Sentiment Classification

Aspect-Based Unsupervised Negative Sentiment Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The evolution of the social media web and the chance to access the important opinions of different people on various business, political, health and social issues have motivated the development of sentiment analysis as a dynamic and important research field [11]. According to [10], a document-level sentiment analysis can be defined as the simplest form of sentiment analysis, where it is assumed that the document contains an opinion on one main object expressed by the author of the document. A document may even contain multiple opinions about the same entity. On the other hand, a sentence-level sentiment analysis is a detailed form of sentiment analysis aimed at obtaining a more detailed view of the different opinions expressed in the document about the entities. However, it is necessary to determine if the sentences are subjective or objective before the polarity of the sentences can be analysed. Only the subjective sentences will then be further analysed to determine whether they are positive, negative or neutral [10].

A sentiment analysis (SA) of texts at the document level or the sentence level is always not enough for applications because it does not specifically have an opinion target; in other words, the sentiment does not have a target entity. An entity normally has many aspects, and people have different opinions about each of these aspects. For instance, when people talk about a product (entity), they may consider many aspects of the product such as its price, colour, weight, etc. Thus, if it is assumed that each document represents a single entity, a positive opinion in a document about the entity does not show that people have positive opinions about all aspects of the entity. In the same way, a negative opinion in the document does not mean that people are negative about all aspects of the entity [21]. Therefore, a complete analysis must be conducted in order to determine the possible aspects and to identify whether the sentiment about an aspect is positive or negative. A full aspect-based sentiment analysis (ABSA) model is needed to extract the sentiments in detail, especially in reviews of such products as cameras and smart phones, as well as specific brands such as Apple, Samsung, Google, etc. Other relevant sources are discussion forums where people give reviews and share their experiences in using the product.

Basically, an aspect-based sentiment analysis (ABSA) performs a finer-grained analysis, which is also defined as a research problem that focuses on identifying the sentiment expressions of aspects of the target within a given document [10]. Moreover, an ABSA aims at identifying the aspects of entities in the document, and for each identified aspect, the sentiment polarity is estimated based on a specific approach [7]. Most significantly, sentiment analyses at the document level and the sentence level do not determine exactly what people liked or did not like.

An aspect-based sentiment analysis (ABSA) consists of two main tasks: aspect-based feature extraction, and aspect- based sentiment classification [22]. Aspect-based feature extraction is the task where the main aspects of entities in a specific domain are identified. Most of research works on aspect-based feature extraction concentrated on nouns, noun phrases/groups [17,18,19, 23]. Other approaches used phrase dependency parser to take into consideration the relationship between aspects and opinions, as well as machine learning techniques, such as conditional random field (CRF), to find explicit aspects.

Other researchers proposed an unsupervised model consisting of methods for learning multi-word aspects of product reviews[6]. It considered the influence of an opinion word on detecting the aspect by employing a set of heuristic rules. Furthermore, the researchers proposed a new measurement based on mutual information and aspect frequency to score aspects with a recently developed bootstrapping iterative algorithm. Not all aspects detected are useful aspects and there are also some incorrect ones. Therefore the model uses aspect pruning to remove these incorrect aspects. Another study [20] proposed a pattern-based bootstrapping algorithm to extract candidate product features, and feature clustering to group the features into aspects. However, it did not handle more types of features such as adjectives and verbs, and did not consider implicit features. Besides, in 2013, authors [25] proposed two novel APSM and ME-APSM models to extract aspects and aspect-specific polarity-aware sentiments from online reviews. However, the results showed that the model still needed improvements in terms of the aspect-level sentiment classification.

The aim of our research is to propose a new hybrid sentiment classification approach using Twitter attributes as features to improve the Twitter aspect-based sentiment analysis performance. The hybrid model was validated using Twitter datasets, namely, the HCTS dataset [36], STS dataset [12], and Sanders Twitter Corpus (STC) dataset [29]. Our hybrid sentiment classification model incorporates rule-based with feature selection methods for the Twitter sentiment classification. The results were compared with a baseline classification method, i.e. the support vector machine. The main contributions of our research are outlined as follows:

(i)
We examined whether the association rule mining (ARM) augmented with heuristic combination POS patterns is beneficial for detecting single and multi-word aspects for a Twitter aspect-based sentiment analysis, which has been studied extensively.
(ii)
We proposed a new hybrid sentiment classification for a Twitter aspect-based sentiment analysis, which incorporates rule-based with feature selection methods, including principal component analysis (PCA), latent semantic analysis (LSA), and random projection (RP) during the experiments.
(iii)
The proposed dataset, named Hate Crime Twitter Sentiment (HCTS), was released for evaluation by other researchers.

The remaining part of the paper proceeds as follows: Section 2 describes related works on sentiment analysis, Section 3 explains the proposed hybrid sentiment classification framework, whilst Section 4 describes the experimental results and discussion. Then, the final section presents the conclusions and future works.

2 Related works on sentiment analysis

In recent years, a large number of techniques and enhancements have been proposed for the problem of sentiment analysis (SA) in different fields and for different tasks. Three types of techniques are used to classify opinions in SA, namely, lexicon-based approaches, machine learning approaches and hybrid approaches.

Lexicon-based approaches do not require any training dataset and use sentiment lexicons such as Word Net [24] and SentiWordNet [5], for classification purposes. These approaches give sentiment scores ranging from − 1 to 1, but they do not classify the context-dependent opinion words appropriately. Moreover, there are also hybrid approaches that combine machine learning and lexical approaches [4].

In contrast, machine learning approaches can be grouped into two main categories, which are supervised and unsupervised techniques. Support Vector Machine (SVM) and Naive Bayesian (NB) classifications are examples of supervised sentiment analysis techniques which have achieved higher success in text classifications [26, 27, 33]. A primary concern of supervised approaches is that they depend on large training datasets, which can be time consuming to collect for each domain. The supervised methods also depend on the selection and extraction of the appropriate set of features to be used for the detection of sentiments. For instance, unigrams, bigrams, and part-of-speech tags are used as feature extractors [26, 31]. Feature selection enhances the sentiment classification method by combining syntactic features with semantic information from sources like SentiWordNet [1]. Feature selection methods, such as Principal Component Analysis (PCA) and Random Projection (RP), are aimed at eliminating irrelevant and redundant features to yield an improved classification accuracy for machine learning techniques and also to reduce data dimensionality [28]. Various feature selection approaches, such as information gain and the chi-square test, are employed to gain higher accuracy in sentiment analysis [35], but many of these studies mainly focused on document-level sentiment analysis.

Authors [3] presented an efficient method of feature selection and ensemble learning for an aspect-based sentiment analysis. The algorithm is based on a single-objective PSO and basic learning algorithms, namely CRF, SVM and ME. Using the SVM+PCA method, an accuracy of 74.51% was obtained on a laptop, on restaurant review datasets. The aim of this works was to investigate the use of a hybrid sentiment classification to enhance the performance of a Twitter aspect-based sentiment classification.

3 Proposed hybrid sentiment classification framework

Figure 1 shows Twitter aspect-based hybrid sentiment classification framework. By using this framework, it can obtain highly effective results for hybrid sentiment classification. The framework performs sentiment classification in four main phases: data collection using twitter datasets, twitter preprocessing incorporates filtering to filter unique twitter attributes, aspect-based feature extraction to identify single and multi-word explicit and implicit aspects by using an association rule mining (ARM) with heuristic combination POS patterns and Stanford dependency parser (SDP) methods. Aspect-based hybrid sentiment classification consists of rule-based approach for sentiment word detection and principal component analysis (PCA) for sentiment word feature selection and lastly sentiment classification using support vector machine (SVM).

3.1 Data collection

Table 1 shows the Hate Crime Twitter Sentiment (HCTS) dataset [36]. This dataset consists of different categories of hate crimes including racial, religion, sexual, feminist, disability and nationality. Table 2 shows the Stanford Twitter Sentiment (STS) dataset [12]. We considers only 353 tweets of negative and positive based on different categories. Table 3 shows the Sanders Twitter Corpus (STC) dataset [29] which can be downloaded from http://www.sananalytics.com/lab/twittersentiment/ http://www.sananalytics.com/lab/twittersentiment/. This dataset consists of four different categories (Apple, Google, Microsoft, and Twitter), and has 5513 manually-classified positive, negative, neutral and irrelevant tweets. In this experiments, we used only 519 positive and 572 negative tweets whilst the neutral and irrelevant tweets which were not necessary for the classification, were ignored [37]. These datasets were chosen because it were used by other researchers as an evaluation dataset for the twitter sentiment analysis[29]. Using this datasets proves that our proposed approach is domain-independent for detecting explicit and implicit aspects in twitter sentiment analysis in order to attain highly effective results.

Table 1 Hate Crime Twitter Sentiment (HCTS) Dataset

Hybrid sentiment classification on twitter aspect-based sentiment analysis

Abstract

Similar content being viewed by others

Twitter Feature Selection and Classification Using Support Vector Machine for Aspect-Based Sentiment Analysis

Pre-processing Framework for Twitter Sentiment Classification

Aspect-Based Unsupervised Negative Sentiment Analysis

Explore related subjects

1 Introduction

2 Related works on sentiment analysis

3 Proposed hybrid sentiment classification framework

3.1 Data collection

3.2 Text pre-processing

3.3 Aspect-based feature extraction

3.3.1 Single and multi-word explicit aspect extraction using Association Rule Mining (ARM)

3.3.2 Implicit aspects extraction using the Stanford Dependency Parser (SDP) method

Example 1

Example 2

3.4 Aspect-based hybrid sentiment classification

3.4.1 Rule-based method for sentiment word detection

3.4.2 Sentiment word feature selection

3.4.3 Baseline classification methods

4 Experimental results and discussion

4.1 Evaluation measures

4.2 Aspect based sentiment feature extraction analysis results

4.3 Twitter aspect-based hybrid sentiment classification analysis results

4.4 Discussion I

4.5 Discussion II

4.6 Statistical significance tests

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation