1 Introduction

The evolution of the social media web and the chance to access the important opinions of different people on various business, political, health and social issues have motivated the development of sentiment analysis as a dynamic and important research field [11]. According to [10], a document-level sentiment analysis can be defined as the simplest form of sentiment analysis, where it is assumed that the document contains an opinion on one main object expressed by the author of the document. A document may even contain multiple opinions about the same entity. On the other hand, a sentence-level sentiment analysis is a detailed form of sentiment analysis aimed at obtaining a more detailed view of the different opinions expressed in the document about the entities. However, it is necessary to determine if the sentences are subjective or objective before the polarity of the sentences can be analysed. Only the subjective sentences will then be further analysed to determine whether they are positive, negative or neutral [10].

A sentiment analysis (SA) of texts at the document level or the sentence level is always not enough for applications because it does not specifically have an opinion target; in other words, the sentiment does not have a target entity. An entity normally has many aspects, and people have different opinions about each of these aspects. For instance, when people talk about a product (entity), they may consider many aspects of the product such as its price, colour, weight, etc. Thus, if it is assumed that each document represents a single entity, a positive opinion in a document about the entity does not show that people have positive opinions about all aspects of the entity. In the same way, a negative opinion in the document does not mean that people are negative about all aspects of the entity [21]. Therefore, a complete analysis must be conducted in order to determine the possible aspects and to identify whether the sentiment about an aspect is positive or negative. A full aspect-based sentiment analysis (ABSA) model is needed to extract the sentiments in detail, especially in reviews of such products as cameras and smart phones, as well as specific brands such as Apple, Samsung, Google, etc. Other relevant sources are discussion forums where people give reviews and share their experiences in using the product.

Basically, an aspect-based sentiment analysis (ABSA) performs a finer-grained analysis, which is also defined as a research problem that focuses on identifying the sentiment expressions of aspects of the target within a given document [10]. Moreover, an ABSA aims at identifying the aspects of entities in the document, and for each identified aspect, the sentiment polarity is estimated based on a specific approach [7]. Most significantly, sentiment analyses at the document level and the sentence level do not determine exactly what people liked or did not like.

An aspect-based sentiment analysis (ABSA) consists of two main tasks: aspect-based feature extraction, and aspect- based sentiment classification [22]. Aspect-based feature extraction is the task where the main aspects of entities in a specific domain are identified. Most of research works on aspect-based feature extraction concentrated on nouns, noun phrases/groups [17,18,19, 23]. Other approaches used phrase dependency parser to take into consideration the relationship between aspects and opinions, as well as machine learning techniques, such as conditional random field (CRF), to find explicit aspects.

Other researchers proposed an unsupervised model consisting of methods for learning multi-word aspects of product reviews[6]. It considered the influence of an opinion word on detecting the aspect by employing a set of heuristic rules. Furthermore, the researchers proposed a new measurement based on mutual information and aspect frequency to score aspects with a recently developed bootstrapping iterative algorithm. Not all aspects detected are useful aspects and there are also some incorrect ones. Therefore the model uses aspect pruning to remove these incorrect aspects. Another study [20] proposed a pattern-based bootstrapping algorithm to extract candidate product features, and feature clustering to group the features into aspects. However, it did not handle more types of features such as adjectives and verbs, and did not consider implicit features. Besides, in 2013, authors [25] proposed two novel APSM and ME-APSM models to extract aspects and aspect-specific polarity-aware sentiments from online reviews. However, the results showed that the model still needed improvements in terms of the aspect-level sentiment classification.

The aim of our research is to propose a new hybrid sentiment classification approach using Twitter attributes as features to improve the Twitter aspect-based sentiment analysis performance. The hybrid model was validated using Twitter datasets, namely, the HCTS dataset [36], STS dataset [12], and Sanders Twitter Corpus (STC) dataset [29]. Our hybrid sentiment classification model incorporates rule-based with feature selection methods for the Twitter sentiment classification. The results were compared with a baseline classification method, i.e. the support vector machine. The main contributions of our research are outlined as follows:

  1. (i)

    We examined whether the association rule mining (ARM) augmented with heuristic combination POS patterns is beneficial for detecting single and multi-word aspects for a Twitter aspect-based sentiment analysis, which has been studied extensively.

  2. (ii)

    We proposed a new hybrid sentiment classification for a Twitter aspect-based sentiment analysis, which incorporates rule-based with feature selection methods, including principal component analysis (PCA), latent semantic analysis (LSA), and random projection (RP) during the experiments.

  3. (iii)

    The proposed dataset, named Hate Crime Twitter Sentiment (HCTS), was released for evaluation by other researchers.

The remaining part of the paper proceeds as follows: Section 2 describes related works on sentiment analysis, Section 3 explains the proposed hybrid sentiment classification framework, whilst Section 4 describes the experimental results and discussion. Then, the final section presents the conclusions and future works.

2 Related works on sentiment analysis

In recent years, a large number of techniques and enhancements have been proposed for the problem of sentiment analysis (SA) in different fields and for different tasks. Three types of techniques are used to classify opinions in SA, namely, lexicon-based approaches, machine learning approaches and hybrid approaches.

Lexicon-based approaches do not require any training dataset and use sentiment lexicons such as Word Net [24] and SentiWordNet [5], for classification purposes. These approaches give sentiment scores ranging from − 1 to 1, but they do not classify the context-dependent opinion words appropriately. Moreover, there are also hybrid approaches that combine machine learning and lexical approaches [4].

In contrast, machine learning approaches can be grouped into two main categories, which are supervised and unsupervised techniques. Support Vector Machine (SVM) and Naive Bayesian (NB) classifications are examples of supervised sentiment analysis techniques which have achieved higher success in text classifications [26, 27, 33]. A primary concern of supervised approaches is that they depend on large training datasets, which can be time consuming to collect for each domain. The supervised methods also depend on the selection and extraction of the appropriate set of features to be used for the detection of sentiments. For instance, unigrams, bigrams, and part-of-speech tags are used as feature extractors [26, 31]. Feature selection enhances the sentiment classification method by combining syntactic features with semantic information from sources like SentiWordNet [1]. Feature selection methods, such as Principal Component Analysis (PCA) and Random Projection (RP), are aimed at eliminating irrelevant and redundant features to yield an improved classification accuracy for machine learning techniques and also to reduce data dimensionality [28]. Various feature selection approaches, such as information gain and the chi-square test, are employed to gain higher accuracy in sentiment analysis [35], but many of these studies mainly focused on document-level sentiment analysis.

Authors [3] presented an efficient method of feature selection and ensemble learning for an aspect-based sentiment analysis. The algorithm is based on a single-objective PSO and basic learning algorithms, namely CRF, SVM and ME. Using the SVM+PCA method, an accuracy of 74.51% was obtained on a laptop, on restaurant review datasets. The aim of this works was to investigate the use of a hybrid sentiment classification to enhance the performance of a Twitter aspect-based sentiment classification.

3 Proposed hybrid sentiment classification framework

Figure 1 shows Twitter aspect-based hybrid sentiment classification framework. By using this framework, it can obtain highly effective results for hybrid sentiment classification. The framework performs sentiment classification in four main phases: data collection using twitter datasets, twitter preprocessing incorporates filtering to filter unique twitter attributes, aspect-based feature extraction to identify single and multi-word explicit and implicit aspects by using an association rule mining (ARM) with heuristic combination POS patterns and Stanford dependency parser (SDP) methods. Aspect-based hybrid sentiment classification consists of rule-based approach for sentiment word detection and principal component analysis (PCA) for sentiment word feature selection and lastly sentiment classification using support vector machine (SVM).

Fig. 1
figure 1

Twitter aspect-based hybrid sentiment classification framework

3.1 Data collection

Table 1 shows the Hate Crime Twitter Sentiment (HCTS) dataset [36]. This dataset consists of different categories of hate crimes including racial, religion, sexual, feminist, disability and nationality. Table 2 shows the Stanford Twitter Sentiment (STS) dataset [12]. We considers only 353 tweets of negative and positive based on different categories. Table 3 shows the Sanders Twitter Corpus (STC) dataset [29] which can be downloaded from http://www.sananalytics.com/lab/twittersentiment/ http://www.sananalytics.com/lab/twittersentiment/. This dataset consists of four different categories (Apple, Google, Microsoft, and Twitter), and has 5513 manually-classified positive, negative, neutral and irrelevant tweets. In this experiments, we used only 519 positive and 572 negative tweets whilst the neutral and irrelevant tweets which were not necessary for the classification, were ignored [37]. These datasets were chosen because it were used by other researchers as an evaluation dataset for the twitter sentiment analysis[29]. Using this datasets proves that our proposed approach is domain-independent for detecting explicit and implicit aspects in twitter sentiment analysis in order to attain highly effective results.

Table 1 Hate Crime Twitter Sentiment (HCTS) Dataset
Table 2 Stanford Twitter Sentiment (STS) Dataset
Table 3 Sanders Twitter Corpus (STC) Dataset

3.2 Text pre-processing

The tweets had to go through the pre-processing steps prior to the classification because the language of Twitter has some unique attributes that may not be relevant to the classification process, such as usernames, links and hashtags. Besides, the filtering process was carried out by removing new lines, opposite emoticons, repeated letters, laughter and punctuation marks. Finally, tokenization, the removal of stop words, lowercase conversion, and the stemming process were carried out to complete the twitter pre-processing.

3.3 Aspect-based feature extraction

The task was performed to find the explicit single and multi-word aspects by using the association rule mining (ARM) with heuristic combination POS patterns. In addition, the Stanford Dependency Parser (SDP) method, which took into account the relationship between the opinion and aspects, was employed for the extraction of implicit aspects. These elements are discussed in the following sub sections.

3.3.1 Single and multi-word explicit aspect extraction using Association Rule Mining (ARM)

Association rule mining is used to find the important aspects of a given target. Association rules are created by analysing the data for frequent ‘if/then’ patterns. Then, the supporting criteria and confidence are used to identify the most important relationships. In other words, the confidence indicates the number of times the ‘if/then’ statements have been found to be true, whilst the supporting criteria are an indication of how frequently the items appear in the database.

The association rule mining is stated as follows:

Let I = i 1,...,i n be a set of items, and D be a set of transactions (the dataset). Each transaction consists of a subset of items in I. An a s s o c i a t i o n r u l e is an implication of the form XY, where XI, YI, and XY = . The rule XY holds in D with confidence c if c % of the transactions in D that support X also support Y. The rule has support s in D if s % of the transactions in D contain XY. The problem with association rule mining is that it has to generate all the association rules in D that have support and confidence that are greater than a user-specified minimum support and minimum confidence.

We used association rule mining (ARM) to find the important single and multi-word aspects for the given target entities [13]. In our case, an aspect was defined as being important if it appeared in more than 1% (minimum support) of the sentences. This experiment applied the association rule mining which is based on the Apriori algorithm. The Apriori algorithm is used to find the frequent(i m p o r t a n t)aspects from a set of transactions that satisfy a user-specified minimum support.

The different values for the minimum support and minimum confidence were applied in the experiments. During the experiments, a suitable value for the minimum support was 0.1, and for the minimum confidence it was 0.5.

From the first experiments of single and multi-word explicit aspect extraction using association rule mining (ARM), a list of single aspects was obtained from the nouns and noun phrases. In this phase, the heuristic combination POS patterns were applied in order to identify the multi-word aspects from the tweets. Previous research had established that a few aspects that people talk about have more than one single word, especially in review sentences [6]. Table 4 shows an alphabetical list of the part-of-speech (POS) tags used in the Penn Treebank Project for the ARM method, whilst Table 5 shows the heuristic combination part-of-speech (POS) patterns for the aspects generation. In this experiment, for instance, the phrases “feminism”, “racism”, and “blackness” were obtained, which were extracted from the pattern “NN NNS NNP NNPS”. Then, another phrase, “white racist” was extracted from the “NN JJ” pattern, “anti bigotry” was extracted from the “NN VBG” pattern, “anti-muslim” was extracted from the “NN VBZ” pattern, and “anti racist” was extracted from the “NN RBR” POS pattern.

Table 4 Alphabetical list of part-of-speech (POS) tags used in the Penn Treebank Project for ARM Method
Table 5 Heuristics combination in POS patterns for multi-word aspects generation

Below is the sample of multi-word aspects generation obtained from heuristic combination in POS patterns for HCTS, STS and STC datasets:

  • (NN—JJ), e.g. aspects from the HCTS dataset: anti white, anti blackness, anti feminist, anti america, anti muslim; aspects from the STS dataset: night museum, malcolm gladwell, amp t, time warner; aspects from the STC dataset: ice cream sandwich.

  • (NN—VB) e.g. anti white supremacy.

  • (DT—JJ) e.g. anti muslim prejudice.

3.3.2 Implicit aspects extraction using the Stanford Dependency Parser (SDP) method

Few studies have investigated the implicit aspects extraction for an aspect-based sentiment analysis. In this work, we attempted to show that the relationship between aspects and opinions can help to determine implicit aspects by capturing the grammatical relations by using the dependency parsers. Different types of dependencies were used to find the relationships that were beneficial to discover implicit aspects. Table 6 shows a sample description of the types of dependencies that were used during the implicit aspects extraction [8, 9].

Table 6 Sample description of typed dependency

In this process, take, for instance, this tweet:

Example 1

“Govt removing Motability cars from disabled ppl who can only walk 50m and will then demonise those that have to give up work because of it.”

The SDP method yielded the following results:

root(ROOT-0, Govt-1) acl(Govt-1, removing-2) compound(cars-4, Motability-3) dobj(removing-2, cars-4) case(ppl-7, from-5) amod(ppl-7, disabled-6) nmod (removing-2, ppl-7) nsubj(walk-11, who-8) aux(walk-11, can-9) advmod(walk-11, only-10) acl:relcl(ppl-7, walk-11) dobj(walk-11, 50m-12) cc(walk-11, and-13) aux(demonise-16, will-14) advmod(demonise-16, then-15) conj(walk-11, demonise-16) dobj(demonise-16, those-17) nsubj(have-19, that-18) acl:relcl(those-17, have-19) mark(give-21, to-20) xcomp(have-19, give-21) compound:prt(give-21, up-22) dobj(give-21, work-23) case(it-26, because-24) mwe(because-24, of-25) nmod(work-23, it-26).

Example 2

“People that openly laugh at handicapped people in public are a disgrace. So many pathetic people on this planet”

The SDP method yielded the following results:

nsubj(disgrace-12, People-1) nsubj(laugh-4, that-2) advmod(laugh-4, openly-3) acl:relcl(People-1, laugh-4) case(people-7, at-5) amod(people-7, handicapped-6) nmod(laugh-4, people-7) case(public-9, in-8) nmod(people-7, public-9) cop(disgrace-12, are-10) det(disgrace-12, a-11) root(ROOT-0, disgrace-12) root(ROOT-0, So-1) amod(pathetic-3, many-2) amod(people-4, pathetic-3) nmod(So-1, people-4) case(people-4, on-5) det(planet-7, this-6) dep(people-4, planet-7).

The process used direct dependencies and transitive dependency (within a distance of one dependency relation). From Example 1, an implicit aspect was identified from the tweet, namely, ‘disabled people’ from the relation amod(ppl-7, disabled-6)’. Besides, another implicit aspect was identified from the relation amod(people-7, handicapped-6), which was taken from Example 2. ‘Disabled people’ and ‘handicapped people’ were implicit aspects that actually referred to the ‘disability hate crime’ category. Besides, when a negation modifier relation was found, it would change the tweet to the opposite sentiment.

3.4 Aspect-based hybrid sentiment classification

The hybrid sentiment classification contains a rule-based and a Feature Selection (FS) for identifying the sentiment words, as well as Support Vector Machine for sentiment classification.

3.4.1 Rule-based method for sentiment word detection

For the purpose of sentiment word detection, the rule-based method was employed. It contains two main rules, as follows:

figure d

For example, in this tweet: “If I can help myself I don’t need you, I’m still blessed to have my limbs so I will use them. I hate when people make me feel disabled!”

The SDP method successfully extracted an aspect from the relation “xcomp - open clausal complement of a verb or an adjective”, which was “feel disabled”, from tweets. However, not all the aspects identified in the first process would be considered as significant aspects. The role of the rule-based method is to identify the significant aspects. It is based on the location of the aspects and sentiment word in the tweets, and also the polarity value for every tagged sentiment word in the tweets.

3.4.2 Sentiment word feature selection

The Principal Component Analysis (PCA) algorithm can be used in feature selection method. It has previously been observed that the PCA showed promising results in the feature selection process [30, 32].

The PCA-based dimensionality reduction is based on the following steps: 1) Convert training and test datasets into numerical form; 2) Find covariance matrix of datasets; 3) Calculate Eigen values and Eileen vector of the covariance matrix; 4) Sort Eigen vectors w.r.t non-increasing Eigen values; 5) Keep the top k vectors; and 6) Train, test and evaluate the reduced datasets.

3.4.3 Baseline classification methods

It is now well-established from a variety of studies [2, 16, 26] that a Support Vector Machine (SVM) works well for various classifications and text categorisations. The SVM has many advantages in terms of handling large features and robustness when there is a sparse set of examples, especially when only a small number of tweets are used in the experiments.

4 Experimental results and discussion

Three datasets were used for the experiment, including the STS dataset, which consists of companies, events, locations, miscellaneous, movies, persons and products; the HCTS dataset, which has six types of hate crime categories, namely, racial, religion, sexual, feminist, disability and nationality; and lastly, the STC dataset, which consists of four categories, namely Google, Microsoft, Twitter and Apple.

From the process, a list of single and multi-word aspects was obtained from the association rule method with heuristic combination POS patterns, as shown in Table 7, for the HCTS dataset. Each tweet was annotated with a list of aspects that were relevant to the dataset. Table 7 shows that the process was able to successfully extract aspects from the HCTS dataset, including “anti”, “muslim”, “feminist”, “jewish”, “black”, and “white”, which were relevant to the targets of this research. Furthermore, the results also shows the multi-word aspects that were extracted from the heuristic combination part-of-speech (POS) patterns.

Table 7 Some experimental results of aspects detection for HCTS dataset

Tables 8 and 9 present a summary of the aspects for the HCTS and STS datasets, respectively, based on the dependency parser grammatical relation.

Table 8 Implicit Aspects from Hate Crime Twitter Sentiment (HCTS) Dataset
Table 9 Implicit aspects from Stanford Twitter Sentiment (STS) dataset

4.1 Evaluation measures

Accuracy measures were used to evaluate the entire classification performance with binary classes (positive and negative). For the positive and negative sentiments on the entities, the standard evaluation measures of precision and recall were employed.

Four effective measures were used in this study based on the confusion matrix output, namely, True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).

  • Precision(P) = TP/(TP+FP)

  • Recall(R) = TP/(TP+FN)

  • Accuracy(A) = (TP+TN)/(TP + TN + FP + FN)

  • F-Measure(Micro-averaging) = 2.(P.R)/(P+R)

4.2 Aspect based sentiment feature extraction analysis results

As can be seen from Table 10, the STS dataset gained higher positive polarity scores than negative polarity scores. The experimental results showed that the total percentage of positive labels was 21.6% from companies, 87.5% from events, 17.6% from locations, 28.8% from miscellaneous, 78.9% from movies, 65.6% from persons, and 54% from product categories. Furthermore, the total percentage of negative labels obtained was 61.2% from companies, 0% from events, 47.1, 42.4, 5.3, 18.8, and 20.6% from other categories, respectively. This method also successfully classified tweets as neutral, where it obtained 17.2% from companies, 12.5% from events, 35.3% from locations, 28.8% from miscellaneous, 15.8% from movies, 15.6% from persons, and 25.4% from product categories.

Table 10 STS polarity scores and total percentage of positive, negative and neutral labels according to category

The results obtained from the feature extraction analysis of the HCTS dataset are presented in Table 11. The results were classified as positive, negative and neutral tweets. Therefore, the disability, racial, feminist and religious categories gained higher negative polarity scores compared to positive polarity scores. In contrast, the sexual category gained higher positive polarity scores of 27% compared to negative polarity scores of 25%. The proposed method labelled tweets as negative for 27.9% from the disability category, 57.9% from the feminist category, 55.3% from the racial category, and 49.5% from the religious category. Nevertheless, 21.3%, 16.7%, 16.5% and 13.4% of the tweets, respectively from the HCTS dataset were labelled as positive.

Table 11 HCTS polarity scores and total percentage of positive, negative and neutral labels according to category

4.3 Twitter aspect-based hybrid sentiment classification analysis results

Tables 1213 and 14 present the performances of the aspect-based classifier with different feature selection methods, respectively. Here, the methods considered only the subjective tweets. Then, the performance was measured using the accuracy, precision, recall and F-measure. The evaluation was carried out with a combination of different methods to be incorporated with the SVM classifier. Three feature selection methods were used, namely, principal component analysis (PCA), latent semantic analysis (LSA), and random projection (RP).

Table 12 Comparison of STS aspect-based sentiment classification results with different feature selection methods for 10-fold cross validation
Table 13 Comparison of HCTS aspect-based sentiment classification results with different feature selection methods for 10-fold cross validation
Table 14 Comparison of STC aspect-based sentiment classification results with different feature selection methods for 10-fold cross validation

Table 12 presents the experimental results from the STS dataset. It shows that the highest accuracy achieved was 76.5517 with the A B S A + S e n t i w o r d n e t + P C A method. Besides, the precision was 0.779, recall 0.766, and F-measure 0.76 with POS Tags features. In contrast, by using the ABSA+Sentiwordnet alone, the accuracy achieved was only 53.4483, which was the same when the LSA method was applied. Likewise, by using the random projection (RP) method, the highest accuracy achieved was only 63.7931%, with a combination of POS Tags and bigram features. The results have been presented in Fig. 2 that shows STS classification results between different features and feature selection method.

Fig. 2
figure 2

STS classification results between different features and feature selection method

In contrast, the experimental results from the HCTS dataset presented in Table 13 showed that the proposed approach using the A B S A + S e n t i w o r d n e t + P C A method with a combination of POS Tags and unigram features gave the highest accuracy of 71.6243%, 0.708 for precision, 0.716 for recall, and 0.647 for the F-measure, accordingly. On the other hand, by using the A B S A + S e n t i w o r d n e t with POS Tags features alone, it gave an accuracy of only 69.0802% compared to the A B S A + S e n t i w o r d n e t + P C A with POS Tag features, which obtained an accuracy of 71.2329%. Lastly, by using the A B S A + S e n t i w o r d n e t + R P with a combination of POS Tags + unigram features, the accuracy achieved was 70.8415%. Furthermore, the PCA method was compared with the latent semantic analysis (LSA) feature selection method, where the results indicated that the accuracy did not change with the use of the same features. Figure 3 shows detail HCTS classification results between different features and feature selection method.

Fig. 3
figure 3

HCTS classification results between different features and feature selection method

Table 14 shows the experimental results from the STC dataset. It also revealed that the highest accuracy was achieved by using the A B S A + S e n t i w o r d n e t + P C A method, where the accuracy was 74.2438, with a combination of POS Tags and unigram features. Besides, the precision shown was 0.751, recall was 0.742 and F-measure was 0.738. In contrast, by using the A B S A + S e n t i w o r d n e t method alone, the accuracy achieved was only 52.429, which was the same as when the LSA method was applied. It was apparent from this table that the feature selection with random projection (RP) method could only produce an accuracy of 67.0944% with POS Tags features. It can be seen in Fig. 4 the detail results of STC dataset with different features and feature selection method. From the graphs in Figs. 23 and 4, it can be seen that the part-of-speech (POS) tags were the best features for the feature selection method.

Fig. 4
figure 4

STC classification results between different features and feature selection method

4.4 Discussion I

The results of this study were then compared with the findings from different feature selection and classification algorithms, as shown in Table 1516 and Table 17. The F-measure of 0.76 from the STS dataset was obtained by using the hybrid ABSA + Senti w ordnet + PCA + SVM approach, and was followed by 0.729 with the random forest (RF) classifier. A comparison of the results from the HCTS dataset revealed that the highest F-measure value of 0.647 was also obtained from the hybrid A B S A + S e n t i w o r d n e t + P C A + S V M approach.

Table 15 Evaluation of STS twitter aspect-based sentiment classification results with different classification algorithms
Table 16 Evaluation of HCTS twitter aspect-based sentiment classification results with different classification algorithms
Table 17 Evaluation of STC twitter aspect-based sentiment classification results with different classification algorithms

The results of this experiment were then compared with the findings from the STC dataset, where the same approach obtained the highest F-measure value of 0.738. Surprisingly, as can be seen in Table 17, the hybrid A B S A + S e n t i w o r d n e t + P C A + R F method successfully achieved an F-measure value of 0.74, which was closer to the result with the SVM. However, a longer time was taken to train the random forest (RF) classifier compared to the other classification algorithms.

The classification accuracies of the STS, HCTS and STC datasets were also better with the A B S A + S e n t i w o r d n e t + P C A + S V M approach compared to the other classifiers. An accuracy of 76.5517% was achieved with the STS, 71.6243% with the HCTS, and 74.2438% with the STC datasets. In summary, these results showed that the Support Vector Machine (SVM) worked well even though different features were used during the classification, and that the part-of-speech (POS) tags feature was the best feature for representing the tweets during the classification. Overall, these results also indicated that the classification accuracies were improved with the proposed hybrid approach.

4.5 Discussion II

The performance of the extreme learning machine (ELM) has been tested in the STS, HCTS and STC datasets in the binary classification, which has two training samples in each class. An analysis by Huang [14] identified the ELM as being efficient, accurate and easy to implement in various applications. The aim of this experiment was to verify whether the ELM can handle a few training datasets, especially Twitter datasets. The experiments were conducted on a laptop with a sigmoidal hidden layer activation function. The number of hidden nodes, L varied from 20 to 500, with an interval of 10 [34]. Fifty trials were conducted for each problem in the experiment [15]. Table 18 describes the performance comparison of ELM for the binary class Twitter dataset. The table includes the average testing and training accuracy, and also the corresponding standard deviation (Std Dev).

Table 18 Performance comparison in a benchmark dataset for Twitter aspect-based sentiment classification by using ELM

Table 18 highlights in particular, the performance comparison between the testing and training accuracies with various hidden nodes from [20, 500]. The best performance of the hybrid method was produced by the A B S A + S e n t i w o r d n e t + E L M + U n i g r a m for training the STS dataset, where the training accuracy was 0.95249. Besides, for the HCTS dataset, the best performance of training accuracy was produced by the A B S A + S e n t i w o r d n e t + E L M + U n i g r a m, where the value was 0.92623. In addition, for the STC dataset, the best performance of the hybrid method was also produced by the A B S A + S e n t i w o r d n e t + E L M + U n i g r a m, where the value was 0.84126.

The testing results of the datasets were compared with the training results. The best performances of the hybrid method were produced in the HCTS and STC datasets. It could be seen that the number of samples in the testing and training datasets played a significant role in producing highly accurate results. A detailed analysis of the ELM for the aspect-based sentiment classification will be conducted for future works in this research.

4.6 Statistical significance tests

In this section, statistical tests were used to examine the significance of the differences in the means of the classification performances. Table 19 shows the results of the paired sample T-test for the classification performance measurements.

Table 19 Paired sample T-test for classification performance measurements

The statistical test results in Table 19 show that the differences in the hybrid methods were significant at an alpha level of 0.05. The difference between the hybrid PCA+SVM and hybrid PCA+NB was statistically significant at an alpha level of 0.05 with a p-value of 0.004. Moreover, the difference between the hybrid PCA+SVM and hybrid SVM was significant at an alpha level of 0.05 with a p-value of 0.023. However, the difference between the hybrid PCA+SVM and hybrid PCA+RF was statistically not significant with a p-value of 0.264, and it also showed that the difference between the hybrid PCA+SVM and hybrid ELM was statistically not significant, where the p-value was 0.058. However, it showed that the method was significant at an alpha level of 0.1.

5 Conclusions

In this paper, we proposed a new hybrid approach for a Twitter aspect-based sentiment analysis to perform finer-grained analysis. This research examined the association rule mining (ARM) that was augmented with a heuristics combination in part-of-speech (POS) patterns for detecting explicit single and multi-word aspects. The reason for this was that interrelations between a heuristic combination in POS patterns from words such as adjectives, adverbs, verbs and determiners with noun phrases are beneficial for the detection of relevant explicit aspects. Besides, the Stanford dependency parser (SDP) method, through the use of grammatical relations, is crucial in detecting implicit aspects. Our system also incorporates a rule-based with feature selection method for identifying the sentiment words. The evaluation with different classification algorithms also demonstrated that the new hybrid sentiment classification produced meaningful results with Twitter datasets, which represented different domains. The implementations showed that the new hybrid sentiment classification, that incorporated results from the aspect-based sentiment classifier method, was able to improve the performance of the existing baseline sentiment classification methods by 76.55, 71.62 and 74.24%, respectively. In a future work, we plan to conduct experiment with another social media data such as youtube and facebook by using the proposed hybrid sentiment classification approach in order to identify sentiment of people towards certain issues.