Keywords

1 Introduction

There is a continuous growth of textual content in social media platforms which has led to research on finding automatic ways for processing this information. Analysing the sentiments of social media posts allows us to understand public opinion behind crucial topics [1]. Sentiment analysis involves the classification of tweets as negative, positive and neutral. In this paper, exclusively focusing on negative sentiment analysis because, in recent times, an increase in the number of online posts that promote hatred and discord in society is observed.

There are two types of learning methodologies employed for sentiment analysis, namely supervised and unsupervised. Examples of supervised machine learning tools are naïve Bayes, logistic regression, support vector machines (SVM) and long short-term memory (LSTM) [2,3,4,5]. Examples of unsupervised methods are fuzzy logic-based approaches that interpret the SentiWordNet scores of tweets to make a decision [6, 7], and aspect-based analysis [8,9,10,11] which is the approach undertaken in this paper. A brief discussion on aspect-based works in the literature is included in Sect. 2 prior to introducing the proposed model that is based on deriving phrase patterns and keywords from tweets and scoring them with the help of the SentiWordNet lexicon. This approach generally involves determining the head aspect or issue subject of opinion phrases in a sentence. The organization of this paper is as follows. The proposed work is introduced in Sect. 2, the experimentation and the results are discussed in Sect. 3, and the conclusions are given in Sect. 4.

2 Proposed Work

The aspect-based analysis aims to determine the subject issue or head aspect in the given text to understand the topic of discussion. One of the significant works in this regard is that of Canonical Conditional Random Fields (CRF) [12] that locates an opinion phrase in every sentence where the head aspect occurs. However, it is not necessary that all opinion phrases, detected in this manner, would contribute to useful information. Some other related works are: [13] that extracts subjective verb expressions, and [14] that extracts keyphrases in a supervised manner. Mukherjee [11] tried to work around the shortcomings by adopting a feature-based approach for representing the head aspect. The semantic features derived in the generative phase are fed into the discriminative phase for issue-based sentence classification. A labelled dataset of aspect-based opinion phrases was generated for the purpose. In [8], the sentiment score of a sentence was obtained by averaging the individual sentiment scores of each aspect at the clause level. This was done under the assumption that a single sentence may express different sentiments towards different aspects of the movie. The focus in [9] was to determine user sentiments of different aspects of a product from tweets. A method that finds special mention is the unsupervised aspect analysis for sentiment detection in [10] since the theme is closely related to our method. They used a specially labelled dataset with annotations of both aspect and sentiment. The top-ranked words for each aspect were shortlisted based on the probabilities generated by a latent Dirichlet allocation (LDA) model. A polarity score was assigned to each noun–adjective pair to compute the overall sentiment. In our aspect-based approach, a sentiment-scoring function for tweets is used that is explained in more detail below.

2.1 Process Flow for the Negative Sentiment Analysis Task

The block diagram in Fig. 1 shows an overview of the generic negative sentiment analysis task for the SemEval 2013 twitter dataset. Text preprocessing, feature extraction and sentiment computing are the major chunks of the generic negative sentiment analysis task as noted from Fig. 1. The text preprocessing shown, prior to the feature extraction, is common to our task. Synsets from the SentiWordNet lexicon [15] provide positive, negative and neutral scores for each word in a tweet. The maximum of these three scores has been taken for each word assuming that the maximum score reflects the real nature and context of a word. The labels of tweets have been considered as ‘negative’ and non-negative’ with the ‘neutral’ category considered as ‘negative’. The first phase is the preprocessing phase, and the first step is slang and abbreviation removal from each sentence. A list of abbreviations along with their full forms is maintained. This list was taken from the slangs and abbreviation list of webopedia and can be accessed online at [16].

Fig. 1
figure 1

Process flow of the negative sentiment analysis task

Each tweet is scanned for words in the abbreviation list that are replaced by its full form. Once all the abbreviations are replaced, each word is converted to lower-case followed by tokenizing the text. The next step is spelling correction. The symspell checker using the symspell inbuilt dictionary [17] with a maximum distance of 2 is used, to edit changes in the spelling. The words are then tagged along with their part-of-speech (POS) tags using the NLTK library [18]. POS tags that do not give any sentiment information are identified and remove them from our text corpus. These tags are identified as [‘EX’, ‘FW’, ‘LS’, ‘NNP’, ‘WP’, ‘WP$’, ‘NNPS’, ‘POS’, ‘PRP’, ‘PRP$’, ‘TO’, ‘WDT’, ‘WRB’] according to [19]. The remaining POS tags are the ones that are based on which the sentences can be scored. An aspect is a subject that is spoken about in the sentence. For example, the sentence ‘The food was very good but the ambience was not very nice’, here the aspect is ‘food’ and ‘ambience’ and the words describing them are ‘very good’ and ‘not very nice’, respectively. Since a single sentence contains both positive and negative sentiments, a prioritized scoring system is devised by identifying 14 types of phrases containing combinations of POS tags ADJECTIVES, VERBS, NOUNS, MODALS and ADVERBS, as shown in Table 1.

Table 1 Our fourteen POS tag categories for detecting sentiment phrases

The idea is to capture the above phrases in a sentence and then score it. A sliding text window is iterated over the tokenized sentence until the variable token reaches the last token in the sentence. There is a possibility that multiple tag patterns may be associated with the same keyphrase in the given sentence. If more than one pattern in Table 1 exists in a sentence then the longest one is chosen, as the phrase ‘not very nice’ in the context of ‘ambience’ keyword, in the example cited above. Preference is given in the order from bottom to top in Table 1. In case, the length and preference order of two patterns is the same, and the first pattern is chosen.

For example, consider the sentence She was not very good at basketball. The phrase patterns derived are—(i) very good → ADVERB+ADJECTIVE (ii) not very good → ADVERB + ADVERB + ADJECTIVE. Since both (i) and (ii) are valid tag patterns present in Table 1, our system selects the longest possible tag pattern in the sentence, i.e. ADVERB + ADVERB + ADJECTIVE (not very good) as the phrase pattern to be scored. The SentiWordNet scores of the terms (not, very, good) are multiplied to get a sentiment score for the phrase pattern which comes out to be a negative value due to the presence of not.

A corpus of verbs, adverbs, modal, conjunctions, noun and adjectives were compiled using wordnet [20]. An ordered dictionary was kept for each category of verbs, adverbs, modal, conjunctions, noun and adjectives so that the searching can be performed faster. The category of a word is identified by examining the allowed POS tags in that category. For POS tags have been referred to the English Penn Treebank [21]. The allowed POS tags for each category are compiled in Table 2 for verbs and Table 3 for adjectives and adverbs, respectively.

Table 2 Acceptable POS tags for the verb category
Table 3 Acceptable POS tags for the adverb and adjective categories

The dependency parser [22] has been used to get the syntactic dependency between the tokens in each statement. The resulting aspects need to belong to this class of dependency parsing tags shown in Table 4. For more reading, the readers are referred to [23] that enlists examples for these dependency parsing tags. Finding links between tokens and relationship-tagging gives more meaningful classification than single keyword-based classification systems [24] or bag-of-word features where the order in which words occur is not known [25]. A sentence when processed by the dependency parser may detect zero or more of these dependency tags.

Table 4 Acceptable syntactic dependency tags

The preference of selection of these tags, as the keyword, is from top to bottom in Table 4. Further sub-categories or extended POS tagging for each of these dependency tags are kept in our accept states as shown in Table 5. The keyword that matches a category in both Tables 4 and 5 qualifies as the aspect information and is included for sentiment-scoring along with the detected phrase.

Table 5 Results on the SemEval 2013 dataset

2.2 Sentiment-Scoring of Tweets

Once our phrase pattern from Table 1 and the keyword from Table 5 have been selected, the phrase + keyword combination for sentiment-scoring and classification is used. Synsets from the SentiWordNet lexicon provide us with the positive, negative and neutral scores for each word in the phrase + keyword combination. The maximum of these three scores has been taken for each word assuming that maximum score reflects the real nature and context of a word. The polarity is (+) for positive scores and (−) for negative scores. The labels have been considered as ‘negative’ and non-negative’ with all non-positive labels including ‘neutral’ labelled as ‘negative’. The procedure of sentiment-scoring for keywords is the same as for the phrases. The software implementation of our code is available for reference at https://github.com/mkaii/bug-/tree/gh-pages. All valid scores derived from sentences in the twitter post are added up to form the variable x that is nonlinearly transformed into a decision variable f(x) as explained below.

The polarized SentiWordNet score is used to replace each word of our phrase + keyword combination. The summation of all these scores, denoted by x, is fed as input into our modified sigmoid function shown in Eq. (1).

$$f\left( x \right) = \frac{4}{{1 + e^{ - x} }} - 2$$
(1)

The sigmoid function in Eq. (1) is plotted in Fig. 2. The range of the sigmoid function in (1) is from −2 to 2. When the sentiment score x is 0, f(x) = 0. If the score x is greater than 0, then f(x) > 0 and the overall sentiment is determined to be positive. Likewise, if the score x is less than or equal to 0, then f(x) ≤ 0 and the overall sentiment is determined to be negative. So the final sentiment score of any phrase + keyword combination would be between −2 (for negative sentiment) and 2 (for non-negative sentiment).

Fig. 2
figure 2

Sigmoid function f(x) versus x

3 Experimentation and Results

The dataset that has been used in this project is SemEval 2013 [26]. It is a twitter-based dataset having 1650 samples. It contains 3 labels, positive, negative and neutral. ‘0’ is for neutral, ‘−1’ is for negative and ‘1’ is for positive sentiment. As the dataset consists of three labels (‘1’ for positive, ‘0’ for neutral and ‘−1’ for negative), for our application of negative sentiment analysis, two labels have been used (converted from three labels by considering neutral tweets as negative). In the two-label approach, label ‘1’ represents non-negative or positive sentiment and label ‘-1’ represents negative sentiment. The experiments were performed in Python 3 software on an Intel i-5 processor with a 2.6 GHz clock. Our code took less than one minute to execute. The software implementation of our code is available online at [27].

Our method is the aspect-based method for negative sentiment analysis which is unsupervised, and in this approach, there is no need for training the model. In the preprocessing step, spelling correction was done, slangs and abbreviations are removed. Preprocessing includes text normalization, stemming, removal of stopwords, spelling correction. After the preprocessing step, the words are tagged along with their correct POS tags using the NLTK library. POS tags that do not give any sentiment information are identified and removed them from our text corpus. The remaining tags are the ones based on which the sentences can be scored. Since a single sentence contains both positive and negative sentiments, a prioritized scoring system has been made by identifying the aspect as well as the phrases containing the acceptable POS tags, in a priority order selected by us. k-fold cross-validation has also been performed for all our experiments, where k = 10. In this process, the input is divided into k subsets. The model is trained on k-1 subsets and evaluated on the remaining 1 subset. This experiment is repeated k times, each time choosing different k − 1 subsets, and the final results are averaged. Apart from accuracy (in %), the precision, recall and F-score metrics are also computed. Table 5 shows the comparison between accuracy, precision, recall and F-score values of different classification models for the SemEval 2013 dataset. The highest accuracy and F-score values are got by using this method, as observed from the results summarized in Table 5.

For comparison, several baseline approaches are also implemented for sentiment analysis and detection that are supervised approaches. The least mean square (LMS) loss function is used for the training phase of all the supervised techniques. The grid search was used for determining support vector machine (SVM) parameter settings. Using n-gram, as a feature extraction method, delivered better results as compared to using the bag-of-word (BOW) features. Long short-term memory (LSTM) model also performed well. The Adam optimizer, with a batch size of 50 and 100 epochs, is the hyperparameter settings for LSTM. The GLOVE word embedding was used. The best result, however, was delivered by our unsupervised aspect-based method for detecting negative sentiment tweets.

4 Conclusions

Hatred in society has spread, in recent times, to social media platforms that are now required to be monitored continuously to detect negative online posts that may trigger mishappenings like riots. In this work, twitter posts have been classified based on their negative content, using an aspect-based approach based on the sentiment scoring of a phrase and keyword combination, detected separately. Various text preprocessing techniques are initially used such as normalization, stemming, removal of stop words, spelling correction and replacing abbreviations with their full forms. Phrases and keywords are selected from a prioritized list involving POS tags, from each sentence. The SentiWordNet lexicon provides us with the sentiment scores of phrases and keywords that are summed up for each sentence of the tweet and nonlinearly transformed into a decision score for detecting the sentiment of the tweet. A negative decision score indicates a negative tweet. Higher classification accuracies on the benchmark SemEval 2013 dataset prove the superiority of our approach as compared to the state of the art. Inclusion of multilingual slang to detect negative content of tweets is the future scope of our work.