1 Introduction

The advent of internet has ensued certain feasibilities at a wider level. Through this, people from all around the world are able to communicate and share their experiences. Along with rapid increase in its usage, people prefer to explore online reviews prior to purchasing any product. There are various social websites such as amazon.com, ebay.com, neweggs.com, alibaba.com etc that stipulate a platform to the users to share their opinions about different products. According to [22], 81% of the users perform online search before purchasing products. These reviews are usually expressed in the form of natural language or by a star rating. These reviews assist the users to ensure the reliability of intended product and to make an informed decision. All of the popular e-shopping platforms (amazon, ebay, neweggs, alibaba, etc.) provide users with complete product specifications including brand, price, features or characteristics . Furthermore, such platforms also provide facility to users for sharing their experiences about the purchased products in the form of a feedback. This feedback is usually available in two forms.

  1. 1.

    Users can provide their feedback in the form of star rating where customers summarize their experience on a scale of 1(Unsatisfactory) to 5 (Excellent).

  2. 2.

    Users can also provide their feedback in the form of detailed reviews where they describe experience related to the product in their own words.

These reviews can serve as an influential factor behind purchasing or altering the plan of purchasing a particular product. . Anyone intends to buy a product can analyze these reviews to form the decision accordingly. Therefore, online customer feedback is considered as a significant informative source, which is useful for both probable customers and product manufacturers. The expeditious increase in the user count also increases the number of online reviews. The exploration of huge plethora of reviews requires a lot of cognitive efforts of users to go through all of them. Eventually, user will end up by reviewing only few of the reviews.

To overcome this issue, data scientists use tools like natural language processing and text analysis to extract the essence of all the available product reviews. The term sentiment analysis encompasses the development process behind these tools [30].

Different studies have been conducted in the literature to perform sentiment classification. For sentiment classification, the core ingredient is the exploitation of polarity bearing words present in the reviews e.g. adjectives, verbs, adverbs etc. Part-Of-Speech are commonly used for evaluation of sentiments [13]. The Sentiment analysis approaches can be classified into two categories such as symbolic and sub-symbolic [8]. Our work belongs to the first type of approach as we are using SentiWordNet (a lexical resource for opinion mining). We have performed a comprehensive analysis on role of the aforementioned POS, and hence identified an important research question that how various forms of adverbs impact on the sentiment classification? This study has exploited all ten distinct forms of adverbs such as general adverbs, general superlative adverbs, general comparative adverbs, general-wh adverbs, degree adverbs, degree superlative adverbs, degree comparative adverbs, degree-wh adverbs, time adverbs and locative adverbs. Furthermore, a comprehensive dataset is acquired from, which contains 51,005 reviews of two products, office products and musical DVDs.

Along with the reviews expressed in a natural language, we have also extracted the corresponding star ratings. These star rates are considered as benchmark in this study. The outcomes of study revealed that general superlative adverbs and degree-wh adverb has outperformed other forms of adverbs by attaining F-measure of 0.86 and 0.80 respectively. The proposed study will be beneficial for sentiment classification systems that exploit the adverbs as polarity bearing features.

2 Literature review

Millions of product reviews are available online. According to a survey conducted in USA, 81% of Internet users do online research prior to purchasing online products [22]. Therefore, many approaches and techniques are used to extract useful information from social reviews and sentiment analysis is one of them.

Sentiments are feelings generally expressed as opinions, attitudes, emotions etc. and some subjective impression without facts [33]. Moreover, social media is deemed as a platform to share these feelings. Since each user hold a unique way to express his opinion. Therefore, generally a conflict is assumed in opinions, which are expressed in these reviews, such as bad and good, better and worst etc [18].

Hence, the sentiments expressed in reviews can be classified as positive and negative, or into an n-point scale, e.g., very good, good, satisfactory, bad, very bad. Similar to our study, most of the studies have considered the star ratings as benchmark to evaluate research efforts [12, 17, 26].

Many researchers have contributed for the analysis of sentiments using different approaches. The Claws Tagger is used to classify the features as noun, verb, adjective, adverb etc. It is important to identify these polarity baring words are useful in classifying the sentiments [1]. Furthermore, using different sentiment lexicon like SentiWordNet is used for finding the polarity of these POS tags. Subsequently, the polarity features (verbs, adjectives, adverbs etc.) are used to classify sentiments. Some researchers have evaluated these parameters individually while others have formed their different combinations.

2.1 Role of adjective

The classification of reviews as negative and positive based on single feature is a difficult task for researcher, however, Bojy et al. conducted experiments on 550 negative and 222 positive sentences and extracted adjectives as major part of opinions [5]. In addition to this, researchers contributed that the first step is to analyze that which feature can be used as opinionated feature. Boiy et al. 2007 identified adjectives as subjective of a document and produce with 74.8% results.

Dray et al. 2009 extracted adjectives from movie reviews and calculate precision/recall. They concluded that adjectives can be used to classify a review as positive and negative [10]. Dray et al. 2009 recorded f-score of 0.71 for positive class and 0.62 for negative class. Furthermore, Moghaddam and Popowich conducted experiments for extracting adjectives along with their comparative and superlative forms from reviews [21]. Based on the extracted features the reviews are classified as positive and negative. Moghaddam and Popowich obtained 73% results for their experiments.

After considering the forms of adjectives some researcher contributed for understanding the linguistics forms of adjectives and their specific role in reviews. Kumar and Suresha [16] performed classification of reviews regarding data taken from [36] and [19], adjectives were found as main stream of opinions. They collected adjectives (JJ), superlative adjectives (JJS), and comparative adjectives (JJR) for different experiment and compared with results conducted in 2002 and 2005 respectively. The analysis made by Kumar and Suresha was overall 68.8% better than the others.

In addition to this, Rill et al. states that adjectives (JJ) along with its two forms such as superlative adjectives (JJS) and comparative adjectives (JJR) are helpful in obtaining the opinions from reviews [29]. The results obtained by this approach is 0.78 using SentiWordNet library. These researcher also contributed that reviews are “J-Shaped” typically asymmetric in which title contains positive words rather than rest of review. Similarly, Das and Balabantaray conducted a study and experience that adjectives along with its different forms produce 76.6% better results [7]. The researchers Padmaja et al. extended the POS tagging work and extracted the features such as noun, adjectives, verbs and adverbs. Hence, they exploit the combination of adjectives and nouns from which they obtained some significant 72.5% results [25].

The critical review of the papers reviewed in this section has been demonstrated in the Table 1. It is evident from this table that most of the researchers used SentiWordNet as a lexicon resource.

Table 1 Literature review for adjectives

2.2 Role of verbs

From the literature, it was found that adjectives are considered as opinions phrases in which subjective meaning is present. Therefore, the polarity of these adjectives may occur as negative or positive. Therefore, a question arises that if only adjectives have been used as a stream for opinions? For answering this question, further literature was consulted and it was found that Chesley et al. states that there are two verbs classes 1).subjective verbs and 2).objective verbs in which verbs are classified as negative or positive [6]. Therefore, these verbs classes are responsible for classifying a review as positive or negative. Their approach produces 67.8% results.

The role of verb are further discussed by Neviarouskaya et al. in their approach to classify opinions on individual sentence level. They extracted 1947 verbs which are annotated to perform experiment [24]. The researcher states that sentence must contain a verb because it is the part-of- speech on which action is depended. Therefore, sentence without a verb part might be unable to classify opinions as positive or negative. The Neviarouskaya et al. produces f-scores of 0.71 for their research.

The critical review of the papers reviewed in this section has been demonstrated in the Table 2. It is evident from this table that most of the researchers used SentiWordNet as a lexicon resource. Different datasets were used, and only few research efforts handled negations.

Table 2 Literature review for verbs

2.3 Role of adverbs

The objective of sentiment analysis is to find out the opinion words and sentence which majorly express the voice of customer as stated by Zhang et al. [40]. They discussed that customer voice can be determined by implementing a graph. The graph explains the relation between opinion bearing words and other aspects. The graph is conducted on degree superlative (RBS) and degree comparative (RBR) forms of adverbs. The results for their approach were 72.8%.

After studying the forms of adverbs similar to adjectives some researchers like Vinodhini & Chandrasekaran concluded a survey that negative impact of any sentence in a document can be determined by some general (WH-RBQ) and degree (WH-RRQ) interrogative adverbs [37]. Therefore, if interrogative sentences are part of opinions then negative opinion can be extracted whenever general (WH-RBQ) and degree (WH-RRQ) adverbs occur. Their study achieves the results 0.74 f-measure. Further, the contribution made by Wang et al. states that the degree adverbs strengthen the sentiments of reviews and blogs and help them to be classified as positive and negative [38]. They conducted experiments on 1375 reviews of three domains such as electronic devices, hotels and E-journals and extracted degree adverbs. They achieved f-measure of 0.80.

Similarly, Dragut & Fellbaum states that the adjectives are opinion words for any review or document and sentiment can be evaluated from them but on the other hand the adverbs support those opinions and present a clear picture or voice [9]. They classified the adverbs in classes such as: strong positive adverbs, weak adverbs, strong negative adverbs, doubtful adverbs. They also stated that general adverbs like awfully scores negative but when negation occurs it changes its class from negative to positive and sentiment of reviews also reversed. Their research produces f-measure of 0.76.

The critical review of the papers reviewed in this section has been demonstrated in the Table 3.

Table 3 Literature review for adverbs

2.4 Hybrid approaches

As its name suggests that hybrid approaches are those approaches that combine various parts-of-speech for their impact analysis on sentiment analysis. For instance, Benamara et al. [3, 14], Khan and Baharudin have delineated that if adverbs are also combined with adjectives then they provide better results in locating the opinions.

In the similar fashion, verbs and adverbs have been combined to analyze their impact on sentiment analysis task. For example, Bjorkelun et al. [4], Mudinas et al. [23] have specified that by combining the features which are modifiers such as comparative adverbs and verbs are helpful in analysis of the opinions.

There exist various other studies in the literature that have combined more than two features for sentiment analysis. Kaushik et al. [32] have identified that adjective, adverbs and verbs are more important than other POS because mostly opinion is extracted from adjectives adverbs, and verbs. Similarly, Patel and Soni [34] have depicted that by merging verb as feature with adjectives and adverb, JJRB (adjective + adverb) and RBVB (adverb + verb) better results can be obtained using Senti-word net in unsupervised learning.

In this study, we have analyzed various state-of-the-art approaches and analyzed the role of each polarity bearing feature. Here we conclude all of these papers that which particular type of polarity bearing feature has been utilized and what is their success rate. The studies in Table 4 shows whenever adjectives alone were used, researchers were able to achieve F-Measure of 0.79 or less than that in different research papers. It should also be noted here that some researchers have not reported the F-measure in their papers, such papers have been ignored. The presented table indicates that adjectives and adverbs have individually obtained maximum F-Measure of 0.79. However, whenever they have been combined with other polarity bearing features such as nouns, adjectives, adverbs, and verbs has no significant improvement is reported.

Table 4 A comparative study of all existing approaches

This highlights two important findings from the literature that

  1. 1.

    To recapitulate, the main findings are as follows: Adjectives and adverbs hold the more potential than other forms of adverbs to classify sentiments with high accuracy.

  2. 2.

    Combining other features with adjectives or adverbs does not enhance the value of F-measure

From this discussion, it is obvious that adjectives and adverbs are the most important polarity bearing features. Moreover, different types of adjectives have already been studied [29, 31]. Two types of adverbs such as: general adverbs [9] and degree adverbs [38] have also been utilized. It has been concluded recently that they hold sufficient potential to measure the intensity of sentiments, and their future work was to analyze the role of adverbs on larger dataset and the exploitation of other types of adverbs. Therefore, this research investigates the role of all forms of adverbs comprehensively on larger dataset of more than 50,000 product reviews.

This research is not only useful for researchers but also helpful for developers. The developers can focus on the best identified adverb types to build accurate sentiment classification tools. Similarly, the researchers can exploit the identified best adverb types in the future when combining with other polarity features for better accuracy.

3 Proposed methodology

As discussed briefly that different researchers have utilized different feature to mine sentiments such as noun, adjective, verbs, adverbs and their different combinations. Various studies have depicted the role of adverbs is significant for classifications of sentiments. As best of our knowledge, there exist very few studies that have evaluated one or two types of adverbs. Since different forms of adverbs hold potential factor in identification of a sentiment, therefore, all of these should be taken into account to perform sentiment classification. In this regard, this paper investigates various form of adverbs to measure their impact on the sentiment analysis classification.

To analyze the impact of adverbs and its different forms and to identify which form or their different combinations play a consequential role in extracting the sentiments, we have proposed a comprehensive methodology.

The architecture diagram for the proposed methodology is explained in Figure 1.

Figure 1
figure 1

Architecture diagram for proposed methodology

To explore the impact of adverbs and its forms, the comprehensive dataset is collected from the social media website named as Amazon. This data set contains reviews which are further pre-processed and POS tagging is applied. The tagger tagged different parts of speech from which mainly the adverbs and its respective forms are extracted. These are ten distinct forms which are further classified using linguistics [35]. These tags are extracted using CLAWS C7 tagger [28]. After acquiring the forms of adverbs, different combinations are processed to acquire their scores using Senti-Word Net library [2]. Furthermore, reviews are classified in two different classes such as positive and negative according to their scores. Along with the extraction of reviews written in natural language, star rates are also extracted against each review as these rates are assigned by users or customers. Hence, the classification of two respective classes is further compared with benchmark for final evaluation.

3.1 Dataset collection

The dataset used in this research study is crawled from Amazon by using .Net crawler. The dataset crawler fetches the reviews of two products which are distinct in nature. The extracted reviews were consists of product reviews, star ratings, and later on (after POS tagging) form of adverbs were added.

The developed crawler is based on xPath expressions. It is recommended by the World Wide Web (WWW) for locating elements and attributes in an XML document. Further, it is based on a tree representation of the XML document and provides the ability to navigate around the tree, selecting nodes by a variety of criteria. In our case, as the product reviews are in HTML pages, we had to load it as XML and then required information of the products is extracted (Table 5).

Table 5 Dataset details of microsoft products

The extracted data set contains reviews of two products: office products, which includes Microsoft Word, Microsoft PowerPoint, Microsoft Excel and Microsoft Access Database. The other product is musical DVDs, which contains two main albums that are pop tracks and slow tracks. The details of the data set is shown in Table 6.

Table 6 Dataset details of Musical DVDs

The reason for selection these two products are 1) total number of reviews are in greater size as compared to other products, 2) reviews are provided from diversified locations, and 3) as reviews from different locations so language compulsiveness is another factor.

3.2 Data pre-processing

In the pre-processing step, first of all, we have verified the sentence boundary and then tokenized the text. Stop words, extra white spaces, html tags, new lines redundant characters, emotions and special symbols are removed.

3.3 Part-of-speech tagging (POS tagging)

The reviews comprise of different Part-Of-Speech such as noun, adjective, verb, and adverb. These are tagged using CLAWS C7 tag set. Since our study focuses on adverbs, therefore, all forms of adverbs are extracted from the reviews using Constituent Likelihood Automatic Word-tagging System (CLAWS) C7 tagger [28]. CLAWS has been continuously developed since the early 1980s. It has consistently achieved 96% to 97% accuracy. Several tagsets have been used in CLAWS and 132 basic tags were used in its initial version. The current standard tagset is the C7 tag-set which consists of 160 tags. In our research, we have used CLAWS WWW tagger free service to tag our dataset. As our study is focusing on adverbs, therefore, CLAWS C7 tagged some adverbs such as:

  • General adverbs (RR): This type of adverb modifies the verb such as “carelessly” and “easily” etc.

  • General WH adverbs (WH-RR): This type of adverbs transforms general verbs along interrogation e.g. “when, where and what” etc.

  • General Comparative adverbs (RRQ): This type of adverbs is used to make comparison e.g. “better, longer and easier” etc.

  • General Superlative adverbs (RRT): It is used to modify general adverb by using superior form e.g. “best, longest and easiest” etc.

  • Adverb of time (RT): It tells us about when an action happened, also for how long and how often e.g. “now, yesterday, and tomorrow, all day, rarely, seldom” etc.

  • Degree adverbs (RG): This type of adverb tells us about the degree of an action, an adjective or another adverb. e.g. “so, very and much” etc.

  • Degree WH Adverbs, (RG-WH): This type of adverbs covers a set of words beginning with wh-. e.g. “how, however and whatever, when, why” etc.

  • Degree Comparative adverbs (RGT): It modifies verbs along another adverb with comparison e.g. “more, less and few” etc.

  • Degree Superlative adverbs (RGQ): This type of adverbs modify verbs along another adverb with superior form e.g. “most, least and worst” etc.

  • Locative adverbs, (RL): It describes the location of another adverb e.g. “alongside, forward and middle” etc.

List of all forms of adverb identified in both the datasets (Office products and Musical DVDs) after POS tagging are shown in Table 7.

Table 7 List of all forms of adverb identified in both the datasets

Let’s consider an example which helps to understand how adverb can be a part of review and how they can be combined together in a sentence or sentence of any review or document to explain the story.

In the Figure 2, the respective adverbs, which appear in a review are highlighted. Now the problem is to understand that how these adverbs narrate the story of any user and for sentiment how it will be classified as positive or negative? The different forms of adverbs in Figure 2, such as “annually, only, as-well, already, professionally” are some general adverbs (RR) while “around” is a locative adverb (RL) and “most” is general superlative adverb (RGQ) where as “soon, as” are degree adverbs (RG). Now the questions arise that

  • What is the contribution of these adverbs or forms of adverbs in sentiment classification of any review?

  • How these adverbs or its different forms will impact on sentiment classification?

Figure 2
figure 2

A review with adverbs forms

For this reason the proposed methodology comprises some experiments to understand the contribution and impact of these adverbs.

3.4 Scoring features

SentiWordNet 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications. SentiWordNet 3.0 is an improved version of SentiWordNet 1.0, a lexical resource and is publicly available for research purposes. It is currently licensed to more than 300 research groups and used in a variety of research projects worldwide. SentiWordNet is one of these lexicons that assigns to each synset of WordNet. Therefore, it is knowledge base which can be used for assigning the scores. The total positive words present in WordNet are 30,76,708 and negative words are 1,51,044. Every feature which is present in any document, review or text is assigned with some positive and negative scores. In this study, all the adverbs are also scored for calculating the total scores of a review from SentiWordNet. Therefore, every positive score is calculated as the average of the positive scores of all the synonyms of that word present in dictionary. Similarly, the negative scores are also calculated as the average of the negative of all synonyms of that word, which are also present in dictionary. Two different level scoring scheme are used in this research such as 1) Sentence and 2) Review level scoring.

3.4.1 Sentence scoring

The sentence score is calculated by the scores of the words present in the sentence.

$$ senScore(S)= \frac{1}{n}\sum\limits_{i=1}^{n}(P_{i}) $$
(1)

Where:

$$ \begin{array}{@{}rcl@{}} senScore(S) &=& \text{ is score for a sentence in a document or review}.\\ n &=& \text{ is the total number of words present in a sentence}. \\ P_{i} &=& \text{ polarity words present in sentence where i is the limits of words}. \end{array} $$

Let us consider an example for calculating the sentence level scores.

Sentence 1: “The Microsoft version 2013 office is very good and many things are enhanced especially the new style.”

Explanation: The word “very” is degree adverb and the word “especially” is general adverb. Now these two distinct adverbs will get the scores from SentiWordNet library and average is calculated for this sentence.

The sentence score is positive because both of adverbs have positive polarity score returned by polarity lexicon “SentiWordNet”. Let us consider another example where negation occurs.

Sentence 2: “The Access is not that good as compared to SQL but others like Excel, Word is much better than before”.

Explanation: This sentence contains “not” a general adverb “as” degree adverb and “much” degree superlative adverb while “better” is general superlative adverb. Now these four adverbs will get scores and to find the polarity for this sentence in which negation occurs firstly the negativity is calculated by formula such as:

$$ NegScore = 1 - (positiveSocres + negativeScores ) $$
(2)

Afterwards, the total calculation is constructed to understand the sentiments of a sentence. Thus, all the sentences are scored and finally averaged for scoring the review of a product.

3.4.2 Review scoring

The review score are calculated by the scores of sentences present in a review.

$$ revScore(R)= \frac{1}{n}\sum\limits_{i=1}^{n}(S_{i}) $$
(3)

Where:

$$ \begin{array}{@{}rcl@{}} revScore(R) &=& is score of document or review. \\ n &=& is the total number of sentence in a review. \\ S_{i} &=& sentence present in a review where i is the limit of sentences. \\ \end{array} $$

Lets’ consider an example shown in Figure 3. In this example, the user has rated the review as 5-star, which is positive. For classifying the review using adverbs and its different forms, the respective review is tagged. After tagging the review, different forms of adverbs are extracted. After extracting these forms, they have been combined together to obtain their scores by using SentiWordNet. Firstly, the scores are assigned at sentence level and then at review level. Therefore, the final scores of reviews are obtained and will be classified in either positive or negative class.

Figure 3
figure 3

This is an example review where a user has rated it 5-star review. Further, various adverbs have been highlighted

Review Class: The results for this review are positive as it scored for the range of 5-Star i.e. from 0.51 to 1.0.

3.5 Star rating

For every review there is always a star rate which is assigned by a user on behalf of her experiences for a particular product. Thus, amazon also contains star rates whenever customer shares opinions. To evaluate 5 star rating of the review, the first step is to find out the ranges which is from highest to lowest ratings. To calculate these star rates the range 0 to 1 are considered and so different researchers contributed such as Pappas and Popescu-Belis [27], Lak and Turetken [17], Jang Jong [12] and Lee and Pang [26] which indicates the highly positive and highly negative ranges i.e. 1.0 to 0.1 respectively. The Table 8 demonstrates the star rates along with polarity values and classification as taken form the literature [15, 20, 40].

Table 8 Star ratings and polarity values

In both of the dataset, the star rating were different. For example, for Microsoft products around 63% reviews were rated as 4 or 5 stars, around 9% of the reviews were of 3 star, and around 28% of the reviews were 1 or 2 stars. Similarly, for the Musical DVDs, around 75% reviews of 4 to 5 stars, around 8% were 3 star reviews, and around 18% reviews were rates as 1 or 2 stars. Thus, accumulating the positive, neutral, and negative reviews from both the dataset were around 33220, 4475, and 13310 respectively.

3.6 Impact of adverbs and its evaluation

The impact of following ten different adverbs is studied in this research to understand their behavior.

  • general adverb

  • general superlative adverb

  • general comparative

  • general-Wh adverbs

  • degree adverb

  • degree superlative adverb

  • degree comparative adverb

  • degree Wh-adverb

  • time adverb

  • locative adverb

In order to evaluate the proposed methodology and to classify the reviews in classes such as positive and negative the standard formula of precision-recall is utilized.

$$ Precision= \frac{True Positive}{True Positive + False Positive } $$
(4)

Precision is calculated for all forms which are correctly selected as shown in (4). Whereas for recall

$$ Recall = \frac{True Positive}{True Positive + True Negative } $$
(5)

Recall is calculated for those different forms which are successfully selected as shown in (5).

Furthermore, the f-measure is calculated for these respective three classes and simplifies the results as shown in (6).

$$ f-measure = \frac{2 * (precision . recall )}{precision + recall } $$
(6)

In the final step was to measures the scores using features combination from single features to ten distinct features.

4 Result analysis

4.1 Single feature analysis

This section discusses the results of various forms of Adverbs and are shown in the Figure 4.

Figure 4
figure 4

Single feature with positive scores

One Feature: Positive Analysis

This has been highlighted in the Figure 4 that how each feature (General adverbs (RR), General WH adverbs (WH-RR), General Comparative adverbs (RRQ),General Superlative adverbs (RRT), Adverb of time (RT) ,Degree adverbs (RG) ,Degree WH Adverbs, (RG-WH) ,Degree Comparative adverbs (RGT) ,Degree Superlative adverbs (RGQ) ,Locative adverbs (RL)) performed for classifying reviews into positive class. The precision (blue bar), recall (brown bar), and F-measure (green bar) values were computed for all various forms of adverb are shown in Figure 4. For example, General adverbs (RR) feature was able to classify a review into positive review with 0.89, 0.84, 0.86 precision, recall, and F-measure respectively.

If we observe the F-measure closely, RRT (general superlative adverbs) performed the best by securing the F-measure of 0.86. Similarly, the following forms were able to achieve the F-measure of more than or equal to 0.75: (1) RR (general adverbs), (2) RGQ (degree superlative adverbs) However, the RR-Wh (general-wh adverbs), and RG-Wh (degree-wh adverbs) obtained the lowest F-measure of 0.59.

Findings: When single feature is used, the feature RRT (general superlative adverbs) obtained the best result by scoring f-measure of 0.86.

One Feature: Negative Analysis

This has been highlighted in the Figure 5 that how each of the evaluated feature (General adverbs (RR), General WH adverbs (WH-RR), General Comparative adverbs (RRQ),General Superlative adverbs (RRT), Adverb of time (RT) ,Degree adverbs (RG) ,Degree WH Adverbs, (RG-WH) ,Degree Comparative adverbs (RGT) ,Degree Superlative adverbs (RGQ) ,Locative adverbs (RL)) performed for classifying reviews into negative class. Similar to the previous results, precision, recall, and F-measure are shown with blue, brown, and green bars respectively. Further, these values are computed for each distinct form of adverbs independently.

Figure 5
figure 5

Single feature with negative scores

If we observe the F-measure closely, RG-Wh (degree-wh adverbs) performed the best by securing the F-measure of 0.80. Similarly, the following forms were able to achieve the F-measure of more than or equal to 0.75: (1) RR-Wh (general-wh adverbs), (2) RRT (general superlative adverbs), (3) RT (time adverbs) and (4) RL (locative adverbs). However, the RRQ (general comparative adverbs), and RGT (degree superlative adverbs) obtained the lowest F-measure of 0.69.

Findings: When single feature is used, the feature RG-Wh (degree-wh adverbs) obtained the best result by scoring f-measure of 0.80.

Explanation of Results:

When each of the ten forms of adverbs have been evaluated individually for Sentiment Classification, the general superlative (RRT) adverb class outperformed other classes in the positive class while degree-wh (RG-Wh) adverb class performed better for negative class. This section highlights that why they have performed better than others. The in-depth analysis indicates that most of the words associated with these adverb types (RRT and RG-Wh) are strong polarity bearing words, which can be used independently to state the polarity by the users.

Few such words are presented in the Table 9. This table presents some words from RRT RR-Wh, RR, and RG classes. It is clear that the words associated with RRT and RG-Wh (Easiest, Hardest, whether, whereas) have clear meaning to predict the sentiment class. However, the words associated with RG, RR are ambiguous. This is the reason that RRT (general superlative adverb) and RG-Wh (Degree-Wh) performed better than other 8 types. Furthermore, it is identified that there is a possibility that the negation could be used with RRT and RG-Wh to change the meaning, for example, “not easiest”. Therefore, we have also handled the negation in this research study, otherwise, wherever the words “easiest, hardest” are used, it has a clear sentiment class of positive or negative.

Table 9 Examples of various form of adverbs and its impact

In Figure 6, the “Postive” score results of the proposed work are compared with Haider et al. [11] and Zafar et al. [39]. In this result, the top 5 best performer adverbs are selected from Haider et al. and Zafar et al. research studies.

Figure 6
figure 6

Precision of Positive score results comparison with [11] methodology

It is evident from the results that the proposed work has outperformed previous studies that exist in this domain. There are various reasons for better performance such as: 1) the proposed model was trained on a large corpus which includes 9555 + 41,450 reviews as compared to 5513 tweets, 2) size of individual review is far greater than tweets size which has 140 character limitation, however, in our case, reasonable size review have more descriptive power, 3) gold standard dataset of Haider et al. is fairly small as compared to the proposed dataset, and finally 4)The adverb types “RT” and “RL” were not identified by Haider et al., however, in the proposed model, enough evidences were found related to these two types giving an edge to the proposed model. In Figure 7, the results of the “Negative” scores are compared with Haider et al. [11] and Zafar et al. [39]. The results show that adverbs play a significant role in the determination of the polarity scores for sentiment analysis. The results of Figures 6 and 7 show that the methodology adopted to use adverbs to classify the sentiment of a larger text is viable and can be used by future research studies.

Figure 7
figure 7

Precision of Negative score results comparison with [11] methodology

5 Conclusions

In this paper, we have performed sentiment classifications of product reviews extracted from Amazon. We have critically analyzed the state-of-the-art in the domain and observed that the contemporary research studies have not exploited the impact of all of the adverb and its different forms for sentiments classification for product reviews. To address this, we have exploited 10 different types of adverbs. These types include general adverbs(RR), general superlative adverbs (RRT) , general comparative adverbs (RRQ) , general-wh adverbs(RR-Wh), degree adverbs(RG) ,degree superlative adverbs (RGQ), degree comparative adverbs (RGT), degree-wh adverbs (RG-Wh), time adverbs (RT) and locative adverbs (RL). To conduct this study, a diversified dataset comprising of 51,005 reviews of two products has been extracted from Amazon. These two products are office products and musical DVDs. To evaluate the results, we have compared the classification results (or polarity scores) with the benchmark data set containing star rating of same reviews. The outcomes of study revealed that general superlative adverbs secured the highest value of F-measure i.e., 0.86 for the positive class. In future, the best polarity bearing adverb features can be combined with verbs, adjectives and its forms to analyze the sentiment classification. This approach can further be applied in other domains rather than product reviews e.g. short text messages (Tweets), scientific documents, blogs, and news articles.