1 Introduction and related works

Machine learning and/or various language processing tools are used to find whether the given document is positive or negative in polarity. This is called sentiment analysis. In the following work we have found whether a movie review is positively or negatively oriented using sentiment analysis. Document level sentiment analysis and aspect level sentiment analysis are the two levels of sentiment analysis (Singh et al. 2013; Parkhe and Biswas 2014). The first level uses certain lexicon-based method or machine learning approaches for document classification. Pang et al. (2002) suggested a good method of sentiment classification using Nave Bayes, SVM and Maximum Entropy classifiers. They experimented with different features like unigrams, unigrams and bigrams, adjectives, top unigrams, etc. and compared their results. Kang et al. (2012) proposed a method for mitigating the error caused when the accuracies of the positive and negative classes are expressed as average values. For this they proposed an improved Naive Bayes algorithm that reduced the accuracy gap. Sometimes the accuracy obtained by the machine learning algorithms is low; thus to address this problem Basari et al. (2012) used Support Vector Machines coupled with Particle Swarm Optimization to increase the overall accuracy. In their study they increased the accuracy from 71.87 to 77 %.

The second level deals with each individual aspect of the movie. A movie has many different aspects such as Direction, screenplay, acting, story, etc. and the reviewer may tend to give his/her opinion based on these aspects. Better analysis of the review is possible if individual aspect polarities are taken into consideration. Reviewers tend to have different opinion about various movie aspects. Thus for detailed analysis of the review, aspect-based analysis is the way to go. Many researchers have worked on aspect-based sentiment analysis. Thet et al. (2010) proposed a method for fine-grained analysis of sentiment orientation and sentiment strength of the reviewer towards the various aspects of the movie. It uses domain-specific and generic opinion lexicons to score the words and with the help of dependency tree, it identifies various inter word dependencies and helps in propagating the word score over the entire document. Singh et al. (2013) gave a new feature-based heuristic for aspect level sentiment analysis. In their scheme they analyse the review text and assign sentiment label on each aspect of the review. Then each aspect text is scored using SentiWordNet (2015) with feature selection comprising of adjective, adverbs, verbs and n-gram features. The overall document is then scored based on the aggregate score of each aspect. Yu et al. (2011) proposed a method for identifying important aspects from online consumer reviews. They identified the important aspects based on the observations that such aspects are commented the most in a review and overall product opinion is greatly influenced by consumer opinion on such important aspects. In their algorithm they formulate the aspect value distribution via a Multivariate Gaussian Distribution. In this paper, we tend to find the movie aspects that dictate the polarity of the review the most. For this we give different weightage to individual movie aspects, called driving factors. The overall score is the sum of individual aspect scores weighed by their driving factors. The approach of Yu et al. (2011) differs from our approach in the method by which they assign aspect values. They use a Multivariate Gaussian distribution while we use a randomized approach to assign values to the driving factors. Also we choose those driving factors that give the maximum accuracy as the best driving factors. The rest of the paper is organized as follows: Sect. 2 describes the proposed method; Sect. 3 gives the dataset, experimental results and performance; Sect. 4 gives the conclusion and future work, and the last section gives Compliance with Ethical Standards and references.

Fig. 1
figure 1

Diagram for the proposed method

Table 1 Lexicon used for aspect based text seperator

2 Proposed method

The following method suggests a technique for aspect-based sentiment analysis of movie reviews (Parkhe and Biswas 2014). Figure 1 describes the method flow. The first step is pre-processing. In this step, we collect reviews from different sources and pre-process them to make them suitable for use in the method. Pre-processing includes formatting of reviews so that they can be aligned in the required format. For this the HTML tags and other tags were removed. For the following method the reviews were pre-processed into simple text format. The next step was to separate the review text into aspects and this was done using Aspect Based Text Separator (ABTS). The various movie aspects that we used are screenplay, music, acting, plot, movie and direction. An aspect-specific lexicon was used to separate the review aspect wise. Table 1 shows some of the words used to separate the sentences (Thet et al. 2010). Each word in the lexicon was associated with the part of speech of that word. While searching the sentence to match the lexicon word, we first tagged the sentence with the Stanford Part of-Speech Tagger (2015) and then we matched the lexicon word within the sentence having the same part of speech.

In the next step, these separated reviews were forwarded to the aspect-specific classifiers. A Naive Bayes classifier Pang et al. (2002) was used for this purpose. It calculates the probability of a word or albeit a sentence, belonging to positive or a negative class of reviews. The outputs were obtained using the traditional training and testing method. The outputs were either \(-1\) or 1 denoting that the input text was negatively or positively oriented, respectively. Instead of NB we can use any classifier like SVM, etc. that is able to clearly classify the text into two classes. However, we must carry out proper processing of input data so that it meets the proper data format requirement for each classifier. Based on the weightage of the driving factors of the movie, the aspect-based output is multiplied with the respective driving factor.

Table 2 Performance measures
Table 3 Performance measures (DF \(=\) driving factor)

The higher the value of the driving factor of an aspect, the more is its importance in the review. The driving factors follow the relationship

$$\begin{aligned} \sum \alpha _i = 1, \end{aligned}$$

where \(\sum \alpha _i\) is the (ith) driving factor. The net output obtained is the sum of all the classifier outputs obtained multiplied with their respective driving factors. The output is

$$\begin{aligned} \omega (\mathrm{d}) = \sum \alpha _i X_i\quad X_i \subseteq [-1,1], \end{aligned}$$

where \(\alpha _i\) is the driving factor of (ith) aspect and \(X^i\) is the output of (ith) classifier and (d) is the document under consideration. Now if

$$\begin{aligned} \omega (\mathrm{d})\le 0 \rightarrow \text {negative classification of review}\quad \text {d}\\ \omega (\mathrm{d})> 0 \rightarrow \text {positive classification of review}\quad \text {d} \end{aligned}$$

Thus we have used a threshold score for the classification of the document.

3 Dataset, experimental results and performance

The dataset was acquired from the Large movie review dataset site of Stanford AI Lab (Maas et al. 2011; Large Movie Review Dataset 2015; Parkhe and Biswas 2014). The dataset consists of 25,000 positive and 25,000 negative reviews and was collected from IMDB. Though there is no specific time span for review collection from IMDB, but it was ensured that no more than 30 reviews from a single movie get included in the final dataset. Because of even number of positive and negative reviews, the minimum accuracy that we can obtain from the experiment is 50 %. The dataset contains only highly positive and highly negative reviews. The authors of the dataset included a negative review only if it scored 4 out of 10 and included a positive review if it scored 7 out of 10 on a benchmark set by them (Maas et al. 2011). Neutral reviews were omitted. It was seen that ABTS separated the review into various aspects having unequal text distribution. This was due to the fact that in each review, the reviewer commented on each aspect in unequal number of sentences. Also in some reviews not all aspects were commented on. The score for such reviews was made 0. As mentioned in the previous section a Nave Bayes classifier was used for classifying the separated aspect based text. The individual classifiers got the aspect-based text as input in the ratio of 70:30, for training and testing, respectively. The experiment ran for 1000 iterations and during each iteration, random values between 0 and 1 were assigned to the driving factors. For the particular dataset under consideration, the driving factors giving the highest accuracy were chosen as the best driving factors (Table 2). The experiment conducted gave results as depicted in Table 3.

Table 4 Experimental results for action genre
Table 5 Experimental results for adventure genre
Table 6 Experimental results for animation genre

The results in Table 3 depict the relationship between accuracy and driving factors used. The highest accuracy obtained was 0.79372, i.e. 79.372 %. The corresponding factors are Screenplay—0.07877, Music—0.11756, Acting—0.28147, Plot—0.16390, Movie—0.31225 and Direction—0.108133.

Thus by using the mentioned driving factors, we get an accuracy of 79.372 %. This is the highest accuracy obtained using this method. Also its worth noting that giving equal importance to all factors, i.e. giving each a value of 0.165, has resulted in a lower accuracy of 78.268 % than the highest accuracy obtained by unequal distribution of factors. The effect of changing driving factors can be seen in the accuracy of the overall classification obtained. In the above case of 79.372 % accuracy we have given most importance to the Movie, Acting and Plot aspects. Thus we can interpret from the results that in the reviews used from the dataset, the user has given more importance to these factors while writing the review. It also means that if the reviewer gives a positive opinion towards these aspects, then due to their high importance the overall review will tend to be positive even if he/she gives a negative opinion towards the other aspects. Giving more importance to certain factors also has an added advantage; it tends to suppress the user opinion about other factors. Suppose we have a review X and it contains user opinion about two factors F1 and F2. Also the overall orientation of the review is positive in nature. The user has given a positive review about F1 and a negative about F2. Also the amount of text in the review for F1 aspect is less as compared to the F2 aspect. Now if we use any non-aspect based sentiment analysis method, then since text size of F2 is greater than text size of F1 and also since F2 is negative in orientation, the overall document score will tend to reduce and skew towards negativity. On the other hand, if driving factors are used and F1 is given more importance the document score will better reflect the positivity of the review. Since each aspect of a movie is analysed separately in this method, we can track the effect each aspect has towards the overall score of the document. This individual aspect-based tracking can be used in a fine-grained aspect-based recommendation system, which recommends movies based on their various aspects instead of the overall rating of the movie. Also this method can be applied on a product review dataset thus enabling us to see what opinion each user has on the various aspects of the product, thus helping in the development of proper product placement strategy. It is very difficult to acquire such in-depth knowledge from the dataset using non-aspect based methods.

Table 7 Experimental results for comedy genre
Table 8 Experimental results for crime genre
Table 9 Experimental results for documentary genre
Table 10 Experimental results for drama genre

We wanted to see how the above method would work on reviews of specific movie genre. Thus we applied the method on movie reviews of genres like action, adventure, animation, comedy, crime, documentary, drama, horror and the results obtained are showed in Tables 4, 5, 6, 7, 8, 9, 10. For certain experimental simplifications, the sum of the driving factors is taken to be 2 instead of 1 as mentioned previously. As can be seen from the tables, we got an accuracy of 63.8 % for action genre, 63.33 % in Adventure genre, 81.48 % in animation genre, 77 % in Comedy genre, 87.3 % in Crime genre, 84.82 % in Documentary genre, 76.64 % in Drama genre and 83.33 % in Horror genre.

The various performance measures used were (Singh et al. 2013)

$$\begin{aligned}&\text {Accuracy} = \frac{\text {Total correctly classified documents}}{\text {Total number of documents}} \\&\text {Precision} = \frac{tp}{tp + fp} \\&\text {Specificity} = \frac{tn}{\text {Total negatively oriented documents}} \\&\text {Recall} = \frac{tp}{\text {Total positively oriented documents}}, \end{aligned}$$

where (tp), (fp) and (tn) are the true positives, false positives and true negatives obtained during the classification. The result obtained by applying the various performance measures can be seen in the given tables. As can be seen from Fig. 2 for action genre we got direction, plot and screenplay as the most important driving factors, for adventure genre we got direction, acting and screenplay, for animation genre we got direction, screenplay and acting, for comedy we got direction, music and movie, for crime we got movie, screenplay and plot, for documentary we got music, screenplay and direction, for drama we got movie, music and acting and for horror we got acting, movie and direction as the most important driving factors. Only the highest accuracy across each genre was considered for obtaining the above results. The graph denotes the percentage distribution of the driving factors across each genre. The total value of these factors comes out to be 2 as stated previously. Thus it shows that each genre has unique important driving factors and if the reviewer comments positively on these aspects then the overall accuracy of the classification increases (Table 11).

Fig. 2
figure 2

Bar graph showing the distribution of driving factors across all genres

Table 11 Experimental results for horror genre

4 Conclusion and future work

The experiment was conducted to find which movie aspects influence the orientation of the review using driving factors. It concluded with Movie, Acting and Plot aspects getting overall high driving factors and resulting in an accuracy of 79.372 % for the current dataset in consideration. The importance of these aspects may or may not change, but since the experiments were conducted on a large dataset, it is quite unlikely that it will.

As we can see from the results obtained for genre-specific reviews, the method gave high accuracy for some genres, while it gave lower accuracy for others. Thus its evident that the method used for mixed review classification is not that good for reviews of certain genres. Thus a newer approach need to be developed of genre-specific classification of reviews as reviews of different genres tend to incorporate genre-specific words or sentences that can have different meaning based on the context in which they are used. For instance, the word funny is used in a good context for a comedy movie but may be used in a wrong context for movie genre like horror, etc. Thus such context-specific words and sentences resulted in uneven accuracy as depicted in the results.

The current method used for classifying the text is Naive Bayes Classifier which uses a bag-of-word approach. This approach does not consider the inter word meaning dependencies and also the context in which the word was used, i.e. genre. For this purpose we tend to develop scoring method using context specific lexicon. Each word in the lexicon will have a different positive and negative score based on the context (genre) in which it was used. Also to incorporate the inter word dependencies we tend to use clause-based scoring of a sentence. It scores each clause of a sentence individually and thus the overall sentence score is the sum of individual clause scores. Thus by coupling the above improved method with the use of genre-specific driving factors we tend to obtain more refined scores for the movie reviews.