Keywords

1 Introduction

In the past few years, Natural Language Processing (NLP) along with Internet technologies has seen a tremendous growth. This has managed to change the way the users now look and perceive the information on the net. In this era of immense social networking, the opinion of users over the net is now playing a central role in our decision-making process. A lot of research is being carried out to find out innovative ways of managing and analyzing this content generated by the user.

A lot of changes are also observed in the behavior of people in these social networking sites. They are now open to collaborate, share their views and opinions without any hesitation. Various fields such as finance, healthcare, online shopping, etc., are making use of this vast intelligence (in the form of opinions) collected over the net to make good business decisions. But the task is difficult to handle, due to the noisy and unstructured characteristics of the online data. This is the reason that many researchers are working in this direction to develop certain automated techniques that can mine and analyze the content generated by the user. The subdomain of sentiment analysis which tries to give a solution to the above problem is opinion summarization. In this research domain, various methods and techniques are being researched to help users to collect, analyze, and draw conclusions from the huge collection of unstructured data. The growth in the various social networking sites such as Facebook, Ecommerce review websites, Twitter, etc., have also led to enormous wealth creation in terms of data. Analysis of this data can enable timely predictions and also help in planning better business strategies.

Toward this research direction, the paper proposes a novel technique for opinion summary generation. Unlike other approaches, the proposed technique is unsupervised as well as domain-independent and generates summaries at four different levels. The technique proposed was experimented on nine data sets that were from different domains.

The organization of the paper is as follows. Related research in the area of opinion summarization is discussed in Sect.Β 2. The proposed model of opinion summary generation is outlined in Sect.Β 3. SectionΒ 4 discusses the data sets used and experiments carried out. The future plan is outlined in Sect.Β 5.

2 Literature Survey

In recent years, many researchers have been trying to find probable optimized solutions to the different challenging problems that have emerged in the research domain of opinion summarization. One of these is the task of presenting the summaries to the user. It’s a complicated task, as a summary that is informative to one set of users may not hold any value to that of the others. So, many researchers have been working on various techniques that display the summary of opinions in a way that the user can digest.

Many researchers have shown that it is always better to aggregate opinions from many sources or persons instead of just presenting the individual view of a person. Liu et al. [1] proposed one such type of summary which was structured and was called aspect-based summaries. These displayed positive and negative opinions about the set of features that frequently occurred in the reviews.

Cheng et al. [2] further worked on the same lines and came up with summaries that compared features of one product with that of the other product. These summaries proved to be useful to users who wanted to compare products across multiple feature dimensions. Other interesting types of summaries are the ones which try to show the perception of the user about a product over time. These trend tracking summaries helped the manufacturers to draw their team’s attention toward products that were losing their popularity among users.

Many researchers [3,4,5] have even presented their summaries in a traditional fashion. By taking inputs that belong to an individual user or from a set of user reviews, textual summaries that are short and meaningful were generated. These summaries did not gain much popularity as any type of quantitative analysis could not be done and they were just suitable for the purpose of human reading.

Some researchers like Wei et al. [4] worked on approaches to determine the ways to select important features that would make their place in the final summary presented. The term frequencies of aspects/features were found out and further an appropriate threshold was used to filter out some of them. They presented a summary at a brief and a detailed level. A brief level had just the headline featuring the sentence selected, and the detailed ones carried the sentiments of the previously selected sentences. Researchers [6] also presented summaries that highlighted the pros and cons of various features of a product. One more interesting summary was developed by Blair et al. [7]. In their approach, sentences were first rated based on the importance of features that they had. Then in the subsequent step, few of these highly rated sentences were selected for the final summary based on their length.

Some researchers [8] even used different methods based on graphs for generating opinion summaries. Several clusters based on aspects/features were first formed and within each of these clusters, few sentences were selected as leaders taking their measure of informativeness and usefulness into consideration. Few have even worked on generating summaries at different granular levels [9]. They start with a word-level summary, by ranking features as per their importance. Next is the phrase level of summaries that just displays short phrases that represent individual clusters. The final level is that of displaying highly ranked sentences where usefulness and informativeness measures are used for sentence ranking.

3 Proposed Approach to Generate Opinion Summaries

The paper proposes a novel way to generate opinion summaries at four distinct levels. In the research domain of opinion summarization, this phase of generating relevant opinion summaries is considered to be a challenging step by many researchers. It is mainly due to the fact that concise, precise, and visually effective summaries are easily understood and effectively used by the consumers for the process of decision-making. Many of the researchers, who are working in this direction, have used various graphical structures to display the summary results rather than a tabulated display. Also, many have used different sentence ranking methods to select sentences that would be part of the final summary. The challenge in these techniques is that they are domain-dependent as well as they require a good amount of training data before the model could be built.

Fig.Β 1 shows the approach for generating opinion summary. The process begins with the already extracted single and multiword features applying the automated rule-based algorithm [10, 11]. Then, by using the technique proposed [12], relevant feature opinion pairs were extracted. These features and their corresponding opinion pairs are used in the proposed domain-independent model to generate opinion summaries at various levels, which differentiates this method from the various other techniques for a summary generation.

Fig.Β 1.
figure 1

Proposed technique for summary generation

As shown in Fig.Β 1, the initial step is to define a method to filter out few features that were extracted in the earlier phase of feature extraction [11]. It was found during experimentation that a larger number of relevant features were extracted. But, for the purpose of generating opinion summary, it would be better to filter out a few of them as our aim is to display only those review sentences that are important from the user’s perspective. We go ahead with this, by the assumption that popular features are usually the ones that are mentioned frequently in the reviews. So, the first step is to find their frequency of occurrence and retain the top few features using an appropriate threshold.

The next step is to extract sentences having the selected features. The orientation of these sentences is then determined using a SentiWordNet dictionary [13]. This dictionary is used, as it has a huge collection of adjectives along with their orientation score. The orientation scores that are assigned to the words are based on the senses in which the words are used. As in the phase of feature extraction [10, 11], a dependency parser was used to parse the review documents, the context of adjective bearing words is easily available to determine the orientation of sentences.

So, now every sentence that is selected previously using the appropriate threshold is placed into a class having a positive, a negative, or a neutral polarity. With these polarities assigned to sentences, the summary is now generated from a coarse level to a more granular one which can be used as per user’s need.

At the first level, the summary reveals details regarding the popular features mentioned in the reviews. The other information presented in this overall summary is the polarity of these features. Summary at the next level is termed as feature wise as it reveals more details about the set of features selected previously. This summary shows the opinion, i.e., positive or negative, of the author toward every individual feature. The third level is called average polarity wise summary. During experimentation, it was observed that the opinions for a few of the features that were selected were very strong even though their number remained small. To prevent losing even such granular detail which may generate better summaries, the average positive and negative polarity scores of every individual feature with its corresponding opinion was found out. This computed value was used in the display of selected features as per their average polarity score.

In the last level, the sentences bearing the features that were selected based on their average polarity score in the previous level was displayed. The above-discussed technique of generating opinion summary was experimented with nine other data sets that belong to the various domains like Mobile phones, Automobiles, Hotel Industry, and Softwares. The characteristic of the data set used along with the results of experimentation is discussed in the next section.

4 Experiments Conducted

The data sets that were used to evaluate the proposed technique were from various review sites such as Amazon, CNET, TripAdvisor.com, Carswale.com to name a few. Some of them were the golden data sets used by many researchers [1, 14] to showcase their contribution in the research field of sentiment analysis and few were manually crawled ones. Each of the reviews was around 8–10 sentences long, having an average of eight tokens per every individual sentence.

4.1 Results

As was discussed in the earlier section, the process of opinion summary generation using the proposed technique starts with the feature set and their corresponding opinion pairs that were extracted previously [10,11,12]. During experimentation, it was observed that the number of features in the category of relevant were too many. So, there was a need to limit their number. A literature survey conducted revealed that sentence ranking [2, 4, 6, 7] was one of the popular techniques used by many researchers to tackle the problem of dealing with the large feature vector. So, an initial step of filtering features was carried out to generate concise opinion summaries.

As discussed in the earlier section, the frequency of occurrence of features selected previously was found out. This was based on the assumption that popular features had more mentions in the reviews. For the automobile data set, there were around 45 features that got extracted in the feature extraction phase. Out of these, the ones that were having a good frequency of occurrence were the eight features that happened to be the top 20% of the sorted list. The review sentences corresponding to these selected features were then obtained. The selected top feature list of the automobile data set with frequency of occurrence is shown in TableΒ 1. The table clearly indicates that features like Seat Comfort, Leg space, Service Centre, and Sensor systems are the most popular ones.

TableΒ 1. Features of automobile data set selected for summarization

The next step as discussed earlier, was to extract sentences bearing these top-rated features and determine their polarity using the dictionary of SentiWordNet [13]. The orientation of few sample sentences having the features selected earlier is as shown in Fig.Β 2.

Fig.Β 2.
figure 2

Sample sentences of automobile data set with orientation

These sentences are then used in a subsequent step to generate summaries that start from a coarse level and extend up to a fine-grained level. Level 1 summary is at a higher level and gives details regarding the top-rated features and the number of opinions that are positive, negative, or neutral about the feature. FigureΒ 3 shows the overall summary of the automobile data set.

Fig.Β 3.
figure 3

Overall summary of the automobile data set

The figure clearly indicates that there are around 30 review sentences that bear positive opinion, 31 bearing a negative orientation, and 23 being neutral for a set of 8 top-rated features selected earlier.

The next level of summary for the automobile data set, termed as feature wise is shown in Fig.Β 4. The details regarding the number of positive, negative, and neutral opinions for each one of the features selected previously are shown here.

Fig.Β 4.
figure 4

Feature wise summary of the automobile data set

The figure is indicative of the fact that the features of Leg space, Safety, Price tag, and Convenience have a negative orientation, whereas that of Service Centre remains neutral and it is positive for Seat comfort, Sensory, and Brake systems.

The next level is termed as average polarity wise summary that presents much more detail than the previous two levels. Here the information regarding the average polarity scores that were computed for each of the feature opinion pairs that was previously selected is presented.

The importance of this granular level of the summary is in situations when for certain features even though the number of positive/negative opinions may be high, but their individual opinion strength may not be too great. For example, Fig.Β 4 indicates that feature of Safety and Leg space have more number of opinions that are with negative orientation. But the result displayed in TableΒ 2 and Fig.Β 5 indicates that even though the number of negative opinions toward these features is high, they are still classified as positive. This is because of the fact that the opinion strength of the word bearing positive sentiment is more than that of the word with a negative sentiment.

TableΒ 2. Average polarity wise summary of the automobile data set
Fig.Β 5.
figure 5

Average polarity strength wise summary of the automobile data set

Tackling such feature opinion pairs further increases the usefulness measure of the summary generated. The summary at the last level displays the actual review statements and is more granular. At this level, for the set of features selected, the average strength of its corresponding opinion that was computed in the previous level of the summary is used. The review sentences that are selected for the final display are the ones that have the score of polarity strength to be greater than the average score computed earlier. The level 4 summary of the Automobile data set is shown in Fig.Β 6.

Fig.Β 6.
figure 6

Actual review statements summary of the automobile data set

Using the proposed technique, a similar set of summaries was also generated for the eight remaining data sets. The summary so obtained was not comparable with other similar approaches due to the following reasons that were evident during the conduction of literature survey.

  • Researchers evaluated the summaries that they generated based on the measure of informativeness, readability as well as non-redundant content of the summary.

  • The researchers compared their summary generated with those that are stored in manually extracted files.

  • Some researchers even based their evaluation according to the way in which polarity of every feature was distributed.

  • Few of the researchers even argued that there can be no goodness measure of an opinion summary as it’s very subjective. A summary that is found informative to one user may not be of any value to the others.

5 Conclusion and Future Work

As the various techniques used for opinion summary generation are domain-dependent and supervised methods, the paper proposes a novel unsupervised domain-independent approach to this summary generation. The technique proposed generated summaries at four different levels revealing details about individual features from a basic level to that of a more granular one.

Level 1 summary termed as overall just displayed the total number of feature opinion pairs along with their polarity for individual feature selected. Level 2 summary had detail about every feature and number of positive and/or negative opinions that it carried. The level 3 summary had the features displayed based on the score of average polarity. The last level displayed the statements as per the filtering done in the previous steps.

The proposed technique was validated by running it through nine data sets that belonged to different domains. The summaries generated were tested for their readability and informativeness against files that had sentences that were manually selected for the final summary. It was observed that there was around 70%–75% matching of the summaries generated using the proposed method with that of manual ones.

Our future research direction is to come up with a way in which we can customize summaries as per the user needs. This can be made possible if we can think of a method to integrate the opinion summarization system with feedback from the user and use this information effectively in building an intelligent model.