Keywords

1 Introduction

Widespread access to the internet is changing the way modern consumers make their purchase decisions. This change is facilitated by e-commerce platforms like Amazon and other online review websites like Yelp which host customer reviews of various products and services. Online customer reviews, also known as electronic word-of-mouth (eWOM) can be defined as “any positive or negative statement made by potential, actual, or former customers about a product or company, which is made available to a multitude of people and institutions via the Internet” [1]. According to a surveyFootnote 1, for 90% of customers, their buying decisions were influenced by online reviews. Academic studies too have confirmed the importance of online reviews on customers’ purchase decisions [2]. However, consumers find some reviews more helpful than the others. This could be either because of the characteristics of the review content (review length, review polarity, content and style) [3, 4] and/or the characteristics of the reviewer.

It has been found that apart from information quality, source credibility is an important aspect for information seeking and adoption [5, 6]. In the context of online reviews, ‘source’ would imply the reviewer, hence reviewer credibility should impact review adoption. An accumulation of reviewer credibility over time should lead to more customers following a reviewer, leading to more reviewer popularity. Thus, popularity of reviewers should be associated with the impact of their reviews on the customers. The importance of word-of-mouth of influential reviewers on consumers’ purchase decision is well established in previous research [7, 8]. However, little attention has been given to study factors which make reviewers popular, and hence more influential. These factors could be used to predict reviewer popularity, which could be useful for businesses. Hence, in this research-in-progress, we attempt to identify popular reviewers based on their online profile characteristics. We use data from Yelp website which has an extensive reviewer community and detailed reviewer attributes. One of such attributes is the number of followers of a reviewer, which we use as a proxy for popularity. Other information regarding the reviewer include the number of reviews written, the number of friends a reviewer has, average rating the reviewer provides, years of experience in writing reviews for the website, etc.

We used five different machine learning techniques to classify reviewers as high or low on popularity based on their profile characteristics provided, and compared their performances. Also, we identified the factors which were most impactful in predicting reviewer popularity.

Insights shared in this study might help businesses in targeting popular reviewers for writing reviews about their offerings. Again, by predicting popularity of a reviewer, review websites might prioritize the display of a new review which is yet to get a helpfulness vote. The study also contributes to the growing research on online customer reviews and to the best of our knowledge, this is a novel attempt of using predictive analytics in the context of reviewer popularity.

2 Literature Review

Past literature one-WoM has primarily focused on identifying the factors related to helpfulness of reviews [9, 10]. A study by [11] found that perceived value of review is influenced by reviewer’s expertise and reputation. [10] used average helpfulness votes received per review and personal information disclosure for finding impact on review helpfulness. Another research has found that reviewer characteristics like the number of reviews posted by a reviewer and the number of helpful votes received by the reviewer on the whole, impacts the helpfulness vote of a review [12]. Study also found that reviews written by a self-described expert are more helpful than those that are not [13]. Some reviewer characteristics such as reviewer quality and reviewer exposure are found to impact sales by reducing perceived uncertainty of buyers [14]. However, to the best of our knowledge there has been no research to identify the dominant factors responsible for making a reviewer popular.

In our study we attempt to differentiate relatively more popular and less popular reviewers using a predictive analytics approach. The determining factors are selected based on support from extant literature and availability of data. Since we did not find much literature directly examining the factors related to reviewer popularity, so we had to use nearest available proxy which is review popularity to justify our variables. Review popularity is based on the perceived helpfulness of the reviews. Review helpfulness has been found to be influenced by reviewer characteristics [13, 15, 16], and thus justifies its use as the proxy. Hence, for predicting reviewer popularity, we identify factors from literature which influence helpfulness of the reviews.

[17] using TripAdvisor data studied the effect of review polarity on helpfulness of the review. They found that reviewers who posted more positive reviews are more likely to receive helpful votes than those who stressed on the negative aspects. Star rating is an indicator of reviewer’s polarity. Hence, we include average review rating as a factor of reviewer popularity.

Another important factor found to be influential in deciding review helpfulness is review length [4, 18, 19]. Studies have found that review length provides important cues regarding reviewer characteristics [20]. Prior literature has already established that number of reviews are associated with review helpfulness [11, 12]. Drawing on these we can say that more the number of reviews a user writes on a forum, more popularity she gains.

Also, it could be said that with increased experience, a reviewer writes more useful reviews and hence become more popular. Similarly, the helpfulness votes a reviewer receives should be associated with her popularity, since it validates her credibility. The research by [10] also confirms that average helpfulness votes received by reviewer as one of the possible reviewer characteristics that might affect review helpfulness.

Being a well-reputed reviewer with certification from the website (‘Elite’ in case if Yelp website) also establishes the reviewer credibility and in turn should influence popularity. Finally, more the number of friends a reviewer has, more popular she is expected to be [11] used number of friends as a measuring variables for reviewers’ reputation, hence we use number of friends as a factor of reviewer popularity.

3 Data and Methodology

A large dataset with 552,339 records was collected from Yelp.com which was made public as a part of the Yelp Dataset Challenge 2016. After processing data and removing outliers we had 69,612 records which we used for analysis. Yelp hosts customer reviews on local businesses. The reason behind selecting the website is that it provides information regarding the reviewers and their followers. User attributes such as number of followers, number of friends, average review rating, number of reviews written, total helpfulness votes, years of experience, years of reputation, and average review length for each reviewer were provided in the data. Number of followers was used as a proxy for reviewer popularity. Description on the data is shown in Table 1.

Table 1. Data descriptive

We used clustering technique (2-stage clustering) to decide on the number of segments appropriate for classification. The results showed two distinguished clusters. On observation of the clustered data we found mean value of number of followers of a reviewer to be the demarcation value. The reviewers having followers more than mean value are said to be high on popularity and vice-versa. The outcome variable is binary with 1 representing high and 0 representing low.

Data was partitioned in 70:30 ratio for training and testing. Five different models were used for classification: C5, Neural network, Bayesian network, CHAID, and Logistic Regression. We used IBM SPSS modeler as the analytical tool. The models were compared based on overall accuracy, lift, and costs. The agreement of all the models were checked to ensure their comparability.

4 Results and Analysis

Different models had different values of accuracy, however, they didn’t differ much. The overall accuracy was around 83%–84%. There was 83.8% agreement among the classification techniques. Table 2 summarizes the results.

Table 2. Summary of results for various predictive models

All the models show nearly same level of accuracy with neural network giving the best value. We found that number of reviews and average helpfulness votes received by a reviewer were the two most important predictors among all followed by number of friends and average review rating. The least important factors turned out to be average review length and years of experience. Figure 1 depicts the predictor importance. Table 3 shows the confusion matrix for neural network. 85.9% of reviewers who are low on popularity are predicted correctly, whereas 70.1% of reviewers high on popularity were correctly predicted. Prediction accuracy is higher for less popular class. Businesses would try to minimize the number of less popular reviewers being predicted as more popular ones, since that would incur costs in investing their time and resources on uninfluential reviewers. In our model, this case was found to be just 14.1%, which is on the lower side of error.

Fig. 1.
figure 1

Predictor importance graph

Table 3. Confusion matrix for neural network

All other models except CHAID used all of the inputs to predict the output variable. CHAID model discarded years of ‘Elite’ and average review length as predictors.

5 Conclusion

In this research-in-progress paper, we attempted to use predictive analytics to classify online reviewers into two distinct classes based on their popularity. We compared five different machine learning techniques - C5, Neural network, Bayesian network, CHAID, and Logistic Regression. Among all, the neural network model turned out to be the best with 84.2% accuracy. Number of reviews written was found to be the most important factor.

In future we plan to incorporate few more factors such as review content characteristics, and try to improve the accuracy of prediction. For example, review subjectivity, polarity, topic relevance, spelling & grammar, etc. could be some more variables to consider. Additionally, we also want to create ensemble models using more than one predictive models (like using a neural network for accuracy and a decision tree for rules), and analyze the results.

Owing to the significant impact of online reviews on customers’ purchase decisions, it is important for the businesses to manage the reviews received for their products and services. In order to get more impactful reviews, it is important for businesses to identify influential and popular reviewers. Based on some characteristics or cues about the reviewer, if it is possible to predict reviewer popularity, businesses could leverage the information to target those reviewers and encourage them to write reviews about their products or services. If necessary, they might also incentivize the most popular reviewers. Also, it is advisable for the businesses to keep a track of the issues raised by popular reviewers to proactively address those. Businesses could use extract ideas from their reviews to enhance the offerings if needed.

For e-commerce and online review sites, the insights from this study could be helpful in many ways. They can develop recommender systems based on different characteristics of a reviewer, predict their popularity, and display their reviews as top reviews on their sites. This would be particularly useful for those websites where social interaction (like following) among reviewers and other consumers is not possible.