Introduction

Online reviews are shared opinions of consumers via the Internet about various entities such as products, services, persons, or companies (Flanagin et al., 2011; Koh et al., 2010; Kunst & Vatrapu, 2019; Lin et al., 2017). These reviews are often considered more credible than commercial advertising or public relations (Furner & Zinko, 2017), and hence, potential consumers rely heavily on them in their purchase decisions (Filieri et al., 2018). According to a study (Fullerton, 2017), 93% of consumers reported that online reviews affect their purchase decisions and 68% of consumers are likely to pay 15% more for the same product or service if online reviews guarantee its quality. Therefore, online reviews have a substantial impact on the success or failure of the entities in the market (Chu & Kim, 2011; Cui et al., 2012; Vallurupalli & Bose, 2020; Zhu & Zhang, 2010). The impact of online reviews on sales has been illustrated in diverse business domains, such as ecommerce (Hong et al., 2017), local business (e.g., restaurant, hospital, hotel, etc.) (Luca, 2016), B2B business (McCabe, 2018), and film business (Lindbergh & Arthur, 2020).

Although online reviews are “shared” opinions, they do not necessarily represent the opinions of majority of consumers. In business, there is a phenomenon called the Pareto principle, also known as the 80/20 rule. It states that a few popular products (20%) create most market sales (80%) (Brynjolfsson et al., 2011). This phenomenon is found in online review communities, where a small number of highly motivated reviewers contribute a large number of online reviews. The presence of these experienced reviewers is well evidenced at Yelp.com, an online review website, where approximately 20% of the reviewers write 80% of the reviews (IESE Business School, 2015). Their presence is also evidenced on Amazon.com where a large portion of the online reviews are created by a small minority while more than 80% of the reviewers leave only one review at the website (Peddibhotla & Subramani, 2007; Woolf, 2014).

Due to the proposed importance of experienced reviewers, prior studies investigated their behaviors, affecting the success or failure of business. However, their results were mixed. Some studies investigated their motivations to contribute a large amount of reviews, arguing that they submit reviews to increase or decrease reputation (e.g., ratings) of products (Pinch & Kesler, 2011). Others argued that they submit reviews to gain compensation from retailers or manufacturers (Reichelt et al., 2014; Wu, 2019). For those reasons, some prior studies postulated that experienced reviewers tended to produce extremely positive or negative reviews (Pinch & Kesler, 2011). In contrast, other studies reported that experienced reviewers were likely to contribute balanced reviews, neither extremely positive nor extremely negative (Banerjee & Chua, 2018; Wang, 2010) and therefore have higher credibility in the communities (Banerjee et al., 2017). These conflicting findings cause confusion for business practitioners in various domains, who likely encounter experienced reviewers in their business. One of the possible reasons for the mixed results is an unverified assumption for the selection of review subjects and demographics of reviewers. The prior studies assumed that the experienced reviewers select the same products or services as novice reviewers, directly comparing average ratings between the two groups. This is a problematic assumption because the two groups may focus on different products for their reviews; for example, the novice group may prefer popular products, while the experienced group may prefer products that can satisfy their unique taste. In addition, some studies used selective samples with inadequate definitions on experienced and novice reviewers (Anderson & Simester, 2014; Banerjee & Chua, 2018). For example, Banerjee and Chua (2018) defined the novice group as reviewers submitting from one to ten reviews, the intermediate group from 45 to 54, and the experienced group from 91 to 100 reviews, ignoring the reviewers between the groups (e.g. reviewers submitting 55 to 90 reviews) or more than 100 reviews, who are highly experienced reviewers. Most notably, little prior research considered reviewers’ demographics such as age and gender in the comparison between experienced and novice reviewers, although they could affect review behavior (Leung & Yang, 2020; Mather et al., 2004). In particular, it is important to understand gender difference because some studies have shown that females are likely to contribute more reviews (Dunivin et al., 2020; Punj, 2013) and more likely to consider online reviews more seriously in their purchase decision (Abubakar et al., 2016; Freddie, 2018). Few studies investigated how experienced reviewers differ by gender in terms of rating score and extremity. This gap provides an opportunity to extend the literature of online and useful guidance for business practitioners who closely collaborate with experienced reviewers.

This research fills the aforementioned gaps in the extant experienced reviewer literature in the context of online movie review communities. The online movie community is selected because experienced reviewers have been shown to make a substantial impact on market performance of movies (Chintagunta et al., 2010; Ma et al., 2019; Zhang et al., 2020). In addition, online movie communities have been criticized for prevalence of unreliable reviews (Wilkinson, 2019), possibly contributed by experienced reviewers with strategic motivations. This is because the online movie communities have an environment better suited for such reviewers than ecommerce platforms, which require verification of purchase (e.g., user-created pictures, verified purchase) to submit reviews. Adopting empirical data including 211,197 reviews of 7642 users at Yahoo! Movies, in particular, this research addresses the following questions; (1) “do experienced reviewers select specific products for their reviews (e.g., products with sophistication)?” (2) “if so, do they have different review behaviors in their rating propensity (e.g., average rating, extremity in rating) from novice reviewers?”, and (3) “does their rating propensity differ by demographics (e.g., gender)? The findings of this study are expected to provide unique contributions to both academia and practitioners concerning experienced online reviewers. The rest of this paper is composed of the following sections. Section 2 discusses relevant studies to this research, particularly experienced online reviewers. Section 3 introduces major hypotheses of this study based on the extant literature. Section 4 discusses the dataset, research models to test the hypotheses, and results of statistical analysis. Lastly, Sect. 5 contains the conclusion of this study, contributions to both academia and industry, and limitations to be considered by future research.

Literature review

Experienced reviewers in online review platform

In the context of online review communities, experienced reviewers are highly important in that they are a small group of the population that contributes significantly larger reviews than novice reviewers (Peddibhotla & Subramani, 2007). They tends to provide not only a significant amount of information (i.e., heavy users) but also quality knowledge and information (i.e., experts) earlier than novice members (i.e., early adopters) (Peddibhotla & Subramani, 2007). Due to the importance of experienced reviewers (Aakash & Jaiswal, 2020), many prior studies examined them in online review communities. Although most of the studies focused on the relationship between their experience and perceived usefulness or helpfulness of their reviews (Choi & Leon, 2020; Fang et al., 2016; Liu & Park, 2015; Mudambi & Schuff, 2010; Racherla & Friske, 2012; Zhu et al., 2014), there are some studies on behavioral differences of experienced reviewers, which are the focus of this study.

One of the primary topics for experienced reviewer research is their motivations (Table 1). Past research investigated motivations of experienced reviews based on words in text reviews and the rating propensity of experienced reviewers (e.g., review valance, ratio of extreme ratings) compared to novice reviewers, which is a focus of this study. The motivations reported in the literature vary but can be summarized in two large categories: other-oriented and self-oriented motivations (Peddibhotla & Subramani, 2007). According to Bhattacharyya et al. (2020), reviewers with other-oriented motivations, such as social affiliation (Peddibhotla & Subramani, 2007) and altruism (Mathwick & Mosteller, 2017; Peddibhotla & Subramani, 2007), are the most dominant reviewer group. They contribute to online review communities for others who have a similar interest, providing neutral ratings and balanced information. They distinguish themselves from general reviewers who rate primarily to reward or punish the subjects (e.g., ecommerce sellers) with extreme rating scores (Lafky, 2014). Self-oriented motivations, which are inferred from their rating propensity and text reviews, include self-expression (Mathwick & Mosteller, 2017; Peddibhotla & Subramani, 2007), improvement of understanding of topics (Peddibhotla & Subramani, 2007), social image (Mathwick & Mosteller, 2017; Wang, 2010), publication of their work (McIntyre et al., 2016) and status seeking (Lampel & Bhalla, 2007; McIntyre et al., 2016; Wu, 2019). The self-oriented perspective supports the idea that experienced reviewers participate in the review process for their own satisfaction and enjoyment (Wu, 2019), although Lampel and Bhalla (2007) view status seeking as a social passion that generates the continued participation. In addition to the other-oriented and self-oriented motivations, some recent studies introduced strategic motivations. These studies argued that some experienced reviewers may have a strategic goal to generate a large amount of online reviews, such as encouraging (or discouraging) sales of specific products or services with extreme ratings and attaining compensation from the review activities (Reichelt et al., 2014; Wu, 2019). This situation can be complicated regarding experienced reviewers in the presence of incentives, as the buying public assume that the reviewers are regular customers (Owen, 2011). When manufacturers send top reviewers (e.g., influencers) products free of charge to solicit their reviews, a bias can be introduced undermining the credibility of the reviewers and the integrity of the process (Chow, 2013; Dvorak, 2011; Lee, 2020; Pinch & Kesler, 2011).

Table 1 Summary of relevant literature

The presence of experienced reviewers with strategic motivations has brought researchers’ attention to their review behaviors. However, they reached mixed conclusions. Anderson and Simester (2014) found that the customers who reviewed a product without purchasing it are more experienced in terms of the number of contributed reviews and their reviews are significantly negative than reviewers with confirmed transaction. Pinch and Kesler (2011) reported that some Amazon customers, who are generally stakeholders of a certain product, have contributed numerous reviews to give extremely positive for their own but extremely negative rating for their rivals. In contrast, some other studies reported that experienced reviewers tended to be less extreme in their rating propensity than novice reviewers (Wang, 2010). A recent study of Banerjee and Chua (2018) investigated how online reviews of members in the IMDb online community change as they have more experience. They reported that reviewers in the novice stage, who have contributed less or equal to ten reviews, tend to give higher ratings to movies than the reviewers contributing reviews more than ten. Similarly, Costa et al. (2019) revealed that experienced reviewers, which are defined as incentivized reviewers in the study, tend to use more positive sentiments in their text reviews. Therefore, it is still inconclusive whether experienced reviewers are more negative or positive in their reviews than novice reviewers.

Summary of literature review

Although prior studies examined experienced reviewers, they primarily centered on whether they produce helpful reviews in ecommerce review platforms. There are some empirical studies on behavioral characteristics experienced reviewers in the online communities but they have the following limitations that this study attempts to address. First, the findings on the relationship between review experience and review propensity are mixed. Some studies revealed the experienced reviewers are more likely to be negative (Anderson & Simester, 2014; Banerjee & Chua, 2018) and extreme (Pinch & Kesler, 2011), while other reported that they are less likely to be negative (Costa et al., 2019) and extreme (Wang, 2010). Second, the empirical comparison between experienced reviewers and novice reviewers was not adequate in some studies, which can be one of the reasons for the inconclusive findings. For example, Banerjee and Chua (2018) used selective samples to compare the average ratings of different reviewer groups, adopting a simple t-test without including other influential factors. They defined online reviewers who submitted from one to ten reviews as at novice stage, from 45 to 54 at intermediate stage, and from 91 to 100 at expert stage, omitting reviewers who submitted the number of reviews between the stages (e.g., reviewers who submitted the number of reviews from 11 to 44 and from 55 to 90) and reviewers who submitted more than 100 reviews, who should have made substantial contributions to the review community. Third, the empirical studies simply assumed that the experienced reviewers evaluated the same products as the novice group and compared average ratings between two groups. However, the comparison is invalid if they had rated different products. For instance, if experienced reviewers mainly rated products in niche markets but novice reviewers did so popular products (Park & Yoo, 2018), the different rating propensity might have caused by the product difference, not by their rating propensity. Lastly, the empirical studies omitted important factors related to rating scores, such as age and gender of reviewers (Leung & Yang, 2020; Mather et al., 2004), product brand newness (Blythe, 1999; Choi et al., 2018), perceived quality of product (i.e., prior rating) (Lee et al., 2015), market popularity of product (Wang, 2010). Without considering these factors, the direct comparison between the two groups should lead to misunderstanding on the experienced reviewers.

Conceptual framework

This research proposes that experienced reviewers in online review communities are differentiated from novice reviewers by what they review, how they review, and why they review. Similar to professional critics, experienced reviewers are proposed to hold products to a higher standard and rate as such, induced by their expertise and knowledge. In terms of review topic selection, the experienced and novice groups may have a clear difference in diverse product domains such as movie (Sedgwick & Pokorny, 2014), music (Elvers et al., 2015), and video game (Santos et al., 2019). For example, while experienced reviewers prefer sophisticated products corresponding to their intellectual tastes, novice reviewers prefer products with market popularity, which may impose less risk in their decision (Sedgwick & Pokorny, 2014). Therefore, we empirically examine how experienced reviewers behave compared to novice reviewers, in terms of how they rate (e.g., average rating and extremity) and what they rate (e.g., sophisticated products vs. popular products) in the context of an online movie review community. Finally, we infer their motivations via how and what they review, as prior studies did. In addition, according to political science literature, where a certain minority group, particularly involving gender, was stressed to make a substantial difference (Dahlerup, 2006), female leader group has shown a different voting behavior. This implies that female experienced reviewers would be different in their review behavior (e.g., rating propensity). Therefore, we propose that gender differences affect the relationship between experience and rating propensity, which has not been discussed in the extant literature of experienced reviewers.

Rating propensity

In online review communities, one of the most popularly used rating methods is the star rating measurement, where reviewers can choose scores from one to five to evaluate products or services (Mudambi & Schuff, 2010). However, rating systems vary by platform. For example, Fandango uses the traditional five star single dimensional system with user text comment adjunct (Fandango, 2018), while Rotten Tomatoes, Metacritic, and IMDb have two rating systems for different groups or purposes. Yahoo! Movies, where the dataset for this study is from, uses a single dimensional rating scale similar to academic grading in the U.S. taking on the form of five letters with ± variation. Due to their unique characteristics, experienced reviewers are expected to illustrate different behavioral patterns in terms of rating scores and extremity in the ratings.

Rating propensity for rating score

Individuals can extend their understanding and knowledge on a certain area through contributing content in an online community with reviews (Lampel & Bhalla, 2007). Therefore, experienced reviewers are expected to have more knowledge or expertise about their topics. Major online review communities consider the number of reviews as a measure for the expertise of reviewers. For instance, Amazon.com identifies top reviewers based on the number of review entries and their overall helpfulness rated by the readers of their reviews (Chen et al., 2008). Extant academic literature also employed the number as one of the primary variables to estimate the expertise of online reviewers (Jennings et al., 2015; Liu et al., 2008), suggesting a positive relationship between the number of reviews submitted by the reviewers and their expertise.

The level of expertise of online reviewers are expected to influence their rating propensity, as reviewers with a higher level of expertise are typically more fastidious in their evaluation. Marketing literature concerning customer knowledge and market demandingness commonly indicates that as consumers are more informed and knowledgeable about products or services, they have more expectation and demand (House et al., 2005; Prahalad & Ramaswamy, 2004). In the movie industry, for example, frequent moviegoers tend to give lower ratings than infrequent moviegoers do due to their experience and knowledge, which introduces more demands on the movies (Chakravarty et al., 2010). Similarly, movie critics with expertise tend to give lower ratings than non-critics (Plucker et al., 2009). In summary, since experienced reviewers have more expertise, they would be likely to give a lower rating, introducing Hypothesis 1;

H1 Experienced reviewers are more likely to select a lower rating.

Rating propensity for review extremity

Extreme ratings, respectively representing a very satisfactory and a very dissatisfactory opinion, have been empirically examined and found to be positively or negatively related to sales growth. Therefore, the phenomenon is important to understand behavior of experienced reviewer behavior (Clemons et al., 2006). Extremity in rating propensity would distinguish experienced reviewers from novice reviewers. While one-time reviewers who write only one review at an online review community tend to be extreme because their motivation is to reward or punish the sellers of products or services (Feng et al., 2012), the experienced reviewers would be less likely to do so due to their different motivations considering social affiliation and altruism (Peddibhotla & Subramani, 2007). Therefore, they would be less likely to share extreme reviews because consumers perceive them as less credible and helpful than moderate ratings (Hunt & Smith, 1987; Mudambi & Schuff, 2010; Schmidt & Eisend, 2015). The extremity also does not correspond to self-oriented motivations such as social image (Wang, 2010) and self-expression (Peddibhotla & Subramani, 2007). Because they share personal thoughts in the online community to establish themselves as intelligent, fair, and good individuals, they would be less likely to choose drastic ratings, which may harm the desired social images. In addition, more knowledgeable consumers tend to be less extreme in their evaluation because the expertise of the consumers on products allows them to evaluate more diverse dimensions from an objective viewpoint (Sanbonmatsu et al., 1992; Sujan, 1985). Therefore, the following hypotheses are proposed:

H2a Experienced reviewers are less likely to select an extremely positive rating.

H2b Experienced reviewers are less likely to select an extremely negative rating.

Effect of gender on the relationship between experience and rating propensity

Gender difference has been popularly discussed in the extant online review studies. The studies commonly reported that female consumers are more influenced by online reviews (Abubakar et al., 2016; Awad & Ragowsky, 2008; Bae & Lee, 2011), negative reviews (Bae & Lee, 2011; Sotiriadis & Van Zyl, 2013), and inconsistent reviews (Zhang et al., 2014). As online review contributors, females tend to post more online reviews than males (Dunivin et al., 2020; Punj, 2013). However, there is little research to investigate how gender affects rating propensity of online reviewers, in particular female experienced reviewers.

In studies involving the relationship between gender and the level of generosity in behaviors, females were found to be more generous than males when their generosity caused little cost (Cox & Deck, 2006) or risk was low in their response (Eckel & Grossman, 2008). One explanation for this disparity is concerned with social expectation. Rand et al. (2016) showed that females have more altruistic tendencies and receive more negative feedback from society than males when they failed to be generous. In a regular social life, similarly, Mehl and Pennebaker (2003) found that females use significantly less negative words in their daily language, implying that females tend to be less aggressive but more generous. In the online review communities, further, Smith and Mangold (2012) reported that female consumers are less likely to write negative reviews on the internet than male consumers when they are disappointed with their purchase.

In the online review, rating propensity of experienced reviewers should differ by gender generosity of the reviewers. Although they have the same level of disappointment with a review subject such as products and services, more generous reviewers would select a higher rating score. As discussed, experienced reviewers should have higher expectation and demands concerning quality of their review subjects, leading to a negative relationship between experience and rating scores. However, based on the studies showing results of females’ generous behavior as well as more constant altruism (Baez et al., 2017; Mehl & Pennebaker, 2003; Rand et al., 2016), females are expected to exhibit more generous evaluations in their online reviews. The negative relationship between experience and the propensity to select a lower rating (Hypothesis 1) should differ by gender and is expected to be weaker for female reviewers. Concerning the extreme ratings, likewise, female reviewers should be more likely to choose the extremely positive but less likely the extremely negative due to the aforementioned attributes. Given the discussion above, the following hypotheses are introduced:

H3a Experienced female reviewers have a weaker tendency to select lower ratings than experienced male reviewers.

H3b Experienced female reviewers have a stronger tendency to select the extremely positive rating than experienced male reviewers.

H3c Experienced female reviewers have a weaker tendency to select the extremely negative rating than experienced male reviewers.

Product selection for review: product quality and market popularity

Experts and regular consumers often have different opinions on products (or services). Sometimes, products with hard criticism from industry experts make a huge market success (e.g., iPhone, Harry Porter, FedEx), implying different perspectives on product quality between experts and consumers. For movies, general moviegoers and professional movie critics tend to have different perspectives on artistic quality (Boor, 1992; Chakravarty et al., 2010). While general moviegoers often overlook the quality and consider the abstruse or boring, the critics tend to more pay attention to the quality that meets their intellectual tastes (Chakravarty et al., 2010; Lundy, 2010). In the video game industry, expert reviewers and amateur reviewers show differences in their evaluation of video games in terms of game genre, developer, and console platform. For example, the armatures tend to rate significantly higher for video games produced by Nintendo, which is more likely to develop friendly video games appealing a broad range of consumer groups, while experts do so for those produced by hard-core game developers, such as Blizzard and BioWare (Santos et al., 2019). Similarly, expert listeners of music tend to prefer more highbrow and sophisticated music, such as jazz and classical, while conventional listeners are more likely to enjoy folk, hard rock, house, and pop (Elvers et al., 2015). These differences indicate that industry experts prefer sophisticated products, which have more delicate, processed, and elaborated attributes (Herédia-Colaço & do Vale, 2018), while regular consumers may overlook the attributes in their product evaluation.

In the online review communities, experienced reviewers are expected to have a similar taste to that of industry experts, focusing on sophistication of products, in that both have strong intellectual motivations and high expectation. For instance, the experienced enjoy displaying their expertise and knowledge (Pinch & Kesler, 2011) as well as developing writing skills and enhancing understanding of topic (Peddibhotla & Subramani, 2007) by organizing and sharpening their ideas in the reviews. In term of social image, they want to establish themselves as intelligent and knowledgeable in their communities (Mathwick & Mosteller, 2017; Wang, 2010), which is an extension of their intellectual motivation. Therefore, the experienced reviewers are more likely to choose sophisticated products for their reviews, introducing the following hypothesis:

H4 Experienced reviewers are more likely to review products with sophistication.

According to Debenedetti and Larceneux (2011), professional movie critics tend not to highly evaluate movies with market success that ordinary moviegoers prefer. The critics may believe that such movies focus more on popular tastes but not on artistic quality, which is important in their evaluation. Likewise, fashion experts do not prefer mass produced items (Park & Yoo, 2018), because such items do not satisfy their unique taste or do not allow to present their expertise in the reviews. Given the aforementioned similarities to the critics, experienced reviewers would have the elite taste (Chakravarty et al., 2010), leading them to reject common products for their reviews. In terms of their desired social image, experienced reviewers would refuse to review popular products because they could not differentiate themselves as experts in their communities (Wang, 2010) as such products would provide less opportunity to project their expertise. Therefore, the following hypothesis is proposed:

H5 Experienced reviewers are less likely to review products with market popularity.

Figure 1 below summarizes the hypotheses proposed;

Fig. 1
figure 1

Research model

Empirical analysis

Data for analysis

This research adopts a dataset, titled “Yahoo! Movies User Ratings and Descriptive Content Information”, which is available at Yahoo! Research (https://webscope.sandbox.yahoo.com/catalog.php?datatype=r), a subsidiary research organization of Yahoo!. The organization provides a variety of datasets to support research of non-commercial users, such as academic scholars and scientists, after examining their personal information and research purposes. More details about the data request process are available at the Webcope website of Yahoo! Research (webscope.sandbox.yahoo.com). The dataset has been adopted in many studies, including analytic modeling research of Ebesu and Fang (2017).

The initial data provided by Yahoo! Research had six datasets about users’ ratings and their demographics and movie information (e.g., movie title/ ID/ year, GNPP, MPAA, awards nominated, etc.). We merged the datasets to prepare a dataset for this study. After handling missing values and errors, we had 211,197 movie reviews of 7642 users in the dataset. The top experienced reviewer contributed 1632 movie reviews to the community while the lowest level completed ten reviews. We sorted the 211,197 reviews and estimated average values of each variable by users to examine the proposed hypotheses. The average review number of each user is 27.636 and approximately top 23% of the reviewers contributed reviews more than average to the movie community. In terms of the number of reviews of an individual reviewer, the dataset illustrates the presence of experienced reviewers at Yahoo! Movies, suggesting that a small number of reviewers contributed a substantially large portion of reviews. As shown in Table 2, the top 5% contributed 29.5% of all reviews while the top 20% completed 53.7%, more than half of the reviews.

Table 2 Distribution of reviews by ranks

Figure 2 illustrates the overall curve fitting the distribution of the reviews for the top 2000 reviewers. It suggests that the growth of the contribution slows as the size of the group increases, indicating the large number of reviewers contributed considerably smaller amount of reviews to Yahoo! Movie than a few top reviewers did.

Fig. 2
figure 2

Distribution of review contributions by rank

Construct operationalization

Using the variables in the dataset, we operationalized major constructs of this study. Table 3 summarizes the definitions of these variables. To measure experience of reviewers, we used the number of total reviews each reviewer contributed to Yahoo! Movie, operationalized as ReviewNumber. Although there are additional factors to describe experienced reviewers in online communities (e.g., quality of reviews), the number has been used as a key determinant for the group in the extant empirical studies on experienced reviewers (Banerjee & Chua, 2018; Buchanan et al., 2014; Nguyen et al., 2020; Peddibhotla & Subramani, 2007; Subramani & Peddibhotla, 2003; Wang, 2010). For example, Wang (2010) estimated experience of reviewers solely based on the number of reviews submitted to Yelp, Citysearch, and Yahoo! Local. Similarly, Peddibhotla and Subramani (2007) categorized reviewer groups by their experience given the number of reviews contributed to Amazon.com.

Table 3 Variable definitions

Rating score is operationalized as Rating. In Yahoo! Movie, they range from A + to F, which have 13 different rating scores. Thus, A + is converted to 13 while F is 1, which is the same conversion scheme as that of many prior studies on online reviews (Dellarocas et al., 2004; Duan et al., 2008; Simmons et al., 2011). Extremely positive rating is operationalized as ExtPositive, which stands for the most positive rating, 13 among the scores. We coded ExtPositive as 1 if the rating is 13, 0 otherwise. Whereas, we operationalized extremely negative rating is as ExtNegative, which refers to 1 in the score. If it is the lowest rating (i.e., 1), it is coded as 1 and 0 otherwise. Concerning constructs for product characteristics, we operationalized product sophistication as AwardsNominated, using the number of award nominations of the movies rated by a reviewer, which is known as a reasonable proxy for product (or service) sophistication (Bonner et al., 2003; Goldberg & Vashevko, 2013; Seng & Geertsema, 2018). Finally, market popularity is operationalized as GNPP, using Global Non-Personalized popularity (GNPP), which is a market popularity index used at Yahoo! Movies. Testing the interaction effect of female reviewers in the proposed hypotheses, we created a variable, Female. If a reviewer is female, it is coded as 1 and 0 otherwise (i.e., male).

Data analysis

The following five empirical models were constructed to test the proposed hypotheses. Each model examines the difference in terms of online review behaviors, including rating propensity and product selection for their review, given the set of control variables that affect the dependents. The controls include BirthYear, MovieYear, MPAA, AverageRating, and RaterNumber, which were barely considered in the extant literature of experienced reviewers. BirthYear is the year of birth of a reviewer. This is an inverse measure of demographic age of a reviewer. MovieYear indicates the release year of a movie, which is an inverse measure of newness of movies. As many prior studies suggested, newness is a key indicator of perceived product quality affecting online review valence and sales of the product (Choi et al., 2018). MPAA is Motion Picture Association of America (MPAA) ratings, indicating the suitability of a movie for certain age groups. As a movie has a high level of rating in the system, it generally has more inappropriate contents for young audience, which should influence rating scores for the movie. AverageRating means the average rating score of movies while RaterNumber does the total number of reviews for a movie. Each represents overall evaluation and interest of the entire community in a movie, potentially affecting review valence of each reviewer (Wang, 2010).

Model 1 testing H1 and H3a

$$\begin{array}{c}Rating_i\:=\:\alpha_0\:+\:\alpha_1ReveiwNumber_i\:+\:\:+\:\alpha_2Female_i\:+\:\alpha_3ReviewNumber_i\ast Female_i\\+\:\alpha_4BirthYear_i\:+\:\alpha_5AwardsNominated_K\;+\;\alpha_6GNPP\;+\;\alpha_7MovieYear_k\\\:+\:\alpha_8MPAA_k\:+\:\alpha_9AverageRating_k\:+\:\alpha_{10}RaterNumber_k\end{array}$$

Model 2 testing H2a and H3b

$$\begin{array}{c}ExtPositive_i\:=\:\beta_0\:+\:\beta_1ReveiwNumber_i\:+\:\beta_2Female_i\:+\:\beta_3ReviewNumber_i\ast Female_i\\+\:\beta_4BirthYear_i\:+\:\beta_5AwardsNominated_k\:+\:\beta_6GNPP_k\:+\:\beta_7MovieYear_k\\+\:\beta_8MPAA_k\:+\:\beta_9AverageRating_k\:+\:\beta_{10}RaterNumber_k\end{array}$$

Model 3 testing H2b and H3c

$$\begin{array}{c}ExtNegative_i\:=\:\gamma_0\:+\:\gamma_1ReveiwNumber_i\:+\:\:+\:\gamma_2Female_i\:+\:\gamma_3ReviewNumber_i\ast Female_i\\\:+\:\gamma_4BirthYear_i\:+\:\gamma_5AwardsNominated_k\:+\:\gamma_6GNPP_k\:+\:\gamma_7MovieYear_k\\+\:\gamma_8MPAA_k\:+\:\gamma_9AverageRating_k\:+\:\gamma_{10}RaterNumber_k\end{array}$$

Model 4 testing H4

$$\begin{array}{c}AwardsNominated_k\:=\:\delta_0\:+\:\delta_1ReviewNumber_i\:+\:\delta_2Female_i\:+\:\delta_3BirthYear_i\:+\:\delta_4Rating_i\\+\:\delta_5GNPP_k\:+\:\delta_6MovieYear_k\:+\:\delta_7MPAA_k\:+\:\delta_8AverageRating_k\\+\:\delta_9RaterNumber_k\end{array}$$

Model 5 testing H5

$$\begin{array}{c}GNPP_k\:=\:\varepsilon_0\:+\:\varepsilon_1ReviewNumber_i\:+\:\varepsilon_2Female_i\:+\:\varepsilon_3BirthYear_i\:+\:\varepsilon_4Rating_i\\+\:\varepsilon_5AwardsNominated_k\:+\:\varepsilon_6MovieYear_k\:+\:\varepsilon_7MPAA_k\:+\:\varepsilon_8AverageRating_k\\+\:\varepsilon_9RaterNumber_k\end{array}$$

Where i represents reviewer and k represents movie.

Model 1 tests Hypothesis 1 and 3a. It includes Rating as the dependent variable, representing the rating score of a reviewer. ReviewNumber is an independent variable to represent the total number of reviews contributed by a reviewer. It indicates the experience of the reviewer, testing Hypothesis 1, which is concerned with the relationship between the experience and the rating propensity of reviewers. The other independent variable in the model is ReviewNumber*Female, which is the interaction term of ReviewNumber and Female. It examines Hypotheses 3a, testing how the relationship between the experience and the rating differ by gender.

In Model 2 testing Hypothesis 2a and 3b, the dependent is ExPositive while the independents are ReviewNumber and ReviewNumber*Female. Each independent respectively examines the relationship between the experience and the tendency to give extremely positive rating and gender difference affect the relationship. Model 3 has the same independent variables as those of Model 2 but its dependent is ExtNegative. It tests whether the experience has to do with the propensity to select the extremely negative review valence and the interaction effect of gender, which are concerning Hypothesis 2b and 3c. Testing Hypothesis 4, Model 4 adopts AwardsNominated, which represents product sophistication, as its dependent. The model tests how a reviewer’s experience is related to the literary quality of movies selected for reviews. Whereas, Model 5 tests Hypothesis 5 concerning how the experience is associated with the market popularity of the movies.

Table 4 summarizes the descriptive statistics of the variables discussed and Table 5 illustrates their correlations. Overall, there is no significant correlation among the variables although GNPP and AwardsNominated have a relatively strong correlation (c.f., VIF’s of the models range between 1.20 and 1.79, indicating no significant multicollinearity). Since the dataset used in this study is unbalanced panel data without a time variable, which an individual reviewer has multiple review records on different movies, the models above are likely to have the violation of OLS (Ordinary Least Square) assumptions. Accordingly, we conducted a Breusch-Pagan test for the models (i.e., Model 1, 3, and 4) and found heteroscedasticity in the models. This suggests that OLS is not a reliable estimator for the models and an alternative estimator is necessary, such as a random effects model with Huber/White standard error correction. This is known as a more rigorous estimator than a random effects model to address heteroscedasticity in a large dataset when the model has heteroscedastic problem (Hayes & Cai, 2007). For Model 2 and 3 that have binary dependent variables (i.e., ExtPositive and ExtNegative), we adopted random effects logistic regression. This is a more rigorous estimator than logistic regression in panel data analysis, allowing to control for time invariant characteristics (Williams, 2009, 2018).

Table 4 Descriptive statistics of key variables
Table 5 Correlation matrix

Estimation results

As aforementioned, we performed a random effects regression with Huber/White standard error correction for Model 1, 4, and 5 and a random effects logistic regression for Model 2 and 3. Due to the nature of the dependent variable of Model 1, which is bounded in the range between the two extreme ratings, we also performed random effect Tobit regression to check robustness of the results (Mudambi & Schuff, 2010). The hypothesis test results remain constant with those from a random effects regression with Huber/White standard error correction. Table 6 below summarizes major analysis results, including coefficients and P values of the variables.

Table 6 Regression results

In Model 1, the coefficient of ReviewNumberi (α1) is significant (P < 0.01) and negative. This indicates that as reviewers are more experienced, they tend to leave lower rating scores, supporting Hypothesis 1. Coefficient of the interaction term, Female (α2) is statistically significant (P < 0.01) and positive while ReviewNumber*Female (α3) is significant (P < 0.01) and negative. This suggests that although female reviewers tend to give a higher rating, the negative relationship between experience and rating scores is stronger in the female group, which is significant but in the opposite direction for Hypothesis 3a.

Model 2 tested the relationship between experience of online reviewers and a propensity for selecting the extremely positive score (i.e., A +) in their ratings. The coefficient of ReviewNumberi (β1) is significant (P < 0.01) and negative. This indicates that as reviewers are more experienced, they are less likely to choose the extremely positive score, corresponding to the prediction of Hypothesis 2a. Concerning Hypothesis 3b, both Femalei (β2) and ReviewNumberi*Femalei (β3) are statistically significant (P < 0.01) and negative. This implies that although the females are more likely to give the extremely positive score, the negative relationship between the experience and the propensity to choose the extremely positive is stronger in the female group, significant but opposite to the prediction of Hypothesis 3b.

Model 3 examined how the experience is related to the probability to choose the highly negative score (i.e., F) in their online review. The coefficient of ReviewNumberi (γ1) is significant (P < 0.01) and negative, illustrating that as more experienced online reviewers are less likely to select the extremely negative score, supporting Hypothesis 2b. Testing the interaction effect of gender (i.e., female) on the relationship between experience and the probability to select the extremely negative, the coefficient of ReviewNumberi*Femalei (γ3) is statistically not significant although Femalei (γ2) is significant (P < 0.01) and negative. This suggests that overall female reviewers are less likely to give the extremely negative score but this propensity is not unique enough to female experienced reviewers from male experienced reviewers, not supporting Hypothesis 3c.

The Coefficient of ReviewNumberi (δ1) in Model 4, which has AwardsNominatedk as its dependent, is significant (P < 0.01) and positive. It suggests that as more experienced, online reviewers tend to rate products with sophistication, supporting Hypothesis 4. Given that the coefficient of ReviewNumberi (ε1) in Model 5 is significant (P < 0.01) and negative, they are less likely to participate in reviewing products with higher market popularity, which is operationalized as GNPPk. Therefore, Hypothesis 5 is supported. Tables 7 and 8 below summarizes the hypothesis test results.

Table 7 Hypothesis test results
Table 8 Results of MANOVA between experienced and novice reviewers

Additional analysis

In order to find more details about the test results, we conducted an additional analysis, adopting MANOVA (Multivariate Analysis of Variance) to examine the differences in the multiple dependent variables, including AvRating, ExtPositive, ExtNegative, AwardsNominated, and GNPP. Because the sample sizes of the experienced and novice reviewer groups are unequal, we conducted Box’s M test to check equality of covariance across the two groups. Not surprisingly, we found unequal covariance in the outcome variables, which is common to a large dataset. Therefore, we used Pillai’s trace, instead of Wilk’s lambda, to check the statistical significance of differences between groups (Tabachnick et al., 2001). We performed the analysis with different definitions on experienced reviewers, such as 15%, 20%, and 25% of top reviewers in terms of the number of reviews submitted to the community. As summarized in Table 9, the differences in the variables are statistically significant (p < 0.01) and correspond to the hypothesis test results for Hypotheses 1, 2a, 2b, 4, and 5. One of the most notable differences between the two groups is in AwardsNominated. The average number of award nominations of the movies selected by experienced reviewers is approximately twice larger than the movies selected by novice reviewers, indicating a clear taste difference between the two groups.

Table 9 Results of MANOVA between female and male experienced reviewers

The gender differences in AvRating, ExtPositive, and ExtNegative were estimated, adopting the same grouping scheme for the previous MANOVA. Overall, female experienced reviewers tend to be stricter than male experienced reviewer groups (Table 9). The average ratings and the ratio of extremely positive rating of female experienced reviewers are consistently lower in the 15%, 20%, and 25% top reviewer groups. The ratio of extremely negative rating is higher than male reviewers in the same group and this difference is statistically significant (P < 0.01). This result is consistent with the overall findings concerning rating propensity of female experienced reviewers, which is less generous than male experienced reviewers, although it is different from the regression result for Hypothesis 3c.

Discussion

This research examines experienced reviewers in online review communities, who potentially impact product reputation and ultimately market performance as a small percentage of reviewers conduct more reviews than the rest of the reviewers. In terms of rating propensity, support for Hypothesis 1 indicates that the experienced tend to leave lower scores in the rating systems. This result corresponds to marketing literature illustrating a positive relationship between knowledge and market demandingness of consumers (House et al., 2005; Prahalad & Ramaswamy, 2004; Wikström, 1996) and therefore, it is more difficult to satisfy such demanding consumers. Since the experienced have higher expectations and demands on the movies attributed from their knowledge and expertise, they tend to choose lower ratings in their evaluation. Support for Hypothesis 2a and 2b suggests that the experienced tend to have less extremity in their review valence. Although this finding is contradictory to the literature that argued that experienced reviewers should have a strategic purpose to manipulate product reputation and sales (i.e., fake reviewers) (Luca & Zervas, 2016; Pinch & Kesler, 2011), this finding corresponds to the extant literature concluding that they tend to have motivations to help other people, including social affiliation and altruism, based on their neutral rating propensity. As such, the experienced reviewers should want to position themselves as personable individuals who are good, intelligent, and fair in the communities (Wang, 2010), and they would be less likely to choose extreme ratings, which contradicting the desired image. In addition, their altruistic motivations would promote them to choose a neutral rating because the community members generally perceive extreme ratings as less helpful in the communities (Hunt & Smith, 1987; Mudambi & Schuff, 2010; Schmidt & Eisend, 2015). This propensity also corresponds to “regression to the mean” of online reviews, suggesting that as a product has more reviews, its rating moves toward the neutral rating neither extremely positive nor extremely negative (Cloney et al., 2018; DellaPosta & Kim, 2016). As experienced reviewers contribute more reviews, the ratio of extreme ratings should decrease in their reviews.

The surprising findings regarding Hypothesis 3a, 3b, and 3c show not only that gender differences exist in online reviews, but also that there are gender differences in the relationship between experience and rating propensity, which are opposite to the extant literature suggesting higher generosity of females. Although it was found that females tend to give higher and more positive extreme rating scores corresponding to the extant studies reporting that females are more generous than the males in their evaluation (Cox & Deck, 2006; Eckel & Grossman, 2008; Smith & Mangold, 2012), the interaction of gender on the relationship between experience and rating propensity was both surprising and profound. Our findings showed that as female reviewers are more experienced, the negative relationship between experience and negative rating propensity actually becomes more substantial. An explanation for this finding could be related to gender, power, and online communities (Hemphill & Otterbacher, 2012). It is possible that power asymmetry in the online world affects the female experienced reviewers such that as they want their voices heard and adjust style in order to be better received (Hemphill & Otterbacher, 2012), task competence becomes a focus of the female experienced conveyed through objective success as an influence agent (Carli, 2001) outweighs the findings of the female sample as a whole and is significant enough to differ from the male experienced. Concerning the tendency to choose the extremely negative rating, females are less likely to do so than males, which corresponds to the generosity literature; however, gender (i.e., female) is not found to have a significant interaction effect on the relationship between experience and the tendency. This indicates that the negative effect of experience on the tendency does not differ by genders. This can be explained by literature on gender differences in online reviewers where reviews of well-known female members were shown to exhibit male characteristics (Hemphill & Otterbacher, 2012; Otterbacher, 2012). Support for Hypothesis 4 and 5 presents preferred products of experienced reviewers for reviews. They prefer products with sophistication in which they can demonstrate their expertise (Pinch & Kesler, 2011) and differentiate themselves from the novice who would not be interested in such products (Mathwick & Mosteller, 2017; Wang, 2010). For the same reasons, they are less likely select products with market popularity because reviewing popular products is less likely to satisfy the aforementioned motivations.

Theoretical contributions

This research enriches the literature of online reviewers, providing the following theoretical contributions. First, this study provides empirical evidence that the small number of the reviewers contribute most reviews to the community and they tend to have altruistic motivations, rather than strategic motivations given their rating propensity. Second, although some extant literature discussed the rating propensity of the experienced (Dellarocas & Narayan, 2006; Wang, 2010), few addressed the subject selection pattern. This study adds a novel dimension to the literature in terms of the pattern, illustrating that experienced reviewers prefer reviewing sophisticated products while they do not those with market popularity, due to the tendency which would have been induced by their expertise, and desired social image (Mathwick & Mosteller, 2017; Wang, 2010). In particular, this difference is found to be highly substantial in that the products reviewed by experienced reviewers have approximately twice more product sophistication than those by novice reviewers. Third, the finding of this study confirms the positive relationship between expertise and demandingness on products or services (House et al., 2005; Prahalad & Ramaswamy, 2004) by presenting that experienced reviewers who should be more knowledgeable tend to give lower ratings. Fourth, this study extends extant literature on experienced reviewers, adding knowledge about gender difference in the rating propensity. This was one of the first attempts to investigate how gender affects the propensity of experienced online reviewers, specifically showing a profound gender difference in rating propensity when a female reviewer happens to be experienced. It illustrates that the female experienced reviewers have less generous than the regular female members and their generosity decreases more dramatically than males as they are more experienced in terms of the amount of contributions. Lastly, this study provides a more comprehensive model for experienced reviewer research, including additional influential factors to determine online reviews, such as demographics of reviewers, newness of products, and average rating and market popularity of products, which were rarely considered in the extant literature but were found to have significant relationships with review behaviors in this study.

Managerial contributions

The findings of this study offer several managerial implications to practitioners. First, experienced reviewers in the online communities produce a substantial number of reviews. For example, 20% of the most experienced reviewers are found to contribute more than 50% of the entire reviews in Yahoo! Movies. This finding corresponds to the extant literature on Amazon.com (Peddibhotla & Subramani, 2007), CitySearch, Yahoo! Local, and Yelp (Wang, 2010), confirming the substantial impact of the experienced reviewers in the online review platforms. Knowing the influence of the experienced, online platform managers can take steps to increase the richness of the reviews; for example considering multidimensional rating systems (Chen et al., 2017). Some platforms (i.e., Rotten Tomatoes and Metacritic) have already acknowledged experienced reviewers and specified qualifications to meet these boundaries. Product manufacturers could consider partnerships with online platform managers to shape multidimensional rating criteria or provide system input; IMDb has initiated an IMDbPro feature for enhanced communication and collaboration between the industry and experienced reviewers. Second, this study verifies that the experienced in online communities would not have a strategic goal given their rating propensity and product selection pattern for review, even though the communities have a favorable environment to fake reviewers with strategic goals (Luca & Zervas, 2016). For example, if they had a strategic goal, they must have rated with extremely positive or negative ratings to manipulate the overall rating as they intended. However, the findings of this study suggest that they have a more neutral rating propensity. This may lessen the concerns of buying public and practitioners who have serious doubt about the reviews of the experienced (Owen, 2011) in particular, those in online movie communities (Wilkinson, 2019) Third, the findings of this study indicate that female reviewers are more generous than the males in rating propensity. However, the generosity decreases more dramatically as their experience increases and female experienced reviewers tend to be stricter in their rating propensity than male experienced reviewers. Therefore, practitioners may consider the difference in understanding their reviews or working with female experienced reviewers (e.g., influencers), being aware of the tendency opposite to the previous theory that females are more generous. Fourth, this study provides evidence that experienced reviewers are similar to professional critics in their rating propensity and product selection pattern. They tend to be stricter but to avoid extreme ratings, both extremely positive and negative, and prefer sophisticated products but do not popular products. These findings suggest that practitioners may use reviews of the experienced to understand the viewpoint of professional critics, separated from novice consumers. Understanding these trends can help online platform managers make business decisions regarding their rating systems; for example, Netflix replaced its five star rating system with a parsimonious ‘thumbs up, thumbs down’ system (McAlone, 2017) with the assumption that professional critic should not select extreme ratings and that general consumers are more likely to use the extreme ratings (i.e., thumbs up, thumbs down). This approach may hinder the participation of experienced reviewers, who play a major role in developing and maintaining the online review ecosystem with a large quantity of reviews as well as superior quality of knowledge about products, but should enable to collect pure ideas of novice consumers in the market.

Conclusion

This research has several limitations that future researchers may consider in designing their research on experienced online reviewers. First, this research was not able to consider time effect in the analysis due to lack of time information of each record in the dataset although timing of reviews must have an impact on overall ratings. For instance, when the difference between movie release date and review date is large (i.e., when reviewers rate old movies), the rating is more likely to be positive because consumers watch old movies when they have confidence that the movies certainly will be satisfactory after a careful selection process (Koren, 2010). As Lee et al. (2015) reported, in addition, it is important to consider the effect of previous ratings of others, which are influential to the following ratings. Although our empirical analysis included average rating of each movie to control such an effect, it was not possible to consider the ratings by time because of lack of time information in the data. The lack of time factor also does not allow considering the number of reviews created in a certain period time. For example, although some reviewers were experienced but not categorized as the experienced because they began to write the reviews later than other (and vice versa). Future research can include the time factor in their analysis to enhance the robustness of their findings. Second, this research did not include text reviews in the analysis, which can disclose additional differences of experienced reviewers from the novice. For example, the experienced could use less emotional terms but more objective information, such as market data and references, in building their ideas. Future research may find more behavioral differences between the experienced and the novice reviewers by analyzing text review data.