1 Introduction

One of the most powerful sources shaping consumer attitudes toward certain products and services is word-of-mouth (Schlosser 2011). With the development of internet technology, online consumer reviews have become a popular word-of-mouth (WOM) source for tourists. However, simply offering online reviews is no longer adequate; instead, certain online reviews are perceived as more helpful than others (Schlosser 2011). In fact, a recent study suggested that helpful reviews are likely to both improve the value of companies (Lee et al. 2018) that provide customer reviews and attract consumers that are seeking information (Qazi et al. 2016). That is, providing more helpful online reviews compared to other websites for tourists is highly likely to increase the website’s sustainability. As a result, e-commerce research has increasingly paid attention to investigating the underlying contents of helpful reviews (Yin et al. 2014).

Online customer reviews are consisted of both quantitative and qualitative aspects, which are review ratings and written contents in regard to explanations for the ratings, respectively. However, the majority of related research has only focused on either review ratings or written contents. For example, prior studies have found that online reviews with negative ratings tend to be perceived as more helpful than online reviews with positive ratings (Cao et al. 2011; Sen and Lerman 2007; Willemsen et al. 2011). Another previous study has focused on examining qualitative aspects for review helpfulness and found that content readability and sentiments in review contents are important determinants of review helpfulness (Agnihotri and Bhattacharya 2016). Consumers, however, do not rely solely on ratings and instead also read the online review’s written content (Chevalier and Mayzlin 2006). In addition, Schlosser (2011) suggested that consumers use the written contents of online reviews in conjunction with product ratings to determine the online review’s helpfulness. This leads us to develop the following research question by examining the quantitative and qualitative aspects of online reviews together: Are reviews that have the same rating level (positive or negative) of hotels perceived as being similarly helpful regardless of the review content?

There are two main objectives involved in addressing our research question. First, we examine how differing written content in online hotel reviews that have the same ratings can lead to systemic differences in regard to how consumers perceive the online review’s helpfulness. Second, we explore which underlying psychological mechanism creates the systemic differences related to the helpfulness of online hotel reviews. We have conducted two experimental analyses in order to achieve these main objectives. Our experimental analyses demonstrate that, when the hotel rating is positive, an online review with both positive and negative written content is perceived as more helpful than an online review with only positive written content. In contrast, we find that, when the hotel rating is negative, an online review with only negative written content is perceived as more helpful than an online review with both positive and negative written content. Our study illustrates that the degree of information diagnosticity stemmed from negativity bias of information in online reviews is an underlying psychological mechanism for consumers. These findings support the existence of a negativity bias in online reviews.

It is important to reveal which types of online reviews are perceived as more helpful for consumers and to highlight the underlying psychological mechanism involved because perceived online review helpfulness and customer loyalty are directly related. In this regard, our findings can provide useful insights for travel websites to develop sustainable business strategies in regard to encouraging their consumers to post more helpful online reviews.

In the following sections, we first review extant literature related to the online reviews and review helpfulness and then negativity bias of information. Thereafter, we develop hypothesis based on the literature review. In the next section, we present our research methodology of two experimental studies to test our hypothesis. In the final section, we conclude our research by discussing the conclusion, contributions, limitations, and directions for future research.

2 Literature review

2.1 Online review and review helpfulness

Online customer reviews are one of the most easily accessible information sources (Godes and Mayzlin 2009; Agnihotri and Bhattacharya 2016), and they have become important information that influence consumer decision making process (Kostyra et al. 2016). Therefore, understanding online customer reviews is becoming increasingly important (Kim et al. 2020; Reyes-Menendez et al. 2019a).

As the e-commerce businesses are growing, the overload of online customer reviews and conflicting information in the reviews get consumers confused (Hong et al. 2017). The conflicting information spamming in online reviews may decrease the efficiency of consumers’ decision-making process (Chen and Tseng 2011). Therefore, it is important for researchers and practitioners to understand how consumers perceive the helpfulness of online reviews (Hao et al. 2010) as the perception of review helpfulness can significantly change consumer decision making process.

As shown in Table 1, many previous studies have tried to find out the determinants of review helpfulness. However, the results of studies on perceived online review helpfulness still show contradictory findings (Hong et al. 2017). It is because the existing literature focused on the different aspects of online reviews to search for what types of online reviews are perceived more helpful to make decisions. Although online reviews are consisted of both quantitative and qualitative aspects, extant literature largely focused on either quantitative or qualitative aspect for investigating the relationship between the characteristics of online reviews and review helpfulness.

Table 1 Literature review on online reviews and review helpfulness

Consumers, however, do not rely solely on ratings and they also take written contents of online reviews into consideration to determine the review helpfulness (Chevalier and Mayzlin 2006; Schlosser 2011). This requires studies to examine the quantitative and qualitative aspects of online reviews together in order to provide the existing findings with more comprehensive view in terms of the determinants of review helpfulness. Thus, this study aims to examine the interaction between quantitative and qualitative aspects of online reviews and how it determines review helpfulness for consumers by drawing on the negativity bias and information diagnosticity.

2.2 Negativity bias of information

According to the extant literature, the psychological effects of negative information outweigh those of positive information (Wu 2013). This “negativity bias,” or “positive–negative asymmetry” (Peeters 1971; Taylor 1991), has been repeatedly confirmed in the existing literature (Ito et al. 1998; Rozin and Royzman 2001). Based on prior studies, a plausible explanation for the existence of a negativity bias is that negative information is more distinctive than positive information, which makes negative information more diagnostic (Skowronski and Carlston 1989). That is, the negativity bias is attributed to the degree of information diagnosticity. Diagnosticity determines the likelihood of information utilization, so inferential biases can happen when people overestimate the diagnostic value of certain information (Herr et al. 1991). Negative information is more diagnostic because it clearly suggests one categorization over other possibilities (Herr et al. 1991). For example, the behavior of cheating reveals more about a person’s honesty than the behavior of truth telling (Wu 2013). Thus, by drawing on the negativity bias and information diagnosticity, we develop the hypothesis in regards to how the same review ratings can be influenced by review contents and its impact on review helpfulness.

3 Hypothesis development

Schlosser (2011) argued that consumers tend to trust reviews because online reviewers do not have a clear incentive or motivation to lie about their consumption experience, which distinguishes these reviews from advertising. This usually makes consumers trust online reviews. In this regard, we propose that consumers determine review helpfulness based on cues from review ratings and content rather than on reviewer characteristics. Thus, we expect that the negativity bias of information in online reviews plays a significant role in online reviews’ perceived information diagnosticity, which then determines the helpfulness of the online reviews. We utilize Kempf and Smith (1998)’s definition of the perceived diagnosticity of online reviews as the degree to which the consumer believes that the information in the review is useful in evaluating the review’s helpfulness. Perceived helpfulness is defined as the extent to which consumers perceive that a peer-generated seller evaluation can facilitate their purchasing decision process (Mudambi and Schuff 2010; Yin et al. 2014).

The two parts of online consumer reviews, product ratings and written content, combined indicate the review’s overall valence. However, the majority of research on online reviews focuses only on product ratings. For example, Forman et al. (2008) found that book reviews on Amazon with extreme ratings were perceived as more helpful than reviews with moderate ratings. Mudambi and Schuff (2010) examined the impact of review ratings on review helpfulness and found that reviews with extreme positive or negative ratings are perceived as more helpful for search goods. In this regard, previous studies have suggested that the qualitative aspect of online reviews is also important in determining review helpfulness (Wu et al. 2011). Therefore, building on the negativity bias of information, we expect that the written content of online reviews can lead to systematic differences in reviews’ perceived information diagnosticity even with identical product ratings, which is then likely to influence perceived review helpfulness.

Specifically, online reviews that have positive product ratings typically contain only either positive written content or mixed content (i.e. positive written content with minor negative information that offers suggestions for service improvement). This is because the majority of consumer-generated online product reviews are not either purely positive or negative (Wu et al. 2011).

In regard to online reviews with positive ratings, we expect that online reviews with mixed content are perceived as more diagnostic than reviews with only positive content. Due to the negativity bias, negative information appears to have greater weight than positive information, and thus, negative information is generally perceived as more diagnostic than positive information (Skowronski and Carlston 1989). In addition, negative information is usually rarer or unexpected, which is perceived as more useful for decision-making (Fiske and Linville 1980).

Hence, when multiple online reviews have the same positive ratings, the reviews with both positive and negative content are likely to be perceived as more diagnostic than those with only positive content. This, in turn, increases the perceived helpfulness of the online reviews. Therefore, we hypothesize the following:

Hypothesis 1

Reviews with extreme positive ratings that contain both positive and negative content will be perceived as more helpful than those with only positive content because the former’s perceived diagnosticity in determining review helpfulness is higher.

In regard to online reviews with negative product ratings, the reviews can contain either only negative written content or mixed content (i.e. negative written content with some positive information). For online reviews with negative ratings, we expect that reviews with only negative content will be perceived as more diagnostic than those with mixed content.

Consumers tend to search for negative WOM in situations where they lack information and experience (Herr et al. 1991). This is because, according to the negativity bias, extremely negative cues are less ambiguous than positive or neutral ones, especially in product-judgment contexts (Mizerski 1982; Wright 1974). In addition, the theory of information diagnosticity suggests that information is perceived as useful if it helps people reduce the uncertainty and ambiguousness involved in decision-making (Feldman and Lynch 1988; Herr et al. 1991). Thus, we provide the following hypothesis:

Hypothesis 2

Reviews with extreme negative ratings that contain only negative content will be perceived as more helpful than those with both positive and negative content because the former’s perceived diagnosticity in determining review helpfulness is higher.

4 Study 1

We conducted the Experiment 1 to analyze the effect of review content type on helpfulness when mediated by diagnosticity, which is to test Hypothesis 1. The following sections provide details on the experiment.

4.1 Method

4.1.1 Participants and procedure

To test Hypothesis 1, we collected data through a self-administered online survey using respondents drawn from Amazon Mturk. The participants were individuals who were interested in online reviews of hotels. A total of 130 samples were collected, but only 115 respondents (63.5% male, 36.5% female) were included in the analysis after removing 15 unusable samples. We excluded these 15 samples due to missing data and untrustworthy responses.

The experimental design was conducted using two experimental stimuli (one-sided positive and ambivalent review content) for the same five-star rated reviews. At the beginning of the experiment, we gave the participants a consent form to indicate their agreement to participate in the study. The participants were then randomly assigned to one of two manipulated conditions for review content type. They were told that the survey was designed to improve the artificial hotel review site, HotelReviews.com. After reading the survey instructions, the participants were required to read the experimental materials and complete a series of questions. We collected demographic information at the end of the experiment.

4.1.2 Experimental stimuli

The experimental stimuli described the artificial hotel review website, HotelReview.com. The stimulus material began with an introduction about developing an online hotel review website. This was then followed by the online review about a fictional hotel, Mon Ami Hotel.

The fictional online review page included a general description of the hotel review regarding its hotel name, rating, title, and review contents. Online reviews are mostly composed of titles, ratings and contents (Chua and Banerjee 2017; Tang et al. 2014). In addition, consumers regard more helpful when a review includes consistent title and contents (Zhou et al. 2020). Each part of review’s information consists of the services, room condition, locations (Xie et al. 2011). Based on the previous literature and real online review comments posted on well-known travel websites such as TripAdvisor, we developed the stimuli including review title and contents for five-star rated reviews. We manipulated the stimuli by differentiating review titles and review content types (one-sided positive and ambivalent content) for five-star rated reviews. Every factor was the same except for the title and review contents in the fictional online review stimuli.

As shown in Fig. 1, the experimental material for the one-sided positive review content with five-star rating was developed as follows.

Fig. 1
figure 1

Experimental materials for two five-star ratings with different review contents

Title: The best service ever!

Review Contents: Overall good with high quality rooms. I liked the place and inclusive services. The room was tidy and clean and very comfortable. You will find it cozy. One of the best hotels in the world to stay. Great staff, great service, great views.

Meanwhile, the experimental material for the ambivalent review content with five-star rating was developed as follows.

Title: The great hotel but unprofessional reception desk services.

Review Contents: Overall good with high quality rooms. I liked the place and inclusive services. The room was tidy and clean and very comfortable. You will find it cozy. The only bad thing is the reception desk services. I had to spend a lot of time in the lobby because of the unprofessional desk service.

4.1.3 Pre-test for stimuli

A pretest was conducted before proceeding to the main test to confirm that the participants perceive the different review content types as intended. Eighty-five participants were recruited through Amazon MTurk in return for a certain amount of financial compensation. A total of 83 participants were finally used for the pre-test regarding stimuli after excluding those who did not properly answer to the questionnaire. The proportion of male to female participants was evenly distributed (n = 41, 49.4%) and females (n = 42, 50.6%).

The participants were first asked to read the experimental materials, online reviews on the fictitious hotel, Mon Ami. Afterward, the participants were asked to answer the following question about the valence of the hotel review (Cheung et al. 2012; Xie et al. 2011): “This review includes only positive comments.” The results indicated that the participants group with the one-sided positive review content (Mone-sided = 6.47, SD = 0.117) showed significantly higher scores in regard to the valence of the review content than did the participant group with ambivalent review content (Mambivalent = 4.73, SD = 0.172). These results confirm that the participants perceived the experimental materials as intended (t-value = 8.377). The participants were asked to answer to another question, “This review is biased towards one side.” The results indicated that the participant group with one-sided positive review content (Mone-sided = 6.51, SD = 0.112) showed higher scores than did the participant group with ambivalent review content (Mambivalent = 4.60, SD = 0.175). The results also confirm that the participants perceive the valence of review contents as we intended even if they have the same five-star ratings (t-value = 9.207) (Table 2).

Table 2 Results of pre-test for experimental stimuli of 5-star ratings online reviews

4.1.4 Measures

The measurements for the study constructs are as follows. First, the review content type (one-sided positive or ambivalent content for the five-star ratings) served as an independent variable (X). We identified each condition with the variable X, and we assigned “1” to the one-sided review content condition and “0” to the ambivalent review content condition. Second, we utilized a mediation variable (M) to measure information diagnosticity. The information diagnosticity was measured as follows: (1) this review makes it easier for me to make a purchasing decision (e.g. booking a hotel or not); (2) this review enhances my effectiveness in making a purchasing decision; (3) this review is helpful for me to make a purchasing decision’ and (4) this review facilitates my purchasing decision. Cronbach’s alpha value for the construct was 0.912, which therefore indicates that the construct is reliable. We measured the diagnosticity using the seven-point scale (ranging from 1 = strongly disagree to 7 = strongly agree). We relied on Qui et al.’s (2012) measurements of diagnosticity and review content type.

Third, the dependent variable (Y) measured review helpfulness. Helpfulness was measured as follows: (1) this hotel review is useful for me to evaluate the hotel’s overall quality; (2) this hotel review is useful for me to become familiar with the hotel’s overall quality; and (3) this hotel review is useful for me to understand the hotel’s overall quality. Cronbach’s alpha value for the construct was 0.933, which therefore indicates that the construct is reliable. We measured helpfulness with a seven-point scale (ranging from 1 = strongly disagree to 7 = strongly agree). We relied on Hu and Chen (2016) to develop the helpfulness measure.

Before testing the hypothesis, correlation analysis was conducted to check the collinearity issue in the variables, especially for the diagnosticity and helpfulness. According to previous studies, the collinearity issue exists if the correlation coefficient between variables is above 0.80 (Field 2018). As shown in Table 3, the correlation matrix showed that there was correlation between the variables, diagnosticity, and helpfulness (r = 0.771, p = 0.000). However, the correlation is below 0.80 suggesting threshold to determine collinearity issue.

Table 3 Correlation matrix

Additional analysis was conducted to determine the collinearity issue with the variance inflation factor (VIF) and tolerance level. The VIF indicates whether a predictor had a strong linear relationship with the other predictor. The collinearity issue exists if the largest VIF value is greater than 10, but, as shown in Table 4, the results of our analysis showed that the VIF was 1.059, implying that there was no collinearity issue (Bowerman and O’Connell 1990). Additionally, the results of our analysis showed that the tolerance value was 0.945 when it is considered that collinearity issue exists if tolerance value is below 0.1. Based on the analyses, we concluded that there was no collinearity issue in our data and thus proceeded to hypothesis test.

Table 4 Results of the collinearity test

4.2 Results

The primary goal of this study was to estimate the pathways of influence from review content type to helpfulness, mediated by diagnosticity. To this end, we first conducted t test to compare two means: one-sided and ambivalent review contents in positive review ratings. In the same positive review ratings, as shown in Fig. 2, participants who were given the ambivalent review content perceived the higher helpfulness (Mambivalent = 5.5862, SE = 0.8903) than those who were given the one-sided positive review content (Mone-sided = 4.9298, SE = 1.3506). This difference in the perceived helpfulness was significant (t(133) = − 3.082, p = 0.003). This means that although online reviews have the same five-star positive ratings, online reviews with both positive and negative contents were perceived more helpful than online reviews with only positive contents.

Fig. 2
figure 2

Result of t-test in positive review ratings

We then conducted the mediation test to examine the mediating role of diagnosticity in the relationship between review content type and helpfulness by applying the Hayes PROCESS macro to conduct a mediation analysis (Preacher and Hayes 2008). Figure 3 displays this mediation model for the between-participant design in path-diagram form. The diagram in Fig. 3 represents three linear equations that can be used to estimate the various components involved in the process, assuming M and Y are modeled as continuous outcomes:

$$M_{i} = a_{0} + aX_{i} + eM_{i}$$
(1)
$$Y_{i} = c^{\prime}_{0} + c^{\prime}X_{i} + bM_{i} + e_{{Yi^{*} }}$$
(2)
$$Y_{i} = c_{0} + cX_{i} + e_{Yi}$$
(3)

where Y is “level of helpfulness,” X is “review content type,” and M is the “mediator” (level of diagnosticity). The α0, c′0, and c0 variables are the regression intercepts, e denotes the estimation error, and * indicates that eYi* and eYi are not the same estimates. We use i to denote the observation number.

Fig. 3
figure 3

Mediation model in path diagram form for the five-star rating reviews. Note: Total effect (c) = direct effect (c′) + indirect effect (a * b). p < 0.01***, p < 0.05**, p < 0.10*

In Fig. 3, c represents the total effect of X → Y, whereas c′ represents the direct effect of X → Y after controlling for the proposed mediator. The independent variable’s effect on the mediator is represented by a, and the mediator’s effect on the dependent variable (controlling for the independent variable) is represented by b. Finally, we calculate the indirect effect by multiplying a * b. In line with Preacher and Hayes (2004), we performed a bootstrapping to test the indirect effect’s (a * b) statistical significance.

We conducted the mediation test based on the analyses in Hayes (2009) and Rucker et al. (2011). The mediation analysis revealed that the effect of review content type on helpfulness is mediated by diagnosticity in five-star ratings. Specifically, the total effect of review content type on helpfulness was significant (c = − 0.656, t = − 3.082, p = 0.003). Also, the effect of ambivalent content on diagnosticity was significant (a = − 0.455, t = − 2.575, p = 0.011). The relationship between diagnosticity and helpfulness was also positive and significant (b = 0.909, t = 12.200, p = 0.000). The direct effect of review content type on helpfulness was marginally significant (c′ = − 242, t = − 1.678, p = 0.096). Finally, the estimated indirect effect of review content type on helpfulness mediated by diagnosticity was significant (a * b = − 0.414, 95% CI [− 0.7841, − 0.1039]; Table 5). We used a bias-corrected bootstrapping method to compute the value of the indirect effect, which indicated that the mediating effect was significantly different from 0 at p < 0.05 as the CI did not contain zero (Preacher and Hayes 2008). The results revealed that diagnosticity partially mediated the effect of review content type on helpfulness (c′ < c). Therefore, we found support for the first hypothesis. In summary, our mediation analysis indicated that ambivalent review content indirectly increases helpfulness through its positive effect on diagnosticity, which in turn increases helpfulness.

Table 5 Results of mediation analysis

5 Study 2

The purpose of Experiment 2 was to test Hypothesis 2 which predicts that reviews with extremely negative ratings with one-sided negative review content will be perceived as more helpful than those with ambivalent review content because the former’s perceived diagnosticity in determining review helpfulness is higher. We conducted an experiment in order to analyze the effect of review content type on helpfulness through diagnosticity for one-star ratings. The following sections provide details on this second experiment.

5.1 Method

5.1.1 Participants and procedure

To test the Hypothesis 2, we collected data through a self-administered online survey using respondents drawn from Amazon Mturk (as in the first experiment). The participants were individuals who were interested in online hotel reviews. We collected a total of 120 samples and included 108 participants (54.6% male, 45.4% female) in the study. We removed 12 survey responses from the sample as they were unusable due to missing data and untrustworthy responses.

The experimental design for both one-star and five-star rating experiments was the same. The experimental condition was divided into two experimental materials (one-sided negative versus ambivalent review content) for one-star ratings. The experiment proceeded in the same order as in Experiment 1. Please refer to Sect. 4.1.1 for the experimental design as Experiment 1 and Experiment 2 were identical except for the star rating levels and content tone.

5.1.2 Experimental stimuli

The stimuli for Experiment 2 were developed using the same procedure as in Experiment 1. The difference was that the stimuli for Experiment 2. Two review content types were given which were one-sided negative and ambivalent contents with the same one-star review ratings.

As shown in Fig. 4, the experimental material for the one-sided negative review content with one-star rating was as follows.

Fig. 4
figure 4

Experimental materials for the one-star ratings

Title: “The worst service ever”

Review Contents: “The room and shower were very cold and dirty. Pillows were like rock, mirror had lipstick marks on. Manager is not friendly at all. One of the worst hotels in the world to stay. Unfriendly staff, bad service, and dirty rooms.”

On the contrary, the experimental material for the ambivalent review content was as follows.

Title: “The worst hotel except the location”

Review Contents: “The room and shower were very cold and dirty. Pillows were like rock, mirror had lipstick marks on. Manager in not friendly at all. The only good thing is the location. It was easy to get to and all the sights were within walking distance. And some of steps were fine.”

5.1.3 Pre-test for stimuli

A pre-test was conducted to confirm that the participants perceive the different experimental materials as we did for Experiment 1. A total of 85 participants were recruited through Amazon MTurk in return for financial incentives. Of the 85 participants, male and female participants were evenly distributed (Male: n = 43, 50.6% and Female: n = 42, 49.4%).

We followed the same procedure for the pre-test as Experiment 1. The participants were asked to read the experimental materials and then answered to the question (Cheung et al. 2012), “This review includes only negative comments.” As a result, the participants group with the one-sided positive review content (Mone-sided = 6.75, SD = 0.080) showed significantly higher scores than did the participant group with ambivalent review content (Mambivalent = 6.07, SD = 0.137). The result showed that the experimental materials were perceived as intended (t-value = 4.260). The participants were asked to answer to another question, “This review is biased towards one side.” The results indicated that the participant group with one-sided negative review content (Mone-sided = 6.68, SD = 0.090) showed higher scores than did the participant group with ambivalent review content (Mambivalent = 6.14, SD = 0.128). This also confirmed that the experimental materials were perceived as intended (t-value = 3.441) (Table 6).

Table 6 Results of pre-test

5.1.4 Measures

The experiment’s construct measured in the second experiment were the same as the ones used in the first experiment. The independent variable (X) was review content type (i.e. one-sided negative or ambivalent content for one-star ratings). We identified each condition with the variable X, and we assigned a value of 1 to the one-sided review content condition and a value of 0 for the ambivalent review content condition. We measured diagnosticity and helpfulness the same way in this experiment as we did in the first experiment. The Cronbach alpha for these measures indicate that they are reliable (diagnosticity [α = 0.927]; helpfulness [α = 0.945]).

As shown in Table 7, correlation test showed that there was a significant correlation between diagnosticity and helpfulness (r = 0.798, p = 0.000). As the correlation coefficient did not exceed the threshold which is 0.80 (Field 2018), there was no collinearity in two variables. However, the correlation coefficient was close to 0.80 and thus, we conducted the additional test to determine the collinearity issue using the VIF and tolerance level.

Table 7 Correlation matrix

As shown in Table 8, the results showed that VIF was 1.150, which suggests no collinearity in the variables (Bowerman and O’Connell 1990). In addition, the tolerance was 0.870. Therefore, based on the analyses, it was concluded that there was no collinearity in these variables.

Table 8 Results of the collinearity test

5.2 Results

Experiment 2 was to confirm the influence of different review content type in the same negative ratings on helpfulness, mediated by diagnosticity. We first conducted t-test to compare two means: one-sided and ambivalent review contents in negative review ratings. In the condition of negative ratings, participants who were given one-sided negative review content (Mone-sided = 5.9035, SE = 0.132) perceived higher review helpfulness than those who were given an ambivalent review content (Mambivalent = 5.481, SE = 0.137), as shown in Fig. 5. The difference in the perceived helpfulness was significant (t(106) = 2.384, p = 0.019). This means that although online reviews have the same one-star negative ratings, online reviews with only negative contents are perceived more helpful than online reviews with both positive and negative contents.

Fig. 5
figure 5

Result of t-test in negative ratings

Second, we conducted a mediation analysis (using the Hayes PROCESS macro) to examine the mediating role of diagnosticity in the relationship between review content type and helpfulness.

Figure 6 displays the mediation model for between-participant design in path-diagram form. The mediation analysis revealed that the effect of review content type on helpfulness is mediated by diagnosticity in one-star negative ratings. Specifically, the total effect of review content type on helpfulness in one-star ratings was significant (c = 0.454, t = 2.384, p = 0.019). Also, the effect of one-sided negative content on diagnosticity was significant (a = 0.646, t = 3.984, p = 0.000). The relationship between diagnosticity and helpfulness was also positive and significant (b = 0.9257, t = 13.1432, p = 0.000). However, the direct effect of review content type on helpfulness was not significant (c′ = − 0.144, t = − 1.144, p = 0.254). Finally, the estimated indirect effect of review content type on helpfulness as mediated by diagnosticity was significant (a * b = 0.598, 95% CI [0.2921, 0.9208]; Table 9).

Fig. 6
figure 6

Mediation model in path diagram form for the one-star ratings. Note: Total effect (c) = direct effect (c′) + indirect effect (a * b). p < 0.01***, p < 0.05**, p < 0.10*

Table 9 Results of mediation analysis

The results revealed that diagnosticity fully mediated the effect of review content types on helpfulness. Therefore, we found support for Hypothesis 2. In this way, the one-sided negative review content was more helpful than the ambivalent review content for the one-star ratings because the one-sided negative content has more information diagnosticity for the online reviews.

In summary, based on the two experimental analysis, information diagnosticity of a five-star rated review’s helpfulness can be improved when the review includes both positive and negative information in the written contents. However, for one-star ratings, including only negative written contents rather than both positive and negative written contents helps improve information diagnosticity. This indicates that the negativity bias influences the degree of information diagnosticity in online reviews, which results in determining perceived review helpfulness.

6 Discussion and conclusions

6.1 Discussion

We examined whether the written content of online hotel reviews can generate systematic differences in the review’s perceived helpfulness even with identical ratings. Specifically, the results from our two experiments demonstrate that, when an online reviews has a positive rating, written content that contains both positive and negative information is perceived as more helpful than an online review with only positive written content. In contrast, we also find that, when an online review has a negative rating, written content that contains only negative information is perceived as more helpful than an online review with both positive and negative written contents. Furthermore, we importantly revealed that the level of information diagnosticity provided in online reviews is an important psychological mechanism for consumers to determine the review helpfulness.

6.2 Conclusion

6.2.1 Theoretical implications

We believe that our findings provide important theoretical implications. First, we investigated the helpfulness of online reviews with extreme review rating by finding the dynamics between ratings and written contents. It is no doubt that when seeking reviews, people would consider both review ratings and written contents at the same time (Chevalier and Mayzlin 2006; Schlosser 2011; Chatterjee 2020; Srivastava and Kalro 2019). However, these two different quantitative and qualitative components of online reviews have been separately examined in previous studies in terms of enhancing review helpfulness, credibility, or influencing consumer decision making process. Although some studies have considered both review ratings and written contents together in data set or conceptual frameworks (e.g. Kim et al. 2020), there are still limited findings on the dynamics between two different components of online reviews and how it impacts the perceived review helpfulness. In this regard, our findings suggest that examining both quantitative and qualitative aspects of online reviews together provides a more comprehensive view of how customers determine the review helpfulness.

A large amount of research on the information diagnosticity of online reviews have had mainly focused on a comparison of the reviews with extreme ratings (positive or negative) to the reviews with moderate ratings, showing that people perceive extreme ratings as more useful than moderate ratings (e.g. Park and Nicolau 2015). However, our study showed that consumers’ perceived helpfulness of online reviews with the same extreme positive or negative ratings is contingent on valence of written contents. Although many researchers have been interested in proving the differences in review helpfulness between extreme versus moderate review ratings, the role of the valence of written reviews in this context has not been examined. Our findings show that the ambivalent written reviews under extreme positive review ratings is more influential on review helpfulness than those of under extreme negative review ratings. This result provides a deeper understanding on dynamics between review valence and review components.

In addition, our findings contribute to the literature by revealing the underlying mechanism that lead to systemic differences in the perceived helpfulness of online reviews with the same extreme positive or negative ratings. Drawing on the negativity bias of information, our findings show that the information diagnosticity plays an important role. In other words, when the level of the information diagnosticity in online reviews is higher, having more online reviews is helpful for consumers to make decisions. The previous studies analyzing field data were not able to demonstrate the underlying mechanism of how the factors that were found to be determinants of review helpfulness actually impact the review helpfulness. As a result, we believe that revealing the underlying mechanism of how consumers determine review helpfulness by incorporating experimental studies add to relevant literature.

6.2.2 Managerial implications

Qazi et al. (2016) suggested that helpful reviews are not only likely to improve the value of companies that provide customer reviews but also attract consumers that are seeking information. In this regard, our findings can also provide useful, practical implications for travel websites and service providers.

Our findings suggest the importance of encouraging their visitors to post online reviews that contain more diagnostic information. In order to do so, when the product rating is positive, the travel websites might design the review-posting platform in a way that encourages customers to include both positive and negative consumption experiences. For instance, the companies can provide customers with two separate writing boxes for positive and negative experiences so that the customers can at least include a minor complaint despite of an overall satisfactory experience. In this way, the positively-rated online reviews can contain more diagnostic information, which will improve the helpfulness of the reviews for other customers. In addition, under the rise of fake reviews, improving the review trustworthiness of each consumer review is critical to enhance an overall credibility of the review website itself. From this perspective, when consumers give extreme positive review ratings which can be easily perceived as fake reviews, encouraging them to provide both positive and negative experiences will be a helpful way to assure credibility of websites. When it comes to negatively-rated reviews, the travel websites might consider providing customers with one writing box in which they can describe their experience. In this way, the customers will not feel obligated to include positive information when their experience is unsatisfactory overall. Consequently, based on our findings, providing customers with a properly designed review-posting platform is likely to increase the possibility that they will post online reviews that contain more diagnostic information. This improved platform can increase the perceived helpfulness of online reviews.

Our study also can provide another practical implication for hotel managers in terms of management response strategy which has been found to be important for subsequent customer reviews (Chang et al. 2015; Li et al. 2017; Wang and Chaudhry 2018). Based on our findings, although online reviews show the same positive ratings, consumers find the reviews containing both positive and negative contents diagnostic and helpful to make their purchase decisions. Typically, hotel managers tend to pay more attention and respond to customer reviews with negative ratings in order to recover from the service failure and minimize the negative impact on the subsequent customer reviews (Xie et al. 2014; Anderson and Han 2016). However, our findings suggest that hotel managers also need to incorporate proper responding strategies for the positively-rated reviews contacting two-sided comments as they can negatively affect other customers just like the negatively-rated reviews do. Thus, properly designed management response strategy considering both reviews ratings and review contents would be necessary to positively influence other customers’ purchase decisions and the subsequent reviews.

6.2.3 Limitations and future research

While our findings make important contributions, the present study does have certain limitations that further research should address. First, we examine the review ratings and the written content as the main components of online reviews; however, the pictures posted by reviewers can also influence the review’s helpfulness by increasing the level of information diagnosticity. Therefore, including posted review pictures in future research will provide a more comprehensive view of the determinants of perceived review helpfulness. Second, while we focus on the interplay between review rating and written content in both positive and negative online reviews, it will be interesting to see future research include online reviews with neutral ratings. Last, for generalizability, it will be helpful for future research to supplement our experimental findings with field data.