Keywords

1 Introduction

Online reviews of physicians have the potential to reduce information asymmetry between healthcare providers and patients, empowering patients to make better decisions. A pertinent question, then, is whether, and to what extent, patients benefit from online reviews of physicians.

Ascertaining this efficacy is important because of the greater role that online reviews play in patients’ decisions about which physicians to see and which ones to avoid (Hanauer et al. 2014). In fact, many physicians monitor their reviews and ratings closely and try to boost their ratings on review sites such as Yelp, Vitals, and RateMDs.Footnote 1 There are even numerous instances in which physicians have filed defamation lawsuits over negative patient reviews.Footnote 2 Evidently, patients are increasingly using online reviews to select physicians as well as other healthcare providers, prompting providers to take these reviews rather seriously. Despite the increasing importance of online reviews in healthcare, it is not at all clear that these reviews are actually leading to better patient choices. Put differently, the relationship between physician reviews and quality of physician care remains largely unexplored. A major challenge lies in the difficulty of accurately measuring the quality of care provided by physicians. Some researchers have used surveys to assess patients’ perceptions of physicians to construct a proxy for physician quality (Doyle et al. 2013). However, patient perception may not be the same as reality.

To address this challenge, we obtained research data for this study from two sources. The first dataset (spanning from 2006 to 2015) was obtained from Dallas Fort Worth Hospital Council’s (DFWHC) Research Foundation database on COPD patients. This dataset consists of approximately 630,000 inpatient admission-discharge records, 10,200 attending physicians, and 330,000 patients. The second dataset of about 14,500 physicians in North Texas (spanning from 2007 to 2015) was collected from Vitals.com. This dataset provides data on physician characteristics and online reviews, including textual reviews, review ratings, and years of physician practice. We integrated the data from these two data sets, using physician names, to create a unique dataset that provides patient health outcomes for physicians who are also rated and reviewed by their patients and examine whether online reviews of physicians are reliable predictors of their patients’ clinical outcomes. In other words, if a physician receives very positive online reviews, does that also mean that her patients also exhibit good health outcomes?

Our results show that patients under the care of physicians with better online reviews may not necessarily experience better clinical outcomes, compared to patients receiving care from physicians with worse review ratings. Our results have broader implications for healthcare providers and consumers.

2 Literature Review

A few recent studies in the information systems area examine online ratings and reviews of care providers. For example, Bardach et al. (2013) suggest that reviewers on Yelp may possess knowledge on important aspects of care. Gao et al. (2015) find that physicians who are rated lower in quality (by the patient population) in offline surveys are less likely to be rated online and online ratings are positively correlated with patient reviews, and that online ratings tend to be exaggerated at the upper end of the quality spectrum. They construct their quality measure using patient surveys conducted by Consumers’ Checkbook using the instrument and procedure designed by the U.S. Agency for Healthcare Research and Quality. Gray et al. (2015) don’t find a clear evidence of association between physician website ratings and traditional quality measures such as blood pressure or low-density lipoprotein controlled. Although these papers shed much needed light on patient perception of providers, they either (1) rely on limited care quality measures such as offline patient satisfaction surveys or (2) are mostly limited to aggregated numeric ratings of physicians as a surrogate for patient perception and often do not consider the rich sentiments expressed in textual reviews.

Studies in medical journals examining the relationship between patient experience and clinical outcomes, such as the mortality rate, 30-day readmission rate, and clinical safety, are also relevant (e.g., Glickman et al. 2010; Boulding et al. 2011). A comprehensive review, conducted by Doyle et al. (2013), summarizes prior research that examined the relationship between patient experience and clinical outcomes. Majority of these studies find positive connection between patient satisfaction and clinical outcomes. Although these findings provide important insights, a bulk of the studies in this literature stream rely on offline surveys to solicit patient experience, which do not allow significant parsing of the textual content through sentiment-mining and topic-modeling techniques as can be done with online reviews. These prior studies have also often relied on cross-sectional hospital- or clinic-level data, limiting the extent to which their findings can be extrapolated to the context of patient experience at the physician level. Finally, the use of these survey findings by patients is not nearly as widespread as is that of websites containing reviews of physicians.

The stream of research on online consumer reviews has generally found that online reviews of products, such as books, and services, and hospitality, enable consumers to make more informed decisions by providing them information on other consumers’ perspectives (e.g., Vermeulen and Seegers 2009; Chevalier and Mayzlin 2006). However, it is not clear whether the findings in prior research relating to the efficacy and usefulness of reviews automatically are applicable to a healthcare context. That is, the true quality of healthcare services could be significantly more difficult to assess when compared to the context of hotels, restaurants, or other similar services.

3 Research Question

Online reviews of physicians can contain rich information and often provide significantly more information than numeric (star) ratings. For example, they can help users gather information about the experience of past patients of a physician including, but not limited to, bedside manners of the physician, whether she spends sufficient time with her patients, follows up after the visit, and the thoroughness of explanations (of diagnoses and procedures) provided by her or her staffs. Some aspects of online reviews, such as detailed accounts of procedures and clinical steps performed by a physician, may even provide useful cues about the clinical aspects of care. Moreover, online reviews can influence patients’ choices. Based on a survey of patients, Hanauer et al. (2014) report that 35% of the respondents selected physicians with good ratings, while 37% avoided those with bad ones. Thus, it suggests that prospective patients expect physicians who receive largely positive online reviews to deliver better clinical outcomes. However, to the best of our knowledge, there is no data-driven evidence that this is indeed the case.

There ought to be a concern about the reliability of online reviews of a physician in predicting the quality of service delivered by the physician because a patient, who typically lacks a comprehensive medical training, may not be well equipped to ascertain the clinical proficiency of a physician.Footnote 3 Also, an online review of a physician may not necessarily provide information on the clinical characteristics of that physician’s care delivery and could easily overemphasize factors such as flexibility in scheduling appointments, promptness and courteousness of the staff, receptiveness and of the medical team, etc. These factors are not necessarily indicative of the level of clinical care provided by the physician. This leads us to our central research question:

Are physicians who receive better online reviews more likely to deliver better clinical outcomes for their patients?

4 Research Framework

4.1 Variables

The two clinical outcome measures used in our study, Future30DayReadm and FutureERVisit, are constructed from the DFWHC dataset. Future30DayReadm is the proportion of future patient admissions within thirty days of the previous discharge date, for a given physician at a given point in time (quarter), due to the same principal diagnosis (i.e. COPD). We construct a binary variable that equals 1 for a patient visit only if that patient’s next admission date is within 30 days of his current discharge date. Then, for each attending physician, we calculate the rolling average of this dummy variable, beginning from the chronologically last (most recent) inpatient admission record to obtain Future30DayReadm. FutureERVisit is the proportion of future patient admissions involving a visit to an emergency room, with construction similar to Future30DayReadm.

The key explanatory variables with regard to online reviews are OverallRating and SentimentScore. OverallRating is the average of the overall star ratings of a physician at a given time, and SentimentScore is the average of the sentiment score (up to a time-point) derived from textual reviews in vitals.com. The sentiment analysis technique that we applied classifies the sentiment of each word in a review into four sentiment categories: very positive, positive, very negative, and negative (based on the vocabulary provided by Nielsen 2011). Then, aggregation across all sentiment words within a review yields an overall sentiment score, SentScorePerReview, for the review.Footnote 4 To control for variations in clinical outcomes arising from variations in the patient-mix handled by physicians, we create several controls. (Note that these controls as well as the key explanatory variables are backward-looking, as opposed to the forward-looking outcome variables Future30DayReadm and FutureERVisit.) We also control for sentiment variance, and latent topics underlying the textual content of online reviews.

We, next, conduct a fine-grained textual analyses of the online reviews by deploying latent Dirichlet allocation (LDA) (Blei et al. 2003) to derive latent topics underlying the textual content in online reviews. Figure 1 plots the distribution of the sentiment category (positive, neutral, or negative) across these four latent topics.Footnote 5 Reviews under the latent topic “Overall Care” tend to be rated more positively, as opposed to reviews for the other three latent topics, while reviews for the latent topic “Promptness” tend to be more negative, compared to the rest. This provides some insights into how the types of underlying themes might be driving sentiments expressed in online reviews.

Fig. 1.
figure 1

Frequency of sentiments by latent topics

4.2 Estimation Model and Results

To account for potential physician-time-level fixed effects and omitted variable biases, we consider a two-stage two-way fixed-effects panel regression with instrument variables. The physician fixed effects account for time-invariant physician attributes that are not captured in our data. The use of forward-looking measures for the outcome variables helps us mitigate possible biases in coefficient estimates of our key explanatory variables, which can arise from simultaneity between these explanatory variables and clinical outcomes. We construct two instrument variables (IV), which represent the average sentiment score of online reviews and average score of online ratings received by the focal attending physician’s peer physician group in the same hospital system, over the previous two and a half years (10 quarters). A physician’s reviews (online perception) can be reliably predicted using the online perception of other physicians in the same hospital system, aggregated over time. But, this time-aggregated online perception of her peer group need not systematically determine clinical outcomes of her (i.e. focal physician’s) patients. The first stage regression results indicate that these IVs are strong. Table 1 presents the second-stage regression estimation results.

Table 1. Two-stage Two-way fixed effects IV estimation results (second-stage)

The coefficient estimates of our key explanatory variable—SentimentScore and OverallRating—in Table 1 demonstrate that physicians who receive better online reviews or higher online star ratings, compared to their peers, do not necessarily exhibit better health outcomes as measured by the future 30-day readmission or ER visit rates of their patients. In fact, higher overall ratings are associated with a higher frequency of future ER visits, casting additional doubts on the efficacy of online reviews and ratings. Hence, our results suggest that neither sentiments expressed in reviews nor numeric ratings are accurate predictors of actual clinical outcomes.

5 Robustness Checks

An endogeneity concern could arise from potential self-selection by patients, i.e. patients with poor health may choose to go to physicians perceived to be of high quality. When that happens, physicians who deliver better clinical outcomes could end up receiving relatively poor reviews. To deal with possible self-selection, we apply the two-stage Heckman selection method. The results from the Heckman method do not lend any evidence to the possibility that patient self-selection is indeed driving our main finding that reviews and ratings are not as useful in predicting clinical outcomes, as commonly believed. These results are omitted due to space constraints.

Next, we consider the possibility that physicians whose patients experience poor clinical outcomes (high readmission or ER visit rates) may be involved in of review manipulation. To examine this, we divided our physicians into two groups: those whose patients have experienced below-average readmission rates (AvgFut30DayReadm = 0), and those whose patients have experienced above-average readmission rates (AvgFut30DayReadm = 1). We repeat this for ER visit rates and again create two groups for AvgFutERVisit = 0 and AvgFutERVisit = 1, respectively. We, next scrape the numbers of “recommended” and “not-recommended” reviews for physicians from Yelp. Reviews not recommended are potentially suspicious due to potential for manipulation. Thus, if we find that physicians whose patients have experienced relatively poorer clinical outcomes have a disproportionately larger number (or fraction) of such reviews, we can suspect some manipulation on Yelp, and perhaps other web sites as well. None of the t-tests’ results in Table 2 suggest that physicians who deliver above-average readmission or ER visit rates receive a higher number (or fraction) of “not-recommended” reviews, compared to physicians who deliver below-average readmission or ER visit rates, not providing any evidence that physicians are engaging in active manipulation of online reviews.

Table 2. Comparison of number and percent of not-recommended yelp reviews

6 Contributions and Implications

In summary, our paper contributes to and builds on prior research in the following four ways: (1) it attempts to study the relationship between online reviews of a physician and actual clinical outcomes of the physician’s patients, (2) it measures clinical outcomes objectively based on the readmission rate and ER visit rate at the patient-admission level, (3) it analyzes the fine-grained textual content of reviews, rather than relying only on aggregated numeric ratings, in examining patients’ opinions, and (4) it applies text mining techniques as well as econometric methods, including a series of robustness checks, to investigate whether the textual content in reviews of physicians is indeed a reliable predictor of clinical outcomes. To the best of our knowledge, there is no prior research that has addressed all of the above dimensions in a unified framework, as we have proposed in this paper.

Our study has several managerial and healthcare policy implications. First, healthcare consumers need to be cautious, when using online reviews and ratings to form opinions about physician quality. Physicians who receive better online reviews, may not necessarily exhibit better quality as measured by their patients’ health outcomes. Second, our results suggest that online reviews require further scrutiny than what is currently done to decipher physician quality. Our study lends support to the concerns raised in the popular press about over-reliance on online reviews of physicians to assess actual physician quality particularly in the context of chronic conditions. Third, hospitals and clinics should be careful about relying on online reviews of physicians for evaluating physician performance, since they do not serve as accurate predictors of future patient health outcomes.