As the literature documenting a consistent relationship between procedural volume and outcome continues to grow, there are advocates of moving forward to volume-based policy changes [13]. For example, the Leapfrog Group is a consortium of healthcare purchasers who support selective referral to high-volume institutions [4]. To emphasize the advantages of selective referral, advocates point to the number of lives that would be saved if all patients were treated at high-volume institutions [3, 57]. There are also opponents to policy initiatives based on the volume–outcome relationship [811]. These focus primarily on the implications of such policy changes, including long travel times for patients in rural areas, the creation of a two-tiered medical system for those rural patients unable or unwilling to travel, unintended alterations of referral patterns, a lack of continuity in postoperative care, and the possibility of further overwhelming already busy high-volume centers. It is possible that the high-volume centers in certain geographic locations would be unable to handle the increase in demand.

Yet there are other important reasons not to move forward with policy changes based solely on the volume–mortality relationship, reasons related to the relationship itself. First, the etiology of the relationship between volume and outcome is still not well understood [9, 11]. The idea of “practice-makes-perfect” has obvious face validity, but studies do not support it over alternate theories [12]. Another explanation is based on “selective-referral patterns”: surgeons and institutions with better outcomes receive more referrals, leading to higher volumes. Additionally, it is widely recognized that volume is not a direct measure of quality. Rather, it is a proxy for other measures, such as structure and process characteristics, which more accurately reflect quality of care.

In this article we investigate the methodological limitations relating to the volume–outcome relationship. We begin with an overview of the strengths and limitations of various quality measures. In particular, we describe why mortality alone is a limited measure. We then look at some of the statistical limitations in analyses of the relationship between surgical volume and outcome. Our goal is to highlight some of the inherent difficulties in the volume–outcome relationship as reasons to be wary of making policy changes at this point.

Is Mortality the Gold Standard for Quality Measurement?

There are three accepted domains of quality of care: structure, process, and outcome [13, 14]. Even among these accepted aspects of quality, debate exists about which is the best reflection of quality [1517]. Outcomes are easier to measure and compare; yet process measures directly reflect whether the appropriate care is given at the appropriate time. Outcome measures are subject to many other contributing factors, even when risk-adjusted, and process measures are more difficult to measure and define, especially in surgery, where little work has been done in this area.

In surgery, many suggested quality measures relate to structure. The teaching status of hospitals, the existence of specialized intensive care units and operating rooms, and staffing ratios are all examples of structural characteristics that could contribute to the observed volume–outcome relationship, but studies to support this are limited [6, 18]. A recent artcicle investigates which hospital characteristics, including house staff and nursing staff ratios, teaching status, geographic location, and hospital ownership, contribute to the volume–outcome relationship [19]. Importantly, the authors found that for procedures with equivalent staffing, outcomes appeared to be equivalent between high- and low-volume institutions for high-risk procedures.

The value of process indicators in surgical quality measurement is also gaining interest. Much of the earlier work relates to clinical pathways and cardiovascular procedures [2024]. There are a few studies that have used process measurement as a means of quality improvement, primarily by studying “high quality” outliers [19, 25, 26]. The Leapfrog Group, the consortium that supports selective referral to high-volume hospitals based on designated volume thresholds, has recently begun to investigate the use of process measures [4]. There are a number of reasons why this is difficult. First, as mentioned above, there is not much data available to define these measures. Second, it will take years for hospitals to develop and institute the necessary infrastructure for such process measurement. Finally, once these quality indicators have been identified and measured, the national resources may not be sufficient. For example, the Leapfrog Group is calling for intensivist staffing of all hospitals, yet there are not enough intensivists being trained to fill those positions. In the meantime, surgical quality measurement appears to fall to outcomes.

Mortality is the most attractive outcome measure due to its ease of measurement, particularly in administrative databases. Yet, mortality is subject to the limitations of any outcome measure, and additionally to the limitation that it is a rare event for most procedures in modern surgical practice, leading to complexities with analysis [27]. Given the lack of consensus on using outcome as opposed to process measures and the limitations of mortality as a quality measure, it seems dangerous to take this one step further to use a proxy such as volume to estimate quality based solely on its relationship with the outcome of mortality.

Level of Analysis

Volume at the Level of the Surgeon or the Institution

To address the limitations of the volume–outcome analysis, we begin with the level of analysis. First, it is necessary to determine which is more important, the individual surgeon or the institution. The literature on the relationship between individual provider volume and outcome is not as consistent as the data supporting the institutional volume–outcome relationship [2832]. Surgeon volume and institutional volume represent very different aspects of quality of care. On the one hand, surgeon volume is a proxy for such individual human factors as technical skill and quality of decision making. Hospital volume, on the other hand, reflects institutional characteristics, as previously mentioned.

Outcome at the Level of the Patient versus the Institution

One can look at the relationship between the institutional volume and the institutional mortality rate or mortality at an individual patient level. Mounting evidence suggests that the most valid statistical approach requires analysis at the level of the individual patient. In other words, one must investigate the effect of institutional volume on mortality using regression analysis, with the patient as the unit of analysis. However, in practicality, one must also understand what is occurring at an institutional level, as the goal is to correlate the performance of the hospital or the quality of care delivered by that institution with case volume. The nature of this relationship is not yet understood, and one could hypothesize that it may take on any number of forms. It is unclear at this point whether the relationship between volume and mortality is continuous, step-wise, or has a single clear cut-off (see Fig. 1). The idea of selective referral, as proposed by the Leapfrog Group, assumes a single cut-off (Fig. 1C), yet there is not yet evidence to support this relationship over the others.

Figure 1
figure 1

Possible relationships between volume and outcome.

Most studies on volume–outcome divide the patients into groups of equal volume (for example, top 25%, upper-middle 25%, lower-middle 25%, lower 25%) and compare mortality between these groups. This is a well-accepted analytic approach to increasing the power of a study by increasing the number of cases (N) in each group; however, there are several problems with this approach. First, if broken into equal groups based on institution, there is large variability between the number of patients reflected in a group; however if broken into equal number of patients, there is substantial variability in the volume range per group. Often, the results of these analyses are reported as the difference in mortality across the extreme quartiles or quintiles. Despite this, some policy initiatives are advocating the use of a single strict volume cut-off to discriminate between “high quality” and “low quality” (as reflected by mortality rate). Studies to date have not addressed the existence of a single volume threshold. The idea of a single “discriminator” is appealing, but it may be overly simplistic. A recent study by the authors suggests that there may be an identifiable single cut-off; however, further work is needed to see if these thresholds are widely applicable [33].

The Variability Issue

A basic statistical principle of a Bernoulli event (e.g., coin flip) is that as the number of observations (N) increases, the variability of the estimate of the rate of the event decreases and the estimate approaches the true rate. In the example of a fair coin flip, the true rate of the flip resulting in heads is 50%. Yet, if the coin is flipped only a few times, the observed rate of heads will likely vary greatly from this true rate. Only when N is sufficiently large, can we be guaranteed that the observed rate reflects the true rate. The definition of what is meant by “sufficiently large” is a function of the underlying true rate of occurrence. The rarer an occurrence, the greater N will need to be to assure the observed rate reflects the true underlying rate.

This principle can be applied to mortality rates to show the difficulty in analyzing the relationship between volume and mortality. In low-volume institutions, where only a few cases are done, there is a high degree of variability in the observed mortality rate that may not truly reflect the quality of care. If a given hospital performs one procedure, there are only two possibilities for the observed mortality rate: 0% (the patient does not die) or 100% (the patient dies). As another example, let us assume that a given hospital performs 5 procedures annually and the true underlying mortality rate is 0.7%. The only possible values for the observed mortality at this hospital are 0 (0 of 5 patients die), 20% (1 of 5 patients die), 40% (2 of 5 patients die), 60% (3 of 5 patients die), 80% (4 of 5 patients die), and 100% (all patients die), with respective probabilities of 0.73, 0.23, 0.03, 6 × 10−5, 8 × 10−7. Thus, 27% of the time, the measured mortality rate for this hospital is 20% or greater despite the fact that the true mortality rate was set at 0.7% in this hypothetical example. In contrast, if the hospital now performs 10 procedures in a given year and the true mortality rate is stable at 0.7%, only 12% of the time will the measured mortality rate be 20% or greater. If the annual volume is 15, the observed mortality rate will be greater than 20% only 6% of the time. When the number of cases performed at a hospital is low, the most common mortality rate will be zero and a few outliers will drive the overall relationship. If a death does occur, the observed mortality is going to vary substantially from the true underlying mortality rate. As the volume increases, the observed mortality rate will naturally converge toward the true mortality rate. The true mortality rate can be estimated by the expected mortality rate. This suggests that crude volume–outcomes analyses may be biased against smaller hospitals.

Figure 2, which plots adjusted mortality against volume, illustrates these points. Adjusted mortality is a ratio of the observed mortality rate to the expected mortality rate (which is an estimate of the true mortality rate). The observed to expected mortality ratios and volumes in Figure 2 are drawn from an analysis of the University HealthSystem Consortium (UHC) Clinical Database for the year 2000 [34]. The UHC computes an expected mortality rate based on the sum of the probability of death for the patients treated at a given institution. The individual patient’s probability of death is calculated using logistic regression modeling of that patient’s preoperative risk factors, such as comorbidities and demographics. The probabilities are then added together to calculate the expected mortality rate for a given institution. To the eye, these data seem to demonstrate a highly correlated inverse relationship, but the correlation coefficient is not significant for three of the four graphs. Only for coronary artery bypass grafting (CABG) is there a weak (r = −0.219) but statistically significant inverse relationship (p = 0.047). This is in part due to the large number of institutions with zero mortality at the lower end of the volume spectrum. Progressing along the abscissa, the variability between institutions decreases and the graph converges.

Figure 2
figure 2

Scatter plot of O/E mortality by procedural volume. The ratio of the observed raw mortality to the expected mortality rate (based on patient-specific characteristics) is plotted against procedural volume for each of the four procedures. The Pearson correlation coefficient (r) and p-value are included in parentheses for each procedure. AAA: abdominal aortic aneurysm repair; CABG: coronary artery bypass grafting; CEA: carotid endarterectomy

Conclusions

Important policy changes, such as selective referral, are being suggested based on the relationship between volume and outcome. Volume offers an easy, simple measure that can be easily understood and employed by both healthcare providers and consumers. However, the use of volume as a quality indicator is not as straightforward as it may seem. Mortality is itself a limited measure of quality, and the methodologies used to evaluate the volume–outcome relationship may not be as simple as they at first seem. There is certainly value in continuing to investigate the volume–outcome relationship, however. Studies that lead to better understanding of the underlying institutional factors that lead to the difference in outcomes are crucial. In the meantime, it seems premature to be moving forward with referrals based on volume alone when there are still many issues to be resolved.

We strongly support the profession, the consumers, and the purchasers of healthcare for their desire to truly understand high quality health care, but we also caution against oversimplified solutions that do not accurately address those concerns. We, as a profession, should continually strive to improve the delivery of care to all of our patients, and understanding quality is an important aspect of that commitment.