Significant progress in pediatric cardiac surgery has raised expectations and generated a need to monitor performance. In adult cardiac surgery, a correlation between volume and mortality has long been recognized [21] and used to advocate for regionalization. Pediatric cardiac surgical mortality also varies among centers and in some studies correlates with volume [9, 12, 14, 27, 29], although other factors such as patient characteristics and referral patterns may be involved. This variation is important to patients, families, physicians, and policymakers [4, 5]. However, the impact of volume and other center-specific effects on pediatric cardiac postoperative mortality is incompletely understood, and additional research in this area has been encouraged [7].

Evaluating institutional performance of pediatric cardiac surgery is difficult for three reasons related to the diversity and complexity of congenital heart disease. First, coding systems do not fully capture the complexity of the diagnoses and procedures involved. Second, the wide spectrum of risk factors complicates adjustment for case mix. Third, the relative rarity of each procedure limits statistical power for detecting institution-level differences [27, 30].

The Pediatric Cardiac Care Consortium (PCCC), a multi-institutional registry, has collected patient-level data since 1982 to support quality improvement in congenital heart surgery [19]. Through 2007, the PCCC includes more than 137000 patients from centers performing up to 500 operations/year. Detailed information on invasive cardiac procedures, as well as cardiac and noncardiac diagnoses, permits reliable risk adjustment. Many previous studies have described outcomes after pediatric cardiac operations but frequently are limited to specific anomalies, individual large centers of excellence, administrative data, or short time periods and thus cannot provide a detailed longitudinal assessment using clinical outcome data.

It is important to characterize the factors affecting outcomes in centers performing up to 500 operations per year because many pediatric cardiac operations are—and will continue to be—performed in such centers [1, 3, 5, 25]. Furthermore, others have suggested that volume-related variation is most evident in this size range [29]. Therefore, we analyzed the relationship of surgical volume and other risk factors to postoperative mortality at PCCC centers and determined whether these relationships changed with time.

Methods

Data Source: The Pediatric Cardiac Care Consortium

The PCCC registry collects detailed clinical data from centers performing pediatric cardiac procedures [19]. All cardiac operations (except isolated ductal ligation in preterm infants weighing less than 2.5 kg) are reported prospectively by the centers. Diagnosis and procedure coding takes place at the core facility, and the PCCC centers retain the right to study their cases locally or via other registries. A subset of the centers likely contributed to an earlier publication [29] addressing similar questions over a shorter time period.

For this study, we excluded centers outside North America, a center that transferred patients to outlying hospitals for postoperative recovery at a rate 10-fold higher than expected, and any years that a center contributed incomplete data or performed fewer than 10 operations.

Risk Adjustment and Independent Variables

The Risk Adjusted Classification for Congenital Heart Surgery, version 1 (RACHS-1) is a validated and widely used risk-adjustment system that classifies congenital cardiac operations into six categories by expected early mortality rates [13]. Risk category 1 operations have the lowest risk of death and risk category 6 operations the highest. Because category 5 operations are very rare (~0.1 % overall), categories 5 and 6 were combined (termed “category 5&6”).

We also analyzed patient sex, age at operation (partitioned at 28 days and 1 year), year of operation (with 1982–2007 divided into five time periods for most analyses), and center annualized surgical volume (calculated for each time period as the total operations performed divided by the number of participating years of that time period). We modeled volume separately as continuous (linear on the log-odds scale) and as categorical (divided into approximate tertiles).

Patients, Procedures, and Outcomes

We calculated surgical volume using all operations performed for pediatric or adult congenital heart disease, with or without cardiopulmonary bypass, in each year or time period. After computing surgical volume, we excluded adults, hospital admissions ending in transfer to another center, and admissions containing any procedures not classifiable by the RACHS-1 system (expected to be 11–14 % of operations) from the multivariate analysis. We included patients with multiple surgical admissions, with the admissions treated as independent.

Choosing the appropriate end point after pediatric cardiac surgery is difficult [8, 11, 28, 31]. We used early postoperative mortality, defined as in-hospital death within 30 days, as the primary outcome measure. When a single hospital admission involved multiple operations, we included only one operation, chosen by highest RACHS-1 score [13] first, then by earliest date.

Study Design and Statistical Analysis

The original study design specified multivariate analysis of the relationship between surgical volume and various patient-specific factors on mortality, quantification of relative contributions to the model, and analysis of interactions. The center random effect was included per statistical routine to account for possible confounding. After seeing that the center effect contributed more to the model than the volume term, we undertook additional analyses to characterize better the behavior of the center effect.

We summarized categorical variables in frequency tables and compared them using Chi square tests. We used mixed-effect logistic regression models to compare the association between surgical volume and mortality, with and without adjustment for additional covariates. We addressed correlation within a center by including a random effect (assumed to follow a normal distribution with a mean of 0) that represented the deviation of a center from the overall adjusted log-odds of death.

In the multivariate model, we included the following predetermined clinically relevant covariates: year (partitioned into 5-year intervals), volume, center, risk category, age (partitioned into predefined age groups), and sex. We analyzed models with interaction terms to evaluate prespecified interactions between risk category and era, volume and risk category, volume and era, and volume and age group. Each interaction model also included the main effects from the multivariate model.

We report odds ratios (ORs) for death, as defined earlier, with 95 % confidence intervals (CIs). We constructed approximate F tests for each factor in a nonlinear mixed-effects logistic regression model. We also computed approximate standard errors as well as the corresponding t statistics and p values for the factor values using the delta method. All the tests were two-sided using a cutoff of 5 % without adjustment for multiple comparisons.

To assess the relative contribution of each variable to postoperative mortality, we used the likelihood ratio test (LRT) and the Akaike information criterion (AIC) in uni- and multivariate models. In the multivariate models, we removed each variable (one at a time) to assess the increase in AIC relative to the full model [16, 24].

We performed analyses using SAS version 9.2 (SAS Institute, Cary, NC) and created figures using R version 2.14.1 (R Foundation for Statistical Computing) and Microsoft Excel.

Sensitivity Analyses

We computed the proportion of observed deaths minus the mean predicted probability of death for each center–era combination and plotted these against annualized volume to assess the potential of nonlinearity for annualized volume and to identify influential centers. We recomputed models for comparison after removing one center in which volume had dropped sharply. Considering that duration of hospitalization before death might vary by center, we repeated the multivariate model with the end point of all in-hospital mortality rather than 30-day in-hospital mortality. We evaluated an analysis using only the first admission per patient to compare the results after removing potential intra-patient correlation. To assess the effect of treating age and year as categorical variables and of annualizing volume, we computed a multivariate model using age and year as continuous variables and volume based on single years. Finally, we repeated the analysis including procedures unclassifiable by RACHS-1 as a separate category in the regression model.

Results

The PCCC registry contains data from 57 participant centers for some or all of the period 1982–2007. These centers performed 118,084 operations for 90,124 patients during 112,030 hospital admissions. After applying exclusion criteria, 49 centers contributed 109,475 operations for volume calculations and 85,023 admissions for detailed statistical analysis (Table 1, Figs. 1, 2). Centers performed 11–534 cases per year overall (mean, 172; median, 154; standard deviation [SD], 96) and 13–458 cases per year when annualized by time period (mean, 161; median, 144; SD, 91) (Fig. 3) with tertile cut points of 109 and 188 cases per year. We rounded the SD to 100 cases per year when computing ORs for an approximate 1 SD increase (in the continuous–volume analyses), and categorized volume as small (10–99 cases/year), medium (100–199 cases/year), or large (≥200 cases/year) by approximate tertiles.

Table 1 Characteristics of hospital admissions
Fig. 1
figure 1

STROBE-style flow diagram. Note that all operations in eligible years (asterisk) count toward institutional volume, whereas multivariate analysis was performed on a subset, as described in the text

Fig. 2
figure 2

Annual volume and contribution of each center to the cohort. Centers are arranged (x-axis) by mean annual volume (bars, left y-axis) regardless of the number of years of participation. The percentage contribution of each center (thick and thin lines, right y-axis) accounts for length of participation

Fig. 3
figure 3

Institutional surgical volumes in the PCCC. a Annual center activity (color) for all 57 PCCC centers (y-axis) over time (x-axis). The duration of participation varied among the individual centers (NA = center’s data incomplete or unavailable for that year) and eight centers were excluded. *High transfer rate, **outside North America, ***fewer than ten cases/year. b Statistical distribution of the 49 included centers’ volumes. The x-axis indicates the years and number of centers in each time period, and the y-axis indicates surgical volume annualized by time period. Boxes represent median and IQR (interquartile range, 25–75th percentiles); whiskers represent range within 1.5 × IQR; and circles represent values outside 1.5 × IQR

For most variables, operations were not uniformly distributed among volume categories and time periods (Tables 2, 3). Younger patients and higher-risk procedures were more common at larger centers, with the largest discrepancy in risk category 5&6, comprising 1.8 % of operations at small centers, 2.6 % at medium centers, and 3.8 % at large centers (p < 0.0001).

Table 2 Characteristics of the study cohort by volume category after removal of excluded data
Table 3 Characteristics of the study cohort by time period (TP) after removal of excluded data

Table 4 shows the distribution of admissions and deaths by risk category for the whole study period and for the most recent era (5.2 % of overall admissions and 2.3 % of 2003–2007 admissions ending in death). Unadjusted mortality varied widely across time periods and risk categories but generally decreased throughout the study period, except in the lowest risk category, in which it has remained low and stable since the late 1980s (Fig. 4).

Table 4 Raw data by risk category overall and for the most recent era
Fig. 4
figure 4

Unadjusted mortality by time and risk category. The percentage of admissions ending in death each year is shown for the overall cohort and by risk category

Table 5 shows the association of center volume, risk category, time period, age group, and sex on 30-day in-hospital mortality. Univariate analysis identified significant effects for most of the variables. Patient sex was insignificant in the univariate analysis, but females had significantly higher mortality in the adjusted multivariate model (OR, 1.20; 95 % CI, 1.13–1.29; p < 0.0001). In the multivariate model, risk category, age at operation, and time period contributed more to the prediction of death after pediatric cardiac surgery than center volume, the center random effect, or patient sex (comparing relative contributions to the logarithmic likelihood ratio Chi square of each variable; Table 6).

Table 5 Effects on mortality by uni- and multivariate analysis
Table 6 Significance of each variable included in the final model

Adjusted postoperative mortality decreased more than 10-fold over the study period (analyzing surgical year as a categorical variable, 1982 vs. 2007: OR, 12.27; 95 % CI, 8.52–17.66; p < 0.0001) (Fig. 5a). The decrease occurred in almost all risk categories except the lowest, in which mortality has been low and stable since the early 1990s (Fig. 5b). The RACHS-1 score discriminated risk categories well, except that we did not find a statistically significant difference in risk of death between categories 1 and 2 during the last period (2003–2007; Fig. 5c).

Fig. 5
figure 5

Mortality by time and risk category. Risk-adjusted mortality (x-axis, log scale) is shown as odds ratios (circles) and 95 % confidence intervals (whiskers). a Adjusted mortality over time (reference: year 2007). b Adjusted mortality across time periods (TPs) by risk category (reference: TP 5). c Adjusted mortality across risk categories by TP (reference: risk category 1)

We evaluated center-specific mortality variation in several ways. Individual center effects, representing each center’s deviation from overall odds of death adjusted for patient factors, varied substantially but with evidence of relation to volume (Fig. 6a). Multivariate analysis including institutional volume as a continuous variable confirmed a significant inverse correlation between volume and mortality (OR, 0.84 per additional 100 operations/year; 95 % CI, 0.78–0.90; p < 0.0001). The volume–mortality relationship persisted across age groups (p = 0.3494 for interaction) but varied somewhat by risk categories (p = 0.0049 for interaction) and time periods (p = 0.0023 for interaction) (Table 7). Interestingly, volume had no effect in risk category 1 (OR, 0.99; 95 % CI, 0.79–1.25).

Fig. 6
figure 6

Individual center mortality by volume. The 49 analyzed centers are arranged by mean annual surgical volume (y-axis), with odds of mortality (x-axis, log scale) adjusted for a risk category, time period, age group, and sex, or for b risk category, time period, age group, sex, and volume. Symbols are as above, with centers having significantly increased or decreased odds of mortality highlighted respectively in red or green. For some centers, volume was adjusted up or down slightly to prevent overlap in the figure. This did not change the order of the centers or alter any center’s volume by more than two cases per year

Table 7 The volume–mortality relationship

After adjustment for volume, substantial variability in individual center effects persisted (Fig. 6b). The center effect followed a normal distribution (by assumption), with a mean of 0 and an SD of 0.29 (95 % CI, 0.21–0.37; p < 0.0001). The inclusion of volume in the multivariate model reduced the variability of the center effect by 20.2 %. Center-specific variation still remained significant (p < 0.0001 by LRT; Table 6).These findings demonstrate the importance of center-specific effects above and beyond institutional volume.

We performed sensitivity analyses to assess the stability of our results. We found neither statistical nor clinical differences in the conclusions of analyses that included only the first admission per patient, included operations with an undefined RACHS-1 risk category, used the end point of overall in-hospital mortality, excluded outliers, or treated year and age as continuous variables and did not annualize volume.

Discussion

Using multi-institutional data collected prospectively over 25 years of pediatric cardiac surgery, we analyzed trends in postoperative mortality and quantified the influence of risk factors, including institutional volume. Overall, survival after pediatric cardiac surgery improved substantially, consistent with a previous study in England [2]. The decrease in mortality occurred across all age and risk groups except the minimal risk category 1, which reached a plateau. Over time, the gaps between different risk categories narrowed, but the RACHS-1 score remained by far the best predictor of postoperative mortality. There is residual patient-specific risk not captured by RACHS-1, with younger age and female sex related to an increased risk of death.

We demonstrated a statistically significant protective effect of increased surgical volume on postoperative mortality. The effect was clinically relevant (relative odds reductions generally 10–30 %, similar to a previous report [22]) but modest compared with that of other variables. Others have found that the volume–mortality relationship varies substantially by patient age [21, 23] and may be attenuated [3, 7] or even absent [26] in the modern era; our analysis did not corroborate these findings. However, we did show that the volume–mortality relationship varied significantly by risk category. The complete absence of effect for the lowest risk category suggests that operations in this category are safely performed at smaller centers, consistent with most previous reports [10, 21, 29]. Even for higher-risk operations, the absolute effect of volume is relatively small at current mortality rates.

Previous reports suggest a critical threshold for surgical volume (variously 75–300 cases/year). We did not identify such a threshold within the volume range available in the PCCC, although it is possible that one exists outside this range (i.e., affecting extremely large centers). Regardless, a substantial fraction of patients do not have easy access to these highly specialized centers [1, 3, 5].

To assess the generalizability of results from the PCCC, we compared the PCCC with other data sets [25] and found that (1) case mix was fairly comparable with a wide range of published data sets (two other clinical registries, three administrative databases, and one very large center of excellence) and that (2) in recent years, PCCC centers performed about 10–30 % of the operations in the United States. These findings underscore the importance of understanding and improving outcomes from such centers.

Institutional surgical volume has been proposed as a quality indicator for pediatric cardiac surgery [18]. Our study found volume to be a relatively weak predictor of a center’s mortality rate, similar to previous reports from other databases [7, 9, 14, 29]. We demonstrated that volume-independent center-specific effects contribute more to the statistical model than volume. These findings are consistent with those of other studies [9, 23, 27, 29] suggesting that volume should not be used in isolation to predict quality at the level of individual institutions.

Regionalization has been considered as a pediatric cardiac surgery quality improvement strategy [4]. This is a complex policy issue, but if undertaken, our results suggest that centers must be targeted by direct analysis of quality rather than by volume alone. The implications for selective referral strategies [1] are similar. Our study provides reference data useful for identifying patient subgroups with the greatest potential benefit.

Study Strengths and Limitations

The strengths of our study included the large data set, the ability to characterize changes over time, the use of clinical rather than administrative data, and attention to recommendations regarding the statistical approach to volume–outcome analysis [15, 16, 22, 24].

The study limitations were mainly those typical of retrospective registry-based studies. We did not control for all patient factors, such as prematurity, weight, comorbidities, critical illness severity, and sociodemographic variables, or institutional factors such as team composition, individual surgeon training and experience, type of facility (e.g., free-standing children’s hospital, general hospital), transfusion practices, infection control, and care pathways, that may affect outcomes. Such factors should be studied in the future because they may be critical to explaining the systematic but volume-independent difference between centers but unfortunately are not available in our data set.

We grouped operations by RACHS-1 risk category and therefore cannot evaluate whether our results apply uniformly to all operations in each risk category or whether alternate approaches to risk adjustment would have affected our results, although the discrimination difference between RACHS-1, the Society of Thoracic Surgeons–European Association for Cardiothoracic Surgery (STS-EACTS) system, and the Aristotle Basic Complexity score is known to be small [20]. We calculated volume over discrete intervals, starting before and ending after each case, rather than on a rolling basis. This is unlikely to have substantially altered our findings because volumes were generally consistent over time except at one center that was therefore the subject of a sensitivity analysis.

The voluntary and changing makeup of the PCCC may limit our inference, and with only a few centers performing more than 350 operations per year, this study could not address the volume–mortality relationship at very large centers. However, a prior study including larger centers suggested that the volume–mortality effect is essentially limited to centers performing fewer than 300 operations/year [29]. Finally, unmeasured referral patterns may have confounded the results of any observational volume–outcome analysis [17], in which case, volume and mortality would indeed be correlated but not causally related.

Implications and Conclusions

Our findings have implications for future research. Studies of volume–outcome relationships and other center-specific variations set the stage for investigation of factors that may mediate these relationships and for targeted quality improvement. However, improvement efforts using mortality as the end point are hampered by limited statistical power [6]. Future research needs to identify end points that can support rapid-cycle quality improvement and incorporate other important outcomes including nonfatal complications, reoperation, neurologic and other noncardiac morbidity, and cost effectiveness.

In this large multi-institutional registry, mortality after pediatric cardiac surgery has declined substantially over the past 25 years. Mortality remains an important end point for higher-risk operations, but new end points should be developed and validated across the spectrum of pediatric cardiac operations. Center-specific variation exists even after risk adjustment, suggesting that some of the postoperative mortality is preventable, but institutional volume only partially explains this variation. Careful regionalization based on measured outcomes rather than volume may have a role in improving quality. However, we believe it may be better to identify the factors responsible for volume-independent center-specific variation and deploy them broadly to improve outcomes at centers of all sizes.