A robust and consistent association has been observed between outcomes following surgical procedures and the hospital volume of that procedure, with patients who undergo surgery at high-volume centers having better outcomes than those who undergo surgery at low-volume centers [113]. Importantly, this relation appears constant over varying levels of surgical complexity and specialties, having been observed following procedures ranging from appendectomy [10] to pancreaticoduodenectomy [2, 14].

As a result of these data, several authors have argued for the regionalization of technically demanding procedures that are performed relatively infrequently to high-volume centers. Such a model has enjoyed success in the field of trauma surgery, where severely injured patients have significantly better outcomes when treated at high-volume trauma centers [15]. Accordingly, credentialing agencies as well as third-party payers have incorporated volume standards into hospital referral criteria for myriad surgical procedures that have documented volume-outcome relations [1618].

The relation between hospital volume of thyroidectomies and outcomes following substernal thyroidectomy (ST) has not been studied. Substernal thyroidectomy is indicated for removal of a substernal goiter (defined most commonly as extension of > 50% of the thyroid gland into the mediastinum [19]), which is present in approximately 6% of patients who present for thyroidectomy [1928]. Substernal thyroidectomy has been advocated for the treatment of all substernal goiters for several reasons—including the natural history of substernal goiter, which is one of progressive expansion with eventual onset of compressive symptoms, the fact that medical therapy is nearly always unsuccessful [29], and because surgery often relieves compressive symptoms [23]. Additionally, approximately 10% of substernal goiters harbor an underlying malignancy [2022, 25, 27].

Compared to conventional cervical thyroidectomy (CT), ST is associated with a increased hospital length of stay (LOS), increased postoperative morbidity, and increased mortality [30, 31]. Using the New York State Statewide Planning and Cooperative Systems (SPARCS) Database, we recently documented a more than eightfold risk-adjusted increase in the likelihood of mortality following ST compared to CT [31]. Worse outcomes following ST compared to CT are likely due to both the technical difficulty encountered during excision and challenges faced during perioperative management of patients with a substernal goiter (e.g., tracheal compression necessitating fiberoptic bronchoscopy for endotracheal intubation).

The purpose of this study was to determine if a volume–outcome relation exists between hospital volume of thyroidectomies and outcomes following ST. Such an association would have potential health care policy ramifications regarding regionalization. We therefore tested the hypothesis that LOS, morbidity, and mortality are improved when ST is performed at a high-volume hospital.

Materials and methods

Data were extracted from the SPARCS database for the years 1998–2004. The SPARCS database contains information on all patients discharged from acute-care, nonfederal hospitals in New York State (NYS). A total of 15 diagnostic fields and 15 procedures fields, which are based on the International Classification for Disease, 9th revision (ICD-9) classification system, are available for each patient discharge. For the present analysis, a data set containing patients who underwent either partial (06.52) or total (06.51) ST was created. Substernal thyroidectomies in which the extent of resection was unclear (06.53) were excluded.

The primary independent variable was the average hospital volume of thyroidectomies performed over the 7-year study period. Although hospital volume of STs was considered originally as the variable on which to base the volume calculation, too few STs occurred to make this a valid metric. Specifically, among the hospitals that performed at least one ST over the 7-year study period, the mean and median annual hospital volume of STs was 1.0 and 0.4, respectively.

The volume variable was thus based on the average number of thyroidectomies performed. Thyroidectomies were captured using the aforementioned ST codes, as well as the following codes for CT: unilateral cervical thyroid lobectomy (06.2), partial CT (06.3), and total CT (06.4). Patients who underwent thyroid biopsy (06.1) only, as well as cases in which the extent of thyroid resection was not specified (06.53 and 06.98) were not included in the hospital volume calculation.

Because the hospital volume of thyroidectomies was not distributed normally, it was analyzed as an ordinal categorical variable, with cutoff points determined by division into tertiles. Tertiles were chosen to assess for linear trends between hospital volume groups (in contrast to analysis of a dichotomous variable) while retaining sufficient sample sizes within groups for multivariable analysis. Low-volume hospitals were defined as centers that averaged <33 thyroidectomies per year, middle-volume as between 33 and 99 thyroidectomies per year, and high-volume as ≥100 thyroidectomies per year. Division into tertiles resulted in approximately equal sample sizes within each volume group.

Additional covariates abstracted included age, sex, co-morbidity, race (white vs. nonwhite), insurance status (private insurance vs. other), extent of thyroidectomy (total vs. subtotal), and thyroid pathology [malignancy (19.3) vs. other]. Patient co-morbidities were captured using the Deyo co-morbidity index [32], which assigns points based on the ICD-9 coding system for 19 preexisting co-morbid conditions, ranging from 1 to 6 for each condition, for a total possible score of 37. However, owing to a high degree of co-linearity between the thyroid malignancy variable and the total co-morbidity score (which incorporates thyroid malignancy), a modified Deyo co-morbidity score, which excluded thyroid malignancy (19.3), was calculated. The co-morbidity variable was analyzed as continuous.

Outcome variables included hospital LOS, hospital mortality, and surgical complications. Complications following thyroidectomy were captured using ICD-9 diagnostic coding as outlined by Sosa et al. [30]. Recurrent laryngeal nerve (RLN) injury was defined as the presence of a diagnostic code in any field for either vocal cord paralysis (47.83) or a surgical complication involving the nervous system (99.700, 997.09). Endocrine complications were captured using the diagnostic codes for either hypoparathyroidism (25.21) or hypocalcemia (27.541) (grouped into the variable hypoparathyroidism for the present analysis). Postoperative bleeding was captured using the diagnostic codes for either hematoma formation (998.12) or hemorrhage related to a procedure (998.11). Acute respiratory failure (518.81) and red blood cell (RBC) transfusion (99.04) were also queried.

Statistical analyses were computed using SAS Version 9.1 (SAS Institute, Carey, NC, USA). All p values were two-sided, with statistical significance evaluated at the 0.05 α level. The assessment of differences in continuous variables between the three volume groups was performed using analysis of variance (ANOVA). The p values derived from this test are listed as p ANOVA. In the case of overall significance, the assessment of significance between individual volume groups was performed using Tukey’s studentized range (HSD) test [33]. The assessment of differences in nominal categorical variables between volume groups was performed using the omnibus χ2 test. The p values derived from this test are listed as p χ2omnibus. Furthermore, the assessment of a linear trend between increasing volume groups and both covariates and outcomes was performing by calculating the χ2 trend statistic. The p values derived from this test are listed as p χ2trend.

Multivariate logistic regression models were created to evaluate the independent effect of hospital volume of thyroidectomies on both the occurrence of at least one complication and mortality following ST. Variables associated with hospital volume at the p < 0.10 level by univariate analysis were added to the model using a forward selection method. The overall contribution of the fitted model to predicting variability in the outcome of interest was assessed using the likelihood ratio χ2 test. The independent contribution of individual variables was assessed using the Wald χ2 test. Model fit was assessed using the Hosmer-Lemeshow goodness-of-fit χ2 statistic, with p > 0.05 indicating acceptable model calibration.

Results

Of the 217 New York State hospitals that performed at least one thyroidectomy over the study period, 155 (71.4%) performed at least one ST. Figure 1 depicts the distribution of these 155 hospitals according to annual volume of thyroidectomies. Whereas most hospitals averaged < 33 thyroidectomies per year (n = 117, 75.5%) (low volume), only 11 hospitals (7.1%) averaged ≥ 100 thyroidectomies per year (high volume). The mean number of thyroidectomies performed annually was 10.7 (range 0.3–31.2) for low-volume centers, 53.0 (range 35.7–89.3) for middle-volume centers, and 170.3 (range 100.9–287.4) for high-volume centers (p ANOVA < 0.0001).

Fig. 1
figure 1

Distribution of the 155 New York state hospitals that performed at least one substernal thyroidectomy according to the volume of thyroidectomies performed over the 7-year study period. Most (n = 117, 75.5%) of the hospitals performed fewer than 33 thyroidectomies per year (low-volume), whereas only 11 hospitals (7.1%) performed ≥ 100 thyroidectomies per year (high-volume). The remaining hospitals (n = 27, 17.4%) performed between 33 and 99 thyroidectomies per year (middle-volume)

A total of 1153 patients underwent ST over the 7-year study period. Sample characteristics are shown in Table 1. The sample consisted of a relatively young, healthy population with a mean age of 56.7 years and a mean co-morbidity score of 0.7. Malignancy was present in 261 patients (24.3%); 680 (59.0%) patients underwent total thyroidectomy; and 988 (85.7%) patients underwent surgery at a teaching hospital. The volume groups were distributed evenly, with 372 patients (32.2%) undergoing ST at a low-volume center, 388 (33.7%) at a middle-volume center, and 393 (34.0%) at a high-volume center.

Table 1 Sample demographics

Analysis of patient covariates revealed significant differences according to volume groups (Table 2). Age decreased slightly with increasing volume; the mean age of patients who underwent surgery at a high-volume facility was 4 years younger than that of patients who underwent surgery at a low-volume center (58.5 vs. 54.5, respectively, p ANOVA = 0.003). The co-morbidity score also varied significantly by volume group (p ANOVA < 0.0001), with high-volume patients having a mean score that was twice that of low-volume patients (1.2 vs. 0.6, respectively). Furthermore, the likelihood of private insurance (p χ2trend < 0.0001), total thyroidectomy (p χ2trend < 0.0001), malignancy (p χ2trend < 0.0001), and presentation to a teaching hospital (p χ2trend < 0.0001) all increased with increasing volume group. Whereas only 72.6% of low-volume patients (n = 270) underwent surgery at a teaching hospital, all of the 11 high-volume hospitals were teaching facilities. Although race varied significantly by volume group (p 2χ omnibus = 0.01), no evidence of a linear trend was observed (p 2χ trend = 0.42). Finally, sex did not vary significantly by volume group (p 2χ omnibus = 0.94).

Table 2 Sample characteristics according to volume group

Outcomes following ST by volume group are summarized in Table 3 and Figures 2 and 3. The likelihood of at least one postoperative complication increased significantly with decreasing hospital volume (p χ2trend = 0.005). Neither RLN injury (p χ2omnibus = 0.74) nor hypoparathyroidism (p χ2omnibus = 0.14) varied in incidence according to volume group. However, a linear trend was observed between increasing hospital volume of thyroidectomies and decreasing incidence of postoperative bleeding (p χ2trend = 0.01), red blood cell (RBC) transfusion (p χ2trend = 0.04), and respiratory failure (p χ2trend = 0.04) following ST (Fig. 2). Furthermore, a trend toward a decreased hospital LOS with increasing volume group was observed (p ANOVA = 0.06). Finally, mortality was nearly 10-fold higher in the low-volume group than in the high-volume group (p χ2trend = 0.004) (Fig. 3).

Table 3 Outcomes according to volume group
Fig. 2
figure 2

Postoperative complications following substernal thyroidectomy that varied significantly according to volume group. The incidences of postoperative bleeding, red blood cell (RBC) transfusion, and respiratory failure following substernal thyroidectomy were inversely related to hospital volume of thyroidectomies

Fig. 3
figure 3

Mortality following substernal thyroidectomy according to volume group. There was an inverse, linear relation between hospital volume of thyroidectomies and the likelihood of mortality following substernal thyroidectomy (χ2 trend = 8.3, p = 0.004)

Results from multivariate logistic regression analysis, using both the incidence of at least one complication and mortality as the dependent variables, are shown in Tables 4 and 5, respectively. Age, race, insurance status, co-morbidity, extent of thyroidectomy, and presence of thyroid malignancy were added to the models in addition to hospital volume. Controlling for these covariates, patients who underwent ST at either a low-volume hospital or a middle-volume hospital were more than two times as likely to incur a complication as compared to patients who underwent ST at a high-volume center: OR = 2.11, 95% CI [1.24–3.58, p = 0.006 and OR = 2.23, 95% CI 1.34–3.69, p = 0.002, respectively) (Table 4). Patients who underwent ST at a low-volume hospital were also more than 10 times as likely to die as patients who underwent ST at a high-volume hospital (OR = 10.5, 95% CI 1.08–102.4, p = 0.04) (Table 5). Patients who underwent surgery at a middle-volume hospital also demonstrated an increased likelihood of mortality following ST compared to high-volume patients, although this value did not reach statistical significance: OR = 7.8, 95% CI 0.76–80.9, p = 0.08).

Table 4 Results from multivariable logistic regression on the occurrence of at least one complication following substernal thyroidectomy
Table 5 Results from multivariable logistic regression on mortality following substernal thyroidectomy

Discussion

Using NYS administrative data, we found that patients who underwent ST at a hospital that averaged a relatively high volume of thyroidectomies enjoyed better outcomes than patients who underwent ST at a hospital that averaged fewer thyroidectomies. These associations were observed despite that fact that patients who presented to high-volume centers had greater co-morbidity, were more likely to have a thyroid malignancy, and were more likely to require a total thyroidectomy than patients who underwent surgery at a low-volume center. Using multivariate logistic regression, we found an independent association between the hospital volume of thyroidectomies and both mortality and complications following ST, with patients who underwent ST at a low-volume center incurring a more than twofold increased likelihood of at least one postoperative complication and a more than 10-fold increased likelihood of mortality compared to patients who underwent ST at a high-volume center. These findings add to the sizable literature documenting a volume–outcome relation in both endocrine surgery and surgery in general. Although these data appear to favor regionalization of STs to high-volume centers, several methodological considerations warrant elaboration.

The most commonly invoked explanation for the volume–outcome relation in surgery involves the theory that both surgeon and hospital experience results in improved patient care—put simply, “practice makes perfect.” Due to the nature of the SPARCS database, we were unable to distinguish the relative contributions of surgeon volume from hospital volume. Independent of hospital volume, surgeon expertise has been associated with improved outcomes following a wide range of procedures, including thyroidectomy [30, 3438]. Whereas some investigators have found that most of the volume–outcome relation is explained by surgeon experience [30, 38], others have noted an independent effect of hospital volume [34]. Harmon et al. reported that the outcomes of low-volume surgeons following colorectal resection are improved when surgery is performed at high-volume centers [39].

The relation between surgeon and hospital volumes of thyroidectomy and patient outcomes was originally studied by Sosa et al. [30]. Overall, a significant association was observed between increased surgeon volume and improved outcomes, including postoperative complications, LOS, and hospital charges. In a subgroup analysis of surgeons with operating privileges at more than one hospital, LOS was found to be significantly associated with surgeon, but not hospital, volume of thyroidectomies. However, results for additional outcomes by hospital volume groups, including mortality, were not reported.

Outcomes following ST appear intuitively dependent on both surgeon and ancillary staff expertise. The increased risk of RLN injury associated with ST as compared to CT [31] argues for the importance of surgeon experience. However, the incidence of RLN injury did not vary by volume group in the present study, although the relative rarity of this complication precluded a meaningful multivariate analysis. In contrast, because perioperative management of patients with a substernal goiter frequently mandates use of specialized procedures, such as fiberoptic bronchoscopy, sternotomy, and postoperative mechanical ventilation [25, 27], ancillary staff expertise (e.g., anesthesiologists, surgical intensivists) may also have a substantial impact on patient outcome. This hypothesis is consistent with the decreased likelihood of respiratory failure observed in patients who undergo ST at a high-volume center.

A second explanation for the observed volume–outcome relation involves reverse causality. Specifically, providers may selectively refer patients to hospitals that are known to produce favorable patient outcomes, thereby increasing patient volume at these hospitals. Owing to the retrospective nature of this study, these two possibilities cannot be distinguished.

Irrespective of the relative contributions of surgeon and ancillary staff expertise, the independent association between hospital volume and outcome following ST remains germane; regardless of the mechanism, patients who presented to high-volume hospitals incurred better outcomes than patients who presented to low-volume hospitals. This association must be interpreted with caution as unmeasured patient characteristics that differ by volume group may independently affect outcomes. However, several parameters associated with adverse outcomes following thyroidectomy, including co-morbidity, thyroid malignancy, and extent of surgery [30], were in fact more common in patients who presented to high-volume centers. Furthermore, an independent effect of hospital volume on both complications and mortality following ST was observed after controlling for the aforementioned variables using multivariate logistic regression.

Although the use of administrative data is advantageous for analysis of outcomes following relatively infrequent procedures such as ST, important limitations are also introduced. Certain variables, such as goiter size, degree of substernal extension, and use of a sternotomy incision during ST, are impossible to capture. Series of substernal goiters have reported a mean goiter mass ranging from 140 g [20] to 264 g [23], and increased goiter mass has been associated with postoperative complications following ST [27]. Goiter size may thus confound the relation between ST and adverse outcomes, although there are no data to suggest that goiter size varies by volume group. Analysis of institutional databases that capture goiter size could help to elucidate these relations, although such studies would likely suffer from inadequate power.

Additional limitations of administrative data include undercoding demographic variables such as patient co-morbidity. Furthermore, additional unmeasured patient characteristics may differ based on hospital volume groupings and introduce biases. This phenomenon, known as “clustering,” may in turn exaggerate measures of association in volume–outcome studies [40]. Finally, because hospital administrative data are limited to the inpatient period, the durations of both RLN injury and hypoparathyroidism are impossible to ascertain.\

Conclusions

The literature concerning the hospital volume of surgical procedures and patient outcomes continues to evolve. In the present study, we provide evidence that the hospital volume of thyroidectomies is directly related to both morbidity and mortality following ST. This effect appears to be independent of differences in patient age, socioeconomic status, co-morbidity, or extent of surgery among volume groups. Although these results suggest potential value to the regionalization of ST to high-volume centers, they must be examined in light of the limitations common to studies of this kind. Further research is needed prior to definitive recommendations.