Introduction

Gastrointestinal stromal tumors (GISTs) are the most commonly diagnosed mesenchymal tumor of the gastrointestinal tract, with an incidence of 10–15 cases per million people per year.13 GISTs most commonly arise from the stomach and small bowel and less frequently arise from the colon/rectum or other sites.4 For patients with primary GIST, margin-negative surgical resection is the treatment modality with the best chance of cure.5 Surgery alone for GIST is, however, associated with a 5-year recurrence-free survival of 70 %.6,7 Established factors associated with recurrence following resection of GIST include tumor size, mitotic rate, tumor site, and tumor rupture.811 Of note, 75–80 % of patients with GISTs have mutations in the receptor tyrosine kinase KIT (CD117) that lead to KIT overexpression.12

Two landmark randomized controlled trials have established adjuvant imatinib mesylate (Gleevec®, Novartis, Basel, Switzerland)—a tyrosine kinase inhibitor (TKI) targeted at KIT—as the standard of care for a subset of patients at risk of recurrence following resection of GIST.13,14 The National Comprehensive Cancer Network (NCCN) currently recommends adjuvant imatinib for patients at intermediate and high risk of recurrence following resection of GIST.5

Estimating the risk of recurrence following surgical resection of GIST is therefore critical to both physicians and patients for both prognostication and to determine which patients warrant adjuvant TKI therapy. Several investigators have assembled recurrence risk stratification systems aimed at stratifying patients according to recurrence risk.7,9,1517 The NIH and the Miettinen risk stratification systems take tumor size and mitotic rate into consideration, while the modified NIH risk stratification system takes tumor size, site, mitotic rate, and tumor rupture into consideration when determining recurrence risk.9,15,17 The Memorial Sloan-Kettering Cancer Center (MSKCC) GIST nomogram, created to numerically predict the recurrence-free survival for a given patient, was based on the analysis of 127 patients treated for GIST at MSKCC from 1983–2002.16 This nomogram was constructed based on the experience of a relatively small number of patients at a single center.

The aim of the current study was to create and internally validate a nomogram to predict disease-free survival following surgical resection of GIST in a large multi-institutional cohort of patients who have undergone surgery for primary GIST.

Methods

Patient Population and Data Collection

Patients for this study were identified from a retrospective, multi-institutional database of 609 patients who underwent surgery for GIST between January 1998 and December 2012 at 7 major cancer centers in the USA (Johns Hopkins University, Baltimore, MD; Duke University, Durham, NC; Emory University, Atlanta, GA; Medical College of Wisconsin, Milwaukee, WI; and University of Virginia, Charlottesville, VA) and Canada (University Health Network, Toronto, ON, and Sunnybrook Health Sciences Centre, Toronto, ON). Specific inclusion criteria for this study were (1) patients underwent surgical resection of a primary GIST, (2) intent of surgery was curative, and (3) patients survived 90 days following surgery. Exclusion criteria were (1) recurrent GIST, (2) metastatic GIST, (3) patients received peri-operative (neoadjuvant or adjuvant) imatinib, (4) surgery for an indication other than GIST who had an incidental finding of GIST on pathology, and/or (5) macroscopically positive (R2) resection margin. In total, 365 patients were included in this study. The institutional review board at each participating institution approved this study.

Standard demographic and clinicopathologic data were collected including sex, age, tumor site, pathologic tumor size, postoperative mitotic rate [number of mitoses/50 high powered field (HPF)], margin status [negative (R0), microscopically positive (R1), macroscopically positive (R2)], mutational testing, and tumor rupture. Date of last follow-up, recurrence, and survival were collected. Recurrence was defined as biopsy-proven recurrent GIST or a lesion deemed suspicious on cross-sectional imaging.

Statistical Methods

Summary statistics for the study population were presented as percentages or as median values with interquartile range (IQR). Disease-free survival (DFS) for the entire study population was generated using the Kaplan-Meier method, calculated with the date of surgery as the time origin.18 Differences in DFS were compared using the log rank test. Clinically important variables associated with recurrence for GIST were evaluated for inclusion into the nomogram. Continuous predictors such as age, tumor size, and mitotic index were transformed using cubic splines or categorized aiming to maximize the Wald χ 2 statistic. The association of relevant clinicopathologic variables with DFS was assessed using Cox proportional hazards models; the prognostic power of covariates was expressed by calculating hazard ratios (HRs) with 95 % confidence intervals (CIs).19 Backward stepwise selection with the Akaike information criterion (AIC) was used to identify variables for the multivariable Cox proportional hazards model. Selected variables were then incorporated into the nomogram.

Model discrimination was evaluated in three ways: (1) by evaluating discrimination with Harrel’s C-statistic (a measure that quantifies the proportion of all patient pairs for whom the predicted and observed survival outcomes are concordant), (2) by plotting Kaplan-Meier curves over the quartiles of prediction, and (3) by examining calibration plots using a bootstrapped sample.20,21 Model validation was performed using bootstrapped resampling to quantify any overfitting of our modeling strategy and to predict future performance of the model. The discriminative ability of our nomogram was compared to the NIH criteria15, the modified NIH criteria9, the Miettinen criteria17, and the MSKCC GIST nomogram16 using Harrel’s C-statistic. For determining DFS based on the MSKCC nomogram, the predicted probability of recurrence at 5 years was calculated using the nomogram and patients were divided into quartiles based on their predicted 5-year recurrence risk. Discrimination for the MSKCC nomogram was calculated using the raw recurrence risk rather than the quartile grouping so as not to diminish the discriminative ability of the nomogram. The discriminative abilities of each staging system were assessed using Harrel’s C-statistic. A C-statistic of 0.5 indicates no discriminatory ability, while a value of 1.0 indicates perfect discrimination.

Statistical analyses were performed with STATA version 12.0 (StataCorp, College Station, TX) and R version 3.0.3 (http://www.r-project.org); all tests were two-sided and a P value <0.05 was considered statistically significant.

Results

Demographic and Clinicopathologic Characteristics

The median age of the 365 patients included in our study was 63 years, and 56 % of the patients were female (Supplemental Table 1). The median tumor size was 3.8 cm. The majority of tumors originated in the stomach (78 %), while a minority originated in the small bowel (15 %), colon/rectum (1 %), or in other locations (7 %). The majority of tumors (86 %) had a mitotic rate of ≤5 mitoses/50 HPF, while 8 % had a mitotic rate of 6–10 mitoses/50 HPF and 6 % had a mitotic rate of >10 mitoses/50 HPF. A small minority of patients underwent an R1 resection (4 %) or had pre-operative or intra-operative tumor rupture (1 %).

The median follow-up for our cohort was 20.1 months. During follow-up, 21 patients recurred and 24 patients died. Median recurrence-free survival was not reached. The 1-, 3-, and 5-year DFS was 95.2, 88.3, and 81.4 %, respectively. Median overall survival was not reached and the 1-, 3-, and 5-year overall survival was 97.6, 94.4, and 88.0 %, respectively.

Model Specifications and Predictors of Disease-Free Survival

The following clinically relevant predictors of recurrence for GIST were selected as candidate variables from the database based on literature review: age, sex, tumor size, tumor site, mitotic rate, tumor rupture, and margin.811 Backward stepwise selection using the AIC in Cox proportional hazards regression modeling identified four variables that had the strongest association with DFS: sex, tumor size, tumor site, and mitotic rate. The HRs for the univariable and multivariable Cox proportional hazards regression analysis for candidate and selected variables is shown in Table 1. On multivariable analysis, male sex (HR 3.71, 95 % CI: 1.66–8.30), tumor size (<5 cm—ref 5–10 cm—HR 1.90, 95 % CI: 0.78–4.62; ≥10 cm—HR 3.14, 95 % CI: 1.20–8.19), and mitotic rate group (≤5 mitoses/50 HPF—ref 6–10/50 HPF—HR 1.57, 95 % CI: 0.54–4.57; >10 mitoses/50 HPF—HR 8.56, 95 % CI: 3.15–23.25) were independently associated with DFS (all P < 0.05).

Table 1 Cox proportional hazards regression model showing the association of variables with disease-free survival

Continuous variables (tumor size and mitotic rate) were then explored for inclusion into the final model using restricted cubic splines. Both tumor size and mitotic rate had non-linear effects on the HR of DFS. The log relative hazard of recurrence or death based on tumor size was relatively homogenous below 5 cm (Fig. 1a). In tumors between 5 and 10 cm in size, the log relative hazard increased steeply with increasing size. In tumors greater than 10 cm, there was a slow increase in the log hazard with increasing tumor size. Based on this pattern, tumor size was modeled in the nomogram as a categorical variable with the following categories: <5, 5–10, and ≥10 cm. For mitotic rate, the log relative hazard of recurrence or death increased until a mitotic rate of approximately 12 mitoses/50 HPF; after which, the log relative hazard plateaued (Fig. 1b). Mitotic rate was modeled as a categorical value in the nomogram with the following categories: ≤5/50, 6–10/50, and >10 mitoses/50 HPF; the categories were chosen based on mitotic rate groupings previously used in recurrence risk stratification.9,15

Fig. 1
figure 1

Transformation of continuous variables in univariate analysis using restricted cubic splines. a Tumor size. b Mitotic rate

Nomogram

A nomogram to predict DFS of patients with GIST following surgical resection is shown in Fig. 2. The nomogram was developed based on the four independent prognostic markers: sex, tumor size, tumor site, and mitotic rate. Tumor size and mitotic rate were modeled categorically; tumor size groupings were <5, 5–10, and ≥10 cm and mitotic rate groupings were ≤5/50, 6–10/50, and >10 mitoses/50 HPF. Tumor site categories used in the nomogram were gastric and non-gastric. Each factor in the nomogram was assigned a weighted number of points, and the sum of points for each patient was associated with a specific predicted 2- and 5-year DFS. Using the nomogram, a higher score was associated with worse prognosis. For example, a man with a 6-cm gastric GIST and a mitotic rate of 6 mitosis/50 HPF would have a total of 11.2 points (sex = 6.1 points, size = 3 points, mitotic rate = 2.1 points, and site = 0 points). For this patient, the predicted 2-year DFS was 87 % and the predicted 5-year DFS was 74 %.

Fig. 2
figure 2

A nomogram for predicting disease-free survival of patients following resection of primary GIST

Prognostic discrimination was performed by dividing the predicted probabilities of disease-free survival into quartiles. DFS stratified by quartile was then used to plot Kaplan-Meier curves (Fig. 3). The median predicted 5-year DFS in quartiles 1–4 were 96.5, 93.4, 87.6, and 67.3 %, respectively. Patients with the lowest predicted 5-year DFS (quartile 4) did substantially worse (5-year DFS 54.9 %) than those in quartiles 1–3 (5-year DFS 89.6, 92.7, and 91.8 %, respectively) (P < 0.001).

Fig. 3
figure 3

Kaplan-Meier curves demonstrating disease-free survival for patients following resection for primary GIST according to quartiles of predicted disease-free survival

Model Performance

Discrimination of the final model was assessed using the C-statistic, which was 0.77 (Supplemental Table 2). Thus, when two patients were randomly selected, the nomogram predicted the correct ordering of DFS 77 % of the time. The 40-sample bootstrapped calibration plot for the prediction of 5-year DFS is shown in Fig. 4. The calibration plots reveal good prediction of 5-year DFS. Bootstrap validation of accuracy of the model with 150 iterations revealed minimal evidence of model overfit. In contrast, the C-statistics for DFS based on the NIH criteria, modified NIH criteria, the Miettinen criteria, and the MSKCC nomogram were 0.73, 0.71, 0.78, and 0.71, respectively (Supplemental Table 2).

Fig. 4
figure 4

Calibration plot comparing predicted and actual disease-free survival probabilities at 5 years follow-up

Discussion

While complete surgical resection remains the treatment modality of choice for primary GIST, the risk of recurrence with surgery alone is significant with 20–40 % of patients experiencing recurrence at 5 years.16 In randomized controlled trials, adjuvant imatinib therapy following resection of primary GIST has been demonstrated to improve recurrence-free survival in patients at higher risk of recurrence.14,22 Outcomes following surgery alone for GIST are heterogeneous, and accurate prognostication is essential to select patients for adjuvant TKI therapy and to inform patients and family members of their prognosis. At this point, the optimal method for risk stratification for patients with primary GIST is unclear.23 In this study, we describe a nomogram that numerically predicts an individual’s DFS following surgery for primary GIST according to four clinically available variables: sex, tumor size, tumor site, and mitotic rate. This information can then be used to make an individualized treatment decision about adjuvant imatinib. This study is important because our nomogram was developed using a large, multi-institutional group of patients who underwent surgery for GIST. In addition, in contrast to the previously developed MSKCC GIST nomogram16, our nomogram takes one additional variable independently associated with recurrence into consideration: sex.

The four variables used to predict DFS in our nomogram are established factors associated with recurrence following surgical resection for GIST. Tumor size, mitotic rate, and tumor site have been consistently associated with recurrence risk following resection of GIST and have been widely used in risk stratification of GIST.9,15,16 In a recent large population-based study including 2560 patients who underwent surgery for primary GIST, in addition to tumor size, tumor site, mitotic rate, and tumor rupture, sex was independently associated with recurrence risk with men having an increased risk of recurrence (adjusted HR 1.38).7 Our study is the first risk stratification model to consider these four variables that are independently associated with prognosis following resection of primary GIST.

The nomogram described in our study demonstrates good discrimination with a C-statistic of 0.77. In comparison, when discrimination of the NIH system, modified NIH system, and MSKCC nomogram was evaluated in our cohort, the C-statistics were 0.73, 0.71, and 0.71, respectively. The Miettinen criteria had similar discrimination in our cohort with a C-statistic of 0.78. Although our nomogram requires validation in an external population, the nomogram presented herein appears to be superior to the MSKCC nomogram in predicting DFS, likely due to incorporation of gender as an additional variable in the prediction of recurrence risk.

Mutational status does not appear to be a prognostic variable but appears to be predictive of response to imatinib.5,23 In the metastatic setting, exon 11 mutations are associated with increased response to imatinib, while exon 9 mutations respond better to a higher dose of imatinib.24,25 Additionally, wild-type tumors and platelet-derived growth factor receptor alpha-mutated tumors respond poorly to imatinib. In studies of patients with primary GIST, the impact of KIT mutations on prognosis has been mixed. In a study by Martin et al., KIT exon 11 mutation was associated with increased risk of recurrence after adjustment in 162 patients who underwent surgery for KIT positive GIST >2 cm in size.26 In another study, DeMatteo et al. found that while specific KIT mutations were associated with prognosis on univariable analysis, KIT mutations were not independently associated with prognosis on multivariable analysis.8 In the current study, only 8.5 % of patients underwent mutational analysis testing; therefore, unfortunately, we did not have sufficient power to test for the impact of specific mutations on DFS. This does, however, demonstrate that although consideration of mutational testing is recommended by the NCCN, only a minority of patients are undergoing routine mutational testing. Implementation of universal testing of mutation status for patients with GIST may allow clinicians to select patients for adjuvant imatinib therapy who are more likely to benefit from this therapy.

In conclusion, in this study, four independent predictors of recurrence following surgery for primary GIST (sex, tumor size, tumor site, and mitotic rate) were used to create a nomogram to predict DFS. The nomogram was able to stratify patients into prognostic groups and performed well on internal validation. Future studies are needed to externally validate the proposed nomogram to establish its value in the prediction of DFS following resection for GIST.