Gastrointestinal stromal tumors (GISTs) are the most common mesenchymal tumors of the gastrointestinal tract.1 They may occur anywhere along the alimentary tract, and they may occasionally occur outside the gastrointestinal tract, such as in the omentum and mesentery.2,3 Significant prognostic heterogeneity has been described with GISTs, which can range from clinically benign to frankly malignant tumors.4,5 The standard treatment for localized primary GIST is complete surgical resection with clear margins.6 11 Nonetheless, the risk of recurrence remains even after complete resection.12,13

Various classification systems to prognosticate GIST have been proposed. The two most widely accepted risk classification systems are the US National Institutes of Health (NIH) criteria and the Armed Forces Institute of Pathology (AFIP) criteria, which were established in 2001 and 2006, respectively.4,14 More recently, Joensuu et al. proposed a modified consensus criteria. Table 1 summarizes these three risk classification systems for GISTs. These classification systems have been compared and validated by several investigators.5,15 18 Several investigators have also proposed modifications to these classification systems, proposing that additional factors such as tumor rupture, tumor ulceration, and mucosal invasion be included in the risk classification.5,19,21

Table 1 Established risk classification systems for GISTs

Huang et al. recognized the wide prognostic heterogeneity of tumors classified in the NIH high-risk category.5 They observed that large and mitotically inactive GISTs were less likely to recur compared to mitotically active GISTs, which tended to herald a much poorer outcome. Hence, they proposed that the NIH high-risk category be subdivided into two categories to prognosticate these patients more accurately. Similarly, in 2008, Goh et al. also noted the wide differences in prognostication of GISTs in the AFIP high-risk category and proposed a modification, dividing the high-risk category into a high-risk group and a very high-risk group on the basis of mitotic activity.19 In another study, Joensuu noted that in addition to tumor size and mitotic count, tumor location and tumor rupture appeared to be important prognostic factors for completely resected GISTs.20 Hence, these investigators conceptualized a risk classification system for selecting GIST patients for adjuvant therapy based on the Fletcher-NIH consensus criteria as well as taking into account the primary tumor location, as per the AFIP criteria, and the presence of tumor rupture. The importance of tumor rupture as an independent adverse prognostic factor has been highlighted in the European Society of Medical Oncology guidelines for the management of GIST.21 A tumor, grade, metastasis clinical staging system was proposed by Woodall and colleagues in 2009—a system in the same vein as the tumor, node, metastasis, grade soft tissue sarcoma staging system now used by the American Joint Committee on Cancer.22

More recently, in 2009, in a bid to quantify the risk of recurrence of GISTs, the Memorial Sloan Kettering Cancer Center (MSKCC) developed a prognostic nomogram that predicted the risk for tumor recurrence after surgical resection of a localized primary GIST, based on their cohort of 127 patients.23 The nomogram was then compared with three other existing GIST staging systems using concordance probabilities, and it was found to be slightly more accurate in the prediction of recurrence-free survival (RFS). This nomogram was subsequently validated with two other cohorts of Western patients from the Spanish Group for Research on Sarcomas (n = 212) and the Mayo Clinic (n = 148).

The present study aimed to validate this prognostic nomogram in a large cohort of Asian patients to determine its applicability in this particular subset of patients. We also compared the predictive accuracy of the GIST nomogram versus current established classification systems.

Methods

Between 1987 and 2012, a total of 313 patients who underwent complete gross resection of localized primary GIST at Singapore General Hospital were identified from a prospectively maintained surgical database. Of these, 24 patients received adjuvant imatinib therapy and were excluded from analysis. This study was approved by our institutional review board. Some of these patients have been reported in a previous study.19 The diagnosis of GIST was confirmed by an expert pathologist using standard pathologic criteria.19 Mitotic index, which is the number of mitoses per 50 randomly selected microscopic high-power fields (HPF), was calculated. Tumor size was also measured by the pathologist, either before or after formalin fixation. At the time of diagnosis, all the patients showed no evidence of metastatic disease based on staging computed tomography imaging of the abdomen and pelvis. None of these patients was treated with a tyrosine kinase inhibitor at a neoadjuvant or adjuvant setting.

The patients’ demographic characteristics and clinicopathologic data were collected. R0 resection was defined as the removal of all gross disease with negative macroscopic and microscopic resection margins on histopathologic examination. The presence of tumor rupture was determined intraoperatively. The postoperative surveillance protocol included physical examination every three months for the first 2 years after surgery, every six months for the next 3 years, and yearly thereafter. Computed tomographic scans of the chest, abdomen, and pelvis were performed every year, and in higher-risk patients every 6 months or earlier if clinically indicated. Endoscopic surveillance was performed yearly. RFS was defined as the time from diagnosis to the time of first documented appearance of the tumor after complete resection on the basis of clinical or radiologic examination.

The MSKCC nomogram (Fig. 1) was applied to our cohort of patients. Points were assigned according to tumor size, number of mitotic figures per 50 HPF, and site of tumor. This was done by drawing a line upward from the corresponding values to the “Points” line. The sum of these three points, plotted on the “Total points” line, corresponded to the nomogram predictions of 2-year and 5-year RFS. The performance of the nomogram was evaluated with the concordance index (C index) and calibration. The C index measured the discriminatory ability of the nomogram, with the interpretation of the C index similar to that of the area under the receiver operating characteristic (ROC) curve.24 The C index provides the probability that given any two randomly selected patients, the patient who experiences recurrence first has a higher nomogram-predicted probability of recurrence. If both patients experience recurrence at the same time, or if the patient with shorter follow-up does not experience recurrence, then the probability does not apply to those two patients.23 Second, calibration was assessed by comparing the nomogram-predicted probability of recurrence with the Kaplan–Meier observed RFS for quartiles of patients stratified by nomogram scores after surgical resection.

Fig. 1
figure 1

MSKCC GIST prognostic nomogram to predict probability of 2- and 5-year RFS25

Statistical analysis was performed by SAS software, version 9.3 (SAS Institute, Cary, NC). Continuous variables are presented as mean (standard deviation) and median (minimum, maximum), and categorical variables are presented as frequency (percentage). The Wilcoxon signed rank test and the χ 2/Fisher’s exact test were used to analyze continuous variables and categorical variables, respectively. Two-tailed p values were reported, and a p value of <0.05 was considered to be statistically significant. Survival curve pairwise comparisons within NIH criteria, AFIP criteria, and Joensuu criteria were performed by the log rank test.

Results

The baseline demographic and clinicopathologic characteristics of 289 patients with primary resected GISTs who did not receive adjuvant imatinib therapy are provided in  Tables 2 and 3. The median follow-up duration was 61 (range 1–266) months. One hundred fifty-three patients (52.9 %) were male, and the median age of the entire cohort of patients was 61.0 (range 27.0–92.0) years. The median tumor size was 55 (range 3–300) mm, and 170 tumors (58.5 %) had a mitotic index of <5 per 50 HPF.

Table 2 Demographic information and clinicopathologic variables

Univariate analyses (Tables 2, 3) demonstrated that tumor size ≥5 cm (28.0 vs. 3.1 %, relative risk [RR] 8.944, 95 % confidence interval [CI] 3.304–24.214, p < 0.001), mitotic count ≥5 per 50 HPF (38.7 vs. 1.8 %, RR 21.905, 95 % CI 6.976–68.779, p < 0.001), and tumor rupture (58.8 vs. 14.3 %, RR 4.103, 95 % CI 2.507–6.714, p < 0.001) were significantly associated with increased 2- and 5-year recurrence rates. Nongastric location (18.7 vs. 15.9 %, RR 1.173, 95 % CI 0.699–1.968, p = 0.545) and positive resection margins (17.4 vs. 16.9 %, RR 1.028, 95 % CI 0.406–2.605, p = 0.954) were not associated with tumor recurrence.

Table 3 Demographic information and clinicopathologic variables

The NIH, AFIP, and Joensuu criteria were useful in stratifying patients according to risk of recurrence (Tables 2, 3). Forty-nine patients (17.0 %) were found to have recurrent disease. None of these was classified to have very low-risk or low-risk tumors according to all three risk classification systems (NIH, AFIP, and Joensuu criteria). The majority of tumors with recurrence within 2 years were classified in the high-risk category according to all three risk classification systems.

Figure 2 shows the overall probability of RFS for the entire cohort of patients with GIST for the first 5 years after surgical resection. The 2-year RFS was 77.2 % (95 % CI 71.6–81.8), and the 5-year RFS was 67.9 % (95 % CI 61.7–73.4). Supplementary Fig. 1b–d demonstrates the RFS probability stratified according to the different risk groups within each classification system (NIH, AFIP, and Joensuu). Multiple comparisons between the different risk groups within each criteria were performed. Patients in the high-risk category according to the NIH criteria had significantly higher RFS probability compared to patients in the other risk categories (p < 0.001). Similarly, patients with high-risk tumors according to the AFIP and Joensuu criteria were significantly more likely to have tumor recurrence.

Fig. 2
figure 2

RFS of GIST during the first 5 years

In each of the three classification systems (NIH, AFIP, and Joensuu), patients classified in the very low-risk and low-risk categories did not differ significantly in terms of RFS probabilities. The AFIP criteria showed a significant difference in RFS probabilities between intermediate-risk patients and very low-risk patients, while the NIH and Joensuu criteria did not show any difference between these two categories of patients. This is likely because of the relatively small sample sizes of patients in the very low-risk category in the NIH (n = 21) and Joensuu (n = 22) criteria compared to the AFIP criteria (n = 80).

Figure 3 assesses the calibration of the MSKCC nomogram by plotting the observed RFS against the predicted RFS, where a 45-degree line should be obtained if the predictions are well calibrated. The 2-year nomogram scores of our study cohort were calculated on the basis of the MSKCC nomogram and were divided into four groups by the 25th, 50th, and 75th percentiles. The concordance probability of the nomogram of 2-year RFS was 0.71 (SE 0.02), and 5-year RFS was 0.71 (SE 0.19). Therefore, 71 % of the time, the nomogram correctly predicted the outcome between two randomly selected patients. The Kaplan–Meier plot for observed RFS among our cohort of GIST patients was drawn above the diagonal line, which indicated that the MSKCC nomogram tended to overestimate the probability of recurrence compared to the actual observed recurrence among our patients.

Fig. 3
figure 3

Calibration of MSKCC nomogram-predicted RFS after surgery during the first 5 years

ROC analysis was used to compare the prognostic accuracy of the three GIST risk classification systems (NIH, AFIP, and Joensuu classifications) and the MSKCC nomogram (Fig. 4; Table 4). For the MSKCC nomogram, both the 2-and 5-year predicted probabilities of RFS after surgery for GIST were calculated. The 2- and 5-year nomograms with area under curve (AUC) = 0.87 (95 % CI = 0.82, 0.91) and AUC = 0.87 (95 % CI = 0.83, 0.92), respectively, provided a better estimation than the NIH (p < 0.001) and Joensuu (p < 0.001) criteria. The 2-year nomogram had a similar ROC curve to the 5-year nomogram, implying that both were similar in terms of predictive ability. There was no significant difference between the performance of the nomogram versus the AFIP criteria (p = 0.142). The AUC = 0.85 (95 % CI 0.81–0.88) for the AFIP criteria was also found to be significantly greater than the NIH criteria (AUC = 0.80; 95 % CI 0.76–0.84) (p = 0.001) and Joensuu criteria (AUC = 0.77; 95 % CI = 0.73, 0.81) (p < 0.001).

Fig. 4
figure 4

ROC curve analysis of risk of GIST recurrence during the first 5 years

Table 4 Comparison between receiver operating characteristic curves

In this study cohort, adjuvant imatinib was routinely available to patients at high risk of recurrence from 2009 after the results from the Intergroup randomized controlled trial demonstrated improved RFS for patients with adjuvant treatment.25 Before 2009, 223 patients underwent surgical resection, of whom 3 received adjuvant imatinib, whereas from 2009 to 2012, 90 patients underwent resection, of whom 21 received adjuvant treatment. Not surprisingly, the proportion of NIH/AFIP high-risk patients in the cohort resected before 2009 was greater than the cohort resected between 2009 and 2012 (47.7 %/40 % vs. 27.5 %/24.6 %). The C index of the MSKCC 2-year nomogram, MSKCC 5-year nomogram, NIH, AFIP, and Joensuu in the older cohort was 0.71, 0.71, 0.66, 0.71, and 0.66, respectively, which was similar to the overall findings of the present study. However, the C index of the MSKCC 2-year nomogram, MSKCC 5-year nomogram, NIH, AFIP, and Joensuu in the latter cohort from 2009 to 2012 was 0.68, 0.70, 0.73, 0.73, and 0.73, respectively.

Discussion

Accurate prognostic stratification of resected primary localized GISTs is essential to enable clinicians to counsel patients appropriately and select patients more likely to benefit from adjuvant treatment. Risk stratification may also assist the clinician in determining the intensity of postoperative surveillance. The present study validated the MSKCC nomogram in our cohort of patients and demonstrated that it had a superior predictive accuracy compared to the NIH and Joensuu criteria.

In the present study, univariate analyses demonstrated that tumor size, mitotic index, and tumor rupture were significantly associated with tumor recurrence, whereas the association between tumor location and subsequent recurrence was not statistically significant. Interestingly, the two best classification systems to predict actual tumor recurrence in this study (the MSKCC nomogram and the AFIP criteria) used the same three parameters: size, site, and mitotic index of the tumor. With regard to the other two criteria, the NIH consensus criteria only utilized tumor size and mitotic count, and the Joensuu criteria, also known as the modified consensus criteria, took into account one additional variable: the presence of tumor rupture, which deemed a tumor to be high risk regardless of size, site, or mitotic count, or location of tumor. According to all three risk classification systems, a tumor >5 cm in size and >5 mitotic counts per 50 HPF would be regarded as high risk, and a tumor <2 cm in size and <5 mitotic counts per 50 HPF would be regarded as very low risk, irrespective of its location or the presence of tumor rupture.

It is difficult to determine the exact reason why the AFIP criteria and the MSKCC nomogram better predicted RFS than the other 2 risk classification systems when applied to our patient cohort. However, the most likely reason is that both criteria took into account tumor location as an important criterion. The AFIP system drew a wider prognostic divergence between tumors located in the gastric region compared to nongastric tumors. For example, a small tumor (<2 cm in size) with a mitotic index between 5 and 10 per 50 HPF would be classified as intermediate risk by both the NIH and Joensuu criteria, regardless of its location. However, according to the AFIP criteria, if the tumor were located in the stomach, it would be classified as low risk, whereas if it were located outside the stomach, it would be classified as high risk. Similarly, the MSKCC nomogram assigns 0 points to tumors sited in the stomach, 5 points for those in the colon or rectum, and 40 points for small intestine GISTs. It is also important to note that although tumor rupture was found to be significantly associated with an increased risk of tumor recurrence in this study, the low proportion of ruptured tumors (5.9 %) limited its impact when predicting RFS.

When attempting to validate the MSKCC nomogram in this study cohort, according to which approximately half of the patients harbored high-risk tumors, we found that the observed RFS was slightly better than the nomogram-predicted RFS. In this study, the MSKCC nomogram was a better prognostic predictive tool compared to the NIH and Joensuu criteria. There was no statistically significant difference between the predictive ability of the nomogram and the AFIP criteria. These findings concurred with those reported by Gold et al. who found that the MSKCC nomogram was superior to the NIH criteria but not significantly different from the AFIP criteria.23 The utility of the Joensuu classification system was not analyzed in their study. Similarly, Tanimine et al. from Japan applied the MSKCC nomogram to a small cohort of 60 Asian patients and reported that the nomogram generally overestimated recurrence risk compared to the actuarial RFS.26 The authors hypothesized that their results could be confounded by the relatively small patient cohort, with a large proportion (>50 %) of very low-risk and low-risk tumors.26

The concordance probability of 0.71 obtained in our patient population was similar to that obtained in the three cohorts of patients that were used to construct and validate the MSKCC nomogram.23 The nomogram predictions of RFS were relatively well calibrated, although the data in the Kaplan–Meier plot above the diagonal line indicated that the MSKCC nomogram tended to overestimate the probability of recurrence. Subset analysis of our patients who underwent resection before and after the introduction of adjuvant imatinib revealed an important limitation of the MSKCC nomogram. The predictive ability of the nomogram was dependent on the proportion of high/low-risk tumors in a particular study cohort. The nomogram tended to overestimate the probability of recurrence, especially for low-risk tumors, and thus its performance tended to be poorer in study cohorts with a high proportion of low-risk tumors.

The provision of imatinib mesylate as an adjuvant treatment has been shown to prolong RFS and overall survival in patients after surgical resection of GISTs.25,27,28 However, there is no clear consensus regarding the selection of patients for imatinib postoperatively or the duration of adjuvant treatment after surgical resection of primary GIST.7,21,29 The nomogram provides RFS probabilities on a continuous scale ranging from 10 to 90 %, although it does not define a specific value at which a tumor should be considered high risk or when the provision of adjuvant imatinib is recommended.30 Therefore, future validation studies of the MSKCC nomogram ought to seek to determine a value, or a range of values, at which a tumor can be considered high risk.

Conclusions

The MSKCC nomogram and AFIP criteria had the best predictive accuracy for tumor recurrence compared to the NIH and Joensuu risk classification systems in our series of Asian patients. However, it slightly underestimated the probability of RFS after surgical resection of GISTs. Our study also suggests that there is a wider than expected prognostic divergence between gastric GISTs versus GISTs arising from the small intestine.