Gastric cancer (GC) is the fourth most common malignancy and the second most common cause of cancer-related death worldwide.1 The tumor–node–metastasis (TNM) staging system is the most important prognostic factor for GC and is critical for prognostic evaluation and subsequent treatment decision-making after surgery. The 7th edition of the AJCC TNM staging system for GC was officially put into use on 1 January 2010.2 Although many studies have confirmed that the 7th edition of the TNM classification has higher prognostic predictive ability than the previous TNM system,3,4,5,6 some limitations remain; For instance, N3a (7–15 metastatic lymph nodes) has the same staging as N3b (> 15 metastatic lymph nodes),6,7,8,9 whereas the number of examined lymph nodes (eLNs) significantly affects the adequacy of pN staging.9,10,11,12 Therefore, the greatest change in the recently released 8th edition of the AJCC TNM staging system13 is the changes in N3a and N3b in the final staging system.

Examination of 16 or more regional lymph nodes is recommended for N status determination,14,15 whereas the determination obtained for patients with ≤ 15 eLNs is objective. Therefore, the predictive ability of the new AJCC TNM classification requires further investigation for GC patients with ≤ 15 eLNs.

Using a database representing more than 12,000 patients from Eastern and Western countries, the purpose of this study is (1) to compare the prognostic ability of the 7th and 8th editions of the AJCC TMN classification in different populations (patients with ≤ 15 or > 15 eLNs) (2) to establish a novel prognostic model based on independent prognostic factors to predict the prognosis of patients with ≤ 15 eLNs and to compare this model with the AJCC TNM staging system, and (3) to validate the results using external data.

Patients and Methods

Patients

Patient data were obtained from the Surveillance, Epidemiology, and End Results (SEER) website (http://seer.cancer.gov/), including patient demographic information, primary tumor and lymph node characteristics, and survival. The inclusion and exclusion criteria were defined as shown in Fig. 1.

Fig. 1
figure 1

Flowchart for creation of the surveillance, epidemiology, and end results patient dataset

The external validation dataset comprised patients undergoing radical surgery for primary noncardia GC from 1995 to 2011 at Fujian Medical University Union Hospital (FMUUH). The exclusion criteria were consistent with those described above. A total of 2355 patients were included in this study. The median follow-up time for the validation dataset was 66.0 (4–213) months.

The pTNM stage of all study patients was classified according to both the 7th and 8th editions of the AJCC guidelines.

Statistical Analysis

Survival curves were estimated using the Kaplan–Meier method, and the log-rank test was used to determine significance. Prognostic relevance was investigated using multivariate Cox regression analyses. For purposes of illustration and clinical applicability, a nomogram was created based on the final regression model. Concordance indices (C-indices) were calculated to evaluate the discriminatory power of both staging systems. To assess and compare the prognostic ability of the competing staging systems, we performed bootstrap-based goodness-of-fit comparisons based on Bayesian information criterion (BIC)16 and time-dependent receiver operating characteristics (t-ROC) analyses as described by Gertler et al.17 Bootstrapping involves repeated sampling of cases from the same study population.18 BIC values were computed from multivariable Cox hazard regression models. Lower BIC values indicate a better model fit.19 For statistical assessment of differences in BIC values, the distribution of BIC differences was gathered from 1000 bootstrap samples of the original dataset, and 95% confidence intervals (CI) were computed.20 This procedure indicates significantly different predictive capability of two models if the zero value is not included. t-ROC analysis represents an extension of the ROC curve that assesses the discriminatory power of a prognostic model for time-dependent disease outcomes.21 In addition to comparing the ROC curves visually, the area under the ROC curve (AUC) was calculated.22,23 For each time point, the AUC value estimates the probability that a dead patient was classified in a higher staging category than a patient who remained alive. Sequential AUCs were compared between the 7th and 8th editions of the AJCC TNM staging system using independent and identically distributed representations of the AUC estimators. All analyses were two-sided, and P values < 0.05 were considered to indicate statistical significance. Statistical analyses were performed using SPSS version 18.0 (SPSS Inc., Chicago, IL, USA) and R version 3.1.2 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Study Population

In total, 10,194 noncardia GC cases from the SEER database and 2355 GC cases (validation cohort) from the FMUUH database were included in the study. The clinical and pathological data of the patients are presented in Table 1.

Table 1 Patient demographics

Stage Migration

In SEER, 80.6% (n = 8217) of the patients were assessed as having the same stage in both editions of the TNM classification. However, 19.4% (n = 1977) of the patients had different stages between the editions, with 17.8% (n = 1817) of the patients being downstaged in the new edition (Supplementary Table 2). In the 7th edition, the 5-year survival rates for each stage were as follows: IA, 77.5%; IB, 63.3%; IIA, 50.8%; IIB, 35.1%; IIIA, 21.7%; IIIB, 17.5%; and IIIC, 8.4% (p < 0.001, Fig. 2a). In the 8th edition, the 5-year survival rates were as follows: IA, 77.5%; IB, 63.3%; IIA, 50.8%; IIB, 35.3%; IIIA, 20.5%; IIIB, 13.5%; and IIIC, 5.3% (p < 0.001, Fig. 2b).

Fig. 2
figure 2

a Kaplan–Meier survival curves using the 7th AJCC staging system. b Kaplan–Meier survival curves using the 8th AJCC/UICC staging system

AJCC N Categories

In the SEER patients, the 5-year overall survival (OS) rates for the five AJCC N categories (N0–N3b) were 62.8, 32.3, 22.0, 12.6, and 5.4%, respectively (p < 0.001). When the patients in the N0, N1, N2, and N3a categories were stratified into subgroups (≤ 15 vs. > 15 eLNs), the patients with > 15 eLNs had significantly better survival than those with ≤ 15 eLNs for all N categories (p < 0.0001, Supplementary Fig. 1).

Comparison of 7th and 8th AJCC Staging Systems in Different Populations

In SEER patients with ≤ 15 eLNs, the 5-year survival rates of each stage were as follows: IA, 75.1%; IB, 59.3%; IIA, 42.7%; IIB, 32.4%; IIIA, 17.6%; IIIB, 9.0%; and IIIC, 2.4% (p < 0.001, Supplementary Fig. 2A). In patients with > 15 eLNs, the 5-year survival rates of each stage were as follows: IA, 85.0%; IB, 75.7%; IIA, 70.6%; IIB, 43.9%; IIIA, 29.6%; IIIB, 20.1%; and IIIC, 6.0% (p < 0.001, Supplementary Fig. 2B).

In the SEER patients, the prognostic ability was not superior for the 8th edition (C-index 0.716; 95% CI 0.706–0.728) compared with the 7th edition (C-index 0.716; 95% CI 0.706–0.728). The subgroup analysis showed that the performance of the 8th edition was superior to the performance of the 7th edition in patients with > 15 eLNs [C-index 0.742 (0.728–0.756) vs. 0.735 (0.721–0.749)], but the new and older editions had similar predictive power in patients with ≤ 15 eLNs [C-index 0.713 (0.704–0.722) vs. 0.713 (0.704–0.722)] (Fig. 3a).

Fig. 3
figure 3

Comparison of prognostic performance of 7th and 8th AJCC staging system according to number of examined lymph nodes for SEER patient dataset: a concordance indices, b results from bootstrap analysis (1000 samples): mean differences in Bayesian information criteria (BIC) with 95% confidence limits based on multivariable Cox regression analysis including the 7th and 8th editions of the AJCC TNM staging system according to number of examined lymph nodes. The 95% CIs computed for differences in the BIC indicate significantly different predictive ability of the two staging systems when the zero value is not included. c Time-dependent ROC curves for the 7th and 8th AJCC TNM staging systems according to number of examined lymph nodes in the SEER patient dataset. The horizontal axis represents years after surgery, and the vertical axis represents the estimated area under the ROC curve for survival at the time point of interest

Furthermore, we calculated the BIC using bootstrap analysis. As shown in Fig. 3b, BIC analysis showed no significant differences between the 7th and 8th edition staging systems for the total population (mean difference in BIC, 44.5; 95% CI − 6.5 to 101.8) or for patients with ≤ 15 eLNs (mean difference in BIC, 15.9; 95% CI − 20.5 to 56.7). However, in patients with > 15 eLNs, the 8th edition outperformed the 7th edition (mean difference in BIC, 87.2; 95% CI 48.1–124.9). These results are consistent with the C-index results.

To evaluate the prognostic ability of the staging systems more intuitively, t-ROC curves were constructed. As shown in Fig. 3c, the 8th edition showed discriminative ability similar to that of the 7th edition for patients with ≤ 15 eLNs. However, the t-ROC curves for the 8th edition were continuously superior to those of the 7th edition after surgery for patients with > 15 eLNs.

Prognostic Factors of SEER Patients with ≤ 15 eLNs

To more accurately predict the prognosis of the patients with ≤ 15 eLNs, independent prognostic factors, including age, ethnicity, histology, tumor size, tumor site, T category, lymph node ratio (LNR, defined as the number of positive lymph nodes divided by the number of eLNs),24 and surgery type, were used to establish a new prognostic model (Supplementary Table 1). All parameters were included in a nomogram (Fig. 4a). The outcomes were reported as 1-, 3-, and 5-year cause-specific survival probabilities.

Fig. 4
figure 4

a Gastric cancer nomogram of overall survival for SEER patients in whom ≤ 15 lymph nodes were examined. For an individual patient’s age, a straight line is drawn from the age scale up to the point scale (e.g., an age of 60 years corresponds to ~ 10 points). This process is repeated for each of the remaining parameters by drawing a straight line each time to the point axis. The points received from each prognostic variable are summed, and this number is located on the total points axis. A straight line is drawn directly down from the total points to the estimated 1-year, 2-year, or 5-year cause-specific survival probabilities to ascertain a patient’s specific risk; For example, a total score of 30 points yields estimated survival probabilities of ~ 0.55 at 2 years and ~ 0.40 at 5 years. b Comparison of the three prognostic classification systems for SEER patients in whom ≤ 15 lymph nodes were examined: results from bootstrap analysis (1000 samples); mean differences in BIC are presented with 95% confidence limits

Comparison of the Three Prognostic Classification Systems for SEER Patients with ≤ 15 eLNs

The nomogram’s C-index was 0.735 (0.726–0.745), superior to the 8th (C-index 0.713; 95% CI, 0.704–0.722) and 7th (C-index 0.713; 95% CI 0.704–0.722) editions of the TNM classification for patients with ≤ 15 eLNs. BIC analysis showed that the prognostic performance was significantly better for the nomogram than for the 8th (mean difference in BIC, 590.7; 95% CI 464.4–693.1) and 7th editions (mean difference in BIC, 2701.5; 95% CI 2508.4–2871.2) of the TNM classification (Fig. 4b).

External Validation

External validation was performed using 2355 patients with noncardia GC from the FMUUH dataset. The C-index, BIC, and t-ROC analysis results were consistent with those obtained using the SEER database for patients with both > 15 and ≤ 15 eLNs. However, because the proportion of patients with ≤ 15 eLNs was small compared with the entire group, the prognostic ability of the 8th edition was significantly greater than that of the 7th edition [C-index 0.733; 95% CI 0.714–0.752 vs. 0.722 (0.702–0.741); mean difference in BIC, 37.0; 95% CI 8.4–68.4] for all patients in the FMUUH dataset (Supplementary Fig. 3A–C).

Additionally, the FMUUH nomogram’s C-index was 0.715 (0.652–0.777), higher than the C-indices of the 7th (C-index 0.692) and 8th (C-index 0.690) editions. Furthermore, BIC analysis showed that the predictive power of the nomogram (mean difference in BIC) was significantly better than that of the 7th (mean difference in BIC, 15.7; 95% CI 11.0–46.4) and 8th editions (mean difference in BIC, 13.1; 95% CI 10.5–45.2) (Supplementary Fig. 3D).

Discussion

TNM staging is the most important tool for cancer treatment and evaluation of patient outcomes.25 The greatest change in the recently released 8th edition of the AJCC TNM staging system for GC is the separate inclusion of N3a and N3b6,7,8,9; however, whether the 8th edition can significantly improve the prognostic ability of the staging system has not been reported.

At present, the predictive performance of the 7th edition of the TNM staging system for GC remains controversial. The study of Wang showed that the predictive ability of the 7th edition was significantly better than that of the 6th edition when used in a Chinese GC population.3 However, Reim found that the 7th edition was not better than the 6th edition when used in a Western population of GC patients, although the complexity was increased.26 In our study, we adopted the published analysis method26 and showed that the 8th edition had predictive ability similar to that of the 7th edition. A total of 19.4% of the patients exhibited a change in staging between the 7th and 8th staging system. Interestingly, the vast majority of the patients exhibited downstaging, because the median number of eLNs in the SEER dataset was 11, not reaching the level of at least 15 cleared nodes recommended for TNM staging.27,28

Enough eLNs is important. Examination of 16 or more regional lymph nodes helps predict prognosis in patients with GC.29,30 Many patients with fewer than 10 eLNs who were found to be node negative were likely to not be truly node negative but instead be understaged; 30,31 N1 disease can be upstaged to N2 or even N3 disease as more lymph nodes are harvested.31,32 In this study, the OS of patients with > 15 eLNs was significantly superior to that of patients with ≤ 15 eLNs (Fig. 1), showing that the extent of lymphadenectomy and the pathological analysis tended to be lower and that stage migration occurred, leading to understaging and subsequent underestimation of disease severity. Therefore, we compared the predictive performance of the 7th and 8th editions for patients with > 15 and ≤ 15 eLNs, then developed a nomogram to predict prognosis for patients with poor operative quality, in whom the TNM system cannot be applied.

In patients with > 15 eLNs, the 5-year survival rates for each stage were significantly different when the 8th edition of the staging system was used. Compared with the 7th edition, the 8th edition resulted in a higher C-index. Additionally, BIC analysis indicated that the 8th edition was significantly better than the 7th edition. Therefore, the new staging system exhibits better monotonicity, distinctiveness, and homogeneity. Although the BIC describes the prediction capability (with lower values indicating better fit), raw BIC values or differences in BIC are not convertible to a clinically meaningful scale.17 Therefore, t-ROC analysis was further used in this study to compare the discriminative ability of the staging systems. The results showed that the t-ROC curve of the 8th edition was continuously superior to that of the 7th edition for the patients after surgery. The advantage of this method is that it enables analysis of survival data with censoring of ROC curves, which is a popular method to determine sensitivity and specificity. Furthermore, the significance of individual prognostic factors can be assessed visually.21

However, this study found that the predictive ability of the 8th edition was not improved for patients with ≤ 15 eLNs, which might be related to stage migration33 caused by insufficient number of eLNs. Therefore, the TNM classification may not be suitable for GC patients in whom ≤ 15 lymph nodes are examined. Many previous studies have shown that the tumor–node ratio–metastasis (TNrM)staging system provides better predictive power than the pTNM classification for patients with insufficient lymph node dissection.11,33 Additionally, to predict the prognosis of patients more accurately, one should not rely solely on postoperative pathological staging but instead use this measure in combination with other prognostic factors.34,35 Therefore, we combined the TNrM staging system with independent prognostic factors to establish a prognostic model for patients with ≤ 15 eLNs. Compared with the 7th and 8th editions staging system, the prognostic model yielded higher C-index, and BIC analysis showed higher prognostic predictive power. Patients from the Eastern FMUUH database also showed strong predictive ability for the model.

This study has some limitations. First, it is a retrospective study. Second, some information is not available from the SEER database, such as the extent of lymph nodes dissection performed or adjuvant treatment. Therefore, we were unable to analyze effects of chemotherapy such as chemotherapy or radiation on the prognosis. Third, the study excluded patients with radiation. However, radiotherapy is an important treatment for patients in the USA, so further research is needed for this group of patients. Finally, the number of patients with ≤ 15 eLNs was small in our external validation cohort. Nevertheless, for the first time, we compared the predictive ability of the 7th and 8th editions of the AJCC TNM staging system in a large Western population sample. The results were confirmed by external data obtained from Eastern countries. Furthermore, we presented a novel prognostic model for patients with ≤ 15 eLNs and showed that that model had high predictive ability for prognosis, although the prognostic model needs to be validated by prospective studies.

Conclusions

The 8th edition TNM classification shows better prognostic ability than the 7th edition for noncardia GC patients with > 15 eLNs, but no improvement was found for patients with ≤ 15 eLNs. Progress will be achieved by combining the TNM classification system with additional independent clinical prognostic factors. More precise prognostic predictions will lead to better individualized treatment recommendations.