Introduction

Colon cancer is the fourth most commonly diagnosed and fifth cause of cancer-related deaths worldwide [1]. In the USA, approximately 38% of colon cancer patients diagnosed between 2006 and 2012 had localized (stage I and II) colon cancer and a 5-year survival rate of about 90% better than regional (stage III) colon cancer patients [2]. The standard treatment for non-metastatic colon cancer is radical surgery with or without adjuvant chemotherapy. Adjuvant chemotherapy is recommended for high-risk patients with stage II colon cancer [3, 4]. Great efforts have been made to identify high-risk patients, but there is still no standardized method. Adjuvant chemotherapy can improve the survival of patients with stage III colon cancer but whether it also benefits stage II colon cancer patients remains a contentious matter [5].

TNM staging classification plays an important role in the stratification of patients’ prognosis, which guides treatment strategy. Though the prognosis of stage II colon cancer patients is generally better than stage III colon cancer patients, there is still a high level of variability [6,7,8]. Recently, several studies have reported that a higher number of examined lymph node (ELN) predicted better prognosis among different cancers [9,10,11]. ELN has also been linked to the prognosis of colorectal cancer patients [12, 13]. Precise staging is key to execute appropriate adjuvant therapy for colorectal cancer, where the number of positive regional lymph nodes is a critical factor. However, an insufficient number of ELN may lead to inaccurate assessment for lymph node involvement [9, 10]. To identify the potential stage III colon cancer from stage II cancer is of great clinical significance.

According to the TNM staging method, only tumor invasion depth (pT), and not regional lymph nodes, is taken into consideration for prognosis stratification of stage II colon cancer. Although the relationship between ELN levels and colorectal cancer prognosis has been reported, the impact of ELN on stage migration in stage II colon cancer has rarely been studied [12,13,14,15,16]. To address this issue, in our present study we combined the ELN number and pT to generate a novel scoring system for reclassification of stage II colon cancer.

Methods

Data source

With granted access to the Surveillance, Epidemiology, and End Results (SEER) Program database (https://seer.cancer.gov/), the SEER 1975–2016 Research Plus Additional Custom Treatment Data released in April 2019 was downloaded for analysis. There are two files titled “YR1975_2016.SEER9” (dataset 1) and “YR1992_2016.SJ_LX_RG_AK” (dataset 2), respectively. Dataset 1 contains the SEER November 2018 Research Data files from nine SEER registries for 1975–2016 and dataset 2 from another four registries.

Patients included and cohort selection

The raw data was processed and saved by Excel Software. There were 147 variables in total, most of which were unavailable. In our present study, data from 17 variables was extracted including marital status, sex, age at diagnosis, year of diagnosis, primary site, grade, ELN, tumor size, site-specific factor 1 (CEA), derived AJCC T and N, reason for no surgery, race recode, vital status record, SEER cause-specific death classification, survival month, total number of tumors for patients, and chemotherapy. The targeted population were patients diagnosed with stage II or III colon cancer between 2004 and 2010 according to the 6th edition of the AJCC cancer staging manual, received radical surgery, and survived more than 1 month. Exclusion criteria were as follows: patients with unavailable data of ELN, histology grade, and tumor size (Fig. 1). The number of ELN ranged from 0 to 90 and were divided into 3 groups of 0–10, 11–20, and more than 20 which can serve as a classified variable. Similarly, some continuous variables, including tumor size and age, were also regrouped as shown in the results.

Fig. 1
figure 1

Flowchart for data selection

Three cohorts were selected from datasets 1 and 2 for different purposes (Fig. 1). Cohort 1 from dataset 1 consisted of stage II colon cancer patients and was used to generate a novel prognostic scoring system that can be used for recognizing high-risk patients. Cohorts 2 and 3 were both selected from dataset 2. Cohort 2 consisted of stage II colon cancer patients while cohort 3 consisted of stage III colon cancer patients. Cohort 2 was used to validate the efficiency of the prognostic scoring system externally. A comparison between the novel prognostic scoring system in cohort 2 and TNM staging classification in cohort 3 was performed.

Novel prognostic scoring system

For stage II colon cancer, the number of ELN was taken into consideration to regroup patients and tumor invasion depth. We defined the score as 0 or 1 when the tumor invasion depth was T3 or T4, respectively. Correspondingly, a score of 0, 1, or 2 indicated the number of ELN was more than 20, ranged from 11 to 20, or ranged from 0 to 10, respectively. All patients were reclassified into four groups based on the combined score, which was calculated as the sum of the scores (Table S1). Group N includes all the patients with a combined score N (N = 0, 1, 2, 3).

Statistical analyses

The difference between the groups was analyzed by χ2 test. Univariate analysis was performed to evaluate the relationship between patients’ survival and variables including marital status, sex, age, and tumor location. Multivariate Cox regression analysis was performed to identify the potential independent prognostic factors from the variables examined with P value < 0.05 in univariate analysis. Survival curves were plotted by the Kaplan-Meier method. Hazard ratio and 95% confidence interval among subgroups were calculated by the log-rank test. χ2 test, univariate, and multivariate analyses were performed using the Statistical Package for Social Sciences version 25. Survival difference between groups was compared using GraphPad Prism version 8. A two-sided P < 0.05 was considered statistically significant.

Results

Patients’ demographics and clinical characteristics

Three cohorts, delineated on a workflow chart (Fig. 1), were selected from the SEER database. After quality control analysis and filtering of data, patients were grouped into different cohorts. Cohort 1 from dataset 1 included a total of 13,960 patients with stage II colon cancer. Cohorts 2 and 3, both of which were selected from dataset 2, included 5312 stage II and 4713 stage III colon cancer patients, respectively.

Demographics and clinical characteristics among the cohorts are listed in Table S2. All three cohorts had similar distributions of marital status, gender, age, and tumor location. Nearly half of the population analyzed consisted of patients diagnosed between ages 61 and 79. Patients with ascending colon cancer accounted for around fifty percent of patients across cohorts (Table S2). Cohorts 1 and 2, both of which only included stage II colon cancer patients, had similar distributions of histology grade, tumor size, CEA level, tumor counts, and chemotherapy (Table S2).

Establishment of a novel prognostic scoring system

Cohort 1 was divided into 3 subgroups according to the number of examined regional lymph nodes. The differences in clinical characteristics among the three subgroups are listed in Table 1. There were no differences in marital status, gender, and CEA observed. Differences in features were small (less than 5%) but statistically significant (P < 0.05) except for age, tumor location, and tumor size. Compared to group 0–10, group ˃ 20 had a higher percentage of patients diagnosed at age ≤ 60 years old and a lower percentage of patients diagnosed at age ≥ 80. The proportion of patients with smaller tumor size (< 5 cm) was highest in group 0–10 and second highest in group 11–20 (Table 1).

Table 1 Demographics and clinical characteristics of cohort 1

Univariate and multivariate analyses were performed to evaluate the relationship between clinical characteristics and patients’ survival. By univariate analysis, factors including marital status, age at diagnosis, tumor location, tumor histological grade, CEA level, T invasion, race, tumor counts, and ELN were found to be associated with both cause-specific survival (CSS) and overall survival (OS) of patients (P < 0.05). Although tumor size was associated with CSS significantly (P < 0.001), it had no association with OS (P = 0.102). Interestingly, patients who received chemotherapy had a better OS but not CSS than those that did not receive chemotherapy (P < 0.001) (Table S3). Factors with P < 0.05 in univariate analysis were then further examined via multivariate analysis to identify independent prognostic factors. Results from the multivariate analysis are shown in Fig. 2. We found that independent prognostic factors for both CSS and OS included age at diagnosis, T invasion, number of ELN, and CEA level. Intriguingly, when compared to patients with only one primary tumor, those who had more than 2 tumors showed better CSS (P < 0.001; HR: 0.439, 95% CI: 0.390–0.495) but inferior OS (P < 0.001; HR: 1.583, 95% CI: 1.509–1.660).

Fig. 2
figure 2

Multivariate survival analysis in cohort 1. Hazard ratio and 95% CI shown as forest plots. Those factors including married status, younger age at diagnosis, negative CEA, T3 stage, and increasing number of ELN were favorable prognostic factors for both CSS and OS in cohort 1

The survival curves for T invasion and ELN are illustrated in Fig. 3. Patients with a smaller number of ELN had both inferior CSS and OS (Fig. 3a, b). Patients with T4 tumors had both much worse CSS and OS when compared to T3 tumors (Fig. 3c, d). Furthermore, age at diagnosis was found to be an independent prognostic predictor and was distributed differently among the three groups. Survival curves were replotted after age was taken as a stratification factor. A smaller number of ELN predicted worse survival in subgroups of patients with different age ranges (Fig. S1).

Fig. 3
figure 3

Survival curves grouped by different factors in cohort 1. a, c, e Patients with smaller ELN numbers, pT4, and higher scores had worse cause-specific survival. b, d, f Patients with smaller ELN numbers, pT4, and higher scores had worse overall survival

A novel prognostic scoring system was established, as described in the methods section, by combining the T invasion and ELN to identify high-risk patients among stage II colon cancer patients who were considered as local tumors and had relatively better survival. As expected, patients with higher scores had worse survival (Fig. 3e, f).

External validation of the novel prognostic scoring system

To validate the efficiency of the novel prognostic scoring system derived from cohort 1, cohort 2 was divided into four groups according to the new system. Univariate and multivariate analyses were then performed. The difference of distribution by clinical characteristics among the four groups is listed in Table 2. No difference was found for marital status and gender. Compared to the other groups, group 3 had a higher percentage of older patients, patients with positive CEA, or receiving chemotherapy (P < 0.001). (More details can be found in Table 2.)

Table 2 Demographics and clinical characteristics of cohort 2

By univariate analysis, we found that T invasion, the number of ELN, and the combined score were associated with both CSS and OS (P < 0.001). Patients who had more than one primary tumor showed better CSS and worse OS when compared to those with only one tumor, which coincided with the results from cohort 1. Additionally, chemotherapy only improved patients’ OS but not CSS, which was also the case in cohort 1 (Table S4). Similarly, factors with P < 0.05, as determined by univariate analysis, except T invasion and ELN, which were used to generate the combination score, were included in the Cox multivariate model for further analysis. The combination score, patients’ age, and CEA were all independent prognostic predictors in cohort 2 (P < 0.001) (Fig. 4). Patients with higher scores had worse CSS and OS. Compared to group 0, the HR (95% CI) values of CSS, as estimated by log-rank test, were 1.223 (0.928–1.612), 1.855 (1.390–2.474), and 4.116 (2.797–6.056) for groups 1, 2, and 3, respectively. The HR (95% CI) values of OS were 1.495 (1.274–1.754) and 2.036 (1.580–2.623) for groups 2 and 3, respectively (Fig. 4). Thus, according to the novel prognostic scoring system, patients could be effectively classified into different risk levels.

Fig. 4
figure 4

Multivariate survival analysis in cohort 2. Hazard ratio and 95% CI shown as forest plots. Those factors including older age at diagnosis, positive CEA, higher score were inferior prognostic factors for both CSS and OS in cohort 2

Clinical value of the novel prognostic scoring system

Stage III colon cancer is a regional advanced tumor accompanied by regional lymph node metastasis, which has a worse survival outcome than stage II colon cancer. Cohort 3 was disbanded into five subgroups according to T and N, which was determined following the 6th AJCC guidelines. Survival curves for subgroups are illustrated in Fig. 5a–d. The cumulative survival proportions at 3, 5, and 10 years after diagnosis are shown in Fig. 5e, f. Group 3 displayed the worst CSS and OS in cohort 2 (Fig. 5a, c), with comparable CSS and OS at 3, 5, and 10 years when compared with T3N2 or T4N1 groups (Fig. 5e, f; the hazard ratios for CSS (Table 3) and OS (Table 4) between the two groups were calculated and compared by the log-rank test.). Similar or worse survival groups, as determined by the novel scoring system and compared to TNM staging system scores, are italicized. Because right hemicolectomy usually yields a higher number of lymph nodes and the right colon cancer takes a majority of both cohorts. We performed a subgroup analysis, and similar results were also found in the subgroup of right colon cancer (Table S5 and S6). Based on our results, the novel prognostic scoring system could effectively identify high-risk patients with stage II colon cancer, who had an even worse survival than some of the stage III colon cancer. This, in turn, could be applied in the clinical setting to help improve the prognosis of high-risk patients.

Fig. 5
figure 5

Comparison of survival curves between reclassified stage II patients using the novel system and stage III patients. a, c Survival outcome of patients regrouped using the novel prognostic scoring system in cohort 2. b, d Survival outcome of subgroups in stage III patients of cohort 3. e, f Comparison of long-term survival of CSS and OS among subgroups of cohorts 2 and 3

Table 3 Hazard ratio calculated by comparison of CSS curves between groups by log-rank test
Table 4 Hazard ratio calculated by comparison of OS curves between groups by log-rank test

Discussion

According to the AJCC cancer staging manual, primary tumor invasion (pT), involved regional lymph nodes (pN), and distant metastasis (pM) should be taken into consideration for colon cancer staging, and stage II colon cancer includes only T3–4 tumor without regional lymph node consideration. There is a big difference in treatment between stage II and III colon cancer. Precise assessment of stage is very important for the delivery of adjuvant therapy to colon cancer patients. The pN stage in colon cancer is influenced by regional lymph nodes. Adequate number of ELN for pN staging is no less than 12 as recommended by AJCC guidelines. Inadequate ELN numbers may result in tumor understaging and lead to worse survival [12, 13, 16,17,18]. However, the optimal number of ELN has not been determined in colon cancer [19,20,21,22,23,24]. Here, we focused on the impact of ELN number on stage II colon cancer and investigated the stage migration effect caused by insufficient ELN number.

Several factors influence the number of ELN. Operation selection and surgeon skill influence complete regional lymph node resection, while variations in pathology practice and skill lead to different regional nodes retrieved [20, 25, 26]. In our present study, some patient characteristics were associated with the number of ELN. Younger patients or those with bigger tumors were more likely to have higher lymph node yield. Previous studies have evaluated the impact of ELN number and metastatic lymph node ratio (mLNR, calculated by positive lymph nodes dividing ELN number) on the prognosis of cancer patients and confirmed higher ELN associated with better survival across various solid tumors including colon cancer [13, 20, 22, 27,28,29]. In our study, ELN number, pT, and patients’ age were determined to be independent prognostic predictors in stage II colon cancer. Patients with less than 10 ELN exhibited worse CSS and OS as compared to those with higher ELN. Similar results were observed among subgroups with different age ranges.

Since the method for building a nomogram for cancer prognosis was introduced by Alexia Iasonos et al. in 2008 [30], studies on the establishment of prognostic nomograms for different cancers have expanded [31,32,33,34,35]. Although useful for stratifying patients, nomograms, especially those consisting of many factors, have limitations for clinical practice. Our novel prognostic scoring system, consisting of combining pT and ELN, is easy and convenient for clinical practice. Stage II colon cancer patients can be reclassified by simply adding the ELN number. As discussed above, patients with the highest score had the worst survival of CSS and OS in both training and validation cohorts according to the novel scoring system.

Although patients with stage II colon cancer generally have a better prognosis than those with stage III colon cancer, there is still extensive variability. One possible reason for this is that stage II colon cancer patients might be stage III colon cancer patients, suggesting understaging as a result of inadequate ELN. The impact of the number of ELN on accurate staging in different cancers has been assessed and generally, a greater number of ELN is related to more accurate node staging and better prognosis [9, 10, 36]. To evaluate the stage migration effect caused by inadequate ELN in stage II colon cancer, we compared patient survival of reclassified stage II colon cancer with those of stage III colon cancer. As expected, stage II colon cancer patients with a smaller number of ELN had comparable, or worse, survival of both CSS and OS when compared with stage III colon cancer patients. For instance, pT4N0 cancer with a 0–10 ELN had similar survival of both CSS and OS relative to either pT4N1 or pT3N2 cancer. Inadequate ELN leads to a greater chance of failing to detect positive lymph nodes and to understaging in colon cancer.

Our present study focused on the effect of inadequate ELN on the prognosis of patients without regional lymph node metastasis. However, the limitation is that why patients experienced inadequate ELN was not addressed here. Some molecular markers such as microsatellite instability (MSI), KRAS, and BRAF mutations are not documented which can also predict the prognosis and direct the treatment of specific stage CRC. MSI test is universally recommended for diagnosis of Lynch syndrome and also can direct the adjuvant chemotherapy post-surgery for a stage II population (irrespective of germline or sporadic background) [37, 38]. However, identification of gene mutations like KRAS and BRAF is recommended for metastatic colon cancer in clinical practice.

In conclusion, a greater number of ELN were found to be associated with better survival in stage II colon cancer and we recommend more than 20 ELN for accurate staging post-surgery. According to the prognostic scoring system, stage II colon cancer can be reclassified into different subgroups, where patients with a score of 2 or 3 should be treated following guidelines for stage III colon cancer patients.