Background

Colorectal cancer (CRC) is the third most common cancers and the second leading cause of cancer-related deaths worldwide [1]. Surgery and chemotherapy are the most common therapeutics for CRC. Surgery is used for stage I CRC with 5-year survival rate approximately 90% [2]. Postoperative adjuvant chemotherapy is a standard treatment for patients with stage III CRC, which improves survival and reduces risk in disease recurrence [3,4,5]. However, although a large number of clinical trials have been carried out, whether adjuvant chemotherapy is beneficial to stage II CRC is still controversial [6,7,8]. About one-quarter of CRC patients are stage II CRC, and approximately 15–25% of these patients suffer from disease recurrence [9, 10]. Current guidelines recommend adjuvant chemotherapy for stage II CRC with clinicopathological risk factors including T4 stage, high tumour grade, <12 lymph nodes harvested, positive or unknown margins, intestinal obstruction or perforation, lymphovascular and perineural invasion [8]. However, those clinicopathological risk factors can not clearly classify which patients have a high or low risk of recurrence, and can not precisely predict which patients benefit from adjuvant chemotherapy. One reason that adjuvant chemotherapy failed to show a survival benefit for patients at stage II CRC could be the lack of biologic factors predicting their recurrence, making it more difficult to draw meaningful clinical conclusions. Hence, the role of biomarkers is becoming increasingly important for the future personalised treatment options. MSI status has proven accurate and effective in identifying approximately 10–15% of patients who do not benefit from 5-FU adjuvant chemotherapy [11, 12]. In addition, some molecular biomarkers have been reported, for example, BRAF mutation, CDX2 gene expression, tumour-infiltrating T cell counts, non-coding RNAs, Oncotype DX colon and so on, but there is still a lack of sufficient evidence resulting in been not applied in clinic [9, 13,14,15,16,17,18,19,20,21,22]. Recently a randomised controlled clinical study DYNAMIC was reported to show that circulating tumour DNA (ctDNA) guides adjuvant therapy in stage II colon cancer. Compared to standard-management group, ctDNA-guided approach reduced the use of adjuvant therapy without compromising recurrence risk [23]. Despite of these efforts, there is still an urgent need for effective biomarkers with prognostic and predictive value to classify the stage II CRC patients.

Tumourigenesis of CRC is a multistep process involving the progression from adenomas to adenocarcinomas, and this complex process harbours the accumulation of genetic and epigenetic alterations [24,25,26,27]. Studies have shown that epigenetic changes including DNA methylation, histone modifications, small non-coding RNAs, chromatin remodelling, etc. play important roles in the development and progression of diseases [28, 29]. These epigenetic modifications are heritable and reversible, involving in conferring cellular plasticity by establishing specific cellular states and responding to changes in microenvironment. Aberrant epigenetic modifications have been shown to play important roles in colorectal cancer [30, 31].

Among the epigenetic modifications, DNA methylation has received a lot of attention because of its stability and heritability [32]. In eukaryotes, DNA methylation occurs at C5 in CpG dinucleotides termed as CpG sites, and CpG islands are regions with a high frequency of CpG sites. CpG island methylator phenotype (CIMP) has been described in CRC, which is characterised by a high frequency of aberrant CpG island methylation [33]. CIMP-positive CRC are associated with the clinical features of female, older age, right side colon, MSI-H status, and BRAF mutations [34, 35]. Aberrant DNA methylation results in dysregulation of various genes and occurs in all stages of cancer, including initiation, growth, and metastasis [36,37,38]. DNA methylation inhibitors have been used clinically as antitumour drugs, suggesting the importance of DNA methylation in cancers [39]. Furthermore, changes in DNA methylation in cancer have been regarded as promising targets for the development of powerful diagnostic, prognostic, and predictive biomarkers. So far, 14 DNA methylation-based biomarkers have been translated into a commercially available clinical test [40].

The lack of reliable biomarkers for the identification of patients with stage II CRC who are at high risk for relapse has made it difficult to make clinical decision in treatment. There is an urgent need to identify prognostic and predictive biomarkers of stage II CRC. To investigate potential DNA methylation sites associated with stage II CRC recurrence, we used high-throughput sequencing to screen 2,521,730 CpG methylation sites in recurrence and nonrecurrence stage II CRC tissue samples in this study. We developed a classifier based on the characteristics of eight DNA methylation sites to predict the recurrence of stage II CRC and potentially guide adjuvant therapy in the current study.

Methods

Patients

This study was approved by the Institutional Review Board of Fudan University Shanghai Cancer Center and was performed in accordance with the Declaration of Helsinki. This study recruited 243 patients of stage II CRC with clinicopathological characteristics and follow-up information from January 2010 to December 2015 at Fudan University Shanghai Cancer Center. Patients with neoadjuvant therapy were excluded in this study. Written informed consent for the sample collection was obtained. Tumour tissue samples were collected at surgery and conserved in RNAlater. All patients were treated and followed up according to the Chinese Society of Clinical Oncology guideline.

Procedures

In the discovery phase, 62 stage II CRC patients, including 31 recurrence and 31 nonrecurrence, who did not receive adjuvant chemotherapy were selected to evaluate the genome-wide DNA methylation profile by Agilent SureSelectXT Human Methyl-Seq, a target enrichment system which delivers more information than methylation microarrays and reduces costs compared with whole genome bisulfite sequencing. Differential methylation sites (DMSs) between recurrence and nonrecurrence samples were identified and the candidate CpG sites were selection based on (1) methylation-level differences; (2) statistical significance; (3) AUC values; and (4) more than one differential CpG site within 100 base pairs. We then quantified the methylation level of candidate CpG sites in 243 stage II CRC patients by pyrosequencing to build and validate a predictive signature of recurrence. Least absolute shrinkage and selection operator (LASSO) method was applied to build the disease recurrence prediction classifier (Fig. 1).

Fig. 1: Study flowchart.
figure 1

Differential methylation sites were selected in the discovery phase and validated in the training and validation phase.

Methyl-seq and data analysis

Sixty-two of the 243 samples were selected for Methyl-seq. Genomic DNA from tissue samples were extracted by Qiagen genomic DNA extraction kit (QIAGEN, catalogue no. 69506), and all the samples passed the quality control tests. Agilent SureSelectXT Human Methyl-Seq, a comprehensive target enrichment system to analysis CpG methylation, were used to quantify CpG methylation levels according to instruction. For each CpG site, reads represented either methylation and unmethylation were obtained by Bismark, a software package to map and determine the methylation state of sequencing reads [41]. The reference genome is hg19 human genome. The methylation level of each CpG site is represented as a β value, calculated as the ratio of the number of reads supporting methylation to the sum of the number of reads supporting both methylation and unmethylation. The β values range from 0 (unmethylated) to 1 (fully methylated). Δβ was calculated using the following formula: Δβ = (mean β nonrecurrence samples) − (mean β recurrence samples).

Bisulfite-PCR pyrosequencing

Pyrosequencing performed on the PyroMark Q96 instrument (QIAGEN, Hilden, Germany) to analyse candidate CpGs in 243 samples. In all, 100 ng genomic DNA were used for bisulfite treatment (ZYMO Research, catalogue no. D5006), and the product was used as PCR template subsequently. A detailed pyrosequencing protocol has been described previously [42]. The genomic sequences and primers designed for CpG sites are listed in Supplementary Table S6.

Statistical analysis

Continuous variables were presented as mean (standard deviation) or median (interquartile range), and categorical variables were described as frequencies and percentages. Student’s t test or the Mann–Whitney U-test were used to identify significant differences between groups. The LASSO model was used to select prognostic markers of all the relapse-associated CpGs in the training data sets, and constructed a classifier for predicting the recurrence of patients with stage II CRC. We conducted the receiver operating characteristic (ROC) analysis and calculated the area under the curve to measure prognostic or predictive accuracy.

Disease recurrence was defined as whether the patient developed recurrence before the follow-up deadline. Disease-free survival (DFS) was defined as the time from the data of resection to the data of confirmed tumour relapse or death. OS was calculated from the data of surgery to death. We used the Kaplan–Meier method to analyses the correlation between the classifier and DFS or OS, and the log-rank test to compare survival curves. The univariable and multivariable Cox regression model were used to assess the association between the classifier and survival. Common clinical features were adjusted in the multivariable model, and Cox regression coefficients were used to generate nomograms. Calibration plots were generated to explore the performance characteristics of the nomograms. Calibration is useful for assessing whether actual outcomes approximate predicted outcomes for every nomogram. The x-axis represents the prediction calculated with use of the nomogram, and the y-axis represents the actual freedom from cancer recurrence for our patients. The 45-degree line represents the performance of an ideal nomogram, in which predicted outcome perfectly corresponds with actual outcome. In a well-calibrated model, points are close to the 45-degree line.

All statistical significance was set at P value of <0.05. Statistical analyses were performed using the R software version 3.6.0 (https://www.r-project.org/).

Results

Patient clinical characteristics

A total of 243 patients diagnosed with clinical stage II CRC were enrolled in this study, and patients treated with neoadjuvant therapy were excluded. The clinical characteristics of the patients were listed in Supplementary Table S1. The median age was 62 years (range 25–87) and 38.7% of the patients were female.

Selection of recurrence-related candidate CpG sites

We first evaluated a genome-wide DNA methylation profile of a discovery cohort of 62 patients with stage II colorectal tumours, including 31 recurrence and 31 nonrecurrence by the Agilent SureSelectXT Human Methyl-Seq. We analysed a total of 2,521,730 CpG methylation sites, and identified 294 CpG sites as the most differential methylation sites (DMSs) between recurrence and nonrecurrence tissues (Supplementary Table S2). Most of the differently methylation sites (220 of 294 sites, 75%) were hypomethylated in recurrence sample. Then, the following 12 CpG sites were selected as top candidates CpGs based on the selection criteria mentioned above: chr13_107146306, chr13_107146299, chr13_107146263, chr13_107146261, chr13_107146255, chr4_75230386, chr4_75230392, chr4_75230411, chr18_11752756, chr10_122334562, chr4_3409383, chr4_3409701 (Fig. 2). Using unsupervised hierarchical clustering, these 12 DMSs successfully separated the 62 patients into 2 discrete clusters (Supplementary Fig. S1).

Fig. 2: CpG methylation sites are significantly associated with recurrence in stage II CRC.
figure 2

al β values of candidate sites among nonrecurrence (n = 31) and recurrence (n = 31) stage II CRC samples.

Building and validation of a predictive signature of recurrence in stage II CRC

In order to obtain a more reliable signature of recurrence in stage II CRC, 243 stage II CRC patients were included to build and validate the predictive model. Samples were randomly divided into a training cohort (n = 171, 70%) and a validation cohort (n = 72, 30%). We quantified the methylation level of candidate DNA methylation sites in 243 patients by pyrosequencing. When pyrosequencing was used to detect site chr18_11752756, chr10_122334562, chr4_3409383, and chr4_3409701, we also obtained the methylation information of two CpG sites adjacent to each of these sites in the genome: chr18_11752740, chr18_11752738, chr10_122334583, chr10_122334569, chr4_3409388, chr4_3409399, chr4_3409739 and chr4_3409727. Hence, a total of 20 CpG sites comprising of 12 DMSs and 8 adjacent CpG sites were used to generate a methylation signature of recurrence in stage II CRC.

We first determined the association of 20 individual CpG sites with prognosis in the training samples (Supplementary Table S3 and Supplementary Fig. S2). We then determined the combination of these sites in the prediction of disease progression. Eight CpG sites were selected by LASSO methods to build a disease recurrence prediction classifier (Supplementary Fig. S3A, B). The risk score combing the eight CpG sites (chr4_75230411, chr13_107146255, chr13_107146299, chr18_11752740, chr10_122334569, chr10_122334562, chr4_3409383, chr4_3409727) was calculated as follows:

Risk score = 1/1 + exp [−(1.815 × methylation level of chr4_75230411 + 6.533 × methylation level of chr13_107146299-7.637 × methylation level of chr13_10714625-10.165 × methylation level of chr18_11752740 + 2.384 × methylation level of chr10_122334569-1.673 × methylation level of chr10_122334562 + 1.198 × methylation level of chr4_340938 + 3.474 × methylation level of chr4_3409727 − 1.657)].

In this formula, methylation levels range from 0 to 1.

With disease recurrence as an event, ROC analysis was performed in the training data sets and validation data sets. The AUC of the training cohort and validation cohort was 0.75 (95% CI: 0.67–0.82) and 0.71 (95% CI: 0.58–0.84) respectively, indicating that the signature potentially represents a robust prognostic biomarker for stage II RCR (Fig. 3a, b).

Fig. 3: DNA methylation signature is a biomarker for recurrence in stage II CRC.
figure 3

ROC curves in the training (a) and validation cohort (b).

The prognostic value of the DNA methylation classifier

To study the prognostic value of the classifier, the risk score for each patient was calculated. The distributions of risk score and recurrence status of stage II patients in training cohort and validation cohort were analysed. As expected, patients with recurrence had a higher risk score and patients with lower risk score were less likely to relapse (Fig. 4a, b).

Fig. 4: Association between the DNA methylation classifier and prognosis.
figure 4

Waterfall plot of risk score using DNA methylation classifier in training (a) and validation (b) cohort. DFS Kaplan–Meier survival in the training (c) and validation cohort (d).

All patients in the training cohort were further divided into either a high-risk group or low-risk group based on the median risk score as the cutoff. The Kaplan–Meier survival curve showed that patients in high-risk group had a poorer DFS than the low-risk group (Logrank test: P value <0.0001) (Fig. 4c). Similar results showed that high-risk patients had poorer DFS than low-risk patients in the validation cohort (Logrank test: P value = 0.026) (Fig. 4d).

DNA methylation classifier is an independent prognostic factor for stage II CRC

The univariate Cox proportional hazard regression model indicates that the risk score was highly associated with prognosis of DFS (HR = 2.96, 95% CI: 1.84–4.75, P value < 0.001) in training cohort (Supplementary Table S4). In validation cohort, risk score (HR = 2.27, 95% CI: 1.08–4.75, P value = 0.017) was also a predictive factor of DFS (Supplementary Table S5).

To further evaluate the role of the DNA methylation classifier in the prediction of prognosis, we included the CpG methylation risk score and common clinicopathological features including age, gender, TMN stage, adjuvant chemotherapy, tumour grade, lymphovascular invasion, perineural invasion, number of lymph nodes harvested, positive or unknown margins and intestinal obstruction or perforation, in the multivariate Cox regression model. In the training cohort, the model showed that risk score (HR = 2.80, 95% CI: 1.71–4.58, P value < 0.001) was an independent prognostic factors for stage II CRC (Supplementary Table S4). In validation cohort, multivariate Cox analysis showed the risk score was an independent prognostic factor associated with DFS (HR = 2.82, 95% CI: 1.20–6.61, P value = 0.017) (Supplementary Table S5). The methylation classifier was still a statistically significant prognostic factor after clinicopathological risk factors are stratified (Fig. 5 and Supplementary Fig. S4).

Fig. 5: DNA methylation signature is an independent prognostic factor in stage II CRC.
figure 5

Kaplan–Meier survival analysis of 243 patients with stage II CRC using the DNA methylation classifier stratified by clinicopathological risk factors. a, b TNM stage. c, d Tumour grade. e, f Perineural invasion. g, h Intestinal obstruction of perforation (IOP) status.

The combination of the methylation classifier with clinicopathological risk factors performed slightly better than the methylation classifier alone in predicting disease recurrence, and the AUC was 0.734 (Fig. 6a). The classifier also showed significantly higher accuracy in the prediction of DFS (Fig. 6b, c) and overall survival (Supplementary Fig. S5A, B) than any other clinicopathological risk factors. These results indicated the methylation classifier or the combination of DNA methylation classifier with clinicopathological risk factors can serve as a prognostic tool in predicting disease recurrence, DFS and OS in stage II CRC.

Fig. 6: DNA methylation classifier is more accurate in the prediction of disease recurrence and disease-free survival than other clinicopathological risk factors.
figure 6

a Disease recurrence ROC curves of prognosis analysis using the DNA methylation classifier (high risk vs low risk), TNM stage (T4 vs T3), tumour grade (high vs low), IOP (yes vs no), number of lymph node examined (12 or more vs fewer than 12), lymphocasular invasion (yes vs no), perineurial invasion (yes vs no), positive margins (yes vs no), combined clinicopathological risk factors alone, and the combination of classifier with clinicopathological risk factors. b, c Disease-free survival time-dependent ROC curves comparisons of DNA methylation classifier, clinicopathological risk factors, and the combination of classifier with clinicopathological risk factors.

Association of the risk score with the benefit of adjuvant chemotherapy

To explore the potential utility of our risk score in treatment decision, we first determined the benefit of adjuvant therapy using current clinical criteria. Adjuvant chemotherapy did not improve DFS in all 243 patients (HR 1.24, 95% CI: 0.83–1.86; P = 0.3; Supplementary Fig. S6A). Compared with the non-adjuvant chemotherapy group, the DFS analysis of adjuvant chemotherapy group showed that there was no significant difference between the survival of patients with classifier-defined high-risk combined with T4 stage and other groups after adjuvant chemotherapy (Supplementary Fig. S6B, C), which indicated that the patients with classifier-defined high-risk combined T4 stage had a favourable response to adjuvant chemotherapy.

Construction of nomogram based on methylation classifier

To visualise the prediction model, we established a nomogram based on the eight-methylation site classifier that integrated the classifier and clinicopathological risk factors, including TNM stage, tumour grade, number of lymph nodes examined, positive or unknown margins, intestinal obstruction or perforation, lymphovascular and perineural invasion (Supplementary Fig. S7A). Among them, DNA methylation risk score had the greatest impact on prognosis, followed by positive margins and tumour grade. Calibration curve showed that the nomogram did well compare with an ideal model represented by the 45-degree line (Supplementary Fig. S7B). The predictive accuracy of the nomogram was calculated by ROC analysis, the AUC of nomogram was 0.734 (Supplementary Fig. S7C). In addition, we also established nomograms with and without adjuvant chemotherapy to predict, the AUC of without and with adjuvant therapy nomograms were 0.868 and 0.7, respectively (Supplementary Fig. S8).

Discussion

Identification of effective prognostic and predictive biomarkers is critical for risk stratification and guiding treatment of stage II CRC. In this study, we presented the discovery and validation of a relapse-related DNA methylation classifier which predicts tumour recurrence in stage II CRC. In predicting disease recurrence, the AUC of our DNA methylation classifier in training and validation cohort was 0.75 and 0.71 respectively. Our DNA methylation classifier can distinguish high- and low-risk group and the risk score calculated from the classifier serves as a prognostic indicator for stage II CRC patients. High-risk group defined by the signature had a poorer DFS than the low-risk group regardless of clinicopathological factors. Furthermore, our DNA methylation signature showed significant high prognostic accuracy than any clinicopathological risk factor. Multivariate Cox analysis showed the risk score was an independent prognostic factor associated with DFS. In addition, our classifier can potentially identify patients who are suitable candidates for adjuvant chemotherapy.

The benefits of adjuvant chemotherapy are small in the patients with stage II CRC, thus it is unnecessary to apply chemotherapy in all stage II CRC patients. Over-treatment occurs in 75% of stage II CRC patients who will not relapse [43]. Our data also indicated that adjuvant chemotherapy did not improve DFS in stage II CRC patients using current clinical stratification. According to the updated guideline, adjuvant chemotherapy should not routinely be offered to patients who are at low risk for recurrence [44]. Identification of patients who can benefit from adjuvant therapy is critical in the management of stage II CRC patients. Our analysis showed that the classifier-defined high-risk combined with T4 stage group can significantly benefit from adjuvant chemotherapy. Studies with larger sample size are needed to further validate the utility of the classifier.

Considering DNA methylation is more stable and heritable than RNAs, several studies have reported DNA methylation play important roles in predicting prognosis in lung cancer, prostate cancer, acute myelocytic leukaemia, breast cancer, ovarian cancer and hepatocellular carcinoma [45,46,47,48,49,50], but it remains unclear whether DNA methylation can act as prognostic factors for stage II CRC. In our research, the high-throughput technology Agilent SureSelectXT Human Methyl-Seq was applied to the analysis CpG methylation. Compared with the methylation beadchips, it determined the methylation level of more than 2.5 million CpG sites, significantly more than the 850K sites in the methylation beadchips. The eight CpG sites in our classifier were not included in Illumina Infinium 450K Human Methylation Beadchips or Illumina Infinium MethylationEPIC (850K) Beadchips, which were widely used in methylation studies including TCGA project. The identification of these sites that were not discovered previously demonstrated that the methylation information beyond 450K or 850K sites may yield more comprehensive prognostic biomarkers in stage II CRC.

The eight CpG sites in our classifier were related to the methylation in the following genes: EREG, EFNB2, GNAL, PPAPDC1A and RGS12. Studies of their biological functions showed they are involved in tumour development and therapeutic response. EREG is one of the EGFR ligands. High-expression of EREG was correlated with prolonged OS and PFS in patients with CRC [51,52,53,54]. EFNB2 encodes a member of the epherin family and variants in EFNB2 are associated with overall survival in colorectal cancer patients [55]. Previous studies have shown EFNB2 and its receptor EPHB4 are transcriptional upregulated in colon cancer and promote cancer cell proliferation, migration and invasion [56]. EFNB2 overexpression was correlated with poor prognosis in patients who received chemotherapy or radiotherapy [57]. GNAL encodes the α subunit of a heterotrimeric GTP-binding protein. It is significantly correlated with grade and prognosis in glioma [58]. PPAPDC1A encodes a phospholipid phosphatase that converts phosphatidic acids to diacylglycerols. High expression of PPAPDC1A is significant associated with poor OS and DFS in lung cancer [59]. RGS12 expression is lower in various cancer types. Studies have shown its roles in cell proliferation, migration and invasion [60,61,62]. Genes in our methylation signature have all been found to be involved in tumourigenesis, prognosis and therapy response. However, their functions in CRC recurrence have yet to be elucidated.

Unlike several studies on the prognosis of stage II CRC that included both stage II and III CRC cases [18, 63, 64], our study focused only on stage II CRC. It is unclear whether the recurrence factors in stage II and stage III CRC are the same. The inclusion of only stage II CRC patients may identify stage II specific biomarkers. The Kaplan–Meier analysis of DFS using DNA methylation classifier stratified by the number of lymph nodes examined showed that our classifier has no prognostic value in <12 lymph nodes examined population, who may be unidentified stage III patients due to poor lymph nodes harvest. These results indicated that our classifier is potentially a stage II-specific prognostic biomarker. Whether our classifier can serve as a biomarker in adjuvant therapy guidance need to be further explored. Our results indicated that patients in the classifier-defined high risk combined with T4 stage group may benefit from adjuvant therapy. These results need to be further validated in larger multi-centre cohorts. In addition, the number of stage II CRC cases included in our study is not large, and the evidence from prospective randomised clinical trials is a necessary step to verify that our classifier is a clinically viable biomarker.

Conclusions

In summary, the DNA methylation-based risk score model constructed in this study provided more accurate information on the risk of recurrence compared with the use of conventional clinicopathologic criteria alone, performed effectively in predicting the recurrence risk of stage II CRC patients, possessed good power to discriminate high-risk patients from low-risk patients, and could potentially identify patients who benefit from adjuvant chemotherapy.