Introduction

Colorectal cancer (CRC) is the third most common cancer worldwide, with more than one million new cases diagnosed every year [1]. It is also the fourth most common cause of cancer-related death, with 700,000 deaths per year [1]. Due to its high incidence and mortality rate, CRC has become a global public health issue [2]. Initial patient management is based on the TNM staging system, which is validated and widely used in clinical practice [2]. However, this system is not very accurate, and even CRC patients of the same stage may vary greatly in their overall survival (OS). To achieve more accurate outcome prediction, it is necessary to incorporate molecular biomarker into our clinical practice [3]. Beside their prognostic value, biomarkers are also helpful in predicting therapy response, especially for selecting patient for biological therapy. In the past decade, with the development of biological therapy, such as anti-EGFR drugs, the OS of metastatic CRC patients has increased significantly from 12 months to 2 years [4]. But only about half of metastatic patients benefit from the anti-EGFR therapy, and this is partly based on presence of KRAS mutation, which confers resistance to anti-EGFR therapy. Therefore, we need more predictive biomarkers to predict response to biological therapy, to help individualize treatment, in particular for biological therapy [5]. Hence, molecular biomarkers have a promising future in the diagnosis, prognosis prediction and development of personalized therapy for metastatic CRC [2].The move from a single biomarker to a more comprehensive molecular characterization is appealing [4]. Presently, only KRAS, NRAS, and BRAF have been applied in routine clinical practice, and they are used as predictive markers of response to EGFR-targeted therapies [2]. However, the need to identify different biomarkers is increasing rapidly, because other mutations might offer the possibility of therapeutic intervention with new targeted drugs [4]. Almost all patients treated with targeted agents would become resistant to these biological drugs eventually. Clones of tumor cells carrying molecular alterations that produce drug-resistance expand, and then lead to treatment failure and recurrence. Therefore, identification of resistance mechanisms might offer the possibility of different lines of treatment with targeted agents. Taken together, it would be necessary to assess multiple biomarkers in the future [4].

Comprehensive molecular characterization of tumor tissue in clinical practice will rely on the development of high-throughput technologies allowing this process to be accomplished in a cost-effective and timely manner and by using a limited amount of tissue [6]. Next-generation sequencing (NGS)-based techniques are rapidly becoming the new standard to evaluate multiple gene mutations. NGS is a useful tool for single-nucleotide variant (SNV) and copy number variation (CNV) identification in many genes and samples simultaneously [7]. It has the advantage of providing information on known and novel molecular alterations [6], and providing highly sensitive semi-quantitative information on all possible mutations in a specific region of the genome [8]. NGS platform was reported to have an accuracy of 96.1% compared with Sanger sequencing [9]. Formalin-fixed paraffin-embedded (FFPE) tissue, due to its easy availability, is the most commonly used material for molecular testing. There are always some doubts about the quality and quantity of DNA in FFPE tissue [4]. Some previous studies had demonstrated that detection of multiple gene mutations with NGS platforms in FFPE sample is indeed feasible [6, 8, 9].

CRC and non-small cell lung cancer (NSCLC) are two frequent tumor types, which have known mutations associated with treatment decisions. In this study, we utilized a NGS-based gene panel, requiring only 10 ng of input DNA and a single tube multiplex PCR. It has 103 amplicons targeting the most frequently mutated cancer-relevant variants in 22 genes in NSCLC and CRC. Those 22 genes included AKT1, ALK, BRAF, CTNNB1, DDR2, EGFR, ERBB2, ERBB4, FBXW7, FGFR1, FGFR2, FGFR3, KRAS, MAP2K1, MET, NOTCH1, NRAS, PIK3CA, PTEN, SMAD4, STK11 and TP53. In this study, this panel was assessed using the NGS-based multiplex PCR technique (NGS-PCR) in FFPE tissues of 207 CRC patients. The mutation status of KRAS and BRAF detected by NGS platform was compared with those detected by routine amplification-refractory mutation system (ARMS)-PCR. Furthermore, clinical significance of this panel in CRC patients was also evaluated.

Materials and methods

Inclusion criteria

Patients who met all of the following criteria were included: (1) pathologically proven colorectal adenocarcinoma; (2) primary tumor was resected completely; (3) primary tumor was removed at our hospital; (4) tumor tissue was preserved in the pathology department for less than 2 years.

Exclusion criteria

Patients with any one of the following issues were excluded: (1) patients with positive proximal/distal/circumferential resection margin in the resected primary tumor specimens; (2) patients treated with local excision; (3) FFPE specimen of primary tumor tissue was unavailable; (4) neoplastic cells account for less than 40% in tumor tissue.

Samples

FFPE tissue samples from resected primary CRC specimens with at least 40% of neoplastic cells, were obtained from 207 patients who had been operated at Changhai Hospital, Shanghai, China between 2015 March and 2016 November. All patients received radical resection of the primary tumor. The clinicopathological characteristics and follow-up results were prospectively maintained in a database. All patients were followed up every 3 months, with a median follow up period of 23 months, ranging from 3 to 37 months. Informed consent had been obtained from all patients and the project had been approved by the Ethics Committee of Changhai Hospital.

DNA extraction

FFPE blocks were cut to obtain ten consecutive sections of 10 µm. H&E staining was performed on Sect. “Introduction”. A pathologist assessed each stained section and estimated the percentage of neoplastic cells. For each sample, DNA was extracted from freshly cut FFPE sections using GeneRead DNA FFPE kit (Qiagen) according to the manufacturer’s protocol. Briefly, after deparaffinized with 160 µL Deparaffinization Solution, FFPE tissue was incubated at 56 °C for 1 h with lysis buffer containing 55 µL RNase-free water, 25 µL buffer FTB, and 20 µL proteinase K. After 90 °C incubation to remove cross-links, the DNA was accessible for the specific removal of deaminated cytosine residues by adding 35 µL Uracil-N-Glycosilase (UNG). Then according to standard operating procedures, after the binding of DNA to the spin column, residual contaminants were washed away by buffers AW1, AW2, and ethanol. DNA was finally eluted in 30 µL buffer ATE.

All the eluted DNA were further quantified using the Qubit 3.0 fluorometer and Qubit dsDNA HS assay kit (Life Technologies) according to the manufacturer’s protocol. And then, 199 µL Qubit working solution and 1µL DNA sample were transferred to each sample tube, and quantified on the Qubit 3.0.

Library preparation

For each FFPE sample, a total of 20 ng DNA was used for preparing sequencing library using the SLIMamp™ Lung and Colon Hot Spots panel kit (Pillar Biosciences) according to the manufacturer’s protocol. In brief, 103 regions were amplified in a first round of gene-specific PCR and subsequently purified via size selection using Agencourt AMPure XP Beads (Beckman Coulter). After purification, a second round of PCR named Indexing PCR added Illumina index adaptors to purify products for sample tracking and sequencing. Those PCR products were further purified using Agencourt AMPure XP Beads (Beckman Coulter). Finally, the libraries were eluted in 22µL nuclease-free water. The final libraries were quantified using Qubit 3.0 fluorometer and Qubit dsDNA HS assay kit (Life Technologies) according to the manufacturer’s protocol. And then, 199 µL Qubit working solution and 1µL library sample were transferred to each sample tube, and quantified on the Qubit 3.0.

Sequencing on the MiSeq

Libraries were sequenced on the MiSeq according to the manufacturer’s protocol. Each library was normalized to 4 nM and combined at equal volume (4 µL). The library mix was denatured using 0.2 N NaOH and diluted to a final concentration of 15 pM for sequencing using MiSeq Reagent Kit v2, 300-cycles (or 20 pM for v3).

Detection of mutation status of KRAS and BRAF genes by routine ARMS-PCR technique

Beside the NGS-PCR technique described above, we had also detected the mutation status of KRAS and BRAF genes by routine ARMS-PCR technique. After HE staining, appropriate samples were chosen to include predominantly tumor cells without significant necrosis or inflammation [10]. Eight 10 µm-thick FFPE sections were collected. AmoyDx FFPE DNA Kit (AmoyDx, Xiamen, China) was used for DNA extraction. Genomic DNA was amplified for KRAS in exons 2, 3, and 4, and BRAF in exon 15 by using AmoyDx gene mutation PCR kits (AmoyDx, Xiamen, China). All the experiments were performed following manufacturer’s instructions, 5µL DNA was used for PCR amplification in each reaction. PCR cycling conditions were shown as : 5 min incubation at 95 °C, followed by 15 cycles of 95 °C for 25 s, 64 °C for 20 s, 72 °C for 20 s and then 31 cycles of 93 °C for 25 s, 60 °C for 35 s, 72 °C for 20 s. Fluorescent signal was collected from FAM and HEX channels. Each PCR run contained a negative and positive control. KRAS and BRAF mutation status were determined according to the Ct value as indicated in the manufacturer’s instructions.

Data analysis

Variants were filtered for known single-nucleotide polymorphisms (SNP) and synonymous mutations. SNP was defined as variant having allele frequency ≥ 5% in any sub-populations of 1000 genomes database. Benign variants, possible benign variants and variants with minor allele frequency (MAF) < 0.02 were excluded. The remaining variants were assessed in the Catalogue of Somatic Mutations in Cancer (COSMIC) database [8, 11]. The data were analyzed with SPSS 22.0 (Chicago, IL). Fisher exact or Chi square test was used for categorical variables as appropriate. The impact of clinicopathological and gene mutations on disease free survival (DFS) and OS was analyzed by Kaplan–Meier curves with the log-rank test. Multivariate COX analysis was employed using stepwise regression (forward: LR), and all factors with statistical significance in univariate analysis were included in multivariate analysis. The enter limit and remove limit were p = 0.05 and p = 0.10, respectively, and p < 0.05 (two-side) was considered as statistically significant.

Results

Panel design and performance

The panel was designed to include all well known predictive markers in the receptor tyrosine kinase (RTK) pathway, such as EGFR, KRAS, NRAS and BRAF for CRC, and other genes that might serve as targets in the future, such as AKT1, DDR2, ERBB2, and so on [12,13,14,15]. This panel included the following 22 genes: RTKs (ALK, EGFR, ERBB2, ERBB4, FGFR1, FGFR2, FGFR3, MET, DDR2); RTK signaling genes (KRAS, PIK3CA, BRAF, AKT1, PTEN, NRAS, MAP2K1, STK11); and other well known cancer-related genes (NOTCH1, CTNNB1, SMAD4, FBXW7, TP53) [6].

Clinicopathological characteristics of the included 207 CRC patients

A total of 207 CRC patients were included. Of the 207 patients, 129(62.3%) were men with a median age (interquartile range, IQR) of 61 (53–68) years (Table 1). This cohort included 30 (14.5%), 63 (30.4%), 79 (38.2%), and 35 (16.9%) cases of TNM stage I, II, III, and IV CRC, respectively. Of the 207 patients, 25 (12.1%) received preoperative chemoradiotherapy, and 45 (21.7%) got postoperative chemoradiotherapy. The primary CRC lesions were radically resected in all patients, and 16 (7.7%) received simultaneous resection of metastatic lesions (Table 1).

Table 1 Clinicopathological characteristics of the 207 included colorectal cancer patients

The distribution of variants and mutated genes

Of the 207 patients, 193 (93.24%) had one or more variants, with a total of 414 variants. Of the 193 patients, 75, 59, 36, 12, 7, 1, 1, 1, and 1 had one, two, three, four, five, six, seven, eight, and nine variants, respectively. Of the 193 patients, 170, 20, and 3 had one, two, and three mutated genes, respectively. No mutation was identified for the remaining 14 patients. The detailed information of variants and mutated genes is shown in Table 2 and Fig. 1.

Table 2 The distribution of variants and mutated genes in 207 patients
Fig. 1
figure 1

OncoPrint plot of gene mutation status of the 19 genes in 207 colorectal cancer patients (X axis: the 207 patients; Y axis: the percentage of patients with gene mutation)

Of the 414 variants, 158 were located on chromosome 17, 99 on chromosome 12, and 51 on chromosome 3; 355 had moderate influence and 59 had high influence on protein expression; 346, 32, 21, 9, 5, and 1 were missense, stop-gained, frameshift, splice, inframe deletion and inframe insertion variants; 384, 25, and 5 were SNV, deletion and insertion (Table 3).

Table 3 The distribution of the 414 variants identified in our 207 CRC patients

The top 4 most frequently mutated genes were TP53 (73.91%), KRAS (46.86%), PIK3CA (20.29%), and FBXW7 (15.94%). The incidences of all gene mutation were summarized in Table 2. No variants were identified in MET, NOTCH1, and STK11 genes.

The relationship of gene mutation and TNM stages

Gene mutation status of NRAS, PIK3CA, and TP53 were related with TNM stage (p = 0.048, p = 0.039, p = 0.029), respectively. No significant relationship was identified between mutation status of other 16 genes and TNM stage. We also analyzed the relationships between the three gene subgroups (RTKs, RTK signaling genes, other genes) and TNM stage, but did not identify any significant relationship. When the 19 gene panel was analyzed as a group, no significant relationship was identified between mutation status and TNM stage (Supplemental Table 1). We had analyzed the relationship between total variant number and TNM stage (Supplemental Table 2). Kruskal–Wallis test showed that there was no significant association between total variant number and TNM stage (p = 0.133).

Comparisons of mutation status of KRAS and BRAF detected by NGS-based multiplex PCR technique and routine ARMS-PCR technique

The gene mutation status of KRAS and BRAF genes had been detected twice: (1) NGS-based multiplex PCR technique (NGS-PCR) performed at the Shanghai Zhengu Biological Technology Co., Ltd; (2) routine ARMS-PCR technique performed at the pathology department of Changhai Hospital. Of the 207 patients, KRAS mutation status had been detected by ARMS-PCR in 151 patients, and there was high consistency between the two measurements (Spearman correlation coefficient: r = 0.763, p < 0.001, Supplemental Table 3). Of the 207 patients, BRAF mutation status had been detected by ARMS-PCR in 149 patients, and the two measurements were exactly the same (r = 1.000, p < 0.001, Supplemental Table 3).

Correlation between gene mutations and DFS and OS in CRC Patients

We had analyzed the influence of “Total variant number” on DFS and OS using different cutoff value (total variant number = 1, 2, 3, 4, Fig. 2). Our results showed that only “Total variant number ≥ 2” was significant predictor for both DFS and OS. Therefore, “Total variant number ≥ 2” had been chosen as the cutoff value in this study.

Fig. 2
figure 2

The influence of total variant number on disease-free survival and overall survival using different cutoff value

Univariate analyses of influencing factors of DFS are shown in Table 4. Advanced TNM stage (p = 0.024), elevated serum CEA (p = 0.013), elevated serum CA199 (p = 0.002), total variants number ≥ 2 (p = 0.019), AKT1 mutation (p = 0.011), BRAF mutation (p = 0.036), EGFR mutation (p = 0.047) and PTEN mutation (p = 0.015), were found to be associated with shorter DFS (Table 4).

Table 4 Univariate analyses of influencing factors of disease free survival (DFS)

Univariate analyses of influencing factors of OS are shown in Table 5. Poor differentiation (p = 0.017), lymph node metastasis (p = 0.029), advanced TNM stage (p = 0.008), elevated serum CEA (p = 0.022), elevated serum CA199 (p = 0.037), tumor deposit (p = 0.008), total variants number ≥ 2 (p = 0.002), AKT1 mutation (p = 0.002), ALK mutation (p = 0.049), BRAF mutation (p = 0.000), CTNNB1 mutation (p = 0.036), KRAS mutation (p = 0.034), NRAS mutation, (p = 0.015) and PTEN mutation (p = 0.042), were found to be associated with shorter OS (Table 5).

Table 5 Univariate analyses of influencing factors of overall survival (OS)

Multivariate analyses demonstrated that advanced TNM stage [2.383 (1.217–4.666), p = 0.011], elevated serum CEA [2.363 (1.264–4.417), p = 0.007], total variants number ≥ 2 [2.126 (1.050–4.304), p = 0.036], AKT1 mutation [11.798 (2.548–54.631), p = 0.002] and PTEN mutation [3.441 (1.304–9.077), p = 0.013], were independent predictors of shorter DFS (Table 6).

Table 6 Multivariate analyses of influencing factors of DFS and OS

Multivariate analyses demonstrated that poor differentiation (3.166 (1.430–7.009), p = 0.004), advanced TNM stage [3.357 (1.448–7.780), p = 0.005], total variants number ≥ 2 [3.539 (1.337–9.371), p = 0.011], BRAF mutation [7.268 (2.067–25.559), p = 0.002], CTNNB1 mutation [22.079 (2.521–193.356), p = 0.005] and NRAS mutation [4.365 (1.257–15.154), p = 0.020], were independent predictors of shorter OS (Table 6).

Discussion

Briefly, in this study, a gene panel consisting of 103 hotspot regions in 22 genes, which have clinical interest for CRC or lung cancer, was utilized for targeted sequencing. The top 4 most frequently mutated genes in this study were TP53 (73.91%), KRAS (46.86%), PIK3CA (20.29%), and FBXW7 (15.94%) genes. Gene mutation status was not related with TNM stage, except for NRAS, PIK3CA, and TP53. Univariate and multivariate analyses demonstrated that advanced TNM stage, elevated serum CEA, total variants number ≥ 2, AKT1 and PTEN mutation were independent predictors of shorter DFS; poor differentiation, advanced TNM stage, total variants number ≥ 2, BRAF, CTNNB1, and NRAS mutation were independent predictors of shorter OS. Taken together, our study proved that TNM stage and total variants number ≥ 2 were independent predictors of DFS and OS.

The frequency of gene mutations in CRC varied greatly in the literature, which could be attributed to different sample size, populations, and detection sensitivity [16]. A study of 112 CRC FFPE specimens based on NGS showed that the mutation frequency of KRAS, BRAF, PIK3CA, NRAS, TP53, and APC was 35.7%, 7.1%, 16.1%, 6.3%, 39.3%, and 29.5%, respectively [17]. Frequency of KRAS, NRAS, BRAF, PIK3CA, TP53, APC, SMAD and FBXW7 mutation was reported to be 44.4%, 4.0%, 4.0%, 13.1%, 52.5%, 27.3%, 2.0% and 3.0%, respectively, in Arab population (99 CRC); and be 48.4%, 4.0%, 4.0%, 12.1%, 47.5%, 24.2%, 11.1% and 0%, respectively, in matched Western population (99 CRC) [16]. Many studies had reported the incidence of TP53 mutation, ranging from 30 to 70% in different populations [18,19,20,21]. It was consistent with the TP53 mutation incidence of 73.91% in our study. However, our mutation incidence was higher than that reported in a previous published literature [6], which may have something to do the different detection sensitivity and sample size. Our study identified no significant relationship between TNM stage and mutation status of almost all genes, except for NRAS, PIK3CA, and TP53. These findings were consistent with Al-Shamsi’s results which revealed no significant associations between KRAS, NRAS, BRAF, PIK3CA, TP53, APC, SMAD, FBXW7 mutations and TNM stage [16].

KRAS is a key downstream component of the EGFR signaling pathway [19]. KRAS mutation could predict resistance to anti-EGFR therapy [22]. However, the prognostic value of KRAS mutation in CRC is controversial [21, 23,24,25,26,27]. In our study, KRAS mutation was found to be related with OS, but not for DFS. BRAF mutation had been reported to predict poor outcome in CRC patients at both early and advanced stages [21, 23, 26]. In our study, BRAF mutation was found to be predictive of shorter DFS and OS. AKT1 is a central player in the PI3K oncogenic pathway, and it involves in many cellular pathways such as cell survival, proliferation, angiogenesis, as well as radiation and drug response [28, 29]. The clinical significance of AKT1 and NRAS in CRC remains unknown [30]. Serum CEA level is a well known biomarker of poor survival in CRC [31, 32]. Our univariate analyses demonstrated that TNM stage, serum CEA level, total variants number ≥ 2, AKT1, BRAF and PTEN mutations were independent predictive factors of DFS and OS. TNM staging alone did not accurately predict outcome in CRC [3, 25]. Dienstmann’s study showed that incorporation of BRAF and KRAS mutation status to TNM staging improved the predictive accuracy of OS [3]. Our multivariate results proved that “total variants number ≥ 2” was independent predictor for both DFS and OS. Why is “total variants number ≥ 2” a prognostic marker? Does it have something to do with microsatellite instability (MSI)? High microsatellite instability (MSI-H) was supposed to lead to more gene mutation and more variants. We had not tested the MSI status in this study, but we had the data of defective mismatch repair (dMMR). Our results showed that dMMR was not predictor of DFS and OS. We had also analyzed the relationship between total variant number and TNM stage, but found no significant association. Our panel included 103 amplicons targeting the most frequently mutated cancer-relevant variants in 22 genes in colorectal cancer. Single-nucleotide polymorphisms (SNP), synonymous variants, benign variants and possible benign variants were excluded. So, all of the remaining variants in our study would probably be associated with the development and metastasis of colorectal cancer. This might explain why total number of variants ≥ 2 was related with poor prognosis.

NGS technologies provide useful tools for SNV and CNV identification in many genes and samples simultaneously [33]. In this study, we used a NGS–PCR technique to detect multiple mutations with a 22-gene panel in 207 CRC patients. Compared with the conventional standard methods like Sanger sequencing or ARMS-PCR, our method has several advantages. Firstly, it could save a lot of tissue sample for detecting multiple targets. Our panel, consisting of 103 amplicons targeting 22 genes, requires as little as 10 ng of input DNA; while conventional methods require 500–1000 ng of DNA [6]. Secondly, the turn-around-time (TAT) from DNA isolation to results is only 48–72 h, which is much shorter than the sum of time spent on every detected gene. Thirdly, it is much more cost-effective if costs are calculated per detected gene. In addition, the included genes of our panel are flexible and can be adjusted accordingly. Our panel is smaller compared with commercial panels designed for a broad spectrum of cancers [6], and it only costs about 120 dollars per sample from DNA extraction to sequencing. Nonetheless, our panel covers almost all frequent mutations identified in CRC tissues. These include a variety of indel and missense mutations whose mutation frequency was as low as 2% [6]. Tops et al. reported a similar 22-gene panel for CRC, but it only included 87 hotspot regions in 22 genes [6]. However, the 22-gene panel in our study included much more hotspot regions (103 regions), since we have added multiple latest variants with potential clinical significances. Most importantly, our study is the first one that has reported its clinical significance and prognostic value in CRC patients. Furthermore, our study had the biggest sample size (207 CRC), which is much larger than those in Tops’ study (29 CRC) [6] and Dijkstra’s study (30 CRC) [8]. Moreover, we identified a PIK3CA:c.3203dupA frameshift mutation that would change the last C-terminal amino acid of the PIK3CA protein and add another three amino acids. This mutation, whose impact on tumor genesis was still unknown, had been reported in hepatocellular carcinomas and breast cancer [34, 35]. But it has never been reported in CRC according to the COSMIC [36], which is the world’s largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. A search of literatures via “Pubmed” also did not find report of this mutation in CRC.

The demand for identifying genetic targets to predict survival and select targeted therapy has been increasing rapidly in the past 30 years [37]. Specification of mutation status of multiple genes in CRC is mandatory to optimize personalized treatment. However, detecting mutation status of multiple genes is faced with a number of challenges, including test cost, turn-around-time and limited sample [37]. Our panel detected by the NGS–PCR technique can overcome these challenges. Therefore, it meets the requirement for unbiased and highly sensitive mutation detection and may accelerate development in basic and clinical cancer research [9].

There are some limitations in our study. Firstly, the follow-up period was relatively short. Secondly, the sample size of our study was relatively small, although it is the largest study in this field until now. There would be more patients with gene mutation if the sample size was larger, which could lead to a more accurate result, especially for these less frequent mutations. Therefore, future large sample sized studies are warranted to verify the conclusions drawn in this study.

In conclusion, simultaneous detection of multiple gene mutations with a 22-gene panel, using the NGS-based multiplex PCR technique, is feasible in FFPE CRC specimens. TNM stage and total variants number ≥ 2 were independent predictors of DFS and OS. Detection of multiple gene mutations may provide additional prognostic information to TNM stage in CRC patients.