Introduction

Large-scale genetic analyses of colorectal cancer (CRC) have recently been conducted [1, 2]. Several studies have revealed the genetic characteristics of CRC according to tumor location. When the tumors were classified as right- or left-sided, the prevalence of BRAF, KRAS, CTNNB1, SMAD4, and PIK3CA mutations as well as microsatellite instability were more common in right-sided tumors [3,4,5]. Right-sided tumors are associated with CpG island methylator phenotype [6] and consensus molecular subtype (CMS) 1 [2] and, while the TP53 mutation rate is higher in left-sided tumors [4]. Even after classifying CRC as right/left-sided, the genetic characteristics and prognosis differed according to tumor location on each side [4]. These data have been used for the selection of anti-tumor agents and for predicting CRC prognosis in clinical practice.

Generally, rectal cancer (RC) is classified into types according to tumor location in daily clinical practice. Lower RC has been reported to exhibit more aggressive behavior than upper RC [7]. Lower RC requiring rectal amputation has also been reported to have a poorer prognosis than other RCs [8, 9]. However, most large-scale genetic analyses of CRC classified RC as a left-sided tumor or as a single group, and the genetic characteristics of RC by tumor location have not been fully investigated, compared with those of colon cancer.

Here, we performed comprehensive genetic profiling of primary RC in a large Japanese cohort. We categorized RC by tumor location according to the general rules of the European Society for Medical Oncology (ESMO) guidelines [10] and Japanese Classification of Colorectal, Appendiceal, and Anal Carcinoma (JCCRC) [11]. We compared the genetic characteristics of low RC with those of other RCs. The identification of clinically significant differences could provide further information regarding tumor biology and lead to improvements in the treatment of RC.

Methods

Ethical statement

In 2014, the Shizuoka Cancer Center initiated Project HOPE (High-tech Omics-based Patient Evaluation) to investigate the biological characteristics of cancer, and the present study used selected data from this project [12, 13]. In Project HOPE, various tumor types, which were surgically resected at Shizuoka Cancer Center Hospital, were evaluated by multiomics-based analyses. This project was conducted at a single institution and designed according to the “Ethical Guidelines for Human Genome and Genetic Analysis Research,” revised in 2013. Informed consent was obtained from all patients and the Institutional Review Board of Shizuoka Cancer Center approved all aspects of this study (authorization no. 25-33).

Patient selection and study design

Patients with all types of cancer who underwent surgery at Shizuoka Cancer Center Hospital and were able to supply fresh cancer tissues of sufficient quantity were candidates for Project HOPE. Patients whose pathological diagnosis could be affected by the removal of cancer tissue were excluded from Project HOPE. Patients who underwent surgery for primary RC between February 2014 and March 2019 and were analyzed in Project HOPE were included in the present study. Patients who had a tumor at a distance exceeding 15 cm from the anal verge (AV), underwent preoperative treatments such as chemotherapy and/or radiotherapy, had squamous cell carcinoma or undifferentiated carcinoma, and provided samples from two or more colorectal tumors were subsequently excluded. Thus, 611 patients were eligible for this study. All tumors were pathologically diagnosed as adenocarcinomas. The clinicopathological and genetic characteristics of these RCs were retrospectively investigated.

Classification of the rectum

The localization of RC was classified according to the ESMO guidelines [10] and JCCRC general rules [11]. In the ESMO guidelines, RC is defined as a tumor located at a distance ≤ 15 cm from the AV. Further, RC is classified according to the tumor’s specific distance from the AV as follows: low, ≤ 5 cm; mid, > 5–10 cm; and high, > 10–15 cm. According to the JCCRC, the rectum is classified according to anatomical landmarks as follows: lower rectum (Rb), below the level of the peritoneal reflection; upper rectum (Ra), above the peritoneal reflection level and up to the lower margin of the second sacral vertebra; and rectosigmoid (RS), above the lower margin of the second sacral vertebra and up to the level of the sacral promontory. We classified tumors into RS-Ra or Rb (low) RC, based on the level of the lower border of the tumor.

Sample preparation

Tumors and their surrounding normal tissue weighing ≥ 100 mg were dissected from fresh surgical specimens. Peripheral blood was collected as a control for whole-exome sequencing (WES). For DNA analysis, clinical samples were frozen in liquid nitrogen before DNA extraction. DNA was extracted from these samples using a QIAamp DNA Blood MINI Kit (Qiagen, Venlo, Netherlands). DNA was quantified using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and Qubit 2.0 Fluorometer (Thermo Fisher Scientific). For RNA analysis, tissue samples were immersed in an RNAlater solution (Thermo Fisher Scientific) and stored at 4 °C before RNA extraction.

Whole-exome sequencing

Detailed protocols have previously been described [13,14,15]. Briefly, WES was performed on an Ion Proton System (Thermo Fisher Scientific) using the Ion Ampliseq Exome kit (Thermo Fisher Scientific). The WES exome library was constructed using the Ion Torrent AmpliSeq RDY Exome Kit (Thermo Fisher Scientific). Somatic mutations were identified by comparing tumor data with the corresponding blood samples.

Gene expression profiling (GEP)

GEP analysis was performed as described previously [13, 16]. Total RNA was extracted from approximately 10 mg of tumor tissue using the miRNeasy Mini Kit (Qiagen, Hilden, Germany). RNA samples with an RNA integrity number ≥ 6 were used in GEP analysis. Briefly, a total of 100 ng RNA was amplified and fluorescently labeled. These samples were hybridized to a SurePrint G3 Human Gene Expression 8 × 60 K v2 Microarray (Agilent Technologies, Santa Clara, CA, USA). Raw microarray data were normalized using GeneSpring GX software (Agilent Technologies). CMS classification was evaluated using the R package CMS caller [17].

Detection of fusion genes

A detailed protocol for the detection of fusion genes has been described previously [18]. Briefly, we constructed an in-house library that targeted total RNA to detect 491 known fusion genes. Thereafter, breakpoint sequencing was performed using a next-generation sequencer to determine the sequence of the fusion gene.

Outcome variables

Data on patients, pathological findings, genetic characteristics, and postoperative prognosis were collected. The tumor stage was defined according to tumor node metastasis classification [19]. These outcomes were compared between low and other RCs. As regards long-term outcomes, overall survival (OS) and relapse-free survival (RFS) rates were evaluated. Long-term outcomes were analyzed in cases that could be classified as CMS1–4 as well as those in which surgery was performed between February 2014 and December 2017 to exclude cases with short observation periods. Cases with pStage IV and with synchronous or metachronous malignancies were also excluded (Supplemental Fig. 1).

Fig. 1
figure 1

Gene expression analysis. The expression levels of 10 genes were compared between low and other rectal cancers. Results according to the European Society for Medical Oncology classification (a) and Japanese Classification of Colorectal, Appendiceal, and Anal carcinoma (b) are shown

Statistical analysis

Fisher’s exact test or the chi-squared test was used to assess categorical variables, and Mann–Whitney U tests were used to compare continuous variables between the two groups. OS and RFS rates were calculated from the time of surgery using the Kaplan–Meier method and compared using the log-rank test. Univariate and multivariate analyses of the factors influencing RFS were performed using a Cox proportional hazard regression model. Risk factors with p values < 0.10 in univariate analysis were included in the multivariate analysis. Differences were considered statistically significant at a p-value < 0.05. On evaluating CMS classification using the CMS caller, q < 0.05 was considered statistically significant. All analyses, except for CMS classification, were performed using BellCurve for Excel (version 2.15; Social Survey Research Information Co., Ltd., Tokyo, Japan).

Results

Baseline characteristics

A total of 611 patients with RC were investigated, of whom 188 (30.8%) and 319 (52.2%) were classified as having low RC according to the ESMO and JCCRC classifications, respectively. Table 1 summarizes the baseline characteristics of the patients according to tumor location. The rate of locally advanced RC was relatively high, with 426 (69.7%) cases of pT3 or T4 and 313 (51.2%) with lymph node metastases.

Table 1 Baseline characteristics of the study population

Mutation accumulation

Table 2 shows the mutation accumulation of key genes in CRC, such as APC, TP53, SMAD4, CTNNB1, PIK3CA, PTEN, KRAS, NRAS, and BRAF, according to tumor location. Under the ESMO classification, only KRAS mutation accumulation was significantly higher in low RC than in high-mid RCs. However, under the JCCRC, the mutation accumulation of these genes was similar between low and other RCs.

Table 2 Comparison of gene mutation frequency

Gene expression profiling

GEP data were available for 607 patients. Gene expression analysis was performed on nine genes examined for mutation accumulation and an additional gene, ERBB2. Under the ESMO classification, the expression levels of CTNNB1, KRAS, and ERBB2 differed significantly according to tumor location (Fig. 1a). Under the JCCRC, the expression levels of TP53, KRAS, and ERBB2 differed significantly according to tumor location (Fig. 1b).

Prevalence of fusion genes

Fusion gene expression was examined in 601 patients. Table 3 shows fusion gene expression details according to tumor location. Overall, 14 (2.3%) patients had fusion genes, of whom 13 (92.9%) exhibited RSPO-related fusion genes: two cases of EIF3E-RSPO2 and 11 of PTPRK-RSPO3. Only under the JCCRC fusion genes were found to be significantly more in the low RC group.

Table 3 Comparison of fusion gene prevalence

Distribution of CMS

CMS was analyzed in 601 patients (Table 4). One hundred and nine (18.1%) cases were unclassifiable. Under both classifications, the distribution of CMS was significantly different between low and other RCs. In particular, low RC had a lower rate of CMS2 and a higher rate of CMS4. Under the ESMO classification, the frequencies of CMS2 and CMS4 were 27.0% and 32.5% in high-mid tumors and 14.8% and 41.5% in low tumors, respectively. Under the JCCRC, the frequencies of CMS2 and CMS4 were 32.6% and 28.5% in RS-Ra tumors and 14.5% and 41.6% in Rb (low) tumors, respectively. The mutation accumulation of APC and TP53 in CMS4 tumors was 81.6% and 77.4%, respectively, and these figures were relatively higher than those reported previously [2]. In addition, the distribution of CMS was significantly associated with the pStage (Supplemental Table 1). As the tumor progressed, the frequency of CMS4 increased.

Table 4 Comparison of consensus molecular subtype distribution

Genetic characteristics according to sex and histology

Under the ESMO classification, the frequency of poorly differentiated or mucinous tumors tended to be higher in low RC, and under the JCCRC, males were significantly more common in the low RC (Table 1). According to sex, the gene expression levels of PIK3CA, KRAS, BRAF, and ERBB2 (Supplemental Fig. 2a), and the distribution of CMS (Supplemental Table 2) differed significantly. According to histology, the mutation accumulation of APC and BRAF (Supplemental Table 3), and the gene expression level of SMAD4 and PTEN (Supplemental Fig. 2b) differed significantly. The distribution of CMS tended to differ, although it was not statistically significant (Supplemental Table 2). The prevalence of fusion genes was similar according to sex and histology (Supplemental Table 4).

Long-term outcomes by CMS classification

Thereafter, we investigated the association between CMS and long-term outcomes because the distribution of CMS differed clearly according to tumor location in both classifications. Long-term outcomes were analyzed in pStage 0–III cases that could be classified as CMS1–4 and in which surgery was performed between February 2014 and December 2017 (Supplemental Fig. 1). Tumor stages were similar among the four groups (Supplemental Table 5). Figure 2 shows the OS and RFS rates for each type of CMS. There was no significant difference in OS between the groups (Fig. 2a). RFS was significantly different between the groups, and CMS4 had a particularly poor prognosis (Fig. 2b). Multivariate Cox regression survival analysis revealed that pT3-4, pN1-2, and CMS4 were independently associated with poor RFS (Table 5). Further analyses of RFS by pStage and CMS revealed that in Stages 0–II, RFS tended to be worse in CMS4, although this outcome was not statistically significant (Fig. 2c). In Stage III, RFS was significantly worse in CMS4 than in CMS1–3 (Fig. 2d).

Fig. 2
figure 2

Long-term outcomes after surgical resection of rectal cancer. a Overall survival by consensus molecular subtype. b Relapse-free survival by consensus molecular subtype. Relapse-free survival by consensus molecular subtype and pStage: c pStage 0–II; d pStage III

Table 5 Univariate and multivariate analyses of factors for relapse-free survival

Discussion

This study demonstrated significant differences in genetic characteristics between low and other RCs. Most previous studies did not categorize RC based on location, although RC is always classified according to tumor location in daily clinical practice. To the best of our knowledge, this is the first study to compare the genetic characteristics of low RC with those of other RCs. Because this was a single-center study, detailed clinical data were available, and a detailed classification of RC was possible. Furthermore, in Western countries, the standard treatment for locally advanced RC is total mesorectal excision with neoadjuvant chemoradiotherapy (nCRT), whereas nCRT is not the standard treatment in Japan [20]. At our institution, the indications for nCRT are < 10% of all RC surgeries [21], and tumors treated with nCRT were excluded from this study. Thus, the present data are from a large number of surgically resected primary RC cases, which were not modified by preoperative treatments. The foregoing constitutes the strengths of this study.

In this study, the mutation accumulation and expression levels of key oncogenes, expression of fusion genes, and distribution of CMS were compared between low and other RCs. We used two classifications in which the location of the rectal tumor was categorized. Regarding mutation accumulation, only KRAS mutation accumulation was significantly higher in low RC under the ESMO classification (Table 2). We detected significant differences in the expression levels of several genes. However, these differences were exclusively observable under one classification (CTNNB1 and TP53), or the difference was not considerably large, even when it was identifiable under both classifications (KRAS and ERBB2) (Fig. 1). The prevalence of fusion genes was significantly higher in low RC, only under the JSCCR (Table 3). In contrast, the distribution of CMS was different according to tumor location under both classifications (Table 4). The CMS system is one of the most robust classifications for CRC, in accordance with gene signatures [2]. The CMS classification system consists of the following four subtypes with distinct biological and molecular characteristics: CMS1 (microsatellite instability immune), CMS2 (canonical), CMS3 (metabolic), and CMS4 (mesenchymal). Understanding these subsets is expected to influence treatment approaches and improve clinical outcomes. In this study, in particular, low RC exhibited a lower frequency of CMS2 and higher frequency of CMS4 than other RCs. Although CMS4 frequency is reportedly higher in cases with more advanced stages (III and IV) [2, 22], as confirmed by our study (Supplemental Table 1), the rates of lymph node and distant metastases were similar between low and other RCs in the present study (Table 1). Therefore, the differences in CMS distribution may be due to tumor location. CRC with CMS4 has previously been reported to have worse OS and RFS [2]. Here, even when the study population was limited to RC, CMS4 was found to be a risk factor for poor RFS after surgical resection (Table 5), suggesting the significance of the CMS classification system in predicting the prognosis of RC after surgery. Lower (extraperitoneal) tumors have previously been reported to exhibit more aggressive behavior than upper tumors [7]. The higher frequency of CMS4 in low RC may be one of the reasons for this finding.

CMS4 was found to be significantly more common in patients with low RC, and it proved to be a risk factor for poor prognosis. Treatment strategies based on these findings may improve the prognosis of RC. The therapeutic effects of cytotoxic drugs and molecular target agents have recently been reported to differ in accordance with CMS [23]. For example, concerning first-line chemotherapy, patients with CMS2/3 benefited from combination chemotherapy with bevacizumab compared with those with CMS1/4 [24]. In RAS wild-type cases, progression-free survival and OS in CMS4 were significantly better in patients treated with 5-fluorouracil/levofolinate/irinotecan (FOLFIRI) cetuximab than in those treated with FOLFIRI bevacizumab, while these benefits were not observed in CMS1/2/3 [25]. Irinotecan-based regimens were significantly superior to L-OHP-based regimens in patients with CMS4 CRC [26]. In high-mid or RS-Ra RC cases, tumors classified as CMS2 or CMS4 each accounted for approximately 30% of cases. In contrast, in low RC, CMS2 tumors accounted for approximately15% of cases only, whereas CMS4 tumors accounted for > 40% of cases (Table 4). These findings suggest the importance of drug selection targeting CMS4 in the treatment of low RC. If tumor CMS could be classified before chemotherapy by tumor biopsy, it may lead to the selection of more appropriate drugs. However, this is difficult because CMS classification using biopsy samples is significantly less reliable because of intratumor heterogeneity compared with that using resection samples [27]. The present data on CMS distribution by tumor location may help in the selection of therapeutic agents for RC cases in which the acquisition of resection samples is difficult (i.e., unresectable or preoperative cases). Total neoadjuvant therapy (TNT) has been developed for the preoperative treatment of locally advanced RC [28]. There was no change in the therapeutic drugs used in TNT according to tumor location. However, it may be preferable to alter the drugs according to tumor location based on CMS distribution. The relationship between CMS classification and drug efficacy in preoperative treatment remains unclear and will be the subject of further study.

In this study, we used two representative classifications that categorize RC according to location. Inconsistencies exist between the ESMO guidelines and the JSCCR [29]. From a molecular genetic perspective, it is unclear which classification makes more sense. In our opinion, if the one classification was able to distinguish clinically significant genetic differences, then this classification may be more appropriate for RC than the other classification, which cannot make that distinction. Here, we also found some genetic differences other than CMS distribution, according to tumor location in each classification. If clinical significance could be found in any of these differences, then the classification which reveals that difference may be more suitable. For example, the prevalence of fusion genes was significantly higher in low RC only under the JSCCR (Table 3). Of the 14 fusion gene cases, 13 (92.9%) had low RC. Furthermore, among these 13 cases, 11 (84.6%) were PTPRK-RSPO3, and PTPRK-RSPO3 was exclusively confirmed in low RC. Several therapeutic agents against CRC with PTPRK-RSPO3 have previously been reported [30,31,32]. If such therapies are availed for clinical use, it would be worthwhile to classify low RC according to the JSCCR rules for evaluating the expression of PTPRK-RSPO3 only in them. Tumors with fusion genes have already emerged as important therapeutic targets in lung cancer treatment [33], and the development of a specific treatment for CRC with fusion genes is anticipated.

This study had certain limitations. First, data on small tumors are scarce. Small tumors tended to be excluded from Project HOPE because the pathological diagnoses might have been affected by the removal of sample tissue. Second, the present data are from Japanese patients; hence, the generalizability of the results to other races remains uncertain. To further validate the present results, data on small tumors and other races are required. Third, the significance of the cases that could not be categorized by the CMS classification system remains unclear, and differences in tumor location possibly exist. Fourth, there were some differences in baseline characteristics of the patients according to the location of the tumor. The differences in genetic characteristics according to sex and histology might affect the present results about genetic characteristics of RC according to tumor location.

In conclusion, we found significant differences in genetic characteristics between low and other RCs. In particular, CMS distribution was significantly different, and CMS4, which was more frequent in low RC, was a risk factor for poor prognosis after surgical resection. In the future, it will be necessary to translate these data into clinical practice, and further investigation will enable physicians to provide optimal and personalized treatment options for patients with RC.