Introduction

Nowadays, various imaging examinations have been widely used in tumour diagnosis, treatment and curative effect prediction [1,2,3,4]. In particular, high-resolution magnetic resonance imaging (MRI) is the gold standard for rectal cancer TNM staging due to its high consistency with pathological findings, particularly in the evaluation of tumour involvement of the mesorectal fascia (MRF) and extramural venous invasion (EMVI) [5,6,7,8]. Importantly, MRI-based TNM staging guides the stratified treatment selection for rectal cancer among options such as surgery alone for early-stage cases, neoadjuvant treatment to achieve downstaging and downsizing, adjuvant chemotherapy for locally advanced individuals and an intensified chemotherapy regimen with or without target therapy for distant metastatic patients [9,10,11,12].

Moreover, MRI is accurate and sensitive enough to assess the tumour response to neoadjuvant chemoradiotherapy (nCRT) or neoadjuvant chemotherapy (nCT) for locally advanced rectal cancer (LARC) [13]. In those patients who respond to neoadjuvant treatment, the tumour mass undergoes volume regression, and responsive tumours are replaced by dense fibrosis and appear with low signal intensity on T2-weighed images. In contrast, for those patients with no response to neoadjuvant treatment, the tumour mass triggers a persistent intermediate-level or higher signal intensity on T2-weighed images [14, 15]. By distinguishing signal differences, the MERCURY study proposed an MRI tumour-regression grading (MRI-TRG) system that closely resembled the five-category pathological TRG (pTRG) developed by Dworak et al. and Mandard et al. [5, 16, 17]. In previous reports, the MRI-TRG system identified a favourable-outcomes subgroup with prolonged overall survival (OS) and disease-free survival (DFS) rates [5, 18]. Though similar to pTRG principles, whether the MRI-TRG system could be used as a surrogate of the pTRG classification scheme remains controversial [13, 15, 19].

Recently, an organ- and function-preserving strategy has been suggested for patients with neoadjuvant treatment-induced complete clinical response (CCR), which accounts for 11 to 38% of all LARC patients [20,21,22]. The “watch and wait” strategy could be alternatively employed for CCR patients given their good long-term OS, DFS and distant metastasis-free survival (DMFS) rates [20,21,22]. Given its powerful capacity of discriminating between the tumour and fibrosis, MRI has been an alternative means by which to identify CCR patients for organ-preserving management [23]. However, the consistency in stratification between CCR and pathological complete response (pCR) by the MRI-TRG system has not yet been clearly addressed for LARC.

Here, we recruited 1033 consecutive LARC patients from two institutions, with one institution’s patients used as the training set and the other institution’s patients used as the validation set for this investigation, respectively. Further, referring to the American Joint Committee on Cancer/College of American Pathologists (AJCC/CAP) four-category TRG classification scheme [24, 25], we proposed a four-category MRI-TRG system based on the volumetric analysis and investigated its prognostic value for LARC and consistency with AJCC/CAP TRG system, particularly for patients in the pCR subgroup.

Methods

Patient characteristics

LARC patients were retrospectively recruited from the Guangdong Institute of Gastroenterology, Sixth Affiliated Hospital of Sun Yat-sen University and Sir Run Run Shaw Hospital, College of Medicine Zhejiang University from July 2011 to December 2018. The study inclusion criteria were as follows: pathologically confirmed rectal adenocarcinoma; tumour located within 15 cm from the anal verge; clinical stage of T3–4 or/and N-positive; receipt of neoadjuvant treatment followed by total mesorectal excision (TME) surgery; and availability of both primary and restaged MRI scans, with the primary MRI scan performed within two weeks before neoadjuvant treatment and the restaged MRI performed within one week before TME surgery, respectively. Exclusion criteria were as follows: patients with distant metastasis occurring before or during treatment; primary or restaged MRI scans missing or with poor quality (such as artefact). This study was approved by the Clinical Ethics Review Committee at the Sixth Affiliated Hospital of Sun Yat-sen University (2021ZSLYEC-121), with the requirement for informed consent waived.

Treatment

The treatment strategy was determined by the multidisciplinary team at each institution. Briefly, the neoadjuvant radiotherapy was delivered at 45 Gy in 25 fractions to the whole pelvis with a boost of 5.4 Gy aimed directly at the gross tumour, and patients were concurrently given 5-fluorouracil–based chemotherapy regimens orally or intravenously. The choice of 5-fluorouracil, FOLFOX (leucovorin 400 mg/m2 intravenously followed by fluorouracil 400 mg/m2 intravenously and fluorouracil 2.4 g/m2 by 48-h continuous intravenous infusion and oxaliplatin 85 mg/m2 intravenously on day 1), or CAPOX (Capecitabine was given 1000 mg/m2 twice daily on days 1–14 and Oxaliplatin was given 130 mg/m2 on day 1) chemotherapy would be given to each patient was at the discretion of the multidisciplinary team [26]. Meanwhile, a subgroup of patients were enrolled from a prospective clinical trial, so their preoperative radiotherapy was omitted from this investigation [27]. Surgery with curative intent was performed according to TME principles at 6–8 weeks following the completion of neoadjuvant radiotherapy or 2 weeks after completion of neoadjuvant chemotherapy. Standard FOLFOX or CAPOX adjuvant chemotherapy was administered to most cases [26].

MRI-TRG system

As summarised in Supplementary Table 1, MRI scans were performed by a 1.5-Tesla or 3.0-Tesla MRI unit. The bowel preparation consisting of a Fleet’s™ enema was performed before scanning. The high-spatial-resolution axial plane of the T2-weighted sequence was perpendicular to the long axis of the tumour in restaged pelvic MRI scans. Oblique axial T2-weighted images were used to outline the region of interest; meanwhile, the other sequences in the restaged pelvic MRI and primary MRI scans were used as the reference images. Two physicians with 7 (Dr. Xiaolin Pang) and 9 years (Dr. Peiyi Xie) of experience in rectal cancer imagining were recruited, whom were blinded to the patient’s clinical characteristics and follow-up information. The outline criteria were as follows: radiochemotherapy induced anorectal fibrosis and residual tumour were delineated on each axial plane of T2-weighted images by the ITK-snap version 3.4.0 software (http://itk-snap.org), respectively. In detail, areas presenting with intermediate or high signal intensity on T2 and DWI images (b = 800/1200 s/mm2) were judged as a residual tumour. In contrast, zones displaying a low signal on T2 and DWI images were judged as exhibiting fibrosis. For the images with artefacts that obscured interpretation, T2 images, DWI and apparent diffusion coefficient (ADC) should be combined for response assessment. The signal of ADC referred the background of the whole image. The areas displaying a low signal on T2 and unrestricted diffusion images (low signal on DWI and low signal on ADC) were judged as exhibiting fibrosis, and a high signal on T2 and restricted diffusion images (high signal on DWI and low signal on ADC) were recognised as tumour [28]. After contouring at each axial plane, the ITK-snap software reconstructed the outlined tumour and fibrosis at all planes into the three-dimensional tissue mass. The percentage of residual tumour was calculated using the following formula: tumour volume/ (tumour volume + fibrosis volume) × 100%. To test the interobserver variability, another 50 LARC patients were enrolled and their T2 images were delineated simultaneously. Discrepancies in assessment between the two physicians were arbitrated by the third physician (Dr. Jian Zheng) with 20 years of imaging experience. We developed a four-category MRI-TRG system as a non-invasive surrogate for the tissue-based AJCC/CAP TRG scheme [24, 25]. We hypothesised that the fractional residual tumour volume on MRI is correlated with pathological residual tumour volume. Based on this hypothesis, we divided the observed MRI fractional residual tumour volumes in our training set into four ordered subgroups (MRI-TRG 0-3) and set the thresholds between groups such that each MRI-TRG subgroup included the same fraction of the population as the corresponding AJCC/CAP TRG stage group [29]. The final MRI-TRG subgroups were defined as follows: MRI-TRG 0 (complete response), where there was no remaining tumour tissue; MRI-TRG 1 (moderate response), where no more than 30% of the tumour remained; MRI-TRG 2 (minimal response), where 30 to 80% of the tumour remained and MRI-TRG 3 (poor response), where more than 80% of the tumour remained (Fig. 1 and Supplementary Table 2).

Fig. 1: Representative images of the MRI-TRG categories.
figure 1

The four columns shown from left to right display the axial plane of T2-weighted for MRI-TRG 0 to MRI-TRG three category patients, respectively. Row a present the representative axial plane of a T2-weighted image before neoadjuvant treatment. Row b denotes the representative axial plane of a T2-weighted image following the completion of neoadjuvant treatment but before the TME surgery. Row c displays the ITK-snap software–delineated residual tumour and the replaced fibrosis in the row b T2 image. Based on the delineation in each plane (row c), the ITK-snap software reconstructed a three-dimensional residual tumour (red block) and radiochemotherapy induced anorectal fibrosis (yellow block) mass (row d). The red arrows indicate the tumour zone, while the yellow arrows point to the radiochemotherapy induced anorectal fibrosis area.

Pathological TRG assessment

The AJCC/CAP TRG category was scored by two pathologists (Dr. Xinjuan Fan and Dr. Yan Huang), who had more than 12 years of experience in gastrointestinal pathological diagnosis. Two pathologists were blinded to patient characteristics and follow-up information. The four-category AJCC/CAP TRG system was classified as follows [24, 25]: grade 0 (complete response), where there were no remaining viable cancer cells; grade 1 (moderate response), where only a small cluster or single cancer cells remained; grade 2 (minimal response), where there was some cancer remaining, but with predominant fibrosis; grade 3 (poor response), where there was minimal or no tumour death with extensive cancer remaining (Supplementary Table 2).

Follow-up

After the TME operation, patients were followed up with a 3-month interval during the first 3 years and then at six-month intervals thereafter. Clinical follow-up included physical examinations, blood tests and chest and abdomen computed tomography and pelvic MRI scans [26]. OS was defined as the data of diagnosis to death or when censored at the latest date if patients were still alive. DFS was defined as the time from surgery to the date of relapse or death or when censored at the latest date. LRFS (local recurrence-free survival) and DMFS were defined as the time of diagnosis to the date of local recurrence or distant metastases, respectively, or death or when censored at the latest date.

Statistical analysis

Intraclass correlation coefficient analysis was employed to evaluate the interobserver agreement regarding the percentage of residual tumour between the two physicians (poor agreement, k < 0.2; fair agreement, k = 0.21–0.40; moderate agreement, k = 0.41–0.60; good agreement, k = 0.61–0.80; very good agreement, k = 0.81–1.00). The correlation between the grade of the MRI-TRG system and pathological TRG classification was determined by Kendall’s tau-b analysis (0–0.20: poor agreement; 0.21–0.40: fair agreement; 0.41–0.60: moderate agreement; 0.61–0.80: good agreement; 0.81–1.00: very good agreement) for each MRI-TRG category. Further, the sensitivity (SE), specificity (SP), positive predictive value (PPV) and negative predictive value (NPV) in predicting the corresponding AJCC/CAP TRG category were also reported. The Kaplan–Meier method and log-rank test were used to assess the survival probability difference between MRI-TRG categories and patient outcomes (OS, LRFS, DFS and DMFS). The area under the receiver operating characteristic (ROC) curve (AUC) was generated by the ROC curve analysis to evaluate the predictive value of MRI-TRG in predicting outcome status. Cox proportional hazards modelling was employed to estimate the hazard ratios (HRs) and 95% confidence intervals (CIs) for patient outcomes. Variables with P < 0.05 in the univariate analysis were subjected to further multivariate Cox regression analysis. P values of less than 0.05 with two sides were considered as statistically significant differences. Statistical analysis was performed with the Statistical Package for the Social Sciences version 26 software programme (IBM Corporation, Armonk, NY, USA).

Results

Patient characteristics

A total of 1033 LARC patients treated with nCRT or nCT followed by TME surgery were enrolled, consisting of 252 cases in the training set and 781 cases in the validation set (Supplementary Fig. 1). As shown in Table 1, we found 3, 43, 793 and 194 patients have stage cT1, cT2, cT3 and cT4 disease, respectively. Stages cN0, cN1 and cN2 were detected in 207, 411 and 415 cases, respectively. The median time from the date of restaged MRI scan to surgery was 4 days (range: 1–7 days).

Table 1 Patient characteristics.

In the training set, the MRI-TRG category was positively correlated with perineural invasion (P = 0.002), ypT (P < 0.001) and ypN (P = 0.001) stages. In contrast, MRI-TRG category was not correlated with tumour cell-differentiation status (P = 0.050), age (P = 0.414), gender (P = 0.736), cT stage (P = 0.155), cN stage (P = 0.271), tumour distance to the anal verge (P = 0.720), lymph–vascular invasion (P = 0.064), or neoadjuvant radiotherapy (P = 0.311). In the validation set, however, MRI-TRG category was positively correlated with perineural invasion (P < 0.001), neoadjuvant radiotherapy (P < 0.001), ypT (P < 0.001) and ypN (P < 0.001) stages (Table 1).

Association between four-category MRI-TRG system and AJCC/CAP TRG classification scheme

Among 1033 LARC patients, 23 patients with relatively poor quality restaged DWI sequence but still available as the reference for the response assessment. After delineation and calculation, 25.0% (258/1033) of MRI-TRG 0 patients (training set: 60/252, 23.8%; validation set: 198/781, 25.4%), 20.7% (214/1033) of MRI-TRG 1 patients (training set: 36/252, 14.3%; validation set: 178/781, 22.8%), 31.2% (322/1033) of MRI-TRG 2 patients (training set: 88/252, 34.9%; validation set: 234/781, 30.0%), and 23.1% (239/1033) of MRI-TRG 3 patients (training set: 68/252, 27.0%; validation set: 171/781, 21.9%) were identified, respectively (Table 2).

Table 2 Distribution of AJCC/CAP TRG and MRI-TRG grades among study participants.

The mean time for one physician to draw the volumetric segmentation was 9 min for a patient. The intraclass correlation coefficient between the two physicians was 0.72. Good agreement was confirmed between the four-category MRI-TRG system and AJCC/CAP TRG classification scheme in both the training and validation subsets (Kendall’s tau-b = 0.645 and 0.671, respectively). Specifically, compared with other pairs (MRI-TRG 1 and AJCC/CAP TRG 1, MRI-TRG 2 and AJCC/CAP TRG 2, or MRI-TRG 3 and AJCC/CAP TRG 3), the MRI-TRG 0 category displayed the greatest sensitivity [training set: 91.7% (95% CI: 79.1–97.3); validation set: 90.1% (95% CI: 84.3–93.9)], specificity [training set: 92.2% (95% CI: 87.4–95.3); validation set: 92.8% (95% CI: 90.4–94.7)], PPV [training set: 73.3% (95% CI: 60.1–83.5); validation set: 77.8% (95% CI: 71.2–83.2)] and NPV [training set: 97.9% (95% CI: 94.4–99.3); validation set: 97.1% (95% CI: 95.3–98.2)] in identifying AJCC/CAP TRG 0 patients (Table 3).

Table 3 The predictive value of the MRI-TRG system according to the AJCC/CAP TRG system.

Association of four-category MRI-TRG system with survival outcome

The median follow-up time for the overall study population was 40 months (range: 4–101 months). In the training set, a significant 3-year survival (OS, DFS, LRFS, and DMFS) difference was observed among the four MRI-TRG categories (all P < 0.001) (Supplementary Fig. 2). A similar association between the four MRI-TRG categories and OS, DFS, LRFS, and DMFS was also detected in the validation set (all P < 0.001) (Fig. 2). The pairwise comparison showed that, except for the comparable survival (OS, DFS, LRFS and DMFS) ratios between MRI-TRG 0 and MRI-TRG 1 patients, any two of the four MRI-TRG categories had distinguished survival (OS, DFS, LRFS and DMFS) ratios (all P < 0.05) (Supplementary Table 3).

Fig. 2: The Kaplan-Meier curve analysis of survival among patients with distinct MRI-TRG categories in the validation set.
figure 2

Significant differences were observed in 3-year OS (a), DFS (b), LRFS (c), and DMFS (d) among the four MRI-TRG categories.

ROC and multivariate regression analysis

In the training set, ROC plot analysis confirmed that, the MRI-TRG system was closely correlated with individual outcomes (OS AUC = 0.829, 95% CI: 0.759–0.899, Supplementary Fig. 3A; LRFS AUC = 0.842, 95% CI: 0.775–0. 909, Supplementary Fig. 3B; DFS AUC = 0.762, 95% CI: 0.697–0.826, Supplementary Fig. 3C; DMFS AUC = 0.748, 95% CI: 0.681–0.815, Supplementary Fig. 3D). A similar finding was further confirmed in the validation set (OS AUC = 0.740, 95% CI: 0.674–0.805, Fig. 3a; LRFS AUC = 0.783, 95% CI: 0.732–0.835, Fig. 3b; DFS AUC = 0.786, 95% CI: 0.736–0.837, Fig. 3c; DMFS AUC = 0.722, 95% CI: 0.677–0.767, Fig. 3d).

Fig. 3: The ROC curve analysis of the four-category MRI-TRG system to predict the individual risk of disease progression in the validation set.
figure 3

Close correlations were observed between the MRI-TRG system and 3-year OS (a), LRFS (b), DFS (c), and DMFS (d) respectively.

Moreover, the variables found to be statistically significant in the univariate analysis were subjected to multivariate analysis (Supplementary Table 4, Supplementary Table 5). In the validation set, lymph–vascular invasion (P = 0.011), ypT stage (P = 0.010), cN stage (P = 0.049) and MRI-TRG system (P < 0.001) were independent prognosticators of OS. For LRFS, ypN stage (P < 0.001) and the MRI-TRG system (P = 0.002) were identified as independent prognostic factors. Moreover, ypN stage (P < 0.001) and the MRI-TRG system (P < 0.001) were associated with disease progression. Also, ypT stage (P = 0.007), ypN stage (P < 0.001), lymph–vascular invasion (P = 0.032) and the MRI-TRG system (P < 0.001) were independent predictive variables of DMFS.

Discussion

Here, a four-category MRI-TRG system was proposed similar to the four-tier AJCC/CAP TRG classification for LARC patients. This four-category MRI-TRG system quantitatively assessed images through three-dimensionally delineating the degree of tumour replacement by fibrosis. Following testing in the training set, an independent validation set further confirmed that our four-category MRI-TRG system had good agreement with the AJCC/CAP TRG classification scheme. Particularly, the MRI-TRG 0 category had the greatest power to predict AJCC/CAP TRG 0 category. Moreover, pairwise comparison showed that, except for the MRI-TRG 0 and MRI-TRG 1 categories, patients in any two of the four MRI-TRG categories exhibited a distinguished prognosis. Therefore, this four-category MRI-TRG system might be a surrogate for the AJCC/CAP TRG system in estimating prognosis and may be a promising non-invasive way to identify pCR patients for organ-preserving management strategies.

The advantage of MRI lies in its dynamic, sensitive and non-invasive capacity to assess the degree of tumour regression following neoadjuvant treatment before TME surgery for LARC [5, 30]. Nevertheless, the central question that remains controversial is whether and to what extent MRI-TRG can reflect the degree of pathological tumour regression [13, 15, 19]. In previous studies, the agreement between MRI-TRG and pathological TRG system ranged from poor to perfect [13, 15, 18, 19]. Considering the findings of Patel UB et al. for example, the MRI-TRG system was strongly associated with histopathological parameters [15]. Conversely, the pooled PAN-EX (EXPERT and EXPERT-C) study suggested that the agreement between MRI-TRG and pathological TRG category was poor (k = 0.25) [18]. Similarly, in another small cohort studies, MRI was found to have low accuracy in predicting pathological TRG [13, 19]. The underlying reason for these different findings might be the structure of the MRI-TRG system. In the MERCURY study, based on principles similar to those of the Mandard pathological TRG criteria [17], Patel UB et al. proposed a five-category MRI-TRG system for the first time [5]. The MERCURY five-category MRI-TRG system categorised individual patients qualitatively rather than quantitatively when determining the degree of tumour replacement by fibrotic stroma [5]. Therefore, the classification process was prone to being subjective and showed great potential for increasing the interobserver variability, as the fibrosis degree of two parts adjacent within the intermediate three parts (MRI-TRG 2 vs. MRI-TRG 3; MRI-TRG 3 vs. MRI-TRG 4) was often too subtle.

Actually, a similar issue was also found in the MERCURY study using the Mandard and Dworak pathological TRG system [16, 17]. Excluding patients with pathological complete response and non-responders, the microscopic difference between the intermediate response categories (e.g., TRG 2 vs. TRG 3 and TRG 3 vs. TRG 4 in the Mandard system, TRG 3 vs. TRG 2 and TRG 2 vs. TRG 1 in the Dworak system) was less clear using the five-category pathological TRG system [16, 17, 31]. In contrast, the AJCC/CAP TRG system adopted a four-category approach with two intermediate groups: good responders (AJCC/CAP TRG 1 category) and a transitional subset of patients who evolve from good to poor responders (AJCC/CAP TRG 2 category), which is more easily applied [25]. Indeed, as compared with the Mandard/Dowrak/MSKCC pathological TRG systems [16, 17, 32], the four-category AJCC/CAP TRG system was more accurate in predicting recurrence and had the highest concordance index value (0.694) [31]. Currently, the AJCC/CAP TRG system is recommended as the main pathological tool to evaluate nCRT treatment response by the NCCN Rectal Cancer Guideline Panel [26].

To achieve perfect predictive performance, we herein proposed a four-category MRI-TRG system based on the principle of AJCC/CAP TRG classification by quantitatively assessing the volume of both the residual tumour and fibrosis replacement (Fig. 1). Because the individual degree of tumour replacement by fibrosis was delineated at each image layer and further reconstructed into a three-dimensional volume, our approach is objective and more accurate when assigning the MRI-TRG category. As expected, our study found in the training set and further confirmed in the validation set that the agreement between our four-category MRI-TRG system and the AJCC/CAP TRG classification scheme was good (k = 0.671), and we proposed that our four-category MRI-TRG system could be used as a surrogate for AJCC/CAP TRG classification. Significantly, the Kaplan–Meier analysis also proved that, except for the MRI-TRG 0 and 1 categories, any two categories in our four-tier MRI-TRG system had distinguished survival ratios (Fig. 2 and Supplementary Table 3). In contrast, the significant survival difference between any two categories of the five-category MRI-TRG and five-category pathological TRG system has not yet been confirmed [5, 13].

Importantly, our study proved that the MRI-TRG system is a reliable method to identify possible complete responders for organ-preserving management. Given the perfect outcome among patients in the pCR subgroup, the possibility of the organ-preserving strategies (e.g., local excision, “watch and wait”) has been raised for the pCR patients in the last decade to reduce treatment-related toxicities and deterioration of the quality of life (e.g., anal leakage, and sexual and urinary dysfunction) without compromising the satisfactory outcome [21, 33]. However, due to the limited pCR group size (9–48 patients) included for analysis [13, 18, 34], whether MRI is sensitive and reliable for predicting pCR is still debatable thus far. Here, to improve the PPV of MRI-TRG 0, we utilised this simple binary system instead of the volumetric analysis. Once the signal of the residual signal was captured in the T2, the response assessment would not be MRI-TRG 0. Furthermore, using a consecutive large group of pCR patients (n = 219), we demonstrated that, even with good agreement between the four-category MRI-TRG and AJCC/CAP TRG systems, the MRI-TRG 0 category still had the highest power to predict pCR (Table 3). In contrast with the finding that the MRI-TRG 1 through three categories were sensitive to detect pCR (sensitivity: 94.0%) [34], we found that radiochemotherapy induced anorectal injury and changes, such as oedema, inflammation, necrosis, and particularly fibrosis, were hard to be distinguished from the residual tumour on the restaged MRI (Supplementary Fig. 4) [35, 36]. Thus, the MRI-TRG 0 showed the highest specificity and sensitivity in both the validation and training sets, while the MRI-TRG 2&3 stage had relatively lower accuracy in predicting pTRG2&3. Therefore, the patients who were assessed as the MRI-TRG 0 could employ the Watch & Wait policy in the prospective clinical studies, but should also be warranted by the other necessary examinations, such as the digital rectal examination, enteroscopy and serological tumour biomarkers, while the close inspection is essential in the follow-up period [37, 38].

In addition, as shown in Fig. 1 and Supplementary Fig. 4, the MRI images from MRI-TRG 2&3 stage individuals included not only residual tumour tissue, but also treatment-induced injury and changes, such as oedema, necrosis and fibrosis. The signals of tumour tissue and non-tumour tissue were similar and mixed. This made it was difficult to distinguish them by the naked eye [35, 36, 39]. It was notable that artificial intelligence has been widely used in tumour diagnosis, treatment and curative effect prediction [40,41,42,43]. Also, we and others groups ever reported that artificial intelligence based on pre-treatment or post-treatment multi-parametric MRI performed well in predicting the response after neoadjuvant chemoradiotherapy for LARC patients [39, 44, 45]. Therefore, a radiomics-based deep-learning approach might be a promising selection to differentiate residual tumour and treatment-induced injury in the future. In the following study, the combination of experienced radiologists with artificial intelligence might be an accurate way to identify the radiochemotherapy induced anorectal injury from the residual tumour.

It should be noted that our study had some limitations. Firstly, this retrospective study may have introduced potential bias and confounding variables, especially in the neoadjuvant treatment, and in enrolling a cohort of patients from a prospective study who were not treated with radiotherapy. We addressed this issue by selecting a large consecutively recruited patient group. In the survival analysis, radiotherapy was confirmed not to be an independent prognostic factor. Thus, the confounding effect of the treatment regimen is controlled to a minimal. In addition, radiologists are prone to misinterpret fibrosis as a residual tumour [46], and greater experience is important to narrow the gap of interobserver variability. To achieve perfect interobserver agreement, two experienced physicians specialising in rectal cancer for more than seven years joined our project. The agreement analysis showed that the interobserver variability was small (k = 0.72). Moreover, we outlined the tumour and fibrosis at T2 sequence, and simultaneously employed the DWI sequence as supportive information to determine the zone of tumour as well as fibrosis in our study to improve the accuracy of the assessment of tumour response. Theoretically, the integration of T2 and DWI would achieve an improved prediction efficacy [47,48,49]. Previously reported studies have confirmed that the combination DWI with T2 sequences intensified the response assessment for LARC after neoadjuvant treatment [49,50,51]. Moreover, artefacts have an evident influence in accurate assessment of treatment response based on DWI [28, 51]. In order to reduce the artefact, bowel preparation consisting of a Fleet’s™ enema was performed routinely before the MRI scan in most of the enroled cases [50]. In further clinical practice, bowel preparation should be advised to reduce the artefact. In addition, the innovation techniques such as reduced FoV EPI DWI might be novel selection [51].

In conclusion, we proposed a four-category MRI-TRG system as a surrogate of the AJCC/CAP TRG classification scheme to predict the neoadjuvant treatment response. Importantly, this tool might be a powerful and non-invasive means to identify complete responders for organ-preserving management strategies.