Introduction

The treatment for locally advanced rectal cancer involves neoadjuvant chemoradiation therapy (CRT) followed by surgery and then adjuvant chemotherapy. After neoadjuvant treatment, approximately 50–60% of patients are downstaged and 15–27% demonstrates a pathologic complete response (pCR) [1]. Hence, a non-operative approach presents an increasingly attractive alternative to surgery for these patients after neoadjuvant treatment [2]. However, the accurate prediction of complete response after neoadjuvant treatment remains a challenge.

To diagnose a pCR after neoadjuvant treatment, digital rectal examination and endoscopy have accuracies of approximately 85% but are limited in extra-luminal assessment and the interpretation varies among surgeons [3, 4]. Meanwhile, rectal MRI is the preferred imaging modality for local staging and restaging of rectal cancer; in clinical practice, it is preferably performed with endoscopy [5]. A limitation in current clinical practice for all these imaging modalities is that they all involve qualitative assessment with variable reproducibility between different readers [6]. For rectal MRI, T2-weighted imaging and diffusion-weighted imaging (DWI) are the two most frequent sequences used to diagnose tumor response [3, 7,8,9]. However, the reproducibility is suboptimal, predominantly fair for T2-weighted imaging and moderate for DWI [10, 11].Thus, there is a critical need for accurate, consistent, and quantitative measures of response. Such measures could help to stratify patients before treatment and allow alternative and more effective individualized therapies to be offered to patients.

Particularly, quantitative measures derived from the computerized analysis of MR images have been shown to predict survival [12, 13]. Thus, they may also allow the identification of risk factors for local recurrence and distant disease, serving as imaging markers to tailor treatment. Such an approach involving computer automated analysis of images is known as radiomics [14, 15].Research involving radiomics analysis in rectal cancer is emerging [16,17,18,19,20], with a focus on predicting and diagnosing treatment response after neoadjuvant CRT [12, 19, 21,22,23,24,25].

The purpose of this study was to investigate the value of T2-radiomics extracted from pre-treatment rectal MRI combined with anatomical MRI staging criteria in predicting complete response after neoadjuvant CRT for rectal cancer. We hypothesized that the non-invasive radiomics classifiers could add value to stratify patients at the time of initial cancer diagnosis based on the predicted treatment response.

Materials and methods

Study population

This retrospective, single-institution study was compliant with the Health and Insurance Portability and Accountability Act and was approved by the institutional review board with a waiver for written informed consent. We examined our retrospective database for consecutive patients who underwent pre-treatment rectal MRI before starting neoadjuvant CRT from October 2011 to January 2015 followed by surgery (a and b).The most common treatment at our institution during the study period was radiation (50.4 GY) with preferably Capecitabine over 5 weeks [26, 27]. We did not separate our analyses based on drugs or doses received, since it was not a scope of our study. All 102 patients in this study have been previously analyzed in a previous study [19]. In the previous study, the same patients were used to analyze the added value of radiomic features on restaging rectal MRI performed after neoadjuvant treatment, whereas here we focused on the value of radiomic features on the baseline rectal MRI performed before neoadjuvant treatment.

The exclusion criteria were: (a) unavailable baseline MRI scan, (b) mucinous tumor, (c) poor image quality, (d) incomplete tumor coverage on MRI, and (e) rectal perforation. (See Fig. 1 for the flowchart for patient inclusion.)

Fig. 1
figure 1

Flowchart of patient inclusion

MRI protocol

Of 102 rectal MRI examinations, 93/102 (91%) were performed at our institution while the remaining 9/102 (9%) examinations were performed at outside institutions. MRI examinations were performed using different GE Healthcare System platforms (Discovery 2MR750, Optima MR450w, Signa EXCITE, and Signa HDxt; Waukesha, WI) with a phased-array coil. MRI parameters at our institution are summarized in Table 1. The minimum requirements for the inclusion of MRI sequences were: (a) high-spatial-resolution axial oblique T2-weighted imaging through the tumor, (b) section thickness of 2–4 mm, and (c) field of view of 180–220 mm. DWI and dynamic contrast-enhanced imaging were included in the qualitative assessment when available.

Table 1 MRI parameters at our institution

Qualitative MRI assessment

Anatomical MRI staging criteria were collected from institutional standardized reports for rectal MRI used in clinical practice routinely. The standardized reports were made by our group of specialized radiologists with 5–15 years of experience; in cases of discrepancy or lack of information, they were resolved by a radiologist with training in oncologic imaging (EO).

The following anatomical MRI staging criteria were assessed to complete the standardized reports: (a) tumor location, (b) tumor length, (c) tumor T stage, (d) extramural depth of invasion, and (e) extramural vascular invasion (See Fig. 2).Tumor location was specified according to the distance from the anal verge as low (0–5 cm), middle (5.1–10 cm), or high (10.1–15 cm) [28]. Tumor stage was determined according to the American Joint Committee on Cancer Staging Manual of TNM staging [29] as follows: T1 (tumors are confined to mucosal/submucosal layer), T2 (tumors invade muscularis propria), T3 (tumors invade the mesorectum), and T4 (tumors invade the peritoneum or adjacent organs). Extramural vascular invasion was defined as present when there was expansion or irregularity of vessel wall, loss of normal vascular flow void, and intraluminal tumor signal within a vessel contiguous with the tumor [30, 31].

Fig. 2
figure 2

Qualitative features assessed on rectal MRI in this study. a Tumor (arrow) length (yellow line); b T4a tumor (arrowhead) abutting anterior peritoneal reflection (arrow); c extramural vascular invasion (arrow), arrowhead pointing to tumor

Quantitative MRI texture analysis

Image segmentation

For the training set, a radiologist with training in oncologic imaging (EO) manually segmented the volume of interest (VOI) on all slices on all 102 high resolution axial oblique T2-weighted images using open-source software (ITK-SNAP, version 3.4.0; http://itksnap.org). Tumor deposits, desmoplastic reaction, and extramural vascular invasion were excluded from the segmentation. For the validation set, two different radiologists (JGP and VP) with 5 years of experience in rectal MRI, respectively, independently reviewed all images and segmented tumors on all slices on 66/102 T2-weighted images following the same guidelines as above.

Quantitative texture analysis

Prior to radiomics feature extraction, all images and segmentations were resampled to obtain isotropic voxels of 1 mm3 to compute radiomics features in 3D. A total of 108 radiomics features were extracted using the CERR software which is compliant with the Imaging Biomarker Standardization Initiative (IBSI) standards [32]. These features were divided into the following categories: 20 features from the intensity histogram, 26 features from the gray-level co-occurrence matrix (GLCM), 15 from the gray-level size-zone matrix (GLSZM), 7 shape features, and 40 edges descriptors. GLCM and GLSZM were constructed with a bin size of 128, where the normalized MRI signal intensities were grouped by their values into those bins. Then, a merging strategy was used to compute the GLCM measures, whereby the co-occurrence matrices computed using directional offsets in all directions (N = 24) were combined by averaging [33]. Edge descriptors metrics were obtained with first order statistics (mean, standard deviation, skewness, and kurtosis) of edge maps computed with Sobel, Laplacian of Gaussian, and Gabor (2 bandwidths (σ)—2, \(2\sqrt 2\) and 4 angles (θ)—0°, 30°, 45°, 90°) edge filters.

Reference standard

The reference standard was the standardized histopathologic report of the surgical resection specimens from total mesorectal excision. No additional pathologic analysis was done for our project. pCR was defined as stage ypT0N0.

Statistical analysis

All statistical analyses were performed using R studio with R version 3.5.1 and pROC and caret packages. For selecting radiomics features robust to the magnetic field strength, the Wilcoxon test was performed with a significance threshold of p < 0.05 after Bonferroni adjustment for multiple comparisons. Radiomics features that were not dependent on the considered scanner parameters were deemed stable and used in the subsequent analysis.

From the stable radiomics features, Least Absolute Shrinkage and Selection Operator (LASSO) with fivefold stratified cross-validation was computed to select the most relevant radiomics features for classification. Stratified cross-validation considers that the outcomes (or class labels) are imbalanced, such that all cross-validation folds in the model computation have a similar prevalence of classes. For this study, this strategy maintained the ratio of the number of patients who had a pCR to those who did not. The selected most relevant radiomics features and their coefficients were used to compute a radiomics score [34] which was the weighted sum of features according to their relevance as determined by the LASSO classifier. The capacity of this radiomics score to discriminate pCR patients was investigated with the Wilcoxon test and receiver operating characteristic (ROC) curves. The radiomics score was also computed from all three different radiologist’s segmentations to evaluate its inter-rater robustness.

Finally, a support vector machine (SVM) model with a radial basis function combining the radiomics and anatomical MRI staging criteria was computed. The two models, namely, the radiomics only and anatomical MRI-radiomics based models were then compared using the deLong test of their ROC curves. The SVM classifier was again trained using cross-validation and the accuracy was computed by evaluating the classifier model in each fold using data that were never used in the model training.

The radiomics features were extracted using the open-source CERR software (https://github.com/cerr/CERR/wiki/Radiomics). All code including the R code used in the machine learning analysis are available through the github repository.

Results

Patient characteristics

Our final study population included 102 patients, 60 (59%) men and 42 (41%) women, with a median age of 61 years (range, 31–91 years). Eighty-three patients (81%) achieved pathologic partial response (pPR) at pathologic examination of surgical specimens and 18 (19%) achieved pCR. The average time between end of the CRT and surgery for this study population (n = 84, 18 no date available of end of radiation) was 91.86 days (interquartile range (IQR), 30.5 days). Among those with a complete response (n = 16), the average time between end of CRT and surgery was 85.69 days (IQR, 25 days).Among those with a partial response (n = 68), the average time between end of CRT and surgery was 93.32 days (IQR, 32.5 days). We conducted a t test to compare the means of the complete and partial responders: the p-value was 0.72 (no significant difference between the two groups).

Qualitative MRI evaluation

Anatomical MRI staging criteria were available for all patients. The median tumor length was 4.65 cm (range 2.6–9.4 cm). Most patients had tumor in the mid rectum (47/102,45.7%), T3 stage, and no extramural vascular invasion (80/102, 78.4%). Table 2 summarizes the anatomical MRI staging criteria.

Table 2 Demographics and tumor characterization

Radiomic classification of pCR

Of 108 radiomics features, 35 were not robust to the magnetic field strength and were removed for the rest of the analysis (See Supplemental Information). Non-robust features mostly comprised those describing global statistics (16/20 features); edge features from Sobel (4/4); and some GLCM, GLSZM, and Gabor edge features (8/25, 3/15, and 4/40, respectively). Of the remaining 73 features, 6 were selected by LASSO: 2 shape features [surface area (coeff = 0.28) and compactness (coeff = − 0.02)], 1 GLCM feature [difference variance (coeff = − 0.0005)], 1 GLSZM feature [size-zone low-gray level emphasis (coeff = 0.10)] and 2 edges descriptors [standard deviation of Gaborσ=2, θ=30°(coeff = 0.11) and kurtosis of Gaborσ=2√2, θ=30° (coeff = 0.05)].

The radiomics score computed as a weighted sum of these 6 radiomics features produced an area under the curve (AUC) of 0.75 (95% CI: 0.63–0.87) for distinguishing patients by pCR. The best cut-off value of 1.39 was associated with a specificity of 0.74 and a sensitivity of 0.70. The Wilcoxon test using the radiomics score showed a significant association with pCR (p = 0.001), where those with pCR had a lower radiomics score (median = 1.30, IQR = 0.35) when compared with patients with partial response (median = 1.51, IQR = 0.44) (See Fig. 3a).

Fig. 3
figure 3

Receiver operating curves showing the specificity and sensitivity of the radiomics score to identify pathologic complete response (pCR) patients (on the left) and boxplots showing the distribution of the radiomics score for pathologic partial response and pCR patients using the best cut-off value (on the right). a Results based on manual segmentations of one radiologist for the entire cohort (n = 102). b, c Results based on manual segmentations of two radiologists for a subset of patients (n = 66) performed independently of each other

Robustness of radiomics features from different segmentations for classifying pCR

Two independently computed radiomics classifiers using the features extracted from the segmentations from the two radiologists classified pCR with an AUC of 0.75 (95% CI 0.59–0.91) and 0.71 (95% CI 0.52–0.90), respectively (See Fig. 3b, c). The radiomics score cut-off value identified on the full training set was also applied to split the patients into two groups, i.e., pCR vs. not complete response (nCR), using the radiomics scores computed from features using segmentations from the two radiologists. These splits were also associated with pCR as determined using the Wilcoxon test (R1 p = 0.01, R2 p = 0.03) and resulted in a specificity of 0.73 and 0.82 and sensitivity of 0.69 and 0.64, respectively.

Support vector machine classifier model for pCR prediction

Using only the anatomical MRI staging criteria, the most important clinical features selected by the SVM model, in order, were the tumor length, the depth of extramural infiltration, a mid-rectum location, and presence/absence of extramural vascular invasion (Fig. 4a). This anatomical MRI-only model achieved an accuracy of 67% (sensitivity: 42%, specificity: 72%).The same model built in combination with the radiomics score showed an increase in accuracy to 74% (sensitivity 58%, specificity 77%), and the most important features were the radiomic score followed by the same anatomical MRI staging criteria from the other SVM model (Fig. 4b).

Fig. 4
figure 4

Importance of the selected features in the support vector machine models using a anatomical MRI staging criteria and b anatomical MRI staging criteria and the T2-radiomics score in combination

Discussion

To allow non-operative management following neoadjuvant chemotherapy, the accurate prediction of complete response is essential but remains a challenge. In our study, we were able to compute a radiomics score using T2-weighted images from baseline rectal MRI scans that distinguished patients with pCR from patients with pPR after neoadjuvant chemotherapy (AUC = 0.75). The predictive value of this score was robust, i.e., not affected when computed using the segmentations of three different radiologists. When the radiomics score was used in combination with anatomical MRI staging criteria, the prediction of pCR was improved (74% for the combined model vs. 67% for the clinical-only model).To our knowledge, this is the first study to date using a combined approach including radiomics and anatomical MRI staging criteria to predict complete response after neoadjuvant therapy in rectal cancer patients.

A prior study using CT scans showed that texture analysis can predict five-year overall survival in colorectal cancer patients [35]. However, rectal MRI is the gold standard imaging technique for preoperative staging and has been proven to allow preoperative stratification of patients and permit better targeted therapy [36]. In a recent study, De Cecco et al. demonstrated the potential of T2-based texture parameters to act as predictive tumoral markers of response to neoadjuvant CRT in rectal cancer; however, this study was limited by a small number of patients [21]. In another recent study, Nie et al. assessed the relationship between pre-treatment MRI values and pCR to build predictive models, showing encouraging results [24]. In our study, the computed radiomics score (AUC 0.75) is in line with those of other recently published baseline radiomic studies in rectal cancer; for example, Vandendorpe et al. reported an AUC of 0.70 in predicting downstaging of locally advanced rectal cancer from neoadjuvant CRT using baseline CT images [37].

Our approach combined both T2-radiomics and anatomical MRI staging criteria in one model. Using this combination, compared with using an anatomical MRI-only or radiomics-only model, we were able to improve the accuracy to predict tumor response and stratify patients at baseline to complete responders or partial responders. Our results are in line with the current movement towards individualized treatment in oncology in general and locally advanced rectal cancer specifically. In addition, our results are in line with the need to reliably assess response to treatment earlier. The development of tumor markers based on images such as anatomical and functional MRI may offer additional guidance and the applied field of radiomics such as that used in our study warrants further investigation. Important considerations when performing radiomics analysis is the reproducibility of the extracted features and accounting for the sources of variability starting from the volumetric segmentation of the tumor [38]. To evaluate reproducibility in our study, we used segmentations from three different radiologists with different skill levels and found no difference in the predictive value of the radiomics score. We validated if the model we built is robust to the segmentation by using the segmentation from two other radiologists. While we were not able to split the data into a training and testing set because of the imbalanced dataset (only 19 complete responders), we were able to validate the robustness of the radiomics signature to segmentation.

The present study suggests that combined visual and quantitative characterization of rectal tumors may potentially enhance the discrimination between complete responders and partial responders to neoadjuvant treatment at the time of diagnosis. Further studies and correlation with pathologic material will be important to better understand the potential interplay between combined radiomics features calculated in the radiomics score, and tumor stromal elements. We postulate that tumor biology and stromal elements are different between complete responders and partial responders, and radiomics assessment may provide valuable information in a form of virtual biopsy to aid current diagnostic methods. Knowledge of the predicted response after CRT using radiomic features may influence the treatment decisions of the multidisciplinary team, thereby allowing a more personalized treatment and improving the cost-effectiveness of rectal cancer treatment.

Our results should be interpreted in light of the limitations associated with our study design. First, the number of patients was limited due to the retrospective nature of this study. Second, we did not include patients on nonsurgical management which may have introduced selection bias into the study. However, this decision was made so as to have histopathologic reference standard for all patients as well as to achieve a homogeneous dataset to develop a radiomics algorithm. Third, we were limited to an imbalanced dataset between patients with partial response (n = 83) and patients with complete response (n = 19). However, we note that the percentage of patients with complete response in our cohort is in the range of what is described in the literature [1]. Therefore, the limitation is not that our data is imbalanced compared with what we see in clinical practice and what is described in the literature but rather that the limitation is statistical in nature. Fourth, we did not separate our analyses based on drugs or doses received. Our study was designed as a pilot study, mainly to see if complete responders have radiomics features that may be identified at baseline. For partial responders, further studies are needed to investigate different drugs and doses on a larger dataset. Fifth, the anatomical MRI staging criteria was based on available clinical reports which were made inconsensus; we did not evaluate the inter-reader agreement of the anatomical MRI staging criteria assessment since it was not an aim in our study. Sixth, radiomics was performed on only T2-weighted images to minimize variability in acquisition parameters. Further studies exploring performance of other imaging sequences may improve radiomics analysis. Seventh, manual segmentation of volume of interest is a time-consuming process; hence, there is a great need to develop easy-to-use automated or semi-automated tools to encourage real-world application of radiomics in daily clinical practice. Finally, our study did not have external validation. We performed internal validation using segmentations from three different radiologists. Hence, further studies are needed to resolve these limitations and validate our data to provide generalization of the results.

In conclusion, our preliminary results suggest that radiomics features derived from T2-weighted MR images of rectal cancer have the potential to serve as imaging markers of tumoral response to neoadjuvant chemotherapy at time of initial diagnosis of rectal cancer, and when combined with standardanatomical MRI staging criteria, improve accuracy. Further studies with larger sample sizes are warranted for additional evaluation and validation of our preliminary data.