Introduction

In patients with rectal cancer, accurate assessment of response to chemoradiotherapy has increased in importance as the variety of treatment options has grown and a controversial non-operative approach may even be considered for some patients in whom complete tumour regression is believed to have occurred [1]. In addition, tumour shrinkage after neoadjuvant therapy may allow for a less invasive organ-sparing surgical approach and a higher rate of sphincter-preservation. However, while routine magnetic resonance imaging (MRI) has a well established role in pre-therapeutic staging of patients with rectal cancer [2], its role in response assessment after chemoradiotherapy is less well established, largely because of the difficulty encountered in distinguishing between therapy-induced scarring and residual viable tumour on T2-weighted sequences [3, 4].

To overcome these limitations [5, 6], other MRI approaches have been proposed in recent years, including the assessment of changes in T2 signal intensity [7] or of changes in tumour volume, and the non-invasive interrogation of biological tumour properties using diffusion-weighted (DW)-MRI [817] and dynamic contrast-enhanced (DCE)-MRI [1824]. In addition, some researchers have attempted volumetric rather than single region-of-interest-based tumour assessment on T2-weighted or DW-MRI sequences, and reported promising results [11, 2527].

While investigators have studied many different metrics of interest to rapidly move the field along and inform the debate on the utility of post-treatment MRI, the various metrics have not been applied in parallel in a single, homogeneous cohort of patients so that their relative value could be directly compared. As such, it is still unclear which of these metrics performs best in assessing response after neoadjuvant treatment. The purpose of this study was therefore to determine which of numerous morphological, volumetric and biological MRI metrics performs best in a head-to-head comparison using a cohort of patients with locally advanced rectal cancer from a single institution who were managed similarly with respect to neoadjuvant chemoradiation dose and imaging schedule.

Materials and methods

Patient selection

The institutional review board approved this retrospective, Health Insurance Portability and Accountability Act (HIPAA)-compliant study and issued a waiver of informed consent. Our institution’s Picture Archiving and Communication Systems (PACS) and electronic medical records were searched for the years 2007 to 2013 to identify patients who (1) had biopsy-proven primary rectal adenocarcinoma; (2) underwent long-course neoadjuvant chemoradiotherapy; (3) underwent MRI examinations before the start and after the completion of neoadjuvant treatment featuring DW and/or DCE sequences; and (4) had surgical tumour resection performed at our institution.

This initial search for patients with at least a pre-therapeutic MRI and primary rectal adenocarcinoma yielded a total of 812 patients (1/1/2007 to 1/27/2013). Given that post-operative re-staging was not performed uniformly or regularly at our institution until 2010 as well as the increasing use of induction chemotherapy or definitive chemotherapy at our institution, most patients did not meet the inclusion criteria above, resulting in a total of 26 homogeneous patients with available post-therapeutic MRI examination. Of these, one patient had to be excluded due to the placement of a rectal stent before post-treatment MRI interfering with imaging, and another patient was excluded due to partial excision of the tumour before MRI. The final cohort therefore consisted of 24 patients (see Fig. 1). Three patients were excluded from analysis of DW imaging, because no DW-MRI sequence was available (n=2) or because of imaging artefacts (n=1). The median interval between the two MRI examinations was 96 (range: 71–131) days, and the median interval between a second MRI and surgery was 20 (range: 3–54) days. All patients underwent standard long-course chemoradiation with 50–50.4 Gy in 25–28 fractions and application of 5- fluorouracil between the first and the second MRI, according to National Comprehensive Cancer Network (NCCN) guidelines [28].

Fig. 1
figure 1

Flow chart detailing the patient selection process. The final cohort consisted of 24 patients

Magnetic resonance imaging (MRI) and image analysis

MRI examinations were performed on different MRI scanners manufactured by GE Healthcare (Waukesha, WI, USA) at a field strength of 1.5 Tesla (n=40) or 3 Tesla (n=8) using a standardized MRI protocol that included standard high-resolution T2-weighted imaging in axial, sagittal, coronal and oblique orientation (TR: 4400-5000; TE: 90-110; echo train length: 12–24; slice thickness: 3–4 mm; interslice gap: 1 mm; FOV: 20 cm; matrix: 320 x 160; NEX: 2), an axial DW sequence (single-shot spin-echo EPI sequence, b-values: 0 and 750–1,000 s/mm2; TR: 1,800–5,550ms; TE: 60–112ms; slice thickness: 3–5 mm; interslice gap: 1 mm; FOV: 18–40 cm; matrix: 96–256 x 96–128; NEX: 3–6; mean acquisition time: 2.4 min) and a sagittal DCE-MRI sequence (TR: 3.1–7.9 ms; TE: 0.9–4.2 ms; slice thickness: 4–10 mm; no interslice gap; FOV: 20–34 cm; matrix: 256–320 x 128-192; mean temporal resolution: 8.3 (5–11.5) s; 30–40 phases; mean acquisition time: 5.2 min). A bolus of Gd- DTPA (Magnevist, Bayer Schering, Berlin-Wedding, Germany) at a constant dose of 0.1 mmol/kg was power injected at a rate of 2 ml/s followed by a saline flush for all patients.

Two readers with experience reading rectal MRI (Reader 1: AMH with 5 years of experience; Reader 2: MJG with 10 years of experience) independently assessed each pre- and post-treatment MRI, blinded to clinical and histopathological information. They each assessed the following metrics, which were chosen either because they were already frequently used for tumour assessment or because they had been reported to provide significant value in response assessment on rectal MRI after therapy (see Fig. 2 a–d for examples):

Fig. 2
figure 2

(a) T2- weighted image (sagittal): Measurement of longest tumour diameter. (b) T2- weighted image (axial-oblique): Measurement of product of largest bidirectional measurements (excluding the lumen) and T2 volumetry (region of interest (ROI) was drawn on every slice with visible tumour). (c) diffusion-weighted image (axial, b=1000 s/mm2): tumour volumetry on diffusion-weighted imaging (DWI; ROI was drawn on every slice with visible tumour) and calculation of apparent diffusion coefficient (ADC). (d) T1- weighted dynamic contrast-enhanced (DCE) image (sagittal): tumour volumetry on DCE. Ktrans was derived from a ROI in maximal tumour extent

  1. (1)

    Longest diameter of the tumour on any plane (measured on T2-weighted sequences, currently part of many response assessment approaches for other tumours, including RECIST [29]).

  2. (2)

    Product of largest bidirectional tumour measurements (largest diameter and perpendicular tumour diameter) of the tumour on the axial oblique plane, excluding the lumen (measured on T2-weighted sequences, a measurement similar to that proposed by the World Health Organization [30]).

  3. (3)

    Relative T2 ratio (SIrel = SItumor/SImuscle) based on a single region of interest (ROI) in a representative region of the tumour and an ROI in the centre of the obturator internus muscle, carefully avoiding any intramuscular fat (a metric introduced by Kluza et al. [7]).

  4. (4)

    Tumour volume on T2-weighted high-resolution sequences, with plane selection being determined based on best visibility of the tumour (e.g. coronal for low rectal cancers) and kept constant for pre- and post-therapeutic measurements [26].

  5. (5)

    Tumour volume on DW imaging (measured on the DW sequence) and voxel-wise calculation of apparent diffusion coefficient (ADC) values using a monoexponential model, based on the volumetric assessment [9, 25, 31].

  6. (6)

    Tumour volume on DCE sequences and the transfer constant Ktrans of the generalized Tofts model [32], measured by drawing a region of interest on a slice in maximal tumour extent, including all visible tumour on this slice [23]. A population-derived arterial input function and T1 reference times (1,317 ms at 1.5T and 1,597 ms at 3T) were used.

  7. (7)

    MR-based tumour regression grade (mrTRG), see Patel et al. [5, 6] for examples.

All morphological tumour assessments (longest tumour diameter, product of bidirectional tumour measurements, T2 ratio) were performed with commercial PACS software (Centricity, GE Healthcare, Waukesha, WI, USA). For volumetric tumour assessments (T2-weighted, DW- or DCE-MRI sequences), each radiologist drew an ROI encircling the entire tumour on every slice where it appeared, using the software Image J (version 1.47m, National Institutes of Health, Bethesda, MD, USA [33]). The data from these ROIs were then analyzed by in-house software written in Matlab (Matlab R2014b, The MathWorks, Inc, Natick, MA, USA) to calculate the volume, apparent diffusion coefficients (mean value for the whole tumour volume) and transfer constant Ktrans values, respectively.

Histopathological workup of the specimens

Each patient underwent surgical resection of the tumour by our colorectal surgeons, each with colorectal cancer surgery specialty training and certified in the performance of total mesorectal excision (TME). Either a low anterior resection (LAR) or abdominoperineal resection (APR) was performed and the specimen was examined with histopathology according to institutional standards. Apart from the routine histopathological assessment including the assessment for surgical margins and other histological features (tumour type and differentiation, involvement of the perineural (PN) or lymphovascular (PV) space by tumour, T stage and N stage; AJCC 7th edition) the percent tumour response was estimated based on the amount of fibrosis and inflammatory tissue versus the amount of residual viable carcinoma in the lesion [34, 35].

Statistical analysis

Summary statistics were calculated for pre- and post-treatment MRI parameters, including percent change. Wilcoxon signed rank tests were used to confirm whether the median difference for each parameter was significantly different from zero. To analyze agreement between the two readers, we used the concordance correlation coefficient [36], including 95 % confidence intervals (CIs). This coefficient combines measures of both precision and accuracy to determine how far the observed data deviate from the line of perfect concordance (i.e. the line at 45° on a square scatterplot). The closer the coefficient is to 1, the better the agreement between the two readers for that imaging parameter. Cohen’s kappa for pairwise agreement analysis was used for categorical parameters, including 95 % CIs. Spearman’s rank correlation was used to analyze the correlation of the imaging parameters with percent tumour response, including 95 % CIs. A Bland Altman assessment for agreement was used to compare the two readers using the absolute difference as a percentage of the average of the two readers, including 95 % limits of agreement. The type-one error was set to 5 % two-sided. R version 3.1.1 was used for all analyses, including the epiR, psych and mada packages.

Results

Patients

The 24 patients included in the study had a mean age of 57 (range: 37–86) years; 19 of them were male and five were female. The pre-treatment clinical stage was cT2 in three patients (all of them also cN+), T3 in 17 patients and T4 in four patients. Histopathological tumour stage after resection was pT0 in four (percentage tumour regression %TR: 100 %), pT1 in two (%TR: 80–90 %), pT2 in seven (all N0, %TR: 30–95 %), pT3N0 in four (%TR: 60–95 %) and pT3N1 in seven (%TR: 25–99 %) patients.

Median values for each MRI parameter on pre- and post-treatment MRI are given in Table 1, as are the median percentage changes in the parameters from pre- to post-treatment MRI. The data indicate that the median difference between pre- and post-treatment parameters is significantly different from zero, except for ADC and Ktrans.

Table 1 Median (range) for pre- and post-treatment MRI parameters and percentage change in MRI parameters through therapy (including p-values for the change of MRI parameters), along with results of MRI-based treatment response grading (mrTRG)

Inter-reader agreement

Inter-reader agreement was found to differ greatly among the MRI parameters measured, with agreement on pre-treatment values generally being superior to agreement on post-treatment values (Table 2, Fig. 3). In the context of response assessment after therapy, the best agreement was found for ADC (CCC: 0.766, 95 % CI 0.510–0.898), DCE volumetry (CCC: 0.700, 95 % CI 0.527–0.817) and T2 volumetry (CCC: 0.609, 95 % CI 0.401–0.758), when assessing absolute post-treatment values. Bland Altman plots on the percent change scale indicate that the bias is minimal, and the limits of agreement are sufficiently close for us to conclude there is good agreement (Fig. 3).

Table 2 Inter-reader agreement (Concordance Correlation Coefficient) and 95 % confidence intervals for all MRI parameters (pre- and post-treatment). Agreement on post-therapeutic MRI-based tumour regression grade (mrTRG) was calculated using Cohen’s kappa for pairwise analysis
Fig. 3
figure 3

Bland-Altman plots for the measured MRI parameters at pre- and post-therapeutic examinations that were significantly associated with tumour response (units: mm3). The Bland-Altman plot shows the percent difference of the measurement between the two readers versus the average of the two readers’ measurements. The middle dotted line is drawn at the mean difference, and the upper and lower lines are the 95 % limits of agreement (calculated using the standard deviation)

Correlation between MRI parameters and histopathological tumour regression

The only two parameters that were significantly associated with histopathological tumour regression for both readers were post-treatment values for DW imaging (DWI) volumetry (rho = −0.574, 95 % CI −0.794 to −0.222 for reader 1; rho = −0.453, 95 % CI −0.724 to −0.061 for reader 2) and post-treatment DCE volumetry (rho = −.547, 95 % CI −0.779 to −0.184 for reader 1; rho = −0.522, 95% CI −0.764 to −0.150 for reader 2).

For reader 1 alone, both the percentage change in T2 volume (rho = −0.636, 95 % CI −0.827 to −0.313) and post-treatment absolute T2 volume (rho = −0.565, 95 % CI −0.789 to −0.209) were significantly associated with histopathological tumour regression. For this reader, percentage changes in DWI and DCE-MRI tumour volumes (rho = −0.579, 95 % CI −0.796 to −0.229 and rho = −0.745, 95 % CI −0.883 to −0.488, respectively) and morphological assessment of therapeutic response (mrTRG) were also found to be highly significant (rho = −0.751, 95 % CI −0.886 to −0.499).

None of the other ‘morphological’ parameters (longest tumour diameter, product of bidirectional tumour measurements, T2 ratio) or functional parameters (ADC, Ktrans) were found to be associated with tumour regression for either reader (Table 3).

Table 3 Spearman rank correlations (including 95 % confidence intervals (CIs)) of percentage change in MRI parameters and of absolute post-treatment values of MRI parameters with percentage tumour response on histopathology

Discussion

In our study, volumetric tumour measurements on post-therapeutic DW-MRI and DCE-MRI sequences were the only metrics significantly associated with percentage tumour regression on histopathology for both readers, and inter-reader agreement was higher for DCE-MRI volumetry than for DW-MRI volumetry.

Neoadjuvant chemoradiation is known to cause tumour shrinkage and to lower the rate of local recurrence in patients with advanced rectal cancer [37]. However, the exact degree of tumour regression induced varies widely among individual patients, ranging from complete pathological response (pCR) to (rarely) only minor tumour shrinkage. Accurate assessment of therapeutic response would be of great clinical value, as it could help in selecting the best therapeutic approach (potentially a less-invasive organ-sparing surgical approach) and in determining the feasibility of non-operative management, which is generally reserved for those with a complete clinical response to neoadjuvant therapy [1]. Response quantitation remains difficult on routine T2-weighted rectal MRI, in part due to scarring, which may contain tumour [3, 4]. In our study, T2 volumetry was useful, whereas uni- and bi-dimensional measurements were not. This makes intuitive sense, since measuring anything less than the entire tumour volume presupposes spherical tumour geometry that can be approximated closely by using known equations. Dimension-based response assessment seems to work fairly well for solid organ lesions with a roughly spherical growth pattern, and thus RECIST and WHO methods have predominated. However, in the luminal GI tract, it is much more difficult to derive an accurate estimation of initial tumour burden and treatment response at imaging.

The success of DWI and DCE volumetry in predicting tumour regression in our study can be interpreted to mean that in contrast to dimension-based measurements from T2-weighted imaging, and even T2 volumetry, measurements of the volumes of protons with restricted mobility (DWI) and of tissue with abnormally permeable microvasculature (DCE) more accurately approximate true tumour volume because of the biological differences between tumours and normal cellular systems.

Our finding that DWI volumetry after neoadjuvant treatment correlates with histopathological tumour regression aligns well with prior investigations, in particular with those of Ha et al. [11] and Curvo-Semedo et al. [25], who also found that DWI volumetry performed better in the detection of pCR than did volumetry on T2-weighted sequences or ADC values. These results were recently verified in a bi-institutional study [38]. To our knowledge, the use of DCE-MRI tumour volumetry for assessment of rectal cancer response to chemoradiation has not previously been studied, and represents new information regarding rectal cancer response assessment. Since DCE-MRI volumetry performed well in this study and demonstrated reproducibility across both readers, the authors believe this parameter deserves to be included in future investigations and validated in a prospective trial.

As noted above, the parameters ADC and Ktrans, which can be calculated from the volumetric measurements on DW and DCE sequences, respectively, did not perform well in our study. Prior results achieved with these parameters have been heterogeneous and somewhat contradictory [39]. Some authors have reported correlations between low pre-treatment ADC values and/or the change of ADC values through treatment and good histopathological response [8, 15, 31], while others have reported associations between low pre-treatment ADC values and more aggressive tumours [9, 14] or could not find any associations between ADC values and tumour response or pCR status at all [25]. Similarly, some authors have reported an association between post-treatment values of Ktrans and pCR status [23], while others were unable to identify the same association [24]. The reasons for these contradictory results may lie in differences in methodological aspects of the studies, such as the selection of different b-values and models to calculate ADC from DW-MRI and the use of different temporal resolutions, different pharmacokinetic models and different methods for calculating the arterial input function (based on individual measurements or a population-based prediction) for DCE-MRI.

The fact that volumetric tumour assessment on T2-weighted imaging and mrTRG assessment correlated well with tumour regression for one reader but not the other highlights the possibility for discrepancies in tumour assessment between readers – even amongst experienced readers as in this study. The reproducibility of measured MRI parameters is of great importance, in particular, if these parameters are to be used in a clinical setting, for example, to distinguish patients with histopathological complete response from those without. Therefore, our results should be regarded as a preliminary assessment to identify a group of parameters for which a more detailed investigation, possibly in a prospective trial with multiple readers of differing levels of experience, is warranted. The addition of DCE-MRI volumetry to previously investigated parameters might be of particular value, as this new parameter not only correlated well with tumour regression, but also demonstrated superior inter-reader agreement. The issue of reproducibility extends to the different histopathological grading systems that are in use – in particular, since recent investigations have questioned both their reproducibility and their clinical use [40, 41]. The grading system used in this study [34] allows for the pathologist to specify a percentage tumour regression, which resulted in a higher inter-reader reproducibility [35] compared to other (e.g. Dworak or Mandard) scoring systems. However, the known difficulties in the histopathological tumour assessment after chemoradiation warrant further investigations into the reproducibility of this parameter.

Our study has limitations: First, we applied strict inclusion criteria to assemble a patient cohort as homogenous as possible. However, the fact that only 24 patients were included in this study limits the generalizability of our results. In addition, the total of only four patients with complete pathological response was too small to allow for more detailed statistical sub-analysis of results for this clinically important group. Secondly, due to our retrospective study design, minor deviations in MRI protocols (i.e. b-values in DW imaging) could not be accounted for and might have influenced our results. Finally, though we found significant associations between post-therapeutic tumour volume on DCE-MRI and regression on histopathology, and good inter-reader agreement for this parameter, a cut-off value is needed for this parameter, so it can be used in clinical practice (e.g. to identify patients with pCR). However, the establishment of such a threshold requires a trial with a higher number of patients and was not possible in our study.

In conclusion, our study demonstrated that DCE-MRI tumour volumetry on post-treatment examinations correlated well with histopathological tumour regression and showed better inter-observer agreement than tumour volumetry on DW imaging. Therefore, this parameter should be incorporated into the assessment of tumour response to neoadjuvant treatment of rectal cancer in future prospective trials.