Introduction

Magnetic resonance enterography (MRE) plays an important role in the assessment of inflammatory intestinal lesions in patients with Crohn’s disease (CD). Current clinical guidelines and expert consensus agree that MRE can be used as an alternative to ileocolonoscopy to evaluate disease extent and activity [1, 2]. MRE is also being increasingly used to measure therapeutic response in patients with CD [3, 4], providing crucial information to guide clinical decisions.

The management of CD is evolving toward targeting objective measures of inflammation as well as symptoms. Healing of ulcers identified at endoscopy is associated with improved long-term clinical outcomes, including reductions in surgery, hospitalizations, and clinical relapse [3, 5]. However, assessing the response to treatment by endoscopy is limited by the invasiveness of the technique, its inability to assess transmural involvement and complications (fistula/abscess), and significant rates of incomplete ileal examinations [6]. MRE’s noninvasiveness and its ability to assess transmural involvement make it an attractive alternative to endoscopy for assessing the therapeutic response, as has been demonstrated in a few clinical trials [7].

Current MRE protocols to evaluate luminal CD include intravenous contrast material, but recent confirmation of gadolinium deposits in brain tissue has raised concerns about its repeated use, especially in younger patients [8].

Among the various grading systems to measure the efficacy of treatment for CD by MRE, the best characterized is the Magnetic Resonance Index of Activity (MaRIA) [4, 9,10,11,12]. Calculating the MaRIA is time-consuming and requires data from gadolinium-enhanced sequences. To overcome these drawbacks, a simplified version of MaRIA (sMaRIA) that could potentially be calculated without gadolinium-enhanced sequences was developed [13]. Unlike the original MaRIA score, the sMaRIA does not include gadolinium enhancement as part of the components for assessing CD activity. Most MRE protocols include the use of contrast-enhanced sequences given the high spatial resolution and the morphological details of the bowel and perienteric tissue that this sequence may provide. Avoiding the use of contrast-enhanced sequences could compromise the diagnostic accuracy of detection of inflammatory lesions and their changes after medical treatment. The current study aimed to compare the accuracy of the sMaRIA with and without intravenous contrast material in assessing the response to biologic treatment for CD.

Material and methods

This is a single-center post hoc analysis of a prospective observational study. The local ethics committee approved the study (HCB/2012/7956 and amendment HCB/2016/0290), and all patients gave written informed consent before being enrolled in the study. Data from part of the cohort were published elsewhere with different outcome assessments [14, 15].

Study patients

Patients with CD candidates for treatment with biologic drugs based on the severity of lesions on ileocolonoscopy and/or MRE were invited to participate in a prospective observational study to evaluate clinical outcomes. The current study included a subset of adult patients who underwent MRE and ileocolonoscopy within 1 month before starting a biological drug and repeated both examinations after completing 46 weeks of treatment, between March 2013 and November 2019.

Study procedures

MRE studies were acquired in one of two scanners (1.5 T Aera or 3 T Vida, Siemens AG). Patients underwent both examinations on the same scanner. Supplementary Table 1 summarizes the MRE protocol in detail.

To assess CD activity by MRE, we used the sMaRIA. Supplementary Table 2 and Appendix 1 outline the components of the sMaRIA and its calculation.

The reference standard for the assessment of disease activity was ileocolonoscopy. Two endoscopists (ER, IO) in inflammatory bowel disease (more than 10 years’ experience) classified the segmental and global disease activity according to the Simple Endoscopic Score for Crohn’s Disease (SES-CD)[16] and qualitatively assessed each intestinal segment for the presence of ulceration (superficial or deep ulcers measuring > 5 mm). For comparison of ileocolonoscopy and MRE findings in each segment, the assessment of inflammatory lesions was performed by dividing the colon into 5 segments (ascending colon, transverse colon, descending colon, sigmoid colon, and rectum) and the terminal ileum.

MRE interpretation

Two radiologists (N.C. and J.R. with 4 and 14 years’ experience in reading MRE, respectively) blinded to clinical and ileocolonoscopy findings independently interpreted MRE images for each intestinal segment. To establish disease extension by MR, we used the same division into six segments as in ileocolonoscopy. Disagreements between readers for each sMaRIA descriptor were resolved by an adjudicator (S.R. with 13 years’ experience in reading MRE).

Studies were interpreted in two rounds, with an interval of 1 month between readings. In the first round, readers classified the sMaRIA descriptors only on unenhanced T2-weighted MRE images with and without fat-saturation (hereafter referred to as T2-sMaRIA). In the second round, readers classified the sMaRIA descriptors using the full set of images (i.e., unenhanced T2-weighted and gadolinium-enhanced T1-weighted images) (hereafter referred to as CE-sMaRIA).

MRE-based endpoints and definitions

We evaluated the accuracy of T2-sMaRIA and of CE-sMaRIA for classifying the response to biologic treatment in each intestinal segment and per patient. In the analysis per segments, we evaluated the ability of T2-sMaRIA and CE-sMaRIA scores < 2 to identify segments with ulcer healing defined by endoscopy [13]. In the per-patient analysis, the primary endpoint was the ability of T2-sMaRIA and of CE-sMaRIA using stringent criteria (scores < 2 in all segments) to identify patients with endoscopic ulcer healing, defined as the absence of ulceration in all segments[13]. Additionally, we analyzed the accuracy of a decrease in global T2-sMaRIA and CE-sMaRIA scores ≥ 50% as a measure of predefined radiological improvement against the reference-standard definition of endoscopic response, defined as a decrease in global SES-CD ≥ 50%[17]. Finally, we analyzed the accuracy of the two MRE approaches in identifying endoscopic remission defined as a global SES-CD < 4 and a reduction > 2 points from the baseline global SES-CD that was not predefined in the original study.

Sample size estimation

The sample size was estimated to enable the calculation of the agreement between MRE and ileocolonoscopy in the dichotomous interpretation of patient endoscopic ulcer healing. In all segments, the expected agreement between MRE and CE-sMaRIA was 0.75 [13] and the proportion of patients expected to achieve endoscopic ulcer healing was 50%[4]; thus, at least 43 patients would need to be included to ensure a precision of 0.2 with an alpha-error of 0.05.

Statistical analysis

Continuous variables are expressed as median and interquartile range (IQR: 25th–75th percentiles). Categorical variables are expressed as absolute frequencies and percentages. We used McNemar’s test for paired comparisons and Cochran’s Q tests to compare paired distributions of response assessments between segments.

Two aspects of responsiveness (defined as the ability to detect change over time) were evaluated. First, at the patient level, we determined the standardized effect size by calculating the mean change in SES-CD scores within patients between the baseline and week 46 and dividing it by the standard deviation of the baseline scores in the group of patients that had achieved endoscopic improvement. Second, we used Linn’s method to correlate the magnitude of change between the SES-CD, T2-sMaRIA, and CE-sMaRIA. In the per-segment analysis, each segment was analyzed using the above-explained method.

To determine the agreement between the SES-CD, T2-sMaRIA, and CE-sMaRIA for the dichotomous classification of therapeutic response or between readers in assessing the components of sMaRIA and disease status, we used Cohen’s kappa and their 95% confidence intervals (95%CI); when kappa was low despite high observed agreement when the marginals are imbalanced (first kappa paradox[18]), we used Bangdiwala’s B-statistic [19].

Significance was set at 5% (two-sided). We used SAS 9.4 software (SAS Institute) for all analyses.

Results

Study population

The flowchart in Fig. 1 shows how the sample was constituted. Table 1 reports the baseline characteristics of the 46 patients included in the study. The median age was 36 [IQR: 28–47] and 16 (34.8%) were male. Of these, 29 (63%) patients had isolated ileal disease and 15 (32.6%) had isolated ileocolonic disease; 26 (56.5%) had inflammatory phenotype.

Fig. 1
figure 1

Flowchart of patients with Crohn’s disease included in this study

Table 1 Patients’ baseline characteristics

A total of 266 intestinal segments (42 ileal and 224 colorectal) were evaluated by MRE and endoscopy at baseline and at week 46. At baseline, endoscopic ulcers were identified in a total of 65 (24.4%) segments; 37 (56.9%) ulcers were located in the ileum and 28 (43.1%) in the colorectum.

Proportion of patients in whom MRE correctly classified endoscopic ulcer healing

After 46 weeks of treatment, 25 (54.3%) patients met the endoscopic criteria for ulcer healing. CE-sMaRIA < 2 in all segments identified patients with endoscopic ulcer healing with 76% (59.3–92.7) sensitivity, 95.2% (86.1–100) specificity, 95% (85.4–100) positive predictive value, and 76.9% (60.7–93.1) negative predictive value. T2-sMaRIA identified patients with endoscopic ulcer healing with 80% (64.3–95.7) sensitivity, 95% (86.1–100) specificity, 95% (86.1–100) positive predictive value, and 80% (64.3–95.7) negative predictive value.

The agreement between MRE and endoscopy for identifying patients with ulcer healing was substantial for both CE-sMaRIA (kappa = 0.7 [0.5–0.9]) and T2-sMaRIA (kappa = 0.74 [0.55–0.93]) (Table 2). The agreement between CE-sMaRIA and T2-sMaRIA for this endpoint was also substantial (kappa = 0.78 [0.6–0.96]) (Table 3). Figure 2 shows the differences in the descriptors of sMaRIA after adjudication.

Table 2 Comparison of the rate of achievement of different therapeutic endpoints as classified by MRE (CE-sMaRIA and T2-sMaRIA) and ileocolonoscopy (SES-CD)
Table 3 Comparison between indexes for the correct categorization of treatment of responses at different level
Fig. 2
figure 2

Proportion of patients with persistent MRE findings of activity after treatment using sMaRIA with and without contrast-enhanced sequences

Comparison of sMaRIA indices with and without contrast enhancement to identify ulcer healing (per patient)

No significant differences were observed between the two MRE indices (T2-sMaRIA vs. CE-sMaRIA) in the rates of correct classification of patients’ status after 46 weeks of treatment as either “all ulcers healed” or “persistence of ulcers in ≥ 1 segment” against the gold standard ileocolonoscopy (Table 3).

Additional endpoints used to measure radiological response

Identification of ulcer healing per segment

Of the 65 intestinal segments with ulcers at baseline ileocolonoscopy, 44 (67.9%) achieved endoscopic ulcer healing after 46 weeks of treatment (18/37 [48.6%] ileal segments and 26/28 [92.8%] colorectal segments). When all segments were pooled, the agreement between ileocolonoscopy and MRE for the identification of segments with ulcer healing was substantial for T2-sMaRIA and almost perfect for CE-sMaRIA. However, both the CE-sMaRIA and T2-sMaRIA identified endoscopic ulcer healing better in colorectal segments than in ileal segments (Cochran’s Q test < 0.001 for both indices) (Table 2). The agreement between the two indices on identifying intestinal segments achieving endoscopic ulcer healing was high for both ileal and colorectal segments (Table 3).

When all segments with ulcers detected at baseline ileocolonoscopy were pooled, no significant differences were observed between the T2-sMaRIA and CE-sMaRIA in the rates of correct dichotomic classification of segments after treatment (ulcers vs. no ulcers) against the reference standard (Table 3).

Identification of patients with radiological improvement (≥ 50% decrease in global SES-CD at week 46)

After 46 weeks of treatment, the global SES-CD had decreased by more than 50% in 46 (65.2%) patients. The agreement between MRE and endoscopy in identifying radiological improvement was moderate for both indexes (kappa = 0.53 [0.28–0.79]) for CE-sMaRIA and k = 0.44 [0.17–0.71] for T2-sMaRIA) (Table 2). There were no differences between the two indices in the rates of patients correctly classified as responders (p = 0.7) or non-responders (p = 0.7) Table 3). The agreement between the two MRE indices for the identification of radiological improvement was substantial (kappa = 0.55 [0.28–0.78]) (Table 3).

Identification of patients with endoscopic remission (SES-CD < 4 and decrease ≥ 2 points from baseline score)

After 46 weeks of treatment, ileocolonoscopy classified 22 (47.8%) patients as in endoscopic remission (Fig. 3). As no corresponding value for endoscopic remission and transmural remission measured by sMaRIA had been published, first we used the area under the receiver operating characteristic curve (AUC) to determine the optimal global T2-sMaRIA and CE-sMaRIA cutoffs for this endpoint at week 46. The optimal s-MaRIA cutoff for endoscopic remission was ≤ 1 point (AUC = 0.79 (0.65–0.92) for global CE-sMaRIA (p < 0.001) and 0.79 (0.65–0.93) for global T2-sMaRIA (p < 0.001)) (Fig. 4). Using these cutoffs, the agreement between ileocolonoscopy and MRE was moderate for CE-sMaRIA (kappa = 0.48 [0.22–0.73]) and for T2-sMaRIA (kappa = 0.43 [0.17–0.69]) (Table 2). There were no significant differences between the two MRE indices in the rates of patients correctly classified as “in remission” (p > 0.99) or “not in remission” (p = 0.3) (Table 3). The agreement between the two MRE indices for detection of remission was almost perfect (Table 4).

Fig. 3
figure 3

A 46-year-old woman with severe, active Crohn’s disease in the terminal ileum. Axial T2-weighted images without fat saturation (a) and with fat saturation (b) show diffuse mural thickening (arrows in a and b) and high signal intensity on fat-saturated T2-weighted images. Contrast-enhanced T1-weighted images (c) show diffuse enhancement of the same segment (arrow). Fat stranding was not identified and the sMaRIA score was 2, whether calculated including contrast-enhanced images or only unenhanced images. Endoscopic image of the terminal ileum in the same patient (d) shows diffuse marked erythema, edema, and deep ulceration (arrow). After 46 weeks of treatment with a TNF-inhibitor, MRE shows a reduction in mural thickness on T2-weighted images (arrow in e), normalization of mural signal intensity on fat-saturated T2-weighted images (arrow in f), and contrast enhancement in T1-weighted images (arrow in g), resulting in a sMaRIA of 0 in both approaches. Endoscopy (h) on the same segment after 46 weeks of treatment shows the absence of inflammatory lesions

Fig. 4
figure 4

Receiver operating characteristic curves for the prediction of endoscopic remission using the sMaRIA including both contrast-enhanced and unenhanced sequences (a) and including only unenhanced sequences (b). T2-sMaRIA: simplified Magnetic Resonance Index of Activity using only unenhanced T2-weighted sequences; CE-sMaRIA: simplified Magnetic Resonance Index of Activity using both unenhanced T2-weighted and contrast-enhanced sequences

Table 4 Comparison of the rates of correct classification (agreement with endoscopic gold standard) of therapeutic endpoints with CE-sMarIA vs. T2-sMaRIA

Standardized effect sizes and correlations of magnitude of changes determined with different indices

Both the global T2-sMaRIA and global CE-sMaRIA demonstrated moderate responsiveness when an improvement of at least 0.5 standard deviations in overall endoscopic disease activity (SES-CD) was used as the criterion for radiological meaningful change. The standardized effect size measured by CE-sMaRIA was 0.63 ([0.41–0.85] p < 0.001), and the standardized effect size measured by T2-sMaRIA was 0.58 ([0.36–0.80] p < 0.001) (Supplementary Table 3).

Overall, the correlation with SES-CD for the magnitude of change from baseline to week 46 measured with the two MRE indices was similar, being classified as moderate in both analyses, per patient and per segment. The positive correlations in both analyses (patients and segments) were strongest between the MRE indices; correlations between CE-sMaRIA and SES-CD were slightly stronger than those between T2-sMaRIA and SES-CD (Table 5).

Table 5 Correlation of the magnitude of changes observed between baseline and week 46 between the SES-CD, T2-sMaRIA, and CE-sMaRIA

Interobserver agreement

The agreement between the two readers in identifying ulcer healing at the patient level was moderate for both CE-sMaRIA (kappa = 0.59 [0.34–0.82]) and T2-sMaRIA (kappa = 0.59 [0.35–0.83]). The agreement between the two readers in identifying ulcer healing at the segment level was moderate for CE-sMaRIA (kappa = 0.59 [0.4–0.76]) and substantial for T2-sMaRIA (kappa = 0.61 [0.41–0.78]). The agreement between the two readers in identifying radiological improvement was moderate for both CE-sMaRIA (kappa = 0.45 [0.19–0.69]) and T2-sMaRIA (kappa = 0.60 [0.33–0.85]).

Supplementary Table 4 reports the agreement between the two readers for the categorization of disease status according to the segmental CE-sMaRIA and T2-sMaRIA and their descriptors on post-treatment MRE examinations. We observed similar inter-observer agreement for both CE-sMaRIA and T2-sMaRIA for categorization of segments after treatment for absence of inflammatory lesions (0.74 [0.53–0.90] and 0.72 [0.52–0.87] respectively) and for healing of severe inflammatory lesions (0.55 [0.36–0.74] and 0.6 [0.40–0.78] respectively).

Discussion

Although several studies have focused on the validity [9, 20,21,22,23] and reliability [11, 24] of different MRE scoring systems for assessing activity in CD, few studies have focused on the accuracy of MRE indices in determining the response to therapy [4, 25, 26]. Determining the response to biological drugs is fundamental in both clinical practice and research. Using MRE indices for this purpose promises to improve efficiency in both applications.

Standard MRE protocols include the use of contrast-enhanced sequences that are commonly considered an essential component in the identification of inflammatory lesions. Bowel hyperenhancement was included in the analysis for the derivation of components of the sMaRIA, but was not included in the final equation of the index because it was not a significant independent predictor of activity. Recent concern about gadolinium deposits found in brain tissue after multiple doses of gadolinium-based contrast agents could limit the use of MRE in patients with CD, who require multiple reassessments of disease status throughout their lives. Avoiding gadolinium would also reduce the cost, the duration of the examination and the tolerability of MRE [27]. On the other hand, there are concerns about whether avoiding the use of contrast-enhanced sequences may compromise the diagnostic accuracy of MRE in the detection of inflammatory lesions related to CD. To determine the accuracy of the sMaRIA with and without contrast-enhanced sequences in assessing the response to therapy, we compared the results of these two approaches against the reference standard of established definitions of endoscopic response using different conceptualizations of therapeutic response. Whether contrast-enhanced sequences were included or not, we found moderate effect sizes and correlation with endoscopic measures of activity. The effect size was smaller than in an earlier study [13], probably because of differences in the baseline severity of lesions between the two cohorts. Overall, our results suggest that MRE can assess the response to biological treatments of known efficacy without the need for intravenous contrast administration.

Another important contribution of this study to the implementation of the sMaRIA in clinical research is the determination of the optimal sMaRIA cutoff for remission. We found that  sMaRIA ≤ 1 identified remission whether contrast-enhanced sequences were included or not, allowing some residual mural thickening in some segments in endoscopic remission [28, 29].

In general, we observed few cases of disagreement between the sMaRIA classifications with and without contrast-enhanced sequences in any of the approaches to measuring the response to treatment. Moreover, in the subgroup of patients who achieved endoscopic ulcer healing in all segments, the sMaRIA classifications differed from the ileocolonoscopy classifications in only 6 intestinal segments in 5 different patients. Disagreements between indices in assessing ulcer healing can be explained by the variability in the interpretation of mural signal T2-weighted sequences [11] mainly in segments that achieve a response. Disagreements between MRE and ileocolonoscopy in assessing the therapeutic response have been reported [13, 26, 30] but the reasons for these disagreements remain controversial [31]. The endpoint with the lowest observed agreement between techniques was the classification of remission, although the diagnostic accuracy of the sMaRIA with and without contrast-enhanced sequences for this endpoint in the current study was in line with that reported in similar studies [13, 25].

Although diffusion-weighted imaging (DWI) forms part of most routine MRE protocols, we kept readers blinded to this sequence. Some studies have reported an accurate role of DWI to grade inflammatory activity [32] or found that the combination of T2-weighted and DWI sequences is non-inferior to gadolinium-enhanced sequences for detecting severe inflammation[33] postulating DWI as a useful sequence complementary or alternative to contrast-enhanced sequences. However, the interpretation of findings indicating active disease on “either T2-weighted or DWI” sequences can decrease the specificity of MRE (29) and the DWI changes used to assess the therapeutic response are not well characterized [34, 35].

Our study has some limitations. This study focused on the ability of the two approaches to applying the sMaRIA to identify the response to treatment at different levels but did not evaluate their ability to detect penetrating complications (i.e., fistulas, sinus tract, and/or abscesses). Persisting or worsening luminal CD lesions can lead to penetrating complications, and contrast-enhanced sequences seem to increase MRE’s sensitivity for detecting these lesions [33, 36]. Thus, caution is warranted in forgoing contrast-enhanced MRE in clinical practice. Moreover, most patients (82%) included in this study were treated with TNF inhibitors. Although our conclusions are likely to be relevant for other drugs with different mechanisms of action, this assumption remains to be tested. The results of our study would be much more generalizable if more readers were included from outside the authors’ institution who are perhaps less used to scoring using sMaRIA. Lastly, estimates of indices’ ability to characterize patients’ responses to treatment might be also influenced by the characteristics of the cohort. Most patients in this study had ileal or ileocolonic disease and a few had isolated colonic disease; patients with isolated colonic disease were more likely to decline MRE follow-up because they perceive marginal benefits of undergoing both examinations because of the lower risk of developing complications related to CD. However, the total number of colonic segments with ulcers at baseline (28/65) provided sufficient data for us to explore the accuracy of the sMaRIA in assessing colonic inflammation.

In conclusion, we demonstrate that the sMaRIA with or without contrast-enhanced images accurately characterized the response to biologic treatment, based on robust criteria for clinically meaningful change. The reliability and feasibility of this index favor its implementation to measure the efficacy of treatments for CD in clinical research.