Introduction

It is estimated that 1–8% of children presenting with acute abdominal pain are diagnosed with appendicitis [1, 2]. Ultrasound (US), often the initial imaging study, is helpful when an inflamed or normal appendix is seen, but when the appendix is not visualized or results are equivocal additional imaging is sometimes necessary [3, 4]. Computed tomography (CT) is highly accurate, but pediatric use is tempered by concerns about radiation [5, 6].

Use of MRI has been hampered by limited availability in the acute setting and by patient motion during lengthy acquisitions [7, 8]. The latter has warranted sedation in many children, imposing risks of aspiration and respiratory suppression, lengthening time to diagnosis, and increasing costs and demands on personnel [9, 10]. Similarly, use of intravenous contrast agents to improve appendix visualization has been associated with patient discomfort, potential delays obtaining venous access, and additional burdens on staff [11]. Recent MRI advancements such as parallel imaging and Dixon techniques have led to faster image acquisition and improved image quality and could obviate the need for sedation and contrast agents in some children, making MRI more acceptable [12,13,14].

Early MRI results without contrast agents or sedation for diagnosis of pediatric appendicitis have been favorable (sensitivity 93–100%; specificity 97–100%), with a recent meta-analysis further supporting these findings [15,16,17,18,19,20,21]. Diagnostic criteria have included maximum appendix diameter ≥7 mm, appendix wall thickness ≥2 mm, presence of intraluminal fluid and presence of localized periappendiceal fluid [22,23,24,25,26,27]. Yet those individual criteria have largely been extrapolated from US and CT findings and their applicability to MRI is unknown. A commonly used US cut-point of 6 mm for appendix diameter yielded 91.6% accuracy; however authors have suggested that 6 mm is too sensitive and have shown 7 mm to be the optimal cut-point [28,29,30]. Similarly, cut-points of 8–9 mm provided the best balance of sensitivity and specificity for CT in children and adults [31, 32]. Cut-points optimizing sensitivity and specificity of appendix wall thickness have been reported to be 1.7 mm and 3.5 mm by US and CT, respectively [32, 33]. While Swenson et al. [34] reported normal pediatric appendix diameters on MRI (mean 5–6 mm), optimal MRI cut-points for appendix diameter and wall thickness in pediatric appendicitis have not been determined. Similarly, presence of intraluminal and periappendiceal fluid has been associated with appendicitis on CT, but sensitivity and specificity of these criteria on MRI are not well known [35, 36].

We performed this study to evaluate performance characteristics of rapid MRI using neither contrast agents nor sedation for diagnosis of pediatric appendicitis and to assess applicability of previously described diagnostic criteria this setting.

Materials and methods

Our institutional review board (IRB) approved this study. We included patients ages 4–18 years with suspected appendicitis and an Alvarado score ≥4 who presented between October 2013 and March 2015. Based on published criteria, Alvarado scores were determined by the primary author (R.A.D., a pediatric radiology fellow) by review of charted signs, symptoms, vital signs and laboratory values, blinded to the imaging findings, with scores ≥4 suggesting a moderate to high likelihood of appendicitis [37]. Exclusion criteria included pregnancy and inability to tolerate the MRI examination. Patients were either prospectively recruited or retrospectively enrolled after MRI was performed for equivalent indications at the request of the clinician. We obtained informed parental consent for prospectively recruited subjects; the IRB waived informed consent for retrospectively recruited subjects. Parents were allowed to remain with their child if they had no contraindication to entering the MRI environment.

Five MRI sequences (Table 1) were performed on one of two 1.5-T scanners (Philips Ingenia and Philips Achieva; Philips Healthcare, Cleveland, OH). Subjects were assigned to breath-held or free-breathing protocols depending on their ability to comply with instructions as determined by the MRI technologist. Patients were positioned supine and imaged from the top of the kidneys through the pubic symphysis. No intravenous contrast agent or sedation was administered.

Table 1 Parameters for MRI without contrast agents or sedation for diagnosis of pediatric appendicitis

We recorded demographic data including subject age, gender, body mass index — or (weight in kilograms/height in centimeters)2 — and clinical outcomes. Standard treatment for appendicitis was surgical appendectomy, either upon presentation or, in the setting of perforation, after percutaneous abscess drainage or 6 weeks of antibiotic therapy (“interval appendectomy”). Surgical pathology, if available, served as the reference standard for diagnosis, although subjects who underwent interval appendectomy were considered positive even if acute inflammatory changes were absent from post-treatment pathology specimens. Clinical outcomes were considered negative if subjects were discharged from the hospital and did not require appendectomy within 3 months of presentation. We searched a shared medical record database to determine whether subjects underwent appendectomy at ours or another hospital after discharge.

Two radiologists, radiologist A (K.L.H.) with 19 years of experience in pediatric radiology and radiologist B (B.R.F.) with 5 years of experience in body imaging, independently and retrospectively reviewed the MRI examinations while blinded to clinical information, US results and outcomes. Reviewers recorded whether they could identify any part of the appendix and, if so, its largest diameter and greatest thickness along with the presence or absence of T2-hyperintense intraluminal or periappendiceal fluid. Using a 5-point Likert scale, reviewers rated the degree to which MRI scans were positive or negative for appendicitis (1=definitely negative, 2=probably negative, 3=indeterminate, 4=probably positive, 5=definitely positive); for purposes of analysis, mean scores of ≤3 were considered negative and >3 positive. Reviewers also rated their level of diagnostic confidence on a 5-point Likert scale (1=not at all confident, 2=somewhat unconfident, 3=neutral, 4=somewhat confident, 5=very confident).

All statistical tests were performed using R (R Foundation for Statistical Computing, Vienna, Austria). Results were considered statistically significant at P<0.05. We determined performance characteristics (sensitivity, specificity, positive predictive value and negative predictive value) for overall MRI diagnosis and for categorical diagnostic criteria (intraluminal fluid and periappendiceal fluid) using 2 × 2 contingency tables. Agreement between reviewers with respect to Likert scores was determined by Pearson’s correlation coefficient (r), and reviewer agreement with respect to presence of intraluminal or periappendiceal fluid was evaluated using kappa statistics. Student’s t-test was used to compare differences in age and body mass index (BMI) between subjects whose MRI scans were correctly or incorrectly interpreted and to compare appendix diameter and appendix wall thickness between subjects without and with appendicitis. We used the Pearson chi-square tests or Fisher exact tests to compare gender and breathing technique between subjects whose MRI scans were correctly versus incorrectly interpreted, to compare visualization of the appendix in MRI scans done with breath-held and free-breathing techniques, and compare differences in intraluminal fluid visualization and periappendiceal fluid visualization between subjects without and with appendicitis. We applied univariable logistic regression models to evaluate the effects of both age and BMI on diagnostic confidence, and we plotted receiver operating characteristic (ROC) curves to illustrate MRI performance over the spectra of continuous diagnostic criteria (appendix diameter and wall thickness) to identify optimal cut-points for these variables. Additionally, we calculated area under the curve (AUC) as a measure of how well each ROC curve separated subjects into those with and without appendicitis. Multivariable logistic regression models were applied to assess the strength of associations between diameter and wall thickness cut-points and clinical outcome.

Results

We enrolled 98 children and young adults (mean age ± standard deviation [in years] 11.0 ± 3.7; range [in years] 4.2–17.9). One patient underwent MRI twice for distinct episodes of pain separated by 88 days, and her examinations were considered independently. One patient was unable to tolerate the MRI (no images obtained) and was excluded from further analysis. Therefore a total of 98 MRI scans from 97 patients were generated for analysis (54 prospective; 44 retrospective). Each patient was on the MRI table for less than 30 min, although exact scan duration could not be determined because of unrecorded length variability associated with respiratory-triggered sequences. Ninety-five percent of MRI examinations (93/98) were implemented fully. In 5% (5/98), imaging was terminated before DWI was performed. Premature termination was for anxiety (n=1), motion (n=3) or pain (n=1).

Thirty-one percent of patients (30/97) underwent urgent laparoscopic appendectomy with a positive rate of 90% (27/30). The patient who underwent two MRI scans, both interpreted as positive, was discharged home after the first MRI because of non-compelling clinical signs and was treated with laparoscopic appendectomy after the second, with positive pathology. No alternative pathology was identified in the 10% (3/30) of patients with negative appendectomies. An additional 5% of patients (5/97) with complicated appendicitis were admitted for antibiotics or percutaneous abscess drainage, undergoing interval appendectomy at a later date. Two percent of patients (2/97) underwent surgical intervention for alternative diagnoses (ovarian teratoma with torsion and Meckel diverticulum with small-bowel volvulus). Three percent of patients (3/97) were admitted for non-surgical diagnoses (pneumonia, gastroenteritis and pyelonephritis), 13% (13/97) were admitted for observation and ultimately discharged, and 46% (45/97) were discharged directly, most commonly with a diagnosis of gastroenteritis. Other than the patient with two MRI scans, no patients who were discharged without surgery underwent appendectomy, either at our institution or another regional hospital, during 3 months of follow-up.

Thirty-five percent of MRI scans (34/98) were considered positive for appendicitis (Fig. 1). Overall performance characteristics of MRI are shown in Table 2. Two false negatives occurred in patients who underwent laparoscopic appendectomy with positive pathology. Three false positives occurred with all three patients discharged home uneventfully. The rate of discovery of alternative diagnoses was 5% (5/98), including pneumonia, pyelonephritis, right inguinal hernia, small-bowel volvulus, and ovarian teratoma with torsion.

Fig. 1
figure 1

MR images in a 16-year-old girl with appendicitis. a, b Coronal T2-W single-shot fast spin-echo (FSE; a) and axial T2-W single-shot fat-saturated FSE (b) images demonstrate a fluid-filled dilated appendix (arrows) in the right lower quadrant containing a T2-hypointense appendicolith with surrounding T2-hyperintense inflammatory edema and an adjacent fluid collection with a fluid-debris level (arrowhead), consistent with a periappendiceal abscess

Table 2 Performance characteristics of MRI in diagnosis of pediatric appendicitis

Treating the Likert scale for diagnosis of appendicitis as a continuous variable, Pearson’s r for reviewer agreement was 0.86 (P<0.01, where 0=no concordance and 1=perfect concordance). Conflicting interpretations (positive versus negative) were rendered by the reviewers for 6% (6/98) of MRIs, all in subjects with clinical outcome negative for appendicitis. For the 23% (22/98) of MRI scans in which at least one reviewer was unable to identify the appendix, clinical outcome was negative for appendicitis.

Table 3 shows differences in patient demographics and breathing protocols relative to MRI interpretation (correct versus incorrect diagnosis). Subject gender was not significantly associated with reviewer confidence or MRI interpretation (P=0.53 and P=0.15). Younger age and smaller BMI were significantly associated with reduced diagnostic confidence (P=0.02 and P<0.01) but neither was significantly associated with MRI interpretation (P=0.87 and P=0.67).

Table 3 Demographics and breathing technique by MRI interpretation

Seventy-eight percent (76/98) of MRIs were performed with breath-holding, and 23% (22/98) were performed with free-breathing. Average subject age in breath-holding and free-breathing groups was 11.9 years and 7.8 years, respectively (P<0.01). At least one reviewer reported visualizing the appendix in 95% (72/76) of breath-held examinations and in only 77% (17/22) of free-breathing examinations (P=0.01). However, breathing technique was not associated with a significant difference in MRI interpretation (correct versus incorrect diagnosis; P=0.58). A trend toward decreased reviewer confidence was observed with free-breathing examinations, but this trend also did not reach statistical significance (P=0.08).

Table 4 compares subject demographics and MRI findings to clinical outcomes (negative or positive for appendicitis). There was no significant difference in age, gender or BMI between patients without and with appendicitis. Differences in appendix diameter and wall thickness did reach statistical significance (P≤0.01). Kappa statistics for reviewer agreement for the presence of intraluminal and periappendiceal fluid were 0.86 and 0.60, respectively (P<0.01). In patients with appendicitis, reviewer agreement for presence of intraluminal fluid was 93% (65/70) (Cohen’s kappa=0.85, P<0.01) and periappendiceal fluid was 80% (56/70) (Cohen’s kappa=0.59, P<0.01). Presence of intraluminal fluid alone yielded a sensitivity, specificity, positive predictive value and negative predictive value of 91%, 60%, 66% and 89%, respectively. Similarly independent consideration of periappendiceal fluid yielded a sensitivity, specificity, positive predictive value and negative predictive value of 97%, 50%, 61% and 96%, respectively.

Table 4 MRI findings by clinical outcome

ROC curves for appendix diameter and wall thickness are displayed in Figs. 2 and 3. AUC was 0.93 for the diameter curve, and AUC was 0.83 for the wall thickness curve, indicating that diameter and wall thickness are good predictors of clinical outcome. Predictive power was maximized by cut-points of 7.250 mm for diameter (sensitivity 88% and specificity 88%) and 2.475 mm for wall thickness (sensitivity 71% and specificity 93%). Because such measurements are below the current spatial resolution limits, we rounded cut-points to 7 mm and 2 mm, respectively, yielding sensitivities of 91% and 88% and specificities of 84% and 43%. In multivariate logistic regression models comparing clinical outcome with the rounded cut-points, odds ratios were 14.6 for diameter ≥7 mm (95% confidence interval [CI], 2.6–109.0; P<0.01) and 5.6 for wall thickness ≥2 (95% CI, 0.9–38.1; P=0.06), demonstrating high association between these two variables and clinical outcome of appendicitis.

Fig. 2
figure 2

Appendix diameter on MRI for pediatric appendicitis. Receiver operating characteristic curve shows sensitivity (sens) and specificity (spec) of appendix diameter with a cut-point of 7 mm

Fig. 3
figure 3

Appendix wall thickness on MRI for pediatric appendicitis. Receiver operating characteristic curve shows sensitivity (sens) and specificity (spec) of appendix wall thickness with a cut-point of 2 mm

Discussion

This study demonstrates that pediatric MRI done with neither contrast agents nor sedation has excellent sensitivity (94%) and specificity (95%) for appendicitis, similar to previously published reports (sensitivity 93–100%, specificity 89–100%) [15,16,17,18,19,20,21]. Overall performance characteristics are also comparable to those described for contrast-enhanced MRI done with sedation (sensitivity 96%, specificity 96%) [38, 39] and contrast-enhanced CT (sensitivity 95–100%, specificity 94–99%) [40, 41]. Our findings provide further evidence that rapid MRI for appendicitis is feasible and might obviate contrast and sedation needs in many children.

ROC analysis revealed that cut-points of ≥7 mm for appendix diameter and ≥2 mm for appendix wall thickness provided the greatest predictive power. These two criteria are good predictors of clinical outcome, with AUC values of 0.93 and 0.83 for appendix diameter and wall thickness, respectively. However when considered in isolation, these diagnostic criteria performed relatively poorly, providing lower sensitivities and much lower specificities. Similar findings have been reported in adults [23]. If used in isolation, appendix diameter and mural thickness would lead to an undesirable number of false-positive and false-negative interpretations.

Intraluminal and periappendiceal fluid also provided high sensitivities but relatively poor specificities. This result is perhaps surprising since, on the one hand, MRI is exquisitely sensitive to the presence of abdominal fluid and, on the other, fluid in these locations can occur with other pathologies as well as in the absence of pathology [7]. Accordingly, presence of intraluminal or periappendiceal fluid as independent criteria is insufficient for rendering a diagnosis of appendicitis by MRI.

When the appendix is not confidently visualized, diagnostic criteria are also unhelpful. At least one reviewer was unable to identify the appendix in nearly one-quarter of our MRI examinations, consistent with previously published MRI results in adults [42]. Interestingly, no patients in whom the appendix was not identified were ultimately diagnosed with appendicitis. In the hands of experienced readers, therefore, non-visualization of the appendix on MRI might be sufficient for excluding appendicitis. This idea is supported by a study by Nikolaidis et al. [43], in which non-visualization of the appendix and absence of localized right lower quadrant inflammatory changes on CT were found to be adequate criteria for a negative diagnosis in adults. Further study is warranted in larger populations and with reviewers at various levels of experience to determine broad applicability of this finding to MRI.

Our reviewers were less confident in their final diagnoses in subjects of younger age and with smaller BMIs. Relative absence of intra-abdominal fat might have contributed by reducing conspicuity of the appendix in patients with these characteristics. Although a trend was noted, there was no statistically significant association between free-breathing and poor reviewer confidence. Yet, irrespective of breathing protocol, young children might have been more prone to gross motion during imaging. Despite their influence on reviewer confidence, age and BMI were not shown to correlate with overall MRI interpretation (correct versus incorrect diagnosis). Therefore, radiologists can be reassured that decreased diagnostic confidence, particularly in young or small patients, does not necessarily diminish diagnostic performance.

While excellent MRI performance characteristics have been consistently reported in pediatric appendicitis, several issues might hinder its implementation. Providing MRI around-the-clock is not feasible in every location, and at imaging centers with high clinical volumes, fitting urgent unscheduled MRIs into an already full schedule might be a logistical challenge. Performing rapid MRI examinations without contrast agents or sedation has the potential to improve patient throughput, reduce demands on personnel, and thereby facilitate implementation. Sedation will almost certainly still be required in some patients, including those younger than 4 years, who were not included in this study.

This study has several additional limitations. We included subjects retrospectively, potentially introducing selection bias if clinicians preferentially ordered MRIs for patients who were more likely to hold still for the examination. Our small number of false-positive and false-negative results precluded more detailed statistical analysis, including multivariable logistic regression models of diameter and wall thickness that would have helped to determine best predictors of pathology. Although pathology is the gold standard for diagnosis, negative clinical outcome was used as a surrogate for normal pathology in this study. While considered unlikely because we used a shared regional database for follow-up, subjects could have undergone appendectomy elsewhere without the researchers’ knowledge, erroneously lowering our false-negative rate. Applicability of results to 3-T magnets, which are increasingly available in clinical practice, is not known, and cost–benefit analysis of MRI in pediatric appendicitis has yet to be performed. Last, our study did not explore the diagnostic performance of individual MRI sequences, which could be explored in future studies.

Conclusion

MRI without contrast agents or sedation maintains excellent performance characteristics in the evaluation of pediatric appendicitis when diagnostic criteria are used in aggregate, similar to those achieved with US, CT and contrast-enhanced, sedated MRI. Individual diagnostic criteria, including appendix diameter, appendix wall thickness, and presence of intraluminal fluid or periappendiceal fluid, demonstrate good sensitivities but poor specificities. Optimal cut-points were ≥7 mm for appendix diameter and ≥2 mm for wall thickness. Non-visualization of the appendix on MRI favors a negative diagnosis.