Introduction

Autoimmune pancreatitis (AIP) is a rare but distinctive type of chronic pancreatitis, which responds markedly well to steroid therapy [1]. The radiological and clinical features of AIP are similar to those of pancreatic ductal adenocarcinoma (PDA), making differential diagnosis important for avoiding unnecessary surgery in patients with AIP [2,3,4]. Diffuse-type AIP can be easily distinguished from PDA based on its characteristic imaging findings, i.e. diffuse sausage-like pancreatic swelling with peripancreatic capsule-like rim or halo sign [2,3,4]. Non-diffuse-type AIP typically manifests as a focal pancreatic mass and main pancreatic duct (MPD) stricture on imaging studies, and thus differential diagnosis from PDA is challenging. Some investigators have attempted to find differential imaging features between non-diffuse-type AIP and PDA, using computed tomography (CT) and magnetic resonance imaging (MRI) or magnetic resonance pancreatography (MRP), in various studies [5,6,7,8,9,10,11,12]. Several key imaging features, with regard to morphology of the pancreatic mass and MPD stricture pattern, were identified to be important to differentiate the two diseases in previous studies.

Currently, CT is the most commonly used, non-invasive imaging modality to evaluate both pancreatic and extrapancreatic findings in patients with suspected AIP. However, in our experience, CT has limitations in demonstrating some specific findings of non-diffuse-type AIP, including multiple pancreatic masses and multiple MPD strictures, while these findings appear to be somewhat clearly visualized on MRI/MRP. The excellence of MRI/MRP in depicting the characteristic findings of non-diffuse-type AIP has been addressed in previous studies [5,6,7,8,9]. However, little is known about the diagnostic performance of CT or MRI/MRP in AIP. Moreover, to the best of our knowledge, there have been no studies comparing the diagnostic performances of CT and MRI/MRP in differentiating the two diseases. Therefore, we conducted this study to intra-individually compare the diagnostic performance of CT and MRI with MRP in differentiating non-diffuse-type AIP from PDA.

Materials and methods

Study population

This retrospective study was approved by our institutional review board, and patient informed consent was waived. Through a search of our medical database, we identified 169 patients diagnosed with AIP, according to the Asian [13] or HISORt [14] criteria, or international consensus diagnostic criteria [15], in our institution between January 2007 and December 2016 (Fig. 1). Among these, 61 patients (44 men and 17 women; mean age ± standard deviation, 58.4 ± 11 years [range, 18–81 years]) were finally included in this study, based on the following inclusion criteria: (a) the presence of dynamic contrast-enhanced CT, including both arterial and portal phases; (b) dynamic contrast-enhanced MRI, including arterial, portal and delayed phases and MRP before steroid treatment; and (c) non-diffuse-type AIP. In criterion (c), the non-diffuse-type was defined as the presence of a focal mass involving less than half of the total pancreas or multifocal masses observed on CT and MRI. The cases with any mass involving more than half of the pancreas, regardless of lesion multiplicity, were considered diffuse type, and thus were excluded from the study. If diffuse or non-diffuse-type was equivocal, we referred to MRP and, if available, endoscopic retrograde pancreatography (ERP) findings. Among the 61 patients, histological diagnosis was performed for 38, and two had type 2 AIP [15].

Fig. 1
figure 1

Flow diagram showing the selection of the study population

From January 2013 to December 2015, 782 patients underwent curative-intent surgery for PDA in our institution (Fig. 1). To create a 1:2 matching with the AIP group, we randomly selected 122 (77 men and 45 women; mean age ± standard deviation, 62 ± 9.3 years [range, 40–83 years]) among the 564 patients with PDA who underwent both dynamic contrast-enhanced CT and MRI with MRP before surgery, using a commercially available random number generator (QuickCalcs, GraphPad Software).

Imaging examinations

CT techniques

The CT examinations were performed on 16-, 64- or 128-multidetector CT scanners (Somatom Sensation 16, Somatom Definition AS or AS+, Siemens Medical Systems; LightSpeed VCT, GE Healthcare), using dual-phase scanning, including the arterial and portal phases. The arterial and portal phase scans were obtained with a 15- to 25-s delay using a bolus-tracking technique and a fixed 70- to 80-s delay, respectively, after intravenous injection of 2–2.5 ml/kg of iopromide (Ultravist 370, Bayer Schering Pharma) at a rate of 3 ml/s, using an autoinjector. The parameters of the GE 64-multidetector scanner were a detector collimation of 0.625 mm, rotation time of 0.5 s, pitch of 0.98, tube voltage of 120 kV, automated tube current modulation with a noise index (200–400 mA) and reconstructed slice thickness of 2.5 mm. The parameters of the Siemens 16-, 64- and 128-multidetector scanner were detector collimations of 0.75, 0.6 and 0.6 mm, respectively; rotation time of 0.5 s; pitch of 1; tube voltage of 120 kV; automated dose modulation, using the maximum allowable tube current set at 200 mA; and reconstructed slice thickness of 3 mm.

MRI techniques

The MRI examinations were performed using a 1.5-T unit (Magnetom Avanto or Vision; Siemens Medical Solutions). The scan parameters were as follows: pre-contrast T1-weighted images with fat suppression using a fast low-angle shot (repetition time [TR]/echo time [TE], 224/2.5 ms; flip angle, 70°; section thickness, 6 mm; field of view [FOV], 240 × 350; matrix, 176 × 256); T2-weighted images with fat suppression using respiratory-triggered fast-spin echo (TR/TE, 3,600–5,100/96 ms; flip angle, 150°; echo train length, 13; section thickness, 6 mm; FOV, 240 × 350 mm; matrix, 264 × 384) or half-Fourier acquisition single-shot turbo spin-echo (TR/TE, infinite/154; flip angle, 150°; echo train length, 256; section thickness, 6 mm; FOV, 240 × 350 mm; matrix, 192 × 256); MRP with thick-slab (40 mm) images using single-shot rapid acquisition with relaxation enhancement sequence during one breath-hold (TR/TE, infinite/1,000 ms; flip angle, 180°; echo train length, 256; FOV, 300 × 300 mm; matrix, 320 × 320); and contrast-enhanced T1-weighted images with dynamic triple-phases (arterial, portal and delayed phase) using a fat-suppressed spoiled gradient-echo sequence (volumetric interpolated breath-hold examination; TR/TE, 4.1–4.2/1.5–1.7 ms; flip angle, 10°; section thickness, 4 mm; FOV, 280 × 350 mm; matrix, 256 × 320) at 10 s (arterial phase), 50 s (portal phase) and 3 min (delayed phase) after intravenous injection of gadobenate dimeglumine (MultiHance; Bracco SPA; 0.1 mmol/kg body weight) or gadoteric acid (Dotarem; Guerbet; 0.2 mmol/kg body weight) at a rate of 2 ml/s, using an autoinjector.

The time intervals between CT and MRI were 0–36 days (median, 5 days) in the AIP group and 0–34 days (median, 3 days) in the PDA group.

Image analysis

All images were reviewed using a local picture archiving and communication system monitor and digital imaging and communications in medicine imaging software. Two board-certified abdominal radiologists, who were highly experienced in diagnosing AIP and PDA, independently reviewed and rated the CT and MRI findings of all 183 patients (61 AIP and 122 PDA) in a random order. The radiologists were blinded to the clinical data, imaging results and final diagnosis, but were aware that the study population consisted of AIP and PDA. The raters used a 5-point scale to rate their confidence in differentiating between the two diseases for each CT and MRI finding: 1 – definite AIP; 2 – probable AIP; 3 – indeterminate; 4 – probable PDA; and 5 – definite PDA. A rating of 1 or 2 was considered to indicate a diagnosis of AIP; a rating of 3, indeterminate; and a rating of 4 or 5, PDA. They subjectively rated their confidence in the diagnosis without any given criteria. They considered both pancreatic and extrapancreatic (e.g. sclerosing cholangitis, renal involvement or retroperitoneal fibrosis in AIP) findings appropriately in the decision of diagnosis rating. The time interval between the reviews of CT and MRI was at least 1 month to avoid recall bias.

One month after completion of the diagnosis rating for all patients, the raters reviewed all the images again to determine, in consensus, the presence of the following key imaging features for differentiating between AIP and PDA for each CT and MRI finding: multiple pancreatic masses; discrete pancreatic mass (discernible focal hypoattenuating or hypointense mass, with visible border between the mass and the normal pancreas on unenhanced or contrast-enhanced images); delayed homogeneous enhancement of the pancreatic mass (hypoenhancement in the arterial phase and homogeneous iso- or hyperenhancement in the portal or delayed phase compared with the normal pancreas [16, 17]); halo sign (thin or thick, continuous, rim-like hypoenhancing soft tissue lesion outlining the pancreas); MPD stricture and its patterns (abrupt vs. tapered narrowing); multiple MPD strictures; marked upstream MPD dilatation (> 5 mm in maximum diameter); upstream pancreatic atrophy; and other organ involvement, including sclerosing cholangitis (intrahepatic or suprapancreatic extrahepatic bile duct), renal lesion [18, 19] and retroperitoneal fibrosis.

Laboratory data

The serum concentrations of immunoglobulin G (IgG), immunoglobulin G fraction 4 (IgG4) and carbohydrate antigen 19-9 (CA19-9), if available, were recorded.

Statistical analysis

Receiver operating characteristic (ROC) analysis was used to compare the diagnostic performance of CT and MRI, for each rater, in differentiating AIP from PDA. The sensitivity of CT and MRI for the diagnosis of AIP and PDA, as determined by each rater, was compared using McNemar’s test. To measure the agreement between the two raters in terms of confidence ratings for differentiating AIP from PDA and the diagnosis based on the confidence ratings, linear weighted κ values were calculated. A κ value greater than 0 indicated a positive correlation (κ<0.21, poor; κ=0.21–0.40, fair; κ=0.41–0.60, moderate; κ=0.61–0.80, good; and κ>0.80, excellent agreement) [20]. The frequency of the key imaging features for differentiating AIP from PDA were compared between CT and MRI using McNemar’s or the chi-squared test, as appropriate. P<0.05 was considered statistically significant. SPSS for Windows version 21.0 (IBM Corp.) and MedCalc for Windows version 12.5.0.0 (MedCalc) were used for the statistical analyses.

Results

Diagnostic performances of CT and MRI

The diagnostic performances of CT and MRI, including the results of ROC analysis and sensitivity for the diagnosis of AIP and PDA, are presented in Table 1. The area under the ROC curve (Az value) was significantly greater for MRI (0.993–0.995) than for CT (0.953–0.976) for both raters (p≤0.035). The sensitivities of MRI were higher than those of CT for the diagnosis of AIP (88.5–90.2% vs. 77–80.3%, p≤0.07) and PDA (97.5–99.2% vs. 91.8–94.3%, p≤0.031) for both raters, although the difference for AIP was statistically marginal (p=0.07) for rater 1.

Table 1 Diagnostic performances of CT and MRI

The two raters agreed on the confidence ratings for differentiating AIP from PDA using MRI and CT in 79.2% (145/183) and 61.7% (113/183) of the cases, respectively, and the corresponding linear weighted κ value was higher for MRI (κ=0.858, excellent agreement) than for CT (κ=0.714, good agreement). The two raters agreed on the diagnosis of AIP and PDA using MRI and CT in 95.6% (175/183) and 86.9% (159/183) of the cases, respectively, and the corresponding linear weighted κ value for MRI (κ=0.923, excellent agreement) was also higher than that for CT (κ=0.806, excellent agreement).

Key imaging features on CT and MRI

The comparison of CT and MRI for the presence of key imaging features to differentiate between AIP and PDA is summarized in Table 2. For AIP, multiple pancreatic masses (39.3% vs. 21.3%), delayed homogeneous enhancement of the pancreatic mass (93.4% vs. 73.8%), MPD stricture (85.2% vs. 54.1%), and multiple MPD strictures (47.5% vs. 11.5%) were observed significantly more frequently using MRI than CT (p≤0.008). The frequencies of halo sign, MPD stricture pattern and other organ involvement in AIP did not significantly differ between CT and MRI (p≥0.317). For PDA, discrete pancreatic mass (87.7% vs. 75.4%), MPD stricture (86.1% vs. 78.7%) and upstream pancreatic atrophy (57.4% vs. 50.8%) were observed significantly more frequently using MRI than CT (p≤0.039). The frequencies of MPD stricture pattern and marked upstream MPD dilatation in PDA did not significantly differ between CT and MRI (p≥0.18). Representative cases are presented in Figs. 2 and 3.

Table 2 Key imaging features observed on CT and MRI
Fig. 2
figure 2

A 63-year-old man with AIP. Arterial-phase CT (a–c) and pre-contrast T1-weighted MR (d–f) images show ill-defined mass-like swelling (arrows in a and d) in the pancreatic tail. The MR images demonstrate two small hypointense nodules (arrowheads) in the pancreatic body or tail (e) and head (f), which are not detectable on CT

Fig. 3
figure 3

A 62-year-old man with AIP. Arterial-phase CT (a–c) and pre-contrast T1-weighted MR (d–e) images show multifocal hypodense/hypointense lesions (arrows) in the pancreatic body and tail. The lesions appear more conspicuous on MRI than CT. The slightly dilated MPD (arrowhead) is barely visible on CT (b). Conversely, heavily T2-weighted axial MR image (f) and MRP image (g) demonstrate the unrevealed (on CT) signal of the slightly dilated MPD (arrowheads), indicating the multifocal strictures extremely well. Arterial (h) and delayed (i) phase MR images show delayed homogeneous enhancement in the pancreatic lesions (arrows)

Laboratory data

The serum concentrations of IgG (> 1,800 mg/dl), IgG4 (> 135 mg/dl) and CA 19-9 (> 37 U/ml) were elevated above the normal limit in 17.5% (10/57), 38.6 % (22/57) and 16.7% (10/61) of the patients with AIP, respectively. The serum CA19-9 level was elevated in 70.5% (86/122) of the patients with PDA. None of the patients with PDA exhibited elevated serum IgG (0/47) and IgG4 (0/46) levels.

Discussion

Our study shows that MRI provides better diagnostic performance than CT in differentiating non-diffuse-type AIP from PDA. The better diagnostic performance of MRI was due to the superiority of MRI over CT in demonstrating several key distinguishing features of the two diseases, particularly for AIP: multiple pancreatic masses, delayed homogeneous enhancement of the pancreatic mass and multiple MPD strictures. It is important to detect multiple pancreatic masses for differentiating AIP from PDA because this finding is highly specific for AIP, as demonstrated in our study as well as in the literature [5, 6]. The pancreatic mass observed in AIP usually appears as a mass-like swelling, without significant difference in the degree of contrast enhancement compared with the normal pancreas, unlike the discrete hypoenhancing mass observed in PDA. Moreover, CT inherently has poor soft-tissue contrast, and thus detection of subtle mass, particularly small lesion, in AIP is relatively difficult. Conversely, MRI has excellent soft-tissue contrast, and fat-suppressed pre-contrast T1-weighted images have outstanding ability to detect pancreatic mass, even small lesions. These factors may contribute to the large difference observed in this study between MRI (39.3%) and CT (21.3%) in demonstrating multiple pancreatic masses in AIP. In previous studies reporting CT or MRI findings of AIP, the multiplicity was observed more frequently on MRI (33–44%) [5, 6] than CT (6%) [4], which supports our results.

The diagnostic value of evaluating enhancement patterns using MRI in AIP was addressed in a previous study by Rehnitz et al. [7]. In their study, MRI was found to be significantly better than CT at revealing delayed enhancement of the pancreatic mass in AIP, and similar results were obtained in our study. These results may be attributed to the higher soft-tissue contrast of MRI, as suggested in the previous study. However, in our study, delayed homogeneous enhancement of the pancreatic mass on CT might presumably be slightly underestimated due to the lack of a 3-min delayed phase, unlike MRI.

The considerable difference between MRI (47.5%) and CT (11.5%) in evaluating multiple MPD strictures in AIP is a noteworthy result of our study. The CT images seem to have inherent limitations in depicting the MPD stricture in AIP, which mostly manifests as tapered rather than abrupt narrowing, with upstream MPD dilatation being absent or mild, considerably dissimilar to that in PDA, as shown in this study. Thus, it is not easy to detect MPD stricture itself using CT. Moreover, it may be even more difficult to detect the slightly dilated MPD between subtle masses in multifocal type AIP using CT. Conversely, on MRI, especially heavily T2-weighted cross-sectional or MRP images, MPD narrowing or non-visualization in strictured segment as well as the dilated upstream MPD, even if mild, would be easily perceivable because water or fluid is remarkably accentuated as a very bright signal, whereas the other structures appear hypointense and dark. The large difference in the detection of MPD stricture in AIP between MRI (85.2%) and CT (54.1%) can be understood in the same context. MRI, including MRP, is frequently used in the diagnosis of AIP since its various strengths have been demonstrated in many studies [5,6,7,8,9,10,11, 21]. Moreover, as a recent study showed promising results for MRP at high field strength, utilization of MRP in the diagnostic evaluation of AIP as an alternative to invasive ERP may increase further in the future [22].

Another importrant advantage of MRI revealed in our study was the higher inter-rater agreement in differentiating non-diffuse-type AIP and PDA than that for CT. We speculated that the relatively low inter-rater agreement of CT was most likely due to the inferiority in demonstrating the aforementioned key imaging features, which may make image interpretation more subjective than with MRI. In previous studies, inter-rater agreements with regard to the CT interpretation of AIP were modest and varied (κ=0.14–0.75), which supports our conjecture [23, 24]. On the other hand, there may be only a few radiologists who are highly experienced in the diagnosis of AIP as this disease is very rare, particularly the non-diffuse-type. Hence, inter-rater variability regarding the image interpretation of non-diffuse-type AIP may be unavoidable to some degree. Given the results of our study, nevertheless, such inter-rater variability is expected to be reduced by appropriate utilization of MRI with MRP in daily practice. Further, the potential benefit of diffusion-weighted MRI (DWI) in differentiating AIP from PDA has recently been highlighted, as the apparent diffusion coefficient values for AIP were consistently lower than those for PDA [5, 6, 10]. Therefore, a combination of conventional MRI with DWI may enhance its diagnostic capability.

Our results suggest that serum concentrations of IgG and IgG4 are also helpful for the differential diagnosis of AIP and PDA as they were not elevated in any of the patients with PDA. However, elevated serum concentrations of IgG (17.5%) and IgG4 (38.6%) were not sensitive for AIP, consistent with previous study results [25, 26]. Moreover, elevation of serum CA 19-9 was not specific for PDA as it was elevated in 16.7% of the patients with AIP. These limitations of laboratory tests make it even more crucial for the imaging modalities to accurately differentiate AIP from PDA.

Our study has a few limitations. Firstly, the blinded raters had participated in patient diagnosis in daily practice; hence, recall bias might have occurred. Secondly, there was a discrepancy in the study period between the two patient groups, i.e. 10 years for the AIP group versus 3 years for the PDA group, which may have caused variations in the quality of imaging examinations for the AIP group. The imaging quality of CT might presumably be affected more than that of MRI for the older cases, which may increase the difference in diagnostic performance between these two modalities. However, a long study period for the patients with AIP was inevitable due to the rarity of this disease. Thirdly, 25 patients (nine with AIP and 16 with PDA) underwent intervention procedures, including biliary stenting, diagnostic ERP, endoscopic ultrasonography-guided biopsy of the pancreatic mass and endoscopic nasobiliary drainage, between the CT and MRI examinations. However, we found no definite procedure/device-related changes/artifacts or only very minor changes, not affecting image interpretation. Therefore, there is very little chance that the study results were affected by the intervention procedures. Fourthly, as the two raters were restricted to selecting only between non-diffuse AIP and PDA, the difference in the diagnostic performances of CT and MRI reported in this study might be an over-representation of the real-world scenario where non-diffuse-type AIP is very rare and overlaps with other entities, such as non-AIP pancreatitis.

In conclusion, MRI has a better diagnostic performance than CT in differentiating non-diffuse-type AIP from PDA, which is due to the superiority of MRI over CT in demonstrating the key distinguishing features of both diseases, including multiple pancreatic masses, delayed homogeneous enhancement of the pancreatic mass and multiple MPD strictures.