Introduction

Pancreatic neuroendocrine tumours (pNETs) are uncommon neoplasms with an estimated incidence of less than 1 per 100,000 per year in the general population [1]. Nevertheless, the incidence and prevalence of this entity has increased in the last decade [2].

One of the most important factors in the management of pNETS is tumour grade assessed by the Ki67 index, which corresponds to the percentage of tumour cells whose nucleus is marked by the monoclonal antibody MIB-1. According to the European Neuroendocrine Tumour Society (ENETS) and the 2010 revised World Health Organization (WHO) recommendations [3, 4], pNETs are classified according to their Ki67 index. Grade 1 (G1) corresponds to tumours ≤2 % Ki67 index, grade 2 (G2) to lesions 3–20 % Ki67 index, and grade 3 (G3) to tumours >20 % Ki67 index. This grading system is strongly correlated to survival, with 5-year survival rates ranging from 60–100 % for G1 and G2 tumours to 29 % for G3 carcinomas [5]. Recently, these cut-off values have become a subject of debate, and authors have suggested that 5 % or 10 % cut-off values may be more clinically significant than 2 % and 20 % [1, 6, 7].

Well-differentiated G1-G2 tumours require surgical resection when possible, while poorly differentiated G3 carcinomas are usually treated with first-line systemic chemotherapy [8, 9]. However, the pathological features of the tumour can only be determined post hoc if the lesion is resected. Therefore, the Ki67 index is usually assessed preoperatively by endoscopic ultrasound fine-needle aspiration. The quantity of material obtained may be small with hemorrhagic alterations, explaining the possible disagreements between Ki67 values on the biopsy and the resected specimen [10, 11]. This is especially true in grade 2 tumours [12, 13]. Therefore, other features, including imaging, would be helpful.

On imaging, tumour size [14] and enhancement have been shown to be helpful to discriminate tumour grades [1517]. Only a few studies have addressed the value of imaging features for the assessment of pNET, and most of them have focused on tumour enhancement or differential diagnosis with computerized tomography (CT) [1623]. Manfredi et al recently showed that magnetic resonance (MR) imaging could play a role [15]. Moreover, the apparent diffusion coefficient (ADC) has been shown to be correlated with tumour cellularity and DWI has been shown to be valuable for the detection and characterization of pNETS [2426], and for the detection of liver metastases [27]. However, there have been very few studies on the added value of DWI for grading tumours [2830].

The aim of our study was to assess the value of MR including DWI-imaging for the evaluation of tumour grade in a series of resected pNETs.

Methods

Patient population

This retrospective study was approved by the IRB and informed consent was waived. Between 2006 and 2014, 554 resected pNETs were extracted from the local pathological database. Inclusion criteria were the presence of at least one resected pNET and a preoperative contrast-enhanced MR imaging examination including a DWI sequence performed up to 3 months before the resection. At least three b-values were required. Demographic, clinical-biological, and outcome data were retrieved from medical records.

The final population included 94 patients (48 women, 51 %), mean age 52 ± 12 years old (range 29–79) with 108 resected tumours (Fig. 1). The median interval between preoperative MR imaging and resection was 2 months (interquartile range [IQR] 0-9).

Fig. 1
figure 1

Flow chart for the study population

Magnetic resonance imaging

MR imaging was performed with a 1.5-T imager (Intera; Philips Healthcare, Best, the Netherlands) using a phased-array surface coil after 4–6 h of fasting. The protocol (Table 1) included a T2-weighted single-shot sequence, a T2-weighted fast spin-echo sequence with spectral fat saturation, and a transverse breath-hold 3D T1-weighted fat-suppressed spoiled gradient-recalled echo sequence before and after dynamic injection of 0.1 mmol/kg of body weight of gadolinium chelates, followed by a 20-mL saline solution flush at a rate of 2 mL/s administered with a power injector. Pancreatic arterial-dominant, portal venous, and equilibrium phase sequences were performed around 35 s (bolus trigger), 60–70 s, and 180–200 s after intravenous contrast material injection, respectively. A free-breathing, fat-suppressed, single-shot echo planar DW MR sequence was performed before contrast injection with b values of 0, 150, and 600 s/mm2. Cardiac gating was not used. No antispasmodic drugs were administered.

Table 1 MR sequences parameters

Image analysis

MR images were retrospectively reviewed by two abdominal radiologists (EL senior resident, and MR with 10 years of experience) on a corresponding workstation (Carestream Health, Rochester, NY, USA). Readers were aware of the diagnosis of NET, but blind to the clinicobiological, and pathological features of patients and tumours.

Qualitative image analysis

Image analysis included the site of the pancreatic tumours, signal intensity on T1-w, T2-w, and DW images compared to the adjacent pancreatic parenchyma (defined as hypo-, iso-, or hyperintense), homogeneity/heterogeneity on T2-w images, cystic components defined as an area of marked T2-w signal intensity similar to that of fluid, dilatation of the main pancreatic duct ≥2.5 mm, dilatation of the bile ducts, arterial or venous encasement, infiltration of peripancreatic fat, contrast enhancement of the tumour on arterial/pancreatic, portal venous, and delayed phase images compared to the adjacent parenchyma (defined as hypo-, iso- or hyperintense), and the presence of lymph nodes >1 cm and liver metastases. The “enhancement pattern” defining a specific combination of the different signal intensities on consecutive arterial and portal venous dynamic phases (i.e. hypo-, iso-, or hyperintensity) was also noted.

Quantitative image analysis

Image analysis included the largest diameter of the tumours on the axial plane and calculation of diffusion-related parameters. Readers were asked to place an ellipsoid region of interest (ROI) on the equatorial plane of each lesion on the intermediate b-value image (b = 150 s/mm2). The ROI was drawn to include the largest portion of the tumour with no surrounding pancreatic parenchyma and avoiding the cystic components. ROIs were pasted on the other b-value images. The mean and minimal signal intensity (SI) for each b-value image was noted. The mean and minimal apparent diffusion coefficient (ADC) values and the pure diffusion coefficient (D) values were then calculated using a monoexponential fit (b = 0, 150, and 600 s/mm2 for ADC and b = 150 and 600 s/mm2 for D). Similar analysis was performed on the up- and downstream pancreas when possible by placing an ellipsoid ROI on pancreatic parenchyma, avoiding the main pancreatic duct.

Pathology analysis

Haematoxylin and eosin-stained slides of each lesion were retrospectively reviewed by a pancreatic pathologist (JC), blinded to MR imaging data. Tumour differentiation and grading (according to WHO-2010 and ENETS-2012), the presence of angioinvasion and perineural invasion, regional lymph node, and distant metastases and the Ki67 labelling index by the MIB-1 antibody (DAKO, Milano, Italy) were noted. Other subdivisions of the Ki67 index were also used in accordance with recent studies showing an improved evaluation of prognosis (i.e. 0–5 %, 6–15 %, 16–35 %, 36–55 %, and >55 %, or 0–2 %, 3–10 %, >10 %) [1, 6, 7]. Pathological analysis of the adjacent pancreatic parenchyma was also performed.

Statistical analysis

Results are presented as means (standard deviation) or medians (ranges) for quantitative data, and as the number of cases (percentage of cases) for categorical variables. The comparison between subgroup features was performed with the Student t-test and the Mann–Whitney U test for continuous variables according to distribution. Qualitative data were compared with the Chi2 or Fisher’s exact tests when necessary.

In patients with multiple lesions, tumour grades, and Ki67 varied between lesions. Therefore, lesions were considered independent. The diagnostic value of D and ADC for differentiating tumour grades was assessed by receiver operating characteristic (ROC) curve analysis. Curves were compared using the DeLong test, and cut-off values were chosen by maximizing the Youden index. Sensitivity and specificity were computed. Tests were always two sided, and p < 0.05 was considered to be significant, except for post-hoc tests, for which p < 0.017 was considered to be significant according to the Bonferroni correction. Inter-reader agreement for ADC and D was assessed using a Bland-Altman plot and intraclass correlation coefficient. All analyses were performed using Statistical Package for the Social Sciences software (version 20.0, IBM SPSS Inc., Armonk, NY, USA).

Results

Patient and tumour characteristics

A total of 94 patients (48 women, 51 %), mean age 52 ± 12 years old, were analysed (Table 2), with 108 tumours (mean 40 ± 33 mm). Eighty-eight patients (94 %) had one lesion, three had two tumours, and the remaining three patients had 3, 4, and 7, respectively. Seventeen patients (18 %) had functioning tumours: gastrinoma (n = 12), insulinoma (n = 3), VIPoma (n = 1), and glucagonoma (n = 1).

Table 2 Patients and tumours characteristics

At pathology, 55 tumours were classified as G1 (51 %), and 42 (39 %), and 11 (10 %) were classified as G2, and G3, respectively. Size increased with grade (26 ± 17, 47 ± 34, and 86 ± 52 mm for G1, G2, and G3, respectively, p < 0.001, with a significant correlation between size and Ki67: r = 0.54, p < 0.001). Among G3 tumours, five (45 %) were poorly differentiated. These lesions were larger than differentiated G3 tumours, but it did not reach the level of significance (109 ± 73 vs. 71 ± 31 mm, p = 0.279). They had a higher Ki67 than differentiated G3 tumours (84 ± 11 vs. 34 ± 15, p < 0.001).

Qualitative analysis

Arterial/venous encasement (p = 0.041 and p = 0.002), peritumoural fat invasion, lymph nodes, liver metastases were significantly associated with tumour grade (all p < 0.001). Pancreatic duct dilatation was more frequent in G3 tumours (36 % vs. 11 % and 26 % for G1 and G2, respectively), but it did not reach the level of significance (p = 0.055) (Table 3).

Table 3 Qualitative MR imaging features

Most G1-G2 lesions were hypervascular (80 % and 79 % for G1 and G2), while all G3 tumours were hypo/isointense. On portal venous phase images, most G1 lesions were hyperintense (80 %), while most G3 lesions showed either hypo/isosignal intensity (83 %). All lesions showed signal hyperintensity in b0 and b150, and most lesions (93 %) showed signal hyperintensity on high b-value images.

Several enhancement patterns were observed (Table 3). Hyperintensity on both arterial and portal venous phase images was significantly more frequent in G1 tumours (71 %, vs. 60 % and 0 % for G2 and G3, p = 0.003). A combination of hypo/iso (i.e. hypo/hypo, hypo/iso, or iso/iso) was more frequent in G3 lesions (100 % vs. 20 % and 19 % for G1 and G2, p < 0.001).

Quantitative analysis

The mean ADC value for the whole cohort was 1.88 ± 0.76 and 1.86 ± 0.70 10-3 mm2/s for readers 1 and 2, respectively (Table 4). The mean D value was 1.80 ± 0.73 and 1.79 ± 0.81 10-3 mm2/s for readers 1 and 2, respectively. Measurements showed significant correlation between readers (r = 0.90 and r = 0.89 for ADCmean and D, both p < 0.001). The bias (limits of agreement) between the two readings were -2.7 % (-38 %, +32 %) for ADCmean and +1.3 % (-41 %, +46 %) for D. ICC was 0.90 and 0.87 for ADC and D, respectively.

Table 4 Diffusion-weighted MR imaging parameters

From this point forward, results are presented for the most experienced reader (reader 1). ADCmean were significantly different among tumour grades, with the lowest values associated with higher-grade tumours (p < 0.001, Table 4, Fig. 2). Similar results were observed for the minimum value of ADC (ADCmin), and D values (p < 0.001 for both, Table 4, Fig. 2). There was a significant correlation between ADCmean and D values (r = 0.91, p < 0.001).

Fig. 2
figure 2

Distribution of the ADCmean (a), ADCmin, (b), and D (c) values according to the tumour grade. ADC and D values were significantly lower in higher tumour grades (p < 0.001, for all). * p < 0.05, ** p < 0.01, *** p < 0.001

Poorly differentiated G3 had significantly lower ADCmean and ADCmin values than differentiated G3 tumours (0.69 ± 0.19 vs. 0.97 ± 0.16 10-3 mm2/s, p = 0.042, and 0.62 ± 0.23 vs. 0.89 ± 0.14 10-3 mm2/s, p = 0.044). D was not significantly different (p = 0.194).

There was an inverse correlation between both ADCmean and D values tumour size (r = -0.26, p = 0.008 for ADCmean and r = -0.23, p = 0.02 for D). There was also an inverse relation between both ADCmean and D values and Ki-67 (for ADCmean r = -0.55, for D r = -0.41, both p < 0.001). When tumours were grouped according to different Ki-67 classes following recent publications (0–5 %, 6–15 %, 16–35 %, 36–55 %, >55 %) there was a significant and gradual decrease in both ADCmean and D values (p < 0.001 for both, Table 5, Fig. 3). When dividing G2 tumours according to the Ki-67 (3–10 % and 11–20 %), ADCmean and D both significantly decreased (p < 0.001 and p = 0.0008 for ADCmean, and D, respectively, Table 5, Fig. 3). Comparison with the pancreatic parenchyma is provided as supplemental material.

Table 5 Diffusion-weighted MR parameters according to alternative Ki-67 thresholds
Fig. 3
figure 3

Distribution of the ADCmean (a) and D (b) values according to the tumour Ki-67. ADCmean and D values were significantly lower in higher tumour grades (p < 0.001 for both). Distribution of the ADCmean (c) and D (d) values according to the Ki-67 of G1 and G2 lesion. ADCmean and D values were significantly lower as Ki-67 increased (p < 0.001 for both)

Prediction of tumour grade

The sensitivity and specificity of the different qualitative morphological features for the identification of G3 tumours (vs. G1-2) ranged from 36 % to 100 %, and 33 % to 95 %, respectively, with AUROCs ranging from 0.49 to 0.90 (Table 6). ADCmean and D showed the largest AUROC (0.96 and 0.95, respectively, p > 0.05, Fig. 4). ADCmin had an AUROC of 0.83 (vs. ADCmean p = 0.04, and vs. D p = 0.06). Optimal cut-off values for ADCmean and D for the identification of G3 lesions were 1.19 10-3 mm2/s (sensitivity 100 %, specificity 92 %), and 1.04 10-3 mm2/s (sensitivity 83 %, specificity 92 %) (Fig. 5).

Table 6 Diagnostic performances of clinical and imaging features
Fig. 4
figure 4

ROC curves of ADCmean (black lines), ADCmin (thin black line), and D (grey lines) for the differentiation between G1-2 and G3 tumours (a), and between G1 and G2-3 tumours (b)

Fig. 5
figure 5

Example of a G3 pNET of the body of the pancreas in a 66 year old woman. MR imaging showed a large (>2 cm) non-hypervascular lobulated tumour on arterial phase images (arrow in a) with a mean ADC value of 1.15 10-3 mm2/s (b). All features were consistent with a G3 tumour. The lesion was resected, and pathological analysis confirmed the tumour grade (Ki67 = 25 %)

Qualitative morphological features were less sensitive and specific for the identification of G1 tumours (vs. G2-3) (Supplemental Figure 1). Sensitivity ranged from 49 % to 95 %, and specificity from 15 % to 91 %, with the AUROC ranging from 0.52 to 0.73 (Table 6). AUROCs for ADCmean, ADCmin and D were 0.75, 0.75, and 0.66, respectively (all comparisons p > 0.05, Fig. 4). Supplemental Figure 2 provides a combination of morphological and functional imaging features for the stratification of tumour grades.

Discussion

The present study shows that morphological and functional MR findings, including DWI, have predictive value for pNET tumour grade, especially for the differentiation between G3 and G1-2 tumours.

For the prediction of tumour grade has been shown to be strongly associated with size [14]. Thus, proliferation in small lesions is more likely to be low. This explains results in previous studies showing that incidentally identified non-functioning lesions <2 cm can be managed by active surveillance, because the risk of malignancy and distant progression is very low [3133]. This was also true in the present study as all G3 lesions were >3 cm, and 75 % of <10 mm were G1, with significant correlation between tumour size and Ki67. This has also been reported in other series evaluating imaging features with MR [15, 34] or CT [16, 19, 35].

Nevertheless, size alone is insufficient for the accurate discrimination of tumour grade, especially between G1 and G2 tumours. Moreover, small lesions may be aggressive [36]. Therefore, numerous other imaging features have been described including vascular invasion or the presence of regional lymph nodes [15, 19].

One important feature that is classically associated with pNET is hypervascularity. Studies have shown that tumour enhancement on CT is correlated with tumour vascularity on light microscopy, and that low-enhancing NETs are poorly differentiated and show decrease overall survival [20, 35]. In the present series, most G1, and to a lesser extent, G2 lesions were hypervascular, while no G3 showed this feature. Recently, Capelli et al suggested that in addition to evaluating tumour enhancement during the arterial phase the tumour enhancement pattern during portal venous and delayed phases might also provide information [16]. This study showed that tumours with significant delayed enhancement alone were often neuroendocrine carcinomas. Our results do not support these observations. If most G3 lesions were hypo- or isointense on both arterial and portal venous phase images, 33 % of these tumours were hyperintense on delayed phase images, but this was not specific to G3 tumours. This may be partially explained by the different imaging modalities, because Capelli et al used contrast-enhanced CT [16].

There was an inverse relationship between ADC and D and tumour size, but one important result of this study is the stronger significant and negative relationship between ADC and D and the degree of tumour proliferation. This supports previous studies by Wang et al. and Jang et al. [28, 29]. Interestingly, in the present study, both ADC and D were significantly and negatively correlated with Ki67 labelling. Our results differ from the study by Hwang et al with IVIM that did not find any difference in ADC values between G1 and G2-3 lesions [34]. Because low grade NETs are hypervascular, perfusion plays a significant role in tumour behaviour and characterization [37]. D’Assignies et al showed that perfusion CT values are correlated with histoprognostic factors, such as the proliferation index and the WHO classification [18]. Thus, the parameter D, which reflects pure water diffusion, could be expected to be less discriminatory for tumour grade than ADC. In fact, the AUROCs for the differentiation between G3 and G1-2 lesions for ADC and D were similar. The difference between our results and those by Hwang et al may be due to the inclusion of fewer pNETs in their study (40 tumours with only one G3 tumour).

We also analyzed the performance of the minimum ADC value as previous studies have reported a significant correlation with Ki67 in other tumours [38, 39]. As expected, ADCmin decreased as the tumour grade and the Ki67 increased, but with lower performance that that of ADCmean. Further studies are necessary to validate these results.

Size and ADC values helped identify G1 and G2 tumours among hypervascular lesions, although the predictive value was lower. Indeed, even if the ADC and D values in our study were found to be significantly different for the three tumour grades, the AUROC of ADC and D were lower for the differentiation between G1 and G2, and there was significant overlap between these two groups. This is partially explained by the small range of Ki67 labelling for G1, and a wider range for G2 tumours [2]. Recently, authors have shown that modified Ki67 thresholds provide a clinically effective prognostic stratification of patients [1, 6, 7]. While waiting for possible revision of this classification, we also applied these modified thresholds (i.e. 0–5 %, 6–15 %, 15–35 %, 36–55 %, >55 %) and observed a more gradual decrease in both ADC and D values, and better discrimination of lesions with low Ki67 labelling.

Finally, the ADC and D values of the pancreatic parenchyma could not help predict tumour grade. Yet, and even though it was not the main objective of the present study, ADC and D of the tumours were significantly lower than those of the upstream pancreas in all grades, and of that of the downstream pancreas in high-grade tumours. This might help differentiate tumours from the pancreatic parenchyma in difficult cases. This statement requires further validation.

Besides its retrospective design, our study has certain limitations. First, there were fewer G3 tumours than G1 and G2 tumours. This was partially due to our selection process, as we only included patients with resected lesions. Another part of the explanation is that G3 tumours are less frequent than G1-2 tumours (around 11 % in a series of more than 35,000 patients [40]). Therefore, our population can be reasonably considered as representative of the distribution of pNETs in the general population. Finally and importantly, inclusion of patients undergoing biopsy rather than resection would expose to a risk of misclassification as biopsy may be subject to variability and error. Second, we did not analyse the outcome of the patients, and the influence of diffusion parameters on survival or recurrence.

In conclusion, the combination of morphological and functional MR imaging features is useful for the discrimination of tumour grade in pNETS. Diffusion imaging parameters are significantly associated with tumour proliferation, and helpful for the identification of high-grade and poorly differentiated tumours.