Introduction

Pancreatic neuroendocrine tumors (pNETs) are highly heterogeneous and may originate from the mature pancreatic endocrine cells and pluripotent stem cells of the pancreas [1]. pNETs are rare, accounting for 70% of the cases correspond to non-functional tumors, which also produce a series of substances, but without corresponding to any hormonal syndrome [2, 3]. Although surgical resection is also the recommended treatment for patients with non-functional (NF)-pNETs, it is important to define the grade of the tumors. This will determine patient eligibility for surgical treatment.

The current World Health Organization (WHO) classification for endocrine organs (WHO 2019) was devised for pNETs alone [4]. This classification endorses the WHO 2010 principles [5], which categorized pNETs as low- (G1), intermediate- (G2), or high-grade (G3) on the basis of the mitotic rate and the fraction of Ki-67-positive tumor cells (Ki-67 index) [6]. However, the WHO 2019 classification introduced a definition of NET G3 for neoplasms that are well differentiated in morphology, but display a proliferation index in the G3 range [7]. However, it now divides neuroendocrine carcinomas previously classified as G3 into well-differentiated (NET G3) and 10 poorly differentiated NETs (neuroendocrine carcinoma G3), with the latter being subdivided into the small-cell and large-cell types [8,9,–]. This improved characterization of G3 tumors has therapeutic implications [8, 11, 12].

In general, the risk of tumor progression increases by 2% for every 1% increase in the Ki-67 index [13]. G2 and G3 pNETs show a poorer prognosis and often require more intensive treatment than G1 pNETs. The accurate preoperative pNET grading is very important for the appropriate clinical treatment selection and prognosis evaluation. However, the current classification systems are unhelpful for surgical planning.

Multi-slice spiral computed tomography (MSCT) and magnetic resonance imaging (MRI) have become the most commonly used tool for diagnosis of pNETs because they are non-invasive and highly efficient. However, the overall value of preoperative pNET grading is limited [14].

Radiomics is a method that is gaining importance in cancer research [15, 16]. In this method, the high-throughput mining of quantitative image features from standard-of-care medical imaging enables data to be extracted and applied within the clinical decision support systems to improve diagnostic, prognostic, and predictive accuracy. Few studies to date have used radiomics to predict the grades of NF-pNETs, especially the used MRI radiomics [17,18,19,20,21,22]. Thus, our aim was to evaluate whether the preoperative MRI rad-score was associated with grading, and evaluate its potential to differentiate grade 1 from 2/3 NF-pNETs.

Materials and methods

Patients

This retrospective single-center study was reviewed and approved by the Biomedical Research Ethics Committee of our institution. All patients provided written informed consent. We screened the institutional database for medical records of patients with histologically confirmed pNETs who underwent surgical resection with curative intent from November 2012 to December 2019. We included patients who had undergone surgical treatment and those who had pNETs confirmed by hematoxylin and eosin immunohistochemical staining. Patients were excluded as per the following exclusion criteria: (1) they had received a treatment (radiotherapy, chemotherapy, or chemoradiotherapy) before undergoing imaging examinations, (2) they had not undergone preoperative standard contrast-enhanced MRI during the month prior to surgery, (3) patients had pancreatic lesions that could not be visualized using MRI, (4) patients had other tumors in the pancreas, and (5) they had confirmed functional pNETs due to the development of a distinct clinical syndrome, as well as secrete hormones which can be detected as abnormally increased serum levels or by immunohistochemical staining of the tumor tissue-related antibodies. The patient selection process is shown in Fig. 1. Consequently, 157 consecutive patients with NF-pNETs were included in this cross-sectional study.

Fig. 1
figure 1

Flow chart visualizing the patient selection process

MRI acquisition methods

All patients included in this study underwent dynamic contrast-enhanced MRI of the pancreas performed with the 3.0-T systems (Signa Excite 3.0 T, GE Healthcare, Milwaukee, USA; Signa HDxt MR750 3.0-T, GE Healthcare; Skyra 3.0-T, Siemens, Erlangen, Germany). All patients were supine positioned with a phased array receiver coil covering the upper abdomen, and a breath-hold, single-shot, fast-spin, echo-coronal T2-weighted sequence (repetition time/echo time [TR/TE] = 6316/87 ms; field of view [FoV] = 360 × 420 mm2; matrix = 224 × 270; slice thickness = 5 mm; slice gap = 1 mm) and unenhanced and dynamic contrast-enhanced T1-weighted fat-suppressed sequences (TR/TE = 2.58/1.18 ms; FoV = 440 × 440 mm2; matrix = 224 × 270; slice thickness = 5 mm; no slice gap) were used. Dynamic contrast-enhanced images, including arterial (15 s), pancreatic parenchymal (20 s), and portal venous phase (40 s) images, were obtained with gadopentetate dimeglumine (Magnevist and Gadovist, Bayer Schering Pharma, Berlin, Germany). The contrast agent was administered intravenously at a rate of 2 mL/s and a dose of 0.2 mL/kg, followed by 20 mL of normal saline (to flush the tube).

Different medical imaging factors cause inconsistencies in the image intensity information of tissues of the same nature. We used the following formula for intensity normalization (where x represents the original intensity; f (x) indicates the normalized intensity; \(\mu\) indicates the average value; \(\delta\) refers to variance; s is an optional scaling ratio, which has been set to 1 by default). While retaining the intensity difference of diagnostic value, the image intensity inconsistency caused by the difference in imaging parameters is reduced or even eliminated for subsequent imaging radiomics analysis.

(https://pyradiomics.readthedocs.io/en/latest/radiomics.html#radiomics.imageoperations.normalizeImage)

$$f\left(x\right)=\frac{s (x-{u}_{x})}{{\sigma }_{x}}$$
(1)

MRI analysis

All MR images were analyzed by two board-certified abdominal radiologists with experience of 20 and 18 years, respectively, who were aware that the study population had pNET, but were blinded to the tumor grade stage, by consensus.

All tumors were evaluated for: (a) tumor location (head [uncinate and non-uncinate], body, or tail of the pancreas or multiple locations in the pancreas), (b) size (the maximum diameter of the tumor in the cross section [23], (c) shape (regular or irregular), (d) margin (well delineated [tumor margin smooth and clearly visible] or ill delineated [with spiculation or infiltration on > 90° of the tumor perimeter]), (e) cystic changes within the tumor (non-enhancing areas of circular or ovoidal shape or well-defined margin and high T2-weighted image [T2WI] signal intensity), (f) dilation (pancreatic and common bile duct dilation [> 3 mm and > 10 mm, respectively]), (g) tumor intensity in T1WI and T2WI (low-, iso-, and high-intensity tumors), (h) the phase of peak enhanced value, (i) organ invasion (invasion of the liver, spleen, intestine, or stomach; the criterion was that tumors could not be separated from the organ/s), and (j) vascular invasion (invasion of the common hepatic artery, splenic artery and vein, gastroduodenal artery, superior mesenteric artery and vein, and portal venous vein; the criteria were vessel occlusion, stenosis, or more than half of the perimeter being in contact with the tumor).

Freehand regions of interest (ROIs) were drawn on the maximum cross-sectional diameter of the tumor. The shape, area, and location of the ROIs were the same in all three phases. Special care was taken to avoid pancreatic vessels, ducts, and artifacts in the ROI placement.

Radiomics workflow

The radiomics workflow included the following: (a) image segmentation, (b) feature extraction, (c) feature reduction and selection, and (d) rad-score calculation.

We used the draw tool, available in the Editor module of three-dimensional (3D) Slicer version 4.8.1 (open source software; https://www.slicer.org/), to delineate the tumors in multiple slices. We extracted the volume of interest for each patient by stacking the corresponding ROIs, delineated slice by slice. Radiomic feature extraction was performed using the open source Python package Pyradiomics 1.2.0 (https://www.radiomics.io/pyradiomics.html). We used two classes of feature extraction methods: the original feature and the filter class, which in turn included five categories: wavelet, square, square root, logarithm, and exponential. We extracted 1409 two-dimensional (2D) and 3D features from the primary tumors in each scan phase. More information about radiomics feature extraction methodology is reported in Supplementary digital content 1.

Feature selection comprised three steps: variance analysis, spearman correlation analysis and the least absolute shrinkage and selection operator (LASSO) logistic regression algorithm. This method is reported in Supplementary digital content 2. The sequential method of Bonferroni correction was applied to adjust the baseline significance level (\(\alpha\)= 0.05) for multiple testing bias [24, 25]. Finally, a rad-score was calculated for each patient via a linear combination of the selected features that were weighted by their respective coefficients.

To assess interobserver reliability, the ROI segmentation was performed in a blinded fashion by two radiologists; for intraobserver reliability, reader 1 extracted the feature twice, with a time interval of one week between the sessions. The reliability was calculated by using intraclass correlation coefficient (ICC). ICC values > 0.75 were selected for subsequent investigation.

Pathological image analysis

The formalin-fixed specimens were processed into paraffin according to the standard routine methods [26]; 5-μm-thick sections were obtained and stained with hematoxylin and eosin (HE) stain for conventional histological examination and were assessed according to the WHO 2019 guidelines. Immunohistochemistry was performed according to the standard methods. We used monoclonal mouse anti-human antibodies against Ki-67 (Dako, Glostrup, Denmark).

All specimens were analyzed by two pathologists with 30 and 20 years of experience in pancreatic pathology, respectively. Pathologic information for the following factors was recorded: (1) G grading performed according to the WHO 2019 guidelines, (2) tumor node metastasis (TNM) staging performed according to the American Joint Committee on Cancer (AJCC) TNM Staging (8th Edition) [27], (3) AJCC clinical stage (8th Edition) [27], (4) lymphovascular space invasion (LVSI), (5) nerve invasion, (6) common bile duct invasion, and (7) the duodenum invasion.

Statistical analysis

Normal distribution and variance homogeneity tests were performed in all continuous variables. Continuous variables with a normal distribution are expressed as mean values \(\pm\) standard deviation; otherwise, they are expressed as the median and range (min–max). The rad-score was evenly divided into two groups (< 0.51 and \(\ge\) 0.51), with the values of < 0.51 constituting the reference group. First, we examined the group differences between the patients with G1 and G2/3 pNETs. The t- (normal distribution), Kruskal–Wallis H (skewed distribution), and chi-square tests (categorical variables) were used to determine statistical differences between the two groups. Univariate regression analysis was applied to estimate the effect size between all variables and grades of NF-pNETs. Multivariable analyses were used to evaluate the associations between the rad-score and tumor grades in three models: model 1 (not adjusted for other covariates), model 2 (adjusted for age, sex, and body mass index [BMI]), and model 3 (adjusted for the same factors as those in model 2, and other associated factors in univariate regression analysis). Receiver operating characteristic (ROC) curves were used to find the optimal cut-off values for the rad-score (maximizing the sum of sensitivity and specificity) to predict the NF-pNET grades. Finally, the clinical usefulness of the rad-score was tested with the decision curve analysis (DCA). A retrospective power analysis was performed.

A two-tailed p-value < 0.05 was considered statistically significant. All analyses were performed with R version 3.3.3 (R Foundation for Statistical Computing, Vienna, Austria) and EmpowerStats (X&Y Solutions, Inc., Boston, MA, USA).

Results

Clinical and pathologic characteristics

Altogether 157 consecutive patients with NF-pNETs (sex, 84 men [mean age, 52.58 years; age range 23–78 years] and 73 women [mean age, 52.58 years; age range, 24–75 years]) were included in this cross-sectional study at our institution. The patients with G1 (n = 61), G2 (n = 78) and G3 (n = 18) pNETs accounted for 38.85%, 49.68%, and 11.46% of the study cohort, respectively. Thirteen patients (8.28%) were found to have pathologically confirmed distant metastatic disease (liver) in the final pathology results after resection. There was a significant difference in TNM stage (T: p < 0.001; N: p = 0.003; M: p = 0.017), clinical stage (p = 0.003), perineural invasion (p < 0.001), and LVSI (p < 0.001) between patients with G1 and G2/3 pNET. However, there were no significant differences in other characteristics (p > 0.05) between the two groups. Characteristics of all patients are summarized in Table 1.

Table 1 Clinical and pathological characteristics of NF-pNETs G1 versus G2/3

MRI characteristics

G1 tumor size was 2.58 \(\pm\) 1.69 cm, G2/3 tumor size was 4.14 \(\pm\) 2.89 cm. Among various MRI findings, the patients with G1 and G2/3 pNETs showed significant differences in tumor size (p < 0.0001), shape (p = 0.049), phase of peak enhancement value (p = 0.002), organ invasion (p = 0.017), and vascular invasion (p = 0.005). Characteristics of all patients are summarized in Table 2.

Table 2 MRI Characteristics of NF-pNETs G1 versus G2/3

Feature selection and radiomics signature building

The interobserver and intraobserver ICCs were satisfactory, ranging from 0.813 to 0.932 and from 0.882 to 0.943, respectively. Out of 1409 total extracted features, 1,252 were excluded due to low intragroup variance (n = 527) and poor correlation with the tumor grade (n = 725). Ultimately, 157 radiomic features were obtained including 94 arterial and 63 portal radiomic features, and were further reduced to seven by using the LASSO regularization method (Fig. 2a–c). Seven features by LASSO regression were still significantly associated with NF-pNET grade after Bonferroni correction. Finally, the radiomics signature was constructed, and the rad-score was calculated by using the following equation (Eq. 2). The rad-scores were significantly higher in G2/3 than in G1 tumors (p < 0.0001) (Fig. 2d).

Fig. 2
figure 2

Radiomic feature selection by using a parametric method, the LASSO. a Selection of the tuning parameter (λ) in the LASSO model via tenfold cross-validation based on minimum criteria. Binomial deviances from the LASSO regression cross-validation procedure were plotted as a function of log(λ). The y-axis indicates binomial deviances. The lower x-axis indicates the log(λ). Numbers along the upper x-axis represent the average number of predictors. The red dots indicate the average deviance values for each model with a given λ, while the vertical bars through the red dots show the upper and lower values of the deviances. The vertical black lines define the optimal values of λ, where the model provides its best fit to the data. The optimal λ value of 0.0723 with log(λ) = − 2.627 was selected. b LASSO coefficient profiles of the 157 texture features. The dotted vertical line was plotted at the value selected using tenfold cross-validation in a. The four resulting features with non-zero coefficients are indicated in the plot. c The error-bar chart of the seven radiomic features. d The combo chart included the box and density plots of the radiomic score according to the non-functioning pancreatic neuroendocrine tumors grade. The radiomic scores of G2/3 were higher than the corresponding of G1. LASSO least absolute shrinkage and selection operator

$$Radiomics \, score=0.4721155 + 0.22016*\mathrm{ Original}.\mathrm{Shape}.\mathrm{Maximum}2\mathrm{DDiameterSlice}- 0.01464*\mathrm{ Exponential}.\mathrm{Glrlm}.\mathrm{RunLengthNonUniformityNormalized} - 0.17215*\mathrm{ Square}.\mathrm{Glszm}.\mathrm{SmallAreaLowGrayLevelEmphasis} + 0.04588*\mathrm{ Wavelet}-\mathrm{HLL}.\mathrm{Glcm}.\mathrm{Imc}1 - 0.18895*\mathrm{ Logarithm}.\mathrm{Glrlm}.\mathrm{ShortRunEmphasis }- 0.04685*\mathrm{ Exponential}.\mathrm{Firstorder}.\mathrm{Kurtosis }+ 0.01956*\mathrm{ Logarithm}.\mathrm{Glrlm}.\mathrm{GrayLevelNonUniformityNormalized}$$
(2)

Univariate analysis

The univariate analysis results are shown in Table 3, demonstrating that the increasing rad-score (p < 0.0001), higher TNM stage, higher clinical stage (II: p = 0.079, III: p = 0.004, IV: p = 0.014), positive perineural invasion (p = 0.0004), positive LVSI (p = 0.0004), increasing tumor size (p = 0.0007), irregular shape (p = 0.049), the later phase of peak enhancement value (pancreatic parenchymal phase: p = 0.063, portal venous phase: p = 0.002), presence of organs invasion (p = 0.028), and presence of vascular invasion (p = 0.009) were significant parameters with an increased risk for G2/3 (Fig. 3).

Fig. 3
figure 3

Radiomic feature maps of the four selected features. ac A case of a 39-year-old man with G1 neuroendocrine tumor. a The axial T2-weighted fat-saturated MR image shows a circular well-demarcated hyperintense nodule (arrow) located at the pancreatic head. b The axial T1-weighted fat-saturated MR image shows that the nodule was hypointense (arrow). c The axial contrast-enhanced fat-saturated T1-weighted MR image in the arterial phase shows that the nodule (arrow) was hypervascular. df A case of a 56-year-old man with G2 neuroendocrine tumor. d The axial T2-weighted fat-saturated MR image shows a circular well-demarcated slightly hyperintense nodule (arrow) located at the pancreatic head. e The axial T1-weighted fat-saturated MR image shows that the nodule was hypointense (arrow). f The axial contrast-enhanced fat-saturated T1-weighted MR image in the arterial phase shows that the nodule (arrow) was hypervascular. g The comparison between G1 and G2 of these patients. MR magnetic resonance

Table 3 The result of univariate analysis

Multivariate analyses

Multivariable logistic models were used to evaluate the associations between exposure (the rad-score) and outcome (NF-pNET grades). In the crude model (model 1), the rad-score correlated with the NF-pNET grades (odds ratio [OR]: 1.41, 95% confidence interval [CI] 1.23–1.61, p < 0.0001). In the minimally adjusted model (adjusted age, sex, and BMI) (model 2), the effect size also showed a significant correlation (OR 1.41, 95% CI 1.23–1.61, p < 0.0001). Following further adjusting for T stage, N stage, clinical stage, perineural invasion, LVSI, size, shape, phase of peak enhancement value, organ invasion, and vascular invasion, significance was still identified in the fully adjusted model (model 3) (OR 1.35, 95% CI 1.14–1.60, p < 0.0001). We further handled the rad-score as a categorical variable, and the same result was observed (p < 0.0001). The results of multivariate analysis are shown in Table 4.

Table 4 The results of multivariable analysis

Apparent performance of the rad-score

ROC curves of the rad-score were used to determine the optimal cut-off values for prediction of the NF-pNET grades showed an area under curve (AUC) of 0.775 (95% CI 0.703–0.846) (Fig. 4). The best cut-off point based on maximizing the sum of sensitivity and specificity was at 0.41 (sensitivity: 63.5%; specificity: 80.3%; accuracy: 0.701).

Fig. 4
figure 4

Receiver operating characteristic curves of the radiomic score

Clinical utility of the rad-score

The decision curves showed that when the threshold probability was higher than 0.3, the rad-score used in our study to grade NF-pNETs was more efficient than the treat-all-patients or the treat-none scheme (Fig. 5).

Fig. 5
figure 5

Decision curve analysis for the radiomic score. The y-axis represents the net benefit. The red line represents the radiomic score. The gray and black lines represent the hypotheses that all patients had G2/3 and G1 NF-pNETs, respectively. The x-axis represents the threshold probability, (the expected benefit of treatment is equal to the corresponding of avoiding treatment). The decision curves in the validation set showed that when the threshold probability is higher than 0.3, the radiomic score developed in the current study to predict the grade of NF-pNETs adds more benefits than the treat-all or treat-none scheme. NF-pNETs non-functioning pancreatic neuroendocrine tumors

Discussion

We examined the relationship between the rad-score and NF-pNET grade. As shown in the fully adjusted model (model 3), the MRI rad-score was significantly associated with the NF-pNET grades. Every 1-point increase in the MRI rad-score was associated with a 35% increased risk of G2/3 disease development. As a categorical variable, rad-scores \(\ge\) 0.51 were associated with a 4.78-fold increased risk of G2/3 disease than the rad-scores < 0.51. Our data also showed that the MRI rad-score provided a potentially valuable non-invasive tool for differentiating the NF-pNET grades (AUC = 0.775, sensitivity = 63.5%, specificity = 80.3%, accuracy = 0.701). The best cut-off point based on maximizing the sum of sensitivity and specificity was 0.41.

Accurate preoperative grading of NF-pNETs is essential for providing patients with appropriate counsel regarding the aforementioned kinds of surgical decisions and for improving prognostication. However, this is difficult to be achieved with the currently available methods. Endoscopic ultrasonography-guided fine-needle aspiration is considered moderately sensitive for the detection of pancreatic lesions, and shows the diagnostic value for primary tumors [28, 29]. However, the effectiveness of this method for the diagnosis of pNETs is limited by the small volume of tissue acquired, the skill level of the endoscopic physicians, and the invasive procedure [30].

Many researchers have studied the correlation between the imaging features and tumor grade. A study by Yamada et al. [31], which included 37 patients with pNETs, showed that the CT enhancement in the pancreatic phase, irregularity, vessel involvement, and cystic degeneration/necrosis were significant predictors of G2 pNETs. Kim et al. [32, 33] performed a multivariate analysis of the MSCT features of 161 patients with pNETs; they reported that the largest diameter (> 3 cm), portal venous enhancement rate (\(\le\) 1.1), and liver metastasis were more suggestive of poor prognosis and that the portal enhancement ratio showed high sensitivity and specificity (92.3% and 80.5%, respectively) in differentiating G3 from G1/2 pNETs. Toshima et al. [34] studied the correlation between CT and MRI imaging features and tumor grade; using univariate analysis, the study showed that the maximum tumor diameter, tumor shape, enhancement pattern, cystic degeneration, and apparent diffusion coefficient (ADC) finding were helpful in the determination of G1/2 and G3 tumors, and using a multivariate regression analysis, it showed that a lower ADC ratio (ADC value of the lesion/ADC value of the parenchyma < 0.94), non-uniform enhancement, lobulated mass, and high signal on T2WI suggest a diagnosis of G3 tumor. A study by Kulali et al. [35], which included 30 patients with pNETs, showed that (i) all high- and intermediate-grade pNETs tended to show low to intermediate signal intensity on T2WI; (ii) low-grade pNETs showed statistically higher arterial enhancement than the intermediate- and high-grade pNETs; and (iii) ADC values in the high-grade pNETs were significantly lower than those of the low- and intermediate-grade pNETs. Kim et al. [36] found that ill-defined borders and hypointense signals on venous and delayed-phase imaging were common findings of the higher-grade pNETs and that the ADC value was helpful for differentiating the G1 from the G2 pNETs. Robertis et al. [37] found Ill-defined margins and the lower mean apparent diffusion coefficient value were more common in G2–3 and stage III–IV PanNENs than in G1 and low-stage tumors. Our study showed that the increasing tumor size, irregular shape, the later phase of peak enhancement value, organ invasion, and vascular invasion were significant parameters with an increased risk for G2/3. These results were consistent with the published studies. However, there was no significant difference in T2 signal intensity and margin between G1 and G2/3 pNETs. We postulate that the distribution of G grades, number of patients, and patient grouping criteria may have been the reasons for the differences in results between our study and others.

Few published studies to date have used radiomics to develop prediction models for pNET grading. Robertis et al. [22] found that whole-tumor histogram analysis of ADC maps (ADCentropy and ADCkurtosis) may be helpful in predicting tumor grade, vascular involvement, nodal and liver metastases in pNETs, and ADCentropy and ADCkurtosis are the most accurate parameters for identification of pNETs with malignant behavior. A logistic regression model predicted the G1 and G2/3 tumor grades with 79.3% accuracy by combining the MSCT features of pNETs with the texture features of cross-sectional CT images in the portal venous phase, including tumor diameter, vascular invasion, pancreatic duct dilation, lymphatic metastasis, and entropy [17]. In contrast, in another study, a summary of MSCT features in 66 patients with pNETs (45 and 21 with G1 and G2/3, respectively) allowed extraction of the texture features of the cross-sectional images of the arterial and portal venous phases to establish logistic regression models [18]. This model is better for the determination of tumor grade (AUC = 0.77) than the method using simple MSCT features (AUC = 0.68). Gu et al. [19] developed nomograms incorporating tumor margin and the fusion CT radiomic signature and discriminated G1 and G2/G3 tumors, yielding AUCs of 0.974 and 0.902 in the training and validation cohort, respectively. Another similar study included 137 patients from two hospitals and developed a combined nomogram including clinical characteristics and CT radiomic signatures to differentiate G1 and G2/3 tumors, showing the best performance (training set: AUC = 0.907; validation set: AUC = 0.891) [20]. However, the radiomic features were extracted from MSCT in the aforementioned studies and the studies on MRI-based radiomic features are very rare. Gao et al. [21] used the MR-T1WI radiomic features of 96 patients with pNETs to develop the convolutional neural network model and predict the pNETs grade; the average accuracy and AUC were 85.13% and 0.9117, respectively, in the cross-validation. In our study the rad-score combined the four arterial radiomic features and three portal radiomic features were significantly associated with the G grading of NF-pNETs (OR 1.41, 95% CI 1.23–1.61], p < 0.0001) in the crucial and minimally adjusted model. When the significantly associated characteristics of the univariate analysis were fully adjusted, significance was still identified (model 3) (OR 1.35, 95% CI 1.14–1.60, p < 0.0001). Further, the rad-score showed favorable discrimination. The AUC was 0.775. The best cut point based on maximizing the sum of sensitivity and specificity was 0.410 (AUC = 0.775, sensitivity = 63.5%, specificity = 80.3%, and accuracy = 0.701).

To assess the models beyond the purely mathematical perspective provided by performance measures, such as the AUC, we used DCA to estimate the predicted net benefit of our model across all possible risk thresholds and thereby evaluate the effects of various risk thresholds [38, 39]. DCA showed that when the threshold probability was higher than 0.3 the use of the rad-score for the G grading NF-pNETs provided more benefits than the treat-all-patients or the treat-none schemes.

However, our study had some limitations. First, this study was retrospective in nature. Second, the number of patients with G3 pNETs was relatively small (only 18 patients). This study included the three types of scanner. However, considering that pNET is a relatively rare type of pancreatic tumor according to the published data in the literature, the number of patients in this study was acceptable. Third, we did not discriminate G2 from G3 any further. Finally, we did not develop the predicative model combing the clinical, MRI, and radiomic features to validate the G grading of pNETs. In this regard, future studies should be focused on multicenter validation with a larger sample size to obtain high-level evidence for the clinical application of the rad-score.

Conclusion

In conclusion, the MRI rad-score showed significant association with the grades of NF-pNETs. A higher rad-score was associated with a higher risk for G2/3. Thus, we state that the MRI rad-score, which is a non-invasive predictive tool, can preoperatively predict G1 and G2/3 grades in patients with NF-pNETs.