Introduction

Pancreatic neuroendocrine neoplasms (PanNENs) are relatively rare tumors, representing approximately 2% of pancreatic neoplasms [1, 2], though with increasing incidence, mainly due to their more frequent incidental diagnosis. Importantly, PanNENs comprise a wide range of entities, both histologically and macroscopically [2, 3]. In this respect, histology is the only validated methodology currently available that allows tumor grading, distinguishing low grade (G1) from more aggressive (G2/G3) neoplasms, mainly based on Ki67 proliferative index, and to assess other biological characteristics important for prognostic assessment [3,4,5]. However, endoscopic ultrasound-guided fine-needle aspiration biopsy (FNAB), an invasive procedure constrained by potential risks, has limited accuracy [6, 7], so that reliable histology may be obtained only after surgery.

On the other hand, macroscopic imaging features have been associated with the outcome of PanNENs, specifically size of the lesion, presence of necrosis, nodal involvement, or distant metastases [8,9,10], but these cannot be considered sufficiently reliable to drive the therapeutic decision [11, 12]

Recently, radiomics emerged as a promising tool in characterizing tumors [13]: it consists in extracting quantitative data from medical images, aiming to build models predicting histological characteristics and/or clinical outcomes [14,15,16,17]. Few studies explored the potential of radiomics in the setting of PanNENs, often showing contrasting results since limited by small sample size issues and potentially biased methodologies [18,19,20,21,22,23,24,25,26].

The aim of the current study, based on a sufficiently large population, was to assess the association between PanNEN histological characteristics and computed tomography (CT) radiomic features (RFs), by applying a machine learning approach optimized to avoid/limit the risk of overfit. Clinicoradiological features were also tested.

Materials and methods

Patients’ cohort

This is a monocentric, retrospective, observational study including patients who underwent upfront surgery for PanNEN at San Raffaele Scientific Institute (Milan, Italy) from January 2015 to December 2021; data were collected within the context of an Ethics Committee–approved study in patients who had signed an institutional procedure–specific informed consent. From a prospectively acquired database, adult patients without visible distant metastases who underwent adequate quality abdominal CT imaging within 1 month before index surgery were enrolled. The resulting population (n = 101) was then randomly split into training (n = 70) and validation cohorts (n = 31) according to the second level of the TRIPOD guidelines for the validation of predictive models in oncology [27].

The histological endpoints considered, as defined by postoperative histological specimens, were tumor grade (G) (G1 vs. G2/3), the presence of distant metastases (M+), metastatic lymph nodes (N+), and microvascular invasion (VI).

Clinical variables

Demographic variables were retrospectively reviewed from an electronic database.

Radiological variables and radiomic features

In patients who underwent multiple preoperative CT scan, the last examination closest to the date of surgery was used for review.

CT protocol

Three different CT scanners were used (Philips Brilliance [64 slices]; Toshiba Aquilion [16 slices]; Siemens Somatom Definition Flash [128 slices]). Apart from beam collimation, which relies upon the number of detector rows, scanning parameters were as follows: 0.938–0.983 (pitch), 100–120 kVp (tube voltage), 138–534 mAs (tube current), 2–3 mm (slice thickness), 1 mm (gap). The matrix size of reconstructed CT images was 512 × 512 pixels, and pixel/voxel size ranges were 0.598–0.888 mm and 0.819–2.206 mm3, respectively. CT protocol included administration of intravenous non-ionic iodine contrast medium (Iopromide, Ultravist 370 mg iodine/mL (Bayer HealthCare), 120 mL at a rate of 4 mL/s) and consisted of a multiphase acquisition (unenhanced, arterial and portal venous axial scans of the abdomen).

The variability of the features between the three scanners was tested through the Mann-Whitney test, finding no significant inter-scanner variations: this result, together with the careful application of the abovementioned acquisition protocols, should limit any potential bias due to feature repeatability.

Conventional imaging parameters

CT findings were selected for analysis by two radiologists and two senior consultant pancreatic surgeons on the basis of their clinical experience; variables previously described in the literature were also considered. The selected CT findings included the followings: (i) presence of necrosis, defined as tumoral tissue that did not enhance in the arterial phase and PVP [28, 29], (ii) presence of cystic/liquid component, (iii) pancreatic parenchyma atrophy, defined as a significant reduction in the volume of the gland, (iv) macroscopic arterial/venous infiltration, and (v) contiguous organs invasion.

Delineation

The robustness of CT RFs against interobserver contouring variability was already assessed by our group [30, 31], showing high intra-correlation coefficient (ICC) for all features. As previously reported, since PanNENs show the greatest conspicuity on arterial phase images, it was established to contour tumors in this phase only. Then, a rigid registration, based upon a box ROI (region of interest) surrounding the pancreas and including the nearest structures, was performed between arterial and unenhanced CT images: a mutual information algorithm was applied followed by manual fine-tuning, by visually overlaying images with and without contrast. The contours were transferred on the co-registered unenhanced CT images, and possibly adjusted for minor local anatomical discrepancies due to respiration and organ motion between different phases (Fig. 1). The contouring and rigid registration of images were performed using the Eclipse System (Varian Medical Systems Inc.).

Fig. 1
figure 1

The tumor is delineated on contrast-enhanced CT, arterial phase (on the left), and then projected onto the corresponding unenhanced image (at the middle) using a registration based on a box ROI surrounding the pancreas: if necessary, the contours were then manually adjusted to correct small anatomical discrepancies in the resulting co-registered image (on the right)

The choice of unenhanced images for radiomic features extraction was due to the fact that contrast medium administration could modify tissue heterogeneity with respect to the intrinsic inter-patient variability of contrast administration.

Radiomic feature extraction

All images were resampled at cubic voxels of 0.78 × 0.78 × 0.78 mm3 with bilinear interpolation using an automatic workflow expressly developed in commercially available software (MIM Software Inc., version 6.5.5). This procedure was implemented to reduce directional bias, according to the specific recommendation of the International Biomarker Standardization Initiative (IBSI) [32, 33]. Image rebinning was also necessary, not only to speed up the process of RF extraction, but also to limit noise: we chose 64 bins, as reported in the literature [34, 35]. DICOM files were then imported into MATLAB using the Computational Environment for Radiological Research [36, 37]. One hundred eighty-two RFs of first and higher order were extracted using SPAARC Pipeline for Automated Analysis and Radiomics Computing (IBSI complying) [32,33,34,35].

Statistical analysis

As previously stated, the original population was randomly split into training (n = 70) and validation (n = 31) cohorts.

RF redundancy limitation

To limit the risk of redundancy, we first applied, for each endpoint, a correlation-based filter: in short, a Spearman coefficient threshold equal to 0.80 was fixed to select redundant (≥ 0.80) and independent RFs (< 0.80). The independent variables were retained for further analysis; among redundant variables, the ones with the best p value resulting from univariate logistic regression were selected.

Model development

To assess the best combination of the previously selected clinical, radiological, and radiomic variables, a machine learning bootstrap-based method was used. In short, the training set of data was bootstrapped 1000 times and a backward multivariate logistic regression was run for each sample including two (for G and M+) or three (for N+ and VI) variables according to the number of events for each endpoint [38,39,40,41]. Three models were then developed for each endpoint: a “conventional” radiological model, a strictly radiomic model, and a combined model considering information from radiomic, conventional radiologic, and clinical variables. Finally, a prognostic index was derived according to the following formula:PI = 1/(1 + EXP(−(Σbi ∗ xi)) where bi are the MLR coefficients associated with the covariates xi.

Model validation

To assess the ability of the prognostic index in stratifying patients according to the histological endpoints, the coefficients of the prognostic index were directly entered into a new univariate logistic regression considering data from validation set. For each model, a p < 0.05 was required for considering it validated.

Analyses were performed using homemade Matlab codes.

Results

Patients’ characteristics are summarized in Table 1. Twenty-five patients (24.7%) had G2/G3 tumor (specifically, three patients only [2.9%] had undifferentiated neoplasm), 37 (36.6%) were shown to have nodal involvement, 14 (13.8%) suffered from distant metastases (mostly in the liver), and 38 (37.6%) had microvascular spread of the disease at pathological specimen. Between training and validation cohorts, no significant differences were found when considering both radiological and clinical variables, nor pathological data. Median tumor volume was 6 cc [1.3–19.9].

Table 1 Patients’ characteristics

The combination of two variables only (one radiomic and one clinicoradiological feature) resulted in good prediction of the risk of M+ and G with AUC = 0.85 and 0.67, respectively; these results were confirmed in the validation cohort (AUC = 0.77 and 0.72, respectively). The models predicting the risk of IV and N+ (both comprising two radiomic and one clinicoradiological feature) showed AUC = 0.82 and 0.72, respectively, in the training set; these results were confirmed in the validation cohort (0.75 and 0.62, respectively). A pure RF_model could be generated only when considering M+ and G as endpoints, with similar performances of the corresponding COMB_models (AUC = 0.81 and 0.68, respectively, in the training cohort; AUC = 0.81 and 0.70 in the validation set). A pure “conventional” radiological model failed to be confirmed in the validation set for all endpoints with the only exception of microvascular invasion.

Negative predictive values resulted moderate to high for all validated models for the different endpoints, ranging between 77.8% (G+ COMB_model) and 97% (M+ RF_model).

The performances of the models for each endpoint are reported in Table 2 and summarized in Fig. 2. In Fig. 3, ROC curves of models are shown.

Table 2 Overall model performance in both training and validation cohorts of radiological, radiomic, and combined models in prediction of metastases (M), grade (G), lymph nodes (L), and microvascular Invasion (MI), quantified in terms of area under the ROC curve (AUC), positive and negative predictive values (PPV, NPV), specificity (SP), and sensitivity (SE). OR, odds ratio; CI, confidence interval
Fig. 2
figure 2

Summary of performances of the models for the three endpoints. Training and validation. AUC, areas under the curve; PPP, positive predictive power; NPP, negative predictive power

Fig. 3
figure 3

ROC curves of models developed for each endpoint, training and validation

Discussion

Few studies explored the potential of radiomics in the setting of pancreatic neuroendocrine neoplasms (PanNENs), often showing contrasting results since limited by small sample size issues and potentially biased methodologies [18,19,20,21,22,23,24,25,26]. In the present study, we applied, in a relatively large cohort of patients, a machine learning approach optimized to limit/avoid the risk of overfit; in doing so, we sought to develop and validate preoperative models (including a maximum of three variables) based upon CT images to predict tumor grade (G1 vs. G2/3), presence of distant metastasis, metastatic lymph nodes, and microvascular invasion at pathological specimen.

The vast majority of the present literature focuses on tumor grade prediction [10, 18,19,20, 22, 23, 42,43,44,45], which is indeed a crucial cornerstone for treatment planning being a surrogate for biological aggressiveness. In this respect, an image-based biomarker able to accurately predict grading could be of great impact, especially when considering patients with small lesions (< 2 cm) which are generally thought to correspond to well-differentiated (G1) tumors to be conservatively managed [11, 46]. Nevertheless, these small tumors do sometimes reveal aggressive biological behavior and need a more aggressive approach [2, 47]. Importantly, endoscopic ultrasound-guided fine-needle aspiration biopsy is not always reliable in determining tumor grading of small tumors [6, 7, 48]. A possible solution could come from radiomics, since radiomic features (RFs), being derived from the whole volume of the lesion, are paradoxically more representative of the heterogeneity of the entire lesion than histology itself (from bioptic samples). In the present study, we found that coupling one robust RF (Morphology_areaDensity_aabb) with one conventional radiological finding (tumor necrosis) resulted in good negative predictive value (77.8%) and area under the curve (0.72) for grade prediction.

These results corroborated our previous findings in a pilot study on 39 patients [26] and highlight the importance of a rigorous radiomic workflow based upon (i) a strict selection of few robust RFs and (ii) availability of an independent validation cohort to reduce any risk of overfitting. Other studies tried to avoid this issue by restricting the number of variables [20], or splitting the cohort into training and validation sets [18, 19, 43]; however, with this second approach only, if no attempt is made to achieve optimally robust models in the training group, the performances may significantly reduce in course of validation, and the proposed findings may lack interpretability. In this respect, our findings are in good agreement with the results obtained by Bian et al [20] in a group of 102 PanNEN patients applying LASSO for tumor grade prediction.

Apart from G, to our knowledge, few other studies explored the value of radiomics to predict other histological characteristics of PanNENs [25, 49].

Accurate preoperative N staging represents indeed a major cornerstone in the treatment algorithm of PanNENs, since patients with different N stages have different prognosis and may need a different extent of lymphadenectomy or neoadjuvant treatment [50]; specifically, the number of positive lymph nodes is accurate in predicting recurrence for PanNENs after surgery [51]. A recent report by Mapelli and colleagues [49] found that second-order RFs extracted from T2 MR sequences have good predictive performance (AUC = 0.992) with respect to lymph nodal involvement. Our model has lower accuracy, but with the clear advantage of relying upon RFs derived from CT images, which are, actually, the standard of care when dealing with non-invasive staging of PanNENs.

Our models also show good results with respect to prediction of distant metastases and microvascular invasion, providing a further insight into disease biological behavior; these findings are indeed in agreement with previous literature [25].

Interestingly, the degree of clinical/biological interpretability of the RFs finally retained in each validated model is promising, being related to tumor irregular shape (as frequently observed in the setting of pancreatic adenocarcinoma) [52] and/or HU value heterogeneity. With regard to this last point, our group recently found an intriguing explanation connecting, in nonfunctioning PanNENs, microvessel density, radiological appearance in terms of HU values, and biological behavior [53]: in short, low microvessel density (assessed by CD34 staining), corresponding to hypoenhancement in arterial phase, has been found to be associated with pathological features of aggressiveness.

Finally, to objectify the incremental value of radiomics with respect to radiologists’ subjective assessment, we demonstrated that a model based upon, specifically, conventional radiological parameters failed to be confirmed in the validation set.

The present study has several limitations, the most important ones being its retrospective nature and the relatively small number of events observed. External validation is also warranted and already planned. Furthermore, our model has been thought not for a standalone usage but rather to be embedded in the multidisciplinary assessment of the patient.

In conclusion, despite the abovementioned limitations, the combination of few radiomic and clinicoradiological features by means of robust methodology that avoid/limit the risk of overfit resulted in robust presurgical prediction of histological characteristics of PanNENs, potentially providing a tool for patients’ personalized management, once more extensive external validation will be accomplished.