Introduction

Pancreatic cystic lesions are frequently identified incidentally due to increased utilization of cross-sectional abdominal imaging and improvement in scanner technology. The prevalence of incidental pancreatic cysts ranges from 2% on CT to 45% on MRI, with pooled estimated prevalence of 8% [1,2,3]. These pancreatic cysts are formed by a diverse group of lesions, ranging from benign neoplasms to neoplasms with the potential for aggressive clinical behavior. The mucin-producing cysts (e.g., intraductal papillary mucinous neoplasms (IPMNs) and mucinous cystic neoplasms (MCNs)) are premalignant lesions that can transform into invasive pancreatic ductal adenocarcinoma (PDAC). Accurate diagnosis of pancreatic cystic lesions is important in determining appropriate treatment and surgical candidacy [4, 5]. However, these pancreatic lesions that form cysts can share overlapping clinical and radiological features making it difficult to differentiate among benign and potentially malignant lesions during preoperative evaluation. As evidenced in surgical databases, 17–25% of patients who undergo surgical resection for a presumed mucin-producing cystic lesion are found to have a benign cyst [6, 7]. Pancreatic resections are some of the most complex abdominal operations that are associated with considerable morbidity, with approximately 40% of patients experiencing postoperative complications [8]. Patients with a pancreatic cyst that does not meet criteria for surgical resection at the time of diagnosis are often followed clinically for 10 years or even longer based on current guidelines [9], which can be a significant burden to patients and an increased cost to the healthcare system.

Recently, radiomics-based approaches have been explored to differentiate pancreatic cysts. Radiomics converts imaging data into high-dimensional mineable quantitative features [10] and has shown remarkable progress correlating image features and clinical features with patient outcomes [11]. Previous studies have mostly focused on risk stratification of IPMNs by identifying radiomic signatures that are predictive of the grade of dysplasia [12,13,14,15,16,17,18]. Other studies aimed to discriminate between pancreatic serous cystadenoma (SCA), a benign neoplasm, from mucin-producing cystic neoplasms [19,20,21,22,23], mucin-producing cystic neoplasms from non-mucinous cysts [24], or among the 3–4 classes of pancreatic cystic lesions [25, 26].

To our knowledge, no study has examined a head-to-head comparison of the relative accuracy between a radiomic analysis and an expert radiologist in determining cyst type with 5 cyst classes. This is an important void in the literature as it is yet unclear if a radiomic approach using available methodology has a potential advantage over the current standard of care. The purposes of this study are to compare the diagnostic accuracy of a radiomics-based pancreatic cyst classifier to an experienced academic radiologist and to explore the added value of radiomics-based classifier.

Materials and methods

Patients and CT acquisition

This retrospective study was HIPPA compliant and was approved by our institutional review board. A total of 214 patients (69 male, 145 female, average age: 54.8 ± 17.0 years) who underwent surgical resection for a pancreatic cyst(s) from 2003 to 2016 were randomly selected from our radiology and pathology databases, with enrichment of the rarer types of pancreatic cysts (MCNs, SCAs, SPNs, and cystic PanNETs). The selected cases include 64 patients with IPMNs, 33 MCNs, 60 SCAs, 24 SPNs, and 33 patients with cystic PanNETs (Table 1). The SCAs were resected based on clinical symptoms and/or diagnostic uncertainty in the preoperative setting. Among the 214 patients, 115 have been previously reported [25] in a study of the application of random forest and neural networks to the classification of IPMNs, MCNs, SCAs, and SPNs. In cases with multiple cysts within the pancreas, the pathologic diagnosis was labeled as the pathology of the dominant cyst.

Table 1 Demographic and image characteristics of the pancreatic cyst cases

One hundred and fifty-nine patients with cystic lesions were scanned with dual-source MDCT scanner (Somatom Definition, Definition Flash, or Force, Siemens Healthineers), 35 patients were scanned on a 64-slice MDCT scanner (Somatom Sensation 64, Siemens Healthineers), and 20 patients were scanned on a 16-slice MDCT scanner (Somatom Sensation 16, Siemens Healthineers). Patients were injected with between 100 and 120 mL of iohexol 350 (Omnipaque, GE Healthcare) at an injection rate of 4–5 mL/sec. Contrast dose weight based at a dose of approximately 1.5 mL/kg, up to dose of 120 mL. Scan protocols were customized for each patient to minimize dose but were on the order of 120 kVp, effective mAs of 270, and pitch of 0.6–0.8. The collimation was 128 × 0.6 mm or 192 × 0.6 mm or the dual-source scanner, 64 × 0.6 mm for the 64-slice scanner. Arterial phase imaging was performed with fixed delay or bolus triggering, usually between 30 and 35-s post-injection, and venous phase imaging was performed at 60–70 s. The venous phase images were used for the analysis in this study. All images were reconstructed with 0.5-mm increment and 0.75-mm slice thickness.

Image Segmentation

Preoperative CTs were reviewed by an abdominal radiologist with > 7 years of experience to document the size and location of pancreatic cysts and presence of calcifications or pancreatic duct dilatation (> 3 mm in diameter). The 214 CT exams were randomly divided between two trained researchers (3 years of experience) for image segmentation. The entire three-dimensional (3D) volume of the cystic lesion(s) and pancreas were manually segmented (Fig. 1) based on venous phase images using the Medical Imaging Interaction Toolkit (MITK) and a commercial annotation software (VelocityTM, Varian Medical Systems Inc.) [27]. The boundaries were verified by three abdominal radiologists with 7–30 years of experience. The researchers and the radiologists had face-to-face sessions to review each case to correct any errors in segmentation.

Fig. 1
figure 1

Two example cases of manual segmentations of pancreas and cystic lesion. The boundary of cystic lesion is outlined in green and the boundary of the background pancreas is outlined in purple

Computation of radiomics features can be affected by the segmentation accuracy, and the inter-observer variability in segmentation accuracy was analyzed. 20 cases among 214 cases were randomly selected and segmented by two image labelers independently. The inter-observer variation between two image labelers was evaluated by two performance parameters, Dice-Sørenson similarity coefficient (DSC) and Jaccard index (JI) to measure the similarity of two regions. 1 indicates perfect overlap and 0 indicates absence of overlap for both measures.

Image analysis and machine learning

A total of 488 radiomics features [10, 28] from the segmented volume were extracted to define cystic lesion and pancreas phenotypes based on venous phase images (Fig. 2). Radiomics features used in this study included 14 first-order statistics of the volumetric CT intensities, 8 shape features of the target structure, 33 texture features from a gray-level co-occurrence matrix and a gray-level run-length matrix, 376 texture features from the 8 filtered volumes by wavelets [28], and an additional 47 texture features form the filtered volume by Laplacian of Gaussian (LoG). Ten image features were extracted from the whole pancreatic region. Table 2 represents the whole feature set used for cyst classification in this study. Two demographic features, age and gender, were also incorporated into the final model.

Fig. 2
figure 2

The radiomics-based classification process. Pancreatic cysts and background pancreas are manually segmented from abdominal CT (left column). Feature extraction process extracts first-order signal intensity statistics, shape features based on 3D surface mask, texture features among adjacent voxels, and filtered features using wavelet or Laplacian of Gaussian filters (middle column). Features are analyzed with machine learning techniques such as random forest classifier to predict clinical outcomes (right column)

Table 2 The demographic and image features computed from the segmented pancreatic cystic lesion and the whole pancreas

To effectively test the limited number of cases of less common cyst types, such as SPNs, fourfold cross-validation was performed in this study. Each type of cystic lesion was randomly divided to four groups and each group from all 5 cyst types composed a fold. A random forest machine learning algorithm was used for cyst type classification. There were a total of a hundred thousand trees built. To test each fold and each decision node was divided until a unique case remained.

Radiologist interpretation

One academic abdominal radiologist (> 25 years of experience) who was blinded to the pathologic diagnosis reviewed the venous phase images for each case and provided their most likely diagnosis. The radiologist was provided the patient age and gender for each case, but was otherwise blinded as to the final pathology.

Statistical analysis

The diagnostic performance of the radiomics model was compared to the radiologist. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and area under the curve (AUC) of the receiver operating characteristics (ROC) curves were calculated for each cyst type.

Results

The mean maximal 2D diameter of the cystic lesions was 4.28 ± 3.25 cm. The site of cystic lesions were the head and/or uncinate process (n = 77), neck (n = 14), body (n = 50), and tail (n = 73) of the pancreas. The demographic and cystic lesion features stratified by cyst type are summarized in Table 1. As expected, there were significant differences in the age (p < 0.001) and gender (p = 0.0003) for different cyst types. MCNs (97.0% female), SCAs (75.0% female), and SPNs (91.7% female) were more common in women, cystic PanNETs were more common in men (39.4% female), and IPMNs (51.6% female) did not have a stronger gender predilection in our sample. Patients with SPN (28.8 ± 11.4 years) were significantly younger than patients with other cyst types. MCNs were seen exclusively in the body and tail. MCNs (5.65 ± 5.00 cm) and SPNs (5.59 ± 3.05 cm) were larger than IPMNs (3.24 ± 2.02 cm), SCAs (4.53 ± 2.61 cm), and cystic PanNETs (3.53 ± 3.40 cm) (p = 0.001). SPNs (63.41 ± 21.32 HU) and cystic PanNETs (64.29 ± 38.15 HU) showed the highest attenuation due to the presence of solid components and IPMNs (24.53 ± 12.92 HU) and MCNs (21.69 ± 9.19 HU) showed the lowest attenuation. Pancreatic duct dilatation was most frequently associated with IPMNs (53.1%) (p < 0.001).

The inter-observer similarity for pancreas segmentation was 90.44 ± 3.68% and 82.74 ± 6.10% in terms of DSC and JI, respectively. The similarity of cyst segmentation was 89.04 ± 11.88% and 81.14 ± 15.93% with DSC and JI, respectively. The boxplot of inter-observer variation is represented in Fig. 3. The inter-observer variation showed high similarity among the labelers as the segmentation was performed under controlled protocols [27].

Fig. 3
figure 3

The boxplot of the inter-observer variations of manual segmentation. DSC and JI percentage values are represented for the whole pancreas and pancreatic cyst lesions. The box represents the first quartile, median, and the third quartile from the lower border, middle, and the upper boarder, respectively, and the lower and the upper whiskers show the minimum and the maximum values. The dot point represents the average value of each performance measure

Among the whole 490 features (488 radiomics features plus age and gender), thirty features were found to reduce redundancy by the minimum-redundancy maximum-relevancy feature selection [29] based on mutual information, which showed the best classification performance, with AUC of 0.940 (Table 3). Age and gender were included in the model due to the known gender and gender associations for pancreatic cysts. These demographic features would be available to the radiologist at the time of exam, and this would simulate the real-world application. Age, median and mean intensities of the original images and wavelets, and fractal dimension were highly ranked for the classifications. Gender was ranked as 29th feature for the classification. The list of selected features is described in Table 4. The distribution of four representative features among the 30 selected features for each type is shown in Fig. 4.

Table 3 Confusion matrix of radiologist’s and radiomics-based classification
Table 4 Radiomics features highly dependent on cyst types
Fig. 4
figure 4

The box plots of feature distributions highly correlated to differentiate cystic types, including age (a), median intensity (HU) (b), Laplacian of Gaussian fractal dimension (c), and cyst to pancreas volume portion (d)

The radiologist’s interpretation of the 214 cases showed AUC of 0.895 for overall cyst classification with AUC of 0.889 for IPMNs, AUC of 0.842 for MCNs, AUC of 0.769 for SCAs, AUC of 0.865 for SPNs, and AUC of 0.908 for cystic PanNETs. The radiomics-based machine learning approach showed AUC of 0.940 for overall cyst classification and AUC of 0.942 for IPMNs, AUC of 0.883 for MCNs, AUC of 0.851 for SCAs, AUC of 0.828 for SPNs, and AUC of 0.905 for cystic PanNETs. The confusion matrix is summarized in Table 3 and the ROC curves for radiologist and radiomics classification are shown in Fig. 5

Fig. 5
figure 5

The receiving operating characteristics curves between radiologist and radiomics classifications of pancreatic cysts for (a) intraductal papillary mucinous neoplasm, (b) mucinous cystic neoplasm, (c) serous cystadenoma, (d) solid pseudopapillary neoplasm, and (e) cystic pancreatic neuroendocrine neoplasm

The radiologist and radiomics feature-based classification showed similar distribution in the confusion matrix. Although the AUC for overall cyst classification of the radiomics-based classification was slightly higher than the radiologist classification (AUC of 0.940 vs. 0.895), it failed to reach statistical significance due to the small sample size. Examples with discordant predictions between radiologist and radiomics-based model are shown in Figs. 6 and 7.

Fig. 6
figure 6

Examples that radiologist’s prediction was incorrect but the radiomics classification was correct. (a) Axial IV contrast-enhanced CT image showed a well-circumscribed lobulated cystic lesion in the pancreatic head (arrow). The radiologist’s prediction of an intraductal papillary mucinous neoplasm was incorrect. Radiomics classification of serous cystadenoma was correct. (b) Axial IV contrast-enhanced CT image showed a well-circumscribed thick-walled exophytic cystic lesion arising from the body of pancreas (arrow). The radiologist’s prediction of a cystic pancreatic neuroendocrine tumor was incorrect. Radiomics classification of a solid pseudopapillary neoplasm was correct

Fig. 7
figure 7

Examples that the radiomics classification was incorrect and the radiologist’s prediction was correct. (a) Axial IV contrast-enhanced CT image showed a well-circumscribed solid and cystic lesion in the pancreatic head (arrow). The radiomics classification of a serous cystadenoma was incorrect. Radiologist’s prediction of a solid pseudopapillary neoplasm was correct. (b) Axial IV contrast-enhanced CT image showed a well-circumscribed cystic lesion with internal septations in the head of pancreas (arrow). The radiomics classification of an intraductal papillary mucinous neoplasm was incorrect. The radiologist’s prediction of a serous cystadenoma was correct

Discussion

Improvement in radiological techniques and increased utilization of cross-sectional imaging have led to an increase in the diagnosis of pancreatic cystic neoplasms. These neoplasms include a diverse cohort, with some lesions being benign and others having malignant potential. Given the considerable morbidity associated with pancreatic resections, accurate preoperative assessment of the lesion is vital in determining surgical candidacy. These cystic lesions present a diagnostic challenge with prior work demonstrating that a considerable proportion of patients undergoing resection are found on final histopathological analysis to have a benign lesion that, in retrospect, did not warrant surgical resection. While radiological assessment remains the gold standard for evaluation of these cysts, there is a need for a more powerful tool to accurately classify cyst types. In the current study we developed a radiomics feature-based classification system that was able to accurately classify cystic lesions and outperformed clinical judgment.

In this study, the performance of the radiomics feature-based classification achieved AUC of 0.940 in distinguishing among five types of pancreatic cystic neoplasms. The performance was similar to previous studies with multi-class pancreatic cyst classifications that included three or four cyst types, with accuracy of 79.6–83.6% [25, 26]. Previous studies on radiomics-based pancreatic cyst classification [19,20,21,22,23,24,25,26] did not include a direct comparison with a radiologist, therefore, it was difficult to assess if the radiomics-based classification reported provided any added value relative to the standard of care. The current study showed that the radiomics-based pancreatic cyst classification achieved equivalent performance as an academic radiologist with more than 25 years of experience. These results indicate that radiomics-based classification could be valuable in improving the current standard of care. Given that this model incorporates both clinical data and radiomic features, we believe that it is more widely applicable and comprehensive in assessment of pancreatic cysts. The radiomics-based classification showed AUC of 0.851 in the diagnosis of SCAs, which corroborated previous studies that showed AUC of 0.75–0.989 in differentiating SCAs from mucin-producing cysts [19,20,21,22,23]. The ability to confidently and accurately diagnosis SCAs, a “leave-alone” benign lesion, has the potential to eliminate unnecessary imaging surveillance and unnecessary surgery, which can reduce patient morbidity and healthcare costs. These radiomics-based classification systems may achieve superior performance to clinical and/or guideline-based features [14, 15]. This refined risk assessment can help with initial triage and tailor the surveillance duration and intensity to maximize the chance of cancer detection while minimizing costs. These cost savings can potentially offset costs associated with algorithm development and implementation.

This study has a few limitations. First, it was a single-center retrospective study. Fourfold cross-validation was used to assess radiomics-based model performance due to the small sample size relative to the number of cyst types. All these cases underwent surgical resection, which may bias the dataset toward atypical appearance of benign lesions (i.e., SCAs). The dataset was enriched with rarer pancreatic cyst types relative to IPMNs to evaluate the ability of the radiologist and radiomics model to discriminate among these rarer cyst types, which may limit the generalizability to the general population, in whom IPMNs are significantly more common. This study was performed on CT scanners from a single vendor. It is unclear whether variations related to scan acquisition (e.g., protocols, vendors) may affect the performance of the radiomics classification model. We only analyzed portal venous phase images in the current study, and the addition of arterial phase images may improve the accuracy of pancreatic cyst classification. MRI is frequently used in the evaluation of pancreatic cysts and can improve diagnostic confidence in the assessment of pancreatic cysts. We chose to apply the radiomics model to CT due to greater heterogeneity with MRI (e.g., vendor, imaging sequences) compared to CT and additional normalization is needed to transform arbitrary gray intensity values from MRI. Therefore, most of the existing publication on pancreas AI have focused on CT rather than MRI. Future research is needed to validate these results with larger external datasets from different institutions and to translate results across imaging modalities. Secondly, the performance of the radiomics-based model was compared to the performance of a single-academic radiologist. The experienced academic radiologist in this study may be more accurate at pancreatic cyst classification than an average radiologist in the community, which may underestimate the incremental value of the radiomics-based model. Future reader studies should also recruit multiple readers with a wide range of experience to measure the real-world impact of these radiomics tools. Thirdly, the current radiomics model only used CT-based features plus patient age and demographics. Other important clinical features such as symptoms, family history, laboratory values, and cyst fluid molecular markers [7] were not included in the current model, which should be incorporated into future models. Our prior experience has demonstrated that the predictive power offered by multiple features is often additive and can result in a stronger model [7].

Conclusion

This study showed that a radiomics-based model can achieve equivalent performance as an experienced academic radiologist in the classification of a wide array of pancreatic cysts with variable malignant potential. This model has the potential to refine pancreatic cyst management by improving diagnostic accuracy of cystic lesions, which can minimize healthcare utilization while maximizing detection of malignant lesions. This study confirms the ability of a radiomic-based model to accurately classify pancreatic cystic neoplasms. Further validation and clinical integration of this model could help optimize management of pancreatic cysts by maximizing the rate of detection of malignant lesions while reducing healthcare utilization.