Introduction

Postoperative pancreatic fistula (POPF) is one of the most frequent and serious complications following pancreaticoduodenectomy, occurring in up to 20% of resections [1,2,3]. Despite advances in operative technique and improvements in perioperative management, clinically relevant fistulas (CR-POPF) are still the major cause of postoperative morbidity and mortality and can have a significant impact on length of hospital stay and associated costs [4, 5]. Identifying patients at increased risk of developing CR-POPF is a key element in tailoring treatment decisions and preventing or mitigating adverse outcomes and several risk scores have been proposed to help achieve this aim [6,7,8,9,10,11] These models, however, have limitations, with the main one being that, although they are based on well-established risk factors, parameters are assessed subjectively (e.g., pancreatic parenchyma texture) and/or intraoperatively (e.g., blood loss, pancreatic duct diameter). For effective decision-making purposes, a reproducible risk profiling model should be based on characteristics that can be assessed both pre-operatively and in an objective manner.

The occurrence of CR-POPF is related to pancreatic features (e.g., pancreatic parenchyma texture, Wirsung’s duct size) and patient frailty [6,7,8, 12,13,14,15]. A preoperative computed tomography (CT) scan can provide insights into these anatomical features and also aspects of patient frailty, such as sarcopenic obesity [16]. The advent of machine learning provides a unique possibility to measure and analyze these factors through quantitative image analysis [17].

In this investigation, we aimed to develop a reliable and reproducible machine learning-based multimodal risk model capable of predicting CR-POPF by combining radiomic features and morphologic features correlated to surgical complexity and patient frailty assessed by preoperative CT images with patient characteristics.

Materials and methods

Study design

We designed an image-processing pipeline to detect and extract features from a predetermined set of slices from each abdominal CT scan. Patients’ clinical data were retrieved from a prospectively maintained database and were combined with the radiomic and geometric features extracted after the CT annotation process. The pipeline allowed the assessment of surface/volumes of pancreas, fat, muscle and bone, proceeding through calculation of fat/muscle ratio (sarcopenia) in relevant CT images, according to the flowchart shown in Online Resource 1. These data were then assessed by a machine learning classifier trained with supervised learning techniques to assess the risk of developing CR-POPF.

This study was approved by the institutional review boards of our institution (n° 11/20). All analyzed data were deidentified.

Patients

A total of 641 consecutive patients who underwent pancreaticoduodenectomy for pancreatic malignancies in our institution between 2011 and 2019 were screened. Of these, 205 patients had preoperative CT scans in our center: 59 were excluded because of neoadjuvant chemotherapy (n = 57) or radiotherapy (n = 2), 37 had a CT scan more than 40 days before surgery, 2 had pancreatitis, 4 for peripancreatic collections and 3 for severe artifacts. A total of 100 patients were included in the study.

All the included patients underwent contrast enhanced CT (CECT) scans according to the National Comprehensive Cancer Network (NCCN) criteria within 40 days before surgery and without severe artifacts [18]. All pancreaticoduodectomies were performed by dedicated surgeons with an open approach in a high-volume center with more than 150 pancreatic resections per year. A two-layer termino-lateral pancreatico-jejunal anastomosis was performed in every patient.

Postoperative complications were defined according to Clavien-Dindo classification with grade III or higher considered as major [19]. CR-POPF was classified according to the International Study Group for Pancreatic Fistula (ISGPF) classification [20].

Variable selection

We based the selection of predictor variables on a priori hypotheses guided by literature and clinical knowledge [6,7,8,9,10,11,12,13, 15, 21]. Patient variables included gender, age (years), height (m), and weight (kg).

CT features

Two experienced radiologists labelled all the CECTs acquired in portal phase (with a delay of 70 s after contrast injection) using MD.ai (https://www.md.ai), a cloud-based platform offering real-time collaboration and exportation of annotations and images.

CT segmentation of pancreatic parenchyma

Radiologists contoured pancreatic parenchyma of the body and tail of the pancreas by freeform region of interest (ROI) to extract radiomic features [22,23,24,25] (Fig. 1). The left border of the portal axis was chosen as landmark between the head and the body of the pancreas.

Fig. 1
figure 1

Contrast-enhanced CT of the abdomen with contoured pancreatic parenchyma (body and tail) by freeform region of interest (ROI) in axial image

Pancreatic duct annotation

Radiologists annotated anterior–posterior pancreatic duct diameter, in the plane orthogonal to the axis of the duct itself at the landmark between the head and the body of the pancreas with the label ‘Wirsung main’. The annotation was reproduced in the upper and lower slice with the labels ‘Wirsung up’ and ‘Wirsung down’. The surface area of the Wirsung section (mm2) was estimated using the surface defined by each Wirsung line, as in the Online Resource 2 in the supplemental contents.

Skeletal muscle and fat tissue annotation

Skeletal muscle and not adipose tissue (e.g. small and big bowel) were segmented using freeform ROIs on two CT slices located at the lower face of the third lumbar vertebrae and the lower face of the third and fourth intervertebral discs (Online Resource 3).

Using thresholding methods, we defined the following sets of levels: Hounsfield unit (HU) skeletal muscle: − 29 to 150 HU; subcutaneous and intramuscular adipose tissue: − 190 to − 30 HU; visceral adipose tissue: − 150 to − 50 HU (Online Resource 4). Area values obtained by each ROI were evaluated and total abdominal muscle (TAMA, cm2/m2), visceral fat (VFA, cm2), and subcutaneous fat areas (SFA, cm2) were calculated [26].

Radiomic feature extraction

Radiomic features for pancreas parenchyma and skeletal muscle tissues were extracted with pyradiomics (v. 2.0), an open-source package developed and operating in Python and deployable in our pipeline [27]. The following sets of radiomic features were considered: first-order statistics; shape-based (3D); shape-based (2D); gray level co-occurrence matrix (GLCM); gray level run length matrix (GLRLM); gray level size zone matrix (GLSZM); neighboring gray tone difference matrix (NGTDM); gray level dependence matrix (GLDM). Images were not filtered before extracting radiomic features.

Statistical analysis

Population split

The study population was split using tenfold cross-validation 250 times to obtain the best configuration of parameters and another 250 times to test performances. The whole dataset was differently divided each time into ten parts: nine acted as training cohorts and the tenth was the validation cohort, in which the algorithm settings were tested. This procedure was repeated with a different folder chosen for validation each time; after using all tenfolds for testing a specific configuration, an averaged result was obtained. After repeating this procedure 250 times, validation results from these simulations were evaluated to obtain the best parameters setting for the pipeline. All configurations’ numerical values were retrieved by following a randomized parameters search [28]. Once the best configuration was found, the algorithms performances were evaluated, using the above-mentioned configuration, with a nested cross-validation phase to obtain averaged validation results.

Algorithm

The proposed pipeline consisted of several steps, which were feature imputation, feature scaling, feature selection, dataset augmentation, search of classifier with its specific parameter, and training and testing of the selected classifier.

For the feature selection step, no dimensionality reduction technique was used directly to keep features interpretable, so a regularization, using LASSO was performed to shrink the coefficient of less important features to zero [29]. Since our dataset was imbalanced for events, a dataset augmentation technique (Synthetic Minority Oversampling Technique, or SMOTE) was adopted to generate more samples for the minority class [30]. Finally, we tested as classifiers two machine learning models: an L1 regularized logistic regression model and a random forest model.

Dichotomous data are expressed as absolute numbers and continuous data as median and interquartile range (IQR).

Variable importance

We evaluated variable importance by the coefficient relative value for the logistic regression model and impurity-based feature importance for the random forest model, as shown in Online Resource 5.

Evaluation

We used scikit-learn version 0.23.2 package, a Python machine learning library, to classify patients in a supervised setting, using considered data (patients, CECT annotations, radiomics features) as input features, as reported in previous publications [27, 31, 32].

The performance of models was evaluated with the commonly used metrics of area under-the curve (AUC), specificity, sensitivity, and positive predictive value (PPV) and negative predictive value (NPV). Further metrics are reported in Online Resource 6.

Results

Study population

Of the 100 patients included in the study, 35 were female and 65 were male, and median age was 67.4 years (range 57.7–74.5). The majority of patients had pancreatic adenocarcinoma (PDAC) (n = 61), although the cohort was mixed and included patients with ampullary adenocarcinoma (n = 19), cholangiocarcinoma (n = 9), pancreatic neuroendocrine tumor (NET) (n = 5), duodenal cancer (n = 2) and benign lesions of the pancreatic head region (n = 4). Demographic and clinical characteristics and data regarding postoperative complications are summarized in Table 1.

Table 1 Baseline characteristics

The overall rate of CR-POPF occurrence was 20%, with grade B and grade C incidence of 15% and 5%, respectively. The median hospital length of stay was 13 days (range 9–21) for the whole population, 21 days (16–21) in patients with grade B POPF and 38 days (25.5–40.5) in patients with grade C POPF. Twenty-two patients developed major postoperative complications classified as grade III or higher according to Clavien-Dindo classification.

Prediction model performance

The model demonstrated an AUC of 0.807 (0.155), specificity of 0.824 (0.133) and sensitivity of 0.571 (0.337) in predicting the occurrence of a CR-POPF, (Fig. 2). PPV was 0.468 (0.310) and NPV was 0.890 (0.084). The ten most significant positive coefficients for CR-POPF prediction included seven related to the shape of the pancreas and radiomics features, two related to sarcopenia and one related to the position of the Wirsung duct. The ten most predictive negative coefficients included five related to sarcopenic features, three to the pancreas shape and to radiomics, one to patient anthropometrics (height) and one to the main diameter of the Wirsung duct (Fig. 3).

Fig. 2
figure 2

Logistic regression (A) and random forest (B) receiver operating characteristic (ROC) for POPF prediction

Fig. 3
figure 3

Logistic regression: top significant ten coefficients, positive (A) and negative (B)

A model based on the random forest approach resulted in a loss of sensitivity. AUC was 0.749 (0.209), specificity was 0.914 (0.106), and sensitivity was 0.424 (0.346); PPV and NPV were 0.502 (0.414) and 0.869 (0.076), respectively (Fig. 2). The most significant coefficients utilized in the random forest model for CR-POPF prediction are shown in Fig. 4.

Fig. 4
figure 4

Random forest model for POPF prediction: top significant ten coefficients

The logistic regression model was also tested to assess its ability to predict a length of hospital stay longer than the median duration. We observed an AUC of 0.709 (0.160), specificity of 0.633 (0.206), sensitivity of 0.715 (0.209), PPV of 0.646 (0.173), and NPV of 0.732 (0.179). Regarding the 10 most significant positive prognostic characteristics, eight were related to the pancreas shape and to radiomics features, and two were related to sarcopenia. Interestingly, these characteristics are not the same as those observed in the prediction model for CR-POPF. Similarly, among the most significant ten negative prognostic factors, five were related to sarcopenia, only one to shape and radiomics of the pancreas, one to patient anthropometrics (height) and three to the characteristics of the Wirsung’s duct.

CR-POPF greatly overlaps with major postoperative complications after pancreaticoduodenectomy, so we also tested our model to predict any postoperative major complication. As expected, the ability to predict major complications was only slightly reduced, with an AUC of 0.690 (0.209), specificity of 0.801 (0.149), sensitivity of 0.436 (0.347), PPV of 0.373 (0.327), and NPV of 0.846 (0.093).

Discussion

Accurate preoperative risk profiling of patients at high-risk of developing CR-POPF after pancreaticoduodenectomy is fundamental in optimizing perioperative management and preventing or mitigating adverse events. Furthermore, risk stratification contributes to preoperative risk assessment, critical in deciding upon the best surgical approach in frail patients.

Our aim is to develop a risk model from the preoperative CECT, which is reproducible, immediate, applicable across healthcare settings, and designed to identify clinically relevant pancreatic fistulas. This new approach may help to overcome the limitations of previous proposed scores and potentially play a role in the identification of patients who could benefit from an upfront surgical approach rather than neoadjuvant therapy or prehabilitation protocol, or even to be excluded from surgery. Our study has shown that the machine learning model has the ability to identify patients that are a low risk of developing a CR-POPF. Moreover, the model can also predict if the postoperative length of stay of the patient will be in the median range.

Previous studies have reported other machine learning models to predict postoperative complications after surgical procedures but, to the best of our knowledge, only a few studies have focused on predicting outcomes after pancreatic resection, [33, 34]. Using a prospective database of 110 patients undergoing pancreatoduodenectomy, 55 with and 55 without POPF, Kambakamba et al., evaluated the prognostic signature of machine learning-based texture analysis to estimate pancreatic consistency based on preoperative non-CE CT scan. Results were similar to our model, with a sensitivity of 76%, specificity of 64%, and an AUC of 0.78 for POPF prediction. This demonstrates the feasibility and reproducibility of this approach, even if this only evaluated pancreatic parenchyma [35]. Han et al. evaluated medical records of 1769 patients underwent pancreatoduodenectomy, 221 of whom had POPF, and developed a platform based on a machine learning algorithm for POPF prediction that incorporated both subjective and intraoperative variables [36].This model also showed good predictive ability with the support of machine learning models but is dependent on intraoperative data.

Limitations

Our study has several limitations. First, we consider a small sample size, which is related to our strict selection criteria and to the exclusion of a large number of patients with CT scans performed elsewhere, since ours is a referral center. Second, our model was specifically designed to assess CR-POPF. For this reason, when applied to different outcomes, such as length of stay or overall postoperative major complication, there is a partial decrease in reliability. Third, the major ability of the model is in ability to predict patients unlikely to develop a CR-POPF, while its PPV is less robust due to its suboptimal sensibility. Despite that, the identification of a subgroup of patients unlikely at low-risk of POPF is clinically relevant. Strategies that aim to improve the recovery and reduce the intensity of care for these patients could be implemented, while it may also allow clinicians to focus attention on high-risk patients. Our next steps will be focused on the refinement of the model, including more cases from external cohorts, and on the automation of the entire process. We plan to enhance the accuracy of the prediction and made it available automatically at the end of the preoperative CT scan.

Conclusions

To our knowledge, our study is the first to develop a machine learning model that combines radiomic features correlated with surgical complexity and patient frailty obtained by preoperative CT scan to predict the occurrence of a CR-POPF. Our machine learning risk model appears to be a reliable tool for risk prediction in patients undergoing pancreaticoduodenectomy, performing well in excluding subjects at risk of developing a CR-POPF. This novel approach, may provide an individualized, objective and reproducible risk stratification that can be easily implemented in clinical practice, enhancing effectiveness by allowing more tailored approaches rather than one-size-fits-all approach.