Post-endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) remains a major issue with the trans-papillary approach, in which the incidence rate of PEP has been shown to be 3.5%–9.7% and the mortality rate has been shown to be 0.1%–0.7% [1].

To date, several patient- or procedure-related risk factors of PEP have been demonstrated, while the use of pancreatic stents and rectal nonsteroidal anti-inflammatory drugs (NSAIDs) have been shown to prevent PEP [2, 3]. Although these prophylactic methods are helpful for the prevention of PEP, they cannot completely prevent PEP due to the complexities associated with trans-papillary ERCP (e.g., selection of cannulation methods, unintentional pancreatic injections, or guidewire pass) [3, 4]. Invariable factors, such as young age and difficult cannulation arising from personal anatomy, are other unavoidable issues [2,3,4]. As a result, this multi-factorial etiology makes the prediction of PEP difficult [4, 5].

Therefore, if a simple prediction model for PEP is to be feasible immediately after ERCP, earlier distinction between low- and high-risk patients with PEP may assist with the health cost containment and the reduction in unnecessary admissions [4, 6, 7]. Moreover, earlier appropriate primary care and intensive care can prevent progression of severe pancreatitis [8,9,10]. However, a simplified and practical prediction system for PEP is not yet available, and early identification of PEP remains a challenge in this field because only few studies on prediction models for PEP immediately after ERCP have been reported even though there are many reports on the risk factors associated with PEP [3, 5, 11,12,13]. Therefore, the aim of this study was to establish a simple predictive scoring system for PEP immediately after ERCP.

Methods

Study design and patients

The present study was a retrospective single-center cohort study including consecutive patients with suspected hepatobiliary-pancreatic disorders who underwent trans-papillary ERCP attempts between January 1, 2012 and December 31, 2019. Data on ERCP procedures were retrieved from the Jikei University ERCP database and medical records. The database was updated immediately after each procedure and contained data of > 3 months of follow up. Written informed consent to undergo ERCP was obtained from all patients prior to participation. The opportunity to opt out of this study participation was also provided, and the requirement for informed consent was waived due to the anonymous retrospective observational study (opt-out method of informed consent).

All participants who underwent trans-papillary ERCP were enrolled in this trial. Patients with no papilla (e.g., pancreaticoduodenectomy, Roux-en-Y with hepaticojejunostomy), and those with ongoing acute pancreatitis were excluded. Patients in whom the target papilla could not be reached and discontinued (e.g., gastrointestinal tract stenosis and food residue at endoscopy), and patients who only underwent stent removal without ERCP were also excluded. This study complied with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement [14], and was approved by the Human Subjects Committee of Jikei University School of Medicine [ID no. 32-027 (10102)]. This study was also subsequently registered with the University Hospital Medical Information Network Clinical Trials Registry (identification no. UMIN000040471), and was conducted in accordance with the ethical principles of the Declaration of Helsinki (Fortaleza revision).

ERCP procedures

ERCP was performed under fluoroscopic view by seven experts who had experience of > 300 ERCPs with naïve papilla, or by 16 trainees who had experience of < 300 with experts’ interference, depending on the situation [15, 16]. Side-viewing duodenoscopes (JF-260, TJF-260V; Olympus Medical Systems, Tokyo, Japan) were used in most cases (e.g. normal anatomy, Billroth I). Forward-viewing scopes, including balloon assisted endoscope and/or cholangioscope, were used in some cases (e.g. Billroth II, Roux-en-Y with gastrectomy) depending on the gastrointestinal anatomy, or when both diagnosis and treatment of pancreaticobiliary conditions were required (Supplementary Tables 1, 2). Selective pancreatic- or biliary-cannulation methods of contrast-assisted, single-guidewire assisted (SGW), pancreatic guidewire assisted (PGW), double-guidewire assisted (DGW), rendezvous, or crossover was chosen depending on the situation. PGW technique is defined as the placement of a pancreatic guidewire to facilitate biliary cannulation using only contrast-assisted method [17]. DGW technique is defined as the combination of both pancreatic duct guidewire placement and SGW to achieve selective biliary cannulation without contrast injection [18]. Pancreatic stents of 5Fr diameter, or occasional endoscopic nasopancreatic drainage of that diameter, were also used in cases of easy pancreatic stenting.

Patients > 13 years old who underwent ERCP were conscious but sedated with intravenous midazolam and pethidine administration, and those < 12 years old were administered general anesthesia during the ERCP procedure.

Definitions of post-ERCP pancreatitis and severity

Diagnosis of PEP complied with revision of the Atlanta classification [1], while severity of PEP complied with the Japanese Guidelines for the Management of Acute Pancreatitis 2015 [19]. We used the Japanese Guidelines for the following reasons. First, the majority of patients who undergo ERCP are admitted to hospital in Japan. Second, initial endoscopic biliary drainage was usually performed for patients with acute cholangitis or obstructive jaundice. Third, the subsequent endoscopic treatments (e.g., stone extraction with endoscopic sphincterotomy and/or endoscopic papillary balloon dilatation) or surgeries for etiology were continuously performed, thus, maintaining the admission in Japan [20]. Thus, the number of prolonged hospitalizations (≥ 10 days) because of these processes facilitate severe PEP in patients with mild PEP, if cotton criteria or other classification are applied [21, 22]. Therefore, we used the Japanese Guidelines.

Selection of PEP risk factors

Univariable analysis was performed to select 35 potential predictor variables for PEP based on previous research, including ASGE and ESGE risk factors; however, our database did not include the total number of attempts on the papilla (Supplementary Table 1, 2) [2, 3]. The significant factors (P < 0.05) from univariate analysis were adopted into multivariate analysis in order to narrow the selected variables. In the multivariate analysis, to avoid imbalances when there were seven or fewer events per confounder, the number of dependent variables was adopted within 15.4 (108/7) items [23]. Variables with a P-value < 0.10 were included but those with a P-value > 0.10 were removed. To adjust for selection bias and to reduce potential confounders, we used three types of propensity score analysis: The regression adjustment method, the inverse probability of treatment weighting (IPTW), and one-to-one propensity score matching. These three methods were used to verify the significance and robustness of each risk factor with the same direction [24]. In the three propensity score analyses, one outcome was defined as PEP incidence based on revision of the Atlanta classification [1], and the other was defined as PEP score for the severity based on prognosis factors (0 to 9 pts) plus CT grade (Grade 1 = 1 pts, Grade 2 = 2 pts, Grade 3 = 3 pts) from the Japanese Guidelines for the Management of Acute Pancreatitis 2015 [19]. Finally, when each exposure had one more significance (P < 0.05) among the six propensity score analyses, the significant factors were employed in the final predictive scoring system.

Development of predictive scoring system for PEP and internal validation

To establish a predictive scoring system, the probability of PEP was quantified using a propensity score analysis in combinations of the employed factors. The probability of PEP was also calculated using the ASGE’s 12 risk factors and the ESGE definite 7 risk factors adding the risk factor of “Absence of pancreatic stent” (Table 1), because each guideline recommends use of pancreatic stents to prevent PEP [2, 3]. As for discrimination, the area under the receiver operating characteristic (AUC) curves was calculated and compared for PEP prediction in each model. Subsequently, four thresholds were set for stratification based on the range of propensity score, as follows: low-, moderate-, high-, and very high-risk groups, which were unrelated to total number of risk factors. Calibration of the predicted scoring model was verified for both PEP and severe PEP by calculating the incidence rate and odds ratio. Pearson goodness-of-fit test (P > 0.05 acceptable) was also performed. Furthermore, bootstrap resampling analysis (i.e., 1000 bootstrap resamples) was performed as an internal validation to evaluate the optimism of the present model [25]. The calibration was calculated by an optimism adjusted frequency of PEP, and the discrimination was calculated by the receiver operating characteristic curve and AUC.

Table 1 Patient’s characteristics and univariate analysis to select the risk factors for PEP

Statistical analysis

In multivariate analysis, variables with multicollinearity (variance inflation factor > 10), non-observation in PEP, and a perfect prediction arising from few numbers were excluded. One-way analysis of variance with a Bonferroni correction was used for multiple comparisons of the AUCs for PEP. Missing values were excluded for complete case analysis. The standardized mean differences (< 0.1) were assessed to balance baseline characteristics in the regression adjustment by propensity score analysis and in the IPTW analysis [24]. To adjust for differences in patient-related and procedure-related covariate, one-to-one propensity score matching was employed using calipers with a width of 0.2 standard deviations. The covariates were all employed from patient-related and procedure-related risk factors (Supplementary Tables 1 and 2), which were known as risk factors of PEP in previous studies [2, 3]. However, non-observation covariates in the PEP group, covariates after ERCP allocation (e.g., prevention drug for PEP), and multicollinearity variables (variance inflation factor > 10) were excluded. Data are presented as mean (standard deviation, SD) or frequencies (%), as appropriate. The proportions of categorical variables were compared using the Chi-square test or Fisher’s exact test (e.g., analysis of PEP ratio after propensity score matching). The mean values of the continuous variables were compered using the Mann–Whitney U-test (e.g., propensity score matching PEP score). Two-sided P-values < 0.05 were considered significant. All analyses were performed using Stata version 15 (StataCorp LP; College Station, TX, USA).

Results

Patients, procedures, and post-ERCP pancreatitis

Following exclusion of non-papillary ERCP procedures, 3,362 eligible patients who underwent trans-papillary ERCP procedures were enrolled in this study (Fig. 1). The raw incidence rates of PEP and severe PEP were 3.2% (108/3362) and 0.77% (26/3362), respectively (Table 1). In terms of the papilla condition, PEP occurred in 96 cases (88.9%) with naïve papilla, and in 12 cases (11.1%) without naïve papilla (Table 1, Supplementary Table 1). The overall success rate was 96.9% (3160/3261) in selective CBD cannulation and 97.0% (98/101) in intentional selective pancreatic ductal cannulation. The final diagnosis and indications of ERCP were bile duct stone (42.9%), malignant biliary stricture (33.0%), and benign biliary non-stricture (12.5%) (Supplementary Table 1). The frequency of hyperamylasemia without PEP was 6.6% (213/3254) in the non-PEP group (Table 1), and no PEP-related 30 day-mortality occurred in this series.

Fig. 1
figure 1

Patient information and the flow of study

Univariate analysis to select risk factors for PEP

Univariate analysis revealed significant odds ratios in patient- and procedure-related risk factors (Table 1, Supplementary Tables 1 and 2). However, more than half of the 12 risk factors of ASGE showed no statistical significance [Age < 40, female, absence of chronic pancreatitis, previous pancreatitis, previous PEP, normal serum bilirubin, precut sphincterotomy, repetitive pancreatic guide wire pass, and EPLBD (> 12 mm)]. Moreover, 4 of the 7 definitive risk factors of ESGE showed no statistical difference (female, previous pancreatitis, previous PEP, and repetitive pancreatic guide wire pass) (Table 1, Supplementary Tables 1 and 2) [2, 3]. In particular, pancreatic guidewire-assisted cannulation (PGW) showed a significant association with PEP, as did single-guidewire assisted cannulation (SGW) and the rendezvous method (Table 1, Supplementary Table 2). However, there was no significant difference in number of PEP between PGW with pancreatic stent (n = 18) and PGW without pancreatic stent (n = 24) (P = 0.25) (Supplementary Table 2).

With regards to preventative drugs for PEP, no drug revealed significance at the stage of univariate analysis. A small number of nafamostat mesilate and aggressive hydration with lactated Ringer’s solution was observed in the PEP and/or non-PEP groups, respectively (Supplementary Table 2).

Multivariate analysis to narrow risk factors

Three factors showed significance for PEP: Naïve papilla, difficult cannulation (> 15 min), and pancreatic injections (≥ 1) (Table 2). PGW (OR, 1.83; 95% CI, 0.99–3.38; P = 0.052) and absence of pancreatic stent (OR, 1.74; 95% CI, 0.90–3.35; P = 0.098), showed odds ratios with borderline significance (P < 0.10) (Table 2).

Table 2 Multiple logistic regression to narrow risk factors

Propensity score analysis to verify the significance of risk factors

Of five candidate risk factors for the incidence of PEP, all exposures without absence of pancreatic stent showed statistical significance in adjusted logistic regression (Table 3). Furthermore, exposures of PGW and pancreatic injections (≥ 1) showed statistical significance in the IPTW and in one-to-one propensity score matching analysis (Table 3).

Table 3 Three types of propensity score analyses to verify the significance and the robustness of “Big. 5” risk factors for PEP

In PEP score for severity, all exposures showed statistical significance in the adjusted linear regression. Exposures of naïve papilla and PGW showed statistical significance in the IPTW, while those of naïve papilla and pancreatic injections (≥ 1) also showed statistical significance in one-to-one propensity score matching analysis. Finally, five exposures revealed significance in the six propensity score analyses, as follows: 1. Naïve papilla, 2. PGW, 3. difficult cannulation (> 15 min), 4. pancreatic injections (≥ 1), and 5. absence of pancreatic stent. Thus, the five exposures were named “Big. 5”, and were employed for the subsequent predictive scoring system for PEP (Table 3).

Predictive scoring system for PEP and internal validation

The prediction model consisting of “Big. 5” revealed an AUC of 0.86 as a predictive probability of PEP (Fig. 2A). Inclusive of the absence of pancreatic stent, a total of 13 risk factors of ASGEs and 8 of ESGEs revealed AUC of 0.84 and 0.82, respectively. The AUC of the “Big. 5” prediction model was significantly higher than that of ESGE (P = 0.0024), but not that of ASGE (P = 0.25) (Fig. 2A).

Fig. 2
figure 2

Comparison of area under the curve for PEP (A) and stratification based on propensity score (B) in predictive scoring system for PEP. 1. One-way ANOVA with Bonferroni correction. 2. Five risk factors for PEP in the present model (Table 3). 3. Twelve risk factors of ASGE for PEP plus absence of pancreatic stent (Table 1). 4. Definite 7 risk factors of ESGE for PEP plus absence of pancreatic stent (Table 1). 5. Each threshold was set based on the range of propensity score

Four thresholds were set based on the range of propensity scores for stratifications from the low-risk group to the very high-risk group (Fig. 2B, Table 4). The diagnostic performance at different predicted probability cut-offs is shown in Supplementary Table 3.

Table 4 Lookup table for propensity score in the prediction model

The moderate-risk group (range: 0.0421–0.0869) usually had 3 items, which contained either one or both of “difficult cannulation (> 15 min)” and “pancreatic injections (≥ 1)” (Table 4). Likewise, the high-risk group (range: 0.1141–0.2009) usually had 4 items, which contained at least 2 of “difficult cannulation (> 15 min)”, “pancreatic injections (≥ 1)” and/or “naïve papilla” (Table 4).

The distribution ratio of four stratifications was 71.4% in the low-risk group, 18.9% in the moderate-risk group, 7.5% in the high-risk group, and 2.3% in the very high-risk group (Table 5). As for calibration, the Pearson goodness-of-fit test indicated goodness-of-fit for the model (P = 0.993, Group = 20). Although the adjusted prevalence of PEP was 3.74% in the present model, the predicted incidence rate of PEP in the very high-risk group rose to 28.79% (95% CI, 18.30–41.25%) (Table 5). Conversely, predicted incidence rate of PEP in the low-risk group decreased in 0.73% (95% CI, 0.41–1.20%). Moreover, the very high-risk group was approximately 60-fold more likely to have PEP than the low-risk group (OR, 55.11; 95% CI, 26.40–115.07; P < 0.001) (Table 5). Likewise, although the adjusted prevalence of severe PEP was 0.90% in this model, the predicted incidence rate of severe PEP in the very high-risk group rose to 9.09% (95% CI, 3.41–18.74%). Conversely, that of severe PEP in low-risk group decreased to 0.15% (95% CI, 0.03–0.43%). The very high-risk group was approximately 70-fold more likely to have severe PEP than the low-risk group (OR, 68.57; 95% CI, 16.75–280.70; P < 0.001) (Table 5).

Table 5 Predicted and observed incidence rates of PEP and odds ratios using an internally validated risk-score model

With regards to internal validation, the optimism-corrected frequency of PEP was acceptable in the very high-risk group at 29.82% (95% CI, 19.58–42.59%) and in the low-risk group at 0.73% (95% CI, 0.45–1.18%) compared to the predicted incidence rate (Table 5). The optimism-corrected AUC also revealed good performance at 0.81 (Fig. 3). However, a validation study could not be performed owing to the small number of severe PEP (26 events), which requires at least 100 events [26].

Fig. 3
figure 3

Optimism-corrected ROC curve and AUC in internal validation for PEP resampling the model 1000 times

Discussion

In the present study, we established a simple predictive scoring system for PEP immediately after ERCP. This simplified clinical scoring system for PEP was derived from the combination of five risk factors as follows: 1. Naïve papilla, 2. PGW, 3. difficult cannulation (> 15 min), 4. pancreatic injections (≥ 1), and 5. absence of pancreatic stent, which we named the “Big. 5”. These factors were strengthened using three types of propensity score analysis. In the scoring system with four stratifications, the very high-risk group had a 28.79% predicted incidence rate of PEP, while that of severe PEP had a 9.09% predicted incidence rate of PEP, although the adjusted prevalence was only 3.74% in PEP and 0.90% in severe PEP. Conversely, the low-risk group had a 0.73% predicted incidence rate of PEP, while the severe PEP group had a 0.15% predicted incidence rate of PEP. The very high-risk group was approximately 60-fold more likely to have PEP and 70-fold more likely to have severe PEP compared to the low-risk group. The present model showed a good calibration of the observed PEP frequency and a good discrimination of optimism-corrected AUC of 0.81 in the internal validation using 1000 times bootstrap resampling.

In view of the predictive scoring system for PEP, a small number of cohort studies used combinations of patient- and procedure-related risk factors to indicate odds ratios or frequencies [12, 13]. However, these were only univariate and multivariate logistic analysis methods, and the total integral number employed for the scoring system used no validation or split sample method. For reason, we employed both propensity score analysis and the bootstrap method to provide stable estimates with low bias, which can avoid the overfitting bias, overly pessimistic estimates of performance with large variability, and confounding factors [23, 24, 26, 27]. Thus, we established the simplified model using the Big 5, which consisted of five risk factors; this is comparable to the 13 factors in ASGE and 8 factors in ESGE (Fig. 2A).

In the patient-related risk factors, naïve papilla accounted for 88.9% in the PEP group. However, the other risk factors were flawed by univariate or multivariate regression analysis (Table 1, 2). There are several possible reasons for this, including the small number in this series (e.g., suspicious SOD, previous PEP, absence of chronic pancreatitis) may have decreased statistic power, or other patient-related factors (e.g., normal serum bilirubin) may have only had weak effects compared to naïve papilla and the significant procedure-related factors. The small number of suspicious SOD may reflect the daily medical practice because only 1.5% of patients with a functional gastrointestinal symptom were reported to have suspicious of SOD [28]. In addition, our result may support that the hypothesis that younger patients have no significant relationship with PEP [29]. However, it is also important to consider that other potential factors without naïve papilla had an incidence of 10.9% in the PEP group, and these factors may also have influenced the incidence of PEP.

In terms of ERCP procedures, PGW was a significant independent risk factor for PEP, because accomplishment of PGW may require both pancreatic injections and pancreatic guidewire passes, which are risk factors of PEP [2, 3]. This finding supports the risk of PGW in that there was no significance in the PEP group between PGW with pancreatic stent in 18 cases and PGW without pancreatic stent in 24 cases (P = 0.25, Supplementary Table 2). However, the pancreatic stent should be installed when PGW is performed due to difficult cannulation with the easy pancreatic stent [2, 3]. In addition, non-use of the pancreatic stent was a significant risk factor of severe PEP in our adjusted linear regression of propensity score analysis (Table 3).

Concerning the proposal applied method for the present model: the moderate-risk group contained either one or both of “difficult cannulation (> 15 min)” and “pancreatic injections (≥ 1)”. Likewise, the high-risk group contained at least two of the three items “difficult cannulation (> 15 min)”, “pancreatic injections (≥ 1)” and/or “naïve papilla” (Table 4). The results were highly suggestive to prevent PEP since “difficult cannulation (> 15 min)” is the only element to judge continuation or cessation of the ERCP procedure. For example, if the propensity score was 0.0627, containing “naïve papilla”, “PGW”, and “pancreatic injections (≥ 1)” (Table 4), the addition of “difficult cannulation (> 15 min)” may lead to progression from the moderate-risk group with a PEP incidence of 4.94% to the high-risk group with a PEP incidence of 23.88% (Table 4, 5). This result may indicate the necessity of switching from an ERCP approach to an alternative endoscopic treatment, such as an endoscopic ultrasound-guided biliary drainage [30], if selective cannulation failed within 15 min.

The present model can stratify PEP candidates with low-risk or high-risk by score. Our stratifications suggest that it is possible to choose high-risk and low-risk groups from cases immediately after ERCP. These refinements may aid the early prediction of PEP, and allow selection of patients who require earlier therapeutic interventions, such as rectal NSAIDs, aggressive hydration, and nafamostat mesilate, possibly preventing the transfer of severe PEP [2,3,4, 9]. Indeed, in the current study, we demonstrated that the high-risk group and the very high-risk group were approximately 30-fold and 70-fold more likely incidence rate of severe PEP compared to the low-risk group, respectively (Table 5). In contrast, the low-risk group held the most common frequency of 71.8% in adjusted ERCP cases and had an observed incidence rate of only 0.73% (Table 5). These results suggest that the low- and moderate-risk group could be discharged early in conjunction with 2–6 h post-ERCP monitoring with serum amylase level [31,32,33]. Based on these considerations, this model could solve outpatient issues by introducing discharge or admission on the day of surgery in order to reduce medical expenses not only in the US but also in other countries including Japan [5,6,7]. In summary, the low-risk group may be able to undergo ERCP as an outpatient treatment and the moderate-risk group may need a temporary hospital stay with 2–6 h post-ERCP monitoring with serum amylase in the US. The high- and very high-risk group may need early and appropriate primary and intensive care to prevent progression of severe pancreatitis both in Japan and in the US.

Our study has several limitations. First, this was a retrospective study at a single center. Second, although propensity score analyses were used because of the difficulty to control multi-factorial etiology assignment in conjunction with ethical considerations, unobserved selection biases and potential confounding factors may still remain. Third, the present model could not indicate the predictive ability for severe PEP owing to the small number of events (n = 26). Finally, a small number of risk factors were included in some cases, such as suspicious SOD, because the result may influence the reduced statistical power. Therefore, further investigations with a prospective multicenter study may be necessary to overcome these limitations.

In conclusion, we established and validated a simplified predictive scoring system for PEP using five minimal risk factors immediately after ERCP to help early identification of PEP and earlier therapeutic interventions. Thus, this scoring system is simple and easy to use and also achieves a high AUC.