Introduction

Though most patients with acute pancreatitis (AP) have a benign clinical course, approximately 20% of patients develop severe complications associated with significant morbidity and mortality ranging between 30 and 50% [1]. Early identification of high-risk patients on admission may help physicians to select those patients who would benefit the most from close surveillance or aggressive intervention. Therefore, a reliable method of risk stratification for AP is of significant clinical importance [2]. Several prognostic models have also been developed in order to predict which patients with AP are mostly at high risk of developing severe acute pancreatitis (SAP), such as Ranson’s score, the APACHE II score [3], and the BALI 4-variable model [4]. However, these scoring systems are complicated or not routinely collected at the time of hospitalization. Single predictors such as pleural effusion [5] and serum glucose [6] lack high sensitivity or specificity, though they are simple.

The classification and regression tree (CART) analysis is a non-parametric method based on the binary recursive partitioning of data. The results of CART analysis are presented as a decision tree, which is intuitive and facilitates the allocation of patients into subgroups by following the flow-chart form that is simple to interpret and may be applied at the bedside [7]. Several studies have shown the utility of this method in providing accurate prognostic models for mortality in different diseases [2, 8]. To our best knowledge, up to now, CART analysis has not yet been used in the prediction of SAP. It is presumed that a certain combination of factors yields a more effective prediction of outcome than when factors are used singly. The aim of the present study is to use CART analysis to develop to a simple tree model for predicting the severity of AP.

Materials and Methods

This study was based on patients with AP admitted to our hospital within 72 h of the onset of symptoms between May 2007 and January 2009. The diagnosis of AP was based on the presence of two of the following three features [3]: (1) abdominal pain characteristic of AP; (2) serum amylase and/or lipase ≥3 times the upper limit of normal; and (3) characteristic findings of AP on computed tomography (CT) scan. Patients were categorized retrospectively as severe and mild AP according to the Atlanta criteria [9]. This was also described by the recent practice guidelines of AP [3]. SAP was defined as pancreatitis associated with organ failure and/or local complications (necrosis, abscess, or pseudocyst). Organ failure was defined as: (1) shock, systolic pressure <90 mm Hg; (2) pulmonary insufficiency, PaO2 ≤60 mmHg; (3) renal failure, creatinine >2.0 mg/dl after rehydration; (4) gastrointestinal bleeding >500 cc/24 h. Pancreatic necrosis was defined as a lack of contrast enhancement of more than 30% of glands on an enhanced abdominal CT scan. Exclusion criteria included the follow: previous surgery for pancreas, endoscopic retrograde cholangiopancreatography (ERCP) or trauma-induced pancreatitis, chronic pancreatitis, pancreatic cancer, pleural effusions on CT scan preceding the development of AP, pleural effusions resulting from concomitant diseases (e.g., pneumonia, chronic heart failure), chronic renal disease, AP in a moribund patient as a component of the terminal illness [5], and complete data unavailable. Patients who were transferred from other hospitals and had experienced aggressive fluids resuscitation were also excluded.

The following information was collected for each patient on admission: age, gender, temperature, pulse, systolic blood pressure, and biochemical parameters, including white blood cell count, platelet, serum glucose, creatinine, blood urea nitrogen (BUN), and electrolytes. All patients underwent abdominal CT scan within 6 h of admission and pleural effusion was recorded. The systemic inflammatory response syndrome (SIRS) and APACHE II score at admission was calculated according to the laboratory and physiological data [3]. This study protocol was approved by the Ethics Committee of the First Affiliated Hospital of Wenzhou Medical College.

Statistical Analysis

Sample size calculation of the test sample was determined in order to show a significant difference between our new prediction tree model and the APACHE II score. With an α risk of 0.05 and a β risk of 0.2, a prevalence of SAP at 0.2, and a bilateral test, the sample size was calculated for the following hypotheses: area under the curve (AUC) is 0.85 for the new prediction tree model and 0.7 for the APACHE II score; the hypothesized rank correlation between the two diagnostic tests is 0.6 for both the positive and negative groups. Thus, the sample size of the test sample was calculated to be at least 110 patients. For this reason, study participants were randomly assigned to the training and test sample in a 2:1 ratio (training sample, 280 patients; test sample, 140 patients) using a computer random number generator.

For the training sample, a Shapiro–Wilk test was used to evaluate whether the continuous data followed a normal distribution. According to the results of the Shapiro–Wilk test, continuous values were expressed as the mean ± standard deviation (SD) or median and interquartile range and were compared using Student’s t-test or the Mann–Whitney non-parametric test. Categorical values were described by count and proportions and compared by the χ2 test or Fisher’s exact test. All variables that were found to be different between patients with and without SAP on univariate analysis were included as candidate variables in a multiple logistic regression analysis to identify independent predictors of SAP. Odds ratios (ORs) were calculated, along with 95% confidence intervals (CIs). Multicollinearity was considered to be significant if the largest variance inflation factor exceeded 10 [10].

The CART analysis was then carried out on the training sample using the predictors identified by logistic regression analysis in order to establish a simple tree model for the prediction of SAP. Impurity function (Gini criterion function) was used for splitting [11]. By the stopping rule, the maximum tree depth is 3 levels, and the case number of the parent node and child node is 20 and 1 each, respectively. Cutoff points for continuous and ordinal variables in the tree model were generated automatically by the SPSS AnswerTree 3.0 program. The optimal tree was selected according its predictive accuracy and clinical relevance [8]. The overall sensitivity (Se), specificity (Sp), negative predictive value (NPV), positive predictive value (PPV), and diagnostic accuracy (DA) of the tree model generated by CART analysis in the training sample were validated in the test sample. At the same time, a logistic regression function (LR model) including the same predictors identified by CART analysis was also developed for comparison.

The area under the receiver operating characteristic (ROC) curves, i.e., AUC was used to evaluate the performance of the tree model and the LR model for the prediction of SAP, and the AUCs were compared by DeLong’s test [12]. A model with an AUC above 0.7 was considered to be useful, while an AUC between 0.8 and 0.9 indicated excellent diagnostic accuracy [13].

All analyses were performed using STATA version 10.0, except that of the CART analysis, which was conducted using the SPSS AnswerTree version 3.0 program and the sample size was calculated by NCSS/PASS 2008. Differences were considered to be statistically significant if the two-tailed P-value was less than 0.05.

Results

Baseline Characteristics in the Entire Study Population

The prevalence of SAP in this entire study population was 24.8%. Of all the patients, 247 (58.8%) were male and their median age was 53 (42.0–64.5) years. The most common cause of AP was gallstones (57.9%). SIRS was recorded in 95 (22.6%) patients. Pleural effusion was found in 81 (19.3%) patients, of which 54 patients had bilateral effusion. The median APACHE II score at admission was 5 (2–7). The time interval between onset and admission was 2.1 ± 0.8 days. The baseline characteristics of the 420 patients in the training and test samples are shown in Table 1.

Table 1 Baseline characteristics of patients in the training and test samples

CART Analysis on the Training Sample

Univariate analysis revealed that SIRS, hemoglobin, platelets, serum glucose, BUN, serum calcium, and pleural effusion were significantly associated with the development of SAP. The results are summarized in Table 2. Variables significantly linked to the development of SAP in the univariate analysis were assessed by multivariate analysis. As shown in Fig. 1, multiple logistic regression identified four independent variables predictive of SAP: SIRS (yes vs. no) (OR = 2.70 [95% CI, 1.06–6.87]; P = 0.038), BUN (for each 1 mg/dl rise in BUN) (OR = 1.08 [95% CI, 1.02–1.14]; P = 0.007), serum calcium (for each 0.1 mmol/l rise in serum calcium) (OR = 0.59 [95% CI, 0.48–0.72]; P < 0.001), and pleural effusion (yes vs. no) (OR = 11.07 [95% CI, 4.39–27.92]; P < 0.001). Multicollinearity between covariates did not exist, and the variance inflation factor was less than 10 for all variables.

Table 2 Univariate analysis of predictive factors of severe acute pancreatitis (SAP) in the training sample
Fig. 1
figure 1

Logistic regression plot of odds ratios and 95% confidence intervals. Significant predictors of SAP included systemic inflammatory response syndrome (SIRS), blood urea nitrogen (BUN), serum calcium, and pleural effusion (PE)

The CART analysis was carried out using the above four predictors identified by regression analysis. An exploratory strategy was followed and several trees were constructed. As shown in Fig. 2, the ultimate decision tree model (denominated as the CART model) was selected based on three variables with a total of four terminal nodes. The cutoff values were 25 mg/dl, bilateral, 1.86 mmol/l for BUN (normal range, 9–20 mg/dl), pleural effusion, and serum calcium in the model, respectively. The four terminal nodes were reconstructed into two groups according to the possibility of the development of SAP (P < 0.001) (Fig. 3): a high-risk group consisted of node 2 (BUN > 25 mg/dl), node 4 (BUN ≤ 25 mg/dl, bilateral pleural effusion), and node 6 (BUN ≤ 25 mg/dl, no or unilateral pleural effusion, serum calcium ≤ 1.86 mmol/l), whose possibility of the development of SAP was 79.03% (49/62); a low-risk group consisted of node 5 (BUN ≤ 25 mg/dl, no or unilateral pleural effusion, serum calcium >1.86 mmol/l), whose possibility of the presence of SAP was 7.80% (17/218).

Fig. 2
figure 2

A tree model for the prediction of severe acute pancreatitis (SAP) generated by classification and regression tree (CART) analysis in the training set of 280 patients

Fig. 3
figure 3

The simple decision tree obtained by CART analysis

The diagnostic values of various predictors in the tree model are shown in Table 3. With the combination of three variables together sequentially, the tree model yielded a diagnostic accuracy of 89.3%.

Table 3 Diagnostic values of various predictors in the tree model

A logistic regression function (LR model) including the same predictors was also developed for the prediction of SAP: 6.807 + 0.080 BUN (mg/dl) + 2.464 pleural effusion (yes vs. no) − 4.822 serum calcium (mmol/l). The Hosmer–Lemeshow goodness of fit test was significant (P = 0.1511), suggesting that our LR model fitted the actual data well. The ROC curves for the CART model, the LR model, and the APACHE II score for the prediction of SAP are shown in Fig. 4. The AUC of the CART model (AUC = 0.84 [95% CI, 0.79–0.90]) was comparable to that of the LR model (AUC = 0.88 [95% CI, 0.82–0.94]; P = 0.0825). Furthermore, the AUC of the CART model was statistically higher than that of the APACHE II score (AUC = 0.68 [95% CI, 0.60–0.76]; P < 0.001).

Fig. 4
figure 4

Receiver operating characteristic curves for logistic regression function (LR model), CART model, and the APACHE II score in the training set. CI = 95% confidence interval

Validation of the CART Model on the Test Sample

The CART analysis was carried out on the randomly selected test sample (n = 140; Fig. 5). The tree model achieved a sensitivity and specificity of 78.9 and 93.1%, respectively. Assuming the SAP prevalence of 27.1% in this population, the positive and negative predictive values from the test sample were 81.1 and 92.2%, respectively. 89.3% of patients were correctly classified. As shown in Fig. 6, two groups of patients divided by CART analysis still had similar high (30/37, 81.08%) and low (8/103, 7.77%) probabilities of the development of SAP, respectively (P < 0.001).

Fig. 5
figure 5

Validation of the test set of the tree model obtained by CART analysis from the training set

Fig. 6
figure 6

Patients stratified by the tree model in the training sample and the test sample

Based on the ROC curve analysis (Fig. 7), the tree model still yielded a high AUC (0.86 [95% CI, 0.79–0.94]), which was comparable to that of the LR model (AUC = 0.90 [95% CI, 0.83–0.97]; P = 0.1839). The AUC of the tree model was substantially different from that for the APACHE II score (AUC = 0.69 [95% CI, 0.59–0.80]; P = 0.005).

Fig. 7
figure 7

Receiver operating characteristic (ROC) curves for the LR model, CART model, and the APACHE II score in the test set. CI = 95% confidence interval

Discussion

A rise in the BUN level reflects the disease status of initial hypovolemia and hypercatabolism in SAP, and it is an important factor for the assessment of SAP [14]. BUN as a prognostic factor has been incorporated in several clinical scoring systems, such as the BISAP score [2], POP score [15], and SPS score [14]. As expected, our data showed that the BUN level with an OR of 1.08 (95% CI, 1.02–1.14) was a valuable predictor of SAP. With the same cutoff identified by previous studies [2, 4, 14], BUN achieved a sensitivity of 27.3% and a specificity of 97.7% in the tree model. Patients with a BUN level of more than 25 mg/dl would have a 78.26% (18/23) possibility of developing SAP, while the remainder (21.74%, 5/23) would not develop SAP (Fig. 2). This suggested that azotemia would recover to normal in a few patients with a BUN level of more than 25 mg/dl if they could receive successful volume resuscitation. These results are consistent with the recent observation by Wu et al. [2] that patients who experienced a decrease in the BUN level at 24 h had substantially reduced mortality.

Pleural effusions are reported in 4–17% of patients with AP [5]. Though the mechanism causing effusion in AP is not completely known, the early appearance of pleural effusion is associated with more severe disease [16]. As one of the extrapancreatic abnormalities, a pleural effusion has been incorporated in several CT score systems, such as the EPIC score [17] and the modified CT severity index [18]. As a single predictor, several studies showed that the pleural effusion had a variable diagnostic sensitivity ranging from 58.3 to 84% and a high specificity ranging from 91 to 92.1% [16, 19]. The present study suggested that pleural effusion with an OR of 11.07 (95% CI, 4.39–27.92) was a strong predictor of SAP. However, few data are available about the relationship between bilateral pleural effusion used as a single predictor and SAP. Using a cutoff of bilateral pleural effusion, it achieved a low sensitivity of 41.7% and a high specificity of 97.6% in patients with a BUN level ≤25 mg/dl. When BUN and pleural effusions were combined sequentially (Table 3), 82.9% of patients would be correctly classified. However, the sensitivity (57.6%) is low, which means that some of these patients who were at high risk of developing SAP may not have a high BUN level and bilateral pleural effusions at admission.

Hypocalcemia is recognized as a consequence of the binding of calcium in the areas of fat necrosis and altered levels of circulating parathormone in AP [20]. It was reported by Mentula et al. [21] that serum calcium (OR 0.6, 95% CI 0.5–0.8) was the best single marker in predicting organ failure in AP after 24 h of symptom onset. With a cutoff of 1.82 mmol/l, it achieved a diagnostic sensitivity of 67% and a specificity of 94%. Our data showed that the serum calcium with an OR of 0.59 (95% CI, 0.48–0.72) played an important role in predicting SAP. With a cutoff of 1.86 mmol/l, the serum calcium achieved a sensitivity of 38.3% and a specificity of 98.5% in patients with a BUN level ≤25 mg/dl and with no or unilateral pleural effusion.

The tree model that consists of the above three parameters markedly improved sensitivity. It yielded an acceptable sensitivity of 74.2%, which was higher than that of the single parameters, and an excellent specificity of 93.9% in the training sample. 89.3% of patients were correctly classified. Also, patients may be divided into a high-risk group (79.03%) and a low-risk group (7.80%) according to the possibility of developing SAP by CART analysis, which suggests that patients with a BUN level >25 mg/dl or those with a BUN level ≤25 mg/dl and bilateral pleural effusion, or who had a BUN level ≤25 mg/dl, no or unilateral pleural effusion, and serum calcium ≤1.86 mmol/l would benefit most from close surveillance or aggressive intervention, while a majority of patients with a BUN level ≤25 mg/dl, no or unilateral pleural effusion, and serum calcium >1.86 mmol/l would have a mild course and could be managed in low-cost hospital beds (Fig. 3). Moreover, the tree model proved to be well calibrated (predicted outcomes in the training sample were reproduced fairly in the test sample) and achieved a same diagnostic accuracy of 89.3% in the test sample. Patients in the test sample can also be classified into high-risk (81.08%) and low-risk (7.77%) groups (Fig. 6). Furthermore, the overall diagnostic performance of the tree model in both the training and test samples was excellent, which was also comparable to that of the LR model. The intuitive nature of the tree model allows an easy assessment of the risk of developing SAP without the need for complex calculations.

Our data by logistic regression analysis suggested that SIRS might be an early indicator of SAP. This is consistent with previous observations [2, 22] that persistent or developing SIRS in the first 48 h was associated with higher cumulative organ dysfunction scores and was correlated with poor survival. However, it was not eventually selected to be entered in the tree model. These differences may indicate both the unique feature and the limitations of the CART analysis [11]. The unique feature is that it could visualize significant predictors that specifically apply to selected patients. The limitation is that not all significant factors may be adopted in the decision tree, since we applied the rule to stop the CART procedure when the sample size was below 20. This rule was applied in order to avoid the generation of an over-fit model, which may lack universality. Therefore, it is possible that SIRS may become a significant variable in the CART analysis if a larger number of patients were included in the analysis.

Our study has several limitations. First, the data were collected retrospectively, which may produce a population bias. Second, the records of pleural effusion were all based on the results of CT scans, which may not be generalized to a local clinic or a community hospital. And other CT parameters such as retroperitoneal inflammation [17], mesenteric inflammation, and ascites [16] were not evaluated as potential predictors. It will be interesting to analyze and compare these parameters in the future. Third, the AUC (0.86) of the tree model was higher for the test dataset than that for the train dataset (0.84), which can be interpreted either as high accuracy of the model or a high degree of similarity of the training and test datasets. Therefore, one should interpret these findings with caution and the authors have to state clearly that external verification of the tree model in other patient cohorts is mandatory before its application in the future. Lastly, the gold standard of SAP was defined by the Atlanta classification, whose limitations have been highlighted in a number of publications and a call for its revision has been made [3, 2325]. Two additional categories, namely, “moderate” and “critical” AP are introduced [23, 26], of which the “moderate” AP is defined as patients with sterile pancreatic complication or transient organ failure. That the Atlanta classification introduced in 1992 was chosen in the present study was based on the following grounds. It will be convenient to make comparisons with other studies because it is the most widely used classification of AP in the literature. Furthermore, a recent study by Vege et al. [25] showed that patients with “moderate” AP had a low mortality but a high morbidity. They still required an average stay in the intensive care unit (ICU) of 5 days and a total hospital stay of 28 days, both of which are more than expected for patients with mild AP [23, 25]. However, our models were to developed to classify the severity of AP and not for predicting hospital mortality.

In conclusion, we propose a tree model based on CART analysis that may help clinicians to make early severity stratification in AP and identify patients who would most likely benefit from close surveillance.