The incidence of common bile duct stones (CBDS) in patients with symptomatic gallstone disease is reported to range from 3 to 33% [1]. Since the advent of laparoscopic cholecystectomy (LC), the strategy for CBDS treatment has turned into a two-stage approach, consisting of an LC preceded or followed by endoscopic retrograde cholangiopancreatography (ERCP). This strategy is impaired by a 5–15% post-ERCP complication rate and, because of spontaneous CBDS passage prior to ERCP, a negativity rate of 15–25% [14]. Recently, the increase in surgeons’ experience has raised the popularity of the laparoscopic one-stage approach. Studies indicate that the one-stage strategy might be as effective and safe as the two-stage approach, but because the one-stage strategy is time-consuming, requires specific skills and supplies, and exposes the patient to bile leak or bile duct stenosis, most surgeons are reluctant to use it and wisely prefer to ascertain bile duct clearance using preoperative ERCP [59].

With the aim of restricting ERCP to patients most likely to present CBDS, recent guidelines have provided a risk stratification algorithm based on clinical, imaging, and laboratory data, recommending ERCP in patients with a high probability of CBDS and suggesting magnetic resonance cholangiopancreatography (MRCP) or endoscopic ultrasonography (EUS) for others [10]. Nevertheless, the overall efficacy of these guidelines remains low, first because there are no valid laboratory tests to identify CBDS with reliability, and secondly because as for other models, the effect of the course of biochemical parameters over time has not been assessed [1, 4, 1115]. Furthermore, MRCP and EUS are not always feasible depending on a centre’s resources, are expensive, and are not always available within the required timeframe, resulting in a supplementary delay to treat patients.

To avoid these issues, we hypothesised that the course of biochemical parameters over time should be highly informative about the persistence or the passage of CBDS, and developed a risk-assessment model based on a dynamic analysis of laboratory values, the aim of which is to better identify patients at risk of persistent CBDS and, in turn, to decrease the rate of unnecessary ERCP and the risk of perioperative discovery of CBDS.

Materials and methods

All consecutive patients who underwent a cholecystectomy from May 2010 to December 2015 at Sainte-Anne Military Hospital, an urban tertiary care centre, were retrospectively identified. Only patients who presented with suspected gallstone migration revealed by pancreatitis, cholangitis, or the association of typical clinical signs (biliary colic, jaundice) and increased liver test values were included in the present study. Medical charts were reviewed and compared according to the presence or absence of CBDS on preoperative ERCP or during cholecystectomy. Pancreatitis was diagnosed when the serum lipase level was over three times the upper limit. Cholangitis was diagnosed according to the Tokyo Guidelines criteria [16, 17]. Exclusion criteria were cholecystectomies performed for biliary colic, acute, or chronic cholecystitis, tumour of the gallbladder, any associated disease or condition that could modify biological function tests, or a history of bile duct stricture or bile duct surgery. The study was approved by the Institutional Review Board of the hospital.

Selection algorithm for preoperative ERCP

The two-stage approach was mainly used in our unit during the study period. To decrease the rate of unnecessary ERCP while limiting the use of MRCP or EUS, patients admitted for choledocholithiasis first benefited from a 3–5-days observational phase during which they were given supportive care and antibiotics if necessary. Blood tests with evaluation of liver enzymes and inflammatory markers were regularly performed. Patients with severe acute cholangitis or pancreatitis underwent urgent endoscopic biliary drainage. After the observational phase, the following decisional algorithm was applied according to the course of biochemical parameters: patients were scheduled for first-intention cholecystectomy when liver function tests normalised, suggesting spontaneous stone passage; patients were scheduled for first-intention ERCP when liver test values increased, suggesting persistence of the stone, and then benefitted from cholecystectomy. Laparoscopic cholecystectomy was performed during the same hospitalisation, with systematic intraoperative cholangiography to ascertain bile duct clearance.

Data collection

The data retrieved included demographics; American Society of Anesthesiologists (ASA) score; body mass index (BMI); use of antibiotics; treatment type; and laboratory test values collected on the day of admission (Day 0), 3–5 days post admission following the observational phase (Day 3–5), maximal values, and differential values defined as the difference between Day 0 and Day 3–5 values and expressed as the percentage of the Day 0 value. Laboratory data included leucocyte count (/µL), neutrophil count (/µL), C-reactive protein (CRP, mg/L), alanine aminotransferase (ALT, units/L), aspartate aminotransferase (AST, units/L), gamma-glutamyl transpeptidase (GGT, units/L), alkaline phosphatase (ALP, units/L), total bilirubin (µmol/L), conjugated bilirubin (µmol/L), and serum lipase level (units/L).

Statistical analysis

Statistical analyses were performed using IBM SPSS 20.0 (IBM Inc., New York, NY, USA). Categorical variables are described in terms of frequency (percentages) and continuous variables as the median (range). Comparisons were conducted using a Student’s t test for continuous variables and a Chi square test or Fisher’s exact test for categorical variables. Day 0, Day 3–5, Maximal, and Differential values were each analysed separately. Variables with P values ≤0.3 following the continuous analysis were stratified into categorical variables using receiver–operator characteristic (ROC) curves and compared. Variables significant at P ≤ 0.1 were included in a backward stepwise logistic regression model for predicting CBDS. Each of the Day 0, Day 3–5, Maximal and Differential models were adjusted for the use of antibiotics. The area under the ROC curves (AUC) of the models was compared using a covariance matrix. A final risk-assessment (FRA) model for the prediction of CBDS was then created, including the independent variables identified in Day 0, Day 3–5, Maximal and Differential models, and adjusted for demographic parameters. A two-tailed P value ≤0.05 was considered statistically significant. To take into account the statistical weight of the various predictors, a weighted FRA model was created and compared with the non-weighted FRA model, assigning points for each risk factor according to the odds ratio. The intrinsic validity and predictive capacities (sensitivity [Se], specificity [Sp], positive predictive value [PPV], negative predictive value [NPV], and accuracy [Acc]) of the FRA model were determined and compared with our traditional algorithm using Chi square analysis. Finally, the FRA model was retrospectively tested within the framework of the two-stage approach, based on the assumption that patients who were predicted to have CBDS underwent first-intention ERCP. A cost analysis was performed to assess the cost-effectiveness of the FRA model, taking into account the mean timeframe to obtain MRCP for patients with choledocholithiasis.

Results

Seven hundred and nineteen patients underwent a cholecystectomy during the study period: 235 for biliary colic, 215 for acute cholecystitis, 39 for chronic cholecystitis, 10 for tumours of the gallbladder, and 220 for choledocholithiasis. Ten patients were excluded because of liver disease (n = 5), intensive care unit pancreatitis or cholangitis (n = 3), and Mirizzi syndrome (n = 2). Finally, 210 patients fulfilled the inclusion criteria (Fig. 1). The median age was 71 years (19–95), including 103 men and 107 women.

Fig. 1
figure 1

Flowchart of the selection process for the study

The median duration of the observational phase was 5 days (1–23). According to our decisional algorithm, 67 patients were scheduled for first-intention ERCP. In six cases (9.0%), ERCP did not find persistent CBDS. ERCP was thus considered unnecessary. At the opposite, 143 patients underwent first-intention LC, with intraoperative cholangiography showing persistent CBDS in 32 cases (22.4%).

In all, 93 patients had persistent CBDS and were compared with 117 patients who did not have stones. Groups were comparable according to sex, age, and BMI (Table 1). Univariate analyses between groups are presented in Tables 1 and 2. Results of the multivariate analyses for Day 0, Maximal, Day 3–5 and Differential values are presented in Table 3. Areas under the ROC curves were 0.738, 0.735, 0.810, and 0.837 for the Day 0, Maximal, Day 3–5, and Differential models, respectively, with no significant differences between models (Fig. 2). Eight parameters were identified as independent predictors in the final multivariate analysis (Table 3). Values of the FRA model (Fig. 3) thus ranged from zero (no risk factors) to eight (all factors present). The AUC for the model was 0.881, which differed significantly from that for the Day 0 (P < 0.001), Maximal (P < 0.001), Day 3–5 (P = 0.002), and Differential models (P = 0.004).

Table 1 Baseline characteristics and laboratory values continuous analysis with corresponding thresholds
Table 2 Univariate and multivariate categorical analysis of laboratory test values using corresponding thresholds
Table 3 Final multivariate logistic regression analysis for predicting persistent CBDS (n = 179)
Fig. 2
figure 2

Receiver–operator characteristic curves for Day 0, Maximal, Days 3–5, differential, and the FRA model

Fig. 3
figure 3

Parameters of the FRA model and probabilities of persistent CBDS

Points for the creation of the weighted FRA model were assigned as follows: age ≥ 80 = 3 points, Day 0 neutrophil ≥ 12,000 = 6 points, differential CRP ≥ −10% = 3 points, differential AST ≥ −35% = 3 points, Day 0 GGT ≥ 300 = 5 points, differential GGT ≥ −25% = 6 points, Day 3–5 ALP ≥ 180 = 3 points, and differential total bilirubin ≥ −15% = 3 points. Values of the weighted FRA model ranged from 0 to 32. The AUC was 0.871 (P = 0.314 when compared to the non-weighted FRA model). Because the scoring system was more complex to use and no more efficient than the non-weighted FRA model, the weighted FRA model was not considered in further analyses.

Probabilities of persistent CBDS are given in Fig. 3. The diagnostic accuracy of the FRA model to predict CBDS was 80.4% when at least four risk factors were identified (Table 4); this did not differ from the decisional algorithm (81.9%, P = 0.714). The corresponding Se, Sp, PPV, and NPV were 69.3, 88.5, 81.3, and 80.4%, respectively. NPV was 100% for zero risk factors, and PPV was 100% when at least six risk factors were present.

Table 4 Intrinsic validity and predictive capacities of the FRA model according to the number of risk factors

When used to select patients for preoperative ERCP, diagnostic accuracy reached 94.8% for a threshold ≥3 (Table 4, P = 0.005 when compared to the traditional algorithm). The corresponding Se, Sp, PPV, and NPV were 94.0, 95.4, 94.0, and 95.4%, respectively. NPV was 100% in the absence of risk factors, and PPV was 100% when at least five risk factors were present. Rates of unnecessary ERCP (false positive) and fortuitous CBDS discovery (false negative) were both 2.6%. The expected reduction in rates of unnecessary ERCP and fortuitous CBDS discovery were 6.4 and 19.8%, respectively. During the study period, the mean timeframe to obtain MRCP was 3 days. The hospitalisation cost of patients using the FRA model was 3962 euros compared to 4358 euros with MRCP.

Discussion

The optimal treatment of choledocholithiasis remains controversial. Proponents of the one-stage approach indicate similar effectiveness to the two-stage approach, with shorter hospital stays and lower cost despite higher conversion and inpatient care rates [13, 5, 7, 8]. Numerous studies have compared the two approaches without firm conclusions [9, 18]. Actually, the efficacy of the two-stage strategy is impaired because one-third of CBDS can pass spontaneously before ERCP [18]. Thus, the criteria used to schedule patients to ERCP have to be improved by identifying those who are most likely to present with persistent CBDS. The American Society for Gastrointestinal Endoscopy (ASGE) recently published guidelines recommending ERCP or MRCP/EUS according to the presence of clinical, imaging, and laboratory predictors. However, performance characteristics in detecting persistent CBDS remain low [10, 11, 19]. Adams et al. [11] failed to improve ASGE guidelines using a second set of laboratory tests conducted before the confirmatory study. Van Santvoort et al. [4] demonstrated that biochemical markers collected at patient admission were not useful to predict CBDS. Actually, all these studies consider biochemical markers as static variables and do not address the impact of the course of the variable over time.

In our study, we hypothesised that the course of biochemical parameters could reflect the persistence or the passage of the stone. We derived four different models from the Day 0, Day 3–5, Maximal, and Differential values we compared to each other and found that the ‘Differential model’ was the most accurate, confirming our initial hypothesis. However, the accuracy of the model was increased when adjusting with Day 0 and demographic variables, suggesting variables collected at patient admission and physiological parameters are mandatory to accurately predict the likelihood of persistent CBDS.

One of the most interesting results is that all the differential values included in the FRA model have a negative cut-off, suggesting that the only decrease of the biological value is not sufficient to predict CBDS passage, but that there is a decreased threshold to reach and below which CBDS clearance becomes very likely. These results demonstrate that stone persistence is possible even when liver function test values decrease, contrary to what basic clinical knowledge, on which our traditional decisional algorithm was constructed, suggests. This finding explains our high fortuitous discovery rate of CBDS during LC (22.4%) and our low rate of negative ERCP (9.0%) compared with those usually reported (from 4 to 6% and from 15 to 25%, respectively) [3, 20]. The strength of our study is to precisely identify for each predictor the corresponding decrease threshold.

The factors identified in the FRA model have already been discussed in previous studies [1, 12, 14, 21]. Age is frequently cited, although the threshold varies from 55 to 70 years old [2123]. In our study, the threshold was ≥80 years old. We consider this to be a consequence of a centre effect, as the median age of our series was 71 years, but this result could also reflect the fact that age was used as an adaptive factor on which biological markers were adjusted, thus acting as a surrogate of the liver functional reserve on which the course of liver biochemical markers depends [24].

Numerous studies have proposed various scoring systems to predict the likelihood of persistent CBDS [1, 4, 1115]. Recently, Jovanovic et al. [25] used an artificial neural network to predict CBDS with PPV and NPV of 92.3 and 69.6%, respectively. If performances of each model are stackable, they are hardly transposable to other centres because of the inclusion of non-reproducible clinical or radiological exams or the use of too complex a scoring system. Our model exhibits good performance characteristics because (i) as previously discussed, we only used objective and reproducible biological variables [1, 12, 21]; (ii) exclusion of patients with cholecystitis or biliary colic allowed inflammatory markers to be interpreted only in the context of bile duct sepsis, and to correlate the rise in the marker with the risk of persistent CBDS. This finding probably would have been lost if cholecystitis or biliary colic had not been excluded, resulting in a decrease of the performances of the FRA model. These choices explain the high accuracy of the model and render it easily transposable.

The diagnostic accuracy of the model was 94.8% when used to select patients for preoperative ERCP. False positive and false negative rates fell from 9.0 to 2.6% and from 22.4 to 2.6%, respectively. These results demonstrate the model to be more effective in selecting patients for ERCP than the traditional algorithm, thus decreasing the rates of unnecessary ERCP and fortuitous discovery of stones. Patients with no risk factors have an NPV of 100% and thus do not need ERCP, contrary to patients with at least five risk factors, whose PPV is 100%. For patients with one to four risk factors, the choice of the best positivity threshold should be adapted according to the local surgeon’s practice, considering that a low positivity threshold will reduce the rate of fortuitous CBDS discovery during LC while slightly increasing the rate of unnecessary preoperative ERCP, conversely to higher thresholds.

Some limitations impair our study. First, owing to the retrospective nature of the work, some biological data were missing, notably when there was a deadline between the treatment decision and the treatment day. In such cases, laboratory tests were not repeated on the days preceding treatment, although the stone could have passed. Secondly, there were variations in the frequency of laboratory tests, although patients were tested daily in most cases. Thus, we were unable to identify an optimal time interval for Day 3–5 values, and thus a threshold beyond which the course of biological markers becomes uninformative. Finally, as most patients were admitted in emergency, some decisions were made according to the availability of the surgeons or the gastroenterologists but against the decisional algorithm. However, we believe this source of bias to be very limited, owing to the size and structure of our institution.

The use of the FRA model to select patients for ERCP has been proved to be cost-effective, because the mean timeframe to obtain MRCP in our hospital was 3 days. Thus, a diagnosis strategy using the FRA model is necessarily cost-effective because there is no more need to wait 3 days for MRCP or even to demand MRCP, which is an expansive exam by itself. Furthermore, such a strategy avoids the demand for supplementary EUS when MRCP is not contributory. The use of the FRA model in our unit could save at least 396 euros per patient. However, the FRA model becomes obviously less cost-effective in centres that can perform MRCP at admission.

Commonly used biochemical parameters are able to correctly predict CBDS persistence when they are considered in a dynamic setting. The analysis of the course over time of biochemical variables dosed at patient admission and three to 5 days after admission is useful to better identify patients who are most likely to present persistent CBDS. The FRA model is a reliable tool to help select patients for preoperative ERCP and to decrease the rate of unnecessary ERCP and the risk of CBDS discovery during LC. These results advocate the usefulness of an initial 3-day observational phase to integrate the possibilities of spontaneous stone passage. Furthermore, this strategy is cost-effective when compared to early MRCP and avoids the need for supplementary exams. The optimal timeframe to dose the biochemical predictors and to increase the cost-effectiveness of the model should be precise on a prospective series we start to collect the data.