Introduction

The clinical management of kidney stone disease relies on the individual risk of future stone events. Such risk stratification, based on clinical characteristics, is recommended by clinical guidelines to appropriately tailor the intensity and burden of preventative interventions, often prescribed over the long term [1]. However, what constitutes high versus low-risk criteria is not clearly established, and as a result there is poor agreement among clinicians when predicting individual recurrence risk [2].

Within this context, the development of the recurrence of kidney stone (ROKS) nomogram sought to provide individual-level probability of future symptomatic stone events. The nomogram uses 13 features from the clinical history, including stone characteristics on imaging [3, 4]. However, the nomogram has limited adoption in clinical practice primarily due to its limited predictive performance, including when evaluated in external cohorts [2, 5, 6].

Whether ROKS can be used to discriminate high and low risk populations among kidney stone patients, rather than individual-level risk prediction, is unknown. Therefore, our goal was to assess the performance of the ROKS nomogram to risk-stratify kidney stone patients based on future risk of stone events, including both 2- and 5-year recurrence. To do this, we performed an external evaluation of the nomogram, identified the optimal risk thresholds for high-low risk stratification, and report their respective predictive potential.

Materials and methods

Patient cohort

After local institutional review board approval, a retrospective review was performed of all patients who underwent kidney stone surgery from 1997 to 2015 at our institution. Patients were identified using an institutionally maintained and deidentified database of the electronic health record [7]. Patients were identified via Current Procedural Terminology (CPT) codes (see Appendix 1) for having received percutaneous nephrolithotomy (PCNL), ureteroscopy (URS), or shockwave lithotripsy (SWL). Patient demographics and comorbidities based on ICD codes were extracted from the Synthetic Derivative (see Appendix 2). We selected a contemporary cohort of 200 patients (100 without recurrence and 100 with recurrence) with at least annual follow-up for 5 years for their stone disease. All patients had at least yearly imaging that could assess for kidney stone disease. Manual chart review was performed to adjudicate follow-up for all patients.

Stone recurrence

Stone recurrence episodes after index surgeries were identified by the first occurrence of 1) kidney stone surgeries (CPT codes, appendix (1), due to stone growth new stone on imaging or asymptomatic hydronephrosis, or (2) emergency room visits for symptomatic kidney stones (ICD codes, appendix (2). Then, manual chart review was performed to adjudicate all recurrence episodes. Patients who had secondary planned surgeries or recurrence events within 3 months of index surgery were excluded to ensure that recurrence episodes were not immediately attributable to a staged procedure or a complication from the index surgery.

Statistical methods

We identified the performance of each predictor in the 2018 ROKS nomogram for our patient population via Cox proportional hazard regression analysis [4]. Estimates of recurrence at 2- and 5-year follow-up were calculated for each patient. These predictions were used to assess the discriminative performance of the nomogram for our patient population using the area under the receiver operating curve (AUC-ROC).

Then, we evaluated the ability of the ROKS nomogram to stratify patients based on low or high risk of recurrence at two thresholds along the ROC obtained from: a) an optimized cutoff point (i.e., a point optimized for both sensitivity and specificity along the ROC), and b) a sensitive cutoff point (i.e., a point with high sensitivity (0.80) and low specificity along the ROC). The points were chosen in line with current statistical guidelines in urology [8]. This sensitive cutoff point was selected to determine the clinical utility of the ROKS nomogram as a tool for identifying high risk of stone recurrence. We performed a Cox proportional hazards regression analysis to evaluate the performance of each individual predictor included in the ROKS nomogram in our patient population. Time to recurrence was compared between the risk groups via survival analysis with Kaplan–Meier (KM) estimation and log rank testing. All analyses was performed with p < 0.05 as significant and conducted in ‘R’ (R Foundation for Statistical Computing, Vienna, Austria)[9].

Results

Patients followed up over a mean ± SD time of 96 ± 38 months. The mean ± SD time to recurrence was 29 ± 32 months. Table 1 summarizes patient demographics and clinical characteristics.

Table 1 Patient characteristics

ROKS nomogram demonstrated fair discriminative ability in predicting risk of recurrence with an AUC-ROCs of 0.67 and 0.63 at 2 and 5 years, respectively (Fig. 1a, b). At 2 years recurrence prediction, sensitivity and specificity were 0.60 and 0.68 at the optimized cutoff point, and 0.80 and 0.45 at the sensitive cutoff point, respectively. At 5 years recurrence risk prediction, sensitivity and specificity were 0.54 and 0.72 at the optimized cutoff point, and 0.80 and 0.26 at the sensitive cutoff point, respectively (Fig. 1c).

Fig. 1
figure 1

Prediction of overall recurrence at a Two-years (the blue point represents the optimized cutoff point with sensitivity (0.60) and specificity (0.68); the red point represents the sensitive cutoff point with sensitivity (0.80) with corresponding specificity of 0.45). b Five-years (the blue point represents the optimized cutoff point with sensitivity (0.54) and specificity (0.72); the red point represents the sensitive cutoff point with sensitivity (0.80) with corresponding specificity of 0.26), and c nomogram prediction of recurrence at 2 and 5-years for each cutoff point. (AUC = area under the curve)

For the 2 year recurrence prediction, the AUC-ROC values of 0.623 for the optimized cutoff point and 0.505 for the sensitive cutoff point were examined. At the optimized cutoff threshold, 120 patients were classified as low, while 80 were classified as high risk. Recurrence rates for the low and high-risk groups were 20 and 45% at 2 years, and 50 and 70% at 5 years, respectively. Then, for the sensitive cutoff threshold, 75 patients were classified as low risk and 125 patients were classified as high risk. The corresponding recurrence rates for the low and high-risk groups was of 16 and 38% at 2 years, and 42 and 66% at 5 years, respectively. For both thresholds, KM analysis revealed a significant recurrence-free probability between the groups (p < 0.01, Fig. 2a, b).

Fig. 2
figure 2

KM curves evaluating time to recurrence between low and high risk groups for using a ROC-cutoff point threshold of a 0.623 by optimizing sensitivity and specificity, and b 0.505 for high sensitivity, low specificity. The shaded areas represent the 95% confidence interval of the respective curves

Of the 13 predictors used by the ROKS nomogram, family history of stone disease, any calcium oxalate monohydrate stone, and having a stone > 6 mm in diameter all associated with risk of recurrence in the cohort (Table 2, HRs: 1.03, 1.8, 0.3,0.61, and 0.26, respectively).

Table 2 Performance of individual predictors from the ROKS nomogram in the cohort

Discussion

Our study suggests that the ROKS nomogram modestly predicts recurrent stone events when including a broad definition of recurrence. This further emphasizes the limitations of utilizing the nomogram for counseling kidney stone patients. However, we found that the ROKS nomogram successfully stratified patients by risk-level (i.e., low vs. high) for stone recurrence based on different cutoff points. For example, using a sensitive cutoff point for stratification predicted the risk of kidney stone recurrence for the low and high-risk groups to be 16 and 38% at 2 years, and 42 and 66% at 5 years, respectively. This finding could facilitate the development of surveillance protocols for kidney stone recurrence by risk group.

Currently, the American Urologic Association (AUA) recommends postoperative imaging after stone surgery to assess for residual stone and silent hydronephrosis with periodic imaging thereafter to assess for new stone formation or stone growth [1, 10]. However, there is limited evidence supporting specific protocols for stone surveillance. Unnecessary imaging can be costly and lead to excess radiation exposure [11, 12]. The lack of data supporting long term imaging use has led to a wide variation follow-up imaging utilization after stone surgery. For example, in an evaluation of MarketScan data, 29 and 15% of patient who underwent percutaneous nephrolithotomy do not undergo postoperative imaging by 3 and 12 months, respectively [13]. Comparatively, 55 and 39% of patients who underwent ureteroscopy, and 23 and 16% of patients who underwent shock wave lithotripsy had no postoperative imaging at 3 and 12 months, respectively [14]. Both studies, moreover, reported significant variations in postoperative imaging modalities as well, emphasizing a lack of consensus for kidney stone surveillance. Thus, a rational approach for follow-up of kidney stone patients is to tailor the frequency of surveillance imaging to specific patient risk factors.

As recurrence risk varies across patients, it is necessary to personalize stone care to optimize stone surveillance strategies. However, clinicians may not be able to distinguish patients with low or high risk of recurrence, particularly compared to the ROKS nomogram [2]. Additionally, prior attempts at external validation of the ROKS nomogram has been limited. Previously, Iremashvili et al. demonstrated only moderate discrimination of the nomogram (0.655 at 2 years and 0.605 at 5 years) for prediction of recurrence risk [15]. We found a similar, modest prediction of risk with a broader definition of recurrence (i.e., including both symptomatic and radiographic recurrence). However, despite limitations of the nomogram as an external prediction tool, our findings suggest it could be used for stone risk stratification. By stratifying patients as low or high-risk for recurrence, the nomogram could help guide follow-up management and improve care for kidney stone patients.

There are several limitations to this study. The retrospective design cannot account for confounders and bias due to omitted variables. All patients were identified after being referred to a tertiary medical center and may have a higher baseline risk for stone recurrence. Furthermore, the nomogram performance may differ for other definitions of kidney stone recurrence [6]. Though we reviewed imaging reports, we were unable to review kidney stone imaging directly. Therefore, we could not differentiate whether recurrence was from a residual fragment or a new stone. We additionally did not include the impact of medication on stone recurrence as it is not an included parameter on the ROKS nomogram. Additionally, as this study was done at a single institution, we cannot assess for recurrence events that occurred at other institutions or at home. It is possible further analysis with a larger dataset could refine our models. Despite these limitations, this study demonstrates the feasibility of kidney stone risk stratification using the ROKS nomogram.

Conclusion

We found that the ROKS nomogram has potential to serve as a tool for recurrence risk stratification into lower and higher risk groups, despite only modest prediction of stone events when including a broad definition of stone recurrence, including symptomatic and radiographic recurrence. This could facilitate adherence to risk-based surveillance protocols.