Introduction

Urolithiasis is a common urological condition with an increasing incidence and prevalence worldwide [1, 2]. Although current European Association of Urology guidelines recommend URS as the first-line therapy for ureteral stones larger than 10 mm, SWL can also be used in clinical practice. Hence, accurately estimating the probability of successful SWL treatment for large stones would maximize patient benefits [3].

Previous studies investigated various factors from a standard non-contrast CT scan, which may influence the success of SWL treatment of ureteral stones, including stone size, density, texture, etc. [4,5,6,7,8]. The usefulness of CT texture analysis in assessing internal structural heterogeneity and its potential role in enhancing the accuracy of SWL success prediction remains contentious. Overall, AUCs of prediction models based on these factors remained unsatisfactory [9,10,11].

Recent research indicates that morphology can differentiate between COM and calcium oxalate dihydrate (COD) stones [12]. Radiomics extracts quantitative imaging features from radiological images, creating a high-dimensional dataset that enhances diagnostic, prognostic, and predictive accuracy in clinical decision support [13, 14]. Therefore, it is plausible to hypothesize that radiomics features can predict SWL success more effectively. Within the field of urolithiasis, radiomics has been used to identify stone composition [15] and predict stone-free rate of flexible ureteroscopy [16]. However, no published reports have investigated whether radiomics signature may enhance the accuracy of predicting SWL treatment outcomes for ureter stones.

The study aimed to investigate the predictive value of radiomics using non-contrast CT images for SWL treatment success in patients with ureteral stones over 10 mm. We also compared radiomics’ prediction performance with conventional radiological features and developed a combined model incorporating independent clinical predictors and superior imaging signatures.

Patients and methods

Patients

A retrospective study analyzed patients with ureteral stones treated with SWL at our hospital from January 2018 to January 2023. SWL were performed using the Dornier Compact Delta II lithotripter (Dornier MedTech, Wessling, Germany). Stone-free status was assessed two weeks after each SWL using kidney, ureter, and bladder film and ultrasound. Retreatment was carried out for inadequate fragmentation and incomplete clearance of the stone. If the stone remained intact after a maximum of three sessions, the case was considered a failed treatment. Treatment success was defined as achieving complete stone absence.

The inclusion criteria for this study were as follows: (a) availability of pre-treatment non-contrast CT images, (b) complete follow-up during or after treatment, and (c) the maximum diameter of stone larger than 1 cm. The exclusion criteria were as follows: (a) presence of multiple ureteral stones on the ipsilateral side, (b) previous open ureteral surgery or a history of ureteral stricture, and (c) any genitourinary tract anomaly.

A total of 301 patients were included in this study based on predefined inclusion and exclusion criteria. Subsequently, these patients were randomly assigned to a training cohort and a test cohort at a ratio of 8:2.

Baseline clinical data of the patients, including age, gender, body mass index (BMI), presence of diabetes mellitus (DM) or hypertension (HTN), stone laterality, location, hydronephrosis grade (HG) [17], proximal ureter diameter (PUD), skin-to-stone distance (SSD) were obtained from the medical records.

Radiological analysis

Two experienced radiologists reviewed and evaluated pretreatment CT images of stone on our hospital’s picture archiving and communication systems.

Radiographic parameters of stone included maximum craniocaudal diameter (MCD), maximum transverse diameter (MATD), minimum transverse diameter (MITD), mean stone density (MSD), standard deviation of stone density (SDSD) and variation coefficient of stone density [(VCSD, (SDSD)/(MSD)*100(%)] [18].

Radiomics feature extraction and selection

The workflow for radiomics analysis is shown in Supplementary Fig. 1. Image segmentation was performed using ITK-SNAP software (version 3.8.0, http://www.itksnap.org/). The volume of interest (VOI) of the entire stone was semi-automatically segmented.

Radiomics feature extraction was performed using PyRadiomics package, an open source tool (https://pyradiomics.readthedocs.io; version 3.0.1), implemented in Python. Then, radiomics features were extracted from each patient’s VOI, which included geometry, intensity, and texture features. Here, texture features were calculated using various techniques, including gray level cooccurrence matrix (GLCM), gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM), and neighborhood gray tone difference matrix (NGTDM) methods.

To ensure comparability and eliminate any potential bias, all extracted feature data were normalized by z-score analysis. The radiomic feature selection and dimension reduction process in the training cohort involved the following steps: First, a univariate analysis was performed to identify statistically significant features. Then, Spearman’s rank correlation coefficient was applied to reject features with high correlations, and only one of the two was retained with correlation coefficients greater than 0.8. Finally, the least absolute shrinkage and selection operator (LASSO) regression adjusted the penalty parameters through 10-fold cross validation with minimum criteria, and then applied to select the most reliable predictive features with non-zero coefficients for signature construction.

Development of different models and performance evaluation

Following Lasso analysis, we employed five machine learning (ML) algorithms to train models with retained radiomics features to identify the most effective classifier, including logistic regression (LR), support vector machine (SVM), random forest (RF), K nearest neighbor (KNN), and XGBoost.

The construction process of radiological signature was similar to that of radiomics signature. First, features used to construct radiological signatures were selected based on univariate analysis of radiological characteristics, using a significant p-value < 0.05. Similar to the radiomics signature, the same ML model was used in the building process of the radiomic signature.

The prediction performance of radiomics and radiological signature models was assessed by the area under the receiver operating characteristic curve (AUC), accuracy and precision. The DeLong test was used to compare the AUCs of two models. In addition, the prediction models developed using the training sets were validated with independent test sets.

The identified superior imaging signature, along with clinical candidate variables, were used to develop a nomogram. The predictive efficacy of the nomogram was tested in both the training and test cohort using ROC curves. Calibration curves were generated to evaluate the calibration of the nomogram, accompanied by the Hosmer-Lemeshow goodness-of-fit test. Decision curve analysis was conducted to assess the clinical usefulness of the nomogram by calculating the net benefit at various threshold probabilities.

Statistical analysis

We used R software (version 4.1.2; https://www.r-project.org) and Python software (version 3.9.7; http://www.python.org) for statistical analysis. Continuous variables were analyzed using either the one-way ANOVA test or Mann-Whitney U test, while categorical variables were compared using the chi-squared test or Fisher exact test. Statistical significance was determined by a two-sided P value of less than 0.05.

Results

Clinical and radiological characteristics

Of the 301 patients enrolled, 165 were identified as successful in SWL treatment, and 136 were unsuccessful. The training cohort included 132 successful and 109 unsuccessful patients, while the test cohort included 33 successful and 27 unsuccessful patients.

Clinical and radiological characteristics of the training and test cohorts are summarized in Table 1. There were no significant differences in clinical and radiological characteristics between the training and test cohorts. Univariate analysis showed that only PUD differed significantly between the successful and unsuccessful groups in both training and test cohorts (Table 1).

Table 1 Baseline characteristics of the patients

Radiological signature building and performance evaluation

Univariate analysis identified four significant radiological predictors: MCD, MATD, MITD, and MSD. Therefore, the radiological signature was constructed using these four predictors.

Table 2 shows the prediction performance of radiological signature models developed using five ML algorithms in training and test cohorts.

Table 2 Performance of radiological models and radiomics models with different classifiers

Radiomics signature building and performance evaluation

A total of 1834 radiomics features were extracted from each patient’s CT images. Supplementary Fig. 2 displays LASSO analysis identifying 25 features from the training cohort. Table 2 shows the prediction performance of radiomics signature models constructed using different ML algorithms, both in training and test cohorts.

Comparison of prediction performance between radiomics and radiological signature models

The Delong test demonstrated that radiomics signature models, based on LR and SVM, showed superior predictive performance compared to radiological signature models using the same ML algorithms, in both the training cohort and the test cohort (Table 2).

In addition, there were no significant differences between training and test cohorts within LR- and SVM-based radiomics models, as well as LR- and SVM-based radiological models. This indicates a high generalization capability in these models.

Although the performance in the test cohort was comparable between LR- and SVM-based radiomics models (P = 0.520), we considered SVM-based radiomics to be superior as it yielded better results than LR-based model in the training cohort (P < 0.001). Detailed performance results of SVM-based radiomics are shown in Supplementary Fig. 3.

Combined model construction and validation

A radiomics nomogram model was developed by combining clinical predictor (PUD) and radiomics signature. The nomogram, shown in Fig. 1a, did not improve prediction ability in the training cohort (AUC: 0.955). However, it showed improved prediction performance in the test cohort compared to using only SVM-based radiomics signature, but the differences were not statistically significant (AUC: 0.939, P = 0.166) (Fig. 1b).

Fig. 1
figure 1

(a) The nomogram was developed, with SVM-based radiomics signature and proximal ureter diameter incorporated. (b) Receiver operating characteristic (ROC) curve of the nomogram in the test cohort. (b) Calibration curve of the nomogram in the test cohort. (d) Decision curve analysis for radiomics signature and nomogram in the test cohort. The graph indicated that the nomogram added more benefit than radiomics signature

Calibration plots showed good calibration of the nomogram in both training and test cohorts (Fig. 1c). The Hosmer-Lemeshow test yielded nonsignificant statistics in both the training (P = 0.825) and test cohort (P = 0321), indicating that there was no deviation from the perfect fit. DCA of the nomogram in the test cohort is shown in Fig. 1d.

Discussion

This study found that radiomics signature derived from non-contrast CT scans outperformed conventional radiological models in predicting the success of SWL treatment for ureteral stones larger than 10 mm, especially when using an SVM-based ML classifier. In addition, incorporating SVM-based radiomics signature and PUD into the nomogram enhanced prediction performance in the test cohort compared to radiomics signature alone, providing important information for proper treatment selection.

In vitro studies suggest that stone heterogeneity in CT scans can indicate susceptibility to SWL for cystine and calcium oxalate stones [8, 19], but not for brushite stones [20]. However, naked-eye observation of stone heterogeneity may not be a reliable predictor of SWL outcomes. Previous studies have used various methods to describe stone heterogeneity [18, 21, 22], but SDSD and VCSD did not show a significant correlation with SWL success in our study, consistent with previous findings [22]. While some studies have suggested that CT texture analysis may offer novel parameters to predict SWL response, its effectiveness remains controversial [9, 11, 22, 23].

Some studies [9, 22] found that CT texture parameters did not improve SWL outcome prediction, with an AUC ranging from 0.64 to 0.7.

The morphology of urinary stones can predict their fragility, as their visual characteristics are closely related to their chemical composition [12]. An automated approach to quantify stone surface morphology, distinguishing between COM and COD stones on CT images, demonstrated excellent discrimination ability with AUC values of 0.93 and 0.90, respectively [12]. This shows potential for clinical use as COD stones are highly responsive to SWL, while COM stones are more resistant. Additionally, conventional CT technology, such as HU values, cannot reliable distinguish COM and COD stones due to their comparable chemical compositions [8]. Radiomics provide a more comprehensive assessment of stone characteristics, including shape features derived from morphology and texture features based on higher-order gray scale matrices [13]. As expected, in our study, the proposed radiomics signature consisting of 25 robust features was identified as an independent factor for predicting SWL success in patients with ureteral stones larger than 10 mm and demonstrated excellent predictive performance in both training and test cohorts.

Based on five different ML models, radiomics signature models outperformed traditional radiological models in the test cohort. Nonlinear SVM showed more stable predictive efficiency and better generalization capability than the other four models, possibly due to our sample size being insufficient for complex models like RF, our training data being nonlinear or linearly inseparable, and SVM achieving comparable performance with a smaller training set. Despite few studies using ML algorithms to predict SWL outcomes, a decision tree model [5] yielded an excellent AUC of 0.95 for predicting SWL outcomes using various clinical and stone characteristics. However, it was not validated in an independent set. Meanwhile, the RF algorithm was used to predict successful SWL, but the limited sample size of 51 patients may affect the reliability of the results [24].

Our study had several limitations. First, the proposed nomogram was developed using data from a single center. To validate the reliability and generalizability of our prediction model, prospective multicenter studies with large datasets are required. Second, other factors that may affect SWL’s success were not included, such as the lithotripter effectiveness, operator skills [25], etc. Third, subsequent treatments were not followed-up and evaluated after SWL failure.

In conclusion, this study indicated that the CT-based radiomics model showed superior predictive ability in predicting the success of SWL treatment in patients with ureteral stones larger than 10 mm than traditional radiological models, especially when using an SVM-based classifier. Therefore, incorporating radiomics and PUD may provide a reliable prognosis to distinguish patients with a high probability of successful treatment outcomes from those with a low probability before making an SWL decision. Based on the nomogram we developed, SWL is suitable for eligible patients with ureteral stones larger than 10 mm.