Introduction

Urolithiasis is a common urological condition with increasing prevalence in the United States and worldwide, contributing to significant healthcare and economic burden [1, 2]. Noncontrast CT is considered the initial imaging modality of choice for patients with suspected stone disease [3]. CT is an accurate and widely available tool for evaluation of urolithiasis [4], identifying associated urinary abnormalities, and monitoring treatment response. The introduction of dual-energy CT (DECT) has widened the scope of CT in stone evaluation by allowing the stone composition determination, which can guide optimal patient management [5]. Specific chemical compositions of stones, particularly in vivo identification, impact therapeutic approaches, including early medical therapy and stone prevention for uric acid stones with urinary alkalization and choice of urologic treatment for cystine, brushite, and calcium stones [6].

DECT uses low and high x-ray spectra to separate different materials based on a unique attenuation profile for each substance at specific x-ray energies [7]. It offers material-specific information and enables estimation of the effective atomic number (Zeff), leading to precise determination of stone composition, which is not possible with traditional attenuation measurement method using single-energy CT [5]. There is robust evidence on the benefit of DECT in characterizing stone composition; however, a large portion of data is from source-based DECTs [8,9,10]. A recent meta-analysis showed that DECT has a pooled sensitivity of 88% and a specificity of 98% for differentiating uric acid from non-uric acid calculi [11]. However, this meta-analysis included mainly dual-source (dsDECT) and rapid kilovoltage-switching (rsDECT) DECT scanners with one study on dual-layer detector DECT (dlDECT). Only few studies [12,13,14] evaluated urinary stones using dlDECT, a recent technical approach to DECT using a single X-ray source and a two-layer detector.

Urolithiasis affects patients of all body habitus, including large-sized patients, which provides challenges for stone characterization with DECT. The utilization of DECT in patients weighing over 118 kg is frequently avoided due to concerns regarding its diminished efficacy, potentially resulting from photon deprivation and increased image noise [15, 16]. The feasibility of DECT in these patients is important as the incidence of urolithiasis is higher in patients with obesity [17]. Previous studies have shown acceptable image quality and accuracy using source-based DECT scanners for stone characterization in large body habitus patients [18, 19]. However, to our knowledge, no study has explored the utilization of dlDECT for urinary stones in patients with large body habitus. Therefore, our study aimed to determine the diagnostic performance of dlDECT in the evaluation and characterization of urinary stones and examine its feasibility and image quality in patients of different sizes.

Materials and methods

Patients

Our institutional review board approved this retrospective HIPAA-compliant study and waived the requirement for informed consent. This study comprised consecutive patients (≥ 18 years old) undergoing abdominal CT examinations to assess urinary stones for initial diagnosis/follow-up on the dlDECT scanner between July 2018 and November 2019. Patients without stones, incomplete clinical records, or incomplete imaging were excluded. To assess the influence of body habitus on stone characterization and image quality, we dichotomized patients based on a threshold of 90 kg to classify individuals with large body habitus, as previously stated in the literature [18, 20]. Patient demographics, clinical history, and laboratory results were extracted from electronic medical records. Radiation dose data, including volume CT Dose Index (CTDIvol) and dose-length product (DLP), were also collected.

Dual-energy CT acquisition and imaging reconstruction

All patients underwent an unenhanced stone protocol CT in the prone position on a dlDECT scanner (IQon, Philips, Best, The Netherlands; Table 1). The following DECT images were generated on the scanner console for routine stone protocol: axial 5-mm 120 kVp images, coronal and sagittal 3-mm reformatted images, and axial color-overlay Zeff images. All images were transferred to the picture archiving and communication system (PACS, Visage 7, Visage Imaging, San Diego, CA) for further analysis.

Table 1 Dual-energy CT scanning parameters

Subjective analysis

Two radiologists with specialized training in abdominal imaging (15 and 3 years of experience) independently reviewed 120 kVp and Zeff images on a PACS workstation. The radiologists analyzed the imaging quality score based on the European Guidelines on Quality Criteria for Computed Tomography [18]. The image quality analysis included diagnostic acceptability, subjective analysis of image noise, and artifacts, which were evaluated using a 4-point Likert-scale (Table 2). These radiologists, blinded to crystallography results, visually determined the characterizability of the stone composition on the color-overlay Zeff images as follows: 0, not possible; 1, partially possible; and 2, fully possible. The radiologists then subjectively analyzed the stone composition following the Zeff color scale (uric acid stones: color-coded red/yellow or non-uric acid stones: color-coded blue). Any stones showing both red/yellow and blue colors were considered mixed stones. Discrepancies between readers regarding the determination of stone composition were resolved in a separate consensus review.

Table 2 Image quality scores based on the European Guidelines on Quality Criteria for Computed Tomography

Objective analysis

Quantitative analysis was performed on a PACS workstation by two separate radiologists (10 and 5 years of experience). The radiologists measured objective image noise by placing circular regions of interest (ROI, 1 cm2) over the subcutaneous fat at the level of kidneys and peritoneal fat at the levels of vesicoureteral junction (VUJ). The transverse diameter of all patients at the level of kidneys on axial images was also measured (Fig. S1). Stone size was measured as the longest linear dimension of the stone on magnified bone window settings on axial images [21]. The average values from two radiologists were used for further analysis. Zeff values of stones with available reference standards were measured on a separate dedicated post-processing software (IntelliSpace Portal, version 11.1, Philips, Best, The Netherlands) using the largest possible ROI for each stone on the axial plane while meticulously avoiding inclusion of unrepresentative tissue.

Reference standard

Crystallographic analysis of the removed stones was performed to assess stone composition in patients who had definitive treatment with stone extraction using semiquantitative polarized microscopy with infrared spectrophotometry. We used the crystallography results as a reference standard for stone characterization when available. A stone with its main component constituting ≥ 80% of its total volume was considered pure stone [22].

Statistical analysis

Continuous variables such as patient and stone characteristics were presented as mean±standard deviation or median and interquartile range (IQR)/range. The Wilcoxon rank-sum test was used to compare these data between patient groups. Categorical variables were presented as frequencies and percentages, and the Chi-square test or Fisher’s exact test was performed to compare these variables. Diagnostic accuracy, including sensitivity and specificity, was also calculated. We also repeated these analyses on subgroups defined by weight and transverse diameter categories. Logistic regression was used to evaluate the effect of stone size, patient weight, and transverse diameter on stone characterization. Receiver operating characteristic (ROC) analysis was performed to assess the diagnostic performance of Zeff in stone characterization and to define an optimal cutoff point of Zeff values with maximal Youden index. We used the overall proportion of agreement for determining interobserver agreement instead of Kappa coefficient analysis. Because the data distribution across categories influenced Kappa analysis, this measure was not a proper indicator in our study [23, 24]. P values less than 0.05 (two-tailed) were considered statistically significant. All analyses were performed using Stata statistical software (Stata 16, 2019; StataCorp).

Results

Patients

Initially, 294 CT examinations from 268 patients met the inclusion criteria. 117 CT examinations from 113 patients were excluded due to absence of stones (n = 103 CT examinations from 99 patients), incomplete clinical information (n = 2 CT examinations from 2 patients) and incomplete imaging (n = 12 CT examinations from 12 patients). Finally, this study included 177 CT examinations from 155 adults (80 men and 75 women) with an average age of 57 ± 15 years (Fig. 1, Table 3). The median body weight was 82.6 kg (IQR 70.8–95.7 kg). There were 119 examinations from 105 patients in the group weighing ≤ 90 kg (median weight 74.8 kg, IQR 67.1–83.0 kg) and 58 examinations from 50 patients in the group weighing > 90 kg (median weight 104.3 kg, IQR 96.2–116.1 kg). The average transverse diameter was 370.3 ± 52.6 mm. The CTDIvol and DLP were 7.1 ± 3.4 mGy and 364.4 ± 94.2 mGy-cm, respectively.

Fig. 1
figure 1

Flowchart demonstrates the inclusion and exclusion criteria of eligible patients and the study design

Table 3 Characteristics of patients and CT examinations

Detection and characterization of urinary stones

We detected a total of 739 urinary tract stones with a median size of 3.7 mm (range 1–35 mm). The stone location was as follows: (1) in the kidneys (n = 616, 83%), (2) proximal/mid ureter (n = 35, 5%), (3) distal ureter in pelvis/urinary bladder (n = 88, 12%). In 86% (638/739) of urinary stones, the radiologists could subjectively interpret the stone composition on the color-overlay Zeff images (non-uric acid, n = 560; uric acid, n = 32; mixed stones, n = 46). The uncharacterized stones showed smaller size (median size 1.6 mm, IQR 1.4–2 mm) than characterized stones (median size 4 mm, IQR 3–6 mm; p < 0.001). For stones ≥ 3 mm, the radiologists could subjectively analyze the composition in all stones (100%, 491/491).

On group comparison, there was no significant difference in stone characterizability between patients weighing >90 kg and ≤ 90 kg (p = 0.20), with 85.7% (132/154) of stones characterized in the > 90 kg group and 86.5% (506/585) in the ≤ 90 kg group. The median size of uncharacterized stones was slightly larger in patients weighing > 90 kg (22 stones; 2.0 mm [IQR 1.6–2.1 mm]) than in patients ≤ 90 kg (79 stones; 1.5 mm [IQR 1–2.0]; p = 0.004). In patients weighing > 118 kg (10 CT examinations from 9 patients), we could characterize 32 out of 37 stones (median size 4 mm, IQR 3–8.3). The median size of uncharacterized stones in this subgroup was 2 mm (IQR 1.9–2.1). Multivariable logistic regression revealed that stone size (OR 114.1, 95%CI 39.0–333.8; p < 0.001) and patients’ transverse diameter (OR 0.98, 95%CI 0.96–0.99; p = 0.003) were independent predictors for the stone characterizability. However, the stone characterizability revealed no significant association with patients’ weight (OR 1.04, 95%CI 0.99–1.08; p = 0.09).

Seventy-four stones had available crystallographic results. The dlDECT characterization by subjective analysis showed a concordant result in 71 stones (96%) compared with crystallography (non-uric acid, n = 61; uric acid, n = 8; mixed stones, n = 2; Figs. 2 and 3). Only one calcium-based stone was misclassified as uric acid stone, and two uric acid stones were misclassified as non-uric acid and mixed stones.

Fig. 2
figure 2

Two case examples of stone characterization confirmed by the crystallographic result. Axial 120 kVp (a) and color-overlay effective atomic number (Zeff) images (b) in a 49-year-old man (weight = 124.7 kg) demonstrate a 16.2-mm stone at the right ureteropelvic junction with color-coded red. Both subjective and objective (Zeff = 7.05) analyses are suggestive of uric acid stone. Axial 120 kVp (c) and color-overlay Zeff images (d) in another 54-year-old man (weight = 74.8 kg) reveal a 11-mm left renal stone with color-coded blue, consistent with non-uric acid stone. Additionally, the measured Zeff value was 11.72, indicated calcium-based stone. Both CT examinations show good image quality without significant difference, even though these two patients had different body sizes

Fig. 3
figure 3

Another case example illustrating stone characterizability in a small-sized stone. Axial 120 kVp (a) and color-overlay effective atomic number (Zeff) images (b) in a 69-year-old woman (weight = 76.66 kg) demonstrate a 2.4-mm stone located in the right kidney, color-coded as blue, indicative of a non-uric acid stone. The small size of this stone precludes Zeff measurement. Crystallographic analysis revealed 90% calcium phosphate (apatite) and 10% calcium carbonate composition

To discriminate uric acid stones from non-uric acid-containing stones, dlDECT revealed a sensitivity of 80% (8/10, 95%CI 44–98%), a specificity of 98% (63/64, 95%CI 92–100%), a positive predictive value of 89% (8/9, 95%CI 52–100%), and a negative predictive value of 97% (63/65, 95%CI 89–100%), respectively. The diagnostic performance of dlDECT was comparable between patients weighing > 90 kg (sensitivity 83% [5/6, 95%CI 36–100%]; specificity 98% [47/48, 95%CI 89–100%]) and ≤ 90 kg (sensitivity 75% [3/4, 95%CI 19–99%], p > 0.99; specificity 100% [16/16, 95%CI 79–100%], p > 0.99). We also found no evidence of a difference in sensitivity and specificity between two groups of patients with different transverse diameters (> 360 cm: sensitivity 60% [3/5, 95%CI 15–95%]; specificity 100% [32/32, 95%CI 89–100%] and ≤ 360 cm: sensitivity 100% [5/5, 95%CI 48–100%], p = 0.44; specificity 97% [31/32, 95%CI 84–100%], p > 0.99).

Zeff analysis

Of 74 stones with available crystallographic results, 4 stones (1 uric acid and 3 calcium-based stones) were excluded for measurement of Zeff value due to small size (< 3 mm). Thus, Zeff measurement was performed in 70 stones (uric acid, n = 9; struvite, n = 2; cystine, n = 2, calcium-based stones, n = 55, mixed stone, n = 2). There was a significant difference in Zeff among different types of stones (p < 0.001). The mean Zeff value of each stone type was 7.10 ± 0.38 for uric acid stones, 9.97 ± 0.17 for struvite stones, 10.41 ± 0.44 for cystine stones, and 10.83 ± 1.03 for calcium-based stones (Fig. 4). Uric acid stones showed significantly lower Zeff values compared to struvite (p = 0.04), cystine (p = 0.04), and calcium-based stones (p < 0.001). ROC analysis revealed an excellent performance of Zeff with an AUC of 1.00 (95%CI 1.00–1.00) using a cutoff of 7.66 in discriminating uric acid stones from non-uric acid stones.

Fig. 4
figure 4

Box-and-whisker plot demonstrates the effective atomic number (Zeff) in each type of urinary stone

Among non-uric stones, calcium-based stones showed a slightly higher mean Zeff without statistical significance compared to struvite (p = 0.19) and cystine stones (p = 0.47). There was no statistically significant difference of Zeff between cystine and struvite stones (p = 0.33). Two mixed stones revealed Zeff values reflecting its dominant component: one stone with 60% uric acid and 40% calcium oxalate monohydrate (Zeff = 7.74) and one stone with 70% calcium oxalate monohydrate and 30% uric acid (Zeff = 9.95). For ROC analysis, Zeff revealed an AUC of 0.78 (95%CI 0.64–0.92) and an AUC of 0.91 (95%CI 0.84–0.99) for differentiation of calcium-based stones from other non-uric stones (Fig. 5) and from all stone types (Fig. 6), respectively. A cutoff of 10.17 revealed a sensitivity of 76% and a specificity of 83%, and a sensitivity of 76%, and a specificity of 93% for discriminating calcium-based stones from other non-uric stones, and all stone types, respectively.

Fig. 5
figure 5

Receiver operating characteristic (ROC) curve showing diagnostic performance of effective atomic number (Zeff) in discriminating calcium-based stone from other non-uric acid stones

Fig. 6
figure 6

Receiver operating characteristic curve (ROC) showing diagnostic performance of effective atomic number (Zeff) in discriminating calcium-based stone from all stone types

Additional details about Zeff measurement of each stone type with reference to calculated Zeff based on the National Institute of Standards and Technology (NIST) demonstrate in Table 4 [25]. Struvite stones showed a lower Zeff range than cystine and calcium phosphate stones but had an overlapping range with calcium oxalate stones.

Table 4 Effective atomic number (Zeff) of urinary stones with confirmed composition

Image quality and noise

Both readers rated acceptable diagnostic acceptability and low artifact scores on both 120 kVp (mean scores: 3.94 ± 0.26 and 1.20 ± 0.45) and Zeff images (mean scores: 3.89 ± 0.31 and 1.32 ± 0.50) with 71–95% interobserver agreement (Table 5). The subjective analysis also revealed acceptable image noise for both readers (2.01 ± 0.05). Patients weighing > 90 kg had slightly lower diagnostic acceptability scores compared to patients weighing ≤ 90 kg on both 120 kVp (3.90 vs. 3.96; p = 0.02) and Zeff images (3.75 vs. 3.96; p < 0.001) (Table 6). Regarding the presence of artifacts, there were higher scores for patients weighing > 90 kg (120 kVp images: 1.27 vs. 1.17; p = 0.04 and Zeff images: 1.59 vs. 1.20; p < 0.001). In both weight groups, subjective image noise scores were similar (> 90 kg: 2.02 vs. ≤ 90 kg: 2.00; p = 0.21). Meanwhile, there was a slightly higher objective image noise in the patients weighing > 90 kg at the level of the kidney (≤ 90 kg, 10.9 ± 2.3 vs. > 90 kg, 11.6 ± 1.8, p = 0.03) and the level of VUJ (≤ 90 kg, 16.3 ± 3.1 vs. > 90 kg, 17.3±3.0, p = 0.049).

Table 5 Image quality scores and agreement for Dual-Energy CT datasets
Table 6 Image quality scores for the two weight categories

Discussion

In vivo determination of stone composition allows for effective triage of patients into various urological treatment strategies. Source-based DECT scanners, including dual-source and rapid kilovoltage-switching platforms, have been demonstrated to enable assessment of stone composition in several studies, including in patients with large body habitus [11, 18, 19, 26]. Nevertheless, the diagnostic performance of dlDECT, a detector-based DECT platform that was introduced more recently in urinary stone disease, has yet to be thoroughly investigated. Our study found that dlDECT had a good ability to characterize urinary stones and could determine the composition in all stones ≥3 mm. For subjective and objective analyses, dlDECT demonstrated high accuracy in distinguishing uric acid from non-uric acid stones, even in patients with large body sizes. Image quality was likewise acceptable in the entire cohort.

Dual-layer detector DECT scanner allows for accurate registration of the spectral data and retrospectively generating dual-energy image series, permitting the assessment of incidental findings. One possible drawback of this scanner type is the lower spectral separation than a dsDECT scanner [27, 28]. However, we demonstrated high specificity (98%) and accuracy (96%) of dlDECT in differentiating uric acid stones from non-uric acid stones by subjective analysis, similar to the prior studies in dsDECT and rsDECT [11, 26, 29]. The high specificity is crucial for clinical implication as false-positive interpretation may lead to delayed intervention in non-uric acid stones. Our sensitivity (80%) is comparable to slightly lower than prior reported (82-96%) in source-based DECT [11, 26, 29]. This visual analysis can be easily done on a standard diagnostic workstation without any complex post-process techniques, leading to easy application in clinical practice.

In this study, we also quantitatively analyzed stone composition using Zeff measurement. Zeff is a unique value for each material, reflecting its density and atomic number [9]. Our study found that Zeff analysis can accurately discriminate uric acid stones from non-uric acid stones, concordant with previous studies in rsDECT scanner [30, 31]. Regarding discrimination of non-uric acid stones, Zeff analysis could not completely stratify each type of non-uric acid stones in our study. Nevertheless, we found that the measured Zeff of struvite stones had a lower mean Zeff, and calcium-based stones showed a slightly higher mean Zeff than other non-uric acid stones without statistical significance. This lack of statistical significance is probably related to the small number of stones included in the respective categories. The AUC of Zeff for discriminating calcium-based stones in our study is within an acceptable range (AUC = 0.78). Two prior studies evaluating rsDECT demonstrated that Zeff accurately classified non-uric stones in the phantom [9, 32]. In comparison, Rompsaithong et al. revealed the limitation of rsDECT in patients in diagnosing subtypes of non-uric acid stones [31]. Different study designs, study cohorts, and types of DECT scanners could be underlying reasons for differences between these studies, including ours. We also observed that non-uric acid stones except struvite stones showed substantially lower Zeff values than the expected values based on NIST. Cannella et al. also found consistently and significantly lower Zeff values of non-uric acid stones in their study than the expected values from NIST [32].

As mentioned earlier, a limited number of studies about urinary tract stone characterization were conducted in dlDECT scanners. Our analysis yielded outcomes consistent with prior studies employing dlDECT, both in ex vivo and in vivo settings, to accurately distinguish uric acid from non-uric acid stones [12, 14]. These results indicate the robustness and consistency of this diagnostic approach across different studies. However, it's important to note that challenges persist in the accurate subclassification of non-uric acid stones, particularly those composed of struvite and cystine, as observed in previous studies [13, 14]. This common challenge underscores the need for further research and the development of more refined techniques to enhance the subcategorization of these non-uric acid stones.

For image quality, large patients had slightly lower scores for diagnostic acceptability and higher scores for artifacts than small patients. However, the scores for large patients were still within an acceptable range, and there was no clinical difference between the two groups in terms of stone characterizability and diagnostic accuracy. These findings are in line with prior studies on similar topics using dsDECT and rsDECT [18, 19, 22]. Duan et al. also evaluated the image quality of abdominal dlDECT in phantom and patients of various sizes and found no significant image quality drop in large phantoms/patients [33], which may relate to the dlDECT anti-correlated noise capabilities. Additionally, objective image noise was somewhat higher in large patients, but there was no significant difference in visual perception in our study.

The failure of stone characterizability in our study can be attributed to the influence of stone size and patients’ transverse diameter, both of which were independent predictors for the stone characterizability in multivariable logistic regression analysis. Interestingly, our analysis did not find a significant association between stone characterizability and patients’ weight, and there was no significant difference in stone characterizability between patients weighing > 90 kg and those weighing ≤ 90 kg. This apparent contradiction between transverse diameter and weight in predicting stone characterizability warrants clarification. One plausible explanation is the impact on image quality. Notably, Kalra et al. reported that the maximum transverse diameter of the abdomen holds the strongest association with image quality when compared to other parameters, such as patient weight and abdominal wall thickness [34]. Furthermore, Haaga emphasized that patient diameter serves as a more reliable predictor of the tube current requirement than body weight, given that diameter correlates better with the distance traveled by the X-ray beam [35].

There are several limitations to our study. First, this study was a retrospective single-center study. Our study had a relatively small number of uric acid and some subtypes of non-uric acid stones with available crystallographic results, particularly struvite and cystine stones, which follow the known general prevalence of the corresponding stone types [36]. This issue caused our limited evaluation of the diagnostic ability in subcategorizing non-uric acid stones, which should be further explored in a larger study size in the future. Next, we evaluated stones using 5-mm axial images in our routine practice, which may induce a partial volume effect in small stones, causing difficulty in recognizing mixed stones. Additionally, a few patients in our study underwent more than one CT examination, resulting in the possibility of some duplicated stones. Nevertheless, these stones can exhibit variations in size, appearance, and location over time. Consequently, we treated each CT scan as an independent dataset due to the distinct imaging characteristics associated with each scan. We also did not directly compare the diagnostic performance of dlDECT with dsDECT and rsDECT in our investigation since it was beyond our primary objective; nonetheless, our findings were comparable to previous studies employing dsDECT and rsDECT. Future research into this topic might directly demonstrate the non-inferiority of this technique. Lastly, the stone composition analysis could only be performed in patients with extracted stones.

In conclusion, dlDECT accurately differentiated uric acid from non-uric acid stones in all clinically significant stones larger than or equal to 3 mm with acceptable image quality among patients with different body sizes. The in vivo diagnosis of uric acid stone could lead to proper triage and early initiation of medical therapy.