Introduction

Lumbar disc herniation is a common cause of low back pain. Magnetic resonance imaging (MRI) is typically used to diagnose lumbar disc herniation owing to its excellent soft tissue contrast [1]. However, MRI presents limitations, including cost and contraindications such as pacemakers or cochlear implants [2]. Furthermore, MRI is challenging to perform in individuals with the inability to remain motionless for extended periods. Computed tomography (CT) is cost-effective and has a rapid scan time. Nevertheless, it has diagnostic limitations for disc herniation owing to its limited contrast resolution between discs and the cerebrospinal fluid (CSF) [3]. In a recent meta-analysis, CT had 82% and 78% sensitivity and specificity, respectively, for diagnosing lumbar disc herniation [4]. Another study on standard CT (SC) assessment of lumbar disc herniation reported a moderate diagnostic accuracy of 75.8% in 2020 [5]. Meanwhile, dual-energy CT (DECT) with color-coded virtual non-calcium (VNCa) images has been reported to have higher diagnostic accuracy by 92% and 95% for the detection of lumbar and cervical disc herniation, respectively, compared to 82% and 77% for SC [6, 7].

Electron density (ED) reflects the distribution of electrons in atoms or molecules, representing the probability of an electron being present in a specific region around the atomic nucleus [8]. DECT with ED images derived from the differentiation of Compton scattering and photoelectric effect attenuation through the measurement of attenuation at two different energies has been recently introduced [9]. ED shows a linear correlation with mass density and Hounsfield units (HU) in materials with HU > 0 at kilovoltage CT energies. Thus, ED images have substantial potential for diagnostic use based on the mass density of materials [10, 11]. ED images have been primarily used to calculate the radiotherapy dose, but several studies on the use of ED maps for diagnostic purposes in radiology have emerged [8, 12,13,14,15,16]. We recently reported better diagnostic performance of ED images in the detection of cervical disc herniation, compared to SC and non-color-coded VNC DECT images [16]. However quantitative analysis was limited because ED images are noiseless by definition.

In this study, we aimed to compare the area under the receiver operating characteristic curve (AUC) of ED maps with SC and VNC images in the detection of lumbar disc herniation as the diagnostic accuracy metrics. In addition, we proposed quantitative measurements to evaluate the image quality of DECT with three reconstruction methods using coefficient of variance (CV) and the normalized contrast ratio (NCR) of intervertebral disc to CSF. To our knowledge, this study is the first to quantify the image quality of ED images.

Materials and methods

Ethics

Our institutional review board approved this retrospective study and waived the requirement for informed consent.

Patients

Among the 126 consecutive patients who underwent both non-contrast DECT and MRI of the lumbar spine within a 2-week interval between March 2021 and February 2022 at our tertiary hospital, we excluded those who underwent postoperative studies (n = 23) or showed spinal malignancy (n = 11), fracture (n = 18), or spondylodiscitis (n = 15). Overall, 59 patients were evaluated; 48 patients had chronic low back pain and/or pain radiating into the lower legs for > 3 months that had not responded to medication or spinal injections and 11 patients had new-onset low back pain or lower leg pain of < 3 months’ duration that had not responded to conservative treatment. Tingling or numbness in the lower extremities was an accompanying symptom in 14 patients, and motor weakness in 5 patients. Intervertebral discs were assessed from T12/L1 to L5/S1 (six levels per patient) in 59 patients. There was no exclusion of individual disc levels. Therefore, 354 lumbar discs from 59 patients were evaluated (Fig. 1). Among 59 patients, 52 underwent preoperative CT evaluation with MRI simultaneously or after MRI, while the remaining 7 underwent more detailed MR examinations of spinal stenosis or disc herniation after DECT. The mean interval between non-contrast DECT and MRI was 3.85 days (range, 0–14 days).

Fig. 1
figure 1

Flow chart for selection of the study population

DECT protocol

All the patients were scanned using a 128-slice MDCT single-source dual-layer detector spectral scanner (IQon, Philips Healthcare). All images were obtained in the craniocaudal direction from the middle of the T10 vertebral body to the symphysis pubis level. The CT scan parameters are specified in Appendix 1. The mean volume CT dose index ± standard deviation was 12.7 mGy ± 2.9 (range, 8.9–22.0 mGy), and the mean dose length product was 557.4 ± 144.1 mGy cm (range, 376.4–1021.0 mGy cm).

CT reconstruction of standard, ED, and VNCa images

SC images were reconstructed using a standard kernel. ED and VNCa images were generated from the spectral data using a commercially available visualization and analysis platform (IntelliSpace Portal 10.1; Philips Healthcare). To optimize the detection of disc herniation, we used optimized window levels (WLs) and widths (WWs) (ED: 105 and 25; standard and VNCa: 50 and 250) [11, 17]. The optimized WL and WW of the ED were calculated using a proportionality equation based on the spine soft tissue CT window setting [11, 17]. The readers were allowed to freely adjust the WL and WW during image assessment. The calcium suppression index of the VNCa images was fixed at 65 [18]. The slice thickness of all images was 3 mm, with an increment of 3 mm in axial and sagittal reconstructions.

MRI acquisition

All patients underwent non-contrast MRI using a 3.0-T instrument (Magnetom Vida, Siemens Healthineers) with the standard spine coil. The MR composition and parameters are presented in Appendix 2.

Image analysis

Image evaluation utilized the PACS viewer solution (G3, Infinitt). To establish the reference standard, two board-certified radiologists (with 21 and 13 years of experience in musculoskeletal imaging, respectively) blinded to any clinical or DECT information assessed MRI images in consensus to determine the presence of disc herniation. The assessment was based on the lumbar disc pathologic classification of the North American Spine Society (version 2.0). They also considered the presence and degree of nerve root impingement according to the Pfirrmann nerve root compression grading system (grade 1 vs grade 2–4), attribution to bulging or disc herniation, and the zone of disc herniation (Fig. 2) [19, 20]. The detailed definitions of disc herniation, nerve root compression, and disc herniated zones are provided in Appendix 3. Zonal discrepancies in disc herniation were defined as an absolute value of the difference between the reference zonal value and the readers’ zonal value.

Fig. 2
figure 2

Schematic of the zonal distribution of the herniated disc (a) and the Pfirrmann grade of nerve root impingement (b). a The zones of disc herniation are denoted as follows; foraminal zones, −3 at right and 3 at left; subarticular zones, −2 at right and 2 at left; paracentral zone, −1 at right and 1 at left; and central zone, 0. The herniated zone of the disc in a is denoted as 2. b For Pfirrmann grade of nerve root compression, grade 0 is defined as no compromise of the nerve root by the herniated disc; grade 1, a contact of the nerve root by the disc; grade 2, if the herniated disc causes the nerve root displacement; and grade 3, nerve root compression between the disc material and the wall of the spinal canal

After establishing the reference standard, four radiologists (three board-certified radiologists with 27, 10, and 7 years of experience in musculoskeletal radiology, respectively, and one radiology resident with 3 years of experience in general imaging) independently analyzed all three sets of axial and sagittal CT images without access to MRI results and clinical histories. To minimize the recall bias, each reader evaluated the imaging sets in different orders with an 8-week interval between each image set (Appendix 4). Assessments of diagnostic confidence and conspicuity of the intervertebral disc, which focused on the discrimination of the posterior disc margin and CSF space, were also performed for all CT series per disc level using a 5-point scale (diagnostic confidence: 1 = cannot diagnose, 2 = poorly confident, 3 = diagnose is possible, 4 = probably confident, and 5 = absolutely confident; conspicuity: 1 = completely indistinguishable, 2 = poor conspicuity, 3 = blurred margin with fair conspicuity, 4 = good conspicuity, and 5 = clear demarcation and excellent conspicuity).

For the quantitative analysis, we did not use the signal-to-noise ratio or contrast-to-noise ratio because there was no noise term in the ED equation [21,22,23,24,25]. We measured CV, which is the ratio of the standard deviation to the mean, to replace the conventional standard deviation as a measurement variation in the regions of interest (ROI) with a different scale [22,23,24,25]. Next, we coined the term “NCR” to show the contrast between the estimated densities of the normal or diseased disc and the CSF space using the Weber contrast equation [26, 27],

$${{{{{\rm{NCR}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{\rm{normal}}}}}}\;{{{{{\rm{disc}}}}}}\;\left({{{{{\rm{NCR}}}}}}{\mbox{-}}{{{{{\rm{normal}}}}}}\right)\,\\ = \,\frac{{{{{{\rm{Mean}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{{\rm{ROI}}}}}}}_{{{{{{\rm{normal}}}}}}{{{{{\rm{disc}}}}}}}\,-\,{{{{{\rm{Mean}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{{\rm{ROI}}}}}}}_{{{{{{\rm{CSF}}}}}}}{{{{{\boldsymbol{(}}}}}}\Delta {{{{{\boldsymbol{)}}}}}}}{{{{{{\rm{Mean}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{{\rm{ROI}}}}}}}_{{{{{{\rm{CSF}}}}}}}}$$
$${{{{{\rm{NCR}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{\rm{diseased}}}}}}\;{{{{{\rm{disc}}}}}}\;({{{{{\rm{NCR}}}}}}{\mbox{-}}{{{{{\rm{diseased}}}}}})\,\\ = \,\frac{{{{{{\rm{Mean}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{{\rm{ROI}}}}}}}_{{{{{{\rm{diseased}}}}}}\;{{{{{\rm{disc}}}}}}}\,-\,{{{{{\rm{Mean}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{{\rm{ROI}}}}}}}_{{{{{{\rm{CSF}}}}}}}({{{{\Delta }}}})}{{{{{{\rm{Mean}}}}}}\;{{{{{\rm{of}}}}}}\;{{{{{\rm{the}}}}}}\;{{{{{{\rm{ROI}}}}}}}_{{{{{{\rm{CSF}}}}}}}}$$

To calculate the CV and NCR, one radiologist performed imaging segmentation of three circular ROIs with a brush size of 10, which were placed in one normal disc, one diseased disc, and the CSF space per patient, using ITK-SNAP version 3.8.0 [28] (Appendix 5). To avoid invalid direct comparisons of values between different scales, both the CV and NCR were calculated from ROI measurements based on the images with min–max normalization [29, 30]. Normalization of all image sets and data extraction from the segmented ROIs were performed using Python 3.8. The equations of %EDW, CV, and min–max normalization are in Appendix 6.

Statistical analysis

Data analysis included evaluation of individual reader performance and integrated results from all four readers. The AUC, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) were calculated and compared for each method. The overall p value of the AUC was calculated from a random case, random reader, which was based on an ANOVA model designed for characteristics of multireader ROC data [31, 32] and a mixed-effects ANOVA performed on the jackknife pseudo-values to test the null hypothesis that the mean accuracy of the readers is the same for all the diagnostic tests studied. Their models were implemented in the MRMCaov package [33] in R version 4.3.2 (R Foundation for Statistical Computing). Individual reader p values were obtained from random case, fixed reader analysis. A subgroup analysis was performed for disc extrusion, with the expectation of superior diagnostic accuracy compared to disc herniation with protrusion. Linear mixed model statistics were used to analyze the zonal discrepancies, image conspicuity, and diagnostic confidence of the three CT series. Quantitative variables (CV and NCR-normal/-diseased) were compared per patient using the Wilcoxon signed-rank test. Inter-reader agreement among the four readers was measured using Fleiss’ kappa [34]. Kappa results were qualitatively stratified according to Landis and Koch [34]. All statistical analyses except for the multiple readers of multiple cases (MRMC) analysis were performed using SPSS version 26 (IBM Corporation) and MedCalc version 16.4.1 (MedCalc Software). Values of p < 0.05 were considered significant.

Results

Overall, 354 intervertebral discs in 59 patients (29 men and 30 women; mean age, 59.7 ± 13.8 [SD] years; range, 29–84 years; mean body mass index, 26.2 ± 4.25 [SD]) were included. Among them, Disc herniation was observed on MRI in 129 (36.4%) discs; disc protrusion and extrusion were observed in 32 and 97 (including 9 sequestrations) discs, respectively. Disc herniation was not observed for 255 (63.6%) discs (99 bulging and 126 normal discs). Spinal nerve root impingement was observed in 102 of 354 intervertebral discs (28.8%). Disc herniations were observed in the right foraminal zone (denoted as –3) in 2 discs, right subarticular zone (–2) in 18 discs, right central zone (–1) in 20 discs, central zone (0) in 37 discs, left central zone (1) in 24 discs, left subarticular zone (2) in 12 discs, and left foraminal zone (3) in 16 discs.

Diagnostic performance for lumbar disc herniation

Tables 1 and 2 show the diagnostic performances of the four readers for lumbar disc herniation by integrated and individual assessments, respectively. In the integrated assessment, the AUC for disc herniation and extrusion was significantly higher in ED images than in SC and VNCa images: 0.807 (0.745–0.870 [95% confidence interval]) for SC, 0.833 (0.786–0.880) for VNCa, and 0.871 (0.821–0.921) for ED for disc herniation (p = 0.002) (Table 1) (Figs. 3 and 4); and 0.766 (0.713–0.819) for SC, 0.770 (0.717–0.822) for VNCa, and 0.847 (0.800–0.900) for ED for disc extrusion (p < 0.001). ED images showed a significantly higher sensitivity for disc herniation than SC images: 70% (57–82%) for SC, 74% (68–81%) for VNCa, and 81% (70–93%) for ED (p = 0.018). Specificities were high for all imaging methods: 92% (88–96) for SC, 93% (89–96) for VNCa, and 93% (89–97) for ED for disc herniation (p = 0.842) and 93% (89–97) for SC, 93% (90–95) for VNCa, and 94% (92–97) for ED for disc extrusion (p = 0.625).

Table 1 Comparison of the diagnostic performance for lumbar disc herniation and extrusion as well as nerve root impingement on SC, VNCa, and ED images per intervertebral disc based on integrated data of four readers
Table 2 Comparison of the diagnostic performances of all readers for the detection of lumbar disc herniation per intervertebral disc based on each reader
Fig. 3
figure 3

A 54-year-old male with back pain. a, b T2-weighted sagittal image (a) and T2-weighted fat-suppressed axial image (b) show disc extrusion at the right central of L3/4 level. ch ED images (c, d) demonstrate disc herniation with a clearer posterior margin (circles) in comparison with VNCa images (e, f) and standard images (g, h)

Fig. 4
figure 4

A 70-year-old male with radiating pain in his left leg aggravated 1 month ago. a, b T2-weighted sagittal images (a, b) present a herniated disc at L5/S1. ch The herniated disc is slightly better seen on ED images (c, d) and VNCa images (e, f), but it is also detectable on standard images (g, h)

For assessments by each reader, the AUC of ED images was significantly better than that of SC images in assessments by readers 1 and 3 (p < 0.001) and that of SC and VNCa images in assessment by reader 2 (p < 0.001). However, it was not significantly different from that of SC and VNCa images in the assessment by reader 4, who was the most experienced (p = 0.361 for SC vs ED, and p = 0.157 for VNCa vs ED) (Table 2). The sensitivity of ED images was higher than that of SC images in assessments by reader 1 and that of SC and VNCa images in assessments by readers 2 and 3 (p < 0.05) (Table 2).

Diagnostic performance for nerve root impingement

For the assessment of nerve root impingement, the AUC was higher in ED images than in SC and VNCa images, but the difference was not significant: 0.841 (0.808–0.874) for SC, 0.840 (0.810–0.869) for VNCa, and 0.872 (0.834–0.908) for EC (p = 0.106) (Table 1). ED images showed a significantly higher sensitivity for nerve root impingement than that of VNCa images: 82% (76–89%) for SC, 79% (74–85%) for VNCa, and 87% (82–92%) for ED (overall p = 0.108, but p = 0.040 for VNCa vs ED) (Table 1). Meanwhile, specificities were not different: 86% (82–90%) for SC, 88% (84–93%) for VNCa, and 88% (83–92%) for ED (p = 0.344).

For individual assessments, the AUC of ED images was higher than that of SC and VNCa images in the assessment by all four readers, but the difference was only significant in the assessment by reader 2 (p < 0.001) (Table 2). The sensitivity of ED images for nerve root impingement was higher than that of SC and VNCa images in the assessment by reader 2 (p < 0.001) (Table 2).

Inter-reader agreement for lumbar disc herniation and nerve root impingement

There was substantial inter-reader agreement among four readers for lumbar disc herniation for all image sets: slightly higher for ED images (κ = 0.717) than for SC (κ = 0.680) and VNCa (κ = 0.670) images. Inter-reader agreement for nerve root impingement was substantial for all image sets: higher for ED images (κ = 0.801) than for SC (κ = 0.714) and VNCa (κ = 0.723) images.

Zonal discrepancy in disc herniation

The mean zonal discrepancy was < 0.5 for all image types (Supplemental Table 1). The highest values of zonal discrepancy were recorded in the SC images by all readers, except reader 1. The lowest values of zonal discrepancy were noted on ED images for readers 1 and 4, while they were on VNCa images for readers 2 and 3 (all p > 0.05).

Image conspicuity assessment and diagnostic confidence

Image conspicuity, with a focus on discriminating between the intervertebral disc and CSF space, demonstrated the highest performance in the ED images; the score was in the order of ED > VNCa > SC images (all p < 0.001; Fig. 5 and Table 3). The diagnostic confidence of lumbar disc herniation was significantly better, in the order of ED > VNCa > SC images (p < 0.05, Fig. 5), for all readers except for reader 4 in whom the diagnostic confidence of the VNCa images was lower than that of the SC images (Table 3).

Fig. 5
figure 5

Box and Whisker plot of subjective data averaged across four readers evaluating visual conspicuity (a) and diagnostic confidence (b). Both visual conspicuity and diagnostic confidence are significantly higher in ED images in comparison to standard and VNCa images (yellow lines represent mean values, orange lines represent median values). The diagnostic confidence of SC images is higher than expected, considering their visual conspicuity

Table 3 Comparison of image conspicuity and diagnostic confidence for the detection of lumbar disc herniation per intervertebral disc based on each reader

Quantitative measurement: CV and the NCR of the normal disc or the diseased disc (NCR-normal and NCR-diseased)

The CV values of all ROIs were much lower in the ED images than in the SC and VNCa images (p < 0.001) (Table 4). All CV values of the three ROI locations in the VNCa images were lower than those in the SC images (p < 0.001). The normalized densities and density differences between discs and CSF were significantly higher in the order of VNCa > ED > SC images. However, NCR-normal (△normal disc−CSF/CSF) was the highest in ED images (vs SC images, p = 0.001; vs VNCa images, p = 0.287). NCR-diseased (△diseased disc−CSF/CSF) ED images were significantly higher than those of both SC and VNCa images (Table 4).

Table 4 Normalized density, the NCR, and CV of normal disc, diseased disc, and CSF with regard to quantitative variables

Discussion

The present study revealed that, compared with SC and VNCa images, ED images exhibit higher AUC and sensitivity for lumbar disc herniation detection in the integrated data. Further, the sensitivity of ED images was significantly better than that of SC and VNCa images in assessments by readers 2 and 3 and that of SC images in assessment by reader 1. However, they were not statistically different in the assessment by reader 4, who was the most experienced reader, despite subjectively assigning much higher visual conspicuity and diagnostic confidence to ED images (p < 0.001). To our knowledge, this study is the first to quantify the image quality of ED images.

Considering our previous findings of greater improvement in the cervical spine for all readers regardless of reader experience [16], the improvement in the diagnostic performance of ED images in lumbar disc herniation in the current study was less than expected. In the cervical spine, ED images were especially effective at the C7/T1 level, which was challenging to assess due to the severe photon starvation artifact induced by the shoulder girdles. In contrast, the lumbar spine lacks anatomical structures that generate these artifacts, thereby preventing severe image degradation. Consequently, SC images demonstrated favorably good diagnostic accuracy, which was not significantly different from that of ED images in the most experienced reader. These findings may indicate that ED images are likely to be more effective in locations with higher noise levels.

In our study, ED images exhibited significantly lower CV values and higher NCR values than those of SC and VNCa images. To objectively substantiate our confidence in the good visual conspicuity of ED images, we adopted CV and NCR. CV offers a standardized measure of data dispersion within a concerned region around the mean value, independent of the unit [23, 25]. With its dimensionless properties, CV allowed us to assess density variations within each ROI concerning normal and diseased tissue, as well as CSF [23, 25]. Weber contrast, coined as NCR in this study, is a widely used method for evaluating contrast in situations where bright features appear against a dark background [26, 27]. These variables may facilitate the comparison of image quality in noiseless images.

If the densities of ED images are initially measured, they may show that the differences between structures are quite subtle in comparison to what is visually apparent. The soft tissue structures in ED images are mostly distributed in a narrow range from 100 (water) to 106 (liver) in the %EDW [11]. It is not surprising that the HU values of soft tissue organs are mostly concentrated within the range of 40–70 when the wide ranges of HU values of water and air are 0 and −1000 in a CT scan, respectively. Therefore, normalization is essential for comparing values at different scales [29, 30]. In addition, some negative HU measurements of the CSF on VNCa images were rescaled to positive values through normalization. This adjustment allowed for a stable and consistent comparison between CV measurements within a positive range and NCR [29, 30].

As ED images were originally developed for determining radiation treatment doses in patients, the utilization of ED images in diagnostic radiology has only recently been documented [8, 9, 12,13,14,15,16, 35]. Several case reports have demonstrated the favorable visual conspicuity of diseases on ED images in patients with hematoma or lung lesions characterized by significant differences in mass density relative to surrounding tissue [9, 14, 15]. We presumed that intervertebral discs could be more effectively visualized on ED images owing to their higher mass density (range, 1.07–1.101 g cm−3) compared to other soft tissue organs and CSF (range, 1.00–1.07 g cm−3), in addition to a lower CV of ED images [16, 36].

Various DECT reconstruction methods that hold the potential to enhance the diagnosis of disc herniation such as VNCa, ED, and collagen-sensitive maps have been introduced [7, 37,38,39]. DECT collagen-sensitive maps visualize collagenous structures with high attenuation of the densely packed fibers and a relatively low effective atomic number [36,37,38]. VNCa and collagen-sensitive images present bone-removal images to maximize the contrast of the dense disc and CSF space, without noise reduction preserving noisy image speckles [7, 37,38,39]. In contrast, ED images offer natural bone-present images with noise reduction, resulting in homogenously high-density disc images within natural bony structures. In our experience, this facilitates image interpretation and enhances air-containing lesion detection compared to bone-removal VNCa images.

Our study has some limitations, including a small sample size and lack of surgical confirmation of disc herniation. Three times of disc evaluations per patient may have resulted in recall bias, even with an 8-week interval between each assessment. Although the distinct visual differences between image reconstruction methods presented a challenge to full randomization, implementing it regardless of image sets and patients would have mitigated potential bias and produced more reliable results [40]. The absence of full randomization inevitably raises concerns about the influence of confounding variables on the study outcomes. Furthermore, although we altered the order of evaluation of the imaging set for each reader, the impact on the results remains unclear. Full randomization, however, is the best way to resolve these ambiguities and provide methodological rigor. We need to overcome these limitations in further research.

In conclusion, ED images have significantly higher diagnostic ability for lumbar disc herniation than SC and VNCa images. Additionally, diagnostic confidence is increased for all readers, possibly due to lower CV and higher NCR values. Therefore, ED images may be a valuable tool for improving the identification of lumbar disc herniation.