Introduction

Dual-energy computed tomography (DECT) allows the detection of infiltration of bone marrow by a malignant tumour or posttraumatic bone marrow lesions (BML), also often referred to as bone marrow oedema. [1,2,3,4,5,6]. Several studies suggest that, especially in patients with vertebral compression fractures, DECT provides valuable additional diagnostic information compared with standard clinical CT for the differentiation between acute and chronic fractures [7,8,9,10,11].

Two algorithms have been proposed for virtual calcium subtraction in a CT image: (i) a two-material decomposition algorithm (2MD), which classifies each voxel as calcium-containing or non-calcium-containing and, thus, enables the subtraction of calcium-containing voxels (also known as the black bone technique); (ii) a three-material decomposition algorithm (3MD), which allows the virtual reduction of the attenuation of each voxel in Hounsfield units (HU) according to the amount of calcium present, thus taking into account different concentrations and partial volume effects (also known as the virtual non-calcium technique) [12, 13]. The three materials used in this model are usually fat, water and calcium. While the 3MD algorithm is useful for the detection of BML [2,3,4,5, 7, 9, 10, 14, 15], the results with the 2MD algorithm have only been reported for the diagnosis of anterior cruciate ligament injury [16]. Additionally, there is an image data-based (ID) and raw data-based (RD) approach for further post-processing of the DECT data sets. RD takes advantage of the four material maps that are reconstructed along with the generic high- and low-kVp images. Recent studies suggested that it may help in dose reduction while maintaining image quality [17].

One of the major drawbacks of 3MD reconstructions is the inferior image quality [7, 18]. Iterative reconstruction (IR) of CT data sets can be used to reduce the radiation exposure or improve image quality or a combination of both compared to filtered back-projection (FBP) [19,20,21,22]. However, these effects are not yet reported for 3MD and 2MD reconstructions on DECT.

The aim of our study was to compare the diagnostic performance of different reconstruction algorithms of DECT data sets for the detection of BML using MRI as the standard of reference, on a single-source system. We expected RD and IR to result in superior image quality and an advantage in the detection of BML compared to FBP. To this end, we systematically analysed and compared the application of different levels of iteration, 2MD and 3MD algorithms, as well as a reconstruction tool based on CT raw data sets.

Material and methods

Study population

We retrospectively identified patients who underwent radiography, DECT and magnetic resonance imaging (MRI) of the thoracic and/or lumbar spine between January 2015 and February 2017. Further inclusion criteria were (i) age over 50 years, (ii) vertebral compression fracture with vertebral body collapse (mild, moderate or severe) detected using the standard CT bone kernel, (iii) availability of CT raw data set and (iv) MRI protocol including sagittal T1-weighted and short-tau inversion recovery (STIR) sequences. Patients who did not agree to the use of their data had a pathological fracture (medical record) or underwent MRI more than 2 days before or after DECT was excluded from the current analysis. We used the STROBE criteria for reporting observational studies [22, 23].

Imaging

The study patients underwent both a DECT and an MRI examination of the thoracic or lumbar spine. DECT was performed on a 320-row single-source imager (Canon AQUILION One Vision; Canon Medical Systems; Otawara, Japan) with tube voltages of 135 and 80 kV and automatic exposure control (standard deviation value of 12 HU) optimised for IR2. The two dual-energy scans were acquired using sequential volume acquisition (16 cm z-axis coverage) without table movement. The so-called wide volume mode was enabled if more than 16 cm scan length was needed. We determined the dose-length product (DLP) and the CT dose index (CTDIvol) and calculated the effective dose value using a conversion coefficient of 0.01 mSv·mGy−1·cm. Thereafter, the patients underwent an MRI examination at 1.5 T (MAGNETOM Avanto; Siemens; Erlangen, Germany or MAGNETOM Symphony Vision; Siemens; Erlangen, Germany) with a protocol including T1-weighted (repetition time (TR), 551; echo time (TE), 12; echo train length (ETL), 3; slice thickness, 3 mm) and STIR (TR, 6150; TE, 31; ETL,8; inversion time ,150; slice thickness ,3 mm) sequences in sagittal orientation.

Reconstruction

The raw data sets were anonymised and reconstructed using FBP and adaptive iterative dose reduction 3D (AIRD3D) with three different iteration levels (IR1: mild, IR2: standard and IR3: strong, indicating the number of iterations) using a medium soft-tissue kernel without beam-hardening compensation (FC13) at the CT scanner. 2MD and 3MD images were reconstructed from the FBP and IR1, IR2 and IR 3 data sets on the CT console (dual-energy image view and dual-energy raw data analysis, version 6, Canon MS, Japan). Additionally, we reconstructed 3MD images using raw data-based software (RD). Materials used for 3MD reconstruction were fat with−136/−106 HU, water with 0/0 HU and calcium with a gradient of 0.69. Sagittal images with 3 mm slice thickness were reconstructed from the primary 0.5 mm volume data sets for measurement and image scoring.

Target and reference vertebrae

Target vertebral bodies with a loss of height were identified on 120 kV equivalent reconstructed images in bone kernel by consensus of two readers (one radiologist specialised in musculoskeletal diseases with 8 years of experience and one research student with 1 year of experience). Furthermore, we defined a reference vertebra for every patient for use in quantitative image quality analysis only. The reference vertebra was the most inferior vertebral body visible in CT and MRI with normal signal and morphology in both imaging modalities but without a reduction of height or prior intervention such as kyphoplasty.

Image reading

All individual series of RD, 2MD and 3MD of every patient were anonymised separately and rated by three readers with different levels of experience (one consultant radiologist with 8 years of experience, a radiologist with 2 years of experience and a research student with 1 year of experience). The readers quantified the subjective image quality on a numeric scale from 0 (non-diagnostic) to 10 (best quality) and scored the presence and extent of BML on a 0–3 scale (0 = no BML; 1 = up to 1/3; 2 = up to 2/3; 3 = up to 3/3 of the vertebral volume).

Measurements

Attenuation of vertebral bodies was measured by reader 3 in regions of interests (ROIs) selected on a workstation with a high-resolution monitor using Osirix (Version 6.4; Pixmeo SARL; Bernex, Switzerland). Criteria for placing the polygonal ROIs were to include most of the bone marrow without cortical bone by keeping 2 mm distance and excluding the central vein (see Appendix 1). Attenuation of the bone marrow (in Hounsfield units, HU) and attenuation and standard deviation in an ovoid (3 mm2) ROI of air placed outside the patient’s body were measured.

Data analysis

A consensus reading of the MRI (reader 1 and 3) served as standard of reference for BML detection. For the analysis of diagnostic accuracy, binary data for presence of BML out of the 0–3 scale were calculated (0 = 0 [no BML detected]; > 0 = 1 [present BML]). We calculated the diagnostic accuracy (in terms of sensitivity and specificity based on the agreement of two of three readers) of the different reconstruction techniques (i.e. RD and FBP, IR1, IR2 and IR3 reconstructed with a 3MD and a 2MD algorithm) for the detection of BML and interrater reliability using Fleiss’s kappa. The subjective image quality ratings for the different reconstruction modes were compared using Friedman’s and Dunn’s multiple comparison test. For the objective assessment of image quality, we calculated the signal-to-noise ratio (SNR), the contrast-to-noise ratio (CNR) and the intervertebral ratio (IVR) with formulas 1–3.

$$ \mathrm{SNR}=\left|\frac{HU\ target\ vertebra- HU\ air}{standard\ deviation\ air}\right| $$

Formula 1 Calculating signal-to-noise ratio: SNR = signal-to-noise ratio, HU = Hounsfield units

$$ \mathrm{CNR}=\left|\frac{\mathrm{HU}\ \mathrm{target}\ \mathrm{vertebra}-\mathrm{HU}\ \mathrm{reference}\ \mathrm{vertebra}}{\mathrm{standard}\ \mathrm{deviation}\ \mathrm{air}}\right| $$

Formula 2 Calculating contrast-to-noise ratio: CNR = contrast-to-noise ratio, HU = Hounsfield units

$$ \mathrm{IVR}=\frac{\mathrm{HU}\ \mathrm{target}\ \mathrm{vertebra}-\mathrm{HU}\ \mathrm{reference}\ \mathrm{vertebra}}{\left|\mathrm{HU}\ \mathrm{reference}\ \mathrm{vertebra}\right|} $$

Formula 3 Calculating intervertebral ratio: IVR = intervertebral ratio, HU = Hounsfield units

A high IVR indicates a high contrast between target and reference vertebrae and thus good visibility of the BML. The IVR of vertebrae with and without BML was compared using the Mann-Whitney test. The objective image quality parameters of the different reconstruction modes were also compared using Friedman’s and Dunn’s multiple comparison test.

Calculations were performed using GraphPad Prism (version 6.0 h for Mac OS X, GraphPad Software, La Jolla California USA) and the Real Statistics Resource Pack add-in for Microsoft Excel for the calculation of Fleiss’ kappa. A p-value smaller than 0.05 was considered significant.

Results

Study population

Our retrospective analysis included 17 patients (11 women and 6 men) with a mean age of 70 years and a standard deviation of ±8 years (range, 56 to 87 years). The maximum time interval between MRI and DECT was ±2 days. A flow chart of the inclusion process is presented in Fig. 1.

Fig. 1
figure 1

Flow chart of retrospective study inclusion. A total of 260 patients were identified and evaluated for inclusion in our study; 62 were not older than 50 years, and 118 only underwent a DECT examination with a missing additional MRI or radiography. Raw data sets were not available in 60 cases, and reconstruction failed in three patients (one for 2MD and two for 3MD). Thus, we examined 17 patients with 39 vertebral compression fractures, of which 24 were positive for BML and 15 were negative

Target and reference vertebrae

Overall, we defined 39 target vertebral bodies and 17 reference bodies, 1 for each patient. Thus, we analysed 56 vertebrae, including 21 thoracic and 35 lumbar vertebrae. Twenty-four of the target vertebral bodies had an increased signal in MRI-STIR and were classified as vertebrae with oedema. Fifteen target vertebral bodies did not have high STIR signal and were classified as vertebrae without oedema.

Imaging

The mean scan length was 24.1 (standard deviation 7.9) cm, and wide volume mode was enabled in 16 cases. Examinations had a mean DLP of 321.7 (standard deviation 237.4; range 83.5 to 1168) mGy·cm and a mean CTDI of 11.7 (standard deviation 6.4; range 5.2 to 29.8) mGy. Thus, our mean effective dose was 3.2 mSv (standard deviation 2.4 mSv; range 0.84 to 11.68 mSv).

Image reading

The mean BML sum scores for the different reconstructions are presented in Appendix 2. For the 3MD images, the sensitivity for BML was higher using IR(1–3) (87.5% (21/24)) compared to FBP (75% (18/24)). The 2MD data sets had inferior sensitivity compared to 3MD (37.5% (9/24) for IR3 as the best result). RD had a sensitivity of 83.3% (20/24). Overall, the specificity was only moderate. All results are compiled in

Table 1. Absolute agreement of the readers for the presence of BML was as follows: 3MD: FBP 66.7% (26/39), IR1 71.8% (28/39), IR2 71.8% (29/39), IR3 79.5% (31/39); 2MD: FBP 69.2% (27/19), IR1 71.8% (28/39), IR2 71.8% (29/39), IR3 74.4% (29/39). The agreement was higher for IR (71.8% (28/39) to 79.5% (31/39)) than for FBP (66.7% (26/39) and 69.2% (27/39)). Reader agreement was best for 3MD_IR3 (79.5% (31/39)) and poorest for RD reconstructions (64% (25/30)). The interrater correlation for the detection of BML was poor using the 2MD series, with a Fleiss’s kappa from 0.05 to 0.24, was moderate using 3MD derived from both RD and FBP (kappa of 0.52 and 0.54, respectively) and was good for IR using 3MD (kappa of 0.62 to 0.72). Image quality was scored highest for IR3 using 3MD (6.86 ± 1.41) and was significantly higher for 3MD using IR compared to FBP (p < 0.0001) and 2MD images (p = 0.0001 to 0.0627). The results of image quality scoring are presented in Fig. 2C and in Appendix 3 and 4. Image examples of each type of reconstruction are shown in Fig. 3.

Table 1 Contingency analysis for the number of vertebrae with BML detected with different reconstruction modes. SE, sensitivity; SP, specificity; PPV, positive predictive value; NPV, negative predictive value. The data are given with 95% confidence intervals. All values were calculated based on the agreement of two of the three readers
Fig. 2
figure 2

Results for (A) SNR, (B) CNR and (C) image quality scores. SNR, signal-to-noise ratio; CNR, contrast-to-noise ratio; RD, raw data-based reconstructions; 3MD, three-material decomposition; 2MD, two-material decomposition; FBP, filtered back-projection; IR1–3, iterative reconstruction levels mild, standard and strong, respectively. The 3MD series provided significantly higher SNR and CNR than the 2MD series. There were no significant differences in the SNR between RD and IR reconstructions for 3MD images. Subjective image quality scores were significantly higher for the 3MD-IR and RD series compared with the 2MD series and 3MD-FBP. Boxes are the middle 50% of data, the lines in the boxes are the medians, and the brackets include the 95% confidence intervals. Points are stray bullets

Fig. 3
figure 3

Examples of different reconstructions showing a vertebral compression fracture of L1 with a BML (arrow) and a vertebral compression fracture of TH11 without a BML (arrowhead). RD, raw data-based reconstructions; 3MD, three-material decomposition; 2MD, two-material decomposition; FBP, filtered back-projection; IR1-3, iterative reconstruction levels mild, standard and strong, respectively

Measurements

Vertebrae with a BML showed increased attenuation in the series reconstructed using 3MD. 2MD series showed a decrease in HU for vertebrae with BML compared to vertebrae without a BML. The results of the vertebral attenuation measurement are summarised in Table 2. SNR and CNR were significantly higher for the 3MD algorithm (p < 0.02 for SNR and p < 0.03 for CNR) compared to the 2MD algorithm (see Appendix 5 and 6). RD-based reconstruction also achieved significantly higher SNR and CNR (p < 0.0001, respectively) than series reconstructed using 2MD. However, RD-based reconstruction did not show significantly higher SNR or CNR than ID series with 3MD. The SNR and CNR results are presented in Fig. 2. All reconstructed images allowed differentiation of vertebral compression fractures with and without BML using IVR (see Fig. 4).

Table 2 Results of attenuation measurement in Hounsfield units (HU). In the 3MD series, vertebrae with a BML had higher HU values. Series reconstructed with 2MD showed smaller HU values in vertebrae with BML
Fig. 4
figure 4

Results of IVR for vertebrae with and without BML. IVR, intervertebral ratio; RD, raw data-based reconstructions; 3MD, three-material decomposition; 2MD, two-material decomposition; FBP, filtered back-projection; IR1-3, iterative reconstruction levels mild, standard and strong, respectively. All reconstructions show significant differences between vertebrae with and without bone marrow oedema in MRI. However, IVRs are lower in 2MD, whereas they are higher in RD and 3MD reconstructions. The discrimination between vertebrae with and without oedema increases with the level of iteration in 3MD images

Discussion

In the present study, we investigated which reconstruction method is most useful for the detection of BML in patients with vertebral compression fractures. The subjective image quality ratings were significantly higher for IR series reconstructed using a 3MD algorithm compared to series reconstructed using FBP or 2MD and higher using RD reconstruction compared to FBP and 2MD. This impression was also confirmed by the objective image quality parameters. The diagnostic accuracy and interrater agreement were higher using 3MD. In contrast, only a few vertebrae with BML were detected using the 2MD series, and interrater correlation was poor.

While the use of IR and 3MD improved the detection of BML and image quality, image reconstruction using 2MD (the so-called black bone technique) is not suitable to diagnose the presence of BML in fractured vertebral bodies. It seems that the differences in attenuation between fresh and old compression fractures seen on 2MD images are mostly driven by the compression mechanism and, thus, a higher concentration of calcium rather than detection of a higher water content. Compression of trabecular structures thus leads to an increased content of calcium per voxel and a lower density in 2MD reconstructions because more calcium-containing voxels are detected and subtracted. These findings might explain the higher HU for vertebrae without BML imaged with 2MD (Table 2) as sign of sclerosis in these vertebrae. Our data suggest that IR dose does not negatively influence the detectability of BML in vertebral compression fractures. Therefore, the use of IR results in an increase in image quality or reduction of the radiation exposure is compared to FBP. Surprisingly, RD-based reconstructions did not show an advantage over reconstructions using ID (with IR) in our study. However, in view of the moderate diagnostic accuracy in our results and the moderate interobserver agreement, the detection of BML with DECT remains a challenge. Experienced readers are needed to evaluate these images, even though IR in 3MD may improve the image quality and detection of BML.

Whereas some studies investigated the influence of IR on DECT images [24], to the best of our knowledge, this is the first study to analyse its value for the detection of BML. Therefore, data from other vendors and DECT techniques are missing for comparison. However, concerning diagnostic accuracy, we found a rather low specificity of 66.7% of current 3MD algorithms. In a recent meta-analysis, Li et al. found a very high specificity of 95.1% for DECT using the virtual non-calcium technique (3MD) [25]. A specificity of 97% was reported by Suh et al. [26]. This discrepancy might in part be attributable to the use of different statistical methods (we only included fractures and not all depicted vertebrae in our analysis) or to different scanning techniques (e.g. we did not use a tin filter for better spectral separation). However, our results are consistent with the findings recently reported by our group that found a reader-dependent sensitivity of 72% and specificity of 70% using the same dual-energy technique [11]. Images reconstructed with IR were rated higher and showed increased SNR and CNR compared to FBP. In this respect, our results are consistent with the literature [19,20,21,22].

Limitations of our study include a small number of patients, so our statistical evaluation might be underpowered, and its retrospective nature. Therefore, we were not able to assess a specific degree of traumatic fractures nor to compare the differences between traumatic and tumoural fracture. We included only patients over 50 years of age to focus on osteoporotic rather than traumatic fractures and account for the development of bone marrow during physiological aging processes. We decided to not include all imaged vertebrae in our analysis to not artificially increase the sample size and true negative findings, ultimately leading to an overestimation of the specificity. We performed our DECT examinations at one specific scanner from one manufacture. Therefore, our findings are not transferable to other systems, especially considering dual-source DECT devices. We used an automatic exposure control optimised for IR2. Thus, FBP reconstructions might have been degraded by underexposure. Even though we used a DECT acquisition with two separate scans at two tube voltages, motion artefacts are not expected, and the software on the CT console is corrected for micro-movement. The method is already described in some prior studies [7, 11]. Wide volume mode, used for scan length over 16 cm, might generate higher radiation due to a small field-of-view overlap. We included the analysis of 2MD reconstruction for BML evaluation, although there is no description of this method in the literature. However, 2MD also offers the option of calcium subtraction and, therefore, was worth an investigation. On the other hand, 2MD cannot take partial volume effects into account. Therefore, it seems less suited for the analysis of bone marrow lesions. Clinical applications of DECT with 2MD reconstruction include the assessment of intracranial vessels and acute intracranial haemorrhage in emergency patients [18, 27,28,29,30].

In conclusion, we analysed the impact of 2MD, 3MD and iterative and raw data-based reconstructions on the image quality of bone marrow images acquired with DECT. IR for virtual non-calcium imaging using DECT improves image quality and diagnostic accuracy compared to FBP. RD-based reconstruction does not offer a significant improvement over reconstruction of preprocessed image data sets. 2MD algorithms are not suited for BML detection.