Introduction

For various lung diseases, bone suppression image (BSI) technique in chest radiography has been proven to be useful for radiologists and in computer-aided diagnosis [1,2,3,4,5,6,7,8,9,10,11]. However, since the introduction of the dual-energy subtraction (DES) for the generation of BSI, DES has not achieved broad clinical application. A possible reason is the additional radiation exposure, although it is minimal in DES [12]. Moreover, double-shot DES provides BSIs with unstable image quality because DES-based BSI may contain motion artifacts associated with patients’ respiration and cardiac motion [13]. Maintaining a stable quality of BSIs generated by DES may be especially difficult for older patients with lung lesions because they cannot hold their breath during the long acquisition time.

To overcome these issues, deep learning–based algorithms or software have been developed to generate BSIs. These techniques have been validated by observer performance tests. BSIs derived from this technique improved readers’ accuracy in detecting lung lesions on standard chest radiographs (CXRs) [7, 10, 11, 14,15,16,17]. Li et al [7] showed that the supplementary use of software-based BSIs with CXRs can improve radiologists’ accuracy for detecting small lung cancers on CXRs compared with DES-based BSI. However, there is always a risk of artificially adding or removing findings when using software-based techniques, which may increase false-positive and false-negative rates in the BSIs [15, 16, 18]. In this respect, bony structures must be removed effectively while maintaining soft tissues in the BSIs. Moreover, the validation of the quality of images generated by newly introduced software or techniques is a prerequisite for using these images in medicine. However, there has been no study on image quality for anatomical structures and lung lesions in BSIs generated by software.

Therefore, our study compared the image quality and radiation dose of software-based BSI (S-BSI) with dual-energy subtraction-based BSI (D-BSI).

Methods

Patients

This prospective study enrolled 40 patients (male-to-female ratio of 23:17; mean age ± standard deviation (SD): 57.7 ± 14.3 years) who met the following eligibility criteria, from Aug 2018 to Jun 2019. The inclusion criteria were as follows: (1) adults over 19 years of age; (2) those with a scheduled routine follow-up for CXRs; (3) those with abnormal CXRs; (4) those who agreed to participate in the study. The exclusion criteria were as follows: (1) patients with space-occupying lesions; (2) pregnant women. The space-occupying lesions were defined as lesions replacing one lung field (i.e., huge mass, lobar consolidation, a large amount of pleural effusion, or pneumothorax with total lung collapse) on CXR because the space-occupying lesions can obscure the appearance of normal anatomical structures, making it difficult to evaluate the image quality of CXR and BSI.

Image acquisition and generation of bone suppression images

Before the present prospective study, a preliminary phantom study was conducted to set up imaging protocols for minimizing the radiation dose while maintaining the optimal image quality for each X-ray device (see Appendix 1 for details).

All participants underwent CXRs with the posterior-anterior projection. Initially, all CXRs were acquired using the GE machine (Discovery XR656, GE Healthcare) with 120 and 60 kVp for consecutive exposures at two different energy levels, and the speed of 400 with individualized automatic exposure control. These parameters (chosen in the present study based on the results of the phantom study) were consistent with the parameters recommended by the manufacturer. In the follow-up imaging, all CXRs were acquired using a Samsung digital radiography (DR) system (GC85A) (Samsung Electronics Co., Ltd.) at 120 kVp and 1–2 mAs with individualized automatic exposure control. The device contains a dedicated deep learning–based software (Bone Suppression™ software, version 1.0, Samsung Electronics Co., Ltd.) to automatically generate BSI (see Appendix 2 for details).

Quantitative analysis of image quality

The study analyzed 160 images from 40 CXRs and 40 BSIs for each image processing method. The lung field in CXR was divided into six lung regions (i.e., bilateral upper, middle, and lower lung regions). First, in the original CXRs, square regions of interest (ROIs) with diameters of 15 × 15 mm were drawn on the rib (bone area) and intercostal area (soft tissue area) at the lateral, middle, and medial portions of each lung region using an in-house annotation tool (Fig. 1). As two images (paired CXRs and BSIs) were already aligned in the same coordinate system, the ROIs in CXRs were sequentially overlaid on the bone suppression images in order to calculate the metrics. Moreover, ROIs were drawn around the margin of each lung lesion in the CXR and then overlaid on the BSI and divided by grids (diameters of 15 × 15 mm) to calculate the metrics. Three image quality metrics were chosen based on a previous study [19]: relative mean absolute error (RMAE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). The metrics were calculated from the extracted ROIs. The adjusted values of RMAE (aRMAE), PSNR (aPSNR), and SSIM (aSSIM) were calculated in the lung fields, excluding the lung areas with visible motion artifacts. If RMAE is close to 0, PSNR is higher, or SSIM is close to 100; these values imply that the quality of the generated image is better. The metrics were defined as follows:

$$ {\displaystyle \begin{array}{c}\mathrm{RMAE}=\sqrt{\frac{1}{N}\sum \limits_{\left(x,y\right)\in \Omega}\left|\hat{z}\left(x,y\right)-z\left(x,y\right)\right|}/\left({z}_{\mathrm{max}}-{z}_{\mathrm{min}}\right)\\ {}\begin{array}{c}\mathrm{PSNR}=20{\log}_{10}\left(\left({z}_{\mathrm{max}}-{z}_{\mathrm{min}}\right)/\sqrt{\frac{1}{N}\sum \limits_{\left(x,y\right)\in \Omega}{\left(\hat{z}\left(x,y\right)-z\left(x,y\right)\right)}^2}\right)\\ {}\mathrm{SSIM}=l\left({\hat{z}}_{\Omega},{z}_{\Omega}\right)\bullet c\left({\hat{z}}_{\Omega},{z}_{\Omega}\right)\bullet s\left({\hat{z}}_{\Omega},{z}_{\Omega}\right)\\ {}l\left({\hat{z}}_{\Omega},{z}_{\Omega}\right)=\frac{2{\mu}_{{\hat{z}}_{\Omega}}{\mu}_{z_{\Omega}}+{C}_1}{{\mu_{{\hat{z}}_{\Omega}}}^2+{\mu_{z_{\Omega}}}^2+{C}_1};c\left({\hat{z}}_{\Omega},{z}_{\Omega}\right)=\frac{2{\sigma}_{{\hat{z}}_{\Omega}}{\sigma}_{z_{\Omega}}+{C}_2}{{\sigma_{{\hat{z}}_{\Omega}}}^2+{\sigma_{z_{\Omega}}}^2+{C}_2};s\left({\hat{z}}_{\Omega},{z}_{\Omega}\right)=\frac{\sigma_{{\hat{z}}_{\Omega}{z}_{\Omega}}+{C}_3}{\sigma_{{\hat{z}}_{\Omega}}{\sigma}_{z_{\Omega}}+{C}_3}\end{array}\end{array}} $$

where z and \( \hat{z} \) are the original CXR and BSI, respectively. Ω denotes the valid regions (ROIs) in z which are composed of the anatomical landmarks and lesions, and N is the number of pixels in Ω. zmin and zmax are the minimum and maximum values of the pixels in image z, and we set the 0.1 and 99.9 percentiles as the values of zmin and zmax. SSIM is a weighted combination of three comparison measurements between zΩ and \( {\hat{z}}_{\Omega} \): luminance (l), contrast (c), and structure (s). \( {\mu}_{{\hat{z}}_{\Omega}} \) is the average of \( {\hat{z}}_{\Omega} \), \( {\mu}_{z_{\Omega}} \) is the average of zΩ, \( {\sigma_{{\hat{z}}_{\Omega}}}^2 \) is the variance of \( {\hat{z}}_{\Omega} \), \( {\sigma_{z_{\Omega}}}^2 \) is the variance of zΩ, and \( {\sigma}_{{\hat{z}}_{\Omega}{z}_{\Omega}} \) is the covariance of \( {\hat{z}}_{\Omega} \) and zΩ. C1 and C2 are two variables to stabilize the division with a weak denominator, and C3 = C2/2.

Fig. 1
figure 1figure 1

Illustration of quantitative image quality analysis (a and b) and ROI-based visual image quality analysis (c) of paired image data. a Data set 1: original chest radiograph and dual-energy subtraction-based bone suppression image. b Data set 2: original chest radiograph and software-based bone suppression image (S-BSI). Square regions of interest (ROIs) in red indicate soft tissue areas and those in yellow indicate bone areas. Three image quality metrics are calculated for the extracted ROIs. c Randomly mixed ROIs (selected from data sets 1 and 2 so as to contain each anatomical structure) were presented individually, and readers analyzed the quality of soft tissues in the images. DES dual-energy subtraction, BSS bone suppression software, CXR chest radiograph, D-BSI dual-energy subtraction-based bone suppression images, S-BSI software-based bone suppression images

Visual analysis of image quality

All images were available on a picture archiving and communication system (PACS). In analysis using full chest images, two board-certified radiologists (G.S. Hong and J. Choe with 12 and 8 years of experience in chest radiography, respectively) independently scored the image quality of a pair of CXRs and BSIs according to the modified evaluation criteria with reference to the previous study [20]. In the modified evaluation criteria of human chest radiographs, the visibility of the ribs and intervertebral discs was excluded from the evaluation criteria of the phantom study mentioned above (see Appendix, Supplementary Table 2 for details) because of the suppressed bone structures in BSIs. The modified evaluation criteria included the visibility of blood vessels in the lung field, myocardial area, and diaphragm area and the visibility of the proximal airway and diaphragm. In addition, the readers independently scored the image quality for lung lesions in CXRs and BSIs, according to the visibility scale of lung lesions: (1) very well-visualized margin of lesions (score = 6); (2) partially poorly visualized margin of lesions (score = 3); (3) not visualized lesions (score = 0). In addition, to mitigate the bias in the visual quality analysis using full CXRs, ROI-based visual image quality analysis was performed (Fig. 1). (see Appendix 3 for details).

Radiation dose estimations

For each X-ray device, the dose area product (DAP [dGy·cm2]) was calculated from the recorded exposure factors.

Sample size calculation and statistical analysis

The sample size was calculated using G* 3.1.9.2 program for paired data sets with 0.5 of the effective size, 5% of type I error, 80% of power, and 15% dropout. The normal distribution of DAP was confirmed by the Kolmogorov–Smirnov test. The paired t test was used to compare the aforementioned quantitative image quality metrics, visual image quality scores, and DAP between two groups (D-BSI and S-BSI). All the statistical tests were two-sided with a significance level of 0.05. Unlike the visual quality test, compared with D-BSI, S-BSI had a high range of standard deviation in image quality metrics. Therefore, the distribution of the SSIM data was investigated using a boxplot because the SSIM of the proposed metrics is known to match very well with the human’s perceived visual quality [21]. All the statistical analyses were performed using SPSS (SPSS version 21.0; IBM Corp.).

Results

Quantitative analysis of image quality

Table 1 shows that S-BSIs achieve better quantitative image quality indices than D-BSIs. In bone areas, S-BSIs had better SSIM values than D-BSI (94.57 vs. 87.77, p < 0.001) but worse RMAE and PSNR values (0.50 vs. 0.20; 20.93 vs. 34.37; p < 0.001, respectively). In soft tissue areas, S-BSIs had better SSIM values than D-BSI (97.56 vs. 91.42, p < 0.001) but similar RMAE and PSNR values (0.29 vs. 0.27; 31.35 vs. 29.87; p < 0.001, respectively). After excluding the lung regions with visible motion artifacts, the adjusted quantitative metrics did not essentially differ from the quantitative indices. A representative case is illustrated in Fig. 2. Boxplots show the distribution of SSIM values and outliers according to the lung regions for image processing methods (Fig. 3). Ten outliers (2 in bone area and 8 in soft tissue area) were identified in D-BSI and 35 (17 in bone area and 18 in soft tissue area) in S-BSI. All outliers in S-BSI were identified in one case. The representative cases with outliers are illustrated in Fig. 3. The mean values ± SD of SSIM were 96.06 ± 8.83 in S-BSI and 89.59 ± 5.03 in D-BSI; with the exclusion of the outliers of the SSIM, the mean values ± SD were 97.32 ± 2.46 in S-BSI and 89.74 ± 4.54 in D-BSI. For lung lesions, the RMAE and PSNR at S-BSIs were inferior to D-BSI (0.04 vs. 0.03; 46.80 vs. 51.68; p < 0.001, respectively). However, the SSIM at S-BSI was significantly higher than D-BSI (93.52 vs. 84.70, p < 0.001). For lung lesions, the adjusted quantitative indices did not essentially differ from the quantitative indices (Table 2).

Table 1 Comparison of quantitative image quality for the entire lung fields between dual-energy subtraction-based and software-based bone suppression images
Fig. 2
figure 2

Comparison of image quality of bone suppression images in a 32-year-old male with pneumonia. a, b Original chest radiograph and dual-energy subtraction-based bone suppression image (D-BSI). c, d Original chest radiograph and software-based bone suppression image (S-BSI). The mean SSIM values were marked in D-BSI and S-BSI, respectively. D-BSI shows the visibly detected motion-related artifact in the left middle and lower lung. However, this motion affects the entire lung fields, not the localized lung area. Note poorly visualized lung lesions (arrows) in D-BSI when compared with S-BSI

Fig. 3
figure 3

Boxplots show the distribution of SSIM in the entire data set at (a) dual-energy subtraction-based bone suppression images (D-BSI) and (d) software-based bone suppression image (S-BSI). The number under the boxplot represents the case number and location (in parentheses) of the identified outliers. b, c A representative case with an outlier in D-BSI. Note the outlier located near the cardiac margin in D-BSI, which contains a cardiac motion artifact. e, f One case with outliers in S-BSI. Chest radiography shows coarse reticular opacity with a cystic change and fibrosis in both lungs. This case was diagnosed with a humidifier-disinfectant inhalation injury

Table 2 Comparison of quantitative image quality for the lung lesions between dual-energy subtraction-based and software-based bone suppression images

Visual analysis of image quality

Table 3 compares the visual image quality between D-BSI and S-BSI with the paired CXRs. In analysis using full chest X-ray images, for all anatomical landmarks of chest images, the difference in visual quality scores between paired CXRs and BSIs for S-BSI was significantly lower than that for D-BSI (reader 1, 1.2 vs. 13.98; and reader 2, 0.53 vs. 20.9; p value < 0.001, respectively). For lung lesions, the difference in visual quality scores between paired CXRs and BSIs for S-BSI was significantly lower than that for D-BSI (reader 1, 0.44 vs. 0.94, p value = 0.028; reader 2, 0.56 vs. 1.11, p value = 0.017). Results of the ROI-based visual quality analysis were identical to those obtained using full chest X-ray images. Representative cases are illustrated in Figs. 4 and 5.

Table 3 Comparison of visual image quality for anatomical landmarks between dual-energy subtraction-based and software-based bone suppression images with paired original chest radiographs
Fig. 4
figure 4

Comparison of image quality for anatomical structures. a, b Original chest radiograph and dual-energy subtraction-based bone suppression image (D-BSI). c, d Original chest radiograph and software-based bone suppression image (S-BSI). Note the blurring of vessels (arrows) in the D-BSI in the right upper lung compared with the paired chest radiograph. S-BSI shows well-visualized vessels (arrows) in the corresponding area of right upper lung

Fig. 5
figure 5

Comparison of the image quality for the lung lesion in a 71-year-old male with lung cancer. a, b Original chest radiograph and dual-energy subtraction-based bone suppression image (D-BSI). D-BSI shows a partially poorly visualized margin of the lung lesion (arrows) compared with the original chest radiograph. c, d Original chest radiograph and software-based bone suppression image (S-BSI). There is no definite change in the image quality of the lung lesion between the original chest radiograph and S-BSI

Radiation dose

Details on radiation doses are summarized in Table 4. The radiation dose in software-based images was significantly lower than that in the DES-based images (p < 0.001). The mean DAP was 1.48 dGy·cm2 in DES-based images and 0.98 in software-based images. In the dual-energy systems, the acquisition of a low-energy radiograph for dual-energy subtraction increased the total examination dose by 91.8%. Compared with the software-based technique, dual-energy systems increased the total examination dose by 48.7%.

Table 4 Comparison of radiation dose between devices with dual-energy subtraction and bone suppression software

Discussion

To the best of our knowledge, no study has focused on the image quality of S-BSIs for anatomical structures and lung lesions, and compared it with a traditional method using hardware. Several recent studies on BSI only used image quality metrics as criteria to validate the developed deep learning–based models [19, 22,23,24]. Moreover, their results were not validated by chest experts, even considering the existing gap between the metrics and human perception in the super-resolution field.

This study clarified the superiority of S-BSI to D-BSI in terms of image quality. In the bone and soft tissue areas, S-BSI had superior SSIM than D-BSI. These results imply that bone suppression software preserves structural similarity of soft tissues after removing bones better than DES. These results were consistent with the visual quality analysis. All the readers preferred S-BSI rather than D-BSI. These results were expected, considering the aforementioned motion-related instability of D-BSI [13]. However, interestingly, the adjusted metrics did not substantially change after excluding the lung areas with visually detected motion artifacts. A reasonable explanation for this is that the cardiac and respiratory movements do not affect the localized lung area with visually detected motion artifacts but rather the entire lung fields. In the present study, there were no advantages in using S-BSI over D-BSI in terms of the RMAE and PSNR. This is because the traditional approach for evaluating image quality, which is based on RMAE and PSNR, compares original and processed images on a point-by-point basis. As a result, they are not well suited to perceive visual quality [25, 26]. Although there was no definite advantage in terms of the RMAE and PSNR, the change in image quality indicated by these traditional metrics cannot be deemed significant if humans are unable to observe these changes in the processed images [27]. Therefore, it is reasonable to conclude that S-BSI offers superior perceptual image quality as compared to D-BSI.

Compared with D-BSI, S-BSI had a high range of the standard deviation of quantitative metrics, which may be caused by the unstable generation of S-BSI. The outliers can indicate an exceptional result (i.e., an unstable function of bone suppression software), resulting in increased standard deviation and decreased image quality. In the S-BSI, outliers in SSIM were identified in one case diagnosed with a humidifier-disinfectant inhalation injury, which resulted in fibrotic interstitial lung disease. The exclusion of these outliers substantially decreased the variance in the SSIM values for S-BSI. In this case, the bone suppression software failed to preserve the quality of soft tissue images. This could be attributed to the inability of the software in clearly distinguishing between increased or inhomogeneous soft tissue opacity and bone structures. This could be interpreted as a potential limitation of the software’s clinical application; however, it is not rational to arrive at this conclusion based on a single data point. Most of all, its clinical relevance is unclear because there was no difference in the image quality scores between the original CXR and S-BSI in the visual analysis of readers. It is well-known that SSIM can reflect substantial point-by-point distortion imperceptible to humans, although SSIM accounts for high-level human visual perception [21, 27, 28]. Therefore, additional studies applying the software to cases involving various diseases are essential to address this issue.

One main finding of this study is that S-BSI achieved better image quality of lung lesions than D-BSI. When using image processing techniques, it is important to maintain the image quality of lung lesions in the generated images as close as possible to that in the original CXR. The reason is that addition or removal in the generated images can increase the rates of false-positives or false-negatives. Freedman et al [15] reported that viewing BSI generated by a software tool increased false-negative decisions by 2% and false-positive findings by 4% per radiologist in lung cancer patients, resulting in significant loss of specificity from 96.1 to 91.8. Bone suppression can suppress ribs and also the lung lesions or vascular structures. For the image quality of lung lesions, we noticed that SSIM in S-BSI was significantly higher than that in D-BSI. This is consistent with the visual analysis by radiologists. All the radiologists gave higher image quality scores for lung lesions to S-BSI than D-BSI. However, the better image quality for lung lesions in BSI does not imply better performance and confidence of radiologists. Szucs-Farkas et al [4] found no difference in the false-positive and true-positive rates between DES-based and software-based images. Li et al [7] showed that the mean confidence level of readers regarding the presence of cancer decreased by viewing supplementary D-BSI, but there was no change when using S-BSI. These results are controversial because many factors influence the reader’s performance [11, 29,30,31]. It is not easy to determine whether the improved image quality of the lesions is more important than other factors. For example, the reader’s expertise and experience level for newly processed images can be more important than image quality to distinguish artificial findings from real lesions. In addition, the presence or absence of lung lesions in BSI is always decided with a paired original CXR. Nevertheless, maintaining the image quality of lesions as close as possible to that in the original image is believed to be essential when using newly processed images in clinical practice.

As expected, this study showed that the software-based image processing technique can decrease the radiation dose over the hardware-based image processing technique. Fischbach et al [12] reported that DES increased the total examination dose by only 14% based on the phantom entrance dose. However, the present study showed that adding a second exposure in DES increased the total examination dose by 91.8%. This discrepancy is because the previous study included the radiation dose in the lateral chest examination. Excluding this radiation dose, the results of the current study are almost identical to those of the previous study. In a dual-energy system, the radiation dose level is a very important factor because the image quality of DES images can vary with the radiation dose [13]. It may cause severe bias in the comparison of image quality. To address this, a preliminary phantom study with various radiation levels was performed to determine the optimal radiation dose level before the current prospective study.

This study has several limitations. First, software-based bone suppression images can be easily differentiated from DES-based bone suppression images, which may cause bias. To address this bias, an ROI-based image quality analysis was also performed. We believe that this method can mitigate the impact of the bias and serve as a more objective method of assessing image quality analyses. Second, the present study did not include the observer performance test for lesion detection. As described above, better image quality in BSI does not guarantee better performance. However, the current study population is not suitable for the lesion detection test because it comprised inpatients for the workup of lung lesions. Most lung lesions can be easily detected. To clarify a relationship between image quality and observer performance, further study is needed. Third, general applications of some current results are limited because of a small number of patients.

In conclusion, bone suppression software significantly improved the image quality of bone suppression images with a relatively low radiation dose, compared with the dual-energy subtraction technique.