Introduction

Noncontrast brain computed tomography (CT) is widely used as a first-line imaging study to evaluate patients suspected of central nervous system disease due to its speedy image acquisition, easy accessibility, and low cost compared with magnetic resonance imaging. But image noise is still a problem for brain CT that hinders the detection of subtle changes in Hounsfield units (HUs) seen in pathologic conditions [1, 2]. In addition, beam hardening, streak, and partial volume artifacts observed in the posterior cranial fossa are other challenges for brain CT [3, 4].

For the past 30 years, filtered back-projection (FBP) has been the dominant method of reconstruction because of its computational efficiency and accuracy. FBP requires a great number of high-quality projection data to obtain accurate reconstructions [5]. But at low dose settings, challenges arise with higher image noise and artifacts. Consequently, iterative reconstruction (IR) was introduced to overcome these limitations of FBP. IR showed significant reduction in radiation dose and improvement in image quality [6, 7]. However, images with high reconstruction strength levels have a waxy, plastic look or simply unnatural appearance, which is another limitation [8, 9].

Deep learning (DL), a subset of machine learning and artificial intelligence, has recently shown the potential for improving image reconstruction in CT, because it can handle a higher number of models and parameters far better than statistics-based reconstruction methods [10, 11]. With high expectations, new technology for DL-based image reconstruction (DLIR; TrueFidelity, GE Healthcare) has been developed. This reconstruction engine has been applied to phantom and coronary CT angiography studies and has shown greater noise reduction and superior image quality compared with adaptive statistical iterative reconstruction-Veo (ASIR-V) [12, 13]. As far as we know, no previous studies have yet applied DLIR to routine noncontrast brain CT protocols. Thus, the aim of this study was to compare the objective and subjective image quality of brain CT images obtained with DLIR and ASIR-V, and to determine the strength of the DLIR algorithms needed to achieve reconstruction images of the highest quality.

Materials and methods

The institutional review board approved the study protocol (Veterans Health Service Medical Center, IRB file No. 2020-03-018) and the requirement for written informed consent was waived.

Subjects

Initially 104 patients who underwent noncontrast brain CT scans (Revolution CT, GE Healthcare) between December 2019 and January 2020 were enrolled. Among them, 42 patients with definite neuropathological CT findings such as hematoma (n = 6), edema (n = 3), neoplasm (n = 4), and encephalomalacia (n = 17) or foreign bodies such as surgical clips (n = 3), coils (n = 4), and drainage catheters (n = 5) were excluded to avoid confounding effects on image interpretation. But, patients with CT findings such as mildly enlarged ventricles and widened cortical sulci, physiologic calcifications in the medial basal ganglia, and a few scattered patchy white matter hypodensities which can be seen in normal aging brains [14] were included in this study. Finally, 62 patients were included in this study.

CT acquisition and image reconstruction

All patients underwent noncontrast brain CT scans on a latest-generation 512-slice CT scanner (Revolution CT, GE Healthcare). Scan parameters were as follows: tube voltage, 120 kV; tube current, 100~300 mA depending on automatic modulation; beam collimation, 64 × 0.625 mm; rotation time, 0.5 s; pitch factor, 0.516; field of view, 250 mm; matrix, 512 × 512.

CT datasets were reconstructed using ASIR-V at a level of 30% and DLIR with three selectable reconstruction strength levels (low, medium, and high) with 5.0-mm slice thickness as is done in routine clinical practice. The mean reconstruction times for ASIR-V and DLIR were 20.51 and 44.39 s, respectively. Finally, we obtained the following 4 reconstruction image datasets for each patient: ASIR-V, DLIR at the low level (DLIR-L), DLIR at the medium level (DLIR-M), and DLIR at the high level (DLIR-H).

Image quality assessment

Objective image quality analysis

All images were evaluated using a dedicated PACS system (M-viewer; Infinitt Healthcare). To assess gray matter-white matter (GM-WM) differentiation, 4 regions of interest (ROIs) were measured: frontal WM and adjacent cortical GM at the level of the centrum semiovale, and thalamic deep GM and WM of the posterior limb of the internal capsule at the level of the basal ganglia. To evaluate artifacts in the posterior cranial fossa, another ROI was drawn in the interpetrous region of the posterior fossa at the level where the most noticeable artifacts were seen [15, 16]. We used ROIs with sizes ranging from 17 to 20 mm2 for all WM locations and the deep thalamic GM, sizes ranging from 4.5 to 6 mm2 for GM at the level of the centrum semiovale, and sizes ranging from 190 to 200 mm2 for the posterior cranial fossa. ROIs were measured by an experienced neuroradiologist (I.K. with 11 years of experience with CT) and reviewed by another experienced neuroradiologist (N.Y. S. with 13 years of experience with CT). Representative images of the five ROIs are shown in Fig. 1.

Fig. 1
figure 1

Axial CT images at the level of the centrum semiovale (a) and basal ganglia (b). Regions of interest (ROI) were drawn in gray matter and white matter at both levels for objective image quality analysis. Axial CT image at the posterior cranial fossa (c). ROI was drawn in the interpetrous region to analyze the artifact index

We defined CT numbers (HU) of the thalamic deep GM at the level of the basal ganglia as CT attenuation of GM. Image noise was defined as the standard deviation (SD) of attenuation values measured in the deep WM at the centrum semiovale level which is relatively free from artifacts. The artifact index was defined as the SD within the ROI of the posterior cranial fossa which is prone to beam hardening, streak, and/or partial volume artifacts. Therefore, the artifact index may reflect the amount of CT number variations caused by artifacts in addition to the inherent image noise associated with scanner- and patient-related factors [17, 18].

We calculated the contrast-to-noise ratio (CNR) at both the centrum semiovale and basal ganglia levels using the above values in the following formula: (mean HUGM − mean HUWM) / [(mean SD HUGM)2 + (mean SD HUWM)2]1/2.

The noise reduction rate was calculated as follows: noise reduction rate (DLIR) (%) = (SDASIR-V − SDDLIR) / SDASIR-V × 100 (DLIR indicates DLIR-L, DLIR-M, or DLIR-H).

Subjective image quality analysis

Subjective image quality was evaluated by the same two experienced neuroradiologists who performed the objective image quality analysis. They were blinded to the reconstruction settings and results from the objective image quality analysis. Two neuroradiologists independently evaluated image quality with three categories: GM-WM differentiation (ability to distinguish GM from WM), sharpness (ability to reproduce the boundaries of the brain clearly and distinctly), and overall diagnostic image quality (image texture and general appearance). We used the following 4-point scale [19]: 1. poor/non-diagnostic; 2. suboptimal, but diagnostic; 3. average; and 4. excellent. Interobserver agreement between the two neuroradiologists was also calculated.

Statistical analysis

All statistical analyses were performed using SPSS version 19.0 (SPSS statistics; IBM). After using the Kolmogorov–Smirnov test to determine normal distribution, quantitative data with normal distributions were compared using the one-way analysis of variance (ANOVA) test. Quantitative data without normal distributions were compared using the Kruskal–Wallis test for objective image quality analysis. Post hoc tests were also performed using Tukey’s honestly significant difference test or the Mann–Whitney U test with the false discovery rate (FDR) correction for multiple comparisons.

Pearson’s chi-square test was performed to compare subjective image quality scores between different image datasets, and pairwise comparisons were performed using the FDR correction. Interobserver agreement was assessed using kappa statistics with the linear weighted method. A two-tailed P value < 0.05 was considered significant.

Results

Baseline characteristics and radiation dose

Patient age and gender are listed in Table 1. A total of 62 consecutive patients made up the study population and consisted of 50 men and 12 women with a median age of 74 (range 43–91 years).

Table 1 Basic characteristics and radiation doses of 62 subjects

The mean volume CT dose index (CTDIvol) was 35.90 mGy and mean dose-length product (DLP) was 768.38 mGy∗cm, respectively.

Objective image quality analysis

Compared with ASIR-V, the image noise and artifact index of the posterior cranial fossa were gradually reduced as the strength levels of DLIR increased (P < 0.001). As the reconstruction strength of DLIR increased from low to high, the image noise reduction rate increased from 23.6 to 51.1%, respectively. CNRs in both the centrum semiovale and basal ganglia levels also improved as the strength levels of DLIR increased from low to high, compared with ASIR-V (P < 0.001). Post hoc pairwise comparisons found statistically significant differences for image noise, artifact index, and CNR in both levels (ASIR-V vs DLIR-L, ASIR-V vs DLIR-M, ASIR-V vs DLIR-H, DLIR-L vs DLIR-M, DLIR-L vs DLIR-H, DLIR-M vs DLIR-H). There were no significant differences in CT attenuation of GM among the 4 datasets (P = 0.978). Results of the objective image quality analysis are summarized in Table 2 and Fig. 2.

Table 2 Comparison of objective image quality among the four image datasets
Fig. 2
figure 2

Noise reduction rate according to the strength levels of DLIR (a). Artifact index (b). CNR in both the basal ganglia (c) and centrum semiovale (d) levels

Subjective image quality analysis

Subjective parameters showed a similar pattern for objective parameters. Scores in the three categories gradually increased as the strength levels of DLIR increased from low to high.

Compared with ASIR-V images, one radiologist found all three DLIR images to have significantly better GM-WM differentiation (ASIR-V vs DLIR-L, P = 0.004; ASIR-V vs DLIR-M, P < 0.001; ASIR-V vs DLIR-H, P < 0.001) and sharpness (ASIR-V vs DLIR-L, P = 0.001; ASIR-V vs DLIR-M, P < 0.001; ASIR-V vs DLIR-H, P < 0.001). On the other hand, the other radiologist found DLIR-M and DLIR-H images to have significantly higher GM-WM differentiation (ASIR-V vs DLIR-M, P < 0.001; ASIR-V vs DLIR-H, P < 0.001) and sharpness (ASIR-V vs DLIR-M, P = 0.002; ASIR-V vs DLIR-H, P < 0.001), but ASIR-V and DLIR-L images did not significantly differ in GM-WM differentiation (P = 0.087) and sharpness (P = 0.405).

DLIR-M and DLIR-H images showed significantly better overall diagnostic quality compared with ASIR-V and DLIR-L images (both radiologists, P < 0.001). But the overall diagnostic quality did not significantly differ between DLIR-M and DLIR-H (radiologist 1, P = 0.427; radiologist 2, P = 0.440). Also, the overall diagnostic quality did not significantly differ between ASIR-V and DLIR-L (radiologist 1, P = 0.703; radiologist 2, P = 0.414) (Fig. 3).

Fig. 3
figure 3

Axial CT images of a 75-year-old male patient at the level of the centrum semiovale (ad) and basal ganglia (eh) and through the posterior cranial fossa (il), using ASIR-V (a, e, i), DLIR-L (b, f, j), DLIR-M (c, g, k), and DLIR-H (d, h, l). Compared with ASIR-V and DLIR-L images, DLIR-M and DLIR-H images showed significantly better gray-white matter differentiation, sharpness, and overall diagnostic quality

Interobserver agreement was moderate for gray-white differentiation (κ = 0.585) and good for sharpness (κ = 0.765) and overall diagnostic quality (κ = 0.717).

Results of the subjective image quality analysis are summarized in Table 3.

Table 3 Comparison of subjective image quality among the four image datasets

Discussion

This study assessed the objective and subjective image quality of two reconstruction algorithms (ASIR-V and DLIR) for images obtained with routine clinical brain CT. We found that brain CT images with DLIR showed better objective image quality in terms of noise and artifact reduction compared with ASIR-V images. Also, DLIR images with medium and high strength levels demonstrated the best subjective image quality scores among the reconstruction datasets.

The conventional modeling approaches of IR face fundamental challenges because the growing number of parameters makes it more difficult to retain the necessary convergence properties of algorithms. On the other hand, DLIR can deal with complex models and a huge number of parameters through training processes, overcoming the modeling limitations of IR [20]. Recently, CT image reconstruction with a deep neural network (DNN) has shown promising performance for improving image quality with favorable noise texture for anatomical and pathological structures [12, 13, 21,22,23]. Even though these previous studies focus on different body organs for CT, the results of our study are still in line with their findings. When compared with the most recent generation of the IR algorithm available from the same manufacturer (ASIR-V), our results showed that the new DLIR (TrueFidelity™) significantly reduces noise and artifacts. Given that the difference in HU is very subtle between GM and WM (typically from 5 to 10 HU) in brain CT images [24], improvements in noise reduction with DLIR could greatly benefit the interpretation of CT images. Although we could not evaluate improvements in diagnostic accuracy in this study, we expect better diagnostic accuracy in the posterior cranial fossa region with DLIR as it shows superior capability for noise and artifact reduction in our results, and we infer that DLIR will be especially advantageous when diagnosing posterior fossa infarction. Further studies are required to analyze how DLIR will impact diagnostic accuracy and its ability to detect lesions under various pathologic conditions and at diverse locations.

As expected, we found that subjective image quality parameters also significantly improved after adapting DLIR with increased strength levels. Our results demonstrated a similar pattern in both objective and subjective parameters, which were used to verify noise reduction. In theory, reduced sharpness and contrast can occur followed by noise reduction, as reported in previous studies using IR [25, 26]. Previous studies also reported that the visual impression of reconstructed images with highest iterative levels differs from images generated with FBP. This plastic-looking unfamiliar noise texture limits the use of high-level iterative reconstruction in routine clinical practice [9, 27]. On the other hand, the results of our study show that DLIR improved the sharpness of the structural margins and produced favorable image appearances even when the highest level of reconstruction strength was applied. DLIR incorporates a DNN trained with high-quality FBP datasets of ground truth images. Through rigorous validation and extensive testing to reduce the difference between reconstruction outputs and ground truth images, DLIR can generate images that accurately match ground truth images [20]. Thus, we thought that DLIR with fine-tuned DNN would enable the generation of more appealing image appearances in clinical brain CT imaging compared with ASIR-V.

Our study has several limitations. This is a retrospective study with a relatively small number of patients. Although the number of patients included in our study was larger than that of recent comparison studies with DLIR and IR [12, 23], further research involving a larger number of patients is required to confirm our findings. Also, our study subjects were limited to patients without definite neuropathological findings or foreign bodies and we were unable to evaluate improvements in diagnostic accuracy. While our results show that DLIR enables superior image quality in subjects with normal aging brains, further studies involving patients with various clinical conditions are required to evaluate the impact of DLIR on lesion detection and diagnostic accuracy. Furthermore, we only investigated CT images under a routine radiation dose protocol which was adapted to the clinical specifics of our institution. Considering that radiation dose reduction is possible by altering the reconstruction mechanism, the impact of DLIR on reduced radiation dose protocols needs to be evaluated in the future. Lastly, subjective image quality evaluations were performed by only two radiologists in this study. Extended research with more neuroradiologists is needed to generalize the subjective outcomes of our study.

In conclusion, brain CT images with DLIR demonstrated better performance in reducing image noise and artifacts compared with ASIR-V, and DLIR images with medium and high reconstruction strength levels provided the highest subjective image quality scores. DLIR shows great potential as an advanced reconstruction method that improves the image quality of clinical brain CT images.