Introduction

In the last decade, substantial efforts have been made to reduce patients’ exposure to ionising radiation from computed tomography (CT). However, the diagnostic accuracy should not be impeded by this process. Lesion detection is a common task in daily practise and particularly important to assess metastatic disease in oncological patients [1]. Alterations in image quality parameters of a CT scan, including image noise and lesion-to-background contrast-to-noise ratio (CNR), potentially influence lesion detectability [2]. This phenomenon has been especially observed for the detection of low-contrast lesions, which is one of the most difficult tasks of diagnostic radiology in routine practice [2, 3].

Radiologists generally have two available options to improve lesion-to-background CNR: the application of iterative reconstruction algorithms (IRs) and imaging at low tube voltages. Several studies demonstrated that IRs reduce the image noise and consequently improve the CNR compared with conventional filtered back projection (FBP) [48]. Reducing the tube voltage increases the attenuation of iodinated contrast material due to the augmented photoelectric effect [9]. Increased attenuation may lead to improved lesion delineation between two tissue types with a different iodine uptake, as confirmed in former studies [10, 11]. The main drawback of low tube voltages is the increased image noise caused by the reduced photon flux. Especially in larger patients, this increase in image noise may outweigh the improved lesion contrast and result in an overall decrease of the lesion-to-background CNR. To increase the photon flux, the tube current time product of the CT scanner must be increased.

Some studies have demonstrated an improvement of image quality parameters and lesion detection rate using IRs [12, 13], whereas others demonstrated no change in diagnostic accuracy despite improvements in image noise and CNR [14, 15]. Model-based iterative reconstruction algorithms (MBIR) are the third generation of IRs. Their complex statistical model takes into account X-ray physics and CT scanner optics during the iteration process. MBIRs have shown to improve image quality parameters and potentially reduce radiation dose [1618]. A phantom study, evaluating the impact of a recent MBIR on low-contrast lesion detectability, demonstrated significantly improved detectability using MBIR compared with FBP at 120 kV [19]. However, the results of the study were limited by the assessment at a single tube voltage as well as by the phantom’s small diameter (16.5 cm) which does not reflect the average abdominal diameter of an adult patient in the Western world.

The purpose of our study was to assess the impact of MBIR and low-kVp imaging on image quality and low-contrast detectability in a contrast-detail CT phantom in abdominal CT of simulated medium and large patients.

Materials and methods

Phantom design

An iodine-containing contrast-detail phantom with 45 hypodense, low-contrast lesions (QRM, Moehrendorf, Germany) was used in this study. The background parenchyma of the phantom was hyperattenuating, simulating parenchyma in arterial phase. The lesions had three different diameters (5, 10, and 15 mm) and three different lesion-to-background contrast values (10, 25, and 50 HU). The phantom was placed in two separate water-filled cylindrical plastic containers with diameters of 30 cm and 40 cm, mimicking the abdominal cross-sectional dimensions of a medium and a large patient (estimated body weights: 72-85 kg for the medium phantom and 118-142 kg for the large phantom [20]).

Scanning protocol

Both phantoms were scanned using a third-generation dual-source CT scanner (SOMATOM Force; Siemens Healthineers, Forchheim, Germany) at 70, 80, 100, and 120 kVp. Our institutional abdominal protocol at 120 kVp, 150 ref. mAs, collimation of 192 × 0.6 mm, gantry rotation time of 0.5 s, pitch of 0.8 and automatic tube current modulation with an average strength curve (CareDose4D; Siemens Healthineers) served as the reference protocol, and the tube current time products of the other three protocols (70, 80, 100 kVp) were adjusted to keep the volume CT dose index (CTDIvol) constant within each phantom size. The CTDIvol was 8 mGy and 19 mGy for the medium and large phantoms, respectively. The effective mAs values measured 685, 417, 200, and 120 for the medium phantom and 1616, 985, 472, and 283 for the large phantom at 70, 80, 100, and 120 kVp, respectively.

Each scan was reconstructed with a recent MBIR, known as advanced modelled iterative reconstruction (ADMIRE; Siemens Healthineers) at a strength of 3 and with FBP [21]. The I40- and Br40-kernels were applied for MBIR and FBP, respectively. The 5-mm-thick axial images with an increment of 2.5 mm were used for evaluation. Eight datasets, with a total of 816 axial images, were generated for each phantom size.

Objective image quality

CT numbers were measured in the background parenchyma and in the centre of the 15-mm lesions by a third-year radiology resident (A.E.), by placing circular regions of interest (ROIs). The 15-mm lesions were chosen to minimise measurement errors due to partial volume effects. All measurements were performed thrice on a vendor-specific software solution (Syngo.via, version VB10A; Siemens Healthineers). The standard deviation of the attenuation of the background parenchyma served as image noise. The lesion-to-background CNR was calculated as follows:

$$ \left[ ROI\kern0.3em (B)\kern0.3em -\kern0.3em ROI\kern0.3em (L)\right]/N, $$

where ROI (B) = mean attenuation of the background parenchyma; ROI (L) = mean attenuation of the lesion; N = mean image noise. The CNR values of the 15-mm lesions were averaged to create a mean CNR value for each dataset.

Low-contrast lesion detection

Six radiologists (three board-certified radiologists and three residents with 12, 9, 7, 4, 3, and 1 years of experience in abdominal CT imaging, respectively) analysed the eight CT datasets of the medium phantom, and an additional six different radiologists (four board-certified radiologists and two residents with 10, 8, 7, 7, 4, and 4 years of experience in abdominal CT imaging, respectively) analysed the eight datasets of the large phantom. We chose two different reader groups to reduce the recall bias due to the large number of similar datasets. They had to scroll through the dataset and mark the position and grade of conspicuity of every lesion on a reading sheet. The conspicuity was graded according to the following three categories: 1 = perhaps present; 2 = most likely present; 3 = definitely present. The readers were free to change the window width and level for analysis.

Each reading session was separated by 1-3 weeks. To reduce recall bias, the succession of the datasets was randomised and the geometrical orientation of the images was changed after every reading: (1) dataset reading from first to last image; (2) dataset flipped vertically; (3) dataset rotated 90° clockwise; (4) dataset reading from last to first image.

Subjective image quality

Ten different slices of each CT dataset were anonymised and randomly presented to the same 12 readers. Thus, every reader had to evaluate 80 different CT images. Each reader had to grade subjective image noise (grade 1 = unacceptable; grade 2 = above average; grade 3 = average; grade 4 = below average; grade 5 = absent) and image quality (grade 1 = bad, no diagnosis possible; grade 2 = poor, diagnostic confidence substantially reduced; grade 3 = moderate but sufficient for diagnosis; grade 4 = good; grade 5 = excellent) on a five-point scale for each image.

Statistics

Marks made by the readers were compared with the construction plan of the phantom and were classified as true positive (TP) or false positive (FP). Missed lesions were regarded as false negatives (FNs). The data from six readers with each phantom were averaged, and the mean values were used for further statistics. Lesion detection rates and numbers of FPs with various combinations of peak kilovoltage (kVp) and reconstruction algorithms were compared using Fisher’s exact test. The number of FPs was rather low; thus, we did not use statistical methods suitable for free response datasets. Inter-observer agreement was assessed by calculating Cohen’s kappa value. Subjective image quality and subjective noise as rated by the readers with the various CT protocols were compared using Wilcoxon’s matched pairs test. All of the analyses were performed using the Statistica software package (Statsoft, Tulsa, OK, USA). The level of statistical significance was set at P < 0.05.

Results

Objective image quality

The CT numbers of the background parenchyma and the lesions were comparable for both reconstruction algorithms (P = 0.53) (Table 1). MBIR decreased the image noise by 23.5%, 21.8%, 26.3%, and 23.5% in the medium phantom and by 26.9%, 25.8%, 27.9%, and 29.1% in the large phantom compared with FBP at 70, 80, 100, and 120 kVp, respectively. In the large phantom, the image noise was comparable between 120 and 100 kVp, but it was substantially increased at 80 kVp and 70 kVp within both reconstruction algorithms. MBIR increased the CNR by 32.4%, 28.1%, 32.0%, and 27.3% in the medium phantom and by 23.5%, 27.8%, 29.4%, and 33.3% in the large phantom compared with FBP at 70, 80, 100, and 120 kVp, respectively.

Table 1 Objective image quality

In the medium phantom, the reduction of the tube voltage increased the CNR by 60.7%, 46.4%, and 17.9% at 70, 80, and 100 kVp, respectively, compared with 120 kVp with MBIR and by 54.5%, 45.5%, and 13.6% at 70, 80, and 100 kVp, respectively, compared with 120 kVp with FBP. In the large phantom, the CNR increased by 5%, 15%, and 10% at 70, 80, and 100 kVp, respectively, compared with 120 kVp with MBIR and by 13.3%, 20%, and 13.3% at 70, 80, and 100 kVp, respectively, compared with 120 kVp with FBP (Fig. 1).

Fig. 1
figure 1

Example of three lesions: one lesion with a diameter of 10 mm and a lesion-to-liver contrast value of 25 HU (1); one lesion with a diameter of 10 mm and a lesion-to-liver contrast value of 10 HU (2); and one lesion with a diameter of 15 mm and a lesion-to-liver-contrast value of 10 HU (3). Despite the increased iodine attenuation of the liver parenchyma at lower tube voltages, the different lesions were equally detectable with MBIR and FBP at all four tube voltages. Note the higher image noise and decreased conspicuity of lesion (2) in the large phantom

Subjective image quality

In the medium phantom, MBIR resulted in significantly reduced image noise and increased image quality compared with FBP at all four tube voltages (P = 0.001, respectively) (Table 2).

Table 2 Data for subjective evaluation of image noise and image quality

In the large phantom, MBIR also resulted in significantly improved image noise based on pooled data of all four tube voltages (P = 0.029). However, there was no significant difference in image quality at any of the four tube voltages (P = 0.79). We must note that the subjective image quality parameters were graded higher in the large phantom compared with the medium phantom, likely due to different preferences regarding image quality between the two reader groups. The lesion conspicuity was graded slightly higher with FBP compared with MBIR in the medium phantom (2.38 ± 0.7 vs 2.30 ± 0.7, respectively, P = 0.024) and similarly with both algorithms in the large phantom (2.15 ± 0.8 vs 2.13 ± 0.7, P = 0.44).

Low-contrast lesion detection

No significant difference in overall low-contrast lesion detection rate was noted regardless of the tube voltage or the reconstruction algorithm (detection rate of 76.7-80.7% for the medium phantom and of 56.7-65.2% for the large phantom; P value range of 0.37-1) (Table 3). The average detection rate was 78.7% and 62.5% for the medium and large phantom, respectively. The inter-observer agreement was excellent for both sizes (overall κ range, 0.82-0.89). False-positive findings were substantially increased with MBIR compared with FBP for the medium phantom (P = 0.052). A sub-analysis of the lesion sizes showed, on average, comparable detection rates for both reconstruction algorithms (Table 4, Fig. 2a and b).

Table 3 Data for the detection of 45 simulated hypodense lesions for both phantom sizes
Table 4 Number of true-positive findings by diameter and contrast value of the simulated tumours
Fig. 2
figure 2

The two graphs show the percentage detection rate for each dataset depending on the lesion size in the medium (a) and the large (b) phantom. Note the substantial lower detection rate of 5-mm lesions in the large phantom

Discussion

Despite an improvement in image quality parameters using MBIR compared with FBP, low-contrast detectability was not significantly improved. The improvement of image quality parameters by MBIR is consistent with the findings of recent studies [2228]. However, our results are contradictory to those of the only other study evaluating the impact of the same MBIR (ADMIRE; Siemens Healthineers) on low-contrast detectability [19]. The authors reported a significant improvement in the detection rate of low-contrast lesions at 120 kVp using MBIR at a strength of 3 compared with FBP. Compared with our study, there are two major methodical differences in study design. First, a relatively small phantom (diameter of 16.5 cm) mimicking very lean or paediatric patients was used. In our study, we simulated medium and large patients (diameter of 30 cm and 40 cm), and our results revealed that patient size had a substantial impact on low-contrast detectability. We observed a substantial decrease in the average detection rate between the medium (78.7%) and large phantom (62.5%). The second main difference was that lesion detection accuracy was evaluated by a two-alternative forced choice approach and by scoring the total number of visible object groups. In contrast, we chose to simulate a typical clinical scenario by performing the reading on a full volumetric stack of images, as is common with clinical CT datasets. Twelve clinical radiologists with broad ranges of experience participated in our study. We hypothesise that these major differences in study design contributed to the discrepancy in low-contrast detection rate.

We also evaluated the impact of different tube voltages on image quality and low-contrast lesion detection. We were particularly interested to analyse lower tube voltages because the third-generation dual-source CT scanner used in this study potentially enabled imaging of medium and large patients at tube voltages of 70 and 80 kVp. In former scanner generations, these tube voltages were typically restricted to small patients. Changes in tube voltage could greatly influence image noise and CNR, which was an observation confirmed in our study. Our phantom design, with a higher iodine content of the background compared with the hypoattenuating lesions, augmented the beneficial effect of low tube voltages on lesion-to-background CNR. Lower tube voltages of 70, 80, and 100 kVp resulted in increased CNR for both phantom sizes compared with 120 kVp (16-58% in the medium phantom and 9-18% in the large phantom, respectively). Image noise was relatively constant within each reconstruction algorithm in the medium phantom. However, comparing data at low kVp to those at high kVp, the notably larger size of the large phantom (40 cm) caused a marked increase in the phantom attenuation at lower kVp settings, leading to a proportionate increase in image noise. This observation confirmed that imaging at very low kVp settings could still increase image noise disproportionately in large patients despite recent access to the very high tube current output by the third-generation dual-source CT used in this study.

In the scientific literature, only a few studies evaluated the effect of reducing the tube voltage on diagnostic accuracy in abdominal CT. Beneficial effects of low tube voltages on lesion detection or conspicuity of hypovascular and hypervascular lesions have been reported in phantom and clinical studies [10, 11, 2932]. Despite the overall improvement of CNR by reducing the tube voltage, the low-contrast lesion detection was not significantly altered in our study. Our findings might be explained by the low iodine concentration of the hypodense lesions in our phantom. Contrast enhancement at a particular tube voltage is proportional to iodine concentration [33]. Although the CT numbers of the background parenchyma increased at lower tube voltages, the attenuation of the hypodense lesions differed only slightly. However, the low iodine concentration of our lesions simulates a realistic physiological scenario given that these lesions are typically hypovascular and exhibit low iodine uptake in vivo. Further studies must determine the thresholds of difference in iodine concentration between lesions and background that are needed to make imaging at low tube voltages beneficial.

Another interesting result of our study is that there might also be potential disadvantages to the use of iterative reconstruction algorithms. Although MBIR significantly improved subjective image noise for both sizes and image quality in the medium phantom, the overall lesion conspicuity was graded slightly lower compared with FBP. We observed that the number of false-positive findings was substantially increased using MBIR in the medium phantom (on average five vs two FPs with MBIR compared with FBP). This finding was close to statistical significance (P = 0.052), and we strongly believe that significance would have been attained with a higher lesion count.

The results of our study are consistent with some studies, but also in contrast to several other studies evaluating the impact of IRs on diagnostic accuracy [14, 15, 34]. These controversial results are a consequence of the wide differences in applied study designs. In particular, phantom studies differ substantially in regard to lesion diameter, lesion-to-background contrast values, phantom shape and size and in regard to the investigated clinical task. This raises the question as to whether direct comparisons among these studies are still meaningful. In addition, a comprehensive framework that fully encompasses image quality and diagnostic accuracy in a clinical context is currently lacking. As noted by Fletcher et al. [35], image quality parameters alone, such as image noise, are not necessarily correlated with the diagnostic accuracy of radiologists.

To date, no standardised, objective and reproducible method is routinely used to assess the impact of novel radiation dose reduction techniques, such as IR technique, on diagnostic accuracy. The implementation of model-observer algorithms could overcome this problem in future. Model-observer algorithms simulate human observer performance and therefore potentially enable standardised, reproducible evaluation of image quality in a clinical context [3638].

Our study had several limitations. First, our phantom had a cylindrical shape, which did not precisely reflect the geometry of the abdomen of humans. Second, we only evaluated one specific type of iterative reconstruction algorithm of one vendor. We did not perform a comparison among different generations of iterative reconstruction algorithms, and we only evaluated a single strength. The investigation of additional reconstruction strength levels would not have been practicable due to recall bias. Third, there was possible recall bias of the readers with regard to the location of the lesions. We minimised this bias by changing the geometrical orientation of the datasets and by separating the reading sessions by 1-3 weeks.

In conclusion, the improvement of quantitative image quality parameters by MBIR compared with FBP at different tube voltages did not result in significantly improved low-contrast lesion detectability in simulated abdominal CT of medium and large patients. Although image noise and CNR are well accepted parameters for grading quantitative image quality, they tend to not fully encompass the influencing factors that determine diagnostic accuracy, thus limiting their value as a fundamental basis for dose optimisation in CT. Model-observer algorithms could replace crude measurements of quantitative image quality parameters in the future by enabling benchmarking of image quality and diagnostic accuracy in a clinical context.