Introduction

The use of positron emission tomography (PET) in combination with fluorine-18 fluorodeoxyglucose (18F-FDG) has been expanding rapidly in the past few years and has now been established as a major tool for diagnosis and evaluation of response to therapy in oncology [1]. Most PET studies carried out today in the field of oncology involve the acquisition of whole-body scans, the majority of which are carried out in the two-dimensional (2D) mode. However, recently, with the advent of new detector technology and the implementation of model-based scatter correction techniques in the majority of commercial PET systems, there is increasing interest in the clinical performance of three-dimensional (3D) studies for whole-body PET imaging. 3D PET leads to higher overall system sensitivity, through an increase by a factor of more than 8 in the overall number of detected coincidences. This increased system sensitivity should in turn lead to reduced statistical noise and improved signal to noise ratio in the reconstructed images. However, the removal of septa is also associated with the detection of a larger fraction of erroneous coincidences such as scatter and randoms, which subsequently contribute to background noise and a reduction in the quantitative accuracy of 3D PET images [2, 3].

Although the advantages of 3D over 2D PET under brain imaging conditions have been clearly demonstrated, there is currently an ongoing debate on the value of 3D acquisitions in improving the quantitative and qualitative accuracy of whole-body PET. A number of phantom studies and performance assessments of current BGO-based PET systems have demonstrated improved noise equivalent count (NEC) rates for 3D in comparison to 2D PET [46]. In addition, a recent simulation-based lesion detection observer study considering similar injected activity for both modes of operation has shown improved detection performance for 3D PET [7]. However, all these studies have assumed the ability to perform “perfect” corrections for the contamination from scatter and random coincidences, in addition to absence of any bias associated with performing such corrections. The validity of these assumptions can be critically questioned in clinical practice, considering the approximate nature of scatter correction algorithms and the statistical noise introduced by the rejection of random coincidences, constituting in general more than 50% of the acquired datasets in 3D PET.

On the other hand, phantom and limited clinical comparison studies carried out to date have suggested no real improvement in image quality through the use of 3D in comparison to 2D PET [810]. These results were obtained considering similar emission acquisition times for both modes of operation. At the same time, recent clinical studies with 3D-only devices employing new crystal technology have demonstrated the ability to reduce times of emission acquisition by >50% in comparison to typical 2D protocols currently implemented using BGO systems [11, 12]. In the absence of large-scale clinical evaluations, questions still remain regarding the clinical impact of 3D PET with BGO-based systems, and more specifically whether it can lead to faster times of emission acquisition in comparison to 2D without any loss in image quality and lesion detectability.

Finally, the axial system sensitivity profile of a 3D acquisition is reduced more rapidly at the edges of the axial field of view (aFOV) in comparison to 2D. As a result it is generally necessary to ensure a larger slice overlap between successive aFOVs in 3D whole-body acquisitions, potentially increasing the time of acquisition necessary to cover the same length of patient.

The objectives of this prospective study have been to compare 2D versus 3D FDG PET whole-body imaging in terms of clinical image quality using a dedicated BGO PET scanner. A number of parameters such as (1) activity present at the start of imaging, (2) emission acquisition duration and (3) overlap between successive aFOVs were varied in order to assess their effect on the quality of the 3D whole-body images.

Materials and methods

Acquisition

A total of 70 patients (62±21 years) were included in the study. Average patient weight was 66.8±14.8 kg, ranging from 45 to 96 kg. Both 2D and 3D datasets of the same patient were acquired using the GE Advance PET scanner. This is the same PET system which forms part of the combined GE Discovery LS PET/CT scanner. 2D whole-body PET scans, from the upper torso to the lower abdomen, were performed within 55–70 (59.8±4.1) min of injection of an average of 355.2±18.5 MBq of 18F-FDG. All patient acquisitions were performed with the patient’s arms at their side. The emission time per bed position for 2D acquisitions was according to standard clinical protocols [13] and fixed at 5 min for all patient scans (20–25 min per whole-body scan), while a one-slice overlap between successive aFOVs was used.

The 3D acquisitions were carried out either before or after the 2D whole-body scans, resulting in an activity present at the start of the 3D whole-body acquisitions ranging from 273.8 to 159.1 MBq (212.2±38.7 MBq). Patients were randomly selected for 3D scans prior to 2D and vice versa, with a constraint being the inclusion of at least 30 patients in each of the two groups, resulting in a total of 31 patients having 3D whole-body scans performed prior to 2D. 3D scans started between 45 and 80 (62.4±13.5) min post injection. A variable emission duration of 3 min (24 patients), 4 min (26 patients) and 5 min (20 patients) per aFOV was used. Patients were placed in each of the three groups in a random fashion, with a sole constraint being the inclusion of at least 20 patients in each of the subgroups considered. For each of the emission per aFOV durations considered in 3D, patient scans were equally split between those performed before and those performed after 2D in order to investigate the effect of activity present at the start of acquisition and eliminate any comparison bias as a result of 2D and 3D scanning for the same patient at different time points post injection. In addition, variable overlaps between successive aFOVs were used, ranging from five (23 patients) through seven (24 patients) to nine (23 patients) slices. Similar rules were used in randomly allocating patients to each of the three subgroups of axial slice overlap.

A single transmission acquisition was carried out before the start of the emission scanning using two rotating 68Ga/68Ge rod sources. The duration of transmission acquisition was 3 min per aFOV [13], and acquisition was performed using the 2D emission acquisition parameters in terms of axial coverage. Transmission maps were resampled after their reconstruction prior to their use for attenuation correction of the 3D emission datasets. This was necessary in order to take into consideration the variable slice overlap used during the 3D emission acquisitions in order to match the 3D emission whole-body axial range.

Image reconstruction

Transaxial emission images of 4.3×4.3×4.25 mm3 (matrix size 128×128, 35 slices per aFOV) were reconstructed from both 2D and 3D acquired datasets. In the case of 2D, ordered subsets expectation maximisation (OSEM) with two iterations and 28 subsets was used. Fourier rebinning followed by a weighted least squares algorithm (three iterations and 32 subsets) was used to reconstruct the 3D whole-body images [14]. The parameters used in the implementation of the algorithms for the reconstruction of both 2D and 3D datasets have been optimised in previous studies [13, 14]. Transmission data were corrected for post-injection emission contamination and processed using an unsupervised segmentation algorithm prior to its utilisation for the attenuation correction of the emission data. Both 2D and 3D datasets were corrected for the presence of random coincidences using a delayed coincidence time window methodology. Bergstrom’s Gaussian fit and Ollinger’s single scatter model-based algorithms, both implemented on the GE Advance by the manufacturer, were used to correct the 2D and 3D acquired datasets respectively for the presence of scatter coincidences [15, 16].

Image analysis

Reconstructed 2D and 3D whole-body images were presented to three experienced nuclear medicine physicians in a random order. The 2D whole-body images presented to the observers were adjusted in order to cover the same part of the body as the 3D images, thereby considering the differences arising from the variable slice overlap used during the 3D acquisitions.

All of the whole-body images considered (140 in total) were presented to each of the physicians in a total of five sessions, with an intra-session period of at least 3 weeks in order to minimise any bias through recognition of individual patient datasets. In addition, the 2D and 3D images of the same patient were always presented in different reporting sessions. The observers were allowed to threshold the images according to their preference, and no patient history information was made available during reporting.

Each reader graded the images in terms of image artefacts, confidence in lesion reporting and overall image quality. For image artefacts and overall image quality, a non-continuous scale between 1 and 9 was used (1, unacceptable; 3, poor; 5, acceptable; 7, good; 9, excellent) [17]. For reporting confidence, a similar scale between 1 (uncertain) and 9 (high confidence) was employed. The category of image artefacts comprised well-defined artefacts, including lines as a result of insufficient slice overlap between consecutive aFOVs and distortions around high- or low-activity regions. Confidence in lesion reporting was based on the confidence with which a reader was able to attribute a high activity concentration, excluding normal variants, to a true signal (lesion), as distinct from artefactual uptake or noise. Overall image quality was assessed by considering the presence or absence of image artefacts and their effects on reporting confidence as well as image noise manifested, for example, by the level of smoothness or graininess over the liver, where large areas of uniform normal activity uptake are present [11].

Statistical analysis

Intra-reader variability was assessed using kappa statistics, with kappa values of >0.8 and 0.61–0.80 representing perfect and substantial agreement respectively [18]. Pearson’s correlation coefficient was used to establish statistically significant correlations between scores in the different categories and other parameters such as activity present at start of scanning, patient weight and body mass index (BMI). Differences between various correlation coefficients were compared using z statistics. Scores obtained from each observer were analysed separately using a two-sided paired t test [19]. Given the good agreement revealed by the kappa values, the pooled data from the three observers were also analysed in the same fashion. All statistically significant levels were set at 5% (p<0.05).

Results

Examples of 2D and 3D whole-body images for two different patients are shown in Fig. 1, while Fig. 2 contains coronal slices demonstrating image quality corresponding to different points of the non-continuous scale used in the evaluation. The concordance of reader’s opinion was good for all three figures of merit considered, with kappa values summarised in Table 1.

Fig. 1
figure 1

Two comparative examples of 2D and 3D whole-body images of the same patient. Patient 1 (BMI 24.4), 366 MBq injected. a 3D scan, start of acquisition at 80 min p.i., 4 min of emission/aFOV, seven slice overlap. b 2D scan, start of acquisition at 60 min p.i. Patient 2 (BMI 21.6), 355 MBq injected. c 3D scan, start of acquisition at 50 min p.i., 3 min of emission/aFOV, five slice overlap. d 2D scan, start of acquisition at 70 min p.i.

Fig. 2
figure 2

Patient images demonstrating different image quality levels within the scale used in our evaluation. a “3”: patient weight, height and BMI of 95 kg, 188 cm and 26.9 respectively; injected activity of 307 MBq; 3D acquisition starting at 55 min p.i.; 3 min/aFOV; five slice overlap. b “5”: patient weight, height and BMI of 82 kg, 170 cm and 28.4 respectively; injected activity of 376 MBq; 2D acquisition starting at 68 min p.i. c “7”: patient weight, height and BMI of 75 kg, 174 cm and 24.8 respectively; injected activity of 359 MBq; 3D acquisition starting at 77 min p.i.; 4 min/aFOV; five slice overlap. d “9”: patient weight, height and BMI of 43 kg, 159 cm and 17 respectively; injected activity of 340 MBq; 2D acquisition starting at 58 min p.i.

Table 1 Intra-observer variability for image quality and image artefact scores as assessed by kappa statistics

Table 2 contains a summary of the image quality, image artefacts and reporting confidence scores from one of the observers for the 2D and 3D acquisitions of variable emission duration. Similar results in terms of significant/non-significant differences between 2D and 3D image scores were obtained in the separate analyses for the other two observers as well as from the pooled data. Statistically significant differences in overall image quality between 2D and 3D whole-body images were found with 3D emission acquisitions of 3 min per aFOV. In addition, the above statistically significant results were independent of the activity present at the start of the acquisition for the 3D datasets. This activity was calculated considering the decay of the known injected activity. No significant differences in reporting confidence or image artefacts were seen between the two modes of operation for any of the emission acquisition times considered in 3D. However, a larger difference between 2D and 3D image quality was observed with increasing patient weight. Worse overall image quality was seen on 3D whole-body images in comparison to 2D with a patient weight >70 kg. These differences, although not statistically significant, were present for all the different emission acquisition times considered (for all 70 patients, p=0.134 for a weight <70 kg and p=0.069 for >70 kg).

Table 2 Average image quality, image artefacts and reporting confidence scores for one of the observers, as a function of mode of operation (2D or 3D) and time of emission acquisition in 3D

Statistically significant inverse correlations were found between both 2D and 3D image quality and reporting confidence scores and patient weight, activity at the start of scanning normalised to patient weight, and patient BMI. A summary of the correlation coefficients given in Table 3 shows that although differences were not statistically significant, larger inverse correlations were seen between image quality scores and patient weight in comparison to other parameters. There were no significant differences in these correlation results between the three different emission acquisition times considered in 3D.

Table 3 Summary of the coefficients of correlation between 2D or 3D whole-body image quality and patient weight, activity at the start of scanning with and without normalisation to patient weight and patient BMI

Finally, no differences in image quality or artefact scores were found between the use of five, seven or nine axial slice overlaps in 3D whole-body acquisitions. The majority of 3D image artefacts seen were in the form of cold areas surrounding hot lesions. No significant line artefacts were seen in either 2D or 3D whole-body images irrespective of the emission duration per aFOV. Image artefact scores did not correlate with any other parameters considered, such as patient weight or BMI.

Discussion

The present study has concentrated on a qualitative comparative evaluation of clinical whole-body images acquired in 2D and 3D using a dedicated BGO-based PET scanner. No attempt was made to provide a gold standard in terms of accuracy of lesion detectability by either simulating lesions or providing evidence of pathology. Instead, the purpose of the study was to compare general image quality and lesion detectability in terms of reporting confidence between the two acquisition modes. Our results clearly demonstrate that operating in 3D mode using such a system can lead to whole-body images with similar but not superior overall quality to that achievable in 2D. Substantial intra-observer agreement was obtained in the comparative evaluation of image artefacts, confidence in lesion reporting and overall image quality for the two different modes of acquisition.

Considering the emission acquisition duration, statistically significant differences in terms of overall image quality were detected for the 3 min per aFOV 3D acquisitions, although no statistically significant differences were seen in the corresponding categories of confidence in lesion reporting and image artefacts. These results, which were independent of the activity present at the start of the 3D acquisition, suggest that although the overall image quality of 3 min per aFOV in 3D may not match that of 5 min in 2D mode, it does not appear to significantly affect lesion detectability and therefore may be acceptable.

To date, a very limited number of clinical studies have been performed in the evaluation of whole-body 3D PET using BGO-based scanners. Our results are in good agreement with those reported by Lodge et al., who found no significant differences between 2D and 3D PET in terms of image quality and lesion detectability. This comparison was based on ten clinical whole-body FDG PET studies carried out using an ECAT EXACT 47 and an interleaved imaging protocol with an average activity of 150–180 MBq at the start of scanning, a range that was considered optimal for 3D imaging using this particular system [9].

On the other hand, we obtained a good correlation between overall image quality and patient weight, patient BMI and activity present at the start of acquisition normalised to patient weight for the 2D whole-body images. Significant correlations, although smaller in magnitude, were also found for all these parameters with 3D images, independent of time of emission acquisition per aFOV. These results are in agreement with recent clinical studies that have demonstrated the need for a weight-adjusted emission duration acquisition protocol in 3D PET [11]. An alternative solution will be the use of injected activities adjusted for patient weight, although there may be problems associated with such a solution. In patients with a large BMI, higher injected activities may not completely compensate for loss in image quality, since patient weight does not clearly reflect body composition, which in turn can have a direct effect on parameters such as scatter and attenuation. This hypothesis is supported by Everaert et al., who demonstrated a negative correlation between BMI and patient image quality irrespective of the activity injected per kilogram [12]. In addition, such a solution may be hampered by legislation suggesting an upper limit to the allowable injected activity (for example a recommended maximum cut-off value of 400 MBq for standard 18F-FDG imaging in the United Kingdom). Such a limit may reduce any potentially beneficial effects of a higher injected activity for patients above 70–80 kg, in whom a larger deterioration in image quality has been seen.

In addition, a larger deterioration in image quality with increasing patient weight was observed with 3D than with 2D images. This difference, although detectable, was not statistically significant, potentially as a result of the limited number of patients with a weight >85 kg in our study. El Fakhri et al. demonstrated deterioration in lesion detectability on 3D images relative to 2D images with increasing patient BMI in a study based on the use of numerical observers for the detection of simulated spherical lesions inserted in clinical patient whole-body studies [20]. Similar results were obtained in phantom studies carried out by Kadrmas et al. [21]. Both these studies considered the use of BGO-based dedicated PET systems.

Our results with respect to image artefacts demonstrate that the axial sensitivity profile of the GE Advance PET system in the 3D mode of operation can be handled using five slice overlaps in comparison to the one slice used in 2D. These results are directly applicable to the GE Discovery LS combined PET/CT system. This slice overlap minimises the impact of the 3D axial sensitivity profile on the overall acquisition time required to cover the same patient length as in 2D. Considering a 3D whole-body acquisition comprising five aFOVs (35 transaxial slices in each) and 4 min per aFOV, the effect of the five transaxial slice overlap will be to increase the overall emission acquisition time by <2.5 min (<15% of the overall study time). The majority of the image artefacts in the 3D images were related to the presence of zeros around the areas of high activity uptake. It is thought that these artefacts are due to approximations in the 3D scatter distribution [16, 17].

In conclusion, our results suggest that 3D acquisitions with BGO systems cannot lead to a significant reduction (considering variable patient weight and BMI) in the overall study acquisition time in comparison to 2D acquisitions without an associated compromise in the overall image quality. A 3D-associated reduction of 20% in the emission acquisition time will always need to be weighed against an increase in the overall acquisition time to cover the same patient length (due to the need for a larger overlap between successive aFOVs) as well as a potential compromise in quantitative accuracy. On the other hand, the number of patients included in our study, although larger than in any other study carried out to date, does not allow us to claim equivalence of 3D (4 min/5 min) and 2D (5 min) image quality. However, the fact that no significant differences in image quality or reporting confidence were observed when comparing 3D and 2D whole-body images, at similar overall times of acquisition, is in agreement with the results of other clinical studies [9, 10], and in contrast to expected improvements in signal to noise ratio and overall quality in reconstructed images for acquisitions in the 3D mode. The expected improvement in signal to noise ratio is based on the higher number of true coincidences available in 3D and the associated increase in the overall system sensitivity and NEC rate [46]. However, NEC rate is only one of the factors that may influence image quality since it does not account for differences in image noise between 2D and 3D acquisition modes. Other parameters that may influence the outcome of such comparative studies are the detector resolution, image reconstruction and biases associated with the correction algorithms used to minimise the impact of random and scatter coincidences on the reconstructed images. A combination of these parameters may account for differences in the literature with respect to true lesion detectability performance of different BGO-based PET systems for 3D whole-body imaging.

However, widespread agreement exists on the results concerning the correlation between reduced overall image quality and lesion detectability with increasing patient weight for both 2D and 3D modes of operation. A faster deterioration in comparison to 2D has been generally reported for 3D whole-body images and patient weight exceeding 70–80 kg.

Conclusion

The use of current BGO-based dedicated systems in 3D mode for whole-body oncology acquisitions does not facilitate a significant reduction in the overall acquisition time without an associated reduction in the overall image quality in comparison with that obtained in 2D. An emission acquisition duration per aFOV of 4 min can be used during 3D imaging without compromising image quality relative to 2D (5 min/aFOV), independently of patient weight. Finally, the axial sensitivity profile in the 3D mode of operation can be handled using five slice overlaps in comparison to the one slice used routinely in 2D, minimising, for this particular system geometry and detector configuration, the impact of this parameter on the overall time of emission acquisition in 3D.