Background

Over the last 20 years, 18F-Fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) has played an increasing role in the management of cancer patients and standardized uptake values (SUVs) have been increasingly used as non-invasive quantitative imaging biomarkers in oncology [15]. However, to use SUV as a biomarker in multicentre trials or in sites equipped with multiple PET scanners, adequate reproducibility is required, so that SUVs are comparable regardless of the PET system used.

Hardware and software evolutions can lead to important device-dependent and reconstruction-dependent variations in SUVs. For example, point-spread-function (PSF) reconstruction, which is available from the three major PET vendors and improves spatial resolution throughout the entire field-of-view [610], has been shown to significantly improve the diagnostic performance of 18F-FDG PET in various malignancies and to substantially increase SUV values as compared to the former generation OSEM algorithm [1113]. More recently, Bayesian penalised likelihood (BPL) reconstruction, which can also include PSF modelling, has been reported to increase SUVs and to improve signal-to-noise ratios as compared to the standard PET reconstructions [1416].

Adequate reproducibility of SUVs can be achieved by harmonizing patient preparation as well as acquisition and reconstruction parameters [17, 18], as recommended by the European Association Research Ltd (EARL) accreditation program [19], the North American Quantitative Imaging Biomarker Alliance (QIBA), and Uniform Protocols in Clinical Trials (UPICT) [20]. With regard to reconstruction-dependent variations, it has been recently shown that it is possible to harmonize SUVs produced by an advanced reconstruction algorithm, to meet harmonizing standards, such as those recommended by the European association of Nuclear Medicine (EANM), by applying a filter during the reconstruction [21]. This method, which is recommended by the EARL accreditation program, requires the reconstruction of two data sets: one for optimal lesion detection and one for harmonized quantification, with NEMA NU-2 phantom-based filtering chosen, so that activity concentration recoveries are as close as possible to those recommended by EANM guidelines. To avoid the reconstruction of two data sets, a proprietary software solution, marketed as EQ.PET (Siemens, Oxford, UK), has been developed to simultaneously allow optimal lesion detection and harmonized quantification from a single data set [22, 23]. This software simultaneously presents the reconstruction that provides optimal lesion detection for diagnostic interpretation with harmonized SUV results (Fig. 1). While the use of a second reconstruction and the EQ.PET technology have both been validated in large series of tumor lesions [21, 23], they have not yet been confronted to each other to test for inconsistencies. Indeed, as EQ.PET remains a patented automatic software working “behind the scene” without possibility for the imaging specialist to check the adequacy of region of interest placement, there is a need to validate this technology against a gold standard.

Fig. 1
figure 1

Illustration of reconstruction harmonization methods and summary of the main factors influencing SUV values

Thus, the present study aimed at comparing the two harmonization strategies above-mentioned that enable the use of newer reconstruction algorithms which improve lesion detection while maintaining compatibility of SUV with older systems. As SUV is mainly used for therapy assessment, we mimicked a situation in which a patient would undergo pre- and post-treatment scans on different generation PET systems by reconstructing the same raw PET data with an ordered subset expectation maximization (OSEM) algorithm known to meet EANM requirements and a PSF reconstruction designed for optimal tumor detection. A filter was then applied to the PSF reconstruction to fulfil EANM requirements either by reconstructing a second data set or using the EQ.PET methodology. The OSEM reconstruction and the use of a second data set, which both meet the EANM and EARL requirements, were chosen as standards of reference for SUV quantification and harmonization. We focused on SUVmax and SUVpeak, the two most frequently used SUV metrics [24, 25]. As SUV reconstruction dependency is not the only source of variability, other technical and biological parameters as well as compliance to EANM guidelines for PET tumor imaging were also analyzed.

Methods

Calibration and cross calibration

The calibration of the PET system was performed daily with a 68Ge cylinder with a known radioactive concentration.

The cross-calibration procedure was performed once during the present study. A solution of 18F-FDG (70.2 MBq, as assessed by the dose calibrator) was introduced into a cylindrical phantom with an exactly known volume and completed with water, which resulted in a solution with an exactly known concentration. A two-bed acquisition of the phantom was performed, and images were reconstructed with attenuation and scatter correction identical to patient studies. Twelve VOIs were drawn on consecutive axial slices to determine the average activity concentration of 18F-FDG within the phantom. The cross-calibration factor was calculated as the ratio of the calculated activity and the true activity. The cross-calibration factor was found to be 1.003.

Moreover, clocks of the PET acquisition workstation and that of the dose calibrator were synchronized weekly.

Patient study

This study included 50 consecutive patients referred to our institution for staging and restaging of lung cancer. Informed consent was waived for this type of study by the local ethics committee (Ref A12-D24-VOL13, Comité de protection des personnes Nord-Ouest III), since the scans were performed for clinical indications and the trial procedures were performed independent of usual clinical reporting. The weight and height of patients on the day of the PET examination were recorded. BMI was computed as follows:

$${\text{BMI}}\; = \;\frac{{{\text{Weight}}\;(kg)}}{{{\text{Height}}\;(m^{2} )}}.$$

After a 15-min rest in a warm room, patients who had been fasting for 6 h were injected with 18F-FDG. The injected activity and the exact delay between injection and the start of the acquisition were recorded for each patient.

PET/CT acquisition and reconstruction parameters

All PET imaging studies were performed on a Biograph TrueV (Siemens Medical Solutions) with a 6-slice spiral CT component. Technical details regarding this system can be found elsewhere [6]. CT acquisition was performed first, with the following parameters: 60 mAs, 130 kVp, pitch 1 and 6  ×  2 mm collimation. Subsequently, the PET emission acquisition was performed in 3D mode. Patients were scanned from the skull base to the mid-thighs.

In our department, PET images are reconstructed with a PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; 3 iterations and 21 subsets) without filtering. For the purpose of this study, raw data were also reconstructed with the OSEM3D reconstruction algorithm (4 iterations and 8 subsets) and a PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; 3 iterations and 21 subsets) using a 7 mm Gaussian filter (PSF7). As shown in a previous study, this latter reconstruction led to protocol-specific images with NEMA NU-2 phantom-based filtering that meet EANM quantitative harmonizing standards, therefore, reducing reconstruction-dependent variation in SUVs [21]. The OSEM3D reconstruction parameters were chosen as recommended by the manufacturer. These parameters meet the EANM requirements regarding activity recoveries.

For all reconstructions, matrix size was 168  ×  168, resulting in a 4.07  ×  4.07  ×  4.07 mm voxel size. Scatter and attenuation corrections were applied.

Only the PSF-reconstructed data without filtering were used for the purpose of diagnostic workup.

PET/CT analysis

The same reader analyzed all PET data on a prototype implementation of the EQ.PET functionality that is available for clinical use in Syngo.via (Siemens Medical Solutions).

The first reconstructions displayed on the screen were the PSF and the OSEM3D reconstructions. The EQ.PET filtered SUVs were calculated behind the scenes, without showing the filtered image. The EQ.PET filter for PSF reconstruction was set to 7 mm to meet the kernel of 7 mm of the PSF7 reconstruction. On the PSF reconstruction, VOIs with a 50% isocontour were drawn on primary tumor lesions, and mediastinal and hilar nodes considered to have pathologically increased uptake and metastatic lesions. The slice number from which has been drawn the volume of interest was noted. The VOIs were then automatically propagated on the OSEM3D reconstruction by re-computing a 50% isocontour at the same location as in the PSF data set. This allowed computing SUVmax and SUVpeak for PSF, OSEM3D, and PSFEQ.

The second sets of reconstructions displayed on the screen were PSF and PSF7 reconstructions. Previous VOIs with a 50% isocontour were redrawn identically on PSF reconstruction using slice numbers previously noted. The VOIs were then automatically propagated on the PSF7 reconstruction by re-computing a 50% isocontour at the same location as in the PSF data set. This allowed computing SUVmax and SUVpeak PSF7 on the same location than previously did for the PSF and OSEM3D reconstructions.

SUVpeak was defined as a 1-cm3 sphere positioned within the lesion so as to maximise the enclosed average SUV. Background activity was measured as SUVpeak and SUVmean in an automatically placed 3 cm diameter sphere in the right liver lobe.

For all lesions, SUVmax and SUVpeak were recorded for PSF, PSFEQ, OSEM3D, and PSF7. For this study, SUVpeak was not corrected for lean body mass.

Finally, short-axis dimension (mm), as determined on axial CT slices, was recorded for lesion when applicable.

Statistical analysis

Quantitative data from clinical PET/CT examinations are presented as mean (standard deviation, SD) and median when appropriate. The relationship between PSF, PSFEQ, OSEM, and PSF7 quantitative values was assessed with Bland–Altman plots. Ratios between PSFEQ and PSF7 quantitative values (for SUVmax, and SUVpeak) according to patient’s BMI and location and size of the lesions were compared using the Kruskal–Wallis test for multiple group comparison or the Mann–Whitney test when appropriate. For lesion size, the ratios between PSFEQ and PSF7 SUVs were dichotomized in 3 groups (<10, 10–20, and >20 mm). For all tests, a two-tailed P value of less than 0.05 was considered statistically significant. Graphs and analyses were carried out using the GraphPad software.

Availability of data and materials

The data sets supporting the conclusions of this article are available upon request.

Results

Population characteristics and compliance to guidelines for tumor imaging

Clinical data are summarized in Table 1. The mean (SD) administered 18F-FDG dose was 4.00 (0.16) MBq/kg. The mean (SD) delay between the administration of 18F-FDG and the start of the PET acquisition was 60 (3.24) min. Overall, the EANM 2.0 guidelines for PET tumor imaging were fulfilled in 48/50 patients (96%). At the time of injection, the mean (SD) blood glucose level was 5.70 (1.27) mmol/l.

Table 1 Patient demographics

Validation of the EQ-PET technology and the use of an additional harmonized PET data set to overcome reconstruction dependency of SUVs

Overall, 171 VOIs were drawn over 55 lung lesions (32.2%), 87 lymph nodes (50.9%), and 29 metastases (16.9%). All lesions confounded, the mean SUVmax (SD) was 10.93 (7.03), 6.96 (4.36), 7.05 (4.36), and 7.01 (4.36) for PSF, OSEM3D, PSF7 reconstructions, and PSFEQ, respectively. The mean SUVpeak (SD) was 6.82 (4.29), 5.41 (3.59), 5.60 (3.59), and 5.58 (3.59) for PSF, OSEM3D, PSF7 reconstructions, and PSFEQ, respectively.

As shown in Fig. 2, the mean ratios between PSF and OSEM3D reconstructions for SUVmax and SUVpeak were 1.59 (95% CI: 1.12–2.06) and 1.28 (95% CI: 1.08–1.49), respectively. When using the filtered PSF7 reconstruction, the mean ratios between PSF7 and OSEM3D for SUVmax and SUVpeak were 1.02 (95% CI: 0.93–1.11) and 1.04 (95% CI: 0.95–1.14), respectively. After application of the EQ.PET filter, the mean ratios between PSFEQ and OSEM3D for SUVmax and SUVpeak were 1.01 (95% CI: 0.91–1.11) and 1.04 (95% CI: 0.94–1.14), respectively.

Fig. 2
figure 2

Validation of the EQ-PET technology and the use of an additional harmonized PET data set to overcome reconstruction dependency of SUVs. For left to right, PSF and OSEM3D values, PSF7 and OSEM3D values, and PSFEQ and OSEM3D values were compared using Bland–Altman plots. Upper panel (a) displays SUVmax values, and lower panel (b) displays SUVpeak values

When comparing PSF7 and PSFEQ standardized uptake values, a perfect correlation was found with an r 2 to 1.00 for both SUVmax and SUVpeak values (p < 0.0001). Regarding Bland–Altman analysis, the mean ratios were 1.01 (95% CI: 0.96–1.06) and 1.01 (95% CI: 0.97–1.04) for SUVmax and SUVpeak, respectively (Fig. 3a).

Fig. 3
figure 3

Pairwise comparison of the EQ-PET technology and the use of an additional harmonized PET data set. The upper panel (a) displays linear regression and Bland–Altman analysis for SUVmax and SUVpeak values of tumoral lesions. The lower panel (b) displays linear regression and Bland–Altman analysis for SUVmean and SUVpeak values of liver background

Outliers of all Bland–Altman plots are summarized in Table 2, and 75% of them corresponded to lymph nodes lesion with a median size equals 9.76 mm. Median size of outlier tumors was 19.39 mm.

Table 2 Details of Bland–Altman plot outliers

Effect of harmonization strategies on the liver background

The mean SUVmean (SD) was 2.36 (0.37), 2.39 (0.37), 2.36 (0.37), and 2.36 (0.37) for PSF, OSEM3D, PSF7 reconstructions, and PSFEQ, respectively. The mean SUVpeak (SD) was 2.90 (0.46), 2.72 (0.43), 2.74 (0.44), and 2.73 (0.44) for PSF, OSEM3D, PSF7 reconstructions, and PSFEQ, respectively.

Bland–Altman plots are displayed in supplemental Fig. 1. The mean ratios between PSF and OSEM3D reconstructions for SUVmean and SUVpeak were 0.99 (95% CI = 0.96–1.02) and 1.06 (95% CI: 1.02–1.11), respectively. When using filtered PSF7 reconstruction, the mean ratios between PSF7 and OSEM3D for SUVmean and SUVpeak were 0.99 (95% CI: 0.97–1.01) and 1.00 (95% CI: 0.97–1.03), respectively. After application of the EQ.PET filter, the mean ratios between PSFEQ and OSEM3D for SUVmean and SUVpeak were 1.01 (95% CI: 0.99–1.04) and 1.00 (95% CI: 0.97–1.04), respectively.

When comparing PSF7 and PSFEQ, a perfect correlation was found with an r 2 to 1.00 for both SUVmean and SUVpeak values (p < 0.0001). Regarding Bland–Altman analysis, the mean ratios were 1.00 (95% CI: 1.00–1.00) and 1.00 (95% CI: 0.99–1.01) for SUVmean and SUVpeak, respectively (Fig. 3b).

Confounding factors

For 16 lesions, the measurement of the short dimension was not applicable: 3 primary lung lesions, 2 pleural metastases, 10 bone metastases, and 1 pulmonary metastasis with either no visible lesion or blurring contours on computed tomography.

As shown in Fig. 4, the ratios between PSF7 and PSFEQ quantitative values (SUVmax and SUVpeak) were not different according to the histological type of the primary lesion, the patient BMI, or the location of lesion. When considering the size of the lesion, ratios between PSF7 and PSFEQ quantitative values (SUVmax and SUVpeak) were significantly different between <10 and >20 mm groups. Mean ratios and their standard deviations tended to be higher for the smallest lesions, but all ratios were found to be within the 1.05 limit.

Fig. 4
figure 4

Impact of confounding factors. Impact of the histology of the primary lung lesion (a), the BMI (b), the location of the lesion (c), and the lesion size (d) on the ratio between PSF7 and PSFEQ quantitative values (left panels SUVmax and right panels SUVpeak). Note that 16 lesions were not measurable and are, therefore, not included in the “per size” analysis (d)

Discussion

Accurate and reproducible SUVs are mandatory when using these metrics as quantitative imaging biomarkers in multicentre studies. This can be achieved by harmonizing both patient preparation as well as acquisition and reconstruction parameters [17, 18], as recommended by the European Association Research Ltd (EARL) accreditation program [19], and other groups [20]. Harmonization of the reconstructions parameters may be challenging, because centres running PET systems with new reconstruction algorithms participating in multicentre trials often wish to use their PET system with parameters chosen to achieve optimal lesion detection. In addition, centres using similar PET systems may use different reconstructions or filtering parameters [26], highlighting the need to harmonize quantitative values even within centres running a similar equipment.

The present study compared two solutions to harmonize SUV to a given standard; here, the EANM harmonizing standards: (1) the use of two data sets: one for optimal lesion detection and one for harmonized quantification [21] and (2) the use of a proprietary software that has been developed to simultaneously allow optimal lesion detection and harmonized quantification from a single data set [23].

As reconstruction is not the only source of variability in SUV measurements, we analyzed adherence to EANM guidelines for PET tumor imaging, especially with regard to other technical factors susceptible to hamper accuracy and reproducibility of SUVs. Compliance was found to be good for clocks synchronization, cross calibrations between the PET system and the dose calibrator, standardized injected dose, and uptake time. Despite this good compliance, when mimicking the situation in which a patient would undergo pre- and post-treatment scans on different generation PET systems by reconstructing the same raw PET data with an OSEM algorithm known to meet EANM requirements and a PSF reconstruction designed for optimal tumor detection, a substantial increase in SUVmax and SUVpeak was noticed for PSF images. Mean ratios between OSEM3D and PSF images were 1.59 for SUVmax and 1.28 for SUVpeak. Taking the example of a system upgrade during a trial, this 59% increase in PSF-reconstructed images would be likely to impact therapy assessment either with the EORTC and the PERCIST criteria for which a 25 and 30% increase in SUV define progressive disease, respectively [27]. We found that after having applied an appropriate filter (in this study, a 7 mm Gaussian filter), the mean ratios between OSEM and PSF7 data for SUVmax and SUVpeak were 1.02 and 1.04, respectively, with very narrow confidence intervals. Similar results were found after the use of the EQ-PET technology. Thus, our study supports previous studies concluding that both methods are appropriate to harmonize data and show that they give almost similar results either for SUVmax or SUVpeak values. Regarding Bland–Altman analysis comparing PSF7 and PFSEQ, there were few outliers, which occurred mainly for lymph node lesions and/or smallest lesions. Of note, there were fewer outliers when using SUVpeak values. Histological type of primary lung cancer, and patient’s BMI and localisation of the lesion were not found to be confounding factors. When considering the size of the lesion, ratios between PSF7 and PSFEQ quantitative values were significantly different between <10 and >20 mm groups both for SUVmax and SUVpeak values, thus suggesting that it exists a difference between the two harmonization strategies based on the size of the lesion. These results could be explained by the fact that the size of the lesion was found to be a confounding factor for PSFEQ, whereas it was not for PSF7 [12, 23]. One can notice that the standard deviations were higher for the smallest lesions suggesting that differences between the two methods of harmonization were mainly present for the smallest lesions and are likely to be due to partial volume effect. However, the differences observed between groups in our study were extremely small with PSF7/PSFEQ ratios ranging only from 1.002 to 1.022 and so had certainly no clinical significance.

Considering liver background, it is supposed to have less reconstruction dependency than tumoral uptake. This is the reason why it has been recommended to proceed to a normalization of tumoral uptake by liver background [28] and why liver background is used as a reference in the Deauville Scale (DS) when assessing therapy response in lymphoma. Our results showed no influence of PSF reconstruction on liver uptake, as the mean ratio between PSF and OSEM reconstruction was equal to 0.99 with narrow confidence interval. Therefore, the tumor/liver ratio could not be used to overcome reconstruction dependency. This finding is important in the context of ongoing efforts to further improve the DS using a semi-quantitative extension [29, 30]. In that setting, harmonization of SUVs from pooled PET data would produce optimal lesion/liver ratios and discrimination between DS 4 and DS 5.

Our study showed that using two reconstructions for harmonized quantification and optimal diagnosis and the EQ.PET technology equally overcame reconstruction dependency in SUVs values. The choice, therefore, remains open to use one or the other of these methods. However, a potential issue is that the EQ software will certainly be used by departments using Siemens medical solution equipment, whereas the use of a second reconstruction data set, recommended by the EARL accreditation program for quantitation, can be easily implemented in any PET unit. This is actually the main limitation of our work: we used a Siemens software only on a Siemens PET system, thereby limiting the exportation of these results to departments equipped by the same vendor. Indeed, the ability of EQ technology to process images acquired on non-Siemens PET systems has not been addressed here and would require validation of the software’s vendor-neutral capacity, using clinical data and other tools, such as the digital reference object technique recently published by Pierce and co-workers [31]. Nonetheless, the EQ.PET software is the quickest method, as it does not require a second standardized reconstruction that has to be stored. The EQ.PET software could then be applied to older examinations, acquired and stored before the era of PET standardization, and provided that other sources of SUV variability are controlled and data regarding calibration of the PET system is available. Finally, for this study, the current EANM expected values were set as the reference standard, but the EQ.PET filter could be adapted to meet any given standard. This is important in the context of evolving guidelines.

Conclusion

The reconstruction dependency in SUV metrics impedes comparison of quantitative data between different PET systems and requires harmonizing SUVs. Harmonization can be achieved using a second reconstruction for harmonized quantification in addition to the reconstruction optimised for diagnostic purpose or using a software solution that allows optimal lesion detection and harmonized quantification from a single data set. Both technologies produced similar results, but the software solution has the advantages of saving reconstruction and interpretation time and allowing for retrospective analysis of stored data.