New standards for phantom image quality and SUV harmonization range for multicenter oncology PET studies

Akamatsu, Go; Shimada, Naoki; Matsumoto, Keiichi; Daisaki, Hiromitsu; Suzuki, Kazufumi; Watabe, Hiroshi; Oda, Keiichi; Senda, Michio; Terauchi, Takashi; Tateishi, Ukihide

doi:10.1007/s12149-021-01709-1

New standards for phantom image quality and SUV harmonization range for multicenter oncology PET studies

Special Article
Published: 14 January 2022

Volume 36, pages 144–161, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of Nuclear Medicine Aims and scope Submit manuscript

New standards for phantom image quality and SUV harmonization range for multicenter oncology PET studies

Download PDF

Go Akamatsu ORCID: orcid.org/0000-0001-9686-8901¹^na1,
Naoki Shimada²^na1,
Keiichi Matsumoto³,
Hiromitsu Daisaki⁴,
Kazufumi Suzuki⁵,
Hiroshi Watabe⁶,
Keiichi Oda⁷,
Michio Senda⁸,
Takashi Terauchi² &
…
Ukihide Tateishi⁹

1450 Accesses
8 Citations
Explore all metrics

Abstract

Not only visual interpretation for lesion detection, staging, and characterization, but also quantitative treatment response assessment are key roles for ¹⁸F-FDG PET in oncology. In multicenter oncology PET studies, image quality standardization and SUV harmonization are essential to obtain reliable study outcomes. Standards for image quality and SUV harmonization range should be regularly updated according to progress in scanner performance. Accordingly, the first aim of this study was to propose new image quality reference levels to ensure small lesion detectability. The second aim was to propose a new SUV harmonization range and an image noise criterion to minimize the inter-scanner and intra-scanner SUV variabilities. We collected a total of 37 patterns of images from 23 recent PET/CT scanner models using the NEMA NU2 image quality phantom. PET images with various acquisition durations of 30–300 s and 1800 s were analyzed visually and quantitatively to derive visual detectability scores of the 10-mm-diameter hot sphere, noise-equivalent count (NEC_phantom), 10-mm sphere contrast (Q_H,10 mm), background variability (N_10 mm), contrast-to-noise ratio (Q_H,10 mm/N_10 mm), image noise level (CV_BG), and SUVmax and SUVpeak for hot spheres (10–37 mm diameters). We calculated a reference level for each image quality metric, so that the 10-mm sphere can be visually detected. The SUV harmonization range and the image noise criterion were proposed with consideration of overshoot due to point-spread function (PSF) reconstruction. We proposed image quality reference levels as follows: Q_H,10 mm/N_10 mm ≥ 2.5 and CV_BG ≤ 14.1%. The 10th–90th percentiles in the SUV distributions were defined as the new SUV harmonization range. CV_BG ≤ 10% was proposed as the image noise criterion, because the intra-scanner SUV variability significantly depended on CV_BG. We proposed new image quality reference levels to ensure small lesion detectability. A new SUV harmonization range (in which PSF reconstruction is applicable) and the image noise criterion were also proposed for minimizing the SUV variabilities. Our proposed new standards will facilitate image quality standardization and SUV harmonization of multicenter oncology PET studies. The reliability of multicenter oncology PET studies will be improved by satisfying the new standards.

Measurement and Evaluation of Quantitative Performance of PET/CT Images before a Multicenter Clinical Trial

Article Open access 13 June 2018

Evaluation of PET quantitation accuracy among multiple discovery IQ PET/CT systems via NEMA image quality test

Article Open access 12 May 2020

EARL compliance and imaging optimisation on the Biograph Vision Quadra PET/CT using phantom and clinical data

Article Open access 25 July 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Medical Imaging

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Whole-body ¹⁸F-fluorodeoxyglucose (FDG) PET imaging has been widely used in the management of various malignant cancers [1,2,3]. Not only lesion detection, staging, and characterization, but also therapy response assessment are key roles for FDG PET in oncology [4]. With the advent of molecular targeted therapy and immunotherapy, metabolic activity of tumors is frequently assessed by quantitative FDG PET imaging. FDG PET has become a quantitative imaging biomarker, moving beyond a qualitative functional imaging tool [5, 6].

For measuring responses to therapy by FDG PET, major methodologies such as the EORTC criteria and PERCIST have been proposed [7, 8]. In these methodologies, tumor response is assessed by visual interpretation as well as percentage change in standardized uptake values (SUVs), and then classified into the following four definitions: complete metabolic response (CMR), partial metabolic response (PMR), stable metabolic disease (SMD), and progressive metabolic disease (PMD). In this manner, maximum and peak SUVs (SUVmax, SUVpeak) and SUVs normalized by lean body mass (SULs) have been used as quantitative markers for primary and secondary endpoints in FDG PET studies and trials in oncology [9,10,11].

However, PET image quality and quantitative accuracy are considerably affected by numerous factors such as injection activity, uptake duration, subject body size, scanner specifications, and image reconstruction parameters [12, 13]. Figure 1 overviews the factors affecting diagnostic accuracy in FDG PET. Small lesion detectability and tumor SUVs are easily made variable owing to these many factors. This variability may not have a significant impact on results in the case of a single-scanner study. In multicenter studies using multiple scanners, however, the inter-scanner variability might seriously degrade the reliability of the study outcomes [14]. Therefore, in multicenter oncology FDG PET studies, imaging protocols and image characteristics should be verified and standardized using an appropriate phantom before starting the study. As stated by Boellaard [12], the required level of standardization depends on the intended use of FDG PET. When PET is used for visual interpretation such as lesion detection and characterization, image quality should be verified and standardized to ensure detectability of small lesions. On the other hand, more strict standards are required for quantitative PET. When using lesion SUVs to measure responses to certain therapies [8], harmonization of SUVs is essential to minimize the inter-scanner variability in SUVs [15]. Groups led by Kinahan have reported that reducing variability to measure true metabolic change can greatly reduce the required sample size and study costs [16, 17]. Therefore, image quality standardization and SUV harmonization are essential to improve the reliability of multicenter oncology PET studies.

Motivated by this issue, several organizations such as EANM/EARL, RSNA/QIBA, ACR/ACRIN, and SNMMI/CTN have provided their own criteria for optimizing image quality as well as reducing SUV variability [18,19,20,21,22,23,24,25,26]. In Japan, the Japanese Society of Nuclear Medicine (JSNM) provides the standard PET imaging protocol and phantom test procedures with the NEMA NU2 image quality phantom (NEMA body phantom) [27, 28]. The JSNM presents image quality reference levels and an SUV harmonization range for each sphere of the phantom (10–37 mm diameters). However, the reference levels and specified range were determined by the phantom data that had been acquired in the early 2010s with the PET scanners available at that time [29]. In the meantime, clinical PET scanner performance has been improved by recent novel technologies such as the point-spread function (PSF) modeling [30, 31], time-of-flight (TOF) measurements [32, 33], and the penalized likelihood reconstruction algorithm [34]. In particular, TOF coincidence timing resolution has been greatly improved by replacing the conventional photomultiplier tube (PMT) with a newer silicon photomultiplier (SiPM) [35,36,37,38]. With such new technologies, recent PET scanners can visualize small spheres with higher SUVs (a smaller partial volume effect). Because their SUVmax recovery curves often exceed the upper range, downsmoothing is required to satisfy the current range. Although downsmoothing of the images is a simple way to harmonize, it spoils the image contrast and may degrade the visual detectability of small lesions. To adapt to advanced PET scanners with better performance, image quality reference levels and the range for SUVmax should be updated accordingly [12]. Also, a harmonization range for SUVpeak should be established, because this term has been widely used in many clinical studies [12, 39,40,41,42].

In addition to SUV harmonization (minimizing the inter-scanner variability), image noise levels should be lowered to reduce the intra-scanner variability. Increasing image noise levels (e.g., short scan duration) would provide a positive bias for SUVs [43]. A sufficient scan duration is needed to reduce uncertainties in SUV measurements as much as possible [44]. The relationship between SUV variability and image noise levels should be investigated in detail to establish reasonable criteria for image noise levels. The combination of SUV harmonization and image noise management can lead to significant improvement in the value and reliability of quantitative FDG PET studies (Fig. 2).

Motivated by these backgrounds, we investigated image quality and SUV variability in hot spheres of almost all recent PET/CT scanner models using an image quality phantom. The first aim of this study was to propose new image quality reference levels with a focus on 10 mm sphere detectability. The second aim was to propose a new SUV harmonization range and an image noise criterion for minimizing the inter-scanner and intra-scanner SUV variabilities.

Materials and methods

PET/CT scanners

Table 1 lists the PET/CT scanner models and image reconstruction parameters used in this study. Detailed scanner specifications and correction methods are summarized in Supplemental Table 1 [45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61]. We evaluated the 23 scanner models (16 PMT-based scanners and 7 SiPM-based scanners) used at 19 clinical sites. Phantom data were acquired from November 2018 to May 2020. This study did not include human data or any personal information.

Table 1 PET/CT scanners and image reconstruction settings

Full size table

Phantom experiments

Phantom measurements were performed according to the JSNM phantom test procedures [27]. The NEMA NU2 image quality phantom (NEMA body phantom) was used for all evaluations. We provided the phantom test procedure manual to all sites, and we visited several sites and supported the phantom test, if necessary. The phantom contains six spheres, having diameters of 10, 13, 17, 22, 28, and 37 mm. All spheres were filled with ¹⁸F-FDG solutions, so that the sphere-to-background activity ratio was 4. The activity concentration in the background area was 2.53 ± 0.13 (± 5%) kBq/mL, which was determined by the following equation:

$$A_{x} = \frac{a}{60} \times \exp \left( {\frac{ - 60}{{109.8}} \times \ln \left( 2 \right)} \right) \times S {\text{ [kBq/mL]}} ,$$

(1)

where A_x (kBq/mL) is the activity concentration in the background area, a (MBq) is the assumed injection activity for 60-kg subjects, and S is the assumed specific gravity of a human body, that is 1.0 (g/mL). Since the assumed injection dose was 3.7 MBq/kg in this study, a was 222 MBq (3.7 × 60). The patient’s weight section (0010, 1030) of the DICOM header was filled with the phantom background volume, so that the true SUV was 1.00 in the background area.

Data acquisition and image reconstruction

Emission data were acquired for 1800s in list mode. PET images were reconstructed with various acquisition durations of 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, and 1800s. For each acquisition duration except 1800s, three image datasets were reconstructed by changing the data start time of 0, 60, and 120 s. Table 1 shows the image reconstruction parameter, which is the setting for clinical whole-body FDG PET imaging used at each site. For the scanner models with PSF reconstruction, both PET images were reconstructed with and without PSF modeling. A total of 37 patterns of images were obtained. In the data analyses described below, the data were classified into four groups: overall (n = 37), TOF + PSF (n = 17), TOF (n = 15), and PSF (n = 5).

Average SUV in the background area (SUV_B,ave)

To confirm the quantitative accuracy of data, we examined the average SUV in the background area (SUV_B,ave) on PET images with 1800-s acquisition. Image analysis was performed with the PETquactIE Ver. 3 software (Nihon Medi-Physics Co., Ltd) [62]. On the axial slice of the sphere center, 12 circular regions-of-interest (ROIs) with a 37-mm diameter were placed over the background area [63]. The ROIs were also placed on the slices ± 1 and ± 2 cm away from the central slice (60 ROIs in total). The SUV_B,ave was calculated by the following equation:

$${\text{SUV}}_{{\text{B,ave}}} = \frac{{\mathop \sum \nolimits_{k = 1}^{K} {\text{SUV}}_{{{\text{B,37}}\,{\text{mm,k}}}} }}{K} ,$$

(2)

where SUV_B,37 mm is the average SUV for the 37-mm ROIs and K is the number of ROIs, that is 60. An acceptable range of the SUV_B,ave was defined as 0.95–1.05. When the SUV_B,ave did not meet this acceptable range, re-testing was done after cross calibration and, if necessary, scanner maintenance.

Part I: image quality with a focus on 10 mm sphere detectability

Visual detectability score

Detectability of the 10-mm-diameter hot sphere was visually assessed by five nuclear medicine technologists in a 3-step scale (0, not visualized; 1, visualized, but similar hot spots are observed; and 2, identifiable). The VOX-BASE/MANAGER (J-MAC SYSTEM, INC., Japan) was used to display PET images using an inverted gray scale with an upper level of 4 and a lower level of 0 (SUV-scaled). The score was averaged across the three image sets and then averaged across the five raters. A score of 1.5 was defined as an acceptable level (i.e., the 10 mm hot sphere can be detected by half or more of the raters) [29].

NEC_phantom

To examine coincidence count data quality, the noise-equivalent count for phantom (NEC_phantom) was calculated by the following equations [29, 64, 65]:

$${\text{NEC}}_{{{\text{phantom}}}} = \left( {1 - {\text{SF}}} \right)^{2} \frac{{\left( {T + S} \right)^{2} }}{{\left( {T + S} \right) + \left( {1 + k} \right)fR}} {{\text{ [Mcounts]}}}$$

(3)

$$f = \frac{{S_{a} }}{{\pi r^{2} }},$$

(4)

where SF represents scatter fraction, and T, S, and R are true, scatter and random coincidence counts. T + S was calculated by subtracting estimated random coincidence counts (R) from prompt coincidence counts (T + S + R). k is a random scaling factor, depending on the random correction method used [66]. We simply set k = 1 for a delayed coincidence-based method, and k = 0 for a singles-based method. f is the ratio of object size to the transaxial field-of-view, S_a is the cross-sectional area of the phantom, and r is the radius of the detector ring. The scatter fraction (SF) for each scanner, according to NEMA NU2 standards, is shown in Supplemental Table 1. The SF values were obtained from previous publications or scanner specification sheets or measured at the clinical site.

Image quality [10-mm-sphere contrast (Q_H _,10 mm), background variability (N _10 mm), and image noise level (CV_BG)]

For image quality assessment, we evaluated the contrast for the 10 mm hot sphere, background variability and image noise level in the background area using the PETquactIE Ver.3 software [62]. On the axial slice of the sphere center, we placed a circular ROI on the 10 mm sphere. In addition, we placed twelve 10-mm-diameter circular ROIs on the background area on the slice of the sphere center and on slices ± 1 cm and ± 2 cm away from the central slice (60 ROIs in total). The percent contrast for the 10 mm hot sphere (Q_H,10 mm) was calculated as follows:

$$Q_{{{\text{H}},10\,{\text{mm}}}} = \frac{{C_{{{\text{H}},10\,{\text{mm}}}} /C_{{{\text{B}},10\,{\text{mm}}}} - 1}}{{a_{{\text{H}}} /a_{{\text{B}}} - 1}} \times 100\text{ (\%)},$$

(5)

where C_H,10 mm and C_B,10 mm are the average activity in the ROI for the 10 mm sphere and the average activity in all the background 10-mm-diameter ROIs, respectively. ${a}_{\mathrm{H}}/{a}_{\mathrm{B}}$ is the activity concentration ratio between the hot spheres and the background. The percent background variability (N_10 mm) for the 10 mm circular ROIs was calculated as follows:

$$N_{{10\,{\text{mm}}}} = \frac{{{\text{SD}}_{{10\,{\text{mm}}}} }}{{C_{{{\text{B}},10\,{\text{mm}}}} }} \times 100\text{ (\%)}$$

(6)

$${\text{SD}}_{{10\;{\text{mm}}}} = \sqrt {\frac{{\mathop \sum \nolimits_{k = 1}^{K} \left( {C_{{{\text{b}},10\;{\text{mm,k}}}} - C_{{{\text{B}},10\;{\text{mm}}}} } \right)^{2} }}{K - 1}} , K = 60,$$

(7)

where SD_10 mm is the standard deviation of the mean activity for the background 60 ROIs. For image noise assessment, we placed 37-mm-diameter circular ROIs on the background area in the same manner as for the background variability assessment (60 ROIs). The coefficient of variation on the background area (CV_BG) (image noise levels) was calculated by the following equation:

$${\text{CV}}_{{{\text{BG}}}} = {\text{mean of}} \left( {\frac{{{\text{SD}}_{{37\;{\text{mm}}}} }}{{C_{{{\text{B,37}}\;{\text{mm}}}} }} \times 100} \right)\left[ \% \right],\left[ {n = 60} \right],$$

(8)

where SD_37 mm and C_B,37 mm are the standard deviation and average of the activity in each 37-mm-diameter ROI, respectively. The Q_H,10 mm, N_10 mm and CV_BG were measured and averaged by five nuclear medicine technologists.

Investigation of image quality reference levels allowing the 10 mm sphere to be visible

The relationships between each image quality metric and visual detectability score were examined to explore an appropriate image quality level for 10 mm sphere detection. The NEC_phantom, Q_H,10 mm, N_10 mm, Q_H,10 mm/N_10 mm, CV_BG, and visual detectability score are shown as a function of acquisition duration (30–300 s). As mentioned earlier, a visual detectability score of 1.5 was defined as an acceptable level. Figure 3 shows the workflow to determine a reference level for each image quality metric. For each image quality metric and each dataset, we measured a 10-mm-sphere-detectable value so as to achieve the visual detectability score of 1.5 (Fig. 3, step 1). For all data, the acquisition duration corresponding to the visual detectability score of 1.5 was calculated by linear interpolation between the nearest data. If the visual detectability score was higher than 1.5 at the minimum acquisition duration of 30 s, the data with the acquisition duration of 30 s were used as the 10-mm-sphere-detectable value. Subsequently, the reference level for each image quality metric (NEC_phantom, N_10 mm, Q_H,10 mm/N_10 mm and CV_BG) was calculated (Fig. 3, step 2). The reference level was defined as the median for all 10-mm-sphere-detectable values.

Inter-rater variability in each image quality metric

To evaluate the inter-rater variability in Q_H,10 mm, N_10 mm and CV_BG, we calculated the respective coefficient of variation across five raters (inter-rater variability) as follows:

$${\text{Inter-rater variability}} = \frac{\sigma }{\mu } \times 100{ }\text{ (\%)},$$

(9)

where σ and μ are the standard deviation and mean of the measurement values, respectively. To remove the effect of statistical noise, the PET images with 300 s acquisition were used for this evaluation.

Part II: SUV variability

SUVs of hot spheres

On PET images with 1800-s acquisition, SUVmax and SUVpeak for the hot spheres were measured using PETquactIE Ver. 3 and RAVAT, respectively (Nihon Medi-Physics Co., Ltd.) [15, 62]. To measure SUVmax for each sphere, a circular ROI was placed with a diameter equal to the inner diameter of the sphere. To measure SUVpeak for each sphere, a volume-of-interest (VOI) was placed, so that the VOI covered the whole uptake. The SUVpeak was defined as the average value within a 1 mL spherical VOI (12-mm-diameter) that was placed so as to maximize the average SUV [18]. Considering this definition, we did not measure the SUVpeak of the 10-mm sphere. When showing recovery coefficient curves, the SUVs were normalized by the true value of 4.

SUV harmonization range

SUVs of the hot spheres among all images with 1800-s acquisition (n = 37) were investigated for all-size spheres. To investigate feasible lower and upper limits, 0–30th percentiles and 70th–100th percentiles were calculated in a fifth percentile step. On PET images with PSF reconstruction, the SUVs of 13–22 mm spheres were often overestimated by edge artifact [67, 68]. Here, the maximum overshoot rate in SUVs (MOR) was calculated by the following equation:

$${\text{MOR}} = \frac{{{\text{SUV}}_{i} - {\text{SUV}}_{{37\;{\text{mm}}}} }}{{{\text{SUV}}_{{37\;{\text{mm}}}} }} \times 100 \left( \% \right),$$

(10)

where SUV_i is the SUV of the i-mm diameter sphere that shows the highest SUV among 13–22 mm spheres, and SUV_37 mm is the SUV of the 37-mm-diameter sphere. Based on these data, we investigated a feasible SUV harmonization range. The upper limit was determined, so that the MOR was lower than 5%. For the lower limit, we considered that it should be lower than the true SUV of 4 for all spheres.

Relationships between SUVs of hot spheres and image noise levels (CV_BG)

On PET images with 30–300 s acquisition, we investigated relationships between SUVs of the hot spheres and image noise levels. In this evaluation, SUVmax of the hot spheres was measured using spherical VOIs that sufficiently covered the whole uptake, assuming realistic tumor uptake measurements. Each SUV of the hot spheres on PET images with 1800-s acquisition was defined as a reference, because the images were in low noise conditions. Then, on PET images with 30–300 s acquisition, relative differences of SUVs were plotted as a function of CV_BG. The measurement procedure of the CV_BG was described above (Eq. 8). The relative differences of SUVs (RD_SUV) were calculated by the following equation:

$${\text{RD}}_{{{\text{SUV}}}} = \frac{{{\text{SUV}}_{i} - {\text{SUV}}_{{i, {\text{ref}}}} }}{{{\text{SUV}}_{{i, {\text{ref}}}} }} \times 100 \left( \% \right) ,$$

(11)

where SUV_i is the SUV of the i-mm diameter sphere on each PET image and SUV_i,ref is the SUV of the i-mm-diameter sphere on PET images with 1800-s acquisition. The RD_SUV was calculated for SUVmax and SUVpeak. To investigate the effect of the uptake volume, the RD_SUV values were classified into two groups based on the sphere diameter (diameter: < 20 mm and ≥ 20 mm). This was based on the recommendation by the QIBA and PERCIST that the minimum lesion size was 2 cm in diameter for the target lesion at the baseline [8, 18].

Statistical analysis

All statistical analyses were performed with EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan) [69], which is a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria). Comparisons of values between two groups were performed with the Mann–Whitney U test. Comparisons of values among three or more groups were performed using the Kruskal–Wallis test, followed by the Steel–Dwass pair-wise multiple comparison test. Spearman’s correlation test was used to investigate the correlation of each image quality metric with the visual detectability score. Correlations between RD_SUV and CV_BG were examined with Pearson’s correlation test. In all analyses, P < 0.05 was defined as statistically significant.

Results

Average SUV in the background area (SUV_B,ave)

The mean ± SD of the SUV_B,ave was 1.00 ± 0.03 and all values were within 0.95–1.05. Supplemental Fig. 1 shows SUV_B,ave for all scanner models. There was no significant difference among reconstruction algorithms (P = 0.56).

Part I: image quality

Figure 4 shows PET images with 120-s acquisition, which were reconstructed with clinical settings. There were no artifacts in any images, but large differences were found in visual contrasts of the smallest 10 mm sphere among scanners. Figure 5 shows NEC_phantom, Q_H,10 mm, N_10 mm, Q_H,10 mm/N_10 mm, CV_BG and visual detectability score as a function of scan duration. The NEC_phantom, Q_H,10 mm/N_10 mm, and visual detectability score increased with acquisition duration, while N_10 mm and CV_BG decreased with it. The Q_H,10 mm did not correlate with acquisition duration.

Figure 6 shows distributions of 10-mm-sphere-detectable values (i.e., corresponding to visual detectability score = 1.5) for NEC_phantom, N_10 mm, Q_H,10 mm/N_10 mm and CV_BG. The data were classified into four groups by image reconstruction methods as follows: Overall (n = 37), TOF + PSF (n = 17), TOF (n = 15), and PSF (n = 5). The medians [min, max] of the 10-mm-sphere-detectable values were 3.2 [0.5, 6.8] for NEC_phantom, 10.6 [7.3, 19.6] for N_10 mm, 2.5 [0.3, 3.5] for Q_H,10 mm/N_10 mm, and 14.1% [8.8, 33.5] for CV_BG. For NEC_phantom and N_10 mm, significant differences were observed in the 10-mm-sphere-detectable values among the three groups. For more detailed information, the relationships between each image quality metric and visual detectability score are shown in the supplemental data (Supplemental Figs. 2–5). Each image quality metric was significantly correlated with the visual detectability score (P < 0.001) (Supplemental Table 2).

Medians [min, max] of the inter-rater variability in Q_H,10 mm, N_10 mm and CV_BG were 4.0 [1.0, 9.4], 5.6 [2.1, 13.3], and 0.8 [0.3, 5.6], respectively (Fig. 7). Inter-rater variability was significantly lower for CV_BG compared to Q_H,10 mm and N_10 mm (P < 0.001).

Part II: SUV variability

Figure 8 shows recovery coefficients for SUVmax and SUVpeak on PET images with 1800-s acquisition. A large variability was observed especially for the 13 mm sphere. Table 2 summarizes median, minimum, and maximum values of SUVmax and SUVpeak on PET images with 1800-s acquisition. For the small spheres (10–17 mm diameter spheres), the inter-scanner variability in SUVpeak was smaller than that in SUVmax.

Table 2 Median, minimum, and maximum values of SUVmax, SUVpeak, and CV_BG in 1800-s PET images

Full size table

The mean ± SD and various (0–30th and 70th–100th) percentile values for SUVmax and SUVpeak of all spheres are shown in Table 3 for PET images with 1800-s acquisition. The MOR for each upper range of 70th–100th percentiles is also given in that table. Using the 100th percentile, we obtained MORs for SUVmax and SUVpeak of 11.0% and 2.3%, respectively.

Table 3 Mean ± SD and various percentile values for SUVmax and SUVpeak for all spheres

Full size table

The MOR for SUVmax was lower than 5% when using ≤ 90th percentile values as the upper limit (Table 3). Therefore, the 90th percentile values were defined as the upper limit for the SUV harmonization range (Fig. 9). Then, the 10th percentile values were defined as the lower limit. This was selected, because the lower limit for all spheres was lower than the true SUV of 4, and the exclusion rate was the same as the upper limit.

For SUVmax and SUVpeak for the hot spheres on PET images with 30–300 s acquisition, RD_SUV in relation to CV_BG are shown in Fig. 10. In SUVmax for the small spheres (10–17 mm diameter), a positive bias was observed in RD_SUV. Table 4 shows median, minimum, and maximum values for the RD_SUV. The median [min, max] of the RD_SUV for SUVmax and SUVpeak in all spheres were 5.3% [− 30.6%, 340.7%] and 1.1% [− 17.8%, 49.8%], respectively. There was a significant difference in the RD_SUV between SUVmax and SUVpeak (P < 0.001). The RD_SUV for both the SUVmax and SUVpeak significantly depended on sphere diameter (< 20 mm and ≥ 20 mm) and CV_BG (≤ 10% and > 10%) (P < 0.001).

Table 4 Median, minimum, and maximum values for the RD_SUV with various categorizations

Full size table

Discussion

We investigated image quality and SUV variability in hot spheres using 23 recent PET scanner models. Since almost all recent PET/CT scanner models were included in this study, the data precisely reflect current PET image characteristics available at clinical sites. Based on the data, we have proposed a reference level for each image quality metric (NEC_phantom, N_10 mm, Q_H,10 mm/N_10 mm and CV_BG) with a focus on 10 mm sphere detectability. In addition, we have proposed a new SUV harmonization range and image noise criterion with a focus on the inter-scanner and intra-scanner SUV variabilities. Our proposed new standards will be useful for image quality standardization and SUV harmonization of PET studies in oncology.

Part I: image quality

Figures 4 and 5 show PET images and image quality metrics under clinical image reconstruction conditions. Because standardization of PET image quality was not performed, there was a large difference in 10-mm-sphere contrasts among scanners. As theoretically expected, longer scan durations provided lower image noise levels and better visual detectability scores. The results indicate that a simple way to obtain better image quality is to extend scan duration [70]. Looking at the 180-s scan data, which is the standard scan duration recommended by the JSNM [27], almost all scanners achieved the visual detectability score of 2.0 (Fig. 5). Therefore, a 180-s scan for each bed position would be reasonable as a reference standard.

For each image quality metric, we have proposed a reference level that makes the 10 mm sphere visible. The calculation procedure for the reference level (Fig. 3) was the same as that of the previous work in 2014 [29], in which the reference levels were proposed as follows: NEC_phantom > 10.8 Mcounts, N_10 mm < 5.6%, Q_H,10 mm/N_10 mm > 2.8. On the other hand, we have provided reference levels as follows: NEC_phantom ≥ 3.2 Mcounts, N_10 mm ≤ 10.6%, Q_H,10 mm/N_10 mm ≥ 2.5, CV_BG ≤ 14.1%. The CV_BG has been newly added to the image quality metrics.

The proposed new reference level for the NEC_phantom was lower than that in the 2014 study [29]. This result suggests that recent PET scanners can visualize the 10 mm sphere even with a low NEC_phantom value. This is mainly because significant progress has been made in developing image reconstruction algorithms. The NEC is a count-based metric, and independent of image reconstruction algorithms [65]. Because PET image quality is determined by detected coincidence count quality (e.g., NEC), image reconstruction algorithms, and so on (Fig. 1), the NEC_phantom would not be suitable for the use for image quality standardization [71, 72].

The N_10 mm, which is a metric of background variability, had similar results to those of NEC_phantom. The proposed reference level for the N_10 mm was higher than that in the previous study. This is also probably due to advances in image reconstruction algorithm. Specifically, PSF and TOF would contribute mainly to improving contrast for the 10 mm sphere [73]. These new techniques allow recent PET scanners to visualize the 10 mm sphere even with higher background variability. In addition, smaller voxel sizes were used in this study (1.3–4.1 mm) compared with those in the previous study (3.1–5.3 mm) [29]. Higher background variability might be derived from smaller voxel size.

On the other hand, the reference level for Q_H,10 mm/N_10 mm (contrast-to-noise ratio) was almost the same as that in the previous study. In addition, there was no significant difference in the 10-mm-sphere-detectable values for Q_H,10 mm/N_10 mm among the image reconstruction algorithms (Fig. 6). These results suggest that the Q_H,10 mm/N_10 mm would be a useful metric for assuring 10 mm sphere visibility, irrespective of PET scanner models and image reconstruction algorithms. The Q_H,10 mm/N_10 mm includes information on both the 10 mm-sphere-contrast and background variability, and the balance of contrast and noise might be a key component for visual detectability of small hot lesions.

As for the CV_BG (image noise levels), there was no significant difference in the 10-mm-sphere-detectable values among image reconstruction algorithms (Fig. 6). Additionally, the CV_BG has some advantages compared with other metrics. The CV_BG showed the lowest inter-rater variability among all image quality metrics (Fig. 7). The reason for its low variability is that the large 37 mm ROIs were used to measure the CV_BG (10 mm ROIs were used for Q_H,10 mm and N_10 mm measurements). The CV_BG is therefore more reproducible than Q_H,10 mm and N_10 mm are. Furthermore, the CV_BG has been widely used for standardization of FDG PET in oncology. RSNA/QIBA and EANM/EARL specify that image noise levels are assessed by measuring the CV in the uniform background area as part of their standardization strategies [18, 22]. They have provided an acceptable level of 15% that is close to our proposed reference level (14.1%), although the phantom and ROI conditions are somewhat different. The CV_BG and its reference level are compatible with other international standards. The use of CV_BG may facilitate international standardization and global PET studies. What should be taken account for the CV_BG is not considering the image contrast. Not only the CV_BG also other image contrast-related metrics such as Q_H,10 mm/N_10 mm and recovery coefficients [29] should be evaluated to assure small lesion detectability.

Part II: SUV variability

As shown in Supplemental Fig. 1, the SUV_B,ave of all scanner models were within 0.95–1.05. This result indicated that all scanners were well calibrated, and their quantitative accuracy was within ± 5% error. Therefore, our phantom data are sufficiently reliable to establish an SUV harmonization range. In the previous report on 2013, the SUV_B,ave of 16 scanners were distributed from 0.87 to 1.14 [74]. Quantitative accuracy of PET scanners would have been improved by scanner performance progress. As described in the Materials and methods section, we visited several sites and supported the phantom test when requested. Such support might be effective in minimizing any technical errors in the process of phantom preparation.

Subsequently, we investigated inter-scanner SUV variability in each sphere on PET images with 1800-s acquisition (in noise-less conditions). Most scanner models showed higher SUVmax recovery coefficients than their upper limit provided by JSNM (Supplemental Fig. 6). This result suggested that the SUV harmonization range should be regularly updated according to the performance improvement of commercial scanners [12]. In comparison to the large spheres (28–37 mm diameters), the small spheres (10–22 mm diameters) had larger SUV variability (Fig. 8). Many studies have reported that TOF PET scanners provided higher SUVs for small lesions compared with those without TOF [26, 75, 76]. Since this study used both TOF and non-TOF scanner models (19 TOF PET scanner models and 4 non-TOF PET scanner models), the SUV variability in the small spheres would result in large variability.

Comparing TOF + PSF and TOF groups (Fig. 8), higher SUVs were obtained for the 17-mm sphere when using PSF reconstruction. Furthermore, in most cases, SUVmax of the 17-mm sphere was higher than that of the 37-mm sphere. This overshoot would be derived from the edge artifact [31, 67, 68]. If we use the SUVmax of a small lesion on PSF-based PET images for monitoring treatment response, this overshoot must be suppressed by SUV harmonization [77]. For SUVpeak, on the other hand, the overshoot was suppressed even in PSF-based PET images, and the inter-scanner variability was lower than that for SUVmax.

Based on various percentile values for SUVmax and SUVpeak of all spheres, we proposed a new SUV harmonization range (Fig. 9, 10th–90th percentile). To address the overshoot due to PSF reconstruction [77], we determined the upper limit, so that the MOR was lower than 5% (Table 3). By satisfying our proposed harmonization range, PET images can be used for both lesion detection and quantification even if PSF reconstruction is applied; and feasible and practical SUV harmonization is possible using this harmonization range. Compared with the SUV recovery coefficients for EANM/EARL standards 2 [22, 78], our proposed SUVmax harmonization range is lower (Supplemental Table 3). This is probably due to differences in the phantom test conditions. Because of the low activity concentration, the short scan duration, and high sphere-to-background contrast, the EANM/EARL standards 2 provided a higher bandwidth for SUVmax recovery coefficients. Taking the difference in phantom test conditions into consideration, there would be no big differences between the SUV recovery coefficient harmonization ranges. Interestingly, the differences in SUVpeak recovery coefficient ranges were exceedingly small despite the different phantom test conditions. International harmonization may be possible, although further investigations are required.

Then, we investigated intra-scanner SUV variability in relation to image noise levels. For all data (n = 37), three images each with the same acquisition time (30–300 s) were reconstructed. The number of images (n = 1110) would be adequate to investigate the relationships. For SUVmax, the variability increased as the CV_BG increased. Because SUVmax is derived from a single maximum voxel value, its variability depends considerably on image noise levels [44]. For the large spheres (≥ 20 mm diameter), a positive bias was clearly observed (ρ = 0.82). This noise-dependent bias was also reported by Lodge et al. [43]. On the other hand, for the small spheres (< 20 mm diameter), the positive bias was weaker (ρ = 0.60) and the numbers of negative values were increased (Fig. 10). When measuring a sequential percentage change in SUVs between two time points, the variability may be large for small lesions. Low image noise is essential for accurate quantitative evaluation, especially for small lesions.

As shown in Table 4, the RD_SUV values for SUVmax were distributed from − 30.6 to 340.7% on the PET images with CV_BG of higher than 10%. Meanwhile, on the PET images with CV_BG of 10% or lower, the RD_SUV were distributed from − 22.3 to 35.3%. In the QIBA/UPICT, the CV in the uniform area should be lower than 15% as a target level, and ideally, it should be lower than 10% [18, 79]. The SNMMI/CTN also uses CV in the uniform area as an image noise metric, and it is recommended that CV be 10% or lower [80, 81]. Akamatsu et al. [44] examined the relationships between image noise levels and SUVs using a phantom and a single PET scanner, and suggested the CV in the uniform area should be below 10% to minimize the SUVmax fluctuation. Considering the results in this study and the standards set by the major nuclear medicine societies, CV_BG ≤ 10% would be reasonable and feasible as the image noise criterion.

Comparison of SUVmax and SUVpeak showed that each has its own advantages and disadvantages. SUVmax has been most commonly used to measure lesion uptakes in FDG PET, because its measurement is easy and observer-independent [8, 13]. The partial volume effect is relatively small even in small lesions [82]. Furthermore, SUVmax reflects the highest metabolically active area inside potentially heterogeneous tumors. This is important, because the highest metabolic activity might be critical information clinically. The most challenging issue is the variability in SUVmax (Figs. 8 and 9). Because the inter-scanner and intra-scanner variabilities in SUVmax are problematic, SUV harmonization and image noise management are essential in multicenter studies. In contrast to SUVmax, SUVpeak has lower intra-scanner variability (Fig. 10). SUVpeak was less sensitive to image noise levels than SUVmax. On the PET images with CV_BG of 10% or lower, the RD_SUV values for SUVpeak were distributed from − 10.8 to 15.4%. Makris et al. [25] also reported that the SUVpeak was less sensitive to variability in image characteristics and might be less affected by noise-dependent bias in comparison to SUVmax. Since SUVpeak may provide lower inter-scanner and intra-scanner variabilities than SUVmax, it is more suitable for use in multicenter studies. However, there are some considerations if SUVpeak is to be used. Because SUVpeak is derived from the 12-mm-diameter spherical VOI, lesion uptakes might be underestimated due to the partial volume effect, particularly in lesions smaller than 20 mm, and it is not applicable to lesions smaller than 12 mm. In addition, there are various definitions for SUVpeak itself [83] and variability will be introduced depending on the image analysis software. To compare the values derived from multiple software codes, VOI definitions should be verified and standardized among image analysis software codes. The appropriate quantitative measure (SUVmax, SUVpeak, etc.) should be selected according to each study’s purpose and the characteristics of the target lesion.

Limitations and future issues

The image quality reference levels that we proposed are not appropriate for all FDG PET studies. We focused on 10-mm-sphere detectability; however, if sub-centimeter lesions are the study target, smaller spheres should be evaluated for more effective standardization. In addition, the NEMA image quality phantom mimics an average human body size. In some cases, such as pediatric studies or studies on overweight patients, phantoms of corresponding size would be suitable. Fukukita et al. [29] evaluated larger size body phantoms, and demonstrated that a longer scan time was required for larger phantoms to keep the 10 mm sphere visual detectability. Appropriate evaluations and quality controls should be made according to the purposes of the individual FDG-PET studies [12].

Regarding FDG distributions, intra-tumoral FDG uptakes are not homogeneous but heterogeneous in some types of tumor [84,85,86]. SUVmax and SUVpeak reflect only the amount of FDG uptakes in specified regions. Recently, other quantitative measures to characterize lesion FDG uptakes have been used, such as metabolic tumor volumes, total lesion glycolysis, and textural features [86,87,88]. If these quantitative metrics are being used in multicenter studies, the inter-scanner and intra-scanner variabilities should be verified using an appropriate phantom to move toward harmonization.

Conclusions

We experimentally investigated image quality and SUV variability in hot spheres using 23 recent PET scanner models and the NEMA image quality phantom. Then, we investigated appropriate image quality reference levels, so that a 10 mm sphere is visible. The reference levels were newly proposed as: Q_H,10 mm/N_10 mm ≥ 2.5 and CV_BG ≤ 14.1%. CV_BG is the most reliable and useful, because it has the lowest inter-rater variability (Fig. 7) and is compatible with other international standards such as RSNA/QIBA and EANM/EARL. In addition, we investigated the inter-scanner and intra-scanner SUV variabilities. The new SUV harmonization range (in which PSF reconstruction is applicable) and the image noise criterion (CV_BG ≤ 10%) were proposed based on these data. Then, our study results supported that SUVpeak is a useful quantitative metric, because it provided reduced inter-scanner and intra-scanner variabilities compared with SUVmax. International SUV harmonization may be facilitated using SUVpeak, although further investigations are needed.

Our proposed new standards are useful for image quality standardization and SUV harmonization of whole-body FDG PET studies in oncology. The reliability of multicenter PET studies will be improved by satisfying the standards before starting the study. We believe that the new standards will help facilitate research and development of new treatments for cancers.

References

Rohren EM, Turkington TG, Coleman RE. Clinical applications of PET in oncology. Radiology. 2004;231:305–32.
PubMed Google Scholar
Delbeke D. Oncological applications of FDG PET imaging: brain tumors, colorectal cancer, lymphoma and melanoma. J Nucl Med. 1999;40:591–603.
CAS PubMed Google Scholar
Fletcher JW, Djulbegovic B, Soares HP, Siegel BA, Lowe VJ, Lyman GH, et al. Recommendations on the use of 18F-FDG PET in oncology. J Nucl Med. 2008;49:480–508.
PubMed Google Scholar
Weber WA. Assessing tumor response to therapy. J Nucl Med. 2009;50:1S-10S.
CAS PubMed Google Scholar
O’Connor JPB, Aboagye EO, Adams JE, Aerts HJWL, Barrington SF, Beer AJ, et al. Imaging biomarker roadmap for cancer studies. Nat Rev Clin Oncol. 2017;14:169–86.
PubMed Google Scholar
Meikle SR, Sossi V, Roncali E, Cherry SR, Banati R, Mankoff D, et al. Quantitative PET in the 2020s: a roadmap. Phys Med Biol. 2021;66:06RM01.
CAS PubMed Google Scholar
Young H, Baum R, Cremerius U, Herholz K, Hoekstra O, Lammertsma AA, et al. Measurement of clinical and subclinical tumour response using [18F]-fluorodeoxyglucose and positron emission tomography: review and 1999 EORTC recommendations. European Organization for Research and Treatment of Cancer (EORTC) PET Study Group. Eur J Cancer. 1999;35:1773–82.
CAS PubMed Google Scholar
Wahl RL, Jacene H, Kasamon Y, Lodge MA. From RECIST to PERCIST: evolving Considerations for PET response criteria in solid tumors. J Nucl Med. 2009;50(Suppl 1):122S-S150.
CAS PubMed Google Scholar
Judson I, Scurr M, Gardner K, Barquin E, Marotti M, Collins B, et al. Phase II study of cediranib in patients with advanced gastrointestinal stromal tumors or soft-tissue sarcoma. Clin Cancer Res. 2014;20:3603–12.
CAS PubMed Google Scholar
Yap TA, Arkenau H-T, Camidge DR, George S, Serkova NJ, Gwyther SJ, et al. First-in-human phase I trial of two schedules of OSI-930, a novel multikinase inhibitor, incorporating translational proof-of-mechanism studies. Clin Cancer Res. 2013;19:909–19.
CAS PubMed Google Scholar
Connolly RM, Leal JP, Solnes L, Huang C-Y, Carpenter A, Gaffney K, et al. TBCRC026: phase II trial correlating standardized uptake value with pathologic complete response to pertuzumab and trastuzumab in breast cancer. J Clin Oncol. 2019;37:714–22.
CAS PubMed PubMed Central Google Scholar
Boellaard R. Standards for PET image acquisition and quantitative data analysis. J Nucl Med. 2009;50(Suppl 1):11S-20S.
CAS PubMed Google Scholar
Adams MC, Turkington TG, Wilson JM, Wong TZ. A systematic review of the factors affecting accuracy of SUV measurements. AJR Am J Roentgenol. 2010;195:310–20.
PubMed Google Scholar
Fahey FH, Kinahan PE, Doot RK, Kocak M, Thurston H, Poussaint TY. Variability in PET quantitation within a multicenter consortium. Med Phys. 2010;37:3660–6.
PubMed PubMed Central Google Scholar
Daisaki H, Kitajima K, Nakajo M, Watabe T, Ito K, Sakamoto F, et al. Usefulness of semi-automatic harmonization strategy of standardized uptake values for multicenter PET studies. Sci Rep. 2021;11:8517.
CAS PubMed PubMed Central Google Scholar
Kinahan PE, Doot RK, Wanner-Roybal M, Bidaut LM, Armato SG, Meyer CR, et al. PET/CT assessment of response to therapy: tumor change measurement, truth data, and error. Transl Oncol. 2009;2:223–30.
PubMed PubMed Central Google Scholar
Doot RK, Kurland BF, Kinahan PE, Mankoff DA. Design considerations for using PET as a response measure in single site and multicenter clinical trials. Acad Radiol. 2012;19:184–90.
PubMed Google Scholar
FDG-PET/CT Technical Committee. FDG-PET/CT as an Imaging Biomarker Measuring Response to Cancer Therapy, Quantitative Imaging Biomarkers Alliance, Version 1.13, Technically Confirmed Version. QIBA, November 18, 2016. Available from: RSNA.ORG/QIBA.
Graham MM, Wahl RL, Hoffman JM, Yap JT, Sunderland JJ, Boellaard R, et al. Summary of the UPICT protocol for 18F-FDG PET/CT imaging in oncology clinical trials. J Nucl Med. 2015;56:955–61.
PubMed Google Scholar
Scheuermann JS, Saffer JR, Karp JS, Levering AM, Siegel BA. Qualification of PET scanners for use in multicenter cancer clinical trials: the American College of Radiology Imaging Network experience. J Nucl Med. 2009;50:1187–93.
PubMed Google Scholar
Christian P. Use of a precision fillable clinical simulator phantom for PET/CT scanner validation in multi-center clinical trials: The SNM Clinical Trials Network (CTN) Program. J Nucl Med. 2012;53(Suppl 1):437.
Google Scholar
Kaalep A, Sera T, Rijnsdorp S, Yaqub M, Talsma A, Lodge MA, et al. Feasibility of state of the art PET/CT systems performance harmonisation. Eur J Nucl Med Mol Imaging. 2018;45:1344–61.
PubMed PubMed Central Google Scholar
Kaalep A, Sera T, Oyen W, Krause BJ, Chiti A, Liu Y, et al. EANM/EARL FDG-PET/CT accreditation—summary results from the first 200 accredited imaging systems. Eur J Nucl Med Mol Imaging. 2018;45:412–22.
CAS PubMed Google Scholar
Kinahan PE, Perlman ES, Sunderland JJ, Subramaniam R, Wollenweber SD, Turkington TG, et al. The QIBA profile for FDG PET/CT as an imaging biomarker measuring response to cancer therapy. Radiology. 2020;294:647–57.
PubMed Google Scholar
Makris NE, Huisman MC, Kinahan PE, Lammertsma AA, Boellaard R. Evaluation of strategies towards harmonization of FDG PET/CT studies in multicentre trials: comparison of scanner validation phantoms and data analysis procedures. Eur J Nucl Med Mol Imaging. 2013;40:1507–15.
PubMed PubMed Central Google Scholar
Sunderland JJ, Christian PE. Quantitative PET/CT scanner performance characterization based upon the society of nuclear medicine and molecular imaging clinical trials network oncology clinical simulator phantom. J Nucl Med. 2015;56:145–52.
PubMed Google Scholar
Japanese Society of Nuclear Medicine. Standard PET imaging protocols and phantom test procedures and criteria: executive summary. 2017. http://jsnm.sakura.ne.jp/wp_jsnm/wp-content/themes/theme_jsnm/doc/StandardPETProtocolPhantom20170201.pdf. Accessed 14 Aug 2021.
Senda M. Standardization of PET imaging and site qualification program by JSNM: collaboration with EANM/EARL. Ann Nucl Med. 2020;34:873–4.
PubMed Google Scholar
Fukukita H, Suzuki K, Matsumoto K, Terauchi T, Daisaki H, Ikari Y, et al. Japanese guideline for the oncology FDG-PET/CT data acquisition protocol: synopsis of Version 2.0. Ann Nucl Med. 2014;28:693–705.
PubMed PubMed Central Google Scholar
Panin VY, Kehren F, Michel C, Casey M. Fully 3-D PET reconstruction with system matrix derived from point source measurements. IEEE Trans Med Imaging. 2006;25:907–21.
PubMed Google Scholar
Rahmim A, Qi J, Sossi V. Resolution modeling in PET imaging: theory, practice, benefits, and pitfalls. Med Phys. 2013;40:064301.
PubMed PubMed Central Google Scholar
Vandenberghe S, Mikhaylova E, D’Hoe E, Mollet P, Karp JS. Recent developments in time-of-flight PET. EJNMMI Physics. 2016;3:3.
CAS PubMed PubMed Central Google Scholar
Surti S, Karp JS. Update on latest advances in time-of-flight PET. Phys Med. 2020;80:251–8.
PubMed PubMed Central Google Scholar
Teoh EJ, McGowan DR, Bradley KM, Belcher E, Black E, Gleeson FV. Novel penalised likelihood reconstruction of PET in the assessment of histologically verified small pulmonary nodules. Eur Radiol. 2016;26:576–84.
PubMed Google Scholar
van Sluis J, Boellaard R, Somasundaram A, van Snick PH, Borra RJH, Dierckx RAJO, et al. Image quality and semiquantitative measurements on the Biograph Vision PET/CT System: Initial experiences and comparison with the Biograph mCT. J Nucl Med. 2020;61:129–35.
PubMed Google Scholar
Kataoka J, Kishimoto A, Fujita T, Nishiyama T, Kurei Y, Tsujikawa T, et al. Recent progress of MPPC-based scintillation detectors in high precision X-ray and gamma-ray imaging. Nucl Instruments Method Phys Res Sect A Accel Spectrometers, Detect Assoc Equip. 2015;784:248–54.
CAS Google Scholar
Wagatsuma K, Miwa K, Sakata M, Oda K, Ono H, Kameyama M, et al. Comparison between new-generation SiPM-based and conventional PMT-based TOF-PET/CT. Phys Med. 2017;42:203–10.
PubMed Google Scholar
Ota R. Photon counting detectors and their applications ranging from particle physics experiments to environmental radiation monitoring and medical imaging. Radiol Phys Technol. 2021;14:134–48.
PubMed Google Scholar
Boellaard R, Sera T, Kaalep A, Hoekstra OS, Barrington SF, Zijlstra JM. Updating PET/CT performance standards and PET/CT interpretation criteria should go hand in hand. EJNMMI Res. 2019;9:5–6.
Google Scholar
Weber WA, Gatsonis CA, Mozley PD, Hanna LG, Shields AF, Aberle DR, et al. Repeatability of 18F-FDG PET/CT in advanced non-small cell lung cancer: prospective assessment in 2 multicenter trials. J Nucl Med. 2015;56:1137–43.
CAS PubMed Google Scholar
Machtay M, Duan F, Siegel BA, Snyder BS, Gorelick JJ, Reddin JS, et al. Prediction of survival by [18F]fluorodeoxyglucose positron emission tomography in patients with locally advanced non-small-cell lung cancer undergoing definitive chemoradiation therapy: results of the ACRIN 6668/RTOG 0235 trial. J Clin Oncol. 2013;31:3823–30.
CAS PubMed PubMed Central Google Scholar
Kahraman D, Scheffler M, Zander T, Nogova L, Lammertsma AA, Boellaard R, et al. Quantitative analysis of response to treatment with erlotinib in advanced non-small cell lung cancer using 18F-FDG and 3’-deoxy-3’-18F-fluorothymidine PET. J Nucl Med. 2011;52:1871–7.
CAS PubMed Google Scholar
Lodge MA, Chaudhry MA, Wahl RL. Noise considerations for PET quantification using maximum and peak standardized uptake value. J Nucl Med. 2012;53:1041–7.
CAS PubMed Google Scholar
Akamatsu G, Ikari Y, Nishida H, Nishio T, Ohnishi A, Maebatake A, et al. Influence of statistical fluctuation on reproducibility and accuracy of SUVmax and SUVpeak: a phantom study. J Nucl Med Technol. 2015;43:222–6.
PubMed Google Scholar
Kaneta T, Ogawa M, Motomura N, Iizuka H, Arisawa T, Hino-Shishikura A, et al. Initial evaluation of the Celesteion large-bore PET/CT scanner in accordance with the NEMA NU2–2012 standard and the Japanese guideline for oncology FDG PET/CT data acquisition protocol version 2.0. EJNMMI Res. 2017;7:83.
PubMed PubMed Central Google Scholar
Reddin JS, Scheuermann JS, Bharkhada D, Smith AM, Casey ME, Conti M, et al. Performance evaluation of the SiPM-based siemens biograph vision PET/CT system. IEEE Nucl Sci Symp Med Imaging Conf Rec. 2018;2018:1–5.
Google Scholar
Rausch I, Cal-González J, Dapra D, Gallowitsch HJ, Lind P, Beyer T, et al. Performance evaluation of the Biograph mCT Flow PET/CT system according to the NEMA NU2-2012 standard. EJNMMI Phys. 2015;2:26.
PubMed PubMed Central Google Scholar
Jakoby BW, Bercier Y, Conti M, Casey ME, Bendriem B, Townsend DW. Physical and clinical performance of the mCT time-of-flight PET/CT scanner. Phys Med Biol. 2011;56:2375–89.
CAS PubMed Google Scholar
Pan T, Einstein SA, Kappadath SC, Grogg KS, Lois Gomez C, Alessio AM, et al. Performance evaluation of the 5-Ring GE Discovery MI PET/CT system using the national electrical manufacturers association NU 2–2012 Standard. Med Phys. 2019;46:3025–33.
PubMed Google Scholar
Hsu DFC, Ilan E, Peterson WT, Uribe J, Lubberink M, Levin CS. Studies of a next-generation silicon-photomultiplier–based time-of-flight PET/CT system. J Nucl Med. 2017;58:1511–8.
CAS PubMed Google Scholar
Vandendriessche D, Uribe J, Bertin H, De Geeter F. Performance characteristics of silicon photomultiplier based 15-cm AFOV TOF PET/CT. EJNMMI Phys. 2019;6:8.
PubMed PubMed Central Google Scholar
Michopoulou S, O’Shaughnessy E, Thomson K, Guy MJ. Discovery molecular imaging digital ready PET/CT performance evaluation according to the NEMA NU2-2012 standard. Nucl Med Commun. 2019;40:270–7.
PubMed Google Scholar
Reynés-Llompart G, Gámez-Cenzano C, Romero-Zayas I, Rodríguez-Bel L, Vercher-Conejero JL, Martí-Climent JM. Performance characteristics of the whole-body discovery IQ PET/CT system. J Nucl Med. 2017;58:1155–61.
PubMed Google Scholar
Demir M, Toklu T, Abuqbeitah M, Çetin H, Sezgin HS, Yeyin N, et al. Evaluation of PET scanner performance in PET/MR and PET/CT systems: NEMA tests. Mol Imaging Radionucl Ther. 2018;27:10–8.
PubMed PubMed Central Google Scholar
Bettinardi V, Presotto L, Rapisarda E, Picchio M, Gianolli L, Gilardi MC. Physical performance of the new hybrid PET∕CT Discovery-690. Med Phys. 2011;38:5394–411.
CAS PubMed Google Scholar
De Ponti E, Morzenti S, Guerra L, Pasquali C, Arosio M, Bettinardi V, et al. Performance measurements for the PET/CT Discovery-600 using NEMA NU 2–2007 standards. Med Phys. 2011;38:968–74.
PubMed Google Scholar
Zhang J, Maniawski P, Knopp MV. Performance evaluation of the next generation solid-state digital photon counting PET/CT system. EJNMMI Res. 2018. https://doi.org/10.1186/s13550-018-0448-7.
Article PubMed PubMed Central Google Scholar
Kolthammer JA, Su K, Grover A, Narayanan M, Jordan DW, Muzic RF. Performance evaluation of the Ingenuity TF PET/CT scanner with a focus on high count-rate conditions. Phys Med Biol. 2014;59:3843–59.
PubMed PubMed Central Google Scholar
Surti S, Kuhn A, Werner ME, Perkins AE, Kolthammer J, Karp JS. Performance of Philips Gemini TF PET/CT scanner with special consideration for its time-of-flight imaging capabilities. J Nucl Med. 2007;48:471–80.
PubMed Google Scholar
Xu B, Changbin L, Yun D, Renming T, Yachao L, Hui Y, et al. Performance evaluation of a high-resolution TOF clinical PET/CT. J Nucl Med. 2016;57(Suppl 2):202.
Google Scholar
Teoh EJ, McGowan DR, Macpherson RE, Bradley KM, Gleeson FV. Phantom and clinical evaluation of the bayesian penalized likelihood reconstruction algorithm Q.Clear on an LYSO PET/CT system. J Nucl Med. 2015;56:1447–52.
CAS PubMed Google Scholar
Matsumoto K, Endo K. Development of analysis software package for the two kinds of Japanese Fluoro-D-glucose-positron emission tomography guideline. Japanese J Radiol Technol. 2013;69:648–54.
Google Scholar
NEMA. NEMA Standards Publication NU 2–2018: performance measurements of positron emission tomographs. Rosslyn, VA: National Electrical Manufacturers Association; 2018.
Strother SC, Casey ME, Hoffman EJ. Measuring PET scanner sensitivity: relating countrates to image signal-to-noise ratios using noise equivalents counts. IEEE Trans Nucl Sci. 1990;37:783–8.
Google Scholar
Badawi RD, Dahlbom M. NEC: some coincidences are more equivalent than others. J Nucl Med. 2005;46:1767–8.
PubMed Google Scholar
Brasse D, Kinahan PE, Lartizien C, Comtat C, Casey M, Michel C. Correction methods for random coincidences in fully 3D whole-body PET: impact on data and image quality. J Nucl Med. 2005;46:859–67.
PubMed Google Scholar
Reader AJ, Julyan PJ, Williams H, Hastings DL, Zweit J. EM algorithm resolution modeling by image-space convolution for PET reconstruction. IEEE Nucl Sci Symp Conf Rec. 2002;2002:1221–5.
Google Scholar
Kidera D, Kihara K, Akamatsu G, Mikasa S, Taniguchi T, Tsutsui Y, et al. The edge artifact in the point-spread function-based PET reconstruction at different sphere-to-background ratios of radioactivity. Ann Nucl Med. 2016;30:97–103.
CAS PubMed Google Scholar
Kanda Y. Investigation of the freely available easy-to-use software ‘EZR’ for medical statistics. Bone Marrow Transplant. 2013;48:452–8.
CAS PubMed Google Scholar
Masuda Y, Kondo C, Matsuo Y, Uetani M, Kusakabe K. Comparison of imaging protocols for 18F-FDG PET/CT in overweight patients: optimizing scan duration versus administered dose. J Nucl Med. 2009;50:844–8.
PubMed Google Scholar
Chang T, Chang G, Clark JW, Diab RH, Rohren E, Mawlawi OR. Reliability of predicting image signal-to-noise ratio using noise equivalent count rate in PET imaging. Med Phys. 2012;39:5891–900.
PubMed PubMed Central Google Scholar
Maebatake A, Akamatsu G, Miwa K, Tsutsui Y, Himuro K, Baba S, et al. Relationship between the image quality and noise-equivalent count in time-of-flight positron emission tomography. Ann Nucl Med. 2016;30:68–74.
PubMed Google Scholar
Akamatsu G, Ishikawa K, Mitsumoto K, Taniguchi T, Ohya N, Baba S, et al. Improvement in PET/CT image quality with a combination of point-spread function and time-of-flight in relation to reconstruction parameters. J Nucl Med. 2012;53:1716–22.
PubMed Google Scholar
Matsumoto K, Suzuki K, Fukukita H, Ikari Y, Oda K, Kimura Y, et al. Variability in PET quantitation within a multicenter studies in Japan. Eur J Nucl Med Mol Imaging. 2013;40(Suppl 2):S305.
Google Scholar
El Fakhri G, Surti S, Trott CM, Scheuermann J, Karp JS. Improvement in lesion detection with whole-body oncologic time-of-flight PET. J Nucl Med. 2011;52:347–53.
PubMed Google Scholar
Akamatsu G, Mitsumoto K, Taniguchi T, Tsutsui Y, Baba S, Sasaki M. Influences of point-spread function and time-of-flight reconstructions on standardized uptake value of lymph node metastases in FDG-PET. Eur J Radiol. 2014;83:226–30.
PubMed Google Scholar
Munk OL, Tolbod LP, Hansen SB, Bogsrud TV. Point-spread function reconstructed PET images of sub-centimeter lesions are not quantitative. EJNMMI Phys. 2017;4:5.
CAS PubMed PubMed Central Google Scholar
Kaalep A, Burggraaff CN, Pieplenbosch S, Verwer EE, Sera T, Zijlstra J, et al. Quantitative implications of the updated EARL 2019 PET–CT performance standards. EJNMMI Phys. 2019;6:1–16.
Google Scholar
18F-FDG PET/CT UPICT Protocol Writing Committee. UPICT Oncology FDG-PET CT Protocol. http://qibawiki.rsna.org/images/d/de/UPICT_Oncologic_FDG-PETCTProtocol_6-07-13.pdf.
Ulrich EJ, Sunderland JJ, Smith BJ, Mohiuddin I, Parkhurst J, Plichta KA, et al. Automated model-based quantitative analysis of phantoms with spherical inserts in FDG PET scans. Med Phys. 2018;45:258–76.
CAS PubMed Google Scholar
SNMMI Phantom Analysis Toolkit (PAT). https://www.snmmi.org/PAT. Accessed 14 Aug 2021.
Soret M, Bacharach SL, Buvat I. Partial-volume effect in PET tumor imaging. J Nucl Med. 2007;48:932–45.
PubMed Google Scholar
Vanderhoek M, Perlman SB, Jeraj R. Impact of the definition of peak standardized uptake value on quantification of treatment response. J Nucl Med. 2012;53:4–11.
CAS PubMed Google Scholar
Watabe T, Tatsumi M, Watabe H, Isohashi K, Kato H, Yanagawa M, et al. Intratumoral heterogeneity of F-18 FDG uptake differentiates between gastrointestinal stromal tumors and abdominal malignant lymphomas on PET/CT. Ann Nucl Med. 2012;26:222–7.
PubMed Google Scholar
Miwa K, Inubushi M, Wagatsuma K, Nagao M, Murata T, Koyama M, et al. FDG uptake heterogeneity evaluated by fractal analysis improves the differential diagnosis of pulmonary nodules. Eur J Radiol. 2014;83:715–9.
PubMed Google Scholar
Chicklore S, Goh V, Siddique M, Roy A, Marsden PK, Cook GJR. Quantifying tumour heterogeneity in 18F-FDG PET/CT imaging by texture analysis. Eur J Nucl Med Mol Imaging. 2013;40:133–40.
PubMed Google Scholar
Lim R, Eaton A, Lee NY, Setton J, Ohri N, Rao S, et al. 18F-FDG PET/CT metabolic tumor volume and total lesion glycolysis predict outcome in oropharyngeal squamous cell carcinoma. J Nucl Med. 2012;53:1506–13.
CAS PubMed Google Scholar
Kitajima K, Miyoshi Y, Sekine T, Takei H, Ito K, Suto A, et al. Harmonized pretreatment quantitative volume-based FDG-PET/CT parameters for prognosis of stage I-III breast cancer: Multicenter study. Oncotarget. 2021;12:95–105.
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors acknowledge the following colleagues for their kind support regarding the phantom experiments: Masafumi Ban, Ryuji Ikeda, Yuji Kojima, Akihito Kuroki, Takamasa Maeda, Yukito Maeda, Hiroyuki Nishida, Kazuki Nitta, Shinji Ochi, Hiroyoshi Okajima, Koji Osanai, Kazuhiro Otani, Shota Sakimoto, Minoru Sakurai, Takahiro Shiraishi, Yuji Tsutsui, Masaki Uno, Kei Wagatsuma, and Masanori Watanabe. The authors would like to thank Shohei Fukai, Noriaki Miyaji, Kazuki Motegi, and Takuro Umeda for visual assessment of phantom images. The authors appreciate the following two committees of the Japanese Society of Nuclear Medicine for their valuable support: the Expert Committee of Standardization of PET imaging (Members: Hiroshi Ito, Setsu Sakamoto, Tohru Shiga, Keiichi Matsumoto, and Hiroshi Watabe) and the PET Nuclear Medicine Committee (Members: Makoto Hosono, Masayuki Sasaki, Ukihide Tateishi, Kenji Ishii, Kengo Ito, Hiroshi Ito, Terue Okamura, Masami Kawamoto, Yuji Kuge, Ichiei Kuji, Michio Senda, Tadaki Nakahara, Yasuhiro Magata, Keiichi Matsumoto, Koji Murakami, Tsuyoshi Yoshida, and Atsuo Waki).

Funding

This study was supported in part by the National Cancer Center Research and Development Fund (2020-J-3), Foundation for Promotion of Cancer Research in Japan, the Japanese Society of Nuclear Medicine (JSNM) Working Group, and JSPS KAKENHI Grant Number JP20K08091.

Author information

Go Akamatsu and Naoki Shimada have contributed equally to this work.

Authors and Affiliations

National Institutes for Quantum Science and Technology (QST), 4-9-1 Anagawa, Inage-ku, Chiba, 263-8555, Japan
Go Akamatsu
Cancer Institute Hospital, 3-8-31 Ariake, Koto, Tokyo, 135-8550, Japan
Naoki Shimada & Takashi Terauchi
Kyoto College of Medical Science, 1-3 Imakita, Oyamahigashi-cho, Sonobe-cho, Nantan, Kyoto, 622-0041, Japan
Keiichi Matsumoto
Gunma Prefectural College of Health Sciences, 323-1 Kamioki-machi, Maebashi, Gunma, 371-0052, Japan
Hiromitsu Daisaki
Dokkyo Medical University Hospital, 880 Kitakobayashi, Mibu, Shimotsugagun, Tochigi, 321-0293, Japan
Kazufumi Suzuki
Tohoku University, 6-3 Aoba, Aramaki, Aoba-ku, Sendai, Miyagi, 980-8578, Japan
Hiroshi Watabe
Hokkaido University of Science, 7-Jo 15-4-1 Maeda, Teine, Sapporo, Hokkaido, 006-8585, Japan
Keiichi Oda
Kobe City Medical Center General Hospital, 2-1-1 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
Michio Senda
Tokyo Medical and Dental University School of Medicine, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8510, Japan
Ukihide Tateishi

Authors

Go Akamatsu
View author publications
You can also search for this author in PubMed Google Scholar
Naoki Shimada
View author publications
You can also search for this author in PubMed Google Scholar
Keiichi Matsumoto
View author publications
You can also search for this author in PubMed Google Scholar
Hiromitsu Daisaki
View author publications
You can also search for this author in PubMed Google Scholar
Kazufumi Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Watabe
View author publications
You can also search for this author in PubMed Google Scholar
Keiichi Oda
View author publications
You can also search for this author in PubMed Google Scholar
Michio Senda
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Terauchi
View author publications
You can also search for this author in PubMed Google Scholar
Ukihide Tateishi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Go Akamatsu or Naoki Shimada.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest. All authors are members of the working group of the JSNM (no payment received).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1660 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akamatsu, G., Shimada, N., Matsumoto, K. et al. New standards for phantom image quality and SUV harmonization range for multicenter oncology PET studies. Ann Nucl Med 36, 144–161 (2022). https://doi.org/10.1007/s12149-021-01709-1

Download citation

Received: 14 October 2021
Accepted: 05 December 2021
Published: 14 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s12149-021-01709-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

New standards for phantom image quality and SUV harmonization range for multicenter oncology PET studies

Abstract

Similar content being viewed by others

Measurement and Evaluation of Quantitative Performance of PET/CT Images before a Multicenter Clinical Trial

Evaluation of PET quantitation accuracy among multiple discovery IQ PET/CT systems via NEMA image quality test

EARL compliance and imaging optimisation on the Biograph Vision Quadra PET/CT using phantom and clinical data

Explore related subjects

Introduction

Materials and methods

PET/CT scanners

Phantom experiments

Data acquisition and image reconstruction

Average SUV in the background area (SUVB,ave)

Part I: image quality with a focus on 10 mm sphere detectability

Visual detectability score

NECphantom

Image quality [10-mm-sphere contrast (QH ,10 mm), background variability (N 10 mm), and image noise level (CVBG)]

Investigation of image quality reference levels allowing the 10 mm sphere to be visible

Inter-rater variability in each image quality metric

Part II: SUV variability

SUVs of hot spheres

SUV harmonization range

Relationships between SUVs of hot spheres and image noise levels (CVBG)

Statistical analysis

Results

Average SUV in the background area (SUVB,ave)

Part I: image quality

Part II: SUV variability

Discussion

Part I: image quality

Part II: SUV variability

Limitations and future issues

Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 1660 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Average SUV in the background area (SUV_B,ave)

NEC_phantom

Image quality [10-mm-sphere contrast (Q_H _,10 mm), background variability (N _10 mm), and image noise level (CV_BG)]

Relationships between SUVs of hot spheres and image noise levels (CV_BG)

Average SUV in the background area (SUV_B,ave)