Introduction

Malignant lymphoma consists of various histologic subtypes and can be divided into Hodgkin's and non-Hodgkin's lymphoma. Diagnosis is based on the WHO classification, and selection of treatment and prognosis depend on histologic subtypes [1]. Peripheral T-cell lymphoma (PTCL) is one of the non-Hodgkin's lymphomas and originates from mature T-cells. The course of PTCL is clinically aggressive and poorly responsive to therapy [2,3,4], and thus new more-effective treatment is desired.[F-18]FDG PET/CT is recommended for the evaluation of therapeutic effects on malignant lymphoma subtypes with high [F-18]FDG avidity [5].

Because of bias and variance in the results obtained from clinical images, quantitative and reproducible measures are needed to validate specific metrics used in clinical trials. Imaging biomarkers would be validated and reliably measured and can act as meaningful surrogates for evaluation of therapeutic responses in individuals or groups. Analysis of data collected during the qualification step, substantiating performance as a response measure, could be developed into a reliable method [6,7,8,9,10,11].

The Quantitative Imaging Biomarkers Alliance (QIBA) was set up by the Radiological Society of North America (RSNA) in order to establish quantitative imaging biomarkers by reducing variability in imaging conditions and the imaging environment [12]. QIBA has 18 biomarker committees, and the [F-18]FDG-PET/CT Biomarker Committee has created a profile for response evaluation by [F-18]FDG-PET/CT in the setting of a clinical trial [13]. The profile addresses the need for phantom test standardization to ensure uniform quantitative performance across all scanners and all sites. Standardization has also been attempted in Japan. The Japan Radiological Society (JRS) has established the Japan-QIBA (J-QIBA) in cooperation with the RSNA-QIBA, and the Japanese Society of Nuclear Medicine (JSNM) has created guidelines governing standardization [14].

Darinaparsin (S-dimethylarsino-glutathione) is an organic arsenical used for treatment of malignant tumors [15,16]. Its efficacy for PTCL has been studied. The possible effect on PTCL was suggested by the results of a multicenter phase II study of darinaparsin in patients with relapsed or refractory Hodgkin and non-Hodgkin lymphoma [17]. Darinaparsin was shown to act via the MAPK pathway in a study using lymphoma cells and xenografts in SCID mice [18]. In an Asian international multicenter phase II trial, [F-18]FDG PET/CT scanners in the facilities of all countries were standardized in advance using the [F-18]FDG PET/CT profile developed by the RSNA-QIBA. We show the results of standardization using phantom tests described in the RSNA-QIBA profile.

Materials and methods

Darinaparsin (S-dimethylarsino-glutathione) has been evaluated for PTCL in previous studies [17,18]. In a phase II clinical trial of darinaparsin monotherapy in patients with relapsed or refractory PTCL in Asian countries, including South Korea, Taiwan, Hong Kong, and Japan, it was decided that the therapeutic response would be evaluated by central assessment of PET/CT imaging data. Since various PET/CT scanners in each facility would be used, standardization of the scanners was required. As J-QIBA activities, phantom tests of individual PET/CT scanners were performed at all facilities participating in the Darinaparsin Phase II clinical trial before patient enrollment. All the facilities are listed in Table 1. The institutional review board at each center approved the clinical trial.

Table 1 The facilities where phantom tests were conducted

The phantom tests were conducted in compliance with the QIBA profile requirements and as stated in [F-18]FDG-PET/CT as an Imaging Biomarker Measuring Response to Cancer Therapy [13]. The National Electrical Manufacturers Association (NEMA) International Electrotechnical Commission (IEC) body phantom and [F-18]FDG prepared at each site were used for the phantom test (Fig. 1). The radio activity level was maintained at 3.7–7.2 kBq/ml in the background area of the phantom, and at four times the background level in the hot sphere. Continuous PET data were acquired over a 1–10 min period, and each image was reconstructed with an adequate method and parameters, which were adjusted from default values as necessary at each facility.

Fig. 1
figure 1

This was the phantom used for standardization, which had six hot spheres (10, 13, 17, 22, 28 and 37 mm)

To evaluate the scanner, we measured the following three parameters: (a) standardized uptake value (SUV), (b) resolution, and (c) noise as described by QIBA. These measurements were used to assess whether (a) the SUV for the region of interest (ROI) set in the phantom was 1.0 ± 0.1 (b) the 13-mm hot sphere in the phantom was visible, and (c) the coefficient of variation (COV) of the voxel values within the region in the background area was below 15%. Axial uniformity, also mentioned in the profile, was not measured because the shape of the NEMA body phantom was not suitable for the measurement. We evaluated whether these criteria were fulfilled by parameter adjustment.

To clarify the difference between the RSNA-QIBA profile and the guideline in Japan, we also evaluated indexes recommended in Japanese guideline for the oncology FDG-PET/CT data acquisition protocol (JSNM guideline) [14], i.e., phantom noise equivalent count (NECphantom), % background variability (N10 mm), % contrast (QH, 10 mm), and relative recovery coefficient (RC).

Results

Twelve facilities in Asia (South Korea, Taiwan, and Hong Kong) were enrolled in this trial (Table 1), and standardization was carried out for 12 scanners including the Discovery PET/CT 600, Discovery PET/CT 690, Discovery PET/CT 710, Discovery STE 16, and Discovery VCT (GE Healthcare, total number of scanners: 8) and the TruePoint Biograph 6, TruePoint Biograph 40, Biograph mCT, and Biograph mCT Flow 40-4R (Siemens Healthineers, total number of scanners: 4). The scanners, injected doses, and imaging parameter values for each site are shown in Table 2. At each center, [F-18]FDG was injected in daily practice at 3.7–7.4 MBq/kg or 370 MBq. Scan duration remained in the range of 1.5–3.5 min at all sites except one that used flow motion. We adjusted imaging conditions to meet the criteria approved by the RSNA-QIBA as needed. Change in one or more parameters was needed at 6 of the 12 facilities but not at the other 6 facilities.

Table 2 List of scanners, injected doses, scan durations, image reconstruction parameters

In accord with the QIBA profile, the data from the phantom tests for SUV, resolution, and noise were analyzed. SUV for the ROI set in the phantom ranged from 0.9 to 1.1, and the 13-mm spheres in the phantom were visible on all scanners. The maximum COV of the voxel values was 11.9%, which should be below 15% according to the profile. We confirmed that the image quality met all three criteria at all sites after parameter adjustment (Table 3).

Table 3 List of SUVs, resolutions, and coefficients of variation (COV)

After the revision of imaging parameters, we assessed the physical indexes mentioned in the Japanese guideline (JSNM guideline) (Table 4). Ten of the 12 scanners did not meet JSNM criteria (NECphantom > 10.8 Mcounts, N10 mm < 5.6%, QH,10mm/N10 mm > 2.8, RC10 mm > 0.38). Patient enrollment began after individual institutions received the results of this field data analysis.

Table 4 Mean and range for the indexes in the JSNM guideline

Discussion

This study demonstrated the first attempt as J-QIBA activities to standardize the image quality of [F-18]FDG PET/CT scans used for the evaluation of therapeutic effect in an Asian international multicenter phase II trial using the [F-18]FDG PET/CT profile approved by the RSNA-QIBA [13]. The results of standardization using phantom tests recommended by the RSNA-QIBA showed that image quality standardization was achieved safely and reliably before proceeding to patient enrollment.

In this study, phantom tests were performed at 12 institutions in Asia (South Korea, Taiwan, and Hong Kong). Because of the sufficiently high [F-18]FDG concentration used at each site, scan time extension from that used in the initial protocol was not needed at all sites except one that used flow motion.

Point spread function (PSF) correction was used in the reconstruction process in some facilities as a default but was not adopted in the revision process. In this clinical trial, [F-18]FDG PET/CT was used for evaluation of the therapeutic effect rather than detection, and it was important to minimize the differences between the scanners. PSF correction is known to increase noise and Gibbs artifact [19,20] and not considered appropriate.

In addition to the measurements stated in the QIBA profile, the parameters for image quality required by Japanese guideline for the oncology FDG-PET/CT data acquisition protocol (JSNM guideline) were also assessed [14]. Standardization by phantom tests as recommended in the JSNM guideline for a clinical trial was reported previously [21] and evaluation was done in this study in the same way. The imaging conditions were adjusted to meet the criteria in the QIBA profile at all the facilities, but 10 of the 12 scanners could not fulfill the recommendations when evaluated by the JSNM guideline. This result indicated criteria for image quality referred to in the QIBA profile were easier to be met than those mentioned in the JSNM guideline, and the standardization procedure in the QIBA profile was regarded as convenient because of the ease of phantom test introduction into the international multicenter study.

Standardization procedure in this study was based on (but not fully compliant with) the one described in the QIBA profile. Axial uniformity measurement, one of the criteria used in phantom testing according to the QIBA profile, could not be assessed because of the shape of the NEMA body phantom. Furthermore, applying harmonization strategies might be required for enigmatic studies in the future.

Quantitative and reproducible measures from imaging studies are needed to validate specific metrics in clinical trials and clinical practice because bias and variance in the results obtained from clinical images can come from several sources. RSNA-QIBA has developed a flexible framework to organize the work of its coordinating and biomarker committees, which is to identify reproducible quantitative imaging biomarkers. The RSNA-QIBA has made an effort to liaison with the European Imaging Biomarkers Alliance (EIBALL) and J-QIBA. As shown by the initial results of an Asian international multicenter phase II trial as J-QIBA activities, results obtained in Asia are possible to correspond to those of international clinical trials in western countries using the RSNA-QIBA profile.

In conclusion, using the RSNA-QIBA profile, we standardized imaging conditions by phantom tests for response evaluation by [F-18]FDG PET/CT images acquired in a multicenter study. J-QIBA can settle quantitative imaging data of Asian international studies.