Introduction

Non-small-cell lung cancer (NSCLC) is one of the most common cancers in the world. Chemotherapy is still the most widely used treatment for patients with advanced NSCLC [1]. Precise and early prediction of tumour response is of great value because it can help prevent unnecessary toxicity and the additional cost of administering ineffective treatment. Currently, the widely used approaches to monitoring therapeutic responses are based on anatomical changes identified on CT imaging or any other anatomical imaging modality. The first study to evaluate tumour responses to therapy was performed by Moertel and Hanley in 1976, and was followed by the World Health Organization (WHO) criteria in 1979, Response Evaluation Criteria in Solid Tumors (RECIST) in 2000, and RECIST 1.1 in 2009 [25]. Although these response evaluation protocols have been updated and improved over time, they are still limited by their dependence on morphological changes to assess therapeutic response [69]. The assumption that changes in tumour dimensions are a true marker of therapeutic efficacy is prone to error because tumour tissue consists of variable substances and the regression of all these components may appear slowly and incompletely. This is especially true in NSCLC because fibrotic masses may remain even after cure.

Quantitative 18F-FDG PET is thought to overcome these limitations and is believed to be a more suitable assessment tool for evaluating therapeutic response [69]. Fundamentally, quantitative 18F-FDG PET treatment response assessment is based on the alteration in the standardized uptake value (SUV) between baseline and follow-up studies. However, the SUV is affected by various factors, such as technical, physical, and biological factors [1012]. To facilitate reproducibility so that results across trials can be compared, a widely accepted standardized protocol is required. Currently, the following two protocols are used to quantify anticancer treatment responses in terms of metabolic changes: the European Organization for Research and Treatment of Cancer (EORTC) criteria, which were developed in 1999 [13], and the Positron Emission Tomography Response Criteria in Solid Tumors (PERCIST) 1.0, which were developed by American researchers in 2009 [9]. EORTC recommends using SUV normalized to body surface area (SUVbsa) to reduce the influence of body weight on SUV [13], but it does not propose a rigorous standardization protocol to ensure that the criteria are uniformly applied. PERCIST 1.0 recommends using SUV corrected for lean body mass (SUL) to avoid falsely high organ SUV in obese patients [14, 15]. It also recommends computing the SULpeak and total lesion glycolysis (TLG) values as response metrics. Furthermore, PERCIST 1.0 proposes a series of detailed and unambiguous regulations to standardize procedures to ensure reproducibility.

Many aspects of PERCIST 1.0 have been improved compared to the EORTC criteria, but PERCIST 1.0 involves the performance of some cumbersome tasks. For example, accurately measuring SULpeak, accurately registering the images from a series of examinations in the same patient, choosing the optimal reference region in the liver, comprehensively monitoring all the factors affecting SUV quantitation between two studies. Ideally a single application would be used to perform these functions automatically. PET volume computer-assisted reading (PET VCAR), an application of the Advantage Workstation (GE Healthcare), is one such program that can be used by the clinician to assist in monitoring treatment response [16].

Several studies have compared response evaluations using anatomical and metabolic criteria in patients with NSCLC receiving chemotherapy [17, 18]. Ordu et al. [17] used SUVmax instead of SULpeak when applying PERCST 1.0. Ding et al. [18] manually calculated SULpeak, which was defined as the largest possible mean value of a spherical region of interest (ROI) of 1 cm3 within a tumour. It is difficult manually to calculate SULpeak accurately. Furthermore, none of these reports described how the acquisition parameters of the two PET/CT examinations affecting SUV variability were monitored, how the images from serial examinations were registered, or how the reference region in the liver was chosen. The original intention of formulating PERCIST 1.0 was to create a standardized mechanism to enhance the reproducibility needed for the comparison of response rates between trials. If the standardized protocol is not strictly followed in practice, the most important component of PERCIST 1.0 will be lost. The aim of this study was to assess the concordance among RECIST 1.1, EORTC criteria and PERCIST 1.0 using PET VCAR for the early evaluation of response to chemotherapy in patients with NSCLC.

Materials and methods

Patients

From October 2012 to November 2014, patients with histopathologically proven advanced NSCLC who were scheduled to undergo chemotherapy treatment were prospectively enrolled in this study. Patients were excluded if they were diabetic, had brain metastasis, had undergone surgery, or had received previous chemotherapy or radiotherapy. All patients were given the following chemotherapy regimen: gemcitabine (1,000 mg/m2) and cisplatin (75 – 80 mg/m2; GP protocol). The baseline 18F-FDG PET/CT scan was performed during the week before the start of treatment, followed by an early 18F-FDG PET/CT scan within 10 to 14 days after two cycles of chemotherapy. Additional CT scans of the chest and other regions were performed as clinically indicated every 3 months or in patients with clinical suspicion of disease progression. Treatment was continued until progression was identified by CT according to RECIST 1.1, unacceptable toxicity, or patient withdrawal. Progression-free survival (PFS) was calculated as the time from the first chemotherapy administration to disease progression or death from any cause. The study was approved by the institutional review board of the First Affiliated Hospital of Jinan University, and was in compliance with national legislation and the Declaration of Helsinki guidelines. Written informed consent was obtained from all patients.

PET/CT examinations

All 18F-FDG PET/CT examinations were performed with the same protocol using a GE Discovery PET/CT 690 system, and the scanning procedure closely resembled the National Cancer Institute guidelines [19]. Patients were instructed to fast for at least 6 h prior to the 18F-FDG PET/CT scan. The blood glucose level was measured before the tracer was injected. Whole-body PET/CT images, generally from the top of the skull to mid-thigh, were acquired 50 to 70 min after intravenous injection of 18F-FDG at a dose of 0.08 – 0.10 mCi/kg body weight. Patients were asked to void their bladder immediately before scanning to minimize the presence of the tracer in the urinary tract.

CT data were acquired in breath-hold with 120 kV, 80 – 160 mA modulated using the GE AutomA technique with a noise index of 30, slice thickness of 3.75 mm, slice interval of 3.27 mm, pitch of 1.375, matrix size of 512 × 512 and scan FOV of 50 cm. PET data were acquired in 3D time-of-flight (TOF) mode with a 2-min scan per bed position, slice thickness of 3.27 mm, slice interval of 3.75 mm, matrix size of 192 × 192 and scan FOV of 70 cm. The PET data were attenuation-corrected (AC) by the integrated CTAC technology. The CT data were reconstructed in standard mode, window width/window level 400/40, advanced statistical iterative reconstruction 40 %. The PET data were then reconstructed in terms of the point spread function (PSF) together with TOF technology.

Evaluation of the therapeutic response by PET VCAR

We selected baseline and follow-up examinations from the same patient and launched the PET VCAR workflow. The images from the examinations were automatically coregistered based on the characteristics of the bone and soft tissue on the CT scan. The SUV-related parameters were automatically reviewed and compared using PET VCAR. PET VCAR would highlight relevant parameters in the following scenarios: if the difference in injected doses of 18F-FDG was over 20 %, if the difference in uptake time was more than 15 min, if either the scan parameters or the reconstruction parameters varied between the baseline and follow-up scans, and if the blood glucose level was more than 200 mg/dL during one scan. If one of the SUV-related parameters was highlighted, the patient was excluded. All data were analysed separately by one experienced radiologist and one experienced nuclear physician to generate response assessments according to the RECIST 1.1, EORTC criteria and PERCIST 1.0. For RECIST 1.1 criteria, the size of the target lesion was manually measured by the reader, and PET VCAR automatically calculated the sum percentage variation in the longest diameters of the target lesion. For EORTC and PERCIST 1.0, the ROIs in the lesions on the baseline examination were automatically bookmarked and propagated onto the coregistered follow-up examination images. The propagated bookmarks were accepted or rejected using the confidence scale, and ROIs were manually adjusted by the reader as needed. The EORTC results are based on the percentage variation in the SUVmax of the target lesions; the PERCIST 1.0 results are classified as progressive metabolic disease (PMD), stable metabolic disease (SMD), partial metabolic response (PMR) and complete metabolic response (CMR).

Response evaluation with RECIST, EORTC criteria and PERCIST

The details concerning the measurable baseline lesions and response assessment criteria for RECIST 1.1, EORTC and PERCIST 1.0 are shown in Table 1. For RECIST 1.1, we chose up to five and no more than two lesions per organ as target lesions. For EORTC, we chose to up to five lesions with the highest 18F-FDG uptake in as many involved organ systems as possible as target lesions. We chose the body surface area to normalize the SUV and chose SUVmax for the response calculation. For PERCIST 1.0, we chose the hottest lesion as the target lesion on the baseline and subsequent follow-up scans. The hottest lesion on the follow-up scan could be a lesion different from the previously measured lesion, on the assumption that it had been present since baseline. The SULpeak was used for the response calculation. The optimized reference region was automatically delineated in the right lobe of the liver. In patients with liver involvement, we chose the descending thoracic aorta as the background area. TLG was automatically computed by PET VCAR using an iterative adaptive segmentation algorithm to find a threshold value that separated the target volume from the background tissue by weighting the SULmax and SULmean within the target volume with a default weighting factor of 0.5 [20].

Table 1 Response evaluation criteria for RECIST, EORTC and PERCIST

In addition, we also chose up to five of the hottest lesions (no more than two per organ) as target lesions using PERCIST 1.0. Each of the baseline target lesions had to meet the PERCIST 1.0 definition of measurable lesions. The sum of the variation in SUL from multiple lesions was measured and the response assessment was classified according to PERCIST 1.0. The response classifications based on evaluation of multiple lesions were compared with those from evaluation of a single lesion to determine whether there were differences in response assessment.

Statistical analysis

The paired t test was used to evaluate the differences in the same parameters between before and after chemotherapy. The response classifications of RECIST 1.1, EORTC and PERCIST 1.0 were graded on a four-point ordinal scale as follows: progressive disease (PD) or PMD =1, stable disease (SD) or SMD =2, partial remission (PR) or PMR =3, and complete response (CR) or CMR =4. Concordance was assessed using Cohen’s κ coefficient. Agreement between the two assessments was categorized as poor (weighted κ < 0), slight (weighted κ = 0 – 0.20), fair (weighted κ = 0.21 – 0.40), moderate (weighted κ = 0.41 – 0.60), substantial (weighted κ = 0.61 – 0.80), and almost perfect (weighted κ > 0.80) [21]. The differences among the results of the three protocols were assessed using Wilcoxon’s signed-ranks test. PFS was calculated using the Kaplan-Meier test and statistically evaluated using the log-rank test. P values less than 0.05 were considered statistically significant.

Results

Patient characteristics

A total of 46 consecutive eligible patients with advanced NSCLC were enrolled in this prospective study, of whom 11 were excluded for the following reasons: technical issues (three patients), patient’s own wish (four patients), and early discontinuation of chemotherapy due to a significant decline in performance status (four patients). A total of 35 patients (26 men and 9 women; mean age 61 years, range, 48 – 84 years) were therefore included in this study, including 22 with adenocarcinoma and 13 with squamous cell carcinoma. Fourteen patients had stage III disease, and 21 had stage IV disease. The mean time between completion of the two cycles chemotherapy and the follow-up PET/CT scan was 12 days (range 10 – 14 days). Patient characteristics are shown in Tables 2 and 3. There were no significant differences in blood glucose levels, injected doses of 18F-FDG, and uptake times between before and after chemotherapy (P > 0.05).

Table 2 Clinical characteristic of the 35 evaluated patients
Table 3 Parameters before and after chemotherapy for NSCLC cancer

Treatment response assessments with RECIST, EORTC criteria and PERCIST

The RECIST 1.1, EORTC criteria and PERCIST 1.0 results are shown in Table 4. With RECIST 1.1 the reduction in tumour diameter was 28.8 ± 1.6 % mm (mean ± SD), with 8 patients showing PD, 20 SD, 7 PR and 0 CR, giving a response rate (PR + CR) of 20.0 %. With EORTC criteria the reduction in SUVmax was 40.1 % ± 2.5 % , with 8 patients showing PMD, 8 SMD, 19 PMR and 0 CMR, giving a response rate (PMR + CMR) of 54.3 %. With PERCIST 1.0 the reduction in SULpeak was 36.1 % ± 2.5 %, with 8 patients showing PMD, 6 SMD, 21 PMR and 0 CMR, giving a response rate (PMR + CMR) of 60.0 %.

Table 4 Therapy response assessments with RECIST, EORTC criteria and PERCIST

Comparison of treatment response assessments between RECIST and EORTC criteria

RECIST 1.1 and EORTC classifications were discordant in 20 patients (57.1 %), with slight agreement in the response classification between the two assessments (κ = 0.194, P < 0.05; Table 5). Of eight patients classified as PD according to RECIST 1.1, four were classified according to EORTC as SMD, as the increase in the sum of the longest diameters of the target lesions was over 20 %, while the increase in the sum of SUVmax was less than 25 %. Of 20 patients classified as SD according to RECIST 1.1, 4 were classified as PMD according to EORTC, as new lesions were detected on PET, but not on CT. Twelve patients were classified as PMR, as the decrease in the sum of the diameters of the target lesions was less than 30 %, while the decrease in the sum of SUVmax was more than 25 %.

Table 5 Comparison of treatment response assessments according to RECIST and EORTC

Comparison of treatment response assessments between RECIST and PERCIST

RECIST 1.1 and PERCIST 1.0 classifications were discordant in 22 patients (62.9 %), with slight agreement in the response classification between the two assessments (κ = 0.139, P < 0.05; Table 6). Of eight patients classified as PD according to RECIST 1.1, four were classified as SMD according to PERCIST 1.0, as the increase in the sum of the longest diameters of the target lesions was more than 20 %, while the increase in the SULpeak of the target lesions was less than 30 %. Of 20 patients classified as SD according to RECIST 1.1, four were classified as PMD according to PERCIST 1.0, as new lesions were detected on PET, but not on CT. Fourteen patients were classified as PMR, as the decrease in the sum of the longest diameters of the target lesions was less than 30 %, while the decrease in the SULpeak of the target lesions was more than 30 %.

Table 6 Comparison of treatment response assessments between RECIST and PERCIST

Comparison of treatment response assessments between EORTC criteria and PERCIST

EORTC and PERCIST 1.0 classifications were discordant in four patients (11.4 %), with an almost perfect agreement in response classification between the two assessments (κ = 0.804,P > 0.05; Table 7). Of eight patients classified as PMD according to EORTC, one was classified as SMD according to PERCIST 1.0, as the increase in the sum of SUVmax of the multiple lesions was more than 25 %, while the increase in the SULpeak of the single lesion was less than 30 %. Of eight patients classified as SMD according to EORTC, one was classified as PMD according to PERCIST 1.0, as the increase in the sum of the SUVmax of the multiple lesions was less than 25 %, while the increase in the TLG was more than 75 %. Two patients were classified as PMR, as the decrease in the sum the SUVmax of the multiple lesions was less than 25 %, while the decrease in the SULpeak of the single lesion was more than 30 %.

Table 7 Comparison of treatment response assessments between EORTC and PERCIST

Progression-free survival analysis

The median overall PFS was 7 months (range 3 – 21 months). With RECIST 1.1, the median PFS was 10 months in patients with PR, 9 months in those with SD, and 3 months in those with PD. The median PFS in patients with PR was significantly longer than in those with PD (P < 0.001), but was not significantly longer than in those with SD (P = 0.855; Fig. 1a). With EORTC criteria, the median PFS was 11 months in patients with PMR, 4 months in those with SMD, and 3 months in those with PMD. The median PFS in patients with PMR was significantly longer than in those with the SMD and in those with PMD (P = 0.020 and P < 0.001, respectively; Fig. 1b). With PERCIST 1.0, the median PFS was 11 months in patients with PMR, 3 months in those with SMD, and 3 months in those with PMD. The median PFS in the patients with PMR was significantly longer than in those with SMD and in those with PMD (P < 0.001 for both; Fig. 1c).

Fig. 1
figure 1

Progression-free survival in relation to response assessments according to RECIST 1.1 (a), EORTC criteria (b), and PERCIST 1.0 (c). PR partial remission, PMR partial metabolic response, SD stable disease, SMD stable metabolic disease, PD progressive disease, PMD progressive metabolic disease

Comparison of PERCIST treatment response assessments between multiple lesions and a single lesion

The therapeutic responses according to PERCIST 1.0 with multiple lesions were as follows: 8 patients had PMD, 5 had SMD, 22 had PMR, and 0 had CMR. PERCIST 1.0 classifications with multiple lesions and with a single lesion were discordant in three patients (8.6 %), with an almost perfect agreement in response classification between the two assessments (κ = 0.803, P > 0.05). One patient was reclassified from PMR to SMD, as the reduction in summed SULpeak of multiple lesions was less than the reduction in the single lesion SULpeak. Two patients were reclassified from SMD to PMR, as the reduction in summed SULpeak of multiple lesions was greater than the reduction in the single lesion SULpeak.

Discussion

In this study, we used an automatic application to evaluate the early therapeutic response to chemotherapy in patients with advanced NSCLC determined using RECIST 1.1, EORTC criteria and PERCIST 1.0. We found that EORTC criteria and PERCIST 1.0 were more sensitive and accurate than RECIST 1.1 for the detection of an early therapeutic response. Furthermore, EORTC criteria and PERCIST 1.0 showed better concordance for the detection of an early therapeutic response.

Comparison of RECIST and PERCIST 1.0 in patients with NSCLC has already been reported. In a study by Ding et al. [18], 44 patients with NSCLC who received cisplatin-based combination chemotherapy were evaluated using RECIST and PERCIST 1.0 after two or four to six cycles of treatment. The differences between RECIST and PERCIST 1.0 evaluations were significant if new lesions were not identified on PET images when using RECIST. There have been no published studies comparing response evaluations between RECIST and EORTC criteria in patients with NSCLC treated with chemotherapy. In other tumours, Monteil et al. [22] found a discrepancy between RECIST and EORTC evaluations in patients with colorectal cancer receiving chemotherapy. In this study, we found that the therapeutic response rate was 20.0 %, 54.3 % and 60.0 % using RECIST 1.1, EORTC criteria and PERCIST 1.0, respectively. RECIST 1.1 and EORTC response classifications were discordant in 20 patients (57.1 %), and RECIST 1.1 and PERCIST 1.0 classifications were discordant in 22 patients (62.9 %), with slight agreements in response classification between the two assessments. The results of this study are similar to those of previous studies [18, 22]. Of the 20 patients classified as having SD according to RECIST 1.1, 12 were reclassified as having PMR according to EORTC criteria and 14 were reclassified as having PMR according to PERCIST 1.0 because metabolic changes exceeded the threshold, but morphological changes did not.

Through the follow-up, we found that the median PFS in patients with PR according to RECIST 1.1 was significantly longer than in patients with PD, but not in patients with SD. On the other hand, the median PFS in patients with PMR according to EORTC criteria and PERCIST 1.0 was significantly longer than that in patients with SMD and PMD. The patients who shifted from SD with RECIST 1.1 to PMR when evaluated with EORTC criteria or PERCIST 1.0 had a longer PFS. One possible reason for this is that metabolic changes occur earlier than anatomical changes after chemotherapy. Similar results have been shown in several studies [2326]. These results indicate that EORTC criteria and PERCIST are more sensitive and accurate than RECIST for the detection of an early therapeutic response. This advantage can potentially allow earlier therapy modification based on the 18F-FDG PET response.

Currently, no studies have been published comparing response evaluations between EORTC criteria and PERCIST 1.0 in patients with NSCLC receiving chemotherapy. In this study, we found that EORTC criteria and PERCIST 1.0 had excellent concordance in response classification. Similar results have been obtained in previous studies. Ziai et al. [27] found perfect concordance between EORTC and PERCIST 1.0 evaluations in patients with small-cell lung cancer who were receiving chemotherapy and radiotherapy. Skougaard et al. [28] found that EORTC criteria and PERCIST 1.0 gave similar evaluation responses in patients with metastatic colorectal cancer treated with a combination of irinotecan and cetuximab. A reasonable explanation for the excellent concordance is that all of these studies focused on the most metabolically active part of the patients’ tumours, which is regarded as the most aggressive disease fragment, determining the disease development according to cancer stem cell theory [29, 30].

On the other hand, EORTC and PERCIST 1.0 response classifications were discordant in only four patients (11.4 %). Similar results were obtained by Skougaard et al. [28], who found that EORTC and PERCIST 1.0 evaluations were discordant in 13.1 % of patients (8/61). Although EORTC and PERCIST 1.0 evaluations are rarely discordant, they differ in several other aspects. With EORTC criteria, SUVbsa is recommended for assessing tumour 18F-FDG uptake. Definitions of the conditions under which SUVmax or SUVmean are used for response calculation were not given. Details of the selected standard for target lesions were also not given. Thus, applying EORTC criteria may have generated different outcomes by several different mechanisms. While with PERCIST 1.0, SULpeak is used to assess tumour 18F-FDG uptake. The number and selected standard of target lesions are explicitly defined. TLG is recommended as another measure of progression. An absolute and rigorous standardization of the PET protocol is given to guarantee reproducible SUL. These definitions mean that PERCIST 1.0 is more consistently applied among international observers. This is crucial, because if no consistent criteria are available, comparisons between studies will be inaccurate and advances in cancer treatment will be hampered [7, 9]. Thus, although EORTC criteria and PERCIST 1.0 had excellent concordance in response classifications, PERCIST 1.0 showed the potential for better consensus and comparability. The computer-assisted reading application that was fully customized for use with PERCIST 1.0 makes the procedure of assessment simpler and helps ensure more accurate results.

Regarding the number of target lesions, PERCIST 1.0 recommends choosing the hottest single lesion as the target lesion. It also recommends that multiple lesions (no more than two per organ and up to five of the hottest lesions) be measured as a secondary analysis. In this study, although the results of response evaluations with multiple lesions and a single lesion were similar, they were still discordant in three patients. Further study with a larger sample size is needed to determine the optimal number of target lesions for response assessment.

The current study had some limitations. First, we have not yet evaluated the corresponding pathological information with these imaging findings. Further study is needed to investigate the correlation of these findings with pathological examinations. Second, the precision of SUL was not determined in this study for ethical reasons. Animal studies are needed to quantify the precision of SUL in future research.

Conclusion

This study demonstrated that EORTC criteria and PERCIST 1.0 are more sensitive and accurate than RECIST 1.1 for the evaluation of the early therapeutic response to chemotherapy in patients with NSCLC. Compared with the EORTC protocol, less ambiguous definitions of standardization help simplify response evaluations with PERCIST 1.0.