Introduction

Various criteria to assess tumor response to therapy by measuring changes of tumor size on imaging studies were developed in the 1960s and 1970s. The World Health Organization (WHO) recognized the need for standardized criteria across clinical trials and published the “WHO handbook for reporting results of cancer treatment” in 1979 [1, 2]. These “WHO criteria” for assessing tumor response have been refined and simplified by the “Response Evaluation Criteria in Solid Tumors (RECIST)” which were developed jointly by the European Organization for Research and Treatment of Cancer (EORT), the National Cancer Institute (NCI) of the USA and the National Cancer Institute of Canada Clinical Trials group [3]. RECIST 1.0 criteria were initially published in 2000 and updated (RECIST 1.1) in 2009 [4]. The updated criteria further clarify which lesions should be measured to monitor tumor response and how these lesions are measured. Another update of RECIST is currently in preparation.

The fundamental principles of assessing tumor response, however, have not changed since the initial publication of the WHO criteria in 1979 [1, 2]. Overall “tumor burden” is quantified by summing the size of tumor lesions in a baseline scan before the start of a new therapy. Bi-dimensional measurements are used for the WHO criteria and unidimensional measurements for RECIST. Tumor response is then quantified by measuring the relative change of this sum of lesion sizes. For clinical decision making and for reporting the results of clinical trials, this continuous parameter is then dichotomized in responders and non-responders. For bi-dimensional measurements (WHO criteria), response is defined as a decrease of the sum of tumor sizes by at least 50%. For spherical tumors, this is equivalent to a decrease of the diameter by 30% (RECIST). Large retrospective analyses have shown that response assessment by WHO criteria and RECIST 1.0 and RECIST 1.1 leads to very similar response classifications [35].

WHO criteria and RECIST have been remarkably successful and have been applied to assess tumor response in thousands of clinical studies. Their great success is perhaps unexpected because WHO criteria were defined at a time when tumor sizes were measured on planar x-rays and by palpation and have not been fundamentally updated since then [1, 2]. Furthermore, the WHO criteria are neither disease- nor treatment-specific, and the definitions of response and progression were not based on outcome data, but the repeatability of tumor size measurements with the technologies available at this time.

Despite these obvious limitations, retrospective analyses of clinical studies have clearly shown that tumor response by RECIST or WHO criteria is associated with improved survival in most all solid tumors [6]. Attempts have been made to improve prediction of patient survival by using continuous metrics for tumor response, but no significant differences as compared to RECIST have been observed so far [7, 8].

EORTC PET response criteria and PERCIST

When discussing criteria for assessment of tumor response by FDG PET and changes in tumor metabolism, it is important to consider the history of the morphologic response criteria, which shows that simple—and to a large extent arbitrary—criteria can prove to be highly valuable if they are robust and routinely used in clinical trials.

EORTC PET response criteria and PET response criteria in solid tumors (PERCIST) follow the model of RECIST and define four response categories with similar names as RECIST—complete metabolic response (CMR), partial metabolic response (PMR), stable metabolic disease (SMD) and progressive metabolic disease (PMD) as shown in Table 1 [9]. However, there are some differences in the exact definitions of the response categories and PERCIST provides many more details regarding which lesions are considered “measurable” and how scans with multiple FDG-avid lesions should be analyzed.

Table 1 Overview of response classification by EORTC criteria and PERCIST

EORCT PET response criteria were published in 1999 [9]. At this time, there was only limited experience in using FDG PET for treatment monitoring. In fact, the publication lists only 10 studies including a total of 95 patients as the basis for the recommendations. Six of these studies were performed for primary brain tumors and seven studies quantified changes in tumor metabolic activity not by standardized uptake values (SUVs), but by glucose metabolic rates (mostly derived from dynamic PET studies). These studies were performed before PET imaging became fast enough to allow for whole-body imaging studies. Therefore, the available studies focused on evaluation of one lesion, often in patients undergoing neoadjuvant chemotherapy for locally advanced disease. Notably, EORTC criteria are therefore “silent” on how to assess tumor response in a patient with multiple lesions. There was only one published study on the test-retest reproducibility (repeatability) of quantitative measurements of tumor FDG uptake. Based on these data, a response was basically defined as a decrease of tumor FDG uptake by 25% or more and progression as an increase in tumor FDG uptake by more than 25% or an increase of the maximum tumor diameter by 20% or the appearance of new metastases.

The 25% threshold of SUV change to define a response was mainly based on the limited data on the repeatability of tumor SUV measurements in patients undergoing no therapy. The standard deviation of relative changes of SUVs was approximately 10% and showed no major deviation from a normal distribution. Therefore, 2.5 times the standard deviation should include about 99% of the fluctuations of the measurements that are not caused by a therapeutic effect. A comparison with the limited literature data on changes in FDG uptake during treatment also supported that a decrease in tumor FDG uptake by 25% or more was correlated with a more favorable outcome.

When the EORTC criteria were developed, it was shown that tumors responding to treatment could be identified earlier by FDG PET imaging than by size measurements. At earlier time points, changes in FDG uptake are likely to be less pronounced than at later time points. Therefore, the EORTC criteria defined a response after 1 cycle of chemotherapy as a decrease in FDG uptake by 15–25%. Time-dependent response criteria are conceptionally interesting, but difficult to generalize, because of different cycle lengths of different treatments. Furthermore, several newer oral anti-cancer drugs, such as EGFR kinase inhibitors, are not given in treatment cycles.

The net FDG influx constant, Ki, was recommended as a parameter to quantify tumor FDG uptake. Measurement of this parameter requires a dynamic PET scan of 60-min duration. Dynamic imaging is now only rarely used in oncologic FDG PET imaging because several studies have shown a close correlation between changes in static uptake measurements and changes of Ki. If a static scan is performed, the EORTC criteria recommended to measure tumor FDG uptake by SUVs normalized by body surface area (BSA), because this parameter is less influenced by the body fat content which artificially increases SUVs normalized by body weight.

Overall, several studies have confirmed that the EORTC criteria are useful for assessing tumor response and predicting patient outcome. However, since the publication of the EORTC criteria, many studies have investigated the use of FDG PET for treatment monitoring. These studies have emphasized whole-body imaging and, since 2001, have mostly used PET/CT and not standalone PET. This has prompted the development of PERCIST [10] which takes into account the greater volume of literature on treatment monitoring with FDG PET. PERCIST, therefore, provides much more detailed guidance on treatment monitoring. In 2016, a simplified guide for response assessment by PERCIST was published [11].

A summary of the response categories by PERCIST is shown in Table 1. Key differences are the use of SUVs normalized by lean body mass, not body surface area, the calculation of the “peak” SUV of the tumor tissue, not the mean SUV, a slightly different threshold for tumor response progression, and a different approach to select lesions on the baseline and follow-up scan.

Maximum or average SUVs have been used in many studies to quantify tumor FDG uptake, but have specific limitations. Maximum SUVs are very easy to measure in an operator-independent way, but they are sensitive to image noise and overestimate tumor FDG uptake in mildly FDG-avid lesions, e.g. metastases after chemotherapy. Average SUVs avoid this problem, but require the definition of tumor borders. Definition of tumor borders introduces inter-observer variability, because it is frequently necessary to separate physiologic FDG uptake from pathologic tumor uptake. This still cannot be performed in a fully automated fashion, especially when the contrast between tumor and normal tissue is low after therapy.

The goal of SUVpeak is to avoid the limitations of SUVmax and SUVmean by measuring the activity concentration in an approximately 1.0-cm volume that encompasses the voxels with the highest radioactivity concentration in the tumor. Because SUVpeak averages several voxels, it is less sensitive to image noise. Because SUVpeak does not require contouring of a lesion, it is less operator-dependent than SUVmean. Several vendors have now included tools to measure SUVpeak in their PET viewing software.

Like the EORTC criteria, PERCIST does not recommend using SUVs normalized by the patient’s body weight to quantify tumor FDG uptake. Tumor FDG uptake is normalized by lean body mass and the resulting SUVs are, therefore, abbreviated as SUL. Compared to normalization by body surface area, SULs (SUVpeak values normalized to lean body mass) have the advantage that the absolute values are similar to SUVs normalized by body weight for non-obese patients.

PERCIST uses a change in FDG uptake by 30% as a criterion for tumor response and progression. This threshold is supported by a series of studies summarized by a meta-analysis [12] which have assessed the test-retest reproducibility or reliability since the publication of EORTC criteria. These studies have shown a slightly larger variability of SUV measurements than the single study that was available at the time of the publication of the EORTC criteria. A higher variability of SUV measurements was mostly found in larger multicenter studies [1315]. Therefore, a larger change in SUV (30%) is required for a response according to PERCIST as compared to the EORTC criteria.

More important than the minor change of the threshold for a tumor response is that PERCIST also requires a minimum absolute difference of 0.8 SUL. This avoids calling minor absolute changes in tumor FDG uptake a response. For the same reason, PERCIST also requires a minimum tumor SUL in order to consider a lesion measurable for response (1.5 times the mean SUL of the liver).

Furthermore, the way that lesions are selected for response assessment is fundamentally different from the EORTC criteria. EORTC criteria require the selection of target lesion(s) on the scan before therapy. The same lesions are then re-identified on the following scan and their change in FDG uptake or size are recorded. For PERCIST, a quite different approach is used. The baseline and the follow-up scan are reviewed and the lesion with the highest FDG uptake on each of these two scans is identified. Importantly, this is not necessarily the same lesion. Then, the change in SULpeak is calculated and used to assess response (Fig. 1). This new approach eliminates the variability in selecting target lesions and significantly simplifies response assessment as only two measurements are compared. In contrast, WHO criteria and RECIST 1.0 require measurements for up to 10 lesions, and RECIST 1.1. measurements for up to 5 lesions. PERCIST 1.0 recommends also evaluating the changes in the sum of up to five lesions as secondary measure to assess response. For this approach, the SUL is determined for up to five tumors (up to two per organ) that present with the most intense 18F-FDG uptake and are typically the lesions identified on RECIST 1.1.

Fig. 1
figure 1

Lesion selection for response assessment by PERCIST. Images show the maximum intensity projection (MIP) of FDG PET/CT images of a woman with right breast cancer and contralateral lymph node metastases. On the baseline images, (a) the primary tumor shows the highest FDG uptake and is used as the target lesion (red arrow). On the follow-up study, the highest FDG uptake is seen in the contralateral lymph node metastasis (red arrow) which is the target lesion for the follow-up scan (b). The change in tumor FDG uptake is calculated as [SULpeak(lymph node - SULpeak(breast)]/SULpeak(breast)

Response assessment may obviously be different for these two approaches, as illustrated in Fig. 2. However, initial analyses in patients with breast cancer have indicated that the differences in response assessment for a one-lesion and five-lesion analysis are quite small [16], but future studies in other tumors are clearly needed.

Fig. 2
figure 2

Differences in response assessment depending on the number of lesions analyzed. The liver metastasis (red arrow) is the lesion with the highest uptake on the baseline (a) and follow-up scan (b). It showed no significant change in FDG uptake: SULpeak 11.4 and 9.9, respectively (−12%). However, if 5 lesions are selected as target lesions (green arrows), the sum of these 5 lesions decreases by 32%

Clinical experience with EORTC criteria and PERCIST

Since 2009, many studies have used PERCIST to monitor tumor response by FDG PET/CT. As of January 2017, there are 907 citations of the original PERCIST paper on Thomson Reuters Web of Science, putting this paper in the 0.05% category of the most cited articles in this database for 2009. The EORTC PET criteria, which were published 10 years earlier were also very influential and have been cited 891 times.

Several recent studies have compared response assessment by EORTC PET criteria and PERCIST [1726]. A meta-analysis of 6 studies including 348 patients concluded that EORTC criteria and PERCIST “showed almost perfect concordance” with a kappa of 0.946 [27]. The close agreement between the two response criteria was also confirmed by four studies published after the meta-analysis (Table 2). A higher rate of discordant response classification occurred when PET was performed “very early” after start of therapy. At this time point, there are some tumors that have already responded by EORTC criteria (15% decrease in FDG uptake), whereas they are non-responders by PERCIST which requires a higher (> = 30%) decrease in FDG. For example, in the study by Ho et al. [19], response rate after 14 days of oral erlotinib (EGFR kinase inhibitor) therapy was 47% by EORTC criteria, but only 30% by PERCIST. However, overall, the current literature (Table 2) indicates that only in about 10% of the studies is response classified differently by EORTC criteria and PERCIST.

Table 2 Overview of studies comparing response assessment by EORTC PET criteria and PERCIST

Importantly, PERCIST can decrease the inter-observer variability of response assessment when compared to a purely visual evaluation of tumor response. In a study by Fledelius et al. [28], 8 observers evaluated tumor response in 35 patients with NSCLC undergoing neoadjuvant chemotherapy. Tumor response was assessed by PERCIST and visual criteria proposed by Hicks et al. [29]. Both response classification systems defined four response categories, CR, PR, SD, and PD. Using PERCIST, there was complete agreement between all 8 observers for 22 patients, whereas complete agreement with visual criteria was only observed for 10 patients [28].

Limitations of RECIST and opportunities for FDG PET-based response assessment in clinical trials

While response by RECIST is very commonly used in clinical trials, its limitations for oncologic drug development are well-recognized. Response rate by RECIST is the endpoint of about 70% of oncological phase II studies [30, 31] and RECIST is also critical for alternative endpoints, such as time to progression, which is also to a large extent defined by follow-up imaging. However, less than 50% of the studies that meet their (response) endpoint in a phase II clinical trial are successful in phase III clinical trials [30, 32]. Conservative estimates for the cost of one randomized controlled trial are about 60 million US dollars [30, 32]. Therefore, the high failure rate of phase III clinical trials contributes to the significant concerns about the sustainability of the current drug development process as costs for new combination of immunotherapies are estimated to be more than 1 million dollars per patient [33].

The limitations of RECIST can be illustrated by the data from a meta-analyses of randomized phase trials evaluating dose-intensified versus standard dose chemotherapy in metastatic breast cancer patients [34]. This analysis included 10 randomized trials including a total of 2126 patients. As shown in Fig. 3a, tumor response was clearly a significant prognostic factor, but the survival differences between responders were not large. For example, at 24 months, 50% of the responder to chemotherapy had died, while about 25% of the non-responders were still alive, indicating that a subgroup of 25% of the patients without a response did better than 50% of the responders. Even more importantly, tumor response rates in a clinical trial were not well-correlated with patient survival, as shown in Fig. 3b. This graph shows the change of the differences in response rates for the dose intense (experimental) and the standard dose arm on the x-axis, and the difference in survival on the y-axis. Positive values indicate an improvement of survival by the experimental treatment. Each circle represents the results of one randomized trial and the diameter of the circle indicates the size of the trial. The experimental therapies consistently improved response rates, with all studies showing a larger response rate with the experimental therapy. However, the effect on survival was much less clear. For example, there are several studies which showed no change in survival (arrows), but improved the response rate by up to almost 20 percentage points (Fig. 3b). Because of this rather weak correlation between response and survival, there is a lively ongoing discussion if tumor response according to RECIST should be used an endpoint for drug approval in breast cancer [35, 36].

Fig. 3
figure 3

Prediction of tumor survival by tumor response. Survival of patients with metastatic breast cancer undergoing dose-intensified (experimental group) or standard chemotherapy (control) from a meta-analysis of 10 randomized trials including a total of 2126 patients. On a patient level, (a) survival is clearly better for responders (R) than for non-responders (NR). However, on a trial level, (b) there is only a weak correlation between the improvement in response rate (shown on the x-axis) and the improvement in survival (shown on the y-axis). Figure 3 is taken from Bruzzi et al. [34], edited. Arrows indicate studies that showed no improved survival despite an improvement in response rate

There are many potential reasons for the poor correlation between tumor response according to RECIST and survival. For example, some tumors may shrink significantly during chemotherapy, but recur quickly, whereas others may respond slowly (or not all), but also grow slowly. The most drastic examples, are perhaps neuroendocrine tumors. Small cell lung cancer (a high-grade neuroendocrine tumor) responds extremely well to chemotherapy, but recurs very rapidly resulting in a dismal prognosis of patients. On the other hand, well-differentiated neuroendocrine tumors respond only poorly (if at all) to chemotherapy/targeted therapies, but the prognosis of patients is nevertheless much better than for small cell carcinomas. The example of small cell carcinoma and well-differentiated neuroendocrine tumors was selected to illustrate the principle problem of treatment responses in a heterogeneous patient population. In clinical trials, these two tumor types would, of course, not receive the same treatment. However, (less dramatic) discrepancies between survival and responsiveness to chemotherapy are also known within one tumor entity. For example, a histopathologic complete response to chemotherapy is achieved significantly more often in patients with triple-negative breast cancer as compared to estrogen receptor-positive breast cancers. Nevertheless, survival of patients with triple-negative breast cancer is significantly worse than of patients with estrogen receptor-positive tumors [37].

These are fundamental challenges of interpreting tumor response to therapy in a heterogeneous patient population, and have been shown to affect also histopathologic response assessment [37, 38]. While similar systematic analyses are lacking for treatment monitoring with FDG PET/CT, it is likely that the correlation between response assessment and patient survival will be similarly confounded by inter-patient tumor heterogeneity. However, FDG PET has the potential to overcome other limitations of response assessment by RECIST. FDG PET/CT is more sensitive and specific for detection of metastatic disease than CT alone. Therefore, FDG PET/CT is likely more accurate in identifying disease progression and differentiating disease progression from stable disease. For example, FDG PET/CT is clearly more sensitive and specific than CT for detection of osseous metastases and should, therefore, detect progression in the skeleton more accurately than CT [39].

Conversely, CT is also limited in assessing tumor response in patients with osseous metastases. Sclerotic lesions typically persist and can even increase as a reaction of the healing bone (“flare phenomenon”) [40]. In contrast, osseous metastases generally show a similar decline in FDG uptake than soft tissue metastases. Thus, FDG PET/CT also has the potential to detect response with higher sensitivity and avoid a misdiagnosis of progression due to the flare phenomenon. Furthermore, response cannot be evaluated by RECIST in patients that present with only osseous metastases, whereas response monitoring with PERCIST is feasible for osseous metastases in the same way as for soft tissue metastases. Therefore, PERCIST can also decrease the number of patients in whom tumor response cannot be assessed by imaging.

A more fundamental problem that can potentially be overcome by PERCIST is the stable disease category of RECIST. The growth rate of un-treated human tumors is frequently unknown and may vary over time. Randomized controlled trials with a control arm of patients receiving no active therapy have consistently shown a substantial fraction of patients who demonstrated stable disease in the control arm (Table 3). This is not only the case for tumors with a known slow growth rate and relatively favorable prognosis, such as neuroendocrine tumors, but also for tumors with a very poor prognosis. For example, in a randomized, placebo-controlled trial of Gefitinib for treatment of chemotherapy refractory NSCLC, 31% of the patients in the control arm were classified as showing stable disease (Table 3). This makes stable disease and the related “tumor control rate” a problematic endpoint for non-randomized clinical studies, because a significant fraction of patients may be considered as “benefiting from inhibition of tumor growth” while, in fact, the treatment did not affect the tumor at all.

Table 3 Frequency of stable disease in randomized controlled trials with best supportive care as the control arm

FDG uptake on PET is determined by the metabolic activity of the tumor tissue and has been shown to correlate with tumor cell proliferation in several tumor types. Therefore, tumor growth inhibition is more likely to result in a decrease in tumor FDG uptake than stable FDG uptake. The definition of PERCIST ensures that the false positive rate for response is very low (see discussion above). Therefore, FDG PET and PERCIST may be better-suited to detect true disease stabilization in single arm studies than CT.

These theoretical considerations suggest that there are several potential advantages of using PERCIST as compared to RECIST in clinical trials. So far, however, there only relatively few and mostly small studies comparing RECIST and PERCIST have been published (References: [17, 2025, 4661]). A meta-analysis published in 2016 identified 6 studies including 268 patients [62]. In these patients, the agreement between RECIST and PERCIST was only moderate (κ = 0.590). Discordant response classifications were found in 101, or 38%, of the patients. Overall, the response rate by PERCIST was significantly higher than by RECIST (35% vs. 54%). Table 4 provides an updated overview of studies comparing RECIST and PERCIST. In these 18 studies, the response rate by RECIST ranges from 0% to 92% and the rate of progressive disease from 0% to 66%. The therapies used included chemotherapy, hyperthermia, chemoradiation, 90Y-microspheres, and various targeted therapies (protein kinase inhibitors and antibodies). Treatment was performed in neoadjuvant and palliative settings. In this very heterogeneous group of patients, response rate by PERCIST is higher than response rate by RECIST in all studies, confirming the results of the prior meta-analysis. Results were more mixed for the PD category with six studies demonstrating a higher PD rate with PERCIST and five showing a higher PD rate with RECIST. In the remaining studies, the PD rate was identical or was not reported.

Table 4 Overview of studies comparing RECIST and PERCIST in solid tumors

Only very few studies have compared the prognostic value of RECIST and PERCIST response classifications. Fendler et al. [52] showed in 73 patients with soft tissue sarcomas that PERCIST response is, overall, better-correlated with progression-free survival than RECIST response. Furthermore, PERCIST response identified two groups of patients with significantly different survival within patients classified as SD by RECIST. In a group of 44 patients with NSCLC treated by chemotherapy, Ding et al. [49] found that tumor response by PERCIST was a better predictor for progression-free survival than tumor response by RECIST. In a multivariate analysis including response by RECIST and PERCIST, only response by PERCIST was a significant prognostic factor. Yanagawa et al. [60] studied PERCIST and RECIST response in 51 patients with esophageal cancer undergoing neoadjuvant therapy. A complete metabolic response by PERCIST was a strong prognostic factor for overall and disease-free survival, whereas there was no significant correlation between tumor response according to RECIST and survival. Finally, Koshkin et al. [64] evaluated the prognostic value of early response assessment (day 9 of treatment) in patients with metastatic Ewing sarcoma treated with an IGF1 receptor antibody. A metabolic response at this very early time point was correlated with overall survival, with a median OS of 13.4 months as compared to 4.7 months in patients with metabolic disease progression. The prognostic value of tumor response by PERCIST was similar to response assessment by CT after 6 weeks of therapy (median overall survival of responders of 17 months as compared to 7.6 months for patients with progressive disease).

In summary, these studies confirm the significant potential of PERCIST to improve response assessment in clinical trials. However, more studies are needed that correlate tumor response according to PERCIST and RECIST with survival in order to demonstrate a superior prognostic value of PERCIST response classifications. These studies should ideally be performed as part of randomized controlled trials. This would allow not only assessment of the prognostic value of PERCIST, but also its predictive value for the effectiveness of a particular drug. The use of PERCIST in randomized clinical trials would allow for meta-analyses as shown in Fig. 3b, which could establish tumor response by PERCIST as a surrogate endpoint for clinical trials.

EORTC criteria and PERCIST versus disease and treatment specific response criteria

EORTC criteria and PERCIST can be applied to assess tumor response to therapy in all solid tumors. The criteria are based on the reproducibility of the FDG signal and some general considerations on how to define progression (appearance of new lesions, marked increase in lesion size). The definitions do not take into account the effect of specific therapies on tumor metabolism, the tumor type or the intent of treatment (curative or palliative). For RECIST, it has been recognized that the correlation between response and outcome is worse for some tumor types than others, which has prompted the development of several “modified response criteria”. These follow the general principles or RECIST, but make adjustments for specific tumor types and/or therapies. Examples include the Choi criteria for GIST tumors treated with imatinib [65], EASL and mRECIST for hepatocellular carcinoma [66], and the RANO criteria for primary and metastatic brain tumors [67].

Because of the rapid development of immunotherapy, there is now a particular interest in criteria that consider the specific type of responses seen in clinical trials of immune-modulating agents [68]. In these trials, tumor shrinkage in eventually responding patients occurred later than for chemotherapy or targeted therapies. In some patients, there was even a temporary increase in tumor size. Immune-related response criteria (irRC) were published in 2009 to address this issue and to decrease the frequency of this “pseudo-progression”. Wong et al. provide more detail on irRC in this supplement. There are currently not enough data to decide if a modification of PERCIST is necessary for monitoring tumor response to immunotherapy with FDG PET/CT and what such an “irPERCIST” would entail. In this context, there has recently been a refinement of the Lugano classification of lymphoma response criteria with respect to immunomodulatory therapy [69]. In a workshop, the Lymphoma Research Foundation, in partnership with the Cancer Research Institute, focused on the development of response guidelines for lymphomas in the setting of immunomodulatory agents, particularly checkpoint inhibitors. A provisional modification of the Lugano criteria adapted to immune-based therapy, the lymphoma response to immunomodulatory therapy criteria (LYRIC), was proposed. LYRIC keeps the core concepts of IRC and incorporates them into lymphoma-specific response criteria. A new response category, that is termed indeterminate response (IR) is introduced. IR is not indicating a specific underlying mechanism and recognizes that a delayed response and an immune-mediated flare can both occur in the early treatment period and, thus, may be difficult to distinguish from progression by imaging. Classification as IR provides the flexibility to have patients continue with treatment past IR in some circumstances with a mandatory subsequent evaluation within 12 weeks to confirm or deny true PD [69].

While it is easy to hypothesize that the inflammatory response caused by immunotherapy may confound the ability of FDG PET to identify patients with a favorable response to immunotherapy, systematic studies are needed to determine in how many cases this is the case. For example, the cell death induced by effective immunotherapy may be so rapid that there is a marked decrease in tumor FDG uptake despite the immune response. Furthermore, different immunotherapies, for example CTLA4 and PDL-1 antibodies, may very well have a different effect on tumor cells and immune cells. Therefore, the criteria may have to be treatment-specific. The recently presented data from a phase II study of the PD-L1 antibody atezolizumab in 138 patients with metastatic NSCLC demonstrated a marked decrease in FDG uptake at 6 weeks after start of therapy in responding tumors, quite similar to what has been observed with chemotherapy or targeted therapy. Response after radiographic progression was only seen in two patients [70].

Use of tumor response to guide clinical decisions (response adapted therapies)

In this review, we have so far discussed the use of response criteria to assess tumor response in a standardized way and to provide data on the activity of experimental therapies. However, tumor response also plays a very important role in the routine clinical care of cancer patients. In malignant lymphomas, response-adapted therapies have been developed with the goal to “de-escalate” chemotherapy and reduce long-term side effects [71]. Response-adapted therapies have also shown promise in some solid tumors, such as gastroesophageal cancer [72] and breast cancer [73]. A review of this subject is beyond the scope of this article. However, we would like to mention that the criteria to assess response-adapted therapies are not necessarily identical with the response criteria used for drug development. For response-adapted therapies, response criteria can and should be optimized to minimize the number of incorrect response assessments. For example, in some clinical scenarios, the goal of a therapy is cure or pathologic complete response, whereas in others, the goal is palliation. Some treatments, such as certain protein kinase inhibitors have a direct effect on tumor glucose metabolism and cause very rapid changes in tumor FDG uptake in sensitive cells [7476], whereas metabolic changes may be more gradual for therapies without such an effect. Optimal thresholds, to define response on FDG PET/CT for response-adapted therapies, will, therefore, likely be drug- and application-specific. EORTC criteria and PERCIST can provide guidance for the development of such treatment-specific criteria, but are not guaranteed to define the best approach for response-adapted therapies.

Conclusion

Criteria for assessing tumor response to therapy with FDG PET/CT, initially published in 1999, have come of age in 2017. The original EORTC PET criteria have formalized the concept of assessing tumor response in clinical trials by quantifying changes in FDG uptake. Clinical research during the last 18 years has demonstrated that this concept is valid and that metabolic changes during therapy are correlated with patient outcome. PERCIST has refined response assessment by FDG PET/CT and provides a much more detailed framework for lesion selection, region of interest definition, and response classification. A series of studies have shown that EORTC criteria and PERCIST provide very similar assessment of tumor response, but the use of PERCIST is preferable for clinical trials because PERCIST is a much more specific standard. Therefore, it is likely that PERCIST will replace EORTC criteria in the same way that RECIST has replaced the WHO criteria. There is some evidence that PERCIST is a better approach to assess tumor response than RECIST, but this still needs to be proven by systematic clinical studies. These studies should include randomized trials to determine if the response rate by PERCIST predicts the clinical benefit of new anti-cancer drugs better than the response rate by RECIST.