Introduction

Metastatic breast cancer is an incurable disease that can be palliated by systemic therapy. Imaging plays a pivotal role in determining whether to continue, change, or stop treatment. In drug development, tumor response and progression-free survival (PFS) are increasingly used as endpoints of clinical studies, because the impact of a drug on overall survival can be confounded by the effects of second- and third-line therapies.

Change in tumor size based on anatomic imaging is the current standard for monitoring tumor response and progression in breast cancer. Size-based response criteria are commonly used as endpoints of clinical trials [1]. Guidelines to standardize assessment of changes in tumor size have been developed and continuously refined over the last 30 years [2,3,4]. Nonetheless, some inherent limitations cannot be overcome when size criteria are used. Distinguishing viable from nonviable residual tumor tissue is often difficult and osseous metastases are, in general, non-measurable. Meta-analyses of clinical trials have shown only weak correlations between tumor response and overall survival in breast cancer and other malignancies [5].

18F–Fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) is an imaging modality that combines morphologic and functional imaging. Measurements of glucose metabolism by FDG PET/CT have a high degree of repeatability [6] and show less inter-observer variability than measurements of tumor size [7]. PET/CT can differentiate an active tumor from post-therapeutic changes and assess metabolic activity in osseous metastases [8]. Changes in FDG uptake have demonstrated a high accuracy for predicting histopathologic response of breast cancer in the neoadjuvant setting [9,10,11] and appear superior when compared to size measurements alone [12]. Tumor response on PET/CT has also shown promise for assessing treatment response for metastatic breast cancer in the palliative setting [13].

A key limitation of most of these studies has been the lack of a standardized approach for assessing tumor response on PET/CT. PET response evaluation criteria in solid tumors (PERCIST) address this limitation and are increasingly used to monitor tumor response [14]. However, very few studies have compared response assessment by PERCIST with the current standard Response Evaluation Criteria in Solid Tumors (RECIST) in breast cancer or other malignancies. Specifically, it is unknown whether response assessment by PET/CT is better correlated with progression-free survival (PFS) and disease-specific survival (DSS) than response assessment by contrast-enhanced computed tomography (CE-CT).

Thus, the goal of this study was to compare CE-CT to PET/CT for prediction of PFS and DSS in patients with stage IV breast cancer undergoing systemic therapy.

Materials and methods

This retrospective study was approved by the institutional review board and informed consent was waived.

Patient selection

Darwin, a database and web-based analytics platform at Memorial Sloan Kettering Cancer Center (MSK), was used to search for patients with metastatic breast cancer who received first- or second-line chemotherapy, targeted therapy, and/or hormone therapy as part of a clinical trial from 2007 to 2012. Patients were included in this study if a) baseline CE-CT and FDG PET/CT were acquired within 28 days prior to initiation of therapy and within 90 days after initiation of therapy; and b) if disease was present at baseline as defined by RECIST 1.1 [4] or PERCIST [14]. Patients’ medical records were reviewed and the following were documented: age at treatment initiation, start date and type of systemic treatment, histologic tumor type, tumor receptor status, and date and cause of death (whether disease-specific or not) or date of last documented visit.

Imaging techniques

CE-CT

Multi-slice CE-CTs were acquired using a variety of 16- or 64-detector row CT scanners from the thoracic inlet to the pelvic floor with 120-kVp tube voltage and 120–200-mA tube current time product after intravenous injection of 150 mL of non-ionic contrast agent with a time delay typical of portal venous phase imaging.

All studies included standard transverse and reformatted coronal and sagittal images and were interpreted on picture archiving and communication system (PACS; Centricity, GE Healthcare) workstations.

FDG PET/CT

Prior to radiotracer 18F–FDG injection, patients fasted for at least 4 h. If plasma glucose was less than 200 mg/dL, patients were injected with 12–15 mCi (444–555 MBq) of 18F–FDG intravenously. After a 60- to 90-min uptake period, scans were acquired from the skull base to the mid-thigh using a variety of PET/CT scanners, followed by a CT scan for attenuation correction. Analysis of the PET/CT images was performed with PET-VCAR AW Suite 2.2, Advantage Workstation, GE Healthcare.

Response evaluation

For both modalities, responses were categorized into four groups: complete response (CR), partial response (PR), stable disease (SD), or progressive disease (PD).

CE-CT response was determined by one board-certified radiologist (KPD) according to the RECIST 1.1 criteria as summarized in Table 1 [4]. The reader was aware of the clinical diagnosis but blinded to the results of the FDG PET/CT and the clinical follow-up information.

Table 1 Criteria for evaluation of response by CE-CT vs. PET/CT

FDG PET/CT response was determined by a different physician certified in both radiology and nuclear medicine (CCR). The reader was aware of the clinical diagnosis but blinded to clinical follow-up information. For all target lesions identified by the reader, peak standardized uptake values normalized to lean body mass (SULpeak) were determined [14] (PET-VCAR AW Suite 2.2, Advantage Workstation, GE Healthcare). In a separate analysis, we evaluated response classifications based on the clinically more commonly used maximum standardized uptake values (SUVmax) and compared these to response classifications based on SULpeak. Selection of lesions and response categorization was based on PERCIST as summarized in Table 1 [14].

For assessment of PFS and DSS, patients underwent clinical follow-up and CE-CT scans at 3-month intervals until progression, followed by routine follow-up until death. At the discretion of the treating physician, some patients were also followed with PET/CT scans.

Statistics

Statistical analysis was performed with Prism 6 (GraphPad, CA, USA) and STATA SE 12.1 (StataCorp, USA). The Pearson correlation coefficient between changes in SUVmax and changes in SULpeak was calculated and differences in response categories were assessed. Concordance between FDG PET/CT- and CE-CT-based response assessment was calculated using kappa statistics. To compare CE-CT and FDG PET/CT for prediction of outcome, patients were dichotomized as follows: In the first analysis, patients with CR and PR were considered responders and patients with SD and PD were considered non-responders. In clinical practice, however, therapy is typically continued as long as the patient shows no progression on anatomic imaging. Furthermore, clinical trials have also used disease control rate (patients with CR, PR, and SD) as an endpoint instead of or in addition to the objective response rate. Therefore, disease control rate on CE-CT was also determined and compared with response on FDG PET/CT. Association of response to treatment with PFS and DSS was assessed using univariate and multivariate Cox hazard regression models. The predictive accuracy of all Cox regression models was assessed and compared by calculating Gonen and Heller’s K concordance statistic [15]. The resulting index is a measure of the Cox regression model’s discriminative power to predict survival. Its values range between 1, implying perfect prediction, and 0.5, which indicates random guessing. Kaplan–Meier analysis was performed to compare responders with non-responders for each modality as well as disease control for CE-CT vs. response by PET/CT. Additionally, Kaplan–Meier analysis was performed to compare responders with non-responders by both modalities and patients with divergent results. The log-rank (Mantel–Cox) test was used to evaluate the differences between Kaplan–Meier curves. P values <0.05 were considered significant.

Results

Sixty-five patients aged 29–85 years (mean age 54 years) were eligible for this study (Fig. 1). Of these 65, 2 patients were enrolled in 2007, 0 in 2008, 2 in 2009, and the remaining 61 from 2010 to 2012. Details about treatment, imaging times, patient follow-up, and cancer baseline characteristics are summarized in Table S1 and Table 2.

Fig. 1
figure 1

Patient cohort

Table 2 Imaging intervals and patient follow-up

Changes in SULpeak and SUVmax were very closely correlated (r = 0.998). Consequently, the response classification (CR, PR, SD, or PD) was the same when SUVmax was used instead of SULpeak. Response categorizations as assessed by CE-CT and FDG PET/CT are summarized in Table 3. Overall, there was only fair to moderate agreement between response assessment by CE-CT and FDG PET/CT (kappa = 0.36, weighted kappa = 0.45). However, all 22 patients categorized as CR or PR by CE-CT were also categorized as CR or PR by PET/CT. In contrast, marked differences in response assessment were observed for the 43 patients categorized as SD or PD by CE-CT. Seventeen (40%) of these patients were classified as PR or CR by PET/CT (Table 3, Fig. 2). Of the 15 patients classified as PD by CE-CT, 3 were classified as CR by FDG PET/CT, 1 as SD, and 11 (73%) concordantly as PD. All three patients with discordant CR on PET/CT had new or progressive bone lesions on CE-CT. The 28 patients categorized as SD by CE-CT were classified by FDG PET/CT as CR in 6 cases, as PR in 8 cases, as PD in 6 cases (Fig. 3 ) and, concordantly, as SD in only 8 cases (29%). Of the 24 patients without osseous involvement, 9 (38%) had SD on CT and 2 (8%) had SD on PET/CT. Only two of the nine patients without osseous involvement and SD by CE-CT were also classified as SD by FDG PET/CT (one was categorized as CR, two were categorized as PR and four as PD; see Table 4 for details).

Table 3 Response categorizations as assessed by CE-CT and FDG PET/CT
Fig. 2
figure 2

Eighty-six-year-old woman with metastatic breast cancer before and after first-line endocrine therapy combined with bevacizumab showing stable disease by CE-CT and complete response by PET/CT. CT images demonstrate several sclerotic foci in the spine, representing biopsy-proven osseous metastases (arrows in a), with no discernible change after treatment (c). Maximum-intensity projection PET/CT images show multiple FDG-avid metastases including the spine (short arrows), pelvis (dashed arrow), and ribs (dotted arrow) (b), with resolution after treatment initiation (d). Axial CT image of the pelvis in the same patient showing no metastases before (e) and after therapy (g). Corresponding axial PET/CT image showing multiple metastases before (f) and resolution of uptake after treatment (h). The patient is still alive with a follow-up of 47 months

Fig. 3
figure 3

Thirty-eight-year-old woman with metastatic breast cancer 10 days before (a, c, and e) and 80 days after (b, d, and f) initiation of desatanib and paclitaxel as first-line combination therapy showing stable disease by contrast-enhanced CT and progression by FDG PET/CT. Axial CT images before (a) and after (b) treatment initiation show stable axillary nodal metastases (arrow). Axial FDG PET/CT before (c) and after (d) treatment initiation show a marked increase in FDG avidity from an SULpeak of 2.1 to an SULpeak of 6.5. Mixed changes are seen on maximum-intensity projection PET images before (e) and after (f) treatment initiation. All nodes were stable on CT. The patient stayed on protocol as she did not meet criteria of progression by RECIST 1.1. Two months later, a new left axillary lymph node became palpable and the patient was switched to another treatment regimen

Table 4 Disease baseline characteristics and types of treatment with subgroup analysis of the number of discrepant cases and number of cases with SD

Differences in response assessment between CE-CT and FDG PET/CT were most commonly observed in patients with osseous metastases (Table 4). However, PET/CT showed substantially fewer cases with SD in all subgroups of histology, receptor status, metastatic pattern, and treatment type (Table 4). This included 23 patients with only soft tissue metastases (9 cases of SD by CE-CT vs. 2 with PET/CT) and 7 patients with invasive lobular cancers (5 cases of SD by CE-CT vs. 1 with PET/CT).

The agreement between response assessment by CE-CT and FDG PET/CT remained moderate when patients were dichotomized as responders and non-responders (kappa = 0.51). Agreement did not improve when disease control on CE-CT was compared with response on FDG PET/CT (kappa = 0.41).

Kaplan–Meier curves for PFS and DSS for FDG PET/CT and CE-CT for responders and non-responders are shown in Fig. 4A and B. One-year PFS for responders vs. non-responders by CE-CT was 59.1% vs. 27.2% (p = 0.1954) compared to 63.6% vs. 0% (p = 0.0001) by PET/CT. Four-year DSS for responders vs. non-responders by CE-CT was 49.6% vs. 37.5% (p = 0.0030) compared to 58.2% vs. 18.0% (p = 0.0001) by PET/CT. Kaplan–Meier curves for PFS and DSS for responders and non-responders by FDG PET/CT vs. disease control by CE-CT are shown in Fig. 4C and D. When disease control rate on CE-CT was analyzed, 1-year PFS for patients with and without disease control on CE-CT was 43.4% vs. 20.0% (p = 0.0118). Four-year DSS for patients with and without disease control on CE-CT was 47.0% vs. 26.7% (p = 0.0035).

Fig. 4
figure 4

Progression-free survival (PFS, a) and disease-specific survival (DSS, b) by tumor response on PET/CT and CT. One-year PFS for responders vs. non-responders by CE-CT was 59.1% vs. 27.2% (p = 0.0030), compared to 63.6% vs. 0% (p < 0.0001) by PET/CT, respectively. Four-year DSS for responders vs. non-responders by CE-CT was 50% vs. 38% (p = 0.2) compared to 58% vs. 18% (p < 0.0001) by PET/CT, respectively. When disease control rate on CE-CT was analyzed and compared with the numbers for response by PET/CT as mentioned above, 1-year PFS (c) for patients with and without disease control on CE-CT was 43.4% vs. 20.0%, respectively (p = 0.0118). Four-year DSS (d) for patients with and without disease control on CE-CT was 47.0% vs. 26.7%, respectively (p = 0.0035)

Univariate Cox proportional hazards regression demonstrated that response on FDG PET/CT and CE-CT were both significantly associated with PFS. However, the HR for a non-response on FDG PET/CT was four times higher than CE-CT (Table 5). The multivariate model retained only response on PET/CT as a significant and independent predictor of PFS. As a result, the C index of the multivariate model equaled that of the univariate FDG PET/CT model. No incremental value of CE-CT to FDG PET/CT was identified (Table 5).

Table 5 Cox proportional hazards regression results for PFS

Univariate Cox proportional hazards regression for prediction of DSS showed a significant association with disease control rate on CT and response on FDG PET/CT, whereas response on CE-CT was not significantly correlated with DSS. Both hazard ratios and predictive accuracy as estimated by the C index were higher using FDG PET/CT response as compared to disease control rate on CE-CT. In a multivariate predictive model, only FDG PET/CT response was an independent predictor of DSS (Table 6).

Table 6 Cox proportional hazards regression results for DSS

Discussion

This first systematic comparison of RECIST and PERCIST for monitoring tumor response in patients with breast cancer indicates that metabolic responses on FDG PET/CT are frequently discordant from morphologic responses on CE-CT. While all patients classified as responders by RECIST were also classified as such by PERCIST, patients with SD or PD by RECIST were frequently classified differently by PERCIST. Because of the differences in these patient groups, response by PERCIST was a significantly stronger predictor for PFS and DSS than response or disease control by RECIST. Our data, therefore, suggest that treatment monitoring with FDG PET/CT could benefit patients with breast cancer in two ways: (i) avoid taking patients off treatment who are considered to have PD by CE-CT but who respond by PET/CT; and (ii) change therapy in patients considered stable on CE-CT but show no response on FDG PET/CT. According to our data (Table 3), this would alter overall patient management in 25% of the patients.

The marked differences in response classification by FDG PET/CT and CE-CT were often due to changes in the metabolic activity and number of osseous metastases (Fig. 2). Bone metastases can exhibit paradoxical behavior when responding to treatment with an increase in size and number of metastases, or the appearance of new lesions, because CT does not evaluate the bone marrow, but only the osteoblastic reaction in healing bone [16]. For this reason, RECIST 1.1 criteria specify that bone lesions without soft tissue components are non-measurable [4]. This limitation is problematic in patients with breast cancer because bone is often the dominant site of metastatic disease [17]. In contrast, FDG avidity reflects tumor viability, and can differentiate between tumor progression and healing bone [18, 19]. Additionally, FDG PET/CT is more sensitive than CT for the detection of osseous metastases [20]. Therefore, disease progression is detected earlier by FDG PET/CT than by CE-CT. While this principle advantage of metabolic imaging is well-known, to our knowledge, only a very limited number of studies have investigated the impact of better assessment of osseous disease on prediction of patient outcome [21].

To overcome the low sensitivity of CE-CT for osseous metastases, many clinical trials require CE-CT plus bone scan to evaluate response. However, the osteoblastic reaction of healing bone is well-known to initially increase radiotracer uptake on bone scans, which leads to false-positive findings, and there is no established quantitative analysis of bone scans in breast cancer patients. This renders bone scans less valuable than FDG PET/CT for assessment of tumor response [22,23,24].

In clinical trials and in clinical practice, therapy is typically continued in patients with controlled disease on CT, which includes patients with CR, PR, or SD. However, a limitation of this approach is that patients with slow-growing tumors may exhibit SD in the absence of a drug effect. In placebo-controlled trials, the SD rate in the placebo arm can be as high as 30%, even for aggressive malignancies such as metastatic non-small cell lung cancer [25]. Consequently, the beneficial effect of a drug can be markedly overestimated if SD is considered as a drug effect. In contrast, PR and CR by RECIST or PERCIST are extremely rare in the absence of active treatment. Therefore, it may be preferable to define the effectiveness of a therapy by response rather than by disease stabilization. This could potentially avoid an overestimation of the effectiveness of a drug in a clinical trial and reduce the side-effects and costs of ineffective therapies in clinical practice.

These theoretical considerations are supported by the findings of the present study, where response on PET/CT was also a better predictor for patient outcome than disease control on CE-CT. The disease control rate (CR + PR + SD) was 77% (50/65) in the current study (Table 3), whereas the response rate on PET/CT was only 62% (40/65). This difference occurred because 46% (13/28) of patients showing SD by CE-CT were non-responders according to FDG PET/CT. DSS of PERCIST non-responders was very similar to patients with RECIST PD, with even a tendency for a worse outcome for the non-responders on FDG PET/CT (Fig. 4D). This suggests that FDG PET/CT correctly identified a significant fraction of patients who were not benefiting from the selected therapies despite showing SD by RECIST (i.e., 50% of the 28 patients showing SD by CE-CT and 20% of the total patient population). There was also a smaller group of patients that progressed by CE-CT, but were responders on PET/CT. These three patients demonstrated new osseous lesions on CT, but showed an excellent clinical outcome.

In patients with metastatic breast cancer, effective second- and third-line therapies can dilute the prognostic value of the initial response to therapy for OS and DSS. Therefore, it is encouraging that response by PERCIST was significantly correlated with DSS despite these confounding factors.

While these observations are encouraging for the clinical use of FDG PET/CT, the following limitations should be considered. The patients in our study received therapy as part of multiple protocols including cytotoxic, hormone, and targeted therapies, as well as a combination of these therapies. Furthermore, various breast cancer subtypes were included in the study. While this heterogeneity may be considered a limitation, we would argue that the strong prognostic value of tumor response on FDG PET/CT in this heterogeneous patient population supports the robustness of response assessment by PERCIST. This is supported by the data in Table 4, which show no obvious differences in the prognostic value of FDG PET/CT in various patient subgroups. While theoretically superior, it seems impractical to define different response criteria for specific therapies and different histological subtypes of breast cancer. For the same reason, RECIST is used (with very few exceptions) for all solid tumors and all therapies. We cannot make an argument for or against the value of CE-CT and PET/CT in case of hormone therapy only, since the number of cases with such treatments was too small (n = 4).

Further limitations of this study include the retrospective analysis and the relatively small number of patients. Thus, the reported findings need to be confirmed in a prospective, multicenter trial. However, we believe that our study provides reliable estimates of the prognostic value of FDG PET/CT to design and power such confirmatory trials, because only protocol patients were included in our analysis and, hence, standardized patient follow-up was performed and all outcome data collected prospectively.

Tumor FDG uptake can be measured in various ways, including SULpeak and SUVmax, raising the question of which parameter is most suitable for response assessment. We have used SUVmax in a preliminary analysis of our data, which showed very similar differences in PFS and DSS between responders and non-responders [26]. In the present study, the response classification was unchanged when SUVmax was used instead of SULpeak for response assessment. This suggests that response assessment by PERCIST is not significantly affected by differences in SUV quantification, which is encouraging for its broader clinical use.

In conclusion, our study suggests that in patients with metastatic breast cancer, response assessment by FDG PET/CT may be superior to CE-CT, because tumor response on FDG PET/CT correlated significantly better with PFS and DSS than tumor response or tumor control on CE-CT. The differences in response assessment by FDG PET/CT and CE-CT could cause changes in patient management in 25% of the patients. Thus, monitoring tumor response to therapy with FDG PET/CT may not only improve the quality of clinical trials using response as an endpoint, but may also reduce the morbidity and costs of ineffective therapies in clinical practice.