Introduction

The past decade has been witness to impressive advances in the treatment of metastatic breast cancer (MBC). MBC remains incurable, but from the initial diagnosis of metastatic disease, many patients are now living beyond a median of 3 years [1, 2]. Contributors to the improvement in overall survival (OS) have been better supportive care and the approval of novel anticancer agents and targeted therapies. However, some of the drugs that have been approved by regulators such as the Food and Drug Administration (FDA) and the European Medicines Agency (EMA) have not readily received reimbursement from public and private payers [3]. One contributing factor has been an improvement in progression-free survival (PFS) with a new drug, but without the concomitant increase in OS. The health policy impact of this phenomenon was demonstrated when the U.S. FDA removed the breast cancer indication from bevacizumab after two randomized trials failed to show an improvement in OS [4]. Given the high cost of the newer anticancer agents, it has been suggested that the oncology community needs to consider the value offered by a new drug, with a clinically meaningful OS benefit being a key component to the value proposition [57]. However, many solid tumor randomized trials including MBC use PFS as the primary endpoint, but PFS has not been fully validated as a surrogate for OS across several tumor types [8].

The FDA definition of PFS is a documented disease progression [i.e., greater than 20 % increase in tumor size, based on the response evaluation criteria in solid tumors (RECIST)1.1] or death from any cause from the time of randomization [9]. Demonstrating an OS benefit remains a challenge in cancer drug development. An important factor contributing to the difficulty in detecting an OS benefit is survival post progression (SPP) [8]. The longer the SPP, the harder it is to detect an OS benefit. In one simulation study, it was demonstrated that even when an OS benefit existed, it could not be statistically detected if the SPP was 12 months or longer [10]. The major events that mask potential OS benefits during the SPP period include mandated patient cross over into the experimental arm of the trial upon progression, secondary therapies, and heterogeneity in access to effective supportive care [11, 12]. As a result, a longer duration of SPP increases the opportunity for these and other factors to dilute any of the incremental survival benefits that may be associated with the new treatment under investigation.

Under these considerations, some investigators have argued that surrogate endpoints such as improvements in PFS should be accepted by regulatory agencies and payers because it would save drug development time and costs, and ultimately improve patient access to effective new drugs [11, 13]. The advantage of using PFS over OS is that secondary interventions cannot contaminate the former measurement. In addition, a patient cannot be crossed over into the experimental therapy until disease progression has occurred. Arguments against the use of PFS as a primary endpoint for drug approval and reimbursement are the potential for interobserver variability in measuring tumor shrinkage, and PFS is only a measure of drug effect during administration and is poorly correlated with survival and quality of life [8, 14].

The use of PFS as a surrogate for OS has been validated in metastatic colorectal cancer with the both patient- and trial-level analyses [1518]. As a result, the FDA and other regulatory agencies have accepted PFS as a surrogate endpoint for the drug approval in metastatic colorectal cancer. However, uncertainty remains in MBC. Meta-analyses at both the patient and trial level have yielded conflicting results [11, 19, 20]. Furthermore, these studies did not evaluate all lines of MBC therapy, nor did they consider the impact of targeted therapies. To address this uncertainty and to test the hypothesis that PFS is a valid surrogate endpoint for OS in MBC randomized trials, a systematic review of the literature followed by a trial-level correlational analysis was conducted in MBC patients receiving anthracyclines, taxanes, and targeted therapies.

Methods

Systematic review of randomized trials

We searched PubMed/Medline, EMBASE, and the Cochrane Central Register of Controlled Trials for randomized controlled trials evaluating anthracyclines, taxanes, and targeted therapies in patients with MBC published between January 1, 1990 and August 1, 2015. Electronic searches of the major conference proceedings were also conducted. Validated filters for randomized clinical trials were used for EMBASE and Medline [21, 22].

There was no restriction on the line of therapy being tested in each study. Trials evaluating 1st-, 2nd-, and beyond 2nd-line therapy were considered. There was also no restriction on trials evaluating single-agent or combination therapy. The trial must have utilized a parallel group-randomized design with at least 65 MBC patients enrolled into each arm. At least one of the arms must have included an anthracycline, a taxane, or a targeted therapy. A measure of progression-free and OS outcomes data must also have been reported in each study arm. Trials that reported time to progression (TTP) or time to treatment failure (TTF) were considered. However, the exact definition used in the trial was documented for subsequent statistical adjustment. Trials that only reported hazard ratios (HR) for PFS and OS were also included. Trials evaluating hormonal therapies were not incorporated into the analysis because these agents are a different class of drugs with a unique mechanism of action.

Studies were selected on the basis of the predetermined criteria and agreed upon by two evaluators. Any disagreement on specific studies between the two evaluators was resolved through discussion. Once trials meeting the inclusion criteria were identified, the following data were extracted: sample size, year of publication, regions involved (e.g., North American, European, global), line of therapy being evaluated, chemotherapy regimen, dosage, duration of therapy, definition of primary and secondary endpoints, how tumor response was assessed (WHO vs. RECIST criteria), trial duration, median number of cycles delivered, if patient cross over was allowed, if the progression-free and OS outcomes were censored, definition of PFS and all relevant clinical outcomes such as median PFS, TTP, TTF, OS, and the associated HR. The extracted data were recorded into a database for the subsequent statistical analysis.

Statistical analysis

The two co-primary endpoints for evaluating the association between PFS and OS were the correlation between the HR for PFS (HRPFS) and OS (HRPFS) as well as the correlation between differences in the median PFS (Δ PFS) and OS (Δ OS) between the experimental and control arms of the trials. The association between PFS and OS is related to the prediction of the endpoint of interest (e.g., HROS or Δ OS) from the surrogate (e.g., HRPFS or Δ PFS). Hence, the stronger the correlation, the more valid the surrogate. The objective of the study was to assess the validity of using PFS as a surrogate endpoint for OS in patients with MBC. For each trial that met the inclusion criteria, the association between PFS (HRPFS or Δ PFS) and OS (HROS and Δ OS) was initially measured using the Spearman rank correlation coefficient. This was then followed with a weighted multivariable regression analysis.

In two separate analyses, weighted (on the total trial sample size) multivariable regression analysis was used to measure the association between the HRPFS (primary predictor variable) and HROS (dependent variable). In the second analysis, Δ PFS was the main predictor variable and Δ OS was the dependent variable. These approaches provided a measure of the model R 2 statistic, which is the proportion of variability in the dependent variable accounted for by the model. Whenever HRs for PFS and OS were not reported in a given trial, they were calculated using the following formulas: HROR = median OS in experimental group/median OS in the control group; HRPFS = median PFS in experimental group/median PFS in the control group.

Other independent variables considered in the regression models included line of therapy, combination versus single-agent therapy, year of trial publication, region where the study was conducted (U.S. vs. European vs. global), what the primary trial endpoint was (i.e., PFS, TTP, TTF or OS), if the PFS measurement in the trial was consistent with the current FDA definition, if the trial incorporated data censoring into the analysis, and if patient cross over was permitted from the control into the experimental arm. Normality in the distribution of the dependent variables was made through a comparison of means and medians as well as the application of the Skew test. The independent variables were retained in the final model through a backwards elimination process (p < 0.05 to retain). The models were also adjusted for clustering on the primary study citation in cases where trials had multiple experimental arms.

The slope of the regression line of the final model provided an estimate of how much of a risk reduction (i.e., via the HR) in PFS contributes to a decrease in the risk of death for patients who were randomized into the experimental arm of the trial. In the case of the model that used Δ PFS and Δ OS as the predictor and depended variables, the final model coefficient estimated the incremental OS benefit per incremental month of PFS reported for the experimental arm of the trial. The stability of the base case results for each modeling analysis was then evaluated in a series of one-way sensitivity analyses. All statistical analyses were performed using Stata, release 14.0 (Stata Corp., College Station, Texas, USA).

Results

The systematic literature search identified 3167 relevant references consisting of 3119 records from the database search and 48 additional records from other sources. From this initial pool of references, 880 duplicates were discarded. Following the title and abstract review, 1528 studies were rejected for being out of scope. Of the remaining references subject to the full-text review, 759 were removed using the exclusion criteria. The final set of bibliographic records that fulfilled the eligibility criteria comprised 72 randomized trials (Appendix “List of studies included in the meta analysis”), which provided 84 trial comparative arms, with median sample sizes in the control and experimental arms being 149 and 144 patients, respectively. Figure 1 shows the flowchart of study selection process.

Fig. 1
figure 1

Flow diagram of study selection process

Trial characteristics are summarized in Table 1. The publication years spanned from 1991 to 2015, with a maximum of 11 publications in 2011. The majority of trials (n = 41) were conducted globally and 55 of 72 (76.4 %) evaluated new treatments in the first-line setting. The most common progression endpoint was TTP (n = 44), 33 and 7 of the trials used PFS and TTF, respectively. Overall, 44 studies used the FDA definition of PFS, which is based on the RECIST 1.1 criteria [ 9 ]. OS was reported in 78 of the 84 comparative arms, with 52 studies utilizing data censoring and 21 of 84 study arms allowing crossover to the experimental regimen upon disease progression.

Table 1 Description of studies included in the analysis

The univariate Spearman Rank correlation coefficient suggested a modest association between HROS and HRPFS (Spearman’s rho = 0.46; p < 0.001) as well as Δ OS and Δ PFS (Spearman’s rho = 0.52; p < 0.001). As illustrated by Fig. 2, there was a positive trend in the association where a lower HRPFS between the experimental and control groups indicated a reduction in the HROS. A HROS below 1.0 between the experimental and control groups would suggest a reduction in the risk of death in the former group of patients. Similarly, a larger Δ PFS was positively correlated with a greater Δ OS, indicating an improvement in overall survival between the experimental and control groups (Fig. 3).

Fig. 2
figure 2

Association between the HRPFS and the HROS the experimental and control groups. The Spearman rho coefficient was 0.46; p < 0.001

Fig. 3
figure 3

Association between Δ PFS and Δ OS the experimental and control groups. The Spearman rho coefficient was 0.52; p < 0.001

The weighted multivariable regression modeling confirmed the findings of the univariate correlational analysis. Through the backwards elimination process, the final variables that were retained in the model correlating HROS with HRPFS were region where the trial was conducted and patient cross over into the experimental arm. Other potentially important variables such as line of therapy, type of therapy (i.e., chemotherapy and targeted therapy alone or in combination) and type of progression endpoint used in the trial were not retained in the final model. Overall, the model R 2 was 0.31, indicating that only 31 % of the variability in the HROS was accounted for by the three independent variables retained in the model. Therefore, there are other important variables that contributed to the observed variability in the HROS that was reported in randomized trials evaluating new drugs in MBC.

The model coefficient between HRPFS and HROS was statistically significant indicating a positive association between these two variables where a reduction in HRPFS from an effective experimental therapy reduced the risk of death in MBC patients across all lines of therapy (Table 2). The findings also revealed that relative to trials conducted exclusively in Europe, global trials yielded a lower HROS by approximately 16 %. Stated differently, globally conducted trials were more likely to report an OS benefit compared to trials conducted in Europe. It is tempting to speculate that this finding may be related to a lower propensity to offer multiple lines of chemotherapy to patients from regions such as Latin America, Asia, and Southern Africa. The difference in reported HROS between European and North American trials was not statistically significant.

Table 2 Multivariable regression analysis on the association between the HR for OS and PFS

The allowance of patient cross over also had a statistically significant effect on the HROS. Trials that allowed cross over reported a 7.4 % reduction in the risk of death between the experimental and control groups compared to trials that did not allow cross over (Table 2). This finding is consistent with the expectation that cross over would only be offered in cases where the experimental agent under investigation appears to be highly effective.

The findings of the weighted multivariable regression analysis investigating the association between Δ PFS and Δ OS were consistent with the former evaluation. The model indicated that for every additional month of PFS, there would be a gain of 0.79 months in OS in the experimental group relative to the control group (Table 3). Other independent variables that were retained in the model by statistical means consisted of region, the allowance of patient cross over and line of therapy (1st- or 2nd- vs. ≥3nd-line trials). Globally conducted trials reported an OS gain of 2.5 months compared to trials conducted in Europe. Trials allowing cross over were associated with a 2.73-month increment in OS. Furthermore, trials evaluating new treatments in the 3rd-line setting and beyond reported a reduced OS benefit by approximately 3.1 months (p = 0.023). Overall, the model R 2 was 0.44, indicating that only 44 % of the variability in Δ OS was accounted for by the four independent variables retained in the final model (Table 3).

Table 3 Multivariable regression analysis on the association between change in OS and change in PFS between the experimental and control group

Sensitivity analysis on the primary findings

A one-way sensitivity analysis was conducted to evaluate the stability of the primary results generated from both multivariate analyses. This was characterized by focusing on trials that were published within the last 12 years (i.e., from 2004 onward), limiting the analysis to 1st- or ≥2nd-line trials only, studies that used the FDA definition of PFS, had utilized data censoring, allowed cross over and were conducted globally. Of the seven sensitivity analyses performed, the statistically significant association between HRPFS and HROS was retained in only three cases; ≥2nd-line trials, those that used the FDA definition of PFS and those allowing cross over (Table 4). Of these, trials ≥2nd-line setting had the highest model R 2 at 0.55 and the model coefficient between HRPFS and HROS increased from 0.18 in the base case to 0.40 (p < 0.001). In contrast, the model coefficient for trials in the 1st-line setting dropped to 0.01 (p = 0.90), indicating that such trials are unlikely to ever yield a statistically significant HROS. Trials allowing cross over to truly efficacious new drugs were also more likely to yield a HROS in favor of the experimental treatment (Table 4).

Table 4 Summary of sensitivity analysis on the base case results

The same series of sensitivity analyses were also performed for the multivariate models evaluating Δ PFS and Δ OS. In contrast to the former series of sensitivity analyses, the current series revealed that the significant association between Δ PFS and Δ OS was maintained in 6 of the 7 performed. The only case where the association was lost was when the analysis was limited to trials that used the FDA definition of PFS (Table 4). In their entirety, these findings imply that Δ PFS may be a better surrogate to OS than HRPFS. However, as a measure of effect size, HRPFS is preferred because it considers the entire time horizon of the Kaplan–Meier survival curve. It was also interesting to note that limiting the analysis to trials that allowed cross over increased the model coefficient for Δ PFS from 0.79 in the base case analysis to 1.57, with the model R 2 increasing to 75 %. The finding that cross over trials yielded stronger and more consistent associations between improvements in PFS and OS suggests that trials allowing cross over are somehow different than those that do not.

Discussion

In order to increase the likelihood of a new drug receiving regulatory approval and eventual reimbursement, a statistically and clinically meaningful increment in OS relative to an accepted standard of care should be demonstrated [3, 7]. However, demonstrating an OS benefit is challenging in solid tumors, particularly in earlier line trials where multiple effective therapies and modern supportive care are available upon progression [10]. To avoid the contaminating effects of these subsequent therapies, drug developers have used surrogate endpoints of patient benefit such as PFS, TTP, and TTF. For a surrogate endpoint to be used as a measure for drug approval, there should be at least some evidence that supports its correlation to OS.

In disease sites such as metastatic colorectal cancer, improvements in PFS have been shown to be statistically correlated to improvements in OS in both trial-level and patient-level analyses [15, 16, 18]. However, studies evaluating PFS as a surrogate to OS in MBC have generated conflicting results and uncertainty remains [11]. In one report, Miksad and colleagues conducted a trial-level analysis to measure the association between the HRPFS and HROS in advanced-stage breast cancer patients who received anthracycline- or taxane-based chemotherapy [20]. The investigators found that HRPFS was a statistically significant predictor of HROS with up to 48 % of the variance accounted for [20]. In contrast to these findings, Burzykowski et al. conducted a patient-level analysis on 3953 patients from 11 randomized trials evaluating anthracyclines or taxanes in the first-line setting of MBC [19]. The investigators failed to find a statistically significant correlation between HRPFS and HROS. Burzykowski and colleagues concluded that PFS was not an acceptable surrogate endpoint in this treatment setting [19].

In the current study, a trial-level meta-analysis was conducted to measure the association between PFS and improvements in OS through two different endpoints; HRPFS and Δ PFS. The analysis used a weighted multivariate modelling approach, which allowed additional predictor variables to be evaluated. The findings indicated that both HRPFS and Δ PFS were modestly correlated with improvements in OS, with 31 and 44 % of the variability explained by the respective models. However, the sensitivity analysis indicated that when the analysis was limited to trials evaluating new treatments in the 2nd setting and beyond, the model coefficient for the PFS surrogate measures increased significantly, as did the model R 2. When the analysis was limited to trials in the first-line setting, the statistically significant correlations between the surrogate PFS measures and improvements in OS were lost, consistent with findings of the patient-level analysis conducted by Burzykowski et al. [19]. These observations indicate that PFS can be a suitable surrogate for OS in MBC randomized trials evaluating new treatments in the 2nd setting and beyond. In the 1st setting, PFS as a primary trial endpoint is of limited clinical value and should be supplemented with meaningful patient reported outcome measures such as improvements in performance status, symptom control, and weight gain [8].

There are a number of limitations in the current study that need to be acknowledged. All meta-analyses are affected by the quality of the studies analyzed. For that reason, we limited our review to published prospective randomized trials with sufficient sample size. However, publication bias remains an issue and it must also be remembered that meta-analyses are only associations between trial-level parameters and study outcomes. True causation can only be established with an analysis of patient-level data. The R 2 of the various multivariate models ranged from 31 to 75 %. Therefore, there are additional factors contributing to the variability between the PFS-OS surrogacy that were not accounted. In 55 of the 84 eligible comparative trial arms, the HR for either PFS or OS was not reported. Hence, it had to be manually calculated using the reported medians. Such an approach may not reflect the true HR from a properly conducted survival analysis. In 3 of the accepted trials, we were also unsure if the current FDA definition of PFS was used. Lastly, variability in the evaluation of PFS between trials may also have impacted the observed differences in median PFS.

Despite these limitations, the findings of this correlative meta-analysis of prospective randomized trials were consistent with other trial-level analyses and indicate that improvements in PFS are correlated with increased OS. However, the effect appears to be driven by trials evaluating new drugs in ≥2nd-line setting. Therefore, PFS can be a suitable surrogate for OS in MBC randomized trials evaluating new treatments in the 2nd setting and beyond. The use of PFS alone as a primary trial endpoint in the 1st-line setting is not recommended.