Introduction

Breast cancer is the most common type of cancer diagnosed in women, with an estimated incidence of 234,190 new cases and a mortality of 40,730 in the USA in 2015. In advanced stages of the disease, surgery does not provide substantial benefit and treatment mainly involves pharmacotherapy (American Cancer Society 2015). In the past 20 years, there have been significant improvements in the treatment of advanced breast cancer, with the development of novel chemotherapeutic agents, as well as targeted and endocrine therapies, which are still being actively developed (Kaufman et al. 2015; Thomas et al. 2007; Verma et al. 2012; Vogel et al. 2006).

Endpoints hold importance in drug approval processes for the accurate and efficient demonstration of drug efficacy and safety. In cancer treatment, the increasing availability of novel therapeutic agents has further complicated endpoint selection, with multiple lines of treatment and therapeutic combinations becoming possible. Furthermore, the debate regarding which outcome measure is most suitable for advanced solid tumor assessment remains open (Bruzzi et al. 2005; Burzykowski et al. 2008; Saad and Buyse 2012; Sargent and Hayes 2008).

Several studies have assessed endpoint selection in cancer cases using published randomized controlled trials (Booth and Eisenhauer 2012; Le Tourneau et al. 2009; Mathoulin-Pelissier et al. 2008; Saad et al. 2010). However, published trials only represent a portion of all trials conducted, with non-publication and delayed publication of negative results, as well as selective outcome reporting, all being current issues in the literature (Chan and Altman 2005; Chan et al. 2004; Huic et al. 2011; Saito and Gill 2014). Thus, for a comprehensive overview of clinical trial endpoint selection, we decided to use ClinicalTrials.gov as a data source to longitudinally track all registered clinical trials for a single disease over a decade.

ClinicalTrials.gov is a Web-based clinical trial registry created as a result of the Food and Drug Administration Modernization Act of 1997 (FDAMA). The ClinicalTrials.gov registration requirements were later expanded following the Food and Drug Administration Amendments Act of 2007 (FDAAA), which required more types of trials to be registered and additional trial registration information to be submitted. The registry currently contains more than 35,000 studies performed both in the US and in non-US settings. Prospective registration of clinical trials is now a standard practice, supported by an announcement made by the International Committee of Medical Journal Editors (ICMJE) in 2004 requiring trials to be registered in order to be considered for publication (DeAngelis et al. 2005). Of all the oncology trials registered in the registry, the highest number of trials was found to be conducted in breast cancer patients (Hirsch et al. 2013).

The aim of our study was to observe endpoint selection in advanced breast cancer from the clinical trial registration stage by using a clinical trial registry for a comprehensive overview. Furthermore, any shift in trends was assessed by following clinical trial registration of a single disease, advanced breast cancer, over a decade.

Methods

Search strategy

We searched ClinicalTrials.gov using the following search terms: breast AND (cancer or carcinoma) AND (metastatic or advanced) with “Interventional studies” as study type, “phase II” and “phase III” as phase, and date of first registration between October 1, 2000, and September 30, 2012. Clinical trials involving systemic anticancer therapy were included, while those involving neoadjuvant therapy, radiation therapy, surgery; high-dose chemotherapy/bone marrow transplantation, and supportive care drugs, such as bisphosphonates and denosumab, were excluded. As new requirements under the FDAAA were applicable to trials initiated after September 27, 2007, trials were divided into two intervals according to registration date: October 2000 to September 2007 (cohort A) and October 2007 to September 2012 (cohort B).

Data collection

Two investigators independently extracted the data, and any discrepancy was resolved after discussion with a third reviewer. Extracted data included demographic information such as gender and age (according to ClinicalTrials.gov classifications), phase type (II or III), funding source (industry or non-industry), study allocation (randomized or non-randomized), intervention model (single or multiple arm), and outcome measures (primary and secondary outcome measures). Advanced breast cancer was defined as stage IIIB, inoperable IIIc, and stage IV. After the initial review, to accommodate the active update of information and to ensure data integrity, information on outcome selection was finally confirmed on April 20, 2015.

Definition of terms

For the purpose of our study, objective response rate (ORR) was defined as the percentage of patients who achieved a complete response (CR) or partial response (PR), while clinical benefit rate (CBR) was defined as the percentage of patients who achieved CR, PR, or stable disease (SD). Time to progression (TTP) was defined as time from randomization until progression without inclusion of deaths, while progression-free survival (PFS) includes deaths. Time to response (TTR) was defined as time from randomization to response, while duration of response (DoR) was defined as time from response to progression or death, whichever occurred first.

Publication search

To compare the primary endpoints of published and unpublished trials, subsequent publications of trials were searched. The search was conducted using NTC numbers in PubMed and also based on the availability of linked publications on ClinicalTrials.gov either added voluntarily by the authors or automatically indexed by the ClinicalTrials.gov identifier. If no endpoints were stated as primary, the first endpoint described in the results section was considered as primary endpoint. Multiple publications of a single trial were all included in our study.

Statistical analysis

Overall number and proportion (N, %) of primary and secondary endpoints were determined along with medians and ranges. Comparisons of overall characteristics and endpoints between categorical groups were carried out by Chi-square test or Fisher’s exact test where appropriate. To analyze trends in endpoint selection, the Cochran–Armitage Trend Test was used. All tests were two-sided, and a P value of 0.05 was considered statistically significant. Statistical analyses were performed using SAS version 9.4 (SAS institute Inc., Cary, NC, USA).

Results

Initially, 1265 phase II and 370 phase III trials were retrieved, 398 and 120 of which met our inclusion criteria, respectively (Fig. 1). The characteristics of the included studies are presented in Table 1. Most trials recruited only female participants (phase II: 65.8 %; phase III: 83.3 %) and included the adult/senior age group (phase II: 93.0 %; phase III: 90.8 %) or industry sponsored groups (phase II: 42.7 %; phase III: 59.2 %). The majority of phase II trials were non-randomized (64.1 %) and single-armed (64.6 %), while most phase III trials were randomized (95.8 %) and double-armed (88.3 %).

Fig. 1
figure 1

Search strategy and study selection

Table 1 Characteristics of advanced breast cancer clinical trials registered in ClinicalTrials.gov between September 2000 and September 2012

The median number of primary outcomes intended in both phase II and III trials was 1 (phase II range 1–5; phase III range 1–9), while that for secondary outcome measures was 4 in phase II (range 0–16) and 5 in phase III trials (range 0–13). Table 2 provides a list of the primary outcomes assessed in both phase II and phase III trials. In phase II trials, the most frequent primary endpoint was ORR in cohort A (60.6 %) and PFS in cohort B (40.7 %). The shift in trends was statistically significant with a decline in ORR selection (P < 0.001) and an increase in PFS selection (P < 0.001) (Fig. 2). For phase III trials, PFS was the most frequently used primary outcome in both cohort groups (cohort A: 35.9 %; cohort B: 66.1 %).

Table 2 Endpoints in clinical trials of advanced breast cancer by cohort
Fig. 2
figure 2

Trend over time in the selection of ORR or PFS as a primary outcome measure in phase II advanced breast cancer trials. Significant change over time was observed (ORR P < 0.001; PFS P < 0.001). Error bars represent 95 % confidence intervals. ORR response rate; PFS progression-free survival

The primary outcomes of phase II and phase III clinical trials by intervention type are shown in Table 3. For phase II, the difference in the proportion of trials using ORR between different intervention types was significant. The proportion of chemotherapy-only intervention trials using ORR as primary outcome was relatively high (72.1 %). The frequency of PFS selection was significantly different between trials of different intervention types (P < 0.001), with a higher percentage in targeted and/or hormone therapy intervention trials. For phase III trials, a significant difference in PFS selection was observed (P < 0.05), where trials with targeted and/or hormone therapy as interventions used PFS more frequently.

Table 3 Top 5 primary endpoints in clinical trials of advanced breast cancer by intervention type

The comparison of primary endpoints reported in published and unpublished trials in ClinicalTrials.gov is shown in Table 4. In phase II trials, CBR was the only endpoint in which a significant difference was observed, and CBR was found to be more frequently reported in published than in unpublished trials. In phase III trials, PFS was found to be significantly different between the two groups, with higher reporting frequency in published than in unpublished trials.

Table 4 Primary endpoints of published and unpublished clinical trials

Discussion

In this study, we examined the trends in outcome selection among phase II and phase III clinical trials in advanced breast cancer treatment. Although guidelines on endpoint selection in clinical trials by regulatory authorities such as the US FDA and European Medicines Agency (EMA) are available and certain endpoints are expected to be used more frequently as primary endpoint and others as secondary endpoints, no studies have quantitatively analyzed endpoint selection and its changing trends in advanced breast cancer patients (U.S. Food and Drug Administration 2007; European Medicines Agency 2013). For phase II trials, ORR was the most frequently used primary outcome overall. However, an increasing trend in PFS use (cohort A, 17.6 %; cohort B, 40.7 %; P < 0.001) in replacement of ORR (cohort A, 60.6 %; cohort B, 39.0 %; P < 0.001) was observed.

In the early 1980s, ORR was the standard endpoint used to assess anticancer activity against advanced solid breast cancer tumors and was used by the FDA for the regular approval of drugs used to treat advanced breast cancer in postmenopausal women, such as anastrozole, letrozole, and fulvestrant (Bradbury and Seymour 2009; Johnson et al. 2003). However, in later years, the FDA required more direct evidence of a clinical benefit for drug approval (Bruzzi et al. 2005; U.S. Food and Drug Administration 2007). The use of early surrogate markers to assess new treatment lines requires caution to avoid misinterpretation of results. Unlike the old chemotherapeutic agents that had cytotoxic activity, newer targeted agents inhibit tumor growth rather than decreasing tumor size (Kummar et al. 2006; Stone et al. 2007; Yu and Holmgren 2007). Thus, ORR, which is defined as the percentage of patients whose tumor shrinks or disappears after treatment as determined by radiological tests or physical examination, may not be suitable (Dowlati and Fu 2008; George 2007; U.S. Food and Drug Administration 2007). ORR does not include SD, which could be the best response to cytostatic agents and may be as clinically relevant to survival as CR or PR (Michaelis and Ratain 2006; Robertson et al. 1997; Tuma 2006). Consequently, the EMA does not recommend the use of ORR as an endpoint for non-cytotoxic compounds (European Medicines Agency 2013). For SD assessment, TTP and PFS are preferred and represent useful surrogates in phase II trials (Michaelis and Ratain 2006; U.S. Food and Drug Administration 2007). Since a decrease in tumor size does not necessarily correlate with patient survival, and novel agents are more likely to inhibit growth rather than decrease tumor size, the transition of intended endpoint in phase II trials from ORR to PFS, and the relatively low use of ORR as primary endpoint in clinical trials involving targeted and/or hormone therapy can be explained.

PFS was also the most frequently used primary endpoint in phase III trials (70.8 %). Traditionally, overall survival (OS) has been the gold standard endpoint in oncology, especially, in drug approval by the US and European regulatory agencies, and it is still unquestionably the most relevant clinical outcome. Unlike ORR and TTP, which are dependent on the evaluation method used and biased by the knowledge of the therapy received, OS is objective and highly representative of clinical benefit (Di Leo et al. 2003; U.S. Food and Drug Administration 2007). However, disadvantages such as randomization requirements, long follow-up and large sample size, as well as incompatibility with a crossover study design limit its use (Le Tourneau et al. 2009). These limitations, along with increasing pressure to develop newer and more effective treatment lines in a time-efficient and cost-effective manner, have highlighted the need for surrogate endpoints of survival. With PFS, the regimen attributable to outcome can be determined, making it especially desirable in advanced breast cancer where frequent rotations in treatment occur (U.S. Food and Drug Administration 2007). PFS also demonstrates clinical benefit in smaller sample sizes and shorter follow-up periods compared to OS, thus allowing for substantial cost reductions (Michaelis and Ratain 2006). However, while PFS has been validated as a surrogate for OS in first-line chemotherapy of advanced colorectal and ovarian cancer, its surrogacy is yet to be established in advanced breast cancer owing to inconsistent results (Burzykowski et al. 2008; Buyse et al. 2007; Miksad et al. 2008; Tang et al. 2007). In 2011, the FDA withdrew approval of bevacizumab indication in breast cancer, which had obtained accelerated approval based on promising PFS results. The reason for this withdrawal was a failure of subsequent studies to demonstrate a clinically meaningful benefit that justified its associated life-threatening risks (U.S. Food and Drug Administration 2011). As PFS does not directly measure prolongation of life or improvement in quality of life, a higher PFS does not necessarily correlate with OS gain or clinical benefit (Saad and Buyse 2012; U.S. Food and Drug Administration 2007). Burzykowski et al. (2008) found that no endpoint, including tumor response, disease control, PFS, or TTP, was an appropriate surrogate for OS in metastatic breast cancer. Thus, when PFS or DFS are used as primary endpoints, OS should be used as a supporting secondary endpoint and vice versa, which explains the frequent use of OS as a secondary endpoint in both phase II (43.2 %) and III trials (70.8 %) (European Medicines Agency 2013). Our quantification of endpoints also aligned with a previous study, which assessed the use of OS and PFS as a primary or secondary endpoint in publications of phase III trials in metastatic cancer, and found that PFS was more frequently used as a primary endpoint while OS was more frequently used as a secondary endpoint (Raphael and Verma 2015).

Factors other than intervention type could affect endpoint selection. Through multivariate logistic analyses (data not shown), intervention type, randomization, and cohort were found to be common factors influencing selection of PFS and RR, which were the two main endpoints of interest in our study. From this, it can be confirmed that even considering other factors that could influence endpoint selection, time was a statistically significant factor and that trend in endpoint selection has changed over time regardless of changing trends in trial design.

Prior studies have used the published articles to assess endpoint selection in clinical trials. To comprehensively assess endpoint selection, we assessed endpoints in registered clinical trials, including unpublished trials. From this analysis, we found significant differences in the selection frequency of some endpoints between published and unpublished trials. This shows that trends in endpoint selection observed based on an examination of the literature may differ from the actual endpoint selection of clinical trials because of the abundance of non-published trials, as well as multiple publications of a single trial with significant results. Thus, inclusion of both published and unpublished trials differentiates our study from previous studies conducted.

Although improvements in the quality of life (QoL) are an important marker of clinical benefits, measures of global health-related QoL (HRQoL) have not served as primary efficacy endpoints in oncology drug approvals (U.S. Food and Drug Administration 2007). A similar study on non-small cell lung cancer found that QoL was often used in the past, but its use is now limited (Ghimire et al. 2013). For QoL to be used as a primary endpoint to support drug approval, improvement in symptoms and reduction in drug toxicity need to be distinctive because apparent effectiveness could represent less toxicity rather than actual drug efficacy (U.S. Food and Drug Administration 2007). According to our study, QoL was rarely used as a primary endpoint in both phase II and phase III studies in both time periods. This could be due to several reasons. First, QoL may be considered more important for decision making in a clinical setting rather than for drug approval (Osoba 2011). In more than 60 % of advanced breast cancer studies, significant differences were not observed in QoL improvements between treatment groups, and thus, this may be one of the reasons why QoL assessments were rarely used as primary endpoints (Bottomley and Therasse 2002). Secondly, the implementation of patient-reported outcomes is expensive and time consuming (Joly et al. 2007; Osoba 2011). Additionally, using HRQoL assessments appears to be more useful in the area of supportive care, which was excluded in our study (Osoba 2011). Inclusion of only advanced stage trials could also be a reason for the limited use of this outcome measure, as several studies found poor compliance to HRQoL assessments in advanced breast cancer patients, resulting in incomplete data (Bottomley 2002; Bottomley and Therasse 2002).

Safety is always an important factor that needs to be considered in drug approval because the drug needs to have an acceptable safety profile as well as efficacy. Safety is especially important in the case of cytostatic drugs because patients treated with cytostatic therapies need to use the drug for a prolonged duration, and late onset, and chronic, or irreversible toxicity is more likely to occur (Schilsky 2002). The study results showed that 10.8 % of phase II and 7.5 % of phase III studies described safety as their primary outcome measure, while more than half of both phase II and phase III studies described it as a secondary endpoint.

In the recent years, several efforts have been made to optimize the assessment of clinical benefit to provide patients care with value. Owing to the rapid development of new, expensive medications, and other technologies, the consideration of value is becoming increasingly important. The European Society for Medical Oncology (ESMO) developed a validated tool, the ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS), to measure the magnitude of clinical benefit of new anticancer drugs and interventions without the bias of overestimation or overstatement. Such a tool enables the relative ranking of the anticipated magnitude of benefit of any new treatment (Cherny et al. 2015). In the USA, the American Society of Clinical Oncology (ASCO) has taken initiatives and proposed new thresholds for cancer drug approval for four conditions, which include breast cancer as well as other cancers for evaluating the effectiveness of alternative cancer treatment options (Ellis et al. 2014; Schnipper et al. 2015). This initiative, the ASCO value framework, has been implemented in recent studies (Sanoff et al. 2016; Li et al. 2015). Since all of these actions were implemented after our study period, they were not considered in our study. However, because they may have significant impacts on the trend of endpoint selection in the future, investigators should consider them when deciding on endpoints.

Biomarkers can act as alternatives to the traditional endpoint of survival. The main examples of promising biomarkers include circulating tumor cells (CTCs) and free tumor DNA. CTCs are thought to play a role in metastasis and studies have shown that the number of CTCs can be a prognostic and predictive marker of overall survival (Nelson 2010; Cohen et al. 2008; Cristofanilli et al. 2004). Free tumor DNA may also be an ideal surrogate marker, as its levels have been associated with disease burden and progression (Schwarzenbach et al. 2011). Using biomarkers as surrogate endpoints in trials can accelerate the drug development process. Thus, although they were not included in our study findings because of a relatively small number of trials using them during our study period, it is clear that biomarkers can play a critical role in the drug development process in the future.

Using ClinicalTrials.gov registry data in the evaluation of trends in endpoint selection, we aimed to overcome the limitations of using only published data. However, the Clinicaltrials.gov registry has limitations of its own. Although ClinicalTrials.gov has procedures in place for quality assurance, such as an automated evaluation system and individual review before public posting, they are insufficient to ensure the quality of information entered by investigators (Becker et al. 2014). Studies have shown that trials lack clear descriptions of the primary endpoints (You et al. 2012; Zarin et al. 2011). Endpoints are often defined inconsistently, in particular TTP and PFS, which are often used interchangeably (Gourgou-Bourgade et al. 2015; Hudis et al. 2007; Mathoulin-Pelissier et al. 2008). In our study, although a shift from the use of TTP to PFS as the primary endpoint in phase III trials is apparent from the values reported, such results may not be meaningful owing to limitations in the data. The sum of the frequency of PFS and TTP was 68.7 % in cohort A, which was similar to that in cohort B (71.5 %). This implies that investigators may have shifted to use PFS instead of TTP for the same intended outcome measure. In support of this hypothesis, one clinical trial that had entered TTP as a primary endpoint, as observed at the start of our study, was found to have changed the term to PFS upon follow-up. Furthermore, during the progress of our study, we found a trial that had changed its study type from phase II to phase III 5 years after registration, which denotes a lack of validity in the information provided. Although FDAAA has made submission of information of clinical trials to the registry database mandatory, no guidelines are currently in place to control the quality of the information submitted, and the information can be freely edited at any point in time. Despite such possible limitations, studies have shown a fairly high completion of mandatory information, of which “outcomes” is one (Glas et al. 2014; Mathieu et al. 2009; Ross et al. 2009; Zarin et al. 2005, 2011). The use of ClinicalTrials.gov as a source of data for the purposes of this study should be more than sufficient in assessing overall trends in endpoint selection. As we have done with advanced breast cancer, application of a similar approach to other cancer types appears feasible for various purposes in future studies.

Conclusions

Our study assessed endpoint selection reporting in phase II and phase III clinical trials of advanced breast cancer, as it affects millions of women worldwide. For phase II trials, a decreasing trend for ORR and an increasing trend for PFS were observed. Despite ongoing debate over the use of PFS as an endpoint in advanced breast cancer trials, it was the most commonly used primary endpoint in phase III trials. To our knowledge, this is the first study to assess endpoint selection in advanced breast cancer clinical trials over a decade, and as the selection of appropriate endpoints is crucial for the success of clinical trials, changing trends should be considered when deciding upon primary and secondary outcome measures to demonstrate drug efficacy and safety.