Introduction and background

Data suggest participation in clinical trials (CT) may improve cancer outcomes, such as reduced morbidity and improved survival, in cancer patients over standard management outside of the CT setting [1,2,3]. Only few analyses have involved breast cancer patients, and those that do include breast cancer of various stages with other histologies and associated stages [4,5,6,7]. In addition, the few studies concentrated on studying breast cancer, at any stage, have been focused solely on disease-free survival (DFS) and overall survival (OS) [8, 9]. These few studies examining CT participation on survival outcomes may be inherently biased due to the extent of follow-up inside and outside of a CT. To date, no breast cancer studies have examined CT participation in the neoadjuvant setting.

Neoadjuvant treatment for early breast cancer, with cytotoxics, hormonal therapy, or both, has been extensively evaluated and has been an established standard of care since the 1990s [10,11,12,13]. Data show neoadjuvant systemic management of breast cancer leads to better outcomes such as minimized surgical management and improved DFS [14,15,16,17,18,19,20]. Patients with a pathological complete response (pCR) are most likely to see improved DFS and OS [21, 22]. Moreover, for patients without pCR, this “in vivo” evidence may be beneficial for future cancer management and treatment decisions.

Although neoadjuvant management of early breast cancer with chemotherapy is a well-established approach to care, in early breast cancer, there continues to be extensive clinical research in the neoadjuvant breast cancer setting. Many of these clinical trials are directed at improving treatment-related morbidity and outcomes, as well as identifying potentially successful new systemic agents and biomarkers to be used in this setting [13, 23].

Therefore, it seems appropriate to investigate a well-documented endpoint that is accessible in the health record for all patients regardless of clinical trial participation, such as pCR after neoadjuvant chemotherapy.

We hypothesize patients participating in CT have a higher pCR rate and a lower mastectomy rate than those not participating in CT. The primary aim of this retrospective cohort study is to determine whether CT participation status is associated with pCR after neoadjuvant therapy for breast cancer, taking into account commonly established predictors of pCR. Additional analysis addresses whether participation status in clinical trials is associated with reduced mastectomy rates after neoadjuvant therapy for breast cancer taking into account established predictors of mastectomy. Exploratory study aims are the influence of trial participation on the prognosis with regard to DFS and OS.

Materials and methods

Patient selection

Patients for this retrospective study were from the Erlangen–Nuremberg tumor registry region, which includes all patients with invasive breast cancer diagnosed from 1995 to 2014 (n = 8614). Patients were excluded in the following hierarchical order: patients treated before 2001, (because clinical trials were systematically implemented in 2001) (n = 1813); patients not treated with neoadjuvant chemotherapy (n = 5226); patients with bilateral breast cancer at the time of diagnosis (n= 94); and patients with primary distant metastases (n = 92). Additionally, patients from studies not yet published, presented, and/or in study “follow-up” at the time of this analysis were excluded (n = 6). Further 345 patients were excluded because of missing information of pCR or at least one covariate, resulting in a final study population of 1038 patients (Fig. 1). All patients registered in the tumor registry gave written informed consent to be included in the registry and the ethics committee of the medical faculty approved this retrospective study.

Fig. 1
figure 1

Patient selection

Data collection

Data are collected prospectively, as required by the certification process of the German Cancer Society and by the German Society for the Study of Breast Diseases [24]. Accordingly, each breast cancer case is prospectively documented, including patient and tumor characteristics, detailed treatment data, and epidemiological data. Treatments are independently abstracted for all patients and are administered in accordance with approval trial protocols and national guidelines to ensure objectively homogeneous treatment of breast cancer patients across several institutions. Follow-up treatments and disease characteristics are collected for up to 10 years after the primary diagnosis. All histological tumor data, such as tumor size, axillary lymph-node status, grading, estrogen receptor (ER) status, progesterone receptor (PR) status, and human epidermal growth factor receptor 2 (HER2) status are documented. Additionally, comorbid conditions relevant to survival are routinely collected. For example, the center records renal function, cardiac disease, and diabetes mellitus characteristics. As part of the continuous certification process, the quality of the data is audited annually. Data obtained from the above-described collection and audit processes were used in the analyses presented here.

Definition of pCR and molecular subtypes

Pathological complete response (pCR) is defined as the complete eradication of invasive tumor cells from the breast and the lymph nodes (ypT0/is ypN0) after the chemotherapy at the time of the surgery, according to published and accepted criteria [22, 25]. Molecular subtype of the tumor is defined by hormone receptor (HR) status, HER2 status, and cellular proliferation rate (Ki-67). Luminal A-like tumors were ER or PR positive, HER2 negative, and had a low Ki-67 (≤ 14%); luminal B-like tumors were ER or PR positive, HER2 negative with high Ki-67 (> 14%); triple-negative (TN) breast cancers were required to be ER, PR, and HER2 negative. Finally, all HER2-positive tumors were considered to be included in a HER2-positive group, regardless of hormone receptor status or Ki-67.

ER and PR were assessed by immunohistochemistry (IHC) by individual institutional breast center pathologists and determined positive or negative according to the existing guidelines of the respective year of analysis. This was the same assessment used to determine hormone receptor status for the inclusion into clinical trials. HER2 was deemed positive when IHC staining indicated a 3+ result or the tumor was positive for fluorescence in situ hybridization (FISH) staining. FISH was performed systematically for all patients with IHC 2+ results for HER2.

Studies conducted at the breast center

During the study period (2001–2015), a total of 11 neoadjuvant clinical trials were conducted at the breast center. As noted above, one trial was excluded from the analysis due to the “open” status of the study. The 10 studies in which patients were enrolled and used in this analysis are listed below and in Table 1:

Table 1 List of clinical trials conducted in the observation period
  • A Randomized Phase III Trial Comparing Preoperative, Dose-Dense, Dose-Intensified Chemotherapy with Epirubicin, Paclitaxel, and CMF versus Standard-Dose Epirubicin-Cyclophosphamide Followed by Paclitaxel With or Without Darbepoetin Alfa in Primary Breast Cancer (Prepare [AGO]) [15, 16];

  • Preoperative Therapy with Epirubicin/Cyclophosphamide Followed by Paclitaxel/Trastuzumab and Postoperative Therapy with Trastuzumab in Patients with HER2-Over Expressed Breast Cancer (Techno[AGO]) [17];

  • A Randomized Phase III Study Exploring the Efficacy or Capecitabine Given Concomitantly or in Sequence to EC-Doc with or without Trastuzumab as Neoadjuvant Treatment of Primary Breast Cancer (GeparQuattro) [26,27,28];

  • A Randomized Phase II Biomarker Neoadjuvant Study of Sequential Doxorubicin Plus Cyclophosphamide (AC) Followed by Ixabelpilone Compared to Sequential AC Followed by Paclitaxel in Women with Early Stage Breast Cancer (Epothilon CA 163 − 100) [29, 30];

  • A Phase III Trials Program Exploring the Integration of Bevacizumab, Everolimus (RAD001) and Lapatinib into Current Neoadjuvant Chemotherapy Regimens For Primary Breast Cancer (GeparQuinto) [31,32,33,34,35,36];

  • A Randomized, Open-label, Multi-Center Study of Larotaxel at 90 mg/m2 or Docetaxel Every 3 Weeks, Alone or in Combination with Trastuzumab According to Her2Neu Status, Administered After a Combination of Anthracycline and Cyclophosphamide as Pre-operative Therapy in Patients with High Risk Localized breast Cancer (Satin) [37];

  • A Randomized Phase II Trial Investigating the Addition of Carboplatin to Neoadjuvant Therapy for Triple- Negative and Her2-Positive Early Breast Cancer (GeparSixto) [38];

  • Dual Blockage with Afatinib and Trastuzumab as Neoadjuvant Treatment for Patients with Locally Advanced or Operable Breast Cancer Receiving Taxane-Anthracycline Containing Chemotherapy (DAFNE) [39];

  • Pi3k Inhibition in Her2 Over-Expressing Breast Cancer: A Phase II, Randomized, Parallel Cohort, Two Stage, Double-Blind, Placebo-Controlled Study of Neoadjuvant Trastuzumab versus Trastuzumab + BKM120 in Combination with Weekly Paclitaxel in HER2-Positive, PIK3CA Wild-Type and PIK3CA Mutant Primary Breast Cancer (NeoPHOEBE) [40];

  • A Randomized, Multi-Center, Open-Label, Two-Arm, Phase III Neoadjuvant Study Evaluating Trastuzumab Emtansine Plus Pertuzumab Compared With Chemotherapy Plus Trastuzumab And Pertuzumab For Patients With Her2-Positive Breast Cancer (KRISTINE [Roche BO28408]) [41].

Statistical analysis

Patient and tumor characteristics of CT participants and non-participants are presented as means and standard deviations or frequencies and percentages.

The primary objective was to study whether CT participation is associated with pCR, taking into account the following predictors for pCR: age at diagnosis (continuous), body mass index (BMI, continuous), tumor size before chemotherapy (ordinal, cT1 to cT4), ER, PR, HER2 (each categorical; negative versus positive), grading (ordinal; G1 to G3), and year of diagnosis (continuous). “Year of diagnosis” was considered as predictor because changes of neoadjuvant treatment might lead to varying pCR rates over the course of time.

For this purpose, a logistic regression model was fitted with pCR (“yes” versus “no”) as the outcome and the above-mentioned predictors (the basic model). Subsequently, an additional logistic regression model was fitted containing trial participation (categorical, “yes” versus “no”), the predictors of the previous basic model, and the interactions between trial participation and the other predictors (the interaction model). Both models were compared using the likelihood ratio test. A significant test result indicates that trial participation influenced pCR beyond to the well-known predictors, either across all patients or at least within one of the subgroups defined by considered predictors. In case of a non-significant result, no further analyses were conducted in order to avoid false-positive results. If, however, the p value was significant, the interaction model was compared with a reduced logistic regression model, the basic model with trial participation added but without the interaction terms (the reduced model), using the likelihood ratio test again. In case of significance, subgroup-specific odds ratios (ORs) for trial participation adjusted for the other predictors were calculated, using the interaction model. In the case of a non-significant result, an adjusted overall OR for trial participation was calculated, using the reduced model.

Patients with missing outcome or missing predictor values were excluded from analyses. Continuous predictors were used as natural cubic spline functions to describe non-linear effect [42]. The number of degrees of freedom (1–3) of each predictor was determined as done in Salmen et al. [43].

The performance of the logistic regression models, with regard to discrimination and calibration (“goodness of fit”), was assessed using the area under the receiver operating characteristic curve (AUC) and the Hosmer–Lemeshow statistic. The AUC ranges from 0.5 (no discrimination between patients with pCR and patients without pCR) to 1 (perfect discrimination). In accordance with Hosmer and Lemeshow, patients were ranked with respect to the predicted probability of pCR and categorized into equal-sized groups based on percentiles. Frequencies of predicted events in each group were compared with frequencies of observed events in each group using a scatter plot and the Hosmer–Lemeshow χ2 test. A large p value indicates satisfactory calibration.

Model building was evaluated by 10-fold cross-validation with 20 repetitions to address overfitting. For this purpose, the complete model-building process (i.e., determination of cubic spline functions and estimation of regression coefficients) was carried out on each training set, resulting in several logistic regression models (one model per set), which were then used to calculate the AUC on the corresponding validation data sets. The average of all these AUC was taken as an evaluation measure. The smaller the difference between the cross-validated AUC and the original AUC, the lower the amount of overfitting.

Secondary objective was to study the association between CT participation and mastectomy rate. We repeated the logistic regression analysis replacing the outcome pCR by surgical management (“mastectomy” versus “breast-conserving therapy”).

With regard to survival analysis, first we compared follow-up duration for trial participants and non-participants. If follow-up differed, violating statistical assumptions, pre-specified survival comparison of CT participants and non-participants was not conducted. Instead, to confirm that pCR influenced prognosis, disease-free survival functions for patients with pCR and patients without pCR were estimated and compared within each group of patients (trial participants and non-participants), using the Kaplan–Meier product limit method and the log-rank test.

All of the tests were two-sided, and a p value of < 0.05 was regarded as statistically significant. Calculations were carried out using the R system for statistical computing (version 3.0.1; R Development Core Team, Vienna, Austria, 2013).

Results

Patient and Tumor Characteristics

Complete data for 1038 patients, 260 who participated in CT and 778 that did not, showed some demographic and disease-related differences (Table 2). Specifically, CT subjects were slightly younger, with larger tumors and varied tumor grades compared to non-participants. However, trial participants and non-participants were similar relative to body mass index, as well as comorbid conditions.

Table 2 Patient characteristics according to trial participation

Median age was 52 years (interquartile range 44–64 years) in CT non-participants, and it was 51 years (43–58 years) in CT participants. There was a clear difference in tumor size between patients treated within and outside clinical trials. While less than 10% of CT patients had T1 tumors, outside CT it was more than 20%; additionally, the CT patients with T2 tumors were > 70% while non-participants were < 60%. Other tumor characteristics such as grading, ER status, PR status, HER2 status, and molecular subtype were similar within both patient groups (Table 2). In total, 296 patients (29%) had a pCR. The association of pCR with year of diagnosis is shown in Fig. 2. Patients treated within a CT were treated with a mastectomy in 24% of the cases, while patients outside CT were had a mastectomy rate of 35% (Table 2).

Fig. 2
figure 2

Pathological complete response (pCR) rates relative to the year of treatment/diagnosis (solid curve) with 95% confidence intervals (dashed curves). Estimations are based on a simple logistic regression model with year of diagnosis as cubic spline function

Prediction of pCR and the role of study participation

Comparing a prediction model with and without CT participation showed trial participation significantly influenced pCR additional to the considered predictors (p = 0.03, first likelihood ratio test). The interactions between trial participation and the other predictors, however, were not significant (p = 0.06, second likelihood ratio test). Thus, we could not show the effect of participation differed between patient subgroups. The adjusted OR for trial participants versus non-participants was 1.53 (95% CI 1.03–2.28).

The reduced logistic regression model used to predict risks was well calibrated. The difference between actual and predicted events was quite low (p = 0.58, Hosmer–Lemeshow test). The discrimination ability of the final regression model was also good, at AUC = 0.829. The cross-validated AUC was 0.813, indicating minimal overfitting. The cross-validated AUC values of the basic and the interaction model were lower (0.810 and 0.809, respectively), confirming the main result that trial participation is predictive without differences between subgroups.

Prediction of mastectomy

Trial participation significantly influenced surgical management (p = 0.01, first likelihood ratio test). The adjusted OR for trial participants versus non-participants was 0.62 (95% CI 0.42 to 0.90). Again, subgroup-specific differences could not be shown (p = 0.054 second likelihood ratio test).

Trial participation and prognosis

Trial participants and non-participants differed with regard to follow-up time. The median follow-up time of CT participants without disease progression or recurrence during observation time of this study was 8.3 years, whereas the median follow-up time of non-participants without progression or recurrence was 3.2 years. The distribution of the follow-up time is shown in Fig. 3. Therefore, survival analyses according to CT participation status were not performed. However, we did examine disease-free survival based upon pCR within each group of patients (trial participants and non-participants) which demonstrated pCR was a predictor of disease-free survival in both groups of patients (Fig. 4a, b), although not statistically significant in the group of trial participants, most likely due to small sample size of that group.

Fig. 3
figure 3

Boxplots for the follow-up time in the groups of patients with and without trial participation

Fig. 4
figure 4

a Disease-free survival according to pCR in patients, who did not take part in clinical trials (p value for the log-rank test < 0.001). b Disease-free survival according to pCR in patients, who did take part in clinical trials (p value for the log-rank test = 0.09)

Discussion

This is the first analysis of CT participation and outcomes in early breast cancer treated in the neoadjuvant setting. It is also one of the only studies examining the potential impact of CT participation on discreet outcomes, such as pCR and ultimate surgical management of the breast. This study demonstrated that beyond known factors associated with pCR. CT participation significantly increases the chance of pCR in women with early breast cancer. There was a > 50% higher pCR rate in patients participating in a neoadjuvant CT than patients who were treated neoadjuvantly according to standard of care. Additionally, we examined predictive models for surgical outcomes that included CT participation. This analysis demonstrated that CT participation is significantly associated with an increased chance of breast-conserving therapy when compared to non-participants.

We chose achievement of pCR as the primary outcome measure, because it is less likely influenced by a detection bias than prognosis. Several studies report prognosis of breast cancer patients who take part in CT compared to patients treated outside clinical trials [4,5,6,7, 9, 44]. However, none of these studies demonstrated a clear benefit from patients taking part in a CT. Most of the studies reported a large difference between patients treated within and outside CT, indicating selection bias with regard to the compared patient populations. Patients in CT were reported to be younger [6, 7, 44] and to have higher stage disease [44].

In our study, there was a difference in age between the CT participant group and non-participant group; however, this difference of 2.1 years might not be clinically relevant. In this analysis, three of the trials had both lower and upper age limits and accounted for > 40% of the patients in the treatment group. Furthermore, age was controlled for in our analysis and it does point to the potential challenge of selection bias in clinical trials.

Year of diagnosis was of importance because of several reasons. First we observed that pCR rate increased over time. Therefore, this variable is of importance because of a cohort effect. Furthermore, study participation decreased over time. This might be due to the fact that studies were more and more designed for certain molecular subtypes and the percentage of patients taking part in clinical trials decreased generally because of that reason. However, since “year of diagnosis” was considered as predictor in the prediction models for pCR, all results obtained from the prediction models, particularly odds ratios, were adjusted for year of diagnosis. The predictor “year of diagnosis” was used as cubic spline function, i.e., non-linearly, to incorporate the association between year of diagnosis and pCR rate in the prediction of pCR as precisely as possible.

In addition to the above-mentioned selection bias, several more biases could influence the ascertainment of the follow-up information. A detection bias with regard to breast recurrences and metastases, as well as death, seems likely when comparing the prognosis of patients treated within and outside CT. In CT, there is significant effort put for ascertainment of this type of follow-up information. Outside CT, many patients do not return to their primary institutions and/or do not participate in regular follow-up visits. It has been shown that within the first 2 years post diagnosis, more than two-thirds of all patients do not return for surveillance mammography [45]. Indeed our study demonstrates that a median follow-up time of patients without events (recurrence or death) for patients not participating in CT was more than 5 years shorter than CT participants.

Additionally, in observational studies the Hawthorne effect has been described [46]. It showed patients who take part in observational studies could start to show a different behavioral pattern. Similarly, patients treated within CT could start to show that different behaviors influence prognosis differently than patients who did not take part in clinical trials. Since we studied the effect of participation compared to non-participation on a biological outcome, such as pCR, we demonstrated beyond patient behavior, “Hawthorne effect,” the benefit of trial participation, because it is unlikely that the percentage of patients who get a final surgery is different in study patients versus non-study patients.

Nonetheless, this study shows significant benefits related to CT participation. We demonstrated that CT participation results in higher pCR rates for patients of all examined subgroups. Also, the risk for mastectomies was lower in patients taking part in CT, which could be subsequent to better response rates of the tumors to the therapies under study. While we do not believe there is a detection bias in both subgroups and all patients had a similar likelihood to receive a final surgery, the clinical relevance with regard to prognosis remains unclear. There have been efforts to link increases in pCR rates with a corresponding improvement in prognosis [22]. However, the recent analysis by Cortazar et al did not show pCR as a surrogate of event-free survival or overall survival in a study population of more than 12,000 breast cancer patients treated with neoadjuvant chemotherapy. The only subgroup showing a trend in association between increased pCR rate and survival was the HER2-positive subgroup [22, 25]. Separate from the question whether increased pCR is a surrogate for improved long-term prognosis in a given treatment study, our analysis demonstrates that pCR is associated with improved survival for both CT participants and non-participants. Further analyses need to be conducted to explore the association between pCR and increased survival in individual treatment studies.

While the current study focused on an outcome, most likely not compromised by detection bias, it has limitations. First, it is a retrospective analysis and, therefore, there was a need to manage missing data and variables. There may have been some sample bias with regard to those patients who entered into clinical trials compared to those who did not. Some patients declined participation, others were ineligible for a variety of reasons, and for some there were no trials available at the time of their diagnosis. Because we have included patients who were not eligible for participation, there may be additional unknown predictors for pCR that were either not accounted or controlled for in this analysis.

Conclusion

In addition to a demonstrated association between improved specific outcomes for women with early breast cancer participating in clinical trials, such as pCR or reduced magnitude of surgical intervention, post neoadjuvant therapy, it is important to recognize the positive impact of clinical trial participation regardless of ultimate outcomes. It is these interventional clinical studies that provide the necessary evidence-based data required to develop new standards of care and to ensure the future progress of improved clinical impact for cancer patient management and care.