Introduction

Breast cancer is the most common malignancy in women worldwide and the second most common cause of cancer-related death [1]. Despite local control by surgery and radiotherapy, many women who present with apparently localized breast cancer will ultimately relapse and die from distant metastases. Randomized controlled trials (RCTs) have consistently demonstrated that adjuvant chemotherapy (AC) reduces the risk of distant metastases and improves overall survival in women with breast cancer [2]. Adjuvant chemotherapy is therefore routinely recommended today for many women who are at substantial risk of recurrence following surgery [24].

While the survival benefit of AC is well established, the optimal time from surgery to initiation of AC (TTAC) is not known [5]. The RCTs which demonstrated the efficacy of AC stipulated a maximum TTAC of 12 weeks or less. These RCTs therefore provide no evidence that AC is effective, except when it is initiated within those timelines. No RCTs have examined whether the timing of AC has an impact on survival. Observational studies therefore provide the only available information about the effects of delay in AC on outcomes, and the results of those studies have been inconsistent [631].

The question of whether delay in AC adversely affects outcomes is particularly important today because the time from diagnosis to the initiation of AC for breast cancer has significantly increased over the last decade [32]. We therefore undertook a systematic review and meta-analysis to determine whether delay in initiation of AC decreases its effectiveness, and, if so, to estimate the rate of loss of effectiveness over time.

Methods

The systematic review and meta-analysis was performed in accordance with the PRISMA and MOOSE guidelines [33, 34].

Literature search

Potentially relevant studies published in English were identified through a structured literature search of MEDLINE and EMBASE (1975-September 2015) using the Medical Subject Headings adjuvant chemotherapy, antineoplastic combined chemotherapy protocols, breast neoplasms, drug administration schedules, time to treatment, survival rate, survival analysis, and treatment outcome. These were combined with keyword searches to define a primary collection of studies. The Cochrane Database of Systematic Reviews, Cochrane Central Register of Controlled Trials, and Google Scholar were then searched. All reference lists were searched for other relevant studies. Finally, abstracts presented at the American Society for Clinical Oncology 2010–15, European Society for Medical Oncology 2010–14, San Antonio Breast Cancer Symposium 2012–14, and the IMPAKT Breast Cancer Conference 2011–15 were searched for reports of recently completed studies.

Selection criteria

Studies were required to meet the following inclusion criteria: all patients should have been treated with AC; time from surgery to initiation of AC (TTAC) should have been measured and clearly reported; and the relationship between TTAC and survival should have been reported. Where additional statistical information or clarification of data was required, the authors of the relevant studies were contacted by e-mail. Through this process, five extra studies became eligible for inclusion [10, 11, 15, 25, 28]. To reduce the risk of publication bias, published abstracts were eligible for inclusion. Studies that utilized non-standard forms of AC (e.g., perioperative chemotherapy) or examined the effect of sequencing of additional adjuvant therapies were excluded.

Validity criteria

A quality score was not utilized to grade the individual studies [3538]. The validity of eligible studies was evaluated as described previously [39, 40]. The greatest threat to the validity of non-randomized clinical studies is uncontrolled confounding [5]. Studies were therefore only classified as “high validity” (HV), if the waiting time groups whose outcomes were compared, were balanced with respect to key prognostic factors, or if the analysis had controlled for any imbalance in prognostic factors. For this purpose, age, stage (or tumor size and nodal status), and hormone receptor status were considered the key prognostic factors. Potential misclassification of palliative chemotherapy as adjuvant poses a second major threat to validity in studies that are based on administrative data [41]. Therefore, studies were only classified as HV if the adjuvant intent of chemotherapy was recorded directly and not inferred indirectly from the timing of chemotherapy in relation to surgery.

Data abstraction

Two clinicians (MR and JB) independently assessed potentially relevant studies with respect to inclusion and validity criteria. These reviewers also independently abstracted the necessary information to a pre-piloted electronic database, and the abstracted data were compared for fidelity.

Primary outcome measures

The primary outcome of the study was the association between TTAC and overall survival (OS), measured as the relative risk (RR) of death associated with a 4-week increase in TTAC. The association between TTAC and disease-free survival (DFS), measured as the RR of relapse associated with a 4-week increase in TTAC, was a secondary outcome. OS was selected as the primary outcome because it is the more objective outcome as defined by Savovic et al. [42]. Only HV studies were included in the primary meta-analysis. A secondary analysis including all eligible studies was done to determine the extent to which our validity criteria might have influenced our results [3538].

The primary studies available for meta-analysis compared the outcomes among groups of patients who had waited a longer or shorter time to begin AC. The waiting times of individual patients were unknown. The median waiting time for each group of patients was estimated from the reported range of waiting times, and the probable distribution of waiting times across that range. Median waiting times were assigned independently by two clinicians (JB and MR), and disagreements were resolved by consensus.

The number of waiting time groups differed from study to study, and the cut-off points used to define those groups varied widely. In order to combine the results of these disparate studies in a meta-analysis, the original results were first converted into a regression coefficient (β) and standard error (SE) corresponding to the RR associated with a 4-week increase in waiting time [39, 40]. Hazard ratios (HRs) and RRs were treated as equivalent measures of effect size [43]. For studies with two waiting time groups, β was calculated as \({ \log }\left( {\text{HR}} \right)/(\left[ {x_{n} - x_{0} } \right]* 3. 9 2)\), and the corresponding SE was calculated as \(\left( {{ \log }\left[ {\text{upper CI}} \right] - { \log }\left[ {\text{lower CI}} \right]} \right)/\left( {\left[ {x_{n} - x_{0} } \right]* 3. 9 2} \right)\), where CI denotes confidence interval, x n denotes exposure at group n level, and x 0 denotes exposure at reference group. If only a P value was provided, the SE was calculated as the “test-based” method: SE = (log[HR])/Zp, where Zp is the value of a unit-normal test (e.g., Zp = 1.96 if p = 0.05, 2-tailed test). For the studies with more than 2 groups, weighted linear regression of the log of the HR was used to estimate β with weights equal to the inverse of the variance of the HR estimates [43]. The SE for β was then computed using the approach described in Greenland and Longnecker [44]. This approach assumes a log-linear relationship between wait time and RR over the range of waiting times covered by the original studies.

Meta-analysis

There was no a priori reason to believe that the magnitude of the effect of delay on the effectiveness of AC would differ among the primary studies. We therefore set out with the intention of using a fixed-effects model. We had used this approach in two previous meta-analyses of the association between treatment delay and outcomes, and it had proved appropriate because we encountered only low heterogeneity of effect size [39, 40]. In the present study, we encountered a higher level of heterogeneity that was inconsistent with our assumption of uniform effect size. We therefore present the results of both the fixed-effects and the random-effects models.

In the fixed-effects model, the inverse variance (1/SE2) was used to weight individual studies. In the random-effects model, this initial weighting step was followed by a second step to un-weight the inverse variance weighting, by applying a random-effects variance component which is proportional to the variability of the effect sizes in the underlying studies [45]. Homogeneity was assessed by the Cochran χ 2 statistic and the I 2 statistic.

To screen for potential publication bias, we examined the symmetry of standard error-based funnel plots using the linear regression method suggested by Egger et al. [46].

All statistical analyses were performed using R package meta version 1.5-0 (R Foundation for Statistical Computing, Vienna, Austria).

Results

The search strategy yielded 1326 publications, of which 44 initially met our selection criteria (Fig. 1). Eighteen were excluded because their results duplicated or overlapped with other studies which were included. Twelve others were excluded because they presented no original information, or provided insufficient information for analysis. The characteristics and conclusions of the original studies without enough statistical information to be included in the meta-analysis are presented in electronic supplementary Table 1.

Fig. 1
figure 1

Flowchart of study selection

Table 1 Characteristics of studies selected for inclusion in this analysis

Table 1 shows the design of the studies included in the 14 publications selected for analysis, and the characteristic of the patients and their treatment. Five described secondary analyses of RCTs, and one of these included separate analyses of 3 different RCTs. Six described institution-based cohort studies. Three described population-based cohort studies, and one of these included separate analyses of two different populations. Thus, these 14 publications described a total of 18 unique study populations. Table 1 also shows whether or not each study met the validity criteria described above.

Table 2 describes the relationship between TTAC and outcomes reported in the 14 publications selected for analysis. The outcome observed in each waiting time group is shown relative to the outcome observed in the group with the shortest waiting time. Nine publications, describing 12 different study populations, reported outcome in terms of OS. Eight of these 12 studies met our criteria for high validity and were included in the primary meta-analysis of the association between TTAC and OS. Seven studies reported the related end-points of DFS and relapse-free survival. Five of these 7 studies met our criteria for high validity and were included in our primary meta-analysis of the association between TTAC and DFS.

Table 2 Outcomes reported in studies selected for inclusion in this analysis

Figure 2a shows the relationship between TTAC and the risk of death observed in the 8 HV, primary studies of OS. To test the assumption that any effect of delay on OS was constant over the entire period of the observations, we plotted the slope of the lines joining each pair of observations, against the time from surgery to the midpoint of the interval between the paired observations (Fig. 3). The mean slope of these lines (RR death/4 weeks) did not differ significantly depending on whether waiting time for AC was increased within 8 weeks of surgery, or 8 weeks or more after surgery (p = 0.99).

Fig. 2
figure 2

The relationship between TTAC and risk of death: Panel A shows the original results of 8 high-validity studies of OS. The risk of death in each waiting time group is shown relative to the risk in the group with the shortest waiting time. Panel B shows the relative risk of death as a function of TTAC, assuming that the any effect is constant over the entire period of the study. Each line represents the results of one study. The blue line represents the weighted average of the slopes of individual studies, and the shaded area represents its 95 % confidence intervals

Fig. 3
figure 3

The relative risk of death associated with a 4-week delay in adjuvant chemotherapy, as a function of the time from surgery to AC. The slope of the lines joining each pair of observations in Fig. 1a, described as RR/4-week delay, is plotted here against the time from surgery to the midpoint of the interval between the paired observations

Assuming that the relationship between TTAC and the risk of death remained constant over the time frame of the observations, we then used weighted regression to derive a single estimate of the association between TTAC and the RR of death for each study population, based on the data shown in Fig. 1a. These results are shown in Fig. 2b. The slopes of these lines, expressed as RR of death/4 weeks delay, were used to represent the results of the original studies in the meta-analysis.

Figure 4 shows the results of the meta-analysis of the association between delay and OS in the 8 available HV studies. The fixed-effects model showed that a 4-week increase in TTAC was associated with a significant increase in the risk of death (RR per 4 week delay = 1.04; 95 % CI, 1.01–1.08). Those results remained significant in a sensitivity analysis in which the three most highly weighted studies [8, 12, 19] were in turn removed from the pool and the meta-analysis was repeated in the remaining studies (data not presented). We found significant heterogeneity among the individual studies (I 2 = 60 %, p = 0.01), and we therefore repeated the analysis using a random-effects model, which confirmed that there was a significant association between TTAC and risk of death (RR per 4 week delay = 1.08; 95 % CI, 1.01–1.15).

Fig. 4
figure 4

The association between time to initiation of adjuvant chemotherapy and overall survival: high-validity studies only. The forest plots show the results of the meta-analysis of the 8 high-validity studies of the association between TTAC and OS. a Fixed-effects model. b Random-effects model

To explore the impact of the validity criteria on our results, we repeated the meta-analysis after including lower validity studies which had been excluded from the main analysis. These results are shown in Fig. 5. The inclusion of these additional studies increased the heterogeneity of effect size to I 2 = 94.9 %, p < 0.0001. Figure 5 therefore shows only the results of the random-effects model. This secondary analysis confirmed that there was a significant association between TTAC and risk of death, and the observed association was stronger than that which had been observed in the main analysis (RR per 4-week delay = 1.22; 95 % CI, 1.08–1.38).

Fig. 5
figure 5

The association between time to initiation of adjuvant chemotherapy and overall survival: all relevant studies. The forest plot shows the results of the meta-analysis of all 12 studies of the association between TTAC and OS in a random-effects model. High-validity studies are indicated by an asterisk

The standard error-based funnel plot shown Fig. 6 shows no significant asymmetry (p = 0.69), making it unlikely that the observed association was caused by publication bias.

Fig. 6
figure 6

Standard error-based funnel plot of the results of all relevant studies of the association between TTAC and OS

Figure 7 shows the results of meta-analyses of the association between delay and the risk of recurrence in the 7 studies, including 5 HV studies, which reported outcomes in terms of DFS or RFS. The results of the 5 HV studies were relatively homogeneous (I 2 = 42.0 %), and the fixed-effects model showed that a 4-week increase in TTAC was associated with a significant increase of recurrence (RR per 4 week delay = 1.04; 95 % CI, 1.00–1.08). The random-effects model gave similar estimates of effect size (RR = 1.06; 95 % CI, 0.99–1.12 for the 5 HV studies, and 1.05; 95 % CI, 1.01–1.10 for all 7 studies combined).

Fig. 7
figure 7

The association between time to initiation of adjuvant chemotherapy and disease-free survival: the forest plots show the results of the meta-analysis of 7 studies of the association between TTAC and DFS. a Fixed-effects model. b Random-effects model

Two population-based studies reported outcomes in terms of cancer cause-specific survival. Table 2 shows that both found strong associations between TTAC and the risk of death from cancer [17, 28].

Discussion

The main finding of this study is that there is significant association between TTAC and survival in women who receive AC for breast cancer. The primary analysis of high-validity studies showed that a 4-week increase in TTAC was associated with an increase in the relative risk of death of between 4 and 8 %, depending on the choice of analytic model (RR per 4-week delay = 1.04; 95 % CI, 1.01–1.08 in the fixed-effects model, and RR per 4-week delay = 1.08; 95 % CI, 1.01–1.15 in the random-effects model). These results were robust and remained significant when the most highly weighted studies were removed from the pool, and when additional lower validity studies were added to the pool. Funnel plots showed no significant asymmetry to suggest publication bias.

The main weakness of this meta-analysis is that it was based entirely on the results of observational studies. Differences in the distribution of prognostic factors between patients who waited longer or shorter times for AC might therefore contribute to the observed differences in outcome. If patients with higher levels of comorbidity or lower performance status waited longer to begin AC, then TTAC might be associated with poorer OS, because the risk of death from causes other than cancer was higher in those who waited longer. This might lead to overestimation of the impact of delay on OS. However, we believe that this is unlikely to explain the observed association between TTAC and OS, because we found similar associations with outcomes which are less sensitive to deaths from other causes (CSS and DFS). In contrast, if patients with more aggressive cancers started on AC sooner than others, then those with longer TTAC would be expected to have better OS, and this might lead to underestimation of the impact of delay on outcomes. To minimize this type of bias, only studies that controlled for major prognostic factors were included in our main analysis. However, we cannot rule out the possibility that our results may have been influenced by confounding by other factors which were not controlled for in this analysis.

Thus, our results are consistent with the hypothesis that delay in the initiation of AC for breast cancer causes a decrease in its effectiveness, but the validity of this conclusion may be questioned because it is not supported by Level 1 evidence [47]. However, Sackett created the typology of “Levels of Evidence” to evaluate evidence of efficacy and did not consider it suitable for evaluating evidence of harm [48]. Recognizing that randomized trials are often impossible in this context, he instead suggested that, in evaluating possible harm, we begin by asking: Was the type of study done, the strongest that could have been performed under the circumstances? [48] In the present context, well-controlled cohort studies, like those included in our primary analysis, provide the strongest possible, direct evidence of the effects of delay in AC, because it would be ethically impossible to carry out an RCT to study the effects of delay on outcome [5]. Sackett further recommended that “Hill’s criteria” should be used to evaluate the total body of evidence relating to potential harm [48, 49]. This approach includes consideration of the strength of the association; the consistency of the association in different studies; the presence of a dose-response gradient; the biological plausibility of the association; the support of experimental evidence; and coherence of the association with overall knowledge of the disease. In the present study, we found a statistically significant association between delay in AC and survival, and this association was observed consistently in different types of study and in different populations. There is a “dose-response” relationship between TTAC and the risk of death, at least over the 4-month range of TTAC for which data are available. It is certainly plausible that delay in AC decreases its effectiveness. This is not a post hoc argument; it is a longstanding clinical principle that AC should start as soon as possible after surgery, when the burden of residual tumor cells is at its lowest [50]. There is experimental evidence that the probability of eradication of a cancer by chemotherapy is inversely related to the burden of clonogenic tumor cells, and that delay in initiating AC provides the opportunity for tumor cell proliferation [50]. Furthermore, it has been shown in animal models that surgical removal of the primary may accelerate the growth of micrometastases [51, 52], perhaps due to a reduction in angiogenesis inhibitors following removal of the primary [53, 54]. Finally, the observation that delay in AC is associated with poorer survival in women with breast cancer is coherent with what is known about the impact of delay in cytotoxic treatment on cancer outcomes in other clinical contexts [39, 40]. Thus, when it is viewed through the lens of Hill’s criteria, the totality of the evidence strongly suggests that delay in initiation of AC causes a decrease in its effectiveness.

The precise magnitude of the effect of delay remains uncertain. As illustrated in Fig. 5, the association between delay and survival observed in the registry-based cohorts included in our review was much stronger than that observed in the RCT populations and in the institution-based cohorts. A more recent registry-based study [22], published after the cut-off date of our review, also reported a very strong association between delay in AC and survival. A previous meta-analysis which used less stringent validity criteria, and therefore included registry-based cohort studies in the main analysis, also reported a stronger association between delay and outcomes than that which we report here [31]. We are concerned, however, that confounding and/or misclassification of palliative chemotherapy as adjuvant [41] may explain the stronger association between delay and outcomes seen in the registry-based cohort studies. We therefore believe that the lower risk estimate, derived from the meta-analysis of clinical studies in which the patients and their treatment are better characterized, is closer to the truth.

Thus, we believe it is reasonable to conclude that a 4-week delay in AC is associated with increase in the relative risk of death of approximately 5 %. The absolute risk of delay for the individual patient, however, depends on her risk of death in the absence of delay. A 50-year-old woman with a pT1c, pN0, grade 3, ER-negative, HER2-negative cancer has a 30 % risk of death within 10 years without AC, and this risk is reduced to 20 % by timely, third generation AC (http://www.predict.nhs.uk/). A 4-week delay in starting AC would increase her risk of death to 21 %, an absolute increase of 1 %. In contrast, a 50-year-old woman with a pT2, pN2a cancer with similar tumor characteristics has a risk of death of 80 % without AC and 60 % with timely, third generation AC. A 4-week delay would increase her risk of death to 63 %, an absolute increase of 3 %. Longer delays would be expected to further increase the risk of death, although it would unwise to extrapolate beyond the 4 month range of TTAC which was studied here. Thus, the impact of a short delay in AC on the prognosis of an individual patient may be clinically important, but insufficient to completely eliminate the benefit of AC.

Although a short delay in AC has a relatively small impact on the prognosis of the individual patient, the potential societal impact of delays in AC is extremely important. Approximately 232,000 women were diagnosed with breast cancer in the US in 2015, of whom 150, 000 (65 %) would be expected to have an indication for AC [4, 55]. Assuming that a 4-week increase in TTAC causes an absolute decrease in survival of 2 % in the average patient, a 2-week reduction in average TTAC across the US would save 1500 lives.

In conclusion, our results show a significant inverse association between waiting time for AC and survival in breast cancer. We recommend that waiting times for AC should be as short as reasonably achievable, and that access to AC should be optimized to minimize delay.