Introduction

Greater physical activity (PA) has been linked to a decreased risk of breast cancer [1], which is believed to reflect its effect on circulating estrogen (ER), insulin sensitivity, immune function, and adiposity [2]. As these mechanisms may also influence breast cancer survival, PA has gained interest as a modifiable lifestyle factor that may improve mortality. Recent studies have examined the relationship between PA and breast cancer survivorship, generally reporting reductions in mortality with greater activity [3, 4]. However, to date, only one study has explored the effects of post-diagnosis PA at different times on breast cancer survival, [5] but it was limited to a Chinese population with short follow-up and therefore this issue has gone unaddressed in a U.S.-based study.

To assess the effect of PA after diagnosis on mortality among breast cancer survivors, we analyzed data from a large, population-based cohort of women who were diagnosed with a first primary breast cancer. We also considered possible heterogeneity of these effects over time since diagnosis, hormone receptor status, and pre-diagnosis body size.

Methods

We analyzed data from the Long Island Breast Cancer Study Project (LIBCSP), a population-based study [6] of newly diagnosed breast cancer cases. The objective of the follow-up to the parent case–control study was to assess factors associated with survival after diagnosis. Both the parent and follow-up studies were approved by the Institutional Review Board of participating institutions.

Study population

Participants were English speaking adult women enrolled in the parent case–control study with a first primary in situ or invasive breast cancer diagnosed in 1996–1997 from Nassau and Suffolk counties in New York. Incident cases were identified through pathology departments of participating hospitals and their physicians were contacted to confirm the diagnosis and obtain permission to contact the patients. Of the 1,837 eligible cases, a total of 1,508 (82.1 %) agreed to participate and provided signed informed consent. Of these, 1,414 women agreed to be contacted at a later date for the follow-up interview, done by mail approximately 5 years after diagnosis; informed consent was obtained by telephone for 1,098 women. Of the 316 who refused to participate, 60 refused at mail contact, 65 refused at telephone contact, 18 refused due to illness, 22 were unable to complete the interview, 55 were lost to follow-up, and 96 were deceased with no identifiable proxy. Of those agreeing, 1,033 subjects (68.5 % of the original 1,508 women) completed the follow-up interview [7], which gathered information after diagnosis.

Outcome assessment

Date and cause of death through December 31, 2009 were established using the National Death Index (NDI) [8], a standard source of mortality data for epidemiologic research [9]. For the 1,508 cases from the parent study, we constructed variables indicating death from any cause (n = 444) and death due to breast cancer (n = 203) using International Classification of Disease code 174.9 or C-50.9. Cases without a death record in the NDI database were deemed alive on December 31, 2009.

PA assessment

Recreational PA was assessed through structured interviews at baseline and follow-up using a modified questionnaire developed for a previous study of PA and breast cancer [10]. The questionnaire was semi-open ended and assessed length (start and stop dates) and duration of participation (number of months per year) and average number of hours per week for each activity reported; number of months per year of each activity was converted to number of hours per week. Where an activity was missing duration, 12 months per year was assumed for non-seasonal activities and the average number of months per year was imputed for seasonal activities. A metabolic equivalent of energy expenditure (MET) score was assigned to each activity [11] with those activities that did not have a corresponding published MET score assigned the MET value from a similar activity. The activity-specific MET value was then multiplied by duration of activity in number of hours per week, which was added across all activities for each subject and averaged to calculate the average total MET hours per week for each subject. From the baseline interview, average lifetime PA was calculated (utilized as a covariate in the analysis), while data from the follow-up assessment were used to calculate the primary exposure: average number of MET hours per week for each year after diagnosis up to the time of the follow-up interview, yielding up to 7 follow-up measures of PA.

Covariates

Questionnaires were interviewer-administered at baseline (in person) and at follow-up (by telephone) to assess menopausal status, education, income, treatment, and other factors that may influence the development/prognosis of breast cancer, including height in meters (m) and weight in kilograms (kg) in the year before diagnosis, which were used to calculate body mass index (BMI, weight in kg/squared height in m). Tumor stage and ER and progesterone receptor (PR) status were gathered from medical records of the 1,402 women who signed a medical record release at baseline. Treatment and tumor characteristics were gathered from medical records for 598 of the women who signed a medical record release at follow-up. The treatment data from the medical record matched closely the self-reported data (kappa coefficients: radiation therapy κ = 0.97, chemotherapy κ = 0.96, and hormone therapy κ = 0.92 [12]), and thus the more complete self-reported data were used. Tumor size was obtained from the New York State Cancer Registry.

Statistical analysis

Approximately one-third of the sample (n = 506) did not respond to the follow-up questionnaire and were missing information on post-diagnosis PA. There was also missing information on start and stop dates for 10.6 % (n = 160) of the sample, precluding matching these activities to specific times. To account for missing data, we utilized a novel approach which we developed previously [13]. Our primary analysis assumed that the data were missing at random (MAR) with an ignorable missing data mechanism, requiring models for the outcome (here, a proportional hazards regression) and models to describe the distribution of the missing covariates (linear and logistic regression models, as appropriate). These latter models are ancillary and therefore not of inferential interest; their parameters estimates are not reported.

Post-diagnosis PA was categorized into 0 MET h/week (referent category), 0.01–9.00 MET h/week, and >9.00 MET h/week (equivalent to approximately 108 min/week of brisk walking at 4 miles/h or 68 min/week of jogging at 5 miles/h [14]). The cutpoint of 9 MET h/week corresponds to that reported in similar studies [15] and therefore facilitates comparison of our results. This categorization also corresponds to PA recommendations from the American College of Sports Medicine and the American Heart Association (moderate PA at an intensity of approximately 3.6 METs for 30 min/day, 5 days/week) [14]. For each time point, PA was modeled as a function of age at diagnosis, BMI 1-year before diagnosis and the previous year’s PA. The survival models included categorized PA in the previous period, and were adjusted for age, chemotherapy treatment, pre-diagnosis BMI (≥25 vs. <25 kg/m2), and tumor size, which have been shown to be related to both breast cancer survival and post-diagnosis PA levels [16, 17]. The inclusion of chemotherapy treatment in the PA model and other treatment variables (radiation therapy and hormone therapy) in the survival model did not change the parameter estimates (data not shown). Due to the complexity of the statistical models parsimony was a major consideration, and therefore these variables were omitted from the respective models in the final analysis. In addition to the overall association of PA with mortality, we conducted analyses stratified by time since diagnosis (before and after 2 years since diagnosis), ER/PR status (ER+ and PR+ vs. ER− or PR−), and BMI prior to diagnosis.

Missing data on chemotherapy (32.2 %), tumor size (31.6 %), and ER/PR status (34 %) were accounted for using logistic regression models for each as functions of age, and for chemotherapy and tumor size, income and education. Minimal amounts of missing data were noted for baseline PA (0.93 %), menopausal status (1.99 %), pre-diagnosis BMI (1.13 %), adult weight change (from age 20 to 1 year before diagnosis; 1.66 %), education (0.40 %), and income (0.27 %). These small amounts were unlikely to influence our results, and the additional cost of accounting for this exceeded the potential benefit, thus we excluded subjects missing these data. Our analysis thus included 1,423 women (94.4 % of the original 1,508 cases), with 420 total deaths, 195 due to breast cancer.

To evaluate sensitivity to the missing data assumptions, we fit two additional models for the main effects based on the one described above. The first assumes a non-ignorable missing data mechanism (labeled Model 2 in results), which included a logistic regression model for the probability of PA data being observed as a function of age and previous PA, which accounts for the potential for the data to be not missing at random (NMAR)—that the probability that PA is observed or not is related to its potentially unobserved value [13, 18]. This method can account for the potential influence of selection bias, where the sample we have full observation on may not be representative of the source population. Finally, we conducted a complete case analysis (Model 3 in results) where we assumed that the data constituted a random sample from the larger population, or that PA is missing completely at random (MCAR) [13, 18]. This model included 946 subjects (217 deaths from any cause; 101 breast cancer deaths) with non-missing assessments for PA, but accounts for missing data in the chemotherapy and tumor size variables.

These models (models 1–3 and the stratified models described above) were estimated within a Fully Bayesian framework, with vague prior distributions on the model parameters as described in our earlier work [13], yielding effect estimates equivalent to a standard frequentist analysis. Samples from the posterior distribution of the regression parameters (log-hazard ratios) were obtained with WinBUGS 1.4 [19], which was run for 100,000 iterations, discarding the first 50,000 as a burn-in sample, retaining every 5th iteration to reduce serial correlation. Posterior hazard ratios (HR) and corresponding 95 % credible intervals (CrI) were calculated by the anti-logarithm of the mean and the 2.5th and 97.5th percentiles of these samples, respectively.

Results

Median survival time among women in our study was 12.7 years with times ranging from 0.23 to 13.42 years. At diagnosis, most women were postmenopausal, and ages ranged from 25 to 91 years (Table 1). Less than half reported receiving chemotherapy treatment, while the majority received radiation therapy or hormone therapy.

Table 1 Characteristics of 1,423 women diagnosed with a first primary breast cancer in 1996–1997 on Long Island, NY, with follow-up assessments in 2002–2004

In Model 1 (Table 2), which assumes post-diagnosis PA is MAR, we note a substantial decrease in risk of death from any cause [HR (95 % CrI): 0.33 (0.22, 0.48)] and from breast cancer [HR: 0.27 (0.15, 0.46)] for the highest level of PA (>9.0 Met h/week) compared to inactive women. A somewhat smaller effect estimate was noted for women with more moderate activity levels for all-cause mortality [HR: 0.43 (0.20, 0.84)], but we saw similar associations across activity levels for breast cancer-specific mortality. This pattern was similar for Model 2 which considers the potential for the data to be NMAR; thus, it appears that the original findings are robust with regard to this assumption. The hazards ratios for the associations from Model 3, which do not account for missing data and are based on a complete case analysis only, are more pronounced, especially for breast cancer-related deaths; thus, a portion of these effects could be due to the pattern of missing data.

Table 2 Posterior HRs (and 95 % CrIs) for the association between all-cause and breast cancer-specific mortality and post-diagnosis PA levels (MET h/week) assessed yearly over entire follow-up, among 1,423 women diagnosed with a first primary breast cancer in 1996–1997 on Long Island, NY, and followed through December 31, 2009

For all-cause mortality, the inverse association with the highest level of PA appeared stronger during the first 2 years following diagnosis than later years, although an inverse association was noted during both periods (Table 3). The association of PA with mortality was more pronounced among women with tumors that were ER+ and PR+ compared to either negative. Of particular note: the CrI for moderate activity levels included one for ER−/PR− women, while an inverse effect at this activity level was most evident for women in the ER+/PR+ group. Finally, a stronger association of PA with all-cause mortality was also noted for women who were not overweight before diagnosis compared to those who were and a similar association was noted for breast cancer-specific mortality.

Table 3 Posterior HRs (and 95 % CrIs) for the association between all-cause and breast cancer-specific mortality and yearly post-diagnosis PA levels, stratified by time since diagnosis and pre-diagnosis BMI, among 1,436 women diagnosed with a first primary breast cancer in 1996–1997 on Long Island, NY, and followed through December 31, 2009

Discussion

In our analysis, greater levels of recreational PA undertaken after diagnosis were associated with substantially lower risk of death from any cause as well as death due to breast cancer in a large, population-based cohort of women who were diagnosed with a first primary breast cancer in 1996–1997. The beneficial effect of PA appeared slightly stronger in the time period following diagnosis, and also among women who were not overweight in the year before diagnosis for both all-cause and breast cancer-specific mortality. These findings also suggest that the effect may be stronger among women with hormone-dependent tumors, and a trend was noted for moderate activity to be somewhat more protective in this group. Nevertheless, the risk of death was substantially reduced among women who were physically active after diagnosis in all subgroups that we examined.

PA is associated with several metabolic consequences that may favor survival from breast cancer [20], which was the most common cause of death in our study. A potential mechanism for these effects is through the increase in insulin sensitivity and reduced endogenous ER production mediated by a reduction in adipose tissue [2123], but it could also be due to PA’s independent increase in the amount of sex hormone-binding globulin (SHBG) and improvement in insulin sensitivity [24]. While hormonal pathways offer the most convincing explanations, PA may also improve the immune response [25], possibly by promoting killer-cell, macrophage, and cytokine activity [26, 27] as well as upregulating antioxidant enzyme activity [28], which may protect against DNA damage.

Our group has recently reported an inverse association between recreational PA before diagnosis and mortality in this cohort [29], however, our findings here indicate that the effect of PA after diagnosis is stronger. Two recent reviews have focused on the relationship between PA and breast cancer survival [3, 4], with both concluding a fairly consistent inverse association between PA and breast cancer survival, with Patterson and colleagues noting an average reduction in relative risk of death of 30 % [4], similar in magnitude to a more recent analysis, [30] which is not as strong as the effect we report here. However, a report by Irwin et al. [15] observed a very strong risk reduction of being physically active 2 years after diagnosis; for women who expended 9 or more MET h/week compared to those who were inactive, mortality was reduced by two-thirds (all-cause mortality HR: 0.33, 95 % CI: 0.15–0.73), which is similar to the magnitude of effect we observed.

While our findings are in general agreement with previous studies, our more pronounced associations are likely due to differences in study design. Our study followed women forward from diagnosis, while previous analyses included women who were well into their survivorship experience, usually 2–3 years post-diagnosis but as much as 4 years in one study [31], and over 10 years in another [32, 33]. Excluding women who do not survive past the first several years could induce length-biased sampling [34] and are thus not generally representative of all breast cancer survivors. Additionally, our PA assessment allowed us to obtain longitudinal measures over the entire follow-up period, while most previous studies utilized data from single time periods, which included pre-diagnosis and post-diagnosis activity [3, 4]. Timing of assessment is important, as previous research has suggested that among breast cancer survivors, PA levels decline during the first year, then increase, although only about half of the women return to their pre-diagnosis activity levels by 3 years [17]. Failing to fully capture the return to higher levels of PA among those who survive longer could partly explain the differences in reported associations. Only one other study to our knowledge utilized data on breast cancer survivors both near diagnosis until several years after [5], but this analysis was limited to women living in China. Their results indicate a stronger inverse association for activity undertaken several years after diagnosis, which is different from what we observed. Their findings with regard to ER/PR status were also not consistent with ours. However, our finding of a stronger inverse relationship among women with ER+/PR+ tumors is highly plausible since it is consistent with other observations that PA is associated with reduced steroid hormone levels among postmenopausal women, [35] who make up the majority of this cohort.

Strengths of our study include its larger size and population-based design that included breast cancer survivors from the time of diagnosis. Additionally, we were able to ascertain PA over several years allowing us to evaluate the associations of activity near or after diagnosis. We also employed a rigorous approach to deal with missing data, which is far superior to the ad hoc methods often employed in epidemiologic analyses [18]. Our comparisons of Model 3 with Model 1 illustrate the potential for bias in analyses that do not properly control for missing data. Our sensitivity analysis further illustrated the robustness of our main model (Model 1) with regard to assuming a potentially more problematic selection bias scenario (Model 2). These findings emphasize the potential for improper treatment of missing data to lead to erroneous conclusions [18]. While selection models like we used can be powerful tools when faced with missing data they do rely on untestable assumptions [36], and so sensitivity analyses, such as the one we employed, are important.

Possible limitations of our study include the use of self-reported PA [37], however, the comprehensive instrument we used was developed specifically for the study of PA and breast cancer [10] and has been successfully used in other studies [38, 39]. The use of proxy interviews is also a potential source of bias for PA data [40], however, the number of these was small (<8 %). Additionally, a previous report suggests that including data from proxy interviews of PA data does not necessarily result in bias [41]. The strong inverse associations noted could be due to healthier women being more physically active while those who were sicker were more inclined to be less active. However, this is unlikely to entirely explain the effects as we adjusted for markers of severity, tumor characteristics, and treatment. The results were also robust when considering PA later in the survivorship experience, when effects would likely be attenuated. Power for interactions was limited, in particular for death due to breast cancer, therefore, we did not conduct more comprehensive stratified analyses. Finally, the women in our study population were predominately white, and therefore these results may not be applicable to the broader population of all breast cancer survivors.

In summary, our results indicate that PA after breast cancer diagnosis is associated with better survival. We noted that the beneficial effect of PA may vary somewhat over time, by hormone receptor status, as well as by body size just before diagnosis, however, a reduction in mortality was consistently seen in all stratified analyses. Future research should consider PA over the entire survivorship experience in order to identify the most relevant time for interventions.