Introduction

With an estimated 1.67 million new cases per year, breast cancer is by far the most commonly diagnosed cancer among women worldwide [1]. While a long-lasting upward incidence trend now seems to have been broken in the US and some other western countries, the incidence and mortality of breast cancer is increasing rapidly in many Eastern European, Asian, Latin American and African countries [2]. Hence, primary prevention is urgently needed in many of these countries because early diagnosis and optimal treatment will remain beyond reach during the foreseeable future [3]. However, among the established and suspected causes of breast cancer, few are modifiable [4]. Besides reducing alcohol intake [5] and refraining from post-menopausal hormone treatment [6, 7], keeping a healthy body weight after menopause [8, 9] is one of the few modifiable risk factors considered causal.

With the aim of finding other realistic targets for primary prevention, physical activity has been on the agenda. A causal effect between physical activity and breast cancer is conceivable through numerous biologic pathways, involving adiposity, adipokines, insulin resistance, sex hormones, and inflammation [10]. The epidemiological literature on the subject is vast.

The possible interaction between physical activity and body mass (measured as body mass index, waist circumference of waist-to-hip ratio) has however, not been fully investigated. We addressed this unsettled issue in a large prospective study of post-menopausal women with information on weight, height, waist- and hip circumference and we used our validated instrument for self-reports of total energy expenditure during a typical day and night [11].

Methods

Subjects

The National March, was a four day national fund raising event arranged by the Swedish Cancer Society in almost 3600 Swedish cities and villages in September 1997. All participants were invited to fill out a 32-page questionnaire with detailed questions about physical activity, anthropometric measures and conceivable confounders such as contraceptive pill use and hormonal replacement therapy, as previously described [12]. Histories of previous cancer at baseline, as well as occurrences of incident breast cancers, deaths or emigration during follow-up, were ascertained through linkages to existing nationwide, complete and continuously updated registers [13]. Accurate linkages—and thus essentially complete follow-up—were attained thanks to the use of the individually unique National Registration Numbers (NRNs), assigned to all Swedish residents shortly after birth or immigration, as identifiers both in the baseline questionnaire and in all registers. The quality of the diagnoses recorded in the Cancer Register has previously been evaluated [14]. This study complies with the guidelines of the Declaration of Helsinki. The Research Ethics Vetting Board in Stockholm approved the study and all subjects gave informed consent.

While the total number of individuals who were actually given a questionnaire during the fund raising event could not be assessed, 43,880 participants handed in completed questionnaires. Compared to the general Swedish population in 1997, the cohort members were on average less educated (39 % had no more than compulsory schooling compared to 25 % of the general population), smoked less (9.6 % compared to 19.2 % of Swedes aged 16–84) and were slightly more overweight or obese; 43 % had a body mass index (BMI) greater than 25 kg/m2 compared with 40 % in the Swedish population according to Statistics Sweden.

Figure 1 is a flow chart of how we selected the study participants. We excluded questionnaires with conflicting, unreasonable answers, as well as questionnaires with incorrect NRNs. From the remaining 43,844 cohort members we excluded men (n = 15,657) leaving a cohort of 28,187 women. We further excluded women with a history of any type of cancer (n = 1884), and women who had emigrated (n = 303) before the beginning of follow-up, October 1st of 1997.

Fig. 1
figure 1

Flow chart of the study selection process, the Swedish National March Cohort

We limited our outcome to post-menopausal breast cancer, thus follow-up started at enrollment only for cohort members who were post-menopausal at baseline. Totally 12,429 women 40 years or older reported in the baseline questionnaire that they no longer menstruated and/or they filled in age of menopause. Women younger than 40 years who had not undergone bilateral oophorectomy, but indicated that they did not menstruate were not included in the study. Among those who indicated that they were still menstruating, 28 were 60 years or older. They were presumed to be post-menopausal, irrespective of their answer. Of those who did not answer any of the questions regarding menstruation, we classified the 120 who were 50 years or older as post-menopausal.

For the 13,140 women who were classified as pre-menopausal at baseline, follow-up started at presumed time of menopause, or due to bilateral oophorectomy, if they were still free of cancer and lived in Sweden. We were able to ascertain history of oophorectomy during follow-up by cross-linkage with the complete Patient Register. Thus, follow-up started for 6619 women when they subsequently turned 50 (age at enrollment <50), or when they turned 60 (age at enrollment ≥50), or when they underwent bilateral oophorectomy. We excluded 6038 women because they did not reach the estimated age of menopause by the end of follow-up, December 31, 2010. Likewise, we excluded 483 who died, emigrated out of Sweden, or who developed any cancer before age of menopause. After these exclusions, 19,196 remained in the final study cohort of post-menopausal women with no previous history of cancer.

Finally, from the cohort of 19,196 subjects, in analyses of total physical activity, those with missing information about physical activity (n = 1722) were excluded, whereas in analysis of anthropometric measures 913, 3399 and 3505 subjects were excluded due to missing values on BMI, waist circumference and waist–hip-ratio (WHR), respectively.

Exposure and covariates information

Weight, waist circumference (at the umbilicus), hip circumference (maximal trochanters), height, and physical activity were self-reported in the baseline questionnaire. How to measure waist and hip circumference was illustrated to facilitate correct measurement. We calculated WHR as the ratio of waist circumference to hip circumference. We assessed all types of activity using a comprehensive questionnaire especially developed and validated for this study. Total physical activity during a typical day was estimated using an instrument with nine ordered intensity levels [11]. Each step was assigned a value expressed as a multiple of Metabolic Energy Turnover (MET) and exemplified by common activities, the intensity of which has been previously shown to be homogenously valued by a wide variety of lay people. MET is a commonly used measure in physical activity questionnaires, and the exemplified activities in the instrument have been confirmed, through objective measurements, to correspond to the MET values of the respective intensity levels [15]. Participants were instructed to report the time spent on each intensity level, from sleeping/total rest (0.9 METs) to hard physical labor (> 8 METs), during a typical day and night. Thus, total physical activity time should add up to 24 h and allow for an estimate of METhours per day (MET h/day) using the equation of \(\sum\nolimits_{i = 1}^{9} {t_{i } \cdot MET_{i} }\). This instrument has previously been validated with a Pearson correlation coefficient of 0.73 with three 24-h recalls [11] and with the doubly labeled water method showing acceptable agreement [16].

Statistical analyses

Follow-up started on October 1st, 1997 or at reported or presumed time of menopause until breast cancer diagnosis, emigration, death or December 31st, 2010, whichever came first. BMI was also introduced in the model both as a continuous variable, and as a categorical variable using the standard classifications established by WHO (normal weight: 18.5 to <25 kg/m2, overweight: 25 to <30 kg/m2, obese: ≥30 kg/m2), excluding, due to small numbers, 227 women with a BMI < 18.5 kg/m2. We categorized waist circumference according to the International Diabetes Federation consensus, defining central obesity in Europids as a waist circumference ≥80 cm and severe obesity to ≥88 in non-pregnant women [17]. We chose 0.8 as the WHR cut-off, according to the international criteria for central obesity [18]. We analyzed total physical activity both as a continuous variable and as a categorical variable into low, medium and high levels by dividing MET h/day into tertiles with the cutoffs 31.2 and 38.1 MET h/day. To determine the temporal scale in setting the survival time, we used age at study entry.

Cox proportional hazards regression models, using attained age as the primary time scale, were fitted using post-menopausal breast cancer diagnosis as the indicator of failure (the first primary diagnosed breast cancer, ICD-7 code 170, excluding women diagnosed at autopsy), and emigration, death and end of follow-up as indicator of censoring. Maximum likelihood served to estimate crude and adjusted hazard ratios (HRs) with 95 % confidence intervals (CIs) of post-menopausal breast cancer incidence for different levels of BMI, WHR, waist circumference, total physical activity and their combinations. We tested linear trends of HRs by using the median value of each category as a single ordinal variable in the Cox regression model. Furthermore, as suggested by Knol et al. [19] we assessed effect modification both the additive and the multiplicative scale. In order to study the combined effects of total physical activity and BMI, WHR or waist circumference (all considered either as binary, above/below the median, or continuous variables), we added the relevant cross-product interaction terms to assess and test multiplicative interaction, using both Wald and likelihood ratio tests. In addition, we used the estimated coefficients from Cox regression to assess interaction on the additive scale using the Relative Excess Risk due to Interaction (RERI); RERI > 0 meaning positive interaction or more than additivity [1922]. RERI can be interpreted as the risk that is additional to the risk that is expected, based on the addition of the RRs (HRs) under exposure (RR+−, RR−+). This is calculated as the difference between the observed risk (RR++) and the expected risk (RR+− + RR−+ − 1), where R+- for instance is the relative risk associated to the first exposure among those not exposed to the second risk factor, and RR++, for instance, is the relative risk of individuals exposed to both risk factors compared to unexposed individuals.

In addition, to study possible nonlinear dose response associations of total physical activity with post-menopausal breast cancer, we generalized Cox regression by fitting restricted cubic splines for total physical activity and BMI, choosing four fixed knots, corresponding to the 5th, 25th, 75th, and 95th percentiles of the distribution of total physical activity and BMI [23]. The smoothing function was estimated using the whole distribution, but the plot (Fig. 2) is spanning the 90 % of the distribution in order to ignore unreliable fluctuations in both tails of the observed range. The proportional hazards assumption was investigated both for each single covariate and also globally by analyzing Schoenfeld residuals. We first produced the graphical plots and then carried out formal statistical tests of their independence over the rank transformation of time, but no departures from this assumption were found.

Fig. 2
figure 2

Adjusted Hazard Ratios (HR) of post-menopausal breast cancer in the Swedish National March Cohort according to total physical activity (MET h/day). The solid line indicate hazard ratios, and dashed lines indicate 95 % confidence intervals derived from restricted cubic spline regression, with knots placed at the 5th, 25th, 75th, and 95th percentiles of the physical activity distribution. The reference points correspond to the 25th percentile. The graph is truncated and ranges around 90 % of the distribution. The hazard ratios are plotted on a logarithmic scale and adjusted for age at enrollment, cigarette smoking status, alcohol drinking, use of vitamin and mineral supplements, education level, contraceptive pill use, hormonal replacement therapy, age at menarche, number of children, age at first full-term pregnancy and childlessness

We considered the following covariates, identified from the literature, as candidates for potential confounding: cigarette smoking (never, past, current); alcohol consumption (all types of alcoholic beverage: never, low (≤3 times/month), medium (1–6 times/week), high (≥1 time/day)); current use of vitamin and mineral supplements (yes or no); level of education (7–9, 10–12 or >12 years); current use of contraceptive pills (yes or no); hormonal replacement therapy (yes or no); age at menarche (<12, 13–14 or ≥15 years); number of births (0, 1 and ≤2), age at first full-term pregnancy (≤24, 25–29, ≥30), treatment for childlessness (yes or no). Because low body weight and low physical activity levels could be due to sub-clinical illness, leading to reverse causality, we run a sensitivity analysis, by fitting the same models, but excluding the first 2 years of follow-up; women who had breast cancer within 2 years were not included in such analysis, while the follow-up started 2 years later for everyone else.

Missing data

The proportion of missing data of the exposure and confounding variables was 5 % for BMI, 18 % for both waist circumference and WHR, 9 % for total physical activity, 8 % for smoking, 5 % for vitamin use, about 2 % for education, contraceptive pill use and hormone therapy, and less than 1 % for age at menarche, number of births, and treatment for childlessness.

Multiple Imputation Chained Equation (MICE) was implemented to assess the impact of missing data on the estimates. The procedure is based on the assumption of data missing at random (MAR) [24], and was applied by running the imputation model under two possible scenarios: the first imputing missing values in both exposure and confounders, incorporating complete information on age and survival outcomes (Nelson-Aalen cumulative hazard and breast cancer status), the second, imputing missing values on the confounders only, by using complete information on age and survival outcomes, and therefore not considering subjects without information in the exposure variables (total physical activity and anthropometric measures). Variables used in the imputation model were the same as the potential confounders listed above, together with the full set of pairwise interactions. We generated 20 imputed data sets of the analytic cohort and we pooled the estimated model coefficients using Rubin’s formula for standard error [25].

We performed statistical analyses with Stata: Release 13 (Statistical Software. College Station, TX: StataCorp LP). Reported probabilities (p values) were two-sided. Less than 0.05 was considered statistically significant.

Results

During a median follow-up time of 13.2 years and a total of 193,983 person years, there were 609 new cases of post-menopausal breast cancer. Descriptive statistics, stratified by physical activity levels, are shown in Table 1. At entry, mean age was 56 ± 11 years and mean BMI 24.7 ± 3.6 kg/m2. On average, sedentary women were older and heavier than those with higher physical activity. Education, current smoking, alcohol consumption, and reproductive factors were evenly distributed across physical activity levels.

Table 1 Selected baseline characteristics of participants, all and categorized by total physical activity level

Incidence rates and hazard ratios of post-menopausal breast cancer as a function BMI, waist circumference, waist–hip-ratio and total physical activity, are shown in Table 2. Women with medium (31.2–38.1 MET h/day) and high (≥38.2 MET h/day) levels of physical activity exhibited substantially lower age-adjusted incidence rates compared to the most sedentary women. The fully adjusted HRs (adjusted for age at enrollment, cigarette smoking status, alcohol drinking, use of vitamin and mineral supplements, education level, contraceptive pill use, hormonal replacement therapy, age at menarche, number of children, age at first full-term pregnancy and childlessness) revealed a statistically significant (p < 0.01) and dose-dependent upward trend with about 40 % higher breast cancer rate in the lowest compared with the highest tertile of total physical activity. Higher incidence rates of breast cancer were observed both for obese and overweight compared to normal weight women, with a fully adjusted HR of 1.58 (95 % CI 1.16–2.16) and 1.20 (95 % CI 0.97–1.48), respectively (p value for trend 0.01). A positive relationship could not be confirmed for waist circumference or waist–hip-ratio (Table 2).

Table 2 Post-menopausal breast cancer incidence rates and hazard ratios (HR) with 95 % confidence intervals (CI) for total physical activity, body mass index, waist–hip-ratio and waist circumference

Associations of various combinations of total physical activity and BMI levels with post-menopausal breast cancer incidence are presented in Table 3. We found consistent dose–response trends for both BMI and physical activity. Sedentary women had 18–45 % higher incidence rates compared to women with high physical activity in all strata of BMI. Likewise, obese women had 23–33 % higher incidence rates than normal weight women in all strata of physical activity. The breast cancer incidence was doubled (HR 2.07; 95 % CI 1.31–3.25) among physically inactive and obese compared with physically active and normal weight women. We could not detect any risk-ratio heterogeneity on the multiplicative scale (p value for statistical interaction = 0.98). The incidence gradients associated with the two factors seemed to combine in a simple additive manner on the log scale. Different measures of RERI and the Synergistic Index (S) did not reveal any effect modification on the additive scale either (p values > 0.25).

Table 3 Hazard ratios (HR) and 95 % confidence intervals (CI) for post-menopausal breast cancer (referent group: high total physical activity and normal weight). The Swedish National March Cohort, n = 14,143

When examining the functional relationship between total physical activity (MET h/day) and post-menopausal breast cancer, restricted cubic regression spline models showed that the simple linear model would be as good in explaining the change in the hazard rate. Such graphical conclusion was also confirmed by a formal statistical test of the second and third spline transformation (Fig. 2). Similar trends, although with less precision, were seen when we stratified the analyses on BMI category (Appendix Figures. 1a–c of supplementary).

Our model estimates did not change substantially when we carried out multiple imputation for missing answers. Nor did the sensitivity analyses excluding breast cancer occurring during the first 2 years of follow-up (n = 113) change our findings.

Discussion

In this prospective investigation with complete, register-based follow-up for 159 months after initial self-reported exposure assessment, the incidence rates of post-menopausal breast cancer were consistently higher in each category of increasing BMI and consistently lower in each escalating level of physical activity at baseline. There was no effect measure modification. Instead the excess incidence linked to these two factors seemed to simply add together so that the incidence was twofold higher in those who were both sedentary and obese compared to normal weight women with a high level of total physical activity.

The epidemiological literature provides numerous reviews and meta-analyses addressing physical activity and breast cancer. In one of the most extensive reports, “Food, Nutrition, Physical Activity and the Prevention of Cancer: a Global Perspective” [26], a panel of experts reviewed the scientific evidence regarding lifestyle and breast cancer. For post-menopausal breast cancer, the evidence that fatness increases the risk was deemed convincing, while the evidence for low physical activity and abdominal fatness was deemed probable [27]. The reviews by Wu et al. [28], Friedenreich et al. [10] and Zhong et al. [29] found a 12–25 % risk reduction in physically active women, compared with inactive women, while our corresponding figure was 29 %. Renehan et al. [30] found a 25 % increased risk for a gain of 10 kg/m2 (approximately equivalent to the contrast between normal weight women and obese women) which is a higher estimate than the 14 % found in this study.

In numerous previous studies an apparent protective effect of physical activity on breast cancer risk was more evident among lean than overweight women [3136]. But data are equivocal, with some reports suggesting higher benefit of physical activity among overweight than lean women [37, 38], and others finding no heterogeneity of the association between physical activity and breast cancer according to adiposity level [3941]. In our study we aimed to specifically examine whether there was possible interaction between the two exposures, physical activity and anthropometry, but we could not find any interaction, rather an additive effect, showing that women, regardless of BMI, may benefit from being physically active.

The precise mechanism by which physical activity may protect against breast cancer remains unclear. The hypothesized mechanisms include (1) lower serum levels of sex hormones and higher sex hormone-binding globulin (the role of sex hormones in breast cancer etiology can be considered well-established), (2) decreases in adiposity (which affects estrogen exposure in post-menopausal women), (3) changes in insulin resistance (insulin has antiapoptotic and mitotic effects in breast cancer cells), (4) decreased adipokines (thought to act via direct mechanisms, or by increased estrogen activity, or through associations with insulin resistance), (5) inflammatory markers (deregulating normal cell growth), and (6) other mechanisms such as enhanced immune function and decreased oxidative stress [10].

Strengths of our study include its prospective design, long and complete follow-up through unique linkage to virtually complete health registers, high quality exposure data because of well-motivated volunteers, strict criteria for the breast cancer end-points, and particularly the detailed assessment of physical activity based on a validated instrument.

One limitation is that physical activity was self-reported and assessed only at baseline. A single measurement may fail to adequately reflect the association between physical activity and breast cancer development over time. Although some misclassification is unavoidable, this is likely non-differential and would entail underestimation of the association between physical activity and breast cancer risk. Also, the long latency to breast cancer development places the exposure window of interest closer to the baseline exposure assessment. Repeated exposure assessment closer to the end of follow-up might instead increase the risk of reverse causation.

Waist and hip circumferences were also self-measured by the participants and thus prone to misclassification. Nonetheless, validity of self-measured waist and hip circumferences has been shown to be reasonably high among women with correlation coefficients between 0.7 and 0.97 [42]. Furthermore, correlations may be higher between waist–hip-ratio and visceral fat, than with subcutaneous fat [43]. Visceral fat is associated with increased bioavailability of estradiol, lower levels of sex-hormone-binding globulin, insulin resistance, hyperinsulinemia, adipokines and inflammatory markers [44]; all potential biological pathways relating physical activity to breast cancer.

Also, the nature of the study sampling event (a nationwide fund raising event for cancer research) may have affected sample selection into the cohort. This may have modified possible confounding patterns inherent to the general population; nonetheless, this does not per se introduce bias in the exposure-outcome estimates [45]. Still, given the limited sample size, it is not certain that chance could be fully ruled out as a cause for the stronger findings in this study, compared to others.

We did not control for potential confounding by family history of breast cancer associated with breast cancer risk. Further, information of tumor hormone receptor status and mammographic breast density was not available; however these factors cannot possibly modify the effect of physical activity on breast cancer. In a review of eleven studies, estrogen and progesterone receptor status was examined as a potential effect modifier of the association between physical activity and breast cancer risk, but no clear pattern emerged [46].

Our study adds to existing evidence that physical activity and BMI are related to risk of post-menopausal breast cancer. We did however not find any evidence of additive or multiplicative interaction between BMI and physical activity, suggesting independent effects. Our and others’ results highlight that physical activity and BMI may be the most important modifiable targets to decrease post-menopausal breast cancer incidence worldwide.