Introduction

Increasing fracture predisposition is considered a major health problem among our aging society [1, 2]. In view of the efficacy of exercise in favorably affecting bone [3, 4], fall risk [5, 6], and impact [7], exercise should be a very effective tool to reduce fractures in older adults. Indeed, a recent meta-analysis [8] that focuses on this issue reported significant effects of exercise on relative overall fracture risk (RR 0.49; 95 % CI 0.31–0.76), i.e., a result on a par with many potent pharmaceutical agents [9]. However, the scientific base of this meta-analysis is limited since all the studies included were seriously underpowered, and consequently, most researchers reported fractures as an experimental study endpoint or simple observations. Since subordinated study endpoints do not necessarily have to be reported, the temptation to publish positive results only was high [10, 11]. The resulting publication bias and the problem that only the minority of the present exercise studies designed their intervention to optimally address fracture reduction [12] by means that adequately affect bone strength, fall incidence, and fall impact may further confound the real effect of dedicated exercise programs on fracture reduction.

Thus, the aim of the present study was to evaluate the long-term effect of exercise on clinical overall fracture and bone mineral density (BMD) in subjects at risk with a sufficient statistical power.

Our main hypothesis was that our exercise program significantly reduces risk and rate of clinical low-trauma fractures in postmenopausal females compared with their non-training controls. The secondary study hypothesis was that BMD of the exercise group changed more favorably over the 16-year interventional period compared with non-training controls.

Methods

Design overview

To evaluate the long-term effect of an intense multipurpose exercise program on risk factors of postmenopausal women under special regard of bone, we started the Erlangen Fitness and Osteoporosis Prevention Study (EFOPS) in 1998. In this article, we address the primary study endpoints “clinical overall fracture” and “bone mineral density.” The study protocol was approved by the ethics committee of the University of Erlangen (Ethikantrag 905, 4209, 4914 B) and the Bundesamt für Strahlenschutz (S9108-202/97/1). The study was conducted by the Institute of Medical Physics, University of Erlangen (FAU), Germany. All study participants gave written informed consent. The study was fully registered under www.clinicaltrials.gov (NCT01177761).

Setting and participants

After application of our criteria: (a) time frame of 1–8 years menopausal, (b) osteopenia according to the World Health Organization (WHO) (−1 > T-score < −2.5 SD), (c) no known osteoporotic fractures, (d) no secondary osteoporosis, (e) no use of medication or diseases affecting bone metabolism, (f) no inflammable or cardiovascular events (e.g., myocardial infarction, stroke); 137 eligible subjects in the Erlangen-Nuremberg (Germany) area were recruited (Fig. 1). Based on their own decision, 51 subjects joined the sedentary control group (CG) that intended to maintain their present lifestyle and physical activity level, and 86 subjects joined the exercise group (EG) that underwent the long-term exercise regime described below.

Fig 1
figure 1

Flow chart of the progress through the EFOPS study. Superscript 1, excluded subjects starting an osteoanabolic therapy during the interventional period

Table 1 gives the initial characteristics of both groups for all subjects included at baseline and for subjects included in the 16-year follow-up assessment. No significant baseline differences at all were determined between or within the groups.

Table 1 Baseline characteristics

Intervention

The study intervention has been described in detail elsewhere [1315]; thus, only a brief summary will be given here.

Calcium and vitamin D supplementation

Based on dietary protocols (see below), all study participants were individually supplemented with calcium (Ca) (1998: up to 1500 mg/day; 2008: up to 1000 mg/day) and vitamin D (Vit. D) (up to 500 IU). After 5 years of free supplementation, subjects were asked to maintain their prescribed Ca and Vit. D intake giving resources for low-priced supplements.

Exercise program

Certified trainers briefed monthly by the principal investigator supervised all the group exercise sessions over the 16-year period. Subjects of the EG kept individual training logs that were checked every 12 weeks in order to determine participants’ compliance and attendance. The EFOPS protocol generally scheduled two group classes of 60–65 min and two home training sessions of 20–25 min for 49–50 weeks per year. Exercise intensity was regularly adapted (see below) on subjects’ physical performance.

Group classes

The 20–25 min warm-up/endurance sequence started with 5–10 min of running variations or dancing and 10–15 min of low- and high-impact aerobic dance exercises with heart rates (HR) at 70–85 % maximum heart rate (HRmax) and peak ground reaction forces (peak GRF) at ≈2–3 × body weight. The number of high-impact loads was progressively increased during the first 3–4 study years (up to 150) and slightly decreased (20–25 %) during the last three study years.

The high-impact sequence (3 min) consisted of four sets of 15 repetitions of different multilateral jumping exercises. Complexity and impact of the prescribed jumping exercises progressively increased up to a peak GRF of ≈4–4.5 × body weight during the first 4 years while less-challenging jumping exercises (peak GRF ≈3–3.5 × body weight) were introduced during the last three study years.

Resistance exercise affecting all the main muscle groups was performed on machines (TechnoGym, Gambettola, Italy) or using isometric exercises, elastic bands, and free weights. After 9 months of conditioning, we consistently applied a periodized exercise schedule with 12 weeks of linearly or nonlinearly periodized high-intensity resistance training [16] on machines (9–10 exercises, 1–4 sets, 4–12 repetitions with 70–90 % one-repetition maximum (1-RM)) and 4–6 week of lower intensity (50–55 % 1-RM) but higher volume (13 exercises, 2–3 sets, 20–25 repetitions) or free weights (see below). Besides intensity, movement velocity was also consistently manipulated with periods of fast (explosive movement) and slow (up to 4 s) velocity. Exercise intensity prescribed in the participants’ individual training logs were either based on regular 1-RM tests (first 5 years) or the repetition number combined with the rate of perceived exertion (Borg CR-10 scale, [17]) [16]. Of importance, we did not intend a subject’s complete exhaustion by the maximum number of repetitions.

The machine-independent resistance session consisted of isometric exercises (12–15 exercises, 2–4 sets, 6–10 s), elastic band exercises (3–4 exercises, 2–4 sets with 10–20 repetitions), and a circuit of one to three sets of three resistance exercises using free weights or weighted vests, structured according to the machine-dependent protocol described above.

The 20- to 25-min home training prescribed a warm-up sequence, rope skipping, isometric and dynamic exercises, and a short stretching sequence carefully practiced in the group sessions beforehand. Home training protocols were changed every 3 to 6 months.

Measurements

The measurements detailed below were performed at baseline and repeated 16 years after the start of the intervention. Assessments were determined in a blinded fashion.

Primary study endpoints

  1. (a)

    Clinical overall low-trauma fracture

  2. (b)

    Bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN)

Overall fractures

Clinical overall fractures (i.e., all fractures during the last 16 years) were determined retrospectively by questionnaires combined with structured interviews after 4, 8, 12, and 16 years. In order to verify the fracture, subjects were asked to provide a medical report. Low-trauma fracture was defined as fractures occurring spontaneously without high load or falls from a standing height or lower [18]. Among the low-trauma fractures, we further identify major osteoporotic fractures (i.e., vertebral, humerus, forearm, proximal femur/hip) according to the WHO Fracture Risk Assessment Tool (FRAX®, [19]). All fractures caused by more serious trauma (e.g., vehicle/bicycle accidents or bicycle falls, falls from a higher level) were excluded from the analysis.

Bone mineral density

BMD at the LS and the FN was measured by dual-energy x-ray absorptiometry (DXA) using standard protocols of the manufacturer (QDR 4500a; Hologic, Bedford, USA). LS scans (L2–L4) and FN scans were independently analyzed by two experienced researchers. In three cases, severe degenerative changes prevented an adequate comparison of baseline and follow-up LS scan. Long-term (16-year) coefficient of variation for BMD at the LS was 0.5 % as determined by weekly spine phantom measurements.

Anthropometry

Height, weight, and waist circumference was measured using calibrated devices. Body composition was determined by bio-impedance technique (Tanita BF 305, Tokyo, Japan).

Questionnaires

Baseline questionnaires determined demographic parameters, pre-study physical activity and exercise levels, health risk factors with special regard to bone and quality of life parameters. Follow-up questionnaires conducted after 1, 2, 3, 4, 5, 8, 12, and 16 years were specifically designed to detect changes in confounding parameters (e.g., medication, diseases, lifestyle, physical activity, exercise, dietary pattern, and Ca/Vit. D supplementation) that may affect the study endpoints.

Nutritional analysis

The consumed food was weighted precisely and reported by the participants. The analysis of the protocols was performed by research assistants using Prodi-4.5/03 Expert software (NutriScience, Hausach, Germany). However, due to participants’ unwillingness to regularly perform this laborious procedure and the minor annual differences for calcium and vitamin D uptake, we decided to stop assessing dietary intake by this method and started to use a standardized calcium and vitamin D questionnaire [20] initially biyearly and later in 4-yearly intervals. A validation of this questionnaire with results of the 5-day dietary assessment resulted in corresponding differences of 10 % for calcium and 15 % for vitamin D uptake.

Statistical analysis

Estimated sample size calculation was based on the number of clinical low-trauma fractures (details in [15]). A completer analysis including all subjects with 16-year follow-up data was calculated, additionally a per protocol analysis that excluded subjects with osteoanabolic medication to specifically address BMD changes. Fisher’s exact test and negative binominal regression were used to determine differences between EG and CG for the number of subjects with fractures (risk ratio) and total number of fractures (rate ratio). According to their distribution, intragroup BMD changes were analyzed by paired t tests or Wilcoxon rank tests. Differences between the groups were consistently determined using Welch’s t tests. Effect sizes (ES) were calculated using Cohen’s d [21]. All tests were two-sided with a p value of less than 0.05 considered as statistically significant.

Results

Table 1 gives baseline characteristics of the EG and the CG. As given mean values (MV) and standard deviations (SD) did not differ significantly between both groups (EG vs. CG) or within groups (all subjects included at study start versus subjects included in the completer analysis).

Participant flow during the EFOPS study was shown in Fig. 1. In total, thirty-two subjects (CG: n = 5 vs. EG: n = 27) were lost to follow-up (total dropout 23 %) for reasons listed in Fig. 1. Due to pharmacological therapy (bisphosphonates, strontium ranelate, denosumab, teriparatide), seven subjects (EG: n = 4 vs. CG: n = 3) were excluded from the per protocol analysis.

Fifty-eight subjects of the EG (67 %) were still exercising after 16 years. Average exercise frequency decreased significantly (p < .001) from 2.46 ± 0.45 sessions/week during year 1 (average compliance rate 66 %) to 2.15 ± 0.40 sessions/week during year 16 (average compliance rate 57 %). This exclusively refer to the significant reduction (p < .001) of home training sessions (0.96 to 0.61 sessions/week), with no relevant changes for the supervised group sessions. In total, about two thirds of the EG participants exercised more than two sessions/week/year [22] during the 16-year study period. During the exercise sessions, one hairline fracture of the os pubis and four muscular lesions (strain traumas, muscle rupture) occurred.

Study endpoints

Clinical low-trauma fractures were observed in 17 subjects of the CG and 11 subjects of the CG (risk ratio 0.51; 95 % CI 0.23 to 0.97; p = .046), with 24 low-trauma fractures in the CG and 13 fractures in the EG (rate ratio 0.42; 95 % CI 0.20 to 0.86; p = .018). With respect to major osteoporotic fractures according to FRAX®, 15 low-trauma fractures were observed in the CG versus seven corresponding fractures in the EG (rate ratio 0.37; 95 % CI 0.14 to 0.88, p = .027). A further eight fractures were related to more serious trauma caused by bicycle/car accidents, bicycle falls, or falls from a higher level than standing and were thus excluded from the analysis. About half of the low-trauma fracture resulted from falls from a standing or lower level. Thus, our main hypothesis that our exercise program significantly reduces risk and rate of clinical low-trauma fractures in postmenopausal females compared with their non-training controls was verified.

Changes of BMD at LS and FN were given in Tables 2 and (completer analysis) and 3 (per protocol Analysis). Results however did not relevantly differ between methods. With respect to the completer analysis, based on comparable baseline values (p ≤ .608), LS-BMD (−1.5 ± 5.0 vs. −5.8 ± 6.3 %, p < .001) and FN-BMD (−6.5 ± 4.6 vs. −9.6 ± 5.0 %; p = .001) significantly decreased in both groups (p = .027 to .001); however, the decrease was significantly more pronounced in the CG (Tab. 2).

Table 2 Changes in the exercise and control groups analyzed by “completer analysis”
Table 3 Changes in the exercise and control groups analyzed by “per protocol analysis”

Consequently, our secondary study hypothesis that bone mineral density of the exercise group changed more favorably over the 16-year interventional period compared with non-training controls can also be accepted.

Changes of parameters with potential effect on our result

Dietary calcium and vitamin D intake slightly decreased from baseline to 16-year follow-up in both groups; however adjusted Ca and Vit. D supplementation ensures the total supply of 1500 mg/day Ca (i.e., from 2008, 1000 mg/day) and 500 IU/day vitamin D as prescribed by the study protocol.

Regular physical activity levels decreased nonsignificantly during the 16-year period, with no difference between groups (p = .553). Non-study-related exercise decreased comparably in both groups (EG: p = .007 vs. CG: p = .015), with the greatest reduction (p = .001) in exercise with high osteoanabolic relevance (e.g., HI-Aerobic, Squash) [23]. With respect to diseases that may affect bone metabolism, the number of subjects reporting thyroid (EG 14 vs. CG 12) or gastrointestinal (EG 1 vs. CG 2) dysfunctions or inflammatory diseases (EG 3 vs. CG 5) did not vary relevantly between the groups. Of relevance however, five EG subjects who dropped out before the 16-year follow-up cited serious diseases (cancer, chronic obstructive lung disease, arthrosis, rheumatoid arthritis) as the reason for their withdrawal from exercise.

Discussion

In the present study, we generated more than 1650 participant-years, which allows us at least to evaluate the exercise effect on clinical overall fractures. We are aware that a more refined approach, i.e., focusing on clinical LS or hip fractures would have been preferable, but the latter endpoints particularly would require sample sizes of 4- (vertebral fractures) to 15-folds (hip fractures) higher than generated by the present study [24]. Although only a speculation, we presumed that chronically underfinanced exercise studies would never be able to recruit, assess, and exercise enough subjects to address hip fractures in older adults.

In summary, we clearly ascertained a positive effect of the EFOPS exercise protocol on clinical low-trauma overall fracture risk (0.51; 95 % CI 0.23 to 0.97; p = .046) and rate ratio (0.42; 95 % CI 0.20 to 0.86; p = .018) with comparable data for major osteoporotic fractures according to FRAX® [19]. Whilst a comparison with pharmacological studies as a benchmark is not fully feasible given the latter’s featured high(er) evidence levels and more dedicated inclusion criteria, it does provide an insight into the dimensions of exercise-induced fracture reduction achieved. With respect to clinical overall fractures, zoledronate, that may actually be the most potent bisphosphonate [25], decreases fracture rate by 33 % [26] which is comparable with the anti-fracture efficacy of 32 and 35 % reported for denosumab [27] and teriparatide [28]. However, it would be completely inappropriate to conclude exercise may be a true alternative to pharmaceutical therapy since the large proportion of frail elderly, as the classical addressees, is unable or simply unwilling to start and maintain lifelong, frequent, and intense exercise programs [22, 29] comparable to the EFOPS protocol. Whether less intense and frequent exercise protocols, which may be more attractive for this susceptible cohort, may favorably affect clinical fracture rate still has to be proved, however.

It is difficult to attribute the positive effect of the EFOPS protocol on fracture incidence predominately to changes of bone or fall risk factors [30]. Given the initially early postmenopausal status of our subjects, the intervention focused mainly on bone and preserving bone density. This should not be taken to mean that risk factors related to falls were neglected. Lower leg strength and balance, the two most promising physical aims of fall reduction programs [6, 31] were consistently addressed using challenging exercises with a sufficient dose (i.e., moderate or high challenge to balance and strength for at least 2 sessions per week on an ongoing basis) [6]. Further, in order to address the increased fall risk of our cohort [32], less intense but more complex exercises/movements that challenge coordination and balance were provided increasingly during the aerobic dance session. However, due to the unwillingness of many CG participants to conduct a fall incidence calendar for at least 18 months (based on a corresponding power calculation), we refrained from assessing falls and were thus unable to report corresponding rate and risk ratios.

Returning to the bone aspect, changes for LS- and FN-BMD differed significantly between groups (p ≤ .001) with more favorable changes in the EG. However, BMD decreased significantly in both groups during the 16-year period with a more pronounced reduction for the proximal femur site. Of importance, the gap between the EG and CG for LS- (≈4.0–4.5 %) and FN- (≈3.0–3.5 %) BMD peaked around study year 4 and remained at least stable up to the present 16-year follow-up. These ongoing distinct BMD differences were much higher compared with the mean differences (LS 0.85 %; FN-region of interest −0.08 %; total hip-region of interest 0.41 %) summed up by meta-analysis for exercise trials with postmenopausal cohorts [3]; these rarely exceed 18 months [3, 33], however. This may indicate that the net effect of exercise on BMD has been underestimated in the past due to application of (too) short interventional periods. This is partially supported by Snow et al. [34], which reported very pronounced, significant net differences (EG vs. CG, 6.0 %) for FN-BMD after 5 years of weighted vest plus jumping exercise in postmenopausal women, while no group differences were detected after 9 months [35].

The study has several strong points: (a) the adequate statistical power due to the long study duration, (b) the identification and exclusion of fractures derived by serious trauma (e.g., car/bike accidents), (c) a homogenous group of osteopenic females initially 1–8 years postmenopausal, (d) the promising, regularly adapted exercise protocol strictly supervised by instructors, and (e) the attractiveness of the exercise program validated by low dropout rates [36].

Limitations

(a) Most critically, we opt not to randomize allocation to the study arm in order to generate high long-term adherence by motivated participants. From a methodical point of view, this approach clearly reduces the evidence level of the study; on the other hand, from a pragmatic perspective, it does provide a more realistic insight into the dimension of exercise effects in subjects who deliberately decided to start exercise. To be realistic, due to the fact that the majority of elderly subjects are unable or unwilling to start and maintain frequent and strenuous exercise programs [37], exercise-induced fracture prevention may be reserved for a motivated and physically fit cohort of subjects.

(b) We placed strong emphasis on the consistent and accurate determination of covariates (i.e., lifestyle, diseases, medication, diet and physical activity) that may confound our result and thus prevent a distinct deduction of a causal effect. However, due to the study length of 16 years, we cannot be certain that all the corresponding changes were detected and considered appropriately.

(c) All the BMD scans as determined by DXA were carefully examined by two experienced researchers; however, considering degenerative changes of the spine, quantitative computed tomography (QCT) may be the better choice for monitoring BMD in this cohort [38]. Due to the rapid progress in this area, the scanner (SOMATOM Plus 4, Siemens, Erlangen, Germany) initially used was replaced by several generations of more powerful CT devices.

In summary, the study confirmed the result of our recent meta-analysis [8] which assessed the effect of ten exercise studies on overall fractures in subjects 45 years and older (RR 0.49; 95 % CI 0.31–0.76). However, due to the low power of the individual studies and the likelihood of a publication bias discussed above, the present study further strengthens the evidence that exercise positively affects clinical fracture risk and rate. However, the decisive evidence has yet to be provided in a randomized controlled exercise trail with adequate power to address clinical, preferentially major, osteoporotic fractures. So far, however, we conclude that for subjects willing and able to exercise frequently, exercise may be the best choice for autonomous fracture prevention. We suggest a consequently lifelong exercise participation, although there is some evidence for a preserved effect of exercise during deconditioning (8 years after 2 years of exercise) at least with respect to vertebral compression fractures [39]. Considering further that the rather complex anti-fracture prevention programs with their inherent endurance, resistance, power, and general coordination components positively affected a large variety of risk factors and diseases [40] confronting older adults, the efforts to encourage subjects to start and maintain exercise should be emphatically intensified.