Introduction

Hip fractures among the elderly often result in disability, loss of independence, high societal costs, and death [1, 2]. On the other hand, low muscle strength and functional impairment have commonly been presented already before hip fracture [3,4,5,6,7]. In fact, more than 90% of hip fractures occur because of a fall [8], typically in sedentary and frail persons [9] with low bone mass [10]. It is known that poor physical function and low level of physical activity are associated with an elevated risk for fractures and death in the elderly [11, 12]. However, the use of simple functional tests for prediction of hip fracture and death in postmenopausal women before old age has not been established in long-term prospective settings.

While low femoral bone mineral density (BMD) is a risk factor for hip fracture [13], the majority of hip fractures occur in patients with “normal” or “osteopenic” BMD values. This makes population-based screening of osteoporosis using densitometry alone a non-optimal solution and is not recommended [14]. Although BMD variation on global scale does not reflect the expected incidence of hip fracture [15], profiling with risk factor tools (such as FRAX) and BMD is a clinically effective approach for preventing hip fracture and is a widely accepted strategy, at least in the over 75 age group. Currently, common fracture risk tools do not take into account the risk that arises from functional impairment, which usually arises due to other health disorders [16]. As multiple factors contribute to the hip fracture risk, combining BMD with other factors may improve the assessment of fracture risk in clinical use [17, 18].

Fall-related injury and fracture rates increase steeply with age. Hip fracture rates present one of the most dramatic changes with a rise of 100 to 1000-fold in the elderly over 60 years of aging [19]. Poor balance, low muscle strength, and impaired coordination are associated with frequent falling in frail nursing elderly [20]. Thus, the preservation of functional capability is of utmost importance in preventing falls [21, 22]. Crucially, the factors determining physical function remain modifiable even in old age [23].

Altogether, there are few prospective cohort studies examining the functional status of postmenopausal women and how such functional measures relate to BMD and register-based outcomes in a long-term follow-up setting. Therefore, research is needed to quantify the role of functional status and its decline in the prediction of fracture and death. This would help in identification of women who are most likely to benefit from exercise intervention.

We have previously shown an association between fracture risk and functional status in postmenopausal women [24]. In addition to self-reported fractures, the current study focuses on health registry data with hip fractures and mortality. Since we have carefully assessed baseline functional impairment in women subsequently followed up for a long time, we are now able to characterize the relationships between their task performance and key health outcomes in later old age.

Our objective was to investigate the ability of clinically applicable functional tests to predict fracture risk and mortality among postmenopausal women in a long-term prospective cohort study.

Materials and methods

Study design

The study population consisted of the ongoing Kuopio Osteoporosis Risk Factor and Prevention (OSTPRE) Study cohort. This population-based long-term follow-up study includes all the 14,220 Finnish women aged 47 to 56 years who lived in the Kuopio Province, Eastern Finland, in April 1989. A postal questionnaire was mailed to 14,120 of these women at baseline 1989 with a response rate of 13,100 (92.8%). The follow-up questionnaire was mailed in 1994, 1999, 2004, 2009, and 2014 to women who responded to the baseline enquiry and were alive at the time, respectively. The response rate varied between 80 and 93% throughout the study. The study has been approved by the Kuopio university hospital ethics committee on October 28, 1986, and is performed in accordance with the ethical standards by the Declaration of Helsinki. Oral and written information have been provided before the onset of data collection.

Bone mineral density measurement

In addition to the enquiry follow-up, baseline responders were asked about their willingness to participate in bone densitometry (DXA) measurement. Altogether 11,055 responders stated their willingness, which formed a pool for stratified random sample of 3686 women invited to the measurements. Out of these, 3222 women underwent the baseline DXA scan. This sample consisted of a random population sample (n = 2025) and 100% samples (n = 1197) of women with higher risk profiles: menopause within 2 years (n = 857), diseases or medication affecting bone (n = 245), and multiple behavioral risk factors (n = 95) [25]. The baseline sample (n = 3222) has been followed with bone densitometry and clinical measurements at 5-year intervals since 1989. The detailed description of DXA follow-up protocol has been published previously [26].

Clinical measurements

At the fifth-year follow-up visit (in 1994–1998), additional functional capacity measurements were introduced to the OSTPRE follow-up clinical measurements protocol, including grip strength, ability to stand on one leg for 10 s (SOL), and ability to squat down and touch the floor (SQ). Thus, in the current study, these fifth-year follow-up measurements were set as the baseline for this study. Altogether, the final sample of this study included 2815 women with valid baseline measurements of functional capacity, femoral neck bone densitometry (DPX-IQ, Madison, WI, USA), and postal enquiry data. Anthropometric measurements (height and weight) were recorded in light clothing without shoes, using calibrated weight scale and stadiometer. Body mass index (BMI) was calculated as weight (kilogram)/height (meter) squared. Femoral neck BMD was expressed as T-scores calculated using young Finnish female normative values.

All three functional tests were treated as dichotomous outcomes (no/yes). These included maximal grip strength result ranking in the weakest (≤ 58 kPa) quartile (mean 45.6 kPa, median 50.0 kPa), inability to squat down while touching the floor with fingertips and getting up without assistance (without using support or being assisted), and inability to stand on one leg for 10 s while resting hands on the hip. Any underlying medical conditions contributing to failure in functional tests were not diagnosed or classified on site. Grip strength was measured three times with a hand-held pneumatic squeeze dynamometer (Martin Vigorimeter; Medizin-Technik, Tuttlingen, Germany) from the dominant hand. Maximum strength was determined by calculating the mean value of the best two (out of three) attempts and results were divided into quartiles. Reproducibility of this method is considered reliable based on the intra-class correlation coefficient (ICC) of the grip strength measurement previously reported to be 0.87–0.97 for absolute values [27]. The women without any of the three functional impairments (no failed tests or in the lowest grip strength tertile) were treated as a referent category (n = 1600).

Covariates

Covariates of interest such as current smoking, alcohol consumption, duration of hormone therapy (HT) use, and menopausal status were recorded from the baseline inquiry. Women were considered menopausal after 12 months of amenorrhea. Smoking was questioned as average cigarette consumption per day and treated as a dichotomous variable of any current smoking (smoker/non-smoker).

Fractures and deaths

Fractures were classified in two mutually non-exclusive outcomes as any fracture and hip fracture. The hip fractures of the cohort were verified using the nationwide Hospital Discharge Register data (HILMO) as well as by postal enquiries sent to the participants (at 5, 10, 15, 20, and 25 follow-up years). All self-reported fractures during the years 1987–2014 were validated by patient perusals and hospital records. Information of circumstances contributing to fracture was not available. Seasonal distribution of fractures between groups was compared. The relevant International Classification of Diseases (ICD) codes were used to include femoral neck, pertrochanteric, and subtrochanteric fractures (ICD-10, S72.0–S72.2). Women with a hip fracture prior to the baseline visit, pathologic and periprosthetic hip fractures, were excluded (n = 12). We have previously shown that the observed number of hip fractures from the register data was significantly higher than the self-reported one. The patients with no response to postal inquiries had significantly higher hip fracture risk. Thus, relying on self-reports only would have resulted biased incidence and period prevalence estimates. Altogether, self-reports missed to capture 38% of hip fractures in this long-term follow-up cohort [28].

Time and cause of death were obtained until the end of 2014 according to the national adaptation of the International Statistical Classification of Diseases, Injuries, and Causes of Death (ICD) from the National Causes of Death Register. The death certification practice and cause of death register have previously shown to be very accurate [29]. Correspondingly, follow-up of the hip fracture risk analysis was stopped to the end of 2014.

Statistical analyses

The fifth-year follow-up visit date (including DXA, anthropometric data, and functional tests) of the OSTPRE study was regarded as the baseline for the analysis. Depending on the event of interest, follow-up was terminated to the day of death, first fracture, first hip fracture, at the end of the registry period, or last returned questionnaire date during follow-up (overall fracture analysis, based on self-reports). Overall fracture risk, hip fracture risk, and mortality were estimated with a time scale of years from baseline by using survival analyses, Kaplan-Meier curves for unadjusted analysis, and Cox proportional hazards regression model for adjusted analysis, with a mean (median) follow-up time of 13.8 (17.0) years, 17.4 (18.2) years, and 17.6 (18.3) years, respectively. In survival analyses, cases were censored at their date of death. Mortality did not appear to have significant effect on fracture risk results as a competing outcome. Cox multivariable proportional hazards regression model was used with other baseline covariates of interest, including femoral neck (FN) BDM, grip strength (kPa), functional capacity (SQ, SOL), age, height, weight, history of HT use (years), amount of physical activity (hours per week), and dietary calcium intake (mg/day). Other potential variables including duration of HT use, dietary calcium intake, and amount of self-reported physical activity were excluded from the final Cox model. Both physical activity and HT use were associated with better functional capacity and lower BMI, while neither had significant impact on adjusted fracture or mortality hazard models (data not shown). Proportional hazards assumptions between study groups were tested based on Schoenfeld residuals, while no significant variations were detected. Hazard ratios have been reported with their respective 95% CI. Potential non-linearity of continuous covariates (age, T-score, BMI) was assessed with the squared terms in the model. Slight correlation was detected between BMI and T-score (r = 0.39, p < 0.001), while the data met the assumption that multicollinearity was not a concern (tolerance = 0.85, VIF = 1.18; T-score, tolerance = 0.85, VIF = 1.18) and both were included in the analysis. The random sample of the study population (n = 2025) was extracted prior to the extraction of 100% sample including high-risk sample stratification for clinical measurements follow-up. No differences were detected between the stratified and random sample BMD values (t test, p = 0.9). The area under the receiver operating characteristic curve (AUC) and the corresponding confidence intervals (CIs) were calculated to estimate functional impairment status (y/n), age (years), and BMD (T-score) which predict the main outcomes of hip fracture and any fracture. Statistical analysis were conducted with SPSS version 23.

Results

Characteristics

The cohort consisted of 2815 women with a mean baseline age of 59.1 years (SD 2.9, range 53–66) and with valid measurement results (Table 1). According to the self-report, 93% of the women were postmenopausal at baseline. Half of the women (50.6%) reported HT use in the preceding 5 years, with the mean duration of 1.8 years. The qualifying percentages with squatting down to floor and stand on one leg for 10 s were 73 and 92.9%, respectively.

Table 1 Baseline characteristics of the total study population (n = 2815), referent group, and functional impairment group with their respective mean (SD) or proportions

Fracture incidence and all-cause mortality

Altogether, 650 (23.1%) women reported 718 fractures during the follow-up. Wrist (n = 279, 38.9%) and ankle (n = 118, 16.4%) were the most common sites of fractures. Women with functional impairment had a higher overall fracture risk (Fig. 1). Only hip fracture showed an exclusive type specific association with functional impairment (Fig. 2). The majority (77.3%) of all fractures occurred during winter (November–April). The referent group had higher seasonal variation in the overall fracture incidence: the majority of their fractures (86.0%) occurred during winter, compared to the functional impairment (68.3%, p < 0.01) group whose fractures were spread over the seasons. A total of 86 women sustained a hip fracture during the follow-up, without any seasonal variation. The crude hip fracture incidence per 100,000 person-years among referent and functional impairment groups was 113 ((95% CI) 93.1–135.9) and 261 (230.3–294.7), respectively (Fig. 2).

Fig. 1
figure 1

Kaplan-Meier survival curves for functional impairment (red) and referent (green) groups on cumulative hazards for any fracture by time (years) (log-rank, p < 0.01) with incidence of 23.1% (n = 650) during the follow-up

Fig. 2
figure 2

Kaplan-Meier survival curves for functional impairment (red) and referent (green) groups on cumulative hazards for hip fracture by time (years) (log-rank, p < 0.001) with incidence of 3.1% (n = 86) during the follow-up

The all-cause mortality during the follow-up was 16.8% (n = 473). A higher death rate was observed in women with functional impairment compared to the referents with mortality of 20.4 and 14.1%, respectively (log-rank p ˂ 0.001) (Fig. 3.). Examining each functional impairment, the highest death rate was observed in those that could not perform the single leg stand (SOL), followed by those with low grip strength (GS) and finally those that could not squat (SQ), with overall mortality of 30.5, 22.7, and 21.3%, respectively. The most common causes of death (ICD-10) were atherosclerotic heart disease (I251) (8.9%), breast cancer (C504) (3.4%), ovarian cancer (C56) (2.5%), and Alzheimer’s disease (G301) (2.3%). In the adjusted Cox model, baseline smoking (y/n), age (years), and functional impairment (any vs. none) remained independent predictors of death with respective HRs of 2.1 (1.6–2.7, p < 0.001), 1.1 (1.0–1.1, p = 0.001), and 1.4 (1.1–1.6).

Fig. 3
figure 3

Kaplan-Meier survival curves for functional impairment (red) and referent (green) groups on cumulative hazards for mortality by time (years) (log-rank, p < 0.001) with incidence of 16.8% (n = 473) during the follow-up

The final Cox multivariable fracture risk models including any functional impairment were adjusted for age, BMI, and BMD T-score, which all remained significant covariates for hip fracture with a HRs of 1.2 (1.1–1.3, p < 0.01), 1.1 (1.0–1.1, p < 0.01), and 2.5 (1.9–3.2, p < 0.001) per each unit of change, respectively. In multivariate fracture risk estimates, age did not appear as independent risk factor for any fracture with HRs of 1.02 (0.99–1.05, p = 0.3), 1.02 (1.0–1.04, p = 0.03), and 1.5 (1.3–1.6, p < 0.001) for age, BMI, and BMD, respectively. Prevalence for any functional impairment in stratified high-risk sample and random sample was 45.6 and 41.8%, respectively, with a borderline significance (chi-square p = 0.050). However, adjusted hip fracture risk estimates for any impairment in random sample were approximately the same (HR 1.9, 1.0–3.3) than in total sample results (HR1.7, 1.0–2.6).

The AUC was used to evaluate the goodness of functional impairment (any), age (years), and BMD (T-score) in the detection of fractures. In univariate model, all three risk factors appeared significant (p < 0.05) indicators of hip fracture, with AUC (CI95%) of 0.60 (0.54–0.66), 0.67 (0.61–0.73), and 0.70 (0.65–0.75), respectively. In hip fracture multivariable model with BMI and age, AUC (mean (CI95%)) estimate was 0.67 (0.62–0.73). By adding functional test status, BMD, or both risk factors simultaneously in the model, the estimates were 0.70 (0.65–0.75), 0.77 (0.73–0.81), and 0.78 (0.74–0.82), respectively. For any fracture as an outcome, the base multivariable model AUC estimate with BMI and age was 0.53 (0.51–0.56). By adding functional test status, BMD, or both in the model, the estimates were 0.54 (0.52–0.57), 0.60 (0.58–063), and 0.60 (0.58–0.63), respectively.

Bone mineral density

No difference was seen in femoral neck BMD (g/cm2) or T-score value between functional impairment and healthy referent groups (Table 1). The overall number of osteoporotic (T-score ≤ − 2.5) women at the baseline was low (2.5%, n = 69). Among the functional impairments, only women belonging to the lowest grip strength tertile had significantly lower (2.7%) baseline BMD value than the referent group (p < 0.001).

The relative bone loss rate in the available 15-year DXA follow-up subsample (n = 1401) was higher among the functional impairment group (n = 516) than in the referents (n = 885), with − 6.1% (SD 8.2) and − 4.9% (7.4) bone loss rates, respectively (p < 0.01). However, at the latest 20-year DXA follow-up measurement (n = 762), no difference between the impairment (n = 251) and the referent (n = 511) groups was observed, with final bone loss rates of − 6.7% (9.4) and − 6.0% (8.9), respectively (p = 0.3). Overall, the Cox multivariable showed 2.5× elevation for hip fracture hazard per SD lower BMD which put this study in alignment with previous literature.

Functional tests

Altogether, around one third (n = 959, 34.1%) of the women had at least one failure in functional tests (SOL, SQ). The most common disability was squatting down, touching the floor, and getting up without assistance (n = 759, 27.0%). Significantly fewer women failed the one-leg stand for 10 s (n = 200, 7.1%). In addition, weaker grip strength was observed among women with failed SOL and SQ compared to the referent group, with mean grip strength of 70.0, 55.9, and 77.1 kPa, respectively (p < 0.001). All functional assessments and their combinations with respective prevalence (n, %) are presented in Table 2.

Table 2 Functional impairments with their respective prevalence (n, %) and hazard ratios (95% CIs) for mortality and fractures in comparison to the referent (n = 1600). Crude and adjusted HRs are shown. Non-significant p values (p > 0.05) are indicated with ns. All other p values are significant (p < 0.01) for crude models and (p < 0.05) for adjusted models

Discussion

This study showed that simple functional tests (low grip strength, inabilities to squat down or stand on one leg) not only predicted hip fracture well, but also predicted mortality in postmenopausal women. The study also confirmed the multifactorial nature of hip fracture, where age, BMD, and functional status are all significant and independent contributors to the risk.

Previously, several life style factors have been identified as predictors for falls, fractures, and bone loss in the elderly [30,31,32]. Prior to our work, it had not been conclusively shown that functional tests were long-term predictors of postmenopausal fractures and mortality. Functional measurements, such as grip strength, are commonly used tools for the assessment of physical condition. They have been shown to have prognostic value for a variety of health outcomes throughout the population, regardless of age, gender, or socioeconomic background [33,34,35]. However, due to strong multifactorial and overlapping effects, infrequent outcomes such as hip fracture are challenging to predict. Even a single potential indicator such as BMD provides a more optimal approach whenever DXA imaging can be combined with clinical risk factors, thus resulting in higher specificity and sensitivity than either alone [36].

Confounding may always be present in observational studies, although no specific medical conditions affecting the functional test results were detected. The hip fractures of the cohort obtained from the nationwide Hospital Discharge Register are known to be accurate figures [28, 37]. Other fracture information (excluding hip) was based on follow-up self-reports which were validated by using medical records. Self-reporting has previously shown to be a relatively reliable way to obtain information about past major fractures in OSTPRE cohort, where 84% proved to be true fractures [38]. Although the absolute number of fractures is likely to be an underestimate, we do not believe that self-reporting would have limited the reliability of the main results. Selection of specific limitation, such as systematic underreporting of fractures among the impairment groups, cannot be totally excluded. However, if women in impaired groups reported fractures with lower reliability than others, the potential bias would be conservative rather than an overestimate of events. Despite the validated outcome events based on self-reports and register data including hip fractures and mortality, our study was of observational nature without record on actual course of events and circumstances leading to fracture. A clear majority of fractures occurred during winter (November to April), matching with the period when local temperature remains below 0°C (data not shown). During winter, the referent group had higher incidence of fractures, which suggests a stronger association to seasonal weather conditions [26]. However, this variation did not apply to hip fracture, suggesting a stronger relationship with functional capacity rather than outdoor exposure. Although outdoor activities may have exposed to falls, the main associations between physical impairment and subsequent hip fracture risk were clear (Fig. 4). Falls combined with low BMD are a common cause for frailty-related fractures and a considerable cause for medical expenditure of non-fatal injuries [39, 40]. In this study, a reasonable number of hip fractures during the very long follow-up period also provide a meaningful risk estimation and enable comparison between types of functional impairment. However, the number of women with different combinations of impairment remained small for conclusive risk estimates. After adjusting for BMD, these results showed that the added value of combinations of impairment for fracture prediction was relatively low. While baseline BMD did not have difference between groups, its contribution to fracture risk estimates remained the most significant factor in all models.

Fig. 4
figure 4

Cumulative effect of different functional impairments on hip fracture risk (adopted from Table 2.). For complete set of hazard ratios with functional impairment combinations and outcomes of interest, see Table 2

The strength of this study was a large population-based cohort of Caucasian women with a long follow-up time combined with clinical measurements and validated registry outcomes for hip fractures and mortality. The cohort presents a homogenous sample of postmenopausal women before old age with relatively narrow age range. The study demonstrated a set of quantifiable physical tasks, which can be regarded as a threshold for generic functional capacity needed in everyday life. The simplicity of the tests suggests that they should have clinical utility for screening and risk evaluation of frailty-related health outcomes, but more studies are needed to determine their true clinical value. The finding that elevated risk was detected relatively early after menopause and well before accumulation of fractures, combined with the fact that physical functioning is modifiable, makes these findings appealing. The inability to stand on one foot for 10 s had the smallest failure rate but the highest predictive hazard ratio for any of the outcomes. The unilateral posture demands hip, core, and leg muscles to compensate accordingly with the proprioceptive system to provide additional support for the body. Standing on one foot provides a constant challenge on both of these properties, muscle coordination and balance, which might explain the highest risk prediction.

Tests like timed up and go or gait speed have previously shown evidence for long-term prediction of falls, fractures, and survival in the elderly [41,42,43]. It has been suggested that BMD contributes less to fracture risk when another strong risk factor, such as frequent falling, is present [44]. A similar association between clinical balance measures and FRAX® has also been demonstrated, suggesting that functional tests could bring additional value for fracture risk estimates [45]. To study the improvement of risk estimate, we would have needed statistical model of FRAX®, which is currently inaccessible for integration of risk factors such as functional tests. However, the results indicate a need for further studies with functional tests that can be done without additional devices to determine if there is improved fracture prediction.

In conclusion, the simple functional tests described here predict hip fracture, overall fracture risk, and mortality among postmenopausal women. The tests have potential for clinical application, by assessing the degree of functional impairment and subsequent hip fracture risk, well before the onset of actual injuries. Furthermore, performance in these tasks can provide meaningful and tangible goals for an individual or for societal public health programs involving rehabilitation. However, pragmatic clinical trials are needed to evaluate how reversal of these functional deficits would be associated with the reduction of adverse health outcomes.