Introduction

The central problem in osteoporosis is fractures. Typical fractures in osteoporosis occur as a consequence of minimal trauma, e.g. a fall from standing height. Each fracture is a strong risk factor for a subsequent fracture, so the primary aim of patient management is prevention of the first fracture. Therefore, the assessment of fracture risk has become very important in diagnostic and therapeutic procedures, and in recent years, new methods of assessing fracture risk have been developed. Among them, the best known and most often used are the WHO-approved FRAX risk assessment tool [1], and the fracture risk calculator proposed by the Garvan Institute [2, 3]. These fracture risk assessment methods have been evaluated in several studies [414]. Since the FRAX tool and the Garvan method were introduced less than 10 years ago, true longitudinal observations are not yet available, and studies investigating these methods should be regarded as “longitudinal retrospective” studies. Studies seeking to validate these methods should ideally include longitudinal observations over at least 10 years, but shorter follow-up times may also provide some validation of the methods.

Fractures occurring during follow-up should be related to the baseline level of fracture probability, e.g. patients sustaining a fracture during a period of observation should show a higher baseline fracture probability. Several studies have compared predicted and observed fractures, and the results of these analyses are not consistent. In some studies, the numbers of predicted and observed fractures were comparable [4, 6, 7], but in others, the number of fractures observed exceeded the number predicted [5, 811], and in only one study was the fracture risk underestimated [12].

In the current study, we calculated the baseline fracture probability using the FRAX tool and fracture risk using the Garvan nomogram in postmenopausal women from the RAC-OST-POL study. Knowledge of baseline fracture risk or probability can be used to determine the factors (or the number of factors) that really contribute to fractures during follow-up. This point is crucial in daily practice; we should know whether calculated fracture probability does indeed indicate those who are at risk of fracture. The aims of our longitudinal study were as follows:

  1. 1.

    Verification of whether incident fractures will occur during follow-up in subjects with higher baseline risk and probability

  2. 2.

    Comparison of “event (fracture)-free survival” between subjects with low and high fracture risk and probability

  3. 3.

    Determination of the risk factors that contribute to fracture incidence

  4. 4.

    Determination of the number of risk factors related to fracture incidence.

Methods

The RAC-OST-POL study is an epidemiological, population-based program designed to reveal the problems associated with postmenopausal osteoporosis. Baseline data of populations studied have previously been presented [15], and some aspects such as nutrition [16], functional status [17], the role of education, marital status, kind of job or place of residence [18], and visual impairment [19] have been shown to be associated with osteoporosis. In our sample, 28 % of women had a prior osteoporotic fracture [15]. These baseline data were obtained in May 2010, followed by longitudinal observation. We collected data on the incidence of osteoporotic fractures over several years. The 625 women included in the RAC-OST-POL study cohort were randomly selected from the local population of the Racibórz district, and previously reported analyses have included this population of 625 women [1519]. However, as well as the women invited by post, an additional 353 women who volunteered were also assessed. We considered that adding these women to the previously described, original randomly recruited population would be beneficial because data would be obtained from a larger population. Prior to this decision, we checked whether prior fracture prevalence was dependent on the method of recruitment (random or nonrandom). The chi-squared test showed that fracture occurrence did not differ between the groups (33 % in the random population and 29 % in nonrandom population, chi-squared = 1.24, p = 0.26). Thus all the women were combined in a larger group of 978 women because this process of inclusion did not influence fracture baseline prevalence. We also compared fracture probabilities obtained using the FRAX tool [1] and fracture risk from Garvan nomograms [2, 3] for any fractures and hip fractures, and values did not differ significantly in regard to random or nonrandom selection (data not shown).

We collected information on factors with a potential influence on bone metabolism and fracture risk including prior fracture, hip fracture in parents, smoking, rheumatoid arthritis, steroid or anticonvulsant use, alcohol intake, diabetes, thyroid disease, early menopause before the age of 45 years, malabsorption, and renal or liver failure using a structured questionnaire. Fracture probability was established using the Polish version of the FRAX tool (version 3.9) with and without bone mineral density (BMD; i.e. FRAX with BMD, and FRAX without BMD; http://www.sheffield.ac.uk/FRAX/tool.aspx), and fracture risk was calculated using the Garvan nomogram (http://garvan.org.au/promotions/bone-fracture-risk/calculator). A fracture probability of ≤10 % (obtained using the FRAX tool) was considered a low probability and a fracture probability of >10 % as a high probability according to recent recommendations proposed by a group of Polish experts [20]. Because a threshold separating high and low fracture risk had not been established for the Garvan method, we used the same values as for the probabilities obtained using the FRAX tool.

All women who were included at baseline were called once a year and asked about three issues:

  • Current therapy for osteoporosis

  • Incidence of fractures of nontraumatic origin, e.g. as a result of a fall from standing height without an external force

  • Number of falls in the last 12 months

All patients were asked for confirmation of any fracture by their doctor in order to include in the analysis only confirmed fractures. This procedure revealed no observed fractures without confirmation by a doctor. So, finally, all self-reported fractures in a follow-up period were included in the analysis. Phone calls were made in May 2011, 2012, 2013 and 2014. Data for all five observation points were available in 770 patients. The following reasons for loss of contact were noted: 175 women changed their phone number or their postal address (patients who did not respond to phone calls were sent a letter); 24 women died; 5 women did not undergo DXA at baseline; and 4 women refused to cooperate. BMD was determined using a Lunar DPX bone densitometer (GE, Madison, USA) for the femoral neck (FN BMD) and the total hip (TH BMD). Coefficients of variation were 1.6 and 0.82 %, for FN BMD and TH BMD, respectively.

Statistics

All calculations were done using Microsoft Office Excel and the STATISTICA program (StatSoft, Inc., Tulsa, OK; www.statsoft.com) run on a personal computer. Descriptive statistics of quantitative values are presented as mean values and standard deviations (SDs). The distribution of analyzed data was verified by the Shapiro-Wilk test. The t test for independent samples was performed for comparison of continuous variables between subgroups. In order to achieve age-adjustment when necessary, ANCOVA with age as a covariate was additionally performed. Values obtained using the FRAX tool in the whole study group were compared using the t test for dependent variables. Qualitative features are presented as the numbers of subjects and the percentage values in defined subgroups. The frequencies of qualitative features were compared between subgroups using the chi-squared test. The Kaplan-Meier method was used for analysis of event-free survival with the Cox F test applied for comparisons between subgroups of low or high fracture risk/probability. The traits with the strongest association with fractures noted during follow-up period were finally identified in logistic regression. All p values <0.05 were considered statistically significant.

Results

Baseline results

Table 1 shows the baseline clinical characteristics in the 235 subjects who reported prior fracture at enrollment and in the 535 subjects without fracture, and also shows BMDs, T-scores, Z-scores, fracture probabilities obtained using the FRAX tool without and with BMD, and fracture risks from Garvan nomograms for any fractures and hip fractures. Women with fractures were significantly older (p < 0.0001), had significantly lower BMD (p < 0.0001), and had significantly greater FRAX probability values (p < 0.000001) and Garvan variables (p < 0.000001) for any fractures and hip fractures. The differences in DXA results, FRAX probabilities, and Garvan values remained significant after age-adjustment (in ANCOVA with previous fracture as the grouping variable and age as covariate). The differences in FRAX with BMD probability values for hip fracture and the Garvan risk values for hip fracture remained insignificant even after adjustment for age. FRAX without BMD was significantly greater than FRAX with BMD (p < 0.0001 for both any fracture and hip fracture probability).

Table 1 Clinical characteristics of the population at baseline

Incident fractures during follow-up

During follow-up, 708 women had no fracture, and 62 others had a total of 71 fractures (9 women had a double fracture). Of these fractures, 53 were major fractures (i.e. those of the spine, hip, arm and forearm) in 46 women. The most common were forearm fracture (38 women), followed by ankle fracture (10 women), clinical spine fracture (7 women), hip fracture (4 women), arm fracture (4 women), foot fracture (4 women), rib fracture (2 women), and clavicle fracture (2 women).

Table 2 shows the clinical characteristics of the 62 women with incident fractures noted during the 4-year longitudinal observation period and the 708 women without fracture, and also shows BMD, T-score, Z-score, fracture probability, and fracture risk for any fractures and hip fractures. Mean age and body size variables did not differ between women with and without fracture. FRAX probability values (p < 0.05 – 0.01) and Garvan risk values (p < 0.0001) were significantly higher, and DXA variables (FN p < 0.01, TH p < 0.01) were significantly lower in women with incident fractures during follow-up.

Table 2 Clinical characteristics (data collected at baseline) of cohort studied at follow-up divided according to fracture status during follow-up

Fracture probability and risk versus fracture incidence

In the whole group of 770 women, we calculated the number with a FRAX probability ≤10 % (low probability) and with a FRAX probability >10 % (high probability). A probability of any fracture considered as high was noted in 79 women (10.3 %) for FRAX with BMD and in 147 women (19.1 %) for FRAX without BMD. Clinical characteristics of subjects with low and high fracture probabilities for FRAX with BMD are presented in Table 3.

Table 3 Clinical characteristics (data collected at baseline) of cohort studied at follow-up divided according to low (≤10 %) and high (>10 %) fracture probability estimated using the FRAX with BMD calculator

Among 46 women with a major incident fracture during follow-up, 10 (21.7 %) had a high FRAX with BMD fracture probability, whereas among those without fracture, 69 (9.6 %) had a high FRAX with BMD fracture probability. These percentage values differed significantly (chi-squared = 6.93, p < 0.01). Among 46 women with a major incident fracture during follow-up, 18 (39.3 %) had a high FRAX without BMD fracture probability, whereas among those without fracture, 17.9 % had a high FRAX without BMD fracture probability. These percentage values also differed significantly (chi-squared = 12.63, p < 0.01). The number of subjects with incident fractures during follow-up and a high fracture probability was greater when FRAX without BMD was used than when FRAX with BMD was used for analysis.

In a similar analysis of the Garvan risk data, among women with fracture, 81 % had a high fracture risk (>10 %), and among those without fracture, 70 % had a high fracture risk. These percentage values did not differ significantly. The clinical characteristics of women with a low and a high fracture risk according to the Garvan calculator are presented in Table 4.

Table 4 Clinical characteristics (data collected at baseline) of cohort studied at follow-up divided according to low (≤10 %) and high (>10 %) fracture risk estimated with the Garvan calculator

Figure 1 shows FRAX with BMD probability, FRAX without BMD probability, and Garvan risk for any fracture in relation to the age of the women studied.

Fig. 1
figure 1

Changes from baseline in FRAX fracture probability and Garvan fracture risk for any fracture over the age range of the whole study group

The Kaplan-Meier method was used to compare the fracture incidence during follow-up between women with a low and a high baseline fracture probability or risk. The event-free survival curves for women with a low and a high fracture probability according to FRAX with BMD are presented in Fig. 2 (in this analysis only major fractures were considered “events”). The cumulative fracture incidence during follow-up was significantly higher in women with a high fracture probability at baseline (12.7 vs. 5.2 %; p < 0.05, Cox F test).

Fig. 2
figure 2

Kaplan-Meier analysis of event-free survival in women with a low (≤10 %) and a high (>10 %) fracture probability estimated with the FRAX with BMD calculator (an “event” being defined as a major fracture during the follow-up period)

A similar analysis of the Garvan risk data is presented in Fig. 3. In this case, any incident fracture during the follow-up period was consider an “event”, and event-free survival curves were plotted for women with a low and a high fracture risk according to the Garvan calculator. The cumulative fracture incidence during follow-up was significantly higher in women with a high fracture risk at baseline (9.2 vs. 5.3 %; p < 0.05, Cox F test).

Fig. 3
figure 3

Kaplan-Meier analysis of event-free survival in women with a low (≤10 %) and a high (>10 %) fracture risk estimated with the Garvan calculator (an “event” being defined as any fracture during the follow-up period)

Risk factors for fracture

We confirmed that a prior fracture at baseline affects the fracture incidence during follow-up. Among women without fracture during follow-up, 29 % had a prior fracture, and among those with an incident fracture during follow-up, 45 % had a prior fracture. These percentages differed significantly (chi-square = 6.81, p < 0.01). We also confirmed that some clinical risk factors for osteoporosis affect fracture incidence: steroid use and rheumatoid arthritis were significantly more often present in women with a fracture during follow-up. Among 62 women with fracture, 9 (14.5 %) used steroids, and among 708 women without fracture, 31 (4.4 %) used steroids (chi-squared = 11.9, p < 0.001). The respective values in women with rheumatoid arthritis were 8/62 (12.9 %) and 39/708 (5.5 %; chi-squared = 5.43, p < 0.05).

We investigated whether the cumulative number of clinical risk factors noted at baseline would affect fracture incidence during follow-up. The women were grouped according to the number of risk factors (zero to five; Fig. 4). The number of risk factors at baseline was significantly associated with fracture incidence (chi-squared = 20.82, p < 001).

Fig. 4
figure 4

Number of risk factors reported at baseline among the whole study group

The odds ratios (ORs) for significant determinants of fractures during follow-up were established using logistic regression analysis. In model 1, with a major fracture as the dependent variable, baseline fracture, rheumatoid arthritis, and low FN T-score were identified as significant predictors of subsequent fracture. ORs calculated for these three factors were 1.8 (1.0 – 3.4), 3.07 (1.3 – 7.4) and 2.44 (1.1 – 5.5), respectively. In model 2, with any fracture as the dependent variable, there were four significant predictors: baseline fracture, rheumatoid arthritis, low FN T-score, and falls with ORs of 1.9 (1.1 – 3.3), 3.9 (1.7 – 8.7), 2.8 (1.3 – 5.8), and 1.8 (1.1 – 3.11), respectively.

Discussion

The most important finding of our longitudinal study was confirmation that fractures occurred more often in patients with higher baseline risk (according to the Garvan calculator) or probability (according to the FRAX tool) of fracture. This finding is a valuable clinical observation for professionals managing osteoporotic patients. Interestingly, although the calculators used provided predictions of fracture incidence with the perspective of 10 years, the discrepancy between low and high fracture risk subgroups was clearly noticeable during the 4-year follow-up period of this study. Secondly, the finding that the number of prevalent clinical risk factors increases fracture risk (regardless of the calculated risk score) should be helpful for establishing the optimum therapy. In particular, the fact that a prior fracture increases subsequent fracture incidence supports the special and significant role of this clinical risk factor. The other identified independent single risk factors for fracture were rheumatoid arthritis, steroid use, falls, and low FN T-score.

Although the main finding of our study was in showing the utility of calculating the fracture probability, another rather unexpected observation was the generally low values of FRAX fracture probability in women with a prior fracture. Both the mean FRAX probability in women with a fracture at baseline and the mean FRAX probability in women with a fracture during follow-up were much lower than the recommended thresholds of 10 % [20] or 20 % (National Osteoporosis Foundation). This means that some of the women with fractures had quite a low individual fracture probability score. This in turn suggests the necessity for early treatment also in some individuals with a low FRAX probability and indicates the possibility of underestimating fracture probability by the FRAX tool in subjects with specific clinical characteristics. For example, in a person with frequent falls but without any other defined fracture risk factors, the FRAX tool may give a low fracture probability, although the association between falls and fractures is rather strong (Table 2). However, one must remember that the FRAX tool relates to a period of 10 years and this observation should be treated as preliminary. A final conclusion must be based on a 10-year follow-up.

As is clearly apparent (in this study as well), a perfect fracture risk estimator cannot be constructed, several alternative tools for that purpose have recently been developed. A simple way to establish risk is a combination of age and weight, and other proposed methods include more clinical risk factors and BMD. In a recent study in a group of 5,000 randomly selected Danish women [14], several methods (Osteoporosis Self-assessment Tool, Osteoporosis Risk Assessment Instrument, Osteoporosis Index of Risk, Simple Calculated Osteoporosis Risk Estimation, and age alone) were compared with the FRAX tool. The conclusion was that generally more simple and more easy to use alternative methods have an ability to predict fractures comparable to that of the FRAX tool.

In discussing the problem of the clinical utility of different fracture predicting tools, we consider that the most important point is the conformity between predicted and observed fractures assessed in a longitudinal study. In several studies, this problem has been studied and the results are very inconsistent. Some authors noted comparable numbers of predicted and observed fractures [4, 6, 7], but others have noted even several times more fractures than predicted. In a study by Czerwiński et al. [5], the number of fractures observed was double the number predicted. The same was observed by Bolland et al. [8] and Azagra et al. [9]. However, in a Spanish study [10], there were 57 fractures predicted using the FRAX tool in 1,231 women, and the number of fractures observed was 222, giving an observed/predicted ratio of 3.9. In another study by Czerwiński et al. [11], a major fracture was predicted in 5.3 % of 886 women, and the observed rate was 17.7 %. These contradictory results obtained in different populations suggest that, at least sometimes, the FRAX tool is not clinically useful.

It is important that only in one study was the number of fractures observed lower by approximately 50 % than the number predicted [12]. The Canadian data suggest that the number of subjects qualified for treatment of osteoporosis might be too high because fracture risk was overestimated [12], but in many other populations, the number of treated subjects might be even four times too small. One of the possible reasons for the discrepancy between predicted and observed fractures is that falls are not considered a risk factor for osteoporosis in the FRAX algorithm. Especially among the elderly, falls play a significant role as one of the leading direct reasons for incident fractures. The importance of falls was shown clearly by Jarvinen et al. [21]. These authors stated that in order to reduce fracture incidence, we should put our efforts into stopping falls and not treating low BMD. We conclude that the results of the current study also support the view that falls should be treated as an essential fracture risk factor, and methods of fracture risk or probability assessment should also take falls into consideration.

Another reason for the discrepancy between the numbers of predicted and observed fractures may relate to the fact that the population in which a fracture prediction tool was developed may not have been the same as the population in which it was verified. Fractures have several causes, e.g. local habits concerning diet and physical activity, climate and weather, prevalence of clinical risk factors, and many others. Therefore, the question as to whether a sufficiently precise conformity between predicted and observed events may be achieved is essential. The results presented in the cited studies suggest that such expected conformity is only rarely observed. Another weak point of currently available data in studies yet to be published concerns their design.

The most desirable method to establish a prediction model and its verification should include three steps. First, a population-based sample must be observed for 10 years. Second, a tool should be developed using data obtained during this longitudinal observation period and should include factors with a significant influence on fracture incidence. Third, a newly developed tool should be verified over a true longitudinal 10-year observation period. Such verification should provide an answer as to whether a fracture prediction method is reliable and is suitable for use in daily practice with patients. Due to the short time before tools used to predict fracture are introduced, studies with such a design are not yet available. Our study was established in 2010, and we now present data gathered during only a 4-year follow-up, so a full methodologically perfect assessment of predicted and observed fracture rates cannot be done. However, fracture probability (FRAX tool) and risk (Garvan calculator) obtained in the current preliminary study were higher in women with fracture.

Current knowledge suggests that more complicated fracture prediction algorithms based on numerous clinical factors are still imperfect. In this regard, should we consider collecting less data, such as BMD, that is cost-consuming, in the case of the FRAX tool? The FRAX fracture probabilities in the current study were significantly higher for the model without BMD. This observation can probably be explained by the relatively high prevalence of clinical risk factors for fractures in the population studied. Only in one study was FRAX fracture probability without BMD clearly greater than fracture probability with BMD [12], and these values were, respectively. 11.7 % and 5.5 % for any fracture and 5.5 % and 3.0 % for hip fracture. In other studies, fracture probabilities have been reported as not so distinctly different: FRAX fracture probability with BMD has been reported to be greater for any fractures [57] and for hip fractures [5, 6], and has also been reported to be smaller for any fractures [11, 12] and for hip fractures [7, 11, 12]. In our study, as BMD values were available in all patients, in the primary analysis, we used the FRAX tool with the BMD value incorporated. To obtain the wider clinical characteristics of the patients, we also added information about FRAX probabilities calculated without BMD, but comparison of FRAX probabilities with BMD and FRAX probabilities without BMD was not the aim of the study. An answer to the question as to whether BMD improves the predictive accuracy of the FRAX tool is essential, and we will certainly try to answer it at the end of the longitudinal observation period. For the moment, it is worth noting that not only DXA results but also the availability of the DXA examination itself may influence fracture prediction during clinical decision making.

The current study had several limitations. We conducted a follow-up of women from the RAC-OST-POL study, and have now completed only a part of the expected 10-year period of observation. Therefore, we are not able to compare the numbers of predicted and observed fractures. Radiographs were available in only a small proportion of the subjects studied, and thus some clinically silent spine fractures might not have been recognized. Phone calls cannot be used to verify information given by patients. Fracture risk calculators were originally designed to establish fracture risk or probability in treatment-naive patients, and in the current study, we applied them in the whole group including some treated patients. We did the same in some of our recent studies [22, 23]. Also, some other authors did not exclude patients receiving treatment for osteoporosis [6, 9, 10, 13] from cohorts used to establish fracture probability. On the other hand, in the current study, it was not possible to analyze any influence of treatment on fracture incidence as the study was not designed for such an analysis. Indeed, we observed that the fracture incidence during the follow-up period in patients receiving treatment was approximately twice as high as in untreated patients. This is clearly understandable when we take into consideration the enrollment protocol—the study cohort of RAC-OST-POL is representative of a local population, and the indications for treatment were therefore established by clinical practitioners and were not influenced by the study authors. Thus women receiving treatment had a priori a “worse” skeletal status. Finally, we used the same threshold concerning high and low fracture risk for the Garvan calculator as for the FRAX tool in the absence of clear recommendations for the Garvan method.

The strong aspects of the current study are the relatively large population-based study cohort, the relatively low dropout rate over 4 years, and the “true” prospective, longitudinal study design.

In conclusion, during a 4-year follow-up, fracture incidence was predicted by baseline fracture probability (obtained using the FRAX tool) and risk (obtained using the Garvan calculator). A number of clinical risk factors and a prior fracture, rheumatoid arthritis, FN T-score, and falls increased independent of fracture incidence.