Introduction

During the past two decades, epidemiologic studies have identified an array of fracture risk factors [1]. These studies have also underscored the fact that osteoporotic fracture is a multifactorial disorder, making it difficult to discriminate those who will sustain a fracture from those who will not. This multifactorial nature suggests that the risk of fracture for an individual cannot be accurately estimated without considering effects of multiple risk factors. The challenge is how to develop workable prognostic models that incorporate relevant risk factors and identify those at high-risk of fracture [2].

A number of prognostic models have recently been developed [35] using bone mineral density measurement and common fracture risk factors such as age, weight or body mass index, smoking, chronic disease, use of central nervous system medication, and history of early menopause. However, all of these models are more applicable to groups of individuals rather than to an individual, because they were built based on the concept of risk stratification in which continuous variables were categorized into subgroups. For example, a good prognostic model should distinguish a 65-year-old woman with BMD T-score of -2.5 from a 67-year-old woman with a BMD T-score of -2.4, because the two women should theoretically have different risk levels even though the difference is modest. In fact, at the patient level, an individual is likely to be unique in risk profile and the risk of fracture for the individual should take that uniqueness into account [6].

Furthermore, all current prognostic models of fracture require complex computation, typically performed by computer software or a website, which may not be accessible or convenient in the primary care setting. A prognostic model that takes the form of a simple paper-based algorithm without data entry to a computer may be particularly useful [7].

For a risk factor to be useful in primary care, it has to ideally possess the following criteria: accuracy, reliability, responsiveness, and to be non-invasive. Among the fracture risk factors that have been identified during the past 30 years or so, advancing age, low body weight, low bone mineral density (BMD), a history of fractures, and falls fulfill these criteria [812]. The translation of these risk factors into clinical tools for primary care use has been applied for predicting of hip fracture risk [6], but not for any osteoporotic fracture risk. The present study sought to develop a nomogram that incorporates these non-invasive risk factors to predict five-year and ten-year absolute risks of any osteoporotic fracture for an individual man and woman.

Materials and methods

Setting and subjects

This study is part of the on-going longitudinal Dubbo Osteoporosis Epidemiology Study (DOES), for which details of protocol and study design have been previously described [8, 1315]. Briefly, in 1989, all men and women aged 60 or above (as of 1989) living in Dubbo, a city of approximately 32,000 people 400 km north west of Sydney (Australia), were invited to participate in an epidemiological study. At that time, the population comprised 1,581 men and 2,095 women aged ≥60 years, of whom 98.6% were Caucasian and 1.4% indigenous Aboriginal. These individuals were all invited to participate in DOES. This study was approved by the St Vincent’s Campus Research Ethics Committee and informed written consent was obtained from each participant.

Dubbo had been selected for the study as the age and gender distribution of the population closely resembled the Australian population [14], and it is relatively isolated in terms of medical care, so that virtually complete ascertainment of all fractures in the target population is possible. During the follow-up period, approximately 5% of women were on anti-osteoporosis treatment, with the minority (4.5%) being prescribed calcium and vitamin D.

Ascertainment of fracture

Low trauma and non-pathological fractures were considered the primary outcome of this study. Fractures occurring during the study period were identified for residents of the Dubbo local government area through radiologists’ reports from the two centers providing X-ray services as previously described [8, 13]. Fractures were only included if the report of fracture was definite and, on interview, had occurred with low trauma (e.g., fall from standing height or less). Fractures clearly due to major trauma (such as motor vehicle accidents), those due to underlying diseases (such as cancer or bone-related diseases) or those of digit, skull or cervical spine were excluded from the analysis. The present study did not include morphometric vertebral fractures.

Fractures were classified as any fracture (any first osteoporotic fracture), including those of hip, vertebrae (symptomatic), wrist, meta-carpal, humerus, scapula, clavicle, distal femur, proximal tibia, patella, pelvis and sternum. Not all individuals who sustained a fracture had had or agreed to have bone density measurements. The total number of individuals with fracture reported in this study accounted for 92% of all fractured subjects from the entire DOES population.

Measurements of risk factors

Individuals were interviewed by a nurse coordinator who administered a structured questionnaire to obtain data, including age, life-style factors such as duration of smoking intake and alcohol consumption, physical activity, any history of falls in the preceding 12 months, any history of fractures after the age of 50, family history of osteoporosis, including maternal history and paternal history, condition of rheumatoid arthritis and a history of using corticosteroids. Current use of corticosteroids was defined as using oral prednisolone of ≥7.5 mg per day or equivalent dose of another corticosteroid for ≥6 months [16] either at the time of fracture or at the time of analysis to those without a fracture. Anthropometric variables (height, weight) were measured, and a dietary assessment was performed based on a frequency questionnaire for calcium intake, as described elsewhere [17].

At baseline, bone mineral density (BMD, g/cm2) was measured at the lumbar spine or femoral neck (FN) by dual energy x-ray absorptiometry initially using a DPX densitometer (GE-LUNAR Corp, Madison, WI). The radiation dose with this method is <0.1 μGy. The coefficient of reliability of BMD in our institution in normal subjects is 0.96 and 0.98 at the proximal femur and lumbar spine, respectively [18]. In the present analysis, baseline femoral neck BMD, not lumbar spine BMD, was used, since the formal is minimally affected by degenerative changes that may artificially elevate BMD. For comparison with other studies, each subject was also classified as “osteoporotic”, with a BMD being 2.5 SD or more below the young normal level, “osteopenic”, with a BMD between 2.5 to 1.0 SD below the young normal level, or as “normal”, being 1.0 SD below or above. The “young normal” BMD was obtained from a referent database for Australian women [19]. The “young normal” BMD was obtained from a sample of Australian men and women aged between 20 to 32 years of age. These values are identical to those of LUNAR Caucasian database [20].

Development of predictive models

Given many potential risk factors, the number of “candidate models” for predicting fracture risk can be large. The Bayesian model average (BMA) [21] was applied to search for most parsimonious models with consistent and maximum discriminatory power. In terms of model consistency and accuracy, it has been shown that the BMA approach performed better than traditional algorithms, such as stepwise regression [22, 23], because it can account for model uncertainty in both predictions and parameter estimates [21, 24].

The prognostic performance of parsimonious models was assessed by the area under the receiver operating characteristic (ROC) curve [2528], which reflects the model’s ability to discriminate between those who will sustain a fracture from those who will not.

The Cox’s proportional hazards regression model [29, 30] was used to estimate the magnitude of association between fracture risk and the risk factors. The association between each risk factor and fracture risk was initially expressed by the hazard ratio (HR) and its 95% confidence interval (CI) for each standard deviation (SD) or unit change with ordinal risk factors.

From the estimates of the models’ parameters, a series of nomograms was constructed for predicting 5-year and 10-year fracture risks for men and women separately. The nomograms were internally validated by the bootstrap method, which assesses how accurately the model will predict fracture in a new similar sample of subjects. In this method, 1,000 sub-samples, each with 150 subjects, of the entire sample were repeatedly re-sampled (with replacement) and analyzed, from which biases can be estimated [31]. The predictive accuracy of a model was assessed by the concordance index [31], which is the probability that, given two randomly drawn individuals, the individual who will sustain a fracture first had a lower probability of non-fracture. The interpretation of the concordance index is similar to that of the area under the receiver operating characteristic curve [2528]. In a further validation, the predicted probability of fracture was compared with the actual probability (i.e., nomogram calibration) on the entire sample, again using 1,000 bootstrap resample to reduce overfit bias, which would overstate the accuracy of the nomogram. All analyses were performed using the R language on the Windows platform [32] with the Design and Hmisc packages [31, 33].

Results

Characteristics of participants and univariate analysis

Data were analyzed from 1358 women and 858 men who had been followed up between 1989 and 2004. The median duration of follow-up was 13 years (inter-quartile range, IQR: 8–14) for women and 12 years (7–14) for men, yielding 14,443 and 8,695 person-years of observation in women and men, respectively. The average age (±SD) at baseline was 71 ± 8 years for women and 70 ± 6 for men.

During the follow-up period, 426 women and 149 men sustained at least one fracture; making the overall incidence of fracture 35 per 1,000 person-years in women and 18 per 1,000 person-years in men. In both sexes, the most common sites of fracture were symptomatic vertebral (28% in women and 34% in men), hip (17% for both sexes), forearm (2% in women and 4% in men), rib (5% in women and 23% in men). In univariate analysis, advancing age, lower weight, shorter current height, lower BMI, lower BMD, prior fracture, and fall during the past 12 months were each associated with increased fracture risk in men and women (Table 1 and Fig. 1).

Fig. 1
figure 1

Cumulative probability of fracture in women (left panel) and men (right panel) classified by age group (a, b), BMD category (c, d), body weight tertile (e, f), history of fracture (g, h), and falls during the past 12 months (i, j)

Table 1 Characteristics of any fracture and non-fracture individuals

In addition, there were no significantly differences between those with and without a fracture in terms of current use of corticosteroid, a condition of rheumatoid arthritis, a parental history of osteoporosis and lifestyle factors, such as daily calcium intake, physical activity and smoking status (Table 1).

Development models for predicting fracture risk

All potential risk factors were included in the analysis by using Bayesian model average to select optimal variables for predicting fracture risk. It was confirmed that the five risk factors: age, femoral neck BMD T-scores, weight, prior fracture, and falls during the past 12 months were most significantly associated with fracture risk. Two most parsimonious models were considered: Model I includes age, BMD T-scores, prior fracture and fall, and Model II replaces the BMD variable in Model I with body weight.

In multivariable analysis with sex being considered a covariate, the hazard ratio (95% CI) of fracture relating to BMD was 1.36 (1.26–1.47) for each SD lower, age 1.25 (1.17–1.30) for each SD older, prior fracture 1.84 (1.68– 2.03), previous falls 1.22 (1.11–1.36). For model II, the hazard ratio of fracture was: body weight 1.10 (1.02–1.19), age 1.34 (1.26–1.43), prior fracture 2.05 (1.87–2.25), previous falls 1.22 (1.10–1.36).

The prognostic performance as assessed by the area under the receiver operating characteristic curve (AUC) is shown in Table 2. Model I (which included age, femoral neck BMD, prior fracture, and fall) performed better than model II (which replaced BMD in model I with body weight). In women, the AUC value of model I was 0.76, which was significantly better (p = 0.0007) than model II’s (AUC = 0.72). In men, there was no statistically significant difference in discrimination (p = 0.146) between model I (AUC = 0.75) and model II (AUC = 0.74). In men, the additional of age (model III) significantly improved the discriminatory power from the model with BMD alone (model IV).

Table 2 Area under the receiver operating characteristic curves of different models

When the continuous BMD variable in model I was replaced by categorized BMD (normal, osteopenia, and osteoporosis according to the WHO classification system), the AUC value for model I reduced to 0.717 (SE 0.016) for women and 0.704 (SE 0.025) for men, which was significantly lower (p = 0.001 for women and p = 0.018 for men) than the model with continuous BMD measurements. Thus, models based on continuous measurements performed consistently better than the models based on categorical measurements.

Using the parameter estimates of model I and model II, we constructed two nomograms for women (Figs. 2 and 3) and two nomograms for men (Figs. 4 and 5). Each of these nomograms predicts the risk of fracture within 5 years and 10 years. In women, internal validation by the bootstrap method suggested that the bias-corrected estimate of predictive discrimination of 0.51 for model I and 0.43 for model II; the corresponding estimates in men were 0.50 for model I and 0.47 for model II. The maximum calibration error in predicting probability of fracture was about 0.4% for both models in women and 0.6–1.9% in men.

Fig. 2
figure 2

a Nomogram for predicting the 5-year and 10-year probability of any fracture for a woman, based on Model I. Instruction for usage: Mark the age of an individual on the “Age” axis and draw a vertical line to the “Point” axis to determine how many points toward the probability of fracture the individual receives for his/her age value. Repeat the process for each additional risk factor. Sum the points of the risk factors. Locate the final sum on the “Total points” axis. Draw a vertical line down to the 5-year or 10-year risk line to find the individual’s probability of sustaining a fracture within next 5 or 10 years. Example: Mrs. A, 70 years old, has a BMD T-score of -2.5, had a prior fracture and a fall in the past 12 months; her points for age is approximately 12, her BMD points is 65; prior fracture point is 13 and fall point is 4. Her total points are, therefore, 12+65+13+4 = 94, and her probability of having a hip fracture is around 0.21 in the next 5 years and 0.40 in the next 10 years. In other words, in 100 women like her, one would expect 21 and 40 of them will have a fracture in the next 5 years and next 10 years, respectively. b: Calibration of nomogram for any fracture for women (Model I). The diagonal dotted line indicates reference line on which ideal nomogram would lie (perfect prediction). Solid line indicates current nomogram performance

Fig. 3
figure 3

a Nomogram for predicting the 5-year and 10-year probability of any fracture for a woman, based on Model II. Instruction for usage is similar to Fig. 2a. b: Calibration of nomogram for any fracture for women, model II. The diagonal dotted line indicates reference line on which ideal nomogram would lie (perfect prediction). Solid line indicates current nomogram performance

Fig. 4
figure 4

a Nomogram for predicting the 5-year and 10-year probability of any fracture for an elderly man, based on Model I. Instruction for usage is similar to Fig. 2a. b: Calibration of nomogram for any fracture for men (Model I). The diagonal dotted line indicates reference line on which ideal nomogram would lie (perfect prediction). Solid line indicates current nomogram performance

Fig. 5
figure 5

a Nomogram for predicting the 5-year and 10-year probability of any fracture for a man, based on Model II. Instruction for usage is similar to Fig. 2a. b: Calibration of nomogram for any fracture for men, model II. The diagonal dotted line indicates reference line on which ideal nomogram would lie (perfect prediction). Solid line indicates current nomogram performance

Some typical point estimates of 5-year and 10-year risks are presented for model I (Table 3) and for model II (Table 4). For illustration, 5-year risk of more than 10% and 10-year risk of greater than 20% are highlighted. As expected from a multivariable prognostic model, there are multiple combinations of risk factors that can raise an individual’s risk. For example, if the 5-year absolute risk of 10% or greater is considered “high risk” then, virtually all women and men aged 80+ with BMD T-scores being less than -1.0 and regardless of their prior fracture and fall are at high risk. Women aged 70 years old with BMD T-scores being less than -2.0 can also be considered high-risk regardless of their prior fracture and fall history. Virtually all men aged 70 years and with a prior fracture can also be considered high risk, regardless of their BMD measurement. Among women aged 60 years, their 5-year risk of fracture is greater than 10% only when their T-scores are ≤-3.0. None of 60-year-old men have 5-year risk of fracture greater than 10%.

Table 3 Five-year and 10-year risks of any fracture for a woman and for a man, based on age, BMD T-scores, a history of prior fracture and falls (Model I)
Table 4 Five-year and 10-year risks of any fracture for a woman and for a man, based on age, body weight, a history of prior fracture and falls (Model II)

Discussion

Despite the fact that several fracture risk factors have been identified by epidemiologic research, the synthesis of these risk factors into a prognostic model for clinical use has not been realized. Part of the problems is that many risk factors require sophisticated measurements that are beyond resources of a typical primary care setting. Thus, a practically useful prognostic model should make use of relevant clinical data that are easily obtained from an individual. The present study, built on previous research of risk factors, analyzed two models of prognosis, which incorporate the established risk factors of age, prior fracture, history of falls, and BMD or body weight. While weight is highly correlated with BMD, it is not surprising to observe that the model with the clinical risk factors and BMD performed better than the model with the clinical risk factors and body weight, because BMD has been shown to be more sensitive and specific in terms of fracture prediction [34, 35]. However, the difference in predictive accuracy between the two models is modest, and given the relatively low incidence of fracture, the difference is of limited practical importance.

The ultimate aim of developing a prognostic model is to provide clinicians and each individual with their risk estimate to guide clinical decisions. At present, individuals with low bone mineral density (i.e., T-scores being less than -2.5) or with a history of prior low trauma fracture are recommended for therapeutic intervention [36, 37]. This recommendation is logical and appropriate, since these individuals–as shown in this study and previous studies [12, 38]–have higher risk of fracture, and treatment can reduce their risk of fracture [3941]. However, because fracture is a multifactorial event, there is more than one way that an individual can attain the risk conferred by either low BMD or a prior fracture. Indeed, virtually all women aged 70 years with BMD T-scores less than -1.5 and all 80-year-old men with BMD T-scores less than -1.0 can be considered “high risk”. On the other hand, no 60-year-old men or women without a prior fracture and a fall are considered high risk, even when their BMD T-scores were below -2.5. This demonstrates the informativeness of a multivariable prognostic model, and the limitation of a risk stratification-based approach for risk assessment for an individual.

Other clinical factors, such as corticosteroid use, family history of fracture and a condition of underlying rheumatoid arthritis, have been shown to be predictors for fracture risk [4244]. However, in the present study, there were no significant associations between those risk factors and fracture risk in both women and men. Therefore, these factors were not included in the prognostic model, as they did not significantly contribute to the predictiveness of fracture risk. It is likely, nevertheless, that a family history of fracture may in general improve the risk prediction. Vertebral deformity (morphometric fracture) has been shown to be a risk factor for fracture [45]. In the present study, vertebral deformity was not assessed at baseline for all participants; therefore, it was not included in the prognostic models. Nevertheless, there is room for further improvement in the prognosis of fracture by incorporating these clinical risk factors into the prognostic model.

Each individual is important and unique. Individualization of risk—or the prediction of risk for an individual given a risk profile—is a fundamental aspect of the present models. The present models considered all continuous risk factors (e.g., BMD, body weight and age) in their original units of measurement. This consideration is different from previous models [3, 4], which categorized continuous risk factors into distinct groups based on some thresholds. While the categorization is an appealing for its simplicity, it implicitly assumes a discontinuous relationship, which is unlikely to be true for well-known risk factors, such as BMD and body weight. Such a categorization is also known to reduce statistical power [46, 47]. Furthermore, the risk estimates based on categorization of continuous risk factors can only be applied to a group of individuals, not to an individual. Prognosis is about imparting information of fracture risk to an individual and each individual is a unique case, because there exists no “average individual” in the population. The more risk factors are considered, the greater likelihood of uniqueness of an individual’s profile being defined. Therefore, by modeling risk factors in their continuous scale the present models can be uniquely tailored to an individual.

The idea of using a nomogram to develop a prognosis model for an individual is not new with more than 1,700 nomograms being advocated [48]. Several nomograms developed and used in the field of oncology exhibited better performance than risk-grouping categorization [49, 50].

Statistical prognostic models have been shown to out-perform clinical judgment [51], because they can objectively incorporate many risk data. Because of their objectivity, multivariable prognostic models can reduce the variability in risk estimates.

No prognostic models are able to discriminate all individuals who will fracture from those who will not, and the current models are not an exception. The accuracy of a predictive model is normally quantified by its capacity to separate individuals who will sustain a fracture along a continuum from those who will not. In terms of the capacity of discrimination, both models proposed in this paper performed reasonably well with the area under the ROC curve being 0.75, which is considered clinically acceptable [52]. Although the AUC is a useful measure of discrimination, it does not provide much insight into calibration, which refers to the concordance between predicted and actual probabilities [53]. In the present study, using the bootstrap technique, it was shown that the overfitting-corrected calibration was very good, particularly in women with the mean absolute calibration errors being 1–2% for all models. In summary, although the models are not perfect, their predictive accuracy and capacity of discrimination would be considered clinically useful in the real-world situation.

The predicted risk of fracture is a continuous probabilistic variable ranging from 0 to 1. This raises the issue of selecting an optimal cut-off predicted probability to classify an individual into fracture or non-fracture. This is not an easy task, because the cut-off value—if it exists at all—depends on the complex risk-benefit consideration, and perhaps more importantly, an individual’s perception of risk, which is beyond the scope of the present study. The level of predicted risk at which that individual is prepared to take action is dependent on the risk perception of the individual, which is not easily quantified. Nevertheless, the predicted probability of fracture from the present prognostic models can be viewed as a measure of severity of osteoporosis for an individual.

It is logical that individuals with high risk of fracture, regardless of their BMD levels, should be considered for treatment because there was evidence suggesting that treating these individuals could yield clinical benefit [54]. However, at what level (or levels) of risk should be regarded as “high risk”, so that an intervention can be considered cost-effective? The issue of cost-effectiveness has been shown to be a function of national per-capita income, age, and future costs excluded [55]. For example, it was estimated that for a 50-year-old Australian woman, treatment is considered cost-effective if the woman’s 10-year risk of hip fracture is at least 1.93% [55]. However, for a 90-year-old woman, the treatment is cost-effective if her 10-year risk is 10.8% or higher. The present nomograms can help identify such individuals for intervention.

All prognosis models are preliminary in the sense that they are yet to be tested in a totally independent population. The present models, as well as previous models of risk assessment developed from observational studies, may not be readily extrapolated or generalized to other populations, because the population under study might differ from the general population. Therefore, both external and internal validation should be a priority of research in the application of risk assessment models.

An important weakness of prognostic models of fracture risk is that they are based on a single measurement of risk factors, with the underlying but not stated assumption that the risk factors do not change with time. Obviously, this assumption is not true in many risk factors such as BMD and body weight that are known to decline or change with time. Moreover, the rates of decline in BMD varied substantially among individuals. Therefore, one important aspect of future model development should take the time-varying nature of risk factors into account to achieve a better estimate of risk for an individual.

Notwithstanding these common limitations, the present models have the advantage of flexibility and ease of use without tedious computation, which can be impractical in primary care settings or inappropriate categorization. The present models visually translate clinical risk factors into a paper-based nomogram so that it can be readily used in clinical practice.