Introduction

In 2008, the World Health Organization (WHO) Collaborating Centre for Metabolic Bone Diseases released a fracture risk assessment tool (FRAX®) for the estimation of individualized 10-year probability of hip and osteoporotic fracture (composite of hip, clinical spine, distal forearm, and proximal humerus) with and without bone mineral density (BMD) [1]. FRAX integrates seven clinical risk factors [prior fragility fracture, a parental history of hip fracture, smoking, use of systemic corticosteroids, excess alcohol intake, body mass index (BMI) and rheumatoid arthritis] which, in addition to age and sex, contribute to fracture risk independently of BMD [2, 3].

The WHO selected the femoral neck as the reference site for osteoporosis diagnosis and fracture risk assessment based upon large epidemiologic studies that have documented its performance for fracture prediction and especially for assessment of hip fracture risk [4, 5]. When BMD is included in the FRAX calculation, the femoral neck measurement must be used. Although other BMD measurement sites can also be used for fracture risk assessment and for osteoporosis diagnosis [6], they are not currently a component of FRAX.

Previous work has demonstrated the feasibility and incremental improvement in fracture risk prediction using lumbar spine BMD in addition to femoral neck BMD [7]. This approach used the femoral neck to predict non-vertebral fracture risk and the lumbar spine to predict isolated vertebral fracture risk. The global fracture prediction model then consisted of summing these mutually exclusive prediction models. Although this established “proof of principle,” the model was judged difficult for clinicians to apply in routine practice. Furthermore, it only considered main effects, whereas the FRAX tool must also consider correlations and complex interactions amongst the predictor variables.

The objective of the present analysis was to develop a simple algorithm whereby major osteoporotic fracture risk under the FRAX system using femoral neck BMD could be adjusted to simultaneously accommodate information on lumbar spine BMD. The aim was to develop a simple tool that could be readily applied in routine clinical practice without the need for elaborate calculations or modifying the structure of the FRAX tool. It was hypothesized that the offset (difference) between lumbar spine and the femoral neck T-scores would provide an indication of whether fracture risk was higher or lower than the risk predicted from the femoral neck alone. An approach was developed and internally validated using a split-cohort design.

Methods

Patient population

In the province of Manitoba, Canada, health services are provided to virtually all residents through a single public health care system. Bone density testing with dual-energy X-ray absorptiometry (DXA) has been managed as an integrated program since 1997; criteria and testing rates for this program have been published [8]. The program maintains a database of all DXA results which can be linked with other population-based computerized health databases through an anonymous personal identifier [9]. The DXA database has been previously described with completeness and accuracy in excess of 99%.

The population for this retrospective cohort study consisted of all individuals aged 50 years and older with valid DXA measurements from the lumbar spine and femoral neck. Subjects were required to have medical coverage from Manitoba Health during the observation period ending March 2008. For those with more than one eligible set of measurements, only the first record was included. The final study population consisted of 36,368 individuals (33,850 women and 2,518 men). The population was randomly divided into two equally sized subgroups, one for use in model derivation and the second for independent validation. The study was approved by the Research Ethics Board for the University of Manitoba and the Health Information Privacy Committee of Manitoba.

Bone density measurements

DXA scans were performed and analyzed in accordance with manufacturer recommendations. Hip T-scores (number of SDs above or below young adult mean BMD) and Z-scores (number of SDs above or below age-matched mean BMD) were calculated from the revised NHANES III White female reference values (Prodigy version 8.8) [4, 10]. No comparable international reference standard exists for the lumbar spine, and T-scores and Z-scores were calculated using the manufacturer USA White female reference values. Vertebral levels affected by artifact were excluded by experienced physicians using conventional criteria [11]. Where the lumbar spine was not considered clinically usable even with exclusions or where there were bilateral hip replacements, then the patient was excluded.

Prior to 2000, DXA measurements were performed with a pencil-beam instrument (Lunar DPX, GE Lunar, Madison WI), and after this date a fan-beam instrument was used (Lunar Prodigy, GE Lunar, Madison WI). Instruments were cross-calibrated using anthropomorphic phantoms and 59 volunteers. No clinically significant differences were identified (T-score differences <0.2). Densitometers showed stable long-term performance [coefficient of variation (CV) < 0.5%] and satisfactory in vivo precision (CV 1.7% for L1–4 and 1.1% for the total hip) [12].

The lumbar spine–femoral neck “offset” was calculated as the numeric difference in the respective T-scores (lumbar spine T-score minus femoral neck T-score). A negative offset indicated that the T-score of the lumbar spine was lower than that of the femoral neck, whereas a positive offset indicated the T-score of the lumbar spine was higher than that of the femoral neck. An absolute T-score offset exceeding 1 SD was designated as “discordant.”

Definitions of fractures and other clinical risk factors

Fractures and other medical diagnoses were assessed through a combination of hospital discharge abstracts (diagnoses and procedures coded using the ICD-9-CM prior to 2004 and ICD-10-CA thereafter) and physician billing claims (coded using ICD-9-CM) [13]. Use of systemic corticosteroids and other medications was obtained by linkage to the provincial Drug Program Information Network database with drugs classified according to the Anatomical Therapeutic Chemical system of the WHO [14]. The pharmacy database is accurate both for capture of drug dispensations as well as the prescription details [15].

Longitudinal health service records were assessed for the presence of hip, clinical vertebral, forearm, and humerus fracture codes (collectively designated as “osteoporotic”) before and after BMD testing that were not associated with trauma codes [16]. Incident fractures were defined as fractures that occurred after the index BMD measurement and generated two or more site-specific fracture codes in any diagnosis field (hospitalization or physician visit). We required that hip fractures and forearm fractures be accompanied by a site-specific fracture reduction, fixation, or casting code as this enhances the diagnostic and temporal specificity for an acute fracture event. To minimize potential misclassification of prior incident fractures, we required that there be no hospitalization or physician visit(s) with the same fracture type in the 6 months preceding an incident fracture diagnosis. For purposes of the FRAX calculation, prior fragility fracture was defined as a major osteoporotic fracture prior to BMD testing. A diagnosis of rheumatoid arthritis was taken from physician office visits and/or hospitalizations with a relevant ICD-9-CM/ICD-10-CA code in a 3-year period prior to BMD testing. Proxies were used for smoking (COPD diagnosis) and high alcohol intake (alcohol or substance abuse diagnosis). Prolonged corticosteroid use (over 90 days dispensed in the year prior to DXA testing at a mean prednisone-equivalent dose of 7.5 mg per day or greater) was obtained from the provincial pharmacy system. Weight and height were recorded at the time of the DXA examination (prior to 2000 this was by self-report and starting in 2000 height was assessed with a wall-mounted stadiometer and weight was assessed without shoes using a standard floor scale). BMI (in kilograms per square meter) was calculated as weight (in kilograms) divided by height squared (in meters). Secondary causes of osteoporosis, other than rheumatoid arthritis, do not contribute to the probability of fracture as calculated by FRAX when information on BMD is present and therefore was not considered in this analysis.

FRAX probability calculations

A 10-year probability of a major osteoporotic fracture was calculated using the US Caucasian FRAX tool (version 3.0) by the WHO Collaborating Centre with femoral neck BMD for each case using the previously defined variables without knowledge of the fracture outcomes. To adjust for the effect of missing parental hip fracture information on FRAX estimates prior to 2005, age- and sex-specific estimates of the effect of a positive response was determined using the later years of data (2005–2007). This averaged effect incorporates both the prevalence of a positive response as well as the relative change in risk. The risk adjustment factor ranged from 1.06 to 1.11 for major osteoporotic fractures. These ratios were then used to adjust the FRAX risk estimate for those years of data where parental hip information was not available.

Statistics

All results are reported as mean ± SD unless otherwise stated. Group comparisons for continuous data were conducted with the student t test and for categorical data using a chi-square test of independence. A Cox proportional hazards model was used to study fracture risk as a function of the BMD offset variable which was adjusted for the FRAX probability of major osteoporotic fracture (based upon femoral neck BMD) as an additional covariate in the model. FRAX probability was coded as both a continuous variable (log-transformed due to a skewed distribution) and as a categorical variable (grouped into risk quintiles). Death was considered a competing hazard in accordance with the approach adopted by the WHO Collaborating Centre. Interactions between the offset variable and sex, age (decade), and fracture risk category (low, moderate, high) was tested in three subsequent models. Following these analyses, a simplified prediction model was developed and tested for potential application in clinical practice.

The FRAX predictions without adjustment for offset were compared with FRAX predictions adjusted for offset. Ten-year fracture outcomes were estimated using the Kaplan–Meier technique, stratified by the size of the offset. Overall performance of the prediction system was assessed from the area under the curve (AUC) for receiver operating characteristic (ROC) plots. The AUC assesses discriminatory ability but has relatively poor sensitivity for detecting improvements in model performance; the integrated discrimination improvement (IDI) estimates the improvement in average sensitivity and average specificity (integrated difference in Youden’s index) with significance testing based upon the z distribution [17]. Statistical analyses were performed with SPSS for Windows version 16.0 (SPSS Inc., Chicago, IL).

Absolute 10-year fracture probabilities using the FRAX model alone and the hybrid model with BMD at the lumbar spine were categorized as low risk (<10%), moderate risk (10–20%) and high risk (>20%) in accordance with Canadian reporting guidelines [18]. The number of individuals in whom the hybrid model reclassified risk to a different category was determined according to the method of Janes et al. [19] The number of fractures under the two prediction systems were cross-tabulated, and the linear trend in fracture rates when a risk category under the FRAX system was reclassified under the hybrid system was assessed using the Cochran–Armitage test. Within each subgroup, fracture outcomes to 10 years were estimated using the Kaplan–Meier method.

Results

Population

The overall study population consisted of 36,368 individuals, with a mean age of 65.2 years (SD, 9.6), of whom the vast majority were women (93.1%). The mean femoral neck T-score was −1.45 (SD, 0.98) with a mean 10-year major osteoporotic fracture probability of 12.2% (SD, 7.5%). The mean lumbar spine T-score was −1.34 (SD, 1.54) with mean T-score offset 0.11 (SD, 1.11). The offset was approximately normally distributed overall and within age decades. Mean offset was studied as a function of age decade and was slightly higher for younger subjects (mean, +0.24 and SD, 1.06 for age 50–59 years) than for the oldest subjects (mean, −0.13 and SD 0.98 for age 90+ years) but the between-age differences were negligible compared to the within-decade variation (Fig. 1).

Fig. 1
figure 1

Effect of age on the T-score offset between the lumbar spine and femoral neck. Ninety-five percent confidence interval bars are shown

The characteristics of the 18,153 subjects randomly allocated to the derivation subgroup and the 18,215 subjects allocated to the validation subgroup are summarized in Table 1. No significant differences (all P > 0.05) were seen in any of the baseline covariates.

Table 1 Study population baseline characteristics

Derivation of the lumbar spine–femoral neck offset rule

During the mean 5.5 years of follow-up (maximum 10 years), 2,321 individuals (1,160 in the derivation subgroup and 1,161 in the validation subgroup) experienced one or more major osteoporotic fractures of whom 667 (338 in the derivation subgroup and 329 in the validation subgroup) experienced one or more clinical vertebral fractures. Using the derivation subgroup, the FRAX probability for major osteoporotic fractures was strongly predictive of major osteoporotic fractures (P < 0.001). When adjusted for FRAX probability, the offset variable was also independently predictive of major osteoporotic fractures [HR 1.12 (95% CI, 1.06–1.18, P < .001) for every SD lumbar spine below femoral neck]. Identical results were obtained when the FRAX probability was recoded as a categorical variable (risk quintiles). No significant interactions were detected between the offset effect and sex (p = 0.120), age decade (p = 0.404), or FRAX risk category (p = 0.331).

In creating a simplified prediction rule using the offset variable, we: (1) assumed that the small adjustments to fracture probability would give equivalent results to adjustments applied to fracture odds, (2) assumed a linear (rather than exponential) relationship between the offset variable and fracture risk (adjustment factor 0.1 per SD increase/decrease versus HRoffset = 1.12offset), and (3) rounded the offset to the nearest whole number. Based upon these simplifications, the following rule was developed for generating the lumbar spine-enhanced FRAX estimate: “Increase/decrease osteoporotic FRAX estimate by one tenth for each rounded T-score difference between the lumbar spine and femoral neck.” For example, consider an individual with a femoral neck T-score of −1.7 and a major osteoporotic FRAX probability of 18%. If the lumbar spine T-score is −3.5 then this indicates an offset of −1.8 (−3.5 minus −1.7). This is rounded to the nearest whole number (−2). One tenth of the FRAX estimate based upon the femoral neck is 1.8%, which is multiplied by the rounded offset value (giving 3.6%). This is then added (because lumbar spine T-score is worse than femoral neck T-score) to the original FRAX estimate (18%) giving a final (rounded) probability of 22% (18% + 3.6%).

Fracture prediction with the lumbar spine–femoral neck offset rule

The overall cohort was categorized according to the offset using the following ranges: above +2, between +1 and +2, between +1 and −1, between −1 and −2 and below −2. The mean FRAX probabilities (without offset), FRAX probabilities (with offset) and observed 10-year fracture outcomes (Kaplan–Meier estimates) are shown in Fig. 2. Including the offset in the FRAX calculation made the prediction more responsive to differences between the lumbar spine and femoral neck T-scores. Observed fracture rates more closely tracked predicted fracture rates using the offset than when this was not used.

Fig. 2
figure 2

Predicted major osteoporotic fracture probability for FRAX (without and with the proposed lumbar spine offset) versus observed fracture outcomes (Kaplan–Meier estimate)

Fracture discrimination assessed from the ROC analysis is summarized in Table 2 in the overall cohort, in the validation subgroup, and in discordant subjects (defined as a T-score offset magnitude exceeding 1 SD). Prediction of vertebral fractures (AUC range, 0.742 to 0.794) was generally better than prediction of all major osteoporotic fractures (AUC range, 0.694 to 0.700). For prediction of major osteoporotic fractures in the validation subgroup, the AUC for FRAX with the proposed offset adjustment (simple rule 0.694) was similar to FRAX without the offset adjustment (AUC, 0.694) but was greater for vertebral fracture prediction (0.755 versus 0.748). When individuals with an offset less than 1 SD were excluded, then AUC differences were slightly greater (0.699 versus 0.697 for major osteoporotic fractures, 0.794 versus 0.778 for vertebral fractures). Despite the small change in AUC, the IDI showed that FRAX with the proposed offset adjustment resulted in a significant improvement in major osteoporotic and vertebral fracture risk prediction for all analyses. In the validation subgroup, there was a significant improvement in the fracture prediction for major osteoporotic fracture (P = 0.007) and vertebral fracture prediction (P < 0.001). No significant loss in major osteoporotic or vertebral fracture discrimination was seen when the simplified offset rule was compared with the complex rule using the same IDI testing procedure (all P > 0.05) except for vertebral fractures in the discordant subgroup (P = 0.046).

Table 2 AUC for fracture prediction and P value for improvement in risk prediction using FRAX probability for major osteoporotic fracture with and without lumbar spine offset

The change in risk classification from using the proposed lumbar spine offset adjustment is shown in Table 3. After using FRAX with the lumbar spine offset, 8.3% moved to a higher or lower risk category. The largest risk reclassification rate (12.3%) was in those initially assigned to the moderate risk category by FRAX without the lumbar spine offset, and reclassified individuals showed a significant fracture risk gradient (P trend <0.001). For moderate risk discordant cases with T-score difference >1 SD, there was 25.2% reclassification and a significant fracture risk gradient (P trend <0.001). Observed 10-year major osteoporotic fractures showed that the proposed lumbar spine offset rule appropriately reclassified individuals to lower risk with a positive offset and reclassified individuals to a higher risk that had a negative offset. In particular, for those subjects initially assigned to the moderate risk category by FRAX without the lumbar spine offset, there was appropriate risk reclassification to the low and high risk categories (Fig. 3).

Table 3 Observed fracture risk at 10 years (Kaplan–Meier estimate) according to risk reclassification under the proposed rule for computing FRAX probability for major osteoporotic fracture with lumbar spine offset for validation subjects
Fig. 3
figure 3

Ten-year major fracture outcomes in moderate risk cases under FRAX without the lumbar spine offset and after reclassification using FRAX with the proposed simple lumbar spine offset rule (discordant subgroup defined as validation cases with femoral neck and lumbar spine T-score difference >1 SD)

Discussion

This analysis suggests that a relatively simple arithmetic procedure can be applied to conventional FRAX estimates of major osteoporotic fracture probability to modulate the risk assessment with knowledge of BMD at the lumbar spine. Furthermore, this simple procedure generates risk predictions that agree with observed fracture rates more closely than when the lumbar spine is not included in the risk assessment. The number of individuals who were reclassified was relatively small overall, which is not surprising given that large discrepancies between the lumbar spine and femoral neck are relatively uncommon; however, among those individuals initially designated as moderate risk under the unadjusted FRAX measurement, one in eight was appropriately reclassified to a higher or lower risk category. Among those where there were larger BMD discrepancies (offset > 1 SD), one quarter showed a change in risk classification.

The small size of the incremental improvement in risk stratification is consistent with simulation studies showing very little expected benefit from combining BMD measurement sites since results tend to be moderately correlated [20]. Likewise, large epidemiologic studies have failed to demonstrate a meaningful improvement in fracture risk stratification by using the lowest from among several measurements versus using the hip alone [21, 22]. ROC analysis is a useful index of the overall population benefits to introducing a clinical decision-making tool but is relatively insensitive as an indicator of improvements in fracture discrimination and does not provide information about the actual risks that the models predict or the proportion of subjects with high or low risk values; therefore we have also presented measures based upon risk stratification tables [17, 19].

Lumbar spine BMD measurements are routinely performed in clinical practice both for baseline risk assessment and for monitoring purposes. When confronted with highly discordant measurements (lumbar spine worse than femoral neck), clinicians are in a quandary about how this should be integrated into the decision-making process. Under some guidelines such as those from the National Osteoporosis Foundation, treatment would be recommended for a lumbar spine T-score in the osteoporotic range regardless of the estimated risk [23]. Other national guidelines, such as those from the UK, do not have specific treatment recommendations for individuals with osteoporotic lumbar spine BMD when a 10-year fracture risk prediction from the femoral neck does not achieve the intervention threshold [24]. Canadian guidelines have attempted to address the issue of site discordance by substituting the minimum T-score for the femoral neck T-score [18], but this systematically overestimates fracture risk [25] and does not consider site-specific differences in fractures or the way BMD declines with age [6, 26]. Where lumbar spine and hip measurements are both performed for clinical purposes, using a procedure that accurately reflects the contribution of each measurement site to fracture risk is clearly preferred.

A limitation of this work is the lack of external validation. This is an important step before this rule can be recommended for clinical practice, as internal validation with a split-cohort design reduces but does not eliminate optimism in the model. Ideally, such independent validation would occur in several different cohorts with different case mix characteristics. A major challenge in the study of the epidemiology of osteoporotic fractures concerns the difficulty in ascertaining vertebral fractures. The protean definitions that have been applied clinically, morphometrically, and administratively speak to this challenge. Notwithstanding these legitimate difficulties, administrative definitions for vertebral fractures have been developed and tested with acceptable results [27]. Simplifications were used in constructing this FRAX enhancement. It is possible that a more complex procedure might produce slightly better performance, but this would defeat the purpose of creating a simple arithmetic procedure that busy clinicians could apply in clinical practice without need for calculation software. Other limitations include the potential for interactions between lumbar spine BMD and other risk variables (including sex and femoral neck BMD), the small number of men, and possible differences in absolute fracture risk for BMD estimates at the lumbar spine between men and women. At the time of the initial analysis, a validated Canadian FRAX tool did not exist but has since been released (FRAX 3.1). The effect of the offset variable after adjustment for FRAX probability of major osteoporotic fracture was found to be identical using the Canadian, US Caucasian, UK, and Swedish tools (all HR, 1.12; 95% CI, 1.06–1.18).

In summary, we believe that it is feasible to use a simple arithmetic procedure based upon the lumbar spine–femoral neck offset to adjust major osteoporotic fracture risk predictions from FRAX based upon the femoral neck to more accurately depict person-specific fracture risk. This shifts some individuals into a different risk category and thereby facilitates identification and treatment of those who would most benefit.