Introduction

Osteoporosis-related fractures are common and have a significant societal impact in terms of human and economic costs [1]. Dual-energy X-ray absorptiometry (DXA) is widely used to measure bone mineral density (BMD) for the purposes of osteoporosis diagnosis and fracture risk assessment [2, 3]. Treatment is often initiated on the basis of low BMD, either used alone or in conjunction with other clinical risk factors [4,5,6]. In appropriately selected women, approved treatments can reduce fracture risk in primary and secondary prevention settings [7,8,9].

Although BMD monitoring during the initial 3–5 years of treatment is recommended by some clinical practice guidelines [5, 10], the role of repeat BMD testing during pharmacotherapy remains controversial [11,12,13]. Group-level clinical trial data suggests that larger BMD increases are associated with greater fracture risk reduction [14, 15]. A meta-regression of published trials found that greater improvements in total hip, femoral neck, and lumbar spine BMD were strongly associated with greater reductions in vertebral fractures; greater improvements in total hip and femoral neck (but not lumbar spine) BMD were strongly associated with greater reductions in hip fractures [16]. In a clinical registry-based analysis, treatment-related increases in total hip BMD were associated with reduced fracture risk compared with stable BMD, and decreases in total hip BMD were associated with greater risk for fractures [17].

Technical challenges in BMD monitoring among individuals include measurement error (typically 3–5%) [2, 18]. An additional source of confusion regards the optimal site for BMD monitoring. Discordance between the spine and hip BMD T-scores is not infrequent in routine clinical practice [19, 20], and discordance also commonly arises in observed BMD change at various measurement sites [21].

The current study was performed to compare BMD monitoring at the hip versus the lumbar spine in women initiating anti-osteoporosis drug therapy in the routine clinical practice setting. We used population-based registries from the Province of Manitoba, Canada, to assess whether anti-fracture effects were more strongly associated with change in hip BMD or change in lumbar spine BMD.

Methods

Patient population

The study population consisted of all women age 40 years and older not receiving osteoporosis therapy at baseline who subsequently initiated anti-osteoporosis therapy and had repeat BMD testing > 1 year later. Using linkage to the province-wide retail pharmacy network [22], we identified women without significant systemic estrogen use or other anti-osteoporosis medication use in the year prior to baseline BMD testing (defined as < 3 months pharmacy dispensed bisphosphonate, calcitonin, systemic estrogen product, raloxifene, or teriparatide) who subsequently received therapy in the year following BMD testing. We excluded women without a full year of coverage data prior to the baseline BMD or a full year of coverage after the baseline BMD, without health care coverage, and with missing BMD measurements at baseline or follow-up for the total hip, femur neck, or lumbar spine. All baseline scans were performed between April 1, 1999, and December 31, 2014, with follow-up scans to March 31, 2016.

In the Canadian Province of Manitoba (population 1.3 million in 2017), health services are provided to virtually all residents through a public healthcare system. DXA testing has been managed as an integrated program since 1997; criteria for baseline testing include screening at age 65 years for women and in younger women with additional risk factors [23]. Consistent with national guidelines, the program’s recommended interval for initial follow-up is 3 years for most patients, 1 year in those on systemic glucocorticoid therapy or aromatase inhibitors, and at least 5 years if previously reported as low risk [18]. The program maintains a database of all DXA results that can be linked with other population-based computerized health databases through an anonymous personal identifier. The DXA database has completeness and accuracy in excess of 99% [24]. The study was approved by the Health Research Ethics Board for the University of Manitoba.

Bone mineral density measurements

Lumbar spine and hip DXA scans were performed and analyzed in accordance with the manufacturer recommendations. Our in-house scheduling software automatically schedules patients on the same scanner as was used for the baseline examination. Femoral neck and total hip T-scores (number of SDs above or below young adult mean BMD) were calculated from NHANES III white female reference values [25]; lumbar spine (L1–4) T-scores were based upon the manufacturer white female reference values. The program’s quality assurance is under strict supervision by a medical physicist [23]. The six cross-calibrated instruments used for this study (3 Prodigy and 3 iDXA, GE/Lunar Healthcare, Madison, WI) exhibited stable long-term performance (coefficient of variation < 0.5%). All reporting physicians and supervising technologists are required to maintain DXA certification with the International Society for Clinical Densitometry (ISCD).

The absolute BMD difference between the two DXA tests (in g/cm2) was compared with 95% least significant change (LSC) values for assessment of change using accepted methods, where LSC is the least amount of BMD change that can be considered statistically significant [18, 26, 27]. BMD measurement error for the Manitoba BMD program used for computing the LSC is derived from more than 400 DXA scan pairs (most performed on different days but within 28 days by different technologists). Our LSC procedure requires every DXA technologist in Manitoba to participate in the precision assessment, initially following major equipment change and also as an ongoing activity (10 per technologist per year). This ensures that the LSC is current and reflects the full spectrum of DXA technologist skill and experience. We have previously reported that this approach (rather than same-day repositioning with the same technologist) is more representative of measurement error encountered during clinical monitoring [28]. From these scan pairs, we obtained the following 95% LSC values which are within acceptable ranges [17]: total hip, 0.03 g/cm2; lumbar spine, 0.05 g/cm2; femoral neck, 0.055 g/cm2. An observed absolute difference less than these values would be considered to be within the range of measurement error (designated stable) whereas an increase or decrease equal to or exceeding these values is outside the range of measurement error (designated detectable increase or decrease in BMD, respectively).

Baseline fracture probability calculations

Ten-year probability of a major osteoporotic fracture risk was calculated using the World Health Organization fracture risk assessment tool, Canadian version (FRAX® Desktop Multi-Patient Entry, version 3.7) [29, 30]. Briefly, age, body mass index (BMI), femoral neck BMD, and other data required for calculating fracture probability with FRAX were assessed through a combination of measurements (height and weight), information collected directly from subjects at the time of DXA scanning, hospital discharge abstracts (diagnoses and procedures coded using the International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] prior to 2004 and International Classification of Diseases, Tenth Revision, Canadian Enhancements [ICD-10-CA] thereafter), and physician billing claims (coded using ICD-9-CM) as previously described [31]. The Canadian FRAX tool was calibrated using nationwide hip fracture data. Predictions agree closely with observed fracture risk in our population [32, 33].

Fractures outcomes

Manitoba Health records were assessed for the presence of incident non-traumatic hip, clinical vertebral, forearm, and humerus fracture diagnostic codes (collectively designated “major osteoporotic” fractures) using previously validated algorithms [34, 35]. Fractures that were not associated with trauma codes were assessed through a combination of hospital discharge abstracts and physician billing claims. We required that hip and forearm fractures codes be associated with site-specific fracture reduction, fixation, or casting codes to enhance specificity for an acute fracture event. To minimize potential misclassification of prior incident fractures, we conservatively required that there be no hospitalization or physician visit(s) with the same fracture type in the 6 months preceding an incident fracture diagnosis.

Statistical analysis

Statistical analyses were performed with Statistica (Version 13.0, StatSoft Inc., Tulsa, OK). Descriptive statistics for demographic and baseline characteristics are presented as mean ± SD for continuous variables or number (%) for categorical variables. Student’s t tests (continuous measures) and χ2 tests (categorical measures) were used to test for between-group differences. Time to incident fracture following the first DXA scan (index date) was studied using Cox proportional hazards regression. Observations were censored for death, migration out of province, or end of follow-up (March 31, 2017). Our primary analysis examined incident MOF as the outcome of interest with BMD change as a categorical measure: stable (referent), detectable decrease, or detectable increase. Models considered total hip BMD change and lumbar spine BMD change, first separately then simultaneously. We reported hazard ratios (HRs) with 95% confidence intervals (CIs) in addition to the overall Wald score as measures of effect size. HRs were adjusted for baseline fracture probability using the FRAX score log-transformed to correct for a skewed distribution (MOF fracture probability for incident MOF or vertebral fracture, hip fracture probability for incident hip fracture). The proportional hazards assumption was confirmed by testing scaled Schoenfeld residuals versus time. We undertook a series of secondary sensitivity analyses to test the robustness of our findings. First, we examined three continuous measures of BMD change: absolute change in g/cm2, annualized change, and annualized percent change. Second, we considered hip fracture and vertebral fracture as the outcomes of interest. Third, we repeated analyses using femur neck instead of total hip as the monitoring site. Last, we repeated the categorical analyses based upon exclusion of lumbar spine levels with structural artifact by ISCD certified reporting physicians using LSC limits adjusted for the loss in spine precision due to the smaller number of available vertebrae [36].

Results

The study population selection flow chart is shown in Supplemental Fig. 1. The final study population consisted of 6093 women, with baseline characteristics summarized in Table 1. Mean age was 63.4 ± 9.8 years at baseline and the mean testing BMD interval was 4.7 ± 2.6 years. The majority (56.5%) of the women met the osteoporosis BMD T-score definition at one or more sites. Bisphosphonate therapy accounted for the majority of the treatment (70%) followed by systemic estrogen (23%), raloxifene (5%), calcitonin (2%), denosumab and teriparatide (< 1%).

Table 1 Baseline characteristics of the study population

Table 2 summarizes detectable BMD change according to measurement site. For total hip and lumbar spine, there was concordance in 57.2% of cases. Detectable decrease in BMD was seen in 19.8% of total hip measurements vs 10.8% of lumbar spine measurements, while a detectable increase was seen in 30.7% of total hip and 39.7% of lumbar spine cases. Both total hip and lumbar spine showed a detectable change in 49.5% of cases. In contrast, the femur neck showed detectable change in only 20.2% of cases. Overall concordance for the femur neck vs lumbar spine was 53.1%. Detectable decrease was similar for femur neck and lumbar spine (10.8%) while a detectable increase was more frequently seen for lumbar spine than femur neck (39.7 vs 9.4%).

Table 2 Detectable change in bone mineral density (BMD) between first and second DXA examinations

During mean follow-up 12.1 ± 3.6 years, 995 women sustained one or more incident MOF, 246 sustained hip fracture, and 301 sustained clinical vertebral fracture. Table 3 shows the mean change in BMD (absolute change, annualized change, and annualized percent change) according to site of fracture and site of BMD measurement. For the total hip, there was a significantly greater increase in BMD in fracture-free vs fracture women regardless of whether change was expressed as an absolute measurement, annualized per year, or annualized percent change. For the femur neck, significantly greater increase in BMD was seen for women without incident MOF or hip fracture, but not for incident vertebral fracture. In contrast, change in lumbar spine BMD was not associated with incident MOF or hip fracture, and paradoxically, women without incident vertebral fracture had a smaller (not larger) BMD increase compared with women sustaining a vertebral fracture.

Table 3 Unadjusted change in bone mineral density (BMD) according to incident fracture status

Table 4 shows HRs for incident fracture adjusted for baseline fracture probability according to detectable BMD change in models that considered hip and spine measurements, first separately then simultaneously. Detectable decreases in total hip BMD and lumbar spine BMD were separately associated with increased risk for incident MOF, but when combined, only detectable decrease in total hip BMD continued to predict increase fracture risk (HR 1.46, 95% CI 1.24–1.73). A detectable increase in total hip BMD was associated with reduced risk for incident MOF (HR 0.71, 95% CI 0.61–0.83) while lumbar spine change was not (HR 0.93, 95% CI 0.81–1.06). As a monitoring site, total hip was more strongly associated with incident MOF (Wald 60.6, P < 0.001) than lumbar spine (Wald 10.9. P = 0.004) and showed minimal attenuation in the combined model (Wald 51.9, P < 0.001 vs 1.7, P = 0.428). Similar results were seen for incident hip fracture and incident vertebral fracture as the outcomes of interest. In both sets of analyses, detectable decrease in total hip BMD was associated with increased fracture risk whereas a detectable increase was associated with a reduction in fracture risk, and this was largely unaffected by simultaneous adjustment for change in lumbar spine BMD. In contrast, when adjusted for change in total hip BMD, a detectable decrease in lumbar spine BMD did not predict increased fracture risk and a detectable increase in lumbar spine did not predict reduced fracture risk. Results were broadly similar for femur neck BMD vs lumbar spine BMD monitoring (Supplemental Table 1) and after exclusion of vertebral levels with structural artifact (Supplemental Table 2). Once again, lumbar spine BMD change was not a reliable predictor of anti-fracture effects when combined with femur neck BMD change.

Table 4 Adjusted hazard ratio (HR) with 95% confidence interval (CI) for incident fracture according to detectable change in total hip and lumbar spine bone mineral density (BMD), with sites considered separately and simultaneously

We also looked at BMD change as a continuous measure expressed as HR per SD. Similar to the categorical analyses, each SD increase in total hip BMD change was associated with a significant reduction in incident MOF, incident hip, and incident vertebral fracture before and after simultaneously considering change in lumbar spine BMD change (Fig. 1). In contrast, an increase in lumbar spine BMD showed only a weak trend towards incident MOF risk when tested separately and was not associated with the incident hip fracture or incident vertebral fracture. When simultaneously considering change in total hip BMD, an increase in lumbar spine BMD did not predict reduced fracture risk (paradoxically fracture risk appeared to be slightly greater). Similar results were seen for continuous analysis based upon femur neck BMD change versus lumbar spine BMD change (Supplemental Fig. 2).

Fig. 1
figure 1

Adjusted hazard ratio (HR) with 95% confidence interval (CI) for incident fracture according to continuous change in total hip and lumbar spine bone mineral density (BMD), with sites considered separately and simultaneously. Results adjusted for baseline fracture probability. a Incident major osteoporotic fracture (MOF). b Incident hip fracture. c Incident vertebral fracture

Discussion

In this large clinical cohort, we were able to look at BMD change in women initiating anti-osteoporosis therapy and relate their fracture outcomes to BMD change. There was consistent evidence that change in total hip BMD, whether assessed categorically or as a continuous measure, was associated with incident fracture outcomes. Femur neck BMD showed a similar but slightly less robust association. In contrast, lumbar spine BMD change was less strongly associated with fracture outcomes and any effect was attenuated (or reversed) by simultaneously considering hip BMD change.

Our results may at first appear paradoxical, since lumbar spine BMD shows a larger increase than hip BMD in individuals initiating treatment. However, this is offset by a lower test precision and by measurement error (e.g., degenerative spondylosis) that can produce an increase in BMD unrelated to treatment [36, 37]. Indeed, the placebo arms of many clinical trials demonstrate an increase in spine BMD while hip BMD decreases [16]. Finally, vertebral fractures in the lumbar spine could falsely elevated spine BMD despite best efforts to exclude these from the region of interest.

Our study supports the individual-level meta-analysis of clinical trial data from Bouxsein et al. [16]. In this pooled analysis of multiple clinical trials, change in hip BMD was more strongly associated with hip fracture outcomes than lumbar spine BMD. Conversely, the latter found that change in lumbar spine BMD was more strongly associated with vertebral fracture outcomes than hip BMD, though the two sites were not assessed simultaneously. The weaker performance of lumbar spine BMD change in our study may reflect a higher prevalence of degenerative and structural artifact in a clinical population compared with women included in clinical trials.

Strengths of our study include broad inclusion criteria, which are representative of women treated in routine clinical practice, and access to a DXA registry with rigorous data collection and linkage to administrative healthcare databases [24]. Our results are therefore likely to be of greater applicability to clinical practice than prospective research cohorts [38] or clinical trials [13]. We also acknowledge several limitations. Firstly, the threat of “confounding by indication” exists in all observational studies; however, the fact that all women in this study initiated treatment mitigates this concern to a great degree. In addition, women with strong contraindications to therapy or with primary non-adherence (i.e., not filling an initial prescription) were excluded from our study, but this also reflects a common clinical reality. Secondly, a limitation is in the lack of standardization in the BMD testing interval. Whereas a fixed testing interval would be seen in a clinical trial or a research cohort, in clinical practice, it is not possible to strictly enforce this. However, alternative measures of change (annualized) did not alter our findings suggesting that this is not an important limitation. Unfortunately, our study cannot provide specific guidance on the optimal testing interval where a relatively short testing interval is often recommended (e.g., 1–2 years after initiating therapy) [5, 10]. Thirdly, a limitation is that we did not consider change in clinical management that might occur based upon the results from the second BMD measurement. It is possible that when confronted with a detectable decrease in BMD, this might improve patient adherence or lead to a change in therapy. Alternatively, individuals with a detectable increase in BMD might be considered for a “drug holiday.” Either action would be expected to bias our results towards the null, and therefore, our findings are likely to be conservative. Fourthly, our study population was predominantly treated with bisphosphonates, and whether similar results would be seen for anabolic agents is uncertain. Finally, our study does not address the question of whether monitoring is useful in terms of guiding appropriate changes in therapy that ultimately lead to better fracture outcomes.

In conclusion, treatment-related increases in total hip BMD are associated with lower MOF, hip, and clinical vertebral fracture risk compared with stable BMD, while BMD decreases are associated with higher fracture risk. In contrast, spine BMD change is not independently associated with fracture risk. These findings may help to inform clinical management regarding BMD monitoring in women initiating anti-osteoporosis therapy. If monitoring is performed, the total hip site provides a better indicator of an anti-fracture effect than the lumbar spine.