Introduction

Evidence-based osteoporosis screening guidelines support bone density testing for elderly women in the US general population, but guidelines for younger postmenopausal women are less certain. The 2002 US Preventive Services Task Force Guidelines recommended routine bone density testing for all women aged 65 years or older (grade B recommendation) [1]. Based on evidence that the presence of certain clinical factors (e.g. lower body weight, hormone therapy) influences outcomes, the Task Force extended its recommendations to include routine screening in women aged 60–64 years who are at increased risk of osteoporosis and fracture. The Task Force made no recommendation for or against routine screening in postmenopausal women younger than age 60 years or in women aged 60 to 64 years without extra risk factors (grade C recommendation) [2]. Recently, peripheral DXA was found to be highly predictive of fracture risk in the National Osteoporosis Risk Assessment (NORA) longitudinal observational study of 200,160 postmenopausal women aged 50 years and older recruited from primary care practices in 34 states [3]. At baseline, 7% of the NORA cohort had osteoporosis and 11% had fractures of the wrist, spine, rib and hip since age 45 years, indicating a need for better strategies to identify and manage osteoporosis across a wider age spectrum of postmenopausal women.

Risk factor prescreening can be used to target bone density testing to postmenopausal women at higher risk of osteoporosis. Several risk assessment tools have been developed for this purpose [4–9]. The development cohorts for the Osteoporosis Self-assessment Tool [OST] [4], Osteoporosis Risk Assessment Instrument (ORAI) [5], and Simple Calculated Osteoporosis Risk Estimation (SCORE) [6] included postmenopausal women as young as 45 years. Diagnostic accuracy measures were not stratified by age in the original studies.

In our study, we compared the diagnostic accuracy of the OST, ORAI and SCORE in postmenopausal white women aged 45–64 years and aged 65 years or older in a large clinic-based study population. We determined whether these tools could identify osteoporosis equally well in younger and older postmenopausal women considered for bone density testing.

Materials and methods

Subjects

We conducted our study using data from a previously recruited sample of 4035 eligible consecutive postmenopausal women aged 45–96 years [10]. The study participants either self-referred or were referred by a physician for a bone mineral density scan between January 1996 and September 1999 to an outpatient osteoporosis center at the University of Liege, Liege, Belgium. The province of Liege had a population of 1,016,762 in 1998. The osteoporosis center at the university is the major referral center in the province. Referral of the study participants followed diagnostic suspicion of osteoporosis by the referring physician. Premenopausal patients and those with Paget’s disease or advanced osteoarthritis were not eligible to participate. Informed consent was obtained from all eligible study participants for whom complete medical records were available. The research protocol was reviewed and approved by the institutional review board of the University of Liege.

Variables and risk scores

We used the following independent variables assessed by chart review to calculate the risk scores for the OST, ORAI and SCORE: age, weight (kg or pounds), race (non-black race versus black race), history of rheumatoid arthritis, history of non-traumatic fracture of the wrist, rib or hip after 45 years of age, and estrogen use (never-use, ever-use or current use). Formulae for calculating the risk scores in the development cohorts are summarized in Table 1. Statistical weights used to calculate the OST and SCORE were derived by modifying linear regression coefficients to yield integer values. Statistical weights used to calculate the ORAI were derived by rounding odds ratios in the logistic regression model to the nearest integer.

Table 1 Osteoporosis risk assessment tools analyzed in the study

For each study participant, bone mineral density was measured at the femoral neck by dual-energy X-ray absorptiometry using Hologic QDR 1000, 2000 and 4500 densitometers (Hologic Inc., Waltham, Massachusetts), which have shown very high inter-instrument correlation coefficients in cross-calibration studies [11,12]. Patients were randomly assigned to the different machines. All machines were cross-calibrated and an anthropomorphic phantom was run every morning before patients were tested to ensure data quality. Independent research personnel who were blinded regarding the specific study hypotheses performed the bone density tests. Results were recorded in g/cm2, and as T-scores (standard deviation from mean in normal young adults) based on the NHANES III reference values for non-Hispanic white women ages 20–29 years for the femoral neck [13]. In accordance with World Health Organization criteria, patients were considered to have osteoporosis if their femoral neck T-score was –2.5 or below [14].

Statistical analysis

Descriptive statistics were tabulated for the entire study cohort aged 45–96 years. Other statistical analyses were performed separately for women aged 45–64 years and women aged 65 years or older, including construction of separate ROC curves and use of different threshold scores to calculate diagnostic accuracy measures for each age group.

We constructed receiver operating characteristic (ROC) curves for each risk assessment tool, using osteoporosis of the femoral neck as the reference variable and calculated risk score as the classification variable. The area under the ROC curve can range from 0.5 (test offers no information) to 1.0 (perfect test) and is higher for tests with higher overall diagnostic accuracy. We compared the areas under the ROC curves of the three risk assessment tools within each age group using methods for correlated ROC curves. We also compared the areas under the ROC curves for a single tool across age groups, i.e. the area under the ROC curve of each tool in women aged 45–64 years compared to the area under the ROC curve of the same tool in women aged 65 years or older, using methods for uncorrelated ROC curves.

Diagnostic accuracy measures were calculated for each risk assessment tool using a single threshold score to achieve approximately 90% sensitivity to detect osteoporosis at the femoral neck. This level of sensitivity was suitable for the purpose of prescreening to select a population appropriate for diagnostic testing by bone densitometry. We used exact methods to calculate binomial 95% confidence intervals for sensitivity and specificity [15].

Multi-category likelihood ratios with 95% confidence intervals (large sample approximations) were calculated using two threshold scores to create three categories of risk (low, intermediate, high) for each tool. LR values further from 1.0 indicate a more useful test to discriminate between patients at lower and higher risk of disease. The threshold scores were set to achieve a LR of 0.1–0.2 for the low-risk group and 5–10 for the high risk group, since those LR ranges generate moderate shifts in pre- to post-test probability [16]. In the case of the ORAI, the maximum achievable LR for the high-risk group was less than 5.

A P-value of 0.05 or less was considered significant for all statistical tests. The Stata 8.0 software was used for all analyses.

Results

The women ranged in age from 45–96 years with a mean age of 61.5 years [Table 2]. All women were postmenopausal and of white race. Their mean weight was 65.1 kg and their mean femoral neck bone mineral density of 0.703 g/cm2 was in the osteopenic range. The prevalence of osteoporosis was 5.5% (139/2539) among women aged 45–64 years, 16.1% (241/1496) among women aged 65 years or older and 9.4% (380/4035) for the entire cohort.

Table 2 Descriptive characteristics of the study population (n=4035). BMD bone mineral density

Mean scores for the risk assessment tools reflected fewer osteoporosis risk factors in younger women as compared to older women in the cohort (Table 3). For the OST, lower scores indicate a greater number of osteoporosis risk factors; the mean OST score was 1.6 for women aged 45–64 years and −1.0 for women aged 65 years or older. For the ORAI and SCORE, higher scores indicate a greater number of osteoporosis risk factors; the mean ORAI scores were 8.0 for women aged 45–64 years and 15.6 for women aged 65 years or older, and the mean SCORE values were 7.2 and 11.7 for the respective age groups.

Table 3 Mean risk scores in the study population, stratified by age

The overall ability of the three tools to identify osteoporosis at the femoral neck in the study population subgroups did not differ significantly (Table 4). The area under the ROC curve ranged from 0.750 to 0.768 for women aged 45–64 years (P=0.23) and 0.745 to 0.762 for women aged 65 years or older (P=0.06). Comparison of the ROC curves for each tool in the uncorrelated populations also showed no statistically significant difference in the areas under the ROC curves for the younger group compared with the older group (P=0.82 for OST, P=0.88 for ORAI, P=0.64 for SCORE). ROC curves were also constructed using T-score thresholds of −1 and −2 for the reference standard. For all three tools, the areas under the curve were lower using these T-score thresholds than using the osteoporosis threshold, indicating the tools had lower discriminatory ability to identify osteopenia compared with their ability to identify osteoporosis (data available upon request).

Table 4 Area under the receiver operating characteristic curves for osteoporosis risk assessment tools in postmenopausal women aged 45–64 years versus aged 65 years or older

The sensitivities and specificities of the three tools to detect osteoporosis at the femoral neck were similar for the two age groups when different threshold scores were used (Table 5). For women aged 45–64 years, the threshold score was 1 for the OST, 8 for the ORAI and 7 for the SCORE. Specificities ranged from 39.8% to 45.0% when sensitivity was approximately 90% in the younger age group. For women aged 65 years or older, the threshold score was −1 for the OST, 13 for the ORAI and 11 for the SCORE. Specificities ranged from 42.3% to 47.5% when sensitivity was approximately 90% in the older age group.

Table 5 Ability of the risk assessment tools to identify osteoporosis at the femoral neck in postmenopausal women aged 45–64 years versus aged 65 years or older

Likelihood ratios for each tool were comparable in the two age groups (Table 6). The LR for high-risk scores on the OST was 6.73 for women aged 45–64 years and 6.99 for women aged 65 years or older; the post-test probability of osteoporosis was 28.0% and 57.3% within the high-risk categories in the respective age groups. The LR for high-risk scores on the SCORE was 5.61 for women aged 45–64 years and 5.62 for women aged 65 years or older, with post-test probabilities of 24.5% and 51.9%, respectively. The LR for high-risk scores on the ORAI was 3.60 for women aged 45–64 years and 3.45 for women aged 65 years or older, with post-test probabilities of 17.3% and 39.9%, respectively.

Table 6 Likelihood ratios for low, intermediate and high-risk scores to detect osteoporosis at the femoral neck in postmenopausal women aged 45–64 years versus aged 65 years or older

The tools identified less than one-quarter of the total number of cases of osteoporosis within the cohort when a high-risk threshold score was used. Of the 139 women aged 45–64 years with osteoporosis, 30 (21.6%) were identified as high-risk by the OST, 29 (20.9%) by the ORAI and 26 (18.7%) by the SCORE. Of the 241 women aged 65 years or older with osteoporosis, 55 (22.8%) were identified as high-risk by the OST, 53 (22.0%) by the ORAI and 41 (17.0%) by the SCORE.

Discussion

The OST, ORAI and SCORE risk assessment tools showed equivalent ability to identify postmenopausal women aged 45–64 years and aged 65 years or older at higher risk of osteoporosis who might benefit from bone density testing. When different score thresholds were used for each age group, no significant difference was found in the diagnostic accuracy of each tool in the younger age group as compared to the older age group. The three tools also performed similarly when compared to each other within the younger age group. The small difference in performance among the three tools in women aged 65 years was not clinically meaningful and was probably an artifact of the large size of the subgroup.

The best strategy for case finding for patients at risk of fracture is still uncertain. However, several comparative studies have indicated osteoporosis risk assessment may be a useful strategy for prescreening patients before bone density testing. Cadarette et al. assessed the diagnostic properties of the ORAI, SCORE and two other decision rules in a population-based sample of 2365 postmenopausal women aged 45 years or older from nine study centers in Canada [17]. Threshold scores recommended by the developers of the decision rules were used in this analysis. Specificity of the ORAI was 27.8% when sensitivity was 97.5% to detect osteoporosis at the femoral neck; specificity of the SCORE was 17.9% when sensitivity was 99.6%. Both tools were found to be more helpful than the National Osteoporosis Foundation guidelines and a weight criterion alone to target bone density testing in high-risk patients. Cadarette confirmed the validity of the Osteoporosis Risk Assessment Instrument (ORAI), Osteoporosis Self-Assessment Tool (OST) chart and equation for identifying women with asymptomatic primary osteoporosis in a study of women aged 45 years or more from family practices of three University affiliated hospitals [18]. A study by Geusens et al. showed that the OST, ORAI and SCORE identified osteoporosis similarly in postmenopausal women aged 45 years or older from two US studies and two studies in the Netherlands. When higher threshold scores were used for the OST and SCORE and the original threshold was used for the ORAI, specificities ranged from 52% to 58% when sensitivity was approximately 90% [19]. These studies did not analyze for a spectrum effect across the wide age range of participants [20]. We constructed separate ROC curves and used separate threshold scores to assess the diagnostic capabilities of the risk assessment tools in women aged 45–64 years and women aged 65 years or older. By analyzing these groups in parallel, we found that the diagnostic capabilities of the tools were comparable but that fewer cases were identified in the younger age group due to the lower background prevalence of disease.

We chose age 65 years as a dividing point in our analysis based on the 2002 US Preventive Services Task Force guidelines for osteoporosis screening in postmenopausal women [1]. The Task Force found good evidence that bone density testing accurately predicts fracture risk in the short term and recommended routine bone density testing after women reach 65 years of age. For this reason, prescreening tools might be of greatest interest for women under 65 years of age to help guide the decision to order a bone density test. The Task Force did not specify the exact risk factors to consider but mentioned that weight <70 kg is the single best predictor of low bone density and that lack of current use of estrogen has also been incorporated into the ORAI. They found less evidence to support other individual risk factors such as smoking, weight loss, family history, decreased physical activity and several nutrition-related variables.

Our study has several limitations. Most importantly, we used different score thresholds to optimize risk tool performance in each age group. The score thresholds were more inclusive for the younger age group than for the older age group, reflecting the lower prevalence of osteoporosis and osteoporosis risk factors in the younger women. The similar performance of the tools in each age group supports the worth of further clinical research on osteoporosis risk assessment in postmenopausal women across a wide age range. However, our findings neither validate the tools for use in younger postmenopausal women nor imply that the particular score thresholds used in this study are appropriate for immediate clinical application. Second, our results might not be generalizable to women in the community, since our study cohort was a referral population from an osteoporosis center and since it is possible that Belgian women may differ from US women in important dietary (e.g. calcium intake) or health behaviors (e.g. smoking, exercise). However, since the background prevalence of osteoporosis in the study cohort was lower than the estimated prevalence for US women aged 50 years and older (third National Health and Nutrition Examination survey, 1988–1991) [21], it is unlikely that these factors contributed to an overestimation of the discriminatory ability of these tools. Third, the risk assessment tools we evaluated were developed from populations with different characteristics from our study population (see Table 1). The OST and ORAI were developed from population-based samples and might be less likely to identify patients with known important secondary causes of osteoporosis (e.g. chronic glucocorticoid therapy, hyperparathyroidism, anorexia nervosa) who present more often to referral centers. The SCORE was developed from a clinic-based sample that might have a profile more similar to our referral population; however, since our patients were all of white race, the SCORE’s race variable could not detect extra variability of bone density in our study population. The OST and ORAI performed about as well as the SCORE in our study population, indicating that age and weight (the variables present in all three tools) were the most important factors to identify high-risk women in a variety of settings. Finally, we cannot directly correlate risk scores with fracture outcomes in this cross-sectional study. Although validation against fracture is ideal, all three risk tools were developed to estimate osteoporosis risk as an easily measured surrogate of fracture risk.

We conclude that the OST, ORAI and SCORE risk assessment tools had similar discriminatory ability to identify osteoporosis at the femoral neck in a referral population of postmenopausal white women aged 45–64 years compared to women aged 65 years or older. Risk assessment can help save costs by identifying women who have few osteoporosis risk factors and are unlikely to benefit from bone density testing [22]. Cost-saving strategies are important to avoid unnecessary testing in a younger population with a lower background prevalence of osteoporosis. We showed that the OST, ORAI and SCORE perform similarly for younger and older postmenopausal women when score thresholds are set to achieve optimal test performance for each age group. Our results suggest that further testing of these risk assessment tools in clinical settings are warranted for younger and older postmenopausal women.