Introduction

Identification of women at risk for osteoporosis is of great importance for the prevention of osteoporotic fractures. It has been estimated that screening for osteoporosis has a similar cost effectiveness to screening for hypertension [15, 30, 36, 37]. To date, bone mineral density (BMD) measurements have been used for identifying osteoporosis. Routine BMD measurement of all women, however, is not feasible for most populations [21, 24]: it is costly [13] and not available universally. Besides, it is not clear which set of women should undergo BMD measurements. Hence identification of a high-risk subset of women is an important element of effective preventive strategies. This becomes a major task in a public health setting where the goal is to efficiently identify women for further assessment using simple and easy clinical tools. A variety of risk predictors for low BMD have been proposed in the literature. Many of them have been identified as statistically significant without assessing their contribution in predicting BMD in terms of clinical relevance.

In this study we have tried to assess whether all the known risk factors for osteoporosis are equally important to predict osteoporosis in women. We have also tried to identify a set of risk factors that were more informative and convenient to use in practice for the identification of a high-risk subgroup.

Methods

We used NHANES III public use files for developing our prediction models. For this we performed a systematic search of the existing literature to identify the risk factors for osteoporosis. Bivariate analysis was performed to identify the hierarchy of the association between the risk factors and osteoporosis, and for performing multiple linear regression analysis. Adjusted r 2, C values, and area under the curve (AUC) were calculated for different models. We finally chose one model based on its practicability and parsimony. We dichotomized the predicted T-scores (described later) into osteoporosis and nonosteoporosis from this model and computed sensitivity, specificity, and the predictive values for different cutoff T-scores. The whole procedure was followed for the two femoral sites: whole proximal region and the femoral neck region.

We used the "public use" file of the Third National Health and Nutrition Examination Survey (NHANES III) of the United States conducted during 1988–1994, for the assessment of the potential risk predictors. NHANES III is the seventh in a series of studies conducted since 1960 by the Center for Health Statistics, Centers for Disease Control and Prevention, USA. NHANES III was conducted in two phases drawing independent random samples: phase I from October 1988 to October 1991 and phase II from September 1991 to October 1994. A total of 33,994 participants, 2 months and older were identified (http://www.cdc.gov/nchs/nhanes.htm). During its 6-year duration, 20,050 participants responded to the questionnaire-based survey. Out of these 20,050 participants, 14,646 had an acceptable hip bone scan, of whom 7,532 were female (51.4%), and 3,198 were 51 years or older. Out of these, 1,828 were non-Hispanic whites, and 869 were phase I participants, and 959 phase II participants. We used the phase II participants in our study.

We identified various risk factors for osteoporosis through a Medline and Healthstar search. The search included papers published in English and French between the years 1985 and 1999 with the keywords/phrases "bone density" or "densitometry, X-ray," and "risk" or "risk factors." We thus identified 888 articles, out of which 14 studies were selected for identifying the risk factors ([2, 3, 7, 8, 9, 12, 17, 18, 22, 23, 26, 29, 31, 32]. The inclusion criteria for the papers were BMD measurement using photon absorptiometry (SPA, CPA), single-energy absorptiometry (SXA), dual-energy X-ray absorptiometry (DXA), quantitative ultrasound (QUS), quantitative computer tomography (QCT), or quantitative magnetic resonance (QMR), and more than 100 study participants per study aged over 40 years.

We chose NHANES III for our study because it gave special emphasis to obtaining information on osteoporosis and its risk factors [4, 19, 20, 34], and most of the risk factors we identified from the Medline and Healthstar search were included in this study. Various risk factors identified are shown in Table 1. We restricted our analysis to white non-Hispanic postmenopausal women aged 51 years and older, with a valid BMD measurement at hip, and with missing data not exceeding 50%. Thus 959 women were available for our study. We have used T-score to measure BMD as recommended by WHO. T-Score is a value for BMD expressed as the number of SD by which an individual result differs from the mean value for young adults of the same sex [37]. Osteoporosis is defined as a BMD 2.5 SD or more below the average value for premenopausal women (T-score <−2.5)[15, 37].

Table 1 Risk factors for osteoporosis identified and assessed in this study

Based on this classification, we used the T-score of –2.5 or less (osteoporosis = T-score <−2.5; nonosteoporosis = T-score >−2.5) as the gold standard for osteoporosis. For our study, we used the BMD measurement of two locations—the whole proximal femur and the femoral neck—which were measured using dual-energy X-ray absorptiometry (DXA) with two X-rays of different energies [1]. The BMD measurements for both these sites were done for all the study participants. All women having a T-score of <−2.5 by DXA were defined as confirmed cases of osteoporosis, those with T-score >−2.5 were defined as not having osteoporosis, separately for both the sites.

Each potential risk factor or predictor variable (Table 1) was cross-tabulated by osteoporosis using T-scores from both the sites, and the P values were calculated (Table 2). Since the findings were almost identical for both the regions, therefore, only results for the whole proximal region are presented. This bivariate analysis was used to run different multiple regression models using the forced-entry method for predicting T-scores, for both the sites separately. We chose models based on their ability to explain the variability in osteoporosis (adjusted r 2) rather than their coefficients and P values. We ran a receiver operating characteristic (ROC) curve and determined the AUC for all the models. The true T-score (gold standard) was used as the state variable for this. We also calculated C-values for these models [10, 11]. We adopted this method because for several models there was hardly any change in the coefficients and yet the P values were significantly different. We performed these steps for both the whole proximal femur and the femoral neck regions.

Table 2 Number of respondents with risk factors by osteoporosis statusa

Finally, we dichotomized the predicted T-scores based on our preferred model (age and measured weight) into osteoporotic and nonosteoporotic groups, using T-score of <−2.5 as cutoff. We then cross-tabulated the dichotomized predicted T-scores with the gold standard measurements of osteoporosis and calculated sensitivity, specificity, positive and negative predictive values for different cutoff points of the predicted scores ( T-scores, <−2.5, <−2.3, <−2.0, <−1.9, etc.). This was again done for both the regions separately. The gold standards for both were from the respective regions only.

Results

Of 959 women included in the analysis, 189 had osteoporosis. Women with osteoporosis had a mean age of 77 years, compared with the mean age of 68 years for the women with no osteoporosis. The distribution of risk factors among osteoporotic and nonosteoporotic groups is shown in Table 2. The following risk factors were significantly different between the osteoporotic and nonosteoporotic groups: age at interview, measured weight, measured height, total number of live births, fractured hip, fractured wrist, fractured spine, and history of maternal osteoporosis (the results were almost identical for both regions; therefore, data for femur neck are not presented).

Prediction models based on whole proximal femur

For whole proximal femur, the model containing age and the measured weight was found to be the best and clinically most relevant prediction model. The adjusted r 2, AUC, and C-value for this and other selected models are shown in Table 3. This model, comprising age and measured weight, estimates BMD as follows:

Table 3 AUC, C-values, and adjusted r 2 for selected models based on whole proximal femur. AUC Area under the receiver operating characteristic (ROC) curve
$$ Predicted\;T{\text{ - }}Score = - 1.332 - 0.0404 \times (Age) + 0.0396 \times (Measured\;Weight) $$
(1)

The sensitivity, specificity, and predictive values for different cutoff values of T-score predicted by our preferred modelare shown in Table 4.

Table 4 Women at risk, sensitivity, specificity, and predictive values for different cutoffs of the predicted T-scores for our preferred model (variables: age and measured weight) based on whole proximal femur

Prediction models based on femoral neck

Similarly, Table 5 displays the models for the femoral neck. Once again, we identified the age and measured weight model as the best model because of simplicity and fair prediction. The prediction equation is

Table 5 AUC, C-values, and adjusted r2 for selected models based on femoral neck. AUC Area under the receiver operating characteristic (ROC) curve
$$ Predicted\;T{\text{ - }}Score = - 1.318 - 0.0360 \times (Age) + 0.0314 \times (Measured\;Weight) $$
(2)

Table 6, likewise shows the sensitivity, specificity, and the predictive values for different cutoff T-scores for the age and measured weight model.

Table 6 Women at risk, sensitivity, specificity, and predictive values for different cutoffs of the predicted T-scores for our preferred model (variables: age and measured weight) based on femoral neck

Tables 4 and 6 give the number and percentages of at-risk women for different cutoff points for both the regions separately. Figure 1 shows the predictive values of our preferred model for different prevalence proportions (pretest probabilities) at different cutoff points.

Fig. 1A–D
figure 1

Performance of our preferred model (age and measured weight): predictive values for different prevalence proportions (pretest probabilities) of osteoporosis and six cutoff thresholds. A Positive predictive value for the whole femur region. B Negative predictive value for the whole femur region. C Positive predictive value for the femoral neck region. D Negative predictive value for the femoral neck region

From Tables 3 and 5, we note that for various models, adjusted r 2 varied from 0.43 to 0.44 (whole femur) and 0.39 to 0.40 (femur neck). Many of these models contain variables that are difficult to measure in field conditions, requiring sophisticated techniques. Since the predictive power of the models are only very marginally different, we chose the most parsimonious model containing only age and weight as our preferred model.

Tables 4 and 6 give the performance of our preferred model at different cutoff T-scores. This is to explain the importance of adjusting the cutoff based on the objectives and the specific characteristics of the population under study.

Discussion

In this study we identified age and weight as two predictor variables which are by far the most informative regarding low bone mineral density in postmenopausal white women. Consideration of other risk factors adds little to the identification of women at risk for osteoporosis. We consequently suggest the use of a simple prescreening tool based on these easily identifiable clinical/anthropological measures (risk predictors) to identify postmenopausal white non-Hispanic women aged 51 years and older for further assessment by absorptiometry. As shown in "Results," simple equations requiring just a pocket calculator can be used to calculate BMD scores and predict osteoporosis in these women under field settings and decide whether to recommend a particular woman for further assessment by DXA. We have presented sensitivity, specificity, and predictive values for different cutoff T-scores predicted by our models for whole proximal femur (Eq. 1) and femoral neck (Eq. 2), respectively.

Our study shows that the cutoff of –1.7 gives a sensitivity of above 80% for both the regions, which for our study population could be recommended as a suitable cutoff for screening purposes. This sensitivity of 80% implies approximately 50% BMD examinations. In other words, in order to identify the majority (80%) of osteoporotic women in our study, we needed to examine at least half of our population. However, we have also shown the measures for other cutoffs as well. This is to show how our models performed at different levels, which could be of help to other users since different cutoffs are needed for different purposes. Although there is an increasing certainty regarding the presence of osteoporosis with increasing age and decreasing weight, we would like to emphasize the difference between screening and diagnosis. Osteodensotometry may still have a place as a prognostic parameter to monitor disease activity even if the presence of osteoporosis is beyond reasonable doubt. The application of the age-weight screening tool is far more useful for ruling out disease risk and avoiding an unnecessary diagnostic burden on both the patient and the health care system.

Sometimes in field conditions we do not have a pocket calculator handy. In such situations, we can use the graphs in Fig. 2A, B constructed from Eqs. 1 and 2, respectively. Based on age-weight combination and the choice of the region, i.e., whether whole proximal femur or the femoral neck, the T-score can be predicted. Based on the cutoff T-scores, one can decide whether the woman needs further assessment by absorptiometry or not. For example, if the age and weight of a particular woman are 65 years and 50 kg, then using the above formulas, we get the predicted T-scores of −1.978 (~−2.0) and −2.088 (~−2.1) for proximal femur and femur neck, respectively. If our cutoff T-score is −1.9, then this woman would be a candidate for further BMD assessment by DXA. Looking at Tables 4 and 6, we see that the sensitivity and specificity for the −1.9 cutoff is 75% and 77%, and 76% and 73%, respectively for the proximal femur and the femoral neck.

Fig. 2A, B
figure 2

Using charts to predict T-scores based on our preferred model (age and measured weight). A For the whole proximal femur region at a cutoff of −1.9 using Eq. 1. B For the femoral neck region at a cutoff of −1.9 using Eq. 2. The cutoff −1.9 is just for illustration, it can be changed based on population characteristics and study goals

The important role of prediction models in identifying women needing BMD assessment has already been realised [22, 25, 28, 35]. Previous authors have used different definitions of "low bone mass": e.g., cutoff set at −2.0 and −3.5 [5, 6, 22]. This "mean" varied from study to study based on the objective of the study, reference population, region scanned, and the densitometer. In order to use a widely accepted gold standard, we used the WHO-recommended T-scores for whole proximal femur and femoral neck, which are based on the measurements of young white non-Hispanic women aged 20–29 years [37]. Many of the previous studies have included estrogen use, which we did not include because we wanted to develop models based on risk factors not involving past medication, which often are not well remembered. We have used two locations to build our models, where the scoring is in terms of predicted T-scores, which is simple and easy to understand and visualize. The scores from these two regions were highly correlated. This was apparent by running linear regression between the T-scores obtained from these two regions (adjusted r 2=0.85 [unadjusted r 2=0.92], α=0.340 and Β=1.056).

The distinction between statistical significance and clinical relevance is crucial. While large studies have the potential to give effect estimates with high precision and therefore identify risk predictors even if their relative effects are small, this carries the risk of introducing unnecessary complexity in explanatory models without adding clinical value. Due to the explicit assessment of the relative contribution of additional risk predictors to explain the variability in osteoporosis (adjusted r 2) rather than the significance testing, we suggest only two variables that are easy to measure for the prediction of low BMD. It has been shown that the incremental predictive value that accrues by adding further variables is not worth increasing the complexity of the prediction formulas. With very few differences in the coefficients and some variation of variable-specific P values there was no perceptible difference in the adjusted r 2. Despite this we do not think that these models should replace densitometry to confirm osteoporosis. The specific role of the proposed parsimonious models is in field conditions for screening purposes, where large numbers of women have to be evaluated and cost is a limiting factor. Therefore, our prediction equations are screening tools for population use and not confirmatory tests for individuals.

Our study findings are in line with the previously published studies. For example, a study by Koh et al. [16] conducted in eight Asian countries showed that after "item-reduction" from a multivariable regression analysis, a model based on only age and body weight performed the best. The study also reported a sensitivity of 90% at <−1.0 cutoff, which is comparable to our findings. Though we have not shown the performance of our models at <−1.0 cutoff, yet the trend visible from our findings suggests the same (sensitivities of 81% and 87%, and 88% and 93% at cutoffs −1.7 and −1.5 for proximal femur and femur neck, respectively). Similarly, another study by Dargent-Molina et al. [6] showed that weight alone was the strongest predictor of very low BMD and had approximately the same sensitivity as the full score comprising six predictors. Likewise, Van der Voort et al. [33], in trying to assess the role of BMI in screening osteoporosis, concluded that measuring weight and just asking height was good enough to differentiate osteoporotic women. Our study showed the fit of the models but did not externally validate the findings, which would require yet another study. One of the reasons that our age-weight model performed as well as bigger models could be that probably all women had information on these variables as compared to others. Our models are population-specific and cannot be generalized as such. We hypothesize that they need to be adjusted in different populations. We want to emphasize that cutoff values derived from age and weight algorithms will vary between populations and differing prevalence of osteoporosis (Fig. 1). This implies that the cutoff values needed to achieve, e.g., a sensitivity of 80% will have to be recalculated for populations other than North American Caucasian women. In other words, the normative values differ between populations: those derived from the NHANES III population will differ systematically from, for example, European populations. A further area of research may be to use decision-analytic methods to determine the optimal cutoff point for BMD and osteoporosis prediction.

In conclusion, we have highlighted that the choice of prediction models should depend on the adjusted r 2 of the entire risk sets rather than the P values of a single risk factor. The study has demonstrated the predictive equivalence of simple prediction models for osteoporosis with only two variables that can be used for screening purposes. Finally, we acknowledge the potential role of population differences in prediction models which requires further research into the validity of risk scores across populations.