Introduction

Current guidelines for the assessment of osteoporosis are based on case finding [1, 2, 3, 4]. Individuals are identified on the basis of easy-to-measure risk indicators and subsequently undergo more specialized bone testing with, for example, a bone mineral density (BMD) measurement. Intervention is considered when BMD lies below an intervention threshold. The threshold used by the International Osteoporosis Foundation is set at 2.5 standard deviations below the average for young adults (T-score = −2.5 SD), corresponding to the WHO criterion for osteoporosis [4]. There is, however, an increasing appreciation that a fixed threshold for BMD is not optimal to serve as a universal intervention threshold [1]. The problem arises largely because there are many indicators other than BMD that contribute to fracture risk [5].

Of these risk indicators, age is of critical importance, and the same T-score has a quite different significance at different ages. For example a T-score of −2.5 SD measured by DXA at the femoral neck is associated with a 10-year probability of hip fracture of 1.9% in Swedish women aged 50 years [6]. For patients at an age of 80 years, the hip fracture probability is 19.4% with the same T-score. In addition to age, there are a number of other risk indicators that contribute to fracture risk independently of BMD. A well-validated example is a prior fragility fracture, which, depending on the site, is associated with a twofold or more increase in fracture risk, independently of BMD [7]. Thus, if it is desired to treat individuals with a hip fracture probability over a given threshold, a less stringent T-score may be more appropriate in patients with a prior fracture than that derived from age and BMD alone.

The consideration of multiple independent risk indicators increases the detection rate (sensitivity) without loss of specificity [5, 8]. The aim of the present study was to provide a mathematical framework to quantify the impact of a case-finding strategy using a combination of risk indicators with BMD. This was a theoretical analysis, and the examples from cohorts were only used to illustrate the general principles.

Methods

Bone mineral density is normally (or near normally) distributed in the population [9, 10]. However, the risk of many fractures, particularly hip fracture, increases exponentially with decreasing BMD and therefore fracture risk can be expressed as a risk gradient [11, 12]. For example, the risk of hip fracture increases 2.6-fold for each standard deviation decrease in BMD at the femoral neck [12]. When fracture risk is additionally assessed with a dichotomous variable (e.g., past fragility fracture) or with another continuous variable (e.g., body mass index), the combined distribution is continuous but, when the risk indicators are totally or partially independent, with a higher gradient of risk. Figure 1 shows the relationship between a measured variable or combination of variables with gradients of risk of 2, 2.6, and 4 per standard deviation in risk score. The steeper the gradient of risk, the greater the potential for identifying individuals at higher risk.

Fig. 1
figure 1

Distribution of relative fracture risk in the population at a given age for various assumptions for the gradient of risk (RR/SD). The risk at the average risk score is the reference

On the assumption that the combination of risk indicators is also distributed normally or close to normally in the population, such as BMD, it is possible to determine the risk of individuals compared with the average risk in the population (stratified by age and gender), by combining both distributions.

From this combination of distributions, the proportion of individuals (P) above a given risk threshold can be calculated. This proportion is dependent on the chosen risk threshold (RT) and the gradient of risk (GR)—i.e., the relative risk per SD change in risk score. It can be shown (see the “Appendix” for more details) that this proportion is given by the equation:

$$P = \Phi {\left[ z \right]}$$
(1)

where Φ is the cumulative distribution function of the standardized normal distribution, and z is given by the equation:

$$z{\text{ = }}{\left[ {\frac{{{\text{ - }}{\left[ {{\text{ln}}{\left( {RT} \right)}{\text{ + }}{\left( {{\text{ln}}{\left( {GR} \right)}} \right)}^{{\text{2}}} {\text{/2}}} \right]}}}{{{\text{ln}}{\left( {GR} \right)}}}} \right]}$$
(2)

Similarly, the average risk (AR) relative to the average in the general population, in the group above the chosen risk threshold, can be calculated. This is given by the equation:

$$AR = \frac{{\Phi {\left( {z + {\mathbf{ln}}{\left( {GR} \right)}} \right)}}}{{\Phi {\left( z \right)}}}$$
(3)

The assumption that combinations of several distributions of both continuous and dichotomous risk indicators lead to a normal distribution of the risk score in the population is an important assumption, and based on the central limit theorem. To test the adequacy of this assumption we examined, as an example, baseline data on BMD and risk indicators from the Rotterdam Study [11]. A theoretical continuous risk score was calculated for women aged 55 years and over, using arbitrary weights, based on age (continuous and not normally distributed: 1 point extra per 10 years of age), BMD (normally distributed: minus the Z-score was added to the risk score), and a dichotomous variable: previous fracture (+1 when yes). The distribution of this risk indicator was then tested for normality using cumulative and quantiles normal probability plots [13].

Results

Assuming a normal distribution of the risk indicators, and a gradient of risk for fracture of 2.6 per standard deviation (SD) (e.g., BMD for hip fracture risk), 32% of the population had a risk that was higher than the average population risk (Fig. 2). With the same gradient of risk, 11% had a twofold increase in risk, and only 5% had a risk that was 3 times that of the average in the population. The proportion of individuals above a given risk threshold is shown in Table 1 for a wide range of assumptions. When, for example, the assumed gradient of risk is 2 per SD, the proportion of individuals above average, double, or triple risk was 36%, 9%, and 3%, respectively. Likewise, when a much steeper gradient of risk is assumed such as 4 per SD, 24%, 12%, and 7% of individuals were identified above these risk thresholds, respectively. When the risk threshold used was near the average risk in the population, the proportion of the population identified decreased somewhat with increasing gradients of risk. At higher risk thresholds, however, the population identified increased with increasing gradients of risk as can be seen in Table 1.

Fig. 2
figure 2

Relative fracture risk compared with the average risk in the population for different gradients of risk (RR/SD)

Table 1 Proportion (%) of individuals detected above a given risk threshold according to gradient of risk

Whereas the change of the proportion of the population detected to be at high risk was relatively small with changing risk gradients, the performance of the test was much better when the RR per SD was greater (see Table 2). When the average population risk was used as a threshold, the average risk in the test-positive category was 1.7 times the population average risk with a 2 per SD risk gradient, while at a risk gradient of 4 per SD, the RR became 3.1. When the threshold was a risk twice the population risk, the average RR in those identified increased to 2.9 and 5.0, respectively. Therefore, tests with a progressively higher gradient of risk identify progressively higher risk patients, and lead therefore to a greater effectiveness of subsequent intervention.

Table 2 Average risk in individuals above a given risk threshold for different gradients of risk

To test the adequacy of the assumption that a risk score will be normally distributed when several risk scores are combined, we examined the distribution of a simple theoretical risk score using baseline data of women in the Rotterdam Study. The risk score could be calculated for 3,374 women with all required baseline information. The average value (SD) for the risk score was 7.0 (1.35). Visual inspection of the distribution and also the cumulative and quantiles normal plots revealed that the distribution was very close to normal (Fig. 3), deviating slightly from normality only at extreme values. This normal distribution result was robust with the addition of other variables such as smoking history or family history, or for changes in the arbitrary weights assigned to each of the risk indicators, and not dependent upon the normality assumption of the BMD distribution. Only the assignment of an extremely high weight to a single dichotomous risk indicator substantially changed this result.

Fig. 3A,B
figure 3

Theoretical risk score based on data from the Rotterdam Study (3,374 women: risk score based on age, BMD, and previous fracture). A Shows the distribution (observed distribution in bars and normal distribution in solid line); B shows the normal quantiles probability plot (observed value of the risk score compared with the expected value based on a normal distribution)

Discussion

The present analysis indicates that the use of tests with progressively higher gradients of risk identifies a similar proportion of patients, but those patients identified have a higher fracture risk. When the patients identified have, on average, a higher fracture risk, then case finding and subsequent intervention will become more effective. If, for example, it were desired to identify individuals with a risk that is twofold greater than that of the population, this would identify 9% of the population using a test with a gradient of risk of 2 per SD, and 12% of the population using a test with a gradient of risk of 4 per SD. But, in the former population the average RR would be 2.9, whereas in the latter the average RR would be 5.0.

A limitation of this analysis is that we have assumed a normal distribution of the combination of risk indicators. In reality this will not be absolutely true, but the example with the theoretical risk score showed that deviation from normality is small when several variables are combined. Although the weights used in this theoretical risk score were arbitrary, the normality of the distribution remained robust unless an extremely high weight was given to a single dichotomous risk indicator. If the distribution would differ significantly from normal, the numbers of patients identified would vary slightly, as would their average risk. Nevertheless, the same principle would hold, namely, that with increasing gradients of risk, a similar proportion will be identified but at a higher average risk.

A risk score based on a combination of variables can be determined by several methods, such as logistic regression, Cox regression, or Poisson analysis. The gradient of risk per standard deviation is then calculated from the standard deviation of that score in the population. To achieve high gradients of risk, a combination of risk indicators that give totally or partially independent contributions to the risk is required. A number of risk indicators have been identified that are partly independent both of age and BMD. These include biochemical estimates of bone resorption, a family history of hip fracture, propensity to fall, previous fragility fractures, smoking, and body mass index, though the last is more highly correlated with BMD [14, 15, 16, 17]. These various risk indicators, used in combination, will enhance the gradient of risk of any case-finding strategy. In applying this methodology, it will be important, however, to determine the interrelationships between all these putative risk indicators so that their combined predictive power is quantified. For example, is smoking a significant risk indicator when adjusted for all other predictors? Previously, several risk scores have been proposed, but often those were risk predictors for low BMD rather than for fractures [17, 18, 19, 20, 21, 22, 23].

The present study indicates that case-finding strategies can be efficient, even when using a test with a gradient of risk of 2.6 per standard deviation (comparable to hip bone mineral density for the prediction of hip fracture). In this case, 32% of the population is identified above the average population risk, and 11% above a risk double that of the average risk. The reason why a minority of the population have a risk that is above average is that bone mineral density is normally distributed, whereas the risk for fracture rises exponentially with decreasing BMD. Therefore, individuals with an average BMD for age have a lower than average risk of hip fracture. Conversely, individuals with an average risk of hip fracture have a BMD lower than average for age.

The use of a test with a higher gradient of risk identifies a similar proportion of the population. By contrast, the risk in those so identified is higher. This is of importance for case finding, where the objective is to identify individuals above (or below) a threshold risk. Consider, for example, an intervention threshold that was set at a 10-year probability of 15% for hip, spine, forearm, or proximal humeral fracture. A 50-year-old woman with a bone mineral density 1 standard deviation below the mean value for her age would have an approximately twofold increase in risk of any of these fractures. In Sweden this amounts to an 11.3% 10-year fracture probability [8]. This would lie below an intervention threshold. The same woman with a prior fragility fracture would have a 21.4% probability (relative risk = 4), and this would exceed the intervention threshold.

From the currently available information on independent risk indicators for fracture, it seems possible that future research and international collaboration might identify risk scores with a gradient of risk of say 4 per standard deviation. If it were desired to treat individuals at a twofold higher risk than the population, then only 12% (Table 1) of individuals would be identified for treatment, and those individuals would have, on average, a fivefold higher risk of fracture then the age-matched population.

We conclude that when a combination of several independent risk indicators increases the gradient of risk of a risk score, it thereby increases the efficiency of case-finding strategies because a target population with a higher average risk will be identified, and so increase the cost-effectiveness of treatment. Moreover, this strategy has a relatively small impact on the number of individuals that require treatment. More research is needed to validate the combined predictive power of different risk indicators.