Introduction

The development of fracture risk assessment tools has enabled a step change in the management of osteoporosis as patients can now be selected for therapy on the basis of absolute fracture risk rather than bone mineral density (BMD) T-score alone. Of the several assessment tools available, the most widely used is FRAX® which is recommended in more than 100 national and international guidelines [1]. The approach used to direct interventions with FRAX has varied worldwide. In the USA and Japan, for example, BMD has remained the gateway for risk assessment, and FRAX is reserved for those individuals without osteoporosis but with a low BMD [2, 3]. Both guidelines used a fixed intervention threshold for FRAX (20% and 15% 10-year probability of a major osteoporotic fracture, respectively). In many other countries, particularly in Europe, parts of Asia, and Latin America, age-dependent intervention thresholds have been preferred [1, 4,5,6]. Given that most assessment guidelines recommend treatment in postmenopausal women with a prior fragility fracture, age-dependent intervention thresholds reflect the age-specific fracture probability equivalent to a woman of average BMI with a prior fragility fracture, no additional risk factors, and without knowledge of BMD. In the UK, the National Osteoporosis Guideline Group (NOGG) adopted what has been termed a hybrid approach—namely an age-dependent threshold up to the age of 70 years with a fixed threshold thereafter [7, 8]. The reason for the fixed threshold was to decrease inequalities in access to therapy that arose at older ages (≥70 years) depending on the presence or absence of a prior fracture.

Recently, the International Osteoporosis Foundation (IOF) and European Society for Clinical and Economic Aspects of Osteoporosis and Osteoarthritis (ESCEO) recommended that individuals eligible for treatment be dichotomised into those at high risk and those at very high risk of fracture [9]. The move responded to the development of new anabolic agents that might be used preferentially in those at very high risk [10,11,12,13]. A second consideration was the quantification of the imminent risk associated with a recent fracture that could adjust conventional estimates of FRAX scores for the added risk associated with recency [14, 15]. Using age-specific intervention thresholds, IOF/ESCEO defined very high risk as a fracture probability that lay 20% above the age-specific intervention threshold, with or without the inclusion of BMD, i.e. where BMD testing is unavailable, the same probability threshold can be used. Using this approach, approximately 36% of postmenopausal women in the UK would be eligible for treatment of whom nearly half (44%) would be characterised at very high risk [9].

These thresholds for high and very high risk were developed for age-dependent intervention thresholds. The aim of the present study was to explore the manner in which very high risk might be categorised using the hybrid assessment model of NOGG.

Methods

We examined the impact of intervention thresholds in a simulated UK cross-sectional cohort of women age 50 years or more. The cohort, described previously [7], was constructed to reproduce the age distribution of women in the UK and the age-specific prevalence of FRAX risk factors [16]. The distribution of the clinical risk factors was estimated by the determination of a set of conditional distributions using cohorts of European women used in the development of FRAX. The simulated population comprised 50,633 women age 50–99 years. Simulations of greater numbers of women (up to 100,000) indicated that this number provided stability of the estimates for the distribution of risk factors.

The current NOGG strategy

The management pathway followed that currently recommended by NOGG. Under the NOGG strategy, the risk of fracture is first assessed on clinical risk factors alone which in turn provides guidance whether a femoral neck BMD measurement or treatment is indicated, an approach that has been endorsed by National Institute for Health and Care Excellence [17]. An exception is in the presence of a prior fragility fracture, in which case treatment is to be considered in such patients without necessarily undertaking a BMD measurement. For the present report, we assumed that treatment would be considered in all women with prior fracture. Conversely, women with no clinical risk factors are not considered for treatment. In those with clinical risk factors (apart from a prior fracture), the decision is based on the 10-year probability of major osteoporotic fracture with some individuals deemed at high risk (treatment without BMD), some at or near the intervention threshold (BMD indicated to finalise risk evaluation and stratification) and some at low risk (lifestyle advice, reassurance, and re-evaluation in the future). Once BMD is entered into the calculation, the decision to treat or not is based on a comparison to age-specific thresholds for both major osteoporotic and hip fracture probability; a probability at or above either threshold indicates eligibility for treatment (Fig. 1).

Fig. 1
figure 1

Intervention and assessment thresholds of the current NOGG thresholds in the UK. The green area denotes low risk (absence of clinical risk factors). The red area denotes eligibility for treatment. The amber area, bounded by the lower assessment threshold (LAT) and upper assessment threshold (UAT), denotes probabilities where a BMD is recommended. Following a BMD assessment, intervention is recommended in those in whom fracture probability lies above the lower intervention threshold (LIT). Thresholds for categorising very high risk are set at 1.2, 1.6, and 2 times the lower intervention threshold

High and very high risk

The NOGG intervention threshold up to age 70 years was set at a risk equivalent to that of a woman with a prior fragility fracture, in line with current clinical practice, and therefore rises with age. At age 70 years and above, fixed thresholds were applied [7, 8] (see Fig. 1). For the present report, this threshold is termed the lower intervention threshold (LIT). The threshold that designates high from very high risk is termed the upper intervention threshold (UIT). For the age-specific European guidelines, a UIT of 1.2 times the intervention threshold was proposed to distinguish very high from high risk. The algorithm identified 16% of women of the total population at very high risk representing 44% of women eligible for treatment [9]. For the NOGG hybrid guideline, we examined the effect of a similar threshold set at the upper assessment threshold (UAT, Fig. 1). We additionally examined the impact of setting the UIT at 1.6 and 2.0 times the LIT.

Factors to influence the setting of the UIT were (1) the prevalence of very high risk in the simulated UK cohort compared with the IOF/ESCEO recommendation and (2) the appropriateness of the probability threshold with regard to the use of anabolic agents. For the latter consideration, we examined the baseline fracture probabilities in the phase 3 ARCH study of romosozumab [11]. In this study, postmenopausal women with osteoporosis and a fragility fracture were randomly assigned to receive monthly subcutaneous romosozumab (210 mg) or weekly oral alendronate (70 mg) in a blinded fashion for 12 months, followed by open-label alendronate in both groups. The study was a multinational trial and calculations of FRAX required the use of 36 FRAX models. For each FRAX model, an intervention threshold (LIT) was calculated using the same hybrid approach that was used for NOGG. For each patient, the 10-year probability of a major osteoporotic fracture (MOF) or hip fracture FRAX was calculated using their country-specific model and expressed as a ratio to the LIT. For example, if the fracture probability of a woman age 65 was 18% and the age and country-specific LIT was 10%, then the ratio 18/10 indicated that the patient had a baseline risk that was 1.8 times higher than the LIT. Hip fracture probability ratios and MOF probability ratios were examined because the NOGG guidelines recommend that treatment decisions be predicated on the basis of both metrics [8].

We additionally examined the multiple of LIT in two phase 3 studies of teriparatide. The first was the assessment of FRAX in participants from the pivotal global, phase 3, multicentre, double-blind, calcium- and vitamin D–controlled, randomised study of teriparatide [18], the methods and results of which have been published previously [19]. The second was the Teriparatide Once-Weekly Efficacy Research (TOWER) trial conducted in Japan [20]. TOWER was a randomized phase 3 double-blind placebo-controlled study of the effects of once-weekly teriparatide on the risk of vertebral fracture. The details have been previously published [21].

Finally, examples of risk categorisation are provided with the use of FRAX clinical risk factors alone and in combination. Examples are for women with a BMI of 25 kg/m2. Each table gives the 10-year probabilities of a major osteoporotic fracture calculated using the UK FRAX model. Cells highlighted in green and orange denote low and high risk, respectively, according to the current NOGG guidance. Red highlights denote very high risk using a threshold 1.6 times the current intervention threshold.

A formal statistical analysis was not conducted as the study represents a simple comparison of the three thresholds using an identical ‘population’.

Results

The application of the three UITs to define very high risk in the simulated UK population of women aged 50–99 years is illustrated in Fig. 2. Given that the LIT remained constant, the proportion of the population designated at low risk was also constant, irrespective of the upper intervention threshold. Thus, the different UITs changed only the proportions of those eligible for treatment that would be characterised at high or very high risk.

Fig. 2
figure 2

The proportion (%) of women characterised at low, high, and very high risk by age. The three panels use an upper intervention threshold of 1.2, 1.6, and 2.0 times the lower intervention threshold. In some instances, errors incurred by rounding give totals that differ slightly from 100%

For all scenarios, there was an age-dependent increase in the prevalence of very high risk. As expected, the higher the UIT, the lower was the prevalence of very high risk. At the UIT of 1.2 times the LIT, a large proportion of postmenopausal women would be characterised at very high risk (23%) and, indeed, just over half of all postmenopausal women that were eligible for treatment would be so characterised (56%). In contrast, the application of the most stringent threshold was associated with a very low prevalence of those at very high risk (5%). The upper intervention threshold of 1.6 times the lower intervention threshold avoided these extremes. For all UIT scenarios, the number of women eligible for treatment was constant (20,451) representing 40% of postmenopausal women. UITs of 1.2, 1.6, and 2.0 yielded 23%, 10%, and 5% prevalence, respectively, of very high fracture risk categorisation amongst the entire population. Amongst those women eligible for treatment, the proportion of women at very high risk was 55.7%, 25.1%, and 12.1% using the UIT of 1.2, 1.6, and 2.0, respectively. This compares with 44% using the 1.2 UIT of IOF/ESCEO and an age-specific intervention threshold over all ages. Thus, the use of a UIT of 1.2 has a different significance when variously applied to an age-specific or hybrid intervention threshold and was one of the reasons, we discarded the use of the 1.2 UIT for the hybrid model. We also discarded the 2.0 UIT because of the very low prevalence of those at very high risk.

The characteristics of women allocated to low, high, and very high risk are shown in Table 1 with an UIT ratio of 1.6.

Table 1 The number of women according to risk category and qualifying characteristics by age using an upper intervention threshold 1.6 times the lower intervention threshold

Probability ratios for the ARCH study are shown in Table 2. The data indicate that many patients exposed to romosozumab would fall into the very high-risk category with a UIT of 1.6. When based on the probability of a major fracture, 41% of patients had a probability ratio that exceeded 1.6. When based on the probability of hip fracture, the majority of patients (73%) had a probability ratio that exceeded 1.6. Similarly, for teriparatide, high median hip fracture probability ratios at baseline were observed in the multinational phase 3 study (median ratio = 1.41) and in the TOWER study (median ratio = 2.12).

Table 2 Mean ratio and interquartile range (IQR) between 10-year fracture probability (%) for a major osteoporotic fracture (MOF) and hip fracture and the NOGG lower intervention threshold in women recruited to the ARCH study. For each age, the proportion of patients is given that equal or exceed the ratio of 1.6 (% > 1.6). Probabilities are calculated with BMD

The Appendix provides examples of the categorization of risk with clinical risk factors alone or in combination in women according to age. With the exception of a prior fracture (always eligible for treatment), a minority of scenarios (32%) were characterised at high or very high risk in the presence of a single risk factor (scenario A). The addition of a further risk factor (a femoral neck T-score of −2.5) increased the number of scenarios (32%) characterised at high or very high risk from 32 to 68% (scenario B). The actual proportions of the whole population at very high risk were 4% and 8%, respectively.

Examples of the impact of recent fracture at different sites are given in the Appendix. Recent fracture alone did not invariably give rise to very high risk and depended in part on the site of the sentinel fracture (scenario C). Eight of the 25 examples (32%) were categorised at very high risk. The combination of other clinical risk factors with a recent fracture will affect the reclassification of risk. Scenario D shows examples of a relatively weak clinical risk factor (current smoking) and scenario E a stronger risk factor (current use of oral glucocorticoids) on categorisation of risk. As would be expected, the weak clinical risk factor increased the proportion of examples at very high risk from 32 to 48% and the strong risk factor from 32 to 96% of the scenarios shown. Note that, despite the multiple very high-risk categories, these examples in scenario E represent less than 1% of the postmenopausal population.

The assessment algorithm is shown in Fig. 3.

Fig. 3
figure 3

Management algorithm for the assessment of individuals at risk of fracture. Patients with a prior fragility fracture are at least designated at high risk and possibly at very high risk dependent on the FRAX probability (left panel). Men and women with clinical risk factors other than a prior fracture are initially assessed with FRAX in the absence of BMD and categorised at low, intermediate, high, and very high risk (righthand panel). Individuals at low risk are afforded lifestyle advice. Those at high or very high risk are eligible for treatment and those at very high risk considered for treatment with an anabolic agent. Those at intermediate risk are referred or BMD testing and probabilities recalculated with femoral neck BMD and thereafter categorised for risk level

Discussion

The categorisation of risk is widely accepted within medicine as an appropriate mechanism to direct decisions on treatment; examples include the fields of cardiovascular disease [22], hypertension [23], and diabetes [24]. The further sub-categorisation of those meriting treatment into high risk and very high risk is predicated on the same principle as it aids in choosing the type of treatment to be recommended. The increasing availability of anabolic therapies in osteoporosis and their superiority to anti-resorptive treatments in head-to-head randomised clinical trials has influenced discussions about the setting of threshold values. Such considerations justify the need for dichotomy but are less helpful in its operationalisation, which by nature will always be somewhat arbitrary. With regard to the development of thresholds between high and very high fracture risk in NOGG guidance, we focussed on precedent and appropriateness.

The two precedents of relevance are the construct of existing NOGG guidance [8] and the guidance of IOF/ESCEO [9]. The hybrid nature of the existing NOGG guidance led us to consider only UITs based on hybrid thresholds (though others were initially considered). This is consistent with the IOF/ESCEO position that viewed UITs as a multiple of LITs. However, the performance of the multiple chosen in fully age-dependent models by IOF/ESCEO of 1.2 was markedly different in the setting of the NOGG hybrid model with the identification of many more women at very high risk, particularly at older ages. The proportion of women eligible for treatment who were at very high risk was 56%, compared to 36% in the fully age-dependent model of IOF/ESCEO. In contrast, the higher UIT of 1.6 times the LIT identified only 25% of women eligible for treatment to be at very high risk—a position a little more conservative than the 36% of the IOF/ESCEO algorithm.

It is relevant perhaps that our simulation population cannot take into account the recency of a prior fracture. If this were to be taken into account, then the prevalence of very high risk would be expected to increase. There are no empirical data to calculate the quantum of effect. An approximation is as follows: approximately 30% of women age 50 years or more in the UK have a prior history of a fragility fracture (see Table 1); 50% of second fractures arise within 2 years of the first fracture [25] so that a recent fracture (within 2 years) affects approximately 15% of the population. Of women with a recent fracture, 53% are a major osteoporotic fracture affecting 8.0% of the population ((53 × 15)/100). In women with a recent MOF, very high risk was found in nine of the 25 scenarios (36%; Appendix, scenario C). Thus, a recent MOF might categorise an additional 2.9% ((8.0 × 36)/100) of women age 50 years or more at very high risk.

Regarding appropriateness, any new threshold is ideally consistent with reimbursement policies. This is never the case with clinical progress in that, rather than being proactive, health technology assessments and reimbursement agencies are invariably reactive to precedents. In the context of osteoporosis in the UK, the NOGG guidance and thresholds, produced in 2008, were not considered by the National Institute for Health and Care Excellence (NICE) in their technology appraisals but eventually recommended nine years later in 2017 [26]. With regard to anabolic treatments, the current position of NICE is that they not be used as first-line treatment but rather as salvage therapy when other options are exhausted [27]. The new NOGG guidance is based on the ever-increasing body of evidence that anabolic treatments are much more appropriately placed as a first-line treatment in those at very high risk than as salvage treatment and used as a last resort [10,11,12,13, 28]. The probability UIT chosen by NOGG is consistent with the populations in which the efficacy of anabolic agents has been assessed, as illustrated in the analysis of the ARCH study in which treatment with romosozumab for one year, followed by oral alendronate, was superior to oral alendronate alone. Similar high-risk populations were enrolled into phase 3 studies of teriparatide.

The examples in the Appendix indicate that no single FRAX clinical risk factor is consistently associated with a fracture probability that exceeded the intervention threshold. Thus, it was combinations of clinical risk factors that provided eligibility for treatment. By the same token, a recent fracture was not consistently associated with a fracture probability that exceeded the intervention threshold for very high risk and recategorization from high to very high risk also depended in many instances on combinations of clinical risk factors.

This analysis has a number of strengths and limitations. One limitation is that the population studied for the impact of UITs on proportions at high/very high fracture risk is a simulated cohort rather than a real population sample. However, the analysis is a comparison of three sets of thresholds within the same population so that conclusions drawn about the relative performance of the thresholds are largely independent of the study population. Furthermore, the simulation allows the impact of any changes in thresholds to be modelled at a population level rather than in subsets of the population. The present analysis has only been conducted in a cohort modelled on the age distribution of the UK but, given that the prevalence of risk factors was derived from several European cohorts, it is likely that similar conclusions would be drawn across other European countries, with the only differences being driven by variations in the age distributions within these countries. The present study was confined to threshold probabilities for a major osteoporotic fracture whereas hip fracture probability is the other output of FRAX used in the NOGG guidance. Indeed, treatment is recommended if the hip fracture probability OR the major osteoporotic fracture probability exceeds the intervention threshold. The lower intervention threshold for hip fracture probability is 0.91, 2.3, and 5.4% at the ages of 50, 60, and 70 years, respectively. Thus, very high risk would be set at hip fracture probabilities of 1.5, 3.7, and 8.6%, respectively. The consideration of the two fracture probabilities is likely also to increase the number of women identified at high risk and at very high risk. The present analysis does not allow an examination of the impact of threshold changes in men.

Finally, it will be important to place the upper intervention thresholds in a health economic perspective. In the context of osteoporosis and fracture risk, the intervention threshold that is relevant for payers can be defined as the probability of fracture at which intervention becomes cost-effective. Whilst NOGG thresholds are driven by clinical appropriateness rather than health-economics, it is still important to underpin the chosen intervention thresholds by cost-effectiveness. The LIT used in the NOGG guidance provides strategies that are highly cost-effective [29,30,31]. The upper intervention thresholds examined in this report require health economic validation using models that can accommodate the heightened risk associated with the recency of fracture [32].

These proposals for the FRAX-based criteria for very high risk categorise a small proportion of women age 50 years or more (10%) identifying a population with a level of risk comparable to that of women enrolled in previous trials of anabolic agents.