Introduction

FRAX is a computer-based algorithm (http://www.shef.ac.uk/FRAX) that calculates the 10-year probability of a major osteoporotic fracture (hip, clinical spine, humerus or wrist fracture) and the 10-year probability of hip fracture. Fracture risk is calculated from age, body mass index and well-validated dichotomised risk factors. Femoral neck bone mineral density (BMD) can be optionally input to enhance fracture risk prediction. Fracture probability differs markedly in different regions of the world so that FRAX is calibrated to those countries where the epidemiology of fracture and death is known (currently 64 countries). About 6 million calculations are performed yearly in 173 countries. FRAX was launched in April 2008 but the story really begins in the early 1990s with the definition of osteoporosis.

Osteoporosis is a young disease. The term ‘osteoporosis’ was first used in France in the 1820s to describe post-mortem bones with abnormal hollow spaces. It had entered English terminology by 1885 but lacked a specific description until Fuller Albright’s work in Boston in the 1940s [1, 2]. Researchers, however, assigned different meanings to ‘osteoporosis’, with varying emphases on clinical, pathological physiological and biochemical factors [3]. Medical dictionaries and reference books from 1972 to 1995 offer surprisingly inconsistent definitions [4]. Several attempts at international consensus failed [5, 6] but this was eventually successfully achieved at an international consensus conference organised by Claus Christiansen and John Kanis on the back of one of Claus’ osteoporosis meetings held in Hong Kong in March 1993, sponsored by the National Institute of Arthritis and Musculoskeletal and Skin Diseases, the European Foundation for Osteoporosis and Bone Disease and the American National Osteoporosis Foundation [7].

Osteoporosis was defined as ‘a systemic skeletal disease characterized by low bone mass and microarchitectural deterioration of bone tissue, with a consequent increase in bone fragility and susceptibility to fracture’ [7]. Thus, the diagnosis of the disease relied on the quantitative assessment of BMD, known to be a major determinant of bone strength, but also captured the view that additional factors contributed to skeletal fragility. The description further recognised that the clinical significance of osteoporosis lay in the fractures that arose. In this respect, there were analogies with other multifactorial chronic diseases. For example, hypertension was and still is diagnosed on the basis of blood pressure whereas an important clinical consequence of hypertension is stroke.

Despite efforts to reinvent the wheel [8,9,10], the description of osteoporosis has survived intact to date. However, the conceptual description of osteoporosis at that time had no practical value. Chistopher Nordin, in an earlier editorial, had remarked wisely that ‘it is surprising that osteoporosis research has made the progress it has when the central object of the work lacks a common definition [11]’. The consensus statement had been preceded by several proposals and counter proposals [11,12,13], none of which had held traction. Moreover, although BMD testing with DXA had become available, it was too enthusiastically promoted in the USA in an evangelical position statement from the National Osteoporosis Foundation [14] with a result that all BMD testing was discredited by the US Agency for Healthcare Quality and Research [15] which in turn had a negative knock on effect in Europe and the rest of the world.

As a researcher at a World Health Organization (WHO) Collaborating Centre, John Kanis was able to secure the collaboration of the WHO for an international WHO Study Group, which he chaired in Rome on 22–25 June 1992 and reported in 1994 [16]. The objective of the conference was to rehabilitate BMD assessment, and its outcome did just that by defining osteoporosis in densitometric terms. At that meeting, the principles of an operational definition were agreed; Joe Melton and John Kanis were charged with validating the resulting cut-off using the epidemiological database of the Mayo Clinic. With the unanimous agreement of the study group, osteoporosis in postmenopausal women was defined as a BMD that was 2.5 standard deviations or more below the mean value of young healthy women, i.e. a T-score < −2.5 SD [16, 17] using the T-score notation originally devised by Tom Kelly.

The definitions inspired much debate by a minority for about 5 years but won general approval, shown by its almost immediate use in clinical practice and research, and through its inclusion in medical reference books by 1995 (e.g. Riggs and Melton 1995 [18]). A 2003 WHO report restates the 1994 standards, a powerful indication of their usefulness and acceptance: ‘The cornerstone of diagnosis is the measurement of bone mineral density. Diagnostic thresholds offered by the WHO have been widely accepted’ [19]. The narrative below documents the development of FRAX arising from this background. The steps are divided using pregnancy as a loose analogy.

Fertilisation

A major drawback of BMD assessment, made explicit in the 1994 WHO report, was that the majority of fragility fractures would occur in individuals without BMD-defined osteoporosis. In other words, a BMD of < − 2.5 SD had high specificity, but the sensitivity was low and ranged from about 30 to 50% under most reasonable assumptions [16]. The low sensitivity was the principal reason not to recommend BMD testing for population-based screening.

This definition of osteoporosis presaged the development of highly effective treatments such as alendronate, which first became available on September 29, 1995. Indeed, at the launch of alendronate by Merck, Wall Street analysts predicted a billion-dollar market for alendronate within the first year. Despite tools for diagnosis, diagnostic criteria, an effective treatment of osteoporosis and a large marketing investment, this goal was to take considerably longer to achieve. Claus Christiansen and John Kanis were invited by Gideon Rodan to Merck in 1997 to offer their views on the disappointing sales figures. In their view, the problem was multifactorial, but a large component related to the lack of technologies to determine who to treat and that the Merck strategy to treat everyone was untenable from a marketing, clinical, ethical and health economic view. Christiansen and Kanis offered to develop assessment algorithms in exchange for 3 days worth of lost revenue (about $4 million) but the offer was rejected.

Over the next year, John Kanis approached other companies to suggest that the development of assessment algorithms was a worthwhile investment. He was particularly fortunate to meet Willard Dere (then at Lilly Laboratories) who was persuaded and offered an unrestricted grant. At that time, a small team was assembled to develop algorithms for the assessment of fracture risk. The expertise represented was clinical, epidemiological, mathematical, statistical and economic (initially comprising Olof Johnell, Bengt Jonsson, Anders Odén, Alison Dawson and later also including Chris De Laet, Helena Johansson, Eugene McCloskey and Nick Harvey). Getting the first grant was the most difficult and grants from the Alliance for Better Bone Health, GE Lunar, Hologic, Pfizer, Roche, Novartis, the International Osteoporosis Foundation and the International Society for Clinical Densitometry followed in later years with a successful publication stream.

The gestation of FRAX—The first trimester

A first step was to identify the relevant risks. It was thought important to characterise the risk of fracture rather than the risk of osteoporosis. The process began by investigating the performance of BMD measurements as a predictor of fracture risk [20,21,22]. It became evident that the clinical utility of assessing fracture risk over a defined time depended not only on BMD and fracture risk, but also on mortality. Thus, fracture probability would be low in an individual at very high fracture risk if the risk of death was even higher. Thus, accurate methods for assessing lifetime risks of fracture were developed [23] by integrating the hazard functions of death and fracture [24]. The principle was extended to include BMD [25, 26] and the mortality consequences of fracture [27,28,29,30], including deaths that could be attributed to the fracture event which would be needed for future health economic analyses.

Having decided to adopt a probability-based approach to fracture risk assessment, it was important to determine the time horizon. One-year probabilities would yield numbers that were too low for clinical consumption (e.g. a 1-year probability of hip fracture of 0.3%). At the other extreme, lifetime probabilities, though much higher, did not increase with age despite a logarithmic increase in the incidence of hip fracture. The compromise was the birth of a 10-year probability [25]. This had the advantage of providing readily understandable numbers and was a time horizon favoured by the economist Bengt Jonsson that covered the duration of interventions, largely bisphosphonates, together with the slow offset of effect once treatment had stopped [31]. Following a review of the world literature on hip fracture risk and death [32], a greater than 10-fold variation in hip fracture probability worldwide was noted. From this observation, it became clear that the assessment of fracture risk must take account of the global diversity of risk.

A further problem to be tackled was the output variable. Ideally this should include all osteoporotic fractures. But how should these be defined? In the absence of a gold standard, site-specific criteria were developed based on their association with low BMD, their predictive value for other fractures and the pattern of their incidence with age [33], criteria which have now become widely accepted. However, at the time, the epidemiology of many osteoporotic fractures was unknown or ill-defined, something that remains the case for certain fractures today. An example is rib fracture which is notoriously difficult to diagnose. For this reason, we focussed on the ‘major osteoporotic fractures’ (hip, clinical spine, forearm and proximal humerus) which account for about 80% of the fracture burden and for considerably more of the disutility and economic burden [33].

The choice of multiple endpoints created epidemiological challenges in the context of probabilities. If we were to develop the 10-year probability of a major osteoporotic fracture, then, the metric of interest was the incidence of a first major fracture whether this be at the hip, spine, humerus or forearm. In the absence of empirical data, Olof Johnell characterised the incidence of a first major fracture from detailed information from all fracture cases in Malmo, Sweden [24]. From this, correction factors were derived to adjust the more usual estimates of age- and sex-specific incidence of hip fracture available in other regions of the world to provide estimates for the MOF incidence. The correction factors have been shown subsequently to be robust [34, 35].

A pivotal finding that determined the future of fracture risk assessment was that BMD, the diagnostic focus for osteoporosis, had relatively poor performance characteristics. As indicated in the WHO report of 1994 [16], it was evident that BMD alone was a poor screening tool in that the majority of fractures in the community occurred in individuals without BMD-defined osteoporosis [21]. For example, if the aim were that the 15% of the female population at highest risk of hip fracture at the menopause were selected for treatment, the use of hip BMD would have a specificity of 85% but a sensitivity of only 25% [21]. However, the models indicated that femoral neck BMD had a different prognostic significance at different ages [25]. Thus, age contributed significantly to fracture risk independently of BMD as shown previously for forearm BMD [36]. The implication was that diagnostic thresholds are not equivalent to intervention thresholds since the range of risk varied so markedly for any given BMD [25]. This raised the question as to whether there were other risk indicators that could improve still further the sensitivity of a risk assessment algorithm.

In 2002, the FRAX team organised a meeting of sponsors, key academics and international societies to review the progress of the work thus far and to introduce more widely the notion of risk-based assessment [37]. The expected improvements in sensitivity were modelled [38] and prototype models developed [39, 40].

The hunt for independent clinical risk factors began.

Health economic perspective

From the late 1980s, treatments were becoming available that were based on high-quality randomised placebo-controlled trials. Since health economics in this field was in its infancy, it became important to determine the information base necessary to populate health economic models and identify important drivers of cost-effectiveness. This was made possible by the work undertaken in the development of the risk assessment tool [41] and in particular by the development of methodology to integrate the multiple fracture outcomes in osteoporosis [33, 42]. The first focus was to determine the fracture probability at which hypothetical interventions became cost-effective [43, 44]. This work, led by Bengt Jonsson, provided the basis for the development of a reference model for osteoporosis in 2007 [45], subsequently adopted by the International Osteoporosis Foundation and remains so today.

The second trimester

Having committed itself to the outside world in 2002 [37], the FRAX team needed to identify clinical risk factors that could provide independent information on fracture risk. For this purpose, it was necessary to have access to the individual data rather than summary material for meta-analyses. This was required because of the likely interactions between clinical risk factors. For example, if we identified a risk of smoking, to what extent was this independent of other risk factors such as the intake of alcohol? The use of primary data would also eliminate the risk of publication bias.

At that time, the FRAX team had access to two population-based cohorts with extensive baseline characterisation and long-term outcomes (death and fracture). The first was the Rotterdam study, available through Chris De Laet and Huw Pols, and was followed by the Canadian Multicentre Osteoporosis Study (CaMos) cohort, courtesy of Alan Tenenhouse. A priority was to have more cohorts from around the world so that the analyses would have international validity. Collecting other cohorts was at first problematic and required reassurance that all material would be held in confidence and that no details of individual cohorts would be published. The search for cohorts for the primary analysis was eventually closed in 2004 with the acquisition of nine cohorts from Australia, Canada, the USA, Japan and many European countries. However, other investigators were asked to provide additional cohort material for the purposes of validation.

Meta-analyses

Armed with an extensive observational base, a series of meta-analyses of potential risk factors for fracture was undertaken during 2004 and 2005. This work was only made possible by unrestricted grants from a number of commercial sources, a fact that was openly acknowledged in each of the relevant publications. The risk factors that were eventually selected comprised:

  • Body mass index [46]

  • Femoral neck BMD [47]

  • A prior fragility fracture [48]

  • Parental history of hip fracture [49]

  • Current tobacco smoking [50]

  • Ever long-term use of oral glucocorticoids [51]

  • Rheumatoid arthritis [51]

  • Other causes of secondary osteoporosis

  • Daily alcohol consumption of 3 or more units daily [52]

For each risk factor, it was important, not only to determine the hazard ratio for fracture, but also its dependence on age, BMD and other risk factors. In addition, the commitment to a 10-year time horizon required that the impact of time on the hazard functions be assessed. At that time, there was precedent from earlier work for believing that the impact of risk factors might attenuate with the passage of time [22]. As it turned out, the effects were minor for the ‘successful’ risk factors. At the same time for each risk factor the hazard for death was determined since, if significant, would compete with the fracture hazard in a probability model. A good example was current smoking which was a moderate risk factor for fracture but would turn out to be a weak risk factor for fracture probability because smoking also significantly increased the risk of death.

The list of risk factors published hides a large number of studies that failed to qualify for inclusion, many of which remain unpublished because of their negative outcome though the relevant background was included in the WHO 2008 Technical Report [53]. A non-exhaustive list is given below:

  • Height and weight. Neither were significant risk factors in an international context but are included in FRAX in order to compute body mass index.

  • Stroke, which had previously been shown to be a strong risk factor for hip fracture [54].

  • Biochemical markers of bone turnover, which held promise in the EPIDOS study [55]. As it turned out, there were insufficient data for world representation.

  • Exposure to oestrogens. Oestrogenic status was heterogeneously characterised, though oestrogen-dependent effects were documented in several cohorts from Copenhagen studied by Claus Christiansen [56].

  • Quantitative ultrasound. The several techniques used and the limited breadth of data precluded serious consideration [47, 57]. A subsequent analysis indicated that the association of quantitative ultrasound with osteoporotic fracture decreased with time, suggesting a limited role over a 10-year time horizon [58].

  • Lumbar spine BMD. The data available were much less than for BMD at the femoral neck. Moreover, the predictive value for fracture was not greater than femoral neck BMD [47].

  • Diabetes. The meta-analysis was negative, perhaps partly due to the difficulties in distinguishing type I and type 2 diabetes. Later work would show that type 2 diabetes would prove to be a significant risk factor independent of BMD [59].

  • Endocrine disorders such as hyperthyroidism, parathyroid disorders, adrenal and pituitary disease. Their prevalence was too low in population-based cohorts to yield helpful information.

  • Falls. The study of falls risk was problematic for several reasons. First, the database was limited, of poor quality and meta-analysis showed no significant increase in osteoporotic fracture risk though a small effect on hip fracture risk [60]. Moreover, in the phase 3 trial of risedronate, where hip fracture was the primary end point, hip fracture risk was not significantly decreased in patients over the age of 80 years, the majority of whom were purportedly selected on the basis of falls risk [61]. For this reason, it was considered that more data would be required before falls could be safely incorporated into assessment algorithms that identified patients at high risk of fracture who would benefit from bone-targeted therapies.

  • Calcium nutrition. Simplified tests for the intake of milk showed no predictive value [62].

Validation of risk factors

There are a number of factors to be considered in the selection of risk factors for case finding. Of importance, in the setting of primary care, is the ease with which they can be captured and might be used. For a globally applicable tool, the chosen risk factors should also be valid in an international setting and their predictive value stable or well characterised over time. These were features that were accounted for in the meta-analytic approach. A further but critical consideration was the reversibility of risk, i.e. what was the evidence that the risk identified by a risk factor is amenable to therapeutic interventions that modify bone mass and strength. Thus, it was important to draw the distinction between reversible risk and reversibility of risk. Age is an example of an irreversible risk factor, but the risk of fracture identified by age has reversibility in the sense that the elderly respond to pharmacological intervention in much the same way as do younger patients.

Although there are well-established methods for evaluating the quality of evidence of the effectiveness of interventions, there was little or no precedent to address this question in the context of risk factors so that levels of evidence were developed to provide a yardstick by which to evaluate the chosen risk factors, the criteria for which were subsequently published several years later in Calcified Tissue International [63].

At the highest level of evidence, it would be necessary to recruit patients selected on the basis of the risk factor to a randomised controlled trial. The risk factor that is best evaluated in this way is BMD, and indeed the vast majority of therapeutic studies have recruited patients on the basis of low BMD. In recent years, other trials have recruited patients on the basis of age, sex, a prior vertebral fracture and current exposure to glucocorticoids irrespective of BMD, and have shown therapeutic effects, albeit mainly for vertebral fracture risk, similar to those noted in randomised controlled trials based on BMD selection. Thus, several of the selected variables fulfilled this criterion.

For the other risk factors, comparable data were lacking. In the absence of such data, an alternative approach was to demonstrate that the presence (or absence) of a risk factor did not adversely influence therapeutic efficacy against fractures. Systematic studies of strontium ranelate [64] and teriparatide [65] had shown no significant interaction between response to treatment and the presence or absence of any of the risk factors selected, including age, height, family history of fracture, low body weight or body mass index, smoking, alcohol intake or prior non-vertebral fracture. The gaps were filled, therefore, by interrogating the phase 3 studies of risedronate [66], raloxifene [67, 68] and clodronate [69] and came to similar conclusions. A gap in the data was an analysis of the phase 3 studies of alendronate which were not provided by Merck or the principal investigators, despite repeated requests until more than 10 years later, long after the launch of FRAX.

The third trimester

It is fair to say that up until 2004, the FRAX team had an impressive publication record but saw little light at the end of the tunnel to indicate that risk assessment tools could eventually be delivered. On the other hand, because of the publications and collaboration with many investigators around the world, the osteoporosis community were expecting an early delivery. The completion of the meta-analyses and risk factor validation gave room for optimism.

The model

The next phase was to create models according to the framework developed in the first trimester with the data generated in the second trimester.

Each of the risk factors identified by the meta-analyses was examined for interactions with sex, age, BMD, time since baseline and the variable itself. An example of an interaction with age is that a prior fracture provided the highest risk of a subsequent fracture at the age of 50 years and fell progressively with age [48]. An example of the interaction of a variable with the same variable was the different significance of a unit change in body mass index at high or low values [46].

Four models were constructed from the risk factor analysis to compute fracture probabilities. These comprised the probability of hip fracture, with and without BMD, and the probability of other major osteoporotic fractures (forearm, proximal humerus and clinical spine), with and without BMD. The choice of four models was driven first by the knowledge acquired that the strength of the risk factors differed for hip fractures than for the other major fracture outcomes. For example, the risk of hip fracture at the age of 50 years increased 3.7-fold for each standard deviation decrease in BMD at the femoral neck whereas the increase in other major fractures was by 1.2 for each standard deviation decrease [47]. Second, the need to develop models that were independent of BMD was fuelled by a survey of the availability of densitometry set against the requirements needed for case finding [70]. In short, limited or no real access to BMD testing was available in many countries and the WHO specifically requested that the tool should be applicable in all settings.

For each model, fracture and death as continuous hazard functions were computed using a modified Poisson regression approach. For each risk factor, all significant interactions that were identified by the individual level data meta-analyses (‘mega-analyses’) were entered in the model. Where interactions noted in the mega-analyses were no longer significant for hip fracture and other osteoporotic fractures, these were omitted in a stepwise manner. For the death hazard, all significant interactions for death risk were included and thereafter omitted if appropriate in a stepwise manner, as undertaken for the fracture hazard.

In addition to rheumatoid arthritis, provision was made for the inclusion of other secondary causes of osteoporosis. For this purpose, Joseph Melton kindly conducted a systematic review to identify secondary causes of osteoporosis consistently documented to be associated with a significant increase in fracture risk. Whereas there was strong evidence for the association of these disorders with fracture risk, the independence of these risk factors from BMD was uncertain [53]. It was conservatively assumed, therefore, that the fracture risk was mediated via low BMD but, in the absence of data on BMD, the risk ratio assumed for these other secondary causes was similar to that noted for rheumatoid arthritis.

WHO Scientific Group meeting

The work undertaken thus far had been part of the work programme of the WHO Collaborating Centre for Metabolic Bone Diseases at the University of Sheffield. The Collaborating Centre and WHO organised a WHO Scientific Group meeting on the ‘Assessment of Osteoporosis at the Primary Health Care Level’ which met in Brussels from 5 to 7 May 2004. The objective of the meeting was to review the work undertaken and to detail the future work required to complete the programme. It was naively assumed that this would be completed within the year, but it would be several years later when the report was finalised and signed off by the study group. Indeed, the report was not printed until the end of 2007 and launched in 2008 [53].

Performance characteristics

The performance characteristics of the models were expressed as gradients of risk per SD change in risk indicator. This provided a quantitative assessment of the predictive value of the clinical risk factors alone and the improvement in risk prediction of combining the clinical risk factors with BMD [71]. For example, at the age of 50 years, the relative risk of hip fracture per standard deviation decrease in BMD was 3.68. The gradient of risk for the clinical risk factors alone was 2.05 and when combined with BMD was 4.23. Gradients of risk were somewhat lower at higher ages, but the conclusions were similar. We had previously determined that increases in the gradient of risk, though at first sight modest, have a marked impact on both sensitivity and positive predictive value and that the gains would be proportionately less above gradients of risk of 3 or 4/SD [26]. Thus, the models, though far from perfect, were a substantial advance over the use of BMD alone. Moreover, the models could be used without BMD.

Validation

As mentioned, we had at this stage (2007) access to an additional 11 population-based cohorts from which we could validate the models. The finding of similar gradients of risk for fracture in the validation cohorts as those determined in the source cohorts [71] meant that country-specific models could now be built. The hazard functions of fracture and death could be applied to specific populations where fracture risks and mortality are known. This assumed that the relative importance of the risk factors and their interactions are the same in each country as in the original model.

In different regions of the world, the age-specific incidences of fracture [32] and death vary markedly so that the FRAX models had to be calibrated to the known epidemiology of fracture and death in each country. The first countries chosen were Sweden and the UK because of the breadth and quality of the epidemiological data available. It was also important to build models where treatments were widely available, such as the USA and Japan and also to provide models where the population probability of fracture was very high (Sweden and the USA), high (UK), moderate (China, France, Japan and Spain) and low (Turkey). These were countries where we had high-quality information for hip fracture and for death but much sparser data on the incidence of other fractures that were to be included (forearm, clinical spine and proximal humerus fractures). A review of the available information had suggested that the pattern of fractures was similar in the Western world and Australia, despite differences in incidence [33], and we built in this assumption for these new countries. As stated before, this assumption, where tested, subsequently appears sound [34, 35].

The FRAX team worked closely with several country organisations to prepare for the arrival of the assessment models. These included the UK National Osteoporosis Guideline Group, NOGG, chaired by Juliet Compston; the Japanese Osteoporosis Society (Hajime Orimo and Seike Fujiwara) and the National Osteoporosis Foundation in the USA (Bess Dawson-Hughes, Robert Lindsay and Joseph Melton). Whereas we were supplying a technology platform (FRAX), the application of the technology was considered to be the responsibility of specific countries. The rationale was to provide early access to the models so that country-based organisations could establish how to incorporate the tools into practice guidelines and the implications on populations selected. Thus, our discussions led us to focus on several aspects of guideline development, particularly intervention thresholds. When was the probability of fracture sufficiently high to recommend an intervention? This required consideration of how to transform a continuous variable into a categorical variable, and the implications arising there from. This was tackled by updating health economic analyses, testing intervention thresholds and assessing the role of FRAX without BMD.

The health economic work focussed on intervention thresholds [72, 73], at that time, the hip fracture probability at which interventions became cost-effective. As treatments became available, specific analyses were undertaken for alendronate [74, 75], risedronate [76, 77], raloxifene [78, 79], hormone replacement treatment [80] and strontium ranelate [81]. More recently, similar analyses have been conducted for denosumab [82,83,84]. In addition to assessing these treatment modalities for osteoporosis, it was possible to compare cost-effectiveness with interventions in other chronic non-communicable diseases such as hypertension and hyperlipidaemia [85].

These studies were used to validate FRAX-based intervention thresholds in several countries including the UK, USA, Switzerland and Sweden [86,87,88,89].

The WHO report was finalised in late 2007 and the only remaining task was to give the assessment tool a name. Of the many suggestions, FRAX was proposed (from memory by Helena Johansson) and adopted. The name was registered since we wished to distinguish the authentic algorithms from those that might be copied later. In hindsight, this was a prudent investment since several groups reverse engineered copies, one of which is still in use today [90]. Access to FRAX was via the website built by Richlyn Systems Ltd., Sheffield, under the supervision of Eugene McCloskey at the University of Sheffield. We had asked whether the University would pay for the website development (and for trade marking) but the modest financial rewards were an insufficient temptation.

The birth of FRAX

The launch of FRAX was attended by several publications that had been prepared before the official launch in April 2008. Of particular importance was the publication of FRAX-based guidelines in the UK [91] and USA [92] and supporting papers in collaboration with the National Osteoporosis Guideline Group (NOGG) in the UK and the US National Osteoporosis Foundation (NOF) [86, 88, 93, 94]. The approach to FRAX-based intervention thresholds differed fundamentally. In the USA, FRAX was built into existing guidelines that predicated treatment in postmenopausal women with a prior hip or spine fracture or who had densitometric evidence for osteoporosis. FRAX was recommended in women with osteopenia (a T-score between − 1 and − 2.5) and intervention recommended when the 10-year probability of a major osteoporotic fracture was 20% or more or the hip fracture probability was 3% or more [92], based on a health economic assessment [86]. The NOGG took a different approach, despite guidelines at the time that took a rather similar approach as that in the USA [95]. NOGG reasoned that, if a prior fragility fracture was an indication to recommend treatment, then postmenopausal women who had a fracture probability that was equivalent to or greater than women with a prior fracture should be considered for treatment, irrespective of their fracture status. The major osteoporotic fracture probability that was equivalent to women with a prior fracture was age dependent and ranged from about 7% at the age of 50 years to 34% at the age of 90 years. Thus, NOGG recommended age-dependent intervention thresholds, in contrast to the fixed thresholds devised by the NOF. This dichotomy of approach (fixed vs. age-dependent thresholds) has persisted with the incorporation of FRAX into more than 100 guideline recommendations in different countries [96], though age-dependent thresholds are slowly gaining traction.

Access to FRAX

Discussions were held with Hologic and subsequently GE Lunar, the major manufacturers of BMD equipment. Both expressed interest in incorporating FRAX into their equipment. Following regulatory review by the US Food and Drug Administration, FRAX was incorporated into DXA scanners to provide FRAX probabilities at the time of DXA scanning. The deal was that we would supply the models free of charge on the condition that the updated software was passed on to the user free of charge. They upheld the letter of the agreement but not the spirit in providing free access only where users purchased other upgrades to existing equipment. Access was also provided free for a number of sources who wished to calculate FRAX from within electronic heath records to see how such mechanisms could be put in place. For those without internet access or DXA upgrades, hand-held calculators and smartphone applications were developed subsequently by the International Osteoporosis Foundation.

New models

FRAX was launched in 2008 with 8 country-specific models. The number of models increased progressively with the growth of available epidemiologic information in the published literature. In several instances, epidemiological studies were initiated in order that FRAX models could be developed. Examples include Mexico [97], Russia [98], Armenia [99], Belarus [100] and Georgia. New epidemiology of fracture resulted in the revision of FRAX models in the USA [101], China and in Turkey [102]. By the end of 2010, the number of FRAX models had increased from 8 to 30. At the time of writing, 64 country models are available, in 34 languages, and currently provide coverage for over 80% of the global population [103].

Models have been completed for Abu Dhabi, Argentina [104], Armenia [99], Austria, Australia, Belarus [100], Belgium [105], Brazil [106, 107], Canada [108], Chile [104], the Czech Republic [109], China (revised 2013), Colombia [110], Croatia, Denmark, Ecuador [104], Estonia, France, Finland [111], Georgia, Germany, Greece, Hong Kong [112], Hungary, Iceland [34], India (surrogate), Indonesia, Iran, Ireland [113], Israel, Italy [114], Japan [115], Jordan (updated), South Korea, Kuwait [116], Lebanon, Lithuania, Malta, Mexico [104], Moldova, Morocco, Netherlands [117], New Zealand, Norway, Palestine (surrogate), the Philippines, Poland [118], Portugal [119], Romania [120], Russia [98], Singapore, Slovakia, Sri Lanka (surrogate), Spain, Sweden [121], Switzerland [122], Taiwan, Thailand, Tunisia, Turkey [123], Ukraine [124], the UK [88], the USA [92, 125] and Venezuela [104].

With regard to website visits, calculations arose from 173 countries in 2012–2013. Uptake was high in North America, the Antipodes and most countries of Europe; intermediate in Latin America and the Middle East; and very low in Africa and much of Southeast Asia [103].

FRAX and intervention

Shortly before the launch of FRAX, the Committee for Medicinal Products for Human Use (CHMP) updated its guidance on the evaluation of interventions in the treatment of osteoporosis [126]. A major departure from previous guidelines was a focus on intervention in patients at high risk of fracture and indeed, they no longer considered a distinction between prevention and treatment of osteoporosis. The guidelines proposed thresholds of major osteoporotic fracture 10-year probability of 10 to 15% for clinical trial inclusion based on our analysis of the Fracture Intervention Trial (FIT). It became important, therefore, to examine possible interactions between baseline fracture probability and therapeutic outcomes in phase 3 studies of efficacy. Analyses of phase 3 studies have shown that, in patients characterised at high risk as assessed by FRAX, a majority of osteoporosis interventions significantly reduce fracture risk including strontium ranelate [127], raloxifene [128, 129], bazedoxifene [128], clodronate [130], daily and weekly teriparatide [131, 132], abaloparatide [133], denosumab [134] and alendronate [135] as well as a basket of interventions used by general practitioners in the UK [136]. Most of these were post hoc but, in the case of denosumab, was a pre-planned analysis. In addition, the ‘SCreening of Older wOmen for the Prevention of fractures’ (SCOOP) study (see below) was a prospective randomised study that demonstrated efficacy for hip fracture in women selected on the basis of hip fracture probability assessed using FRAX [136, 137].

Several of these studies showed that efficacy (relative risk reduction) was greater in patients with the higher baseline fracture probabilities [128, 130, 134]. This has implications for targeting treatments to high-risk patients in that the dividend in terms of fractures saved is amplified. This also has implications for health economic assessment and conventional meta-analyses of interventions used in osteoporosis [138]. As a result, FRAX algorithms have been integrated in health economic models so that economic thresholds could be expressed in terms of 10-year fracture probabilities [78, 82, 94, 139]. An important later development was, for the first time, to integrate FRAX-based models with relevant parameters of adherence [83, 140].

Screening with FRAX

Until recently, the effectiveness of risk assessment strategies in which samples of the general population might be evaluated for risk factors and BMD to derive individual estimates of absolute fracture risk, with targeting of anti-osteoporosis therapy on the basis of these estimates, remained uncertain. The publication of the Medical Research Council/Arthritis Research UK-funded SCOOP trial provides strong support for such a strategy [136]. This seven-centre pragmatic randomised controlled trial with 5-year follow-up included 12,483 women aged 70–85 years, who were randomised to receive a care algorithm including FRAX and drug targeting versus usual primary care for osteoporosis based on opportunistic case finding. Screening on the basis of hip fracture probability reduced the incidence of hip fractures by an average of 28%, an effect that was somewhat greater in women with the higher fracture probabilities targeted for treatment [137]. The screening algorithm resulted in a pronounced increase in the use of anti-osteoporosis medication, and greater compliance with therapy, over the period of follow-up. These findings strongly support a systematic, community-based screening programme of fracture risk in older women. In addition, the strategy appears to be cost-effective [141, 142].

Teething troubles

As had occurred with the introduction of an operational definition of osteoporosis, the introduction of FRAX engendered some controversy. The tool was praised for its simplicity for the end-user, doctors and allied health care professionals but criticised by academics for the same reason. It was evident that FRAX (like all available clinical risk assessment tools) had several limitations that should be borne in mind when interpreting the results.

Limitations

The limitations of FRAX have been reviewed recently [96]. Thus, the FRAX assessment takes no account of dose-responses for several risk factors. For example, two prior fractures carry a much higher risk than a single prior fracture [143] and a vertebral fracture with a small residual deformity was a weaker prognostic risk factor than a more severe deformity [144]. Dose-responses are also evident for glucocorticoid exposure [145], cigarette smoking [50] and alcohol intake [52]. Since it is not possible to accommodate all such scenarios within the FRAX algorithm, these limitations should temper clinical judgement. Many of these issues were addressed by the ‘FRAX Initiative’, a meeting organised by the International Society of Clinical Densitometry in collaboration with the IOF held in Bucharest, Romania, in November 2010 [143, 146].

To address some of these and other limitations, relatively simple arithmetic adjustments have been proposed, which can be applied to conventional FRAX estimates of probabilities of hip fracture and a major fracture to adjust the probability assessment with knowledge of:

  • High, moderate, and low exposure to glucocorticoids [147]

  • Concurrent data on lumbar spine BMD [148, 149]

  • Trabecular bone score [150,151,152,153]

  • Hip axis length [154]

  • Falls history [155]

  • Immigration status [156]

  • Type 2 diabetes [157, 158]

Misperceptions in discrimination and calibration

As well as calls for complexity, others argued that simpler models gave as good discrimination as the FRAX model applied to their cohort, as attested by a rash of studies published between 2009 and 2012, that compared the performance characteristics of FRAX using receiver operating characteristic (ROC) curves [159,160,161,162,163,164,165,166,167,168,169,170], to cite but a few. Other examples are given elsewhere [171]. Some of these were summarised and reported by the National Clinical Guideline Centre [172] though, for unknown reasons, the extensive validation studies that we had undertaken [71] were omitted. For a variety of reasons, these studies were flawed, not least for the inappropriate use of ROC curves [173]. Fewer such analyses have been reported recently although those listed above regrettably still appear in systematic reviews [172, 174, 175].

Several sources reported that there are limited studies of the calibration available for FRAX [174, 175]. However, the view is somewhat misplaced in that each FRAX model goes through an internal calibration step [53, 96]. Thus, if the whole population of a country were to be ‘FRAXed’ with the country-specific model, the number of deaths and hip fractures would match those predicted by FRAX. The calibration is, however, dependent on the accuracy of the fracture hazard and death hazard for each country or ethnicity. For this reason, where possible, FRAX models are built on national data.

The renouncement of parenthood

Whereas the examples of teething troubles above reflect normal academic debate, there have been instances of more unreasoned assaults on FRAX and its developers by a minority for reasons largely unrelated to the performance of the tool. The first was a series of nihilistic papers by Teppo Jarvinen and colleagues who disputed the view that fractures are mainly caused by osteoporosis, that patients at high risk can be identified and that the risk is amenable to bone-targeted pharmacotherapy [176,177,178]. The first point that the osteoporosis T-score has low sensitivity is well recognised and was actually the very reason for the development of FRAX. The other two points simply ignore the scientific literature [137, 179,180,181,182], yet the latest article was published by the British Medical Journal [177], thus fulfilling its own agenda of an overdiagnosis and overtreatment theme, despite the advice of its own referees [183]. To confuse risk assessment with intervention thresholds, established by independent national and international bodies, is to suggest that inanimate sphygmomanometers are the direct cause of increased treatment of hypertension. But in these media-driven times, scientific integrity cannot be permitted to get in the way of a good story that fits with editorial policy.

The term fake news has recently come to prominence, though more frequently outside the scientific and medical arena. However, lines can be crossed and ‘truth’ is now open to post hoc interpretation. We believe that an example of such has been promulgated by the WHO that published an editorial and a letter to the New England Journal of Medicine distancing themselves from the FRAX® tool [184, 185]. The editorial was somewhat disingenuous and, indeed, poorly researched. It states that the WHO wishes to make clear that the FRAX® tool to evaluate fracture risks of patients is not a ‘WHO tool’ and has not been developed, endorsed, evaluated or validated by the WHO.

The WHO Collaborating Centre for Metabolic Bone Diseases at Sheffield was established in 1991. Over the subsequent 20 years, a core remit was to develop assessment algorithms in the management of osteoporosis; for example, this was outlined clearly in the proposal for renewal of the Centre by WHO in 2000 [186]. Significant milestones of the Centre included the WHO operational definition of osteoporosis, the WHO guidelines for preclinical guidelines and clinical trials in osteoporosis and the development of FRAX [16, 17, 19, 53, 187,188,189,190]. Regarding the report on FRAX [53], the intention by the WHO was to publish this as a WHO Technical Report, formally commenced by the hosting of a WHO Study Group in 2004. The report was subsequently reviewed by the WHO and (at the expense of the Collaborating Centre) edited by a WHO recommended editor. The passage to publication was derailed when the WHO asked the Collaborating Centre for an additional $100,000 to publish the report, leading to its subsequent publication via the University of Sheffield. Notwithstanding, the report, reviewed and edited by the WHO, approved the use of the term ‘WHO fracture assessment tool (FRAX)’ [53]. Thus, the editorial statement that ‘WHO has not authorized the use of the WHO name, acronym or emblem in connection with the FRAX® tool’, is somewhat economical with the facts. The editorial further stated: ‘it should be clear that any treatment recommendations integrated within the FRAX® tool have not been evaluated by WHO’s Guidelines Review Committee’, neglecting to mention that this committee, to the best of our knowledge, did not exist until after publication of the technical report. Moreover, no intervention thresholds were included in the WHO report and did not, therefore, require approval.

In addition to the ill-researched editorial, the WHO asked the University of Sheffield to delete all references to the WHO on the FRAX website (with which it obligingly complied) and to write an erratum for all papers that mention the WHO Collaborating Centre after the year 2010 (even where the work was that of the Collaborating Centre) [191]. The summary of the process underlying the development of FRAX is still available on the WHO website [187] and contains the statement that ‘the performance characteristics of the FRAX model (without BMD) are at least as good as those provided by peripheral assessment of bone mineral density’. The question that then arises is what or who inappropriately motivated the WHO to undermine the FRAX tool, the scientists that developed the tool and the field of osteoporosis?

The WHO has done much to advance the awareness and management of osteoporosis and its ensuing burden of fractures. It has provided a definition of osteoporosis (and updated this), a position on screening accepted worldwide, guidance for the development of interventions and most recently the development of FRAX. It is a pity that the WHO does not wish to continue to take credit for one of the most significant advances in osteoporosis, a position that it had not questioned for 6 years after the launch of FRAX in 2008. It is even more disappointing that a department within the WHO appears to have combined a legally correct request for correction of affiliation status with a low-quality and jaundiced review of the situation, that has tarnished the reputation of the authors of FRAX and the WHO itself. Sadly, like many contemporaries, elements within and external to the WHO appear to be entering the post-truth world where each side has their own reality and the truth has little purchase [192].

Conclusion

Since its launch, FRAX has been shown to be a robust assessment tool subsequently approved by the National Institute for Health and Care Excellence, UK, and the Food and Drug Administration, (https://www.itnonline.com/content/fda-clears-hologic-bone-densitometer-risk-calculator) [193]. The continued interest in FRAX in terms of publications per year, uptake in national and international guidelines and its potential to be used for screening suggests that FRAX has truly come of age. Indeed, it is now established as the standard global tool for fracture risk assessment, accessible by countless healthcare professionals internationally to use in their daily interactions with patients. The future of FRAX offers exciting possibilities to further refine risk prediction, and for integration with clinical systems to optimise identification of those at highest fracture risk. Such an approach will yield the greatest chance of successfully combating the ever-increasing global burden of osteoporotic fractures over the coming decades.