Introduction

Osteoporosis is a bone disorder characterized by low bone mass and density, resulting in skeletal fragility and an increased fracture risk in the spine, hip, wrist, pelvis and humerus [1,2,3]. Bone properties associated with predicting fracture risk include bone mineral density (BMD) and bone turnover markers [4,5,6,7].

Globally, it is estimated that over 200 million people currently have osteoporosis [8], with one in three women and one in five men age 50 or older presenting with osteoporotic fractures [9]. Postmenopausal women are at an increased risk of developing primary osteoporosis due to reduced estrogen levels [10, 11].

Osteoporotic fractures, also called fragility fractures, are bone injuries that occur from low-energy trauma such as falls from standing height [12,13,14]. These types of fractures result in reduced health-related quality of life (HRQoL), increased disability and increased mortality [11, 15, 16]. Vertebral fractures are specifically associated with an increased mortality rate of 10–20% in people with osteoporosis [3, 16].

Antiresorptive agents are often used to prevent osteoporosis and osteoporotic fractures by increasing bone mass and density [17]. Antiresorptive treatments include bisphosphonates such as alendronate, ibandronate, risedronate and zoledronate; selective estrogen receptor modulators (SERMs), such as bazedoxifene and raloxifene; and monoclonal antibody against receptor activator of nuclear factor kappa-B ligand (RANK-L), denosumab.

Denosumab (Prolia®) is regularly used in clinical practice across developed countries to treat osteoporosis in postmenopausal women and is often reimbursed through health insurance [18,19,20,21]. In 2017, pharmacovigilance reports warned that discontinuation of denosumab therapy in patients with osteoporosis can lead to increased rates of bone turnover, significant bone mineral loss (in some cases below baseline levels) and increased vertebral fracture risk [22]. Such complications have not been observed after the discontinuation of other osteoporosis therapies (i.e. bisphosphonates, SERMS) due to differences in their mode of action.

The aim of this study was to evaluate the clinical effectiveness and safety (using randomized control trials [RCT]) of denosumab, compared to bisphosphonates and SERMs, for the treatment of postmenopausal women with osteoporosis in developed countries.

This was achieved through conducting a systematic review and network meta-analyses of all available evidence on denosumab, bisphosphonate and SERMs in postmenopausal women with osteoporosis.

Method

This systematic review followed an a priori protocol and is reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement for network meta-analysis [23]. A completed PRISMA network meta-analysis checklist is detailed in the Supplementary document. Similarly, the study protocol is available from the authors upon request.

Search Strategy

Two systematic literature searches were conducted in three databases (PubMed, Embase, The Cochrane Library) from inception to 27 April 2022 (Supplementary Table S1 & S2). The first section of the search was sensitive and identified the literature relevant to denosumab in people with osteoporosis. The second section of the search was more specific and was combined with a methodological filter to limit the identified literature associated with osteoporotic patients on bisphosphonates or SERMs to randomized controlled trials (RCTs). In order to capture any additional evidence that may have otherwise not been identified, grey literature searches of specialty websites were conducted (Supplementary Tables S4 & S5), clinical trial registries were reviewed (Supplementary Table S3), and the reference lists of included publications were pearled by authors. In addition, clinical experts (i.e. oncologist, rheumatologists, endocrinologists) were consulted.

Study Selection

Studies were considered for inclusion if they were RCTs and met the predetermined eligibility criteria (complete criteria is available in Table 1 & Supplementary Table S6). For the purposes of this review, developed countries were defined as World Health Organization (WHO)-Mortality Stratum A countries. This was implemented to ensure that the included RCTs had comparable causes of death and burden of disease [24].

Table 1 Study selection criteria

The search results were screened by title and abstract against predetermined inclusion criteria by three reviewers. Inter-rater reliability was checked via discordance among reviewers on a test sample (k = 200); the calculated Fleiss kappa score was high (kappa = 0.872) [25]. All articles deemed potentially relevant were then reviewed by full text by two reviewers, independently. Conflicts between reviewers on study inclusion were settled via consensus. If consensus could not be reached, a third reviewer decided whether to include or exclude the citation.

Data Extraction

One reviewer independently extracted data into a standardized template, which was then checked against the original study record by a second reviewer. Disagreements were settled by discussion or utilization of a third reviewer. Data of interest included trial information, demographic information, intervention and comparator, outcomes of interest and any other noteworthy features. Both intention-to-treat (preferentially utilized) and per-protocol information, as well as adjusted and unadjusted (preferentially utilized) results were extracted for data analysis. For studies that reported outcomes graphically, WebPlotDigitizer was used to estimate numerical values [28].

Assessment of Quality of Evidence

The quality of RCTs was evaluated using Cochrane Risk of Bias 2.0 (RoB 2.0) [29]. The appraisal was performed by one reviewer and checked by a second reviewer. Any differences were settled via consensus. If consensus could not be reached, a third reviewer was consulted. Risk of bias (RoB) was assessed on a per-outcome basis for clinical effectiveness and safety outcomes. However, for the ROB2 domains of randomization and blinding of personal/participants, the effectiveness and safety outcomes were evaluated together.

Data Analysis

Network Meta-Analyses

A network meta-analysis was performed if data was available for three treatment arms across a minimum of two trials. The network meta-analyses were performed using a Bayesian inference. The analysis was performed under the assumption of a consistency model using a random-effects model. Random-effects models were used in the analyses in order to account for any variations in the possible effect modifiers (i.e. compliance, age, baseline fractures etc.) as well as to account for discrepancies in how the intervention and comparators were delivered in the included trials. The referent comparator for each model was placebo, as most of the available direct evidence was reporting treatment effect relative to placebo. Default and non-informative priors with standard normal distribution and sufficiently wide standard deviations (SD) were used to compute the posterior distribution data, as it was computationally feasible [30, 31].

The duration of follow-up extracted and used in the network meta-analysis for the dichotomous outcomes of vertebral fractures, nonvertebral fractures, mortality, adverse events (AEs), serious AEs (SAEs) and withdrawals due to AEs was the total number of events at the last reported timepoint. Risk ratios (RR) and accompanying 95% credible interval (95% CrI) were calculated using link function logit and a binomial family distribution. With regards to continuous outcomes, mean percentage change for BMD was extracted for each reported timepoint. In situations where the mean percentage change was not provided (e.g. T-score, g/cm2, nmol) it was imputed from the measurements at baseline and timepoints. The mean follow-up period (± 1 SD of the follow-up period) from all included trials was calculated, and results at this timepoint were used in the analyses. Mean difference (MD) and accompanying 95% CrI were calculated using a link function identity and a normal family distribution [32]. Missing values were imputed using methods and formulae detailed in the Cochrane Handbook [33,34,35].

Modelling, Convergence and Output

The Bayesian network meta-analysis was performed in RStudio using the BUGSnet (Bayesian inference Using Gibbs Sampling to conduct network meta-analysis) package [32, 36, 37].

For dichotomous outcomes, a burn-in of 70,000 iterations of Markov chain Monte Carlo (MCMC) simulation was used where results were discarded. A burn of 700,000 iterations of MCMC simulations was run to estimate parameters. To assess the consistency assumption, an inconsistency model using a random-effects model was run. A burn-in of 5,000 iterations of MCMC simulation and an additional 25,000-iteration of MCMC simulations were run to compare the parameter for the assessment of consistency.

For continuous outcomes, a burn-in of 10,000 iterations of MCMC simulation was used where results were discarded. A burn of 100,000 iterations of MCMC simulations was run to estimate parameters. 3For the purpose of assessing the consistency assumption, an inconsistency model using a random-effects model was run. A burn-in of 5,000 iterations of MCMC simulation and an additional 25,000-iteration of MCMC simulations were run to compare the parameter for the assessment of consistency.

The Gelman-Rubin statistic as defined in Brooks and Gelman (i.e. potential scale reduction factor [PSRF] between 1 and 1.05) was used to assess whether convergence had been met in both the consistency and inconsistency models [31, 38, 39].

The results were presented in forest plots. These forest plots included the pooled treatment effects of each intervention relative to placebo. The plots also presented the treatment ranking within the network as well as the equivalent surface under the cumulative ranking curve (SUCRA).

The league tables that present treatment effects and 95% CrI for all possible pairs of interventions in each network are available in the Supplementary document.

Network diagrams were drawn to illustrate the geometry of the treatment network in each analysis. The size of each network node is proportional to the sample size of that node, and the thickness of the lines connecting the nodes is proportional to the number of included trials (i.e. direct evidence).

Assessment of Heterogeneity and Inconsistency.

Cochrane’s Q-statistic was used to derive the conventional I2 values to characterize statistical heterogeneity for both continuous and dichotomous outcomes [40, 41]. The results of the heterogeneity assessment are available in the Supplementary document.

Inconsistency at the global level was assessed by reviewing the fit of consistency and inconsistency models using leverage plots, as well as comparing the deviance information criterion (DIC) score for both consistency and inconsistency models [32]. A difference in DIC scores of 0 to 5 between models was considered minimal, a difference of 5 to 10 was substantial and, finally, a difference that was greater than 10 was significant and eliminated the validity of the results of the model with the higher DIC [42]. The presence of local inconsistency was evaluated by a plot that compared the posterior MD of each data point produced by the consistency and inconsistency models [32]. In situations where networks do not have closed loops, a DIC score could not be calculated [43]. Global and local inconsistency tables are available in the Supplementary document.

Meta-Regressions

Meta-regressions were conducted to evaluate whether there was an association between the included antiresorptive treatments and the age of postmenopausal women. The meta-regressions were only conducted if 10 or more trials were included [40].

Sensitivity Analysis

In addition to the main analyses, sensitivity analyses were conducted to review the impact that high and moderate RoB had on the various analyses. This was achieved by rerunning the respective analyses and only including trials that had a low risk of attrition bias, selection bias and reporting bias. The analyses only focused on these three domains, as the risks of performance bias and detection bias in the included trials were low.

Assessment of Publication Bias

Publication bias was assessed using comparison-adjusted funnel plots (Supplementary document) [44]. This method requires a minimum of 10 trials per outcome [45].

Results

The results of the literature search are summarized in Fig. 1. The searches identified 22,979 articles. A total of 4,753 duplicate citations were removed and 18,226 items were reviewed by title and abstract. In total, 1,227 articles were reviewed by full text. A total of 12 RCTs (k = 22 publications) met the study selection criteria (Table 1) [46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67]. Table 2 details the characteristics of the 12 included trials. There was no available RCT evidence that met the predetermined selection criteria (Table 1) to investigate the effect of denosumab on HRQoL, trochanteric (TRO) BMD and fracture risk assessment (FRAX®) in postmenopausal women with osteoporosis (Table 2).

Fig. 1
figure 1

PRISMA flow diagram. PRISMA preferred reporting items for systematic reviews and meta-analyses, RCTs randomized controlled trials. K number of individual publications. n number of RCTs—an RCT can be included in multiple publications

Table 2 Characteristics of included RCTs assessing clinical effectiveness and safety

Risk of Bias (RoB)

The RoB 2.0 graph and summary are reported in Fig. 2. Across the five domains, all included RCTs (n = 12), presented a high RoB overall. The specific domain scores for each included trial are described in Supplementary Table S7.

Fig. 2
figure 2

Risk of bias graph for RCTs assessing clinical effectiveness and safety outcomes combined. D1 randomization process; D2 blinding of participants/personnel; D3a missing outcome data (fracture/safety); D3b missing outcome data (BMD); D4a measurement of the outcome (fracture/safety); D4b measurement of the outcome (BMD); D5a selective reporting (fracture/safety); D5b selective reporting (BMD). BMD bone mineral density; RCTs randomized controlled trials

Effectiveness

Vertebral Fractures

Vertebral fracture data were available from nine RCTs that had a combined sample size of 19,710 (Fig. 3a) [51, 54,55,56,57, 60, 61, 63,64,65]. Neither denosumab nor the other active treatments were statistically significant compared to placebo after 12 to 84 months of treatment (Fig. 4). Similarly, none of the pairwise comparisons were statistically significant (Supplementary Table S8). Of these treatments, denosumab had the highest probability (SUCRA = 79.56) of being the most effective at preventing vertebral fractures, whereas bazedoxifene had the lowest probability (SUCRA = 26.09) and was ranked as the least effective active treatment. The network did not show any significant evidence of statistical heterogeneity or local inconsistency (Supplementary Figure S2 & Table S17). Global inconsistency could not be estimated as a DIC score could not be generated (Supplementary Table S18).

Fig. 3
figure 3

Network diagrams. (a) vertebral fractures; (b) nonvertebral fractures; (c) FN BMD; (d) LS BMD; (e) TH BMD; (f) mortality; (g) AEs; (h) SAEs; (i) withdrawal due to AEs. AE adverse events; ALN alendronate; BAZ bazedoxifene; BMD bone mineral density; DEN denosumab; FN femoral neck; IBN ibandronate; LS lumbar spine; PLB placebo; RIS risedronate; RLX raloxifene; SAE serious adverse events; SERM selective estrogen receptor modulators; TH total hip; ZOL zoledronate

Fig. 4
figure 4

Forest plot indicating the RR of vertebral fractures (relative to placebo). BAZ bazedoxifene; CrI credible interval; DEN denosumab; PLB placebo; RIS risedronate; RLX raloxifene; RR risk ratio; SUCRA surface under the cumulative ranking curve; ZOL zoledronate. Credible interval (CrI) interval within which RR values will fall with a specific probability; can be interpreted as a confidence interval [71]. Surface under the cumulative ranking curve (SUCRA): probability that a specific treatment is among the most effective options (i.e. ′best′) in the network. A SUCRA value of 100% suggests that the treatment is the most effective treatment included in the network; a value of 0% suggests that the included treatment is the least effective treatment in the network [72]. Rank: position of treatment hierarchy within the network based on the SUCRA score, with 1 representing the most effective treatment

Nonvertebral Fractures

Nonvertebral fracture data were available from seven RCTs that had a combined sample size of 21,873 (Fig. 3b) [47, 51, 55, 56, 60, 64, 67]. Compared to all treatments, denosumab was not associated with statistically significant changes. In addition, only the pairwise comparisons between alendronate and placebo were statistically significant (Supplementary Table S9). Denosumab had the lowest probability (SUCRA = 29.51) and was ranked as the least effective treatment in the network (Fig. 4). Risedronate had the highest probability (SUCRA = 95.69) of being the most effective treatment in the network. The network did not show any significant evidence of statistical heterogeneity or inconsistency (Supplementary Figure S3, Table S19 & Table S20). In addition, sensitivity analysis suggested that results were not impacted by reporting bias (Supplementary Table S35).

Femoral Neck (FN) BMD

Data on BMD measured at the femoral neck (FN) were available from eight RCTs that had a combined sample size of 12,128 (Fig. 3c) [47, 54,55,56, 58, 59, 62, 64]. Relative to denosumab, at 19 (± 1 SD) months, alendronate, ibandronate and risedronate resulted in significant improvements in FN BMD of 11.47% (MD 11.47; 95% Crl 1.39, 21.96), 11.02% (MD 11.02; 95% Crl 0.82, 21.37) and 9.67% (MD 9.67; 95% Crl 0.88, 18.72), respectively (Supplementary Table S10). Of these treatments, alendronate had the highest probability (SUCRA = 94.05) of being the most effective treatment and denosumab had a low probability (SUCRA = 52.31) and was ranked as the fourth most effective treatment (Fig. 5). Overall, there was low total heterogeneity within the network (Supplementary S22). There was no evidence of local or global inconsistency in the network (Supplementary Figure S4 & Table S21).

Fig. 5
figure 5

Forest plot indicating the mean percentage difference in BMD (relative to placebo). ALN alendronate; BAZ bazedoxifene; BMD bone mineral density; CrI credible interval; DEN denosumab; FN femoral neck; IBN ibandronate; LS lumbar spine; MD mean difference; PLB placebo; RIS risedronate; RLX raloxifene; SD standard deviation; SUCRA surface under the cumulative ranking curve; TH total hip; ZOL zoledronate. Credible interval (CrI) interval within which RR values will fall with a specific probability. A credible interval can be interpreted as a confidence interval [71]. Surface under the cumulative ranking curve (SUCRA): probability that a specific treatment is among the most effective options (i.e. ‘best’) in the network. A SUCRA value of 100% suggests that the treatment is the most effective treatment included in the network; a SUCRA value of 0% suggests that the included treatment is the least effective treatment in the network [72]. Rank: position of treatment hierarchy within the network based on the SUCRA score, with 1 representing the most effective treatment. It is difficult to determine whether any of the statistically significant results are also clinically significant, as there is no verified scale that associates an increase in BMD with a decrease in the risk of vertebral or nonvertebral fractures [73,74,75]

Lumbar Spine (LS) BMD

Data on BMD measured at the lumbar spine (LS) were available from nine RCTs that had a combined sample size of 10,092 (Fig. 3d) [47, 51, 54,55,56, 62, 64, 67]. Figure 5 indicates that denosumab (compared to placebo) can improve LS BMD by 7.67% (MD 7.67; 95% Crl 3.11, 12.22) at 20 (± 1 SD) months. Relative to denosumab, alendronate, ibandronate and risedronate resulted in significant improvements in LS BMD of 13.32% (MD 13.32; 95% Crl 4.90, 21.64), 13.16% (MD 13.16; 95% Crl 4.75, 21.50) and 9.52% (MD 9.52; 95% Crl 2.66, 16.37), respectively (Supplementary Table S11). Alendronate had the highest probability (SUCRA = 94.57) of being the most effective treatment in the network, and denosumab (SUCRA = 55.34) was ranked as the fourth most effective treatment. The entire network showed substantial to considerable total heterogeneity (Supplementary Table S24). However, the network arm that compared placebo to denosumab presented low heterogeneity. There was no evidence of inconsistency in the network (Supplementary Figure S5 & Table 23).

Total Hip (TH) BMD

Data on total hip (TH) BMD were available from seven RCTs that had a combined sample size of 13,666 (Fig. 3e) [47, 51, 55, 56, 59, 61, 67]. Denosumab (MD 4.55; 95% Crl 3.08, 6.05) significantly improved TH BMD after 19 (± 1 SD) months of treatment, relative to placebo (Fig. 5). Furthermore, denosumab resulted in a 2.67% (MD 2.67; 95% Crl 0.05, 5.23) and 3.07% (MD 3.07; 95% Crl 0.75, 5.21) improvement in TH BMD compared to the raloxifene and bazedoxifene, respectively (Supplementary Table S12). Denosumab had the highest probability (SUCRA = 89.01) of being the most effective treatment at increasing TH BMD and bazedoxifene had the lowest probability (SUCRA = 31.46) and was ranked as least effective active treatment. There was moderate total heterogeneity within the network (Supplementary Table S26). Moreover, there was no evidence of local and global inconsistency (Supplementary Figure S6 & Table S25).

Safety

Mortality

Mortality data were available from seven RCTs that had a combined sample size of 26,882 (Fig. 3f) [47, 49, 51,52,53, 55, 56, 60, 66]. Relative to all included treatments, denosumab was not associated with statistically significant changes in mortality after 12 to 60 months. Similarly, none of the pairwise comparisons were statistically significant (Supplementary Table S13). Denosumab had the highest probability (SUCRA = 75.91) of having a favorable mortality profile, noting the difference in ranking between denosumab and raloxifene was negligible (Fig. 6). Overall, the network presented moderate evidence of statistical heterogeneity (Supplementary Table S28). There was no evidence of local inconsistency in the network (Supplementary Figure S7). Global inconsistency could not be estimated (Supplementary Table S27).

Fig. 6
figure 6

Forest plot indicating the RR of safety outcomes (relative to placebo). ALN alendronate; BAZ bazedoxifene; CrI credible interval; DEN denosumab; IBN ibandronate; PLB placebo; RIS risedronate; RLX raloxifene; RR risk ratio; SUCRA surface under the cumulative ranking curve; ZOL zoledronate. Credible interval (CrI): interval within which RR values will fall with a specific probability. A credible interval can be interpreted as a confidence interval [71]. Surface under the cumulative ranking curve (SUCRA): probability that a specific treatment is among the most effective options (i.e. ‘best’) in the network. A SUCRA value of 100% suggests that the treatment is the most effective treatment included in the network; a SUCRA value of 0% suggests that the included treatment is the least effective treatment in the network [72]. Rank: position of treatment hierarchy within the network based on the SUCRA score, with 1 representing the most effective treatment.

AE

AE data were available from 12 RCTs that had a combined sample size of 24,481 (Fig. 3g) [47, 49, 51, 55, 56, 58,59,60,61,62, 64]. Compared to all treatments, denosumab was not associated with statistically significant improvements within 12 to 36 months of treatment. Furthermore, none of the pairwise comparisons were statistically significant (Supplementary Table S14). Risedronate had the highest probability (SUCRA = 65.33) of having a favorable AE profile, with denosumab ranked as having the second highest probability (SUCRA = 59.70), noting that there were no differences in the reported relative effects across interventions (Fig. 6). The network did not show any significant evidence of statistical heterogeneity (Supplementary Table S30) or inconsistency (Supplementary Figure S8 & Table S29).

SAE

SAE data were available from 11 RCTs that had a combined sample size of 24,400 (Fig. 3h) [47, 49, 51, 55, 56, 58,59,60,61,62]. After 12 to 36 months of treatment, compared to all treatments, denosumab was not associated with statistically significant improvements. None of the pairwise comparisons were statistically significant (Supplementary Table S15). Raloxifene had the highest probability (SUCRA = 69.60) of having a favorable SAE profile (Fig. 6). Meanwhile, denosumab was ranked fifth (SUCRA = 54.56). The network did not show any significant evidence of statistical heterogeneity or inconsistency (Supplementary Figure S9, Table S31 & Table S32). Furthermore, sensitivity analysis suggested that results were not impacted by reporting bias (Supplementary Table S35).

Study Withdrawal Due to Treatment-Related AEs

Data on study withdrawals due to treatment-related AEs were available from six RCTs that had a combined sample size of 22,254 (Fig. 3i) [46, 47, 49, 56, 59, 60]. After a treatment regimen of 12 to 36 months, compared to all treatments, denosumab was not associated with statistically significant improvements (Fig. 6). None of the pairwise comparisons were statistically significant (Supplementary Table S16). Zoledronate had the highest probability (SUCRA = 57.46) of having the least number of withdrawals due to treatment-related AEs (Fig. 6), with denosumab closely ranked as second with a comparably high probability (SUCRA = 57.19). The network did not show any significant evidence of statistical heterogeneity or inconsistency (Supplementary Figure S10, Table S33 & Table S34). Moreover, the sensitivity analysis suggested that results were not impacted by reporting bias (Supplementary Table S35).

AE Upon Denosumab Discontinuation

Data on AEs upon denosumab discontinuation were only available from a single RCT that compared denosumab to placebo for vertebral and nonvertebral fractures [48, 50]. The sample size was 7,808 at baseline, 1,471 at 4.2 months after a loss of denosumab treatment effect and 797 at 6 months (median) [48, 50]. Loss of the denosumab treatment effect occurs approximately 6 months after the last dose.

Overall, there was likely no significant change in vertebral (Fig. 7a) and nonvertebral fracture (Fig. 7b) rates between denosumab and placebo at 4.2 and 6 months after the loss of the denosumab treatment effect, noting the evidence for this outcome is highly uncertain.

Fig. 7
figure 7

Forest plots indicating RR of fracture outcomes after denosumab discontinuation (relative to placebo). (a) vertebral fractures; (b) nonvertebral fractures. CI confidence interval; DEN denosumab; PLB placebo; RR risk ratio. Loss of effect defined as 6 months post-last dose of denosumab. An additional 1-month study visit window was also reported [48, 50]

Discussion

A comprehensive literature search identified 12 RCTs (k = 22 publications) comparing denosumab to bisphosphonates (alendronate, ibandronate, risedronate, zoledronate), SERMs (bazedoxifene, raloxifene) and placebo in postmenopausal women.

Vertebral fracture, mortality, AEs, SAEs and withdrawal due to treatment-related AEs reported no significant differences for any intervention compared to placebo. Contrastingly, risedronate was found to be statistically significant at preventing nonvertebral fractures. Denosumab was statistically significant at improving BMD compared to placebo when measured at the LS and TH. It is important to note that it is difficult to determine whether any of the statistically significant BMD results are also clinically significant, as there is no verified scale that associates an increase in BMD with a decrease in the risk of vertebral or nonvertebral fractures [73,74,75].

Network meta-analyses could not be conducted on the published data available on AEs upon denosumab discontinuation. There appears to be no significant change in vertebral and nonvertebral fracture rates between baseline and 4.2 to 6 months after denosumab discontinuation (loss of effect). It is important to note, however, that there were significant losses to follow-up at both timepoints in each treatment, and the results are from a single study. As such, the results presented are subject to considerable uncertainty and should be interpreted with caution.

The sensitivity analyses conducted to evaluate the impact of reporting bias on the results were consistent with the main analyses for fractures (vertebral and nonvertebral), mortality, AEs, SAEs and withdrawal due to treatment-related AEs. The impact of reporting bias on the main analysis could not be assessed for BMD at LS and FN, as none of the included trials posed a low RoB in this category. Similarly, the effects of attrition bias and selection bias on the main analyses could not be assessed for the included outcomes as none of the trials presented a low bias in either of these categories. Sensitivity analyses could not be conducted to determine the impact of bias on AEs associated with denosumab discontinuation, as each pairwise meta-analyses only included a single trial.

There was low to moderate heterogeneity for all but one of the network meta-analyses conducted. The network meta-analysis conducted on LS BMD showed substantial to considerable heterogeneity. There was no evidence of strong inconsistency in any of the meta-analyses conducted.

A single meta-regression found that there was a slight association between the effectiveness of denosumab and postmenopausal age. The regression indicated that denosumab was slightly less effective at preventing SAEs in older postmenopausal women (aged 75 to 85). Additional meta-regressions on the remaining outcomes could not be conducted as each network included fewer than 10 trials.

No publication bias was identified in the network meta-analyses that were conducted on AEs and SAEs (Supplementary Fig. 11 & 12). Publication bias could only be assessed in these two outcomes as they were the only network meta-analyses that included the minimum 10 trials required.

In general, the findings of this review complement a previously reported network meta-analysis by Simpson et al. 2020 [76]. The direction of treatment effect when comparing denosumab to placebo for vertebral fractures and BMD measured at the FN are generally in accordance with the findings of Simpson et al. 2020 [76]. Contrastingly, this network meta-analysis utilized RR and a random-effects model to assess dichotomous outcomes, while Simpson et al. 2020 utilized hazards ratios [76]. This resulted in the analysis by Simpson et al. 2020 not being able to account for the timing of treatment benefits post-fracture [76].

There are several key limitations of this review. First, many of the meta-analyses suffer from statistical imprecision due to small sample sizes in certain treatment arms. Low samples sizes can result in wide uncertainty margins that do not accurately reflect the true effect of the treatment (compared to placebo). Finally, it is still unclear how heterogeneity and inconsistency affect the findings of a network meta-analysis. Mills et al. 2013 reported that it is unclear how moderate to considerable levels of heterogeneity and inconsistency may impact the reliability of results (generated from both direct and indirect evidence) [41].

It will be of great interest to conduct further research to address the effects of denosumab on HRQoL, FRAX®, TRO BMD and AEs upon discontinuation in postmenopausal women with osteoporosis within WHO-Mortality Stratum A countries. Due to the lack of applicable evidence for these outcomes, an evidence-based decision cannot be made to inform policy decisions in developed countries.

Conclusion

In conclusion, denosumab had varying impact on clinical outcomes in post-menopausal women with osteoporosis. Relative to placebo, denosumab was associated with significant improvements in BMD measured at both the LS and TH. Denosumab also resulted in an improvement in TH BMD compared to the raloxifene and bazedoxifene. However, relative to denosumab, alendronate, ibandronate and risedronate resulted in significant improvements in LS BMD. Moreover, compared to all treatment, denosumab was not associated with statistically significant improvement in fractures (vertebral and nonvertebral). Finally, regarding safety outcomes, relative to all included treatments (i.e. SERMs, placebo, bisphosphonates), denosumab was not associated with statistically significant changes.