Introduction

Patients with hormone-sensitive cancers receiving endocrine therapy are at an increased risk of secondary osteoporosis [1,2,3]. Secondary osteoporosis is a common bone disorder that is characterized by a decrease in bone mineral density (BMD) and microarchitecture, which results in an increased risk of fracture [4,5,6,7]. Unlike primary osteoporosis, secondary osteoporosis is caused by lifestyle factors (e.g., smoking), pharmaceuticals (e.g., corticosteroids, adjuvant aromatase inhibitor therapy [AAIT], hormone ablation therapy [HAT]), or medical conditions (e.g., hypogonadism) [1, 6,7,8,9].

Hormone-sensitive cancer patients on endocrine therapy who develop osteoporosis are treated with antiresorptive agents. These agents aim to prevent BMD loss and fractures by decreasing bone resorption and subsequently increasing bone mass [10]. Common antiresorptive agents used in patients with hormone-sensitive cancer include receptor activator of nuclear factor kappa-B ligand (RANKL) inhibitors such as denosumab (Prolia®); bisphosphonates such as zoledronate, ibandronate, alendronate, and risedronate; and selective estrogen receptor modulators (SERMs) such as raloxifene and bazedoxifene.

Denosumab can be used to treat secondary osteoporosis in two subpopulations of patients with hormone-sensitive cancers receiving endocrine therapy [1,2,3]. These two subpopulations are men with castration-sensitive prostate cancer receiving HAT in the form of androgen deprivation therapy and women with estrogen or progesterone-sensitive breast cancer on AAIT [1,2,3].

Denosumab is regularly used in clinical practice and is often subsidized in developed countries for men with prostate cancer (MPC) on HAT and women with breast cancer (WBC) on AAIT without restriction [11,12,13]. However, pharmacovigilance reports in 2017 indicated that denosumab discontinuation may lead to increased rates of BMD loss, increased bone turnover, and increased fracture risk [14]. Complications after treatment discontinuation have not been observed with other antiresorptive agents (i.e., bisphosphonates, SERMs).

Therefore, the aim of this study is to assess the clinical effectiveness and safety of denosumab compared to bisphosphonates (alendronate, ibandronate, risedronate, zoledronate), SERMs (bazedoxifene, raloxifene), and placebo for the treatment of secondary osteoporosis in MPC on HAT and WBC on AAIT.

Method

This systematic review is reported in agreement with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) and followed an a priori protocol [15].

Literature searches

Two systematic literature searches were conducted in three databases (Embase, PubMed, The Cochrane Library) from inception to 27 April 2022 (full search strings are provided in Supplementary Table S1). The first part of the systematic search identified literature relevant to denosumab in patients with hormone-sensitive cancers receiving endocrine therapy. The second part of the systematic search identified literature relevant to patients with hormone-sensitive cancers receiving endocrine therapy that were treated with either bisphosphonates or SERMs. This search string was combined with a methodological filter to limit the identified literature to randomized controlled trials (RCTs) [16].

Study selection

Studies were included if they met the predetermined selection criteria (Supplementary Tables S1 and 2). For the purposes of this review, developed countries were defined as World Health Organization (WHO)-Mortality Stratum A countries [17]. This also ensured that the populations had comparable causes of death and burden of disease [17].

Three reviewers screened the search results by title and abstract against the predetermined selection criteria; each screened approximately one-third of the total number of search results. The inter-rater reliability between the reviewers was established on a training sample (Fleiss kappa = 0.872). Two independent reviewers selected relevant articles via full-text review. Conflicts between the reviewers were settled via consensus. A third reviewer was consulted in situations where consensus could not be reached.

Data extraction

Data were independently extracted on a trial-arm level into a standardized template by a single reviewer. A second reviewer then checked the extracted data against the original study record. Disagreements were settled via consensus or through the utilization of a third reviewer.

The data extracted from the RCTs included trial information, intervention, outcomes of interest, demographic information, comparator, and other important trial features or information. WebPlotDigitizer was used to transform trial outcomes that were reported graphically into numerical values [18]. Per-protocol information, intention-to-treat (ITT), and adjusted and unadjusted results were extracted. ITT and unadjusted results were preferentially utilized in the data analysis.

Assessment of quality of evidence

Included RCTs were critically appraised using the Cochrane Risk of Bias 2.0 tool (RoB 2.0) [19]. The appraisals were conducted by a single reviewer and independently checked by a second reviewer. Any differences between the two reviewers were settled via consensus. In situations where consensus could not be reached, a third independent reviewer was consulted. Clinical effectiveness and safety outcomes were critically appraised on a per-outcome basis.

Data analysis

Network meta-analyses

Network meta-analyses were performed using a frequentist approach, with a random-effects model. The model was used to account for any variations in potential effect modifiers (e.g., age, baseline fractures), population-based factors, discrepancies in the delivery of the intervention or comparators, and the variances in the end-of-treatment timepoints. The referent comparator for each of the individual models was a placebo.

Dichotomous outcomes (i.e., vertebral fractures, nonvertebral fractures, mortality, treatment-related adverse events [AEs], serious adverse events [SAEs], and withdrawal due to treatment-related AEs) were analyzed based on the total number of events at the longest duration of follow-up. Results were reported as risk ratios (RR) with corresponding 95% confidence intervals (95% CI). Continuous outcomes (BMD, health-related quality of life [HRQoL]) were evaluated as mean difference (MD) with 95% CI at the mean follow-up period (±1 standard deviations [SD]) of all trials included in the network. With regard to BMD, the mean percentage change was evaluated. When the mean percentage change (g/cm2, T-score, nmol, etc.) was not reported in a publication, the difference was imputed from the extracted baseline and timepoint measurements.

Output

The network meta-analysis was performed in RStudio using the netmeta (Network Meta-Analysis using Frequentist Methods) and meta (General Package for Meta-Analysis) packages [20,21,22,23,24].

The results of the network meta-analysis are presented in forest plots, which include (i) the estimated treatment effect and 95% CI for each treatment, relative to placebo, (ii) treatment measure and sample size associated with each treatment, (iii) probability of each treatment being the highest ranking in the network (P-score), and (iv) treatment ranking within the network. A league table that details treatment effects with 95% CI between any pair of interventions (i.e., pairwise comparisons) in a given network is presented in the Supplementary document.

P-scores rank each treatment within a given network. The P-score estimates the extent of certainty that a treatment is superior to its comparators (the closest score to 1 being the “best”) [25]. We have interpreted a P-score of >0.95 as a very high degree of certainty, a P-score of 0.8–0.9 as a high degree of certainty, and a P-score of <0.80 as an unclear degree of certainty. The interpretation is also dependent on the other P-scores in the network (e.g., if the P-scores are close together).

Each league table presents pairwise comparisons generated from both direct (i.e., effect sizes calculated from the evidence that directly compared interventions) and indirect (i.e., effect size imputed from the evidence that did not directly compare interventions) evidence. The league tables presented in the Supplementary document differentiate between direct and indirect evidence.

Network diagrams were drawn to visualize treatment networks. Each node represents a treatment, and the size of each node is proportional to the sample size of that node. Likewise, the thickness of the lines that connect each of the nodes is proportional to the number of trials that provide information on the treatment comparison (i.e., direct evidence) [26, 27].

Meta-regressions were not performed to explore potential effect modifiers as the netmeta package in RStudio does not support this type of analysis [20,21,22,23,24].

Assessment of heterogeneity

Statistical heterogeneity was identified using Cochran’s Q-statistic for heterogeneity (Qhet) as defined by Krahn et al. 2013 [28]. I2 was used to quantify Cochrane’s Qhet statistic [29]. All heterogeneity assessments are available in the Supplementary document.

Assessment of inconsistency

The global heterogeneity level was reviewed using the Cochrane’s Q-statistic for inconsistency (Qinc) as defined by Krahn et al. 2013 [28]. Inconsistency at the local level was assessed by node-splitting [23, 28]. Given that none of the networks had closed loops, a Cochrane’s Qinc statistic could not be calculated and is therefore not reported [28]. However, summaries of the node-splitting conducted to evaluate local inconsistency were reported (see footnote of the league tables in the Supplementary document).

Pairwise meta-analysis

Where insufficient data were available to conduct a network meta-analysis, a pairwise meta-analysis was conducted if the evidence base included two or more RCTs comparing denosumab to a single comparator of interest (Supplementary Tables S1 and 2).

All pairwise meta-analyses were performed using random-effects models. The Mantel-Haenszel (MH) method was used to estimate primary study weights for dichotomous outcomes, whereas the restricted maximum likelihood (REML) was used for continuous outcomes. Results were reported as RR with 95% CI for each dichotomous outcome, and MD and 95% CI for continuous outcomes.

Timepoints for each analysis (dichotomous or continuous) were determined using the methods described in the network meta-analysis section.

Output

All pairwise meta-analyses were performed in RStudio using the meta (general package for meta-analysis) package [20,21,22]. In addition, the meta-analysis results are presented in forest plots that detail the sample size, estimated treatment effect, and 95% CI for each included trial.

Assessment of heterogeneity and inconsistency

The statistical methods used to measure heterogeneity in meta-analyses of continuous and dichotomous outcomes were Tau2 and I2. The I2 was interpreted in accordance with the Cochrane Handbook for Systematic Reviews of Interventions (version 6.1) [30], whereby an I2 of 0–40% represents low heterogeneity, 30–60% moderate heterogeneity, 50–90% substantial heterogeneity, and 75–100% considerable heterogeneity [30]. Heterogeneity was also visually evaluated using forest plots, as the graphs illustrated the reported effect sizes and uncertainty across included trials.

Sensitivity analysis

Sensitivity analyses were conducted to review the impact that high and moderate risk of bias (RoB) had on the network meta-analyses. Separate analyses were conducted on trials that had either a low risk of attrition, selection, or reporting bias. The analyses only focused on these three RoB domains, as the risk of performance and detection bias in the included studies were low. Results can be found in the Supplementary document.

Assessment of publication bias

Publication bias in the network meta-analyses was assessed using comparison-adjusted funnel plots and these are presented in the Supplementary document [31]. The method requires a minimum of 10 trials per outcome [32].

Results

Search results

The results of the systematic searches are detailed in Supplementary Fig. S1. The database searches retrieved 22,979 articles. A total of 18,266 items were screened by title and abstract after the removal of duplicate citations. Overall, 1227 publications were reviewed in full text.

A total of 14 RCTs (k=15 publications) met the predetermined selection criteria (Supplementary Table S1). Of these, 10 RCTs (k=10 publications) met the inclusion criteria for MPC on HAT, whereas 4 RCTs (k=5 publications) met the inclusion criteria for WBC on AAIT. The characteristics of all included RCTs are detailed in Table 1. There was no available RCT evidence that reported the effect of denosumab on HRQoL, fracture risk assessment (FRAX®), and AEs upon discontinuation of denosumab (Table 1) in patients with hormone-sensitive cancer receiving endocrine therapy.

Table 1 Characteristics of included RCTs assessing clinical effectiveness and safety

Risk of bias (RoB)

The RoB 2.0 summary graph is reported in Supplementary Fig. 2. In total, 87% (12 of 14) of the included trials presented a high overall RoB judgement. Moreover, Greenspan et al. 2007b did not provide information for fracture and safety data across three domains (i.e., domains 3a, 4a, and 5a) [46]. The individual domain scores for each included trial are detailed in the Supplementary document.

Effectiveness

Vertebral fractures

MPC on HAT who have an increased fracture risk

Vertebral fracture data were extracted from two RCTs, with a combined sample size of 1,458 (Fig. 1a) [33, 42]. Statistically significant results suggest that at the end of a 12-month treatment regimen, denosumab can decrease the risk of vertebral fractures by up to 85% (RR 0.15; 95% CI 0.03, 0.67), relative to placebo (Fig. 2). Denosumab was ranked as the most effective treatment with a very high degree of confidence (P-score=0.99), while zoledronate was ranked as the least effective active treatment in the network (P-score=0.14). Pairwise comparisons between denosumab and zoledronate were also statistically significant in favor of denosumab (Supplementary Table S5). The total heterogeneity (within the design) could not be assessed as each arm only included a single trial. There was no evidence of inconsistency within the network (Supplementary Table S5).

Fig. 1
figure 1

Network plots.Index: a MPC vertebral fractures; b MPC nonvertebral fractures; c MPC FN BMD; d MPC LS BMD; e WBC LS BMD; f MPC TH BMD; g WBC TH BMD; h WBC TROBMD; i MPC mortality; j MPC AEs; k MPC SAEs; l WBC SAEs; m MPC withdrawal due to treatment-related AEs. Abbreviations: AE, adverse events; ALN, alendronate; BMD, bone mineral density; DEN, denosumab; FN, femoral neck; IBN, ibandronate; LS, lumbar spine; MPC, men with prostate cancer on hormone ablation therapy (HAT); PLB, placebo; RIS, risedronate; SAE, serious adverse events; TH, total hip; TRO, trochanter; WBC, women with breast cancer on adjuvant aromatase inhibitor therapy (AAIT); ZOL, zoledronate

Fig. 2
figure 2

Forest plot of network meta-analyses indicating the RR of vertebral fractures (relative to placebo). Abbreviations: ALN, alendronate; CI, confidence interval; DEN, denosumab; MPC, men with prostate cancer on hormone ablation therapy (HAT); PLB, placebo; RR, risk ratio; ZOL, zoledronate. Notes: P-score, extent of certainty that a treatment is superior to its comparators (closest score to 1 being the “best”) [25]. Rank: position of treatment hierarchy within the network based on the P-score, with 1 representing the most effective treatment. This forest plot is the result of a network meta-analysis performed using a frequentist inference

WBC receiving AAIT who have an increased fracture risk

Data on vertebral fractures were available from two RCTs, with a combined sample size of 1849 [43, 44]. A pairwise meta-analysis was conducted as both RCTs compared denosumab to placebo (Fig. 3a). Given that only a single trial had estimable results, the combined treatment effect and heterogeneity for the pairwise meta-analysis could not be calculated

Fig. 3
figure 3

Forest plots of pairwise meta-analyses indicating effect sizes (RR or MD) of denosumab (relative to placebo). Index: a WBC vertebral fractures; b WBC nonvertebral fractures; c WBC FN BMD; d WBC mortality; e WBC AEs. Abbreviations: AE, adverse events; BMD, bone mineral density; CI, confidence interval; DEN, denosumab; FN, femoral neck; MD, mean difference; PLB, placebo; RR, risk ratio; WBC, women with breast cancer on adjuvant aromatase inhibitor therapy (AAIT)

Nonvertebral fractures

MPC on HAT who have an increased fracture risk

Nonvertebral fracture data were extracted from six RCTs, with a combined sample size of 2186 (Fig. 1b) [33, 36, 37, 39,40,41]. The results suggest that after 12 to 36 months, none of the treatments was statistically significant compared to placebo (Fig. 4). Denosumab was ranked as the most effective treatment with the highest level of confidence (P-score=0.73), while alendronate also had a comparable level of confidence (P-score=0.71). Zoledronate was ranked as the least effective active treatment in the network (P-score=0.35). None of the pairwise comparisons was statistically significant (Supplementary Table S6). The total heterogeneity (within the design) in the network was low, and there was no evidence of local inconsistency (Supplementary Tables S6 and S18).

Fig. 4
figure 4

Forest plot of network meta-analyses indicating the mean percentage difference in BMD (relative to placebo). Abbreviations: ALN, alendronate; BMD, bone mineral density; CI, confidence interval; DEN, denosumab; FN, femoral neck; LS, lumbar spine; MD, mean difference; MPC, men with prostate cancer on hormone ablation therapy (HAT); PLB, placebo; RIS: risedronate; TH, total hip; TRO, trochanter; WBC, women with breast cancer on adjuvant aromatase inhibitor therapy (AAIT); ZOL, zoledronate. Notes: P-score, extent of certainty that a treatment is superior to its comparators (closest score to 1 being the “best”) [25]. Rank: position of treatment hierarchy within the network based on the P-score, with 1 representing the most effective treatment. This forest plot is the result of a network meta-analysis performed using a frequentist inference.

WBC receiving AAIT who have an increased fracture risk

Data on nonvertebral fractures were available from two RCTs, with a combined sample size of 1,849 [43, 44]. A pairwise meta-analysis was conducted as both RCTs compared denosumab to placebo (Fig. 3b). Overall, there was no statistically significant difference between denosumab and placebo after 24 to 36 months. There were low to moderate levels of heterogeneity in the analysis (I2=40%).

BMD–femoral neck (FN)

MPC on HAT who have an increased fracture risk

Femoral neck (FN) BMD data were extracted from seven RCTs, with a combined sample size of 1889 (Fig. 1c) [33,34,35,36, 40,41,42]. The results suggest that after 12 months of treatment with denosumab, BMD measured at the FN can improve by 2.73% (MD 2.73; 95% CI 2.31, 3.16), relative to placebo (Fig. 4). Zoledronate (MD 3.16; 95% CI 2.03, 4.28) and alendronate (MD 2.61; 95% CI 1.36, 3.85) also significantly improved FN BMD compared to placebo. Risedronate was ranked as the most effective treatment with the highest degree of confidence (P-score=0.73). However, the treatment had a small sample size (n=30) and high levels of uncertainty (MD 6.57; 95% CI −7.13, 20.27). Contrastingly, zoledronate has a similar level of confidence (P-score=0.70), a larger sample size (n=105) and comparatively low uncertainty (MD 3.16; 95% CI 2.03, 4.28). Alendronate was ranked as the least effective active treatment in the network (P-score=0.49). Denosumab was ranked as the third most effective treatment in the network (P-score=0.53). There was low total heterogeneity (within design) in the network and no evidence of local inconsistency (Supplementary Table S7 and S19).

WBC receiving AAIT who have an increased fracture risk

Data on FN BMD were available from two RCTs with a combined sample size of 1240 [43, 44]. A pairwise meta-analysis was conducted as both RCTs compared denosumab to placebo (Fig. 3c). At 12 months, denosumab treatment resulted in an average increase in FN BMD of 3.04% (MD 3.04; 95% CI 2.29, 3.78) compared to placebo. There were moderate levels of heterogeneity in the analysis (I2=42%).

BMD–lumbar spine (LS)

MPC on HAT who have an increased fracture risk

Lumbar spine (LS) BMD data were extracted from 10 RCTs, with a combined sample size of 2315 (Fig. 1d) [33,34,35,36,37,38,39,40,41,42]. After 12 months of treatment, denosumab can improve BMD measured at the LS by 5.00% (MD 5.00; 95% CI 4.55, 5.44), relative to placebo (Fig. 4). Zoledronate (MD 6.74; 95% CI 5.93, 7.56) and alendronate (MD 3.85; 95% CI 2.69, 5.01) also significantly improved LS BMD compared to placebo. Zoledronate was ranked as the most effective treatment with a high degree of confidence (P-score=0.90), with alendronate ranked as the least effective active treatment in the network (P-score=0.35). Denosumab was ranked as the third most effective treatment in the network (P-score=0.60). The total heterogeneity (within the design) in the network was low, and there was no evidence of local inconsistency (Supplementary Tables S8 and S20). There was no evidence of publication bias (Supplementary Fig. S3).

WBC receiving AAIT who have an increased fracture risk

LS BMD data were extracted from four RCTs, with a combined sample size of 1462 (Fig. 1e) [43, 44, 46, 47]. The analysis indicates that after 12 months of treatment with denosumab, BMD measured at the LS can improve by 5.67% (MD 5.67; 95% CI 5.25, 6.09) compared to placebo (Fig. 4). All treatments significantly improved LS BMD compared to placebo. Denosumab was ranked as the most effective treatment with a high degree of confidence (P-score=0.92), while risedronate was ranked as the least effective active treatment in the network (P-score=0.33). A pairwise comparison between risedronate and denosumab suggested a statistically significant improvement in BMD at the LS in favor of denosumab (Supplementary Table S9). The total heterogeneity (within the design) in the network was low, and there was no evidence of local inconsistency (Supplementary Tables S9 and S21).

BMD–total hip (TH)

MPC on HAT who have an increased fracture risk

Total hip (TH) BMD data were extracted from nine RCTs, with a combined sample size of 2249 (Fig. 1f) [33, 34, 36,37,38,39,40,41,42]. The analysis suggested that after 12 months, none of the treatments was statistically significant at improving TH BMD compared to placebo (Fig. 4). Denosumab was ranked as the most effective treatment with the highest level of confidence (P-score=0.63), while zoledronate was ranked as the least effective active treatment in the network (P-score=0.45). None of the pairwise comparisons was statistically significant (Supplementary Table S10). The network showed considerable total heterogeneity (within the design) (Supplementary Table S22). Lastly, there was no evidence of local inconsistency within the network (Supplementary Table S10).

WBC receiving AAIT who have an increased fracture risk

TH BMD data were extracted from four RCTs, with a combined sample size of 1468 (Fig. 1g) [43, 44, 46, 47]. The results indicated that after 12 months of treatment, denosumab can improve TH BMD by 3.82% (MD 3.82; 95% CI 3.43, 4.21), relative to placebo (Fig. 4). All treatments significantly increased TH BMD compared to placebo. Ibandronate was ranked as the most effective treatment with a high degree of confidence (P-score=0.98), with denosumab ranked as the second most effective treatment (P-score=0.69) and risedronate ranked as the least effective active treatment in the network (P-score=0.33). In addition, the pairwise comparison between denosumab and risedronate was statistically significant in favor of denosumab (Supplementary Table S11). The total heterogeneity (within the design) in the network was low (Supplementary Table S23). Similarly, there was no evidence of local inconsistency within the network (Supplementary Table S11).

BMD–trochanter (TRO)

MPC on HAT who have an increased fracture risk

No studies investigating the impact of denosumab on trochanteric (TRO) BMD were identified for this population [34, 42].

WBC receiving AAIT who have an increased fracture risk

TRO BMD data were extracted from two RCTs, with a combined sample size of 332 (Fig. 1h) [43, 46]. After 12 months of treatment, denosumab improved TRO BMD by 4.62% (MD 4.62; 95% CI 3.50, 5.74), relative to placebo (Fig. 4). All treatments were statistically significant compared to placebo. Denosumab was ranked as the most effective treatment with a very high degree of confidence (P-score=1.00), with risedronate ranked as the least effective active treatment in the network (P-score=0.50). The pairwise comparison between denosumab and risedronate was statistically significant in favor of denosumab (Supplementary Table S12). The total heterogeneity (within the design) could not be assessed. There was no evidence of local inconsistency between the direct and indirect comparisons (Supplementary Table S12).

Safety

Mortality

MPC on HAT who have an increased fracture risk

Mortality data were extracted from five RCTs, with a combined sample size of 2063 (Fig. 1i) [33, 36,37,38, 42]. After 12 to 36 months of treatment, none of the treatments reported statistically significant differences in mortality compared to placebo (Fig. 5). Denosumab was associated with the lowest mortality as it had the highest level of confidence in the network (P-score=0.59). However, both alendronate and placebo had similar levels of confidence, with P-scores of 0.52 and 0.51, respectively. Zoledronate was associated with the highest mortality as it had the lowest degree of confidence (P-score=0.38). None of the pairwise comparisons was statistically significant (Supplementary Table S12). There was low total heterogeneity (within the design) and no evidence of inconsistency in the network (Supplementary Tables S13 and S24).

Fig. 5
figure 5

Forest plot of network meta-analyses indicating the RR of safety outcomes (relative to placebo). Abbreviations: ALN, alendronate; CI, confidence interval; DEN, denosumab; IBN, ibandronate; MPC, men with prostate cancer on hormone ablation therapy (HAT); PLB, placebo; RIS, risedronate; RR, risk ratio; WBC, women with breast cancer on adjuvant aromatase inhibitor therapy (AAIT); ZOL, zoledronate. Notes: P-score: the extent of certainty that a treatment is superior to its comparators (closest score to 1 being the “best”) [25]. Rank: position of treatment hierarchy within the network based on the P-score, with 1 representing the most effective treatment. This forest plot is the result of a network meta-analysis performed using a frequentist inference.

WBC receiving AAIT who have an increased fracture risk

Data on mortality were available from two RCTs, with a combined sample size of 3669. [43, 44] A pairwise meta-analysis was conducted as both RCTs compared denosumab to placebo. Overall, there was no statistically significant difference between denosumab and placebo after 24 and 36 months of treatment (Fig. 3d). The analysis indicated low levels of heterogeneity (I2=0%).

Treatment-related AEs

MPC on HAT who have an increased fracture risk

Treatment-related AE data were extracted from five RCTs, with a combined sample size of 1957 (Fig. 1j) [33, 36,37,38,39]. The results suggest that after 12 to 36 months, none of the treatments was statistically significant compared to placebo (Fig. 5). Alendronate was associated with the lowest risk of treatment-related AEs as it had the highest level of confidence in the network (P-score=0.61). However, placebo and denosumab had comparable levels of confidence with P-score of 0.60 and 0.55, respectively. Zoledronate was associated with the highest risk of treatment-related AEs as it had the lowest level of confidence (P-score=0.24). None of the pairwise comparisons was statistically significant (Supplementary Table S14). There was moderate to considerable heterogeneity (within the design) in the network and no evidence of local inconsistency (Supplementary Tables S14 and S25).

WBC receiving AAIT who have an increased fracture risk

Two RCTs reported data on treatment-related AEs with a combined sample size of 3,531 [43, 44]. A pairwise meta-analysis was conducted as both RCTs compared denosumab to placebo. Overall, there was no statistically significant difference between denosumab and placebo after 24 to 36 months of treatment (Fig. 3e). The analysis indicated low levels of heterogeneity and inconsistency (I2=0%).

SAEs

MPC on HAT who have an increased fracture risk

SAE data were extracted from eight RCTs, with a combined sample size of 2380 (Fig. 1k) [33, 35,36,37, 39,40,41,42]. The results suggest that after 12 to 36 months, none of the treatments was statistically significant compared to placebo (Fig. 5). None of the pairwise comparisons was statistically significant (Supplementary Table S14). Zoledronate was associated with the lowest number of SAEs (P-score=0.82), with alendronate having a similar level of confidence with a P-score of 0.74. Risedronate was associated with the highest number of SAEs in the network (P-score=0.14). Denosumab was ranked fourth (P-score=0.27). The total heterogeneity (within the design) in the network was low and there was no evidence of local inconsistency (Supplementary Tables S14 and S25).

WBC receiving AAIT who have an increased fracture risk

SAE data were extracted from three RCTs, with a combined sample size of 3675 (Fig. 1l) [43, 45, 47]. The results suggest that after 24 to 36 months, no treatments were statistically significant compared to placebo (Fig. 5). Placebo was ranked as the treatment associated with the lowest number of SAEs as it had the highest level of confidence (P-score=0.79), denosumab was ranked second (P-score=0.55), with ibandronate associated with the highest number of SAEs in the network (P-score=0.16). None of the pairwise comparisons was statistically significant (Supplementary Table S16). Overall, the network showed low to moderate total heterogeneity (within the design) (Supplementary Table S27). However, there was no evidence of local inconsistency (Supplementary Table S16).

Withdrawal due to treatment-related AEs

MPC on HAT who have an increased fracture risk

Withdrawal due to treatment-related AEs data was extracted from seven RCTs, with a combined sample size of 2,238 (Fig. 1m) [33, 37,38,39,40,41,42]. After 12 to 36 months of treatment, none of the treatments was statistically significant compared to placebo (Fig. 5). Alendronate was associated with the lowest number of treatment-related AEs (P-score=0.96). Denosumab was associated with the highest number of treatment-related AEs, as it had the lowest level of confidence in the network (P-score=0.23). None of the pairwise comparisons was statistically significant (Supplementary Table S16). There was low total heterogeneity (within the design) in the network and no evidence of local inconsistency (Supplementary S17 and S28).

WBC receiving AAIT who have an increased fracture risk

Neither a network meta-analysis nor a pairwise meta-analysis could be conducted, as only a single RCT by Ellis et al. 2008 reported data on withdrawal due to treatment-related AEs [43].

Discussion

A systematic literature search identified 14 RCTs (k=15 publications) comparing denosumab to bisphosphonates (alendronate, ibandronate, risedronate, zoledronate) and placebo in patients with hormone-sensitive cancer receiving endocrine therapy. None of the included trials reported the effects of SERMs (i.e., bazedoxifene, raloxifene) relative to placebo, bisphosphonates, or denosumab in the aforementioned population.

Nonvertebral fractures, mortality, AEs, SAEs, and withdrawal due to treatment-related AEs reported no significant differences for any of the included interventions relative to placebo in hormone-sensitive-cancer patients receiving endocrine therapy. However, denosumab (relative to placebo) was found to be statistically significant in preventing vertebral fractures suffered by both MPC on HAT and WBC receiving AAIT. Similarly, denosumab was found to significantly improve BMD measured at the FN and LS in MPC on HAT, as well as BMD measured at the FN, LS, TH, and TRO in WBC on AAIT. It is difficult to determine whether the reported increases in BMD are clinically significant, as there is no validated scale that associates an increase in BMD with a decrease in the risk of vertebral or nonvertebral fractures [48,49,50]. There was not enough evidence to meta-analyze vertebral fractures or withdrawal due to AE in WBC on AAIT. In addition, none of the included trials reported BMD measured at the TRO in MPC on HAT.

Sensitivity analyses to evaluate the impact of attrition, selection, and reporting bias on the various network meta-analyses in MPC receiving HAT could not be assessed as none of the RCTs that presented a low RoB included denosumab. In addition, the sensitivity analyses conducted to review the impact of attrition and selection bias were consistent with the main analyses for BMD measured at the FN, LS, and TH, nonvertebral fracture, mortality, and AE, and for SAEs in WBC on AAIT. Similarly, reporting bias did not influence the main analyses conducted on BMD FN, LS, BMD TH, mortality, AE, and SAE in WBC on AAIT. However, reporting bias did appear to impact the main analyses conducted on nonvertebral fractures. The effect of reporting bias on the original BMD TRO analysis in WBC on AAIT could not be assessed as none of the RCTs that presented a low RoB in this category included denosumab.

To our knowledge, the only other systematic review and network meta-analysis that compared denosumab to bisphosphonates or placebo in patients with osteoporosis are by Simpson et al. 2020 [51]. However, the findings of the Simpson et al. 2020 [51] study cannot be compared to the findings of our study as the previously mentioned review purposely excluded RCTs that included cancer patients with osteoporosis. However, it is important to note that the direction of the treatment effect of denosumab (relative placebo) for FN BMD and vertebral fractures is in line with Simpson et al. 2020 [51].

This systematic review and network meta-analysis have several limitations. The first is that the review separates the populations into two groups and is limited to developed countries; as a result, this limited the statistical precision of the analyses. The second limitation is that some analyses suffered from imprecision due to low sample sizes; this could result in the calculated effect sizes not accurately representing the true effect of the treatment relative to placebo. The third limitation is that it is unclear how heterogeneity or inconsistency impacts results of a network meta-analysis. Therefore, it is unclear whether moderate to considerable levels of heterogeneity and/or inconsistency in a network lead to less reliability of the results or the ability for the results to impute comparisons (i.e., generate indirect evidence when direct comparisons are unavailable) [52]. In addition, the majority of the included trials only reported short follow-up periods (i.e., 12 [k=12 publications] reported follow-up periods between 12 and 36 months) [33, 34, 36,37,38,39,40,41, 43, 46, 47]. Eight trials (k=8 publications) reported follow-up periods of merely 12 months [34, 36,37,38, 40,41,42, 46], with only two trials (k=3 publications) reporting follow-up periods over 36 months [44, 45, 47]. Finally, this review only reports the clinical effectiveness and safety of denosumab relative to placebo, bisphosphonates, and SERM. It does not address switching from denosumab to one of the aforementioned active treatments, as it is beyond the scope of the review.

Several important knowledge gaps were identified in this review. First, the effectiveness and safety of SERMs in patients with hormone-sensitive cancer receiving endocrine therapy could not be determined as no RCTs met the predetermined inclusion criteria. Second, network meta-analyses comparing denosumab, bisphosphonates, and placebo could not be performed on fractures (vertebral and nonvertebral), FN BMD, mortality, AEs, and withdrawal due to treatment-related AEs in WBC on AAIT as the evidence base only included two treatment arms. Finally, none of the included RCTs reported the effects of denosumab on HRQoL, FRAX®, and AEs upon discontinuation of denosumab. It is important that future primary research addresses these research gaps. This will help enable evidence-based decision-making on these issues.

Conclusion

In MPC on HAT, denosumab was found to be the most effective treatment for preventing vertebral fractures relative to placebo. Similarly, alendronate, zoledronate, and denosumab were found to be effective at increasing BMD measured at the FN and LS, relative to placebo.

In WBC on AAIT, denosumab and risedronate were found to be effective at increasing BMD measured at the LS, TH, and TRO, relative to placebo. Risedronate was also found to be effective at improving BMD measured at the TH and LS, relative to placebo. Furthermore, denosumab was found to be the most effective treatment for preventing vertebral fractures relative to a placebo in WBC on AAIT.

None of the bisphosphonates or denosumab presented an increased risk to patients with hormone-sensitivity cancer receiving endocrine therapy, relative to placebo.