Impacts of practice

  • The availability of robust evidence of treatment for moderate-to-severe ulcerative colitis is limited by short duration of the clinical trials and lack of direct comparisons.

  • New small molecules for ulcerative colitis therapy seem to have successful results, similar to current biological treatment.

  • New treatment options for moderate-to-severe ulcerative colitis have some potential advantages, such as oral route or absence of antigenicity, which means more convenience for patients.

Introduction

Inflammatory bowel diseases include a group of several diseases. The main ones are Crohn´s disease and ulcerative colitis (UC). The characteristic symptoms of UC are bloody diarrhoea, usually with abdominal pain, with alternated periods of remission and relapses. Mucosal inflammation commences in the rectum and can spread to the rest of the colon [1].

The incidence and prevalence of UC have been increasing worldwide and has emerged and is rising in prevalence in developing countries [2, 3]. Contrarily, the UC incidence appears to have stabilized or even decreased both in Europe and North America [4].

UC is a complex and costly disease affecting the quality of life and productivity, which incurs a significant economic burden. The selection of treatment depends on the severity of the disease and its extent. Conventional treatments for mild-to-moderate UC are salicylates, steroids, and thiopurines; for moderate-to-severe UC, immunosuppressants and biological drugs [5, 6], and, more recently, new oral small molecules are incorporated to the therapeutic arsenal. Surgery is usually the last option, although it is considered “curative”, it has a negative and permanent impact on the patients’ quality of life [7].

Treatments for this pathology is increasingly complicated and should be chosen depending on their efficiency and patients’ preferences. Therefore, a systematic review and network meta-analysis (NMA) was conducted to estimate the benefits and safety profiles of the biological drugs and new oral small molecules for moderate-to-severe UC treatments.

Aim of the review

The aim of this study is to assess the comparative efficacy and safety of biological and new oral small drugs for naïve patients to biological drugs for moderate-to-severe UC treatment.

Ethics approval

This manuscript has not been submitted to other journals for simultaneous consideration and has not been published previously. All data are from published studies, without manipulation. Information from patients about clinical practice or from animals has not been used to obtain data. The study did not require the approval of a review board. All authors are part of an independent group and have contributed sufficiently to the scientific work.

Materials and methods

A systematic review and NMA were conducted to compare the efficacy and safety of the biological drugs (infliximab, adalimumab, golimumab, vedolizumab, and etrolizumab) and the new small oral treatments (tofacitinib and ozanimod) for moderate-to-severe UC in patients who had not been previously treated with any biological treatment. This was performed and reported in agreement with the PRISMA guidelines [8] and ISPOR recommendations [9].

Systematic review

PubMed, Embase, and Web of Science were the databases used for our search, which is detailed in the supplementary appendix. Conference proceedings and grey literature sources for relevant studies about the drugs of interest were also investigated. The titles and abstracts of published articles identified were reviewed by two independent reviewers. Discrepancies were resolved by consensus.

Randomized controlled trials (RCTs) in adults, which assess the biological and new small oral drugs for moderate-to-severe UC, with a Mayo score ≥ 6 [5, 10], were included. The controlled arm could be a placebo or an alternate drug for moderate-to-severe UC. RCTs were eligible for inclusion regardless of their country of origin and could be phase 2 or 3.

The Jadad scale [11] was used to assess the methodological quality of included clinical trials (CTs). The Cochrane Collaboration’s tool [12] was used to assess the risk of bias (RoB).

For induction therapy, the efficacy outcomes were clinical remission (Mayo score ≤ 2, with no individual subscore > 1), clinical response (reduction in the Mayo score ≥ 3 points and ≥ 30% from the baseline, with a decrease in the rectal-bleeding subscore of ≥ 1 point or a subscore of ≤ 1), and mucosal healing (Mayo endoscopy subscore of 0 or 1). For maintenance therapy, the study outcomes were clinical remission, mucosal healing and sustained clinical remission (an achievement of clinical remission or clinical response at both the induction and maintenance therapy). Induction treatment was considered for 6–8 weeks and maintenance therapy for 48–54 weeks. To assess the safety of the treatments, the serious adverse events (SAEs) and rates of infections were considered, due to their impact on the quality of life and the burden of disease. SAEs included adverse events, which occur during the treatment and require inpatient hospitalization, are life-threatening, or result in a persistent or significant disability.

When the data for multiple doses of the same medication were available, the chosen dose and data were described in the summary of product characteristics (SmPC) (see Table S1 in the supplementary appendix).

Network meta-analysis

A Bayesian NMA was performed to provide information about the comparative efficacy and safety of different treatments. It was conducted using the GEMTC [13] package for R-Statistics® and the J.A.G.S.® [14] program. The results were analyzed using a fixed- or random-effects model. The Deviance Information Criterion (DIC) value was used to determine the best model [15]. Four independent chains of 10,000 iterations were run. The convergence was assessed calculating the Brooks-Gelman-Rubin Statistics [16].

The relative treatment effects were presented as an odds ratios (ORs) with a 95% confidence interval (CI). The ranking of treatments was calculated based on the proportion of cycles during the sampling process.

The statistical heterogeneity between the trials was defined as I2 metric [17, 18]. The concordance among head-to-head and indirect comparisons was determined to assess the inconsistency using the node-splitting method [19,20,21].

Results

From the systematic review, after removing the duplicates, 612 potential references were obtained. Following abstract screening, 23 full-text articles were assessed, of which 14 references, with 18 CTs, were finally chosen. The main reasons for exclusion of abstracts were that the studies were of different diseases, had different aims, or were observational. The removal of full-text articles was due to the different definitions or measurements of outcomes, and for two studies because a negative result was obtained by the studied drug versus placebo [22, 23]. Details of the systematic review are provided in a flow diagram in the supplementary appendix (Figure S1). All studies were randomized, double-blind CTs in adults for moderate-to-severe UC. All treatments are summarized in Table 1. The baseline demographic and disease characteristics of the patients from all included studies are detailed in the supplementary appendix in Table S2.

Table 1 Included CT for the NMA

Aside from the CT of Mshimesh [30], which was 3/5 due to the randomization and blinded methods being unknown, the rest of studies reached the maximum score on the Jadad scale, so all publications had a sufficient quality. The RoB kept the maintained trend, since most of the studies had low RoB, excluding the Mshimesh one, in which the risk was unclear due to the method of randomization and allocation concealment.

For induction, maintenance treatment and safety, a fixed-effect NMA was used to determine the OR and 95% CI for each comparison of treatments. This model was adjusted better in the used NMA, with a lower DIC. Additionally, the heterogeneity was low in all the analysed variables, with I2 ≤ 10%, and there were no elements for inconsistency (p < 0.05) invalidating the model. The results are represented in the supplementary appendix.

Induction therapy

Clinical response

Among the comparisons, most of the drugs were superior to the placebo for induction of a clinical response, with the exception of etrolizumab (OR 1.87, 95% CI 0.76–4.72). Infliximab was the best drug for the induction of clinical response, regardless of the dose used: 5 mg/kg (OR 4.15, 95% CI 2.96–5.84) or 3.5 mg/kg (OR 4.07, 95% CI 1.76–9.81). The rest of treatments had a similar efficacy, excluding etrolizumab, which did not show any statistically significant differences versus the placebo. Infliximab was statistically superior to adalimumab for the induction of clinical response (OR 2.10, 95% CI 1.33–3.27).

Clinical remission

Most of the drugs showed statistically significant differences versus the placebo, with the exceptions of etrolizumab (OR 2.42, 95% CI 0.42–20.85) and ozanimod (OR 3.16, 95% CI 1.01–12.71). Infliximab 5 mg/kg ranked the best, with 21.7% of the simulations showing a statistical superiority over adalimumab (OR 2.35, 95% CI 1.35–4.14) (see Figs. 1 and 2).

Fig. 1
figure 1

Results of NMA induction. Clinical remission

Fig. 2
figure 2

Diagram of NMA induction. Clinical remission

Mucosal healing

All therapeutic options showed better results than the placebo, with a statistical significance, being very similar in terms of inducing mucosal healing. Among the different treatments, ozanimod 1 mg daily ranked the best (49% of simulations). On comparing all drugs, it was found that infliximab was statistically superior to adalimumab and golimumab (OR 2.01, 95% CI 1.28–3.16 and OR 1.67, 95% CI 1.04–2.07, respectively).

Maintenance treatment

Clinical remission

All treatments reflected a superiority versus the placebo. Vedolizumab (OR 3.84, 95% CI 2.13–7.15) and tofacitinib (OR 5.51, 95% CI 3.31–9.56) showed better outcomes in terms of reaching clinical remission. Tofacitinib ranked first in 88% of the Bayesian simulations and was statistically superior to adalimumab and golimumab 50 mg or 100 mg when all treatments were compared (OR 2.34, 95% CI 1.12–4.95, OR 3.15, 95% CI 1.52–6.71 and OR 3.06, 95% CI 1.47–6.41, respectively) (see Figs. 3, 4).

Fig. 3
figure 3

Results of NMA maintenance. Clinical remission

Fig. 4
figure 4

Diagram of NMA maintenance. Clinical remission

Mucosal healing

All drugs showed superior results to the placebo. Adalimumab (OR 2.02, 95% CI 1.31–3.13) and golimumab 100 mg (OR 2.04, 95% CI 1.26–3.31) or 50 mg (OR 1.98, 95% CI 1.22–3.21) were very similar in terms of this variable. Infliximab (OR 3.81, 95% CI 2.13–6.97), tofacitinib (OR 5.62, 95% CI 3.51–9.57), and vedolizumab (OR 4.35, 95% CI 2.48–7.79) indicated a higher success for this parameter. Tofacitinib was the best drug in 66.7% of the simulations. Vedolizumab and tofacitinib showed a statistical superiority to adalimumab and golimumab 50 mg or 100 mg (See Table 2).

Table 2 Comparative results NMA mucosal healing at maintenance

Sustained clinical remission

Aside from golimumab 50 mg (OR 1.38, 95% CI 0.51–3.58) and 100 mg (OR 2.08, 95% CI 0.86–5.30), the rest of the treatments showed statistically significant superiority to the placebo. Tofacitinib (OR 6.56, 95% CI 3.35–14.14) had the best success in the sustained clinical remission, with a probability of 80.4%, and was statically superior to adalimumab and golimumab 50 mg (OR 3.11, 95% CI 1.27–7.95 and OR 4.81, 95% CI 1.45–16.74, respectively).

Safety

Safety was only assessed for maintenance therapy, since induction treatment is usually too short to consider most of the safety profile, which has an impact on a chronic disease like UC. Therefore, etrolizumab and ozanimod were not included. For both variables, the heterogeneity was very low (I2: ≤ 2%). Inconsistency could not be calculated, since there were no closed loops.

Rate of infections

Tofacitinib (OR 2.08, 95% CI 1.34–3.20), golimumab 50 mg (OR 1.79, 95% CI 1.12–2.85), golimumab 100 mg (OR 1.90, 95% CI 1.23–2.95) and vedolizumab (OR 1.74, 95% CI 1.05–2.93) were the drugs showing higher rates of infection, with a statistical significance versus the placebo. However, infliximab 3.5 mg/kg (OR 0.98, 95% CI 0.27–3.02), infliximab 5 mg/kg (OR 1.23, 95% CI 0.76–2.01), and adalimumab (OR 1.26, 95% CI 0.90–1.78) were considered the safest drugs, since they did not show statistically significant differences versus the placebo. In the ranking of drugs, tofacitinib was the worst in 40.2% of the simulations.

SAEs

All treatments indicated superiority versus the placebo to produce SAEs. All drugs had similar probabilities of causing SAEs, without a statistical significance.

Discussion

The biological drugs for treating moderate-to-severe UC have changed the therapeutic perspective. Furthermore, many new drugs with different therapeutic outcomes are in the investigational phase; soon, there will be new options available for this pathology when the current therapy is not effective.

NMA can provide estimations when no head-to-head studies have been performed. This NMA adds to the current understanding of the comparative efficacy and safety of treatments for moderate-to-severe UC, covering the new therapies which are in the process of research and authorization in some countries. Moreover, this can help to place these drugs for clinical practice.

Based on the results obtained in the ranking of the NMA, for induction therapy, in general terms, infliximab had the probability of being the best drug in terms of all considered variables, being statistically superior to adalimumab and golimumab. However, for induction of mucosal healing, ozanimod showed the probability of being better than infliximab. Doses of 3.5 mg/kg and 5 mg/kg of infliximab appeared to be similar. In the study of Jiang et al. [25] both doses showed a superiority versus the placebo; however, they were only studied in the Chinese population and the median of Mayo score was lower than in the rest of the infliximab studies [24, 26, 30]. Overall, for the rest of the drugs, the treatment with the worst probability for the three outcomes was adalimumab. Vedolizumab and golimumab were at an intermediate level.

For maintenance therapy, tofacitinib registered the higher probability to be the best treatment overall, with better success for a sustained clinical remission. The anti-α4β7 integrin vedolizumab was in second place, followed by the anti-TNF molecules, where infliximab showed the probability to be the best one, followed by adalimumab and golimumab. Etrolizumab and ozanimod were not considered for this part of the treatment, since the etrolizumab CT was designed for induction therapy [36] and the ozanimod CT [37] had a duration of 32 weeks, which is not sufficiently for establishing clinical efficacy or assessing safety.

With respect to safety, infliximab and adalimumab produced lower rates of infections. Contrarily, tofacitinib was suggested to have a high one; although, in most cases these were minor, there were an increased risk of opportunistic infections [38, 39]. In all cases, the infections could be altered by concomitant drugs [40, 41]. Meanwhile, the SAEs would be caused in higher proportion for all the studied drugs than for the placebo, without a statistical significance. Due to this, in general terms, all treatments were well tolerated.

This work is in concordance with previously published systematic reviews and NMAs. However, some differences were found during the study. Among the biological treatments, results for induction therapy, as well as the SAE’s outcomes, correspond with the ones obtained by Mei et al. [42]. Contrarily, for maintenance therapy, it was concluded that there were no differences among biological drugs, while in our NMA, vedolizumab appeared to be better than the anti-TNF agents. In this study, apart from the drugs included in the work of Singh et al. [43], etrolizumab and ozanimod were included. Similar results were obtained for naïve patients to anti-TNF agents, with the exception that in the current study, etrolizumab and ozanimod ranked higher than vedolizumab for the induction of clinical remission and ozanimod ranked the highest in the induction of mucosal healing. Results obtained were different than those obtained by Bonovas et al. [44]; this could be influenced by the inclusion of ozanimod and etrolizumab in the current NMA. Bonovas et al. [44] found vedolizumab to have the highest probability of being the best in terms of clinical remission, while infliximab was ranked the highest in the current study. The results for induction of mucosal healing follow the same trend, where ozanimod is ranked the best instead of infliximab, as Bonovas et al. [44] indicated. Results for maintenance therapy are different to those in the current study, since tofacitinib was not included. Besides, this research studied the outcome of sustained clinical remission.

All studies demonstrated a sufficient quality after being checked by the Jadad scale. However, the Mshimesh study [30] showed RoB, specifically selection bias.

All CTs included are homogeneous in most terms. Groups were well balanced except for the randomization of induction of tofacitinib, where patients were randomly in the 4:1 ratio tofacitinib or placebo, respectively [35]. Clinical outcomes were defined and measured by the Mayo score, although etrolizumab [36] was laxer in terms of the definition of the UC, while the CT of Mshimesh [30] was harsher. Similarly, the OCTAVE study [35] was the most restricted, including two subscores of Mayo score. The basal characteristics were similar: most participants were adults, mainly men; the age range was 36–44 years old. However, Suzuki et al.’s [29] study included patients from the age of 15. Almost all groups of each CT had a Mayo score between 8 and 9 points. The patients of the ULTRA 1 [27] and 2 [28] CTs; OCTAVE 1, 2 and Sustain CT [35]; etrolizumab CT [36]; and the study of Mshimesh [30] had severe diseases. The median duration of disease was 5 to almost 10 years.

It should be noted that there were some limitations to this study, the first of which is the time when clinical outcomes were measured. For the induction therapy, the time of measurement was 6 [31, 36] or 8 [24,25,26,27,28,29,30, 35, 37] weeks; for maintenance therapy, the time was 48 [24], 52 [28, 29, 35], or 54 [32, 33] weeks. Secondly, is was difficult to compare the naïve population and the second-line of treatment, due to the accessibility of data.

Conversely, some trials used a re-randomization approach for maintenance [32, 34, 35]. However, re-randomization was only used among those who responded to the drug initially. Therefore, this increases the uncertainty of the results of the maintenance analyses.

This NMA only contemplates naïve patients for biological treatment; most of the CTs included just this kind of patients. That made the comparison easier and more realistic, thus diminishing the uncertainty associated with assumptions. However, the ULTRA 2 [28] CT; the OCTAVE 1, 2 and Sustain [35] study; etrolizumab [36] and ozanimod [37] CTs; and GEMINI I [34] included patients for first- and second-line therapy for biological treatments. The ULTRA 2 CT reported the outcomes separately. For tofacitinib, separate results were published for the induction therapy. The etrolizumab CT only provided results for naïve patients for induction of clinical remission; however, we decided to include them jointly with the pre-treated ones, due to the small sample size of each group. The outcomes for ozanimod expressed were in tandem for the first- and second-line of treatment; the same is true for GEMINI I CT. Recently, a post hoc analysis of GEMINI I was published [45], with differentiated subgroups. However, we did not take it into account due to less evidence being presented as a posterior analysis.

At present, the availability of data for pre-treated patients in randomized, double-blind CTs is very poor. It would be very interesting to obtain details, because many patients receive successive therapy with these drugs and the results are different depending on previous treatment and the reason for discontinuation.

Conversely, for patients who are steroid-dependent, the clinical remission free-steroids would be very useful for clinical practice and for the safety of the patients. It would be highly interesting to have a separate assessment of these studied treatments, depending on the use of steroids.

A limitation for clinical practice is that we considered the dose and treatment regimen of their SmPC [46,47,48,49] and did not consider the short temporal discontinuations of treatment or the intensifications of doses. The small sample size and the lack of head-to-head trials could increase the uncertainty of the results. Considering the differences among the assessed drugs, the comparisons of the same by head-to-head studies would implicate a large sample size.

CT showed that all of these treatments are effective in UC therapy. They implicate a high burden impact on the healthcare system. Due to this, the efficiency of the treatments is required to make the best decisions and provide patients the best therapy regimens. In this study, infliximab was showed to be the best in induction and maintenance treatment and considered the most cost-effective for naïve patients [50]. However, potential differences to other cost-effectiveness analysis might include, among other aspects, the price of drugs in different settings [51]. Therefore, the use of the biosimilar of infliximab makes this therapy as a more efficient drug. Apart from that, infliximab, adalimumab, golimumab and vedolizumab are known drugs to clinicians; they know the efficacy and safety in the real population. In contrast, new drugs should be used with caution.

Conclusion

This NMA suggests that infliximab is generally the best therapeutic option for moderate-to-severe UC in naïve patients. Vedolizumab seems to have better outcomes in maintenance than in induction therapy, appearing superior to golimumab and adalimumab, which appear to have the worst results. Tofacitinib, the first new oral treatment with a new and different target, indicates very successful results, mainly for maintenance therapy. Ozanimod and etrolizumab presented encouraging results in their phase 2 CT, which should be confirmed in their ongoing phase 3 CT. All treatments are generally well tolerated.

The therapeutic algorithm of UC will become more complex every day and it will be necessary to place the treatment in clinical practice. Due to this, the evidence found would be useful for clinical decisions, although head-to-head comparisons for different kind of patients are necessary and the costs of treatment and the preferences of patients should be considered.