Introduction

Multiple sclerosis (MS) is a chronic inflammatory degenerative disorder of the central nervous system [1]. There are four types of MS, the most frequent being relapsing–remitting MS (RRMS), which accounts for about 85% of cases. The other types are primary progressive MS, secondary progressive MS, and clinically isolated syndrome [2, 3]. In 2020, about 2.8 million people worldwide suffered from MS (estimated prevalence 35.9 per 100,000 population) [4]. The mean age of diagnosis is about 32 years, which means that the disease affects individuals in their most productive years of life. As MS leads to significant disability and reduced quality of life, it is considered a serious social and economic burden. There is currently no known cure for MS, but there are treatments that slow down the progression of the disease and allow patients to function normally in the society. Since 1993, disease-modifying therapies (DMTs) such as interferon beta-1a, interferon beta-1b, peginterferon beta-1a, glatiramer acetate, dimethyl fumarate, teriflunomide, natalizumab, fingolimod, cladribine, ocrelizumab, and alemtuzumab have been approved for use in patients with relapsing MS or RRMS [5, 6]. However, despite the availability of numerous DMTs, the treatment of MS remains a challenge in modern medicine, and there is an ongoing search for new drugs with a higher clinical efficacy and a better safety profile compared with the currently available pharmacotherapy.

In the last few years, there has been a dynamic increase in the number of registered DMTs, with three new therapies approved, including ozanimod (May 2020) [7], ofatumumab (March 2021) [8], and ponesimod (May 2021) [9]. As the available DMTs vary in efficacy, in 2015, the Association of British Neurologists proposed a classification of DMTs used in the treatment of RRMS into two groups. The first group included drugs with moderate efficacy (average relapse reduction in the range of 30–50%), such as interferons beta, glatiramer acetate, teriflunomide, dimethyl fumarate, and fingolimod, typically used in the first-line setting. The other group included drugs with high efficacy (average relapse reduction substantially higher than 50%), namely, alemtuzumab and natalizumab [10]. According to another definition, “high-efficacy DMTs” are therapies that are more effective than the first-line therapies in reducing relapse rate and disability progression. Current evidence and recent reviews support the high efficacy of alemtuzumab, natalizumab, fingolimod, cladribine, ofatumumab, ozanimod, and ocrelizumab for RRMS treatment [11,12,13,14]. While the newest DMT, ponesimod, was not included in this list, it also has the potential to be classified as a high-efficacy drug. According to the results of phase III OPTIMUM trial, published in 2021, ponesimod reduced the relapse rate by 30.5% and was significantly superior to an active comparator, teriflunomide [9]. High-efficacy therapies are generally recommended for patients with rapidly evolving, severe disease or failure of other therapies [14, 15].

To determine the best therapy for an individual patient, it is important to consider not only the efficacy but also the risk of side effects that might significantly affect drug tolerability and patient comfort [14, 15]. Although numerous studies have been conducted to date, there have been no direct comparisons of high-efficacy therapies in terms of the risk of adverse events (AEs). As direct head-to-head evidence on the safety profile is lacking, indirect evidence may facilitate decision-making based on the evaluation of DMT toxicity. Studies published so far have focused mainly on the assessment of key clinical efficacy parameters, while providing only an overall assessment of the safety profile [14,15,16,17,18,19,20]. There have been a few systematic reviews with network meta-analyses (NMAs) dedicated solely to in-depth safety assessment [21]. Moreover, as new DMTs have been approved for use over the last years and evidence for the previously approved DMTs has been growing, there is an ongoing need to update the results of indirect comparisons. The assessment of the safety profile is important to support clinicians and patients in the choice of therapy. In addition to efficacy, the ranking of drugs in terms of individual safety endpoints may help select the most appropriate drug for an individual patient out of the numerous available DMTs. Finally, data on the safety profile of DMTs may provide useful additional information to guide health care policymakers in their decisions on reimbursement.

Considering the above gaps in knowledge and evidence, we aimed to compare the safety profile of high-efficacy DMTs used in adult patients with RRMS. We conducted a systematic review with an NMA focused on the overall and in-depth safety assessment of selected high-efficacy DMTs.

Materials and methods

General principles

The systematic review was conducted according to the recommendations of the PRISMA Extension Statement for Reporting of Systematic Reviews Incorporating Network Meta-Analysis of Health Care Interventions [22, 23], a guideline for conducting and interpreting the NMA (developed by the International Society for Pharmacoeconomics and Outcomes Research Task Force [24] and Cipriani et al. [25]). The systematic review was registered in the PROSPERO database (registration number: CRD42021286362) [26].

Data sources and search

Eligible studies were identified by searching the three main databases: MEDLINE (via PubMed), EMBASE, and Cochrane Library. The search was conducted in November 2021. Only studies published in English were considered. The search strategy was based on the medical subject heading (MeSH) terms or Emtree terms combined with Boolean logical operators (the complete search strategy is described in Supplementary file, Tables 1421). We also searched trial registration databases (https://clinicaltrials.gov/ and https://www.clinicaltrialsregister.eu/), the reference lists of the most recent systematic reviews on the use of DMTs in RRMS [16,17,18,19,20,21], and the reference lists of the included studies.

Selection criteria

Detailed inclusion and exclusion criteria for the systematic review and meta-analysis (described in Supplementary file, Tables 22, 23) were generally in line with previous NMAs [16, 18,19,20].

Studies were considered eligible if they were prospective randomized controlled trials (RCTs) published in English, were conducted in a group of adult patients with a clinical diagnosis of RRMS (> 85% of the population), and had a follow-up of at least 1 year (i.e., at least 48 weeks) and involved at least 70 participants in each study arm. In terms of the interventions, the eligibility criteria were broad and encompassed not only high-efficacy or potentially high-efficacy DMTs approved for use (i.e., ofatumumab, ocrelizumab, natalizumab, fingolimod, alemtuzumab, ponesimod, ozanimod, and cladribine) but also other approved DMTs such as interferon beta-1a, interferon beta-1b, peginterferon beta-1a, dimethyl fumarate, teriflunomide, and glatiramer acetate compared with one another or with placebo.

The outcomes of interest were AEs, serious AEs (SAEs), discontinuation of study drug due to AEs, and individual AEs most commonly reported in the summary of product characteristics [7,8,9, 27,28,29,30,31]. These included infections, serious infections, upper respiratory tract infections, nasopharyngitis, urinary tract infections, fatigue, headache, and nausea. Non-randomized studies, unpublished studies, and studies published only as conference abstracts due to concerns about methodology and/or obtained results were excluded. If no appropriate data were available in a full-text publication, information from clinical trial registries was allowed.

Trials were selected in accordance with the PRISMA recommendations [22, 23]. The titles and abstracts of studies identified during the database search were analyzed, following which a list of studies that initially met the inclusion criteria was prepared. The next step was the selection of studies on the basis of full-version articles, considering all the inclusion and exclusion criteria for the analysis. This yielded the final list of studies that were then thoroughly assessed for bias and the reported results. Trials were selected by two independent reviewers (K. Ś., O. O.). Any discrepancies at all stages of the review were resolved through discussion, consultation with a third reviewer (P.K.), and, finally, by consensus. However, there was a high degree of compatibility between the reviewers (96%).

Data extraction and quality assessment

Data from included studies were extracted independently by two reviewers (K. Ś., O. O.) using a predefined data extraction form. The following information was extracted and analyzed to assess the homogeneity of trials: design (methodology), patient characteristics, treatment regimen and previous therapy, duration of follow-up/treatment, and the size of the study arms. The quality of eligible RCTs was evaluated using the Cochrane risk-of-bias tool for randomized trials [32], which allows an evaluation of specific domains: sequence generation, allocation concealment, blinding of participants, blinding of outcome assessment, incomplete outcome data, selective outcome reporting, and “other issues.” The domain-based evaluation allows the assignment of the following ratings to each of the domains: low risk of bias (“ + ”), high risk of bias (“–”), or unclear risk of bias (“?”). The results of the risk-of-bias assessment for individual trials were presented graphically using Review Manager v.5.4.1.

Data analysis and synthesis

The NMA was conducted using the R software netmeta package [33], which incorporates the graph-theoretical method of an NMA (vertices, treatments; edges, randomized comparisons) and provides a point estimate from the network along with 95% confidence intervals (CIs). This frequentist method is an alternative to a standard NMA conducted within the Bayesian framework [34].

In the NMA, we used consistency and random-effects models with adjustments for multi-arm studies. All eligible treatments and their regimens with different doses or dosing intervals from the identified studies were included in the network, and each treatment at a given dosage regimen constituted one node (vertex in a graph). However, in the manuscript, only the treatments of interest at their licensed dosage regimens were presented: natalizumab (400 mg every 4 weeks intravenously [IV]), fingolimod (0.5 mg every day orally [PO]), alemtuzumab (12 mg IV), cladribine (3.5 mg/kg PO), ofatumumab (20 mg every 4 weeks subcutaneously [SC]), ponesimod (20 mg every day PO), ocrelizumab (600 mg IV), ozanimod (0.92 [1 mg] mg every day PO), dimethyl fumarate (240 mg twice a day PO), glatiramer acetate (20 mg per day SC and 40 mg three times a week SC), interferon beta-1a (30 µg every week intramuscular [IM] and 44 μg three times a week SC), interferon beta-1b (250 μg every other day SC), peginterferon beta-1a (125 μg every 2 weeks SC), and teriflunomide (7 mg every day PO and 14 mg every day PO).

All comparisons assessed in the trials, including suboptimal and experimental dosage regimens and comparator treatments not assessed in the review, were presented in Supplementary file. The heterogeneity of evidence was assessed using the Q test, I2 statistic, and tau values, and consistency was assessed using the splitting approach and comparison with direct evidence [35]. Publication bias was assessed by examining the funnel plot for “small-study effects.”

The ranking of the treatment was conducted using the P score, a frequentist equivalent of the surface under the cumulative ranking. A higher P score corresponds to a higher ranking for safety (i.e., lower risk of AEs) [36]. Caution should be exercised when interpreting the treatment ranking alone, because it informs only about the probability of a treatment to be the best while not incorporating the effect size of the difference between treatments directly. The average probability of an event along with relative measures from the NMA should be considered with the treatment rankings [37, 38].

The NMA was conducted for odds ratio (OR) to calculate the average probability of an event for each treatment, using the assumed probability in the control arm. The latter was obtained from the meta-analysis of placebo arms from all studies included in the NMA, using the random-effects model based on the Freeman-Tukey (double arcsine) transformed proportion. The networks were created for each of the specified clinical outcomes if there were similar definitions of the outcome and sufficient information reported in a study.

Results

Search results and included studies

The database search for trials with high-efficacy DMTs and other DMTs provided a number of RCTs that met the criteria for inclusion in the review and NMA (Fig. 1). Overall, 8831 possibly relevant publications were identified, of which 8706 were excluded after screening the titles and abstracts. Of the 125 articles assessed in the full-text review, 48 were excluded (Supplementary file, Table 24) and 33 trials were included in the review [39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82] (77 references, 31,926 patients randomized). The characteristics of the studies are presented in Table 1 and Supplementary file, Table 25.

Fig. 1
figure 1

Search flow diagram

Table 1 Methodology of trials included in network meta-analysis

All studies were multicenter randomized phase III trials (except phase II CAMMS223, phase IV REGARD, and trials with an undefined phase, i.e., INCOMIN, INFB MS, and PRISMS) with a parallel design. Most trials were double blind, except ASSESS (rater and dose blind), BEYOND (double blind only between two doses of interferon), BRAVO (double blind only for a comparison of placebo versus laquinimod but rater blind for placebo versus interferon beta-1a), CAMMS223 (single blind), CARE-MS I, CARE-MS II, and TENERE (rater blind), EVIDENCE (assessor blind), and REGARD (open label). The length of follow-up in most trials (n = 20) was about 2 years; however, 8 studies reported 1-year results (ADVANCE, ASSESS, EVIDENCE, GALA, SUNBEAM, TENERE, TOWER, TRANSFORMS) and 5 studies had a follow-up longer than 2 years (ASCLEPIOS I and II, BEYOND, CAMMS223, CombiRx). Although some studies included patients with relapsing MS, ultimately over 91% of the participants of all trials were adults with RRMS. The majority of patients were women (62–81%), and the baseline Expanded Disability Status Scale (EDSS) score ranged from 1.9 (CAMMS223, CombiRx) to 3.0 (INFB MS), which indicates mild disability. The duration of MS from the first symptoms ranged from 1.0 (CombiRx) to 10.6 (FREEDOMS) years, and the number of relapses in the last year ranged from 1.0 (BRAVO) to 3.6 (INFB MS). In BEYOND, CAMMS223, CARE-MS I, CMSSG, CombiRx, INCOMIN, INFB MS, and MSCRG trials, the enrolled patients were previously untreated with any DMTs or selected DMTs, while in AFFIRM, EVIDENCE, and REGARD, patients had not used DMTs for a few months before randomization. If patients previously treated with DMTs were included, usually a discontinuation (wash-out) period before the entry into the study was required, and patients using the study drug were excluded. In the identified trials, high-efficacy DMTs were either compared with placebo (AFFIRM, CLARITY, FREEDOMS, FREEDOMS II) or with active comparators such as interferon beta-1a (CAMMS223, CARE-MS I and II, OPERA I and II, RADIANCE, SUNBEAM, TRANSFORMS), teriflunomide (ASCLEPIOS I and II, Optimum), and glatiramer acetate (ASSESS). The risk-of-bias assessment is presented in Fig. 2. The majority of the included trials had a low risk of bias.

Fig. 2
figure 2

Risk-of-bias assessment

NMA results

Thirty-three trials were homogeneous enough to be included in the NMA (Fig. 5). Not all predefined endpoints were reported in each trial. The final number of trials for each endpoint is presented in Supplementary file, Table 26. The results of the NMA included the ranking of high-efficacy DMTs, general and detailed assessment of the safety profile, sensitivity analysis (excluding trials with 1-year follow-up), and assessment of the networks.

Ranking of high-efficacy DMTs

The P score–based ranking of high-efficacy DMTs (including the registered dosages only) is presented in Table 2 and in Supplementary file, Table 41 (all dosage regimens from clinical trials).

Table 2 P score (overall rank based on P score among high-efficacy DMTs and placebo) for assessed endpoints. Numbers in the parenthesis indicate the DMT’s position in the ranking

The results indicated that individual drugs under evaluation are ranked differently depending on the safety endpoint. Considering the rate of any AEs, ozanimod (1 mg) and natalizumab were the best treatment options, while alemtuzumab (12 mg) had the lowest ranking. Interestingly, in terms of treatment discontinuation due to AEs, the best option was alemtuzumab (12 mg) and the worst⁠ option was ponesimod. Ocrelizumab had the highest P score in terms of serious AEs and serious infection, while for any infections, ofatumumab was ranked as the best option.

Considering individual AEs, natalizumab had the highest ranking of the study drugs in terms of urinary tract infections, fatigue, and headache, while alemtuzumab (12 mg) ranked the lowest for urinary tract infections and headache and next to the lowest for fatigue. Ozanimod (1 mg) and ofatumumab were assessed as the safest drugs according to the P score for upper respiratory tract infections, while alemtuzumab for nasopharyngitis. The rate of nausea was not reported for natalizumab and ocrelizumab, while alemtuzumab (12 mg) was ranked as the best option among the remaining high-efficacy DMTs, next to placebo.

General safety profile

The general safety profile was assessed in terms of any AEs, SAEs, and discontinuation of the study drug due to AEs. The included trials generally applied uniform definitions of an AE and discontinuation due to AEs, which allowed us to conduct a credible NMA (Tables 3, 5, Supplementary file, Table 30, 32). There were no significant differences between high-efficacy drugs except the following: (1) alemtuzumab (12 mg) increased the rate of any AEs as compared with all other DMTs and placebo (p < 0.05); (2) cladribine (3.5 mg) increased the rate of any AEs as compared with ozanimod (1 mg) and placebo (p < 0.05); and (3) ocrelizumab increased the rate of any AEs as compared with ofatumumab, ozanimod (1 mg), fingolimod, natalizumab, and placebo (p < 0.05) (Table 3). The average probability of an AE was the highest for alemtuzumab (12 mg) and the lowest for natalizumab (98.2% [95% CI: 95.5; 99.3] vs 82.8% [95% CI: 70.9; 90.5]) (Fig. 3). There were no significant differences between high-efficacy DMTs in terms of discontinuation due to AEs except for ponesimod versus alemtuzumab (12 mg) and placebo (Table 5). The average probability of discontinuation due to AEs was the highest for ponesimod (10.1%; 95% CI: 4.7; 20.4) and the lowest for alemtuzumab (12 mg) (3.0%; 95% CI: 1.3; 6.7). However, the differences between DMTs were generally small (Fig. 3).

Table 3 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of adverse events presented as ORs with 95% CIs (statistically significant results are bolded). ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod
Fig. 3
figure 3

The average probability of A adverse events, B serious adverse events, C discontinuation due to adverse events in relation to placebo (dotted line) for high-efficacy DMTs. ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PON ponesimod

The definitions of a SAE slightly differed between the trials, depending on whether they considered MS as a SAE or not (Supplementary file, Table 27). This may have affected the results of this NMA. Considering only trials of high-efficacy DMTs, MS relapse was considered a SAE in ASSESS, FREEDOMS, and AFFIRM trials, unlike in ASCLEPIOS I and II trials, which did not include it in the definition of a SAE. On the other hand, CARE-MS I and II trials performed separate analyses for SAEs with and without MS. Considering all trials irrespective of the definition of SAE, there were no differences between high-efficacy DMTs in the rate of SAEs, with the exception of a higher risk for cladribine (3.5 mg) versus ocrelizumab (Table 4, Supplementary file, Table 31) and ofatumumab versus ocrelizumab (both p < 0.05). The average probability of a SAE was the highest for cladribine (3.5 mg) (17.3%; 95% CI: 11.1; 25.9) and the lowest for ocrelizumab (8.7%; 95% CI: 5.2; 14.4) (Fig. 3).

Table 4 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of serious adverse events presented as ORs with 95% CIs (statistically significant results are bolded). ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod

Selected adverse events

In a more detailed analysis of the safety profile, DMTs were assessed for individual AEs, including general infections, serious infections, upper respiratory tract infections, nasopharyngitis, and urinary tract infections as well as other commonly reported AEs such as fatigue, headache, and nausea (Table 5).

Table 5 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of discontinuation of the study drug due to adverse events presented as ORs with 95% CIs (statistically significant results are bolded). ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod

The trials differed slightly in the way they reported on infections and serious infections (Supplementary file, Table 27). In some studies, infections were classified according to the MedDRA system organ class, that is, as “infections and infestations,” while in other trials, they were reported only as “infections” or “any infection-associated event.” The differences in definitions might have impacted the obtained results. Considering all trials irrespective of the definition of infections (Table 6), there were no differences between DMTs in terms of infections, except for the following: (1) an increased rate of infections for alemtuzumab (12 mg) versus ocrelizumab; (2) an increased rate of infections for cladribine (3.5 mg) versus ofatumumab and ozanimod (1 mg); and (3) a lower rate of infections for ofatumumab versus placebo. The average probability of any infections was the highest for cladribine (3.5 mg) (62.0%; 95% CI: 55.5; 68.1) and the lowest for ocrelizumab (47.2%; 95% CI: 33.6; 61.2), with a difference between the two DMTs of nearly 15% (Fig. 4). In the case of serious infections (Table 7), a significant increase was found only for alemtuzumab (12 mg) versus ocrelizumab. The average probability of serious infections was generally low and not significantly different between individual high-efficacy DMTs, with the highest rate of 4.6% (95% CI: 1.5; 13.3) for fingolimod and the lowest rate of 0.4% for ocrelizumab (95% CI: 0.0; 4.6) (Fig. 4).

Table 6 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of infections as ORs with 95% CIs (statistically significant results are bolded). ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod
Fig. 4
figure 4

The average probability of A infections, B serious infections, C upper respiratory tract infections, D nasopharyngitis, E urinary tract infections, F fatigue, G headache, H nausea, in relation to placebo (dotted line) for high-efficacy DMTs. ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PON ponesimod

Table 7 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of serious infections as ORs with 95% CIs (statistically significant results are bolded). ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod

There were no significant differences among individual high-efficacy DMTs as well as between DMTs and placebo in the rate of upper respiratory tract infections (Table 8), nasopharyngitis (Table 9), and fatigue (Table 11). However, due to differences in reporting, the natalizumab trial was excluded from the NMA for upper respiratory tract infections. There were generally minor differences between the drugs with the highest and the lowest average probability (Fig. 4), with an 8.1% difference in the average probability of an event for upper respiratory tract infections (cladribine: 18.8%; 95% CI: 12.8; 26.8 vs ozanimod: 10.7%; 95% CI: 6.3; 17.5), 4.8% for nasopharyngitis (cladribine: 21.3%; 95% CI: 15.5; 28.6 vs alemtuzumab: 16.5%; 95% CI: 10.6; 24.8), and 5.3% for fatigue (ozanimod: 11.5%; 95% CI: 5.3; 23.2 vs natalizumab: 6.2%; 95% CI: 3.5; 10.6).

Table 8 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of upper respiratory tract infections as ORs with 95% CIs. ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod
Table 9 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of nasopharyngitis infections as ORs with 95% CIs. ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod

Considering urinary tract infections (Table 10), no significant differences were found between individual high-efficacy DMTs as well as between DMTs and placebo, except for alemtuzumab (12 mg), which showed a higher rate of urinary tract infections versus ocrelizumab (Table 11). The average probability of an event was the highest for alemtuzumab (12 mg) (15.1%; 95% CI: 7.2; 29.2) and the lowest for natalizumab (5.9%; 95% CI: 2.8; 12.0) (Fig. 4).

Table 10 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of urinary tract infections as ORs with 95% CIs (statistically significant results are bolded). ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod
Table 11 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of fatigue as ORs with 95% CIs. ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod

No significant differences between high-efficacy drugs were revealed for headache (Table 12), except the following: (1) alemtuzumab (12 mg) versus all the other high-efficacy DMTs and placebo; (2) cladribine (3.5 mg) versus natalizumab; and (3) fingolimod (0.5 mg) versus natalizumab. The average probability of an event was notably higher for alemtuzumab (12 mg) (50.3%; 95% CI: 33.5; 67.1) as compared with all the other DMTs. The difference between alemtuzumab and the drug with the lowest average probability of an event (i.e., natalizumab, 11.5%; 95% CI: 6.0; 20.9) reached almost 39% (Fig. 4).

Table 12 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of headache as ORs with 95% CIs (statistically significant results are bolded). ALE alemtuzumab, CLA cladribine, FIN fingolimod, NAT natalizumab, OCR ocrelizumab, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod

The NMA showed no significant differences between high-efficacy drugs in the rate of nausea (Table 13). However, information on the rate of nausea was lacking in the natalizumab and ocrelizumab trials. The average probability of an event was the highest for ozanimod (1 mg) (26.4%; 95% CI: 7.6; 60.8). It was more than twofold higher than for alemtuzumab (12 mg) (10.1%; 95% CI: 4.7; 20.3) and about twofold higher than for other DMTs (Fig. 4).

Table 13 Results of a comparative analysis of high-efficacy DMTs (at registered dosage) in terms of nausea as ORs with 95% CIs. ALE alemtuzumab, CLA cladribine, FIN fingolimod, OFA ofatumumab, OZA ozanimod, PBO placebo, PON ponesimod

Sensitivity analysis

After excluding trials with a follow-up duration of only 1 year, the results of the sensitivity analysis showed similar statistical significance as those of the base-case analysis for AEs, serious AEs, discontinuation due to AEs, upper respiratory tract infections, nasopharyngitis, urinary tract infections, headache, and nausea for all high-efficacy DMTs versus placebo. We were unable to conduct an NMA for infections, serious infections, and fatigue, because there were not enough trials assessing these outcomes after a follow-up of at least 2 years.

Assessment of the networks

The level of heterogeneity of the effect sizes (Supplementary file, Table 42) was low in the networks assessing AEs, SAEs, infections, serious infections, and fatigue (I2 = 0% for all) as well as in those assessing upper respiratory tract infections (I2 = 11.9%) and nasopharyngitis (I2 = 0.8%). A moderate heterogeneity was noted for the networks assessing discontinuation due to AEs (I2 = 40.1%), urinary tract infections (I2 = 37.1%), and nausea (I2 = 35.5%), while moderate substantial heterogeneity⁠ was observed for those assessing headache (I2 = 60.2%). Overall, significant heterogeneity (within designs) was found only in the network for headache (p = 0.0004).

There was evidence of inconsistency in the networks for discontinuation due to AEs and headache, which reached borderline significance (between designs; p = 0.0384 and p = 0.0517, respectively). The splitting approach revealed a disagreement between direct and indirect evidence for alemtuzumab (12 mg) versus interferon beta-1a (44 µg) and alemtuzumab (24 mg) versus interferon beta-1a (44 µg) studies in the headache network. Moreover, a disagreement was found for dimethyl fumarate versus glatiramer acetate and glatiramer acetate versus interferon beta-1b studies in the network for discontinuation due to AEs. Therefore, the results of those networks should be interpreted with caution.

The evidence for alemtuzumab (12 mg) was a major contribution to the observed heterogeneity of the network for headache (total heterogeneity and heterogeneity within designs) and urinary tract infections (only total heterogeneity). The heterogeneity within designs for the headache network may be due to the duration of alemtuzumab trials and the number of randomized patients (CAMMS223 trial: 36-month follow-up, about 110 patients per group; CARE-MS I and II trials: 2-year follow-up and a higher number of randomized patients than in CAMMS223).

Considering the network for discontinuation due to AEs, the heterogeneity was caused mainly by glatiramer acetate and dimethyl fumarate studies. Because these studies differed in design (four-arm, three-arm, or two-arm studies), the presence of numerous arms may have resulted in an inconsistency between direct and indirect evidence.

Overall, the ORs from all NMA models (direct and indirect evidence combined) were similar to direct evidence (meta-analyses of head-to-head studies). No trace of publication bias was found in any of the networks.

Discussion

In recent years, the number of approved DMTs for MS has been gradually increasing. Considering the limited availability of high-quality studies allowing direct comparisons, there is a strong need for a reliable indirect comparison of the efficacy and safety of DMTs. According to clinical guidelines, in active RRMS, the selection of a DMT should depend on disease severity and activity, patient characteristics and comorbidities, drug availability, patient preferences as to the route of administration, and safety profile [14, 15]. Generally, most high-efficacy DMTs, such as natalizumab, fingolimod, cladribine, alemtuzumab, and ocrelizumab, are often used in the second-line setting and/or in patients with highly active disease course [14, 15]. Most systematic reviews with NMA published so far have focused primarily on aspects related to efficacy, assessment, and comparison of individual DMTs by their annual relapse rate, and confirmed progression of disability at 3 or 6 months. Regarding safety, recently published NMAs have focused only on general safety endpoints such as the overall frequency of AEs, serious AEs, or discontinuation of therapy due to AEs [17, 18, 20]. It should be noted that previous systematic reviews differed in methodology in terms of the analyzed DMTs and the size and follow-up duration of included studies, as well as the methods of analysis (frequentist or Bayesian approach) [17, 18, 20].

Our systematic review with NMA focused mainly on a comprehensive safety assessment of 8 DMTs characterized by high clinical efficacy: natalizumab, fingolimod, cladribine, alemtuzumab, ocrelizumab, ofatumumab, ozanimod, and the novel one—ponesimod. For the comprehensive assessment, large RCT trials (> 70 patients in each arm) with at least 48 weeks of follow-up were considered. The differences between trial designs and characteristics of included populations were examined. By combining the direct and indirect evidence from 33 RCTs assessing also other DMTs, we were able to make several observations that might be useful for clinicians, patients, and healthcare decision-makers.

In terms of the general safety profile, our NMA revealed that alemtuzumab (12 mg) significantly increased the rate of AEs compared with all other high-efficacy DMTs and placebo. Therefore, it ranked the lowest for this outcome, as based on the P score. Furthermore, AEs were more often observed for cladribine (3.5 mg) versus ozanimod (1 mg) and placebo as well as for ocrelizumab versus ofatumumab, ozanimod (1 mg), fingolimod, natalizumab, and placebo. On the other hand, no significant differences between all high-efficacy DMTs were revealed in terms of discontinuation due to AEs (except ponesimod versus alemtuzumab), with alemtuzumab ranked as the best option. This suggests that despite the higher risk of AEs, AEs for alemtuzumab were mild enough so as not to lead to discontinuation of therapy. The low risk of discontinuation due to AEs for alemtuzumab may also result from the frequency of administration (only for 5 consecutive days during the first year), as compared with other DMTs that require more applications during the year (e.g., oral therapies with ponesimod, fingolimod, and ozanimod that require daily administration). Furthermore, the difference in the average probability of the discontinuation of the study drug due to AEs between alemtuzumab and ponesimod was relatively small (7.1%), with the lowest P score. The obtained results are in line with a previous NMA by Liu et al. [17], in which narrower criteria were applied in terms of follow-up duration (only 24 months) and high-risk studies were excluded. The NMA revealed that among all approved DMTs (except ponesimod, which still awaited approval at that time), alemtuzumab had the best surface under the cumulative ranking curve in terms of discontinuation due to AEs [17].

In our study, a similar rate of serious AEs was revealed between individual high-efficacy DMTs as well as between DMTs and placebo, with the exception of cladribine (3.5 mg) versus ocrelizumab and ofatumumab versus ocrelizumab. The differences in the average probability of SAE among DMTs did not exceed 9%, and the incidence of SAEs was comparable between all DMTs and placebo. Although numerous studies described SAEs, they were not reported in a standardized manner regarding the inclusion or exclusion of MS relapse as a SAE. Therefore, our results for SAE should be interpreted with caution. In a meta-analysis including studies with a 24-month follow-up and excluding MS relapse as a SAE, Giovannoni et al. [20] also found no differences in the rate of SAEs for natalizumab, fingolimod, and cladribine (3.5 mg) versus placebo. Interestingly, the sensitivity analysis conducted by Giovannoni et al. [20] revealed that the inclusion of MS relapse as a SAE had only a small impact on the obtained results. Given the differences between studies in terms of considering relapse as SAE, there is a need to standardize the definition of SAE in studies for MS.

Patients with MS are generally at higher risk of infections as well as infection-related hospitalizations and mortality compared with individuals without MS. Therefore, it is important to assess if DMTs affect the incidence of infective events [116, 117]. The rate of infections or serious infections was not commonly reported in trials included in our NMA (in 17 and 18 trials, respectively), while the other AEs of interest were reported in ≥ 20 trials included in our NMA. Despite some differences in the definition of infections or infections/infestations, our results revealed that high-efficacy DMTs have a comparable OR of infections. Higher infection rates were noted only for cladribine (3.5 mg) versus ofatumumab and ozanimod (1 mg) as well as for alemtuzumab (12 mg) versus ocrelizumab. There was a notable difference of 15% in the average probability of any infections between the best option, ocrelizumab, and the worst option, cladribine. However, there were no significant differences between all DMTs and placebo. Considering serious infections regardless of the definition, no differences were found among high-efficacy DMTs, except cladribine (3.5 mg), which showed higher rates in comparison with ocrelizumab. The rate of serious infections was generally low, and the average probability of these events did not exceed 4.6%. No significant differences were found between all high-efficacy DMTs (except the ponesimod trial, which did not report the incidence of infection, and the ozanimod (1 mg) trial, which did not report the incidence of serious infections) and placebo in terms of infections and serious infections.

Our NMA revealed no significant differences in the OR for the most common types of infections (i.e., upper respiratory tract infections, nasopharyngitis, and urinary tract infections) among all high-efficacy DMTs or between DMTs and placebo. The values of P-scores for assessed DMTs revealed that the best option in terms of upper respiratory tract infections, nasopharyngitis, and urinary tract infections was ozanimod (1 mg), alemtuzumab (12 mg), and natalizumab, respectively. However, there were no clear advantages of these drugs over the others in terms of the ranking and average probability of an event. It should be noted that the level of effect size heterogeneity was very low in the NMA for infections, serious infections, upper respiratory tract infections, and nasopharyngitis. For urinary tract infections, the total heterogeneity was minor (I2 = 37.1%).

Our results indicated that there were no significant differences among high-efficacy DMTs considering the rate of fatigue and nausea. However, the incidence of nausea was not reported for the natalizumab and ocrelizumab trials, which constitutes a limitation. Although there were no significant differences in the average probability of nausea between DMTs and placebo, the difference was about twofold higher between alemtuzumab (the best option) and ozanimod (1 mg). Furthermore, we observed small heterogeneity (I2 = 35.5%) in the NMA for nausea, caused mainly by alemtuzumab studies.

In terms of headache, alemtuzumab (12 mg) showed a significantly higher OR compared with other high-efficacy DMTs and placebo. On the other hand, natalizumab (classified as the best option based on the P-score) showed a lower risk of headache when compared with cladribine (3.5 mg) and fingolimod (0.5 mg) and also fingolimod (0.5 mg). All other DMTs revealed a similar rate of headache versus placebo, but a notable difference was observed in the average probability of headache between alemtuzumab and other DMTs, especially as compared with natalizumab (39%). The network for headache showed the highest heterogeneity (I2 = 60.2%) of all safety outcomes. Lucchetta et al. [21] performed an NMA including trials with only 48 months of follow-up. They revealed no significant differences in the OR for headache between natalizumab, fingolimod, cladribine (3.5 mg), ocrelizumab, alemtuzumab (12 mg), and placebo. However, Lucchetta et al. [21] did not include the recent ofatumumab and ponesimod trials as well as the ASSESS trial for fingolimod.

As in the case of all indirect comparisons, our study has several limitations to be acknowledged. Owing to some differences between the compared studies, the results should be interpreted with caution. According to the predefined inclusion criteria, we included trials with a relatively long follow-up of at least 1 year (≥ 48 weeks). The final analysis included only 7 trials with 1-year follow-up, while the remaining trials had a follow-up duration of at least 2 years. Thus, the results may not apply to short-term treatment outcomes. It should be noted that the results of the sensitivity analysis, which excluded trials with only 1-year follow-up, were in line with the results of the base-case analysis.

Another important aspect to consider is the comparability of the included trials, especially regarding the patient population. The homogeneity of the trials in terms of methodology (i.e., randomization, parallel design) and baseline population characteristics (i.e., age, female ratio, disease duration, and activity) was considered sufficient to conduct an NMA. However, there were some differences regarding previous therapies. Few studies included only patients who were naïve to DMTs. In most studies, participants were previously treated with DMTs or immunosuppressive drugs, which suggests a more severe course of the disease. The high burden of diseases and a high number of previous therapies may potentially affect safety analysis. However, most trials required a wash-out period between the previous and current therapy. This could minimize the potential effects of previous therapies on safety outcomes. It should be noted that other studies conducted NMAs (mainly concerning the clinical efficacy of DMTs) despite differences in the baseline characteristics of patients [16,17,18,19,20].

Furthermore, the assessed safety outcomes were not reported in all included trials. If no data were reported in a full-text publication, information from clinical trials registries was used. This may be a potential source of bias because these are not officially published results. Sometimes, individual but important AEs, such as serious infections, are not reported in all publications, which makes it difficult to perform comparative analyses between DMTs. Therefore, it is important that publications for MS consider not only the most common events for a given drug but also those important for the disease course or for comparison with other therapies.

Although the results for the majority of highly active DMTs were comparable for most study endpoints, almost all DMTs assessed here have some rare but specific adverse drug reactions. For example, autoimmune diseases (mainly of the thyroid) were reported for alemtuzumab [118], while natalizumab was associated with the risk of progressive multifocal leukoencephalopathy [119, 120], which requires monitoring during therapy. Our outcomes of interest were the individual AEs most reported in the summary of the product characteristics of the assessed DMTs. Therefore, we excluded adverse events of special interest for individual DMTs as those were of low incidence, required a longer exposure period of each DMT, and/or were not reported in some RCTs.

The frequencies of some rare and/or specific adverse drug reactions, which were revealed during a long treatment period, are often obtained from observational studies or long-term extension studies of a single arm. For example, the risk to the fetus is important in terms of DMTs safety assessment, especially for women with potential for childbirth. However, it should be mentioned that in most RCTs for DMTs, pregnancy and breastfeeding are exclusion criteria [41, 46, 47, 93,94,95, 97,98,99], so pregnancy outcomes are generally not evaluated in clinical trials. Most of the high-efficacy DMTs, according to their summary of product characteristics [7, 8, 27,28,29,30,31], are contraindicated during pregnancy or discontinuation should be considered, according to the benefit/risk assessment.

The assessed DMTs also differ in the administration routes. Cladribine, fingolimod, ozanimod, and ponesimod are taken orally; alemtuzumab and ocrelizumab, intravenously; natalizumab, intravenously or subcutaneously; and ofatumumab, subcutaneously. These differences can impact patient adherence and comfort of use and should also be individually taken into account by the physician and patients before choosing the therapy.

In conclusion, clinicians choosing an appropriate therapy for their individual patients should consider both the effectiveness of DMTs and the general safety profile of DMTs, the incidence of the most common AEs (with a focus on specific AEs), and the route of administration. Despite limitations, this systematic review with NMA provides the most up-to-date results for the currently available high-efficacy DMTs in terms of an in-depth comparative safety analysis.

Our findings may aid clinicians and patients in choosing the best treatment option out of a wide range of available DMTs. Moreover, they may serve as guidance for healthcare policymakers in developing reimbursement policies or as a reference in planning future clinical studies.