FormalPara Key Points

Alemtuzumab, natalizumab and ocrelizumab are the most effective therapies for the annualised relapse rate in the treatment of relapsing–remitting multiple sclerosis (RRMS), among disease-modifying therapies.

RRMS guidelines should consider a three-category classification: high efficacy (i.e. alemtuzumab, natalizumab, ocrelizumab), intermediate efficacy (i.e. cladribine, fingolimod, dimethyl fumarate) and low efficacy (i.e. peginterferon, glatiramer acetate, interferons and teriflunamide).

It seems that high- and intermediate-efficacy therapies could be the first-line treatment for patients with more severe RMSS conditions.

1 Introduction

According to the Multiple Sclerosis International Federation, 2.3–2.5 million people had multiple sclerosis (MS) in 2013 (2.1–140/100,000 residents) [1, 2]. MS is classified into four major phenotypes: relapsing–remitting multiple sclerosis (RRMS), primary progressive MS (PPMS), secondary progressive MS (SPMS) and clinically isolated syndrome (CIS) [3]. RRMS is the most frequent, representing 80–85% of new cases of MS [4, 5].

RRMS is characterised by symptomatic relapse at irregular intervals, interspersed with periods of remission in which there is total or partial recovery of the patient [6]. To reduce the frequency and severity of the relapse, and to delay disease progression, decrease the number of lesions in the central nervous system and maintain or improve patients’ quality of life, RRMS treatment should comprise disease-modifying therapies (DMTs) such as alemtuzumab, cladribine, dimethyl fumarate, fingolimod, glatiramer acetate, interferons, mitoxantrone, natalizumab, ocrelizumab, peginterferon or teriflunomide. The Association of British Neurologists (ABN) [7], in guidelines updated in 2015, recommends starting the treatment with a DMT of moderate efficacy (category 1) such as interferons, glatiramer, teriflunomide, dimethyl fumarate and fingolimod. Only for patients in high disease activity, or who do not respond or tolerate a category 1 DMT, are drugs of high efficacy (category 2) such as alemtuzumab and natalizumab recommended. The ABN guideline does not include new therapies such as cladribine and ocrelizumab [7]. These therapies were also not included in previous published meta-analyses with low risk of bias [8]. Network meta-analyses are recommended by the International Society for Pharmacoeconomics and Outcome Research to compare different treatments simultaneously [9, 10]. Thus, we aimed to conduct a network meta-analysis of randomised clinical trials (RCTs) to provide evidence-based hierarchies of the efficacy and safety of all available DMTs for patients with RRMS.

2 Methods

This systematic review was performed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Network Meta-Analyses (PRISMA NMA) [11] and Cochrane Collaboration recommendations [12] and is registered with the International Prospective Register of Systematic Reviews (PROSPERO), number CRD42017059120 [13].

2.1 Search Strategy and Selection Criteria

Electronic searches were conducted in the PubMed and Scopus databases without any time limit or language restriction (updated in May 2017). Trial registration databases (ClinicalTrials) and the reference lists of reviews and included studies were also searched. Complete search strategies are provided in the electronic supplementary material (ESM), p 2.

We included studies that fulfilled the following inclusion criteria: randomised, phase II or later controlled trials (including post hoc analyses) that assessed the efficacy, safety or quality of life (QoL) of a DMT as monotherapy (head-to-head or against placebo) in adults diagnosed with RRMS. The searched DMT therapies were alemtuzumab, 12 mg per day for 5 days (first course) and for 3 days (second course), with a 1-year interval between each course (ALE12 and ALE24) intravenous (IV); azathioprine (AZA) orally (PO); cladribine cumulative dose 3.5 mg, 5.25 mg per kg (CLA3.5 and CLA5.25) PO or (CLA2.45) subcutaneous (SC); daclizumab 150 mg and 300 mg every 4 weeks (DAC150Q4W and DAC300Q4W) IV; dimethyl fumarate 240 mg twice a day or three times a day (BG240BID and BG240TIW) PO; fingolimod 0.5 mg, 1.25 mg and 5 mg per day (FING0.5QD, FING1.25QD and FING5QD) PO; glatiramer acetate 20 mg, 40 mg per day and 40 mg three times a week (GA20QD, GA40QD and GA40TIW) SC; interferon β-1a 30 µg or 60 µg each week (IFNA30QW and IFNA60QW) intramuscular (IM); interferon β-1a 44 or 22 µg three times a week (IFNA44TIW and IFNA22TIW) SC; interferon β-1b 50, 250, 375 or 500 µg, every other day (IFNB50EOD, IFNB250EOD, IFNB375EOD and IFNB500EOD) SC; pegylated interferon 125 µg every 2 or 4 weeks (PIFN125Q2W and PIFN125Q4W) SC; natalizumab 300 mg every 4 weeks (NAT300Q4W) IV; ocrelizumab 600 mg and 2000 mg every 6 months (OCRE600Q6M and OCRE2000Q6M), IV; rituximab (RTX), IV and teriflunomide 7 mg and 14 mg per day (TERI7QD and TERI14QD), PO. The considered outcomes included the annualised relapse rate (ARR), disability progression confirmed at 12 weeks (DPC12), disability progression confirmed at 24 weeks (DPC24), disability improvement confirmed at 12 weeks (DIC12), disability improvement confirmed at 24 weeks (DIC24), discontinuations due to adverse events (DAE) and change in QoL evaluated through Short Form-36 items or 12 items (SF-36 or SF-12). Studies with a follow-up of < 12 weeks or evaluating RRMS with other forms of MS were excluded.

Two researchers independently screened the titles and abstracts of retrieved studies to identify irrelevant records. In a second stage, full-text articles were also independently evaluated by two researchers according to defined inclusion and exclusion criteria. Discrepancies were reconciled in consensus meetings, using a third researcher as a referee.

2.2 Data Analysis

The following data were independently extracted by two researchers by using Microsoft© Office Excel©: (1) study baseline characteristics (authors’ names, year of publication, country, sample size, patients’ sex and age, disease duration, onset of symptoms and follow-up, evaluated DMT therapies), (2) methodological aspects (e.g. trial design); (3) clinical outcome results (efficacy, safety or QoL).

The critical evaluation of risk of bias of the included studies was conducted by two independent reviewers, using the Cochrane Collaboration revised Risk of Bias (RoB 2.0) assessment tool [14]. In the absence of consensus, points of disagreement were resolved by the opinion of a third researcher.

Statistical analyses were performed using software R v. 3.4.1/R studio 1.0.153 [15], packages READR [16], META [17], METAFOR [18], GeMTC [19], RJAGS [20] and CODA [21]. Transitivity analyses were performed by comparing population, interventions and control and outcome definitions among the included studies in the meta-analyses. Transitivity was assumed for minor differences in follow-up times, and 48 or 52 weeks were considered to correspond to 1-year follow-up; 96–108 weeks were considered to correspond to 2-year follow-up. Both pairwise and network meta-analyses were performed. Effect size measures were defined for each outcome as follows: hazard ratio for the ARR outcome; relative risk for dichotomous outcomes (DPC12 and 24 weeks, DIC12, DAE); mean difference for the changes in the QoL scores (SF-36 or SF-12).

Pairwise meta-analyses for the outcome ARR were assessed using the Poisson method. Dichotomous outcomes (DPC12 and 24 weeks, DIC12, DAE) were analysed using the Mantel–Haenszel method. Changes in the QoL scores (SF-36 or SF-12) were evaluated through inverse variance. DerSimonian-Laird estimator of τ2 was employed in all analyses. All effect size measures were calculated considering 95% confidence intervals (CI). Data entry was performed with contrast-based data. Heterogeneity was evaluated using Higgis inconsistency analyses (I2) [12]. Sensitivity analyses, using adjustment of the random effects model by Hartung–Knapp and Sidik–Jonjman estimator for τ2, were carried out.

Network meta-analyses, using a Bayesian framework for each outcome based on the Markov Chain Monte Carlo simulation method, were performed. Arm-level entry data was used. For the inclusion of multiple-arm studies, correlations for the likelihood between arms were considered. A common heterogeneity parameter was assumed for all comparisons [22]. We opted for a conservative analysis of non-informative priors [23]. Effect size measures were expressed with a 95% credibility interval (CrI). Both fixed- and random-effect models were tested, and the one with the lowest deviance information criteria (DIC) was selected. Convergence was attained based on visual inspection of Brooks-Gelman-Rubin plots and potential scale reduction factor (PSRF) (1 < PSRF ≤ 1.05). To increase the estimate precision of the relative effect sizes of comparisons and to properly account for correlations between multi-arm trials, ranking probabilities for each outcome were calculated via surface under the cumulative ranking analysis (SUCRA). To estimate the robustness of the network, inconsistency, defined as the difference between the pooled direct and indirect evidence for a comparison, was assessed using node-splitting analysis [24]. Sensitivity analyses with the hypothetical removal or inclusion of the studies were conducted when discrepancies were identified in the network meta-analyses: (1) first scenario: original analyses; (2) second scenario: removal of studies with high risk of bias; (3) third scenario: inclusion of studies with non-approved therapies; (4) fourth scenario: removal of studies with suspicion of impairing transitivity due to important differences in patients’ characteristics (e.g. age, Expanded Disability Status Score [EDSS], disease duration, onset of symptoms and previous DMT experience). When possible, subgroup (e.g. population) analyses also were performed.

The quality of the evidence was assessed using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group [25] for ARR and DAE96, which were classified as high, moderate, low and very low. IFNA44TIW was assumed as the common comparator.

2.3 Role of Funding Source

This study was funded by the Institutional Development Support Program of the National Health System (Proadi-SUS) and Hospital Alemão Oswaldo Cruz (n. 01/2017). This funder had no role in any of the phases of the study (i.e. study design, data collection, data analysis, interpretation, writing of the report and responsibility for submission).

3 Results

Our systematic review identified 2797 records after removal of duplicates; 2596 were considered irrelevant during the screening, and 152 were excluded in the full-text appraisal (ESM pp 3–8). The remaining 49 records represent 40 RCTs included in systematic review and 37 in meta-analysis. The included articles were published between 1995 and 2017, with a median in 2011. Most of the studies in systematic review were multicentric, conducted in more than one country (n = 38) and included 29,150 participants (median of 196; interquartile range: 111–417), 66% of whom were women. Eight studies included only treatment-naive participants, and one study assessed only treatment-experienced patients; 16 studies included both treatment-naive and treatment-experienced patients, and 15 articles did not report this information. Altogether, 15 approved dosages of DMT were identified, with 16 clinical trials comparing active therapies (head-to-head trials), 14 comparing different doses of DMT and 10 evaluating the active treatment against placebo. No study evaluating azathioprine or rituximab fulfilled the inclusion criteria and could be included in the systematic review. Most of the studies had a follow-up of 96 weeks (median 96; interquartile range: 48–96). The main characteristics of the included studies are presented in Table 1 (supplemental characteristics are presented in the ESM, p 9).

Table 1 Characteristics of the included studies in the systematic review

The methodological quality assessed by RoB 2.0 is presented in the ESM, pp 10–14. The outcomes more frequently associated with ‘low risk of bias’ were disability improvement and disability progression confirmed at 12 weeks; ‘some concerns’ appear more frequently in disability progression confirmed at 24 weeks, whereas ‘high risk’ was associated with QoL and ARR outcomes. The two domains more frequently scored as ‘high risk of bias’ were measurement of the outcome (due to the lack of masking of the assessors) and domain referring to missing outcome data.

Network diagrams of the possible comparisons for evaluated outcomes in the first scenario (original analysis) are presented in Fig. 1. (Studies included in each network meta-analysis are presented in the ESM, p 15). It was not possible to build a single network for the DIC12 or QoL outcomes (SF-36 or SF-12). Baseline EDSS, disease duration or onset of symptoms were assumed to be homogeneous, because sensitivity analyses that excluded studies suspected of impairing transitivity exhibited similar results. A fixed-effects model was selected for the ARR and DCP24 network meta-analyses, whereas a random-effects model was selected for other DCP12 and DAE analyses.

Fig. 1
figure 1

Network meta-analyses. Network geometry: each node represents a therapy, and the lines represent direct comparisons in the literature; thicker lines represent largest number of studies identified, and larger nodes represent the largest number of studies for a therapy; dark-coloured nodes correspond to first-line therapies and light-coloured nodes to second- or third-line therapies and placebo. a Annualised relapse rate; b disability progression confirmed at 12 weeks; c disability progression confirmed at 24 weeks; d discontinuation due to adverse events at 96 weeks. ALE12 alemtuzumab 12 mg daily for 5 days and, 12 months later, 12 mg daily for 3 days, BG240BID dimethyl fumarate 240 mg twice daily, CLA3.5 cladribine, cumulative dose 3.5 mg/kg, DAC150Q4W daclizumab 150 mg monthly, FING0.5QD fingolimod 0.5 mg daily, GA20QD glatiramer acetate 20 mg daily, GA40TIW glatiramer acetate 40 mg three times weekly, IFNA30QW interferon β-1a 30 µg weekly, IFNA44TIW interferon β-1a 44 µg three times weekly, IFNB250EOD interferon β-1b 250 µg every other day, NAT300Q4W natalizumab 300 mg monthly, OCRE600Q6M ocrelizumab 600 mg every 6 months, PIFN125Q2W peginterferon 125 µg every 2 weeks, PLA placebo, TERI7QD teriflunomide 7 mg daily, TERI14QD teriflunomide 14 mg daily

The network meta-analysis of ARR (Fig. 1a) included 32 studies (n = 38,298 patient-years). All therapies were statistically superior to placebo. By SUCRA analysis, ALE12 presented the highest probability of being the best alternative for this outcome (96% probability), followed by NAT300Q4W with 96%, and OCRE600Q6M with 85%. On the other hand, TERI7QD and IFN30QW appeared to be the worst therapies (23% and 7%, respectively, of probability of ranking first) (see Table 2, Fig. 2).

Table 2 Network meta-analysis results for ARR, regarding all follow-up durations (lower) and DAE for 96 weeks of follow-up (upper)
Fig. 2
figure 2

Ranking plot based on analysis of surface under the cumulative ranking curve (SUCRA) values for efficacy (annualised relapse rate) and acceptability (treatment discontinuation due to adverse events) over 96 weeks. Treatments lying in the upper-right corner are more effective and acceptable than the other treatments. TERI7QD, TERI14QD, GA40TIW, PIFN125Q2W and DAC150Q4W could not be included in network meta-analysis because of discontinuation due to adverse events. ALE12 alemtuzumab 12 mg daily for 5 days and, 12 months later, 12 mg daily for 3 days, BG240BID dimethyl fumarate 240 mg twice daily, CLA3.5 cladribine, cumulative dose 3.5 mg/kg, DAC150Q4W daclizumab 150 mg monthly, FING0.5QD fingolimod 0.5 mg daily, GA20QD glatiramer acetate 20 mg daily, GA40TIW glatiramer acetate 40 mg three times weekly, IFNA30QW interferon β-1a 30 µg weekly, IFNA44TIW interferon β-1a 44 µg three times weekly, IFNB250EOD interferon β-1b 250 µg every other day, NAT300Q4W natalizumab 300 mg monthly, OCRE600Q6M ocrelizumab 600 mg every 6 months, PIFN125Q2W peginterferon 125 µg every 2 weeks, PLA placebo, TERI7QD teriflunomide 7 mg daily, TERI14QD teriflunomide 14 mg daily

The DPC12 network meta-analysis (Fig. 1b) (n = 16 studies; 13,510 patients) revealed that ALE12 and OCRE600Q6M were significantly more efficacious than other therapies (94 and 88% of probabilities, respectively). The DMTs IFNB250EOD and GA20QD were the worst treatments for this outcome (20 and 24% probability, respectively) (see ESM, p 16).

The DPC24 analyses (Fig. 1c) (n = 16 trials; 13,410 patients) presented IFNB250EOD (93% probability) and ALE12 (76% probability) as more efficacious treatments, whereas GA40TIW (5%) and IFNA30QW (22%) were the worst options (see ESM, p. 16). Considering the possibility that the INCOMIN (Independent Comparison of Interferon) [26] study was an outlier in this analysis, after the removal of this study, NAT300Q4W became the most effective (87% probability), followed by ALE12 (82% probability) and OCRE600Q6M (77% probability) (see ESM, p. 17).

The network of DAE96 (Fig. 1d) included 17 trials with 12,221 patients. ALE12 and OCRE600Q6M were considered the best therapies for this outcome (85 and 67% probability, respectively), whereas IFN44TIW and CLA3.5 (22 and 38%, respectively) were ranked last in the SUCRA analyses (see Table 2). It is noteworthy that a statistical difference was found for only one comparison in this outcome (ALE12 and IFNA44TIW, with relative risk [95% CrI] 0.37 [0.17–0.81]).

Pairwise meta-analyses results confirm the results obtained in the network meta-analyses (see ESM, p. 18). In the pairwise meta-analyses, the change from the original statistical method to the Hartung-Knapp method caused the enlargement of the confidence intervals. Consequently, the statistically significant difference between most of the comparisons was lost, except for ARR (ESM, p 19).

The node-splitting technique revealed that no substantial differences in the magnitude or direction between the results of the direct and indirect effects were identified in the network meta-analyses (ESM, p. 20). The sensitivity analyses, using different scenarios of network meta-analyses (scenarios II–IV), found no significant differences compared with the original scenario (ESM, p. 21).

Several studies reported results for patients with more aggressive forms of RRMS. However, due to the inconsistent reports of outcome results, it was not possible to perform subgroup meta-analyses (both pairwise and network) to synthesise evidence.

Table 3 presents the results of the quality of evidence assessment (GRADE) for the ARR and DAE outcomes, with therapies listed by rank order. For ARR, the most efficacious therapies compared with IFNA44TIW presented high-confidence evidence, whereas in the comparison with the least efficacious therapies, the confidence varied from low to moderate due to imprecision and presence of methodological bias. For the outcome of DAE, most comparisons presented high or moderate confidence, the latter downgraded by the presence of methodological bias. None of the comparisons in either outcome was affected by intransitivity.

Table 3 Evidence quality assessment for annualised relapse rate and discontinuation due to adverse events (GRADE)

4 Discussion

In our study, we compared efficacy and safety of DMTs through a systematic review of 40 studies (29,150 participants), with 33 studies (26,133 participants) included in the original analysis. Previous network meta-analyses of DMTs in RRMS, published by Tramacere et al. [8], Fogarty et al. [27], Siddiqui et al. [28] and Hamidi et al. [29] included, respectively, 39, 28, 44 and 49 studies. However, our network presents more strict inclusion criteria, considering only RRMS patients and not SPMS patients [8]. We also included the three new therapies (CLA3.5, DAC150Q4W and OCRE600Q6M) not considered in Tramacere et al. Fogarty et al. or Hamidi et al. [8, 27, 29].

The ARR outcome was the most reported by the studies, followed by DAE; although DPC was reported by most of the studies, the statistical synthesis of this outcome in the present review showed limitations due to differences in its assessment: some studies reported DPC or DIC only at 12 weeks, others at 24 weeks. QoL was poorly reported, and in a few cases, it was described incompletely (without distribution statistics) and with different assessment tools, including specific ones for MS, EQ-5D and SF-12/36, which precluded a network development. QoL and other patient-reported outcomes are highly relevant in the context of RRMS, considering that the disease is a progressively incapacitating condition. Hence, as important as avoiding a relapse and reducing disease intensity, the benefits obtained with the treatment should include physical and mental improvement from the perspective of the patient. The heterogeneity in measuring and reporting the addressed outcomes may be explained by the absence of a core outcome set for RRMS in adults, which would be paramount to guide future studies, contributing, therefore, to the consistency and pertinence of new findings.

ARR was the outcome that presented the best robustness. The greater precision for this outcome is probably because (as recurrence of relapses) it is the primary endpoint in most of the studies, being used in the definition of sample size. ALE12, NAT300Q4W and OCRE600Q6M were the most effective treatments against placebo and active comparators, with over 80% probability of being the best choice (SUCRA). No significant differences in terms of efficacy were identified among them; however, differences in costs and administration schedule exist. NAT300Q4W is administered monthly, OCRE600Q6M is given every 6 months and alemtuzumab in two unique courses (one in 5 days in the first year and a second over 3 days in the second year). Despite the potential benefit of ALE12 administration, little evidence of long-term efficacy exists [30]. Cost–utility studies comparing DMTs identified ALE12 as more effective and less costly than treatment alternatives, followed by NAT300Q4W [29]. No pharmacoeconomic studies were found that evaluated ocrelizumab with ALE12 or NAT300Q4W, but comparison with IFNA44TIW shows OCRE600Q6M as the most efficient therapy [31]. CLA3.5 is administered orally, which may be an additional benefit compared with daclizumab (SC) and dimethyl fumarate (PO), with similar efficacy. Conversely to Siddiqui [28], we also identified that CLA3.5, DAC150Q4W, BG240BID and FING0.5QD present an intermediate efficacy profile in ARR, below ALE12 and NAT300Q4W. Potential reason for this discrepancy between the meta-analyses could be the different patients’ inclusion criteria.

Very similar results were obtained for DPC12 and 24. It is important to note that the INCOMIN study [26] was reported as an outlier for this outcome and may have overestimated the efficacy of IFNB250EOD [32, 33]. The influence of this trial in the ARR network was not relevant because of the greater number of studies included. Sensitivity analyses had a greater impact on secondary and tertiary outcomes, but not on ARR. This suggests that outcomes with lower statistical power are more sensitive to the inclusion or exclusion of studies.

We evaluated safety, considering only DAE96, because of two reasons. It was not possible to build a 48-week network, but the main reason was that ALE12 is administered once a year, which could overestimate its safety in periods of < 2 years. For this outcome, whereas in individual studies and direct meta-analyses some therapies demonstrate superiority over IFNA44TIW and placebo, in our network we identified no differences among most of the comparisons. The only statistical difference found was between ALE12 and IFNA44TIW, favouring ALE12, probably because of the difference administration schedules make (once a year vs three times weekly, respectively). Despite the wide credibility intervals, SUCRA analysis suggests that ALE12 and OCRE600Q6M are the safest therapies. It is important to note that DAE is usually associated with serious adverse events, and the ranking we obtained may not correspond with the frequency of other adverse events [33]. This analysis demonstrates how considering DAE as the only relevant safety outcome could cause health professionals to make wrong decisions, especially regarding new therapies. Unlike the DAE analysis, clinical practice demonstrates that most of these therapies are associated with high rates of adverse events, and many long-term adverse events are identified after the end of treatment (e.g. acute acalculous cholecystitis, thyroid disorders, immune thrombocytopenia with alemtuzumab) [34, 35] or even after the end of the studies (e.g. cancer, severe infections and deaths with ocrelizumab or progressive multifocal leukoencephalopathy, severe liver failures and lymphoma cases with natalizumab) [36, 37]. Moreover, the case of daclizumab illustrates this concern: despite the absence of DAE at 96 weeks or differences with interferon on report of severe adverse events in the DECIDE (Efficacy and Safety of Daclizumab High Yield Process Versus Interferon β 1a in Patients With Relapsing–Remitting Multiple Sclerosis) and SELECT (Daclizumab High-Yield Process in Relapsing–Remitting Multiple Sclerosis) trials, recently the manufacturer announced voluntary worldwide withdrawal of marketing authorisations for Zinbryta® for RRMS due to the identification of cases of inflammatory encephalitis and meningoencephalitis [38]. Therefore, besides the need to consider frequency of adverse events, it is mandatory to consider results of trial extension and real-world evidence data with adequate confidence in evidence, in order to conclude about the safety profile of DMTs.

We selected INFA44TIW as the comparator for our analyses of confidence in the evidence, instead of placebo like Tramacere et al. [8]. Due to the effects of imprecision, selecting a high-efficacy comparator, such as NAT300Q4W, would downgrade the quality of evidence for each comparison, whereas selecting a low-efficacy comparator, such as IFNA30QW or placebo, would excessively upgrade the quality of evidence. High-efficacy comparators present the highest quality of evidence for efficacy, whereas intermediate- and low-efficacy comparators present moderate and low quality. Evidence obtained for safety was mainly high quality. None of the comparisons were downgraded due to heterogeneity, publication bias or intransitivity; however, the low heterogeneity may have been caused by the small number of studies in each pairwise meta-analysis. Our sensitivity analyses demonstrated no problems associated with transitivity.

As a result of our analyses, especially of efficacy evaluation, we suggest a three-category classification for RRMS instead of the two categories recommend by ABN [7]. We identified three efficacy clusters: high efficacy (i.e. ALE12, NAT300Q4W and OCRE600Q6M), intermediate efficacy (i.e. CLA3.5, FING0.5QD and BG240BID) and low efficacy (i.e. PIFN125Q2W, GA40TIW, IFNA44TIW, GA20QD, TERI14QD and IFNB250EOD). This reclassification would have an impact on selecting first-line therapies for patients with more aggressive conditions (highly active or rapidly evolving severe); although there is not sufficient evidence, it seems that high- and intermediate-efficacy therapies could be the first-line choice for patients with more aggressive conditions, which makes a difference, especially for FING0.5QD and BG240BID, currently in category 1 in the ABN guidelines [7], but also for CLA3.5, not included in the guideline. Although ABN guidelines draw no difference between interferons and teriflunomide, our network identified TERI7QD and IFNA30QW as the worst options for ARR, which probably justifies removing them from the guideline. Considering its efficacy profile, DAC150Q4W could be proposed as intermediate efficacy; however, recently the manufacturer announced voluntary worldwide withdrawal of marketing authorisations for Zinbryta® for RRMS due to identification of cases of inflammatory encephalitis and meningoencephalitis [38].

As in any systematic search, there is a chance that studies were missed. However, the grey literature and manual searches found no additional studies, reinforcing the quality of our search. Although we applied strict selection criteria for the type of MS, we found poor reporting of raw data in subgroup analyses of primary studies, which precludes us from performing subgroup meta-analyses (e.g. age, EDSS, disease duration, disease activity). The rapid evolution of the diagnostic criteria may produce differences in efficacy assessment, and we could not make sensitivity analyses, because most DMTs were evaluated by a single criteria. Moreover, analyses for DAE may not represent accurate safety concerns as seen in real-world settings.

5 Conclusion

High-quality evidence shows that alemtuzumab, natalizumab and ocrelizumab present the highest efficacy among DMTs, and other meta-analyses are required to evaluate the frequency of adverse events to better understand the safety profile of these therapies. Based on efficacy profile, guidelines should considerer a three-category classification (i.e. high, intermediate and low efficacy). Specific studies should be conducted for a more precise selection of therapies for more aggressive RRMS conditions.