Introduction

Rheumatoid arthritis (RA) is a chronic systemic inflammatory disease that causes cartilage damage, bone erosions, and eventually joint deformity. These impairments are associated with limitations in daily activities, work productivity, and quality of life. Other tissues and organs, including the heart and lungs, may also be affected causing additional health issues. Approximately 1 % of the population is diagnosed with RA worldwide but can vary by country [15]. The diagnosis of RA increases after the fourth decade of life and is three times more likely in women than men. Regional differences in the prevalence of RA have been described [3, 5]. In 2005, the prevalence rate of RA in the USA was estimated to be 0.6 % of the adult population [3], compared to Australia and New Zealand where the prevalence rates ranged from 2.1 to 3.5 %, in a similar period. In a recent systematic review, authors reported median annual incidence rates for the total population in south Europe of 16.5 (range 9 to 24) cases per 100,000, compared to 29 (range 24 to 36) for north European countries, and 38 (range 31 to 45) cases per 100,000, for North America [5].

Regardless of the geography, RA is associated with a significant burden to healthcare systems [6] and society [7]. The total costs of RA in the UK, including indirect costs and work-related disability, have been estimated at between £3.8 and £4.75 billion per year, before 2000 [8]. In the USA, by 2003, the estimate of the total cost of arthritis and other rheumatic conditions was approximately $128 billion, equivalent to 1.2 % of the 2003 US gross domestic product [9]. More recent publications [1014] reported the annual mean total cost per patient to range from $4700 to $24,920 US dollars. Annual mean indirect costs in these publications had wider ranges but are difficult to compare due to differences in methodology. These estimates are constantly increasing due to aging of the population and availability of new health technologies for the management of RA, which often come at higher costs to the healthcare system.

Recent advances in the management of RA focus on biologic disease-modifying antirheumatic drugs (DMARDs). The first set of biologic agents to be approved for RA was the TNF antagonists: etanercept, infliximab, and adalimumab. Subsequently, two other TNF antagonists were also approved certolizumab pegol and golimumab. Other biologic DMARDs with different mechanisms of action to the TNF antagonists have been also approved for the management of RA: rituximab, a genetically engineered chimeric anti-CD20 monoclonal antibody that depletes B cells; abatacept, a soluble human fusion protein that selectively modulates T-cell costimulation; anakinra, an interleukin (IL)-1 antagonist; and tocilizumab (TCZ), a humanized monoclonal antibody against the IL-6 receptor inhibitor (IL-6R). Despite the number of available therapies, there continues to be an unmet need for newer agents for bio-naive and TNF refractory patients.

Current evidence-based guidelines recommend that patients must fail conventional DMARDs (cDMARDs) before treatment with biologics, as monotherapy or in combination with cDMARDs, to improve outcomes, such as radiological damage, symptom control, function, and health-related quality of life (HRQoL) in patients with moderate to severe RA [15, 16]. In 2012, a comprehensive review from the Agency of Healthcare Quality (AHRQ) in the USA, compared the therapeutic effects of the different drugs for RA [17], but it was not possible to draw specific conclusions about which therapy is best for patients with RA. Considering the multiple alternatives and different mechanism of action between biologic DMARDs and in the scarcity of head-to-head data, it is important to assess comparative effectiveness of biologics in RA to inform decision makers, physicians, and patients about the best alternative for each patient. Therefore, to address this need, we conducted a systematic review and a network meta-analysis of treatment strategies that incorporate biologic and cDMARD.

Materials and methods

Methods of study selection, quality assessment, and appraisal

A systematic review was performed to identify randomized controlled trials of approved biologics. Generally, the reviews followed the Centre for Reviews and Dissemination guidance for undertaking reviews in healthcare [18] and the Cochrane Collaboration Handbook [20]. A range of databases were searched up to January, 2014, as well as searches of trial registries, and the references of identified research and review articles. All relevant studies regardless of language or publication status (published, unpublished, in press or in progress) were considered eligible for this systematic review. Objectively derived search filters for randomized controlled trials were used. No date or language limits were applied; only studies in humans were sought in multiple datasets using specific keywords (Appendix). Screening of titles and abstracts was done by two reviewers independently. The full texts of potentially relevant studies were obtained and assessed for inclusion by one reviewer with checking of a 10 % random sample by a second reviewer. Disagreements were resolved through discussion or referral to a third reviewer where necessary. Quality assessment was carried out independently by two reviewers using the Cochrane Collaboration quality assessment checklist [19, 20]. Consensus was used to resolve any disagreements. Data extraction sheets were designed and piloted using Microsoft Excel 2007. Data extraction was carried out by one reviewer, and then checked by a second reviewer. Any disagreements were resolved by consensus. A flow diagram of the numbers of studies included and excluded at each stage was provided following guidance in the PRISMA statement (Fig. 1).

Fig. 1
figure 1

Summary of study flow

Quantitative analysis and meta-analysis methods

For the few cases when “head-to-head” comparisons of biologic treatments were available, the quantitative analysis was conducted in line with the Cochrane Handbook [20]. However, for the majority of other, non-head-to-head, trials, mixed treatment comparison meta-analysis was conducted using Bayesian methods [21]. The comparison treatment was extracted based on the definition provided by the authors in each RCT. Most of these studies used specific comparators (i.e., placebo or MTX), or used the physician’s criteria with a broader set of treatments in the control arms (i.e., DMARD or Standard Care). The main outcomes considered included: ACR 20, ACR 50, ACR 70, and EULAR score (moderate or good). Data were collected for outcomes reported at 26 weeks (±2 weeks) and 52 weeks (±2 weeks). Dichotomous outcomes (e.g., number of patients responding according to ACR 20 criteria) were reported as odds ratios (ORs) with 95 % confidence intervals (CI). Pooled effect sizes and 95 % CIs were calculated for the direct head-to-head comparisons using fixed or random effects models, as appropriate, where trials were considered to be clinically and statistically homogeneous. Analyses used Cochrane Review Manager version 5.1 (RevMan 5.1). Publication bias was assessed where there were sufficient numbers of trials.

The most recently approved biologic agent for the management of RA, TCZ, was used as the reference biologic treatment for the analysis. In the absence of trials directly comparing the biologics, indirect treatment comparisons and NMA were performed. Comparisons were performed at three different levels of aggregation: (1) all doses for bDMARDs and cDMARDs were considered as separate interventions, and results were presented for the recommended doses of each DMARD (Appendix 3, Table 2). For cDMARDs, only MTX at the recommended dose was included in this analysis; (2) all doses for cDMARDs were combined as treated as the same intervention; and (3) all doses were combined for all interventions. Doses below the recommended dose for each of the biological DMARD were excluded.

If it was possible to form a connected network of trial evidence (where all treatments could be linked directly or indirectly) then a Bayesian NMA was performed using WinBUGs version 1.4 (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml). Vague priors [normal (0, 0.0001)] were used for estimating the trial baselines, treatment differences, and the random effects standard deviation [uniform (0,2)]. A burn-in period of 10,000 simulations was used, followed by a further 20,000 simulations, which were used to obtain parameter estimates. Model fit was investigated using the deviance information criterion (DIC) and residual deviance and, depending on the amount of available evidence, the fit of fixed and random effects models were compared. The results from the most appropriate model are presented as OR with 95 % credible intervals (CrI).

The assumptions of homogeneity, similarity, and consistency as described by Song et al. [22] were assessed, as without these the NMA results may not be valid. Clinical homogeneity was assessed by reviewing the baseline characteristics of the trial populations, including baseline severity of RA, age, concomitant treatments, and comorbidities. Since most biologics were not directly compared, consistency could not be assessed with the available evidence.

Results

Systematic review

A total of 158 manuscripts were identified and included through the specific search strategy (Appendix 1, search strategy and results). In addition, 49 papers were retrieved through extended manual searches. In total, 207 publications were included for the assessment of RCTs of clinical effectiveness of biologics in RA. These 207 papers described 68 individual trials in total (Appendix 2, references of studies included, and Appendix 3, Table 1).

All trials included adult patients with moderate to severe active RA, who had an inadequate response to DMARDs. Most of these studies (n = 32) recruited patients worldwide and were between 24 to 26 weeks in duration (n = 40). Twelve trials were performed only in North America, 11 trials only in Japan, and 3 only in Korea. Five trials were performed only in Europe, one only in China, and two trials in South and North America. For two trials, it was not reported where they were performed. Eighteen trials were 52 weeks in duration, and eight trials followed patients for 104 weeks, but most presented data at 24 weeks. The definition of active RA differed considerably between trials. Most trials accepted the use of rescue medication (24 trials), although this was not reported in 39 trials, while five trials explicitly did not allow the use of rescue medication. In most trials, patients had failed on previous cDMARDs (60 trials), but in five trials, patients had failed on previous TNF treatment, and in three trials, patients had failed on both (Appendix 3, Table 1).

Overall, the methodological quality of the 68 trials was inconsistent. Despite our efforts, a number of the quality assessment criteria could not be assessed and were described as “unclear” due to the inadequate reporting of methods in the trial publications. Poor reporting made the assessment of the randomization method particularly difficult (60/68 trials were “unclear”), and allocation concealment (49/68 trials were “unclear”). Given the difficulties in assessing the methodological quality of a number of the trials, the reliability of the data is uncertain. Despite this uncertainty, 38/68 trials had low risk of bias for selective outcome reporting. In terms of individual trial, the methodological quality varied and some had particular methodological problems of concern, which may affect the robustness of their results. Of note, 8/68 trials had described the randomization method and 17/68 trials described allocation concealment adequately. In 20 trials patients, care providers and outcome assessors were clearly described as blinded but only 11 trials used a true ITT analysis. Assessor blinding or independent verification of outcomes is particularly important for subjective outcomes such as treatment response (ACR and EULAR). Prior knowledge of the treatment being received may lead intentionally or unintentionally to the biased assessment of outcomes in favor of the study drug of interest.

Network meta-analysis

The 68 trials (Appendix, Table 1) were used to form evidence networks for the analysis of outcome. A total of 24 evidence networks were created and included in the analyses (four main outcomes: ACR 20, ACR 50, ACR 70, and EULAR; two time points: 26 and 52 weeks; and three levels for pooling different doses of interventions).

The results for the main effectiveness outcomes at 26 weeks from the NMA using separate doses for all bDMARDS and only using MTX for cDMARDs at the regular dose (level 1) are presented in Table 1 for all comparisons with TCZ alone, and in Table 2 for all comparisons with TCZ in combination with MTX. These results showed that TCZ alone was superior than standard care/placebo in achieving ACR 20, ACR 50, ACR 70, and EULAR response at 26 weeks of follow-up (OR 13.27, 95 % CrI (3.958, 43.98); 17.45 (10.18, 31.24); 37.77 (7.226, 216.3); 10.42 (1.963, 54.8), respectively). TCZ alone was also significantly better than MTX alone for achieving ACR 50, ACR 70, and EULAR response at 26 weeks of follow-up (OR 5.44, 95 % CrI (4.142, 7.238), 7.364 (1.4, 30.83), and 4.226 (1.184, 15.58), respectively). All the results were estimated using random effects models as these provided a better fit to the data compared to the fixed effect models (lower DIC values). The level 1 analysis for the trials with 52-week results contained fewer trials, and so fixed effect results were presented for ACR 20 and 50 at 52 weeks as these models provided a better fit to the data.

Table 1 Primary endpoint comparisons with tocilizumab alone (ACR odds ratio vs TCZ-IV8, median [95 % CrI]) at 26 and 52 weeks
Table 2 Primary endpoint comparisons with tocilizumab + MTX (ACR odds ratio vs TCZ-IV8 + MTX, median (95 % CrI)) at 26 and 52 weeks

In the level 1 analysis, the combination of TCZ and MTX was significantly better than standard care/placebo and MTX alone for ACR 20, ACR 50, ACR 70, and EULAR response outcomes at 26 weeks follow-up (OR 18.63 (95 % CrI 5.32, 66.81), 24.27 (95 % CrI 14.5, 41.91), 46.13 (95 % CrI 10.08, 277), 14.23 (95 % CrI 2.493, 84.02), 4.169 (95 % CrI 2.267, 7.871), 5.44 (95 % CrI 4.142, 7.238), 8.731 (95 % CrI 4.203, 19.29), 7.306 (95 % CrI 4.393, 13.04), respectively). At 52 weeks, when compared to MTX alone, TCZ + MTX were significantly better for ACR 20 and ACR 50 response. However, no comparison was possible with standard care/placebo at 52 weeks due to lack of data. TCZ + MTX was significantly better than etanercept alone for ACR 20, ACR 50, and ACR 70 responses at 26 weeks and ACR 20 and ACR 50 responses at week 52. Compared to adalimumab monotherapy, TCZ + MTX was significantly better for ACR 20 and ACR 50 response at 26 weeks. TCZ +MTX was significantly better in ACR responses for the following comparisons: abatacept + MTX for ACR 50 at 26 weeks; adalimumab + MTX for ACR 50 at 52 weeks; etanercept + MTX for ACR 50 at 26 weeks and ACR 20 at 52 weeks; infliximab + MTX for ACR 50 at 26 weeks, and TCZ alone for ACR 50 at 26 weeks. Only one comparison favored the comparator: CZP-200 + MTX, which was for ACR 20 at 52 weeks follow-up (OR 0.489, 95 % CrI (0.279, 0.83)). Comparisons with abatacept, adalimumab alone, certolizumab alone, golimumab alone or in combination, and infliximab alone could not be assessed due to lack of data.

For EULAR responses, TCZ + MTX was significantly better than adalimumab + MTX at 26 weeks and infliximab + MTX or abatacept + MTX at 52 weeks.

The results for the level 2 NMA—using separate doses for all bDMARDS and all doses for cDMARDs (Tables 3 and 4) and level 3—using combined doses for all bDMARDS and cDMARDs (Tables 5 and 6) resulted in slightly different results due to the inclusion of additional trials with different doses for cDMARDs and bDMARDs (licensed doses and higher doses). However, overall, the results were consistent. In the level 3 analyses, TCZ alone and TCZ + DMARD were significantly better to DMARD alone and standard of care for all outcomes assessed, with one exception, the EULAR score response at 26 weeks was no longer significant in favor of TCZ alone and TCZ + DMARD when compared with standard care/placebo. If there were significant differences in comparison with other bDMARDs, they were in favor of TCZ alone and TCZ + DMARD, but in different magnitudes as previously described.

Table 3 Primary endpoint comparisons with tocilizumab alone (ACR odds ratio vs TCZ-IV8, median [95 % CrI]) at 26 and 52 weeks
Table 4 Primary endpoint comparisons with tocilizumab + DMARD (ACR odds ratio vs TCZ-IV8 + DMARD, median [95 % CrI]) at 26 and 52 weeks
Table 5 Primary endpoint comparisons with tocilizumab alone (ACR odds ratio vs TCZ, median [95 % CrI]) at 26 and 52 weeks
Table 6 Primary endpoint comparisons with tocilizumab + DMARD (ACR odds ratio vs TCZ + DMARD, median [95 % CrI]) at 26 and 52 weeks

Discussion

In this systematic review, 68 trials were included to assess the clinical effectiveness of biologic DMARDs in RA. Overall, the methodological quality of the included trials was inconsistent. There was considerable clinical heterogeneity within the systemic review, particularly with regard to the patient populations and dosing schedules used. Differences in clinical and demographic characteristics, as well as previous exposure, failure with previous treatments (i.e., DMARD IR or TNF IR), and differences in the definition of standard care as comparator, in each trial population may influence the analysis and the results of the NMA. Some of these characteristics were measured and reported in some but not all the trials systematically. These are important limitations, and the impact of those differences cannot be fully quantified in the current analysis and must be considered when putting in context these results. In addition, as most NMA, the availability of new evidence will require additional analysis to incorporate that information and provide new estimates about the effectiveness of biologic treatment in RA. Safety endpoints were not included in this analysis, in part due to higher heterogeneity and differences across trials about the definitions and reporting of adverse events. Therefore, results of these analyses should be interpreted in the context of those limitations. Another limitation of this study is the use of TCZ, the only IL-6 inhibitor receptor currently available, as the reference for comparison as the most recently approved biologic in this indication. During the study period, only TCZ published RCTs were available and therefore limited the generalizability of the study findings to the IL-6 class. With the availability of new biologics, including new therapies, better generalizability of the results may be obtained by including those studies.

The NMA was done at three levels, allowing only recommended doses for all drugs at level 1, combining all cDMARDs in level 2, and combining all cDMARDs and doses for bDMARDs (except subrecommended doses) in level 3. Overall, the results of these analyses were similar for all three levels, providing consistency to the results. Results of the network analyses showed that TCZ alone and TCZ + MTX are superior to DMARDs alone and standard care on most outcomes assessed, where enough information was available to perform the analysis. Few significant differences were found between TCZ alone and any of the biologicals alone or in combination with MTX. Only one comparison favored the comparator. Certolizumab + MTX showed significantly better response for ACR 20 at 52 weeks of follow-up compared to TCZ alone in the level 1 analysis. This difference was not found in any of the other analyses performed.

Other NMA for biologics in RA have been published focusing on different evidence networks and endpoints. Recently, Tvete et al. [23] conducted a multiple treatment comparison analysis using ACR 50 as the dependent variable and dose level and disease duration as the independent variable for assessing the comparable relative effect between nine biologics (adalimumab, certolizumab, etanercept, golimumab, infliximab, anakinra, abatacept, rituximab, and tocilizumab) and placebo or DMARD. In contrast to our analysis, Tvete et al. aggregated ACR 50 response across all periods (12 to 54 weeks) and did not consider other response variables (ACR 20, ACR 70, EULAR response). This analysis, based on 54 publications, embraced all treatment and comparators arms over all publications. The authors found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs. Similar to our analysis, differences in patient characteristics between trials was not fully incorporated, and the authors concluded that all biologic agents were more effective than placebo.

Conclusions

The systematic review of RCTs of biologics in adults with RA who have failed treatment with conventional disease-modifying agents for rheumatoid disease (cDMARD) showed inconsistencies and varying degrees of methodological quality. Network meta-analysis allows combining information from different clinical trials and performing comparisons within the context of the available evidence. The resulting assessment of clinical effectiveness, using the network meta-analyses, showed that TCZ alone and TCZ + MTX were superior to cDMARDs alone and standard care on all outcomes assessed and as effective as other biologics in RA. Few significant differences were found between TCZ alone and any of the biologicals alone or in combination with MTX. As new evidence and agents becomes available, additional analysis must be performed to improve the quality of these comparisons.