Introduction

Patient-reported outcomes (PRO) have gained an increasing interest in chronic diseases assessment and rheumatic diseases, such as axial spondyloarthritis (axSpA), are no exception. In this context, there are several available PROs covering multiple dimensions such as the following: (i) disease activity, e.g., the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) [1]; (ii) function, e.g., the Bath Ankylosing Spondylitis Functional Index (BASFI) [2]; (iii) quality of life, e.g., the Ankylosing Spondylitis Quality of Life (ASQoL) questionnaire [3] or the EuroQoL 5 dimensions (EQ 5D) instrument [4]; (iv) work-related outcomes, e.g., the Work Productivity and Activity Impairment (WPAI) [5]; (v) health status, e.g., the Short Form 36 (SF36) health survey questionnaire [6] or the ASAS Health Index (ASAS-HI) [7]; and (vi) fatigue, e.g., the Functional Assessment of Chronic Illness Therapy–Fatigue (FACIT-F) [8]. In some cases, such as the Ankylosing Spondylitis Disease Activity Score (ASDAS), PROs may be blended with objective scores, such as inflammatory markers [9].

For many decades, the treatment of spondyloarthritis (SpA) has also been a great challenge. The therapeutic options were centered in nonsteroidal anti-inflammatory drugs (NSAIDs) [10] and/or physical interventions, given the little or no effect of conventional synthetic disease-modifying anti-rheumatic drugs (csDMARDs) or steroids, in this context [11]. However, many patients fail to respond or have serious adverse events to NSAIDs. In the last decade, the introduction of biological disease-modifying anti-rheumatic drugs (bDMARDs) has opened new possibilities to approach articular and extra-articular manifestations [12]. Currently, two groups of therapies, with different mechanisms of action, are available and approved for axSpA: tumor necrosis factor inhibitors (TNFi) and interleukin17A inhibitors (IL17Ai) [13].

Classically, most of the trials with bDMARDs have focused on disease activity and disease progression [14]. PROs were usually considered as secondary outcomes, with the exception of BASDAI, a PRO which is currently used as an outcome for disease activity. Lately, the philosophy of bringing patients to the center of the decision-making process [15] has increased the need to look further and assess other dimensions, motivating the introduction of the PRO’s concept [16]. This new approach generates another important topic to be debated related with the best way to report these results. The commonest way is to present the statistical significance for difference between values registered in two or more timepoints. However, it is important to assess if a significant statistical difference conveys a relevant clinical difference. Jaeschke et al. suggested the concept of the minimal clinically important difference (MCID) [17], addressing the limitations of examining statistical significance by itself, especially when studies may find statistical relationships that do not have any clinical importance [18]. In recent years, MCID has gained adepts and well-described MCID cutoffs are now available for many PROs regularly assessed in the context of axSpA [19] (Table 1).

Table 1 Different patient-reported outcomes (PROs) in axial spondyloarthritis and their respective minimally clinical important difference (adapted from Deodhar et al. 2016 [19])

Considering the concepts discussed above, a systematic literature review (SLR) was performed to assess the efficacy of bDMARDs on PROs, in axSpA randomized controlled trials (RCT), evaluating different relevant dimensions.

Methods

Literature search

A literature search according to the population (P), intervention (I), comparator (C), and outcomes (O), PICO format was performed. The “P” was defined as adult (≥ 18 years) patients with axial spondyloarthritis (axSpA), both radiographic-axSpA (r-axSpA) or ankylosing spondylitis (AS) and non-radiographic-axSpA (nr-axSpA). Studies including patients with other diagnoses were eligible if the results for axSpA were presented separately. The “I” was defined as any biological DMARD (bDMARD), regardless of formulations or treatment duration, as “C” were considered placebo (PBO), the same drug (different dose or regimen) or any different drug. “O,” patient-reported outcomes (PROs) for disease activity (BASDAI); function (BASFI); Quality of Life (ASQoL and EQ-5D); health survey (SF36-PCS and SF36-MCS), and fatigue (FACIT-F), were considered for analyses.

Only RCTs were considered for inclusion. Data from observational studies, studies not including PROs as primary or secondary endpoints and studies exhibiting PROs not quantitatively expressed were excluded. The search of The MEDLINE database was performed on June 1, 2018, with the filters “published in the last 10 years,” “Humans,” and “English.”

Data analyses

Trials were divided according to the target condition: only r-axSpA patients, only nr-axSpA patients, or the whole axSpA spectrum (r-axSpA and nr-axSpA). All co-medications allowed were compared across trials. The efficacy of bDMARDs on PROs was evaluated through the MCID concept (as defined in each individual publication) or by the statistically significant differences between baseline and a later time-point value, always comparing the intervention arm (i.e., bDMARD) to the comparator arm (e.g., PBO).

Assessment of bias

Assessment of bias was performed using the latest version of RoB 2 [23].

Results

The PICO search identified eighty-four papers. After reading all abstracts and manually screening, an extra nine papers, twenty-four publications fulfilling the inclusion/exclusion criteria (14 r-axSpA, 6 nr-axSpA, and 4 both axSpA and nr-axSpA) were identified (Fig. 1). All of them assessed TNFi (adalimumab (ADA), etanercept (ETN), infliximab (IFX), golimumab (GOL), certolizumab-pegol (CZP)), or the IL17Ai (secukinumab (SEC)).

Fig. 1
figure 1

Flowchart of the SLR approach

Most of the RCTs lasted for 12, 16, or 24 weeks, being almost universally followed by an open-label phase (Supplementary Table 1).

For r-axSpA, most of the studies had relatively homogeneous inclusion and exclusion criteria [19, 24,25,26,27,28,29,30,31]. In general, the inclusion criteria required the modified New York criteria (mNYc) for ankylosing spondylitis (AS) [32], failure or intolerance to at least 1 or 2 NSAIDs (usually after a total of 4 weeks or 30 days), and high disease activity, defined as a BASDAI ≥ 4. In 4 trials, patients required to be TNFi naïve [24,25,26, 30, 31], while in others, previous TNFi was allowed [19, 27, 28] and one did not state any information regarding previous bDMARDs [29]. Regarding co-medication, all allowed concomitant csDMARDs, NSAIDs, or steroids, at stable dose (maximum dosage defined in each of the papers).

For nr-axSpA, all studies had similar inclusion/exclusion criteria [33,34,35,36]. The ASAS axSpA criteria were required (with exclusion of patients meeting radiographic mNYc), inadequate response to at least 1 or 2 NSAIDs, at least 4 weeks, and high disease activity defined as BASDAI ≥ 4. Two of them limited the patients inclusion to < 5 years of symptoms duration [33, 36]. Regarding previous TNFi, only one study explicitly allowed them [35]. Two studies did not allow concomitant use of csDMARDs [33, 35], while in the remaining it was allowed. NSAIDs and steroids were allowed at a stable dose, even though one study did not provide information regarding steroids [33].

For studies with both subtypes of axSpA (r-axSpA and nr-axSpA), the inclusion criteria were defined according to the ASAS criteria for axSpA (regardless of mNYc) [37, 38] but one had more idiosyncratic criteria [39]: inflammatory back pain (IBP) according to Calin criteria plus HLA-B27+ plus sacroiliitis on magnetic resonance imaging (regardless of mNYc). Regarding previous bDMARD exposure, one study allowed them under specific conditions [37], other did not allow any previous exposure [38], and in another one this information was not stated in an explicit way [39]. In terms of co-medication, csDMARDs were allowed in one case [37], forbidden in other [39], and another provided no information [38]. One study did not allow simultaneous steroids [39], one allowed simultaneous steroids at a stable dose [38], and other provided no explicit information [37]. All studies allowed stable doses of NSAIDs.

Considering all publications, the studies permitted concomitant use of csDMARDs (hydroxychloroquine (HCQ), sulfasalazine (SSZ), or methotrexate (MTX)) in a stable dose. The studies that accepted stable doses of steroids usually excluded patients > 10 mg/day of prednisone or equivalent. Only in one case, patients taking prednisone or equivalent > 7.5 mg/day were excluded [35].

Only 5 RCTs reported and compared MCID achievement between treatment arms [19, 31, 37, 40, 41] and 4 of these provided numeric values [31, 37, 40, 41] (Table 2). Most of the RCTs reported the mean difference of a given PRO between baseline and a later timepoint (as absolute values or percentage of variation), providing a statistical test (confidence interval and/or p value) to express the magnitude of the difference between the treatment and the PBO arm.

Table 2 MCID reported in RCTs in axSpA

MCID

Regarding the 5 trials that reported MCIDs, there was a relevant difference, favoring the treatment arm over PBO, for almost all reported outcomes.

For r-axSpA, one trial on ADA showed a significant difference at 24 weeks, regarding ASQoL, SF36-PCS, and BASFI [31]. For GOL, the GOL 50 mg dose was superior to PBO regarding SF36-PCS (12 and 24 weeks) and SF36-MCS but only at week 12, while GOL 100 mg was superior for both components of SF-36 at all given timepoints [40].

For nr-axSpA, ETN was superior to PBO regarding EQ 5D utility after 12 weeks [41].

In studies involving nr-axSpA and r-axSpA, CZP 200 mg and CZP 400 mg have shown superiority compared with PBO at 24 weeks regarding ASQoL, SF36-PCS, and SF36-MCS [37]. There was no available quantitative data regarding IFX and SEC.

Effect of bDMARDs expressed as a statistical difference

In the context of r-axSpA (Table 3), ETN and IFX only had data on BASDAI and BASFI. ADA, GOL, CZP, and SEC have data on BASDAI, BASFI, ASQoL, SF36-PSC, and SF-36-MSC. SEC was the only bDMARD with quantitative data regarding EQ-5D and fatigue (using FACIT-F).

Table 3 Comparison of the effect of bDMARDs vs PBO in RCTs (assessed as mean difference)—r-AxSpA

There was an almost universal response for BASDAI, BASFI, and ASQoL favoring the treatment arm (bDMARD).

For SF36-PCS, the treatment arm was almost always superior. For SF36-MCS GOL, 50 mg achieved a relevant difference at week 12 but not at week 24, while GOL 100 mg had a relevant difference at both timepoints [26].

For EQ-5D, there was only data for SEC, which was superior to PBO only when an intravenous (IV) loading dose was given [19, 44].

For FACIT-F, there was a relevant difference for all SEC doses independently of the administration route [19, 44].

Regardless of the administration form of the loading dose, SEC was superior to PBO for all assessed PRO, except for EQ-5D [19, 43,44,45].

For nr-axSpA, the results broadly favored treatment arm (Table 4). For BASDAI, all studies favored treatment arm. Same results for BASFI, except for ADA, with a positive difference in one study [35] but not in another [34]. In the case of ASQoL, CZP and GOL were superior to PBO [33, 42] but not ETN [36]. Regarding general health quality of life evaluation, GOL 50 mg has shown a consistent positive impact in SF36-PCS, SF-36PCS, and EQ-5D. [33] ETN showed a positive effect on SF-36PSC but not in SF-36MSC or EQ-5D [41]. The results for ADA were contradictory regarding SF36-PCS: one study showing a positive difference [34] but not in other [35]; the only study where SF-36MSC and EQ-5D were evaluated has not shown any positive effect [35].

Table 4 Comparison of the effect of bDMARDs vs PBO in RCTs (assessed as mean difference)—nr-AxSpA

In all studies that assessed axSpA as a whole (r-axSpA and nr-axSpA–Table 5), there was a relevant difference favoring the treatment arm (CZP, IFX, or ETN) for BASDAI, BASFI, and ASQoL [39, 42] (except for BASDAI in the ETN trial at 8 weeks) [38]. There was a positive impact for CZP (combined dose) in SF-36, both PCS and MSC [37].

Table 5 Comparison of the effect of bDMARDs vs PBO in RCTs (assessed as mean difference)all SpA (both r-axSpA and nr-axSpA)

Assessment of bias

All studies showed a low risk of bias [47].

Discussion

It is well recognized that therapeutic decisions should include both physicians and patients’ perspectives [15], since better outcomes are achieved by a shared decision making [48]. In this context, PROs evaluation in axSpA has gained increasing importance in the clinical practice for therapeutic monitoring purposes.

This systematic review fills a knowledge gap regarding the way PROs are reported in RCTs.

Regarding individual PROs, BASDAI was the most commonly reported followed by BASFI, even though both were seldom reported using the MCID concept. Often, BASDAI was only reported to monitor therapeutic response using BASDAI50. The remaining PROs, concerning general and specific quality of life and fatigue, were less reported and even in a more heterogeneous way.

When PROs were described, in many cases no quantitative information was provided. However, the majority of the RCTs compared the values at baseline with the values at a second timepoint (coincident to the primary outcome), and the statistical significance for the difference. It does not seem to be adequate to draw conclusions regarding the relative efficacy of a bDMARD based on the statistical significance for a numerical difference, which may have little or no impact in terms of patient perspective. Even the MCID concept has limitations because achieving a clinically significant response may not be equal to patient acceptable symptom state (PASS) or remission [49].

Overall, PROs are still underreported as outcomes in clinical trials and described in a very heterogeneous way, making its interpretation and comparison difficult. Once again, this SLR highlights the difficulties to obtain strong conclusions from RCT evaluation due to the well-known inherent problems of direct comparison between trials (differences in inclusion/exclusion criteria, measurement timepoints, and co-medications allowed) [50], the inherent differences related with PROs evaluation (e.g., a PRO may be filled using a visual analogue scale (VAS) or a numeric rating scales (NRS)) [51], and the use of different PROs for the same dimension (e.g., FACIT-F vs BASDAI fatigue). However, bDMARDs were broadly more efficient than PBO in terms of PRO improvement but it must be pointed out that the effect on SF36-MCS and EQ-5D was not as consistent as to others PROs. The weaker association of these outcomes with disease activity, being highly prone to be influenced by external factors might constitute a reasonable explanation. On the other side, disease-specific PROs are preferred against generic tools (such as SF36 and EQ-5D) due to the lower sensitivity of the later [52].

Conclusion

This SLR highlights the fact that there is a need to raise the standard of care on SpA, through the real introduction of the patient perspective in the decision-making process. However, in order to achieve this goal, the target must be clearly defined, reported, and tested. Once again, a standardized PRO evaluation and reporting would contribute to improve the patients approach regarding QoL maintenance. Apart from the current MCID concept, there is a need to identify cutoffs for several PROs, equivalent to clinical remission or to the PASS state, that should be addressed in the near future.