Introduction

Rheumatoid arthritis (RA) is characterized by joint swelling and pain, often with systemic features and possible progressive joint damage [1]. Patient’s experience of flare often includes increased joint pain, fatigue, poor sleep, and decreased function [2]. RA flares are associated with cardiovascular risk, impaired fertility, complicated pregnancy, progressive joint damage, impaired function, and overall reduced quality of life [3,4,5]. Importantly, a recent study identified self-reported flares as a reliable predictor of progressive radiographic joint damage [6].

Although there is no fully agreed definition of RA flare, a unifying concept includes (unexpected) worsening of RA signs/symptoms potentially requiring treatment change [7, 8]. Prior research by van der Maas et al. examined various disease activity score (DAS28) flare definitions and proposed that worsening of DAS28 ≥ 0.6 above 3.2 or an absolute increase of 1.2 constituted a flare [9, 10]. However, as with most definitions that focus on the physician’s objective account, the latter approach does not incorporate the patient’s perspective of flare, thus missing subjective warning signs of one or more previous, active, or developing flares [2]. Patient-reported outcome measures (PROMs), such as the French Flare Assessment in RA Questionnaire (FLARE-RA) and the OMERACT RA Flare Questionnaire (RA-FQ), have been developed to address this concern [7, 11]. Thresholds for intensity and duration of flare are being investigated by the developers of these PROMs. In addition, use of such PROMs can aid physicians to take a novel approach when evaluating a patient in RA flare.

The French STPR (French acronym for “Therapeutic strategies in RA”) collaborative group developed and validated the FLARE-RA in French as a self-administered PROM intended to identify current/recent flares between visits and to potentially assess the need to change therapy [11, 12]. The French FLARE-RA has been translated into British English (not yet validated) [11] and validated in Danish [13, 14]; it has yet to be examined in American English-speaking RA patients.

In this study, we began the process toward validating the use of FLARE-RA in an American English setting. Our objectives were the following: (1) to evaluate if the British English version of the French FLARE (BE-FLARE-RA) questionnaire was suitable for American English-speaking RA patients, (2) to utilize a robust methodology to develop a FLARE questionnaire for American English-speaking RA patients (AmE-FLARE-RA) (cultural adaptation), and (3) to examine the ability of the AmE-FLARE-RA to detect flares, by assessing its correlation with established measures of disease activity and evaluating its ability to discriminate between patient-reported flare/no flare.

Patients and methods

This study is in compliance with and approved by the University of California, Los Angeles (UCLA) Office of the Human Research Protection Program. We performed a mixed method design, where both qualitative assessments (cohort 1) and quantitative assessments (cohort 2) were performed.

Qualitative analysis: cognitive debrief and finalization of AmE-FLARE-RA (cohort 1)

Twenty-five patients meeting the 2010 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) classification criteria for RA [15] were recruited from rheumatology clinics, between May and October 2013. Patients voluntarily signed the UCLA internal review board (IRB)-approved (#12-001784) consent. Gender, age, race, serology, disease duration, current RA treatment, and clinical disease activity index (CDAI) were collected. Established cut points for CDAI were utilized to distinguish patients of varying disease activity categories, remission ≤ 2.8, low disease activity ≤ 10 and > 2.8, moderate > 10 and < 22, and severe ≥ 22 [16]. Educational background for the participants in the cognitive debrief was collected.

We used the 10-step international guidelines on Good Practice for the Translation and Cultural Adaptation Process for PROMs by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Task Force for Translation and Cultural Adaptation (TCA) to culturally adapt FLARE-RA [17] (Appendix Table 6). Using the published BE-FLARE-RA [11], we cognitively debriefed 25 RA patients to be representative of American English speakers involved in clinical trials (ISPOR steps 1–7: forward-back translation, reconciliation, and cognitive debriefing). ISPOR steps 2–4 (forward translation, reconciliation, and back translation) were modified since AmE-FLARE-RA did not require initial translation and only one person performed the forward-backward translation [17].

BE-FLARE-RA consisted of 13 items in 2 subscales (6 physical items, 7 general items) scoring from 0 (no flare) to 10 (maximum flare) [11]. Patients completed the questionnaire prior to and after a semi-structured interview and self-reported flare status (yes/no). The patients reported whether they could answer each item/question, if it described their present or most recent flare, if the item was understandable, and whether they would ask the question in a different way. We captured all patient responses on a case report form during the interview and audio-recorded 14 patients to examine accuracy. For thematic analysis, responses were tabulated for each item based on flare status and assessed based on the proportion of patients who knew how to answer each item.

To address the results of the thematic analysis, five of the original 25 interviewees returned for a focus group to propose changes to wording and item presentation on the BE-FLARE-RA (ISPOR step 8: review of cognitive debriefing results and finalization) (Appendix Table 6). Using the nominal group methodology, we discussed wording issues identified during the thematic analysis and the focus group recommended modifications. Sentence structure changes were made only if all five patients agreed (Appendix Table 7). Although the questionnaire did not require translation, per ISPOR (step 9: proofreading), the culturally adapted AmE-FLARE-RA was back translated to assure content preservation and then finalized (ISPOR step 10: final report) [17].

Quantitative analysis: flare detection by AmE-FLARE-RA (cohort 2)

An additional 103 consecutive RA clinic patients meeting the 2010 ACR/EULAR classification criteria were recruited from UCLA rheumatology clinics and voluntarily signed informed consents [15]. Patients reported demographics, patient global disease activity visual analogue scale (VAS), and number of flares (none/one/several) since the last visit and independently completed the Routine Assessment of Patient Index Data3 (RAPID3) and AmE-FLARE-RA. One patient missed the flare status question (yes/no) and was therefore omitted from the flare-no flare patient response comparison. Seropositivity, current RA medications, physician global VAS, physician-reported flare (yes/no), 28-swollen joint count (SJC), 28-tender joint count (TJC), and CDAI were extracted from the clinic chart. We did not provide the patients or physicians with a definition of RA flare—both answered the question based on their independent concept of flare.

Statistical analyses

Descriptive statistics (means, standard deviations, and frequencies) were computed to describe the demographics of cohorts 1 and 2. Sample size for cohort 2 was based on COSMIN criteria for validation of patient-reported outcomes, where ≥ 100 subjects’ grants score of “excellent.” Within the subjects in cohort 2, we computed Cronbach’s alpha for the overall construct as well as the physical and emotional subscales to assess internal consistency. Values indicating high internal reliability for clinical applications are generally considered to be those greater than 0.90 [18]. We also performed both exploratory and confirmatory factor analysis to compare the factor structure with prior work with the FLARE-RA questionnaire and to evaluate the fit of the results with the existing factor structure. Model comparisons were made based on a likelihood-ratio test. Fit statistics for the confirmatory factor analysis is reported as the root mean square error of approximation (RMSEA) and comparative fit index (CFI) with values < 0.08 and > 0.90 respectively indicating good model fit [19]. Next, we evaluated the discrimination ability of AmE-FLARE-RA and other standard disease activity measures to separate patients into either patient-reported flares groups (Y/N) or separately into physician-reported flares groups (Y/N). These analyses were performed using the Wilcoxon rank-sum test to compare the distributions of the measures between groups defined by flare status. Next, we used the Wilcoxon rank-sum test to compare AmE-FLARE-RA/disease activity measures between cases in which both physician and patient agreed that the patient was in flare, compared with cases in which they agreed that the patient was not in flare. A similar analysis was used to compare the measures between groups defined by the two types of patient and physician disagreement (patient = yes/physician = no vs patient = no/physician = yes). The Kruskal-Wallis test was used to compare AmE-FLARE-RA and disease activity measures across patient-reported flare frequency categories (none/once/several). We used Spearman rank correlations to assess the association between the AmE-FLARE-RA scores and the disease activity measures. Linear regression analyses were also performed examining the association between AmE-FLARE-RA scores and disease activity measures after adjusting for age, gender, race, and seropositivity.

Results

Qualitative analysis: cognitive debrief and finalization of AmE-FLARE-RA (cohort 1)

Of the 25 RA patients in cohort 1, 92% were women, 64% Caucasian, mean (SD) disease duration was 10 (10.1) years, and mean age (SD) was 51.4 (12.5) years; 20% were in remission/low disease activity, 48% had moderate, and 32% severe disease activity. Thirty-six percent used conventional disease-modifying anti-rheumatic drugs (cDMARDs), 24% biologic DMARDs (bDMARDs) or targeted synthetic DMARDs (tsDMARDs), and 40% combination cDMARD and bDMARD. Thirty-six percent took prednisone. Education level ranged from high school (40%) to post-college education (32%). Twenty-one patients self-reported flare at the current visit (52%) or within 3 months (32%) prior to the visit (Table 1).

Table 1 Patient characteristics

All 25 patients agreed that the BE-FLARE-RA questionnaire required wording changes to accurately report their flare. Thematic analysis disclosed several recurrent issues: (1) confusion because anchors were dichotomous (“completely untrue/absolutely true”) when the numerical rating scales were 0–10; (2) inconsistent and non-specific time frames (e.g., “several consecutive days,” “in the last 3 months,” “since the last consultation”); (3) lack of specificity of items referring to flare; i.e., “due to your rheumatic disorder” could refer to RA or other concomitant disease; and (4) ambiguity of the instructions for completing the questionnaire.

Specific wording choices in BE-FLARE-RA were modified to understandable/culturally appropriate terminology (Appendix Table 7). For example, patients reported the word “appearance” was appropriate to describe swelling, but not to describe stiffness or pain. Additionally, patients were confused by the BE-FLARE-RA words for fatigue (tired), which became “fatigued/exhausted,” pain medications (killers) became “pain medication,” and “restricted” pertaining to decrease in daily activities became “affected.”

In addition, patients requested clarification of instructions, especially relating to time frames: “in the last three months,” “since the last consultation,” or “over several consecutive days”; AmE-FLARE-RA now asks patients to think about their “current or most recent flare” for each item. Moreover, time frames were changed from “over several consecutive days” to “for more than a few days” since patients argued flares do not necessarily occur over consecutive days. Although item “You noticed a marked worsening in your arthritis lasting more than a few days due to your flare” was removed from the latest FLARE-RA [12], it was re-incorporated as the first item by our study patients since they thought it provided descriptive foundation for describing their flare. Finally, several patients did not relate prednisone use to their flare since most patients were either not taking the prednisone (64%) or did not necessarily modify their prednisone during flares. This coincided with the recent deletion of this item during the validation of FLARE-RA [12].

A French collaborator (Dr. Fabrice Kwiatkowski) confirmed by back translation that none of the changes altered the intended measurement for any item. Aside from only obtaining audio recordings for 14 patients, there were no missing data. The final AmE-FLARE-RA is presented in Fig. 1.

Fig. 1
figure 1

Final AmE-FLARE RA Questionnaire: Flare Assessment in RhEumatoid arthritis

Quantitative analysis: flare detection by AmE-FLARE-RA (cohort 2)

Of 103 RA clinic patients completing the final AmE-FLARE-RA in a clinic, 89% were female, 67% Caucasian, with mean (SD) disease duration of 11.9 (10.1) years, and mean (SD) age of 51.1 (15.3) years. Fifteen (16%) patients were taking prednisone with a median (min-max) daily dose of 5 (2–13) mg. Additionally, 26% were in CDAI remission/low disease activity (LDA), 38% moderate, and 37% were in severe disease activity (Table 1).

Internal consistency and structural validity of AmE-FLARE-RA

Cronbach’s alpha showed high internal consistency with the AmE-FLARE-RA scores (α = 0.96) as well as the physical and emotional subscales (0.93 and 0.94, respectively). Fit indices from the confirmatory factor analysis were moderate (RMSEA 0.16; CFI = 0.89). Moreover, the factor-analytic structure supports a two-factor solution as indicated by a statistically significant (p < 0.05) likelihood-ratio test comparing between a two factor and single-factor solution.

Comparison of AmE-FLARE-RA scores and RA disease activity measures

Total and general AmE-FLARE-RA scores were significantly higher in the 38 patients self-reporting current flare versus the 64 patients who did not report a flare. We also found that total and physical AmE-FLARE-RA scores were significantly higher in the 19 patients with physician-reported flare versus the 84 subjects with no physician-reported flare (p < 0.05) (Table 2). As expected, there were significant differences in conventional disease activity measures (e.g., TJC, SJC, CDAI, RAPID3, and physician/patient global VAS) between those with physician-reported flare compared with those with no physician-reported flare (both p < 0.01). Many of these measures were also significantly different between self-reported flare groups with the exception of SJC and physician global VAS (p = 0.62 and p = 0.09, respectively).

Table 2 AmE-FLARE-RA and disease activity measures according to patient-reported and physician-reported flare status at the time of visit

AmE-FLARE-RA discrimination across patient/physician-reported flare and flare frequency

Seventy-one patients and physicians agreed on current flare status at the time of the visit. When concordant, AmE-FLARE-RA scores were significantly higher, as were CDAI, RAPID3, patient and physician global VAS, and TJC between those subjects where both physician and patient-reported flare versus those subjects where the physician and patient both did not report flare (p < 0.05) (Table 3).

Table 3 AmE-FLARE-RA and disease activity measures stratified by patient/physician agreement and their discordance

For 31 patients (30%), patient and physician reports of flare were discordant. Measures that were different between patient- and physician-reported flare primarily related to clinical disease assessments. SJC and physician global were numerically higher for physician-yes/patient-no flares (both p = 0.05). AmE-FLARE-RA mean total scores were highest in those with physician and patient both reporting flare (mean (SD), 7.6 (2.2)) and lowest in those in which both physician and patient did not report flare (4.9 (3.6)) (Table 3).

Overall, 78 (76%) patients self-reported flares since their last clinic visit; 28 reported one flare and 50 reported several flares (Table 4). CDAI, RAPID3, TJC, patient global VAS, and physician global VAS scores differed significantly across groups based on number of flare reports (none, one, or several). While SJC was numerically higher in the “Yes, several times” group, it was not statistically significant. Patients without between-visit flares had lower mean scores of all disease measures versus those who self-reported one or several flares between visits. There was also a numerical trend toward higher AmE-FLARE-RA total scores (mean (SD)) according to the frequency of flares between visits: no-flare, 4.3(4.1); one flare, 5.6(2.6); several flares, 6.4(2.7) (p = 0.07) (Table 4).

Table 4 AmE-FLARE-RA comparisons of no flare, one flare, or several flares between visits

Correlations between AmE-FLARE-RA and RA disease activity measures

There were significant correlations between general, physical, and total AmE-FLARE-RA scores and RAPID3 (Spearman correlation of 0.52, 0.44, 0.50, respectively; all with p < 0.0001), and CDAI (Spearman correlation 0.44, 0.42, 0.45, respectively; all with p < 0.001). In addition, total and each subscale AmE-FLARE-RA score correlated with TJC and patient global VAS (p < 0.001), while none was significantly correlated with SJC (Table 5). The results of the linear regression analyses after adjustment for demographic factors demonstrated similar results, except that SJC was also independently associated with AmE-FLARE-RA scores (p < 0.01, data not shown).

Table 5 Correlation coefficients between AmE-FLARE-RA and measures of disease activity

When comparing the flare instrument to RAPID3, each of the 13 AmE-FLARE-RA questions correlated with the questionnaire significantly (correlations range from 0.39 to 0.55), where question 9 regarding “feeling irritable” had the highest correlation and question 6 regarding a need for more pain medication least correlated (data not shown).

Discussion

The literature shows that self-reports of RA flare can lead to potentially irreversible consequences, such as destructive physical joint damage and cardiovascular disease [3, 4, 6]. One challenge in the management of RA patients with flares is a lack of consistent definitions. A validated instrument to detect and measure RA flares would help guide clinical decision-making and patient management. The French/BE-FLARE-RA was developed and the French version validated by the STPR group, as well as translated and validated in Danish, to help address this need. However, in order for the questionnaire to be useful among American English speaking patients, this study determined that cultural adaptation of the BE-FLARE-RA questionnaire was required in order to improve comprehension, particularly regarding wording and sentence structure. The BE-FLARE-RA is currently being used in RA clinical trials in the USA [20], despite the fact that it has yet to undergo the rigor of validation, and highlights the importance of this study. This is crucial in a country like the USA, where there are an estimated 1.3 million RA patients [21].

In a 3-month, multi-center study, where most of the French patients were in remission/low disease activity, the total FLARE-RA score correlated significantly with DAS28 (r = 0.59–0.63, p < 0.001), RA Impact of Disease (RAID) (r = 0.72–0.80, p < 0.001), RAPID3 (r = 0.72–0.77, p < 0.001), and Health Assessment Questionnaire Disability Index (HAQ-DI) (r = 0.53, p < 0.001) [12]. For French patients without flare versus several flares, the median (interquartile range) total score was 0.82 (0.2, 1.8) versus 4.9 (3.5, 6.5) (p < 0.0001) [12]. Similarly, our study found that patients self-reporting flares had significantly higher mean total AmE-FLARE-RA scores, compared with those without flares at the time of the visit, and that AmE-FLARE-RA scores (total, general, physical) significantly correlated with RAPID3, CDAI, tender joint count, and patient global VAS (p < 0.001). While not statistically significant, our data showed numerically higher scores for total, physical, and general AmE-FLARE-RA in the “Yes, several times” group compared with the “No flare” group. Overall, AmE-FLARE-RA scores were numerically higher in each category compared with the French cohorts, which may be attributable to the higher proportions of patients in our cohort with moderate or severe disease activity.

There remains a disparity between physician determination of flare and patient self-report of flare, as evidenced by the discordance between physician and patient assessment of flare in nearly a third of our patients in cohort 2. This highlights the widely recognized discordance that is seen between physician and patient global assessments of disease activity [22,23,24], as well as lack of consensus on the definition of flare between clinicians and patients [2, 8, 11]. Interestingly, regardless of discordance between patient and physician assessment of flare at the time of visit, AmE-FLARE-RA total scores were uniformly elevated compared with those in which both patient and physician agreed that there was no flare (~ 6.0 on a 0–10 scale). Furthermore, SJC and physician global VAS did not correlate with AmE-FLARE-RA, which is expected, as patients are not involved in performing these assessments.

Since AmE-FLARE-RA captures the number of flares between visits as well as at the time of the present clinic visit, we envision the questionnaire as a tool to detect flares (episodic worsening of disease activity) between clinic visits based on the patient’s perspective once validated prospectively (as done with French FLARE-RA). Ultimately, we hope this triggers the promotion of a critical, early decision for change in therapy, sparing irreparable joint damage. This unique feature of (AmE) FLARE-RA differs from instruments like the OMERACT RA-FQ, as well as the DAS flare definitions that are only assessed at the time of a clinic visit.

This study has strengths but also limitations. First, most patients who participated in the cognitive debrief and focus group (cohort 1) were educated beyond high school, and focus groups with patients with a broader range of education levels may be of use. Secondly, cohort 2 patients had higher disease activity compared with patients in the French validation study (moderate/severe disease: American ~ 70%, French ~ 40%) (Table 1); however, this difference may be partially explained by differences in disease categorization between CDAI (used in this study) and DAS28 (used in the French study). Additionally, although patients requested a change in time frames to “for more than a few days” to more accurately reflect their flare experience, this still leaves the possibility that the instrument excludes flares of shorter duration, as it does with the original French terminology. Further, as this is a cross-sectional study, the impact of changes in disease activity between visits could not be determined. Finally, we did not remove item 1 (“You noticed a marked worsening in your arthritis lasting more than a few days due to your flare”) and the measurement properties of the 12-item AmE-FLARE-RA may differ marginally from the 11-item French FLARE-RA. This feature should be evaluated in longitudinal studies as well as reliability. Despite these limitations, internal consistency values in this study were similar to those found by Fautrel et al. [12], showing a high internal consistency with the overall construct, as well as with physical and emotional subscales. Moreover, the factor-analytic structure of our data aligns with Fautrel et al.’s findings of a two-factor solution.

In conclusion, we have cognitively debriefed and culturally adapted the British English version of the FLARE-RA to American English, originally developed with French RA patients. We have shown that the AmE-FLARE-RA questionnaire distinguishes flares versus no flares, as reported by the patient or the physician, comparable with the FLARE-RA. Thus, AmE-FLARE-RA can be used to detect and measure RA flares. Longitudinal studies can now be conducted with this questionnaire to fully characterize its psychometric properties and contribute to its validation, including responsiveness, reliability, and thresholds of meaning for measurement of between-visit flares.