Introduction

Rheumatoid arthritis (RA) is characterized by fluctuating between inactivity and activity, a so-called flare [1]. Different definitions of such flares exist in RA [24]. However, according to the international network OMERACT (Outcome Measures in Rheumatology), a flare measure should be able to indicate a worsening in disease activity state and should also include the patient’s perspective [5]. Traditionally, disease activity in RA has been measured using a composite score, i.e., the Disease Activity Score 28 (DAS28) [6]; however, recent years have seen several attempts at developing simpler, more user-friendly, and less time-consuming flare instruments [7, 8].

The French Flare instrument (FI) is a newly developed patient self-assessment questionnaire aimed at detecting changes in disease activity among RA patients in daily clinical practice [8]. It seeks to determine the perspective and perception of a flare among both patients and clinicians, and is based on semi-structured interviews among 99 RA patients and consensus Delphi rounds among 13 rheumatologists. The Flare instrument can be defined as a patient-related outcome measure (PROM) [9], and thus, as for any PROM, the ability of the FI to capture a flare depends on the psychometric strength of the instrument.

We have translated the French version of the FI into Danish, and the present study is the first of two presenting the translation process and the reliability of the Danish FI. In a second publication, we present the construct and criterion validity of the instrument.

Methods

Patients

For this study, patients visiting a large out-patient clinic at the Department of Rheumatology, Aarhus University Hospital, Denmark were invited. Inclusion criteria were diagnosis of RA according to the criteria set by the American College of Rheumatology [10], age older than 18 years, and fluency in speaking Danish. Clinical characteristics are presented in Table 1.

Table 1 Clinical characteristics of 50 RA patients in the FI study

The Flare instrument

The Flare Instrument (FI) is a newly developed self-administrated tool for patients with RA to identify disease activity between consultations. It is designed to detect both past and present disease activity among patients with RA in daily clinical practice. FI is a result of 99 semi-structured interviews together with statements from 13 rheumatologists, generated through a Delphi process. It consists of 12 items and each item represents a statement, which is related to disease activity in RA. FI addresses both the patients and the clinician’s experience of disease activity [8]. When scoring FI, the patients are asked to enumerate their degree of agreement on a 10-point Likert scale (0 = completely agree, 10 = completely disagree), higher scores indicating a flare. The FI gives three scores: the total FI score with potentially two subscales: one related to joint symptoms (FI joint) and one related to general symptoms (FI general). The total FI score, FI joint symptoms, and FI general symptoms were calculated as the arithmetic mean of the 12 FI items, the joint symptoms items (item 1 to 6), and the general symptoms items (item 7 to 12). Only questionnaires with no missing data were included in the calculations.

Recently, FI was shown to have high reproducibility in a randomized controlled trail that included 200 French RA patients with stable disease (unpublished data). In the present study, we aim to describe the translation and reliability of the FI among a consecutive sample of patients with RA visiting a large outpatient clinic at the department of rheumatology at a university hospital in Denmark.

Translation

The adaptation of the FI to a Danish version (FI-D) was performed according to the guidelines by Guillemin et al. [11] and included the following steps: (i) Translation of the FI from French to Danish by two independent qualified translators; (ii) synthesizing the translations, in order to achieve coherence; (iii) face validity test of this version among patients from the outpatient clinic and health professionals in order to test feasibility of the test; and (iv) back-translation of the consensus version by two independent back-translators was approved by the developer of the original questionnaire.

Reliability

Reliability evaluates the degree to which the measurement is free from measurement error [12]. The reliability was evaluated with 10 days between assessments to reduce the risk of (a) recollection of answers and (b) new flares. The FI was sent to the patients and a new questionnaire (retest) was sent 8 days after receiving the first questionnaire, giving patients at least 10 days between test and retest.

Inclusion and dropout is shown in Fig. 1.

Fig. 1
figure 1

Flow chart for selection of RA patients in the FI study

Statistical analysis

The sample size followed the general recommendation of having at least 50 subjects in a method comparison study [12, 13].

Descriptive statistics were calculated for all variables. Possible floor and ceiling effects were examined. Such effects were considered to be present if more than 15% of the respondents achieved the highest or the lowest score, respectively [14]. Scatters of the differences between test and retest were plotted against the means to indicate if the differences were related to FI score (heteroscedasticity) [15]. Differences between test and retest were calculated, and systematic differences were assessed by paired t test. These differences were plotted against the means of the two measurements by Bland–Altman plots with 95 % confidence intervals (CI) and 95 % limits of agreement (LOA). Absolute measurement errors were estimated by calculating the standard error of the measurement (SEM) and converted the SEMs into the minimally detectable change (MDC) (MDC = 1.96 × √2 × SEM). The MDC defines the smallest within-person change that can be interpreted as a “real” change above the measurement error [13]. The intra class correlation coefficients (ICC) model 2.1 with corresponding 95 % CI was used to assess reliability. The ICC can range from 0.0 to 1.0, and according to recommendations, an ICC exceeding 0.90 was considered a sufficient reliability for evaluation of the individual patients [14].

Analysis of internal consistency using Cronbach’s α was determined for the total FI score, FI joint symptoms, and FI general symptoms; values above 0.7 were considered acceptable [16].

In order to check whether patients had changed between test and retest sensitivity analyses were performed on item on medication and the global item using weighted kappa with squared weights (K w2) on the medication item.

Statistical analysis was performed with STATA13 software (StataCorp, College station).

Ethics

All patients gave their written informed consent. The study protocol was approved by the Danish Data Protection Agency (reference number: 1-16-02-577-13), and all procedures were in accordance with the declaration of Helsinki II.

Results

Translation

After the forward translation from French into Danish especially item 1 to 4 did not appear to be mutually exhaustive and exclusive. This became evident through the face validity test, and hence, there was a need for rewording these items. Ten patients from the outpatient clinic and five health professionals (four physicians and one nurse) participated in the test of face validity. The adjusted version was then back translated and this version was approved by the French authors, and this procedure did not result in further corrections. A figure of the translations process is available as extra web material, and the Danish version of FI is available as Appendix 1.

Reliability

Data to the reliability test were collected from March 2013 to August 2013. A total of 107 patients with RA from our outpatient clinic were invited to participate by a postal questionnaire; of these, 72 patients accepted the invitation and returned the first FI. Eighteen patients only completed the first test. The FI is a two-page questionnaire, and three patients completed only the first page and were excluded due to missing data. One patient was excluded due to a hand operation between test and retest. This left 50 patients for the study (Fig. 1).

There were no statistical difference between included patients and the non-responders or the 22 patients excluded (see Fig. 1) in matters of sex, positive rheuma-factor, corticosteroids or disease duration (p > 0.05). Included patients were significantly older than non-responders (65.3 and 54.3, respectively) (p < 0.01), but there were no significant difference in age between included patients and the 22 excluded patients (p = 0.29).

The characteristics of the included patients are presented in Table 1. The mean duration between the two tests was 12.6 days (SD 2.6).

No floor or ceiling effects in total scores and subscales were found except for FI general symptoms, where 20 % of the patients scored 0.

Cronbach’s α was 0.96, 0.92, and 0.93 for the total FI score, FI joint symptoms, and FI general symptoms, respectively.

As seen in Table 2, there was no systematic bias between test and retest.

Table 2 Reliability and agreement parameters for FI total score, FI joint symptoms, and FI general symptoms in 50 RA patients

Means for test and retest for, difference, LOA, SEM, and ICC are shown in Table 2.

Differences between test and retest plotted against the mean of the tests are shown in Fig. 2.

Fig. 2
figure 2

Differences between test and retest plotted against the mean of FI total score, FI joint symptoms, and FI general symptoms

Weighted kappa showed excellent agreement on both analgesic and the global item (Қ w2 0.96 and 0.82, respectively).

Discussion

The FI was successfully translated into Danish, and excellent reliability was found for the total FI score and for the two subscales: joint symptoms and general symptoms. Thus, the results of the present study show that the FI is a reliable tool for evaluation of flares in patients with RA.

This was the first reliability study on the FI, and thus, we have no comparable studies.

The translation from French to Danish and backwards was carried out according to international guidelines for cross-cultural adaptation of health-related measures using the procedure introduced by Guillemin et al. [11]. In this process, small adaptations to the original French version were made, including the way arthritis medication was addressed (a field in which the Danish language in general is more imprecise than French) and the form of address in the questionnaire (to a more informal tone).

It is generally recommended that reliability coefficients as a minimum exceed 0.70 to discriminate between groups in clinical trials and 0.90 to evaluate individual patients [14, 17]. In our study, FI and its subscales had an ICC ≥0.95, which indicates that FI and its subscales are well-suited to evaluate disease activity in individual RA patients and also for comparison between groups of patients. The level of the estimated MDC implies that at least 1.23 points are needed to detect a “true” within-person change in the FI total score and a bit higher in FI joint symptoms and FI general symptoms 1.74 and 1.51, respectively.

Cronbach’s α was high indicating good internal consistency; as α exceeded 0.90, it could be argued that some items could be removed [16], but as this is a translation and reliability study, the purpose was not to reduce items.

The present study has some limitations. Participation in this study relied on active enrolment of patients. However, only 46 % of the patients returned both the first and second questionnaire. This dropout was larger than expected [14] and could potentially have introduced selection bias. As we expected, more than 50 % of the patients in our cohort were treated with DMARDs [18], but age was higher and disease duration longer than expected [19]. Because patients with longer disease duration usually have the most stable disease courses, consequently, we may have included patients with a lower prevalence of flares than usually seen in daily clinical practice. This could potentially have caused an overestimation of the ICCs in our study.

A relatively short interval between test and retest was used. A longer time between test and retest has been proposed by some authors [20]. The present time interval was chosen to ensure minimal change in RA, which may have occurred over time. However, such a short interval may be a potential source of bias because the patients may recall previous answers given, and this may affect the objectivity of the tests.

In reliability tests, one of the main questions is whether patients actually had changed between test and retest. As the FI were mailed to the patients, there is no clinical examination to confirm this, but kappa calculations on both the general item and use of analgesic showed excellent reliability indicating no or minor change in clinical symptoms.

In conclusion, we have successfully translated and adapted the FI into the Danish language with the final Danish version demonstrating good psychometric properties with excellent reliability.