FormalPara Key Points for Decision Makers

Patient research partners can effectively contribute in the identification of face validity in instruments.

A clear consensus was reached among rheumatoid arthritis patients that each of the eight outcomes should be measured according to three distinct domains: severity, effect and ability to cope.

1 Introduction

Rheumatoid arthritis (RA) is an unpredictable auto-inflammatory disease characterised by painful and swollen joints, fatigue and joint damage. Traditionally, rheumatologists and other health professionals have decided which outcomes should be measured and these have been standardised using core sets and composite indices to allow comparison across studies. The OMERACT/American College of Rheumatology (ACR) core set [1, 2] and the Disease Activity Score (DAS) [3], developed by RA professionals, have become dominant in clinical practice and in research. Although these include patient-reported outcomes, patients were not involved in their development or selection. The OMERACT/ACR variables are tender and swollen joint counts, physician’s global assessment of disease activity, inflammatory markers, and the patient’s assessment of pain, disease activity and physical function. The patient’s assessment of pain and disease activity is commonly measured by visual analogue scales (VAS) although the wording of the VAS have not been standardised and has been shown to vary across studies [4]. Physical function is commonly measured by the Health Assessment Questionnaire (HAQ) [5]. These variables were chosen through a process of literature review and nominal groups with experts because they sample the broad range of improvement in RA (have content validity) and are all at least moderately sensitive to change [1, 3].

The DAS is a composite index used to assess disease activity in RA and provides evidence of clinical need and governs decisions on the appropriateness of biologic therapies in the UK [6]. The DAS is an algorithm of tender and swollen joint counts, blood inflammatory markers, and patient opinion of global health or disease [3]. These variables were chosen by rheumatologists from the literature and tested for their relative contribution to changes in inflammation. Patient opinion of global health or disease is measured by a VAS, the wording of which has also been found to vary across studies and clinical practice with a potential impact on prescribing biologic therapy [4].

The International Classification of Functioning, Disability and Health core set for RA [7] is the most comprehensive but with 96 items (26 % body functions, 19 % body structures, 33 % activities and participation, and 22 % environmental factors) it is not parsimonious and has not been widely used even in its brief form. This was developed using Delphi techniques with RA professionals, systematic review and empirical data collection. It was validated with RA patients after its publication, using the core set as a structure for the focus groups [8], and hence limiting the potential for new items to emerge. All these methods have focused on obtaining professional priorities for assessing disease activity, although OMERACT has since endorsed fatigue as an essential measure in RA, following substantial input from patient research partners [9].

The USA Food and Drug Administration (FDA) recognised the need to incorporate a wider patient perspective in drug evaluation programmes. It has called for (and issued draft guidance on) standardisation of methods for developing patient-reported outcome measures (PROMs), which incorporate the patient perspective [10]. PROMs have become standard measures of pain, functional disability and global health/disease activity in RA owing to their inclusion in OMERACT/ACR criteria and the DAS [13]. They have been shown to be as effective as the traditional physician- or laboratory-reported outcomes in reflecting changes in disease activity over time, and predicting long-term morbidity and mortality [11].

When developing instruments to measure PROMs, it is essential to use appropriate methodology to involve patients, such as focus groups to develop appropriate wording for numerical rating scales (NRS), and cognitive interviewing to explore understanding of candidate questions, as recommended by the FDA [12, 13].

The growing rheumatology literature on the patient perspective has revealed three important issues. First, differences have been shown between physician and patient assessments of outcomes such as pain, overall health, and physical and mental function [14]. Second, RA patients identify treatment outcomes that are not routinely measured, such as sleep, well-being and normality [1517], which may relate to disease activity. Feeling well and returning to normal were consistently described as important outcomes by RA patients in focus groups and interviews in the UK [15, 16, 18] and in Sweden [19]. Third, RA patients have both similar and different priorities to clinicians. The RA Impact of Disease (RAID) study developed an index of seven patient priority domains with patients and clinicians across ten European countries: pain, functional capacity, fatigue, emotional well-being, quality of sleep, coping and physical well-being [16].

Previously, some authors of this paper (TS, MM and SH) had developed the RA Patient Priorities in Pharmacological Interventions (RAPP-PI) patient core set by collaboration with patient research partners. This identified eight priority outcomes: pain, activities of daily living (ADL), visible joint damage, mobility, life enjoyment, independence, fatigue and valued activities [20]. Although pain, functional disability, fatigue and joint damage are routinely measured, the other outcomes in these patient-centred studies are not, despite being linked to experience of disease by patients. Feasibility is an important requirement of outcome measures [21], and the responder burden of completing many multi-item questionnaires to measure each of these domains may be unfeasible in many studies. Therefore, we wanted to develop a single measure that allows treatment outcomes to be assessed with validity and reliability in all domains of disease regarded as important to patient and clinicians.

Comparison of the RAPP-PI patient core set with commonly used PROMs in RA clinical trials [Short Form 36 Health Survey questionnaire (SF-36), RAQoL questionnaire, Nottingham Health Profile (NHP) and Arthritis Impact Measurement Scale (AIMS2)] showed that none of them covered all eight RAPP-PI outcomes [22]. Therefore, the addition of an existing validated questionnaire to the RA core set would not be sufficient to assess patient priorities highlighted by the RAPP-PI. For this, the identification of eight validated NRS with sufficient face validity from the literature or a new PROM would be required. The RAPP-PI patient core set was developed through three phases. First, in-depth interviews were conducted with RA patients to identify outcomes and the data were analysed in collaboration with patient research partners [16]. Second, nominal groups were held to begin to prioritise more than 60 outcomes with different RA patients. Third, a national survey was conducted to prioritise 32 outcomes in a manageable number for a core set, and which was representative of the RA population [20]. A patient research partner contributed to the steering committee throughout the study. The patient-centred method for outcome selection provides a foundation for involvement of patients in the identification of existing NRS or the development of a new outcome measure. This paper reports the process to enable the RAPP-PI patient core set outcomes to be measured, by identifying whether existing NRS have sufficient face validity from a patient perspective and developing a practical tool to capture changes in priorities known to be important to patients. While there is a substantial choice of instruments to assess RA (see Sect. 2 for examples), the aim of this project was to develop a comprehensive and feasible outcome measure including all of the patient priorities previously identified [20], constructed with a high level of patient involvement at every stage of the process. After validation studies, this will produce a multidimensional instrument to assess the patient perspective of impact.

2 Methods

There were two stages in the development of the RAPP-PI outcome measures: consultation meetings with patient research partners (Phase 1), and focus groups (Phase 2). Subsequent cognitive interviews and pilot testing of the measures (Phase 3) will be reported separately [23]. Ethics approval was granted by the London Dulwich Research Ethics Committee (11/LO/1524). Patient information was provided to eligible participants for each phase, and written informed consent was taken. Two patient research partners were part of the steering committee (BN and AW).

2.1 Identification of Instruments to Measure Priority Outcomes (Phase 1)

Patient research partners with RA at the Bristol Royal Infirmary attended two consultation meetings in November and December 2011. All had definite RA according to ACR criteria [24] and varying educational and professional experience (Table 1). Their mean (SD) age was 53.3 (12.2) years, disease duration 19.2 (10.5) years, 15 were female, and seven were currently in part- or full-time employment. Fifteen had previously been involved in research projects as patient partners in a variety of roles (Table 2). For three others, this was their first contribution as a patient research partner, although they had attended a training day in preparation. The partners were sent a booklet containing visual analogue scales (VAS) or NRS found in the literature review [2541] (Table 3) selected by the research team, or items or subscales from questionnaires where no VAS/NRS could be identified that assessed a similar notion into that in the original interview data [15]. Not all scales or items identified in the literature were included. For example, items from the Valued Life Activities (VLA) were not used because they were too specific, but were discussed in the consultation meeting as extra examples of wording [42]. Limited existing scales were identified for the eight priorities (Table 3): six for pain; ADL six; mobility two; life enjoyment three; independence four; fatigue five; and valued activities five. For joint damage, three NRS were drafted from previous interview data [15] because no instruments were identified that assessed patient-perceived joint damage. Partners were asked to complete the booklet as if they were a participant completing the questionnaire, and then to write comments about their preferences for the scales listed, thereby identifying whether existing instruments were considered to effectively assess the eight priorities from a patient perspective.

Table 1 Participant characteristics for Phases 1 and 2
Table 2 Characteristic of 15 experienced patient partners in Phase 1 (three additional patients were partners for the first time)
Table 3 Existing scales presented to partners in Phase 1

Four priorities were discussed at each meeting. Partners shared their opinion about the face validity of the scales and then voted for their preference for a specific existing scale or the need for a new scale to be developed. Emphasis was put on a scale being ‘good enough’, rather than perfect, given the resources involved in constructing new instruments. A consensus of greater than 50 % was required for an existing scale to be used. The meetings were digitally recorded and transcribed verbatim. Content analysis [43] was used to identify reasons for supporting or rejecting the use of an instrument. These qualitative data were used to inform the draft NRS in Phase 2.

2.2 Development of New RAPP-PI NRS (Phase 2)

Two focus groups were held with patients with a diagnosis of RA [20] at the Bristol Royal Infirmary and the Royal National Hospital for Rheumatic Diseases in January and February 2012, who had not participated in Phase 1 (Table 1). Prior to the groups, draft scales were constructed where partners in Phase 1 had voted for new instruments. Data from Phase 1 on the wording of questions, timeframe, response ‘anchors’ at each end of the scale and layout were used to construct the drafts by the research team, including the two patient research partners on the team.

The draft NRS were presented to the focus group participants for feedback on the stem question, time frame, anchors and layout. The groups were digitally recorded and transcribed verbatim. The transcripts were analysed primarily by TS and CA using content analysis [43], with the research team (including the two patient research partners) contributing to the finalisation of the questions.

3 Results

3.1 Consultation Meetings (Phase 1): Identification of Instruments to Measure Priority Outcomes

In total, 18 patient research partners attended at least one consultation meeting, with 14 attending both meetings. Existing NRS for pain [16] (12/17 votes), ADL [16] (12/17 votes) and fatigue [37] (13/15 votes) were voted as acceptable. The other five priority outcomes received between 12 and 17 votes for a new scale to be developed. The selected fatigue scale consists of three components: level of fatigue, effect of fatigue and coping with fatigue. The partners argued that these three components were measuring important different aspects of fatigue and that the three components should be applied to all of the priority outcomes. ‘Level’ was interpreted as the severity of the priority outcome experienced, and was understood as distinct from ‘effect’ and ‘ability to cope’ (see Table 3 for example quotations). ‘Effect’ was understood as how a priority outcome affected their life (that is, the impact). ‘Coping’ was interpreted as a patient’s own ability to change how the outcome was experienced. A vote was taken on the inclusion of the three components at the second consultation meeting and all 15 participants (100 %) agreed that each priority should be represented by three questions on severity, effect and ability to cope. This resulted in the need for 24 questions (19 new).

The qualitative data from the discussions and comments written on the instrument booklets regarding support for or rejection of existing instruments provided the content for the draft NRS (Table 3). For example, participants consistently argued that it was important to put ‘rheumatoid arthritis’ into the wording of each question. The draft NRS were piloted with the two patient research partners on the steering committee, before proceeding to Phase 2.

3.2 Development of New RAPP-PI NRS with the Three Components Severity, Effect and Coping (Phase 2)

Although 14 patients had agreed to participate in the two focus groups only eight took part. Their mean (SD) age was 64.7 (8.2) years, disease duration 12.8 (11.8) years, their disease severity measured on a patient global VAS (0 = well, 10 = very bad) was 4.0 (2.5), six were female, two were in part- or full-time employment, three had university qualifications and a further three had school qualifications at age 16 years. The reasons given by other patients for lack of attendance were an RA flare (2), transport difficulties on the day (2) and unknown (2). Focus group discussions initially discussed the wording of the NRS (stem questions), anchors, timeframe and layout (Table 4), as specified in the topic guide (Table 5). For example, different understandings of the word ‘mobility’ were apparent: “moving around”, “get from room to room”, “doing your chores”, “movement in your arm” and “dexterity”. There was a consensus from participants that the phrase ‘getting around’ should be used to convey the original meaning of the priority from the previous interview data [17] (Table 6).

Table 4 Phase 1 consensus results on existing scales and the need for new NRS with three domains, supported by example quotations
Table 5 Focus group topic guide
Table 6 Phase 2 focus group feedback on draft NRS

In the consultation meeting (Phase 1), four participants (of 17) worried about the effect of the term ‘joint damage’ on newly diagnosed patients who completed the questions. However, the focus group participants suggested that joint change could be understood by patients to include improvements of joints, as well as worsening, and this would cause scoring problems for the NRS. The consensus from the focus groups was to use ‘joint damage’ as patients should be “realistic” about the condition. In relation to the ‘valued activities’ priority, participants agreed that as their condition had progressed, the activities that they enjoyed had to change: “I suppose I’ve modified things I’ve tried to do down the years… It should ask about doing activities you currently enjoy” (his emphasis). The group decided that the comparison to pre-RA activities could create a distorted answer to the scales. The feedback on layout included using boxes around each set of priority questions for visual separation, importance of consistency in the format of questions, keeping questions as short as possible, and underlining key words to differentiate between the questions. Participants also spontaneously discussed their support for the eight priorities (see Table 4 for examples).

The draft NRS were edited in light of the focus group analysis and finalised through consensus with the research team, including the two patient research partners. The final NRS are provided in Fig. 1, including the instructions for completing the questions as also presented to the focus group participants for feedback (e.g. see layout quotation number 3).

Fig. 1
figure 1figure 1figure 1

Final Rheumatoid Arthritis Patient Priorities in Pharmacological Interventions. RA rheumatoid arthritis

4 Discussion

This research has developed patient-centred outcome measures that have strong face validity, through involvement of patient research partners in consultation meetings and on the steering committee, and patients in focus groups. Not only were patients able to significantly contribute to this process, they also understood the process of instrument validation (in Phase 1, to decide whether existing instruments were appropriate) and the measurement properties of wording, timeframe and anchors (Phase 2). For example, the discussion on how to phrase the priority on visible joint damage included the impact of wording on patients (damage vs. change), the effect of anchors on scoring and the need for an extended timeframe.

Existing validated measures were identified by participants for three of the priorities (pain, ADL and fatigue), and only one of these (fatigue) addressed all three components of severity, effect and coping [44]. There was a complete consensus amongst patient research partners that the three components should be addressed for each of the priorities. Thus, where there are existing measures, many are multi-item questionnaires, and they do not address these different components. The development of a single item for each priority allows the inclusion of all three components without expanding the questionnaire to such an extent that it becomes too burdensome for responders to complete. However, there is limited evidence that these components measure significantly different constructs. Nicklin et al. [37] found a satisfactory criterion and construct validity of the Bristol RA Fatigue (BRAF) short scales for severity, effect and coping, and that the data also demonstrated a ‘potential disconnect between the ability to cope with fatigue and its severity, which offers the possibility that patients might be able to improve their ability to cope with fatigue and thus reduce its effect, even if fatigue severity per se cannot be changed’ (p. 1566). It is unknown whether similar effects may be found for the seven other RAPP-PI priorities and further research is required to determine how these three components are related within the Impact Triad [45].

NRS were constructed for each of the priorities where existing validated instruments did not have sufficient face validity, and all of the 24 NRS formatted to have a consistent layout. The priority of visible joint damage provided the greatest challenge because no existing instrument could be found that measured patients’ self-reporting of joint damage. The three NRS were developed from previous interview data [17], and discussed during the consultation meetings and focus groups. A minority of Phase 1 participants were concerned about the effect of the term ‘joint damage’ on newly diagnosed patients, but the overall consensus was that RA patients should be realistic about the potential consequences of progression of their disease and the implications for measurement were more important. However, testing of these NRS through an observational study may show that self-reported joint damage may not be an effective method for measuring this patient priority. It may be more reliable to use clinicians’ assessment of tender and swollen joints, and PROMS for self-reported anxiety about joint damage instead.

Two of the existing scales that were voted as having acceptable face validity were taken from the RAID index [16]. The similarities in domains between the RAID and RAPP-PI may derive from the common focus on patient perspective and it is reassuring that patient priorities have been identified. However, the diverse approaches in constructing the NRS may result in a different final selection of domains for the RAPP-PI. Furthermore, one noticeable difference between the RAID approach and the one adopted here is that patients clearly recognised the difference between the three components of severity, effect and coping. It would be important in an observational study to compare how these RAPP-PI components relate to the RAID NRS, that is, to establish with which component the RAID’s ‘Impact’ NRS correlates most. These data may provide an indication of the necessity of including questions about ability to cope in rheumatology instruments.

A limitation of this research is that only eight (of 14 confirmed) participants attended the focus groups (Phase 2) across two sites. Although relevant data were also generated in the consultation meetings (Phase 1), these participants were limited to one site. However, the large number of patient research partners involved, with diverse educational and professional experience as well as of user involvement, enabled the process of item development and face validity to benefit from lively discussion. In addition to their diverse experiences of living with the condition, this discussion was generated from their involvement in other studies, including other outcome measure development, and the latest research partner training day that focused on methodological issues. Therefore, we argue that the effect of involving patient research partners in addition to patient participants was to fully engage with conceptual issues such as the different aspects of domains.

Further testing of the NRS wording will take place through future cognitive interviewing to address this, covering comprehension of the questions and the ability to recall information and make a judgement [46]. A strength is that these new outcome measures are based on the development of a conceptual framework grounded in patient data from a previous study [17], in collaboration with patients as recommended by the FDA [10], and benefiting from a higher level of user involvement [47]. Further research aims to construct a final tool, which will assess priorities important to patients in one instrument, constructing an algorithm from observational data to produce a single score.

5 Conclusions

It is effective to collaborate with patients when developing new measures, and a range of methods can be successfully used, including consultation meetings and focus groups. This study, through extensive patient feedback, constructed an instrument with 24 NRS based on priorities identified by patients and encompassing domains where existing questionnaires contain many more items and do not address three important concepts endorsed by patients: severity, effect and coping. Further research is required to pilot the RAPP-PI PROMs for the evaluation of feasibility, construct validity and sensitivity through an observational study.