Introduction

The prognosis of patients with rheumatoid arthritis (RA) has improved markedly due to a combination of factors, such as the development of biological agents and new small molecules, a better use of conventional therapies—probably in relation to the high price of the newer drugs—and the use of strategies for treatment adjustment according to disease activity. These latter strategies imply the use of quantitative measures—so-called clinical indices—to avoid basing decisions solely on physician impression.

In general, quantitative measures for RA are composite indices based on a core set of measurements, including number of tender and swollen joints, physician assessment of disease activity, patient evaluation of pain, activity, and physical function, plus laboratory measurements, basically acute phase reactants, either the erythrocyte sedimentation rate (ESR), or C-reactive protein (CRP) [1, 2].

The composite indices developed for RA in the last decades are based on this core set; nevertheless, none of them includes all dimensions of disease activity. The most commonly used indices are: the Disease Activity score (DAS) [3, 4], which combines, in a continuous measure, the Ritchie’s index (number of tender joints out of 44), a global assessment of disease activity by the patient through an Visual Analogue scale (VAS) from 0 to 10, and ESR; simplified versions of the DAS, using the counts of 28 joints instead of 44 [5] and using CRP instead of ESR in the formula [6, 7]; the Simplified Disease Activity index (SDAI), which is the arithmetic sum of the variables included in DAS28 plus the physician’s assessment of disease activity [8]; and a version without phase reactants, the Clinical Disease Activity index (CDAI) [9, 10].

The DAS28, used as part of the endpoint in almost all RA clinical trials, has some limitations, the main being its complexity for calculation in clinical practice. In addition, in the formula, the tender joint count is weighted twice as much as the swollen joint count when the latter is a more specific characteristic for RA than the former; the same for ESR, which is strongly weighted and may induce changes in the index even when within the normal range [11.] Some researchers argue that some parameters included in the DAS28 are very subjective, as they may be affected by the psychological state, not solely by inflammation, not accurately reflecting the patient’s clinical status, but his or her perception [12]. All in all, and despite being an excellent measure for clinical trials, DAS28 may result in wrong measures of disease activity at the individual patient level. As a result, several groups are working on adapting this index to clinical practice. In addition, it is more evident the necessity to implement new methods to identify better the real inflammatory state of these patients. Remission and low activity state are probably the most important targets for these new tools.

High-frequency ultrasound (US) can be used to assess objectively inflammation in RA [13]. Adding US to a clinical measure should, in principle, improve the reliability and validity of the measurement of disease activity, and several US indices have been developed, using various combinations of joints and scoring systems [14,15,16]. All of them use information based solely on US. The correlation of these indices with the DAS28 is good, but none of the US-based indices has been extensively used in clinical practice because of different reasons, such as the large number of joints to be evaluated, the use of a semiquantitative scale for synovitis, and the long time required to perform the evaluation [17].

An index based on essential clinical measures plus a US measure, focused on simplicity, with appropriate validation, would allow a better classification of patients at different levels of disease activity than a clinical only or US only index. On the other hand, in cases with low activity or remission, a combined index including ultrasound measure could provide a more objective measure detecting subclinical synovitis to help the rheumatologist being more effective in using medications. The main objective of this study was to develop and validate a mixed clinical-US index to reflect disease inflammatory activity in RA for use in clinical practice avoiding confounding variables.

Methods

Study design

This study was carried out with mixed methods, qualitative and quantitative.

For the theoretical development of the index, discussion group and Delphi techniques were used. The purpose of the discussion group—composed of ten rheumatologists from the US group of the Catalan Society of Rheumatology plus two renowned RA experts—was to elicit items or potential elements to be included in a disease activity index and to define them as clearly as possible, including measurement variants. After the meeting, a Delphi survey (19 investigators) was carried out to prioritize items among all elicited ones to include in the index. For this task, all items were anonymously graded as to “perceived degree of objectivity”, “capacity to reflect RA activity and its changes”, and “feasibility in clinical practice” in 1–5 scales. The grades were then averaged into a global ranking score from 1 to 5. Only those items which scores were ≥ 4 were forwarded to a second quantitative phase.

For the construction and validation of the index, a cross-sectional multicentre study was conducted.

Centres in Catalonia with at least one rheumatologist with high experience in US and availability of US machines at the rheumatology offices were invited to participate. All ultrasonographers had demonstrated experience performing musculoskeletal US. All of them had passed the advance level of Spanish ultrasound school of our Rheumatology Society and had more than 5 years of experience performing US exams.

Patients

Patients were selected from the participating centres through a consecutive sampling among those with enough inflammatory activity to evaluate all the items needed for the index. All centres included the same number of patients during the recruitment period (2 months). Neither treatment nor comorbidities were considered as exclusion criteria. Patients who had required joint surgery in the past were excluded.

The inclusion criteria were:

  1. 1

    Diagnosis of RA according to EULAR (European League Against Rheumatism)/ACR (American College of Rheumatology) 2010 criteria [18];

  2. 2

    Any degree of inflammatory activity with any treatment;

All patients signed the inform consent prior to inclusion in the study.

Variables

The gold standard was disease activity (present or absent), defined by consensus of the panel of rheumatologists as based on all available information in each case.

In addition, the following clinical variables were collected: swollen and tender joints from a total of 28 [shoulders, elbows, wrists, metacarpophalangeal (MCP) 1–5, interphalangeal (IP) 1–5, and knees], patient and physician global assessment (VAS), ESR, CRP, rheumatoid factor (RF), and anticyclic citrullinated peptide antibodies (ACPA); age; disease duration; and previous or current treatment.

We used ultrasound machines of General electric (GE) and Esaote (E) brand: E Mylab Six, Mylab seven and Mylab twice, and GE S8 and E9. All machines were equipped with a multifrequency linear transducer and frequency used was the maximum possible in each machine (10–15 MHz) and in each joint to obtain the best quality image as possible. In each ultrasound machine, the same settings were used for all patients.

PD settings were: medium dynamic range, medium persistence, medium frame rate, low wall filter, and 0.5–0.8 Hz pulse repetition frequency. In each machine, these parameters were adjusted to obtain the maximum sensitivity to identify Doppler signal at the lowest possible value for each joint. The US assessment included: synovitis or tenosynovitis by grey scale (GS) and PD from a total of 42 anatomical structures [bilateral shoulder, bilateral elbow, bilateral wrist, bilateral wrist flexor tendons (all grouped), bilateral wrist extensor tendons first–sixth compartment (all grouped), bilateral MCP 2–5, bilateral IP 2–5, bilateral finger flexor tendons (all grouped), bilateral knee, bilateral posterior tibial tendon, bilateral peroneal tendons (long and brevis grouped), bilateral tibio-talar joints, bilateral subtalar joints, and bilateral metatarsophalangeal (MTP) second and third]. The standard US method was used (OMERACT definitions) [19].

Each structure was scanned in longitudinal and transverse view following EULAR ultrasound recommendations.

Blinding between clinical and US assessments was maintained by having patient’s data collected by two independent rheumatologists in each centre.

Statistical analysis

After the descriptive analysis, the study sample was divided into two random sub-samples, one for the construction of the index and the other for its validation.

For the construction of the index, a procedure in different and successive steps was used: (1) selection of US locations; (2) selection of US scoring method; and (3) creation of the index.

For the selection of the most suitable US sites to include in the index, we used combinations based on: (a) frequency of US abnormalities, typical of RA, in GS and PD; (b) locations used in the indices published by Naredo et al. formed by 12 structures [15], and in the APPRAISE study [19]; and (c) feasibility and time spent on US assessment in the opinion of the researchers. We then tested the capacity of the different combinations to detect > 90% of the structures with synovitis and PD signal (sensitivity).

Once the locations to be included were selected, the scoring method needed to be defined. For this, we used three approaches: (a) a semiquantitative scale (0–3) used for GS and PD; (b) a dichotomous scale also for GS and PD (0 and 1 if GS > 2 or PD ≥ 1); and (c) a qualitative scale (0 and 1 total, not by GS and PD) based on a decision tree proposed by the participating researchers (Fig. 1).

Fig. 1
figure 1

Decision algorithm for the qualitative ultrasound score of inflammation. All researchers were asked to define the value (0 or 1) of each branch of the algorithm as to whether there was inflammation or not. The percentages (%) reflect the agreement with the final value of the branch

The composite index was calculated as the arithmetic sum of three constituent sub-scales (US assessment, swollen joints count, and CRP) selected from the previous steps. Three indices were created according to the different scoring modalities of the US evaluation.

After the creation of the index, a validation study was carried out in the subsample intended for this purpose. The dimensions of validity analysed were construct validity and reliability.

Construct validity was tested by a correlation analysis between the new index and different external measures of activity. Convergent validity was based on the correlation between the new index/ices and DAS28, SDAI, and CDAI, calculated according to the following formulas: DAS28 = (0.56 × √tender joints count) + (0.28 × √swollen joint count) + (0.70 × ln(ESR)) + (0.014 × VAS physician); SDAI = swollen joint count + tender joints count + VAS (patient) + VAS (physician) + CRP (mg/dl); and CDAI = tender joints count + swollen joint count + VAS (patient) + VAS (physician).

Correlation between the new composite US indices and the calculated external activity measures [(DAS28, CDAI, and SDAI plus the physician’s overall assessment (PGA)] was analysed with the Spearman correlation coefficient (ρ).

The reliability of the index was evaluated by a test–retest analysis in a subgroup of patients re-evaluated in a week time (5 per centre) with the intraclass correlation coefficient (ICC).

Results

Delphi study

Responses with values ≥ 4 over 5 were considered a priority to include in the assessment. The only parameters that met this requirement were: bilateral US evaluation of tendons and joints, CRP, and swollen joint count.

Descriptive analysis

A total of 13 hospitals participated in the study. The sample consisted of 281 RA patients, mainly women in their fifties, with a mean of 11 years from diagnosis, and various levels of disease activity, being 80% receiving treatment with DMARD and 46% with biological (Table 1).

Table 1 Description of the complete sample (creation plus validation)

After the descriptive study, a random sample split was performed in two sub-samples, one for the index design (n = 141) and the other for the validation study (n = 140).

Selection of ultrasound locations

Two summary variables were constructed for GS and PD based on the frequency of abnormalities at each US location. The presence of synovitis and PD signal was defined by values greater than zero in any location. Among the 141 patients included, 130 (92%) had synovitis at some location, and in 89 (63%), a PD signal was detected. The most frequently affected joint was the wrist and the sites that presented fewer abnormalities were the wrist flexors and peroneal tendons. Table 2 shows the sensitivity of the different combinations of sites to detect synovitis and PD.

Table 2 Sensitivity of different locations for the detection of synovitis or power Doppler signal

Due to the similarity of the results on the capacity of these combinations to detect synovitis and PD, it was decided to use the last combination for the construction of the index, as it was considered more feasible and faster to perform. This combination contains 14 structures (7 bilateral): wrist (including flexors and extensors of the wrist), MCP (2 and 3), knee, tibio-talar joint (plus posterior tibial tendon and peroneal tendon), and MTP (2 and 3).

Methods of ultrasound scoring

We then calculated the scores by the three previously defined scoring scales: (1) semiquantitative: from 0 to 3 (each location is based on 4 evaluations (right and left side, GS and PD); therefore, the score ranges from 0 to 12); (2) Dichotomous: 0/1 (0 and 1 in GS count as 0, and 2 and 3 as 1; in PD, any value ≥ 1 counts as 1; the total score per location ranging 0–4); and (3) Qualitative: 0/1 based on the algorithm (Fig. 1) (range per location: 0–2). In the qualitative score, if one area has a 1, for instance a tendon, the rest of the area needs no assessment, as it will be a 1 in any case.

Final indices to validate

Three composite indices were created based on the sum of US assessment, swollen joint counts, and CRP value (mg/dL) with each of the three scales (index 1 semiquantitative, index 2 dichotomous, and index 3 qualitative).

Validation study

In the validation subsample, 126 patients had valid information for the construction of the three composite indices. The mean ± SD of the three indices in this subsample were 11.9 ± 9.7 for the semiquantitative; 3.3 ± 3.1 for the dichotomous; and 4.6 ± 3.8 for the qualitative. The respective ranges were: 0–43; 0–13.7; and 0–16.4.

Correlations between the three US indices were high. Correlation with external measures of activity was higher for the dichotomous and qualitative indexes than for the semiquantitative. The highest correlations were obtained with the physician’s overall assessment (PGA) (0.702 and 0.771), followed by DAS-28 (0.694 and 0.678), SDAI (0.661 and 0.666), and finally CDAI (0.652 and 0.658) (Table 3).

Table 3 Correlation between composite ultrasound indices and external activity measures

For the reliability analysis, a sample of 44 patients with two US examinations separated by a week was used. The ICC values obtained were high for all three indices, ranging from 0.89 to 0.93 (Table 4). No modification in treatment was done between both US examinations.

Table 4 Interobserver reliability of the three indices created

Given the most adequate validity, reliability, and feasibility, the index selected was index 3, from now on, USAS (UltraSound Activity score).

Discussion

We have created a new tool, USAS, combining clinical, US, and biological information to measure and monitor inflammatory activity in RA patients in clinical practice (See Supplementary material for an example of scoring sheet). Experts agreed upon the face validity of the components finally included in the proposed tool, as a reflection of inflammatory state, rather than other aspects of RA activity. Further development of the index stressed on the feasibility of the tool without losing validity. Finally, the validation showed a good behaviour against widely used indices of disease activity, and high reproducibility.

The utility of clinical parameters in decision making in RA is out of doubt [20]. However, evidence shows over and underestimation of the inflammatory activity in individual patients [19, 21,22,23]. US have demonstrated greater sensitivity than clinical exam for detecting synovitis, and PD is an excellent outcome marker for flare and structural damage and sensitive to change in patients under different treatment strategies [15, 19, 24, 25]. US is not systematically used in the clinic to monitor disease activity for different reasons, mainly related to feasibility and scoring complexity. We created the USAS with the aim to be as simple as possible, so that rheumatologists, with any level of expertise on US, could use it in the clinical practice. For this reason, we included a large number of centres with US equipment of varying quality, and rheumatologists with sufficient qualification to perform an US but without a specified level of expertise. Such heterogeneity, instead of becoming a problem, enhanced the possibility to design an index useful in a real-life context. Other authors, as Naredo et al. have used similar strategies for their US studies [15].

The originality of USAS relies in that it combines exam, laboratory and US parameters. Most studies on US scores did not integrate other measures, and show poor concordance with clinical parameters, probably because they measure distinct aspects of the patient’s clinical situation [14, 15, 26,27,28,29,30,31,32]. The DASECO is an US-based DAS28 index that uses GS and PD measurements as an alternative to tender and swollen joints by physical exam plus the rest of parameters in the formula of the DAS28 [33]. The correlation between the DAS28 and the DASECO is good. However, we should bear in mind subtle differences between the DASECO and the original DAS28 evaluations: (1) in DASECO, PD is performed in the MTP joints, whereas DAS28 does not include foot joints; (2) in the DAS28 the joint count is binary (yes/no for tenderness and inflammation), while the US score is semiquantitative; and (3) DASECO differentiates between tenosynovitis and synovitis, this latter more representative of RA, whereas this differentiation is not possible in DAS28. Our group did not aim to create a variant of the DAS28—same as when we enter CRP instead of ESR in the formula—but to generate a reduced measure—to enhance feasibility—with enough face validity of inflammation and feasibility. We could have used a weighted index or formula, as in the DASECO, but decided to use a sum instead, as in the CDAI. These decisions improve uptake in the clinic, and we proved that are valid, at least to the point we have validated the USAS. On the other hand, the time to perform the USAS index was less than 30 min, including joint count and ultrasound examination. In our opinion, this time is feasible if we intend to carry out an exhaustive evaluation of the patient’s inflammatory state.

Compared to other US scores, USAS uses a qualitative scoring based on an easy algorithm, while others, except for the DASECO [33], use semiquantitative scores. Semiquantitative scoring may not be the most appropriate in clinical practice, because is time consuming and there is significant variability in the assessment of different degrees of activity in GS and PD. In addition, PD is probably the best outcome parameter of inflammation in musculoskeletal US [34,35,36]. Based on this, the algorithm used in the USAS weighs PD heavily, what increases the face validity even more.

Regarding the structure selected for US assessment, most of the relevant US scores include anatomical locations similar to ours [14, 15, 29, 30, 37]. We decided to add tendons to complete the best possible evaluation, in line with previous studies showing the need to evaluate joints and tendons jointly in RA patients [38,39,40]. Whether merging tendons with joints may have an impact on sensitivity to change must be tested.

Recently, another group has created a combined clinical and ultrasound score, the US-CLARA [41]. This index is clearly different to our, presented here. They use different parameters, as self-administered tender joint counts, that can be considered subjective and very influenced by other pathologies as osteoarthritis, fibromyalgia, or pain. Our score is specifically created to avoid this problem, and the main advantage is the utility to identify inflammatory activity in patients in clinical remission or help clinicians to better identify the extension of inflammatory activity. Both are created to control RA patients, but our score is specifically thought to study real inflammatory state in clinical practice, avoiding subjective clinical variables.

A final recall to the objectives of this new development. Despite needing further validation, we created the USAS with the aim to reflect inflammation and to be used in clinical practice. The fact that USAS reflects inflammation may need further assessment—independently of the hurdles of finding an adequate gold standard—but it may be the reason why the correlation with DAS28, SDAI, and CDAI is not perfect, but it is with PhGA. We believe that classical activity measures are measuring something more than pure inflammation. On the other hand, we are not suggesting that USAS should be used in all patients in clinical practice, but mostly in those in which there might be a discordance between the patient and doctor assessment, namely those in which inflammation might not be as clear as in others. Further validation to confirm this and other hypotheses is under way. Once the validity and reliability of the index have been demonstrated, a prospective study shall be carried out to assess its responsiveness, or ability to detect a change in the construct of interest (activity). For methodological reason we decided to perform the project in two different phases. First, the creation and internal validation, presented here. In second place, external validation in a prospective study to analyse the responsiveness of the index and identify levels of inflammatory activity using USAS score. In summary, USAS combines clinical, laboratory, and a simplified US assessment in a single score with good metric properties, easy to perform. Although further validation in other setting is needed, the USAS is able to help the knowledge of inflammatory process and outcome as well as facilitate the evaluation of RA patients in whom inflammation may be unclear.