Rheumatoid arthritis (RA) is a systemic inflammatory disease, which predominantly affects the synovial joints resulting in synovitis and often subsequent progressive bone and cartilage destruction. Magnetic resonance imaging (MRI), with its excellent soft tissue contrast, is a sensitive modality for the detection of active synovitis and bone erosions in RA.

Intravenous (IV) administration of paramagnetic gadolinium-containing contrast agents (Gd) makes the inflamed synovium easy to recognize and evaluate, and IV Gd is generally recommended for MRI assessment of synovitis in RA [14]. However, the use of intravenous Gd adds to the overall examination time and cost and may, although rarely, induce side effects [57]. It therefore reduces the feasibility of MRI in RA. Omission of Gd injection would allow imaging of more joints, which potentially could provide information that better reflected the overall disease status.

Some MRI sequences, such as T2-weighted fat-saturated (T2w FS) sequences and short tau inversion recovery (STIR) sequences, display areas with a high water content as bright areas. Thus, oedematous areas within the inflamed synovium show increased signal intensity on such fluid sensitive sequences, and evaluation of synovitis of the wrist and joints on MRI with no contrast injection has been reported [8, 9]. However, only limited data exist regarding the reliability of synovitis scoring of the hand in RA patients using these fluid sensitive sequences (STIR and T2w FS) [10] and it has not been investigated how various field strengths, coil types, and image resolution influence synovitis assessment .

The aim of the present study was, by comparison with Gd-enhanced MRI as the standard reference method, to explore to which extent synovitis in wrist and metacarpophalangeal (MCP) joints can be reliably assessed by MRI without Gd injection at different MRI field strengths (0.23 T, 0.6 T, 1.5 T, and 3.0 T), coil types (flex coils and dedicated phased-array extremity coils) and image resolutions.

Materials and methods

Study population

Forty-one patients and 12 healthy controls were included in the study. All patients fulfilled the American College of Rheumatology (ACR) 1987 criteria for RA.

Inclusion criteria

To be included in the study, patients and controls should be 18–85 years old and have no contraindications to administration of MRI contrast agent. Furthermore, patients should have ≥1 clinically swollen wrist or MCP joint.

Exclusion criteria

Patients with any changes in conventional or biological disease-modifying anti-rheumatic drugs (DMARDs) or prednisolone or any glucocorticoid injections within the last 30 days could not be included. Healthy controls were excluded if they had joint pain or familial disposition to RA or psoriatic arthritis.

Imaging procedure

Each subject underwent MRI examinations of the wrist and second to fifth MCP joints, within a period of 24 hours, on four MRI units with different field strengths: a 0.23 T Philips Panorama open unit, a 0.6 T Philips Panorama open unit, a 1.5 T Philips Achieva conventional unit, and a 3.0 T Philips Achieva conventional unit.

In total, participants had seven different coronal STIR sequences obtained at the four different MRI units, with different coil types and voxel sizes: At all field strengths a STIR sequence was obtained with a flex coil. At 0.6 T and 1.5 T, an additional STIR sequence was obtained with a three-channel (0.6 T) or four-channel (1.5 T) phased-array dedicated wrist coil (extremity coil). MR-acquisition parameters are represented in Table 1. At 1.5 T an additional STIR sequence was obtained with the same four-channel phased-array dedicated wrist coil, but using a smaller voxel size of 0.5 × 0.5 × 2 mm (other parameters were adjusted to retain a comparable scan time).Coronal T1-weighted (T1w) gradient echo images (without contrast injection) were obtained at the four different MRI units using flex coils at 0.23 T and 3 T and dedicated wrist coils at 0.6 and 1.5 T. The 1.5 T MRI was always performed last, and on this unit the final procedure was IV injection of Gd-containing contrast agent (Dotarem®, Guerbet; France; 0.1 mmol/kg body-weight), followed by repetition of the T1w sequence immediately after contrast injection.

Table 1 Magnetic resonance parameters of the short tau inversion recovery (STIR) sequence on the different field strengths, coils, and resolutions

Fifteen subjects (ten patients/five controls) did not undergo post-Gd MRI evaluation. In another patient, the the second to fifth MCP joints were not included in the post-Gd study. Data from these patients were only used for calculation of agreement between the STIR scores and excluded for the remaining statistical analyses.

Image analysis and scoring

The STIR sequences were anonymised and all MRI data (MRI unit type and parameters) were deleted. Each of the seven STIR sequences were paired with the corresponding T1w sequence for anatomical reference. For the second and third reads (see below) images were re-anonymised. All anonymisations were performed by a person not involved with the reads.

Examinations were evaluated by a musculoskeletal radiologist (IE) with 8 years of experience in RA imaging and OMERACT RAMRIS scoring, blinded to all patient's and controls' clinical and MRI details. In the first read, the reader evaluated the coronal STIR images with the corresponding pre-contrast T1w sequence for anatomical reference and scored synovitis according to the principles of the RAMRIS system [11]. According to this method each of the second to fifth MCP joints, the distal radio-ulnar, radio-carpal joint, and intercarpal (IC)-carpometacarpal (CMC) joints are scored on a 0–3 scale (normal, mild, moderate, severe) [11].

In the second read, performed in a subsequent reading session, the reader was blinded to the results of the STIR reads. This read included the T1w post-contrast sequences and the corresponding pre-contrast T1w sequences for reference. These RAMRIS scores were considered the “standard reference” scores. OMERACT RAMRIS recommendations require two planes for scoring of erosion [1113], and as T1-weighted images were only obtained in one plane, bone erosion could only be assessed suboptimally, and data on bone erosions are consequently not reported. Results concerning bone marrow oedema have been reported elsewhere [14].

After a one-month interval a third reading session was performed. This included an evaluation of intrareader reliability on a cohort of 15 patients and three controls randomly selected by a person not involved in the reads. For feasibility reasons not all patients and controls were included in this read. The patients in this cohort were re-anonymised and rescored for synovitis on each of the STIR sequences and, separately in a subsequent reading session, the T1w-post-Gd sequences were scored by the same reader.

Statistical analysis

The scores obtained from the T1w-post-Gd MR images recommended by OMERACT RAMRIS for scoring of synovitis [3, 11, 15] were used as gold standard reference.

Statistical analyses were performed on the total sum-score of synovitis per subject and on the synovitis score per joint.

The intrareader reliability of synovitis scores in 15 patients and three controls was assessed by single-measure intraclass correlation coefficients (ICCs) based on a two-way mixed effects model in which ICC values (ICC<0.2:poor, ≥0.2 and <0.:fair, ≥0.4 and <0.6: moderate; ≥0.6–0.8: high; ≥0.8 very high agreement). ICCs were also used to correlate the total synovitis score of each of the STIR sequences with the score from the post-Gd standard reference and between themselves. The non-parametric Wilcoxon signed ranked test was used to evaluate differences between scores.

Agreement rates for the presence/absence of synovitis were calculated as sensitivity, specificity, and accuracy for the different STIR protocols compared to the gold standard reference. The percentage exact agreement joint-by-joint was also calculated. P-values <0.05 were considered statistically significant. No data imputation was done.

The study was approved by the regional committee on biomedical research ethics for the capital region of Denmark and the Danish Data Protection Agency. Each participant signed informed consent before inclusion.

Results

Patient demographics and clinical and biochemical data are given in Table 2.

Table 2 Demographic, clinical, and biochemical data for patients and healthy subjects

The duration of synovitis scoring per patient was on average 10 minutes, and was similar for the unenhanced and enhanced image sets.

Synovitis scores in patients and controls

Synovitis scores according to the different units and sequences are provided in Table 3 and Fig. 1 (examples in Fig. 2). For all sequences, the synovitis scores were higher in patients than in healthy controls (p < 0.01). The scores using the T1w-post-Gd sequence (gold standard reference; median 6.5/mean 7.42) were higher than with any of the STIR sequences (median 3–6.5/mean 4.22–6.03) (p < 0.05), except for the 1.5 T-flex (p = 0.089) and 1.5 T-extremity with small voxel size (p = 0.251) STIR sequences.

Table 3 Synovitis scores obtained using short tau inversion recovery (STIR) sequences from different magnetic resonance imaging field strengths and coils and from the 1.5 T T1-weighted post-Gd sequence (the gold standard reference)
Fig. 1
figure 1

Box-plot with mean (2SD) synovitis scores obtained using the STIR sequences from different magnetic resonance imaging (MRI) field strengths and coils and from the 1.5 T T1-weighted post-Gd sequence (the gold standard reference) Gd=gadolinium; f=flex coil; ex=extremity coil; sm=small voxel; SD=standard deviation; *=gold standard reference

Fig. 2
figure 2

Magnetic resonance images (MRIs) performed at different field strengths and using different coils of the second metacarpophalangeal (MCP) of a patient with rheumatoid arthritis.. The synovitis score is highest on the post-contrast T1-weighted (T1 Gd) image, with lower scores in the other protocols. *Synovitis scores are based on an assessment of several continuous slices, not only the ones displayed here. RA=rheumatoid arthritis, Gd=gadolinium, f=flex coil, ex=extremity coil, sm=small voxel size

Using single measure intraclass correlation coefficients (ICCs) based on two-way mixed effects models, the correlation of synovitis scores both on STIR and T1w-post-Gd sequence with patients swollen joints count (MCP 2–5 and wrist joint) were low (ICC: -0.2 – +0.12).

Intrareader reliability

The intrareader reliability of total synovitis scores (for patients and controls) was very high (ICCs ≥ 0.80) for the T1w-post-Gd sequence and for the 3 T STIR sequence, while moderate-high for the remaining STIR sequences (ICCs 0.50–0.76) (Table 3). The agreement on presence versus absence of synovitis was high both when calculated by person level (≥0.87) and by joint level (≥0.80). The proportion of joint scores with exact agreement on the score was also high (0.62–0.74).

Agreement with gold standard reference (T1w-post-Gd MRI)

Tables 4 and 5 provide the ICCs between scores obtained by the different STIR sequences and the T1w-post-Gd sequence. The sensitivity, specificity, and accuracy of the different STIR sequences, with the 1.5 T T1w-post-Gd sequences considered as gold standard reference are also provided. There was fair–high agreement of synovitis scores by the different STIR protocols with the T1w-post-Gd standard reference score, when measured by the ICC (0.38–0.72). The highest correlation was observed with the 1.5 T STIR using the extremity coil with standard or small voxel size (ICCs 0.67 and 0.72, respectively). Lower values were observed for 0.23 T and 3 T with flex coils. Values were generally numerically higher using extremity coils than flex coils. The sensitivity and specificity on the person level was generally high. The same was true on the joint-by-joint level, as described in the following paragraph.

Table 4 Intrareader reliability and agreement of scoring on the different magnetic resonance imaging field strengths and coils using short tau inversion recovery (STIR) sequences with gold standard reference (1.5 T T1-weighted post-Gd sequence) scores
Table 5 Intraclass correlation coefficients (ICCs) of STIR synovitis scorings using different MRI field strengths and coils

The accuracy for the presence of synovitis per joint ranged from 0.70 to 0.83, whereas absolute agreements on scores were lower (0.50–0.66), being highest on the 1.5 T unit with extremity coil and small voxels and lowest at 3 T.

When discrepancy occurred in the scores, the majority of joints scored higher on T1w-post-Gd sequence compared to any of the STIR sequences. When for instance the 0.23 T unit was compared to the T1w-post-Gd, 60 % of joints (155/257) were scored the same, 36 % (93/257) scored higher on T1w-post-Gd and the remaining 4 % scored higher at 0.23 T. This tendency was present for all STIR protocols.

Discussion

Since contrast injection increases invasiveness, duration, and costs of MRI, it is very important to clarify to which extent and with which technique MRI without contrast injection can be used for reliable assessment of synovitis in RA. If one or more unenhanced techniques were found reliable, gadolinium-contrast injection could then be avoided in RA and other arthritides. The present study is the first systematic evaluation of unenhanced MRI for scoring RA synovitis, which applies the entire range of relevant field strengths and various coil types. When T1w-post-Gd MRI was considered the gold standard reference, STIR sequences provided fair to high agreement concerning scoring of synovitis. The accuracy for detection of synovitis (regardless of score) was quite high in all protocols used, whereas the agreement was lower on absolute scores. The highest ICCs, sensitivity, and accuracy were seen using the 1.5 T unit with an extremity coil, even though comparable values were seen for several other protocols. Overall, the intraobserver reliability and agreement with post-contrast MRI were better for 0.6, 1.5, and 3.0 Tesla than for 0.2 Tesla, and better for extremity coils than for flex coils.

Sufficient reproducibility is a prerequisite for a reliable scoring method. A high intrareader reliability for detection of synovitis on the T1w-post-Gd sequence has previously been demonstrated by the OMERACT group [16]. Agreement on presence versus absence of synovitis on the patient level was in the present study also high for the STIR sequences, whereas absolute agreement on the score was lower. This tendency was also seen for interreader agreement by Østergaard et al. evaluating synovitis on STIR sequences on 1.0 and 1.5 T MRI units [10].

In the current study the sensitivity of unenhanced water-sensitive (STIR) sequences for the presence of synovitis were between 80 % and 100 %, with post-contrast MRI as gold standard reference, as determined by an experienced reader. Specificity values ranged from 43 % to 100 %, and the accuracy was 50 % to 100 %. Similar values were seen in previous studies both on conventional strength magnets such as a 1.5 T unit (sensitivity =77.8 %, specificity=49.7, accuracy=65.3 %) [17] and a 1.0 T unit (87 %, 42 %, 83 %, respectively) [10], and on a lower field strength (0.2 T) magnet (60 %, 96 %, 76 %, respectively). Exact agreement on scores was not evaluated in these studies. Exact agreement between STIR and T1w-post-Gd sequences for synovitis scores in our study was higher for grade 3 than for other grades; e.g., STIR-readings at 0.6 T using flex coil had exact agreements of 67 % for grade 0, 37 % for grade 1, 36 % for grade 2, and 80 % for grade 3. This may be explained by the fact that small amounts of synovial fluid can be seen in normal joints without synovitis, complicating the differentiation from low-grade synovitis, while a large amount is easy to detect. Still, only enhancement after contrast injection can differentiate between fluid and inflamed synovium, and the use of STIR sequence for this purpose may result in false-negative or false-positive results. However, it should be remembered that an indirect arthrographic effect can be seen on delayed contrast-enhanced images [18], and fibrotic synovium may show low signal intensity on T2w images and no enhancement on post-contrast T1w images [19]. Based on the current knowledge, synovitis can to some extent be detected without gadolinium injection by using STIR sequences. However, for maximal sensitivity, accurate scoring, and presumably also for sensitive evaluation of changes over time (which currently was not assessed), T1w sequences with contrast injection are essential. The fact that synovitis can be assessed to some extent without contrast injection is important in the sense that this may allow diagnosis of joint inflammation in patients not suspected of joint inflammation, in which contrast injection is generally not done [20].

In the current study, low-grade MRI synovitis was also detected in some healthy controls. This is in agreement with previous studies [21].

To assess the importance of field strength and coil type, we evaluated different MRI field strengths with extremity or flex coils and also an additional higher resolution sequence with smaller voxel size. Flex coils in general resulted in lower ICC values and lower synovitis scores compared to dedicated extremity coils. This was expected since dedicated extremity coils are specifically designed for the wrist and hand, while the flex coils are multi-purposed and not specific for the hands [22]. The scores from the 1.5 T STIR protocol with smaller voxel size were most similar to 1.5 T post-Gd scores and had higher ICC values (0.72) and accuracy (66 %) compared to the other 1.5 T STIR protocols (conventional voxel size: extremity coil ICC = 0.67, accuracy = 60 %; flex coil: ICC = 0.42, accuracy =57 %). This indicates that higher resolution and well tailored sequences produce higher quality images that are easier to evaluate and score.

In the current study, readings were generally more reliable, as assessed by ICC scores, on the 1.5 T unit compared to lower field strengths, when post-contrast MRI was considered the standard reference, whereas absolute agreement on scores was comparable between all protocols. The agreement of STIR MRI with post-contrast T1w MRI was in our study relatively low compared to correlations observed in studies evaluating the agreement between synovitis scoring with T1w-post-Gd sequences on different field strength magnets. For example, Naraghi et al. [23] observed an ICC of 0.90 for synovitis scoring between 1.0 T and 1.5 T unit and Ejbjerg et al. [24] observed a kappa value of 0.92 between 0.2 T and 1.5 T. These differences again suggest that the STIR sequence can help appreciate major changes in synovitis, but is not sufficiently reliable and sensitive to detect minor differences in synovial hypertrophy. Scoring according to the RAMRIS method is time consuming and not widely used in clinical practice. However, the assessment of whether synovitis is present or absent is also feasible in clinical practice.

In the current study, synovitis was evaluated by an experienced reader. However, with a less experienced observer result may have reduced reliability. It has been suggested that computer-aided techniques might add to objectivity and reliability of scoring [25, 26] and could consequently potentially be of value in the evaluation of synovitis in clinical practice and research. The role of computer-aided techniques for non-enhanced images needs further evaluation.

The fact that the 3.0 T unit had the lowest synovitis scores, the lowest ICC values (0.38), and the lowest accuracy (50 %) compared to the standard of reference was unexpected. The 3.0 Tesla magnet offers a higher signal to noise ratio (SNR) compared to the lower field magnets. However, along with the gain in SNR, there is an increase in magnetic field heterogeneity. Thus, centre positioning of the imaged organ is a major issue in reducing field heterogeneity in 3 T units, compared to lower field strength units [22, 27]. We therefore believe the lower values on the 3 T unit result from the use of the flex coil and suboptimal hand positioning. In contrast, the 0.23 T and the 0.6 T units are dedicated for imaging of the extremities, and their C-shaped design enables imaging the hand and wrist in the centre of the magnet where the homogeneity is optimal. This probably was the main cause of the higher ICC and accuracy values on these units compared to the 3 T unit.

Several methodological limitations should be considered. First, images were evaluated by one reader only. More readers could have brought additional evidence on the generalisability of the results. Also, intrareader reliability was assessed only on a subset of patients and not over the entire cohort, However, this was done due to feasibility and because determining interreader agreement was not the main focus of this work. Additionally, interreader and intrareader agreements of unenhanced and enhanced MRI have previously been demonstrated in studies involving our research group [10]. Secondly, STIR protocols were not identical on the different MRI units, in agreement with built-in differences between the MRI units. Moreover, it is likely that further optimization of the MRI protocol, especially on the 3 T unit (if possible including positioning of the hand in the isocenter of the magnet), would have improved the results. Finally, MRI was performed at one time-point only. A future longitudinal follow-up study would be useful to assess the sensitivity to change of the different imaging approaches.

In conclusion, unenhanced MRI using STIR sequences is only moderately reliable for assessing synovitis in RA MCP and wrist joints when contrast-enhanced MRI is considered the gold standard reference. Contrast injection, field strength, and coil type influence synovitis assessment, and should be considered before performing MRI in clinical trials and practice.

The best results were obtained at the 0.6 T and 1.5 T MRI units and by using extremity coils. Optimizing scanning protocols, including the use of dedicated extremity coils rather than flex coils, are important for reliable scoring of synovitis in wrists and hands of patients with RA.