Introduction

There have been numerous efforts to determine the prognostic factors affecting the outcome of rotator cuff repair, and many structural (tear size, muscle atrophy, fatty degeneration, etc) and clinical factors (age, patients’ expectation, and surgeons’ experience, etc) have been proposed [2, 3, 10, 14, 16, 21, 24]. Fatty degeneration (FD) of the rotator cuff muscle is one of the factors negatively influencing functional and anatomic outcomes [7, 9, 1416]. Furthermore, FD of the cuff muscle is worsened in patients sustaining retears and reportedly is irreversible even in successful repairs [7, 27].

A semiquantitative grading system for FD was proposed by Goutallier et al. [8], and this system has been widely used [2, 3, 7, 8, 10, 1416, 21, 22, 24]. Measurement of FD originally was estimated on the axial CT images, but as MRI became the gold standard for evaluating cuff disorders, the situation has changed to the oblique sagittal MRI scan [6, 29]. Problems with the semiquantitative nature of the grading system of Goutallier et al. have been reported [18, 26, 28, 29], and quantitative assessment of FD using MRI and CT has been suggested, although such a quantitative grading of FD is not yet widely accepted.

A few studies have reported the reliability of grading systems for FD of the rotator cuff muscles [6, 13, 22]. One recent study reported low interobserver reliability (kappa values) using the system of Goutallier et al. including various features of rotator cuff tears, such as degree of retraction and size of tear, perhaps resulting from the complexity and subjectivity of the grading system [22]. In one study, the interobserver and intraobserver reliabilities of FD in each cuff muscle were judged unacceptable and use of an overall fatty infiltration grade, with an interclass correlation coefficient (ICC) of at least 0.75 was recommended [13]. In contrast, acceptable agreement in rating the FD grade according to Goutallier et al. was reported by Fuchs et al. [6], with kappa values ranging from 0.68 to 0.83 on CT and 0.82 to 0.93 on MRI. Given these apparently conflicting studies, a question arises regarding the reliability of the grading systems of FD.

We therefore assessed interobserver and intraobserver reliabilities of the current semiquantitative grading of FD, presuming it would show acceptable reliability.

Materials and Methods

We enrolled 75 patients between October 2003 and July 2006 who met the following inclusion criteria: the patient had to (1) have a full-thickness rotator cuff tear verified by preoperative MR arthrography (MRA), (2) have surgery, (3) be available for postoperative CT arthrography (CTA) to evaluate cuff integrity and FD of cuff muscles, and (4) be available a minimum of 1 year after the operation. We excluded patients with partial-thickness rotator cuff tears, isolated glenohumeral abnormalities (superior labral anterior-posterior lesion, instability, etc), revision of surgical cuff repair, any previous shoulder surgery, isolated subscapularis tear, and irreparable cuff tear. There were 38 men and 37 women with a mean age of 59.35 years (range, 39–77 years; standard deviation [SD], 8.77). The dominant shoulder was involved in 61 patients (81.3%). We obtained prior Institutional Review Board approval for the study protocol.

We performed arthroscopy-assisted mini-open repair in 30 patients (40%), and all-arthroscopic repair in 45 (60%). All surgical procedures were performed by the senior author (JHO) and there was no partial repair. The anteroposterior (AP) dimension and retraction of the cuff tear was measured during arthroscopy with a probe. The mean AP dimension of the cuff tear at footprint was 2.46 cm (range, 0.60–5.00 cm; SD, 1.36) and the mean retraction length was 2.34 cm (range, 0.50–6.60 cm; SD, 1.28). When the rotator cuff tears were categorized according to the tear size (AP dimension), there were 18 small (< 1 cm) (24%), 26 medium (1–3 cm) (34.7%), and 31 large to massive tears (> 3 cm) (41.3%).

For preoperative MRA, the contrast media was injected into the joint with fluoroscopic guidance through an anterior approach. A 22-gauge spinal needle was placed in the glenohumeral joint, and 1 to 5 mL iodinated contrast material (Telebrix 30®; Guerbet, Villepinte, France) was used to verify intraarticular injection. Diluted gadopentetate dimeglumine (12–20 mL) (Omniscan™; GE Healthcare Amersham, Oslo, Norway) at a concentration of 2.5 mmol/L was injected. MRI was performed on a 1.5-T system (Philips Gyroscan Intera; Philips Medical Systems, Utrecht, The Netherlands). MRI was started immediately after contrast was injected (within 5 minutes of intraarticular injection). The shoulders were placed in a neutral position in a dedicated, phased-array, Flex-M coil (Philips Medical Systems). We obtained fat-suppressed T1-weighted spin-echo images in the transverse plane (TR/TE: 400–700 ms/10–20 ms; 3-mm section thickness; 0.3-mm interval; 160- × 160-mm field of view; 256- × 512-pixel matrix) in the oblique coronal plane parallel to the supraspinatus muscle and in the oblique sagittal plane parallel to the joint surface of the glenoid. T1-weighted spin-echo images without fat suppression were obtained in the oblique coronal plane. We obtained T2-weighted spin-echo images in the coronal plane and in the oblique sagittal plane (TR/TE: 2500–3500 ms/89–100 ms). Three-dimensional gradient echo images (3DWatSc; TR/TE: 20 ms/9–10 ms; 20° flip angle) were obtained in the transverse plane.

For postoperative CTA, we injected 12 to 20 mL diluted (65%) iodinated contrast material (Telebrix 30®). CT was performed using a 16-multidetector CT system (Mx 8000 IDT; Philips Medical Systems), with the following scan parameters: tube voltage, 120 kV; tube current, 245 mAs; 1-mm slice thickness; 0.5-mm increment; beam collimation, 0.75 mm; effective pitch, 0.9; 150-mm field of view; 1024- × 512-pixel matrix. For data analysis, we generated oblique coronal, oblique sagittal, and axial reconstructions at a three-dimensional workstation with a 2-mm section thickness and no reconstruction interval for the axial images and a 2-mm reconstruction interval for oblique coronal and sagittal images. Oblique coronal images were reconstructed parallel to the supraspinatus muscle and oblique sagittal images parallel to the joint surface of the glenoid with identical section thickness and reconstruction interval.

Two musculoskeletal radiologists (JAC, YK) (experience level: 7 years and 2 years after fellowship training in musculoskeletal imaging) and three shoulder fellowship-trained orthopaedic surgeons (CHO, KHJ, JHO) (experience level: 2, 3, and 5 years) determined the FD of each rotator cuff muscle using the grading system of Goutallier et al. using preoperative MRA and postoperative CTA (Table 1). Also, data of the FD grades by Goutallier et al. were converted into the system of Fuchs et al. [6], which was developed to minimize choice variability, in which the five grades in the system of Goutallier et al. were reduced to three stages by grouping, ie, Goutallier’s Grades 0 and 1 are regarded as normal muscle, Grade 2 as moderately pathologic muscle, and Grades 3 and 4 as advanced degeneration (Table 1).

Table 1 Criteria for grading fatty degeneration of rotator cuff muscles

To assess interobserver reliability, the two radiologists and three orthopaedic surgeons reviewed preoperative MRA and postoperative CTA scans and graded the FD of the cuff muscles. All five raters were aware all patients enrolled in the study had rotator cuff tears and had undergone repair. Several meetings were held before the interpretation regarding which image cuts of the preoperative MRA and postoperative CTA should be selected for evaluation of the FD: the FDs of the supraspinatus, infraspinatus, and subscapularis muscles were measured on the oblique sagittal T1-weighted image at the level showing the coracoid base and where the spine and body of the scapula form a Y shape [6, 28, 29]. For evaluation of CTA, the same level of the sagittal reconstruction image was used (Fig. 1). All observers were free to observe all scans and choose one representative scan for the evaluation. The first rating for the five observers at the same time was used to evaluate interobserver reliability.

Fig. 1A–B
figure 1

For the FD evaluation, the oblique sagittal T1-weighted image was used at the level that shows the coracoid base and where the spine and body of the scapula form a Y shape in (A) MRA and (B) CTA.

To assess intraobserver reliability, all raters performed the second radiographic evaluation with the same image for the FD of the cuff muscles. Before the second rating, all raters were unaware of the first rating.

The intraobserver and interobserver reliabilities of the five raters were evaluated with the ICC, a two-way random model with absolute agreement, for each FD of the cuff on preoperative and postoperative images. The value can range from 0 to 1; close to 1 indicates high reliability of measurement. All analyses were performed using the SPSS® software package (Version 12.0; SPSS Inc, Chicago, IL).

Results

For evaluation of interobserver reliability of FD using the system of Goutallier et al. the ICC of the FD grade of the cuff muscle was higher in MRA (0.6–0.72) than in CTA (0.43–0.6) (Table 2). We found similar reliability using the system of Fuchs et al. (Table 3), with no difference of interobserver reliability between the two grading systems. The value of ICC for interobserver reliability of FD was higher and less variable for the radiologists (0.58–0.78) than for the orthopaedic surgeons (0.32–0.68) (Table 4). This phenomenon was consistent for cuff muscles and imaging tools.

Table 2 Interobserver reliability for grading fatty degeneration*
Table 3 Interobserver reliability for grading fatty degeneration*
Table 4 Interobserver reliability for grading fatty degeneration

The intraobserver reliabilities of FD in each rotator cuff muscle were variable between individuals (ICC, 0.26–0.81) (Table 5). Observers with higher levels of experience tended to have higher and more consistent ICC values.

Table 5 Intraobserver reliability for grading fatty degeneration

Discussion

The FD in rotator cuff tear has received attention as a prognostic factor affecting anatomic and functional outcomes [7, 9, 1416, 20]. The reliability of the classification system is a prerequisite for communication and comparison of results between studies. Despite the recent emphasis on FD of the cuff muscle, there have been only a few studies focusing on the reliability of the system of Goutallier et al. [6, 13, 22]. Therefore, the main purpose of our study was to evaluate the reliability of the most widely used classification in MRA and CTA by radiologists and orthopaedic surgeons.

There are limitations to our study. First, we used different modalities preoperatively and postoperatively, which did not allow us to evaluate changes of the FD. Correlation of the FD between MRI and CT was only fair to moderate; a change of evaluation modality in the same patient during followup or the use of both methods in a trial was not recommended [6]. The FD tended to have a higher grade on MRI than on CT because the connective tissues were read as muscle on CT. In addition, using CTA for postoperative evaluation might account for low reliability in CTA because, in general, postoperative change makes evaluation difficult. Nevertheless, we cautiously believe reconstructed sagittal CTA images for evaluation of FD are quite medial to the repair site, and the elapsed time of at least 1 year after surgery might have little influence on evaluating the FD and analyzing the reliability of FD grading. There were several reasons why we adapted multidetector CTA for postoperative cuff evaluation. For earlier repairs, we used a metal anchor for the repair and metal artifacts are one of the well-known drawbacks of MRI. The interpretation of cuff integrity is more evident with CTA, especially near the anchor. Moreover, it is much less expensive than MRI and not examiner-dependent as in ultrasonography. Second, we included only rotator cuff tears in the study and observers were not blind to the fact. Grading of FD might have been influenced by recognizing the presence of the tear and its size. However, FD is an important prognostic factor in patients with rotator cuff tears and is not evident in disorders other than cuff tears. We wanted to focus on patients with full-thickness rotator cuff tears and enrolled consecutive patients under strict inclusion and exclusion criteria, although it was not a randomized procedure. Finally, there might be a bias when evaluating postoperative CTA among observers. For example, surgeons tend to think positively of surgical approaches so the postoperative FD evaluation might have been biased. In the intraobserver analysis, ICC values of CTA were close to those of MRA; however, they were consistently lower in the interobserver analysis.

We did not find high interobserver reliability, even with MRI (ICC, 0.6–0.72), which still was better than CTA (ICC, 0.43–0.6). The system of Fuchs et al. simplifies the grades into three to reduce the complexity and subjectivity, but we found the interobserver reliability was similar to that for the system of Goutallier et al. The interobserver reliability of FD was more constant and less variable for radiologists than for orthopaedic surgeons. The intraobserver reliability also showed the ICC varied among the examiners (ICC, 0.26–0.81). There were unequal levels of experience among the radiologists and orthopaedic surgeons who participated in the study. The level of experience might affect the results of reliability, but we believe it is not an absolute factor for higher reliability. The more experienced radiologist tended to show more consistent results. An experienced radiologist might be better at evaluating the FD of the rotator cuff muscle, because orthopaedic surgeons might have a bias as surgeons, although it depends on personal experience or skill.

The interpretation of ICC values is also debatable. The acceptable threshold of ICC values varies study by study or system by system. In the studies of Escolar et al. [4] and Fleiss [5], 0.75 was established as a threshold value for reliability. The criteria of Cicchetti and Sparrow [1] were 0.00 to 0.39 as poor; 0.40 to 0.59 as fair; 0.60 to 0.74 as good; and 0.75 to 1.00 as excellent. The reliability of a certain evaluation system is not an absolute one, so its clinical implications regarding whether it is acceptable or not depend on the clinical importance of the situation. However, authors believe the current situation of FD evaluation for rotator cuff muscle have problems with interobserver or intraobserver reliabilities, and a more reliable and objective system is necessary.

We believe the current FD grading system has several drawbacks. The FD of each cuff muscle usually is measured at one cross-sectional image rather than the whole muscle belly. Relatively distally located evaluation points may not well represent the condition of the entire muscle. Rotator cuff tear with fatty infiltration is a progressive and infiltrative process that increases with time and fatty infiltration progresses from the musculotendinous junction toward the muscle origin in a rabbit torn cuff model [19]. The FD of the cuff might be overestimated when the evaluation is performed on one cut section near the glenoid. Furthermore, the cross-sectional areas of the muscle may be highly and directly influenced by retraction of the musculotendinous junction of the torn rotator cuff [25, 26, 29]. If a repair were successful and the retracted tendon end brought back to the footprint, the muscle belly would be lateralized, and this might affect evaluation of FD. The postoperative FD would appear improved because the measurement point is the same even in successful repair. This problem could be overcome by integration of fatty content or the muscle-occupying ratio with multiple cross-sectional images. However, taking multiple cross-sectional images toward the medial part of the muscle is problematic with MRI. It increases the field of the image gain and might reduce accuracy of another area. In addition, more time is required to obtain a wider view.

Therefore, there have been attempts to develop a more objective method to measure muscle quantity and quality [12, 26, 28, 29]. Measuring the mean muscle density in Hounsfield units in CT (ICC, 0.98) was superior to using a visual rating (ICC, 0.63) [28]. Analysis of the muscle volume in cadavers using MRI with three-dimensional image analysis software and the actual volume after dissection correlate well, with intraobserver and interobserver variabilities less than 4% [26]. Two-dimensional SPLASH MRI and proton MR spectroscopy also were developed to quantify fat content of the rotator cuff muscles [12, 17]. Fatty atrophy also was evaluated with ultrasonography [11, 23]. However, these methods are not readily available in general practice. Some of the methods need specialized software, and others need spectroscopy, which is not familiar to the orthopaedic field.

The reliability of the measuring system for FD of the rotator cuff muscles is a critical aspect of evaluation, and the current grading system does not fully meet the requirement. We recommend caution when interpreting FD grading among studies. FD is considered one of the most important outcome predictors of rotator cuff tear. Given its importance there should be a more reliable and accurate grading system for FD of the cuff muscles.