Abstract
Fatty degeneration of the rotator cuff muscles is considered one of the most important factors for the outcomes of cuff repair. However, the reliability of the grading system is not well validated. Two specialists in musculoskeletal radiology and three shoulder fellowship-trained orthopaedic surgeons reviewed the fatty degeneration grades of each cuff muscle of consecutive 75 full-thickness cuff tears. Fatty degeneration grades were assessed according to the systems of Goutallier et al. and Fuchs et al. using preoperative MR and postoperative CT arthrographies. The interclass correlation coefficient was analyzed to assess interobserver and intraobserver reliabilities. For interobserver reliability using the system of Goutallier et al. the interclass correlation coefficient was higher in MR arthrography (0.6–0.72) than in CT arthrography (0.43–0.6) and higher for radiologists (0.58–0.78) than for orthopaedic surgeons (0.32–0.68). There was no difference between the systems of Goutallier et al. and Fuchs et al. Intraobserver reliabilities showed a similar pattern (0.26–0.81), but the level of experience should be considered. Although the system of Goutallier et al. is most widely used in orthopaedics, reported data should be interpreted carefully because of the relatively low reliability.
Level of Evidence: Level III, diagnostic study. See the Guidelines for Authors for a complete description of levels of evidence.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
There have been numerous efforts to determine the prognostic factors affecting the outcome of rotator cuff repair, and many structural (tear size, muscle atrophy, fatty degeneration, etc) and clinical factors (age, patients’ expectation, and surgeons’ experience, etc) have been proposed [2, 3, 10, 14, 16, 21, 24]. Fatty degeneration (FD) of the rotator cuff muscle is one of the factors negatively influencing functional and anatomic outcomes [7, 9, 14–16]. Furthermore, FD of the cuff muscle is worsened in patients sustaining retears and reportedly is irreversible even in successful repairs [7, 27].
A semiquantitative grading system for FD was proposed by Goutallier et al. [8], and this system has been widely used [2, 3, 7, 8, 10, 14–16, 21, 22, 24]. Measurement of FD originally was estimated on the axial CT images, but as MRI became the gold standard for evaluating cuff disorders, the situation has changed to the oblique sagittal MRI scan [6, 29]. Problems with the semiquantitative nature of the grading system of Goutallier et al. have been reported [18, 26, 28, 29], and quantitative assessment of FD using MRI and CT has been suggested, although such a quantitative grading of FD is not yet widely accepted.
A few studies have reported the reliability of grading systems for FD of the rotator cuff muscles [6, 13, 22]. One recent study reported low interobserver reliability (kappa values) using the system of Goutallier et al. including various features of rotator cuff tears, such as degree of retraction and size of tear, perhaps resulting from the complexity and subjectivity of the grading system [22]. In one study, the interobserver and intraobserver reliabilities of FD in each cuff muscle were judged unacceptable and use of an overall fatty infiltration grade, with an interclass correlation coefficient (ICC) of at least 0.75 was recommended [13]. In contrast, acceptable agreement in rating the FD grade according to Goutallier et al. was reported by Fuchs et al. [6], with kappa values ranging from 0.68 to 0.83 on CT and 0.82 to 0.93 on MRI. Given these apparently conflicting studies, a question arises regarding the reliability of the grading systems of FD.
We therefore assessed interobserver and intraobserver reliabilities of the current semiquantitative grading of FD, presuming it would show acceptable reliability.
Materials and Methods
We enrolled 75 patients between October 2003 and July 2006 who met the following inclusion criteria: the patient had to (1) have a full-thickness rotator cuff tear verified by preoperative MR arthrography (MRA), (2) have surgery, (3) be available for postoperative CT arthrography (CTA) to evaluate cuff integrity and FD of cuff muscles, and (4) be available a minimum of 1 year after the operation. We excluded patients with partial-thickness rotator cuff tears, isolated glenohumeral abnormalities (superior labral anterior-posterior lesion, instability, etc), revision of surgical cuff repair, any previous shoulder surgery, isolated subscapularis tear, and irreparable cuff tear. There were 38 men and 37 women with a mean age of 59.35 years (range, 39–77 years; standard deviation [SD], 8.77). The dominant shoulder was involved in 61 patients (81.3%). We obtained prior Institutional Review Board approval for the study protocol.
We performed arthroscopy-assisted mini-open repair in 30 patients (40%), and all-arthroscopic repair in 45 (60%). All surgical procedures were performed by the senior author (JHO) and there was no partial repair. The anteroposterior (AP) dimension and retraction of the cuff tear was measured during arthroscopy with a probe. The mean AP dimension of the cuff tear at footprint was 2.46 cm (range, 0.60–5.00 cm; SD, 1.36) and the mean retraction length was 2.34 cm (range, 0.50–6.60 cm; SD, 1.28). When the rotator cuff tears were categorized according to the tear size (AP dimension), there were 18 small (< 1 cm) (24%), 26 medium (1–3 cm) (34.7%), and 31 large to massive tears (> 3 cm) (41.3%).
For preoperative MRA, the contrast media was injected into the joint with fluoroscopic guidance through an anterior approach. A 22-gauge spinal needle was placed in the glenohumeral joint, and 1 to 5 mL iodinated contrast material (Telebrix 30®; Guerbet, Villepinte, France) was used to verify intraarticular injection. Diluted gadopentetate dimeglumine (12–20 mL) (Omniscan™; GE Healthcare Amersham, Oslo, Norway) at a concentration of 2.5 mmol/L was injected. MRI was performed on a 1.5-T system (Philips Gyroscan Intera; Philips Medical Systems, Utrecht, The Netherlands). MRI was started immediately after contrast was injected (within 5 minutes of intraarticular injection). The shoulders were placed in a neutral position in a dedicated, phased-array, Flex-M coil (Philips Medical Systems). We obtained fat-suppressed T1-weighted spin-echo images in the transverse plane (TR/TE: 400–700 ms/10–20 ms; 3-mm section thickness; 0.3-mm interval; 160- × 160-mm field of view; 256- × 512-pixel matrix) in the oblique coronal plane parallel to the supraspinatus muscle and in the oblique sagittal plane parallel to the joint surface of the glenoid. T1-weighted spin-echo images without fat suppression were obtained in the oblique coronal plane. We obtained T2-weighted spin-echo images in the coronal plane and in the oblique sagittal plane (TR/TE: 2500–3500 ms/89–100 ms). Three-dimensional gradient echo images (3DWatSc; TR/TE: 20 ms/9–10 ms; 20° flip angle) were obtained in the transverse plane.
For postoperative CTA, we injected 12 to 20 mL diluted (65%) iodinated contrast material (Telebrix 30®). CT was performed using a 16-multidetector CT system (Mx 8000 IDT; Philips Medical Systems), with the following scan parameters: tube voltage, 120 kV; tube current, 245 mAs; 1-mm slice thickness; 0.5-mm increment; beam collimation, 0.75 mm; effective pitch, 0.9; 150-mm field of view; 1024- × 512-pixel matrix. For data analysis, we generated oblique coronal, oblique sagittal, and axial reconstructions at a three-dimensional workstation with a 2-mm section thickness and no reconstruction interval for the axial images and a 2-mm reconstruction interval for oblique coronal and sagittal images. Oblique coronal images were reconstructed parallel to the supraspinatus muscle and oblique sagittal images parallel to the joint surface of the glenoid with identical section thickness and reconstruction interval.
Two musculoskeletal radiologists (JAC, YK) (experience level: 7 years and 2 years after fellowship training in musculoskeletal imaging) and three shoulder fellowship-trained orthopaedic surgeons (CHO, KHJ, JHO) (experience level: 2, 3, and 5 years) determined the FD of each rotator cuff muscle using the grading system of Goutallier et al. using preoperative MRA and postoperative CTA (Table 1). Also, data of the FD grades by Goutallier et al. were converted into the system of Fuchs et al. [6], which was developed to minimize choice variability, in which the five grades in the system of Goutallier et al. were reduced to three stages by grouping, ie, Goutallier’s Grades 0 and 1 are regarded as normal muscle, Grade 2 as moderately pathologic muscle, and Grades 3 and 4 as advanced degeneration (Table 1).
To assess interobserver reliability, the two radiologists and three orthopaedic surgeons reviewed preoperative MRA and postoperative CTA scans and graded the FD of the cuff muscles. All five raters were aware all patients enrolled in the study had rotator cuff tears and had undergone repair. Several meetings were held before the interpretation regarding which image cuts of the preoperative MRA and postoperative CTA should be selected for evaluation of the FD: the FDs of the supraspinatus, infraspinatus, and subscapularis muscles were measured on the oblique sagittal T1-weighted image at the level showing the coracoid base and where the spine and body of the scapula form a Y shape [6, 28, 29]. For evaluation of CTA, the same level of the sagittal reconstruction image was used (Fig. 1). All observers were free to observe all scans and choose one representative scan for the evaluation. The first rating for the five observers at the same time was used to evaluate interobserver reliability.
To assess intraobserver reliability, all raters performed the second radiographic evaluation with the same image for the FD of the cuff muscles. Before the second rating, all raters were unaware of the first rating.
The intraobserver and interobserver reliabilities of the five raters were evaluated with the ICC, a two-way random model with absolute agreement, for each FD of the cuff on preoperative and postoperative images. The value can range from 0 to 1; close to 1 indicates high reliability of measurement. All analyses were performed using the SPSS® software package (Version 12.0; SPSS Inc, Chicago, IL).
Results
For evaluation of interobserver reliability of FD using the system of Goutallier et al. the ICC of the FD grade of the cuff muscle was higher in MRA (0.6–0.72) than in CTA (0.43–0.6) (Table 2). We found similar reliability using the system of Fuchs et al. (Table 3), with no difference of interobserver reliability between the two grading systems. The value of ICC for interobserver reliability of FD was higher and less variable for the radiologists (0.58–0.78) than for the orthopaedic surgeons (0.32–0.68) (Table 4). This phenomenon was consistent for cuff muscles and imaging tools.
The intraobserver reliabilities of FD in each rotator cuff muscle were variable between individuals (ICC, 0.26–0.81) (Table 5). Observers with higher levels of experience tended to have higher and more consistent ICC values.
Discussion
The FD in rotator cuff tear has received attention as a prognostic factor affecting anatomic and functional outcomes [7, 9, 14–16, 20]. The reliability of the classification system is a prerequisite for communication and comparison of results between studies. Despite the recent emphasis on FD of the cuff muscle, there have been only a few studies focusing on the reliability of the system of Goutallier et al. [6, 13, 22]. Therefore, the main purpose of our study was to evaluate the reliability of the most widely used classification in MRA and CTA by radiologists and orthopaedic surgeons.
There are limitations to our study. First, we used different modalities preoperatively and postoperatively, which did not allow us to evaluate changes of the FD. Correlation of the FD between MRI and CT was only fair to moderate; a change of evaluation modality in the same patient during followup or the use of both methods in a trial was not recommended [6]. The FD tended to have a higher grade on MRI than on CT because the connective tissues were read as muscle on CT. In addition, using CTA for postoperative evaluation might account for low reliability in CTA because, in general, postoperative change makes evaluation difficult. Nevertheless, we cautiously believe reconstructed sagittal CTA images for evaluation of FD are quite medial to the repair site, and the elapsed time of at least 1 year after surgery might have little influence on evaluating the FD and analyzing the reliability of FD grading. There were several reasons why we adapted multidetector CTA for postoperative cuff evaluation. For earlier repairs, we used a metal anchor for the repair and metal artifacts are one of the well-known drawbacks of MRI. The interpretation of cuff integrity is more evident with CTA, especially near the anchor. Moreover, it is much less expensive than MRI and not examiner-dependent as in ultrasonography. Second, we included only rotator cuff tears in the study and observers were not blind to the fact. Grading of FD might have been influenced by recognizing the presence of the tear and its size. However, FD is an important prognostic factor in patients with rotator cuff tears and is not evident in disorders other than cuff tears. We wanted to focus on patients with full-thickness rotator cuff tears and enrolled consecutive patients under strict inclusion and exclusion criteria, although it was not a randomized procedure. Finally, there might be a bias when evaluating postoperative CTA among observers. For example, surgeons tend to think positively of surgical approaches so the postoperative FD evaluation might have been biased. In the intraobserver analysis, ICC values of CTA were close to those of MRA; however, they were consistently lower in the interobserver analysis.
We did not find high interobserver reliability, even with MRI (ICC, 0.6–0.72), which still was better than CTA (ICC, 0.43–0.6). The system of Fuchs et al. simplifies the grades into three to reduce the complexity and subjectivity, but we found the interobserver reliability was similar to that for the system of Goutallier et al. The interobserver reliability of FD was more constant and less variable for radiologists than for orthopaedic surgeons. The intraobserver reliability also showed the ICC varied among the examiners (ICC, 0.26–0.81). There were unequal levels of experience among the radiologists and orthopaedic surgeons who participated in the study. The level of experience might affect the results of reliability, but we believe it is not an absolute factor for higher reliability. The more experienced radiologist tended to show more consistent results. An experienced radiologist might be better at evaluating the FD of the rotator cuff muscle, because orthopaedic surgeons might have a bias as surgeons, although it depends on personal experience or skill.
The interpretation of ICC values is also debatable. The acceptable threshold of ICC values varies study by study or system by system. In the studies of Escolar et al. [4] and Fleiss [5], 0.75 was established as a threshold value for reliability. The criteria of Cicchetti and Sparrow [1] were 0.00 to 0.39 as poor; 0.40 to 0.59 as fair; 0.60 to 0.74 as good; and 0.75 to 1.00 as excellent. The reliability of a certain evaluation system is not an absolute one, so its clinical implications regarding whether it is acceptable or not depend on the clinical importance of the situation. However, authors believe the current situation of FD evaluation for rotator cuff muscle have problems with interobserver or intraobserver reliabilities, and a more reliable and objective system is necessary.
We believe the current FD grading system has several drawbacks. The FD of each cuff muscle usually is measured at one cross-sectional image rather than the whole muscle belly. Relatively distally located evaluation points may not well represent the condition of the entire muscle. Rotator cuff tear with fatty infiltration is a progressive and infiltrative process that increases with time and fatty infiltration progresses from the musculotendinous junction toward the muscle origin in a rabbit torn cuff model [19]. The FD of the cuff might be overestimated when the evaluation is performed on one cut section near the glenoid. Furthermore, the cross-sectional areas of the muscle may be highly and directly influenced by retraction of the musculotendinous junction of the torn rotator cuff [25, 26, 29]. If a repair were successful and the retracted tendon end brought back to the footprint, the muscle belly would be lateralized, and this might affect evaluation of FD. The postoperative FD would appear improved because the measurement point is the same even in successful repair. This problem could be overcome by integration of fatty content or the muscle-occupying ratio with multiple cross-sectional images. However, taking multiple cross-sectional images toward the medial part of the muscle is problematic with MRI. It increases the field of the image gain and might reduce accuracy of another area. In addition, more time is required to obtain a wider view.
Therefore, there have been attempts to develop a more objective method to measure muscle quantity and quality [12, 26, 28, 29]. Measuring the mean muscle density in Hounsfield units in CT (ICC, 0.98) was superior to using a visual rating (ICC, 0.63) [28]. Analysis of the muscle volume in cadavers using MRI with three-dimensional image analysis software and the actual volume after dissection correlate well, with intraobserver and interobserver variabilities less than 4% [26]. Two-dimensional SPLASH MRI and proton MR spectroscopy also were developed to quantify fat content of the rotator cuff muscles [12, 17]. Fatty atrophy also was evaluated with ultrasonography [11, 23]. However, these methods are not readily available in general practice. Some of the methods need specialized software, and others need spectroscopy, which is not familiar to the orthopaedic field.
The reliability of the measuring system for FD of the rotator cuff muscles is a critical aspect of evaluation, and the current grading system does not fully meet the requirement. We recommend caution when interpreting FD grading among studies. FD is considered one of the most important outcome predictors of rotator cuff tear. Given its importance there should be a more reliable and accurate grading system for FD of the cuff muscles.
References
Cicchetti DV, Sparrow SA. Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. Am J Ment Defic. 1981;86:127–137.
Cole BJ, McCarty LP 3rd, Kang RW, Alford W, Lewis PB, Hayden JK. Arthroscopic rotator cuff repair: prospective functional outcome and repair integrity at minimum 2-year follow-up. J Shoulder Elbow Surg. 2007;16:579–585.
DeFranco MJ, Bershadsky B, Ciccone J, Yum JK, Iannotti JP. Functional outcome of arthroscopic rotator cuff repairs: a correlation of anatomic and clinical results. J Shoulder Elbow Surg. 2007;16:759–765.
Escolar DM, Henricson EK, Mayhew J, Florence J, Leshner R, Patel KM, Clemens PR. Clinical evaluator reliability for quantitative and manual muscle testing measures of strength in children. Muscle Nerve. 2001;24:787–793.
Fleiss JL. The Design and Analysis of Clinical Experiments. New York, NY: Wiley; 1986.
Fuchs B, Weishaupt D, Zanetti M, Hodler J, Gerber C. Fatty degeneration of the muscles of the rotator cuff: assessment by computed tomography versus magnetic resonance imaging. J Shoulder Elbow Surg. 1999;8:599–605.
Gladstone JN, Bishop JY, Lo IK, Flatow EL. Fatty infiltration and atrophy of the rotator cuff do not improve after rotator cuff repair and correlate with poor functional outcome. Am J Sports Med. 2007;35:719–728.
Goutallier D, Postel JM, Bernageau J, Lavau L, Voisin MC. Fatty muscle degeneration in cuff ruptures: pre- and postoperative evaluation by CT scan. Clin Orthop Relat Res. 1994;304:78–83.
Goutallier D, Postel JM, Gleyze P, Leguilloux P, Van Driessche S. Influence of cuff muscle fatty degeneration on anatomic and functional outcomes after simple suture of full-thickness tears. J Shoulder Elbow Surg. 2003;12:550–554.
Henn RF 3rd, Kang L, Tashjian RZ, Green A. Patients’ preoperative expectations predict the outcome of rotator cuff repair. J Bone Joint Surg Am. 2007;89:1913–1919.
Kavanagh EC, Koulouris G, Parker L, Morrison WB, Bergin D, Zoga AC, Dlugosz JA, Nazarian LN. Does extended-field-of-view sonography improve interrater reliability for the detection of rotator cuff muscle atrophy? AJR Am J Roentgenol. 2008;190:27–31.
Kenn W, Bohm D, Gohlke F, Hummer C, Kostler H, Hahn D. 2D SPLASH: a new method to determine the fatty infiltration of the rotator cuff muscles. Eur Radiol. 2004;14:2331–2336.
Lesage P, Maynou C, Elhage R, Boutry N, Herent S, Mestdagh H. [Reproducibility of CT scan evaluation of muscular fatty degeneration: intra- and interobserver analysis of 56 shoulders presenting with a ruptured rotator cuff muscles] [in French]. Rev Chir Orthop Reparatrice Appar Mot. 2002;88:359–364.
Liem D, Lichtenberg S, Magosch P, Habermeyer P. Magnetic resonance imaging of arthroscopic supraspinatus tendon repair. J Bone Joint Surg Am. 2007;89:1770–1776.
Mellado JM, Calmet J, Olona M, Esteve C, Camins A, Perez Del Palomar L, Gine J, Sauri A. Surgically repaired massive rotator cuff tears: MRI of tendon integrity, muscle fatty degeneration, and muscle atrophy correlated with intraoperative and clinical findings. AJR Am J Roentgenol. 2005;184:1456–1463.
Oh JH, Kim SH, Ji HM, Jo KH, Bin SW, Gong HS. Prognostic factors affecting anatomical outcome of rotator cuff repair and correlation with functional outcome. Arthroscopy. 2009;25:30–39.
Pfirrmann CW, Schmid MR, Zanetti M, Jost B, Gerber C, Hodler J. Assessment of fat content in supraspinatus muscle with proton MR spectroscopy in asymptomatic volunteers and patients with supraspinatus tendon lesions. Radiology. 2004;232:709–715.
Pfirrmann CW, Zanetti M, Weishaupt D, Gerber C, Hodler J. Subscapularis tendon tears: detection and grading at MR arthrography. Radiology. 1999;213:709–714.
Rubino LJ, Stills HF Jr, Sprott DC, Crosby LA. Fatty infiltration of the torn rotator cuff worsens over time in a rabbit model. Arthroscopy. 2007;23:717–722.
Shen PH, Lien SB, Shen HC, Lee CH, Wu SS, Lin LC. Long-term functional outcomes after repair of rotator cuff tears correlated with atrophy of the supraspinatus muscles on magnetic resonance images. J Shoulder Elbow Surg. 2008;17:1S–7S.
Sherman SL, Lyman S, Koulouvaris P, Willis A, Marx RG. Risk factors for readmission and revision surgery following rotator cuff repair. Clin Orthop Relat Res. 2008;466:608–613.
Spencer EE Jr, Dunn WR, Wright RW, Wolf BR, Spindler KP, McCarty E, Ma CB, Jones G, Safran M, Holloway GB, Kuhn JE. Interobserver agreement in the classification of rotator cuff tears using magnetic resonance imaging. Am J Sports Med. 2008;36:99–103.
Strobel K, Hodler J, Meyer DC, Pfirrmann CW, Pirkl C, Zanetti M. Fatty atrophy of supraspinatus and infraspinatus muscles: accuracy of US. Radiology. 2005;237:584–589.
Sugaya H, Maeda K, Matsuki K, Moriishi J. Repair integrity and functional outcome after arthroscopic double-row rotator cuff repair: a prospective outcome study. J Bone Joint Surg Am. 2007;89:953–960.
Thomazeau H, Rolland Y, Lucas C, Duval JM, Langlais F. Atrophy of the supraspinatus belly: assessment by MRI in 55 patients with rotator cuff pathology. Acta Orthop Scand. 1996;67:264–268.
Tingart MJ, Apreleva M, Lehtinen JT, Capell B, Palmer WE, Warner JJ. Magnetic resonance imaging in quantitative analysis of rotator cuff muscle volume. Clin Orthop Relat Res. 2003;415:104–110.
Uhthoff HK, Matsumoto F, Trudel G, Himori K. Early reattachment does not reverse atrophy and fat accumulation of the supraspinatus: an experimental study in rabbits. J Orthop Res. 2003;21:386–392.
van de Sande MA, Stoel BC, Obermann WR, Tjong a Lieng JG, Rozing PM. Quantitative assessment of fatty degeneration in rotator cuff muscles determined with computed tomography. Invest Radiol. 2005;40:313–319.
Zanetti M, Gerber C, Hodler J. Quantitative assessment of the muscles of the rotator cuff with magnetic resonance imaging. Invest Radiol. 1998;33:163–170.
Acknowledgments
We thank Ki Hyun Jo, MD, for helping with data acquisition and Seong Woo Bin, MD, Hye Ran Kim, and Shang Mi Shim for support in data collection. Pacific Edit reviewed the manuscript before submission.
Author information
Authors and Affiliations
Corresponding author
Additional information
Each author certifies that he or she has no commercial associations (eg, consultancies, stock ownership, equity interest, patent/licensing arrangements, etc) that might pose a conflict of interest in connection with the submitted article.
Each author certifies that his or her institution has approved the human protocol for this investigation, that all investigations were conducted in conformity with ethical principles of research, and that informed consent for participation in the study was obtained.
This study was performed in Seoul National University College of Medicine, Seoul National University Bundang Hospital.
About this article
Cite this article
Oh, J.H., Kim, S.H., Choi, JA. et al. Reliability of the Grading System for Fatty Degeneration of Rotator Cuff Muscles. Clin Orthop Relat Res 468, 1558–1564 (2010). https://doi.org/10.1007/s11999-009-0818-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11999-009-0818-6