Introduction

Accurate interpretation of knee articular cartilage MR imaging is clinically important. Osteoarthritis and post-traumatic cartilage defects are common and contribute to increasing medical costs [1]. There is a wide range of non-arthroplasty surgical treatment options for cartilage disease [2, 3] including autologous chondrocyte implantation, osteochondral auto- and allografting, and marrow stimulation. Lifestyle modification and medical treatment options may also slow the onset of osteoarthritis [4, 5]. MRI is one of the most commonly used modalities for the assessment of the articular cartilage of the knee [6, 7] because of its excellent soft-tissue contrast, although the reported sensitivity of MRI to detect cartilage abnormalities is lower than that of other internal derangements of the knee, such as ligamentous or meniscal injuries [6].

There have been multiple previous experimental, retrospective, and prospective studies demonstrating the diagnostic performance of MRI in assessment of articular cartilage utilizing conventional 2D fast-spin-echo sequences [8,9,10,11,12,13] as well as of specialized techniques including biochemical imaging [14], 3D gradient sequences [15, 16], and 3D double echo steady state (DESS) [17, 18]. While many of these specialized techniques are not in widespread clinical use, 3.0 Tesla (T) magnets are increasingly common in routine clinical practice, and 3.0 T MRI has been advocated as offering better diagnostic performance in comparison to 1.5 T MRI for assessment of knee articular cartilage [19,20,21,22,23]. The advantages of 3.0 T MRI include improved spatial resolution and contrast resolution, without increasing acquisition time. While prior studies in the established literature on cartilage imaging have utilized dedicated readers to evaluate the articular cartilage, no previous studies have assessed the diagnostic performance of routine clinical reports to evaluate the articular cartilage. The purpose of this study is to retrospectively compare the diagnostic performance of routine clinical knee MRI reports in the evaluation of articular cartilage at 1.5 and 3.0 T, with arthroscopy as the standard of reference.

Materials and methods

This retrospective study was performed with approval from our institutional review board with waiver of informed consent and was in compliance with Health Insurance Portability and Accountability Act regulations.

Study group

A sample size of 1372 articular surfaces (229 knees with 6 articular surfaces per knee) was calculated for a 10% confidence interval width and 95% confidence level, presuming a prevalence of articular lesions of 28% based on a study of 31,516 knee arthroscopies [24]; this sample size was then increased to 1800 articular surfaces (300 knees). The study consists of MRI and arthroscopy reports of 300 knees from 297 consecutive symptomatic patients who underwent arthroscopic knee surgery from 12 March 2012 to 29 March 2013 and who also had a routine (non-contrast, non-arthrogram) MRI examination of the same knee performed at the same institution within 365 days prior to arthroscopic surgery. Three patients had bilateral knee MRI and arthroscopic surgery. The list of consecutive patients who underwent knee arthroscopy was obtained from searching procedure codes in the radiology Picture Archiving and Communication System (PACS), which is also used to store photographic arthroscopic images. Patients were excluded from the study if non-routine MRI was performed (e.g., direct or indirect arthrogram or intravenous contrast-enhanced examination) or if the patient had undergone a previous cartilage repair surgery. Of the 297 patients included in the study (Table 1), there were 161 males and 136 females (mean age 42.8 years, range 15–83 years). There were 165 knees that underwent MRI at 1.5 T from 163 patients and 135 knees that underwent MRI at 3.0 T from 134 patients. The mean time between MRI and arthroscopy was 68.2 days (range: 1–330 days) for all patients, with the mean for 1.5 T of 73.6 days (range: 1–330 days) and the mean for 3.0 T of 59.0 days (range: 7–252 days).

Table 1 Number, sex, age, and days between MR and arthroscopy for the study group at 1.5 and 3.0 T

MR scanning

MR imaging was performed on one of six clinical MRI scanners (four 1.5 T and two 3.0 T) manufactured by Siemens and General Electric Healthcare. The decision to perform imaging at 1.5 or 3 T was determined by a combination of scanner availability, patient location preference, and referrer physician preference. The imaging protocols were not specifically tailored toward imaging cartilage but were instead designed as a standard, comprehensive assessment of the knee joint for evaluation of internal derangement. Imaging protocols vary slightly by scanner; however, the standard institutional protocol includes the following conventional 2D fast-spin-echo sequences: axial T2-weighted or PD-weighted images with fat suppression; sagittal and coronal fast-spin-echo proton-density weighted images with fat suppression; coronal and sagittal fast-spin-echo proton-density weighted images without fat suppression. Additionally, a 3D isotropic sequence with proton-density weighting without fat suppression was performed in the majority of examinations [Siemens Sampling Perfection with Application optimized Contrast using different flip angle Evolution (SPACE)], which was acquired in the sagittal plane and reformatted in axial, coronal, and coronal-oblique trochlear planes. Of the examinations performed at 1.5 T, 105/165 (63.6%) included the 3D isotropic sequence, as did 118/135 (87.4%) of the examinations performed at 3.0 T.

The number of examinations performed on each specific scanner, as well as the proportion of examinations including the 3D isotropic sequence per machine, is demonstrated in Table 2. Typical scanning parameters for the most commonly utilized MR machines at 3.0 T (scanner 1) and 1.5 T (scanner 2) are shown in Tables 3 and 4, respectively. Together, these two Siemens scanners comprised 231/300 of the knee MRIs included in this study. There are a few notable differences between the sequences and hardware of these two scanners in addition to differences in field strength. Specifically, the routine sequence for scanner 1 (3.0 T) included an axial T2-weighted sequence with fat suppression, while the axial fluid-sensitive sequence for scanner 2 (1.5 T) was PD-weighted with fat suppression. Additionally, scanner 1 (3.0 T) utilized an eight-channel transmit-receive coil, while the coil for scanner 2 (1.5 T) was a 15-channel transmit-receive unit. The typical scan duration for scanner 1 (3.0 T) was 22 min in comparison to 24 min for scanner 2 (1.5 T).

Table 2 Number of examinations performed on each specific scanner and proportion of examinations including a 3D isotropic sequence
Table 3 Specific scanning parameters for the most commonly utilized 3.0-T MRI machine (scanner 1 in Table 2). An eight-channel transmit-receive coil was utilized. PD = proton density; FS = fat saturated; SPACE = Sampling Perfection with Application optimized Contrast using different flip angle Evolution; NEX = number of excitations
Table 4 Specific scanning parameters for the most commonly utilized 1.5-T MRI machine (scanner 2 in Table 2). A 15-channel transmit-receive coil was utilized. PD = proton density; FS = fat saturated; SPACE = Sampling Perfection with Application optimized Contrast using different flip angle Evolution; NEX = number of excitations

Correlation of articular surfaces from MRI reports and arthroscopic surgery

All knee MRI examinations were interpreted by one of eight fellowship-trained musculoskeletal radiologists, with 5 to 41 years of experience (mean: 15.4 years, standard deviation: 15.9 years). Most examinations were interpreted in conjunction with a radiology resident or musculoskeletal imaging fellow. Knee arthroscopy was performed by one of nine orthopedic surgeons, with 6 to 25 years of experience (mean 15.0 years, standard deviation 7.2 years). Arthroscopy was performed with standard medial and lateral joint line portals and a medial suprapatellar inflow portal, and complete diagnostic assessment of the joint was performed per routine practice, which was recorded in the clinical operative note.

In both the MRI reports and operative notes from arthroscopic surgery, the articular surfaces of the knee were divided into six regions (patella, trochlea, medial femoral condyle, medial tibial plateau, lateral femoral condyle, and lateral tibial plateau). The MRI reports were structured, and the arthroscopy reports were freeform. All data for this study were only obtained from MRI and arthroscopy reports; neither images from the MR nor stored images from arthroscopies were evaluated routinely. MRI and arthroscopic descriptions of each articular surface were both classified using a modified Outerbridge classification, which is the system used by the orthopedic surgeons at this institution (Fig. 1). The adaptation of the Outerbridge classification is defined as follows: grade 0 is normal; grade I is heterogeneous signal (correlating to cartilage softening at arthroscopy); grade II is superficial fissuring, thinning, or defect <50% in depth; grade III is deep fissuring, thinning, or defect >50% in depth; grade IV is a full thickness defect. All musculoskeletal radiologists in the department are familiar with this staging system, and all routinely use descriptive terminology to describe the articular cartilage that would allow easy conversion to the Outerbridge classification. If two or more lesions were present on a single articular surface, the highest-graded lesion was recorded. If a specific articular surface was not explicitly described in the arthroscopy report or if several articular surfaces were described with a blanket statement such as “the remainder of the articular surfaces appeared normal,” then these unnamed articular surfaces were presumed normal. All data were entered into an Excel (Microsoft Corp., Seattle, WA) spreadsheet.

Fig. 1
figure 1

Illustration of MRI and arthroscopic grading of cartilage lesions, per the modified Outerbridge classification

Statistical analysis

Statistics were performed in R version 3.3.2 (R Foundation for Statistical Computing, Vienna, Austria). Two-sample t-tests were performed to compare the patient ages and days between MRI and arthroscopy in the two groups who underwent MRI at 1.5 and 3.0 T. Pearson’s chi-squared test was performed to evaluate the sex distribution between 1.5- and 3.0-T groups, the proportion of examinations at 1.5 and 3.0 T that included a 3D isotropic sequence, and the proportion of examinations at 1.5 and 3.0 T read by each radiologist.

To determine sensitivity and specificity, articular cartilage was divided into “disease negative” (grade 0) and “disease positive” (grades I–IV). Sensitivity and specificity with 95% confidence intervals were calculated for each articular surface at 1.5 and 3.0 T. The percentages of articular surfaces graded concordantly and within one grade of the arthroscopic grade were calculated for both field strengths and for each articular surface, with 95% confidence intervals. For each radiologist reading a significantly different proportion of examinations at 1.5 and 3.0 T, the per-radiologist sensitivity and specificity were calculated with 95% confidence intervals. Additionally, the knees were divided into two equal groups based on the time between arthroscopy and imaging, and sensitivity and specificity with 95% confidence intervals were calculated for knees with arthroscopy performed both early and late.

The agreement between the grading of articular cartilage in MR reports and arthroscopy was evaluated with the weighted kappa statistic with 95% confidence intervals, which was interpreted according to the recommendations of Landis and Koch [25] as follows: kappa 0.00–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; 0.81–1.00, almost perfect agreement.

Significance testing was performed by bootstrapping individual articular surfaces to obtain a standard error, after which the Z-test was performed to obtain p-values. Significance testing was performed for all articular surfaces between the 1.5 and 3.0 T groups (rather than subdivided into multiple individual analyses of each articular surface or by individual radiologists) to maximize statistical power and to reduce the probability of obtaining a spuriously significant p-value. Sensitivity, specificity, percentage of articular surfaces graded concordantly, percentage of articular surfaces graded within one grade, and kappa values were compared between the 1.5 and 3.0 T groups in this method. Significance testing was also performed between the groups with arthroscopy performed early and late. A p-value of <0.05 was considered statistically significant for all tests.

Results

Comparison of the 1.5 and 3.0 T groups

There was no significant difference between the 1.5 and 3.0 T groups for patient age (p = 0.13), sex distribution (p = 0.74), or time between MR and arthroscopy (p = 0.06). A significantly lower proportion of examinations included the 3D isotropic sequence at 1.5 T (63.6%) in comparison to 3.0 T (87.4%) (p = 5.2 × 10−6).

Comparison of diagnostic performance at 1.5 and 3.0 T

There were a total of 674 cartilage lesions in 1800 articular surfaces, for a prevalence of 37.4%. Of 1800 articular surfaces, 1532 were explicitly described in the arthroscopy reports, and the remainder were either described as normal in a blanket statement or presumed normal. There were 365 lesions in the group scanned at 1.5 T (prevalence of 36.9%) and 309 lesions in the group scanned at 3.0 T (prevalence of 38.1%). The distribution of cartilage lesions per grade per articular surface at both 1.5 and 3.0 T is given in Table 5.

Table 5 Number of cartilage lesions of each articular surface, with the prevalence in parentheses. MFC: medial femoral condyle; MTP: medial tibial plateau; LFC: lateral femoral condyle; LTP: lateral tibial plateau

The sensitivity, specificity, percent of articular surfaces graded exactly, and percent of articular surfaces graded within one grade are 61.4%, 82.7%, 62.2%, and 77.5% at 1.5 T, respectively, and 61.8%, 80.6%, 59.5%, and 75.6% at 3.0 T. For all articular surfaces at both field strengths, the sensitivity, specificity, percent of articular surfaces graded exactly, and percent of articular surfaces within one grade are 61.6%, 81.8%, 61.0%, and 76.6%, respectively. These parameters per articular surface at 1.5 and 3.0 T are shown in Table 6. Graphs of the sensitivity and specificity per articular surface at 1.5 and 3.0 T are shown in Figs. 2 and 3. Sensitivity was highest for the patella (78.4% at 1.5 T and 80.5% at 3.0 T) and lowest for the medial tibial plateau (47.9% at 1.5 T and 36.8% at 3.0 T). Specificity was highest for the trochlea (88.3% at 1.5 T and 91.1% at 3.0 T) and lowest for the patella (70.0% at 1.5 T and 79.7% at 3.0 T). The 95% confidence intervals of sensitivity and specificity were widely overlapping for each articular surface at both 1.5 and 3.0 T.

Table 6 Sensitivity, specificity, percent of articular surfaces graded exactly, and percent of articular surfaces graded within one grade at 1.5 and 3.0 T (95% confidence intervals in parentheses)
Fig. 2
figure 2

Sensitivity of MR reports at 1.5 and 3.0 T per articular surface with error bars representing 95% confidence intervals

Fig. 3
figure 3

Specificity of MR reports at 1.5 and 3.0 T per articular surface with error bars representing 95% confidence intervals

When comparing all articular surfaces, there was no significant difference in sensitivity, specificity, percent of articular surfaces graded exactly, and percent of articular surfaces graded within one grade between the groups scanned at 1.5 and 3.0 T (Table 7).

Table 7 Sensitivity, specificity, percent of articular surfaces graded exactly, and percent of articular surfaces graded within one grade of all articular surfaces at both 1.5 and 3.0 T. Standard error in parentheses. P-values were produced by the z-test. There was no significant difference between any of these parameters at 1.5 or 3.0 T

The agreement between MRI reports and arthroscopy for all articular surfaces was moderate (weighted kappa of 0.56 at 1.5 T and 0.55 at 3.0 T). Weighted kappa values per articular surface are given in Table 8. There was no significant difference between agreement at 1.5 or 3.0 T (p = 0.659).

Table 8 Agreement between MRI reports and arthroscopy with the weighted kappa statistic (95% confidence intervals in parentheses)

Comparison of patients undergoing arthroscopy early and late

In comparing patients who underwent arthroscopy early (mean: 24.0 days, standard deviation: 9.8 days) and late (mean: 110.1 days, standard deviation: 69.2 days), there was no significant difference in sensitivity (early: 62.2%; late: 61.1%; p = 0.658) or specificity (early: 82.8%; late: 80.7%; p = 0.267).

Comparison of proportion of examinations read at 1.5 and 3.0 T per radiologist

The number of examinations read at 1.5 and 3.0 T for each radiologist is shown in Table 9. For each radiologist, Pearson’s chi-squared test was utilized to determine whether there was a significant difference in the number of examinations read at each field strength. Three radiologists did read a significantly different proportion of examinations at 1.5 and 3.0 T; for these radiologists, individual sensitivity and specificity were calculated along with 95% confidence intervals (data also presented in Table 9). The sensitivities ranged from 55.0% to 69.8%; however, the 95% confidence intervals were overlapping. The specificities ranged from 78.2% to 83.2%, with widely overlapping 95% confidence intervals.

Table 9 Number of examinations read at 1.5 and 3.0 T by each radiologist (the order of radiologists is sorted by descending total number of examinations read). Significance testing was performed with Pearson’s chi-squared test (*P < 0.05). Sensitivity and specificity were calculated for each radiologist who read a significantly different proportion of examinations at 1.5 and 3.0 T (95% confidence intervals in parentheses)

Discussion

In clinical practice, conventional 2D fast-spin-echo (FSE) sequences are performed most commonly for global assessment of the intra-articular structures of the knee, including evaluation of the articular cartilage. There is a wide reported range of diagnostic performance of 2D FSE MR for assessment of the knee cartilage, with sensitivity ranging from 26%–96%, specificity 50%–100%, and accuracy 49%–94% [8,9,10,11,12,13].

Several studies with dedicated readers have shown increased diagnostic performance for assessment of knee articular cartilage at 3.0 T in comparison to 1.5 T with 2D FSE imaging, although the performance metrics have varied among studies. Kijowski et al. [22] reported a significant increase in specificity and accuracy at 3.0 T in comparison to 1.5 T (specificity of 78.0% at 1.5 T and 85.9% at 3.0 T; accuracy of 74.5% at 1.5 T and 80.1% at 3.0 T) in 200 patients with arthroscopic correlation, but no significant difference in sensitivity (69.3% and 70.5%). In contrast, Van Dyck et al. [23] showed 3.0 T to have significantly higher sensitivity in assessment of articular cartilage in 200 patients (69% at 3.0 T compared to 60% at 1.5 T), but not higher specificity or accuracy. In a study of 26 patients, Wong et al. [26] showed 3.0-T MRI to have higher sensitivity (75.6% at 3.0 T and 70.6% at 1.5 T) and accuracy (51.3% at 3.0 T and 42.9% at 1.5 T).

The 3.0-T MRI has also been shown to have better diagnostic performance in experimental animal studies. Link et al. [20] showed a statistically significant increased area under ROC curves at 3.0 T in comparison to 1.5 T in a study of 27 porcine knees. Similarly, Masi et al. [19] showed higher accuracy at 3.0 T in a study of ten porcine knees and two human volunteers.

Although the diagnostic performance of MRI reports for assessment of articular cartilage in this study is within the range of the literature utilizing dedicated readers, the previously demonstrated diagnostic advantage of 3.0 T MRI for assessment of knee articular cartilage was not achieved. The lack of improved diagnostic performance at 3.0 T was despite scanning parameters at 3.0 that potentially would allow for increased image quality, including a slight increase in matrix size at 3.0 T and increased number of acquisitions in two sequences (Tables 3 and 4). There are several reasons why the potential diagnostic advantage of 3.0 T was not demonstrated in this study, including some that are likely generalizable, and additional factors that may be unique to this institution.

Interobserver variability for imaging interpretation

Imaging evaluation of articular cartilage is challenging, and there may be considerable interobserver variability. A recent study of 5 readers independently evaluating 962 articular surfaces of the knee using CT arthrography demonstrated only fair global interobserver agreement on assessment of the cartilage (kappa = 0.35), and accuracy was shown to be improved with greater experience [27]. The MRI reports in our study were created by a relatively heterogeneous group of eight radiologists with musculoskeletal imaging experience ranging from 5–41 years. Additionally, three of the eight radiologists read a significantly different proportion of examinations at 1.5 and 3.0 T, with the sensitivity of these radiologists ranging from 55.0% to 69.8%, albeit with overlapping 95% confidence intervals. Although the low sample size per radiologist and potential confounding factors precluded us from testing for significance in the diagnostic performance between individual readers, varying performance may play a role in obscuring the potential diagnostic advantage of 3.0 T.

General clinical factors

In contrast to a dedicated research study where the participating readers can focus on the articular cartilage to the exclusion of the remainder of the knee joint, several clinical factors may contribute to varied diagnostic performance. For example, in a typical clinical workday, the radiologist may be required to concurrently integrate information from the medical record, manage interruptions, and interpret images. It is therefore possible that common clinical pitfalls such as fatigue, distraction, and satisfaction of search may impact the diagnostic performance in this environment [28,29,30].

Interobserver variability for arthroscopic evaluation

The interobserver agreement of grading articular cartilage lesions at arthroscopy in the knee has been shown to be generally good, but not perfect, with a study by Marx et al. [31] consisting of 6 surgeons reviewing 31 videos of articular cartilage demonstrating a kappa statistic ranging from 0.35 to 0.87. In our study, arthroscopic surgery was performed by a total of nine different surgeons with 4 to 25 years of experience, possibly leading to heterogeneity in the reporting of chondral lesions in the arthroscopy reports.

Limitations of arthroscopy

The agreement between arthroscopy and histology has been shown to be good, but not perfect, with a reported weighted kappa statistic of 0.619 [32]. A similar kappa statistic of 0.602 was demonstrated for the agreement between arthroscopic grading of knee cartilage in comparison to arthrotomy [33]. There has also been recent work showing a potential blind spot of the far posterior femoral condyle articular cartilage that may not be visible at arthroscopy [34], with a possible such lesion demonstrated in Fig. 4. Additionally, in the freeform arthroscopy reports, 268/1800 of the articular surfaces were either described as normal in a blanket statement or were not explicitly described. For the statistical analysis of this study, these 268 articular surfaces were presumed normal. It is possible that a low-grade chondral lesion in a different compartment from the main site of injury may not have been described by the arthroscopist. However, this potential omission would not be expected to affect the sensitivity of MR or the percentage of cartilage lesions graded within one grade.

Fig. 4
figure 4

MRI performed at 3.0 T with an apparent far-posterior cartilage abnormality of the lateral femoral condyle. Arthroscopy performed 29 days later did not reveal any cartilage defects. Sagittal PD SPACE (left image) and axial T2-weighted image with fat suppression (right image) demonstrate signal heterogeneity and superficial fissuring of the far posterior non-weightbearing lateral femoral condyle with subchondral cystic change. This region of the far posterior femoral condyle may not be visible during routine arthroscopy. In the current study, this was classified as a false-positive MR grade II lesion

Effect of time between imaging and arthroscopy

Although cartilage is generally considered to have limited healing potential [35], spontaneous healing of cartilage has been demonstrated clinically, for instance in medial opening-wedge high tibial osteotomy without additional cartilage surgery [35, 36]. It is conceivable that if a patient is treated with conservative measures such as bracing, physical therapy, or activity modification, then some healing may take place, especially if there is a long interval between imaging and arthroscopy. Conversely (and probably much more likely), further cartilage damage or new cartilage lesions may occur in the interval between imaging and arthroscopy. However, our study showed no significant difference in diagnostic performance if imaging was performed early (mean time interval of 24.0 days) or late (mean time interval of 110.1 days). Figure 5 demonstrates an apparent full thickness cartilage defect (grade IV) of the patella on MRI, while arthroscopy performed 170 days later demonstrated only softening (grade I) of the patellar cartilage. Figure 6 demonstrates a case where the articular surfaces were described as normal on MRI report (with apparent signal heterogeneity of the medial femoral condyle possibly evident in retrospect), while arthroscopy performed 50 days later demonstrated a delaminating cartilage lesion of the medial femoral condyle without exposed bone (grade III). It is conceivable but purely conjectural that these two discrepancies may reflect interval partial healing and interval worsening, respectively.

Fig. 5
figure 5

MRI performed at 1.5 T with an apparent full-thickness defect in the medial facet/median ridge of the patella (grade IV), as demonstrated on the sagittal proton-density-weighted image with fat suppression (left) and reformatted axial PD SPACE image (right). Arthroscopy performed 170 days later revealed some softening (grade I) of the patellar cartilage. This was classified as a true positive (cartilage disease present at both MR and arthroscopy), but grading was not within one grade

Fig. 6
figure 6

MRI performed at 1.5 T, with articular surfaces classified as normal. Some signal heterogeneity of the medial femoral condyle may be evident in retrospect as evident on the sagittal PD SPACE (left) and coronal PD-weighted image with fat suppression (middle). Arthroscopy performed 50 days after MRI revealed a delaminating cartilage lesion without exposed bone (grade III) in the medial femoral condyle. This was classified as an MR false-negative grade III lesion

Unique to this institution is the fact that a 1.5 T MR scanner (scanner 2 in Table 2) is located in the same outpatient building as the clinic location of two surgeons who specialize in cartilage repair surgery, which accounted for 106/300 examinations in this study. Since these surgeons are well known to the staff radiologists, and patients of these surgeons often prefer the convenient availability of this scanner, it is possible that radiologists spend extra time and attention assessing the articular surfaces of these patients.

Another factor that deserves discussion is the inclusion of a 3D fast spin-echo isotropic sequence (SPACE on the Siemens scanners) in the majority of examinations. This sequence results in intermediate-weighted images with hyperintense synovial fluid with minimal blurring and allows the creation of multiplanar reformations in any orientation after a single acquisition [7]. SPACE and the similar Cube sequence from General Electric are non-gradient 3D sequences that have shown promise for assessment of intra-articular structures of the ankle [37] and knee [38], including the articular cartilage [39,40,41,42,43,44,45,46]. The musculoskeletal radiologists at this institution typically use the 3D sequence as a problem-solving tool rather than for primary interpretation, and these sequences have shown variable diagnostic performance for assessment of articular cartilage in the literature. Kijowski et al. showed that Cube offered increased sensitivity but lower specificity in detecting cartilage lesions of the knee, in comparison to 2D FSE images [47], which was born out in a similar study by Ristow et al. [48]. Milewski et al. concluded that there was no significant difference in accuracy comparing 3D fat-suppressed FSE imaging with conventional 2D FSE in 34 patients [49]. Schaefer et al. showed that that SPACE showed decreased accuracy in comparison to conventional 2D FSE sequences [42]. In the present study, given that a 3D sequence was performed more commonly at 3.0 T (87.4% in comparison to 63.6% at 1.5 T), it is unlikely that the inclusion or exclusion or this sequence could explain the absence of performance benefit at 3.0 T.

This study has important limitations. While many of the factors hypothesized to affect the correlation between clinical MRI reports and arthroscopic surgery are generalizable to various practice environments, the institution-specific factor of having a 1.5 T scanner that is commonly utilized in imaging of patients referred by the two cartilage repair surgeons likely introduces selection and interpretation bias. It would be difficult to quantify the degree of potential bias due to the potential confounding variables (such as different patient population, different scanner hardware, different scanning parameters, etc.) precluding the ability to perform subgroup analyses reliably. Additionally, the study may be underpowered despite the fact that the number of MRI reports correlated with arthroscopy (300) is greater than prior dedicated reader studies assessing knee articular cartilage. Although a sample size calculation was performed for a 10% (±5%) confidence interval width, it may be necessary to power the study for a narrower confidence interval width given the heterogeneity of radiologists, orthopedic surgeons, and other clinical factors.

Conclusion

This study shows no significant difference in diagnostic performance of clinical MRI reports for assessment of the knee articular cartilage imaged at 1.5 and 3.0 T using arthroscopy as the standard of reference at our institution. The confluence of several patient-, machine-, physician-, and institution-specific factors in a large and busy clinical practice most likely outweigh the inherent diagnostic advantage of 3.0 T MRI that has been previously demonstrated in the literature.