Introduction

Subchondral bone marrow lesions (BMLs), visible on magnetic resonance imaging (MRI) have been shown to be an important feature in osteoarthritis (OA). BMLs are associated with pain [1,2,3,4], predict cartilage defect progression and cartilage volume loss [5,6,7], and total joint replacement (TKR) surgery [4, 8,9,10].

Conventionally, BMLs are assessed on fluid-sensitive MRI sequences such as T2-weighted fat saturation, short tau inversion recovery (STIR), intermediate-weighted fat saturation (IW-FS), and proton density fat saturation (PD-FS), although they can be detected using other MRI sequences [8, 11, 12]. Previous reports indicate that gradient-recalled echo (GRE)-type MRI sequences such as T1-weighted gradient echo and spoiled gradient-recalled acquisition in steady state (SPGR) are insensitive to marrow abnormalities and may underestimate the lesion size, compared to fluid-sensitive sequences [13,14,15]. Although many studies have compared the performance of different MRI sequences in regard to their ability to detect BMLs (prevalence), reliability, and sensitivity to change [15,16,17,18,19,20], there are limited studies on how BMLs on different MRI sequences correlate with clinical outcomes.

In a recent study in a pain-free knee cohort, BMLs present on both T2- and T1-weighted fat saturation MRI sequences were associated with medial tibial cartilage volume loss and incident knee pain over 2 years [21]. Furthermore, in separate studies, it has been shown that BMLs identified on T2- and T1-weighted images predict joint replacement surgery among people with OA [8, 10]. This study aimed to determine the association of BMLs detected on two different MRI sequences with pain, physical function limitation, stiffness, cartilage defect progression, and cartilage volume loss in older adults over 2.7 years, as well as knee joint replacement surgery over 13.3 years. Given that BMLs generally appear larger on T2-weighted MRI compared to T1-weighted MRI [14, 15], we hypothesised that BMLs would be easier to detect on T2-weighted MRI sequences and would be more strongly associated with clinical outcomes compared to BMLs present on T1-weighted MRI sequences.

Methods

Participants

This study was a part of the Tasmanian Older Adult Cohort (TASOAC) study, an ongoing prospective, population-based study aimed at identifying the environmental, genetic, and biochemical factors associated with the development and progression of OA at multiple sites (hand, knee, hip, and spine). Participants between the ages of 50 and 80 years were randomly selected from the electoral roll in Southern Tasmania (population, 229,000), with an equal number of men and women. The overall response rate was 57%. Participants were excluded if they were institutionalised or reported a contraindication to having a right knee MRI scan (e.g. implanted pacemaker, metal sutures, presence of shrapnel or iron filings in the eye, claustrophobia, right knee replacement, knee too large for scanner). Figure 1 shows the study flowchart. Of all initially eligible participants, 1100 enrolled in the study, and 1099 attended a baseline clinic between March 2002 and September 2004. Follow-up data were collected for 875 eligible participants at a subsequent clinic approximately 2–3 years later. The MRI machine was decommissioned halfway through the follow-up period; therefore, MRI scans were available for approximately half of the follow-up participants.

Fig. 1
figure 1

Flowchart of study participants

All research conducted was in compliance with the Declaration of Helsinki and was approved by the Southern Tasmanian Health and Medical Human Research Ethics Committee. All subjects gave informed written consent.

Anthropometrics

Weight was measured to the nearest 0.1 kg (with shoes, socks, and bulky clothing removed) using a single pair of electronic scales (Seca Delta Model 707). Height was measured to the nearest 0.1 cm (with shoes and socks removed) using a stadiometer. Body mass index (BMI) was calculated as kilogrammes per square metre.

Radiographic Knee OA

A standing anteroposterior semi-flexed view of the right knee with 15° of fixed knee flexion was performed at baseline and scored individually for osteophytes and joint space narrowing on a scale of 0–3 (0 = normal and 3 = severe) according to the Altman atlas [22] as previously described [23]. The presence of radiographic OA was defined as any score ≥ 1 for joint space narrowing or osteophytes.

Magnetic Resonance Imaging

MRI of the right knee was acquired at baseline and follow-up with a 1.5-T whole-body magnetic resonance unit (Picker, Cleveland, OH, USA) by using a commercial transmit/receive extremity coil. Image sequences included the following: (a) a T1-weighted fat saturation three-dimensional (3D) gradient-recalled acquisition (T1-w GRE MRI) in the steady state; flip angle, 30°; repetition time, 31 ms; echo time, 6.71 ms; field of view, 16 cm; 60 partitions, 512 × 512-pixel matrix; acquisition time, 5 min 58 s; one acquisition; sagittal images were obtained at a slice thickness of 1.5 mm without a interslice gap; and (b) a T2-weighted fat saturation two-dimensional (2D) fast spin echo (T2-w FSE MRI), flip angle, 90°; repetition time, 3067 ms; echo time, 112 ms; field of view, 16 cm, 15 partitions, 228 × 256-pixel matrix; sagittal images were obtained at a slice thickness of 4 mm with an interslice gap of 0.5–1.0 mm.

Bone Marrow Lesions

Subchondral BMLs were assessed on T2-w FSE and T1-w GRE fat saturation MR images by using OsiriX software at the medial and lateral sites of the femur and tibia, and the superior and inferior sites of the patella at baseline. BMLs were defined as areas of increased signal intensity on T2-w FSE and T1-w GRE, located immediately under the articular cartilage. One trained observer measured the BMLs on each sequence by measuring the maximum area of the lesion on a single slice where the area appeared the largest in mm2 using software cursors. If more than one lesion was present at the same site, the BML with the largest size was used. Baseline and follow-up MRI images were read paired with the chronological order known to the observer. Intra-observer reliability was assessed in 40 randomly selected subjects after a 2-week interval between the readings. The intra-class correlation coefficient (ICC) using two-way mixed-effects model [24] was 0.98 (95% CI 0.96, 0.99) for T2 and 0.94 (95% CI 0.90, 0.96) for T1-weighted sequences. For analysis, BMLs were categorised into three groups: (1) BMLs present on T2-weighted MRI (T2-w FSE), (2) BMLs present on T1-weighted MRI (T1-w GRE), and (3) BMLs present on both T2-weighted and T1-weighted MRI (T1 and T2).

Cartilage Morphology Evaluation

Cartilage defects were assessed by a trained observer at baseline and follow-up on T1-weighted MR images (score range 0–4), as previously described: grade 0 = normal cartilage; grade 1 = focal blistering and intra-cartilaginous low-signal intensity area with an intact surface and base; grade 2 = irregularities on the surface or base and loss of thickness < 50%; grade 3 = deep ulceration with loss of thickness > 50%; and grade 4 = full-thickness chondral wear with exposure of subchondral bone. A cartilage defect also had to be present on at least two consecutive slices. The cartilage was considered to be normal if the band of intermediate signal intensity had a uniform thickness. If more than one defect was present on the same site, the highest score was used. Medial tibial, lateral tibial, medial femoral, lateral femoral, and patellar compartments were measured. Baseline and follow-up images were read at different time points. The baseline scores were available to the reader when assessing the follow-up scores. Intra-observer repeatability was assessed in 50 subjects with at least 1-week between the two measurements with ICC of 0.93, 0.92, 0.95, 0.80, and 0.94 at the medial tibia, medial femur, lateral tibia, lateral femur, and patellar, respectively [25]. Change in cartilage defect score from baseline to follow-up was dichotomised to 0 and 1: 0 representing no change or a decrease in cartilage defects and 1 representing an increase of one or more on the 0–4 scale.

Knee tibial and patellar cartilage volume was measured by a trained observer on T1-weighted MR images at baseline and follow-up by means of image processing on an independent workstation using Osiris software as previously described [25, 26]. The volumes of individual cartilage plates (medial tibia and lateral tibia) were isolated from the total volume by manually drawing disarticulation contours around the cartilage boundaries on a section by section basis. These data were then re-sampled by means of bilinear and cubic interpolation (area of 312 × 312 mm and 1.5 mm thickness, continuous sections) for the final 3D rendering. The baseline and follow-up images were read at different time points. The baseline cartilage volume value was available to the reader when assessing the follow-up scans. The coefficient of variation (CV) was 2.1% for the medial tibia, 2.2% for the lateral tibia, and 2.6% for patella.

Knee femoral cartilage volume was determined at baseline and follow-up by means of image processing on an independent workstation using CartiscopeTM (ArthroLab Inc., Montreal, Quebec, Canada), as previously described [27,28,29]. The quantitative segmentation of the cartilage–synovial interfaces was carried out with the semi-automatic method under reader supervision and with corrections when needed. Cartilage volume was evaluated directly from a standardised view of 3D cartilage geometry as the sum of elementary volumes. Baseline and follow-up images were read paired with chronological order known to the reader. The coefficient of variation percentage (CV) was approximately 2% [27]. The cartilage volume assessment was done for the medial and lateral condyles delineated by the Blumensaat’s line.

WOMAC Scores

Knee pain, physical function limitation, and stiffness were assessed using the self-administered Western Ontario and McMaster Universities OA Index (WOMAC) [30] scale, which was scored using a 10-point numeric rating scale from 0 (no pain, no function limitation, and no stiffness) to 9 (most severe pain, most severe physical function limitation, and most severe stiffness) [30] at baseline and follow-up. There are 5 components of pain, 17 of function limitation, and two of stiffness included. Each of the subscales are summed to form a total score for pain (range 0–45), function limitation (range 0–153), and stiffness (range 0–18). The total WOMAC score was calculated by summing pain, function limitation, and stiffness total scores (range 0–216) [30]. For cross-sectional analysis, we categorised the subcales into three levels (none, mild, moderate to severe). This categorisation was done due to non-normally distributed WOMAC data. These levels were based on pain cut-offs used by an OA Expert Group in the Global Burden of Disease (GBD) 2010 study [31]. Total pain score was categorised as 0 (none), 1–13 (mild), and 14–45 (moderate to severe). Total function limitation score was categorised as 0 (none), 1–45 (mild), and 46–153 (moderate to severe). Total stiffness score was categorised as 0 (none), 1–4 (mild), and 5–18 (moderate to severe). Total WOMAC score was categorised as 0 (none), 1–64 (mild), and 65–216 (moderate to severe). For longitudinal analysis, change in WOMAC scales was calculated as follow-up minus baseline.

Total Knee Replacement (TKR) Surgery

The incidence of TKR surgery was determined by data linkage to the Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR) between 1 March 2002 and 21 September 2016. AOANJRR started data collection in Tasmania in September 2000 and collects data from both public and private hospitals. Data validation against State and Territory Health Department data is done using a sequential multi-level matching process [32]. Identifying information such as first name, last name, sex, date of birth, current, and historical addresses were provided to AOANJRR, which were used to identify participants who had a TKR. Ethical approval for data linkage was obtained from the Tasmanian Health and Medical Human Research Ethics Committee.

Comorbidities and Pain Medication Use

Participants used a self-reported questionnaire to report whether or not they had any of the following comorbidities (yes/no); diabetes, heart attack, hypertension, thrombosis, asthma, bronchitis/emphysema, osteoporosis, hyperthyroidism, hypothyroidism, rheumatoid arthritis, and other major illness. They also used a self-reported questionnaire to list the pain medications they were taking (medication name, dose, and frequency).

Statistical Analysis

The exposure for all analyses was BMLs present on T2-w FSE; BMLs present on T1-w GRE; and BMLs present on both MRIs. Five outcomes were analysed and fitted into a separate model for the three exposures; baseline WOMAC scales, change in WOMAC scales, worsening or stabilising of site-specific cartilage defects, change in cartilage volume, and incident of TKR.

Adjacent category ordinal logistic regression was used to estimate the association of BMLs on T1, T2, and both MRI sequences with baseline categories of knee pain, physical function limitation, stiffness, and total WOMAC. Multivariable models were adjusted for age, sex, BMI, and radiographic OA. Standard errors were adjusted to account for any correlation of observations for the same individual (i.e. BMLs present on both MRI sequences).

Linear regression was used to estimate the association of BMLs present on T1-w GRE, T2-w FSE, and both MRI sequences with change in WOMAC scales in separate models. Standard errors were adjusted to account for any correlation of observations for the same individual. Multivariable models were adjusted for age, sex, BMI in the first instance, then additionally for radiographic OA and baseline WOMAC score. The outcome variable was transformed using Box-Cox transformation to satisfy model assumptions.

Site-specific associations between BMLs and cartilage defects were defined as the association within the same site (e.g. medial tibial BMLs predicting medial tibial cartilage defect worsening). Log–binomial regression was used to estimate the risk of worsening site-specific cartilage defects over 2.7 years for baseline BMLs, adjusted for age, sex, and BMI and baseline cartilage defect score.

Multi-level mixed-effects linear regression was used to estimate the longitudinal association of baseline BMLs with cartilage volume loss over 2.7 years. Point estimates of change in cartilage volume over 2.7 years for those with BMLs at baseline compared to those without BMLs at baseline were reported. Multivariable models were adjusted for age, sex, and BMI.

Due to perfect prediction of BMLs with TKR (i.e. all those participants who underwent TKR surgery had a BML at baseline), we were unable to model these data and present it descriptively.

We conducted a sensitivity analysis to examine whether number of comorbidities and pain medication use to examine whether these factors were confounders.

All statistical analyses were performed using Stata 14 (Stata-Corp, College Station, Texas, USA). The significant p value was set at the value of less than 0.05 (two-tailed).

Results

Characteristic of Participants

The study sample contained 394 participants who had MRI measures at baseline and the 2-year follow-up. There were no significant differences in participant characteristics, including age, sex, BMI, baseline cartilage defects, and cartilage volume, between the study sample (n = 394) and the remainder of the cohort (n = 705) who did not have MRI scans at follow-up. The characteristics of the participants stratified by BMLs on any of the MRI sequences at baseline, are shown in Table 1. There were no significant differences in terms of age, sex, BMI, radiographic OA, WOMAC scales, total cartilage volume at baseline, and absolute change in total cartilage volume between those with and without baseline BMLs. Prevalence of any cartilage defects at baseline, an increase in cartilage defect score, and incident TKR was higher in those with baseline BMLs.

Table 1 Characteristics of participants split by the absence and presence of BMLs on any of the MRI sequences

BML Prevalence and Size

231 (59%) participants had BMLs on at least one sequence. There were 388 BMLs detected on T2-w FSE and 378 BMLs detected on T1-w GRE. 354 (86%) of BMLs were detected on both MRI sequences and very few BMLs were detected on only one of the sequence types [i.e. 34 (8%) BMLs only on T2-w FSE and 24 (6%) only on T1-w GRE] as shown in Fig. 2. An example of this is presented in Fig. 3. For those BMLs present on both sequences, while the size differences were not statistically significant, overall, mean area for total BMLs on T2-w FSE were slightly larger (Fig. 4).

Fig. 2
figure 2

Venn diagram of BML distribution. Yellow circle represents the BMLs on T2-w FSE, blue circle represents the BMLs on T1-w GRE, and the green overlapping area represents the BMLs present on both sequences. (Color figure online)

Fig. 3
figure 3

BMLs are indicated by white arrows. 1a and 1b: BMLs present on T2-w FSE but not on T1-w GRE. 2a and 2b: BMLs present on T1-w GRE but not on T2-w FSE. 3a and 3b: BMLs present on both MRIs sequences

Fig. 4
figure 4

Mean BML size (mm2) at each knee site on T2-w FSE and T1-w GRE

Knee Pain, Functional Limitation, Stiffness, and Overall Disability (Total WOMAC Score)

Table 2 shows cross-sectional associations between BMLs present on T2-w FSE, T1-w GRE, and both MRI sequences and baseline category of knee pain, physical function limitation, stiffness, and total WOMAC score. The presence of BMLs on T2-w FSE, T1, and both MRI sequences at baseline were associated with increased odds of moving to a higher category of knee pain, physical function limitation, and total WOMAC score compared to the reference group with no BMLs. The effect sizes were similar for each sequence and remained unchanged and significant after adjustment for age, sex, BMI, and further adjustment for radiographic OA. Participants with a BML present on T2-w FSE, T1, and both MRI sequences were consistently estimated to have increased odds of moving to a higher category of stiffness but evidence for the association was weaker.

Table 2 Adjacent category logistic regression of baseline knee pain, physical function limitation, stiffness, and total WOMAC on BMLs present on T2-w FSE, T1-w GRE, and both MRI sequences

We next examined whether the presence of BMLs T2-w FSE, T1, and both MRI sequences compared to the reference group with no BMLs was associated with changes in knee pain, physical function limitation, stiffness, and total WOMAC score over 2.7 years (Table 3). BMLs present on T2-w FSE, T1, and both MRI sequences were associated with the worsening of pain and stiffness over 2.7 years, with similar effect sizes, after adjustment for age, sex, BMI, radiographic OA, and baseline WOMAC score. There was no evidence for an association between BMLs present on T2-w FSE, T1, or both MRI sequences with changes in physical function limitation and total WOMAC score in unadjusted or adjusted analyses.

Table 3 Linear regression estimates of change in knee pain, physical function limitation, stiffness, and total WOMAC after 2.7 years on presence of BMLs on T2-w FSE, T1-w GRE, and both MRI sequences at baseline

Cartilage Defects

Table 4 shows the relative risks of worsening site-specific cartilage defects over 2.7 years for BMLs present on T2-w FSE, T1, and both MRI sequences. The presence of BMLs on T2 T2-w FSE T1, and both MRI sequences were associated with a higher risk of site-specific cartilage defect worsening over 2.7 years in adjusted analysis at all sites, except medial tibial and inferior patellar. The relative risk estimates for each site were of a similar magnitude for the three sequence types, with the largest effect observed for the lateral femoral site.

Table 4 Log–binomial regression of worsening between site-specific cartilage defects over 2.7 years on site-specific presence of BMLs on T2-w FSE, T1-w GRE, and both MRI

Cartilage Volume Loss

Table 5 shows estimated changes in site-specific cartilage volume over 2.7 years for site-specific BMLs present on T2-w FSE, T1, and both MRIs, compared to the reference group with no BMLs. The presence of BMLs was associated with significantly greater cartilage volume loss at the lateral tibial and superior patellar for all MRI sequences. Increased cartilage volume loss was also associated with the presence of medial femoral BMLs identified on T2-w FSE, and with lateral tibiofemoral BMLs identified on both MRI sequences and on T2-w FSE, but not for BMLs on T1-w GRE. While there was no evidence for an association between BMLs and site-specific cartilage volume loss at the medial tibial, lateral femoral, inferior patellar, medial tibiofemoral, total tibiofemoral and overall sites, the effect size estimates were consistently negative.

Table 5 Mixed-effects model regression point estimates of mean change in site-specific cartilage volume loss over 2.7 years for site-specific BMLs present on T2-w FSE, T1-w GRE, and both T1 and T2, compared to the reference group with no BMLs

Total Knee Replacement (TKR)

6% of our study population had TKR (19 cases). 100% of TKR participants had a BML on both MRI sequences and on T1-w GRE. 95% of TKR participants had a BML on T2-w FSE. This indicates that BMLs were a very strong predictor of TKR on each sequence type. We were not able to model these data due to the perfect prediction.

Further adjustment of all our presented models for number of comorbidities and use of pain medication did not change effect sizes by more than 10%, data not shown.

Discussion

This study describes associations between BMLs detected on two different MRI sequences with clinical outcomes in OA including pain, function, stiffness, cartilage damage and loss, and TKR surgery. We found that subchondral BMLs were commonly seen on both T2-w FSE and T1-w GRE sequences in an older adult population. While the difference in BML size on each sequence was not statistically significant, BML area was slightly larger on the T2-w FSE sequences compared to T1-w GRE sequences. Despite this, contrary to our hypothesis, associations with clinical outcomes including symptoms, cartilage damage and loss, and TKR were similar. This suggests that either T2-w FSE or T1-w GRE MRI sequences could be used separately to assess BMLs.

Our study found that 86% of BMLs were seen on both MRI sequences in our sample of community-dwelling older adults. Prevalence assessments for BMLs in previous studies vary widely. One study reported 74% in community-dwelling adults without knee pain [21], whereas another study reported 75% in knees with and without medial joint space narrowing [33]. Our rate of BMLs detected on both MRI sequences is higher than the previous studies. A number of factors may contribute to this inconsistency including the use of different sequence types, study populations, study sizes, and different BML scoring systems and readers.

There have been limited studies evaluating how BMLs on different MRI sequences correlate with clinically important outcomes. Recently, Wluka et al. [21] reported that BMLs present on both T1- and T2-weighted MRI sequences were associated with increased cartilage loss and incident knee pain compared to BMLs seen only on T2-weighted sequences. These findings support recommendations suggesting a combination of both fluid-sensitive and GRE-type MRI sequences should be used. However, our study did not find this. We found that BMLs were typically seen on both MRI sequences, and were equivalently associated with symptoms, cartilage damage and loss, and TKR surgery. This suggests that there is no meaningful difference in prediction of clinically important outcomes using either sequence. Furthermore, in studies where both fluid-sensitive and GRE-type MRI sequences are not available, either sequence could be used for clinical research.

There is great debate about the ideal sequence to assess BMLs. Several previous studies have been conducted comparing the performance of different MRI sequences in regard to BML detection, reliability, and sensitivity to change over time [15,16,17,18,19,20]. This has led to mixed recommendations about what is the optimal MRI sequence to measure BMLs. As BMLs often appear larger on fluid-sensitive sequences compared to T1-weighted sequences [11, 19, 20], authors often suggest measuring them using water-sensitive sequences [11, 34]. Our study also found that BMLs appeared slightly larger on the T2-weighted sequences compared to the T1-weighted sequences. However, mixed findings from other studies [17, 18] have led to the hypothesis that a combination of both fluid-sensitive and GRE-type MRI sequences would result in superior accuracy in assesssing BMLs. One other study has assessed this in addition to ours; they observed no difference between a fluid-sensitive sequence (IW-TSE) compared to a DESS sequence in detecting the overall prevalence or sensitivity to change over time [20]. This led the authors to conclude that either sequence could be used for assessment of BML change in a clinical trial, which is consistent with our study findings.

Studies which have used histology to characterise BMLs have offered great insight into the compositional characteristics of BMLs. Zanetti et al. were one of the first to examine the histology of BMLs and found that they consisted of oedema, fibrosis, trabecular bone changes, and necrosis [35]. Combining different MRI sequences may offer new insights into the different cellular changes occurring in BMLs [36, 37]. A study using a combination of fluid-sensitive and GRE-type MRI sequences showed significantly greater oedema, fibrosis, and necrosis in BMLs present on both MRI sequences compared to BMLs present on only fluid-sensitive sequences [38].

This study has several potential limitations. First, this study consisted of 394 participants who had MRI scans at both time points, therefore excluding 705 from our larger cohort. However, the two groups were similar in terms of age, sex, BMI, baseline cartilage defects, and volume so our findings should be generalisable. Second, in our study, the initial response rate is lower than desirable (57%), but it is similar to other Australian cohort studies [39]. The relationship between outcomes and exposures is not necessarily biassed due to a lower response rate [40]. The study quality and validity should be judged with other criteria and not the response rate alone [41]. Third, the BMLs assessed in this study were read by one reader who measured the BMLs on both sequences at the same time. Therefore, the reader may have been more likely to pick up BMLs on each sequence because they were comparing the images from each sequence to each other. This may have led to an overestimate of BML presence on each sequence. However, this method does provide assurance because the reader was able to confidently document whether or not a BML was present on each sequence, meaning that BMLs were less likely to be missed by the reader. Forth, baseline WOMAC scales were categorised into tertiles as the data were not normally distributed and had a large amount of zeros. While there is no consensus on the exact cut points to be used, we adopted cut-offs based on the expert consensus from an OA Expert Group from the GBD 2010 study.

Conclusions

BMLs were commonly seen on both T1-w GRE and T2-w FSE MRI sequences. They were equivalently associated with clinical outcomes including symptoms, worsening of cartilage defects, cartilage volume loss, and TKR. Our study demonstrates that BMLs can be assessed on either MRI sequence alone with no clinical predictive advantage of either sequence.