Background

Low back pain (LBP) is an extremely common health problem [1], with an enormous global burden [2]. While some progress has been made in the management of LBP; the best options provide only small or moderate treatment effects [3, 4]. One explanation for the failure to identify treatments with large treatment effects is the current inability to identify a specific cause for LBP in most people [3]. As a result, a single intervention is usually provided to heterogeneous groups of patients with potentially different causes of their pain. Identifying more homogenous subgroups of LBP patients has been identified as a key research priority in the field [5]. Most previous research in this area has focussed on identifying clinical and psychosocial variables associated with patients who respond better to different interventions [6, 7]. However, very little attention has focussed on identifying subgroups based on biological mechanisms or anatomical structures. Some early work has investigated subgroups based on different pain mechanisms [810] due to increasing evidence for the role of central mechanisms in the development of chronic LBP [11]. Subgrouping based on possible spinal patho-anatomical causes of LBP has received little attention and its value is unknown.

The importance of magnetic resonance imaging (MRI) findings such as disc herniation, facet joint arthropathy and modic changes (bone marrow and endplate lesions visible on MRI) in identifying the source of an individual patient’s LBP remains unclear and controversial. Many MRI findings are common in people without LBP, yet these findings are typically more common in people with LBP than those without [1214]. Research into the importance or otherwise of MRI findings has been frustrated by the lack of a widely accepted gold standard [14]. An alternate approach in such cases is to investigate if the presence of MRI findings predicts different response to interventions [15]. If this was the case, it would provide evidence for the importance of such findings and a logical rationale for selecting specific interventions for individual patients.

To our knowledge, there has been no review of a range of MRI findings as effect modifiers for LBP interventions. Therefore, the aim of this systematic review was to investigate if the presence of MRI findings at baseline identifies patients with LBP or sciatica who respond better to particular interventions.

Methods

The review protocol was specified in advance and registered on PROSPERO: international prospective register of systematic reviews (refer to this link for full access of the protocol, http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42013006571). The PRISMA statement was used to guide the conduct and reporting of the study [16].

Search strategy

A sensitive search was performed of MEDLINE, EMBASE and The Cochrane Central Register of Controlled Trials to identify potential studies from the earliest records up to 20th of June, 2015. We used a search strategy based on the recommendations of the Cochrane Back Review Group [17] for randomised controlled trials (RCTs) and LBP, combined with Medical Subject Headings and keywords related to ‘MRI’ and ‘effect modification/subgroups’. After piloting the search strategy, we decided to use two different searches and then combine the results.

Search 1 included terms from each of the following domains: (1) RCTs, (2) LBP/sciatica and (3) MRI. Search 2 included terms from each of the following domains: (1) RCTs, (2) LBP/sciatica and (3) effect modification/subgroup. Searches 1 and 2 were merged to generate the final search strategy (Refer to Appendix Tables 4 and 5 for the full search strategy). Reference and citation tracking of relevant articles were performed. A final list of the included studies was sent to two experts in the field who reviewed the list for possible omissions.

Study selection

To be included, studies were required to meet all the following criteria:

  1. (1)

    Participants: recruited samples of populations with current LBP or sciatica, who were not diagnosed with serious disease (e.g. cancer, spinal infection, spinal fracture, inflammatory arthritis or cauda equina syndrome) as the source of LBP.

  2. (2)

    Interventions: investigated any type of intervention for LBP, including conservative, surgical, or placebo. Included studies needed to have compared any intervention for LBP or sciatica, with any type of intervention, placebo or no treatment control.

  3. (3)

    Outcome: reported for either pain (e.g. measured by the visual analogue scale, numerical rating scale) or disability (e.g. measured by the Roland Morris Disability Scale, Oswestry Disability Index). In studies that included participants with a primary complaint of LBP, self-reported LBP was considered the primary outcome while in trials of sciatica self-reported leg pain was considered the primary outcome [3].

Study design: included studies needed to be an RCT which had used methods capable of identifying whether patients with a specific MRI finding had a different treatment effect than those without the MRI finding or with a different MRI finding. Studies were required to have included and reported a patient’s results separately for either (1) sample with and without a particular MRI finding (i.e. disc herniation) or (2) people with a different type or severity of MRI finding (i.e. mild vs. severe disc degeneration).

One reviewer screened titles and abstracts of each citation and excluded clearly irrelevant studies. For each potentially eligible study, the full text was retrieved and two reviewers independently assessed whether the study fulfilled the inclusion criteria. In cases of disagreement, a third reviewer was consulted and a decision made by consensus. The search had no language restrictions.

Data extraction

Relevant data were independently extracted by two reviewers using a standardised form. In cases of disagreement, a joint review of the original article was performed until consensus was reached. The extraction form included the following criteria: clinical settings, sample, age, treatment groups, MRI findings and point estimates and measures of variability for outcomes. Outcome data were extracted for short-term outcomes (0 to ≤6 months) and long-term outcomes (>6 months). When multiple time points fell within the same category, we used the one closest to 3 months for short-term and closest to 12 months for long-term.

Risk of bias

There is no established method to assess the risk of bias for studies of effect modification. We, therefore, chose to use the risk of bias tool recommended by the Cochrane Back Review Group [17] to assess the conduct of the RCTs included in our review. The risk of bias findings was, therefore, not emphasised in the interpretation of results, as would be common in a review of an intervention. Two reviewers independently assessed the criteria of all included studies. In cases of disagreement, a third reviewer was consulted and a decision made by consensus (refer to Appendix Table 6 for further details on the criteria list for the methodological quality assessment). Data pooling was appropriate only if the studies were considered homogeneous with regard to population sample, MRI measure, clinical outcomes and treatment.

Analysis

Due to the small number of included trials and the heterogeneity between them in terms of MRI findings, treatment and clinical outcomes, we were unable to undertake the pre-specified meta-analysis. Therefore, each MRI finding of the lumbar spine was examined for its individual capacity for effect modification and interaction. The results are presented descriptively for LBP and sciatica populations.

We extracted (1) mean difference and 95 % confidence intervals (95 % CI) from studies that reported continuous outcomes, (2) hazard ratios (HR) and 95 % CI from studies that reported time-to-event categorical outcomes, and (3) contingency table data to calculate Odds Ratios (OR) for categorical outcomes. If not reported or provided, the effect modification and subgroup interaction were calculated using the method suggested by Kent et al. [7] for continuous outcomes and the method suggested by Hancock et al. [18] for categorical outcomes. In brief, for continuous outcomes this involved using the following formulae:

((in subgroup and received intervention treatment) − (in subgroup and received comparison treatment)) − ((not in subgroup and received intervention treatment) − (not in subgroup and received comparison treatment)).

For dichotomous outcomes, the approach involves recreating a replication of the data set and running logistic regression.

Four studies had key information not available from published manuscripts and additional information was requested [1922]. Two studies reported combined RCT and observational cohort data [19, 20]. The separated RCT data for the intention-to-treat analysis were requested. The effect modification and/or the subgroup interaction were calculated by the current review authors, for six studies [19, 20, 2326].

In this review, the term subgroup interaction refers to how much more effective (compared with the control intervention) the intervention is in the subgroup (MRI positive) than for those not in the subgroup (MRI negative).

Results

Study selection

The search identified 7163 papers. After review of titles and abstracts, we excluded 7096 (Fig. 1). Based on full-text review of 67 papers, we excluded a further 59 and included eight trials in the review [1926]. The primary reasons for the exclusion of trials retrieved in full-text are noted in Appendix Table 7. No additional studies were identified after contacting two experts in the field of MRI and LBP or sciatica.

Fig. 1
figure 1

Flow diagram of review process

Risk of bias

The risk of bias assessments for the included studies is shown in Table 1. Randomisation, drop-out rate, co-interventions and outcome timing were the only criteria scored ‘yes’ in all trials. Participant blinding, outcome assessor blinding and the absence of selective outcome reporting were the criteria most commonly scored ‘no’.

Table 1 Risk of bias of the included studies

Study characteristics

The characteristics of the included studies are shown in Table 2. Three trials studied patients with LBP [2426] and five studied patients with sciatica [1923]. The samples were recruited from secondary health care [1921, 23, 24], and tertiary health care [22, 25, 26] settings. The number of participants varied from 120 to 472 and most studies sampled predominantly adults in their middle age. The treatments evaluated in the trials included surgery, injections and rehabilitation. No study had the primary aim of investigating MRI effect modifiers. LBP duration was categorised as acute (<6 weeks), sub-acute (6–12 weeks) and chronic (greater than 12 weeks) [17].

Table 2 Individual study characteristics

Results of the review

Due to the heterogeneity of samples, MRI findings, clinical outcomes and treatment, it was not possible to perform meta-analysis of the results for any of the included studies. For ease of interpretation, the studies were grouped into LBP population [2426] or sciatica population [1923] as the importance of MRI findings might be quite different in these two populations. Detailed findings of all included studies are presented in Table 3.

Table 3 Subgroup treatment effect and interaction for low back pain and sciatica population

Low back pain population samples

One study reported a population with sub-acute LBP (symptoms ≥6 weeks) [25] and two reported populations with chronic LBP (symptoms ≥1 year) [24, 26]. All three studies investigated Modic changes (Modic changes type 1 corresponding to vertebral body oedema and hyper-vascularity; Modic changes type 2 reflecting fatty replacements of the red bone marrow; and Modic changes type 3 consisting of subchondral bone sclerosis [27, 28]) as effect modifiers [2426], while one study investigated disc herniation and facet joint arthritis [26].

Cao et al. [25] investigated various intradiscal injection regimens for patients with Modic changes (n = 120). Patients with Modic changes type 1, when compared with patients with Modic changes type 2, have a little more improvement in disability in the short-term (3 months) when treated by Diprospan (steroid) injection, compared with saline (mean difference 8.30; 95 % CI 1.01–15.59, on a 0–100 disability scale). Other subgroup interactions for pain and disability with Modic changes were not significant.

Hellum et al. [26] investigated whether features of degenerative disc were effect modifiers for disc prosthesis compared with multidisciplinary rehabilitation at two-year follow-up (n = 154). The presence of Modic changes type 1 and/or 2 was not a significant effect modifier for improvements in disability (percentage of patients improved ≥15 points on a 0–100 scale, categorised by yes/no), with OR ranging from 0.63 (95 % CI 0.15–2.65) to 2.96 (95 % CI 0.65–13.52). Similarly, disc herniation, facet joint arthropathy and high intensity zone were not significant effect modifiers for improvement in disability when treated with surgery, compared with rehabilitation [26].

Buttermann [24] investigated whether Modic changes type 1 was an effect modifier for spinal injection and steroid, compared with discography alone at 1–3 and 12–24 months (n = 171). The presence of Modic changes type 1 was not a significant effect modifier for injection success (coded as ‘yes’ if the overall opinion about their injection was considered successful) at short- (OR 7.94; 95 % CI 0.40–156.46) or long-term follow-up (OR 2.20; 95 % CI 0.11–45.98).

Sciatica population samples

Three studies reported potential MRI effect modifiers in one population sample with sub-acute sciatica (symptoms ≥6 weeks) [2123] and two with chronic sciatica (symptom ≥12 weeks) [19, 20]. Three studies investigated disc herniation [2022], two investigated spinal stenosis [19, 21], one investigated disc height [21] and one investigated different types of MRI findings (disc prolapse vs. spinal stenosis) [23] as effect modifiers.

Pearson et al. [20] studied whether features of disc herniation were effect modifiers for discectomy, compared with conservative rehabilitation at three and 12 months follow-up (n = 472). Patients with central disc herniation, when compared with patients without central disc herniation, had a substantially better response to surgery at long-term follow-up (12 months), mean difference 1.60; 95 % CI 0.17–3.03 (0–6 point Likert scale). In patients with central herniation, one-year pain outcomes were substantially better (mean difference 1.60; 95 % CI 0.10–3.10; 0–6 point Likert scale) for those receiving surgery compared with rehabilitation. In those without central herniation, surgery was no better than rehabilitation (mean difference 0.00; 95 % CI −0.40 to 0.40; 0–6 point Likert scale). Other disc herniation characteristics (e.g. posterolateral and protrusion) were not associated with significant treatment interactions.

Peul et al. [22] investigated if disc herniation was an effect modifier for response to early surgery compared with prolonged conservative care (n = 283). Sequestrated disc herniation (Hazard ratio, 0.94; 95 % CI 0.56–1.57) and disc herniation enhancement (Hazard ratio, 0.85; 95 % CI 0.47–1.54) did not have any significant interaction with treatment, for 12 month outcomes (very much improved and much improved were coded as recovered).

Arts et al. [21] investigated if disc herniation, spinal stenosis and disc height were effect modifiers for response to tubular discectomy, compared with conventional microdiscectomy, at one-year follow-up (n = 325). None of the MRI findings produced significant interactions with treatment for long-term recovery outcomes.

Pearson et al. [19] investigated whether features of spinal stenosis were effect modifiers for response to surgery, compared with rehabilitation, in 278 patients at three and 24 months follow-up. Spinal stenosis did not produce any significant interactions with treatment for short- and long-term disability outcomes.

Tafazal et al. [23] investigated whether features of disc herniation (disc prolapse) or lumbar spinal stenosis were effect modifiers for the efficacy of corticosteroids injection in 150 patients. Neither MRI features produced significant interactions with bupivacaine (a local anaesthetic) and steroid or bupivacaine alone at short-term follow-up.

Discussion

Statement of principal findings

This review could only identify eight studies, which provided adequate data to assess if MRI findings were treatment effect modifiers. Three studies reported data from people with LBP and five studies reported data from people with sciatica. The included studies investigated 38 interactions for combinations of different MRI findings, interventions and outcomes. No effect modifiers were consistently identified across more than one study. A single study shows that patients with Modic changes type 1 have a little more improvement on disability when compared with patients with Modic changes type 2, in the short term when treated by Disprosan injection, compared with saline (mean difference 8.30; 95 % CI 1.01–15.59, on a 0–100 disability scale). A single study reported that patients with sciatica and central disc herniation (compared with those without central disc herniation) have substantially greater benefits from surgery than rehabilitation (mean difference 1.60; 95 % CI 0.17–3.03, on a 0–6 point Likert scale). However, these are single study results and caution should be taken when interpreting the findings. Some other subgroup interactions presented trends and confidence intervals that included potentially important interactions; however, these trials were underpowered due to their small sample sizes.

Strengths and weaknesses of the study

We believe that this is the first systematic review of RCTs to investigate if a range of MRI findings are effect modifiers for interventions in people with LBP and/or sciatica. The strength of this review is the use of a pre-specified protocol and the comprehensive approach to identifying all suitable RCTs. We also provide data for all included trials on the interaction effect as well as the subgroup effects for those with and without the MRI finding of interest. We used a sensitive search strategy and contacted experts in the field, reducing the risk of missing any important trial. A limitation of our review is that the inconsistency of MRI findings, interventions and outcomes investigated across the studies inhibited our ability to perform meta-analysis. Furthermore, most trials were not powered for subgroup interaction analysis, as it was not the primary aim of the study. As a result, some non-significant findings may include a potentially important interaction (e.g. OR 7.94; 95 % CI 0.40–156.46) [24]. Another limitation of our review is the possibility of publication bias as we did not attempt to identify unpublished trials that might have been found in other clinical trials registries and in conference proceedings. Furthermore, this review could have missed important trials that for some reason were not captured by our search, not cited by a relevant study or unknown to our experts in the field.

In our review, we used the Cochrane Risk of Bias tool to assess quality of RCT conduct; however, this tool does not necessarily reflect the risk of bias associated with effect modification analyses. Currently, there is no validated measure to assess risk of bias in effect modification analyses. Factors including the use of an appropriate test of interaction, adequate power for interaction test and a priori hypothesis of direction of effect may be important to the risk of bias in effect modification studies but are not included in Cochrane risk of bias tool [29, 30].

The reliability of different MRI findings is important for the interpretation of this study. Carrino et al. [31] reported the reliability of lumbar MRI findings to be generally moderate to good. For example, they reported Kappa values of 0.66, 0.55, 0.59 and 0.54 for disc degeneration, spondylolisthesis, Modic changes and facet arthroplasty, respectively. The type of MRI machine used and the experience of the image readers may influence reliability. A recent study found that Modic changes Type 1 was detected more often using low field MRI (0.3 Tesla), whereas Modic changes Type 2 was detected more often when using high field MRI (1.5 Tesla) [32].

Comparison with other studies

Three previous reviews have investigated effect modifiers for LBP treatments. Two of these reviews investigated effect modifiers for specific interventions (manual therapy/exercise and psychosocial intervention) [6, 7]. These reviews did not include MRI findings as potential effect modifiers. The third review specifically investigated Modic changes as effect modifiers [33]. Interestingly, all reviews found a limited number of suitable studies, which had inconsistent findings, had small sample sizes, and provided limited evidence for strong effect modifiers. These results corroborate our findings. The review investigating Modic changes as an effect modifier for different LBP treatments had several method limitations [33]; for example, the inclusion of single subgroup designs (i.e. studies including all people with Modic changes and no people without Modic changes) as these types of studies cannot robustly test if effect modification occurred [34].

Meaning of the study

From 38 treatment effect modification interactions investigated, only two were positive: one for LBP and one for sciatica populations. These positive findings could represent spurious findings. However, the lack of statistically significant interactions may also be partly due to most studies being underpowered for this type of analysis. Consequently, it remains unclear whether MRI findings are important effect modifiers for interventions for LBP and sciatica populations. What is clear is that there are very few trials and most of these are underpowered, reinforcing the need for more and larger trials in this potentially important and evolving area.

Recommendations for future research

Studies on subgroup interaction are a research priority in LBP [5] and well-conducted trials provide the possibility to answer the important and controversial question about the importance or otherwise of MRI findings. The need for larger, high-quality trials is evident. Due to the nature of subgroup and interaction analyses, such trials need a larger sample size than if their only interest was the main effect of treatment. One way to gain statistical power would be to combine several sets of individual patient data, to acquire an adequate number of individuals with and/or without an MRI finding of interest. Furthermore, it is important that future studies use standardised definitions of LBP, sciatica, MRI findings and clinical outcomes. Without this, it is very difficult to perform meta-analyses or compare findings between studies.

A key finding from our review was that only trials including surgery or injections had investigated MRI findings as effect modifiers for LBP interventions. We could find no evidence for the importance or otherwise of MRI findings for conservative interventions. While we recommend the need for larger, high-quality trials, it is important to note that limited evidence exists for the use of surgery in most patients with LBP [35].

Conclusions

This review identified eight studies that investigated if MRI findings identify patients with LBP and/or sciatica who respond better to a variety of interventions. Included studies recruited participants from secondary and tertiary health care settings. While two statistically significant interactions were found between specific MRI findings and response to treatment, the limited number of suitable studies and the heterogeneity between them did not permit definitive conclusions about effect modification. Further well-designed, adequately powered studies are required.