Introduction

Low back pain, (LBP), defined as pain in the lumbar spinal region with or without sciatica, is a common cause of disability worldwide, with a lifetime prevalence of 60–85% [13]. In about 95% of all cases LBP is nonspecific, yet it can be caused by serious underlying pathology such as; disc herniation, spinal stenosis, infection, inflammation, tumour or fractures [4, 5]. In case of suspicion of serious spinal pathology, diagnostic confirmation is required since delayed treatment has been associated with poorer outcomes [1, 4, 5]. Despite the guideline recommendations, diagnostic confirmation is also often requested when the likelihood of a specific cause of LBP is very low [4, 6]. This is often due to the fear of missing serious pathology or to reassure patients [7]. Diagnostic confirmation can be obtained by several imaging techniques one among which is magnetic resonance imaging (MRI). These imaging results combined with available clinical information aid the physicians in their treatment decisions [8, 9]. Of all available techniques MRI is currently the imaging modality of choice. MRI has the advantage of not using ionising radiation and has good visualizing capacities especially of the soft tissues. Thereby it is regarded the most useful method for the detection of spinal infections, spinal metastases, nerve root disorders and disc abnormalities [10]. Nonetheless, the role of MRI in diagnosing lumbar spinal pathology remains controversial [11, 12]; Partly because many studies did not report differences in patient outcomes when comparing lumbar MRI and subsequent treatment to conservative care without diagnostic imaging. However, most studies included patients with a low risk of serious underlying pathology [1315]. Furthermore, studies have not provided conclusive information about the diagnostic accuracy of MRI [16]. This is largely explained by the absence of a ‘gold’ reference standard for identifying serious underlying spinal pathology in LBP [13, 17, 18]. Additionally heterogeneity in primary diagnostic studies complicates the interpretation of results of diagnostic test accuracy. Potential sources of heterogeneity include variation in; considered pathologies, MRI techniques, reference standards, patient population and methodological quality. To provide more evidence on the diagnostic role of MRI in LBP and sciatica, this systematic review aims to summarize the available evidence on its diagnostic accuracy in the identification of serious underlying pathology.

Methods

Literature search

A database search was conducted using MEDLINE, EMBASE and CINAHL (until December 2009). The search strategy was designed to identify the publications for four separate diagnostic test accuracy reviews of an imaging technique i.e. MRI, CT, X-ray and myelography in the identification of lumbar spinal pathology. The search strategy was developed to find prospective or retrospective cohort or case-control studies assessing the diagnostic accuracy of MRI in the identification of lumbar spinal pathology (i.e. radicular syndrome, spinal stenosis, spinal tumours, spinal fractures, spinal infection/inflammation, disc herniation, spondylolisthesis, spondylolysis, ankylosing spondylitis, disc displacement, osteoporotic fractures and other degenerative disc diseases) in adult patients with LBP or sciatica.

MRI had to be compared to a reference test defined as (1) findings at surgery, (2) expert panel opinion or (3) diagnostic work up. Only published full reports with sufficient data to construct a diagnostic two-by-two table were included. Next to the electronic search, reference lists of all retrieved relevant publications were checked. Two reviewers (AV/RvR and MW) independently applied the selection criteria to all titles and abstracts and reviewed relevant full papers. Disagreements were resolved in a consensus meeting or by consulting a third review author (MvT) in case of persisting disagreement.

Data extraction and methodological quality

Data extraction and quality assessment was performed by two review authors (RvR and MW) independently using a standardised form ensuring adequate reliability of collected data. Data were extracted on:

  • Author, year of publication and journal;

  • Study design

  • Study population characteristics: pathology considered, age, gender, numbers of subjects for inclusion in study and analysis, clinical features with in- and exclusion criteria, level of measurement and setting

  • Index and reference test characteristics: type of test, year and methods of execution, cut-off values, positivity thresholds and outcome scales

  • Diagnostic parameters; diagnostic two-by-two table or parameters to reconstruct this table

Methodological quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) list consisting of 20 items to be scored as “yes”, “no” or “unclear” [19, 20]. Scoring criteria are available upon request. Disagreements were resolved by consensus. A radiologist (AG) was consulted for the assessment of the used test technology (item 13). No weights for different quality items or summary quality score were applied since the interpretation of summary scores is problematic and potentially misleading [21, 22].

Statistical analysis

For each primary diagnostic study sensitivity and specificity with 95% confidence intervals (CI) were calculated and presented in a forest plot. Results were stratified into different pathology subgroups. For meta-analysis (for homogenous studies) a bivariate random effects analysis (using STATA 10 software) was used to calculate pooled estimates of sensitivity and specificity [23]. This method provides a random effects estimate of the mean pooled summary estimates of sensitivity and specificity with corresponding 95% confidence intervals thereby dealing with both within and between study variation together with any correlation that might exist between the sensitivity and the specificity. The resulting pooled summary estimates of sensitivity and specificity including a 95% confidence ellipse were plotted in ROC space and presented together with their corresponding prior probabilities, likelihood ratios (LR+ and LR−) and the diagnostic odds ratio (DOR). If no pooled estimates could be calculated, the range of sensitivity and specificity together with the prior probability was presented for each subgroup. Several sources may have contributed to heterogeneity in diagnostic accuracy parameters, which could only be addressed descriptively due to limited number of included studies.

Results

Literature search and data extraction

Figure 1 summarizes the search process, which resulted in eight articles on MRI being included in this review. In total these eight studies included 467 patients of which 1,476 discs or foramens were assessed. On average more males were included (50–78%). Six studies were prospectively designed. All studies were performed in a secondary care setting with most studies using surgery as the reference standard and one study using expert panel consensus. One study [27] presented accuracy data for both a full MRI and a limited MRI protocol performed in the same patients, only results obtained by the full MRI protocol were included since this protocol better resembled the protocols used in other studies including both axial and sagittal weighted images. The included studies were stratified according to the different pathologies: i.e. lumbar disc herniation [16, 2429] and spinal stenosis [26, 30]. Lumbar disc herniation was subdivided into (1) Herniated nucleus pulposus (HNP) and (2) Nerve root compression due to HNP.

Fig. 1
figure 1

Flow chart of literature search process

Methodological quality

Figure 2 shows the scores for each quality item across the eight included studies, while the quality assessment of the individual studies can be found in Fig. 3. Poor reporting of several quality items hindered assessment of the risk of bias and may have affected the validity of the reported sensitivities and specificities.

Fig. 2
figure 2

Results of the assessment of methodological quality items presented as percentage across all included studies

Fig. 3
figure 3

Methodological quality for each included study

Findings

The sensitivity and specificity of the studies are presented in a forest plot (Fig. 4), subdivided into the different pathology subgroups.

Fig. 4
figure 4

Forest plot of study results per pathology group describing sensitivities and specificities with accompanying 95% confidence intervals as well as the numbers of TP true positive, FP false positive, FN false negative and TN true negative results

Lumbar disc herniation

Herniated nucleus pulposus

Six studies [2429] assessed the diagnostic accuracy of MRI for lumbar disc herniation i.e. HNP including sequestration, extrusion and protrusion or disc bulging. In all studies surgery was the reference standard. One study [25] presented the results for the combined identification of HNP and degenerative disc disease, consequently this study was not included in the pooled analysis.

Five studies (197 patients; 322 discs) were considered sufficiently clinically homogenous for a pooled analysis. One study [27] assessed more than one disc level per patient. The prior probabilities ranged from 49% [26] to 77% [28] with a mean prior probability of 63%. The sensitivity of MRI for HNP ranged from 64 to 92% and the specificity ranged from 55% to even 100% [28]. The results of the bivariate analysis are plotted graphically in ROC space (Fig. 5). The pooled summary estimate of sensitivity and specificity were 75% (95% CI 65–83%) and 77% (95% CI 61–88%), respectively. This corresponds with a LR+ of 3.30 (95% CI 1.76–6.21), a LR− of 0.33 (95% CI 0.21–0.50) and a DOR of 10.12 (95% CI 3.88–26.39).

Fig. 5
figure 5

Summary ROC plot presenting pooled estimates of sensitivity and specificity including 95% confidence ellipse and hierarchical ROC curve summary based on bivariate analysis of five studies describing the diagnostic accuracy of MRI for identifying lumbar disc herniation

Due to small number of studies included, sources of heterogeneity were only explored descriptively. Both studies with a relatively low prior probability of 49% [26, 29] also presented slightly lower sensitivities (0.64 and 0.71 vs. 0.72–0.92) with similar specificities compared to those studies with a higher prior probability. Furthermore the two studies [27, 28] subjected to partial verification revealed a slightly higher sensitivity (0.83 and 0.92) compared to studies avoiding partial verification (0.64–0.72). There were no clear differences in specificity. The study [24] that did not use an appropriate MRI technology demonstrated a lower specificity yet a similar sensitivity when compared to those studies using an appropriate MRI technology.

Nerve root compression due to HNP

The diagnostic accuracy of MRI for nerve root compression due to HNP was evaluated in two studies (n = 128) with prior probabilities of 77.9% [16] and 93.9% [27]. The results demonstrated a high sensitivity of 81 and 92% with varying specificity of 52 and 100%. The high specificity in the study of Chawalparit et al. might be due to partial verification since not all patients underwent the reference test (surgery). The studies used a different reference standard i.e. findings at surgery versus expert panel consensus precluding statistical pooling.

Spinal stenosis

Two studies described the accuracy of MRI in the identification of spinal stenosis [26, 30] in 983 foramina of 118 patients, and used surgery as the reference standard. The clinical homogenous studies demonstrated rather different prior probabilities (2.7% [30] and 83% [26]), possibly due to different population characteristics, or the unequal number of foramina/levels assessed in each study. Both studies showed high sensitivities of 87 and 96% coupled with lower specificities of 68 and 75%. Since this group only consisted of two studies, pooling of summary estimates was not performed.

Discussion

This review summarizes the evidence of the diagnostic accuracy of MRI in adult patients with LBP or sciatica in identifying specific lumbar spinal pathology. The pooled summary estimates for HNP sensitivity 75% (95% CI 65–83%) and specificity 77% (95% CI 61–88%) resulted in a positive predictive value of 84% and a negative predictive value of 64% given the mean prior probability of 63% in the included studies. This suggests that a substantial proportion of the patients will be incorrectly classified. The studies on spinal stenosis and nerve root compression due to HNP were not suitable for pooling. All results should be interpreted cautiously, since the results were only based on a limited number of studies of moderate quality with several unaddressed sources of heterogeneity limiting the generalizability and validity of the results.

One of the most important sources of heterogeneity was the wide range in prior probabilities (from 2.7 to 82.9% for spinal stenosis [26, 30]). This is likely to influence the results as the posterior probability depends highly on the prior probability. The variation in prior probabilities will, at least partly, be the result of patient selection and setting; often scored as unclear or inadequate in the included studies. In this review studies with relatively high prior probabilities might be overrepresented as some studies [24, 28] may have included only those patients likely to undergo, or indeed underwent, surgery. Furthermore all studies were performed in a secondary care setting, where patients are likely to have a higher prior probability due to referral of only those patients with a relative high suspicion of specific pathology. Results only apply to these settings.

A second issue is the lack of a gold reference standard. This resulted in a large variation of reference tests. In this review, studies were only included if they used surgery, expert panel consensus or diagnostic work up as reference standard. Surgery, especially when combined with clinical follow-up, is often regarded as the best reference test, but subject to partial verification as often only patients with a strong suspicion of a specific underlying cause will be subjected to surgery. Verification bias might lead to a higher sensitivity and a lower specificity [31], yet it has also been found that it increases both sensitivity and specificity [32]. Those studies [25, 27, 28] in which partial verification is likely to be present, indeed showed a somewhat higher sensitivity compared to studies without verification bias along with comparable or higher specificity.

A third issue is related to the index test characteristics, such as the reliability of the index test, i.e. the (inter- and intra-) observer variation, which may have affected the results due to lack of consensus on radiological definitions of findings related to subjective symptoms [33]. However, the extent of the effect is difficult to estimate, since the issue of observer variation was only addressed in two studies; both with relative poor scores (kappa <0.70 for interobserver variation) [27, 30]. Most studies did not report objective cut-off values, used clear definitions or detailed procedures, possibly leading to different numbers of positive and negative test results. Furthermore, two of the included studies [24, 25] used older MRI techniques with less advanced visualising capacities probably resulting in a poorer identification of lumbar spinal pathology.

Finally, heterogeneity arose from the fact that some studies reported their results at disc level whereas others at patient level. Most studies assessed several discs per patient and some even presented unequal numbers of discs measured per patient, resulting in multiple (and unequal) inclusions of the same patient in the analyses. This leads to smaller confidence intervals and possibly to an overestimation of diagnostic accuracy. This occurs when patients with multiple signs of lumbar disc herniation at subsequent disc levels are more likely to be subjected to multiple testing than patients without these signs.

Strengths and weaknesses

To our knowledge this study is the first systematic review summarizing the available evidence on the diagnostic test accuracy of MRI for the identification of lumbar spinal pathology in LBP patients. A potential limitation of this review is the use of a study design filter in our search strategy. The rationale for the use of this filter was to limit the harvest of 18,239 citations found without the filter, even though using methodological filters has limitations [34]. To reduce the risk of missing important articles, the references of both included articles as well as review articles were checked, yet still there is always a chance of missing relevant publications.

Furthermore, in this review, the accuracy of the isolated use of MRI was assessed, though in routine clinical practice MRI will often be used in combination with other clinical observations or test results. Diagnosis and treatment decisions are never based on MRI findings only but always on the entire clinical assessment including MRI, standing X-rays and clinical evaluation. Yet, our search strategy did not identify studies assessing the accuracy of MRI combined with other tests.

Implications for clinical practice and research

The importance of the findings should be interpreted in light of its clinical consequences. The value of MRI in identifying lumbar spinal pathology largely depends on the role of MRI results in clinical decisions regarding the management of LBP or sciatica and resulting outcomes. This could be either to exclude patients without the target condition to spare invasive treatments, or to identify as many patients as possible when delayed treatment results in worse patient outcomes. The role of MRI thereby largely depends on the suspected underlying pathology as well as the setting and patient characteristics.

This review demonstrates that a considerable proportion of patients will be incorrectly classified by MRI and may not be offered adequate management of LBP. However, the evidence for the diagnostic accuracy of MRI identified by this review is not conclusive and is limited to lumbar disc herniation and spinal stenosis. Consequently this cannot be generalized to other specific pathologies underlying LBP or sciatica. To provide more profound evidence on the role of MRI in the identification of lumbar spinal pathology in LBP patients, there is a strong need for high quality and accurately reported studies.

Additionally, in this review, the accuracy of the isolated use of MRI was assessed, although in routine clinical practice MRI will not be used isolated but in combination with other clinical observations or test results. Combined diagnostic information may provide different estimates of diagnostic accuracy that can be better placed in its clinical context. Future research should therefore address the added diagnostic value of MRI in the triage of LBP patients and its impact on subsequent treatment decisions.