Introduction

Low back pain is one of the most common complaints in primary care. The great majority of low back pain is benign in nature, and specific diagnoses are rarely made [17]. The main purpose of the primary care assessment is to identify those cases where low back pain is caused by serious spinal pathology such as vertebral fracture, malignancy, infection, or inflammatory disease [13].

Whilst malignancy is the most common of these serious diseases, it is estimated to occur in less than 1% of primary care patients with low back pain [7]. However, early detection and treatment of spinal malignancies is important to prevent the spread of any metastatic disease and the development of further complications such as spinal cord compression [15]. The consequences of late or missed diagnosis of spinal malignancy necessitates use of accurate screening tools in primary care. Ideally, primary care practitioners should be able to identify the small number of patients with spinal malignancy without subjecting a large proportion of their low back pain patients to unnecessary diagnostic testing [11].

Clinical guidelines for the management of low back pain recommend the use of “red flag” screening questions to alert clinicians to the presence of serious disease, and indicate when further investigation is required [13]. The evidence for using these “red flags” is often based on single studies [1] or simply referenced to previous guidelines in which there was no evidence [17]. Most clinical features considered to be “red flags” for malignancy in low back pain are derived from the study performed by Deyo and Diehl in 1988 [6].

Because of the importance of identifying patients with low back pain caused by spinal malignancy in primary care and the relative lack of data in the clinical guidelines, we performed a systematic review. We aimed to describe the diagnostic accuracy of tests used in primary care to screen for spinal malignancy in patients with low back pain.

Materials and methods

Data sources

A comprehensive search of the literature was performed to identify all relevant original, peer-reviewed articles evaluating tests for spinal malignancy in patients presenting with low back pain. The primary search was performed from the earliest available dates to 15th August 2006, on the MEDLINE, EMBASE, and CINAHL electronic databases. A subject-specific search strategy was used, combining sensitive searches of the diagnostic (index) tests available to primary care practitioners, and the target disease (low back pain) [4] (Appendix 1). The index tests included information from the history and physical examination, diagnostic imaging, and laboratory tests. Non-English language reports were included, but articles were excluded from analysis if appropriate translation was not available.

From the results of the electronic search, the bibliographies of all systematic reviews and eligible diagnostic and screening studies were reviewed. Eligible studies were entered into Web of Science to identify any articles in which they had been cited. Contact was made with experts on diagnostic testing, and on low back pain, to identify unpublished studies missed by the search process and to review the list of identified studies to ensure the search was comprehensive.

Study selection

The titles of the studies identified by the search were screened in order to exclude those that were clearly outside the scope of the review. To determine eligibility for the analysis, studies were included if they satisfied the following criteria: (a) reported on a cohort of patients presenting for either treatment for low back pain or lumbar spine X-rays; (b) confirmed the diagnosis of malignancy with an appropriate reference standard; (c) evaluated the diagnostic performance of a test available to primary care practitioners; and (d) reported results in sufficient detail to allow reconstruction of contingency tables of the raw data.

Study quality assessment

There are several potential threats to internal and external validity in studies of diagnostic accuracy [2]. Studies with methodological shortcomings may overestimate the accuracy of a diagnostic test [3] therefore, all eligible studies identified by the search underwent methodological quality assessment using the QUADAS scale [18].

Data extraction

Two authors independently extracted the following data from each eligible article; author(s), year, journal, setting (i.e., primary care, secondary care), index tests, reference standard, number of patients, prevalence of cancer, true-positive, true-negative, false-positive, and false-negative results for the index tests. Disagreements were resolved via discussion and consensus. Because there were empty cells in the contingency table, a value of 0.5 was added to each cell in order to circumvent computational problems [10]. From the extracted data, sensitivity, specificity, and positive (LR+) and negative (LR−) likelihood ratios with their 95% confidence intervals (95%CI) were calculated using Meta-DiSc software [19]. We considered clinical features to be useful for raising the index of suspicion of malignancy if the LR+ and lower bound of the 95%CI were greater than 1. Conversely, the LR- were considered useful to lower the suspicion if the point estimate and upper bound of the 95%CI were below 1. It was our intention to pool the results and perform a meta-analysis if sufficient statistical and clinical homogeneity existed amongst the studies. If insufficient data were reported in the articles, we contacted the authors of the original studies in order to gain access to the primary data.

Results

Search results

The search of the electronic databases retrieved 8,944 articles (Fig. 1). After review of the titles, 8,461 articles were excluded because they were clearly ineligible. The remaining studies were categorised according to study type to screen out any reviews, case series, case reports, and case-control studies. Two authors reviewed the titles and abstracts of the cohort studies to identify all studies evaluating a cohort of low back pain patients. Any discrepancies were resolved by reading the full text and subsequent consensus. Four systematic reviews were identified by the search, and were read to identify any eligible studies missed by the search strategy.

Fig. 1
figure 1

Study selection process and reasons for exclusion

The full text of the 13 studies that investigated cohorts of low back pain patients were read by two authors, and assessed for eligibility. Only six studies assessed tests available to primary care clinicians for the diagnosis of malignancy and reported data in sufficient detail for analysis [5, 6, 8, 9, 12, 16].

Study characteristics

The six eligible studies assessed a total of 5,097 patients presenting for low back pain treatment or lumbar spine X-rays (Table 1). The prevalence of malignancy in these studies ranged from 0.1 [12] to 3.5% [8]. Three of the eligible studies recruited patients seeking low back pain treatment from walk-in hospital clinics [5, 6, 9]. The other studies reported on patients recruited from secondary referral centres [12], patients presenting to an accident and emergency department [16], or from the office of an orthopaedic surgeon [8]. The most common reference standard used in the studies was X-ray, although the retrospective studies used the final clinical diagnosis as the reference standard [8, 9]. Two studies also used a 6-month follow-up to identify patients with malignancy who may not have received an X-ray [5, 6].

Table 1 Study characteristics

Study quality assessment

To be eligible for this review, studies needed to have used an appropriate reference standard; hence this item was not included in the quality assessment table (Table 2). Most studies were either of poor quality or poorly reported, fulfilling between two and six of the 13 criteria. Inadequate reporting was a problem in all of the studies, with no study reporting sufficient information to determine if all criteria had been met. There was poor reporting of the details of the index tests and the reference standard, and whether the tests were interpreted in a blinded fashion. Most studies were subject to partial verification bias, as they failed to perform the same reference standard on the entire cohort or on a random sample of patients.

Table 2 Study quality assessment using QUADAS scale

Index test results

Data on a total of 22 different clinical features were extracted from the 6 eligible studies (Table 3). Four features were investigated by more than one study; age >50 years, a previous history of cancer, not improved after 1 month, and clinician judgement. The results for these features were pooled and are also presented in Table 3.

Table 3 Clinical features and data extracted from eligible studies

The features investigated can be separated into features from the clinical assessment (both history and physical examination) of the patient, or results of laboratory testing. For the history and physical examination features, only age ≥50 years (LR− = 0.34) [5, 6, 8, 9] had a significant LR−. A number of features had significant LR+, including a previous history of cancer (pooled estimate from two studies = 23.7); failure to improve after 1 month (pooled estimate from two studies: LR+ = 3.0), no relief with bed rest (LR+ = 1.7), and duration of pain >1 month (LR+ = 2.6) [5, 6, 8, 9] age ≥50 years (pooled estimate from four studies: LR+ = 2.2).

The use of some laboratory-based test results had significant likelihood ratios, such as erythrocyte sedimentation rate (ESR) ≥ 50 mm/h (LR+ = 18.0; LR− = 0.46), the presence of anaemia (LR+ = 3.9; LR− = 0.53), hematocrit < 30% (LR+ = 18.2), and white blood cell count (WBC) ≥ 12,000 (LR+ = 4.1) [6].

The accuracy of clinician judgement in the identification of patients with malignancy was assessed by two studies and had LR+ (95%CI) of 11.9 (4.8–29.6) in an accident and emergency setting [16], and 12.6 (1.1–143.9) in a secondary referral centre [12].

One study reported on a combination of features; age >50 years or unexplained weight loss or a past history of cancer or no improvement in low back pain after a month. This combination had a reported sensitivity of 100% [6], and a specificity of 60%, which was reported in a subsequent paper [7]. The LR+ (95%CI) was 2.4 (2.1–2.7) and the LR− (95%CI) was 0.06 (0.00–0.91).

Discussion

Using clinical features or tests to screen for serious pathologies in low back pain patients involves identifying features which, when present, raise the index of suspicion and when absent, lower the index of suspicion of having the disease. For malignancy in particular, raising the index of suspicion is most important due to the prevalence of the disease within this patient group being around 1%. The results of this systematic review identified a number of features, which raise the probability of malignancy, however these features are not equally useful for this purpose. The LR+ of the features ranged from 1.7 to 55.6 and this needs to be appreciated when judging the clinical importance of a red flag identified in a clinical assessment.

Age ≥50 years, no improvement after 1 month, a previous history of cancer, and no relief with bed rest are commonly suggested “red flags” for malignancy in clinical guidelines [17], and are supported by the results of this review. Of these four red flags, a previous history of cancer is the most informative with a pooled LR+ of 23.7. The other three all had LR+ about 3. Other common “red flags” include unexplained weight loss, fever, thoracic pain, or being systematically unwell [17]. Being systemically unwell was not evaluated by any of the eligible studies, and the other features did not significantly raise or lower the probability of having malignancy [6].

While laboratory tests are not recommended routinely in low back pain patients [13, 17] tests for ESR and anaemia were found to be useful screening tools for malignancy. Hematocrit <30% (LR+ = 18.2) and WBC ≥12,000 (LR+ = 4.2) also significantly raise the suspicion of malignancy [6]. In the study, which evaluated these laboratory tests, however, the decision to perform them was based on clinician judgement [6] and the results would therefore be subject to a form of filter bias [14]. Overall clinician judgement for the presence of malignancy also had significant LR+ of 12.1 [16] but the details of what factors and other features were contained within this overall judgement were not reported.

Providing data, such as likelihood ratios, on the diagnostic accuracy of clinical features to screen for malignancy allows clinicians to evaluate whether further testing is warranted in patients with low back pain. The results of this review show that whilst a number of features have significant likelihood ratios, only four features; a previous history of cancer, an elevated ESR, low hematocrit, and clinician judgement are able to raise the post-test probability of malignancy to a clinically significant level when used in isolation (Table 4). This process is illustrated in Table 4, which shows the post-test probability of cancer in patients with a positive response to each red flag. The analysis is conducted for pre-test probabilities of 1 and 5%. For example, if the prevalence of malignancy (pre-test probability) in a low back pain patient is presumed to be 1%, and the patient is aged ≥50 years, the (post-test) probability would only increase to 2.2%. In fact all but one of the red flags from the clinical assessment had only modest predictive ability. The exception is if a patient has a previous history of cancer, where the probability will be raised to 19.2%, a change in disease probability that would be sufficiently large to warrant further investigation.

Table 4 Application of red flags to clinical decision making

Clearly it would be helpful to have a clinical screening tool with greater accuracy than the clinical red flags in Table 4. One strategy would be to rely upon combinations of red flags an approach more analogous to overall clinician judgement. The only combination of features that was evaluated had a significant LR+ of only 2.4 and a significant LR− of 0.06, as the focus was on increasing the sensitivity [6]. Further study is needed which focuses on raising the suspicion of malignancy by investigating to what effect combinations of features can increase the post-test probability. Almost three-quarters of the clinical features identified by this review were investigated in only one study [6], and it is possible that other features not previously evaluated may be useful in the diagnosis of malignancy. Due to the low prevalence of the disease, large-scale high quality studies need to be performed for practitioners to have further confidence in their ability to screen for serious pathologies such as malignancy. Another area of research would be the investigation of the salient features that are considered when clinicians form an overall judgement that the patient may have cancer especially as this ‘test’ was the second most informative clinical test to identify patients with cancer. This test was found to be quite informative in two studies but neither outlined the cues the clinicians were considering when forming this judgement.

The quality of the studies included in the review is an important consideration because certain methodological shortcomings can have large effects on estimates of diagnostic accuracy [14]. The largest of these effects are caused by studying a non-representative sample of patients, or failing to apply the same reference standard to the entire cohort or a random sample of the population [3]. Only one eligible study reported performing the same reference standard (X-ray) on all patients in their cohort [12]. The other studies combined the use of X-ray as a reference standard with clinical follow-up [5, 6, 8, 9, 16]. As clinical follow-up may fail to identify false-negative test results, the diagnostic performance of the test will be overestimated [14]. Overall, the reporting of design-related characteristics of the studies was poor, and the methodological quality was low.

To increase the external validity of our findings, we excluded case-control studies, and only extracted data from studies of clinical populations of low back pain patients. The use of clinical features for detecting serious spinal pathology is presumably most useful in the community primary care setting as this is where patients with low back pain are usually managed [13]. However, there were no studies identified by this review that were performed on a consecutive series of low back pain patients presenting to community primary care providers.

In conclusion malignancy is rare in low back patients. The most informative tests to screen for malignancy are a previous history of cancer, overall clinician judgement, elevated ESR, and reduced hematocrit. Popular red flags such as unexplained weight loss, age >50, and failure to improve after 1 month have only modest predictive ability and on their own are not useful to screen for cancer.