Introduction

Non-specific low-back pain (LBP) is by most physicians considered as a recurring, benign, and self-limiting condition, but for patients it is a painful and disabling experience for which they frequently demand treatment. Several treatments are available for LBP, such as analgesics, non-steroidal anti-inflammatory drugs (NSAIDs), exercise, behavioral therapy, spinal manipulation, and acupuncture. Numerous randomized trials have been published investigating the effectiveness of treatments for non-specific LBP. However, there are important differences in how these trials have been conducted: Many trials assess the effect of combinations of different interventions; some trials compare interventions with no treatment or placebo, whereas others compare different interventions for non-specific LBP. In trials comparing two interventions, there is often no difference found between groups and it remains unclear if any of the interventions are effective or not and it raises questions about the basic benefit of each treatment.

Several systematic reviews have been published addressing the question of the effect of a particular treatment for LBP. Conclusions from many of these reviews are based on qualitative analysis. This type of analysis uses various levels of evidence (from “strong” to no “evidence”) regarding the effectiveness of a treatment taking into account the participants, interventions, controls, outcomes, and methodological quality of the original studies [80]. Quantitative analysis, on the other hand, is a statistical approach involving pooling data (meta-analysis) that provides an overall effect estimate, which allow direct comparing between effects of different treatments.

The objective of the present study was to synthesize the results of randomized controlled trials (RCT) for common LBP treatments comparing the interventions to placebo/sham or no-treatment comparison groups, to estimate a pooled effect size for each treatment, and compare them with each other.

Methods

Study selection

We searched for RCT from systematic reviews of treatment of acute and chronic non-specific LBP in the latest issue of the Cochrane Library, issue 2, 2005 and used Medline/Pubmed, Embase, Cinhal, and Amed to search for additional papers limiting the search from the time of the last search in each Cochrane review until December 2005. The following terms were used for the search: LBP (Mesh) or LBP (tw), placebo effect (Mesh). In addition, RCT keywords were used for the search: “LBP” and the name of the treatment of current interest (i.e., exercise, manipulation, behavioral treatment, NSAIDs, and acupuncture) [80].

Low-back pain was defined as pain located below the scapulas and above the cleft of the buttocks [81] and non-specific LBP was defined as back pain not attributable to a recognizable, known specific pathology (i.e., infection, tumors, osteoporosis, fracture, structural deformity, inflammatory disorder, radicular syndrome, or cauda equina syndrome) [87]. Three criteria defined relevant trials: (1) the trials should compared the treatment to a no-treatment comparison group, (2) the trials should investigate an unselected and general population, and (3) the treatment should be practiced and be available in several countries.

The no-treatment comparison groups included subjects receiving placebo, sham treatments, no treatment, or those on a waiting list. Waiting-list implied that the patients were waiting for a treatment for their back pain. Placebo was defined as a medicine which has no inherent pertinent pharmacologic activity, but which is effective by virtue of the factor of suggestion attendant upon its administration. Sham treatment was defined as procedures where medical personal goes through the motions without actually performing the treatment such as sham acupuncture for acupuncture or detuned Transcutaneous electric nerve stimulation (TENS). No treatment implied that the patients were not prescribed any drugs by the physician or recommended any treatment or home exercises for their back pain from any other medical personnel. They could, however, receive a booklet with advice about daily activities.

We looked separately at trials investigating acute and subacute/chronic LBP. Acute LBP was defined as duration of pain less than 6 weeks and the subacute/chronic condition more than 6 weeks.

Outcome measures

Outcomes were self-reported pain intensity and self-reported physical functioning. In the LBP literature, several outcome measures have been used to assess the construct of pain intensity [for example, 10 cm or 100 mm visual analogue scale (VAS), McGill Pain Questionnaire, and numeric (11, 21, or 101 points) rating scale (NRS)] [86]. LBP-specific functioning can also be measured with various instruments [for example, Oswestry Disability Index (0–100), Quebec Back Pain Disability Scale (0–100), and the 24-point Roland Morris Disability Questionnaire] [44]. We used the standardized mean difference (SMD) to estimate the treatment effect of the individual trials for similar constructs. This will allow direct comparison of studies, which used different measures of sufficiently similar constructs [19].

Data extraction

Data were extracted from the included trials separately both for acute and subacute/chronic LBP, and also for short-term (assessment closest to 6 weeks after randomization) and long-term (assessment between 6 and 12 months after randomization) follow-up. We extracted effect sizes estimated as SMD and Relative Risk (RR) or data for calculation of effect size. Cohen categorized effect size values as small (ES: 0.2–0.5), moderate (ES: 0.5–0.8), and large (ES: >0.8) [16] but it is uncertain how this applies to the field of LBP.

We calculated effect size from either continuous (mean, SD, and confidence interval) or dichotomous variables (number of patients with good/excellent response to treatment). For continuous variables effect size was calculated as SMD, which is defined as the differences in outcome measures between two groups divided by the SD of the of the control group, the SD of the treatment group, or the pooled SD [19, 41]. For dichotomous data the effect size was calculated as RR, where RR is the risk of an event in the treatment group divided by the risk of the event in the comparison group. If no data were available for estimating effect size, the author of the trial was contacted by E-mail, and if data were provided, these trials were also included. If variance data were not reported as SDs, they were calculated from the trial data using standard error of the mean (SE) or 95% confidence intervals. If variance data were not reported, SDs from other relevant studies were used, i.e., studies concerning the same treatment and the same condition (acute or chronic). The pooled SD of the treatment effects for each group from relevant studies was calculated using the following formula [19]:

$$ {\text{SD}}_{{{\text{pooled}}}} {\text{ = }}{\sqrt {\frac{{{\text{(}}n_{{\text{1}}} {{ - }}{\text{1) $ \times $ SD}}^{{\text{2}}}_{{\text{1}}} {\text{ + (}}n_{{\text{2}}} {{ - }}{\text{1) $ \times $ SD}}^{{\text{2}}}_{{\text{2}}} }} {{N{{ - }}{\text{2}}}}} }, $$

where n is the number of participants in each treatment group, N the total number and SD1 and SD2 are standard deviations for the intervention group and control group, respectively [19]. The percentage of the SDpooled of the mean difference of change from relevant studies was used in studies with missing variance data [25].

The quality of the included trials was reported as assessed by the authors of the systematic Cochrane reviews. For the original articles published after the reviews, one of the authors of the present study (AK) assessed the quality according to the 11-item criteria list recommended in the method guidelines for systematic reviews of the Cochrane Back Review Group [80]. Some reviews used another criteria based on a list consisting of three items [38].

Analysis

A quantitative meta-analysis was performed in which the effect sizes (SMDs and RR) were pooled using a random effect model.

We assessed statistical heterogeneity using I 2 statistics and confidence intervals [34]. We present the effect sizes separately for dichotomous and continuous variables. For continuous variables, the pooled effect sizes for the treatments for the acute and chronic condition are presented. In addition, the effect sizes for the individual studies for each treatment for the chronic condition and short-term follow-up are presented.

Review Manager, Version 4.2 for Windows (Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2003) was used for the analyses.

Results

We included seven reviews from the latest issue of the Cochrane Library, issue 2, 2005, from which we included 41 trials of 228 (Tables 1, 2). An additional six trials were identified from the updated search in Pubmed/Medline, Embase, Cinhal, and Amed. Table 1 shows the number of trials included from the systematic reviews and the updated search. About 20% of the trials were included, ranging from 4 to 50%. In most cases, the reason for not including a trial was because one type of intervention was compared to another intervention and not to a no-treatment group.

Table 1 The total number of trials included in the systematic reviews from the latest issue of the Cochrane Library, issue 2, 2005, the number of these trials included in the present review for the acute and the chronic condition according to the inclusions criteria, and the number of included trials from the updated literature search
Table 2 Reasons for not including trials listed in the reviews from the Cochrane Library, issue 2, 2005 and reason for not including trials comparing treatment with no-treatment group in the updated search

Acute low-back pain

Table 3 describes the studies, which evaluated the effectiveness of treatments for acute LBP. The effect sizes of the individual studies and the pooled effect sizes are presented in Tables 4 and 5 and in Fig. 1. Unless otherwise noted, the effect sizes presented are calculated from continuous outcome measures. We included four studies on exercise therapy [13, 26, 50, 54], which were published between 1993 and 2005: the quality was high in two studies and low in two. Regarding short-term follow-up, the pooled effect sizes for pain relief and function were 0.07 (95% CI: −0.30 to 0.44) and 0.38 (95% CI: −0.40 to 1.16), respectively. For long-term follow-up, the corresponding figures were −0.04 (95% CI: −0.35 to 0.27) and −0.13 (95% CI: −0.35 to 0.09) (Table 4 and Fig. 1). Moreover, these studies were characterized by a high degree of heterogeneity (I 2 = 69.1%).

Table 3 Description of the included studies for acute LBP
Table 4 Effect sizes (SMR) for acute LBP for pain and function
Table 5 Effect sizes (relative risk) for acute LBP for pain
Fig. 1
figure 1

The pooled effect sizes for treatments for acute (at the top) and chronic low-back pain (at the bottom) for pain and function, short- and long-term follow-up, presented in a Forest plot

We included three studies on spinal manipulation [8, 29, 63], which were published from 1974 to 1988: one was of medium and two were of low quality. Measures of variance were missing in all three studies, and therefore we used SDs from other RCT concerning manipulation in patients with acute LBP [36, 88]. The pooled SDs of these two studies were 85 and 140% of the mean difference, respectively. For the calculations of effect size, we chose a SD of 112.5% of the difference in mean. We also performed a sensitivity analysis with SD values of 85 and 140% of the mean, applied on the studies without measurements of variations [8, 29, 63]. The pooled effect sizes for short-term pain-relief were 0.50 with SD at 85%, 0.33 with SD at 140%, and 0.40 (95% CI: −0.09 to 0.89) with SD at 112.5 (Table 4 and Fig. 1). The studies were homogeneous (I 2 = 10.7%).

Three studies on NSAIDs were included [5, 18, 71], which were published from 1994 to 2003: one of high quality and two of moderate quality. The pooled effect size for short-term pain relief was 0.51 (95% CI: 0.16 to 0.86) (Table 4 and Fig. 1), but the studies exhibited a high degree of heterogeneity (I 2 = 67.8%).

Four studies on muscle relaxants comparing non-benzodiazepines with placebo were included [6, 9, 47, 78]. They were published from 1979 to 2003: two were of high quality and two moderate quality. The pooled effect size was calculated from dichotomous variable, and was for pain short-term follow-up 0.52 (95% CI: 0.42 to 0.65) (Table 5). The studies were homogenous (I 2 = 0.0).

Chronic low-back pain

Table 6 describes the included studies for evaluation of the treatments for chronic LBP, the effect sizes are presented in Tables 6 and 7, in Figs. 1 and 2. We included six studies on exercise therapy [45, 64, 65, 69, 76, 91], which were published from 1981 to 2000: one was of medium and five of low quality. The pooled effect size was 0.52 (95% CI: −0.21 to 1.25) for short-term pain relief, and 0.25 (95% CI: −0.04 to 0.54) for long-term pain relief. For short- and long-term improvement in function, the effect sizes were 0.22 (95% CI: −0.07 to 0.51) and 0.13 (95% CI: −0.32 to 0.58), respectively. The studies had a high degree of heterogeneity (I 2 = 88.5%).

Table 6 Effect sizes (SMR) for chronic LBP for pain and function
Table 7 Effect sizes (relative risk) for chronic LBP for pain
Fig. 2
figure 2

The pooled effect sizes for included trials for treatments for chronic low-back pain, for pain presented in a Forest plot

We included seven studies on behavioral treatment [49, 60, 70, 7477], which were published from 1982 to 1993: one was of high, four of modest quality, and two of low quality. The effect sizes for short-term follow-up for pain relief were 0.57 (95% CI: 0.33 to 0.81) and 0.24 (95% CI: −0.01 to 0.49) for function. The studies were homogenous (I 2 = 0.0).

Five studies concerning manipulation were included [21, 61, 63, 73, 89]. The included studies were published from 1978 to 1995: four studies were of modest and one was of low quality. Measures of variance were not available for three studies [21, 63, 89] and, therefore, the variance measurements from the two other studies were used [61, 73]. The pooled SDs of these two studies were 60 and 130% of the mean difference, respectively. For the calculations of effect size, we chose a SD of 95% of the difference in mean. We also performed a sensitivity analysis with SD values of 60 and 130% of the mean, applied on the studies without measurements of variations [21, 63, 89]. The pooled effect sizes for short-term pain-relief were of 0.49 with SD at 60%, 0.19 with SD at 130%, and 0.35 (95% CI: −0.01 to 0.69) with SD at 95% of mean difference. The studies were moderately heterogeneous (I 2 = 50.7%).

Two studies [17, 53] were included on Transcutaneous Electrical Nerve Stimulation (TENS) [43], which were published in 1990 and 1993 and were of modest and low quality, respectively. The pooled effect size for short-term pain relief was small 0.19 (95% CI: −0.13 to 0.51). The studies were quite homogeneous (I 2 = 17.7%).

We included seven studies in acupuncture [12, 14, 42, 46, 55, 57, 72], which were published from 1980 to 2002 and most were of moderate to high quality. The pooled effect size for short-term pain relief was modest 0.61 (95% CI: 0.41 to 0.81). The studies were homogenous (I 2 = 0.0).

We included three studies evaluating the effect of benzodiazepines [2, 66, 85]. They were published in 1990 and 1992 and were considered to be of moderate quality. The effect size was moderate for short-term pain relief RR 0.82 (95% CI: 0.72–0.94) (Table 6). The studies were homogenous (I 2 = 0.0).

We considered four studies on the effect of NSAIDs on chronic LBP [10, 15, 40], but one was excluded because there was no data available for estimating effect size [10]. They were published from 2003 to 2004 and were of high quality. The pooled effect size for pain was moderate RR 0.61 (95% CI: 0.50 to 0.74) for short-term follow up.

Discussion

The purpose of the present study was to investigate the effect sizes of treatments for non-specific LBP in randomized controlled studies comparing treatment with no treatment. In general, only a few studies of all the included RCT in the systematic reviews of the Cochrane Library, issue 2, 2005, compared treatments with no treatment (Fig. 1). The results are sobering as there are only modest effect sizes, if any, for short-term pain relief.

For the acute condition the effect size of NSAID and manipulation were only modest, there was no effect of exercise, and function and long-term follow-up were missing from most studies. Our results are in accordance with “the European Guidelines for acute LBP” which recommend NSAID and consideration of a short course of manipulation if the patients do not return to normal activities [20].

For chronic LBP, acupuncture and behavioral therapy had the largest effect size, followed by exercise and NSAID, although, all with only a modest effect. TENS and manipulation had small effect sizes. Only exercise and behavioral therapy measured function, but demonstrated scarcely any effect and data on long-term follow-up were missing.

“The European Guidelines of the management of chronic LBP” [1] recommends behavioral therapy, exercise, and a brief educational intervention, in addition to a brief treatment with NSAID and muscle relaxants. Additionally, they also suggest a short course of manipulation as a treatment option. Although acupuncture is not included in the European guideline’s recommendations, we found that acupuncture had a modest effect size, when compared to placebo or no-treatment groups.

Supervised exercise is recommended as the first line treatment in the management of chronic LBP, without any recommendations on the specific type of exercises [1]. The definition of exercise is wide and defined as “a series of specific movements with the aim of training or developing the body by a routine practice or as physical training to promote good physical health” [32], and accordingly the included studies in the present study concerning exercise contain all kind of exercises and different combinations of exercise programs. We based our results on quantitative analysis, in order to produce a single estimate of a treatment effect [19]. However, to do this in a meaningful way, heterogeneity must be taken into account. Heterogeneity concerns the variation in results across studies, which might be the result of differences in patient selection, type of treatments and combinations of these, durations of treatment, and more. The level of heterogeneity is calculated by chi-square test and is quantified by the I 2-value, which describes the percentage of the total variation across studies that are due to heterogeneity rather than chance. We demonstrated a high level of heterogeneity for exercise trials, and although it seems reasonable to compare all kinds of exercise-programs, the limitation is that it is not possible to calculate a pooled effect size. So, for achieving a consistence and homogenous measure of the effect of exercises, it seems suitable to compare studies with the same exercise programs or use advance statistical methods to explore these characteristics [32]. The heterogeneity for manipulation was also high, which might be explained by different manipulations methods and different populations.

It might be surprising that we only found modest effect of the most common treatments for non-specific LBP. For acute LBP, the low-effect sizes might be explained by the fact that it is usually a self-limiting condition with a recovery rate at 90% within 6 weeks [20, 37]. Concerning chronic non-specific LBP, the definition of the diagnosis “chronic non-specific low back pain” is not defined as a clinical entity and diagnosis, but rather a symptom in patients with very different stages of impairment and disability without knowing the specific causes of the pain [1]. This implies that we compare effects of treatments for a condition without a specific diagnostic test, based on patient’s rating of pain, with different unspecific radiological findings and where the prognosis is influenced by psycho–social and work related factors. This is a serious limitation of the study, which may contribute to the modest effect sizes, and it emphasizes the need of defining appropriate sub-classifications of non-specific LBP.

Most RCT for treatments of LBP compare combinations of different interventions with either one intervention or another combination of interventions [4, 7, 51], and only a small part (about 20%) compare treatments with a no-treatment group (Table 1). This is interesting as it would be natural first to investigate the basic benefit of treatments and then compare them to each other. The purpose of the present study was to investigate the basic benefit with the consequence that only a small part of trials from each systematic review would be included (Tables 1, 2) and some treatments in general use would be excluded. Although this limits the generalization of the result, it brings into focus the importance of no-treatment controlled trials for investigating the pure effect of treatments for non-specific low-back pain. However, we are aware of the multitude of difficulties in carrying out no-treatment controlled trials, particularly within surgery.

Although surgery for degenerative conditions affecting the lumbar spine is a common type of treatment, it was excluded from the present study as most RCT compare different surgical techniques. To our knowledge, the only exception is the study by Fritzell et al. [22] comparing lumbar fusion with a no-treatment group. Most patients with chronic low-back pain who are referred to a surgical clinic have been through several non-surgical treatments, inclusive exercises, and they often demand surgical treatment. Hence, it might be a hard task for the physician to persuade these patients to be enrolled into a RCT, since there is a risk they will be randomized to the no-treatment group.

Another limitation is the quality of the included studies. Generally, they ranged from 3 to 7 according to the 11-item criteria list, with a few exceptions of 9 and 11, both for acute and chronic LBP (Tables 3 and 8). In addition, in several studies, mostly concerning manipulation, the variability of the effect estimate was not reported and for this reason we used the SD from other studies. Although, we performed sensitivity analyses, this is a limitation that calls for a cautious interpretation of the results.

Table 8 Description of the included studies for chronic low-back pain

Overall, the follow-up in most of the studies was insufficient. Exercise therapy has short and long-term follow-ups both for the acute and the chronic condition, but apart from this treatment there is a lack of long-term follow up. However, it is likely that long-term follow-up would not change the conclusion in current study, as the effect sizes in general were small at short-term follow-up and would most probably not have changed at long-term follow-up.

Outcomes measurements in the field of LBP are pain and function measured by psychometric scales. In the present study, there were several different scales both for pain and function. Although the measurements can be standardized by calculating effect size or by rescaling individual trials outcome for pain and functioning from 0 to 100, comparison of trials would be more easy and precise by using the same scales as recommended by Bombardier et al. [11].

It is common to accept co-interventions in RCT and it might be difficult to carry out RCT not allowing patients to visit other health care providers. However, co-interventions contribute to blur the effects of the treatments in the RCT, so that is the reason for our decision to exclude trials where patients were allowed to have co-interventions [47, 58, 67].

In conclusion, the effect sizes for the pure benefit of treatments of LBP that are compared to no-treatment groups were small to moderate for both the acute and chronic conditions. There was a lack of long-term follow up for pain and function and the quality of the studies were low to moderate. For increasing our knowledge about treatments of non-specific LBP, there is still a need to develop more effective interventions.