Background

Non-specific back pain has a life time prevalence of about 80% [44]. More than half of the patients with non-specific low back pain (LBP) resume full functioning within one month after a new onset of a LBP episode; 75–90% of the patients resume full functioning within 3 months [30]. Performing activities of daily living is one of the main components of quality of life, and is, besides return to work, the most important patient-centred outcome. Return to work as a sole outcome is not sufficient as, (a) one will exclude many patients who did not work before the LBP episode (housewives, elderly people), (b) return to work depends strongly on external factors (e.g. economic situation in the given region or profession), and (c) return to work and function are strongly correlated, but do not represent the same underlying construct. Therefore, it is important to evaluate prediction of function in a separate analysis. Persistent pain can restrict function, with consequences not only for the patient him/herself in different life situations, e.g. interpersonal interactions, but also with consequences for those close to the patient in work or private life.

Recent guidelines recommend the assessment of risk factors for severe disease and prolonged disability if a patient with LBP has not returned to full activity at 4 and 6 weeks after the onset of disabling LBP [21, 22, 34, 40]. Predicting or explaining functional recovery or disability would help to concentrate precious health care and social work on patients in need, especially by an informed assignment of the different interventions. If one knows a patient’s modifiable risk factors for persistent limitations, an informed assignment to different interventions is possible. For example, if social risk factors are predominant over medical factors, emphasis may be put on social work. To disclose a broader set of risk factors than those traditionally accounted for from the medical perspective would enhance the effectiveness of health care providers. In addition, explanation would be provided for ineffective medical interventions in the case that social risk factors are predominant.

Furthermore, these predictive tools could be used to “negotiate” and explain the rationale for management strategies to the patients. This might enhance the patient’s understanding of the problem and thus, e.g., enhance compliance during the rehabilitation process.

The possibility of accurate prediction of risk in clinical trials will allow the implementation of strategies in these trials to balance out known prognostic factors or to control for confounders in the analyses [31].

Several authors systematically reviewed the risk factors for function-related outcomes [7, 8, 15, 26, 29, 32, 35, 37, 45], but to our knowledge, there is no systematic overview of instruments predicting persisting restriction of function for patients with non-specific LBP.

We systematically reviewed the literature to find predictive models and tools for the transition from subacute to chronic non-specific LBP with persistent restriction in function. To report and discuss the predictive value of the instruments found, we evaluated the methodological quality, discriminative properties, and ability to predict or to explain the function-related outcome of these studies of patients with subacute non-specific LBP.

Methods

The search for evidence to answer the question “value of predictive tools to determine long-term restriction in function” was combined with the search for the similar study question “value of predictive tools to determine long-term non-return to work”. In this publication we report only the values for the predictive instrument for persisting restriction of function.

We searched studies reporting predictive values of questionnaires, assessments, clinical examination, etc.), or models (combining different individual risk factors or assessments to a “decision rule”, “clinical rule” or “predictive-tool”) for the prediction of chronic non-specific LBP with persisting restriction in function. For simplicity of reading, we will use the term “instruments” to summarize all these assessments and clinical rules, etc. in this text.

Inclusion criteria were: prospective cohort study, patients with subacute non-specific LBP and instruments had to be applied between 2 and 12 weeks after the initial medical visit for a first or a new episode of LBP [45]. We excluded retrospective studies, studies that applied the predictive instruments in a general population, studies that applied the instruments at a too early time-point (less than 10 days) or too late (more than 3 months) after the medical visit because of an onset of a new LBP episode, studies that included pregnant patients, patients with neck pain or patients with specific pathologies such as inflammatory diseases, cancer or studies that did not have at least a three-month follow up.

An epidemiologist and an information specialist defined the search strategy (available from the first author) for the different electronic databases. The search had no language, date, or publication status restriction. These systematic searches were conducted in July 2004: Medline in-Process (Ovid version, 1966–2004), Embase, (1974–2004) PsychINFO/PsychLIT (1987–2004), CINAHL (1982–2004), Central (2nd Quarter 2004, PEDro (from inception to 2004), Psyndex (1977–2004), Sociofile (1974–2004). In addition, we checked reference lists of the publications included of relevant systematic reviews, relevant articles on the topic, guidelines and expert reports. Furthermore, we searched in the Related Articles section of the studies included and reviews in PubMed (also after July 2004). Studies for the research questions were selected in two stages: initially, two reviewers (RH, TL) independently assessed 55% of the retrieved abstracts; two other reviewers (CH, HJ) assessed 55% of the identified abstracts. Agreement for the overlapping 10% was analysed and judged as good. In a second step, two reviewers (CH, RH) read the full text of the pre-selected studies and used checklists to decide on definite inclusion. Articles in other languages were assessed by physiotherapists with sufficient knowledge of the given language. Disagreements were resolved by discussion with the second author (LMB).

We checked the methodological quality of each study with a criteria checklist based on recommendations [11, 12, 18, 19] (see legend Table 2 for items). As effects of individual quality components can be masked by simple summary scores, we additionally looked for serious methodological flaws that would strongly bias the predictive values and which were not covered by the check-list [17]. The following additional issues were evaluated: number of events, whether interaction was considered, data reducing methods (for example principle component analyses) and variable selection process. Special issues for multivariable models were discussed [5]. We extracted all available relevant data at baseline, and the univariable and multivariable associations and predictive values. We extracted as much relevant data as needed and as possible to answer the question of predictive value. Given the heterogeneity in the tools and models, a meta-analysis would not have provided meaningful interpretable information.

Results

Selection process

Figure 1 shows the flowchart of the inclusion process of the combined search for work related outcome (not reported here) and function-related outcome. We identified 4,968 references in the databases and finally included 15 publications on function-related outcomes [4, 9, 10, 13, 14, 20, 23, 25, 27, 36, 39, 42, 43, 46, 47].

Fig. 1
figure 1

In this article we report only on studies on function-related outcomes. (a) Note that two articles were published after the end of the systematic search of the databases [36, 43], and one article was excluded in the abstract screening at first due to the wording “..in patients with chronic low back pain...” in the title [14]

Table 1 shows the studies included. Publication years ranged from 1993 to 2005. Five studies used logistic regression [4, 20, 27, 39, 42], six studies multiple linear regressions [10, 13, 14, 23, 36, 47], one study used latent transition regression analyses [43], two studies used recursive partitioning [9, 10], one study used analysis of covariance [25] and one study used univariable analyses only [46].

Table 1 Included studies dealing with the transition from subacute to chronic low back pain

Quality

See Table 2 for rating of the items from the checklist. All studies were classified as having moderate quality. The checklist does not reflect the general quality of the study, merely the quality concerning predictive issues. Percentages of patients available at follow up ranged from 56 to 100%. Five reports had follow-up rates below 80% [4, 13, 14, 36, 43] (see Table 1).

Table 2 Quality of reviewed studies

The majority of the studies did not include all relevant risk factors (e.g. specific self-efficacy or expectations of the patient regarding recovery) of the relevant domains (biomedical, psychosocial, socio-economic, as well as occupational). For example, several studies evaluated no risk factor from the psychological domain [4, 20, 27, 46]. No factors from the social and occupational domain were evaluated in several other studies [13, 27, 36, 46]. Some studies built their model from a comprehensive set of risk factors (i.e. from all relevant domains) [10, 14, 23, 25]. See Table 1 for independent variables included.

Predictive values for function-related outcomes

Table 3 shows the predictive values of the studies evaluating prediction of continuous function-related outcomes (disability questionnaires). The variance explained (R2) in outcome, where reported, ranged from 28 to 51%; mean 42% (SD 8). The highest explained variance was observed for disability as a predictor and for psychosocial factors.

Table 3 Studies reporting explained variance from regressions. Predictive instruments and models used for function-related outcomes at different time-points. Models predict restricted function

Table 4 shows the results from the studies reporting odds ratios for dichotomized functional outcome: median odds ratio (we inversed odds ratios if they were less than 1) for individual factors was 2.20 (interquartile range 1.49 to 3.68). Odds ratios above four were observed for the following predictors: lack of energy (9.9), high disability with severely limiting back pain (8.1), high disability with moderately limiting back pain (6.1), pain radiating below the knee combined with neurological signs (5.7), high score in the Oswestry Disability Index (5.2), social isolation (4.3) and avoidance coping style (4.1).

Table 4 Studies reporting odds ratios. Predictive instruments and models used for function-related outcomes at different time-points. Models predict restricted function

Table 5 reports probabilities of the different classifications for having impaired function at follow up. Probabilities for not having impaired function at follow up if being in the given “low risk” group ranged from 0 to 11.7% with a mean of 6% (SD 4.3). Probabilities for having impaired function at follow up if being in the “high risk” groups ranged from 26.8 to 82.1% with a mean of 51.6% (SD 18.9). Predictive values for not having impaired function are better than for predicting impaired function. For example, “not being distressed” was a better predictor of not having impaired function than “being distressed” was a predictor of having impaired function at follow up [10].

Table 5 Studies reporting probabilities. Predictive instruments and models used for disability related outcomes at different time-points

Table 6 shows predictive values for the dichotomized functional outcome. Diagnostic odds ratio for impaired function ranged from 4.0 to 28.7; mean 11.3 (SD 11.8). The percentages of the overall correct classified ranged from 40 to 72%; mean 55% (SD 14.65)

Table 6 Predicting values for dichotomized functional outcome

The sensitivity (correct classified regarding functional limitations) ranged from 63 to 91%; mean 77% (SD 13.8). Specificity (correct classified regarding good function) ranged from 29 to 93%; mean 63% (SD 26.8). Positive predictive value ranged from 22 to 98%; mean 59% (SD 39.8), negative predictive value ranged from 35 to 96% with a mean of 67% (SD 33.0).

The decision rule of Wahlgren [46] had a high diagnostic odds ratio and good overall classification for the prediction of functional outcome at 6 months, but with wide confidence intervals. Dionne and colleagues [9] evaluated generalizability. The diagnostic odds ratio was reduced from 8.27 to 4.1 because of the decreased specificity (from 57 to 29%) of the predictive tool in the new population. The overall correct classification was moderate in the development sample, but low in the new population.

Leroux [25] reported only associations between the independent variables and the one-year Roland Morris score from an analysis of covariance; these results are not shown in our tables.

Table 7 shows the large diversity of predictors included to predict functional related outcomes. Factors that are modifiable by treatments and were consistent predictors of function-related outcomes were function at baseline (measured with questionnaires) (reported 9 times), depression (8), somatization (3), psychological demand (3) and avoidance coping strategies (twice, once as avoidance coping style and once as guarding, which also concerns the avoidance of physical activities). Pain intensity was related with functional limitations positively (7) and negatively (once). The number of pain days were related positively (3), radiating pain, pain combined with disability, and the number of pain sites were each positively related with functional limitations once. The non-modifiable factors age, gender, education were important in several instruments: higher age was associated with higher functional limitations in nine populations, in one study, younger age was related with functional limitations and one study showed a U-shaped relationship between age and function [42]. To be a woman was related with higher functional limitations in six populations. Furthermore, diverse medical interventions were related with functional limitations.

Table 7 Predictors in different risk domains

Discussion

We systematically reviewed the literature to find predictive models and tools for the transition from subacute to chronic non-specific LBP with persistent restriction in function and we analysed the methodological quality. We found instruments with limited ability to predict or explain function-related outcomes in patients with non-specific LBP. The methodological quality related to predictive issues was moderate, especially regarding the selection processes for risk factors.

Predictive tools should contain risk factors and protective factors for problems with function-related outcomes from all relevant domains (biomedical, psychosocial, occupational, social, and patient expectations about recovery of functioning). If we evaluate the predictive tools in the light of guidelines for LBP [3, 21, 40] and systematic reviews on risk factors [15, 26, 32, 37, 38] we conclude that not all known risk factors were assessed and included in the instruments. In the case of LBP and its outcome function these factors would probably be age, gender, marital status, perceived disability, pain intensity, poor expectations for recovery of function, general and specific self-efficacy, somatization, pain catastrophizing, fear avoidance beliefs, distress, anxiety, health locus of control, coping strategies, symptoms of depression, pain behaviour, work intensity (heavy work and fast work pace), job tenure, availability of modified duty, perceptions of work (e.g. job satisfaction, monotonous work, job (or housework) stress, beliefs that work, housework or other activities are dangerous for the back, emotional effort concerning work or housework, social support or control at workplace) and delayed coordinated care.

Several medical interventions showed to be predictors of functional limitations. We cannot conclude whether patients with bad prognoses received more interventions, or if the interventions themselves had a causal relationship with functional limitations. One might argue that these therapies could have increased other risk factors as, for example, fear avoidance beliefs. For a discussion of the evidence of treatments in acute and subacute back pain see [40].

How to “operationalise” these constructs (e.g. distress, coping strategies, or pain behaviour) in a standardised manner should be defined to improve comparability and transferability of the predictive instruments.

Furthermore, the automatic selection processes used in the regression analyses in some of the studies included could lead to biased regression coefficients (on average too large coefficients) and to unstable variable selection, i.e. minor changes in the data may lead to selection of different predictors. For example, if a factor strongly influences the prognosis of a patient, but has a low prevalence in the population and is underrepresented in the sample in which a clinical rule is derived, then this strong predictor will not be a significant predictor in a statistical regression model and not be selected by automatic selection processes and, finally, be missing in the instrument [33]. If the instrument is later applied on a patient with this risk factor, the prognosis will be overly optimistic because the important risk factor was not assessed by the instrument. Therefore, one should not rely only on automatic selection processes. One possibility is to fit a regression model first with (a) a set of predictors that have shown to be related to the desired outcome in several studies (see list above) and with (b) a set of factors which might theoretically justify the predictive value.

Some factors, such as heavy work and fast work pace, could be summarized into a predictive index (e.g. work intensity) to reduce the number of variables in the model. Instead of relying on an automatic (stepwise) procedure, one could compare models with a “minimal” set of predictors with models including more predictors using a likelihood test to evaluate whether the models including more predictors have significant additional predictive value.

With respect to generalizability, only one study applied the model in a new population, an important step in validation: Dionne et al. [9] applied a model in a different population and obtained similar predictive values, but with decreased specificity. They considered over 100 variables in the development process and included enough patients in the development sample and the validation sample (860 patients at follow up).

Risk factors may change over time. Therefore, in studies where predictive models were devised or evaluated, analyses based on repeated measurements of predictors may be used [16]. Nevertheless, this was not done in the studies selected.

Limitations of this review

Our search in electronic databases was systematic for articles published before July 2004. We included three studies published after July 2004 (one found by expert contacts, two by searching “Related Articles” in Medline), but the search string was not rerun. Therefore, we cannot exclude that we missed some relevant studies published after July 2004.

We wanted to include studies that assessed risk factors during the period of 2–12 weeks after onset of a new back pain episode. However, differences in defining a back pain episode and the actual assessment time point could have led to an inconsistency in the selection process regarding this selection criterion. We included some studies with patients having long pain duration, for example the studies from Dionne et al. [9, 10]. Our argument to include these studies in spite of the long duration of pain is that a patient may be suffering from back pain for years without being restricted in his/her functioning. Most of the people suffering from an attack of LBP will reduce activities of daily living for some days, but will resume most activities within days. There are possibly some activities they will abandon, but a majority of people (approximately 90%) with a LBP episode are not that constrained by the back pain that they would consult a medical doctor [6]. If, however, a patient becomes restricted in activities of daily living and therefore visits a doctor, a new situation “the acute phase of an LBP episode with restricted activities of daily living” has begun. Therefore, we counted the 2–12 weeks from this time-point on (when the patient contacted a medical doctor because of LBP).

Future research

We propose the construction of a comprehensive predictive model for outcome related restriction in function. Building a model containing all relevant risk factors assembled in the many systematic reviews on risk factors for consequences of LBP will enhance information gained from such a decision tool. In a further step, the predictive values should be evaluated in patients with LBP at about 6 weeks after the onset of a new episode leading to functional limitations; we hypothesise that at this time-point accurate prediction of prognosis will have the highest impact on clinical practice and costs. The validation process should follow expert recommendations [2, 19, 24, 41, 48]. According to Altman and Lyman [1], multiple separate and uncoordinated studies may delay the process of defining the role of prognostic markers, therefore we suggest an international coordinated study.

Implication for practice

The instruments evaluated in this review do not provide optimal information for the allocation of health care and social work resources, since there is no instrument that includes a comprehensive set of risk factors from all relevant domains of health care, psychology and social work providers. There is evidence that most of the risk factors shown in table 7 are modifiable (see e.g. [28] for pain, self-reported function, fear avoidance beliefs about physical activity, and depression, or [40] for an overview). An assessment including a comprehensive instrument would allow an informed assignment of health care, psychological and social work resources towards the modifiable risk factors and improve the triage between inexpensive standard interventions and expensive (in sense of money and time) coordinated interdisciplinary rehabilitation programs.

Clinical research, e.g. randomized trials and outcome studies, without an instrument that accurately identifies prognostic factors makes it difficult firstly to balance the prognostic factors (e.g. by stratification or minimisation), secondly, to adjust for the case mix and, thirdly, to control for confounders by multivariable analyses [31].

With the reviewed instruments, one might accurately classify patients at the extreme both ends of the “no risk–high risk” continuum, but patients in the middle “grey” zone cannot be classified accurately. As a consequence of this lack of accuracy, expensive treatments will be assigned to those patients who would even have improved with minimal interventions. On the other hand, patients who would have improved with an intensive multimodal rehabilitation program might only receive minimal interventions and pain and restrictions persist. Nevertheless, even if a clear and accurate decision cannot be made by the use of one of these instruments, using a combination of them (to address all risk domains: biological, medical, psychological, socioeconomic as well as occupational) will reduce uncertainty and provide information to apply effective interventions towards the modifiable risk factors.

Conclusions

The instruments reviewed had only limited ability to predict or explain function-related outcomes. Using one of the presented tools would provide limited information on the spectrum and amount of risk factors involved.

To provide clinicians with an accurate predictive instrument, a comprehensive predictive model should be devised by assessing known and putative risk factors (e.g. age, gender, pain intensity and history, treatments in the past, Body Mass Index, self-reported function, neurological signs, depression, somatization, fear avoidance beliefs, self-efficacy, coping strategies, physical and psychological job demands) in a sufficiently large population. The model should then be applied to different populations to assess external validity.