Introduction

In Sweden, up to 25% of patients report unimproved or worse pain and up to 40% are not happy with the outcome of lumbar fusion [35]. Many possible reasons for mixed lumbar fusion surgery outcomes exist, including instrumentation failure, inadequate surgical technique and poor patient selection. Factors previously suggested to be predictive of pain and disability-related outcomes include pre-surgical pain/function [42], negative personality traits [19, 31, 40, 41, 44], emotional status [41], anxiety/depression [3, 23, 28, 40, 41], fear avoidance (FA) beliefs [28], negative outcome expectations [20, 48] negative coping [3], smoking status [41], gender [9], exercise [9], litigation [23], duration of back pain and workers’ compensation [3, 19, 40, 41].

The understanding of how psychological factors influence pain perception and the development of pain-related disability has improved in recent years. The FA model summarizes the current state of literature describing how increased anxiety, fear of movement/(re)injury and negative emotions can be related to the use of negative coping strategies such as pain catastrophizing resulting in avoidance behaviour, functional disability, depression and pain chronicity [25, 46]. Factors involved in the FA model have been estimated to explain 40–50% of the variance in pre-operative measures of functional disability and health-related quality of life (HRQOL) in lumbar fusion patients [1]. Furthermore, strong influences and mediation roles of psychological factors on pain, mental health, fear of movement/(re)injury, disability and HRQOL in patients scheduled for lumber fusion have been shown [1]. In addition to our own observations in a randomised controlled trial [2], a previous study showed the importance of post-operative rehabilitation for the outcome of lumbar fusion [4]. In both trials, biopsychosocially orientated rehabilitation was more effective than exercise therapy in improving functional outcome [2, 4].

Previously published studies investigating prediction of lumbar fusion outcome have failed to include a complete array of psychological factors outlined by the FA model or even accounted for the influence of post-operative rehabilitation type as predictor variables in regression models. Previous studies have restricted their analyses to relations showing only linear trends which assume that no nonlinear relations exist between response and predictor variables. Furthermore, previous studies have not tested the validity of their regression models’ performance. It is hypothesised that with the inclusion of relevant variables based on current literature, the use of regression methods capable of analysing nonlinear relations, the use of regularization methods for variable subset selection and the use of bootstrap resampling for examining prediction error, a model capable of predicting long-term problems with functional disability, back pain intensity and HRQOL in lumbar fusion patient can be constructed. The purpose of this study was to investigate individual factors and issues of validity in the prediction of functional disability, back pain intensity and HRQOL 2–3 years after lumbar fusion based on regression of non-linear trends in pre-surgical variables.

Materials and methods

Study design and selection of patients

A prospective cohort design was used to study the predictive value of pre-surgical demographics, work-related, psychological and clinical variables in relation to functional disability, back pain intensity and HRQOL outcomes 2–3 years after lumbar fusion. The patients were recruited from the Karolinska University Hospital’s Orthopaedic Clinic, Stockholm, Sweden, over a 2-year period between 2005 and 2007. The inclusion criteria were: men and women aged between 18 and 65 years with a >12-month history of back pain and/or sciatica; a primary diagnosis of spinal stenosis, degenerative or isthmic spondylolisthesis or degenerative disc disease; selected for lumbar fusion with or without decompression; competence in the Swedish language. The criteria for exclusion were previous lumbar fusion, rheumatoid arthritis and ankylosing spondylitis. The ethics committee for medical research in Stockholm health region approved the study.

Of the 107 patients recruited in this study, 58 received posterolateral fusion with pedicle screws, 32 received transforaminal intervertebral fusion and 17 received posterolateral fusion without pedicle screws. In 99 of the patients, decompression was performed by full laminectomy and partial facet joint resection in central spinal stenosis and spondylolisthesis (isthmic and degenerative) or by partial facet joint resection in root canal stenosis. The patients participated in a randomised controlled trial analysing the short- and long-term effectiveness of a psychomotor therapy (N = 53) compared with an exercise therapy (N = 54) applied during the first 3 months after lumbar fusion [2]. Psychomotor therapy combined cognitive-behavioural and motor relearning strategies to modify maladaptive pain cognitions, behaviour and motor control. Exercise therapy encompassed physical training focusing on muscular strength, endurance and cardiovascular fitness.

Evaluation of outcome

The response variables in the study are functional disability, back pain intensity and HRQOL 2–3 years after lumbar fusion. Pre-surgical predictor variables were collected 1 month prior to surgery. These variables include demographics (age, gender, BMI, smoking), work-related variables (work status, sickness benefits), psychological variables (mental health, fear of movement/(re)injury, outcome expectancy, catastrophizing, functional self-efficacy, control over pain by using coping strategies and ability to decrease pain by using coping strategies) and clinical variables (back pain intensity, leg pain intensity, straight leg raise, functional disability, diagnosis, surgical technique, post-operative rehabilitation). Medical records system and a self-reported questionnaire were used to collect data on demographics, work-related variables and clinical variables such as diagnosis, straight leg raise, surgical technique and post-operative rehabilitation. Straight leg raise was considered positive if pain in the sciatic distribution was reproduced between 30° and 70° passive flexion of the straight leg [6]. The Oswestry Disability Index (ODI) version 2.0 [13, 14] was used to measure functional disability where lower scores on the 0–100 scale reflect less low back pain disability. A 0–100 mm visual analogue scale (VAS) representing no pain to unbearable pain was used to record patients’ self-rated average back pain and leg pain intensity during the previous 7 days [17, 30]. The European Quality of Life Questionnaire (EQ-5D) [12] was used to measure HRQOL where a 0–100 scale from worst to best possible health state was calculated by using UK index tariffs [7]. The mental health subscale of the Medical Outcome Study Short Form 36 (SF-36) was used to measure mental health and is presented as a 0–100 score with higher scores representing better mental health. It is a summary score of five questions related to anxiety, depression, loss of behavioural/emotion control and psychological well-being experienced during the previous month [47]. The Tampa Scale for Kinesiophobia (TSK) with a score range of 17–68 (low–high) was used to measure the patient’s current pain-related fear of movement/(re)injury[45]. The Self-Efficacy Scale (SES) with a score range of 8–64 (low–high) was used to measure patient’s belief in their ability to perform physical activities [11]. To investigate the patient’s beliefs about the expected outcomes related to future low back pain, the Back Beliefs Questionnaire (BBQ) with a score range of 9–45 was used [37]. Higher scores on the BBQ represent a more positive attitude and better ability to manage future back pain. The Coping Strategy Questionnaire (CSQ) was used to measure the patient’s current use of coping strategies [32]. The CSQ’s catastrophizing subscale (CSQ-CAT) with a score range of 0–36 (low–high) was used to measure the patient’s use of pain-related negative thinking. The self-perceived effectiveness of coping strategies to control pain (CSQ-COP) and ability to decrease pain (CSQ-ADP) were measured by two single-item scales with a score range of 0–6 (low–high). In Scandinavian conditions, all questionnaires have been shown to have good reliability and validity [10, 16, 18, 21, 22, 24, 26, 36].

Analysis

To identify the most important pre-surgical variables for the prediction of functional disability, back pain intensity and HRQOL 2–3 years after lumbar fusion, a categorical regression (CATREG) method in SPSS version 17 was used. CATREG is capable of describing nonlinear relations by using a regression with transformation approach and optimal scaling methodology [15, 42, 43].

The response and predictor variables that are interval scales were treated as rank-ordered variables. To make it possible to investigate possible nonlinear relation between variables, monotonic transformations were used for interval variables with a limited number of categories (ability to decrease pain, control over pain) while a spline transformation was used for remaining interval variables with a larger number of categories. The spline transformation was based on second degree polynomials with one interior knot controlling the smoothness of the transformations. A nominal scaling level was used for variables such as age, gender, smoking, work status, sickness benefits, diagnosis, surgical technique, straight leg raise and post-operative rehabilitation.

Elastic net regularization was used to improve prediction accuracy by shrinking the regression coefficients making them more stable and reducing the estimation variance due to possible multicollinearity [49]. Shrinkage occurs through applying a penalty to the regression model. When increasing the penalty, variables with the most stable coefficients will shrink to zero more slowly. A 0.632 bootstrap method was used to draw 200 new samples with 63.2% similarity to our original sample. From the 200 resamples of our original sample, the smallest (most parsimonious) subset of predictors within 1 standard error (SE) of the model with minimum prediction error could be selected [8].

Predicted values resulting from CATREG of the most parsimonious models for each response variable were used to test the discriminative power of the models. The median value was used for dichotomising each response variable. A receiver operating characteristic (ROC) analysis was used to investigate the sensitivity (proportion of true positives) and specificity (proportion of true negatives) of the models.

Results

The follow-up rate at 2–3 years after surgery was 81%. Missing data from non-responders was imputed according to intention to treat principles with a 10th or 90th percentile value when external evidence in medical records indicated good or bad outcome; otherwise, mean imputation was used. Table 1 outlines descriptive statistics for the response and predictor variables used in the regressions.

Table 1 Variables

In Fig. 1, CATREG elastic net paths for the penalisation of the full predictor models are displayed. For functional disability 2–3 year post-surgery, the elastic net regularization method in 200 bootstrapped samples found the most parsimonious shrunken model of stable predictors to contain eight pre-surgical predictors (functional disability, mental health, fear of movement/(re)injury, outcome expectancy, catastrophizing, control over pain, leg pain intensity and post-operative rehabilitation). The same pre-surgical predictors along with the straight leg raise were found to form the most parsimonious shrunken model for back pain intensity 2–3 year post-surgery. The most parsimonious shrunken model for HRQOL 2–3 years post-surgery contained variables such as mental health, outcome expectancy, catastrophizing and control over pain.

Fig. 1
figure 1

CATREG elastic net regularization, bootstrapped with 200 resamples. Each pre-surgical predictor is represented by a symbol in the right hand column. The penalised shrinkage of beta coefficients for each pre-surgical candidate predictor shows that the more stable predictors decrease to zero more slowly. The x-axis dotted reference line represents the smallest subset of predictors with the least prediction error estimated from the 200 bootstrap resamples of the original population

Regression coefficients and the relative importance of each eight predictors in models for functional disability and back pain intensity 2–3 years post-surgery as well as the four predictors in the model for HRQOL 2–3 years post-surgery are shown in Tables 2, 3 and 4. The prediction models significantly explained variance in functional disability, back pain intensity and HRQOL 2–3 years post-surgery with an R 2 = 0.416, 0.360, 0.256, an apparent error = 0.599, 0.640, 0.744 and an expected prediction error = 0.873, 0.955, 0.944, respectively. Significant predictors for functional disability 2–3 year post-surgery were pre-surgical leg pain intensity (β = −0.301, P ≥ 0.001), post-operative rehabilitation (β = 0.230, P = 0.024), pre-surgical catastrophizing (β = 0.240, P = 0.041) and pre-surgical control over pain (β = −0.212, P = 0.040). Significant predictors for back pain intensity 2–3 years post-surgery were pre-surgical catastrophizing (β = 0.230, P = 0.002), pre-surgical leg pain intensity (β = −0.291, P = 0.026) and the straight leg raise (β = 0.219, P = 0.021). Significant predictors for HRQOL 2–3 years post-surgery were pre-surgical control over pain (β = 0.231, P = 0.031) and pre-surgical outcome expectancy (β = 0.250, P = 0.002). For the median dichotomised classification of functional disability, back pain intensity and HRQOL levels 2–3 years post-surgery, the discriminative ability of the models is shown in Table 5.

Table 2 Predictor subset regression coefficients for functional disability 2–3 years post-surgery
Table 3 Predictor subset regression coefficients for back pain intensity 2–3 years post-surgery
Table 4 Predictor subset regression coefficients for HRQOL 2–3 years post-surgery
Table 5 Median dichotomised classification results for functional disability, back pain intensity and HRQOL 2–3 years post-surgery using the predicted values from regression models as test scores

Discussion

The major finding of this study was that good prospective disability, back pain and HRQOL outcomes after lumbar fusion surgery can be predicted by screening pre-surgical psychological variables. In clinical practice, this translates to high outcome expectancy (≥28 points) recorded by the BBQ [37], high control over pain levels (≥4 points) recorded by the CSQ-COP [32], and low catastrophizing levels (<18 points) recorded by the CSQ-CAT [32]. Therefore, to attain an accurate prediction of outcome, patients can quite simply complete the BBQ as well as the catastrophizing and control over pain subscales of the CSQ and the clinician can easily summate the scales. Furthermore, the predictive power of psychological variables is also supported by our finding that patients planning to receive early post-operative rehabilitation with a biopsychosocial approach are predicted to have less prospective functional disability levels than patients receiving traditional post-operative exercise therapy.

Previous studies that have included pre-surgical outcome expectations in multivariate models have found its predictive significance when testing pain, functional and HRQOL-related outcomes [5, 20, 29, 33, 39, 48]. Our results showed that patient’s expectations of future back pain-related outcome was the most important predictor of prospective HRQOL, but was not predictive of pain or functional related outcomes. This may be explained by our inclusion of a more thorough range of psychological factors compared with previous studies [27] revealing control over pain and catastrophizing to be significant predictors of pain and functional related outcomes.

Self-reported lower limb pain has largely been ignored in predictive research in favour of testing the predictive value of self-reported back pain intensity. Because this variable is dependent upon the patient’s self-report of symptoms, we chose to use the term “leg pain intensity” as patients found this terminology more easily understandable in the Swedish language. In our study, high pre-surgical levels of leg pain intensity proved to predict lower levels of functional disability and back pain intensity 2–3 years post-surgery. To distinguish leg pain of radicular nature, a positive straight leg raise was shown to predict lower prospective levels of back pain. These findings are not surprising as a patient’s self-reported leg pain along with a positive straight leg raise most likely reflects somatic illness more than back pain, and a biological method such as surgery should be expected to affect particularly somatic illness. This, of course, suggests the importance of a thorough pre-surgical assessment of pain and neurology to distinguish patients with dominating peripheral symptoms. One could assume that diagnoses with characteristic peripheral symptoms such as degenerative disc disease, lateral and central spinal stenosis would also show predictive strength for prospective back pain, but this was not the case in our study. Separate analysis of diagnostic subgroups even produce similar results for predictive variables. Although the reason for this is unclear, the results suggest that the underlying diagnosis indicating surgery, whether for clinical syndromes of instability or spinal stenosis, is of less importance than the absence or presence leg pain and the straight leg raise in the prediction of prospective back pain outcomes.

In previous outcome prediction studies specific for lumbar fusion patients, 25–30% of the variance in post-surgical ODI and pain VAS have been reported to be explained by regression models [9, 41]. In our study, the models significantly explained 41.6, 36.0 and 25.6% of the variance in the 2–3 year measures of ODI, back pain VAS and EQ-5D, respectively. The analysis of nonlinear relations and the optimal scaling transformations of the variables used in CATREG help to increase the predictor variable’s beta values and the subsequent variance explained by the models due a better data fitting compared with linear regression modelling.

Ordinary least squares regression (OLS) (linear and logistic) is known for not performing well with regard to both prediction accuracy and model complexity [34, 38]. Several regularized regression methods have been developed to overcome the flaws of OLS regression. Zou and Hastie [49] proposed the “Elastic Net” regularization method which uses shrinkage of regression coefficients to reduce their variability and provide subset selection of stable predictors (Fig. 1).

In many studies, authors have attempted to decrease model complexity by not including non-significant variables from univariate tests in the belief that the variable effects should be proven prior. For estimation, prior significance testing is, however, not relevant if a variable effect is supported by subject knowledge [34]. Another alternative is using a reduction while modelling approach such as backward stepwise selection to eliminate the least significant candidate predictors from a full model [38]. A disadvantage with stepwise methods is the instability of predictor selection and exaggeration of P values, especially when the number of observations to variable ratio is <10. Statistical texts, however, recommend 1 predictor to 50 observations and the use of bootstrap re-sampling for reliable selection among candidate predictors in standard linear/logistic regressions [34, 38]. No previous study investigating predictors of spinal surgery outcome has been able to follow such recommendations. These assumptions of linearity between variables, normality of residuals and ratio of cases to variables in standard linear or logistic regression do not apply to CATREG.

Apart from the prediction of the observed response variables, prediction of future outcome variables is also of interest. Van der Kooij [43] found that the application 0.632 bootstrap to CATREG was the best performing re-sampling method for testing prediction accuracy. This gives a measure of expected prediction error rate for applying the observed model parameters to predict the outcome of future observations. The expected prediction error rates of 0.873, 0.955 and 0.944 for applying the models on future populations to predict prospective functional disability, back pain intensity and HRQOL are quite high error rates compared with our test population’s apparent prediction errors of 0.599, 0.640 and 0.744, respectively. The apparent error rates are within normal ranges as compared with earlier studies which suggest adequate internal validity, but the high expected prediction error rates suggest inadequate external validity. Van der Kooij [43] has, on the other hand, shown that expected prediction error rates increase considerably with sample sizes lower than 1000.

To discriminate between high and low levels of functional disability, back pain intensity and HRQOL 2–3 years post-surgery, the prediction models showed to have high specificity resulting in the possibility of only a few false positives receiving surgery when undesirable outcome was be expected. Adequate sensitivity also showed that each prediction model correctly identifies patients that respond positively to surgical treatment. These results suggest the possible usefulness of pre-surgical screening of these predictors for determining the prognosis of spinal surgery.

Conclusions

This study demonstrates the importance of pre-surgical psychological factors, leg pain intensity, the straight leg raise and post-operative psychomotor therapy in the predictions of functional disability, back pain and HRQOL-related outcomes.