Introduction

The most recent Global Burden of Disease study reveals that the single greatest cause of years lost with disability (YLD)—causing more than 146 million YLDs in 2013, an increase of 61% since 1990—is chronic low back pain [1]. Osteoarthritis was in 13th place. Population aging is clearly one of the reasons behind these statistics (http://www.unfpa.org/ageing). Degenerative conditions of the musculoskeletal system are among the most prevalent and symptomatic disorders associated with middle and old age [2] and the increasing number of aged people will hence be paralleled by an increase in the number suffering from degenerative joint diseases. Analyses suggest that patients of the 2020s will be more demanding of treatment and less willing to live with their symptoms than our current elderly [2]. This will result in a greater demand for elective surgery and a consequent need for resource allocation policies, with services being prioritized based on documented effectiveness. In turn, this requires that we embrace a more evaluative culture, with surgical procedures being systematically registered and scrutinized regarding their respective patient-oriented outcomes, using comparative effectiveness research (CER) methodology [3].

Anecdotally, lumbar spine surgery is believed to deliver poorer patient outcomes than total hip (THR) and knee (TKR) replacement surgery; however, the formal studies hitherto conducted in small, select groups of spine patients do not always substantiate this [4,5,6,7,8,9,10,11,12]. Notably, these studies have predominantly employed generic health-related quality of life (HRQL) outcome measures, such as the Short Form 36 (SF-36), which do not always adequately reflect the benefit of specific types of surgery [9, 13] and are less well able to discriminate between the successes and failures of treatment [14, 15]. It is well known that the proportion of patients that can be considered a success after treatment depends very much on how success is defined, in terms of both the specific metric employed and the cutoff values applied [16, 17]. Although the spine and the large joints of the lower extremity are not comparable from a (patho)anatomical/physiological or biomechanical point of view, disorders in these regions impact the same “core domains” of importance to the patient (pain, function, quality of life, etc.), which allows them to be compared, given an appropriate set of questions that tap these domains. “Success” can be measured by the achievement of a “minimal clinically important change (MCIC) score” on the given outcome instrument, but this measure is not without its drawbacks. Firstly, it is influenced by baseline scores [18], and secondly, although indicating “improvement”, it does not necessarily tell us whether the patient is doing well in the end. Recent studies suggest that the patient’s achievement (or not) of an “acceptable symptom state” (PASS) may offer a more rigorous measure of success and better tease out differences between treatments [19]. Finally, enquiry as to the patient’s perspective on complications arising after surgery may provide a hitherto poorly investigated, but extremely important aspect of patient outcome [20].

The aim of the present study was to compare the outcomes after surgery in a large number of patients with different degenerative disorders of the lumbar spine, hip, or knee, using a brief patient-reported outcome measure (PROM) that includes the “Core Outcome Measures Index” (COMI). The latter is a set of single items on pain, function, symptom-specific well-being, quality of life, and disability, all in relation to the specified joint problem. It was developed and validated [21,22,23] following initial expert group recommendations for assessing outcomes in back patients [24], and then further validated for use in other joints/locations, including the hip and knee [25,26,27] and in many languages http://www.eurospine.org/forms.htm. The PROM also includes single items on satisfaction with care, global treatment outcome, reoperations, and complications, all rated from the patient’s perspective (as per the “Patient Self-Assessment form” of the EUROSPINE Spine Tango Registry). This parsimonious set hence provides a comprehensive and candid reflection of surgical outcome, in all the major areas that matter most to patients undergoing elective surgery.

Materials and methods

We carried out an analysis of PROM data, collected prospectively from patients operated in our orthopedic hospital between 2005 and 2014 (Spine) or in 2014 (hip and knee) and stored within our in-house Surgical Outcomes registry. The spine data were collected using the framework of the EUROSPINE Spine Tango Registry [28] http://www.eurospine.org/spine-tango.htm. We included patients with a good understanding of one of the languages in which the PROM was available [28] undergoing first-time surgery for the given joint/region of the spine (and unilaterally, for THR and TKR), and having reached at least 1 year postoperative. We included spine patients who had lumbar degenerative disorder as the main pathology (with “pain relief” indicated as one of the goals of surgery), further categorized as herniated disc, spinal stenosis without spondylolisthesis, degenerative spondylolisthesis, degenerative deformity, or degenerative disc/segment disease, according to the Spine Tango diagnostic groups algorithm (http://www.eurospine.org/cm_data/def_of_degen_patho.pdf). Patients underwent decompression and/or instrumented fusion as deemed appropriate by the treating surgeon. For the hip and knee groups, we took all patients undergoing total joint replacement for primary osteoarthritis. For the hip, direct anterior or posterior approaches were used, which are both abductor sparing; for the knee, classic approaches with or without tuberosity osteotomies were used. We had no exclusion criteria. We obtained ethics committee approval for the re-use of routinely collected data.

Questionnaire

The PROM we used is shown in Table 1. Preoperatively, it comprised the Core Outcome Measures Index (COMI) [21,22,23], a validated, multidimensional index (scored 0–10) containing one question for each domain, formulated in relation to the affected joint/region. At 12 months’ follow-up, in addition to the COMI, the PROM included a range of single-item measures of treatment success (Table 1).

Table 1 Items and response options for the PROM used in the study

Patients completed the questionnaire at home, having received it from the research department by post, so the information given was free of care provider influence.

Statistical analysis

We present descriptive data as mean ± standard deviation (SD) or percentages. We used crosstabs/contingency analyses with Chi squared to assess the significance of differences between group distributions for various nominal variables.

We applied multiple linear regression analysis to investigate the influence of group (coded as dummy variables) on the COMI score at 12 months’ follow-up (FU) while controlling for potential confounders [age, sex, BMI (in categories corresponding to those of the Spine Tango surgery form; < 20, 20–25, 26–30, 31–35, > 35], smoking status (no, yes), and American Society of Anesthesiologists Physical Status Score (ASA 1–5).

We used multivariable logistic regression analysis to evaluate the influence of pathology/group on different indices of “success”, controlling for the aforementioned confounders. The indices included: achievement of MCIC (reduction of ≥ 2.2 points between the preoperative and postoperative COMI score [29]); achievement of a good global treatment outcome (operation helped/helped a lot); satisfaction with care (satisfied/very satisfied); achievement of an acceptable symptom state (satisfied/very satisfied to spend rest of life with current symptoms).

We set an alpha level for the analyses of 0.05, and used SPSS version 22 (IBM Corp., USA, 2013) and StatView 5.0 software (StatView, SAS Inc., Berkeley, CA) for the analyses.

Results

Final study group

A total of 4594 patients (3937 spine (with different pathologies), 368 hip and 269 knee) fulfilled the study inclusion criteria (Fig. 1; Table 2). Women predominated in the degenerative spondylolisthesis, degenerative deformity, degenerative segment, and knee groups, while men predominated in the stenosis, herniated disc, and hip groups. Patients were significantly younger and had lower ASA grades in the herniated disc and degenerative segment groups. A greater proportion of patients were overweight or obese (BMI > 25 kg m−2) in the knee and spinal stenosis groups. There were more smokers in the herniated disc and degenerative segment groups, and fewer in the hip and knee groups.

Fig. 1
figure 1

Flowchart showing the flow of patients through the study

Table 2 Demographic data

Group differences in baseline COMI scores

Preoperatively, the hip group showed slightly but significantly lower scores (i.e., a better status) for the COMI sum score and most of the COMI sub-domains than all other groups (Table 3). The knee group showed significantly lower scores than almost all the spine subgroups. The only COMI item for which scores did not differ significantly between the groups was “symptom-specific well-being”.

Table 3 Preoperative COMI scores and sub-domain scores

Group differences in the improvement in COMI from preoperatively to 12 months postoperatively

From preoperatively to 12 months’ postoperatively, there was a significant reduction in the COMI score (i.e., improvement in status) in each of the groups (Table 4). However, compared with the hip group, the magnitude of the improvement was significantly less in all other groups (ranging from 0.6 points less (for TKA) to 2.7 points less (for degenerative deformity), after adjusting for confounders; Table 3). A similar pattern was observed for all the COMI sub-domain scores (detailed results not shown; see example for symptom-specific well-being item in Fig. 2).

Table 4 Group differences in the improvement in COMI from preoperatively to 12 months postoperatively
Fig. 2
figure 2

Mean (SD) scores for “symptom-specific well-being” measured preoperatively and 12 months postoperatively in the different groups (scores re-scaled as 0–10)

Group differences in dichotomized ratings of success at 12 months postoperatively

The achievement of the different indices of “success” are shown in Table 5.

Table 5 Proportion of patients perceiving a successful surgery according to different criteria

“Satisfaction with medical care” gave the highest proportion of successful outcomes all round, at 96% for the hip and knee groups and between 83 and 90% for the spine subgroups. In adjusted analyses, the odds of being satisfied were approximately four to five times greater (p < 0.0001) for the hip and knee patients compared with the statistical reference group, spine degenerative deformity (with no difference between the spine subgroups).

A good “global treatment outcome” was reported by 95–98% hip/knee patients and 73% (stenosis) to 84% (herniated disc) of the spine patients; the figures were approximately 5–10% lower in each group for the respective proportions of patients achieving the MCIC for the COMI (Table 4). Adjusted analyses showed that, compared with spine degenerative deformity patients, the odds of a good global treatment outcome and achievement of MCIC were significantly higher (odds ratios 1.6–16.9; Table 5) for hip, knee, degenerative spondylolisthesis, and herniated disc patients.

The proportion of patients achieving the patient-acceptable symptom state (PASS; item 3 in Table 1) was the outcome that varied most widely between the groups, from as low as 44% for the spine degenerative deformity group to 93% for the hip group. Again, adjusted analyses showed that, compared with degenerative deformity patients, the odds of achieving PASS were significantly higher for hip and knee patients (odds ratios, 13.8 and 5.3, respectively; Table 4), and for herniated disc and degenerative spondylolisthesis patients (odds ratios 1.4 and 1.6, respectively; Table 5).

Group differences in patient-reported complications and reoperations

Overall, 25% of patients reported that some type of complication had arisen as a consequence of their operation 1 year earlier. The highest rate (29%) was seen in patients with degenerative deformity and the lowest (18%) in the hip patients (Table 6). Approximately half of the hip and knee patients reported that the complications were at least “moderately bothersome”; the corresponding figure was around 70% for patients with degenerative deformity, degenerative segment or herniated disc, and 80% for those with degenerative spondylolisthesis and spinal stenosis (Table 6). The two most common patient-reported complications in spine, hip, and knee patients alike were sensory disturbances (20–23% of all complications reported) and continuing/new pain (23–42% of all complications) (Table 7). Problems with wound healing were more commonly reported by hip and knee patients (11–16% of complications) than spine patients (7%).

Table 6 Patient-rated complications and reoperations within the first postoperative year
Table 7 Description of main patient-rated complications

The presence of a complication had a significant negative association with satisfaction and with achievement of a good global treatment outcome, MCIC and PASS (each p < 0.0001). Patient-reported reoperation rates did not differ significantly between the groups.

Discussion

When PROMs are used to serially evaluate the course of change in large numbers of patients, brief instruments ease administration, reduce respondent burden, increase completion rates, and lower costs. These factors are critical to the adoption of systematic evaluation in routine practice [30]. Using a very brief, multidimensional instrument to cover all the core domains of importance to patients—including some novel and sensitive indices that are not included in existing joint-specific or generic instruments—we showed that the extent to which THR proved superior to TKR and spine surgery was highly sensitive to the method used to categorize success. This may explain some of the previous discrepancies in the literature based mainly on generic quality of life measures [4,5,6,7,8,9,10,11,12] or satisfaction [4]. Our PROM contained questions relevant to any patient with a painful musculoskeletal disorder, but formulated in reference to the particular joint problem. In this sense, it represented an instrument that was both joint specific (and hence highly responsive) and yet generic (and hence comparable between different musculoskeletal conditions). The open nature of the questions makes them similar to those in “patient-individualized” questionnaires, known to be especially responsive [31]. For example, the function item enquires about “your normal” work/housework, as opposed to listing specific activities (that the patient may or may not normally do). This serves to ensure that the items’ content is always relevant to the patient, making them more likely to be “shifting” items (i.e., susceptible to change) with effective treatment. From a practical perspective, a “one size fits all” instrument simplifies data collection and comparative analyses, making it useful not only in research but also in hospital-wide quality control and benchmarking activities. These are the reasons underlying the decision of EUROSPINE, the Spine Society of Europe, to adopt the PROM used in the present study for its Spine Tango Spine Surgery Registry, for all spinal pathologies alike.

THR is considered to be one of the most successful orthopedic procedures available today [5], and the results of our study also substantiated this. It was in top place for all indices, including satisfaction with care, improvement, current state, patient-rated complications, and repeat surgery. However, the extent to which it distinguished itself from the other treatments clearly depended on the precise metric used. Scores on “satisfaction with care” showed the least difference among the groups. The odds of being satisfied were the same for hip and knee patients, and were approximately fourfold those for the spine patients; however, all groups showed respectable figures, with greater than 83% patients being satisfied. Satisfaction with care, which is influenced by the patient–provider relationship and concerns treatment delivery, typically yields higher proportions of success than constructs focused on therapeutic improvement [23]. It can be an important concern in quality improvement initiatives—e.g., to document that patients are not disgruntled, when popular but ineffective treatment or unnecessary imaging is denied [24]—but its external validity/generalizability may be limited. The effectiveness of a procedure can be measured as either “the extent of improvement” (doing better) or the “actual state” (doing well) following treatment. Our indices of improvement teased apart further differences between the groups, with > 95% of hip and knee patients reporting a good global treatment outcome, compared with 73–84% spine patients. A difference of this size, if it were seen between two treatments for the same condition, would yield a highly relevant and clinically significant “number needed to treat” of as few as five. Compared with the hip group, the improvement in COMI score 12 months postoperatively was significantly lower for all other groups, by 0.6 points for knee and by 2–3 points for spine patients, the latter again being a clinically relevant difference [21].

Of all the indices, the PASS was the index that revealed the lowest rates of success for all pathologies and the greatest differences between the pathologies, with only about half of the patients in the spine group achieving an acceptable symptom state, compared with 81% of the knee and 93% of the hip patients. This highlights the fact that even large and statistically significant improvements in outcome scores do not necessarily mean that an acceptable state is reached in the end. This is perhaps our most poignant take-home message. Compared with the stringent measures used in the present study—measures that capture the impact of the precise symptoms surgery aims to relieve—it seems that the quality of life instruments (SF-36, EQ 5D) used in previous studies may have been too generic and too insensitive, resulting in their painting an overly rosy picture of what spine surgery can achieve. That these findings are not peculiar to our own hospital, but are instead typical of those reported on a wider geographical basis, is shown by the similarity of the results reported for other spine units in the EUROSPINE Spine Tango Registry [28, 32].

The success of surgery seemed to diminish in line with the increasing complexity of the “motion segment” (hip, knee, spine). Multisegmental spine pathology (present in 50% of our spine patients) might serve to increase the complexity again, as might previous surgery at a different spinal level (12% patients). Problems with the hip and knee are often unilateral and may be relieved by resting the joint, whereas disorders of the centrally located spine may result in limitations that are more difficult to cope with. The common involvement of the neurologic system in spine patients is a confounding factor that is generally not an issue in hip or knee arthritis. Baseline status was significantly better in THR patients, as has been reported before [33], suggesting the threshold for surgery was lower or the impact of the problem was less severe; this may provide for better chances of full recovery postoperatively. The indications are usually much clearer for large-joint replacement than for most spine surgery. In spine, it is generally the case that the greater the concordance between symptoms, multimodal imaging (X-ray, MRI), and the rationale for the planned procedure, the better is the outcome. There are clear subsets of spine patients that benefit more from a given surgery—e.g., herniated disc patients with greater leg pain than back pain undergoing decompression [34] and discogenic pain patients with a distinct pain pattern undergoing fusion [35]—and we need to better identify such subgroups.

In summary, there are numerous reasons why outcomes are, and can be expected to be, significantly worse for spine surgery than for THR or TKR. However, if this is not exposed by the use of sensitive and stringent measures, and we instead elect to believe the generic quality of life data that suggest comparable or even superior results for spine surgery [4, 5, 8], then we will fail to seek and attract the necessary investment in research to improve the situation. Our study shows the importance of routinely assessing patient-orientated outcome for all orthopedic procedures, and recommends a suitable, practicable instrument for doing so. The findings should provide the evidence required to lobby research funding bodies, governmental agencies, industry, and charitable foundations to make greater investments in spine registries and spine research, in the hope of ultimately improving spine patient-rated outcomes.