Introduction

Degenerative spondylolisthesis (DS) is defined as slippage of a vertebral body over the vertebral body below due to degenerative changes of the spine. DS can present with spinal stenosis and consequently neurogenic claudication and low back pain. Risk factors for DS include age > 70 years, female gender, and sedentary lifestyle [1]. Patients with DS are typically evaluated with a physical and neurologic exam, and imaging including standing radiographs and MRI. Initial management of symptomatic DS consists of conservative treatment, including oral pain medication, injections, and physical therapy.

In patients with progressive neurologic symptoms, disability, or diminished quality of life, surgical intervention for DS and associated spinal stenosis is indicated. Previous research demonstrated that patients undergoing surgery had substantially greater improvement of pain and function compared to patients that were treated without surgery during two years of follow-up [2].

Over the last few decades, instrumented fusion of vertebral bodies in addition to decompression of the spinal canal has become increasingly more common as the standard surgical treatment for lumbar DS. In some countries, 90% of decompression surgeries will include concomitant fusion [3]. The necessity of fusion procedures in addition to decompression (D + F) in treating DS was the focus of two randomized controlled trials (RCTs) published in 2016 [4, 5]. Due to somewhat conflicting results, the controversy remained [6,7,8]. Since then, multiple studies may have been published on this subject which may help to find consensus on this dilemma [9,10,11,12]. Therefore, by the means of this systematic review and meta-analysis we aimed to assess if decompression and fusion has better clinical outcomes (e.g., functionality) than decompression alone in patients with DS and associated lumbar spinal stenosis.

Methods

This review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (supplementary material 4) [13, 14]. This study was registered in the international prospective register of systematic reviews (Prospero CRD42021291603).

Inclusion criteria for studies

Studies were considered for this review according to the following inclusion criteria: (1) prospective studies, including RCTs, quasi-randomized studies and non-randomized studies; (2) patient population older than 18 years of age; (3) patients undergoing decompression or decompression with fusion for lumbar spinal stenosis due to DS; (4) measured one of the clinical outcomes (i.e., functionality, leg pain, back pain, walking improvement) or radiological outcomes at least at 1 year of follow-up; (5) were published in English. Excluded were non-original studies, conference abstracts, studies conducting retrospective analyses and studies concerning non-instrumented fusion techniques.

Interventions

Posterior decompression

Posterior decompression could be performed according to surgeons’ preference. Recent research showed no difference in clinical outcomes between various posterior decompression techniques used for decompression in patients with lumbar spinal stenosis [15].

Fusion

Any form of posterior decompression would be accepted, before or after the fusion procedure (as long as it is performed in the same surgical session). Fusion should include posterior instrumented fusion according to surgeons’ preference. Selection of grafts, devices or additional instrumentation was per surgeon preference.

Search strategy

An experienced librarian conducted a systematic search using a combination of terms related to DS, fusion and decompression techniques. All databases were searched for from inception. The search is available in Supplementary Table 1. On the 12th of November 2021, MEDLINE, Embase, EmCare, Web of Science and the Cochrane library were systematically searched for eligible articles. In addition, additional eligible articles were searched for by reference checking the included studies. All available records were screened by two reviewers independently based on title and/or abstract (P.G. and M.B.). In case of disagreements, a third independent reviewer was consulted. Following this step, two authors (P.G. and M.B.) independently screened the full-text of the manuscripts based on the inclusion criteria. Disagreements were resolved through consensus with the involvement of a third reviewer.

Data collection and analysis

Two authors (P.G. and M.B.) independently extracted all data in a pre-specified spreadsheet. Discrepancies in extraction were resolved by consensus. Extracted were (1) study characteristics (e.g., study design, inclusion criteria); (2) clinical outcomes (e.g., Oswestry disability index (ODI), visual analogue scale (VAS) for leg and back pain, walking improvement, Short-Form-36 Physical Component Summary (SF-36 PCS); (3) surgical outcomes (e.g., operative time, blood loss, length of hospital stay, reoperations and complications; and (4) postoperative radiological outcomes.

Assessment of risk of bias

Risk of bias analysis was performed for all (quasi)RCTs using the criteria recommended by the Cochrane Collaboration [16]. These criteria cover: selection bias, performance bias, attrition bias, detection bias and selective outcome reporting bias. Two authors (P.G. and M.B.) independently scored these criteria as: low risk of bias, high risk of bias, or unclear. Disagreements were resolved by consensus and if necessary, by evaluation of a third author. Risk of bias was not formally assessed for non-randomized studies as the evidence level of these studies, compared to the RCTs, were expected to be low.

Bias across studies

Conflict of interest was determined for all included studies based upon the information provided by the authors in their publication. Publication bias was assessed using a funnel plot and based upon symmetry; no formal tests were conducted because there were too few data to reliably test this.

Data analyses

Measures of treatment effect

Only data from RCTs were considered for the meta-analysis. The primary outcome was the continuous outcome the ODI measuring functional status. Continuous outcomes were expressed as mean difference (MD), including 95% confidence intervals (CI). A negative effect size indicates that decompression is more beneficial than D + F, meaning patients have better functional status after decompression only. Patient-reported outcomes were analyzed at two years of follow-up. When multiple outcomes were available from a single study, the value was used which was thought to be best correlated to that time interval. In this specific case, we used the latest time point of follow-up. Risk for reoperations was calculated as an odds ratio (OR). A random-effects model was used for all analyses based upon the DerSimonian and Laird approach [17]. RevMan 5.4.1 (The Nordic Cochrane Center, The Cochrane Collaboration, Denmark) was used to perform the meta-analysis. Due to heterogeneity of the complications that were reported, we only described the reported complications per study.

Statistical heterogeneity

Statistical heterogeneity was examined by inspecting the Forest plot and formally tested by the Q-test (chi-square) and I2. We were not able to explore cases of considerable heterogeneity (defined as an I2 statistic > 75%) by subgroup analysis, because there was insufficient data to do so.

Data synthesis and quality of the evidence

We evaluated the overall quality of the evidence for the primary outcome, and the secondary outcomes, provided that at least three studies evaluated these outcomes. The GRADE-method was applied, which ranges from high to very low quality and is based upon the following five domains: limitations of design, inconsistency of results, indirectness, imprecision, and other factors (e.g., publication bias) [18]. We downgraded for these determinants as follows: 1) limitations of design if > 50% of the study population originated from studies with a high or unclear risk of bias for allocation concealment. We focused on this specific aspect of the risk of bias because there is empirical evidence from large meta-epidemiological studies that selection bias results in exaggerated effects [19]; 2) inconsistency if the I2 statistic exceeded 75% or if only one study reported on the outcome; 3) indirectness if the included study population was thought not to be generalizable to patients with DS; 4) imprecision when there were < 400 patients for continuous outcomes or < 300 events for dichotomous outcomes and 5) other considerations when publication bias or conflict of interest was apparent.

Results

Search results

The initial search in November 2021 retrieved 2403 studies. After removing duplicates and screening based on the title and abstract, 25 studies remained (Fig. 1). After assessing full-text articles, 18 additional studies were removed (see supplementary material 2). Of the remaining 7 studies available for the qualitative analysis, 5 were suitable for the quantitative analysis [4, 5, 9, 11, 20,21,22]. The search was rerun on October the 17th 2022, which did not lead to new studies for inclusion.

Fig. 1
figure 1

Flowchart of the study selection process

Of the 7 included studies, 5 were RCTs and 2 were prospective observational studies [4, 5, 9, 11, 20,21,22]. Table 1 gives an overview of these 7 included studies. Of the RCTs, one was conducted during the 80s, while the others were conducted from the 2000s. Two RCTs were conducted in the USA, two in European countries and one in Japan. Samples sizes of the RCTs ranged from 33 to 267 patients, while patients had an average age ranging from 62 to 67 across the studies. One of the RCTs did not report specifically on the degree of slip of the patients included [20]. Another RCT only included grade I DS, while the other 3 RCTs included patients with 3 mm of slip, or more. In two of the five RCTs, flexion–extension radiographs were used to judge suitability for randomization. Decompression and fusion techniques used in the studies are reported in Table 1. In general, the decompression techniques used were highly variable and ranged from limited midline-structure preserving techniques, to more aggressive decompressions. Fusion techniques used usually concerned pedicle fixation with the use of autograft.

Table 1 Overview of the included studies

Risk of bias analysis

The results of the risk of bias analysis of RCTs are shown in Fig. 2. Three studies had a low risk of selection bias due to reporting of random sequence generation [4, 9, 11], while two had a low risk of selection bias due to reporting on allocation concealment. As blinding of patients and personnel was not possible due to fundamental differences in operating techniques between decompression and D + F, all studies had a high risk of performance bias. As all RCTs had PROs and the patient was not blinded, all studies had a high risk of detection bias. Risk of attrition bias was low for four RCTs and unknown for 1 RCT. Furthermore, two RCTs had a high risk of reporting bias, while all RCTs were estimated to have a low risk of other forms of bias. Publication bias was not formally assessed given too few data.

Fig. 2
figure 2

Risk of bias assessment for all included RCTs. A shows the risk of bias summary per study while (B) shows the risk of bias graph

Primary outcome

Oswestry disability index

Of the 7 included studies, three RCTs and two observational studies reported on the ODI after decompression and D + F at two years of follow-up (Table 2) [4, 5, 9, 21, 22]. All three RCTs did not detect a statistically significant difference between both treatment arms, while both observational studies found statistically significant more favorable results on the ODI after D + F compared to decompression alone. Pooling of the data of the three RCTs showed no difference in ODI at two years of follow-up between both groups, namely a MD of − 0.31 with a 95% CI − 3.81 to 3.19 (Fig. 3A). Study heterogeneity was low (I2 = 16%). Overall, there is high-quality evidence of no difference in ODI between both techniques at two years of follow-up (Table 3).

Table 2 Outcomes of RCTs and of prospective observational studies. For clinical outcomes of RCTs values measured at two years of follow-up are shown with their standard deviations, when reported. If not reported, one-year results are reported. + indicates the outcome is in favor of D + F,—indicates the outcome is in favor of D and ± indicates there is no difference between D + F and D. Favors means a statistically significant difference was shown in individual studies. In case if differences were not tested, no symbol is shown. Scores for leg pain, back pain and functional status are reported from 0 to 100 with 0 indicating no pain or disability. Pooled results are results which are reported by at least 3 RCTs in mean differences with their 95% confidence intervals. NR not reported
Fig. 3
figure 3

Pooled results of decompression alone versus decompression with fusion on the primary outcome the (A) Oswestry disability index, and the secondary outcomes (B) leg pain and (C) back pain

Table 3 Complications described in the RCTs and prospective observational studies

Secondary outcomes

Leg pain

Three studies reported VAS scores for leg pain at two years of follow-up. All of these studies were RCTs and showed no difference in leg pain between decompression and D + F. Pooled results of these RCTs (Fig. 3B) also showed no difference in leg pain between both techniques (MD − 1.79, 95% CI − 5.08 to 1.50). Study heterogeneity was low (I2 = 0%). Overall, there is high-quality evidence of no difference in leg pain reduction between decompression and D + F at two years of follow-up.

Back pain

Four studies reported VAS scores for back pain, three were RCTs and one was an observational study [4, 9, 11, 21]. The three RCTs reported no difference in VAS for back pain between both procedures, while the observational study reported a statistically significant lower VAS for back pain at two years of follow-up in after D + F (1.8 vs 4.5, N = 139)[21]. Pooled results of the three RCTs (Fig. 3C) show a MD of -2.54 with a 95% CI of − 6.76 to 1.67 with a low study heterogeneity (I2 = 0%). Overall, there is high-quality evidence of no difference in back pain between decompression and D + F at two years of follow-up.

SF-36 Physical component summary

Only two studies reported outcomes of the SF-36 and specifically the physical component summary [5, 22]. One of these studies was an RCT and one an observational study, both from the same lead author. Both studies showed statistically significant more favorable outcomes for the D + F-group (Table 2). Because only one RCT assessed the physical component summary, no additional analyses could be conducted.

Improvement of walking

Two RCTs assessed walking capability after surgery [4, 20]. Bridwell et al. assessed walking improvement by asking patients whether they felt their ability to walk distances was worse, the same or significantly better after surgery. Three out of nine patients (33%) of the decompression group versus twenty out of 24 patients of the D + F-group reported significantly better walking [20]. Försth et al. assessed the walking distance by a 6-min walk test and by a single question. Walking distance at 2 years after surgery did not differ significantly between patients undergoing decompression vs. D + F (396 ± 144 m vs. 382 ± 152 m). Self-reported improvement in walking distance, also did not differ significantly between both groups (86% for decompression vs 88% for D + F).

Blood loss

Blood loss after both procedures was assessed in four RCTs and one observational study [4, 5, 9, 11, 21]. All studies show less blood loss after decompression alone (Table 2). Pooled results show a MD of -320.41 with a 95% CI ranging from − 389.10 to − 251.73 (supplementary material 3A). Studies showed moderate heterogeneity (I2 = 59%). Overall, there is high-quality evidence of less blood loss after decompression only compared to D + F (Table 3).

Length of hospital stay

Length of hospital stay was assessed in five studies, of which four were RCTs. All studies measured a shorter hospitalization after decompression compared to D + F. Pooled results showed a MD of -1.7 with 95% CI − 1.8 to − 1.7 (Supplementary material 3B). Heterogeneity between studies was low (I2 = 0%). Overall, there was high-quality evidence of shorter length of hospital stay between patients undergoing decompression versus D + F (Table 3).

Reoperations

Reoperations were assessed in six studies of which five were RCTs. All studies showed no statistically significant differences in reoperation rates between both groups. Pooled results (supplementary material 3C) showed an odds ratio of 1.41 with a 95%CI from 0.84 to 2.36 for reoperations. Study heterogeneity was low (I2 = 0%). Overall, there was moderate quality evidence of no difference in reoperations between both groups.

Costs

Costs were assessed by two RCTs, but only reported by one study [4, 5]. Försth et al. reported higher direct costs for patients undergoing D + F with a mean difference of $6,800. Indirect costs were similar for both groups, making D + F on average more costly.

Radiological outcomes

Radiological outcomes were assessed by two RCTs and one observational study, but only reported by two studies [4, 20, 22]. Bridwell et al. performed regularly postoperative X-rays and showed statistically significant more slip progression in patients undergoing decompression alone versus D + F [20]. Ghogawala et al. performed various measurements on CT or MRI imaging to assess whether these were predictors for clinical outcomes. Only one (negative) radiological predictor was identified for the PCS in the decompression alone group, namely disk space height.

Complications

Table 4 gives an overview of the reported complications between both groups. A total of 172 complications occurred over a sample size of 846 patients. Total number of complications seem to be higher after D + F compared to decompression alone (25.2% vs. 16.0%). Most frequently reported complications were dural tears and neurologic deterioration.

Table 4 GRADE evidence summary of findings for the effect of Decompression vs. Decompression + Fusion

Discussion

The current review aimed to determine whether decompression with fusion would lead to any benefits in patient-reported outcomes compared to decompression alone in patients with low grade DS. Based on the inclusion of only RCTs and prospective comparative studies, we were able to provide high-quality evidence on outcomes as functionality, leg pain and back pain, which was previously not possible due to conflicting evidence. The current review shows no advantages of D + F in function, leg pain, back pain, and reoperations, compared to decompression alone in patients with low grade DS at two years. Furthermore, decompression alone was associated with a less perioperative blood loss, a shorter length of hospital stay, and lower costs.

Comparison with other studies

Multiple reviews have been published in previous years, using different methodology [23, 24]. Some of these included also retrospective studies or may not use the most efficient methods to perform data synthesis. Because we wanted to make more firm conclusions, we only included prospective studies and also included the recently published Norwegian study which had the highest weight for the pooled results [9]. If we look at other studies, that were excluded for this review, we can identify studies in favor for decompression and fusion from the same Norwegian study group as the recently published trial by Austevoll et al. [25], but also studies from Europe and North America implying non-inferiority of decompression alone compared to D + F [26, 27]. These discrepancies in the literature further emphasize the necessity of the current review.

Strengths and limitations

Based on the recent literature we were able to provide high-quality evidence on a few patient-reported outcomes. There are, however, some limitations that have to be acknowledged. One is the heterogeneity in the surgical strategies and postoperative treatment protocols. Examples of this is the variability in decompression techniques or the differences in duration of postoperative hospital stay in Asia compared to Europe or the USA. This variability in surgical techniques may be a confounder as there is some evidence that minimally invasive surgery may be associated with lower reoperation and fusion rates, less slip progression and greater patient satisfaction compared to open surgery [28, 29]. We expect this to have a limited impact on our conclusion, especially as recent studies show no difference in clinical outcomes after various decompression techniques for lumbar spinal stenosis [15]. On the other hand, fusion techniques have also improved throughout the years and the use of these novel techniques may as well have led to improved outcomes after decompression with fusion for DS. Another limitation is that most studies had no long-term data available. Therefore, we could only draw conclusion for two years of follow-up on most outcomes, which is a common endpoint in degenerative spine research. Finally, current study results only are applicable to stable DS. Strengths of our study are the prospective registration, the low statistical heterogeneity between studies, and the quality of evidence provided.

Implications

Based on the outcomes of this review, there was high-quality evidence regarding outcomes functionality, leg pain, back pain and blood loss, meaning that further research would be unlikely to change the conclusions. Therefore, based on this data D + F should not be the only treatment option for all patients with low grade spondylolisthesis and associated spinal stenosis. However, as we only made conclusions on clinical outcome data with two years of follow-up, long-term clinical data is warranted to verify our conclusions at five- or ten- year follow-up. Furthermore, studies performing health economic evaluations from societal perspective are warranted. Only one of the included studies compared costs. Such studies should evaluate whether there are differences in functionally or quality-adjusted life years between decompression vs. decompression with fusion and if these differences would justify differences in costs between both procedures. Finally, as the two U.S. studies included in our review show, some patients do seem to benefit more from D + F than from decompression alone. Identifying those patients, who are more likely to benefit from concomitant fusion, should also be the focus of further research. One study focusing on patient selection, is currently underway [30].

Conclusion

Based on the current literature, there is high-quality evidence of no difference in functionality after decompression alone compared to decompression with fusion in low grade DS at 2 years of follow-up. Decompression alone, was also associated with no difference in leg pain reduction, back pain reduction, and reoperations, but less perioperative blood loss and a shorter length of hospital stay, compared to decompression and fusion. Based on the current data, decompression alone is non inferior to decompression with fusion as a treatment option for low grade DS. Further studies should focus on long-term comparative outcomes, health economic evaluations, and identifying those patients that may benefit more from decompression with fusion instead of decompression alone.