Introduction

Spinal fusion remains the established gold standard for the treatment of painful degenerative disc disease (DDD) [13], but it has the drawbacks of stiffness and adjacent segmental degeneration [46]. Recently, artificial total disc replacement (TDR), a motion-preserving option, has been used to treat the patients with symptomatic DDD [7, 8]. But there is still a controversy whether TDR is more effective and safer than lumbar fusion. Previous meta-analysis concluded that TDR showed significant superiority for the treatment of DDD when compared with fusion [9]. However, the meta-analysis was based on a small sample size and insufficient analyses. The need remains for strong evidence based on the latest high-quality RCTs to test the above conclusion. The aim of our meta-analysis is to systematically compare the effectiveness and safety of TDR to fusion for the treatment of lumbar DDD again.

Materials and methods

Search methods

Up to March 2013, all published RCTs comparing TDR with lumbar fusion intervention for DDD were searched for by two authors (MJR and SSC) independently. We performed the research of Medline, Embase, Clinical, Ovid, BIOSIS and Cochrane central registry of controlled trials. A manual search of Spine, European Spine Journal, and the American and British versions of Journal of Bone and Joint Surgery was also performed to identify additional studies. Publication language was limited to English. Key words used for search were as follows: DDD, lumbar fusion, low back pain, TDR and randomized controlled trial.

Criteria for selected trials

Two reviewers (MJR and SSC) checked titles and abstracts identified from the database. For items which could not be decided on the basis of titles and abstracts, the full text was retrieved for second-round selection. All randomized controlled clinical trials (RCTs) comparing the TDR to fusion for the treatment of lumbar DDD were taken into consideration. The indication for surgical treatment was low back pain with or without radicular pain that failed to respond to conservative treatment. Patients older than 18 years of age with lumbar systematic DDD were included in this study. The interventions included various types of TDR and fusion in the lumbar spine. Studies with patients who had acute spinal fracture, infection, tumor, osteoporosis, or rheumatoid arthritis were excluded. The reviewers applied the inclusion criteria to select the potentially appropriate trials. Disagreements between two investigators were resolved by discussion, and a consensus was attempted.

Data extraction

Two reviewers participated in the extraction of relevant data from the included reports. One reviewer (MJR) extracted all relevant data onto a table; a second reviewer (SSC) checked the data. Disagreement was resolved by further discussion. The data extracted to describe characteristics of the investigations were characteristics of participants, intervention details, number of participants in each intervention group, sex radio, follow-up rate and period.

Methodological assessment

The modified Jadad scale was used as the methodological assessment for the study [10]. There are eight items designed to assess randomization, blinding, withdrawals and dropouts, inclusion and exclusion criteria, adverse effects and statistical analysis (Table 1). The score could range from 0 to 8. Scores of 0–3 indicate poor to low quality and 4–8 good to excellent quality.

Table 1 Modified Jadad scale with eight items

Outcomes for meta-analysis

Primary outcomes consisted of visual analog scale (VAS), Oswestry disability index (ODI) and the patient satisfaction. Other outcome measures, such as the reoperation rate, employment rate, the operation time and blood loss and the complications etc. were considered as secondary outcome measures.

Assessment of clinical relevance

The clinical relevance of the seven included studies was assessed according to the five questions recommended by the Cochrane Back Review Group [11]. Positive (+) would be recorded if the clinical relevance item is appeared, negative (−) for the irrelevance and unclear (?) suggests that the data are inadequate for answering the question. 20 % of improvement in the pain score [12] and 25 % of improvement in the functioning score are considered to be clinically important [13].

Statistical analysis

The Q- and I 2-statistics were used to test for statistical heterogeneity [14, 15]. The Q-statistic tested the null hypothesis that all studies shared a common effect size with minimal dispersion of the effect size across studies. I 2 can be readily calculated from basic results obtained from a typical meta-analysis as, I 2 = 100 % × (Qdf)/Q, where Q is Cochrane’s heterogeneity statistic and df is the degrees of freedom. An I 2 value <25 % was considered homogeneous, an I 2-statistic between 25 and 50 % as low heterogeneity, an I 2-statistic between 50 and 75 % as moderate heterogeneity, and an I 2-statistic above 75 % as high heterogeneity [15]. Although the random-effects model cannot explain or remove the heterogeneity, for which we still used because it was considered to be more suitable for the statistical combination of LBP trials than the fixed-effect model [16]. Dichotomous variables are presented as relative risk (RR) and continuous variables as mean difference (MD), both with 95 % confidence intervals (CI) and probability value. These data were calculated when one outcome was assessed in different ways in different trials. The meta-analysis was performed by RevMan 5.1 software (Cochrane Collaboration, Oxford, UK) for outcome measures. A level of P ≤ 0.05 was considered statistically significant.

Results

Search results

The process of searching relevant literature and the results is shown in Fig. 1. Seven published RCTs [1723] with a total of 1,584 patients were included according to the inclusion criteria. The characteristics of the studies and participants are listed in Table 2.

Fig. 1
figure 1

Study selection process

Table 2 Characteristics of seven randomized controlled trials (RCTs) included studies

Results of methodological quality

As shown in Table 2, it is indicated that most studies achieved high quality by modified Jadad scale. However, the main shortcoming reflected in nearly all studies was the lack of blinding method, which might lead to a certain degree of detection bias. All of the participants in the included studies had performed the follow-up for 2 years. The two studies [18, 21] that scored <4 had inappropriate randomization. The data of the Sasso’s study have not been pooled into the meta-analysis because of the extremely low follow-up rate, with only 18 out of the 67 patients completed the follow-up [18].

Clinical relevance

The results of clinical relevance are shown in Table 3. The patient details and intervention procedures were clearly recoded in all included studies in effort to allow researchers to replicate them in clinical practice. The complication, one of relevant outcomes, was not reported in the study [21]. An improvement more than 20 % in pain scores and improvement more than 25 % in functioning scores were accomplished in all included studies. In other word, the effect of the treatment was clinically important. The consistent outcomes suggested that the treatment benefits were likely worth the potential harms.

Table 3 Clinical relevance

Heterogeneity

There were similar demographic characteristics, pain and functioning status baseline for the participants from seven included studies. Four different artificial discs (ProDisc-L, Maverick, CHARITE’ and FlexiCore) were used in these studies. Circumferential fusion was performed in four studies [18, 19, 21, 22], ALIF in Blumethal’s study [17] and instrumented PLF or PLIF in Berg’s study [23]. The surgical data were not pooled together because of the above differences. Most outcomes were measured by the same method in the studies. In random-effects meta-analysis, heterogeneity was observed in duration of hospitalization (I 2 = 91 %, P < 0.00001), proportion of patients choosing the same treatment again (I 2 = 31 %, P = 0.22), operation time (I 2 = 99 %, P < 0.00001), blood loss (I 2 = 95 %, P < 0.00001), reoperation rate (I 2 = 55 %, P = 0.06). The outcomes regarding patient functioning, painfulness and proportion of patients returning to full-time/part-time work (I 2 = 0 %) and the complication (I 2 = 5 %) were consistent.

Meta-analyses results

As revealed in Fig. 2, the patient’s functioning ability measured by ODI in the TDR group was better than that of the fusion group (MD −5.09; 95 % CI [−7.33, −2.84]; P < 0.00001), with statistical significance between the two groups. Unfortunately, the MD of five Owestry points was not clinically relevant. The VAS score of painfulness for TDR group was less than that of the fusion group (MD −5.31; 95 % CI [−8.35, −2.28]; P = 0.0006, Fig. 3), but the MD of five points was also not clinically significant. There was a shorter duration of hospitalization (MD −0.82; 95 % CI [−1.38, −0.26]; P = 0.004, Fig. 4) in TDR group than in fusion group. Besides, A greater proportion of patients in TDR group were willing to choose the same operation again (OR 2.32; 95 % CI [1.69, 3.20]; P < 0.00001, Fig. 5), with 79.2 vs. 63.0 %. However, there was no significant difference in operation time (MD −44.16; 95 % CI [−94.84, 6.52]; P = 0.09, Fig. 6), blood loss (MD −29.14; 95 % CI [−173.22, 114.94]; P = 0.69, Fig. 7), complications (OR 0.72; 95 % CI [0.45, 1.14]; P = 0.16, Fig. 8), reoperation rate (OR 0.83; 95 % CI [0.39, 1.77]; P = 0.63, Fig. 9) and the proportion of patients who returned to full-time/part-time work (OR 1.10; 95 % CI [0.86, 1.41]; P = 0.47, Fig. 10) between TDR group and the fusion group.

Fig. 2
figure 2

Oswestry disability index in TDR and fusion groups

Fig. 3
figure 3

Visual analog scale in TDR and fusion groups

Fig. 4
figure 4

Duration of hospitalization in TDR group and fusion group

Fig. 5
figure 5

Proportion of patients who would choose the same treatment again after TDR and fusion treatments

Fig. 6
figure 6

Operation time in TDR group and fusion group

Fig. 7
figure 7

Blood loss in TDR group and fusion group

Fig. 8
figure 8

Complications in TDR and fusion groups after TDR and fusion treatments

Fig. 9
figure 9

Reoperation rate after TDR and fusion treatments

Fig. 10
figure 10

Proportion of full-time/part-time work after TDR and fusion treatments

Functional assessment

No meta-analysis on functional recovery was carried out because different assessment systems had been used in the studies and few effective data could be extracted and pooled. Outcomes and conclusions concerning functional recovery varied. At the 2 years of follow-up, significant differences were observed in the overall clinical success in TDR and fusion group from the studies reported by Blumenthal et al. [17] (63.6 vs. 56.8 %, P = 0.0004), Zigler et al. [19] (63.5 vs. 45.1 %, P = 0.0053), and Gornet et al. [20] (73.5 vs. 55.3 %, P < 0.001). However, Berg et al. [23] reported that there were no differences in ODI success between TDR group and fusion group. The author still believed that the efficacy of the TDR could be improved with the strictly choosing surgical indications. Overall, there is strong evidence that TDR patients showed satisfactory outcomes than fusion patients in ODI and VAS scores. Therefore, it could be said that the functional recovery in the TDR group was better than in the fusion group.

Discussion

Lumbar fusion is a well-established procedure for the treatment of degenerative lumbar diseases [3, 2427]. However, the original biomechanics of the spine was altered because of the loss of motion at the fused segments [28]. In addition, spinal fusion is associated with a common complication of adjacent disc degeneration [2, 6, 2931]. Adjacent segment degeneration can cause significantly stenotic lesion or instability, for which additional operations are often required [32].

TDR has increased in popularity as an alternative for lumbar fusion [33]. The technique is to restore and maintain spinal segment motion, which is attempted to prevent adjacent level degeneration at the operated segments [34, 35]. TDR provides the opportunity to restore normal segmental motion of the spine and normal loading to the adjacent segment, preventing the degeneration progression of the adjacent disc. But excessive forces are concentrated on the facet joints at the level of TDR insertion. Therefore, most problems with the TDR occur at the insertion level and not at the adjacent level. However, there is still debate on the preferred surgical method for the degenerative lumbar spine. The purpose of this study is to compare the effectiveness and the safety of TDR to fusion for the treatment of lumbar DDD.

Results of our meta-analyses confirmed that TDR shows significant safety and efficacy comparable to lumbar fusion at 2 year follow-up. Besides, TDR has significant superiority in improved physical function and reduced pain. We consider that this superiority is associated with maintaining normal spinal segmental motion. These are in accordance with the conclusion proposed by Yajun et al. [9]. However, our meta-analyses offered new findings. TDR also shows significant superiority in shortened duration of hospitalization as compared to the fusion. It suggested that the functional recovery in the TDR group was better than in the fusion group.

Some basic epidemiological information of the participates can be derived from Table 2. It is noticeable that the number of TDR was about two times as the fusion except in Berg et al. [23]. Second, most lumbar DDD patients were in the age group of 35 s–45 s. This indicated that the middle-aged population should be given more attention on heavy manual work because their intervertebral discs are no longer as good as they were in adolescence. Third, sex ratio was 1:1 indicating that there was no correlation with lumbar DDD.

The statistical results of TDR versus fusion were not stated in the previous systematic reviews because of the lack of relevant RCTs [36, 37]. The previous meta-analysis confirmed that the TDR does not show significant superiority for the treatment of lumbar DDD when compared with fusion [9]. The meta-analysis was based on 837 patients with lumbar DDD. The latest two high-quality RCTs [20, 22] which compare TDR to the spinal fusion were added in our study, with a total of 1,584 patients. When effective data from the six high-quality studies were pooled, we find that the patients with TDR had a better function and back or leg pain status and shorter duration of hospitalization stay. There was no significant difference in operation time, blood loss, complications, reoperation rate and the proportion of patients who returned to full-time/part-time work between TDR group and the fusion group.

In our meta-analysis, seven published RCTs on lumbar TDR versus fusion were analyzed. Five studies had good methodological qualities (Jadad scores ≥4), two studies only gained three scores which implied a higher risk of bias. The most prevalent methodological shortcomings appeared to be insufficiency regarding the outcome assessor blinding to intervention. The low number of included studies limited our assessment of potential publication bias by the funnel plot and unpublished researches with negative results cannot be identified. Therefore, publication bias may exist, which could result in the overestimation of the effectiveness of interventions.

Different procedures of the fusion and different types of artificial discs may affect the comparing outcomes between the interventions, although no artificial disc is shown to be superior or inferior to the others [23]. Fusion method also could result in different operative data, even if there is no significant difference in clinical and function results [38, 39]. In addition, the results are affected by heterogeneity caused by random sampling. For example, the results of operating time, blood loss and duration of hospitalization presented significant heterogeneity. Therefore, the results of this meta-analysis should be cautiously accepted. Besides, the benefits of motion preservation and protecting adjacent levels, long-term complications and surgical revisions still remain unproved from the existing data. More independent high-quality RCTs with long-term outcomes are needed to strengthen the quality of evidence and contribute information to complement the findings.

Conclusion

TDR showed significant safety and efficacy comparable to lumbar fusion at 2 year follow-up. TDR demonstrated superiorities in improved physical function, reduced pain and shorten duration of hospitalization. The benefits of operating time, blood loss, motion preservation and the long-term complications are still unable to be proved.