Introduction

Anterior cervical decompression and fusion has been commonly performed and considered the surgical standard for treatment of symptomatic cervical degenerative disc disease. Several studies [5, 23, 46] report relief of pain and recovery of neurologic dysfunction in patients with cervical disc disease treated with anterior cervical decompression and fusion. Two 5- to 8.5-year mean followup studies [2, 19] suggest anterior cervical decompression and fusion can result in accelerated disc degeneration and mechanical instability on the adjacent segments. Furthermore, some authors [5, 12, 40] report reoperation rates of 7.8% to 10.4% at 2 to 10 years required to treat complications such as nonunion, bone graft collapse, or graft extrusion.

Cervical disc arthroplasty is an alternative to fusion after anterior neural decompression. The theoretical advantage of cervical arthroplasty has been the maintenance of segmental mobility and, thereby, reduction or avoidance of adjacent-segment degeneration and other limitations of fusion [15, 27, 43]. The potential disadvantages are possible wear and toxicity [27], the issue of biocompatibility [41], high incidence of heterotopic ossification [6, 22], and implant migration or subsidence [35].

Numerous investigators [3, 79, 17, 20, 26, 2831, 33, 36, 38, 39, 48] have reported randomized controlled trials (RCTs) comparing cervical disc arthroplasty with anterior cervical decompression and fusion for treatment of symptomatic cervical disc disease, including numerous FDA investigational device exemption studies [3, 8, 9, 17, 26, 28, 33, 38] with mean followups from 1 to 5 years. However, the findings of these studies are ambiguous: some trials [3, 79, 26, 33, 38, 48] suggested better neurologic outcomes and lower incidence of developing adjacent segment degeneration with arthroplasty versus fusion, whereas others [20, 2931] reported no difference between two procedures.

To clarify these ambiguous findings we conducted a meta-analysis to determine whether (1) cervical disc arthroplasty was superior to fusion with better pain relief and recovery of neurologic dysfunction; (2) cervical disc arthroplasty was associated with a lower incidence of reoperation and complications compared with fusion; and (3) cervical disc arthroplasty could reduce the incidence of adjacent segment degeneration compared with fusion.

Search Strategy and Criteria

Through an electronic search and independent, manual searches by two clinical librarians we identified all RCTs in all languages up to June 2012 comparing cervical disc arthroplasty with anterior cervical decompression and fusion for treating symptomatic cervical disc disease. The sources of electronic searching included MEDLINE®, EMBASE, and The Cochrane Central Register of Controlled Trials. The following key terms were included in our searches: “cervical disc arthroplasty,” “fusion,” “arthrodesis,” and “randomized controlled trial” (“Appendix 1”). Additionally, bibliographies of all selected full text articles were reviewed to identify more articles. After applying the search strings, we identified 503 potentially eligible articles. Two reviewers (SY and XY) independently checked the titles and abstracts of all articles. Of the 503 articles, 303 were duplicates (Fig. 1). One hundred two articles were excluded based on their titles and abstracts with apparent lack of relevance. This left 98 articles. Eligible articles included those with: (1) patients older than 18 years with symptomatic cervical disc disease; (2) use of random allocations of treatments; (3) inclusions of arms treated with any type of cervical disc arthroplasty; (4) inclusion of arms treated with anterior cervical decompression and fusion; (5) postoperative followups for the included patients after at least 1 year; and (6) inclusion of at least one valid primary outcome. The primary outcomes included: (1) neck disability index (NDI); (2) neck and arm pain assessments measured by VAS or the numerical rating scale (NRS); (3) SF-36 mental and physical health surveys (MCS and PCS); (4) neurologic status; (5) flexion-extension ROM at the index and adjacent levels; (6) reoperations related to index surgery and adjacent segments; and (7) major surgical complications. The secondary outcomes included: (1) surgical data, such as operation time, blood loss, and hospital stay; (2) patient satisfaction; and (3) employment rate. We had no restrictions related to language. Using these criteria, another 52 of the 98 manuscripts were excluded after the abstracts were reviewed.

Fig. 1
figure 1

The flow chart shows the article selection process we performed.

The full texts of all 46 remaining articles were assessed by the same two reviewers. If no agreement could be reached, a third reviewer (YSQ) made the final decision. Of these, 33 were excluded for invalid outcome measures, insufficient followup times, pertained to the same patients, or reported none of the primary outcomes. We considered outcomes expressed without mean values or SD and graphic outcomes without numerical values as invalid outcome measures. The exclusions left 13 articles from 10 studies for the current review and meta-analysis, involving a total of 2227 patients [3, 79, 17, 26, 28, 30, 31, 33, 36, 38, 48] (Fig. 1). Full-text, published articles and unpublished data of completed studies were included. Authors of studies for which only the abstracts or partial data were available were contacted for detailed study data. Six articles were paired reports [3 and 26, 9 and 28, 17 and 38] from three corresponding RCTs with different followup times. Of all the included articles, eight reports [3, 8, 9, 17, 26, 28, 33, 38] from five prospective multicenter RCTs were FDA-regulated, investigational device exemption (IDE) studies comparing different cervical arthroplasty prostheses with anterior cervical decompression and fusion. Three prospective multicenter FDA-regulated IDE studies [3, 9, 38] reported mean followups of 4 or 5 years and involved 1213 patients treated with either cervical disc arthroplasty or anterior cervical decompression and fusion.

We recorded the characteristics of the 13 included papers (Table 1), recruitment characteristics (Table 2), and details of the clinical outcome measurement (Table 3). All included studies had definite inclusion and exclusion criteria. We classified the followup times as short term (1–3 years) or midterm (4 or 5 years). Of all 10 trials, six including five prospective multicenter FDA-regulated studies were sponsored by industry.

Table 1 Characteristics of the included studies
Table 2 Study characteristics
Table 3 Details and heterogeneity of clinical outcome measurement of the included studies

The risk of bias was independently assessed by two reviewers (SY and XY) using the 12 criteria recommended by the Cochrane Back Review Group [14]. The reviewers tried to reach consensus on each criteria. Based on the recommendation by the Cochrane Back Review Group, studies were rated as having “low risks of bias” when at least six of the 12 criteria were met without serious flaws. Studies with serious flaws, or those in which fewer than six of the criteria were met were rated as having “high risks of bias.” Among all the included articles, 10 were considered to meet at least six of the 12 criteria, without serious flaws, and were rated as “low risk of bias” (Fig. 2). The remaining three trials were graded as “high risk of bias,” because fewer than six of the criteria were met in these studies. Eight articles reported adequate allocation sequences and six reported adequate allocation concealments.

Fig. 2
figure 2

The risk of bias for the included studies was assessed in our meta-analysis.

Two reviewers independently evaluated the clinical relevance of the included studies according to the five criteria recommended by the Cochrane Back Review Group [14] (Table 4). If the relevance criteria were met, a positive score was assigned, if the criteria were not met, a negative score was assigned, and if the data were not available or inadequate for the criteria, the relevance was considered unclear. A 25% improvement for pain and 10% improvement for function were considered clinically important.

Table 4 Clinical relevance of included studies

The heterogeneity was assessed using the chi-square test [18]. The value of I2 as greater than 50% would be considered substantial heterogeneity. A sensitivity analysis was performed for the measured effects, omitting the study which may have largely influenced the clinical findings. For most outcome measures, there was no significant heterogeneity across the included studies (Table 3). We did detect significant heterogeneity among the three trials that reported valid data regarding neck and arm pain assessment by VAS. A sensitivity analysis showed that the heterogeneity between the trials was attributed mainly to the study reported by Nabhan et al. [31]. Significant heterogeneity also was observed in the four studies reporting information about segmental motion [7, 17, 33, 48], and this was not changed by omitting high risk of bias trials or others.

The same two reviewers independently extracted the data of included studies and reached a consensus on each item. Data included demographics, methodologic characteristics, interventions, surgery information, and the primary and secondary clinical outcomes mentioned above. The neurologic status was determined by measuring motor function, sensory function, and deep tendon reflexes. Neurologic success was based on the maintenance or improvement in all three indicators. To be considered an overall success, patients had to achieve all of the following: improvements of 15 or more points from the preoperative to postoperative NDI scores, maintenance or improvement in neurologic statuses, no serious implant-associated or implantation procedure-associated adverse events, and no subsequent surgeries or interventions that were classified as treatment failures. The Grade 3 or 4 adverse events with use of WHO criteria [44] were considered serious adverse events. The definition of major surgical complications included implant wear, migration, dislodgement and subsidence, graft donor site morbidity, graft extrusion, vertebral body fracture, segmental kyphosis, failed kinematics, pseudarthrosis, neurologic injury, worsening of myelopathy or radiculopathy, heterotopic ossification and osteolysis, recurrent laryngeal nerve palsy, dysphagia, Horner’s syndrome, dural perforation, pharyngeal or esophageal perforation, and hematoma and infection [13, 37]. Segmental motions at index and adjacent levels were evaluated by the flexion-extension ROMs using dynamic lateral radiographs of the cervical spine in five included studies [7, 17, 26, 33, 48].

We rated the strength of evidence by using the GRADE (Grades of Recommendation, Assessment, Development and Evaluation) approach for all the pooled clinical outcomes. Study design, risk for bias, consistency, directness, and precision were assessed for rating the strength of evidence. The strength of evidence was rated for all the pooled results in our analysis. No high-quality evidence was obtained in our study. With short-term followup, there was moderate quality evidence for NDI, neurologic status, pain assessment using NRS scoring, SF-36, and segmental motion at adjacent levels (Table 5). Regarding midterm followup, moderate-quality evidence was obtained in NDI, neurologic status, pain assessment using NRS scoring, and SF-36 physical scoring (Table 6).

Table 5 Summary of strength of evidence with short-term followup
Table 6 Summary of strength of evidence with midterm followup

The Mantel-Haenszel statistical method, with the random effects method, also was used for dichotomous outcomes, and risk ratios (RR) were calculated. For continuous outcomes the statistical inverse variance method was used with the random effects analysis model, and standardized mean differences were calculated. Both outcomes were calculated with corresponding 95% CIs. By using the GRADE profiler software (GRADEpro, Version 3.6, The Cochrane Collaboration, Copenhagen, Denmark), a rating system with four levels of evidence (high, moderate, low, and very low), taken from the Cochrane Back Review Group, was applied to evaluate the level of evidence [16]. We used Review Manager software (RevMan Version 5.1.6, The Cochrane Collaboration, Copenhagen, Denmark) for statistical analysis.

Results

With mean followups ranging from 1 to 3 years, the NDI was higher (p = 0.02) in patients treated with cervical disc arthroplasty than for patients treated by fusion (Fig. 3). Neurologic status in the cervical disc arthroplasty group also was better (p = 0.03) than that of the fusion group (Fig. 4). However, there were no differences in neck and arm pain scoring between the cervical disc arthroplasty group and the fusion group measured by the NRS and VAS, respectively (Fig. 5). There also was no difference in SF-36 physical and mental scoring after cervical disc arthroplasty and fusion (Fig. 6).

Fig. 3
figure 3

This forest plot for NDI at short-term followup shows a difference between cervical disc arthroplasty and fusion procedures. Patients treated with cervical disc arthroplasty reported smaller NDI scores than patients treated with fusion. NDI = neck disability index; IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Fig. 4
figure 4

This forest plot for neurologic status at short-term followup shows a difference between cervical disc arthroplasty and fusion procedures. There was a higher rate of neurologic success postoperatively or patients treated with cervical disc arthroplasty. M-H = Mantel-Haenszel method; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Fig. 5
figure 5

The forest plot for pain assessment at short-term followup shows there was no difference in neck and arm pain scoring between the cervical disc arthroplasty group and the fusion group measured by NRS and VAS, respectively. IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty; NRS = numerical rating scale.

Fig. 6
figure 6

The forest plot for SF-36 at short-term followup shows no difference in PCS and MCS scores between the cervical disc arthroplasty group and the fusion group. IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty; PCS = physical health surveys; MCS = mental health surveys.

There were fewer major surgical complications in patients treated by cervical disc arthroplasty than by fusion (RR = 0.45; 95% CI, 0.27–0.75; p = 0.002) (Fig. 7). We also found a lower rate of reoperation related to index surgery was associated with the cervical disc arthroplasty group (RR = 0.42; 95% CI, 0.22–0.79; p = 0.007) (Fig. 8).

Fig. 7
figure 7

This forest plot shows pooling of risk ratios for major surgical complications at short-term followup after the sensitivity analysis. The rates of major complications were lower for patients treated with cervical disc arthroplasty than for patients treated by fusion. M-H = Mantel-Haenszel method; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Fig. 8
figure 8

This forest plot shows pooling of risk ratios for reoperations at short-term followup. There was a lower reoperation rate related to index surgery for patients treated with cervical disc arthroplasty, but no difference for reoperation rates at adjacent levels between the cervical disc arthroplasty group and the fusion group. M-H = Mantel-Haenszel method; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Compared with fusion, cervical disc arthroplasty retained segmental motion at the index level (Fig. 9). However, the segmental motion at adjacent levels for the cervical disc arthroplasty group was not greater than that of the fusion group (Fig. 10). There was no difference in reoperation rate at adjacent levels after cervical disc arthroplasty and fusion (p = 0.19) (Fig. 8).

Fig. 9
figure 9

The forest plot compares four RCTs including 1058 patients for segmental motion at the index level at short-term followup. It showed that cervical disc arthroplasty retained the segmental motion at index level better than fusion. IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Fig. 10
figure 10

The forest plot for the segmental motion at adjacent levels at short-term followup shows no difference between the cervical disc arthroplasty group and the fusion group. IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

In the three prospective multicenter FDA-regulated IDE studies [3, 10, 37] reporting mean followups of 4 or 5 years, the pooled results also showed cervical disc arthroplasty was associated with a lower (p < 0.001) NDI score (Fig. 11), higher (p = 0.004) SF-36 PCS score (Fig. 12), and better neurologic success rate (p = 0.04) (Fig. 13) compared with fusion. Moreover, better pain relief was reported for cervical disc arthroplasty than for fusion measured by NRS scoring for the midterm period (Fig. 14). There was also a lower rate (p = 0.01) of reoperation related to the index surgery in patients treated by cervical disc arthroplasty than fusion (R = 0.45; 95% CI, 0.24–0.83) (Fig. 15). At 4 to 5 years mean followup we also found no difference (p = 0.31) in the reoperation rate at adjacent levels after cervical disc arthroplasty and fusion.

Fig. 11
figure 11

The forest plot compares three RCTs including 704 patients for NDI at midterm followup. The NDI scores were lower for patients treated with cervical disc arthroplasty than for patients treated by fusion. IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Fig. 12
figure 12

The forest plot for SF-36 at midterm followup shows that the SF-36 PCS scores were higher for patients treated with cervical disc arthroplasty than for patients treated by fusion. IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty; PCS = physical health surveys.

Fig. 13
figure 13

This forest plot for neurologic status at midterm followup shows that there was a higher rate of neurologic success postoperatively for patients treated with cervical disc arthroplasty. M-H = Mantel-Haenszel method; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Fig. 14
figure 14

The forest plot for pain assessment at midterm followup shows that the NRS scores for neck and arm pain were lower for patients treated with cervical disc arthroplasty than for patients treated by fusion. IV = inverse variance; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty; NRS = numerical rating scale.

Fig. 15
figure 15

This forest plot shows pooling of risk ratios for reoperations at midterm followup. The reoperation rates related to index surgery were lower for patients treated with cervical disc arthroplasty, however, there was no difference for reoperation rates at adjacent levels between the cervical disc arthroplasty group and the fusion group. M-H = Mantel-Haenszel method; ACDF = cervical decompression and fusion; CDA = cervical disc arthroplasty.

Discussion

Anterior cervical decompression and fusion has been considered the surgical standard for treatment of symptomatic cervical disc disease for decades. One of the main disadvantages of the procedure, however, is that the adjacent segments to a fusion are subjected to increased ROM and intradiscal pressures, which may lead to adjacent segment degeneration. Cervical disc arthroplasty is designed to preserve motion and avoid the limitations of fusion [10, 11]. Although more relevant literature, including RCTs, comparing the two interventions have been reported, the evidence regarding whether cervical disc arthroplasty is superior to anterior cervical decompression and fusion remains insufficient owing to the ambiguous results. We therefore conducted a meta-analysis of RCTs to determine whether cervical disc arthroplasty is associated with better pain relief and recovery of neurologic dysfunction, lower incidence of reoperation and complications compared with fusion, and whether cervical disc arthroplasty could reduce the incidence of adjacent segment degeneration.

Readers should be aware of limitations in the literature in general and our study in particular. First, the most prevalent methodologic shortcomings in the included studies were the lack of references concerning the blinding method and intention-to-treat analysis. Only two reports [17, 38] used patient and outcome assessor blinding, and another two [9, 28] did single-blinding for the patients. None of the included studies encompassed the information of intention-to-treat analyses. Second, six included studies including five prospective multicenter FDA-regulated studies were sponsored by the medical device industry. Although carefully monitored, the impetus and funding for these studies from industry may be the potential source of bias. Third, the strength of evidence graded in our study was relatively low for most clinical results. This low grade was driven by the high risk for bias within individual studies and the lack of precision across the studies. Many trials used weak study designs with inadequate blinding, insufficient allocation concealment, and no intention-to-treat analysis. Although blinding is not always feasible because of the nature of the surgical intervention, adequate allocation concealment is always possible in a RCT. We could not assess the possibility of publication bias because of the small number of included studies. Meanwhile, the imprecision across the studies was attributed mainly to the limited total sample size. Fourth, the types of arthroplasty in the included studies may have a different affect on the final treatment effect. We did not assess the relative outcomes of arthroplasty in subgroups with different types of prostheses for stratified analysis because of the limited number of included trials. This limitation may be another potential source of bias for the final conclusion. Fifth, as the followups for the studies examined were no longer than 5 years, it was impossible to draw conclusions regarding the long-term results of followup. Our review differs from previously published reviews [4, 21, 45, 47] because it assesses the updated and full range of relevant RCTs with no or fewer restrictions related to the followup periods, number of surgical segments, and languages. Furthermore, we assessed the strength of evidence by using the GRADE approach and provide a current synthesis of the state of the evidence on cervical disc arthroplasty and anterior cervical decompression and fusion.

Our meta-analysis indicates that cervical disc arthroplasty provided better recovery of neurologic dysfunction than fusion. Patients treated with cervical disc arthroplasty showed a higher reduction in NDI and higher neurologic success rate than those of the fusion group with short-term and midterm followups. The reason for greater improvement in neurologic recovery after cervical disc arthroplasty potentially could be related to the maintenance of motion in the cervical spine [33, 39], rapid postoperative recovery [17, 42], and the lack of necessity for postoperative bracing [25]. Regarding pain relief, the two procedures did not differ postoperatively for the short-term period. However, with midterm followup, cervical disc arthroplasty showed superiority in pain relief compared with fusion. All function outcomes evaluated after cervical disc arthroplasty were superior or at least equivalent to those after fusion in our study.

Compared with fusion, cervical disc arthroplasty was more durable with fewer failures and reoperations related to index surgeries in our analysis. The RRs for the reoperation rates related to index surgeries with short-term and midterm followups were 0.42 and 0.45, supporting cervical disc arthroplasty rather than fusion. Cervical disc arthroplasty also was associated with a lower incidence of major surgical complications (4.0%) than anterior cervical decompression and fusion (8.9%). More surgically related complications occurred in patients treated with fusion, largely from pseudarthrosis, dysphagia, graft donor site morbidity, and graft extrusion [1, 13]. The most frequent complications of cervical disc arthroplasty included heterotopic ossification [6, 22], implant wear, migration and subsidence [27], and segmental kyphosis [34]. Cervical disc arthroplasty could greatly reduce the risk of dysphagia compared with fusion. In a randomized clinical trial, McAfee et al. [24] reported a lower incidence of dysphagia at 3 and 12 months after cervical disc arthroplasty versus anterior cervical decompression and fusion. Long-term resolution of symptoms also occurred at a higher 74% rate for patients who had cervical disc arthroplasty compared with a lesser 41.4% for patients who had anterior cervical decompression and fusion. They suggested that the esophageal retraction and soft tissue dissection during anterior cervical plating may be the primary reason for the risk of postoperative dysphagia [24].

Cervical disc arthroplasty can restore disc height, maintain spinal movement, and reduce kinematic strain on adjacent segments. Therefore, the reduction or avoidance of adjacent segment degeneration or disease was expected for cervical disc arthroplasty as an alternative to anterior cervical decompression and fusion. However, whether development of adjacent segment degeneration and disease after fusion may be related to natural degeneration, and whether restoration of motion with cervical disc arthroplasty could alter the rate of adjacent segment degeneration or disease remain open issues [19, 20, 32]. Our data indicated that although the patients treated with cervical disc arthroplasty could retain segmental motion at the operative level, the adjacent level motion did not differ after cervical disc arthroplasty and fusion. Moreover, the pooled results in our analysis also indicated that cervical disc arthroplasty could not reduce the reoperation rate attributable to adjacent segment degeneration. Nunley et al. [32] and Jawahar et al. [20] concluded that the risk of having adjacent segment degeneration develop was equivalent after cervical disc arthroplasty and fusion but higher in patients with osteopenia and concurrent lumbar degenerative disc disease.

Our meta-analysis suggests, for the treatment of symptomatic cervical disc disease, cervical disc arthroplasty is superior to fusion; it provided better functional recovery and reduced the risk of reoperations related to index surgery and complications. However, cervical disc arthroplasty did not reduce the risk of adjacent segment degeneration any more than fusion. Future studies with high methodologic quality and long-term followup periods are needed for updated meta-analyses to better evaluate the two procedures for treatment of symptomatic cervical disc disease.