Introduction

Anterior cervical discectomy and fusion (ACDF) is a well-accepted surgical option for the treatment of symptomatic cervical disc disease [1]. ACDF can achieve neural decompression, maintain cervical lordosis and provide segmental stabilization. Clinical studies have reported excellent outcomes and relatively low complication rates after this procedure [25]. However, there is evidence suggesting that this approach may ultimately lead to kinematic strain on adjacent spinal levels [6, 7] resulting in disc degeneration and mechanical instability [810].

Cervical disc arthroplasty (CDA) has been introduced as an alternative treatment option for degenerative cervical spine conditions. The aims of CDA are to achieve the same neural decompression as that of traditional anterior surgery and to provide stability without eliminating intervertebral motion, thereby theoretically normalizing the kinematics of the spine and possibly protecting the adjacent levels.

Several studies reported that CDA may be associated with a higher rate of neurological success and a lower rate of adjacent segment disease (ASD) than that of ACDF at 2 years after surgery [1116]. However, some reports have been critical of this procedure, and controversies exist regarding the effects of arthroplasty on ASD and the incidence and impact of heterotopic ossification (HO) [1720]. To establish CDA as a viable surgical alternative for the treatment of cervical radiculopathy and myelopathy, long-term follow-up studies are needed. Several US Food and Drug Administration (FDA) trials have reported the mid- to long-term outcomes after CDA and ACDF. We performed a systematic literature review and meta-analysis to elucidate the mid- to long-term effectiveness and safety of CDA and ACDF.

Methods

Study selection

All randomized controlled trials comparing CDA with ACDF for the treatment of cervical disc disease were identified. The electronic databases of PubMed (1966–2013), Cochrane Controlled Trials Register (CENTRAL; Issue 1, 2013), and EMBASE (1984–2013) were searched.

The searches used a combination of keywords describing technical procedures (arthroplasty, prosthesis, implantation, discectomy, and total disc replacement) and anatomical features and pathology (cervical vertebrae), including both MeSH terms and free text words. In addition, searches were performed for the specific names of the prostheses.

Only randomized controlled trials were included in this review. The searches were limited to studies published in English, and only trials that reported outcomes after a minimum of 48 months of follow-up were included. Patients with single-level or two-level cervical spondylosis were included.

All the retrieved articles were manually reviewed by two authors (R.C.P. and S.Y.M.) and were discussed among all the authors to make a decision regarding inclusion. If there was any disagreement among the authors regarding inclusion of an article, the senior author (S.Y.M.) made the final decision. This review was conducted according to the standards of the QUORUM guidelines [21].

Data extraction

Two authors independently extracted the relevant data from the included studies regarding design, inclusion and exclusion criteria, age, gender, type of disc prosthesis, type of control intervention, and follow-up period. The outcomes pooled in this analysis included ASD, reoperation, improvement in movement and functioning measured by the Neck Disability Index (NDI), improvement in arm pain measured by a visual analog scale (VAS) score, improvement in neck pain measured by a VAS score, the Short Form 36 Health Survey (SF-36) physical component score (PCS), neurological success, heterotopic ossification (HO) and complications.

Assessment of methodological quality

The quality of the studies was independently assessed by two authors, according to the guidelines in the Cochrane Handbook for Systematic Reviews of Interventions, version 5.0 [22]. The following domains were assessed: randomization, blinding (of patients, surgeons and assessors), allocation concealment, and follow-up coverage. Each domain was classified as adequate, unclear or inadequate.

Heterogeneity

We evaluated the heterogeneity of data using the I 2 statistic. This statistic aims to assess the impact of heterogeneity on the meta-analysis [23]. We rated I 2 < 30 % as low heterogeneity, I 2 = 30–60 % as moderate heterogeneity, and I 2 > 60 % as high heterogeneity. A random effects model was used when I 2 was >30 % and a fixed effects model was used when I 2 was <30 %.

Data analysis

For continuous data, we calculated the pooled mean difference (MD) and its 95 % confidence interval (CI) using the change from baseline, standard deviation (SD), and total number of participants in each treatment arm. If the SD was not reported, we calculated this from the reported p value or CI. For dichotomized outcomes, we calculated the odds ratio (OR) and its 95 % CI using the number of events and total number of participants in each treatment arm.

Statistical analyses were performed using Review Manager 5.0 software (available from the Cochrane Collaboration at http//:www.cochrane.org). Results were regarded as statistically significant if p was <0.05.

The strength of evidence was rated using the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) approach for all pooled clinical outcomes. This rating included assessment of the study design, risk for bias, consistency, directness and precision [24].

Results

Search results

The process of identifying relevant studies is shown in Fig. 1; 86 references were extracted from the selected databases. After screening of the titles and abstracts, 59 of these were excluded because they were not relevant to the topic nor were they randomized controlled trials. The remaining 27 reports underwent detailed and comprehensive evaluation. Finally, five randomized controlled trials were included in this systematic review and meta-analysis [2529]. These five studies were all randomized US FDA approved Investigational Device Exemption pivotal trials, which reported outcomes after a mean follow-up period of 59.2 months. The inclusion criteria mainly included symptomatic cervical disc disease secondary to disc herniation or focal osteophytes. The exclusion criteria mainly included obvious cervical stenosis and cervical segmental instability. The main characteristics of the included studies are shown in Table 1.

Fig. 1
figure 1

Flow chart of the article selection process

Table 1 Characteristics of the included studies

The sample sizes of the included studies ranged from 73 to 541, and a total of 1,557 patients who were treated for symptomatic cervical disc disease refractory to nonoperative management and enrolled in the five studies. Of these 1,557 patients, 1,041 completed 4–6 years of follow-up, including 599 who underwent CDA and 442 who underwent ACDF; 82 patients in one of the studies underwent two-level surgery [27], and the other 959 patients underwent single-level surgery. The mean patient age was 45.3 years. Four types of cervical disc prosthesis were used: Prestige ST/Prestige II (Medtronic, Minneapolis, MN, USA), ProDisc-C (DePuy Synthes, West Chester, PA, USA), Bryan (Medtronic) and Kineflex|C (SpinalMotion, Mountain View, CA, USA).

Methodological quality

The results of the methodological quality assessment are shown in Table 2. All five studies precisely described the randomization method used [2529]. As the study design was not described in three of the studies [25, 28, 29], we consulted the previous studies in which the short-term outcomes had been reported to obtain this information [1315]. Only one study blinded both the patients and the assessors [28]. In addition, one study blinded only the patients [29] and one blinded only the assessors [25]. None of the studies documented concealment of randomization. Descriptions of patient drop-outs and withdrawals were included in all five reports. Therefore, the methodological quality of all five studies included was level B.

Table 2 Methodological quality of the included studies

Analysis of outcomes

ASD

Radiological adjacent segment degeneration was not reported in these five studies. Three studies reported the rate of symptomatic adjacent segment degeneration, namely adjacent segment disease (ASD) [25, 27, 28]. The pooled results show that the rate of ASD was not significantly different between patients who underwent CDA (6.4 %) and those who underwent ACDF (5.7 %) (OR 0.95, 95 % CI 0.59–1.53; p = 0.83) with no heterogeneity (I 2 = 0 %) (Fig. 2).

Fig. 2
figure 2

Adjacent segment disease. Forest plot comparing the odds ratios of adjacent segment disease between cervical disc arthroplasty (CDA) and anterior cervical discectomy and fusion (ACDF). MH Mantel–Haenszel, CI confidence interval

Reoperation

Four studies reported the rate of reoperation [25, 26, 28, 29]. The overall rate of reoperation was significantly lower in patients who underwent CDA (3.9 %) than of those who underwent ACDF (9.1 %) (OR 0.44, 95 % CI 0.22–0.89; p = 0.02) with moderate heterogeneity (I 2 = 39 %) (Fig. 3). The rate of reoperation for ASD was lower in patients who underwent CDA (2.9 %) than of those who underwent ACDF (4.8 %), but this difference was not significant (OR 0.62, 95 % CI 0.34–1.13, I 2 = 0 %; p = 0.12).

Fig. 3
figure 3

Reoperation. Forest plot comparing the odds ratios of reoperation between cervical disc arthroplasty (CDA) and anterior cervical discectomy and fusion (ACDF). MH Mantel–Haenszel, CI confidence interval

NDI

Three studies reported changes in the NDI after a minimum of 48 months [25, 26, 28]. The pooled results show that patients who underwent CDA had a significantly greater improvement in NDI than did those who underwent ACDF (MD 5.49, 95 % CI 2.79–8.20; p < 0.0001) with no heterogeneity (I 2 = 0 %) (Fig. 4).

Fig. 4
figure 4

Neck Disability Index. Forest plot comparing the odds ratios of improvement in Neck Disability Index between cervical disc arthroplasty (CDA) and anterior cervical discectomy and fusion (ACDF). IV inverse variance, CI confidence interval

Neck pain

Two studies reported changes in the neck pain VAS scores [25, 28]. The pooled results show that patients who underwent CDA had a significantly greater improvement in neck pain than did those who underwent ACDF (MD 5.42, 95 % CI 0.21–10.63; p = 0.04) with moderate heterogeneity (I 2 = 57 %) (Fig. 5).

Fig. 5
figure 5

Neck pain visual analog scale (VAS) score. Forest plot comparing the odds ratios of improvement in neck pain VAS score between cervical disc arthroplasty (CDA) and anterior cervical discectomy and fusion (ACDF). IV inverse variance, CI confidence interval

Arm pain

Two studies reported changes in the arm pain VAS scores [25, 28]. The pooled results show that patients who underwent CDA had a significantly greater improvement in arm pain than did those who underwent ACDF (MD 9.19, 95 % CI 6.57–11.81; p < 0.00001) with low heterogeneity (I 2 = 26 %) (Fig. 6).

Fig. 6
figure 6

Arm pain visual analog scale (VAS) score. Forest plot comparing the odds ratios of improvement in arm pain VAS score between cervical disc arthroplasty (CDA) and anterior cervical discectomy and fusion (ACDF). IV inverse variance, CI confidence interval

SF-36 PCS

Two studies reported changes in the SF-36 PCS [25, 28]. The pooled results show that patients who underwent CDA had a significantly greater improvement in SF-36 PCS than did those who underwent ACDF (MD 1.91, 95 % CI 0.94–2.89; p = 0.0001) with no heterogeneity (I 2 = 0 %) (Fig. 7).

Fig. 7
figure 7

Short Form 36 Health Survey physical component score (SF-36 PCS). Forest plot comparing the odds ratios of improvement in SF-36 PCS between cervical disc arthroplasty (CDA) and anterior cervical discectomy and fusion (ACDF). IV inverse variance, CI confidence interval

Neurological success

Three studies reported the neurological success rate [25, 28, 29]. The available data did not show heterogeneity (I 2 = 0 %). The neurological success rate was 93.2 % in patients who underwent CDA and 89.9 % in those who underwent ACDF, but this difference was not significant (OR 1.54, 95 % CI 0.91–2.63; p = 0.11) (Fig. 8).

Fig. 8
figure 8

Neurological success. Forest plot comparing odds ratios of neurological success between cervical disc arthroplasty (CDA) and anterior cervical discectomy and fusion (ACDF). MH Mantel–Haenszel, CI confidence interval

Range of motion (ROM)

Four studies reported the mean flexion–extension ROM at the index level, but the SD could not be calculated. In each study, the ROM was significantly higher in patients who underwent CDA than in those who underwent ACDF. Coric et al. [26] reported a mean ROM of 8.6° in patients who underwent CDA and 0.2° in patients who underwent ACDF after 5 years. Zigler et al. [29] reported a mean ROM of 9.42° in patients who underwent CDA and 1.02° in patients who underwent ACDF after 5 years. Sasso et al. [28] reported a mean ROM of 8.48° in patients who underwent CDA and a restricted ROM in patients who underwent ACDF after 4 years; and Burkus et al. [25] reported a mean ROM of 9.42° in patients who underwent CDA and a restricted ROM in patients who underwent ACDF after 5 years.

HO

Coric et al. [26] reported bridging ossification in seven patients (17 %) who underwent CDA; Zigler et al. [29] reported complete bridging ossification at the index level in six patients (6 %) who underwent CDA; and Burkus et al. [25] reported bridging ossification in three patients (3.2 %) who underwent CDA. HO was not reported in any patients who underwent ACDF.

Adverse events

Burkus et al. [25] reported 22 (8.3 %) dysphagia or dysphonia in patients who underwent ACDF and 24 (8.7 %) dysphagia or dysphonia in patients who underwent CDA (p = 0.879). There were no revision surgeries (0 %), defined as any surgical procedure used to adjust or modify the original implant configuration, in CDA group as compared with five revision surgeries in five of the patients in ACDF group (1.9 %). Coric et al. [26] reported 1 (3.1 %) implant loosening in one of patients who underwent ACDF and 1 dysphagia (2.4 %) in one of patients who underwent CDA. No implant breakages or device failures had occurred in the patients for CDA. Zigler et al. [29] reported 1 (0.9 %) dysphagia and 6 (5.7 %) pseudarthrosis in patients who underwent ACDF. No reoperations in CDA patients were performed for implant breakages or device failures.

No high-quality evidence was obtained in this study. Moderate-quality evidence was obtained regarding the rate of ASD, difference in NDI, difference in SF-36 PCS, and difference in ROM after surgery (Table 3).

Table 3 Strength of evidence after mid- to long-term follow-up

Publication bias

As fewer than ten studies were included, we did not assess publication bias using a funnel plot diagram.

Discussion

This systematic review and meta-analysis compared mid- to long-term results after CDA and ACDF. Five randomized controlled trials that reported outcomes after 4–6 years of follow-up in patients who underwent CDA or ACDF for symptomatic cervical disc disease refractory to nonoperative management were identified. There was slight heterogeneity among these trials in terms of inclusion criteria, age, interventions, type of control intervention, and methods of outcome assessment. Based on the methods of allocation sequence generation, allocation concealment, blinding and follow-up, the overall methodological quality of all five studies included was level B.

It was expected that patients who underwent CDA would have a lower rate of adjacent segment degeneration or disease than those who underwent ACDF. In this analysis, we compared the rate of ASD, which is symptomatic adjacent segment degeneration, between patients who underwent CDA and those who underwent ACDF. Some previous studies reported that the short- and mid-term rates of ASD were significantly lower in patients who underwent CDA than those who underwent ACDF [30, 31]. However, it remains unclear whether the development of ASD after fusion is related to natural degeneration, and whether restoration of the ROM after CDA influences the postoperative rate of ASD [3234]. Yang et al. [35] performed a meta-analysis of outcomes in patients with 2–5 years of follow-up, and found that the rate of ASD was lower in patients who underwent CDA (8.8 %) than those who underwent ACDF (13 %), but this difference was not significant (p = 0.32). Of these studies included in our review, only Nunley et al. [27] described assessment of adjacent level disease as its main purpose. Other studies described the rates of ASD and reoperation. Our data show that physiological segmental motion is maintained after CDA. Unexpectedly, the rate of ASD after 4–6 years of follow-up was higher in patients who underwent CDA (6.4 %) than those who underwent ACDF (5.7 %), but this difference was not significant. A larger sample size is needed to definitively determine whether there is a long-term difference in the rate of ASD after CDA compared with ACDF. Barna et al. [36] reported HO in ten patients (25 %) who underwent CDA, with spontaneous fusion across the disc replacement in three patients (7.5 %) after 4 years. Suchomel et al. [37, 38] reported HO (grade III–IV) in 63 % of patients who underwent CDA using a Prodisc CTM prosthesis after 4 years. In the studies included in this review, HO (grade III–IV) was reported in 3–17 % of patients who underwent CDA [26, 27, 29]. A fused implant in an incorrect position is likely to result in overload of the adjacent segments, probably to a greater degree than after ACDF. The long-term outcomes regarding adjacent segment protection with motion preservation are still unclear.

Four studies reported the reoperation rate. Reoperation was defined as any subsequent revision or removal of the implant, supplemental fixation, or posterior decompressive procedure. Patients who underwent CDA had a significantly lower overall rate of reoperation than those who underwent ACDF (p = 0.02). However, the rate of reoperation for ASD was not significantly different between patients who underwent CDA and those who underwent ACDF (p = 0.12). Delamarter et al. [29, 39] reported that patients who underwent ACDF had a five times higher rate of reoperation after 5 years than those who underwent CDA. The most common reasons for reoperation at the index level in patients who underwent ACDF were pseudarthrosis and dysphagia. Half of the reoperations in patients who underwent ACDF were for ASD. All the reoperations in patients who underwent CDA were for recurrent pain at the index or adjacent levels, with no cases of reoperation for implant breakage or device failure.

The pooled results of this study show that the improvements in the NDI, arm pain VAS score and neck pain VAS score were significantly greater in patients who underwent CDA than in those who underwent ACDF. The SF-36 PCS was better in patients who underwent CDA than in those who underwent ACDF. Yin et al. [40] performed a meta-analysis of outcomes after 1–3 years of follow-up, and found no differences in the neck pain VAS score, arm pain VAS score or SF-36 PCS between patients who underwent CDA and those who underwent ACDF, but found that the NDI was better in patients who underwent CDA than in those who underwent ACDF. Fallah et al. [41] performed a meta-analysis of outcomes after 2–4 years of follow-up, and found that the NDI, neck pain VAS score, arm pain VAS score and SF-36 PCS were better in patients who underwent CDA than in those who underwent ACDF. In this study, these scores were significantly improved after 4–6 years in patients who underwent CDA.

Fallah et al. [41] and Yin et al. [40] found that patients who underwent CDA had higher short- to mid-term rates of neurological success than did those who underwent ACDF. It may be that the wider lateral decompression that is required for CDA results in more successful neural decompression. In the longer term, this difference may not be significant because of factors such as HO and prosthesis subsidence [42]. In the pooled results of this study, the neurological success rate after 4–6 years tended to be higher in patients who underwent CDA (93.2 %) than in those who underwent ACDF (89.9 %), but this difference was not significant (p = 0.11).

Publication bias could not be assessed in this review because of the small number of included studies. The types of arthroplasty performed in the included studies may have affected outcomes. Only studies published in English were included, which may have introduced a selection bias. Although only 82 of the 1,041 patients who completed long-term follow-up underwent two-level surgery, this may also have introduced a selection bias.

The findings of this meta-analysis suggest that CDA has better overall mid- to long-term clinical outcomes than does ACDF in patients with symptomatic cervical disc disease. CDA results in better functional recovery and reduces operative risk as compared with ACDF. A review of the literature showed that only an insufficient number of studies had investigated adjacent segment disease; therefore, it is mandatory that adequate future research should focus in this direction. CDA appears to be a good alternative choice of surgical treatment for patients with symptomatic cervical disc disease. Further studies with higher methodological quality are needed to better evaluate outcomes after CDA and ACDF for symptomatic cervical disc disease.