Introduction

Cervical spondylosis and disc diseases are the most common cause of radiculopathy and myelopathy resulting in significant disability [1]. For decades, anterior cervical discectomy with fusion (ACDF) has been regarded as the “gold standard” of surgical intervention in treating cervical disc diseases, achieving neural decompression, segmental stabilization, and excellent clinical outcomes [24]. However, in recent years, cervical disc arthroplasty (CDA) has become an emerging option, which is designed to preserve segmental motion and theoretically prevent adjacent segment disease (ASD) [5, 6]. CDA has been increasingly regarded as an acceptable surgical intervention to treat cervical disc diseases.

To date, the legitimacy of CDA has been demonstrated by an accumulation of short- and intermediate-term follow-up studies [7] and a few long-term studies [8, 9]. Following anterior cervical decompression and artificial disc replacement, the motion patterns and disc height are preserved and the subsequent degeneration at adjacent levels can retard or reduce, resulting in excellent outcomes and clinical recovery [1013]. However, cervical arthroplasty may also be companied with complications, such as increased flexibility of the adjacent disc, erosion of the prosthesis and heterotopic ossification. Most studies have compared single-level disc replacement with ACDF. Although the multi-level disc replacement has also been advocated and performed by some authors for the treatment of multi-level cervical disc diseases instead of extensive fusion techniques, the clinical effect and safety of multi-level CDA are still disputed [14, 15].

Previous meta-analysis reviews focus on the comparison between single-level CDA and fusion, however, the outcomes and benefit of multi-level CDA compared with single-level CDA are less clear. The purpose of this study was to compare outcomes after multi-level CDA with those after single-level CDA to evaluate whether favorable single-level CDA outcomes and reliability extend to multilevel surgery.

Materials and methods

Search strategy

To search all of the relevant literature, we conducted a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)-compliant search of Medline, Embase, Pubmed, Cochrane library, and Cochrane Central Register of Controlled Trials (CENTRAL) by using combinations of the following keywords: “cervical arthroplasty”, “cervical artificial disc replacement”, cervical prosthesis, cervical total replacement “single-level”, “multi-level or two-levels or three-levels”. We searched for randomized controlled trails (RCTs), prospective cohort and retrospective cohort published between January 1990 and November 2014 that compared single-level cervical arthroplasty with multi-level cervical arthroplasty. We placed no restrictions on the language of the publication. References cited in the relevant articles were also reviewed. All researches were carefully estimated to identify repeated data. Criteria used to define duplicate data included study centers, treatment information, and any additional inclusion criteria.

Inclusion and exclusion criteria

Researches that conformed to the following criteria were eligible for inclusion in this study: (1) original researches; (2) studies that include multi-level CDA compared with single-level CDA; (3) studies with follow-up more than 1 year. We excluded studies in the thoracic or lumbar spine, articles that were duplicate reports of an earlier trial, reviews, and case-reports.

Data extraction

Two of the authors extracted the data from eligible studies independently, discussed discrepancies, and reached conformity for all items. The indispensable information extracted from all primary researches included the titles, author names, year of publication, original country, study design, sample size, type of arthroplasty prosthesis, duration of follow-up, and outcome parameters. The corresponding author of each study was contacted to obtain any missing information if it was required. The extracted data were rechecked for accuracy or against the inclusion criteria by the corresponding author.

Outcomes

The following outcomes were extracted from the included publications.

  1. 1.

    Disability was assessed postoperatively using the neck disability index (NDI).

  2. 2.

    Pain was assessed using the arm and neck visual analog scale (VAS).

  3. 3.

    Complications included the following severe events related to surgical procedures or implants: numbness or paresthesia, dural tear, cerebrospinal spinal fluid leakage, hematoma formation, dysphagia, dysphonia, and deep infections.

  4. 4.

    Heterotopic ossification (HO) rate after cervical total disc replacement.

  5. 5.

    Incidence of reoperation. Secondary procedures occurred in cases of continued neck and shoulder pain, prosthesis flexibility, or adjacent-level degeneration, and when implants or pros-theses needed to be removed for infection.

  6. 6.

    Success rate. Outcomes in some studies were graded based on the Odom classification [16]. The SF-36 quality of life score was also used in some studies to assess the recovery of patients [17].

Quality assessment

The quality of the studies was independently assessed by the authors according to the Newcastle-Ottawa Scale (NOS). The manual was downloaded from Ottawa Hospital Research Institute online. The NOS uses a pentagram symbol “☆” rating system (a pentagram symbol stands for one score), to judge quality of cohorts based on three aspects of the cohort studies: selection, comparability and outcomes. Scores were ranged from 0 to 9. Studies with a score ≥7 were regarded to be of high-quality.

Statistical analysis

We performed all meta-analyses with the STATA 12.0 (StataCorp LP, College Station, TX, USA). For continuous outcomes, means and standard deviations were pooled to generate a mean difference (MD), and 95 % confidence intervals (CI) were generated. For dichotomous outcomes, the risk ratio (RR) or the odds ratio (OR) and 95 % CI were assessed. A probability of p < 0.05 was considered to be statistically significant. Assessment for statistical heterogeneity was calculated using the I-square tests, which described the proportion of the total variation in meta-analysis assessments from 0 to 100 % [18]. The random effects model was used for the analysis when an obvious heterogeneity was observed among the included studies (I 2 > 50 %). The fixed-effects model was used when there was no significant heterogeneity between the included studies (I 2 ≤ 50 %) [19]. The possibility of publishing bias was not evaluated because there were <10 studies assessed.

Results

Study characteristics

By searching in PubMed, Embase, Medline, Cochrane library, and Cochrane Central Register of Controlled Trials (CENTRAL), 156 studies were initially identified. Eighty-seven studies were excluded because they did not meet the inclusion criteria. A flow diagram of the selection process for relative articles is shown in Fig. 1. Finally, eight studies were included into our meta-analysis and the characteristics are presented in Table 1. Out of the eight studies, four are designed as prospective cohort and the other four are retrospective cohort. Totally, 506 patients were undergone single-level CDA and 240 patients were instrumented by 532 levels of CDA in multi-level group.

Fig. 1
figure 1

The flow diagram of the selection process for relative articles. A PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)-compliant search of Medline, Embase, Pubmed, Cochrane library, and Cochrane Central Register of Controlled Trials (CENTRAL) was performed

Table 1 Patient and study characteristics of the eight included studies in the meta-analysis

Quality assessment

Assessment of the study specific quality scores from NOS system is shown in Table 2. The median score of included studies was 7.375, with a range from 6 to 8. Seven of the eight studies were identified as relatively high-quality.

Table 2 Methodological quality of studies included in the meta-analysis assessed by the Newcastle-Ottawa Scale

Outcomes analysis

Comparison between multi-level and single-level cervical arthroplasty was based on usual clinical outcomes, including NDI, VAS, reoperation rate, heterotopic ossification, and quality of life satisfaction (success rate and SF-36 quality of life score) [20]. The mean NDI, radicular VAS and cervical VAS were comparable pre-operatively between both groups (p > 0.05 for the three comparisons), and they decreased significantly at all post-operative time points compared to pre-operative values, by the same extent. At 24 months post-operation, there was no significant difference between both groups in the NDI score (single-level, 27.3 ± 1.8 versus multi-level 29.2 ± 2.9, p = 0.713), the radicular VAS score (single-level, 24.4 ± 2.1 versus multi-level 26.7 ± 3.9, p = 0.790), and the cervical VAS score (single-level, 23.5 ± 1.7 versus multi-level 23.9 ± 3.2, p = 0.593). The SF-36 quality of life score also showed a strong improvement in both groups compared to pre-operative baseline during the follow-up period, without remarkable difference between the two groups. And the success rate, taking into account both functional improvement and absence of revision surgery, was 69 and 66 % in the single- and the multi-level groups, respectively (p = 0.727). Regarding complications and re-operations, the number of patients meeting at least one complication/re-operation did not differ significantly between both groups (p = 0.109). Consistent with this, Kim et al. [21] reported that the double- and single-level Bryan arthroplasty showed similar NDI and VAS score at 1 year follow-up. Moreover, Wu et al. [22] reported that the clinical outcomes measured by VAS neck, VAS arm, and NDI, were all significantly improved after surgery compared with the pre-operative status in both groups. Notably, there was no significant difference between the single- and multilevel groups at each time point of evaluation (i.e., pre-operation, and 12 and 24 month post-operation).

However, in another study [14] of 140 patients that underwent cervical arthroplasty with PCM prostheses (229 segments), the mean improvement in the NDI for the single cases was 37.6 % versus the multilevel cases mean improvement in NDI of 52.6 % (p = 0.021). The difference between the two groups was statistically significant. The mean improvement in the VAS showed the same association: single-level mean improvement 58.4 % versus multilevel mean VAS improvement 65.9 %. Only the reoperation rates and serious adverse events were similar between single-level and multilevel arthroplasty. These findings demonstrate that clinic recovery of multi-level cervical arthroplasty is superior to single-level arthroplasty, which is not supported by the other three cohorts. This diversity may be due to the difference between the properties of PCM prostheses and Bryan and other prostheses (discussed in “Discussion”). Notably, after the relevant data were pooled from the eight included studies and investigated by meta-analysis, the overall difference of clinical outcomes between single- and multi-level CDA was not significant.

NDI

Four studies reported a postoperative NDI score of single-level and multi-level CDA. All of the four cohorts completed 1-year follow-up, three of them observed 2-year follow-up. The meta-analysis showed that the between-study heterogeneity existed (1 year follow-up, I 2 = 93.9 %; 2 year follow-up, I 2 = 88.9 %), so the random effects model was used to calculate the summary risk ratio with corresponding 95 % CI. In forest plots, the overall effect estimate was shown by a diamond for total or subtotal 95 % CI. When the diamond overlapped the vertical line of no effect, it indicated no statistically significant difference in NDI score between the two treatment groups. The SMD was 0.024 for the NDI (95 % CI −0.171 to 0.218; z = 0.24, p = 0.811) in 1 year follow-up. The SMD was 0.009 for the NDI (95 % CI −0.605 to 0.621; z = 0.03, p = 0.979) in 2-year follow-up (Fig. 2).

Fig. 2
figure 2

Forest plot of the meta-analysis of the NDI score comparing single-level with multi-level CDA. Diamonds stand for the overall effect estimate. When the diamonds overlap the vertical line of no effect (null line), it indicates that there is no statistically significant difference in NDI score between the two treatment groups in 1- and 2-year follow-up

Neck pain

Four studies reported neck VAS score postoperatively in 1-year follow-up and three studies reported 2 years follow-up. The random effects model was used in the analysis of 1 year follow-up because of the heterogeneity (I 2 = 73.5 %). In the meta-analysis of 2 years follow-up the fixed model was used because the heterogeneity was not significant (I 2 = 0.0 %). The results showed no significant difference in neck VAS score between single-level and multi-level CDA groups. The SMD was 0.179 for the neck VAS (95 % CI −0.213 to 0.571; z = 0.90, p = 0.370) in 1 year follow-up (Fig. 3a). The SMD was 0.039 for the neck VAS (95 % CI −0.159 to 0.237; z = 0.39, p = 0.700) in 2-year follow-up (Fig. 3b).

Fig. 3
figure 3

Forest plot of the meta-analysis of the neck VAS scores of single-level CDA versus multi-level CDA. a 1 year follow-up; b 2 years follow-up. The diamond of overall effect estimate overlaps the line of no effect, indicating there is no significant difference in neck VAS score between the two treatment groups

Arm pain

The arm VAS score in four cohorts were reported in 1-year follow-up and three studies reported 2 years follow-up. The random effects model was used because of the heterogeneity (1 year follow-up, I 2 = 73.9 %; 2 year follow-up, I 2 = 65.2 %). The results showed no significant difference in arm VAS score between single-level and multi-level CDA groups. The SMD was 0.199 for the neck VAS (95 % CI −0.197 to 0.594; z = 0.98, p = 0.325) in 1 year follow-up. The SMD was 0.039 for the neck VAS (95 % CI −0.363 to 0.324; z = 0.11, p = 0.910) in 2-year follow-up (Fig. 4).

Fig. 4
figure 4

Forest plot of the meta-analysis of the arm VAS between single-level and multi-level CDA. The diamond overlaps the line of no effect and demonstrates insignificant difference in arm VAS score between the two treatment groups

Adverse events

Reoperation

Three studies reported the morbidity of reoperation. There was no significant difference in incidence of reoperation between the multi-level and single-level CDA (RR = 0.715, 95 % CI 0.271–1.890; z = 0.68, p = 0.499) with very low heterogeneity (I 2 = 0.0 %) (Fig. 5).

Fig. 5
figure 5

Forest plot of the meta-analysis of the reoperation rate comparing single-level CDA with multi-level CDA. No statistically significant difference between the two groups when the diamond overlaps the null line

Heterotopic ossification

Five studies reported the incidence of heterotopic ossification (HO) postoperatively. Grading of HO was defined by the classification proposed by McAfee et al. [23]. The meta-analysis of the incidence of HO from McAfee Grade I to IV showed there was no significant difference between single-level and multi-level CDA groups (RR = 0.648, 95 % CI 0.295–1.427; I 2 = 80.3 %, z = 1.08, p = 0.282) (Fig. 6a). The meta-analysis of the severe complication, HO of Grade IV in three studies, was also been performed. The difference was not significant between the two groups (RR = 1.246, 95 % CI 0.608–2.554; z = 0.60, p = 0.547) with moderate heterogeneity (I 2 = 27.5 %) (Fig. 6b).

Fig. 6
figure 6

Forest plot of the meta-analysis of the incidence of heterotopic ossification (HO) postoperatively comparing single-level with multi-level CDA. a The morbidity of HO from Grade I to IV was pooled; b only the data of severe HO of Grade IV was pooled. No statistically significant difference between the two groups when the diamond touches the null line

Quality of life

Success rate

Clinical outcomes were assessed using the Odom outcome classification [16]. Clinical success was defined as patient report of “excellent” or “good” in the Odom rating system. Three cohorts reported the success rate of multi-level and single-level CDA. The difference was not significant between the two groups (RR = 1.470, 95 % CI 0.768–2.815; I 2 = 91.5 %, z = 1.16, p = 0.245) (Fig. 7).

Fig. 7
figure 7

Forest plot of the meta-analysis of clinical success rate between single-level and multi-level CDA. No statistically significant difference between the two groups when the diamond touches the null line

SF-36 quality of life score

The SF-36 physical component score (PCS) and mental component score (MCS) were reported in two studies. Pooled analysis shows that the PCS of multi-level CDA had no significant difference compared with single-level CDA. As shown in Fig. 8a, at 1-year follow-up, SMD (95 % CI) = 0.075 (−0.201 to 0.350), I 2 = 0.0 %, z = 0.53, p = 0.595. At 2-year follow-up, SMD (95 % CI) = 0.070 (−0.641 to 0.501), I 2 = 59.9 %, z = 0.24, p = 0.810 (Fig. 8b). The difference of the MCS between the two groups was not significant. At 1-year follow-up, SMD (95 % CI) = 0.146 (−0.130 to 0.421), I 2 = 0.0 %, z = 1.03, p = 0.301; at 2-year follow-up, SMD (95 % CI) = 0.002 (−0.273 to 0.278), I 2 = 59.9 %, z = 0.02, p = 0.986 (Fig. 8c).

Fig. 8
figure 8

Forest plot of the meta-analysis of SF-36 quality of life score postoperatively between single-level with multi-level CDA. a Physical component score (PCS) at 1 year follow-up; b PCS at 2 years follow-up; c Mental component score (MCS) at 1 and 2 years follow-up. No statistically significant difference between the two groups when the diamond touches the null line

Discussion

Cervical disc arthroplasty is used increasingly to treat degenerative cervical diseases in recent years. A number of studies reported the comparison between single-level CDA and ACDF. Previous meta-analysis reported that the clinical outcomes of cervical arthroplasty are equivalent to or superior to the outcomes of ACDF for the treatment of single-level cervical diseases [24, 25]. Single-level CDA is commonly accepted for its satisfying outcomes as ACDF. However, the outcomes and reliability of multi-level CDA remain debated.

In our previous study, we designed a prospective random control trial to compare the clinical outcomes between two-level ACDF and CDA [26]. The evaluation of the total 65 patients indicated that Bryan artificial cervical disc replacement seemed reliable and safe in the treatment of patients with two-level cervical disc disease. Also in a cadaveric study [27], Phillips showed that a two-level CDA allowed a near-normal mobility at index and adjacent levels. In another in vitro study [28], Laxer showed that adjacent discs experience substantially lower pressure after two-level disc replacement when compared to two-level anterior fusion (ACDF). Nonetheless, to date, the amount of studies comparing the outcomes of multi-level CDA with ACDF is far lacked to be pooled by meta-analysis. To investigate whether the outcomes and reliability of multi-level CDA are superior or inferior to single-level CDA, we focused on the studies comparing the two methods of surgery when we searched the literature. Only eight cohorts met the including criteria of our meta-analysis, four were prospective cohorts and the others retrospective. No RCT comparing multi-level CDA with single-level CDA was obtained. The methodological quality assessed by NOS system showed that seven of the cohorts were identified as relatively high-quality and one was moderate. Clinical heterogeneity was induced by the biomechanical properties, different intention of surgery, different cervical prostheses, and the biophysical environment in the included studies. As a result, these methodological quality deficits should be considered when interpreting the findings of this meta-analysis. Besides, the possibility of publication bias was not assessed because of the small number of included studies.

In our meta-analysis, the multi-level arthroplasty group was equally beneficial to the NDI in the single-level CDA group. The heterogeneity was generated from the study of PCM prostheses reported by Pimenta [14], which showed the overall mean improvement in the NDI for the single cases was 37.6 % versus the multilevel cases (52.6 %) with statistically significant difference (p = 0.021). Whereas Huppert [20] reported that there was no significant difference between both groups regarding the NDI score at 1 and 2 years postoperatively. And the retrospective cohort of Wu [22] measured the clinical outcomes by VAS neck, VAS arm, and NDI. Also, there were no significant differences of these parameters between the single- and multi-level groups at 1 or 2 years follow-up. Finally, we pooled the studies reporting the NDI of the two groups by using random effect model in the meta-analysis because of the high heterogeneity. The data of neck VAS, arm VAS and quality of life were also analyzed and our results showed that multi-level CDA was as effective as single-level CDA to improve the outcomes and functional recovery of patients. The diversity of outcomes between different studies may be contributed to various prostheses. The cervical prostheses are mainly divided into three types, minimally-constrained/unconstrained (Bryan, Mobi-C), semi-constrained (PCM, Prestig-LP), and constrained (Prodisc-C). Sung Bae Park [29] reported that four types of prostheses (Bryan, Mobi-C, PCM, and Prestige-LP) provided the preservation of sagittal range of motion (ROM) and increased superior adjacent segment kinematics, regardless of prosthesis design. Patients who received the unconstrained prostheses Bryan and Mobi-C showed increased ROM at the index level and a high incidence of adjacent segment degeneration (ASD) compared with patients who received the semi-constrained prostheses (PCM and Prestig-LP). Most of the CDA studied in this meta-analysis were Bryan prosthesis. A long-term follow-up of Bryan CDA [8] reported that at 4 and 6 years postoperatively, the clinical outcomes appear consistent with the previous results at 1 and 2 years postoperatively. The mean angular motion results at 4 and 6 years postoperatively for single-level patients were 7.3° and 7.7°, respectively. Two-level patients had slightly less motion at 4 and 6 years postoperatively with mean caudad values of 5.7° (4 years) and 6.0° (6 years), and cephalad values of 4.2° (4 years) and 6.2° (6 years), respectively, suggesting a reliable preserve of cervical ROM was provided by two-level CDA in long-term follow-up. Because the ROM improvement was reported by inconsistent criteria detecting the total cervical spine or different implanted level, respectively, we did not investigate the pooled ROM results by meta-analysis.

Data on adverse events were not consistent in the studies included. We reviewed postoperative complications, reoperation rate and heterotopic ossification in multi-level and single-level CDA. The reoperation rate was reported in three studies and the analysis showed that multi-level group had a similar rate of reoperation to the single-level CDA. Meanwhile, the incidence rates of HO after cervical arthroplasty showed a huge discrepancy in previous literatures. The formation of HO has been reported from none to more than two-thirds in a variety of cervical artificial discs. For Bryan disc, the reported rate of HO has varied from 48.1 % in 52 levels, to 29 % in 59 levels, to 17.8 % in 90 levels [3032]. However, more than 60 % of high rates of HO have been reported for other kinds of artificial discs [3335]. The discrepancy of the reported incidence rates might be attributed to types of artificial discs and the difference of HO determination and the level of scrutiny. Grading of HO was defined by the classification proposed by McAfee et al. [23]. Class 0 means no HO present and Class IV represents severe HO causing inadvertent arthrodesis, bony ankylosis, bridging trabecular bone continuous between adjacent endplates and <3° of motion of lateral flexion–extension radiographs. Five studies reported the incidence of HO (Grade I to Grade IV) and three of them reported the data of HO of Grade IV. The data of severe (Grade IV) HO and the total incidence of HO were pooled respectively. The meta-analysis showed no significant difference of HO (severe or total) incidence between multi-level and single-level CDA.

There are several strengths and limitations of this study. The strengths include a rigorous search strategy, no language limitations, article screening, and methodological assessments performed in duplicate, abstracted data verified by a second reviewer and utilization of the NOS system to judge the quality of the evidence. In addition, this is the first meta-analysis on this topic to compare the difference between multi-level and single-level CDA. However, some limitations of this study should be acknowledged. First, there was no RCT comparing the outcomes between multi-level and single-level CDA. The studies included were composed of four prospective and four retrospective cohorts, the statistic quality of which was inferior to RCTs. Second, the statistical power could be improved in the future by including more studies. Due to the small number of included studies, some parameters could not be analyzed by subgroups to avoid a high heterogeneity which may exert instability on the consistency of the outcomes. Moreover, clinical heterogeneity might be caused by the different indications for surgery, the characteristics of the different prostheses and the surgical technologies used at different treatment centers.

In summary, this meta-analysis indicates that the parameters of outcomes and functional recovery of patients performed with multi-level CDA are equivalent to those with single-level CDA. Besides, more well-designed studies with large groups of patients are needed to provide further evidence of the effect and reliability of multi-level CDA in the treatment of cervical disc diseases.