Introduction

Low back pain (LBP) is a very common and chronic condition associated with disability and medical consultations in Western countries. The literature reports that the 1-year incidence of LBP stage between 1.5 and 36%, the recurrence rates between the 24 and 80% and LBP reaches the 18% of prevalence [1]. Although LBP is associated with many different etiologies, degenerative disc disease (DDD) is one of the most frequent and important causes, especially in young adults. Recent innovations in regenerative medicine provide new viable treatment options [2]. Total disc replacement (TDR) and interbody fusion have been developed to treat DDD in patients who failed a conservative treatment. Nowadays, interbody fusion is the gold standard in the treatment of DDD providing a solid anterior support, removing abnormal motion and disproportionate loading on pathologic disc tissue, thereby ensuring pain relief and quality of life improvement [3]. TDR was adopted by many spinal surgeons as a reliable alternative to fusion with a mid- to long-term follow-up [4,5,6,7]. Even if TDR’s biomechanical advantages, such as restoration of disc height, motion preservation and adjacent segment degeneration (ASD) reduction, have been highlighted, lumbar arthroplasty is not universally accepted [4, 7].

Recently, Stubig et al. [8] have demonstrated comparable clinical results and lower operative costs in patients treated with TDR when compared to lumbar interbody fusion.

Nevertheless, concerns regarding long-term outcome, implant survival and late complications still limit wide adoption of TDR in spine surgeon’s community [7].

The aim of this study is to report the clinical and radiographic outcomes, rate of complications and influence on spinal alignment at long-term follow-up of patients who underwent lumbar total disc arthroplasty, bringing some evidence to determine the profile of the most well-suited patients for TDR.

Material and methods

A retrospective review of patients underwent TDR was performed. Between 1998 and 2008, 47 patients underwent TDR. A single surgeon highly trained in spine surgery (C.F.) performed all procedures. Written and informed consent has been acquired from each patient.

Indications for TDR applied by the senior author were: age between 25 and 65 years, patients suffering from chronic discogenic back pain resistant to conservative treatment for at least 1 year and absence of permanent nerve root compression, symptomatic lumbar DDD evidenced by radiographic analysis and magnetic resonance imaging (MRI). Exclusion criteria for disc arthroplasty were: significant arthritic changes at the facet joints, previous spinal surgeries at the painful level, lumbar fractures, scoliosis greater than 15° Cobb angle, instability and/or spondylolisthesis, symptomatic disc herniation, pregnancy, spinal tumours, general or local infections and comorbidities such as autoimmune disease, obesity and major bone disease.

Patients were screened for main demographic data such as age at surgery, sex, body mass index (BMI), smoke status and baseline comorbidities. Surgical data included treated levels, number of motion segments involved in surgical procedure, surgical time and complications. Senior surgeon submitted every patient to a clinical [visual analogue scales (VAS) back, VAS leg and Oswestry Disability Index (ODI), when available] and radiological (standing lumbosacral and flexion/extension X-rays) evaluation protocol preoperatively and post-operatively. The facet joint status has been evaluated with CT-scan in all patients before surgery. Preoperative MRI was retrospectively analysed to assess the disc degeneration and the vertebral endplates signal changes according to Pfirrmann and Modic classifications, respectively [9, 10]. Patients not having a recent follow-up (within 1 year from the date of the present study) were recalled for a new evaluation.

Retrospectively, a trained spine surgeon analysed all the collected X-rays (preoperative, at 1 year and at final follow-up) to evaluate spinopelvic parameters [pelvic incidence (PI), lumbar lordosis (LL), segmental lordosis (SL), pelvic tilt (PT) and sacral slope (SS)].

Intervertebral disc height was systematically registered. The mobility of each implant was evaluated by measuring on the dynamic radiographs and the angles formed by the superior and inferior sides of the implant. By subtracting the hyperextension angle from the hyperflexion angle, the implant’s arc mobility was obtained. A difference superior to 3°was set up to consider that the implant was still mobile [4].

Statistical analysis

Continuous variables were reported as mean ± standard deviation and compared using the Student t test or Mann–Whitney U test if not normally distributed. Categorical variables were expressed as the number of cases or percentage. Confidence intervals (CIs) were calculated for each comparison.

The repeated measure analysis of variance (ANOVA) was used to compare means of continuous normally distributed variables in three or more timing of measurement. The Friedman test was used if variables were not normally distributed.

Kaplan–Meier survival function was created using parameters to analyse survivorship of the present series free of the revision for any reason.

For all the analysed data, a two-tailed p value of < 0.05 was considered statistically significant.

A post hoc power calculation was performed considering the final VAS back as the primary outcome measure. With the VAS back value of 7.32 ± 1.33 reported in the literature for patients affected by DDD [4] and the probability of type I error with α value of 0.01 the resulted post hoc power of the present study on 30 patients was Φ (14.8) corresponding to 100% of power.

Results

Demographic data

Thirty patients out of 47 were included with a mean follow-up (FU) of 164 ± 36.5 (120–240) months. Seventeen patients (36.2%) were excluded: 7 were lost during the follow-up and 10 patients had an incomplete perioperative clinical or radiological data. The study population included 25 females (83.3%) and 5 males (16.7%) with a mean age at surgery of 40.2 ± 10.3 (17–58) years, and an average BMI of 26 ± 2.4 (21–31) kg/m2. Twenty-four Maverick lumbar disc prostheses (Medtronic, TE, Memphis, USA), 5 ProDisc II lumbar implants (Synthes, Paoli, PA, USA) and 3 Charité artificial disc (DePuy Spine, Raynham, MA) were implanted. Biomechanically, 29 implants (91%; Maverick and Prodisc II prostheses) were considered as relatively constrained/fixed bearing devices, three (9%; Charitè artificial disc) as unconstrained/mobile bearing devices [11].

Mean surgical time was 121 ± 55.2 (80–240) minutes. Thirty-two levels were treated. The most treated level was L5S1 (n = 15, 46.8%), followed by L4L5 (n = 12, 37.5%), L3L4 (n = 4, 12.5%) and L2L3 (n = 1, 3.1%). In two patients (6.7%) with two-level lumbar disease, a double-level TDR was performed. In other two (6.7%) patients with double-level lumbar degenerative disease, a hybrid construct was accomplished. These constructions combine the advantages of a single ALIF with those of a single-level arthroplasty [12]. Forty percent of the patients present a Modic 1 sign and 26.7% a Modic 2. In the other patients (33.3%), no Modic sign was identified. All operated discs had at least a preoperative Pfirrmann grade 3 or more (22 discs grade 3, 10 discs grade 4). Figure 1 shows a sample of L5S1 DDD treated with TDR in a young female.

Fig. 1
figure 1

Sample of L5S1 degenerative disc disease. a, b Preoperative standing X-rays, c, d preoperative sagittal magnetic resonance imagines, e, f last follow-up standing X-rays

Patient features and surgical data are summarized in Table 1.

Table 1 Patient features and surgical data

Clinical and radiological outcomes

Significant improvements in VAS back (p < 0.001) and ODI score (p < 0.001) have been found without significant differences within VAS leg values (p = 0.087).

Moreover, the improvements in VAS back and ODI score were stable, and no change or temporal variance has been identified between 1-year and long-term follow-up (p > 0.2).

Clinical outcomes are summarized in Table 2.

Table 2 Clinical outcomes

With the limitation of the available numbers, analysis of clinical score and failure rate showed no differences between fixed vs mobile TDR design (p > 0.7 and p > 0.5 respectively); between different device models (p > 0.7 and p > 0.6 respectively); between hybrid group and single-level TDR group (p > 0.5 and p > 0.6 respectively); and between hybrid group and double-level TDR group (p > 0.2 and p > 06 respectively). Moreover, no differences in terms of clinical score (p > 0.6) and failure rate (p > 0.5) related to grade of preoperative disc degeneration (Pfirrmann 3 vs. Pfirrmann 4) were observed.

Three degrees were set as threshold to assess the mobility of the prosthesis. According to this criterion, 68.75% of the implants (22 implants) have preserved motion. Average mobility in flexion and extension at final follow-up was 10.2° ± 1.1° at discs L4–L5 and 7.9° ± 1.2° at discs L5-S1. There was a significant mobility difference between L4L5 and L5S1 segment (p < 0.001, 95% CI 1.36–3.74). Nonetheless no correlation between the mobility of the implant and clinical scores (ODI, VAS back and VAS leg) were found (p > 0.15). The statistical analysis has highlighted no significant differences in mobility among different devices and biomechanical designs of TDR (p > 0.2). The different preoperative grade of disc degeneration did not correlate with post-operative motion preservation of the implant (p > 0.3).

No difference in clinical outcomes based on instrumented level has been found. Moreover, the BMI score was not correlated with clinical outcome.

Complications

Two (6.7%) approach-related complications were reported: a case of retrograde ejaculation (spontaneously solved in 6 months) and a postsympathectomy syndrome in a young female. Two cases of partial resolution of back pain have been observed.

At long-term follow-up assessment, 1 case of aseptic loosening of TDR and 3 ASD were observed.

The case of aseptic loosening was recorded 2 years after surgery at L3L4 level in a patient with previous lumbar interbody fusion at a lower level. The prosthesis was removed, and an anterior corpectomy with a Harms cage and posterior fusion was performed. This case has been the only one requiring the implant removal.

Two cases [L3L4 TDR at 7 years (Fig. 2) and L5S1 TDR at 10 years] of upper level ASD required surgical revision with circumferential fusion. The other case of ASD was managed conservatively. The overall rate of complications is 20%.

Fig. 2
figure 2

Case of adjacent segment degeneration (ASD) in a hybrid construct at 7 years. a, b Preoperative standing X-rays showing proximal ASD, c, d post-operative standing X-rays

Figure 3 shows the Kaplan–Meier survival function of the population considered as failure any cause of revision. Table 3 summarizes the reported complications.

Fig. 3
figure 3

Kaplan–Meier survival function for failure for any cause of revision in patients treated with total disc replacement

Table 3 Complications

No cases of infection and no cases of clear subsidence of the prosthesis have been found. However, some devices have been sub-optimally implanted. These radiographic imperfections did not show clinic relevance.

Sagittal alignment

Spinopelvic parameters (PI, LL, PT) were retrospectively evaluated on preoperative, 1-year post-operative, and last follow-up (Table 4). No significant changes in spinopelvic parameters between baseline and 1-year follow-up were recorded. Differences between preoperative and last FU were instead noted. However, at final follow-up 73.3% of the patients were considered globally well aligned [13].

Table 4 Spinopelvic parameters

Discussion

Total VAS back and ODI scores statistically decreased from preoperative to 1 year after surgery. Although these scores increased from 1-year clinical examination to last follow-up, they remained significantly lower than the preoperative values and were probably secondary to age-related degeneration of lumbar spine. These clinical improvements were consistent with outcomes reported in the literature. Our results are comparable to other mid- to long-term studies (more than 5 years) and studies with larger cohort of patients, which confirm the beneficial effect of TDR in the treatment of DDD [5, 6].

We did not report any differences in terms of clinical and functional outcomes related to age, sex, BMI, preoperative grade of disc degeneration, biomechanical design and type of implant. In case of double-level DDD in two patients was performed a double TDR, while in other two cases a hybrid construct was preferred.

Furthermore, no clinical and functional differences were found between the hybrid group and double-level TDR. These findings seem to be confirmed by Andrieu et al. [14].

Moreover, we did not find any statistically significant differences between patients treated with single and multi-level TDR. On the other hand, other authors have shown, single-level TDR gave better results than double level [15]. It was reasonable to expect a higher overall complications rate due to an increased surgical time, blood loss and instrumented levels in multi-level group [5]. We have to underline that we performed only two patients with hybrid constructs and two with double TDR precluding a careful analysis.

We reported an overall rate of perioperative complications of 20% and a long-term revision rate of 10% (2 cases of ASD and 1 prosthesis removal due to aseptic loosening). These results seem to be aligned with other authors. Most of papers showed similar rates [6, 16, 17]. Nevertheless, examining the existing literature, we found higher rates of overall complications. Guyer et al. [18] reported a comparable reoperation rate (10.3%), nevertheless authors reported a higher overall rate of complications (71.1%). It is not clear the reason of this greater variability in overall complications rate, probably different indications (less selective) for TDR could explain these data.

Analysing ASD, different surgical strategies were proposed as potential solutions [19].

TDR has the theoretical advantage to restore disc height-preserving spinal segment motion in order to reduce the excessive strain at adjacent levels, decreasing ASD risk. Our cohort of patients confirmed these findings, showing high rate of preserved mobility. However, we reported an overall rate of ASD of 10%. Our results seemed to be average according to recent literature. Some authors have reported very low ASD rates [5], but they may represent an exception since most of studies showed worse results [20]. Huang et al. [21] underline a clear relationship between TDR range of motion and the presence of ASD at 8.6-year follow-up. Patients with motion 5°or greater had a 0% prevalence of ASD. Patients with motion less than 5° had a 34% prevalence of ASD (p = 0.021, odds ratio 13.5). However, authors assess that ASD had no statistically significant effect on clinical outcome.

According to these outcomes, we are not able to conclude that TDR is a protective factor against ASD, but we can assess TDR is at least as safe as fusion [22].

We observed a significant difference in terms of mobility between the two most treated levels but the mobility of the implant was not correlated to clinical outcome. These findings are confirmed by other authors [4, 23]. We did not report any difference in terms of post-operative implant mobility related to grade of preoperative disc degeneration. The relatively constrained/fixed bearing design reduced stress and excessive load on facet joints [11]. Moreover, the lordosis distribution related to Roussouly morphotypes could represent other significant factor in preservation of implant mobility.

Huec et al. [23] reported that when prosthesis is implanted at level L5–S1 or L4–L5, the local lordosis increases and the lordosis at the above level significantly decreases. However, the authors assessed that the prosthesis has enough range of motion allowing the patient to achieve or maintain the natural sagittal and spinopelvic balance needed to prevent undue stress on the muscles and the sacroiliac joint.

Huec et al. [23] demonstrated that TDR did not significantly modify spinopelvic parameters, confirming our findings between baseline and 1-year FU. In our series at long-term FU, differences were noted, probably secondary to age-related degeneration of lumbar spine.

One of the most TDR’s controversial aspects is its ideal indication. The experience acquired in the last 20 years has allowed us to determine the profile of the most well-suited patients for TDR. The inclusion and exclusion criteria have been already stated. Disc arthroplasty requires a stable fixation to the bone. Patient with poor bone quality might develop vertebral body failure and fixation failure. In our experience, other two main aspects have to be analysed in detail. First is the importance of coronal deformity. We have set a threshold of 15° Cobb angle as reported by other authors [4]. We observed that the only case of aseptic loosening of our cohort affected a patient with a coronal deformity that approximate this threshold. However, with the limitation of the available numbers, we were not able to find significant correlations between coronal deformity and clinical outcome. The asymmetrical load distribution at bone–device interfaces could justify the reported failure. Therefore, we could underline the importance to evaluate, not only the sagittal alignment, but also the coronal misalignment.

The second aspect we would discuss is baseline PI. PI and spinopelvic relations were globally accepted in daily clinical practice in the last decade. Therefore, originally, the senior surgeon did not consider PI as a possible limitation of TDR. Plais et al. [4] excluded patients classified as Type 4 according to Roussouly et al. [24], since the high risk of stress on facets and the excessive load are associated with facet joint arthrosis and impairment of prosthesis. Pellet et al. [25] supported this thesis. The authors showed that in cases of low PI, it was necessary to maintain a Roussouly type 1 or 2 without increasing lordosis with L4L5 prosthesis. By contrast, L5S1 arthrodesis seemed a more suitable approach for treating patients with elevated PI and SS (back type 3 or 4). This retrospective analysis showed that our patient’s cohort had a middle value of PI and for this reason we have experienced good clinical and functional outcome with this surgical technique. The study presents several limitations. It is a retrospective case series with limited population. However, TDR has very strict and selective indication as previously listed. Thirty-six percent of the patients that originally underwent TDR were excluded from the final analysis. The datum is compatible with other long-term studies [4]. Seven (16.7%) patients were lost during the follow-up; other ten patients were clinically and radiologically evaluated for study enrolment, but were finally excluded due to incomplete perioperative clinical and radiological data.

This study is the results of single surgeon database analysis. The strengths are: a long-term follow-up [mean FU of 164 ± 36.5 (120–240) months]; strict and homogeneous indications to surgery; limited heterogeneity of surgical procedures, clinical and radiological FU. Nonetheless, the limited series of the present study could not preclude type II errors. Another bias is linked to poor quality MRI images (above all oldest exams).

Nevertheless, large adoption of TDR still remains limited due to concerns regarding long-term outcome, implant survival and late complications. The aim of our study is to report our 20-year experience with disc arthroplasty showing long-term complications, clinical and radiological results.

Conclusion

Relevant clinical improvements and long-lasting reduction in pain have been achieved in patients who underwent TDR with a long-term follow-up. Nevertheless, the optimal surgical indication is crucial to achieve excellent clinical outcomes. We strongly advocate further high-quality long-term studies to better clarify the role of single, multi-level TDR or hybrid constructs and mainly the importance of spinopelvic alignment in indication and outcome. Lastly, our long-term results confirm the existing evidence about efficacy and safety of TDR as a reliable option, in optimal surgery indication, to treat DDD.