Introduction

Spinal fusion surgery is an effective treatment method to prevent curve progression in patients with idiopathic scoliosis. However, there are concerns regarding the long-term effect of spinal fusion such as back pain, poorer physical function, reduced spine mobility, reduced functional sports activity, poorer self-perception of appearance, poorer psychological well-being and late infections requiring implant removal.

Back pain after spinal fusion surgery is common [1,2,3,4,5,6,7,8,9,10]. When compared with matched non-scoliotic controls, the incidence of back pain in patients with spinal fusion were higher [2, 3]. There were suggestions that fusion to the lower lumbar vertebra caused higher incidence of back pain [5, 7]. Late infections causing back pain requiring implant removal had been reported [9]. The incidence of back pain may be related to poorer physical function which affects the activity of daily living [1, 2, 4, 11, 12]. However, there were reports that had documented no worsening of physical function or perceived better function compared to non-surgical patients [3, 8, 10, 13,14,15,16,17,18,19,20,21,22,23]. The reduction in spine mobility was expected after surgery, especially when fusion was extended to the lower lumbar spine [19]. This may affect the patients’ functional sports activities [12]. Corrective surgery was expected to maintain or improve patients’ self-image and self-perception of appearance by reducing the curvature magnitude [15, 24]. However, not all patients attained this improvement and had reported poorer self-perception of appearances [6, 11]. The psychological well-being and mental health of postoperative patients may also be affected [11, 17].

In posterior spinal fusion (PSF) surgery for patients with adolescent idiopathic scoliosis (AIS), the selection of fusion levels and determination of the lowest instrumented vertebra (LIV) had been reported to affect the functional outcome, spine mobility and degeneration of the spine [5, 7, 25]. Due to these important factors, this study was designed to investigate whether LIV selection would affect the mid-long-term (more than 10 years) health-related quality of life (HRQoL), functional outcome and degeneration of the remaining unfused lumbar intervertebral discs in AIS patients who had PSF surgery. We also analysed whether LIV selection would affect the degeneration of the adjacent disc below LIV, the second disc below LIV (adjacent disc + 1), any disc below the LIV and all the discs below the LIV.

Methods

Study design

This was a cross-sectional study conducted at a tertiary deformity centre. All patients who attended the centre’s outpatient clinic was screened for inclusion. This study received institutional review board approval from University Malaya Medical Centre Medical Ethics Committee. The approval number is MECID No: 2016126-4669. Inclusion criteria were AIS patients who had undergone PSF surgery more than 10 years ago, age at time of surgery between 10 and 18 years old, and patients who had consented to participate in the study. Exclusion criteria were patients with non-idiopathic scoliosis, congenital scoliosis, psychological disorders and medical or surgical co-morbidities. This study obtained consent from all patients who agreed to participate.

Division of groups

Patients were divided into 2 groups:

  • Group 1 (G1): LIV at L3 or higher

  • Group 2 (G2): LIV at L4 or lower

Data collection

Patients were interviewed and requested to complete the research questionnaires. Whole spine anteroposterior and lateral radiographs were taken. Magnetic resonance imaging (MRI) of the lumbosacral spine was performed in all study patients. Radiographic parameters were recorded and measured with digital software (Centricity PACS, version 5.0, GE Healthcare, Chicago, IL, USA).

The demographic parameters obtained were age, follow-up duration, height, weight, body mass index (BMI), gender, smoking status and occupation status (blue collar vs white collar). The postoperative radiographic data obtained were Cobb angle, coronal balance, pelvic obliquity, coronal LIV tilt, sagittal vertical axis (SVA), thoracic kyphosis, lumbar lordosis, pelvic tilt, sacral slope, pelvic incidence, pelvic incidence-lumbar lordosis (PI-LL) mismatch, upper instrumented vertebra (UIV) level, LIV level, type of fusion (selective vs non-selective), number of fused levels and types of instrumentations.

MRI evaluation of the unfused intervertebral discs characteristics were classified using Pfirrmann classification [26] which is divided into the following five grades:

  • Grade I: The structure of the disc is homogeneous, with bright hyper intense white signal intensity and a normal disc height.

  • Grade II: The structure of the disc is inhomogeneous, with hyper intense white signal. The distinction between nucleus and anulus is clear, and the disc height is normal, with or without horizontal grey bands.

  • Grade III: The structure of the disc is inhomogeneous, with intermediate grey signal intensity. The distinction between nucleus and anulus is unclear, and the disc height is normal or slightly decreased.

  • Grade IV: The structure of the disc is inhomogeneous, with hypo intense dark grey signal intensity. The distinction between nucleus and anulus is lost, and the disc height is normal or moderately decreased.

  • Grade V: The structure of the disc is inhomogeneous, with hypo intense black signal intensity.

For further sub-analysis, patients were stratified into the following 2 groups:

  • Healthy disc (Patient with the worst Pfirrmann grade of I, II or III for any disc below LIV)

  • Degenerated disc (Patients with the worst Pfirrmann grade of IV or V for any disc below LIV)

The Pfirrmann grades of the adjacent disc below LIV, the second disc below LIV (adjacent disc + 1), the worst disc below LIV and all discs below LIV were documented (Fig. 1). Pfirrmann scores were calculated as the average of Pfirrmann grades for all unfused discs below LIV. The formula is displayed below:

$${\text{Pfirrmann score }} = \frac{{\text{Total Pfirrmann grades for each unfused disc below LIV for a patient }}}{{\text{Total number of unfused disc below the LIV for the same patient}}}$$
Fig. 1
figure 1

a MRI of a 28-year-old patient, postoperative 13 years, T4 to L1 fusion with Pfirrmann grade 2 at L1/2, L2/3, L3/4, L4/5 and L5/S1 discs b MRI of a 41-year-old patient, postoperative 30 years, T4 to L2 fusion with grade 3 L2/3 disc, grade 3 L3/4 disc, grade 4 L4/5 disc and grade 2 L5/S1 disc c MRI of a 32-year-old patient, postoperative 17 years, T2 to L3 fusion with grade 2 L3/4 disc, grade 2 L4/5 disc and grade 3 L5/S1 disc d MRI of a 53-year-old patient, postoperative 38 years, T4 to L4 fusion with grade 5 L4/5 disc and grade 4 L5/S1 disc

The HRQoL parameters obtained were SRS-22r, SF-36 and Oswestry Disability Index (ODI). Functional sports activities were assessed by using the Modified Cincinnati Sports Activity Scale (MCSAS). The SRS-22r were further sub-analysed based on five domains: function, pain, self-image/appearance, mental health and satisfaction with treatment. The SF-36 were further sub-analysed based on eight domains: physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional and mental health.

Statistical analysis

Student t-test was used to investigate the differences in continuous variables between groups. Chi-square test was used to investigate the differences in categorical variables between groups. Pearson’s correlation was used to investigate the correlation between variables. Interobserver reliability was estimated by calculating the Cohen’s kappa coefficient and corresponding 95% confidence interval (CI). Values 0.01–0.20 indicate none to slight agreement, 0.21–0.40 indicate fair agreement, 0.41– 0.60 indicate moderate agreement, 0.61–0.80 indicate substantial agreement, and 0.81–1.00 indicate almost perfect agreement. The cut-off point of statistical significance was defined at 0.05.

Results

A total of 48 patients were recruited during their scheduled clinic appointments consecutively. These patients had their surgeries done between 1982 and 2010. The mean follow-up duration was 17.7 ± 6.3 years. The mean age of the cohort was 33.3 ± 8.7 years. Majority of patients were females (93.7%). There were no smokers amongst the patients in both groups. The LIV was at L3 or higher (G1) and at L4 or lower (G2) in 39.6% and 60.4% of patients, respectively. There were no significant differences in demographics between both groups. There were no significant differences in the coronal and sagittal parameters between both groups except for postoperative coronal LIV tilt (4.2 ± 8.8° in G1 versus − 4.9 ± 12.0° in G2, p = 0.004). A total of 27 patients (56.3%) had LIV at L4. There was only one patient each with LIV at T12 and L1. There were no significant differences in types of instrumentations between both groups (Table 1).

Table 1 Demographics and postoperative data comparison between patients with LIV L3 or higher and patients with LIV L4 or lower

Pfirrmann grading was done by two observers, CST and CKC. The interobserver reliability was analysed and the overall kappa value was 0.76 (95% CI, 0.67 to 0.87) indicating substantial agreement. For grades with disagreements, both observers met together to discuss and decided on a mutually agreed grade. Table 2 shows the Pfirrmann grades and scores of the adjacent disc below LIV, adjacent disc + 1 below LIV and all discs below LIV. Majority of patients have Pfirrmann grade II in adjacent disc below LIV (34 discs, 70.8%), adjacent disc + 1 below LIV (30 discs, 65.2%) and all discs below LIV (81 discs, 65.9%) regardless of the LIV. None of the discs had Pfirrmann grade I. Although not statistically significant, there were more degenerated discs (Pfirrmann grade IV and V) in G2 compared to G1, 7 discs versus 2 discs in adjacent disc below LIV, 4 versus 1 disc in adjacent disc + 1 below LIV, and 11 versus 7 discs in all discs below LIV. In majority of patients (23 patients, 47.9%), the most degenerated disc was Pfirrmann grade II. The mean Pfirrmann scores averaged 2.5 ± 0.6 with no significant differences between both groups.

Table 2 Pfirrmann grades and scores comparison between patients with LIV L3 or higher and patients with LIV L4 or lower

Figure 2 demonstrates the distribution of Pfirrmann grade according to disc levels. One patient had Pfirrmann grade II at T12/L1 level while two patients had grade II at L1/L2 level. No patients had grade III or higher in these two levels. At L2/3 level, four patients (3.3%) and three patients (2.4%) had grade II and III, respectively. At L3/4 level, 12 patients (9.8%), five patients (4.1%) and two patients (1.6%) had grade II, III and IV, respectively. At L4/5 level, 33 patients (26.8%) had grade II, five (4.1%) had grade III, seven (5.7%) had grade IV and one (0.8%) had grade V. At L5/S1 level, there were 29 (23.6%), 11 (8.9%) and eight (6.5%) patients with Pfirrmann grade II, III and IV, respectively.

Fig. 2
figure 2

Distribution of Pfirrmann grade according to disc levels

When we compared patients with healthy discs (Pfirrmann grade I to III) and those with degenerated discs (Pfirrmann IV and V), we found that patients with degenerated discs were older (37.9 ± 11.4 years versus 31.4 ± 6.7 years, p = 0.017) and had longer follow-up (22.1 ± 8.9 years versus 15.9 ± 3.6 years, p = 0.001). Other demographic and postoperative radiological parameters did not show any significant differences (Table 3).

Table 3 Demographics and postoperative data comparing healthy disc and degenerated disc

There were no significant correlations between LIV and Pfirrmann grade for adjacent disc below LIV (p = 0.253), Pfirrmann grade for adjacent disc + 1 below LIV (p = 0.457), worst Pfirrmann grade of any disc below LIV (p = 0.284) or mean Pfirrmann scores (p = 0.581) (Table 4). There was no significant correlation between age and worst Pfirrmann grade of any disc below LIV (p = 0.057). However, there were significant moderate correlation between age and Pfirrmann grade for adjacent disc + 1 below LIV (r = 0.475, p = 0.001) and mean Pfirrmann scores (r = 0.546, p < 0.001). There was significant weak correlation between age and adjacent disc below LIV (r = 0.365, p = 0.011). There were no significant correlations between BMI and Pfirrmann grade for adjacent disc below LIV (p = 0.785), Pfirrmann grade for adjacent disc + 1 below LIV (p = 0.740), worst Pfirrmann grade of any disc below LIV (p = 0.166) or mean Pfirrmann scores (p = 0.591).

Table 4 Correlation between LIV and age with Pfirrmann grades or scores

The HRQoL and sports activity scores comparison between G1 and G2 are shown in Table 5. In the SRS-22r questionnaire, only pain domain demonstrated significant difference between G1 and G2 (4.3 ± 0.5 versus 4.0 ± 0.6, p = 0.044). Similarly, G1 obtained significantly higher scores in the bodily pain domain in SF-36 questionnaire, 88.7 ± 12.3 versus 77.8 ± 18.7 (p = 0.018). There were no significant differences in other domains of SRS-22r and SF-36 between both groups. There were also no significant differences in the ODI and MCSAS scores. Patients with healthy discs showed significantly higher scores in self-image/appearance domain in SRS-22r when compared to patients with degenerated discs (3.9 ± 0.6 versus 3.4 ± 0.4, p = 0.006). There were no statistical differences between both groups in other HRQoL questionnaires (Table 5).

Table 5 Health-related quality of life and sports activity scores comparing between LIV at L3 or higher and LIV at L4 or lower and between healthy disc (Pfirrmann I-III) and degenerated disc (Pfirrmann IV and V)

Discussion

Long-term degeneration of the lumbar spine in idiopathic scoliotic patients who underwent spinal fusion surgeries had been assessed by plain radiographs [23, 27] and MRI scans [5, 18, 25, 28, 29]. Danielsson et al. [27] found that AIS patients who had been treated with brace or surgery had more degenerative disc changes when compared to normal controls on plain radiographs during follow-up of more than 20 years. However, they found no differences between AIS patients treated with brace or surgery. Sudo et al. [23] followed-up AIS patients for about 17 years and found that 23% developed mild degeneration at the adjacent disc level below the fusion mass on plain radiographs.

Danielsson et al. [5] examined 32 AIS patients with spinal fusion using Harrington rods 25 years post-surgery and found that there were significantly more degenerative disc changes, disc height reduction and end-plate changes in the lowest unfused disc compared with the control non-scoliotic patients on MRI scans. Kelly et al. [29] investigated on the long-term outcome for AIS patients who had anterior spinal fusion and found that amongst the six patients who had lumbar MRI performed, all demonstrated loss of signal intensity on the T2-weighted image in the disc below the fusion mass with disc space narrowing. Green et al. [18] compared MRI before surgery with follow-up MRI after a mean follow-up of 11.8 years and found that 85% of patients had new disc pathology. Akazawa et al. [25] reported that in AIS patients after 35-year post-spinal fusion surgery, patient with fusion to L4 or lower had less lumbar lordosis, considerable SVA imbalance and more severe disc degeneration compared with those with LIV at L3 or higher. Ghandhari et al. [28] found that after 3 to 5 years following PSF for AIS patients, 6 out of 37 (16.2%) patients developed degenerated disc when preoperative MRIs were compared with follow-up MRIs. This was not associated with current age of the patients, preoperative or postoperative vertebral tilt angle, visual analog scale (VAS) and ODI score, level of fusion and choice of fusion device.

Our study found that both groups (LIV L3 or higher and LIV L4 or lower) had comparable demographic and postoperative radiological parameters except that G2 (LIV L4 or lower) patients had a significantly larger postoperative coronal LIV tilt angle (Table 1). We noted that there were more patients in G2 compared to G1 which does not represent current practice. We postulate that the surgeries performed during that period were more conservative, thus longer fusions with less selective fusions being done. It may also be due to the surgeons’ experience or preference, implant technology available and the late presentation of patients with larger curves requiring longer fusions. Although G2 had higher proportions of patients with degenerated disc (Pfirrmann IV and V) for adjacent disc below LIV, adjacent disc + 1 below LIV and all discs below LIV, we found no significant differences in all Pfirrmann grades or scores when compared to G1 (LIV L3 or higher group) (Table 2). The selection of LIV was also not significantly correlated with any of the Pfirrmann grades or scores (Table 4). Therefore, we conclude that the selection of LIV might not have any significant mid-long-term effect on the degeneration of the remaining unfused lumbar intervertebral discs.

When we sub-analysed by dividing the patients into healthy and degenerated discs, we found that there were significant differences in age and follow-up duration (Table 3). Patients with degenerated disc were older and had longer follow-up duration. Further analysis showed that age had moderate significant correlation with Pfirrmann grade for adjacent disc + 1 below LIV and Pfirrmann scores (Table 4). Age also had weak significant correlation with Pfirrmann grade for adjacent discs. Therefore, we found that degeneration of the discs may be related to ageing of patients.

We did not find any significant correlation between BMI and Pfirrmann grades or scores (Table 4). Lumbar intervertebral disc degeneration on MRI had been found to be higher in overweight and obese individuals. Samartzis et al. [30] had evaluated 1,040 men and 1,559 women (mean age 41.9 years) and found that BMI was significantly higher in subjects with disc degeneration (mean 23.3 kg/m2) than in subjects without degeneration (mean 21.7 kg/m2) (p < 0.001). The reason why our findings did not correspond to what was described in the literature may be due to our patients being younger (mean age = 33.3 ± 8.7 years) and had lower BMI (mean BMI = 20.4 ± 3.6 kg/m2, G1 = 20.6 ± 4.6 kg/m2, G2 = 20.2 ± 2.9 kg/m2). Lower BMI in our Asian population may not be seen in certain populations such as the Western societies. Therefore, the results from this study may not be applicable to populations with a higher average BMI.

For the HRQoL and sports activity scores, patients with LIV L4 or lower had significantly lower scores for the SRS-22r pain domain and the SF-36 bodily pain domain (Table 5). All other parameters showed no significant differences. Therefore, our patients with LIV L4 or lower had more significant back pain when compared with those patients with LIV L3 or higher. Other than back pain, both groups had similar physical function, self-image, satisfaction with treatment, mental health, and functional sports activity.

There were limitations in this study. The first limitation was the small sample size. A larger sample size would be able to reduce the bias and improve the strength of the study’s results. However, this was limited by the number of surgeries performed previously and the willingness of patients to undergo an MRI during follow-up. Second limitation was the availability of preoperative data and records which were unavailable for us to compare and analyse. Third limitation was the unavailability of the preoperative MRI to assess the pre-existing degenerative disc condition which may be present prior to surgery. Fourth limitation was the unavailability of a matched control group of normal subjects and a matched non-surgically treated group of AIS patients treated with brace. Fifth limitation was the short follow-up of 17.7 ± 6.3 years for these patients to develop degenerative disease since they were young (mostly adolescent age group) when they had surgeries done. A longer follow-up period may yield a more accurate result. Even though in our centre, it is routine to continue follow-ups for patients who had spine surgery, some patients may have concerns, symptoms or other factors that motivated them to come for clinic appointments. Unfortunately, we did not record or analysed these factors.

Conclusions

Patients with fusion to L4 or lower had more significant back pain when compared to patients with fusion to L3 or higher. However, both groups had similar physical function, self-image, satisfaction with treatment, mental health, and functional sports activity. We did not find any significant differences in the degeneration of the remaining unfused lumbar intervertebral discs with the selection of LIV.