Introduction

Anterior spinal instrumented fusion (ASF) is a viable option to correct main thoracolumbar/lumbar curves (LC) in adolescent idiopathic scoliosis (AIS). Using selective LC correction and fusion (SLF), the compensatory thoracic curve (TC) is expected to resolve by spontaneous thoracic curve correction (STCC). Several authors identified that lumbar and thoracic curves remain stable over time, from postoperation to follow-up at 3–6 months or even after as many as 17 years [513].

A few studies revealed improved outcomes with ASF [7, 16, 17], better trunk mobility [26] and, on average, approximately 1 level of shorter fusions compared to posterior spinal fusion (PSF) [16, 17, 2729]. Large matched-pair studies comparing ASF and PSF do not exist [16, 17]. Studies improving the understanding of LC characteristics, predictors of LC-resolution, and data for selecting the optimal lowest instrumented vertebra (LIV) are scant. Clinical and biomechanical studies have demonstrated the benefit of maintaining distal lumbar motion segments, particularly from L3 to S1, to decrease the likelihood of degeneration and back pain, with a more distal fusion in the lumbar spine and a lack of LIV-horizontalization [3742]. The optimal LIV position, however, is often difficult to predict preoperatively. With SLF, there is particular interest in achieving a horizontal LIV and good STCC. With SLF, the STCC was shown to be inferior to the preoperative flexibility and difficult to predict [13, 43]. Only a few studies have tried to establish criteria predicting STCC [28, 44]. The accurate prediction of LC correction and STCC is valuable because fusing too much or too little or causing too much of a residual deformity can have long-lasting effects [45]. Therefore, the objectives of the study were to identify predictors of radiographic outcomes, distal adding-on, STCC, and SLF failure.

Materials and methods

A retrospective review of all patients with AIS and a main LC operated with SLF using ASF over a 9-year period (n = 245) was conducted. Patients between 10 and 30 years old with Risser ≥2 were included. No patient was excluded. Medical charts were reviewed for demographics, surgical details, complications, and outcomes. Abbreviations used and radiographic parameters are explained in Table 1.

Table 1 Abbreviations, radiographic parameters, and measurement techniques

Surgical technique

In the study period, SLF was indicated for LC curves defined as the major structural curve with the TC defined as the minor structural curve. According to the retrospectively applied Lenke classification, 151 patients (62 %) had type 5, 58 (24 %) had type 6, 7 (3 %) had type 3, 4 (2 %) had type 2C and 25 (10 %) had type 1C.

Instrumentation with ASF using bicortical screws was performed with a 4.0-mm single-rod system (USIS, Ullrich Medical Systems, Ulm, Germany) in 153 patients (62 %), a 4.5-mm single-rod system (XIA-anterior, Stryker, Marseilles, France) in 53 patients (22 %), and a dual-rod system with 5.0-mm/4.0-mm rods (Halm-Zielke-Device, Depuy, Rayham, USA) in 39 patients (16 %). Anterior mesh cages were used in 11 patients (5 %), for the remaining patients rib grafts only were used. ASF was performed via an open thoraco-lumbophrenic retroperitoneal approach. The fusion length usually included the upper- and lower-end vertebra (EV). LC correction was completed by derotation and the cantilever rod-reduction manoeuvre.

Radiographic analysis

Standard deformity parameters were evaluated preoperatively, postoperatively, and at the final follow-up on biplanar full-spine and bending radiographs (Table 1). At the 6-month follow-up, fusion was also assessed using tomographies. Curve flexibility (%/°) and the LC- and TC-corrections (%/°) were calculated as described in Table 1. The apex of LC and TC, the EV and stable vertebra (SV), and the upper instrumented vertebra (UIV) and LIV were defined on AP radiographs. The LIV distance with respect to SV and EV was recorded. A negative value, e.g. LIV = SV–1, indicated that LIV was 1 level cephalad to the SV.

On follow-up radiographs, changes in construct alignment, failure of instrumentation, and evidence of non-union were noted. For stratification of radiographic outcomes, an optimal radiographic outcome was defined by a target LC ≤20°, which in a balanced spine is referred to as a LIV-tilt ≤10°. For the TC, the cut-off was set at a TC ≤30°. Definitions of target parameters were based on prior studies establishing indicator variables for radiographic and clinical outcomes, including complications, risk for advanced degeneration, and the need for revision surgery [13, 28, 4653]. An increase in the LIV adjacent segment disc angle (LIVDA) ≥5° was defined as ’adding-on’. Any increase in the TC from preoperation to follow-up was defined as TC-progression.

Surgical outcome analysis

Complications were assessed according to Glassmann [54]. Revision surgery was defined as any surgery related to the index procedure. Indications for late PSF and revision surgery predictors were recorded.

Clinical analysis

The subjective outcome was defined using the modified Odom criteria [55], with stratification into excellent, good, moderate, and poor.

Statistical analysis

Independent Student t tests, Welch ANOVA, Pearson’s Chi-squared test, Fisher’s exact test, tests for correlations, and independent bootstrap t tests were applied. Crosstabulation tables and correlation coefficients were computed. 95 % confidence intervals (CI) were computed for the means, differences of means, probabilities, odds ratios (OR), and for regression curves. Univariate and multivariate discrete logistic regression analysis and linear regression models were applied to identify the criteria that predicted the optimal radiographic outcomes. For both models, a conditional forward variable selection algorithm was applied. Residual analyses were completed to describe the performance of the models. ROC-analyses were performed; specificities, sensitivities, and AUCs with 95 % CI were computed. All tests were two-sided. A p value <5 % was considered to indicate a statistically significant difference. All analyses were completed using STATISTICA 10 (StatSoft, UT, USA), StatXact 10 (Cytel Software Corporation, Cambridge, USA) and MATHEMATICA 7 (Wolfram Research, IL, USA).

Results

Sample characteristics

The sample included 245 patients, of whom 223 were females (91 %). The average age was 17 years (12–30 years), while the average fusion length was 4.6 ± 0.9 levels (3–8 levels). The length of follow-up averaged 32 months (6–129 months).

In the majority of patients, the LC apex was at L1 (46 %), at L2 (20 %), at the L1-2 disc (13 %), and at the T12–L1 disc (9 %). The LIV-ROT before surgery was 0.9 ± 0.6 (0–2.5). The distribution of UIV and LIV is illustrated in Fig. 1. Among the patients, 68 % had LIV at SV-2 and 28 % had LIV at SV-1. The distance between LIV and SV was −1.6 ± 0.6 and that between LIV and EV was −0.1 ± 0.5.

Fig. 1
figure 1

Distribution of upper instrumented vertebra (UIV) and lowest instrumented vertebra (LIV) among the 245 patients with thoracolumbar/lumbar curve fusions

Radiographic results

The detailed radiographic results are summarised in Table 2 and Fig. 4. For a comparison of these results to a review of the literature, see Table E4 (Electronic Supplement). The radiographic results, stratified according to the different implants, are summarised in Table 3. With comparable baseline flexibility, the rigid dual-rod system was shown to confer improved postoperative and follow-up LC correction (%) compared to the single-rod systems.

Table 2 Summary of radiographic results
Table 3 Lumbar and thoracic curve resolution stratified according to different implant types

Main thoracolumbar/lumbar curve (LC)

The mean preoperative LC averaged 49°, LC-bending was 22° (LC-flexibility of 57 %), postoperative LC was 20°, and follow-up LC was 25°. The postoperative LC correction averaged 59 % and was 48 % at follow-up. The mean preoperative Fusion-Cobb was 48°, while the postoperative Fusion-Cobb was 13°, and the follow-up Fusion-Cobb was 16°, representing corrections of 73 and 67 %, respectively. At follow-up, 86 patients (35 %) had an LC ≤20°, and 177 patients (72 %) had a Fusion-Cobb ≤20°. The postoperative LC was −2.3 ± 10.5° smaller than the LC-bending value. At follow-up, the difference was 2.9 ± 10.5°. The scoliosis correction index (=postop correction (%)/preop flexibility (%)) at follow-up was 1.1 ± 0.6.

Statistical analysis showed a strong correlation between preoperative LC and postoperative LC (p < 0.01, r = 0.6) and follow-up LC (p < 0.01, r = 0.6) as well as between LC-bending and postoperative LC (p < 0.01, r = 0.59) and follow-up LC (p < 0.01, r = 0.6). A statistical analysis based on a generalised linear model shows that LC-bending and preoperative LC are significantly related with follow-up LC (Fig. E1 Electronic Supplement). 94 % of all subjects show a difference between observed and predicted follow-up LC of ±10°. The correlations between preoperative Fusion-Cobb and postoperative Fusion-Cobb (p < 0.01, r = 0.6) and follow-up Fusion-Cobb (p < 0.01, r = 0.6) were stronger. The same was true for the correlation between LC-bending and postoperative Fusion-Cobb (p < 0.01, r = 0.65) and follow-up Fusion-Cobb (p < 0.01, r = 0.65). The postoperative LC correction (%) significantly differed when a LIV was selected at SV-2 vs. SV-3 (p = 0.0009) or SV-1 vs. SV-3 (p = 0.0003).

Regarding the axial plane, the change from the preoperative LC-AVR to the postoperative LC-AVR was 0.6 ± 1°. Correlations between the follow-up LC and the preoperative LC-AVR were observed (p < 0.001, r = 0.3). However, these correlations were obviously weaker than that between follow-up LC and preoperative LC, and no significant correlation was observed between preoperative LC-AVR and postoperative LC. Statistics showed that neither preoperative LIV-ROT nor LIV-1 ROT had a significant correlation with postoperative LC or postoperative Fusion-Cobb. The results indicate that preoperative LC had a greater impact on postoperative LC and follow-up LC compared to preoperative apical rotation.

Prediction of LC-resolution at follow-up

Follow-up LC and an LC ≤20° were largely influenced by LC-bending (p < 0.0001, r = 0.6) and preoperative LC (p < 0.0001, r = 0.6). Accordingly, achieving a target follow-up LC ≤20° was more likely in patients with smaller LC-bending (p < 0.0001, 16° vs. 26°), preoperative LC (p < 0.0001, 42° vs. 53°), preoperative TC (p < 0.0001, 31° vs. 42°), and TC-bending (p < 0.0001, 15° vs. 24°). Similarly, follow-up Fusion-Cobb ≤20° was more likely in patients with smaller preoperative LC (p < 0.0001, 45° vs. 60°), LC-bending (p < 0.0001, 18° vs. 33°), postoperative LC (p < 0.0001, 17° vs. 29°), preoperative TC (p < 0.0001, 31° vs. 42°), and TC-bending (p < 0.0001, 18° vs. 29°).

LC-flexibility (%) did not predict a LC ≤20°, but the risk for LC ≤20° at follow-up significantly depended on the distance from the LIV to SV (p < 0.01). While the probability of LC ≤20° was 28 % for patients with LIV at SV-2, the corresponding probability almost doubled to 52 % with LIV at SV-1 (p < 0.0004).

To identify the most valuable radiographic predictors, a multivariate analysis was performed with ‘LC ≤20°’ as the dependent variable. Preoperative parameters that showed a significant impact in the univariate analysis (p < 0.01) and were suitable as clinical predictors were selected for the testing models. Multivariate logistic regression analysis showed that preoperative LC (p = 0.04, OR 1.04) and LC-bending (p = 0.009, OR 1.06) were significant predictors of follow-up LC. A prediction model was set up and is illustrated in Fig. 2, revealing high accuracy. A counter-plot (Fig. E2 Electronic Supplement) explains how such models can support the decision-making process regarding LC-resolution prediction.

Fig. 2
figure 2

Prediction model for LC ≤20° and LC >20°. Logistic regression analysis showed that both preoperative LC [p = 0.037, OR 1.04 (95 % CI 1.0–1.08)] and LC-bending [p = 0.009, OR 1.06 (95 % CI 1.01–1.1)] were significant predictors of follow-up LC. A prediction model was set up and is illustrated. Of all patients, 46 % (95 % CI 35–84 %) with LC ≤20° were correctly classified as having a LC ≤20° (i.e., sensitivity), and the model correctly classified 85 % (95 % CI 78–90 %) of all subjects with LC >20° (specificity)

Compensatory thoracic curve (TC)

The mean preoperative TC was 39°, TC-bending was 21° (flexibility of 48 %), and follow-up TC was 29°. The mean STCC was 32 % (p = 0.001). Preoperatively, 67 (32 %) patients had a TC ≤30°, and 167 (81 %) patients had a TC-bending TC ≤30° (valids n = 207). At follow-up, 110 patients (55 %) had TC ≤30° (valids n = 202), 181 patients (90 %) had a smaller or the same TC compared to the preoperative TC (valids n = 202), and 20 (10 %) had TC-progression. Statistical analysis showed a strong correlation between preoperative TC and postoperative TC (p < 0.001, r = 0.8) and follow-up TC (p < 0.01, r = 0.8) as well as between the preoperative TC-bending and postoperative TC (p < 0.01, r = 0.8) and follow-up TC (p < 0.01, r = 0.8). The postoperative TC was 5 ± 7.5° (11–33°) larger than the TC-bending value, and the follow-up TC was 10 ± 8.2° (9–39°) larger than the TC-bending value.

Regarding the axial plane, the change from preoperative TC-AVR to postoperative TC-AVR averaged 0 ± 0.5°. Correlations between preoperative TC-AVR and follow-up TC (p < 0.001, r = 0.3) and postoperative TC were present (p < 0.001, r = 0.3). The results indicate that preoperative TC and preoperative TC-bending had a greater impact on postoperative TC and follow-up TC than did preoperative apical rotation.

Prediction of spontaneous thoracic curve correction (STCC)

The statistical analysis revealed a strong correlation between preoperative LC and preoperative TC (p < 0.001, r = 0.6). Similar correlations were observed postoperative (p < 0.01, r = 0.6) and at follow-up (p < 0.01, r = 0.6). Notably, correlations between LC-flexibility (°/%) and TC-flexibility (r = 0.3, p < 0.001) were weak, and there were only weak statistical correlation between preoperative to postoperative and preoperative to follow-up correction of the LC and TC (p = 0.1/p < 0.001, r = 0.3), respectively. Thus, postoperative STCC was not strongly associated with the LC correction, indicating the need for further analysis.

Preoperatively, the difference between the LC and TC was 15 ± 8.6°. Postoperatively, the same difference was −6 ± 8.5°, indicating a slightly larger TC than LC. An analysis of the predictors for a follow-up TC ≤30° showed that these patients had an average follow-up TC of 19° vs. 42° (p < 0.001). A follow-up TC ≤30° was more likely in patients with smaller TC-bending (p < 0.001, 15° vs. 29°), smaller preoperative TC (p < 0.001, 31° vs. 48°), preoperative LC (p < 0.001, 45° vs. 56°), and LC-bending (p < 0.001, 18° vs. 28°), and better LC-resolution (postoperative LC (p < 0.001, 17° vs. 25°) and Fusion-Cobb (p < 0.001, 10° vs. 17°), follow-up LC (p < 0.001, 22° vs. 31°) and Fusion-Cobb (p < 0.001, 13° vs. 21°), patients with target LC ≤20° (p < 0.00001).

Of the patients, 10 % (n = 20) had TC-progression. Statistical analysis revealed that patients with TC-progression had a greater preoperative coronal imbalance (p = 0.04, C7-CSVL 2.8 vs. 2.2 cm), preoperative shoulder tilt (p = 0.02, 3.5° vs. 1.9°), follow-up LC (p = 0.03, 25° vs. 20°), postoperative Fusion-Cobb (p = 0.03, 17° vs. 13°), and follow-up Fusion-Cobb (p = 0.02, 21° vs. 16°). Patients with LC ≤20° were less likely to experience TC-progression (p = 0.01).

Regarding the prediction of STCC, a linear regression model showed that preoperative TC and TC-bending were significant predictors of follow-up TC, explaining 67 % of the variation in follow-up TC (r = 0.8, p < 0.0000001, Fig. 3). The residual analysis revealed that in 96 % of the subjects, the difference between the observed and predicted values was <±10°. In 56 %, the difference was only ±5°. The prediction accuracy was high. A multivariate logistic regression analysis was performed to analyse the prediction strength of preoperative TC and TC-bending for a follow-up TC ≤30°. In the model, preoperative TC (p < 0.00004, OR 1.16; 95 % CI 1.09–1.23) and TC-bending (p = 0.037, OR 1.06; 95 % CI 1.0–1.12) were shown to be significant predictors. A total of 84 % of all subjects with TC ≤30° were correctly classified as having a TC ≤30° and the model correctly classsified 79 % of all subjects with TC >30° (Fig. E3 Electronic Supplement). In addition, a ROC-analysis showed a sensitivity of 70 % and a specificity of 60 % using a cut-off preoperative TC of 38° for a TC ≤30° at follow-up.

Fig. 3
figure 3

Interaction of preoperative TC, TC-bending, and follow-up TC. For the prediction of STCC by preoperative parameters, the linear regression model shows that preoperative TC and TC-bending are significantly predictive for follow-up TC (preoperative TC regression coefficient b = 0.48, 95 % CI 0.33–0.62, p < 0.000000001; TC-bending b = 0.42, 95 % CI 0.27–0.58, p < 0.0000003). Preoperative TC and TC-bending are strongly correlated with follow-up TC and explained 67 % of the variation in follow-up TC (multiple correlation coefficient r = 0.82, adjusted r 2 = 67 %, p < 0.0000001). Letting x 1 be the preoperative TC, x 2 be the TC-bending, and y be the follow-up TC, the corresponding regression equation for the mean response and 95 % CI mean prediction band is as follows: y = 2.26 + 0.48x 1 + 0.42x 2. The residual analysis revealed that in 96 % of all subjects, the difference between the observed and predicted values (i.e., the residuals) was <±10°. In 56 % of the subjects, the difference was only ±5°

Lowest instrumented vertebra adjacent disc angle (LIVDA)

The increase of LIVDA from postoperative to follow-up showed a negative correlation with the preoperative LIVDA (p < 0.01, r = −0.7) and a positive with postoperative LIVDA (p < 0.001, r = 0.5), but not with the preoperative LC and follow-up length. An increase in the follow-up LIVDA was also correlated with preoperative LIV-ROT (p = 0.0008; −1.3 ± 5.9° for LIV-ROT = 0, 1.7 ± 5.2° for LIV-ROT = 1 and 3.6 ± 5.2° for LIV-ROT 2). At follow-up, 31 % patients had adding-on. Patients with adding-on had a follow-up LIVDA averaging 9.8° compared to 4.4° for patients without (p < 0.001) and a LIVDA increase averaging 7.6° vs. −1.3° (p < 0.001). Multiple factors promoted adding-on: patients with adding-on had smaller preoperative LIVDA (5.7° vs. 2.3°, p < 0.001). Statistical analysis using ROC-curves for the risk of adding-on established a cut-off value for preoperative LIVDA at <3.5° with high specificity (71 %) and sensitivity (86 %). Clinical case examples stresses these observations (Fig. 4d–f and Fig. E4 Electronic Supplement). In addition, the risk of adding-on increased in patients failing to achieve a LC ≤20° (p = 0.008). Adding-on was more likely with LIV-SV = −2 (40 %) compared to LIV-SV = −1 (20 %) (p < 0.0001). The incidence of adding-on was 14 % in patients with a LIV-ROT = 0, 31 % in those with LIV-ROT = 1 and 45 % in those with LIV-ROT = 2 (p = 0.02). Regarding the relationship between LIV and EV, adding-on was observed in 56 % of patients with LIV-EV = −1, 28 % with LIV-EV = 0, and 0 % with LIV-EV = 1. The differences were significant (p = 0.01–0.04).

Sagittal plane

Results are summarised in Table 2. The physiological spinal and spinopelvic interdependencies were consistent at preoperation, postoperation, and follow-up (p < 0.01, r = 0.4–0.7). At follow-up, the proximal junctional kyphosis (PJK) angle was correlated with the TLA (p < 0.001, r = 0.4) and the TK (p < 0.001, r = 0.6), which in turn showed a significant correlation with the SS (p = 0.007, r = −0.2), indicating that PJK was a physiological function of sagittal thoracolumbar alignment and particularly TK.

Surgical results

Clinical outcomes

Of the 174 patients with complete clinical follow-up data, 97 patients (56 %) reported an excellent outcome, 72 (41 %) reported a good, and 5 (3 %) reported a moderate outcome. Clinical outcome was significantly different between Lenke types 5 and 6 (p = 0.01) and between types 3 and 5, in favour of type 5 (p = 0.03). The clinical outcome decreased with higher follow-up LIVDA (p = 0.005), adding-on (p < 0.05), and late PSF (p = 0.06).

Complications

Twenty patients (8 %) required PSF. The main causes were proximal adding-on (1), distal adding-on (7), non-union (10), TC-progression (7), including combinations. Four of the aforementioned patients had TC-progression and distal adding-on. In total, 13 patients were diagnosed as non-union, and 9 of these required revision. Two patients required revision surgery for superficial wound infection. Rod fractures occurred in 14 % of patients and only occurred in the 4.0-mm and 5.5-mm single-rod systems (p = 0.1). In the rod fracture group, the incidence was significantly increased in patients with a follow-up LC >20° (p = 0.01), Fusion-Cobb >20° (p < 0.00001), greater preoperative CSVL (p = 0.03), postoperative LC (p < 0.05), and postoperative Fusion-Cobb (p = 0.003). Regarding medical complications, nine patients (4 %) had a urinary tract infection, and three had pancreatitis (1 %), all of which were resolved by medical treatment.

Statistical analysis showed an increased risk for late PSF with a LIV-SV = −2 (60 %) vs. LIV-SV = −1 (25 %) (p = 0.04). Other risk factors included TC-bending (p = 0.008, 28° vs. 20°; OR 1.05), preoperative LC (p = 0.02, 56° vs. 48°, OR 1.01), rod fracture (p < 0.00001, 29 vs. 5 %), failure to achieve a LC ≤20° (p = 0.01, 12 vs. 3 %) or a Fusion-Cobb ≤20° (p = 0.03, 15 vs. 6 %). Multivariate regression analysis showed that after applying a variable selection algorithm, only preoperative TC remained in the model. Hence, preoperative TC (p = 0.004, 47° vs. 38°) was the strongest predictor of the need for late PSF. An increase of the preoperative TC by 14° doubled the risk for late PSF (p = 0.001, OR 1.05, 95 % CI 1.01–1.09).

Discussion

In the current study, age and follow-up length are comparable to previous studies [43]. The radiographic results were compared to a review of the literature (Table 4 in Electronic Supplement) and showed a higher preoperative TC and more rigid LC with increased TC and LC at follow-up compared to the means of other studies. Our study covered a large time period and, thus, a large range of radiographic outcomes. Therefore, the sample was ideal to study risk factors for radiographic and clinical parameters in ASF surgery, including complications.

Lumbar curve (LC)

With a LIV-EV relation averaging −0.1, most patients in our study had EV-to-EV fusions. Hence, the results refer to ASF, with the EV merely selected as the LIV and UIV in accordance with the practice of most surgeons using ASF [6, 7, 13, 17, 28, 43]. Nevertheless, with EV-to-EV fusion, the number of fusion levels varies, with a mean of 4.6 in 13 studies [6, 7, 10, 17, 21, 28, 43, 44, 51, 52, 56, 57]. Fusion length nuances might be explained by a surgeon’s hesitation to fuse to L4 and decision to stop at L3 [48]. In our study, 12 % of the patients had LIV at L4 and this was 16–28 % in other studies using ASF [10, 56, 58]. L4 was selected in 30–45 % in series using PSF [10, 53].

Radiographic success with SLF usually depends on the individual surgeon’s acceptance of residual deformity. Accordingly, the indications for SLF and fusion level selection varied in the past, merely based on expert opinion, and included parameters such as LC to TC Cobb ratio ≥1.25 [59], preoperative TC <50° [59], Risser ≥2 [59], TC <40° [60], a TC two-thirds the magnitude of LC [6], TC-flexibility >50 % [6], LC <85° [52], LC-flexibility >50 % [52], TC-flexibility >50 %, or LC <30° on a stretch-film [52]. Due to historical changes in the indications of the authors’ institution, our study included several curve types according to the retrospectively applied Lenke classification. Thus, an analysis of SLF for different curve severities was possible. A wide range of curve severity and the large sample size enabled an adequately powered statistical analysis of LC ≤20° predictors. The statistical approach was thought to be ideal to delineate both the selection criteria for SLF and the predictors for the target outcomes. The selection of LC ≤20° as a sharp target outcome in the prediction models results in ‘safe’ recommendations for SLF, not underestimating the deformity and overestimating the curve response to surgical intervention. Based on the intergroup differences of patients with a follow-up LC ≤/>20° and prediction models for LC ≤20°, we defined the following cut-offs: patients with LC-bending <20°, preoperative LC <50°, and an LIV at SV-1 or at SV have a high probability of achieving the target LC. Because simple comparisons among groups usually have limited accuracy, the prediction models (Fig. 2, Figs. E1 and E2 Electronic Supplement) offer additional information for individual risk assessment. To test the model’s performance in an independent sample, we applied the equations to patients of previous studies. In a study by Schulte [43] reporting 27 SLF using dual-rod ASF, the difference between the predicted LC using our model and the actual follow-up LC averaged −4.5° (smaller LC than predicted). In our study, the majority of patients had 4.0-mm or 4.5-mm single-rod constructs. Echoing the findings in our study, the dual-rod construct also performed better in Schulte’s study [43]. Accordingly, the model offers a rather safe estimate for planning LC correction and anticipating LC-resolution using modern dual-rod constructs.

Spontaneous thoracic curve correction (STCC)

The postoperative STCC usually remains stable until follow-up without significant changes for as long as 17 years [25]. However, with the preoperative TC being larger and more rigid in our series compared to literature (Table 4 Electronic Supplement), 10 % had TC-progression. This progression was a result of proximal adding-on and real TC-increase. Li [17] compared 22 ASF and 24 PSF for Lenke 5 curves and reported that 18 % of patients after ASF and 17 % after PSF had no STCC but TC-progression. Risk factors were shown to be a smaller LC to TC Cobb ratio and less TC-flexibility. Notably, STCC is frequently slightly inferior to the TC-bending after SLF [13, 43, 57]. In Schulte’s study [43], 93 % of the patients had worse follow-up TC than preoperative TC-bending, and the TC was larger than the fused LC in 41 % of the patients. Predictors for STCC could not be established. In a multicenter study on 49 ASF for LC [28], 12 % had TC-progression as defined by a follow-up TC >40°. The LC to TC Cobb ratio, the magnitude of TC-bending, and status of the triradiate cartilage were identified as predictors of TC-progression. Cut-off values as estimates for the risk of progression were not offered. Ding [61] reported on the outcomes of 130 ASF and observed proximal adding-on in 8 % of these cases. The representative cases of proximal adding-on associated with distal adding-on resembled TC-progression. Maturity and UIV selection were identified as risk factors. Recently, Lark [44] reported outcomes of 29 matched pairs treated with SLF or non-SLF. The results indicated that TC inclusion improves LC- and TC-correction at the cost of decreased TK and trunk mobility. Predictors for STCC resolution were not reported.

The large sample size and a wide range for STCC in our study enabled the establishment of prediction models for STCC and a target TC ≤30°, with a high accuracy. In addition, based on the intergroup differences for the means and SD of patients with follow-up TC ≤30° and prediction models for TC ≤30°, we defined potentially useful cut-offs that indicate significantly increased chances of achieving a target TC ≤30°: TC-bending <20°, preoperative TC <40°, and postoperative target LC ≤20°. Because statistical comparisons among groups have limited accuracy, the prediction models outlined in Fig. 3 and Fig. E3 (Electronic Supplement) offer additional information for individual risk assessment.

Evolution of the LIVDA and CSVL

Asymmetric wedging of the LIVDA bears the potential of accelerated disc degeneration over time as a result of adding-on. Distal adding-on was observed in 31 % of our patients (Fig. E4 Electronic Supplement). It was one of the risk factors for late PSF. As in our study, several authors observed a significant LIVDA increase from preoperative to postoperative after SLF [9, 11, 21, 58] and from postoperative to follow-up [11, 13, 53, 58, 62]. In two long-term studies, each with 17 years of follow-up, the LIVDA averaged 3.2°, 9.6°, and 11.8° [9] or 0.2°, 6.3°, and 9° [25]. Predictors for an LIVDA increase could not be established. However, a large residual LIVDA in both studies raised concerns regarding the fate of the disc below the LIV when these patients age.

Fig. 4
figure 4

a Radiographic course of a 15 years old female patient. ASF using lumbar interbody mesh cages (Lenke 5 curve). b Radiographic course of a 16 years old female patient. ASF using 4.5 mm single-rod system (Lenke 5 curve). c Radiographic course of a 16 years old female patient. ASF using 4.5 mm single-rod system (Lenke 6 curve). d Radiographic course of 15 years old female patient. ASF using 5.0 mm single-rod system. Note instrumentation to LIV (L3) with almost horizontal preoperative LIVDA. Adding-on postoperative, at 6 months and at late 6 years follow-up. Excellent restoration of lumbar lordosis. e Radiographic course of 14 years old male patient. ASF using 4.5 mm single-rod system. Note excellent correction of large thoracolumbar curve, but TC-progression (Lenke 6) with distal adding-on. Good clinical outcome. f Radiographic course of 15.5 years old female patient. ASF using 5.0/4.0 mm dual-rod system (Lenke 5 curve). Note, mild distal adding-on at the postoperative and at 6 months follow-up and on follow-up radiographs

In our study, the LIVDA increase from postoperation to follow-up was highly related to the preoperative LIVDA and postoperative LIVDA but not to the preoperative LC. Notably, the preoperative LIVDA was significantly smaller in patients with adding-on. Using ROC-curve analyses, a preoperative LIVDA <3.5° was shown to predict an increased risk for adding-on. In addition, adding-on was significantly affected by the magnitude of LIV-ROT, particularly if approaching grade 2, and by the relationship between the LIV and the SV and the EV, with a high risk for adding-on if the LIV was 2 levels short of SV or 1 level short of the EV. Similar to our findings, Satake [56] reported on 61 ASF and noted a LIVDA ≥5° in 38 % of the patients; the LIV-EV-1 group had parallel preoperative LIVDA and experienced a significant increase until follow-up. For LIV = EV, with a near parallel disc below the LIV or LIV-EV = 1, the LIVDA was much less than that for LIV = EV-1. A more cephalad LIV and a shorter fusion created larger disc wedging below the LIV. Likewise, very short fusions according to the ’Hall-principle’ showed inferior results regarding LC correction and a loss of LC correction compared to an EV-to-EV fusion using a dual-rod construct [63] (Fig. E4 Electronic Supplement).

Satake [56] reported that adding-on occurred mostly when the subjacent disc was nearly parallel, similar to another study by Sudo [25]. Likewise, Kaneda [64] observed disc wedging with a follow-up LIVDA of 6.6° in patients with LIV = EV-1, compared to 3° in patients with LIV = EV. In a study by Wang [58], the LIVDA increase was greater in patients with LIV = EV-1 (9.3°) compared to LIV = EV (2.6°). Our findings add evidence to previous observations regarding the relationship LIV-EV has with adding-on. We observed a significantly increased risk for adding-on according to the LIV-EV relation and with a preoperative LIVDA <3.5°. To summarise, our findings indicate that if the parallel disc is excluded from the fusion, disc wedging is likely. Retrospectively, LIV selection above the parallel disc in alleged EV-to-EV fusions might be related to the failed identification of the true EV of the LC [6, 7, 10, 17, 21, 28, 43, 44, 51, 52, 56, 57].

Based on our findings, careful definition of the EV and analysing the relationship between LIV and EV and SV is recommended. In order to avoid adding-on, our data indicate that inclusion of the parallel disc is beneficial.

Correction of scoliosis and maintenance of correction

In our study, the postoperative LC correction averaged 59 % and decreased to 49 % at follow-up, while the correction of the Fusion-Cobb measured 73 and 67 %, respectively. Our data compare well to those presented in the literature (Table 4 Electronic Supplement). In our study, a significant loss in Fusion-Cobb was observed, particularly with single-rod systems, in agreement with a previous study [58]. Likewise, rod fracture and non-union was only observed with single-rod systems and did not occur with the dual-rod system. Nevertheless, using dual-rod systems, Sudo [25], Bullmann [57], and Geck [63] reported a loss of LC correction of 3.8°–4.5°. Using a single-rod, Tao [7] reported a LC correction loss of 4.5°, while the same loss was 2° when using a posterior pedicle screw system. In our study, the change in Fusion-Cobb also averaged 2° using the dual-rod system and LC correction was better with dual-rod constructs. We observed that the LC correction loss was largely the sum of minor changes in instrumentation and the LIVDA.

While most radiographic parameters remain stable from postoperation to follow-up, late CSVL improvement is common [11], as it was in our study. CSVL averaged 2 cm postoperation and 1.2 cm at follow-up. When successful global coronal balance is defined as a CSVL ≤2 cm [13], we were successful in 83 % of patients. We observed that CSVL improvement over time comes at the cost of increasing the fractional lumbosacral curve, TC, and particularly the subjacent disc (LIVDA).

Thoracic kyphosis and lumbar lordosis

Summarising the literature (Table 4 Electronic Supplement), ASF improves thoracolumbar kyphosis, as it did in our study [28]. Notably, different to PSF [7, 10] symptomatic PJK was not a concern with ASF in our study and others [21, 25]. In the current study, interbody structural support was not shown to improve LL reconstruction. These findings add evidence to the results from previous investigators, who disproved the benefit of cages [6, 11, 62].

Surgical results and complications

In our study, ASF was shown to be a safe procedure without neurovascular complications in a series of 245 patients. Our data echo previous reports on good outcomes in the majority of patients using ASF [7]. ASF confers the benefits of a posterior muscle sparing approach. However, late PSF was indicated in 8 %. In the literature, the revision rates differ (0–15 %) and are supposed to be slightly lowered when using dual-rod systems constructs [6, 7, 11, 13, 16, 17, 51, 53, 57, 6266]. Notably, using a dual-rod system, Sudo [25] also reported the need for late PSF in 7 % of patients to address distal adding-on and TC-progression. In a multicenter study [67] on 100 patients, 7 % had non-union using a dual-rod, and 0 % had non-union in the single-rod group. Whether the complication rates can be reduced using PSF cannot be defined by the currently available data.

Revision rates vary in the literature and are multifactorial. Variations might be due to different surgeon thresholds when deciding on revision. In Bitan’s series [68], 5 of the 29 patients (17 %) had adding-on, 2 had a non-union, 1 had significant back pain, and 1 had a top screw pullout and a loss of correction. However, the authors reported a zero revision rate. In our series, a proactive approach was selected for patients with a non-union, significant loss of correction or adding-on, and the late PSF rate was 8 %. Statistics showed that revision was attributable to non-union, TC-progression, and adding-on. While our study showed that the risk for non-union could be lowered to zero using a dual-rod construct, the risk for TC-progression and adding-on can be reduced by adjusting the criteria for SLF according to the prediction models established in our study and careful LIV selection, as discussed.

Conclusion

The strength of our study is that it is based on a large sample of patients without any exclusion. Detailed statistical assessment of the radiographic factors identified significant interdependencies between clinically relevant radiographic parameters for LC- and TC-resolution. Using stepwise regression analysis, prediction models were established that might be helpful in the clinical decision-making process. In future studies, the models can be tested and adjusted in independent samples and distinct curve types.