Introduction

Sagittal alignment has been deeply researched since the transition of this century. The anatomical basis to understand spinal alignment was originated in the French school [1, 2], and the clinical application in adult spinal deformity (ASD) was first highlighted by Glassman and Schwab [3,4,5]. Sagittal alignment is now widely accepted to be a critical factor in patient assessment and surgical decision making. Postoperative global sagittal alignment and the residual body compensation (mainly pelvic retroversion) present in fused patients have a large impact on clinical results and potential complications [6], stressing the importance of an adequate sagittal plane restoration when performing surgery.

Traditionally, two classifications have been used to assess the sagittal plane: the Roussouly classification [7], and the SRS-Schwab classification [5, 8]; and more recently two methods were proposed to evaluate long-term implications of realignment: the GAP score [9] and the T4-L1-Hip axis [10]. All four have been investigated for their association with clinical outcomes and mechanical complications. Sometimes different methods are used in combination [11]. However, there is no consensus on which of these methods is more precise to predict complications following ASD surgery. Prior investigations show inconsistent results depending on the study cohort characteristics, the methodology used to assess complications, and the analyzed adverse events [12,13,14,15,16,17]. Importantly, the primary outcome frequently varies between studies (mechanical complications vs only proximal junctional kyphosis vs readmissions, etc.), leading to difficult interpretations when the methods are compared [18,19,20,21].

The aim of our study was to analyze a cohort of ASD operated panlumbar fusions and compare three different alignment assessment methods based on normative data (Roussouly, GAP score, and T4-L1-Hip axis) to determine the method most predictive of different adverse events.

Methods

We conducted a retrospective study with patients taken from a prospective multicenter database (ESSG: European Spine Study Group) comprising adult spinal deformity (ASD) patients. General inclusion criteria for the database were patients older than 18 years with at least one of the following requirements: coronal Cobb ≥ 20°, sagittal vertical axis (SVA) ≥ 5 cm, pelvic tilt (PT) ≥ 25°, or thoracic kyphosis ≥ 60°. Institutional review board approval was obtained at each participating institution prior to patient enrollment. Informed consent was obtained from each patient.

For this specific study we selected patients fulfilling the following criteria: ASD operated patients, more than 6 instrumented levels (UIV at or above L1), fused distally to the pelvis, and complete data at the immediate postoperative (6 weeks) period and 2 years after surgery.

We analyzed three different alignment methods based on normative data in the 6-week postoperative radiograph: (1) mismatch according to the Roussouly classification (categorical variable: matched or mismatched): the postoperative type was calculated based on the 6-week sacral slope magnitude [7] compared to the ideal type based on PI [22]; (2) GAP score (continuous variable) and GAP categories (categorical variable: proportioned, moderately disproportioned, severely disproportioned) as described by Yilgor et al. [9]; (3) T4-L1-Hip axis method as termed by Hills et al. [10]: composed of two different continuous variables: the relative L1PA, which was calculated at 6 weeks as (L1PA minus ideal L1PA) (ideal L1PA was calculated based on normative data using the formula 0.5xPI-21); (4) and T4-L1PA mismatch (difference between 6-week T4PA and 6-week L1PA (mismatch considered as > 4º)).

We performed binary logistic regressions with stepwise likelihood ratio method, generating models that best predicted each of the following adverse events:

  • mechanical complications in isolation:

    • proximal junctional kyphosis––PJK: a difference of more than 25° in the proximal junctional angle (PJA) measured from the immediate postoperative sagittal radiograph to a given successive follow-up sagittal radiograph.

    • proximal junctional failure––PJF: PJA failure due to fracture of UIV or UIV + 1, implant failure––pull-out/dislodgment, or proximal spondylolisthesis, usually causing symptoms. All required revision surgery

    • PJK and PJF together: this variable groups all PJK (as per definition, only angular PJA kyphosis > 25º mechanical complications in isolation) and PJF (proximal failure needing revision surgery).

    • rod breakage: breakage of one or more rods detected in the follow-up radiographs.

  • mechanical complications (MC) in general: any mechanical complication; this variable groups proximal junctional complications and rod breakages and instrumentation failure (screw dislodgment, pull-out, osteolysis).

  • reinterventions in general: any unplanned reoperation after the index surgery for any complication that includes MC and non-MC reinterventions such as infections, postoperative radicular pain, and pseudomeningoceles.

  • reinterventions due to MC: any unplanned reoperation after the index surgery due to a mechanical complication (PJK, PJF, rod breakage, screw pull-out, etc.)

  • readmissions: any unplanned readmission due to a complication secondary to the index surgery.

Receiver operating characteristics (ROC) analysis was performed to calculate the predictive ability of the models in the occurrence of each adverse event. The area under the curve (AUC) was used to evaluate the accuracy of each evaluation system together with the Odds Ratio (OR) and 95% confidence intervals.

Univariate analysis was carried out with additional variables (baseline demographic and radiographic, surgical, and 6-week radiographic) and tested for each of the adverse events previously listed. Those variables showing statistical difference -set at p < 0.05- (as risk factors for a specific adverse event) were added to the previous models, and binary logistic regression analysis were run again to create a second round of predictive models that combined the previous assessment aligning methods with the additional selected variables (listed below).

  • demographic variables: age, gender, ASA score, and body mass index (BMI)

  • baseline PROMS: ODI score, SRS-22 subtotal, SF-36 PCS, and SF-36 MCS

  • surgical variables: surgical approach, surgical time, estimated blood loss, upper instrumented vertebra (UIV), and the use of spinal osteotomies.

  • radiographic variables: preoperative and at 6 weeks: pelvic incidence (PI), lumbar lordosis (LL), T2–T12 kyphosis, T10–L2 kyphosis, relative spinopelvic alignment (RSA) based on global tilt (GT), relative lumbar lordosis (RLL) based on L1–S1, lumbar distribution index (LDI) (L4–S1/L1–S1 × 100), relative pelvic version (RPV) based on sacral slope (SS), T4 pelvic angle (T4PA), and L1 pelvic angle (L1PA).

Statistical analysis was carried out using the SPSS software (version 20, SAS Institute Inc., Cary, NC). Normality of the variables was tested using Kolmogorov–Smirnov test. The distribution of quantitative variables is given as mean and standard deviation or median and quartiles (Q1; Q3) as required; qualitative values are expressed in percentages. Univariate analysis was performed when comparing qualitative variables using the chi-square test and the Fisher exact test, and quantitative variables using the Student t-test or Mann–Whitney U. The significance threshold was set at 5% (p < 0.05), and those that demonstrated statistical significance were used in the multivariable analysis. Binary logistic regression was performed with the stepwise likelihood ratio method and the variance explained by Nagelkerke's R2. ROC curves were used to plot AUC, and OR was calculated to assess alignment methods’ predictive accuracy.

Results

391 patients met inclusion criteria and were analyzed in this study. Table 1 summarizes the baseline demographic, radiographic and surgical variables of the cohort. We can highlight a 45.5% rate (178) of MC in this series of panlumbar fusions (rod breakage 18.7% (73), PJK 11.8% (46), PJF 5.4% (21), PJK + PJF = 17.3% (67), other (dislodgment, screw pull-out, osteolysis) 9.6% (38)), with 36.1% (141) rate of general reinterventions and 28.4% (111) of reinterventions due to MC, and 31.2% rate (122) of readmissions.

Table 1 Sample description

In the first round of binary logistic regression models performed to competitively test the three alignment assessment methods measured at 6 weeks postoperatively, we found that the best overall predictor parameters of adverse events in most models were T4-L1PA mismatch and GAP score. We could not prove a predictive ability of the Roussouly mismatch. The T4-L1PA mismatch best predicted general MC, PJK, PJK + PJF, and readmissions, while the GAP score best predicted PJF and reinterventions due to MC and to any complication (Table 2). We were not able to predict rod breakage in this first round by any of the methods (p > 0.05). The variance (predictive ability, Nagelkerke's R2) of the adverse events explained by these models based ranged between 0.031 and 0.113, and the odds ratios (expressed as exponential b) ranged between 1.070 and 1.456 (Table 2).

Table 2 First round of binary logistic regression

To measure the predictive accuracy of the different sagittal alignment methods, ROC curves were plotted for each adverse event, obtaining a range of AUC between 0.57 and 0.70 (Table 3, Fig. 1). When this analysis was performed, the GAP score performed almost as good as the T4-L1PA mismatch to predict MC, PJK + PJF, and readmissions. The GAP score best predicted PJF and reintervention due to any complication, while T4-L1PA mismatch best predicted PJK.

Table 3 ROC curves calculated for each adverse event with each created model
Fig. 1
figure 1

Area under the curve (AUC) showing the predictive ability of the different sagittal alignment methods for each adverse event

Introducing additional variables in a second round of binary regressions improved predictions in all models (Nagelkerke's R2 between 0.082 and 0.329). And only by adding them to the models was it possible to predict rod breakage together with the T4-L1-Hip axis method. Depending on the model, the following variables were significant (Table 4): baseline demographic (ODI, SF-36PCS, SF-36-MCS, ASA score); surgical (blood loss, UIV level); 6w postoperative radiographic (T2–T12 kyphosis, RSA, RLL, T4PA, L1PA, and LDI).

Table 4 Binary logistic regression crating models to predict adverse events with all the variables selected by univariate analysis

Discussion

Different methods for sagittal alignment assessment compete for predicting adverse events after ASD surgery [7,8,9,10]. These methods help surgical planning, allowing surgeons to personalize alignment goals for each patient using sagittal preoperative measures. Published results are contradictory depending on the investigated adverse event, the characteristics of the selected cohort, and the analyzed alignment method [18,19,20,21]. Evaluating the predictive ability of three of these methods in a cohort of panlumbar fused ASD patients, we have seen in our study that the best predictor parameters in most models were T4-L1PA mismatch and GAP score; we could not prove a predictive ability of the Roussouly mismatch.

The T4-L1-Hip axis method [10] was developed after the analysis of a disease-free multinational volunteer cohort of individuals with normal sagittal balance and spinopelvic alignment. The T4-L1-Hip axis method defines ideal alignment using parameters that are either fixed or directly modifiable in surgery, including pelvic incidence (PI), the L1PA (defined as a function of PI), and the T4-L1PA mismatch (which describes a thoracic alignment relative to the lumbar). Sagittal alignment targets should restore patients to an ideal shape: ideal L1PA based on PI with the formula L1PA = 0.5 × PI-21; and T4PA should be nearly equivalent to the L1PA (within 4º), aligning the T4-L1-Hip axis. While this method proposing the correct position of T4 over L1 in space appears to be accurate for describing normal thoracic and lumbar alignment, no studies have yet confirmed that realignment based on these targets improves outcomes. The method has been described in a set of long thoracolumbar fusions, surgeries in which both T4 and L1 are captured and positioned with instrumentation. The utility of this alignment scheme is unclear in patients with an upper instrumented vertebra in the lower thoracic spine.

The GAP score [9] includes 4 sagittal parameters (L1–S1 lumbar lordosis, L4–S1 lumbar lordosis, sacral slope and global tilt), and targets are calculated based on the ideal situation of each one (respectively, relative lumbar lordosis—RLL, lumbar distribution index—LDI, relative pelvic version—RPV, and relative spinopelvic alignment—RSA) calculated with formulae based on PI. An age modifier adds a biological component to the prediction. The score stratifies patients into three categories originally associated with an increasing rate of mechanical complications. In a recent systematic review analyzing GAP score capacity in predicting MC occurrence, authors studied eleven retrospective articles plotting a global AUC of 0.68 ± 0.2, showing a moderate predictive accuracy [23].

The Roussouly classification [7] was created from a cohort of healthy population, and groups patients depending on their sacral slope in 4 different types. For each type there is a characteristic location of the lumbar apex and the inflection point, as well as a specific number of vertebrae included in the lordosis and lordosis arches distribution. These characteristics have later been proposed as ideal targets to set when surgically restoring the sagittal profile [14, 24].

The SRS-Schwab classification [5, 8] was not used in our analysis as it is the only one not based on normative data from a cohort of asymptomatic subjects, but on a series of adult deformity patients. This classification uses three sagittal modifiers to quantify deformity (sagittal vertical axis-SVA, pelvic tilt-PT, and PI-LL mismatch). It sets thresholds for each parameter based on the correlation between radiographic parameters and HRQOL measures (mainly disability using ODI over or under 40). This classification was not conceived as a predictor of complications, and despite achieving optimal values postoperatively, mechanical complications are not uncommon [25]. We believe this is because it is not patient tailored and might underestimate the malalignment with regard to a patient’s individual PI [9, 18]. For example, the threshold PT > 20º is considered pathological in the classification, but this value should be considered normal in patients with high PI [26].

A few papers have compared these different methods of alignment assessment. Jacobs et al. [18] highlighted GAP’s ability (AUC for GAP score was 0.86) over the SRS-Schwab classification (AUC = 0.69) in predicting mechanical complications and attributed it to the fact that in the GAP score all parameters are related to patient`s individual PI (it is patient tailored), making RPV a better parameter than pelvic tilt—PT, and RLL a better parameter than PI-LL mismatch. The GAP score was also found to be more effective in predicting PJK (AUC = 0.863) and PJF (AUC = 0.724) than the Roussouly classification by Sun et al. [19]. They further showed GAP categories (AUC = 0.561) to be equally effective than GAP score (AUC = 0.555) in predicting implant-related complications. On the other hand, Teles et al. [21] showed the SRS-Schwab to be more predictive of mechanical complications (AUC 0.67) than the GAP score (AUC 0.53).

In our study, the predictive accuracy of the tested methods varied depending on the adverse event to predict and the statistics applied. Readmissions, mechanical complications as a whole, PJK, and the combination of PJK and PJF were better predicted with the T4-L1PA mismatch using regression analysis. It is logical to think that if the ideal lumbar distribution is not achieved (L1PA) and thoracic anterior forces lie in front of L1 (mismatch T4PA), the fate of the construct is to fail over time. However, when ROC analysis was performed we discovered a similar predictive ability with the GAP score.

On the other hand, reintervention for mechanical complications and PJF were better predicted with the GAP score when logistic regression statistics were run. But the T4-L1-Hip axis seemed to predict similarly when ROC analysis was performed. Many authors have also reported GAP’s ability to predict complications [18, 27, 28]. Even with the existing controversy with the use of this method, it is still a good guide to assess ideal alignment. Predictions are aided in this method by a biological marker (age), which we find necessary because, as we have seen in the current study, the accuracy in prediction when only sagittal alignment assessment is used has a lot of room for improvement.

We could not prove in our current study the impact of the Roussouly classification in predicting adverse events as seen in other publications [14, 29], and this is shared by other authors [19, 21]. This classification is less analytical than the other available methods. This might have been our fault as classifying patients only based on three SS categories, as we did in this study, might be misleading if the rest of the characteristics are not considered. First, because patients having SS values under 35º can be classified as either type 1 or type 2. Second, because SS is subjected to pelvic compensation, which may lead to misclassifying the patient having spinal pathology. Moreover, this classification used as a predictive method may be too simplistic to reflect the continuum.

The variance explained for adverse events with the current sagittal alignment methods was low (between 3 and 11%) using Nagelkerke’s statistics, testing the overall performance of the models. This statistical tool is based on the log-likelihood as a type of scoring rule [30]. It can be interpreted as a measure of how close the prediction of the model is to the observed 0 and 1 outcomes. The AUC values also vary from 0 to 1; values of 0.5 indicate that the model performance is not better than random; 0.5–0.7 indicates poor performance. However, a word of caution should be taken when interpreting pseudo-R2 statistics as they do not represent the proportionate prediction in error as the R2 in linear regression does. We cannot conclude that only 3–11% of the adverse events are answered by the models, it just shows part of the quality of the models. This is due to its heteroscedasticity (the error of variance differs for each value of the predicted score).

While sagittal alignment has a clear and demonstrated impact on patient outcomes, sagittal malalignment represents one of many risk factors associated with postoperative complications. In our cohort predictions improved with the addition of variables such as PROMs (ODI, SF-36PCS, SF-36MCS), ASA score, surgical variables (blood loss, UIV level), and some radiographic postoperative parameters (T2–T12 kyphosis, RSA, RLL, T4PA, L1PA, and LDI).

We were not able to predict rod breakage by any of these alignment methods alone (consistent with Sun et al. [19]). The addition of a surgical variable (blood loss) and several independent sagittal regional parameters (RSA, LDI) resulted in some improvement, but the utility and external validity of such a model are likely low. Therefore, other unknown factors apart from sagittal alignment, or the variables we currently collect in our databases, must be involved in adverse events and outcomes. The biomechanical loads supported by a fused spine [31] and patient’s biological state likely play a considerable role in the risk of complications, such as bone quality [27], muscle status [32], patient comorbidities or frailty [33], and serum markers, telomere length, or omics [34]. All of these are future potential variables to add to these equations to improve predictability.

Limitations of the study comprise the retrospective design which may have introduced selection or information biases. As other articles, the selected cohort may not be comparable to other series. The use of postural radiographic parameters (pelvic tilt, sacral slope) and the use of postoperative radiographic parameters of unfused segments of the spine may bias the results. Additionally, there may be considerable difference in risk profiles for those patients with an upper thoracic UIV vs those with a lower thoracic UIV. Another limitation is that we did not account for rod type, number, or diameter in our study. These factors have the potential to impact the occurrence of specific mechanical complications. Finally, statistics need to be carefully interpreted as pseudo-R2 methods do not exactly represents biological processes. Furthermore, comparison between methods is difficult to interpret as their value measures were conceived in different scales. A prospective study comparing surgeons using different alignment schemes is needed to determine which planning scheme results in the least error and which planning scheme results in least complications.

In conclusion, our examination of the T4-L1-Hip axis and the GAP score suggests their potential superiority in predicting adverse events compared to the Roussouly method. Despite this, it is essential to acknowledge that the three analyzed methods exhibit only partial efficacy in anticipating complications. While postoperative sagittal alignment emerges as a pivotal modifiable risk factor, it is imperative to underscore the significance of incorporating a diverse range of variables—both modifiable and non-modifiable—for an enhanced predictive accuracy. Therefore, a comprehensive approach, encompassing a broader spectrum of factors, is indispensable for refining our predictive models and advancing the understanding of adverse events in this context.