Introduction

Adult spinal deformity (ASD) impacts patients’ health-related quality of life (HRQoL), particularly in terms of pain, function and cosmesis [1]. Over the past two decades, sagittal malalignment has been identified as negatively affecting HRQoL in ASD patients, with its associated prevalence, significance and clinical results of correction surgery [2,3,4,5]. The Scoliosis Research Society (SRS) classification for ASD is mainly based on a sagittal plane alignment and deformity [6, 7].

As part of this, coronal malalignment (CM) may cause severe impairment of HRQoL for ASD patients [8]. Certain types of CM are reported to increase the risk of insufficient correction in coronal deformity, implant failure, and persistent low back pain after ASD surgery [9,10,11,12]. In particular, ASD patients with Parkinson’s disease show a high prevalence of CM with frequent suboptimal surgical results [13, 14]. However, CM has received minimal attention in the literature compared to sagittal malalignment. Recently, a new classification system of CM for ASD was published (The Obeid-CM classification) [15]. This classification is expected to provide adequate corrective strategy and technique to surgeons for each subtype of CM. However, a validation study among spine surgeons is required to establish and validate a new classification system.

On the other hand, full-length side bending anteroposterior (AP) radiographs have been used to evaluate the stiffness of the coronal curves and decide on the levels of instrumentation in addition to the standing AP and lateral radiographs for idiopathic scoliosis [16]. For the treatment of ASD, it remains to be seen whether side bending radiographs help us better understand CM.

Therefore, the primary aim of this study was to establish intra- and inter-rater reliability of the Obeid-CM classification in reference to the full-length AP and lateral radiographs. Secondly, we aimed to determine whether the additional use of side bending radiographs improves the inter-rater reliability of the classification compared to those without.

Materials and methods

The institutional review board of Bordeaux University Hospital approved this study (approval number: CE-GP-2019–14).

Classification system

The Obeid-CM classification system first divides the CM patients into two main types according to their CM deformity patterns [15]. Concave CM (Type 1) is defined as CM with coronal T1 plumbline falling at the side of the concavity of the main coronal curve, whereas convex CM (Type 2) is defined as CM with coronal T1 plumbline falling at the side of the convexity of the main coronal curve, greater than 20 mm (Fig. 1). Both types of patients are graded into two subtypes according to the first modifier (A and B) accounting for the location of the main curve. There are a total of four subtypes in the second step; type 1A, and 2A indicate the cases with thoracolumbar or lumbar main curve and include patients whose curve apex are located between T12 and L4. Type 1B indicates thoracic main curve and includes patients who have a main curve with the apex above T12. Type 2B indicates those with lumbosacral main curve whose apex is located below L4. In the third step, Type1A and 2A are graded into two additional subtypes according to the second modifier (1 and 2) accounting for the stiffness and condition of the main curve or lumbosacral junction, respectively. In total, the classification system contains six subtypes; type 1A1, 1A2, 1B, 2A1, 2A2, and 2B; type 1A1 designate a potentially flexible main curve and type 1A2 designate a rigid or fused curve. Type 2A1 is assigned to patients who have a flexible and non-degenerate lumbosacral junction. Conversely, type 2A2 is assigned to those who have degenerate or previously-operated lumbosacral junction. (Fig. 2).

Fig. 1
figure 1

The first grading into two types depends on which side of the main coronal curve the coronal T1 plumbline falls on: left; concave CM (Type 1), right; convex CM (Type 2)

Fig. 2
figure 2

Schema of the Obeid-CM classification

Case readings

Fifteen readers from fourteen international institutions were assigned 28 cases for classification who represented CM (the absolute value in coronal distance from the C7 plumb line to the coronal sacral vertical line (C7-CSVL) > 20 mm) according to the Obeid-CM classification with reference to full-length standing anteroposterior and lateral radiographs (first assignment). Four of the fifteen readers were expert scoliosis and deformity correction surgeons who participated in the establishment of the classification (elaborators). The remaining eleven readers were spine surgeons certified in each country (non-elaborators). All of the readers had been requested to refer to the paper and understand the classification system before the process. Any other information such as treatments or outcomes of the cases were blinded to the readers. Radiographs of the cases were shared among the readers with the slides, and the readers could not obtain any additional personal information. The person who selected the reading cases (I.O.) did not participate following reliability analyses. The assignments were repeated 2 weeks later, with the cases presented in a different order (second assignment). Immediately after the second-reading, the readers were asked to grade the cases again in another order with reference to side bending radiographs (SBRs), in addition to the anteroposterior and lateral radiographs which used in following assignments (third assignment). After the data collection in all assignments, I.O. revealed his results of the classification of 28 cases. The ‘‘correct answer’’ of each case was determined by the majority decision in the third assignment (with SBRs) among 15 readers and I.O..

Statistical analysis

Intra- and inter-rater reliability of the first and second reading in each type and subtype grading were determined by calculating Cohen’s and Fleiss’ kappa coefficients. The results were compared between elaborators and non-elaborators of the classification. Subsequently, inter-rater reliability of the second and third assignments was compared by calculating Fleiss’ kappa coefficients. The strength of the agreement and reliability was considered as follows: kappa value from 0.8 to 1 (almost perfect), 0.6 to 0.8 (substantial), 0.4 to 0.6 (moderate), 0.2 to 0.4 (fair), and under 0.2 (slight) [17]. Then, the majority decision of the classification in each reading case among 15 readers was determined both in the assignment without SBRs (first and second) and those with SBRs (third). The concordance rate for the majority decision among the readers was compared to assess the contribution of SBRs. The cases with low concordance rate were detailed. All statistical analysis was conducted using SPSS (version 25.0, IBM corp. USA).

Results

Distribution of reading cases

28 cases contains 7 cases in 1A1, each 4 cases in 1A2 and 2B, each 2 cases in 1B and 2A1, 9 cases in 2A2 in reference to ‘‘the correct answer’’. The radiographic parameters ranged as follows; coronal Cobb angle of lumbar curve, 42.0 ± 23.6˚; Cobb angle of thoracic curve, 24.9 ± 16.2˚; C7-CSVL, 64.6 ± 28.1 mm. All of the patients had intolerable back and/or leg symptoms and required deformity correction surgery. Eight of them had received previous fusion surgery.

Intra-rater agreements and reliability

Intra-rater agreements and reliability between first and second assignments averaged 0.95 (almost perfect; range from 0.79 to 1.00) for main curve types, 0.86 (almost perfect; 0.72 to 1.00) for subtypes with the first modifier, 0.73 (substantial; 0.55 to 0.95) for subtypes with two modifiers (Table 1). Almost all the readers achieved almost perfect or substantial intra-rater agreements. No difference was noted in agreements between elaborators and non-elaborators for the main curve types (0.95 vs 0.95), subtypes with the first modifier (0.85 vs 0.86), and with two modifiers (0.70 vs 0.74).

Table 1 Intra-rater agreements and reliability between first and second assignments for each reader

Inter-rater agreements and reliability

Inter-rater reliability among readers averaged for the first and second assignments was calculated as 0.91 (almost perfect) for main curve types, 0.75 (substantial) for subtypes with the first modifier, and 0.52 (moderate) for subtypes with two modifiers (Table 2). No difference was noted in agreements between those among elaborators and those among non-elaborators in main curve types (0.96 vs 0.90), subtypes with first modifier (0.76 vs 0.74), and with two modifiers (0.52 vs 0.51).

Table 2 Inter-rater agreements and reliability among fifteen readers

Contribution of SBRs for reliability

According to the third assignment with additional SBRs, inter-rater reliability was determined as 0.88 (almost perfect) for main curve types, 0.73 (substantial) for subtypes with the first modifier, and 0.53 (moderate) for subtypes with two modifiers, respectively (Table 2). All readers altered their grading from the second assignment in two to thirteen (in a mean of 6.9; 24.5%) cases after the inclusion of SBRs.

Contribution of SBRs for the decision of classification

After the inclusion of SBRs, the concordance rate for subtypes with two modifiers increased over 20% in two cases compared to the decision without SBRs. These two were graded as 1A2. On the other hand, the majority decision changed in six cases after the reference to SBRs. Of those, three cases which were labelled as 1A2 having 63 to 70% of concordance rate changed to 1A1 with 60 to 73% agreement. Another one case which was labelled as 2A1 having 90% of concordance rate was changed to 2A2.

Low concordance cases

Even after inclusion of SBRs, the concordance rate was not high enough in some cases. In the classification of main curve type, the two cases were under 75% concordance. They had a coronal lumbar curve and a similar degree of contralateral lumbothoracic curve (Fig. 3a). If the readers consider lumbar as the ‘main curve’, these cases are classified as concave CM (Type 1). In contrast, if lumbothoracic was considered as ‘main’, these cases have a convex CM (Type 2). In the classification of subtypes with the first modifier, the two cases were less than 60% concordance. They had a coronal lumbosacral curve and similar degree of contralateral lumbar curve (Fig. 3b). The readers were less decisive if the ‘main treatment’ would be lumbosacral (Type 2B) or lumbar (Type 2A). In the classification of subtypes with second modifiers, two cases decreased 40% concordance compared to the classification with the first modifier. The two shows a large lumbar curve but slightly flexible. The curve remains despite bending to the contralateral side (Fig. 3c).

Fig. 3
figure 3

Low concordance case in main curve type; a having coronal lumbar curve and similar degree of contralateral lumbothoracic curve, b having coronal lumbosacral curve and similar degree of contralateral lumber curve, c having large lumber curve with incomplete flexibility. The curve remains despite of the bending to contralateral side

Discussion

In this study, we demonstrated that there were considerable intra- and inter-rater agreements and reliability in the Obeid-CM classification in ASD with reference to full-length standing anteroposterior and lateral radiographs. Even non-specialist deformity surgeons could grade the patients with the same reliability as the elaborators of the classification. After the inclusion of SBRs, although inter-rater reliability of the classification did not improve, they contribute to alter the decision or increase the concordance rate of decision in certain cases.

The previously proposed version of the Scoliosis Research Society (SRS) classification of ASD paid attention to coronal balance as a part of global balance modifiers; a coronal C7-CSVL distance of greater than 3 cm was considered as malalignment [18]. The widely used SRS-Schwab ASD classification was recently succeeded and simplified from the previous version, and it requires division into four groups for the coronal plane according to the main curve; thoracic only, lumbar only, double curve, and no major coronal deformity [6]. It seems CM was less significant than sagittal malalignment. One of the reasons might be the reports which emphasized the importance of sagittal alignment consideration and its restoration at ASD surgery [2, 4, 5]. However, some literature also demonstrated that global coronal balance is one of the independent or important related factors for HRQoL parameters in ASD [8, 19]. Moreover, insufficient improvement in CM at correction surgery is associated with postoperative poor clinical outcome [20]. Bao et al. proposed the division between the patients with a trunk shifted to the convex side and those with concave side [10]. Zhang et al. reported a similar system [11]. These classifications alerted to surgeons the risk of deterioration of coronal balance and poorer outcome following ASD surgery in certain cases. However, CM patients especially with a lumbothoracic or lumbosacral main curve have not always adapted well to these classifications or been provided adequate surgical planning. To connect each step of classification and surgical strategy, we took part in a proposal for the Obeid-CM classification [15].

The first step of this classification is to divide the coronal deformities into concave CM (type 1) and convex CM (type 2). This concept of division is included in previously reported systems [10, 11]. The intra- and inter-rater agreements and reliability of this step was almost perfect. Different from the previously reported systems, the Obeid-CM classification requires division of cases into subtypes. Almost perfect intra-rater and substantial inter-rater agreements and reliability were shown in the second step with division into four subtypes with the first modifier (Types 1A, 1B, 2A, 2B). This step allows the surgeons to identify the levels for correction and recognize the cases that require specialized correctional maneuvers such as L5 PSO for type 2B [21]. The amount of CM such as C7-CSVL distance did not included in this step; the amount of correction needed might be too complex to determine only from the classification of coronal alignment. In the third step dividing into a total of six subtypes with two modifiers, substantial intra-rater and moderate inter-rater agreements and reliability were obtained. Although these results were comparable to the results in the other classification system related to the spinal deformity [22, 23], the difference in reliabilities between in the second step and the last step was reflective of the discrepancies between the readers for evaluating the stiffness in the main curve or lumbosacral junction.

Thereafter, we evaluated whether additional SBRs diminish those discrepancies between the readers compared to those without. The authors stated in a previous paper that SBRs is required to evaluate the flexibility of the CM in all ASD cases [15]. We had hypothesized that SBRs helps us better understand coronal plane deformity in ASD similar to those in idiopathic scoliosis [16], and contribute a concordance in the grading. This study revealed that they do not to improve inter-rater reliability. Although all readers altered the grading by 25% after the inclusion of bending radiographs (between the second and third readings), it should be noted that 22% of the alteration of grading were found between the first and second readings, when the readers did not reference the SBRs. However, the increase in concordance or changes in the majority decision with subsequent concordance were found in certain cases graded into 1A or 2A. The results indicate that surgeons should consider the necessity of the SBRs on a case by case basis. The cases graded into type A in the first modifier might have the benefit of SBRs despite the radiation exposure and associated costs.

Not all of the study cases showed high concordance in this classification. Discussion in low concordance cases is important to improve the classification system. We found the different decisions for the main coronal curve caused low concordance in cases with same degree of lumbar and contralateral coronal curve (Fig. 3a, b). Thus, the main coronal curve might be better to define clearly as the one which causes the deviation away from the CSVL for the T1 vertebra. Another cause for low concordance was the large lumbar curve with lesser flexibility (Fig. 3c). In the subanalysis, all cases graded 1 in the second modifier with high concordance among readers (≥ 80%) had flexible curves on SBRs, where correction ratio in lumbar Cobb angle by bending was ≥ 50%. In contrast, all the cases graded 2 for the second modifier with high concordance had rigid curves, where less than 25%. The cases between 25 and 50% of correction ratio on SBRs showed low concordance. In this point of view, the correction ratio of 25 or 50% might be the options of the border between second modifiers 1 and 2 to improve the classification system, however, verification and modification according to surgical decision making and postoperative results will be needed. Thus, we have not included those statements for guidance at present.

The strength of the present study was the inclusion of both elaborators of the classification system and non-elaborators as readers. We found non-elaborators were able to grade almost the same as elaborators. Surgical correction of CM may require three column osteotomy such as asymmetrical PSO [24, 25]. In every case with CM, the Obeid-CM classification provides for deformity surgeons whether those technically demanding procedures are possibly required or not.

This study has several limitations. First, ‘‘true’’ classification in 28 reading cases were hard to determine even for the main elaborator of the Obeid-CM classification (I.O.). Thus, ‘‘the correct answer’’ of reading cases was defined according to the majority decision in the third assignment. Second, we found that the classification has considerable intra- and inter-rater agreements and reliabilities. However, in terms of ASD care, it remains to be seen whether this classification system contributes to improve patients’ HRQoL and diminish complication rate. In addition, we have not clarified which types or subtypes of CM are likely to deteriorate clinically and radiographically during conservative treatment, and which types are likely to require surgical correction. On the basis of the results of this study, further investigation will be planned to access them.

In conclusion, there were adequate intra- and moderate inter-rater agreements and reliability for the Obeid-CM classification in ASD. The classification provides possible surgical strategies to treat a case with coronal malalignment, and might help not only deformity expert surgeons but also non-expert surgeons better understand coronal malalignment in ASD.