Introduction

With industrialization, modernization of the transport and construction industries, and the evolution of sports, there is a growing incidence of traumatic injuries. Globally, spinal injuries constitute a significant proportion of traumatic musculoskeletal injuries. Evidence suggests that 75–90% of spinal fractures occur in the thoracal and lumbar regions, most commonly involving the junction (T10–L2) [13]. To promote communication between physicians, guide treatment decisions, improve patient outcomes, and further research, several thoracolumbar spine injury classification systems had been proposed. However, none of them are universally accepted or have attained widespread clinical use. Therefore, in 2003, the AOSpine Spinal Cord Injury and Trauma Knowledge Forum proposed a new AOSpine thoracolumbar spine injury classification system [4]. This classification is based on the evaluation of three basic parameters: (1) morphologic classification of the fracture; (2) neurological status; and (3) clinical modifiers.

A spinal fracture classification that is universally adopted should be comprehensive, clinically relevant, and demonstrates adequate reliability and reproducibility. Previously proposed thoracolumbar spine fracture classification systems, such as the Denis classification system [5], are not comprehensive and have low reliability and reproducibility [6]. The Mageral classification system is comprehensive [7], but it is too complicated to achieve universal acceptance in clinical practice. The Thoracolumbar Injury Classification System (TLICS) [8] proposed by the American spinal injury study group in 2005 requires magnetic resonance imaging (MRI) to demonstrate compromise of the posterior ligament complex, restricting its use to acute trauma settings and developing countries [9, 10]. In addition, this classification system has poor reliability in identifying injury to the posterior ligamentous complex [9].

The new AOSpine thoracolumbar spine injury classification system is the most recent thoracolumbar spine fracture classification system. It includes the merits of the multiple classification systems that are available in the literature and refines them. The AOSpine thoracolumbar spine injury classification system is based on a computed tomographic (CT) scan rather than magnetic resonance imaging. As a revision of the original Magerl AO classification system, it simplifies morphologic classification of the fracture, includes an evaluation of neurological deficit, and accounts for the presence or absence of important medical conditions that may affect treatment decisions. The AOSpine thoracolumbar spine injury classification system has good interobserver reliability and intraobserver reproducibility [4, 11, 12]. Kepler et al. developed a spine injury score for the AOSpine thoracolumbar spine injury classification system (TL AOSIS), and confirmed that the system was ideal for the establishment of a globally accepted treatment algorithm for thoracolumbar trauma [13].

Currently, reports on the reliability and reproducibility of the AOSpine thoracolumbar spine injury classification system by Chinese spinal surgeons are scarce. The objective of this study was to determine if the AOSpine thoracolumbar spine injury classification system can be reliably applied by Chinese orthopedic surgeons with different levels of experience in spinal trauma.

Methods

Patient population

This retrospective study included patients with acute, traumatic thoracolumbar spinal injuries treated at XXX Hospital between January 2015 and October 2015. Patient records were provided by the hospital database. This study was approved by the Institutional review board.

Patients were included if they had: (1) acute, traumatic thoracolumbar spinal injuries and (2) complete clinical records with imaging. Exclusion criteria were: patients with nontraumatic thoracolumbar fractures, including pathological bone fractures (i.e., fractures associated with spinal tumors and infections) and osteoporotic fractures. A consecutive series could not be used for this study, as a broad spectrum of spinal injuries was analyzed.

Procedure

Anteroposterior and lateral radiographs, as well as CT scans (axial images, sagittal reconstructions, and coronal reconstructions) were rated by six orthopedic surgeons who were divided into two groups according to their level of training in spinal trauma. Group A included three orthopedic surgeons who had 2 years of clinical experience in spinal trauma. Group B included three orthopedic surgeons who were postgraduates with 1 year of clinical experience in spinal trauma. The training of orthopedic surgeons in China includes 5 years of undergraduate study in the medical sciences to obtain a Bachelors degree and ≥6 years of graduate study to specialize and obtain a Master’s degree and doctorate. After obtaining a Bachelors degree and passing the National Medical Licensing Examination, it is possible to practice medical care in a hospital. In the current study, Group A included surgeons who had obtained a Masters degree, and had practiced in spinal trauma for 2 years. Group B included surgeons who were first year postgraduate students applying to take a Master’s degree in orthopedic (spine) surgery.

Cases were graded on two different occasions, one month apart. On the second occasion, the order of the cases was scrambled using a random number generator to avoid recall bias. When multiple injuries were present, the level of injury to be graded was designated. For A-type injuries, to ensure that the raters were assessing the same injury, only cases with single vertebral body injury (disregarding B and C coding) were included. For B-type and C-type injuries, only the most severe injury was considered; however, concurrent A-type or B-type injuries at the same level were graded.

Statistical analysis

Statistical analysis was conducted using the SPSS version 22. The Kappa coefficient (κ) was used to assess the interobserver reliability and intraobserver reproducibility of the classification system for the most severe injury type (i.e., A, B, or C) and subtypes for A-type and B-type injuries. Kappa coefficients were interpreted according to Landis and Koch [14], whereby κ values of 0.00–0.20 were defined as slight agreement or reproducibility; 0.21–0.40 were defined as fair agreement or reproducibility; 0.41–0.60 were defined as moderate agreement or reproducibility; 0.61–0.80 were defined as substantial agreement or reproducibility; and 0.81–1.00 were defined as almost perfect agreement or reproducibility. ANOVA was used to evaluate differences between the orthopedic surgeons in Group A and Group B. Statistical significance was reached at P < 0.05.

Results

This study included an analysis of 109 cases of acute, traumatic thoracolumbar spinal injuries (Table 1).

Table 1 Distribution of thoracolumbar injuries

Interobserver reliability

The overall Kappa coefficient for all cases was 0.362, which represents fair reliability. The Kappa statistic was 0.385 for A-type injuries, 0.292 for B-type injuries, and 0.552 for C-type injuries (Table 2). These values represent fair reliability for A- and B-type injuries and moderate reliability for C-type injuries. Kappa coefficients by fracture subtypes are shown in Table 2.

Table 2 Interobserver reliability by injury type/subtype

Intraobserver reproducibility

The Kappa coefficient for intraobserver reproducibility was 0.442 for A-type injuries, 0.485 for B-type injuries, and 0.412 for C-type injuries. These values represent moderate reproducibility for all injury types. Kappa coefficients by fracture subtypes are shown in Table 3.

Table 3 Intraobserver reliability by injury type/subtype

Between-group comparison

Interobserver reliability was significantly better in Group A compared to Group B (P < 0.05) (Table 4). There were no significant between-group differences in intraobserver reliability (P > 0.05).

Table 4 Between-group differences in interobserver agreement (κ)

Discussion

In the current study, we retrospectively reviewed 109 patients with acute thoracolumbar spine fracture using the new AOSpine thoracolumbar spine injury classification system. It is known that morphologic classification of a spinal fracture is an important but challenging parameter to evaluate. In practice, not all clinical assessments and pre-operative plans are conducted by the most experienced surgeons. In fact, a well-designed classification system must show adequate reliability and reproducibility in residents as well as attending doctors. Therefore, we investigated whether the AOSpine thoracolumbar spine injury classification system can be reliably applied by Chinese orthopedic surgeons with different levels of experience in spinal trauma.

A spinal injury classification system should be comprehensive enough to include different patterns of spinal trauma and should demonstrate adequate reliability and reproducibility [15]. Previous classifications, including the Denis classification [5, 6], the Magerl classification [6, 1618], and the TLICS [8], have shown low interobserver reliability, making their widespread adoption difficult. Our results demonstrated fair interobserver reliability for the morphologic grading of fracture type in the new AO spine injury classification system, including fair interobserver reliability for A-type and B-type injuries and moderate interobserver reliability for C-type injuries. Interobserver reliability was lower in the current study compared to that reported by the group of surgeons that developed the classification system, but it is usually difficult for independent studies to duplicate the reliability of classification systems as originally reported [16, 17]. The interobserver reliability of the new AO spine injury classification system for the morphologic grading of fracture type and subtype in the current study was also lower than previously reported by Urrutia [15] and Kepler [11]. The raters in the latter studies were more experienced in spinal fractures than the raters in the current study, and they may have previously used the Magerl classification or the TLICS classification in clinical practice, which could explain our discrepant results.

In accordance with the findings in the current study, Vaccaro [4] and Urrutia [15] found lower interobserver agreement when classifying B-type injuries compared to A-type or C-type injuries. This observation confirms that accurately evaluating the posterior or anterior tension band is challenging, as was previously reported for the Magerl [19, 20] and TLICS classification systems [9, 10, 21]. The new AO spine injury classification system uses CT scans to evaluate morphologic classification of the fracture. Therefore, diagnosis of posterior ligamentous complex damage relies on clinical examination, X-ray, and CT scan. X-ray and CT scan are considered useful in the diagnosis of bone injury [22, 23]. However, it has been suggested that X-ray and CT scan have limited utility in the diagnosis of ligamentous injuries [24]. As reported by the Spine Trauma Study Group [25], certain indirect factors indicate the presence of complex lesions, including vertebral translation, interspinous space greater than in adjacent levels (over 2 mm according to Daffner [26], facet joint diastasis seen in CT scan, local kyphosis without vertebral injury (over 20° according to Nagel [27], facet joint diastasis seen in X-ray, palpable interspinous gap, spinous avulsion, and vertebral compression exceeding 50% without lesion of posterior wall. Rajasekaran et al. confirmed that CT was necessary for all injuries for accurate classification based on the new AOSpine thoracolumbar classification system, and compared to MRI, and no significant difference was found in terms of assessment of fracture stability or management with the exception of improved identification of B2 fractures [28]. Therefore, it is necessary for spinal surgeons to be trained in the new AO spine injury classification system and have certain clinical experience when evaluating the integrity of the posterior wall of the vertebral body and the posterior or anterior tension band by X-ray and CT scan.

In the current study, the observers were divided into two groups: Group A (higher level of experience level) and Group B (lower level of experience level) according to the surgeon’s clinical experience. It has been suggested that a spinal surgeon’s level of experience does not substantially influence the application of a classification system or interobserver reliability [4, 15]. In contrast, our results suggest that the interobserver reliability was significantly better in Group A than Group B. In accordance with Sadiqi [29], significant differences were not observed between Group A and Group B in intraobserver reproducibility. Group A achieved interobserver reliability >0.55, which may be considered the minimal level for a classification system to be useful [30]. These data suggest that the new AO spine injury classification system may be applied in day-to-day clinical practice in China following extensive training of healthcare providers; however, this classification system is associated with a steep learning curve.

Vaccaro [8] recognized the limitations of MRI, namely, the relatively poor reliability associated with the identification of posterior ligamentous complex injuries, and acknowledged that a classification system heavily dependent on MRI would be unlikely to gain widespread use in the developing world. Guen Young Lee [31] demonstrated that the reliability of MRI for assessing posterior ligamentous complex integrity according to the TLICS was fair to moderate (κ = 0.440 for the first and 0.389 for the second review). However, MRI shows higher sensitivity, specificity, and accuracy in distinguishing ligamentous lesions versus CT [32, 33] and may reduce the risk of failure to diagnose a posterior ligamentous complex injury and associated late deformity [17, 19]. Furthermore, MRI can be useful for diagnosing subtle posterior ligamentous complex injuries, particularly in situations, where fracture displacement on presentation is not representative of maximal displacement at the time of injury. In addition, MRI is often helpful in determining the location and severity of neurological compromise and identifying injury to non-bony structures. Evidence suggests that posterior ligamentous complex integrity plays an important role in fracture stability [34]. In addition, evaluation of neurological status is critical for a complete assessment of a patient’s functional status and eventual prognosis, and is an important factor when making decisions about the need for surgery. Therefore, completing an MRI examination is helpful for young surgeons to establish definitive diagnoses and makes appropriate therapeutic decisions for patients with acute spinal injuries.

This study was associated with several limitations. First, we only investigated the reliability and reproducibility of the morphologic classification of the fracture using the new AO spine classification system. Second, our observers were all young orthopedic surgeons; a study, including surgeons with a greater level of experience, will be informative. Third, we did not verify the presence of a posterior ligamentous complex injury by MRI or surgically. Finally, our study was a preliminary retrospective study based on radiology. To identify fracture types that should be managed conservatively or with surgery and minimize the limitations associated with the current study, prospective randomized control trials in different healthcare providers and clinical settings are warranted.

Conclusions

It is well known that a universally accepted classification system should be reliable and reproducible, has prognostic implications, predicts probabilities of complications, and guides treatment decision making [35]. In the current study, our results showed relatively low overall interobserver reliability and intraobserver reproducibility and demonstrated that the spinal surgeon’s level of experience does substantially influence the classification and interobserver reliability of the new AOSpine thoracolumbar spine injury classification system in young Chinese spinal surgeons. However, the new AO spine injury classification system surpassed the minimal level for a classification scheme to be considered useful in the more experienced surgeons in this study. These data suggest that the new AO spine injury classification system may be applied in day-to-day clinical practice in China following extensive training of healthcare providers.