Introduction

In 2013, Vaccaro et al. [1] published the AOSpine thoracolumbar spine injury classification system, which was designed to incorporate critical elements of both the Magerl classification system and the thoracolumbar injury classification system (TLICS) [2, 3]. The ultimate goal of the new AOSpine thoracolumbar spine injury classification system was to allow for the development of a globally accepted treatment algorithm that would provide treatment recommendations for a wide variety of thoracolumbar injuries; however, in an effort to prevent the mistakes of previous classification systems, the classification was published first, followed by numerous studies designed to identify any possible problems that would prevent the global acceptance of an associated treatment algorithm.

The AOSpine thoracolumbar spine injury classification system separates fractures into three major types: type A—compression injuries; type B—tension band injuries, and type C—translational injuries. Type A and B injuries are further subdivided into five and three subtypes, respectively, Table 1). Next the neurologic status of the patient is evaluated and classified: N0—neurologically intact patient; N1—resolved transient neurological symptoms; N2—persistent radicular symptoms; N3—incomplete spinal cord injury or cauda equina injury; N4—complete spinal cord injury, and NX—neurologic exam is unobtainable. Lastly, the patient is evaluated for patient-specific modifiers. M1 is assigned to compression-type injuries in which the status of the posterior ligamentous complex is unclear, and M2 is assigned to any patient in whom patient-specific morbidities affect the treatment algorithm such as ankylosing spondylitis, polytrauma, etc. [1]

Table 1 The AOSpine thoracolumbar spine injury classification system

Once the classification was published, the first step in the development of the treatment algorithm was establishing the inter- and intraobserver reliability of the system. Vaccaro et al. [1] reported substantial reliability in identifying type A (κ = 0.72) and type C injuries (κ = 0.70), and moderate reliability for type B injuries (κ = 0.58) among members of the AOSpine Classification Group. Similarly, Kepler et al. [4] reported moderate overall reliability (κ = 0.56) in 100 spine surgeons from around the world with no previous knowledge of the classification system, and substantial agreement for type A (κ = 0.80), type B (κ = 0.68) and type C injuries (κ = 0.72).

Next, a series of studies were performed to determine if the surgical algorithm could be globally applied, or if given the regional variations in the treatment of thoracolumbar trauma, a regional treatment algorithm would be needed. Schroeder et al. [5] identified the global severity of each variable in the classification, and found no regional or experiential variability. In a follow-up study, no regional variability in the ability to correctly classify type A thoracolumbar injuries was identified, and in a final study, Schroeder et al. [6] demonstrated no regional difference in the ability to identify an injury to the posterior ligamentous complex (PLC); however, while no regional variability in the ability to identify an injury to the PLC was identified, the authors reported only slight (κ = 0.11) interobserver reliability in determining the integrity of the PLC [7].

Utilizing the results of the aforementioned studies, Kepler et al. [8] published the thoracolumbar AOSpine Injury score (TL AOSIS) (Table 2), which assigned integer values to each variable of the AOSpine thoracolumbar spine injury classification system. The goal of the current study is to determine the surgical threshold for the TL AOSIS.

Table 2 The thoracolumbar AOSpine injury score (TL AOSIS)

Methods

A modified form of the Delphi method was used to establish the surgical algorithm for the treatment of thoracolumbar trauma [9]. The AOSpine Trauma Knowledge Forum designed an initial survey and sent it to a worldwide group of spine surgeons. The AOSpine Trauma Knowledge Forum then interpreted the results of the survey and summarized those results. Using the results of each survey, another survey was designed, which was again sent to the larger group. In this way, the biases of any single individual were negated, while the AOSpine Trauma Knowledge Forum could still guide the larger global spine community. The results of the previous surveys were reported in detail in multiple previous publications [48], and the results of a final survey were used to develop the surgical algorithm to accompany the TL AOSIS.

A final survey was sent to all AOSpine members from the six AO regions of the world (North America, South America, Europe, Africa, Asia, and the Middle East). The survey asked surgeons if a patient should undergo an initial trial of conservative management, or if surgical management was warranted. The survey consisted of a broad spectrum of injuries including those injuries that have created the greatest therapeutic controversy historically, including A2N0M0, every controversial iteration of A3 (A3N0M0, A3N0M1, A3N1M0, A3N1M1, A3N2M0, and A3N2M1), A4 (AN0M0, A4N0M1, A4N1M0, A4N1M1, A4N2M0, and A4N2M1), B1 (B1N0, B1N1, and B1N2), and B2 (B2N0, B2N1, and B2N2). By definition, an A2 fracture cannot lead to neurologic compromise or involve the posterior ligamentous complex, so only A2N0M0 was included. Similarly, type B fractures all involve a disruption of the tension band, so the M1 modifier is not relevant to these fractures. Fractures that were not controversial, such as all C type fractures were excluded, since these are acknowledged to be unstable. Similarly, the survey did not include incomplete or complete spinal cord injuries, as the literature is consistent that these patients undergo surgical intervention uniformly throughout the world where resources exist with relatively few exceptions [10]. To eliminate the possibility of misinterpretation of imaging, no imaging studies were presented. Instead, a written description of the injury as well as the AOSpine thoracolumbar spine injury classification was given (Fig. 1). Utilizing the results of the survey, as well as incorporating information from the previous surveys and limited input from the AOSpine Trauma Knowledge Forum, surgical thresholds were established.

Fig. 1
figure 1

Example of the questions asked in the survey

Statistical analysis

Absolute numbers and frequencies were used to describe the distribution of participating surgeons according to AO region, experience, and sub-specialty. Further on, to investigate the relationship of AO region with surgeons’ initial recommendation, regarding the treatment of controversial thoracolumbar fractures, Chi-square and Fisher’s exact tests were used as appropriate. The statistical significance was set at 0.05. The analysis was performed using the statistical software SAS version 9.2 (SAS Institute, Cary NC).

Results

Four hundred and eighty-three surgeons completed the survey from all six AO regions of the world (Table 3). Table 4 demonstrates the percentage of surgeons who would recommend surgical management, as well as if regional variation in the response was identified. Regional differences were identified in 15 of the 19 controversial fractures demonstrating significant regional variability (P < 0.05). The only four fractures types, which had worldwide agreement on their treatment were A3N0, A3N0M1, A4N2M1, and B2N1.

Table 3 Demographics of survey respondents
Table 4 The number of surgeons from each region who would recommend surgical intervention for controversial thoracolumbar fractures

Using the results of the survey, the AOSpine Trauma Knowledge Forum determined that injuries in which less than 30 % of surgeons would recommend surgical intervention should undergo a trial of non-operative care, and similarly, injuries in which more than 70 % of surgeons would recommend surgery should undergo surgical intervention. While these values are arbitrary, they were determined by consensus of the AOSpine Trauma Knowledge Forum. Using these thresholds, two controversial fractures would always undergo a trial of non-operative care (A2N0 and A3N0), and twelve fracture types would be recommended to undergo surgical management (A3N1M1, A3N2M1, A4N0M1, A4N1M1, A4N2M0, A4N2M1, B1N0, B1N1, B1N2, B2N0, B2N1, and B2N2).

Combining the results of the current survey with the TL AOSIS, it was determined that injuries with a TL AOSIS of three or less should undergo a trail of conservative treatment, and injuries with a TL AOSIS of more than five would carry a recommendation for surgical intervention. Operative or non-operative treatment was determined to be equally acceptable for injuries with a TL AOSIS of four or five.

Case examples

A neurologically intact patient (N0) with an A2 compression fracture would be awarded two points, and a trial of non-operative management is recommended (Fig. 2). A neurologically intact patient (N0) with a complete burst fracture (A4) would be awarded five points, and operative or non-operative treatment may be equally considered. Lastly, in a patient with a complete burst fracture (A4) and cauda equina syndrome (N3), nine points are awarded and surgical treatment is recommended (Fig. 3).

Fig. 2
figure 2

A neurologically intact patient with an A2 compression fracture (pincer fracture) is awarded two points, and non-operative treatment is recommended

Fig. 3
figure 3

A patient with an A4 fracture (complete burst) with cauda equina syndrome is awarded eight points, and surgical treatment is recommended

Discussion

The aim of this study was to develop a surgical algorithm to accompany the AOSpine thoracolumbar spine injury classification system, and we propose that injuries with a TL AOSIS of three or less are treated non-operatively, and injuries with a TL AOSIS of more than five are treated operatively. These thresholds are data driven, as they are determined by the recommendations of 483 worldwide spine surgeons. However, in two cases (A3N1M1and B1N0), more than 70 % of surgeons recommended operative intervention, but using the TL AOSIS, the cases are only awarded five points, and thus operative or non-operative care is appropriate. The AOSpine Trauma Knowledge Forum felt strongly that an A3N1M1 injury (a single endplate burst fracture with a transient neurologic injury and an indeterminate PLC injury) belonged in the gray zone because of the wide variability of presenting symptoms that can be associated with N1, and because of the inability of the surgeons to agree on the integrity of the PLC. The treatment algorithm may be significantly different for a patient with an A3 fracture with mild splaying of the spinous processes and transient dermatomal numbness, compared to a patient with mild splaying of the spinous processes and transient paralysis. Lastly, over 70 % of the respondents would recommend surgical intervention for a B1 (a bony chance fracture). However, significant literature has demonstrated that B1 fractures with significant bony apposition are amenable to treatment with extension bracing, and non-operative care may be associated with improved health-related quality of life [11]. Comparatively, B1 fractures that are unable to be reduced often result in painful non-unions, so initial surgical management may be beneficial [1215]. Because of this variability, operative or non-operative treatment may both be equally appropriate.

Due to the regional treatment variations in the literature, significant research was performed by the AOSpine Trauma Knowledge Forum to determine if a single global treatment algorithm could be proposed, or if a regional interpretation was needed. Analysis of the current results clearly demonstrates that there are regional differences with 15 of the 19 controversial fractures demonstrating significant regional variability (P < 0.05). However, while these differences are statistically significant, the clinical impact is less apparent. The new algorithm recommends conservative care for A2N0, and while regional variability in treatment is identified, even in South America, the most surgically inclined region, only 17.2 % of the spine surgeons believe that the surgery is indicated. Similarly, while there is regional variability in the treatment of B2N2 fractures, 81.3 % of surgeons from Africa, the least surgically inclined region, recommend surgical management. Furthermore, six of the injuries with regional treatment variability are awarded four or five points, and so either operative or non-operative care is recommended.

The regional variation in the treatment of type A fractures was the main reason that a regional threshold was considered; however, the classification of type A fractures into four subcategories allows for a meaningful distinction in treatment recommendations that is not available in the TLICS, and yet is not onerous as it would have been had we attempted to guide treatment for the over 20+ compression variants in the Magerl system [2, 3]. The results of the current study demonstrate the importance of these subclassifications by demonstrating that there is global acceptance that compression fractures not involving the posterior wall (i.e., A1 and A2 fractures) should be treated initially with non-operative management. Furthermore, there is a global agreement that burst fractures only involving a single endplate in a neurologically intact patient (A3N0M0) should be treated with a trial of non-operative management. These results were somewhat surprising given the recent literature from Europe reporting the results of surgical treatment for thoracolumbar compression and burst fractures [16, 17]. However, it is possible that some of the perceived regional treatment variability is, at least in part, due to the failures of the previous classification systems to offer a useful distinction between burst fractures. For instance, in 2014, Schnake et al. [16] reported on the treatment of burst fractures (Magerl A.3) with combined anterior–posterior stabilization, however, further subclassifying these fractures into one of the nine subclassifications of burst fractures in the Magerl system was not done. Similarly, when Bailey et al. [18] reported on the treatment of burst fractures with or without a brace, they also did not subclassify the Magerl A.3 fractures. The current study, as well as the previously published injury severity score [5], clearly establish a global acceptance that A3 and A4 fractures are distinct injuries which require a different treatment algorithm. The ability of the new AOSpine thoracolumbar spine injury classification system to separate these fractures substantially lessened the need for a regional interpretation of the TL AOSIS.

Because of the failure of the existing literature to separate A3 and A4 fractures, a perceived medical equipoise in treatment of thoracolumbar burst fractures has been reported. However, the results of the current study demonstrate that almost 9/10 surgeons worldwide, with similar results even in regions such as Europe, which has been reported to treat more thoracolumbar burst fractures surgically, believe that incomplete burst fractures (A3N0M0) should be treated with a trial of non-operative care. However, using the results of the current study, and the available literature, it is impossible to firmly recommend specific treatment for all iterations of A3 and A4 fractures. In an effort to improve our understanding of these fractures, AOSpine has sponsored an ongoing study that will prospectively collect outcome data on patients with A3 and A4 fractures that may require adjustments to the surgical algorithm for these fractures in the future.

While the current study is unable to definitively recommend treatment for all injury types, the surgical algorithm to accompany the AOSpine thoracolumbar spine injury classification system is a significant improvement over the current classifications. With over 50 fracture subtypes, and the failure to formally consider the neurologic status of the patients, there is no globally accepted surgical algorithm based on the Magerl system [2]. Similarly, while the TLICS is a straightforward system, the worldwide adoption has been limited due to the perception that it fosters the treatment biases of its developers, and does not accurately represent the accepted treatment algorithms in many parts of the world [1921]. The current surgical algorithm is a simple system that formally considers the neurologic status of the patient and is data driven to allow for worldwide acceptance.

Significant limitations with this study exist, including the fact that it was based off of a descriptive survey of surgeons. We chose to use a descriptive survey rather than images, because it eliminated any interobserver variability in the interpretation of the images. While good inter- and intraobserver reliability of the AOSpine thoracolumbar spine injury classification system has been demonstrated [4], the current methodology ensured that there was no variability in the interpretation of the case presentations. The most important limitation is that this system fails to definitively recommend treatment for fractures that are awarded four or five points. This limitation is in large part due to the variability in thoracolumbar trauma. If every possible variable were accounted for in the current system, the AOSpine thoracolumbar spine injury classification system would be more complex than the Magerl system, a system with relatively low reliability due to its complexity. Furthermore, AOSpine is actively involved in research designed to help determine the best treatment for many of the injuries that are awarded four or five points which may eliminate this ambiguity in the future. Additionally, we acknowledge that there is a regional variability in the treatment of some injury patterns, but with the addition of meaningful subclassification, the AOSpine thoracolumbar spine injury classification system has been able to partially mitigate these regional variations and ongoing prospective research may eliminate some of the regional treatment variations through identification of best practice standards of care.

Moreover, while the results of the current study clearly find that patients with an A3N0M0 injury should be treated non-operatively, the study recommends operative or non-operative treatment for A3N0M1 injuries. An M1 modifier indicates that the surgeon is uncertain about the status of the PLC, such as fractures with significant focal kyphosis. This gives the surgeon a significant amount of discretion in the treatment of A3N0 fractures, as there are no strict criteria to define an M1 injury. Another limitation to the study is that it is possible that some of the 483 spine surgeons who answered the survey do not routinely treat spine trauma. We quantified the experience of the surgeons based on years in practice, but we did not ascertain the average number of thoracolumbar trauma cases treated. It is possible that there is no correlation between years in practice and familiarity with spine trauma surgery. The last limitation relates to the less than unanimous support for surgical or non-surgical treatment of any specific injury. The threshold of 70 % agreement may be viewed as arbitrary; however, the authors believe that this degree of consensus internationally is substantial.

Conclusion

The treatment algorithm to accompany the AOSpine thoracolumbar spine injury classification system is a data-driven algorithm which is the result of worldwide survey on the treatment of thoracolumbar injuries. The classification is simple enough to allow for substantial interobserver reliability, but complex enough to afford meaningful separation between injury types and guide treatment. While undoubtedly updates will be required as our understanding of thoracolumbar trauma improves, the AOSpine thoracolumbar spine injury classification system and the surgical algorithm proposed here has the potential to become the new standard for research, teaching, and clinical decision-making for thoracolumbar injuries with further validation in prospective clinical studies. However, further studies are necessary to validate this treatment algorithm and to assess its outcomes.