Introduction

Vertebral compression fractures (VCF) constitute a major health problem affecting more than 1.4 million people each year worldwide [1], leading to pain, significant morbidity [2, 3], and healthcare expenses [4]. Non-surgical management (NSM) may not relieve pain, frequently leads to prolonged immobilization, and may lead to pulmonary deterioration, persistent pain, progressive kyphotic deformity, weight loss, depression, and overall compromise in life quality [2, 5, 6]. In addition, patients with VCF are prone to new adjacent fractures (a fivefold increase in risk) [7]. In one prospective study, elderly women with at least 1 VCF had an age-adjusted increased risk of mortality of 32 %; survival impact was more profound with greater numbers of vertebral fractures [3].

Minimally invasive techniques such as vertebroplasty (VP) and balloon kyphoplasty (BKP) have been employed to treat painful VCFs. There is class I evidence to support the superiority of vertebral augmentation procedures (VAPs) over NSM [810], as well as non-randomized prospective studies [1113], systematic reviews [1419], and numerous retrospective series supporting safety and effectiveness of these procedures. However, recently published randomized controlled trials (RCTs) that showed no superiority of VP over NSM [20] or over a simulated procedure (sham) [21, 22] have raised questions regarding the value of VP. These trials have been criticized for potential methodological flaws confounding the outcomes [23]. Meta-analysis may help resolve controversies by combining data and increasing the power of the analysis. Therefore, we performed a new systematic review evaluating the latest published literature related to the treatment of VCFs.

The null hypothesis in the current review is that there is no difference in safety or efficacy between BKP, VP, and NSM. The objective of this study was to determine if differences existed between BKP and VP, BKP and NSM, and VP and NSM in the treatment of symptomatic osteoporotic VCFs. To this end, we reviewed only the published prospective studies to date (class I and II data). In addition, we used meta-regression and subgroup analyses to identify potential predictors of outcomes.

Materials and methods

Literature search and selection

As of February 1, 2011, a PubMed search using “kyphoplasty” and “vertebroplasty” as keywords resulted in 1,587 articles, out of which 27 studies satisfied the inclusion/exclusion criteria. Inclusion criteria were prospective comparative studies of VAPs (Fig. 1), studies enrolling ≥20 patients, and studies performed for mid/lower thoracic and lumbar vertebral fractures due to osteoporosis. Exclusion criteria were single-arm studies, BKP studies not using inflatable balloons, studies not available in English, systematic reviews and meta-analyses, studies including traumatic non-osteoporotic or cancer-related fractures, and studies not reporting clinical outcomes. Of the 27 identified studies, 8 described randomized studies. Nine articles compared VP to NSM (6 articles reported on 5 randomized studies, and 3 articles reported on 2 non-randomized studies). Six articles compared BKP to NSM (1 article reported on 1 randomized study, and 5 articles reported on 2 non-randomized studies). Finally, 12 articles compared BKP to VP (1 article reported on 1 randomized study, and 11 articles reported on 11 non-randomized studies). Some of the studies reported effects for the same group of patients and were combined into one analyzable “study” (Kasperk/Grafe et al., 4 total [11, 12, 24, 25] and Rousing et al., 2 total [20, 26]; see Table 1). This systematic review was reported in accordance with the PRISMA statement [27]. Study bias was assessed with the 6-category risk of bias assessment suggested in the Cochrane Handbook [28] and advocated in the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [29] (Table 2).

Fig. 1
figure 1

Literature search and selection

Table 1 Summary of study characteristics
Table 2 Bias assessment by study summary

Statistical methods

The primary analytic approach was to pool treatment subgroups to calculate a mean effect, and then to compare subgroups in a pair-wise manner using the Z test [30]. This method allowed for the assessment of the maximum number of effects in a body of evidence that comprised 27 publications on 3 treatments. Sham arms were considered to be part of the NSM group. Considering sham as an independent treatment was desired but not possible due to only two studies reporting on sham treatment. We applied a mixed-effect model when performing pair-wise comparisons. In general, we sought a minimum of four studies contributing to each subgroup in order to generate an estimated within-group effect.

We also used the method of indirect treatment comparisons (ITC) [31] and direct treatment comparisons. While these comparisons were limited by available studies, the ITC and direct methods preserve randomization [32] and thus is a worthwhile method to assess stability of our conclusions using the primary statistical approach.

Mean, SD, and N, if not directly reported, were imputed from other summary statistics [33]. For effects measured repeatedly over time, such as pain scores, mean differences from baseline were used in a meta-regression of days from baseline to assess for time-dependent effects. When the meta-regression yielded a non-significant slope, we combined multiple time point measures to yield a more precise per-study effect size. If the original scale of measure for an effect could not be preserved, we calculated standardized mean differences [34].

All summary effect sizes were assessed for heterogeneity using the I 2 statistic. We identified a priori baseline fracture age as a potential covariate. A meta-regression was performed to assess for the significance of the slope and to search for any trends [35]. We set our Type-I error at α = 0.05.

Unless otherwise noted, data are reported as mean effect sizes with the 95 % confidence interval in parentheses. The review was conducted in Comprehensive Meta-Analysis Version 2 [36]. Indirect treatment analysis was conducted using the ITC Software Application [37]. Analysis was performed by one of the co-authors (G.C) and results were confirmed by an independent statistician (B.S).

Results

Analysis of pooled treatment arms

Results are summarized in Tables 3 (includes mean values, SE, and I 2) and 4 (pair-wise treatment comparisons).

Table 3 Summary of endpoints
Table 4 Pair-wise treatment comparison results

Disability improvement is reported as a standardized mean difference utilizing two scales: the Oswestry Disability Index (ODI) and Roland Morris Disability Questionnaire (RMD). A meta-regression showed no significant time-dependent effect on disability scores for any treatment arm. In terms of disability reduction, BKP, −3.93 (−5.73, −2.12), showed a trend toward greater improvement than VP, −1.95 (−3.33, −0.56) (P = 0.08), and significantly better than NSM, −0.77 (−1.15, −0.39) (P = 0.008). The difference between VP and NSM in terms of disability reduction was not significant (P = 0.23) (Fig. 2). There was substantial heterogeneity in the BKP (I 2 = 91) and VP (I 2 = 83) arms.

Fig. 2
figure 2

Change in disability from baseline and forest plot

QOL improvement is reported as Physical Component Summary (PCS) units from the combined SF-36 and SF-12 surveys. BKP, 7.13 (4.78, 9.48), showed significantly more PCS improvement than VP, 2.70 (−0.87, 6.28) (P = 0.043) (Fig. 3). It should be cautioned that these results are based on only four studies for BKP with an I 2 = 92, and five studies for VP with an I 2 = 6.

Fig. 3
figure 3

Change in quality of life from baseline and forest plot

Pain ratings were rescaled to a 0 to 10 scale, with 0 being no pain and 10 being the worst pain imaginable. The meta-regression of days from procedure versus pain rating for each of the treatment arms demonstrated no significant correlation. The range of pain relief (0 = no pain relief, −10 = maximum possible pain relief) across all studies was −5.07 (−5.96, −4.18) for BKP, −4.55 (−5.22, −3.87) for VP, and −2.17 (−2.92, −1.41) for NSM. A wide scatter in ranges of pain relief for BKP (I 2 = 99), VP (I 2 = 99), and NSM (I 2 = 99) was evident as shown in Fig. 4. Both BKP (P < 0.01) and VP (P < 0.01) performed significantly better than NSM, while no significant difference was observed between the two interventional procedures (P = 0.35).

Fig. 4
figure 4

Change in pain from baseline and forest plot

Subsequent adjacent fractures and overall subsequent fractures are reported as event rates. For overall subsequent fractures (95 % CI in parentheses), both BKP, 11.7 (6.1, 21.4) % (P = 0.04), and VP, 11.5 (6.7, 19.0) % (P = 0.01), showed significantly lower rates of fracture than NSM, 22.7 (18.0, 28.1) % (Fig. 5), while there was no significant difference between BKP and VP (P = 0.96). Significant heterogeneity was observed in BKP (I 2 = 78) and VP (I 2 = 78) while NSM showed more consistent results (I 2 = 20). For subsequent adjacent fracture, there was not a detectable difference between BKP, 10.4 (6.7, 16.0) %, and VP, 8.4 (5.3, 13.2) % (P = 0.51).

Fig. 5
figure 5

Subsequent fracture rate and forest plot

Cement extravasation, reported as an event rate, was significantly less frequent for BKP, 18.1 (13.9, 23.2) % than for VP, 41.1 (36.6, 45.8) % (P = 0.01) (Fig. 6). The VP group (I 2 = 89) showed substantially more heterogeneity than the BKP group (I 2 = 38). Spinal canal extravasations occurred too infrequently in both groups to provide a meaningful analysis. There were no reported spinal canal extravasations for BKP and 7 reported for VP.

Fig. 6
figure 6

Cement extravasation rate and forest plot

For vertebral height restoration, we converted the relative values to quasi-absolute measures by assuming normal vertebral body height to be 30 mm as reported for a large population of osteoporotic women in a previous study [38]. The calculation of Hedges’s g showed a significant difference favoring BKP over VP for height restoration, 1.87 (0.64, 3.11) (P = 0.003). However, heterogeneity was moderate (I 2 = 60).

Kyphotic angle was reported as a degree difference in index VB angulation, defined as |A|−|B| where A is pre-op angle and B is post-op angle. BKP, 4.85° (2.87°, 6.83°), was superior to VP, 1.74° (0.49°, 3.00°) (P = 0.009) (Fig. 7). Both the BKP group (I 2 = 96) and the VP group (I 2 = 96) have substantial heterogeneity. Most BKP studies reporting 3.7º–8º reduction, with 2 outliers that showed minimal change [12, 39].

Fig. 7
figure 7

Kyphotic angle and forest plot

Analysis of randomized trials

Results are summarized in Table 5, which compares mean values (where applicable) between RCTs, non-RCTs and pooled studies. Here, we report the comparison between RCTs-only. Seven randomized trials were recognized: one between BKP/VP (Liu [40]), one comparing BKP to NSM (Wardlaw [8]) and five comparing VP either to NSM (Rousing [20], Voormolen [9], Klazen [41]) or to SHAM procedure (Buchbinder [22], Kallmes [21]).

Table 5 Comparison results stratified by study type

In terms of disability reduction, there was no difference between BKP/VP (P = 0.16) or VP/NSM (P = 0.1), with BKP showing a trend toward greater improvement than conservative management (P = 0.078). One study was available for BKP, three for VP and four for NSM. QOL improvement was superior for BKP versus VP (P = 0.02) based on only one randomized trial for BKP and two studies for VP. Pain ratings were similar between procedures (P = 0.46), while BKP performed better than NSM (P = 0.03) and VP showed a trend toward more pain relief than NSM (P = 0.06) with six studies were available for VP/NSM and two for BKP. For overall subsequent fractures no differences were encountered: BKP versus VP (P = 0.8), BKP versus NSM (P = 0.6) and VP versus NSM (P = 0.11) with four studies available for VP/NSM and two studies for BKP. As far as kyphosis correction BKP was superior to VP (P = 0.001) with only one RCT available for VP/BKP. Finally no randomized trials reported cement extravasation for BKP so comparisons were not feasible.

Additional analyses

The sensitivity analyses on average baseline index fracture age against subsequent fractures, cement extravasation, and disability did not yield significant results. The meta-regression of pain reduction against baseline fracture age exhibited a clear pattern, with clinically significant pain reduction before 7 weeks (~−5.0 to −7.0 points) and substantially less pain reduction between 7 weeks and 4 months, especially for VP (~−2.3 to −3.5 points for VP and (~−3.8 to −4.5 points for BKP) (Fig. 8).

Fig. 8
figure 8

Meta-regression of fracture age versus pain reduction in vertebroplasty and kyphoplasty

We attempted confirmatory analysis of pain change, subsequent fractures, and disability using the techniques of direct and indirect treatment comparison, though in many cases low study count caused a difficulty in interpretation. For pain change, indirect treatment comparison trended toward BKP over VP (effect size −1.99 (95 % CI −5.28, 1.29). The wide confidence band is likely due to only three studies contributing to the BKP–NSM path. Direct treatment comparison significantly favored BKP over VP (effect size −0.39 [−0.74, −0.04], P = 0.02). For subsequent fractures, inconsistent results and low study count resulted in wide confidence bands. Thus, the results neither refuted nor confirmed the results of the grouped treatment analysis. For disability, indirect treatment comparison trended toward BKP over VP (effect size −3.75 [−10.39, 2.88]), while direct treatment comparison also trended toward BKP over VP (effect size −0.80 [−1.81, 0.19]). Low study count likely accounted for the wide confidence intervals.

Discussion

Traditionally, VP has been accepted as a successful procedure for treating VCFs; but three recently published RCTs comparing VP with a sham procedure [21, 22] or NSM [20] have created contention about the efficacy of VP. Potential flaws confounding the outcomes have been previously outlined: these include low accrual rates at busy centers, inclusion of patients with subacute/chronic fractures, sham design, and no reported clinical examination to determine the source of pain [23]. These studies do not report on what happened to the majority of patients that fulfilled the inclusion criteria but opted not to participate in the study. Non-uniform evaluation of fractures with MRI [20, 21], higher crossover rates in the NSM arm [21] and other pain generators unrelated to the fracture (e.g., discogenic/facetogenic pain) are additional problems. Most of the limitations of those RCTs were presented by Bono et al. [23] on behalf of the North American Spine Society, and were responded to by study authors [42]. Nevertheless, these trials demonstrate the likelihood that a subset of patients will not benefit from VP. Further randomized studies are needed to address this issue, although a subsequently published RCT (VERTOS II) found clear superiority of VP versus NSM [41]. The current study represents an updated systematic review of prospective studies of VAP and NSM for the treatment of osteoporotic VCFs. We supplement the most recent meta-analysis, published by Han et al. [43] which pertains only to comparative trials between VP/BKP, by also including analysis of randomized and non-randomized controlled trials comparing VAPs with NSM.

Disability/QOL

Disability instruments such as the ODI and RMD and QOL scales such as SF-12/36 are standard questionnaires designed to minimize subjective variability and allow for reproducible and comparable measures [44, 45]. BKP was shown to be superior to NSM in terms of reduced disability and non-significantly better than VP, whereas VP was not significantly different from NSM. Direct treatment comparisons and ITC provide secondary evidence of the disability benefit of BKP over VP. QOL improvement (PCS component) was also superior in BKP over VP. Similar observations were made in randomized trials although this effect was milder (no difference between VP/BKP and trend favoring BKP vs. NSM). This potential advantage of BKP in disability and QOL was not observed (or could not be validated due to insufficient data) in previous analyses [15, 18, 19, 43]. In addition to a possible procedural effect, this may reflect different patient selection criteria and clinical acumen and expertise of practitioners of BKP compared to VP. Caution should be used due to the pooled nature of this analysis, the heterogeneity of the results and the low number of studies.

Pain relief

Pain relief as measured by the VAS was similar between BKP and VP, while both treatments were significantly better than NSM. When considering only randomized trials, the effect was diminished only for VP (non-significant trend favoring VP vs. NSM). This 4- to 5-point difference between VAPs and NSM should be considered not only statistical but also clinical important (more than 30 % improvement from baseline pain) [46, 47]. However, this should be interpreted with caution because of significant heterogeneity (I 2 = 99) in pain relief for all three treatment groups. Surprisingly, even between RCTs, great variance exists, for instance in VERTOS II [41] there is almost a double size effect in pain reduction comparing with the INVEST [21] or the Buchbinder trial [22]. This is partially attributable to the non-uniform scales across studies but also points to the unreliability of this method; patient rated VAS pain scores may be referring to maximum pain, average or current pain, or pain with or without medications, positional pain, etc. predisposing to heterogeneity in responses. The wide scatter in ranges of pre-operative pain and pain relief suggests that pain using a single VAS measure as a sole measure of treatment efficacy is inconsistent and unreliable.

Subsequent/adjacent fractures

The results of the current study are consistent with prior meta-analyses and prospective and comparative studies reporting higher rates of subsequent vertebral fractures after non-surgical treatment of osteoporotic VCFs when compared to VP and BKP [1113, 15, 16]. The mechanism whereby vertebral augmentation may reduce the risk of subsequent fracture might be that anterior column support along with reduction of kyphosis lessens the flexion moment on the surrounding vertebrae, thus reducing the likelihood of further fractures [48]. Of note, the subsequent fracture rate in the NSM group is in accordance with the literature (around 22 % at 1 year after the initial fracture [7]). Even if we base our conclusions on randomized-only trials, VAPs are at least equivalent with NSM and are not associated with an increased risk of subsequent fractures.

Cement extravasation

There was significantly less cement extravasation in the BKP arm than in the VP arm, with high heterogeneity in the VP group. In contrast, the BKP arm yielded more consistent results. The lower rate of cement extravasation after BKP is consistent with previous studies [15, 18, 4952]. A number of factors may have contributed to the heterogeneity in the VP group: procedural technique, variation in considering extravasation as a complication [51], different postoperative radiological follow-up (plain films vs. computed tomography [51, 53]), cement viscosity (inverse relationship [52]), cement pressure [49], fracture level (higher extravasation rates above T7), and cement volume (dose dependent [54]). The optimum cement amount per level has not been established, and cement volume does not seem to correlate well with either clinical success [55] or restoration of vertebral stiffness or strength [56, 57]. The lower rate of cement extravasation seen after BKP may be attributed to the cavity created by the inflatable balloon that allows for low-pressure and higher-viscosity controlled cement filling. In addition with balloon expansion, cancellous bone is compacted, thus creating a dam effect during cement filling. A recent meta-analysis of only comparative trials found no difference between BKP/VP [43]; this may be due to the study design which did not allow for the inclusion of significant RCTs or prospective studies comparing VP to NSM. Several of these trials noted a significantly increased rate of cement leakage with VP (Klazen-72 % of patients [41], Buchbinder-37 % [22], Alvarez-60 % [58], De Negri-33 % [59]).

Height restoration/kyphotic angle reduction

The random effect model showed a significant difference in VB height restoration in favor of BKP, which also showed significantly greater kyphotic angle reduction. The BKP arm was more heterogeneous, but this was due to the presence of two studies that paradoxically report significant VB height restoration with no change in kyphosis [12, 39]. This finding likely reflects the correction of biconcave fractures that are not associated with the development of kyphotic angulation.

Kyphosis reduction may be attributed to postural reduction in the prone position for the procedure and/or balloon expansion. The beneficial effects of restoration of spinal sagittal balance are well documented [60]. Theoretically, improvement in spinal alignment will reduce the flexion moments around the affected vertebrae and relax the paraspinal muscles, leading to more upright posture, reduced pain, and fewer subsequent fractures. Studies give conflicting results, with some authors favoring either no correlation of deformity correction with clinical outcome [12, 39, 61, 62], positive correlation [63, 64], or did not report or investigate this outcome [40, 6568]. Insufficient study data made it difficult to perform a statistical analysis to test for a relationship, so the question still remains.

Serious adverse events

Most studies did not either report [9, 40, 63, 64] or encounter [1113, 20, 59, 65, 67, 68] any serious adverse events (SAEs). In the remaining studies, most SAEs were related to spinal canal/foramen cement leakage. In VP studies, three patients had postoperative paraparesis related to cement extravasation and required reoperation that reversed their symptoms [39, 58]. Two patients had postoperative radiculopathy and were successfully managed non-operatively [58, 62]. Additionally in the VP group the authors reported one psoas hematoma, one injury to the thecal sac (managed conservatively [21]), and one osteomyelitis requiring further surgery [22]. No cases of symptomatic cement extravasation were reported in the BKP arm, while one case of osteomyelitis was recorded [69].

The only study describing SAEs in detail was the FREE trial [8] (BKP vs. NSM randomized) where the profile was similar in both groups: 58 events in 149 cases in the BKP arm and 54/151 in the NSM arm. In the same study, three cases of pulmonary embolism were reported more than 6 weeks after the procedure. In VERTOS II (VP vs. NSM), the authors performed postoperative computed tomography in two-thirds of the patients and found clinically silent cement embolus in peripheral pulmonary vessels in one-fourth of them [70]. Overall, the literature suggests that both procedures had safe SAE profiles with occasional case reports of symptomatic cement extravasation in the VP arm.

Optimal intervention time

There is controversy regarding the optimal time of intervention, with some authorities recommending early intervention [61, 71] and others suggesting that late augmentation does not compromise outcome [25, 72]. The majority of VP studies that yielded significant pain relief (greater than a 4 point drop) had a mean fracture age less than 7 weeks (see Fig. 8). The most important observation was that in both arms there appears to be a period of substantially greater pain relief (approximately less than 7 weeks), after which results were suboptimal or inconsistent.

Limitations

Unlike previous systematic reviews and meta-analyses [1419, 50, 73, 74], the current study included only prospective comparative studies. This restriction limited the total number of studies, and therefore the power of the analysis. However, by including only class I and II evidences, our analysis was less prone to bias than retrospective or single-arm studies. Significant heterogeneity in effect sizes and data reporting methods (e.g., RM/ODI, VB height restoration, clinical/subclinical fractures) limited interpretation of treatment differences in many cases and likely represents the developmental nature of the level I and II studies.

The six-point assessment of bias revealed that 50 % of studies had incomplete outcomes, while 68 % had a risk of bias due to lack of randomization. The review used mixed-effect analysis of study treatment effects subgrouped by bias ratings (Table 2). For NSM, pain reduction was affected by randomization (P = 0.001) and incomplete outcome reporting (P = 0.053), while subsequent fracture rate was marginally affected by incomplete outcome reporting (P = 0.073). For VP, disability improvement was marginally affected by randomization (P = 0.059), and significantly affected by incomplete outcome reporting (P < 0.001). The BKP group did not show any detectable effect of bias on outcomes.

These sensitivity analyses have the potential to be highly variable due to the relatively low number of studies contributing to the analyses. Lower quality trials, defined by a higher risk for bias, have been shown to significantly amplify beneficial results [75].

Conclusions

Our analysis indicated that vertebral augmentation was superior to NSM in the treatment of osteoporotic VCFs in terms of reducing pain and subsequent fractures. Balloon kyphoplasty was superior to VP and NSM in terms of QOL. As expected, kyphosis reduction was variable but was superior for BKP than for VP, along with a lower cement extravasation rate for BKP. Surgical interventions on VCFs within the first 7 weeks show evidence of greater pain reduction. The significant heterogeneity of effects, even among randomized trials, indicates that the current class I and II evidences are delivering inconsistent messages.