Introduction

Osteoporosis, which is a systemic disorder, is characterized by low bone density that leads to fragile bones and higher fractures risks [10, 27]. Osteoporotic vertebral compression fractures (OVCFs) occur more frequently than ankle, wrist or hip fractures and constitute a major health problem worldwide. OVCFs produce direct and indirect effects on patient quality of life and costs to the public health-care systems [5, 31, 37, 38]. It is estimated that OVCF will develop in 8 % of women older than 50years and 27 % of men and women older than 65 years [7, 23].

OVCF results from insufficient anterior vertebral height and causes spinal deformities, reduced pulmonary function, restriction of the abdominal and thoracic contents, impaired mobility and clinical depression [12, 25, 40, 42, 44]. However, the majority of OVCFs are associated with a limited period of pain; therefore, most patients pay little attention to their symptoms unless an obvious accident occurs. OVCFs have traditionally been treated with analgesics, bed rest, physical therapy and antiresorptive medications. However, conservative management cannot reverse the kyphotic deformities that cause biomechanical changes in the spinal segment and increase the incidence of adjacent vertebral fractures [20]. Moreover, OVCFs result in prolonged hospitalization, increased morbidity, diminished quality of life and increased societal burdens. Additionally, anti-inflammatory drugs and certain types of analgesics are poorly tolerated by elderly patients, and bed rest and immobilization cause progressive demineralization and future fractures [45].

Because of the poor quality of osteoporotic bone, classical open surgery with metal implants often fails and contributes to persistent back pain, neurological symptoms and functional limitations [9, 35]. In particular, open procedures have been limited by neurological deficits and spinal sequence instability.

Vertebroplasty (VP) was introduced in France by Galibert and Deramend in 1984 for treating hemangiomas at the C2 vertebra [13]. Since 1984, VP has been used to treat vertebral compression fractures caused by myeloma, trauma and osteoporosis. Balloon kyphoplasty (KP) was first performed in 1998. It is a minimally invasive surgical technique that corrects kyphosis secondary to collapsed vertebral bodies using a balloon (an inflation bone tamp) [14].

At present, both of these minimally invasive techniques are used to treat OVCF. The procedures include placing spinal needles into fractured vertebral bodies and injecting bone cement under radiological control. They both increase bone strength and reduce the pain caused by OVCF. However, KP also aims to restore the height of the vertebral bodies. Recently, two randomized controlled trials (RCTs) have indicated that both of the procedures can produce immediate pain relief than conservative treatments [19, 47]. In 2011, an RCT with 24 months of follow-up reported that kyphoplasty reduced pain and improved function, disability and quality of life more effectively than nonsurgical therapy without increasing the risk of additional vertebral fractures [8].

However, there is controversy over which of the two procedures leads to superior results and long-term outcomes. There is no consensus as to whether KP or VP is the optimal treatment. Although there are a limited number of RCTs, several non-randomized controlled trials (non-RCTs) have been published. The purpose of this systematic review is to evaluate the evidence from RCT and non-RCT studies that compared the safety and efficacy of KP and VP for treating OVCF patients and to develop GRADE (Grading of Recommendations, Assessment, Development and Evaluation)-based recommendations for using the procedures to treat OVCF [1, 2].

Materials and methods

Search strategy

To assemble all of the relevant published studies, PRISMA-compliant searches of MEDLINE, EMBASE, ScienceDirect, OVID, the Cochrane CENTRAL database and Google Scholar were performed for all peer-reviewed studies published through March 2012 that compared KP with VP for treating OVCF. The following search terms were used to maximize the search specificity and sensitivity: surgery, kyphoplasty, balloon kyphoplasty, vertebral compression fracture, osteoporosis and vertebroplasty. Broad MeSH terms and Boolean operators were selected for each database search.

Secondary searches of the unpublished literature were conducted by searching the WHO International Clinical Trials Registry Platform, UK National Research Register Archive and Current Controlled Trials from their inception to 1 March 2012. Conference proceedings, such as those of the European Federation of National Associations of Orthopaedics and Traumatology and British Orthopaedic Association Annual Congress, and the ISTP database were also searched for entries up to March 2012.

The reference lists of all the full-text papers were examined to identify any initially omitted studies. We made no restrictions on the publication language.

Inclusion criteria

Studies were considered eligible for inclusion if they met the following criteria:

Study design: Interventional studies (RCTs or CCTs) and observational studies (cohort or case–control studies).

Population: Patients with OVCF of an osteoporotic etiology.

Intervention: KP.

Comparator: VP.

Outcomes: Reported at least one of operative time, subjective pain perception, quality of life, incidence of adjacent vertebral fractures, cement leakage, postoperative vertebral body height and local kyphosis angle [21].

Exclusive criteria

Patients were excluded from the meta-analysis if they had neoplastic etiology (i.e., metastasis or myeloma), infection, neural compression, invasive disease, traumatic fracture, neurological deficits or spinal stenosis. The other exclusion criteria were severe degenerative disease of the spine, previous surgery at the vertebral body in question and long-term use of steroidal or nonsteroidal anti-inflammatory drugs.

Study selection

Two reviewers (D.X. and JX. M.) independently screened the titles and abstracts for the eligibility criteria. Subsequently, the full text of the studies that potentially met the inclusion criteria were read and the literature was reviewed to determine the final inclusion. We resolved disagreements by reaching a consensus through discussion.

Date extraction

Two of the authors (D.X. and JX. M.) independently extracted the following data from each full-text report using a standard data extraction form. The data extracted from studies included the title, authors, year of publication, study design, sample size, population, age, gender, type of interventions, numbers of vertebral bodies, surgical procedures, duration of follow-up and outcomes parameters. The corresponding authors of the included studies were contacted to obtain any required information that was missing. The extracted data were verified by XL. M.

Outcome

The primary outcomes included the visual analog scale (VAS), Oswestry Disability Index (ODI) and incidence of bone cement leakage. The following items were included as secondary outcomes: operative time; kyphosis angle; anterior vertebral body height; the incidences of certain complications, including adjacent fractures, mechanical procedural failures, neurological deficits, nerve root irritation and lung embolisms. We defined “short term” as occurring within 1 week and “long term” as occurring after 6 or more months. If no data were reported for a specified time, we selected the closest measurements for pooling purposes.

Assessment of methodological quality

Following the Cochrane Handbook for Systematic Reviews of Interventions 5.0, the methodological quality of the included studies was independently assessed by two authors (D.X. and XL. M.). Any disagreements were resolved by discussion. A third author (JX. M.) was the adjudicator when no consensus could be achieved. We evaluated the RCTs using the “Cochrane collaboration’s tool for assessing the risk of bias”, which included the following key domains: adequate sequence generation; allocation concealment; blinding; incomplete outcome data; free from selective reporting; and free from other bias, including baseline balance between groups, no support by funding and valid sample size estimation. However, non-RCT (i.e., CCT and observational study) methodological quality was assessed using the Methodological Index for Non-Randomized Studies (MINORS) form [43], which was a valid instrument designed to assess the quality of comparative or non-comparative non-RCT studies.

Data analysis

We performed all of the meta-analyses with the Review Manager software (RevMan Version 5.1; The Nordic Cochrane Center, The Cochrane Collaboration, Copenhagen, Denmark). For continuous outcomes, such as ODI and VAS, the means and standard deviations were pooled to a weighted mean difference (WMD) and 95 % confidence interval (CI). Risk ratios (RRs) and 95 % confidence intervals (CIs) were used to evaluate the dichotomous outcomes, such as the incidence of bone cement leakage or adjacent fractures. The inverse variance and Mantel–Haenszel techniques were used to combine separate statistics. A P value <0.05 was considered to be statistically significant.

Statistical heterogeneity was assessed using Q statistics. A fixed-effects (inverse variance) model was used when the effects were assumed to be homogenous (P > 0.05). P < 0.05 implied statistical heterogeneity, and a random effects model was used in those circumstances. The subgroup analyses were stratified by study design. The sensitivity analysis was performed by rejecting the studies with higher statistical heterogeneity.

Evidence synthesis

The evidence grade was determined using the guidelines of the GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) working group [1]. Although the GRADE system acknowledges the primacy of RCTs, it also recognizes circumstances in which observational studies generate high-quality evidence of treatment effects [24]. The GRADE system uses a sequential assessment of the evidence quality that is followed by an assessment of the risk–benefit balance and a subsequent judgment on the strength of the recommendations. The evidence grades are divided into the following categories: (1) high, which indicates that further research is unlikely to alter confidence in the effect estimate; (2) moderate, which indicates that further research is likely to significantly alter confidence in the effect estimate and may change the estimate; (3) low, which indicates that further research is likely to significantly alter confidence in the effect estimate and to change the estimate; and (4) very low, which indicates that any effect estimate is uncertain. Uniformity of the estimated effects across studies and the extent to which the patients, interventions and outcome measures are similar to those of interest may lower or raise the evidence grade. As recommended by the GRADE working group, the lowest evidence quality for any of the outcomes was used to rate the overall evidence quality. The evidence quality was graded using the GRADEpro Version 3.6 software. The strengths of the recommendations were based on the quality of the evidence.

Results

Search results

A total of 242 titles and abstracts were preliminarily reviewed, of which 12 studies [11, 15, 17, 22, 28, 29, 33, 36, 39, 41, 48, 49] eventually satisfied the eligibility criteria (Fig. 1). These studies included one randomized controlled trial [28], three clinical controlled trials [15, 36, 41], five prospective cohort studies [11, 22, 29, 33, 39], three retrospective cohort studies [17, 48, 49] and no retrospective case series study. In total, 1081 patients and 1332 vertebral bodies were included in the 12 studies.

Fig. 1
figure 1

The study selection and inclusion process

The funnel plot for the incidence of bone cement leakage demonstrated limited evidence of small study exclusion and publication bias. The diagram was asymmetrical, and few studies were plotted on the left side of the funnel (Fig. 2).

Fig. 2
figure 2

The funnel plot for the cement leakage rate outcomes

Quality assessment

Among the 12 included studies, only 1 RCT [28] had a low risk of bias, and the remaining 11 non-RCT [11, 15, 17, 22, 29, 33, 36, 39, 41, 48, 49] studies had a high risk of bias resulting from study design limitations. Only one trial [28] reported an adequately generated allocation sequence, and two trials [28, 36] reported allocation concealment. Three studies [28, 36, 39] reported using single-blinded outcome assessors, and no studies reported using double-blinded assessors; the other studies did not specify a blinding method. The methodological quality of the RCTs is presented in Fig. 3. In contrast to the RCTs, the methodological quality assessments of the non-RCTs used a MINORS form. The MINORS quality scores of the non-RCTs are presented in Table 1. The mean score was 12.8 (range, 8–16), which corresponded to a 53 % score. This result indicated that there was considerable variability in the evidence base.

Fig. 3
figure 3

The methodological quality of the RCTs

Table 1 The study designs and MINORS appraisal scores for the non-RCTs

Demographic characteristics

In total, 1 RCT and 11 non-RCTs with 1,081 patients (331 males and 750 females) were eligible for inclusion. The individual sample sizes ranged from 45 to 192 patients. A total of 455 patients underwent KP, and the remaining 626 received the VP procedure. All of the included studies had defined eligibility criteria. All of the studies recruited patients with the following attributes: (1) moderate to severe pain caused by a radiological compression fractures that did not improve with conservative treatment; (2) kyphotic deformities and risk of progressive vertebral height loss; (3) no neurological deficits, systemic or spinal infections, traumatic fractures, pathologic fractures or spinal stenosis; (4) not receiving treatment for osteoblastic, matrix or tissue-producing solid tumors; and (5) no osteomalacia or neoplasms. However, 1 of the 12 studies [11] included patients with ≥15 % vertebral height loss and VAS scores ≥5, but excluded patients with ≥50 % vertebral height loss and local kyphotic deformities ≥30 degrees. Movrin et al. [33] recruited patients with <90 % vertebral height loss. Two of the included studies [36, 41] only recruited patients with fresh OVCF and A1 or A3 fractures [30]. Only Schofer et al. [41] defined fresh fractures as up to the 28th day after the event that caused the fracture. One RCT [28] recruited only patients who presented with thoracolumbar junction (T12–L1) OVCF. Santiago et al. [39]. recruited only patients without known causes of osteoporosis (e.g., corticosteroid therapy, inflammatory spondyloarthropathy and diabetes mellitus). Patients in five of the included studies [11, 28, 41, 48, 49] had only one vertebral fracture. The demographic characteristics of the included studies are summarized in Table 2.

Table 2 The demographic characteristics of the included studies

Outcome analyses

Primary outcomes

Pain was measured using a VAS and was classified by the length of the follow-up period, i.e., short term or long- term. In the subgroup analysis, we pooled the outcome values by study design. Seven studies [11, 15, 22, 28, 33, 36, 41] reported short-term VAS scores. The CCT subgroup analysis did not find a significant difference between the KP and VP groups. However, the RCT subgroup analysis found that KP was less effective than VP (WMD = 0.30, 95 % CI 0.08, 0.52; P = 0.007). The cohort study subgroup analysis found that KP was more effective than VP (Fig. 4). Long-term VAS scores were available in ten of the studies [15, 22, 28, 29, 33, 36, 39, 41, 48, 49]. The RCT and CCT subgroup analyses found no significant differences between the KP and VP groups. However, the cohort study subgroup analysis found that KP was more effective than VP (WMD = −1.51, 95 % CI −2.92, −0.09; P = 0.04) (Fig. 5). The pooled VAS pain score outcomes are summarized in Table 3.

Fig. 4
figure 4

The weighted mean difference (WMD) estimates for the short-term VAS scores

Fig. 5
figure 5

The weighted mean difference (WMD) estimates for the long-term VAS scores

Table 3 The pooled VAS and ODI score outcomes

Separate subgroup analyses were also performed for the short- and long-term ODI outcomes. Two studies [22, 36] reported short-term ODI scores. The KP and VP patients did not differ significantly in the CCT subgroup analysis. However, the cohort study subgroup analysis found that patient functional recovery after KP treatment was superior to recovery after VP treatment (Fig. 6). The results of the five trials [15, 22, 29, 36, 39] that provided long-term ODI data were consistent with these short-term outcomes (Fig. 7). The ODI score data are summarized in Table 3.

Fig. 6
figure 6

The weighted mean difference (WMD) estimate for the short-term ODI scores

Fig. 7
figure 7

The weighted mean difference (WMD) estimate for the long-term ODI scores

Eleven studies [11, 15, 17, 22, 29, 33, 36, 39, 41, 48, 49] reported complications related to cement leakage. The overall pooled analysis of bone cement leakage found a significantly lower rate in KP patients than in VP patients (RR = 0.65, 95 % CI 0.47, 0.89; P = 0.007). However, the CCT subgroup analysis did not find a significant difference between the KP and VP groups (Fig. 8).

Fig. 8
figure 8

The risk ratio (RR) estimate for cement leakage

Secondary outcomes

Adequate operative time data were available in two of the trials [28, 49]. The pooled RCT and cohort study subgroup analysis demonstrated a shorter operative time in VP patients than in KP patients (Table 4).

Table 4 The pooled operative time, short- and long-term local kyphosis angle, and anterior vertebral body height outcomes

As shown in Table 4, the local kyphosis angle after surgery was evaluated in both short-term and long-term follow-up. Meta-analyses were performed for the study design subgroups. Three studies [33, 36, 41] reported short-term postoperative kyphosis angles. Our overall pooled results did not show a significant difference between the KP and VP patients (WMD = −2.25, 95 % CI −5.14, 0.65; P = 0.13). In contrast, five trials [11, 28, 36, 41, 48] reported long-term postoperative kyphosis angles. The RCT and cohort study subgroup analyses found that the mean long-term kyphosis angle of the KP patients was significantly smaller than the angle of the VP patients. However, CCT subgroup analysis did not find a significant difference between the KP and VP patients.

Four studies [28, 36, 39, 49] examined postoperative anterior vertebral body height. There were statistically significant differences in this height between the KP and VP patients in the RCT and cohort study subgroup analyses. However, the pooled CCT subgroup analysis did not find a significant difference (Table 4).

In the eight studies [15, 22, 28, 29, 33, 36, 41, 48] that examined the incidence of adjacent vertebral compression fracture data, there were no significant differences between the KP and VP patients in any of the subgroup analyses (Fig. 9).

Fig. 9
figure 9

The risk ratio (RR) estimates for adjacent-level fractures

No other complications were reported in the included studies.

Quality of the evidence and recommendation strengths

Ten outcomes in this systematic review were evaluated using the GRADE system. The following seven outcomes were important: short- and long-term VAS scores, short- and long-term ODI scores, anterior vertebral height, incidence of bone cement leakage and adjacent vertebral compression fractures. The evidence quality for each outcome was low or very low (Table 5). Therefore, we agreed that the overall evidence quality was very low. This finding may lower the confidence in any recommendations.

Table 5 The GRADE evidence quality for each outcome

Discussion

An ideal treatment for OVCF should result in lasting symptom improvement and durable kyphotic deformity correction. Currently, KP and VP are alternatives after medical therapy has failed or when patients cannot tolerate the pain. These minimally invasive procedures can provide rapid and lasting pain reduction and improved quality of life. Although several published studies [26, 32, 34] have demonstrated that KP and VP improve preoperative clinical status and quality of life, it is not clear which of these two interventions provides better outcomes. Furthermore, there have been no guidelines or recommendations for surgically treating OVCF. Therefore, there is a need for an evidence base to help surgeons make clinical decisions and develop optimal treatments. To the best of our knowledge, this study is the first systematic review to use the GRADE system to evaluate the quality of the evidence comparing KP and VP treatments for OVCF.

Because of the challenges clinicians face from the lack of randomized surgical trials and the large number of observational surgical studies, non-RCTs were included in this review. However, including non-RCTs in the present study introduces a high risk of bias. The methodological quality assessment identified a number of limitations to the current evidence base. Ultimately, only 1 RCT and 11 non-RCTs met the pre-defined eligibility criteria. “Cochrane collaboration’s tool for assessing the risk of bias” and the MINORS form were used to evaluate the RCTs and non-RCTs, respectively. All of the non-RCTs had insufficient information on the randomization methods. Apart from two studies [28, 36], all of these trials had poor allocation concealment, which allowed for selection and allocation bias. Although blinding of the participants and surgeons was not performed in all of the studies, three of the studies [28, 36, 39] used the assessor blinding method. The lack of blinding permitted further detection and performance biases and the potential for type II statistical errors. Combining the results of the observational studies could cause significant bias. To some extent, the observational studies included in this systematic overestimated the treatment effect. Moreover, confounding factors that should be balanced by randomized methods disturbed the intervention effect in the non-RCTs. Therefore, most of the included studies had relatively high methodological assessment risks, which may have influenced the accuracy and reliability of the pooled results.

Some degree of clinical heterogeneity was induced by the different surgical technologies used, number of vertebrae treated, varying spinal vertebral bodies, types of fractures, gender differences, pre-surgical medical status, follow-up times, differing OVCF severities and mean durations between injury and surgery. Heterogeneity may have been caused by poor non-RCT study design, which poses greater bias risks than other study types. Although we performed subgroup analyses that were stratified by study design, heterogeneity cannot be completely resolved. Accordingly, although the results of the meta-analysis should be considered appropriate, methodological quality defects and clinical heterogeneity should be considered when interpreting the findings.

Judgments concerning the quality of outcome evidence across studies can be made in the context of this systematic review. Well-designed non-RCTs may provide high-quality evidence in the circumstances described by the GRADE working group. However, the present study could not improve the low quality level of the evidence, which may lower the confidence in any recommendations. Although the pooled results from the subgroup analyses may not apply to the studies as a whole, this systematic review represented the best available method of synthesizing the current evidence.

The results from the subgroup analyses of the short- and long-term VAS outcomes were internally inconsistent within the subgroups. Although the cohort study subgroup analysis indicated a significant difference in the short- and long-term VAS scores between the KP and VP groups, the weakness of the cohort study design could have biased this result. The exact mechanism of pain reduction remains unclear. Belkoff et al. [3, 4] provided evidence of pain reduction that was attributable to immobility and inhibition of micromovement in the fractured fragment. In addition, a cytotoxic effect of polymethylmethacrylate (PMAA) causes damage to terminal nerve endings and contributes to pain reduction [16]. However, Togawa et al. [46] reported that PMAA did not influence the thermic pain reduction effect. Schofer et al. [41] reported that significant pain reduction was achieved in the KP and VP groups who suffered from fresh thoracolumbar compression fractures, whereas Schofer et al. found no difference in pain reduction between the two groups. Furthermore, the duration of the follow-up period was comparable to the natural healing time; therefore, it is difficult to distinguish the effect of the intervention from the effect of the natural recovery process. Therefore, the difference in the VAS scores between the KP and VP groups may tend to diminish.

Although KP appears to be more effective for short- and long-term functional improvements, there were no significant differences in the ODI (i.e., quality of life) scores between the KP and VP patients in either the short- or long-term follow-up outcomes. Previous studies have demonstrated that KP and VP significantly improve quality of life compared with the preoperative status. The ODI scores may also be affected by the ongoing osteoporosis process. Additionally, the patient selection in non-RCTs may depend on different indications for KP or VP. Therefore, there may be no significant differences in quality of life between the KP and VP groups. Moreover, because of the limited number of included studies and the lack of RCTs that reported ODI scores, the GRADE quality of the evidence could be low.

Apart from general complications, there are specific complications due to cement leakage. Severe complications occur in up to 8 % of the KP and VP patients [16]. However, cement leakage does not usually cause any clinical symptoms. In the cohort study subgroup analysis, the incidence of cement leakage after KP was lower than the incidence after VP, and this difference was statistically significant. By contrast, no statistically significant differences were observed in the CCT analysis. Moreover, there was no evidence of cement leakage in the RCT data. Although all of the included studies reported the incidence of cement leakage, they reported no cases of spinal stenosis and pulmonary embolism that were caused by cement leakage. To the best of our knowledge, low-viscosity cement and high injection pressure lead to more frequent cement leakage through fractures and blood vessels. Filling the cavity in the vertebral body that is created by the balloon with high-viscosity cement injected at low pressure is characteristic of KP and could lead to lower cement leakage rates. Lovi et al. [29] reported that performing VP with firmer cement decreases the risk of cement leakage. Consequently, patients with vertebral fissures, especially fractures in the posterior edge of the vertebral body, may be candidates for KP. In addition to the cause of the leakage, the differing cement leakage measurements influenced the results. Heini et al. [16] provided further evidence that little cement leakage is found by standard X-ray imaging, whereas high rates are observed with computed tomography. Therefore, the GRADE evidence quality for cement leakage was very low because of the high risk of publication bias and outcome inconsistencies.

There was no statistically significant difference in the incidence of adjacent-level fracture between the two surgical methods based on the results of the RCT, CCT and cohort subgroup analyses. Whether bone cement augmentation causes an increased incidence of new adjacent vertebral body fractures is unresolved [18]. Additionally, it is difficult to discriminate between adjacent-level fractures following a surgical procedure and new OVCFs in these patients. In a systematic review, Hulme et al. [18] demonstrated that the incidence of adjacent-level fractures did not increase compared with an osteoporotic population that had already suffered OVCF. Because of the small sample size and poor study design, we could not confidently make conclusions about this complication. Because of the insufficient quality of evidence, the effect estimate is uncertain and has a lower GRADE recommendation strength.

Restoration and repositioning of fractured vertebral body height are easily achieved with low-pressure bone cement injection when using balloon kyphoplasty. Numerous publications have provided further evidence that following KP or VP, a significant postoperative increase in anterior vertebral body height is observed compared with the preoperative height. However, Hulme et al. [18] reported that there was no significant difference between KP and VP in vertebral body height restoration. By grading the present evidence, we do not have sufficient evidence to prove a significant difference in anterior vertebral body height postoperatively between the KP and the VP group. Restoration of the vertebral body height by KP can be associated with the possibility of restoring the shape of the vertebral body via a balloon. However, restoration of vertebral body height was partially attributable to the patient’s bedding being in a sagging position. Although Rollinghoff et al. [36] and Schofer et al. [41] demonstrated that there was no relationship between improved vertebral body height and clinical outcome in either the KP or VP groups, restoration of vertebral body height associated with OVCF was theoretically better in the KP group. Therefore, patients with significant height loss of the fractured vertebrae may be better candidates for KP.

In our systematic review, subgroup analysis of CCTs demonstrated that there was no significant difference postoperatively between the KP and VP groups in short- or long-term kyphosis angle. The short- or long-term kyphosis angles of the KP group were superior to that of the VP group based on the subgroup cohort analysis. However, the single included RCT, which did not report a short-term kyphosis angle, demonstrated that KP group had a lower kyphosis angle compared with the VP group; this difference was statistically significant. The quality of the included studies and evidence may influence the reliability of the results. Because of the RCT that was included in the analysis of the long-term kyphosis angle, the long-term results could be plausible. Additionally, as one of the confounding factors, the measurement of kyphosis angle may be affected postoperatively by patient pain and anxiety. The different kyphosis angle results of the two surgical procedures may be attributed to the following factors: different amounts of endplate subsidence of the index vertebrae, preoperative baseline measurements and different types or amounts of bone cement. The reduction of angle may also depend on the natural fracture healing process [6]. Furthermore, patient positioning may influence measurement accuracy. Therefore, detection bias may influence the reliability of the outcomes. The quality level of evidence of short- and long-term kyphosis angle measurements after KP or VP was very low according to GRADE system. However, Schofer et al. [41] reported a reduction of the kyphosis angle by an average of 3–6° after KP compared with a 1° reduction after VP. It was recommended that there was an additive effect from the balloon-induced restoration. Therefore, patients with severe OVCF or with a high kyphosis angle may be candidates for KP.

The primary limitations of this systematic review include the following: (1) the statistical efficacy could be improved by including more studies; (2) two methods to evaluate study quality were used because both RCTs and non-RCTs were included in this study. However, the different efficacies in quality assessment tools may have led to assessment bias; (3) poorly designed non-RCTs were more likely to suffer from various types of bias; (4) publication bias from significant conclusions being more easily published and non-English publications not being included in this review may have caused important studies to be overlooked; (5) no economic outcomes were reported in the included studies; and (6) the overall GRADE quality of evidence was very low, which lowers confidence in any subsequent recommendations. Although we used the GRADE system to evaluate the evidence quality and recommendation strengths, judgment is still required.

Conclusion

This systematic review and grading of the evidence comparing KP and VP for treating OVCF offer useful conclusions and demonstrate that KP and VP are both safe and efficacious surgical procedures. Patients with large kyphosis angles, vertebral fissures, fractures in the posterior edge of vertebral body or significant height loss due to fractured vertebrae may be better surgical candidates for KP. However, the KP procedure has higher material costs than the VP technique, which could negatively affect KP utilization. The overall GRADE evidence quality was very low; therefore, further validation is required, and medical institutions should conduct high-quality RCTs.