Introduction

Osteoporotic vertebral compression fractures (OVCFs) are common in the elderly population with approximately 1.4 million new vertebral fractures occurring every year worldwide [1]. Records show that, among people who are older than 50 years, as many as one-quarter will sustain at least one vertebral fracture in their lifetime [2]. The OVCFs can reduce patients’ mobility and interfere on a patient’s quality of life [3], cause significant pain, and increase mortality risk [4].

Traditional, conservative therapies for OVCFs include bed rest, use of analgesics, physical therapy, and immobilization [5]. Although most fractures heal within a few months, some patients suffer from persistent pain and disability, requiring hospitalization, long-term care, or both [6]. Moreover, in the elderly population, conservative therapy is not well tolerated. It has been reported that the concomitant use of analgesia and prolonged immobilization would also increase the risk of adverse effects, such as poor cognition, increased risk of falls, constipation, and nausea [7].

Based on the premise that fracture stabilization can provide pain relief, percutaneous vertebroplasty (PVP), the percutaneous injection of medical cement or polymethylmethacrylate (PMMA) into the fractured vertebral body, was introduced as an alternative option for the treatment of OVCFs. Since its introduction, this minimally invasive technique became widely accepted and has become routine therapy for OVCFs. Several studies have concluded that vertebroplasty relieves pain better as compared with the use of conservative therapy [8,9,10,11,12,13,14]. However, two double-blind randomized controlled trials (RCTs) in 2009 found that sham injections provided similar pain relief as PVP [15, 16], leading to debates on the benefit of PVP. Furthermore, there is also concern that PVP could increase the risk of new fractures occurring after the procedure [17, 18]. Due to the lack of high-quality supporting data, the American Academy of Orthopedic Surgery guideline strongly recommends against the use of PVP for OVCFs [19].

Since 2012, numerous systematic reviews and/or meta-analyses, comparing the effect of PVP with non-operative therapy for OVCFs, have been published [20,21,22,23,24]. Nevertheless, these have varied in the inclusion criteria and methodological rigor and have reported conflicting results. Hence, to date, there is controversy regarding whether PVP is beneficial for OVCFs. In the past 3 years, several pivotal RCTs have been published [25,26,27,28]. This meta-analysis, incorporating these recent studies, was conducted to provide updated evaluation of the efficacy and safety of PVP compared with non-operative therapy (either conservative therapy or the sham injection) for OVCFs.

Methods

Study inclusion and exclusion criteria

Studies for inclusion in this review were based on the PICOS (Population, Intervention, Comparison, Outcome, and Study design) framework. The population (P) should be adults with a diagnosis of OVCFs that were not caused by malignancy, trauma, or any other specific condition. The intervention (I) should be PVP, defined as the percutaneous injection of bone cement into a fractured vertebral body. The comparators (C) could be the sham injection, best supportive care, pharmacological treatment, or any other non-operative therapies. The outcomes (O) should report at least one outcome of interest: (1) the pain relief and (2) the rate of occurrence of new vertebral fractures. The study (S) should be an RCT with a follow-up period of more than 2 weeks. Trials were excluded if they (1) were abstracts, letters, or meeting proceedings; (2) had repeated data; and (3) lacked outcomes of interest.

Outcome measures

The primary outcome was pain relief as measured by a visual analog scale (VAS) or numerical rating scale (NRS). The effects were analyzed at 1 to 2 weeks (short-term), 1 to 3 months (medium-term), and 6 to 12 months (long-term) of follow-up. The secondary outcome was the rate of occurrence of new vertebral fractures. Both clinical and radiologically apparent vertebral fractures were included. For the rate of new vertebral fracture, the denominator was the number of participants, but the numerator could include more than one new fracture per participant. For both outcomes, if data were available in a trial at multiple time points, data belonging to the longest duration of follow-up were used in the meta-analysis.

Data sources and search strategy

Two independent authors (Shenghan Lou and Xu Shi) searched through MEDLINE, EMBASE, and the Cochrane Central Register of Controlled Trials from inception until 1 January 2019, without language restrictions. The search strategy was developed using MeSH terms and keywords associated with terms relevant to “spine,” “fracture,” “vertebroplasty,” and “randomized controlled trial” (File S1). The reference lists of RCTs that were included, together with relevant reviews, were also surveyed for additional RCTs.

Study selection and data extraction

Titles, abstracts, and full-text articles were screened independently by 2 authors (Shenghan Lou and Xu Shi), according to the inclusion criteria. For all the eligible trials, data were extracted independently and in duplicate using a standardized electronic form (Shenghan Lou and Xu Shi). Discrepancies were resolved through consensus or consultation with a third independent reviewer (Yansong Wang). The major categories of variables to be coded were (1) study characteristics, (2) participant characteristics, (3) intervention characteristics, (4) outcome characteristics, and (5) risk of bias. When the original data were unavailable, we calculated the data using the available coefficients, according to the methods described in the Cochrane Handbook [29]. In the cases where data were only presented graphically, GetData Graph Digitizer 2.26 software (http://getdata-graph-digitizer.com/) was used to digitize and extract the data.

Risk of bias assessment

Two authors (Shenghan Lou and Xu Shi) independently assessed the risk of bias of RCTs using a modified Cochrane risk of bias tool. The tool provides response options of “definitely or probably yes” (assigned a low risk of bias) or “definitely or probably no” (assigned a high risk of bias), an approach that has been validated previously [30, 31]. The tool addressed seven specific domains: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, and other sources of bias. Each domain is assigned a judgment relating to the risk of bias for that study classified as low risk, high risk, or unclear. Any disagreements were resolved through discussion, and sometimes with another reviewer (Yansong Wang) if necessary.

Data synthesis and statistical analysis

For dichotomous outcomes, we calculated risk ratios (RRs) with 95% confidence intervals (CIs). For continuous outcomes, we calculated mean differences (MDs) with 95% CIs.

The meta-analysis was performed using the inverse variance (IV) weighted method, with a random-effects model to minimize effects between study heterogeneity [32]. The I2 statistic was used to assess the heterogeneity [33], and the heterogeneity was distinguished as not important heterogeneity (I2 ≤ 40%), substantial heterogeneity (I2 > 40% and < 75%), or considerable heterogeneity (I2 ≥ 75%) [29]. Publication bias was assessed using the funnel plots and the Egger regression test [34].

In cases of substantial and considerable heterogeneity (I2 > 40), sensitivity analyses, meta-regression analyses, and subgroup analyses were conducted to investigate the clinical heterogeneity. Sensitivity analyses were conducted using sequential omission of a single study from the total studies to evaluate the influence of each study on the pooled effect estimates. The random-effects meta-regression analyses were performed using the unrestricted maximum likelihood method for the continuous variables. Subgroup analyses and tests for subgroup differences were performed for the categorical variables.

All the tests were two-tailed, and a p value of ≤ 0.05 was deemed statistically significant. Statistical analyses were conducted in Review Manager (version 5.3) and Comprehensive Meta-Analysis (version 2.0).

Results

Search results

We identified 1337 relevant studies. After 283 duplicates were removed, the titles and abstracts of 1054 studies were reviewed; 1022 did not meet the inclusion criteria, and were excluded. Next, 32 full-text articles were carefully assessed and reviewed, of which 19 studies were excluded. Finally, 13 differentiated RCTs were found to be eligible and included in the systematic review and meta-analysis [13,14,15,16, 25,26,27,28, 35,36,37,38,39]. The study selection process is shown in Fig. 1.

Fig. 1
figure 1

Flow diagram shows the process of literature selection

Study characteristics

A total of 1624 individuals were recruited in the 13 RCTs with sample sizes ranging from 34 to 385, including 814 and 810 subjects in the intervention and control groups, respectively. Five trials [15, 16, 25, 27, 28] compared PVP with a placebo sham vertebroplasty procedure. The other 8 trials compared PVP with conservative therapy [13, 14, 26, 35,36,37,38,39]. Except for Leali [26], the baseline information was clearly described by all the other trials. Among the trials that were included for analysis, most participants were female with a mean age range between 65 and 81 years. Duration of pain varied across trials with the mean period ranging from around 5.5 [35] to 208.2 days [36]. Hansen 2016 [25] did not report the mean duration of pain; however, inclusion criteria for the trial specified a pain symptom duration of 8 weeks or less. Mean baseline pain scores were similar across the studied trials. In most trials, the mean pain scores were greater than seven out of 10, except for Blasco [37] and Chen [36]. The range of follow-up period was from 2 weeks to 36 months. The main characteristics of the evaluated trials are summarized in Table 1.

Table 1 Characteristics of the studies included in systematic review and meta-analysis

Risk of bias assessment

In all the included studies, some trials were characterized by lack of information about the random sequence generation (n = 4) [13, 14, 26, 36], allocation concealment (n = 4) [26, 35,36,37], withdrawals (n = 1) [26], and trial protocols (n = 6) [13, 14, 26, 35, 36, 38]. Thus, the risk of bias for these items was unclear. Owing to the open-label design, 8 studies (61.5%) had a high risk of performance and detection bias (n = 8) [13, 14, 26, 35,36,37,38,39]. Due to the high loss to follow-up, one study had a high risk of attrition bias [13]. Also, one trial had a high risk of reporting bias [28] because the paper featuring this trial did not report all the pre-specified outcomes in the published protocol. In addition, two studies had a high risk for other biases because of the lack of a balanced baseline between treatment and control groups [13, 39]. Details of the risk of bias assessment are shown in Fig. 2.

Fig. 2
figure 2

The methodological quality of the RCTs. Risk of bias summary. “+” means low risk; “?” means unclear risk; “−” means high risk

Pain relief

PVP versus sham injection

Pain relief was reported in 5 studies [15, 16, 25, 27, 28]. Since the Vertebroplasty for Acute Painful Osteoporotic fractURes (VAPOUR) trial [28] was clinically heterogeneous (earlier fracture duration and worse pain scores) with the other blinded trials [15, 16, 25, 27], the VAPOUR trial was reported separately, as a subgroup. For the subgroup of the VAPOUR trial [28], statistical differences were found between PVP and the sham injection group in pain relief at 1 to 2 weeks (MD, − 1.20; 95% CI, − 2.26 to − 0.1; Fig. 3a), 1 to 3 months (MD, − 1.30; 95% CI, − 2.56 to − 0.04; Fig. 3b), and 6 to 12 months (MD, − 1.30; 95% CI, − 2.54 to − 0.06; Fig. 3c). In contrast, for the subgroup of the other 4 trials [15, 16, 25, 27], we did not find significant improvements with PVP for pain relief in the short term (MD, 0.01; 95% CI, − 0.48 to 0.50; I2 = 0%; 4 trials; Fig. 3a), medium term (MD, − 0.41; 95% CI, − 0.92 to 0.10; I2 = 0%; 4 trials; Fig. 3b), or long term (MD, − 0.53; 95% CI, − 1.14 to 0.08; I2 = 0%; 3 trials; Fig. 3c). Although the other 4 studies failed to determine the benefit of PVP, the effect size of PVP increased over time, ranging from 0.01 (MD, 95% CI, − 0.92 to 0.10) in the short term to − 0.53 (MD, 95% CI, − 1.14 to 0.08) in the long term.

Fig. 3
figure 3

Forest plot for the pain relief at 1 to 2 weeks (a), 1 to 3 months (b), and 6 to 12 months (c). PVP, percutaneous vertebroplasty; IV, inverse variance

PVP versus conservative therapy

Pain relief was reported in 7 RCTs [13, 14, 35,36,37,38,39]. Pooled results indicate that patients in the PVP group had greater pain relief than those in the conservative therapy group at 1 to 2 weeks (MD, − 1.83; 95% CI, − 2.29 to − 0.97; I2 = 91%; 6 trials; Fig. 4), 1 to 3 months (MD, − 1.60; 95% CI, − 2.02 to − 1.18; I2 = 68%; 6 trials; Fig. 4), and 6 to 12 months (MD, − 1.33; 95% CI, − 1.71 to − 0.95; I2 = 63%; 6 trials; Fig. 4). The sensitivity analyses found that results from Blasco [37] significantly changed the pooled effect sizes at 1 to 2 weeks (Fig. S1), Farrokhi [38] significantly changed the pooled effect sizes at 1 to 3 months (Fig. S2), and both Blasco [37] and Chen [36] significantly affected the pooled results at 6 to 12 months (Fig. S3). Based on the results of subgroup analyses (Table 2), PVP was seen to be more effective in patients with shorter duration of fracture (≤ 6 weeks) than those with longer duration of fracture (> 6 weeks), although the advantage declined over time. Besides, subgroup analyses indicated obvious differences for trials with more severe pain (> 7) in comparison with trials with less severe pain (≤ 7) at all three time points (Table 2).

Fig. 4
figure 4

Forest plot for the pain relief at all times points. PVP, percutaneous vertebroplasty

Table 2 Subgroup analyses for the pain relief

Publication bias

No significant publication bias was detected using the funnel plot (Fig. S4, S5, and S6, respectively). The 2-tailed p value of the Egger test was insignificant at all time points (1 to 2 weeks, 1 to 3 months, and 6 to 12 months): 0.22, 0.17, and 0.25, respectively.

The rate of new vertebral fractures

Based upon 11 trials [13, 16, 25,26,27,28, 35,36,37,38,39], slightly higher rates of new vertebral fractures were found in the PVP group (116 fractures in 706 participants (16.43%)) compared with the control group (111 fractures in 701 participants (15.83%)), but it was not statistically significant (RR, 0.97; 95% CI, 0.63 to 1.49; I2 = 60%; Fig. 5). The funnel plot and the statistical test showed no evidence of publication bias (Egger’s test p = 0.89, Fig. S7).

Fig. 5
figure 5

Forest plot for the rate of occurrence of new vertebral fractures. PVP, percutaneous vertebroplasty

Sensitivity analyses found that omitting results from Blasco [37] significantly affected the pooled results (additional Fig. S8). When excluded, the degree of heterogeneity changed from substantial (I2 = 60%) to not important (I2 = 19%). On the other hand, the meta-regression analyses did not show any statistically significant differences in association with the mean age (10 trials, p = 0.89 for slope), the female sex (10 trials, p = 0.19 for slope), the mean pain duration (9 trials, p = 0.93 for slope), or the mean pain scores at baseline (10 trials, p = 0.13 for slope). In addition, the subgroup analyses and tests for subgroup differences suggested that the effects did not differ according to types of fractures (clinical or radiological fractures), the duration of follow-up (6 months, 12 months, or 24 months), or the types of study design (sham injection or conservative therapy) (Table 3).

Table 3 Subgroup analyses for the rate of new vertebral fractures

Discussion

Main findings

We conducted a review of 13 RCTs, involving a total of 1624 patients, to evaluate the benefits of PVP in comparison with conservative therapy or placebos, in the treatment of OVCFs. The VAPOUR trial [28], which was the only placebo-controlled trial, suggested that PVP was superior to placebo intervention for pain reduction. Moreover, PVP resulted in greater pain relief than conservative therapy at the very early times of 1 and 2 weeks and remained significant at 6 to 12 months. Since pain relief in trials with a sham injection control was clearly different from trials with a traditional conservative therapy control, this observation suggests that the benefit of PVP might be related to placebo effects. In addition, PVP did not increase the risk of new vertebral fractures after the procedure.

Considering the clinical heterogeneity between the VAPOUR trial [28] and the other 4 blinded trials [15, 16, 25, 27], further analysis led to the insight that there might be a window in which PVP outperformed non-operative treatment. Owing to the natural healing of the fracture, the time lapse between incurring a fracture and receiving treatment should be an important factor that likely influenced the outcomes. This notion is consistent with the results observed from subgroup analyses. Both in the sham injection control study (Fig. 3) and the conservative therapy control studies (Table 2), PVP resulted in greater pain relief in patients with shorter duration of fracture (≤ 6 weeks) than those with longer duration of fractures (> 6 weeks). Among the studies with fractures ≤ 6-week duration [13, 28, 35, 39], 3 of them [28, 35, 39] showed clear benefits from PVP as far as reduction in pain is concerned. Although Rousing [13] failed to determine significant differences in their study, it should be noted that their sample size (47 patients) was underpowered, and that the baseline pain scores were significantly lower (p = 0.02) in the treatment group (mean pain scores, 7.5) than the control group (mean pain scores, 8.8). Besides, in theory, PVP should be more effective in patients with more severe pain caused by more severe vertebral collapse. In agreement with the idea, our subgroup analyses found that greater pain relief was found in patients with more severe pain scores. The baseline pain scores of the VAPOUR trial [28], which is the only blinded trial that determined significant differences between PVP and placebo, were more than 8 to 8.6. Meanwhile, among all the trials that failed to determine the benefit of PVP [15, 16, 25, 27, 37], their baseline pain scores were less than 8, with the trial by Blasco recording a baseline pain score less than 7 [37]. It should be noted that the results of this trial [37] were far different from the other open-label studies [35, 36, 38, 39], suggesting that patients with less severe pain scores (< 7) might experience inferior pain relief. If patients with more severe pain scores could get superior pain relief and patients with less severe pain score could get inferior pain relief, PVP would be more effective in studies with a lower proportion of patients with less severe pain scores than studies with a higher one. In other words, studies with different thresholds of baseline pain scores for patient eligibility might get different results. Thus, PVP might be more effective in studies with a narrower threshold, but a lower one in the studies with a broader one, which was consistent with published data. The VAPOUR trial [28], presenting positive outcomes and having a narrow threshold, recruited patients with pain scores of 7 or more (out of 10). The trial by Firanescu [27], presenting negative outcomes and featuring a broad threshold, recruited patients with a pain score of 5 or more (out of 10). On the other hand, the study by Kallmes [15] recruited patients with an even broader threshold, pain score of 3 or more (out of 10), and also failed to determine the benefit of PVP. In addition, although most blinded trials [15, 16, 25, 27] failed to determine the significant differences between PVP and control groups, the pooled results showed a trend that the effect size of PVP increased over time (Fig. 3), suggesting that PVP might be a better treatment than placebo for patients who needed durable and stable pain relief.

Comparison with other studies

Several systematic reviews and meta-analyses have been reported on PVP compared with non-operative treatment for OVCFs [20,21,22,23,24]. Our results are consistent with other meta-analyses in concluding that PVP was superior to conservative treatment at all time points [21, 22, 24], and that PVP did not increase the risk of new vertebral fractures after the procedure [20, 22, 24]. Our systematic review, however, differs from previous systematic reviews in several aspects. First, we used the time points 1 to 2 weeks, 1 to 3 months, and 6 to 12 months instead of the time points 1 week, 2 weeks, 1 month, 3 months, 6 months, 12 months to make full use of the existing data, which could help to improve the generalizability and usefulness of our meta-analysis [40]. Second, although both Xie et al. [20] and Mattie et al. [21], in their recent meta-analyses, said that more than 10 trials were included in their studies (12 RCTs and 11 RCTs, respectively), some of the assessed trials were follow-up studies of the original trials. In fact, only 9 and 8 differentiated RCTs were included by Xie et al. [20] and Mattie et al. [21], respectively. Different from these 2 studies [20, 21], a greater number of eligible and differentiated RCTs (13) were included in our study. Compared with the previous meta-analysis [20,21,22,23,24], both the number of RCTs and the sample size were far less than our study. Third, Mattie et al. [21] set a threshold (a change of 1.5 points on the VAS or NRS) as the minimal clinically important difference (MCID) to assess the clinical value of PVP for the pain relief. This approach was also widely used by the individual trials [15, 16, 27], and by the current Cochrane PVP review [41]. However, it seems that this threshold should not be measured from groups but, rather, from individual patients [42, 43] and might, hence, not be appropriate to use in the meta-analysis [43]. Thus, in our study, no threshold was set to determine the MCID in our study. Last but not least, we fully considered that clinical heterogeneity, the blinded studies, and open-label studies were reported separately. Moreover, since the VAPOUR trial [28] was clinically heterogeneous (earlier fracture duration and worse pain scores) with the other blinded trials [15, 16, 25, 27], the VAPOUR trial was reported as a subgroup. Based on this, we found that the baseline of pain scores and duration of fracture might be key factors affecting the efficacy of PVP. We observed that PVP might be more effective than placebo for long-term follow-up.

Limitations

We would like to acknowledge limitations associated with our study. First, there were methodological limitations in the quality of the original studies such as the unclear random method, the unclear concealment of treatment allocation, and the inadequate blinding. Second, the possibility of publication bias might exist because of the ongoing and unpublished trials, although no statistical evidence for this was detected. Third, although we have found that the baseline of pain scores and duration of fracture might be key factors affecting the effect of PVP, there might be other clinically relevant but unreported confounding factors, such as the types of patients, the degree of bone mineral density, and the types of PVP techniques. Moreover, owing to the limited studies, whether the baseline of pain scores and duration of fracture truly affected the effect of PVP still needs to be determined by further trials. Given these limitations, the results of this meta-analysis should be interpreted cautiously.

Implications for future studies

Owing to the limitations of the current research, more studies will in fact be necessary to confirm our findings and further explore the clinical utility of PVP, drawing attention to the following points.

First, since our study determined that PVP might be only effective in patients with acute OVCFs and experiencing severe pain, further studies should focus more on these people and use more consistent inclusion criteria. For example, each study should add the confirmation of OVCFs with magnetic resonance imaging (MRI) to the inclusion criteria and use the same threshold of the baseline pain scores for patient eligibility. Second, since we determined that the baseline information could affect the effect of PVP, further studies should clearly report the baseline information, such as the duration of back pain, the baseline pain scores, the methods of non-operative therapy, the PMMA volume, and the baseline T-score, and other clinically relevant factors, to find the association between these factors and PVP. Third, since the pooled results presented a trend showing that the effect size of PVP increased over time, a longer follow-up (more than 12 months) with enough sample size is necessary for further studies to determine the long-term therapeutic effects of PVP. Finally, as mentioned above, using the change in mean values of pain scores directly as the outcome might not be appropriate [42,43,44], further studies should set an appropriate outcome to assess the clinical improvement of PVP. The determination of the proportion of patients who achieved the MCID might be helpful as it could provide a more interpretable result with direct clinical implications [44]. This would make it also easier to make comparisons between groups [43].

Conclusion

Among patients with OVCFs, PVP showed variable outcomes. The procedure was beneficial to patients with acute OVCFs experiencing persistent and severe pain, but not for patients with older fractures or non-severe symptoms. Further research is recommended to assess this finding and to explore the clinical utility of PVP.