Introduction

Lumbar fusion with autogenous iliac crest bone graft (ICBG) has been developed over several decades and has become the gold standard surgical procedure in the treatment of patients with symptomatic degenerative disc disease (DDD), spondylolisthesis, and other painful discogenic syndromes. The object of fusion is the attainment of rigid union of the degenerative and unstable motion segments to reduce or eliminate pain and decrease disability [13]. As is well known, there are several serious shortcomings in performing lumbar arthrodesis with ICBG, including donor-site morbidity and relatively high frequency of nonunion [46].

To prevent the morbidity associated with iliac crest harvesting and nonunion, graft substitutes such as local bone, demineralized bone matrix (DBM), and platelet gel have been advocated. However, there is not yet a substitute that has been proven superior to iliac crest graft [710]. Urist et al. [11] described bone morphogenetic protein (BMP) in 1965. For over 45 years thereafter, clinical applications of BMP have been developed. Currently, the widest clinical application of BMP is recombinant human bone morphogenetic protein-2 (rhBMP-2), an osteoinductive bone growth factor that is a member of the transforming growth factor-β superfamily [12, 13]. In previous clinical studies, rhBMP-2 delivered in a variety of carriers has been shown to have superior fusion rates to ICBG in lumbar fusion in different surgical procedures [1418]. However, FDA-approved clinical use of rhBMP-2 has been limited since 2002 to anterior lumbar interbody fusion with proprietary lordotic tapered cages (LT cages, Medtronic Sofamor Danek, Memphis, TN), which means that rhBMP-2 use for other lumbar fusion procedures remains off-label. In addition, some potential complications associated with the use of rhBMP-2 in spinal fusion have been reported and highlighted by independent studies and the FDA. Given some of the limitations in these original controlled studies, which include variations in study design, sample size, and methods of outcome measurement, it is still uncertain whether rhBMP-2 can be regarded as a consummate alternative to ICBG in clinical use.

The purpose of this study is to evaluate the effectiveness and, more importantly, safety of rhBMP-2 compared with ICBG in lumbar fusion. We sought to evaluate all randomized controlled trials (RCTs) on this topic to determine whether rhBMP-2 can ultimately replace ICBG.

Materials and methods

Literature search

Relevant randomized controlled trials (RCTs) were identified up to February 2012 from PubMed, EMBASE, and the Cochrane Central Register of Controlled Trials. The references of the retrieved articles were also hand-searched. Key words that have been used for searching are: lumbar fusion, recombinant human bone morphogenetic protein-2, rhBMP-2, bone substitute. Two reviewers independently checked all titles, abstracts, and the full text of potentially eligible articles to decide which trials fit the inclusion criteria. Disagreement was resolved by discussion; if necessary, a third independent investigator made the decision.

Selection of studies

RCTs published in English were regarded eligible according to the following criteria: (1) participants were adults and underwent lumbar fusion for degenerative disease; (2) the intervention was lumbar fusion using rhBMP-2 as a substitute to ICBG in the treatment group; (3) the study reported at least one desirable outcome. Studies were excluded if (1) participants had an acute spinal fracture, infection, tumor, osteoporosis, or rheumatoid arthritis; (2) only the abstract was available.

Data extraction

The data including the following categories were extracted independently by the two reviewers: participant characteristics; sample size of each intervention group; follow-up rate and time (month); characteristics and details of each intervention; industry-sponsored information and financial interest. The primary outcome was fusion failure assessed by radiographs and computed tomography (CT) scans; the secondary outcomes were reoperation and clinical improvement of the Oswestry Disability Index (ODI) [19] (defined as a decrease of at least 15 % over the preoperative score). The adverse events reported in the studies were also recorded.

Risk of bias assessment

Risk of bias was independently assessed by the two reviewers according to the 12 criteria and the instructions recommended by the Cochrane Back Review Group (CBRG) [20].To each criterion, from three options, yes, unclear, or no is scored, where yes indicates the criterion has been met. The studies were rated as having a “low risk of bias” when at least 6 of the 12 CBRG criteria were met and the study had no serious flaws. Studies with serious flaws or those in which fewer than six of the criteria were met were rated as having a “high risk of bias”. Disagreement was resolved by discussion.

Clinical relevance

The five questions recommended by the Cochrane Back Review Group [20] were used to assess the clinical relevance of the included studies. Positive (+) indicated that the clinical relevance item was met, negative (−) if the item was not met, and unclear (?) if the data were inadequate to answer the question.

Data analysis

For dichotomous outcome data (fusion failure rate, reoperation, clinical improvements in ODI), relative risk (RR) and 95 % confidence interval (95 % CI) were calculated. For continuous outcomes, weighted mean difference (WMD) and 95 % CI were calculated. In order to be more likely to reveal a difference between the two interventions, fixed-effect model was used when there was no significantly statistical heterogeneity. The random-effects model acknowledges that the effect of treatment may not be identical from study to study, because of heterogeneity and typically results in a more conservative estimate [21]. Therefore, the statistical data were analyzed with the use of random-effects model when heterogeneity existed. We assessed the statistical heterogeneity using Q statistic and I 2 statistic [22], with significance at 0.1 and 50 %, respectively. Funnel plot and statistic tests (Egger’s test [23] and Begg’s test [24]) were used to explore potential publication bias [25]. To assess the stability in the overall result if publication bias existed, we corrected the summary results by the trim and fill method [26]. Given the different influence on the summary effect owing to different surgical procedures or total doses of rhBMP-2, subgroup analyses were conducted. To detect whether the combined RRs within subgroups had significant differences, tests of interaction developed by Altman et al. [27] were performed. We also performed sensitivity analyses to assess the changes in the overall result by omitting studies. Data analyses were performed using Review Manager 5.1 and STATA 12.0.

Results

From computer and manual searching, 1,053 publications were obtained. Based on titles and abstracts, 576 publications were immediately excluded. In the potentially relevant 477 publications, 464 were omitted according to the inclusion criteria. Full texts of the remaining 13 studies were reviewed by authors. Two of them were duplicate publications on the same set of participants [28, 29] and one study [30] only reported the data of the control group. Finally, ten RCTs [1418, 3135] involving 1,342 patients were eligible for our inclusion criteria (Fig. 1).

Fig. 1
figure 1

A flowchart showed the study selection process

Basic demographics and follow-up information are shown in Table 1. Patients in nine RCTs [1418, 3133, 35] had symptomatic, single-level lumbar degenerative disc disease (DDD) with low back or leg pain, or both, and failed to respond to nonoperative treatment for at least 6 months. Additionally, seven RCTs [14, 15, 18, 3133, 35] had patients with spondylolisthesis classified no higher than Meyerding grade 1[36]. Glassman et al. [34] enrolled patients over 60 years of age with single-/two-level lumbar DDD, stenosis, and adjacent level fusion. Patients underwent ALIF in three studies [1618], and instrumented posterolateral fusion (iPLF) or posterolateral interbody fusion (PLIF) in seven studies [14, 15, 3135]. The total doses of rhBMP-2 in five studies were fixed [14, 15, 3133], from 12 to 40 mg. In four studies [1618, 35], the total doses, from 1.95 to 12 mg, were varied by differently sized fusion devices. However, the concentration of rhBMP-2 remained constant within each study. In six studies [1416, 31, 33, 35], all of the local bone removed as a result of the decompression was discarded; one study [34] used available local bone in all cases; the remaining three RCTs [17, 18, 32] did not give an explanation. Both the treatment and control groups used pedicle instrumentation in six studies, Cotrel-Dubousset Horizon pedicle screws and rods (CD Horizon Spinal System; Medtronic Sofamor Danek, Memphis, TN) were applied in four [15, 3133], one study [14] used Texas Scottish Rite Hospital pedicle screw instrumentation (TSRH Spinal System; Medtronic Sofamor Danek, Memphis, TN), and one study [34] only described that all cases used the same screw/rod implant system. The details of the operative characteristics are presented in Table 2.

Table 1 Basic characteristic of included studies
Table 2 Operative details of included studies

In one study [14], there were two treatment groups (rhBMP-2/TSRH group and rhBMP-2 only group) and one control group (ICBG/TSRH group). To avoid heterogeneity, rhBMP-2 only group was omitted from this meta-analysis.

The risk of bias assessment of the included studies is presented in Table 3. Given that the compliance of surgical treatment can be considered as irrelevant for interventions [20], Criteria 11 was scored as not applicable. Although the study by Dimar et al. [15] met six criteria, this study was rated to be with “high risk of bias” because of its high dropout rate (35 %), which could be considered a serious flaw. Among the ten eligible RCTs, four trials [17, 18, 31, 33] reported adequate generation of the allocation sequence, and two trials [17, 31] reported allocation concealment. Only one trial [17] performed the care provider and outcome assessors blinding, and one trial [32] did not mention the blinding method. Finally, seven studies were rated to be with “low risk of bias” [14, 1618, 31, 33, 35] and three studies were rated to be with “high risk of bias” [15, 32, 34].

Table 3 Risk of bias assessment and conflicts of interest of included studies

Details on industry funds and financial interest are also presented in Table 3. Industry funds were received in eight studies [14, 15, 17, 18, 3134] and one or more authors received financial interest in seven studies [14, 15, 18, 3133, 35].

The clinical relevance of the included studies is presented in Table 4. In two trials [17, 32], complications as a main clinical outcome were not fully reported. In the study reported by Haid et al. [35], the likely treatment benefits were considered by reviewers not to be worth the potential harms. This was attributed mainly to the relatively low rate of satisfaction in the rhBMP-2 group (72.4 % vs. 80.0 %), as well as the significantly high rate of new bone formation into the canal or neuroforamina (24 of 32 rhBMP-2 patients, 70.1 %; 4 of 31 ICBG patients, 12.9 %; p < 0.0001).

Table 4 Clinical relevance

Fusion failure

Apart from plain or dynamic radiographs, all studies evaluated incorporated computed tomography (CT) scans, which may increase the accuracy of fusion assessment. Also, the time of postoperative rehabilitation may affect fusion rate. We pooled the relevant data at postoperative 6, 12 and 24 months. Seven studies reported data on fusion failure at 6 months [1618, 3133, 35], seven studies provided relevant data at 12 months [14, 16, 17, 3133, 35], whereas, data at 24 months were obtained from nine studies [1418, 31, 3335]. The fusion failure rate at postoperative 6 months in the rhBMP-2 group was significantly lower than that of the ICBG group (p < 0.0001, RR = 0.55, 95 % CI = 0.42–0.72; Fig. 2). Significant differences were also found at 12 months (p = 0.0003, RR = 0.53, 95 % CI = 0.37–0.75) and 24 months (p < 0.00001, RR = 0.31, 95 % CI = 0.21–0.46; Fig. 3 and 4). Heterogeneity was absent or moderate during follow-up (6 months: I 2 = 0 %; 12 months: I 2 = 46 %; 24 month: I 2 = 0 %).

Fig. 2
figure 2

Forest plot of fusion failure at 6 months

Fig. 3
figure 3

Forest plot of fusion failure at 12 months

Fig. 4
figure 4

Forest plot of fusion failure at 24 months

Reoperation

Reoperation rates were available from all included studies [1418, 3135]. The combined result showed a significantly lower rate of reoperation in the rhBMP-2 group in comparison to the ICBG group (p = 0.0001, RR = 0.52, 95 % CI = 0.37–0.72); no heterogeneity was found (I 2 = 0 %; Fig. 5).

Fig. 5
figure 5

Forest plot of reoperation

Clinical improvement on ODI

Clinical improvements on ODI were summarized from four studies [16, 18, 33, 35]. Clinical improvement was defined as a decrease of at least 15 % from the preoperative ODI score. The summary RR and 95 % CI showed a strong favorable trend in the rhBMP-2 group, although no statistically significant difference was found (p = 0.12, RR = 0.73, 95 % CI = 0.49–1.08). There was no significant heterogeneity between trials (I 2 = 0 %; Fig. 6).

Fig. 6
figure 6

Forest plot of the Oswestry Disability Index at 12–24 months

Clinical outcomes

Clinical outcomes at postoperative 24 months are shown in Table 5. Not surprisingly, the improvement in each clinical outcome at 24 months follow-up interval, as compared to preoperative scores, was statistically significant for both intervention groups. Eight studies [1518, 31, 3335] reported data on the mean improvement of the ODI score. Seven of them [1518, 31, 33, 34] compared the mean improvement of the two intervention groups (p value) and none of them was significant. Two studies by Dimar 2nd et al. [31] and Burkus et al. [16] only depicted that the ODI scores were similar in both groups over all time intervals and did not report a specific p value. Four studies [15, 17, 33, 34] presented and compared the mean improvement of the SF-36 PCS, and only one study [17] showed a significant difference (p = 0.015). One study by Dimar 2nd et al. [31] did not provide the corresponding data, but demonstrated that the SF-36 PCS scores were similar in both groups over all time intervals. Eight studies [1417, 31, 3335] provided data on the mean improvement of the NRS back pain score. Of these studies, compared with ICBG, two RCTs [17, 35] showed a significant tendency toward better improvement with the use of rhBMP-2 (p = 0.009 and p = 0.032, respectively). In the study by Burkus et al. [17], patients treated with rhBMP-2 had a significantly improved NRS leg pain score compared to those treated by ICBG. But there were no significant differences in the remaining six studies which reported the corresponding data. [15, 16, 31, 3335].

Table 5 Clinical outcomes

Subgroup analysis

In subgroup analyses, we divided all included studies into two groups, namely “posterolateral fusion group” and “anterior fusion group.” At 6 months follow-up, the summary RRs on fusion failure were statistically significant (anterior subgroup: p = 0.02, RR = 0.38, 95 % CI = 0.16–0.86; posterolateral subgroup: p = 0.0002, RR = 0.58, 95 % CI = 0.44–0.77). No heterogeneity was observed within either subgroup (I 2 = 3 % and I 2 = 0 %, respectively; Fig. 2). At postoperative 12 months, there was significant difference between the two intervention groups on fusion failure in either the anterior subgroup (p = 0.01, RR = 0.28, 95 % CI = 0.11–0.74) or the posterolateral subgroup (p = 0.007, RR = 0.59, 95 % CI = 0.41–0.87). However, heterogeneity was found in the posterolateral subgroup (p = 0.09, I 2 = 51 %; Fig. 3). At 24 months follow-up, significant differences were found in both subgroups (anterior subgroup: p = 0.0002, RR = 0.25, 95 % CI = 0.12–0.52; posterolateral subgroup: p < 0.00001, RR = 0.34, 95 % CI = 0.22–0.53). Heterogeneity was present in the anterior subgroup (p = 0.10, I 2 = 57 %; Fig. 4). The result of subgroup analysis on reoperation has demonstrated that, in both subgroups, the rate of reoperation in rhBMP-2 group was significantly lower than that of the ICBG group (anterior subgroup: p = 0.02, RR = 0.48, 95 % CI = 0.26–0.89; posterolateral subgroup: p = 0.002, RR = 0.54, 95 % CI = 0.36–0.79). However, heterogeneity was found in the anterior subgroup (p = 0.12, I 2 = 52 %; Fig. 5). With regard to the ODI clinical improvement, the summary RRs stratified by the type of operative procedure were also not statistically significant (anterior subgroup: p = 0.59, RR = 0.86, 95 % CI = 0.50–1.49; posterolateral subgroup: p = 0.07, RR = 0.58, 95 % CI = 0.32–1.04), with no heterogeneity (Fig. 6).

We repeated subgroup analysis in accordance with the different total doses of rhBMP-2. One study reported by Glassman et al. [34] was omitted because the total dose of rhBMP-2 was not provided. Of the remaining nine studies, four studies [14, 15, 31, 32] and five studies [1618, 33, 35] were allocated to the “high-dose subgroup (40 mg)” and “low-dose subgroup (<40 mg)”, respectively. On the primary outcome, both subgroups showed a significant reduction on the fusion failure rate at postoperative 12–24 months (high-dose subgroup: p < 0.0001, RR = 0.34, 95 % CI = 0.20–0.57; low-dose subgroup: p < 0.00001, RR = 0.25, 95 % CI = 0.13–0.46) and no heterogeneity was found (Fig. 7).

Fig. 7
figure 7

Forest plot of the subgroup analysis on different total doses of rhBMP-2

Adverse events

Eight studies reported data on adverse events [1416, 18, 31, 3335]. All the authors of the included studies declared that no unanticipated rhBMP-2-related adverse events occurred in either intervention group. Owing to the different data formats, meta-analysis was not performed. The details are presented in Table 6. Glassman et al. [34] reported that multiple complications were only observed in six ICBG patients, but in none of the rhBMP-2 patients. A rate of retrograde ejaculation (RE) of 4.1 % (6 of 146) was reported by Burkus et al. [16] in the entire cohort who underwent ALIF, without comparison between the two intervention groups.

Table 6 Category of adverse events

Sensitivity analysis

To evaluate whether the studies rated to be with high risk of bias significantly affected our results, we performed sensitivity analysis by excluding these studies [15, 32, 34]. After excluding these studies, the summary RR of fusion failure at 12 months was 0.57 (95 % CI = 0.39–0.82) and at 24 months was 0.28 (95 % CI = 0.17–0.45); neither was significantly different from previous RRs.

Test of interaction

On the fusion failure rate, RRs in the respective subgroups at each postoperative interval were compared and no significant differences were found at 6 months follow-up (Z = −0.93, p = 0.176, 95 % CI = 0.27–1.58), at 12 months follow-up (Z = −1.43, p = 0.076, 95 % CI = 0.17–1.32), or at 24 months follow-up (Z = −0.7, p = 0.242, 95 % CI = 0.31–1.73).

Publication bias

The funnel plot of fusion failure at 24 month is presented in Fig. 8. Evidences of publication bias were found in both Egger’s test (p = 0.003 < 0.1) and Begg’s test (p = 0.048 < 0.05). After correcting for publication bias by the trim and fill method, the summary RR in either fixed-effect model or in random-effect model was 0.340 (95 % CI = 0.229–0.505), which was not materially different from the uncorrected result (0.31, 95 % CI = 0.21–0.46).

Fig. 8
figure 8

Funnel plot on fusion failure

Discussion

Ten RCTs involving 1,342 patients that compared rhBMP-2 with ICBG for lumbar fusion were identified in this meta-analysis. Compared with ICBG, the use of rhBMP-2 significantly reduces the risk of fusion failure and reoperation at all time intervals. Subgroup analyses stratified by the type of surgical procedure yielded similar results. A greater ODI clinical improvement was found in the rhBMP-2 group when compared with that in the ICBG group, but there was no significance. The results from the test of interaction demonstrated that the use of rhBMP-2 in each approach obtained higher fusion rates than with ICBG, but there was no significant evidence to support a different treatment effect in the different approaches. In clinical practice, the most important reason why patients consider fusion surgery is for pain relief and return of function. In this meta-analysis, all included studies, which reported clinical outcomes (the ODI score, the SF-36 PCS score, or the NRS back pain and leg pain score), showed a significant improvement in the rhBMP-2 group when compared with the preoperative scores. However, significant differences between the rhBMP-2 group and the ICBG group on clinical outcomes were only found in the SF-36 PCS score reported by Burkus et al. [17], in the NRS back pain score reported by Burkus et al. [17] and Haid et al. [35], and in the NRS leg pain score reported by Burkus et al. [17].

In contrast to the anterior approach, variable factors in the posterolateral approach, such as extensive soft tissue stripping, limited bony surface area, and compression due to tension of the posterior muscles, produce a challenging environment for attaining robust fusion [32, 37].Valuable experience from previous nonhuman primate trials in posterolateral spinal arthrodesis [3840] has demonstrated that a stand-alone collagen sponge carrier was insufficient to obtain solid fusion. Therefore, for this approach, surgeons should use specific doses of rhBMP-2 combined with specially designed carriers. RhBMP-2 stimulates bone formation and healing by promoting the differentiation of primitive mesenchymal cells into osteoblasts by a cascade mechanism [41, 42]. Moreover, this effect requires a relatively stable environment. Osteoconductive bulking agents, such as biphasic calcium phosphate (BCP) granules and HA-TCP compression-resistant matrix (CRM), have been proven to resist the compression force of the posterior musculature and to ensure close contact between rhBMP-2 and the surface of the cancellous bone [43, 44]. Therefore, rhBMP-2 with osteoconductive materials can maximize the effect of osteoinduction. In this meta-analysis, studies that used osteoconductive materials in posterolateral lumbar fusion had successful fusion rates at 2 years follow-up (88–100 %), which is consistent with the results of previous animal trials [38, 4345].

The purpose of this meta-analysis was to evaluate the effectiveness and, more importantly, safety of rhBMP-2 compared with ICBG in lumbar fusion. Unfortunately, due to the lack of data and the inconsistent data formats, a quantitative analysis of complications associated with rhBMP-2 use cannot be conducted. Despite that none of the authors of the studies included in our analysis directly attributed complications to the use of rhBMP-2 and its potential advantages, the use of rhBMP-2 has been associated with various potential complications.

Haid et al. [35] identified unintended bone formation outside of the disc space by thin-cut CT scans in 24 of 32 rhBMP-2 patients (75 %) who underwent single-level PLIF as compared to 4 of 31(13 %) control patients. Although the authors stated that these findings were not associated with adverse outcomes, a difference between incidences with highly statistical significance (p < 0.0001) was revealed. To our knowledge, the formation of heterotopic bone may occur when rhBMP-2 spills from carriers into unwanted sites [46]. But these complications were not included in the author’s comments on adverse events related to rhBMP-2. In a case report, five patients with ectopic bone formation in the spinal canal and neurological complaints after either PLIF or TLIF using rhBMP-2 were identified by Wong et al. [47]. Three patients underwent an extensive revision surgery. Similarly, Chen et al. [48] presented four cases (4/147, 3 %) of bone formation in and around the neural foramen following the use of rhBMP-2 in TLIF. These patients reported neurological complaints and three of them required decompression. In view of this, the possible complication should be recognized when rhBMP-2 is used.

Bony resorption or osteolysis which presumably results from excessive osteoclastic activity [46, 49] is another potential complication, although resorption or osteolysis may represent a normal remodeling process in a progression to fusion [46]. As we know, severe resorption or osteolysis can also cause problems with graft subsidence and result in mechanical failure. In a 6-year follow-up study on 279 patients originally reported in 2002 [16], Burkus et al. [50] revealed seven adverse subsidence events (5.4 %) and four of these required additional surgery. Interestingly, all these adverse events were reported to have occurred within the first 2 years [50], but were not reported in the original 2-year follow-up industry-sponsored publication [16]. In a nonindustry-supported prospective study, Vaidya et al. [51] reported significant lucency and subsidence in 70 % (14 of 20) of ALIF levels with rhBMP-2 (27 % had >10 % subsidence). However, the mean subsidence without rhBMP-2 was 6 %.

Retrograde ejaculation (RE), which is secondary to trauma to the superior hypogastric plexus in the retroperitoneal space [52], also arouses our attention. In the RCT of ALIF comparing rhBMP-2 against ICBG with the LT-cage, Burkus et al. [16] reported a rate of RE of 4.1 % (6 of 146) in the entire cohort, without comparison between the two intervention groups. However, reanalyzing relevant data from the 2002 Food and Drug Administration document on the same cohort [53], Smoljanovic et al. [54] reported a high rate of RE associated with rhBMP-2 (7.9 %, rhBMP-2 group vs. 1.4 %, ICBG group; Fisher exact p = 0.05). Similarly, Carragee et al. [55] conducted a retrospective analysis on 243 patients who underwent ALIF with or without rhBMP-2; five RE events (7.2 %) were reported in the rhBMP-2 group and one (0.6 %) in the ICBG group (Fisher exact p = 0.0025). Although the incidence of RE might vary by approach, technical level and comorbidities (such as diabetes), findings with statistical significance indicated a strong association of rhBMP-2 with RE.

The risk of malignancy with the use of high-dose (40 mg) rhBMP-2 in iPLF has already been highlighted by the FDA [56]. Dimar 2nd et al. [31] reported on 463 patients undergoing iPLF using a dose and concentration of rhBMP-2 (40 mg, 2 mg/ml), which was significantly higher than the dose and concentration in clinically available rhBMP-2/ACS (12 mg, 1.5 mg/ml). The authors found no difference in complication rates between the two intervention groups, and reported no BMP-related complications. In a retrospective case series of 1,037 cases (including the participants in the study by Dimar 2nd et al. [31]) in 2011, relevant statistical data from three studies [5759] in which very large doses of rhBMP-2 (36–320 mg, mean: 91.2 mg) were used in spinal fusion were quoted by Glassman et al. [60] to support a conclusion that the use of high-dose rhBMP-2 did not increase the incidence of associated complications. However, on the same cohort in the study reported by Dimar 2nd et al. [31], the FDA document in 2010 reported a 3.8 % rate (9/239) incidence of new malignancy with the use of high-dose (40 mg) rhBMP-2 compared with a 0.9 % rate (2/224) in controls (NNH < 33), meaning that there may be a real association with approximately 90–95 % probability [61]. Unfortunately, this finding was not mentioned by the original authors [31, 60].

One of the major concerns for surgeons is that the use of a new technology should bring benefits to patients while minimizing the damage. Our results of this meta-analysis demonstrated that the use of rhBMP-2 could significantly decrease the risk of fusion failure and reoperation, as well as achieve significant improvement in various clinical outcomes when compared with the preoperative score. Owing to the inconsistent statistics and conclusions on the rhBMP-2 morbidity, the presentation of rhBMP-2 complication in the original industry-sponsored trials did not seem to fully reflect the data available from the FDA document and subsequent independent studies. In a systematic review focusing on the safety of rhBMP-2, Carragee et al. [61] concluded that original, industry-sponsored trials underestimated rhBMP-2-related adverse events. Evidence from multiple independent studies [47, 48, 51, 54, 55] indicated that there appears to have been an increased risk of uncommon and serious complications with the use of rhBMP-2 in lumbar fusion. Therefore, in sum, it is difficult for us to determine the true nature and frequency of complication associated with rhBMP-2.

There are several limitations in this meta-analysis. First, only ten RCTs were identified and were limited to English as publication language. As a result, publication bias exists. Although the treatment’s effect may be overestimated [23], the results after correcting for this bias were not materially different from the uncorrected results. Second, three of ten included studies were rated to be with “high risk of bias”. Furthermore, almost all included studies reported conflicts of interest except the one in 2002 of Burkus et al. [16]. As a result, to some extent, the presence of conflicts of interest and the potential possibility for reporting bias may affect the authenticity of results. However, this deviation can hardly be distinguished by the risk of bias assessment form recommended by the CBRG [20]. As a consequence of these factors, additional high-quality independent RCTs with long-term outcomes and concerns about complication are needed to further evaluate the efficacy and safety of rhBMP-2.

Conclusions

Compared with ICBG, the use of rhBMP-2 can significantly increase fusion success rate and simultaneously decrease the risk of reoperation. However, compared with those original publications, the subsequent studies and the FDA documents have presented a completely different pattern of complications associated with rhBMP-2 use. The complications seem to be more common and serious than reported in these included studies. Therefore, for surgeons, the advantages of rhBMP-2 as an alternative to ICBG in lumbar surgery must be cautiously weighed against the disadvantages. Furthermore, we must note that great efforts should be made to further define the relationship between rhBMP-2 use and potential complications.