Introduction

Lumbar spinal posterolateral fusion (PLF) with the use of bone grafts is a common surgery used to treat a variety of degenerative spinal disorders [1, 2]. Because of the common postoperative complication of bone fusion failure, bone grafts include iliac crest bone-graft (ICBG), and various bone graft substitutes such as bone morphogenetic proteins (BMPs) are used to increase the spinal fusion rate and attempt to avoid autogenous ICBG-related morbidity while PLF is performed [35].

Recombinant human bone morphogenetic protein-7 (rhBMP-7), which is also termed osteogenic protein-1 (OP-1), is one such osteoinductive protein. When combined with a collagen matrix, rhBMP-7 has been shown to increase the bone-healing rate and improve autograft performance in animal spinal fusion models [6]. In 2004, rhBMP-7 received a Humanitarian Device Exemption as an autograft alternative for revision PLF from the U.S. Food and Drug Administration (FDA) [7]. To date, rhBMP-7 has been used clinically for more than a decade. However, because of few relevant clinical studies including limited numbers of patients were published [12, 14, 15, 18, 20], the effectiveness of rhBMP-7 relative to autogenous bone grafts remains controversial and requires further investigation.

Hence, to evaluate the effectiveness of rhBMP-7 as a bone graft substitute in patients undergoing single-level PLF, we conducted this systematic literature review and meta-analysis. In the present study, we collected all relevant RCTs from the published literature and compared patients that received rhBMP-7 (experimental group) with those that received autologous bone grafts (control group) and analyzed parameters such as the primary outcomes (e.g., fusion rates, clinical success rates, and safety and adverse events reports) and secondary outcomes (e.g., operation time, and hospital stay durations).

Materials and methods

Search strategy and study selection

We conducted an electronic search of the electronic databases PubMed, EMBASE, Scopus, and the Cochrane Collaboration Library from inception to July 1, 2016 for relevant randomized controlled trials (RCTs). We used the search terms (spine AND fusion) AND (BMP-7 OR bone morphogenetic protein OR osteogenic protein-1 OR OP-1) and limited the search results to humans and clinical research without language restrictions. In addition, the references of the retrieved articles were searched. Articles were considered eligible if they met the following criteria: (1) clinical research from RCTs; (2) study populations comprising adult patients with lumbar degenerative or isthmic spondylolisthesis that had undergone single-level PLF; and (3) a comparison of OP-1 and autologous bone graft. Articles were excluded if they met any of the following criteria: (1) inclusion of only minor patients in the study population; (2) inclusion of patients with spinal deformities, fractures, tumors, or infections; and (3) cases of spondylolisthesis classified higher than Meyerding Grade II [8].

Data extraction

Finally, we extracted the following information from the eligible articles (Table 1): (1) patient characteristics, including the diagnosis, age, gender, number of enrolled subjects and follow-up; (2) intervention (surgery) data, including the dose, carrier and concentration of OP-1, type of graft in the control group, number of fusion levels, and the presence or absence of pedicle instrumentation (Table 2); and (3) outcomes data, including fusion success, clinical success, safety and adverse event reporting, operation time, blood loss, and hospital stay duration. Fusion outcomes were evaluated based on plain films (static, dynamic) and computed tomography (CT) scans. To be considered a successful fusion, radiographic evidence of a complete bridging bone between the transverse processes at the level of the spondylolisthesis was required; comparisons of the flexion–extension lateral X-ray films were required to reveal a ≤5° angulations and ≤2–3 mm translation. A clinically successful result was defined as a minimum improvement of 20% in the preoperative Oswestry Disability Index (ODI) [9].

Table 1 Characteristics of the Included Studies
Table 2 Investigated Operative Data in the Included Studies

Study quality assessment

We used the Physiotherapy Evidence Database (PEDro) scale to perform a methodologic quality assessment of each included RCT [10]; this scale was initially developed to rate the quality of RCTs in the Physiotherapy Evidence Database (http://www.pedro.fhs.usyd.edu.au). The scale contains 11 items, and each satisfied item (except the first item) contributes 1 point to the total PEDro score (range 0–10 points). Two independent reviewers assessed the quality of each eligible article and resolved discrepancies through consensus.

Meta-analysis

The Cochrane Collaboration statistical program (Review Manager 5.3; available at tech.cochrane.org) was used for data analysis. Across all component studies, we performed the meta-analysis using the fixed-effects model if there were no significant differences between the tests for heterogeneity; otherwise, we used the random-effects model. Forest plots of pooled data were used to graphically present the results of individual studies.

Results

Search results

A total of 801 references were initially extracted from the above-mentioned electronic databases via the described search strategy but only 10 RCTs were included (Fig. 1) [1120]. One of these RCTs did not report the necessary outcomes [19] and four were duplicate reports of the same set of patients [11, 13, 16, 17]. Finally, a total of five RCT studies with 539 patients (361 in the rhBMP-7 group, 178 in the control group) were included in the current meta-analysis (Fig. 1) [12, 14, 15, 18, 20]. A manual search of the references from the retrieved articles did not yield any additional eligible studies.

Fig. 1
figure 1

Flow diagram of the literature search

Methodological quality and publication bias

Although our study was based on RCTs, we conducted a further evaluation based on the PEDro scale to investigate the methodological quality of the included studies. According to the PEDro scale, the study quality scores ranged from 4 to 8 (mean: 6.6, median: 6) (Table 1). The mainly reasons for the relatively low quality scores were the absence of blinding of the surgeons, subjects or assessors. Additionally, the low follow-up rates led to reduced quality scores [12, 15]. However, the small number of included studies has limited our ability to assess or draw conclusions regarding publication bias. Furthermore, all articles included in this meta-analysis were published in English and this fact may bias the data.

Fusion success

Relevant data were extracted from all five eligible articles. Using subgroup analysis, we investigated the influence of spinal instrumentation on the effect estimate by grouping the eligible studies according to whether the pedicle internal fixation system was used. The “instrumented group” comprised two articles including a total of 132 patients (66 in the rhBMP-7 group, 66 in the control group) [14, 20] and the “non-instrumented group” comprised three article including a total of 245 patients (172 in the rhBMP-7 group, 73 in the control group) [12, 15, 18]. In the instrumented group, the analysis was performed in the absence of any statistical heterogeneity (I 2 = 0%, P = 0.53), and significant difference was observed between the BMP-7 and the control groups (RR = 0.76, 95% CI [0.60, 0.98], P = 0.03; Fig. 2). The significantly lower fusion rates with rhBMP-7 (54 versus 74%, P = 0.03) was found in the “instrumented group”. In the non-instrumented group, no statistical heterogeneity was found (I 2 = 0%, P = 0.50) and no significant difference was observed between the compared groups with regard to fusion rate (RR = 0.97, 95% CI [0.83, 1.14], P = 0.73; Fig. 2). As a whole, the test for overall fusion rates suggested that there was no significant difference between the two groups, regardless of the use of spinal instrumentation (RR = 0.89, 95% CI [0.78,1.02], P = 0.09; Fig. 2). Specifically, the overall patient fusion rates were 69.3% in the rhBMP-7 group and 75.5% in the control group.

Fig. 2
figure 2

Fusion rate, rhBMP-7 group vs. control group with 1-level PLF, results showed no significant difference was observed between the compared groups with regard to fusion rate

Clinical success of improvement from preoperative ODI

Although the five eligible studies reported the clinical outcomes in detail, the evaluation methods differed. The studies that used ODI in their assessments reported similar conclusions—specifically, that the ODI scores either decreased significantly or improved by at least 20% from the baseline in both groups after surgery but that no significant difference was found between the two groups in a clinical assessment [11, 12, 14, 15]. Three articles [12, 15, 20] documented specific clinical success data with no statistical heterogeneity was found (I 2 = 0%, P = 0.38). As a result, the pooled data from these three relevant studies did not reveal a significant difference with regard to ODI improvement (RR = 0.95, 95% CI [0.84, 1.06], P = 0.33; Fig. 3).

Fig. 3
figure 3

Clinical outcomes: Oswestry, rhBMP-7 vs. control group with 1-level PLF, results showed no significant difference with regard to ODI improvement between the two groups

Overall adverse events and revision rates

All the studies except for that published by Kanayama et al. [14] reported the safety and adverse event data in detail. The complete data were abstracted from these four articles including a total of 515 patients (350 in the rhBMP-7 group, 165 in the control group). There was no evidence for mathematical heterogeneity among these publications (I 2 = 0%, P = 0.82). The pooled effects for overall complications from these four relevant studies did not reveal a significant difference (RR = 0.78, 95% CI [0.60, 1.01], P = 0.06). The statistical analysis with respect to the rate of adverse events did not reveal significant differences between the two groups in the study (Fig. 4.1). In addition, reoperation rate data were available for three studies [12, 18, 20]. The evidence of statistical heterogeneity was found between the two groups (P = 0.04, I 2 = 70%). So the estimates were pooled using a random-effects model from these 3 relevant studies, which did not indicate any significant difference between the rhBMP-7 and autogenous bone grafts (RR = 1.63, 95% CI [0.38, 7.04] P = 0.51; Fig. 4).

Fig. 4
figure 4

Complication rates and revision rate, rhBMP-7 vs. control group with 1-level PLF

Operation time

Details regarding the operation time were available for three studies [11, 15, 20]. Vaccaro et al. (2008) provided the means for these data but omitted the standard deviations (SDs) [12]. Because we failed to obtain specific data from these authors, we arbitrarily set SDs for the treatment and control groups by averaging the available SDs for both the treatment and control groups, respectively. Finally, relevant data pooling yielded a pooled WMD value of −16.70 (95% CI [−25.83, −7.57]) that significantly favored the treatment (rhBMP-7) group (P = 0.0003; Fig. 5).

Fig. 5
figure 5

Operation time, rhBMP-7 vs. control group with 1-level PLF

Hospital stay

Relevant hospital stay data were extracted from two eligible articles contain 147 patients [15, 20], and no statistical heterogeneity was detected (I 2 = 0, P = 0.52). Our analysis revealed no benefit of BMP-7 in reducing hospital stay (WMD = 0.74, 95% CI [−0.94, 2.42], P = 0.39; Fig. 6).

Fig. 6
figure 6

Hospital stay, rhBMP-7 vs. control group with 1-level PLF

Blood loss

Only one study [20] provided specific data regarding blood loss and reported no significant difference in blood loss between the two groups (P = 0.50). However, Vaccaro et al. (2008) [12] reported that the mean operative blood loss was significantly lower for the rhBMP-7 group than the autograft group (P = 0.00004).

Discussion

The aim of the present meta-analysis was to compare the efficacy and safety of rhBMP-7 (3.5 mg per side) and autogenous ICBGs based on pooled data from all available and relevant published RCT studies. Evaluation criteria such as the fusion rate, clinical success, adverse events, operation time, and hospital stay duration were assessed. Finally, the results of present meta-analysis indicated that when compared with autogenous ICBG, rhBMP-7 appear to yield lower fusion rates in instrumented posterolateral fusion procedures and comparable fusion rates in the non-instrumented group, it could also reduce the operation time. Additionally, the outcomes demonstrated a lack of significant differences between rhBMP-7 and ICBG in terms of clinical success of ODI, overall adverse events, revision rates, and duration of hospitalization.

It is well established that instrumentation may play a beneficial role in the modern practices of reduction and fusion for PLF, especially in the patients with low-grade isthmic spondylolisthesis [21, 22]. The outcomes of fusion are generally good, although reports vary widely. In the study by Jenis et al. conducted a histological and radiographic analysis of rhBMP-7 implantation in a rabbit lumbar fusion model and found that the long-term osteoinductive effect of rhBMP-7 did not appear to be affected by spinal fixation whereas fixation appeared to enhance early fusion in the autograft group [23]. Based on these data, we conducted a subgroup analysis to investigate the effect of instrumented and non-instrumented posterolateral procedures on fusion success. As a result, in the instrumented group, rhBMP-7 appears to yield significantly lower fusion rates compare with autograft, while this difference could not seen in the noninstrumented groups. Taken together, it seems that the use of the rhBMP-7 instead of ICBG produce no additional beneficial effect on the fusion rates in posterolateral lumbar surgery.

Additionally, our data showed operating times were lower with rhBMP-7. Regarding more clinical outcomes, such as clinical success of ODI and hospital stay duration, no statistically significant effect was found in our analyses. The advantage of shorter surgery time in rhBMP-7 group make sense because don’t need to perform iliac crest autografting. Thus far, previous studies have shown a lack of a clear advantage of clinical success when rhBMP-7 was used as a substitute for ICBG, as well as cost and stay duration of hospitalization.

Given the wide use of BMPs, the adverse effects of the OP-1 bone graft substitute have increasingly attracted attention [24]. A systematic review conducted by Carrafee et al. revealed that the use of rhBMP-2 for the treatment of spinal fusion was associated with statistically significant increases in overall adverse events, including life-threatening events [25]. Additionally, BMPs are expressed by and promote the growth of some types of cancer [2628]. Therefore, we analyzed the adverse effects across two groups. Our data showed no statistical difference in overall complication rates and revision rates between the use of rhBMP-7 and the use of ICBG. According to our analysis, the safety and adverse events reported in the studies included in our meta-analysis focused on organ system complications (e.g., cardiovascular, respiratory, gastrointestinal) and decompression and fusion surgery complications (e.g., dural tears, surgical infections, neural injury), and no evidence of rhBMP-7 material-related systemic toxicity, ectopic bone formation, or implant migration was observed [11, 12, 15, 18]. When assessing the reoperation rates, we found that the pooled data from the three relevant studies did not indicate any significant differences between the rhBMP-7 and control groups. The main reasons of revision surgery performed were persistent back pain or leg pain and fusion failure. It is noteworthy that in the study conducted by Delawi et al. [11], one patient in rhBMP-7 group was diagnosed with a glioblastoma 11 months after surgery. However, there have been no previous reports relating the use of BMPs to the occurrence of glioblastoma. On the contrary, several recent studies have indicated the potential use BMPs to prevent glioblastoma growth and recurrence in humans [2931].

Although this meta-analysis was based on five RCTs of relatively high methodologic quality, there are several limitations that should be noted. First, the small number of studies and enrolled patients might not provide sufficient statistical power (risk of a type II error). Therefore, larger controlled studies of higher quality will be needed to draw more reliable conclusions. Second, because different rhBMP-7 carriers were used among the studies might have been responsible for the different fusion rates, thus potentially limiting the reliability of the outcomes. Finally, although previous studies have indicated that rhBMP-7 was more cost effective than autologous bone grafts in patients with tibia nonunion, but currently there is insufficient evidence for the cost effectiveness of rhBMP-7 relative to autologous bone graft for spondylolisthesis treatment. Further studies that include cost–benefit analyses will be helpful.

In conclusion, with the exception of reducing the operation time, our review suggests that the use of the rhBMP-7 instead of ICBG produce no any additional beneficial effect on the fusion rates, clinical success of ODI, overall adverse events, revision rates and duration of hospitalization in single level posterolateral lumbar surgery. Or even opposite conclusion was drawn in the instrumented posterolateral fusion patients; it appeared to yield lower fusion rates. Taken these together, rhBMP-7 seems not to be an effective tool to facilitate lumbar fusion in single-level PLF compare with autogenous ICBG. Additionally, the current literature would benefit from well designed and conducted large RCTs that directly compare rhBMP-7 and autograft use in clinical practice.