Introduction

Posterolateral fusion (PLF) is the most commonly used spinal arthrodesis technique. Generally, this technique is used to correct deformity, neurological involvement, stenosis, or instability [1]. In most cases, the bone graft required to perform PLF is extracted from the iliac crest. This area is preferred because of its high cancellous bone concentration, which helps generate new tissues necessary for bone fusion. Often, iliac crest bone graft (ICBG) harvest is associated with problems, such as donor site morbidity, or sometimes, grafting may be insufficient in individuals with osteoporosis [2].

Bone morphogenetic proteins (BMPs) are growth factors that facilitate osteoinduction. They have been effectively used in the field of orthopedic surgery [3]. The FDA has approved two types of BMPs, BMP-2 and BMP-7 (OP-1) [4]. Some authors have reported on the benefits of using this family of proteins in procedures related to spine surgery [5, 6]. Extensive data have shown that molecules belonging to the BMP family can initiate signaling cascades that are essential for bone formation, including the migration of pluripotent mesenchymal stem cells and their differentiation into osteoblasts [7]. Although recombinant human BMP-2 (rhBMP-2) has been proven to be osteoinductive and is considered a promising substitute for autogenous bone grafts, questions have been raised with regard to its safety [8]. In fact, some authors have claimed that high doses of rhBMP-2 may be associated with the risk of developing malignant tumors [9]. Given its brief history, the use of BMP-2 for procedures involved in orthopedic surgery remains controversial with insufficient evidence available.

Therefore, the objective of this meta-analysis is to achieve a general understanding of the efficacy and safety of BMP-2 in order to draw more accurate conclusions, which can aid in decision-making regarding the use of BMP-2 in comparison to ICBG in PLF. Furthermore, this study aimed to answer some of the questions raised with regard to the effectiveness of BMP-2.

Materials and methods

Search strategy and study selection

This meta-analysis was conducted according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement [10]. A systematic search of the literature using PubMed, EMBASE, Scopus, and the Cochrane Collaboration Library database was carried out through July 2018. The following search terms were used: (1) posterolateral fusion of lumbar spine; (2) bone morphogenetic protein and iliac crest bone graft; (3) randomized clinical trials (RTCs). In addition, the reference lists of retrieved papers and recent reviews were reviewed. The search was limited to studies published in the English language.

Screening of titles or abstracts was first performed. Then a second screening was based on full-text review. Studies were considered eligible if they met the following criteria: (1) the study design was a randomized clinical trial study; (2) the type of bone morphogenetic protein was defined as BMP-2 (rhBMP-2); (3) the outcome of interest was fusion and was measured; and (4) relative risk (RR) or hazard ratio (HR) and its corresponding 95% confidence interval (CI) (or data to calculate them) were reported. Exclusion criteria were studies that did not treat humans.

Data extraction

We extracted the basic data independent of each study: article, year, number of patients, sex, age, BMP-2 dose, delivery vehicle, BMP-2 concentration, and duration. The variables for comparison were limited to quantitative variables. These variables were related to the efficacy, safety, and optimization of the use of BMP-2. These were dichotomous, whereas for the measurement of other variables, the mean was used as a central tendency measure and the standard deviation as a dispersion measure. First, a comprehensive reading of the articles was done taking into account the comparisons presented by each one. With this method, four aspects could be compared: surgery (surgery time, blood loss, and hospitalization duration), fusion (at 6, 12, and 24 months), clinical variables (Oswestry Disability Index (ODI), 36-Item Short Form Health Survey (SF-36) and Back Pain Score), adverse effects (respiratory effects, infection, malignancy, non-union, and total union), and additional surgery procedures (ASPs). In terms of safety, many comparisons could be drawn, but only those that were controversial or doubtful in the literature were selected. The evaluation of the fusion was based on radiological examinations (static and dynamic) and CT. Using CT fusion was based on presence of either unilateral or bilateral bridging bone. Using conventional radiology, a successful fusion was defined as bridging of the trabecular bone between the transverse processes with the absence of motion (with motion defined as + 3 mm of translation and + 5° of angulation on flexion–extension views), absence of radiolucent lines through the fusion mass, and absence of any secondary signs of non-union, such as fracture or loosening of the screws, in at least one side of the spine.

Study quality assessment

The quality of RCTs was evaluated in accordance with Review Manager (RevMan) version 5.3 (The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen, 2014) software to assess for the risk of bias. If there was a conflict between the two reviewers, a third reviewer is consulted and a discussion is conducted to arrive at a decision. The evaluation method consisted of the following steps: random sequence generation, allocation concealment, blinding, incomplete outcome data, and selective outcome reporting (Fig. 1).

Fig. 1
figure 1

Risk of bias summary. (green = low risk; red = hight risk; white = unknown) (color figure online)

Statistical analysis

The meta-analysis was performed using the Review Manager 5.3 software provided by the Cochrane community. The odds ratio (OR) with a 95% confidence interval (CI) was calculated for the dichotomous variables, and the difference in means (DM) and the 95% CI were calculated for the continuous variables. Heterogeneity was evaluated using the Chi-square test and the I2 method. The I2 statistic describes the percentage of variation across studies that is due to heterogeneity rather than chance. When heterogeneity is substantial, a prediction interval rather than a confidence interval can help have a better sense of the uncertainty around the effect estimate. I2 varies from 0 to 100%: 30–40% indicates insignificant heterogeneity, 30–60% moderate heterogeneity, 50–90% substantial heterogeneity, and 75–100% high heterogeneity. The inverse variance method and the fixed effects were used according to whether or not there was significant statistical heterogeneity in the results. p values less than 0.05 were considered significant.

Results

Literature search

The results of the selection process are presented in Fig. 2. As regards the type of BMP, only two types were approved for use by the FDA: BMP-2 and BMP-7 [4]. In this study, only BMP-2 was included in the comparison. It was found that there were no similar meta-analyses since 2008, except for a meta-analysis comparing BMP-7 and ICBG that was published in 2017 [11, 12]. The decision to conduct a meta-analysis and update the use of BMP-2 compared to ICBG in PLF was based on the following: first, four of the studies selected were published after 2008; second, the date of the last meta-analysis about the use of BMP-2 at 2008. The studies included in this meta-analysis were updated, rigorously selecting three meta-analysis studies. The meta-analysis consisted in six RCTs [1, 2, 13,14,15,16]. With regard to the included patients, the sample was homogeneous, with a similar average age in the six RCTs, a higher proportion of women than men, and a follow-up time of between 6 and 48 months. In terms of the type of intervention, all patients underwent posterolateral lumbar fusion and were randomly divided into two groups for the administration of two types of osteoinductive material: ICBG or BMP-2 in different vehicles, doses, and concentrations. Seventeen studies that either did not treat humans or did not share common variables to enable comparison in the meta-analysis or had low quality were excluded.

Fig. 2
figure 2

Study selection flow diagram (preferred reporting items for systematic reviews and meta-analysis)

Baseline data

The main characteristics of the six selected studies are summarized in Table 1. The studies were published from 2002 to 2017. There was a total of 908 patients, 446 received BMP-2 and 462 received ICBG, of whom 380 and 528 were men and women, respectively. The RCTs were performed in elderly people, aged 52–70 years. The dose varied depending on the study used. Furthermore, the average duration was 24 months.

Table 1 Characteristics of included randomized control trial studies of bone morphogenetic protein-2 versus iliac crest bone graft for the posterolateral fusion of the lumbar spine included in the meta-analysis

Fusion success

At 6 months, the differences between both groups were significant. A total of 687 patients were included. About 86% achieved fusion in the BMP group and 60% in the ICBG group (OR 3.75, 95% CI: 2.58–5.44), p < 0.00001, I2 = 86%) (Fig. 3a). A total of 448 patients were studied at 12 months. three RCTs could be compared. Approximately 88% and 80% achieved fusion in the BMP and ICBG group, respectively. This difference was significant (OR 1.76, 95% CI: 1.06–2.92, p = 0.03, I2 = 43%) (Fig. 3b). At 24 months, 519 patients were included. During this period, 94% achieved fusion in the BMP group and 83% in the ICBG group (OR 3.12, 95% CI: 1.71–5.72, p < 0.0002, I2 = 0%) (Fig. 3c).

Fig. 3
figure 3

Forest plot of risk difference in fusion rates from randomized control trials: a at 6 months; b at 12 months; c at 24 months

Surgery time, blood loss and hospitalization days

The surgical variables were analyzed in terms of surgery time, blood loss, and hospitalization duration. The surgery time was significantly longer in the ICBG group (MD − 17.56, 95% CI: − 23.98 to (− 11.14), p < 0.00001, I2 = 83%) (Fig. 4a). Similarly, the blood loss was also greater in the ICBG group (MD − 61.19, 95% CI: − 101.73 to (− 20.66), p = 0.003, I2 = 78%) (Fig. 4b). Hospitalization duration was significantly longer in the ICBG group (MD − 0.40, 95% CI: − 0.67 to (− 0.14), p = 0.0005, I2 = 83%) (Fig. 4c).

Fig. 4
figure 4

Forest plot of risk difference in surgery variables rates from randomized control trial studies included: a surgery time, b blood loss and c hospitalization days

Clinical success

The three variables that indicated the patients’ quality of life of were as follows: ODI, SF-36, and Back Pain Score. The ODI decreased in both groups, but the difference between the two groups were not significant (MD 2.57, 95% CI: − 3.51 to 8.66), p = 0.83, I2 = 0%) (Fig. 5a). Figure 5b shows SF-36 results (MD − 0.89, 95% CI: − 4.31 to − 2.54, p = 0.61, I2 = 0%). Figure 5c shows Back Pain Score outcomes (MD 0.13, 95% CI: − 0.74 to 0.99, p = 0.77, I2 = 0%).

Fig. 5
figure 5

Forest plot of risk difference in clinical variables rates from randomized control trial studies included: a Oswestry Disability Index (ODI), b 36-Item Short Form Health Survey (SF-36) and c Back Pain Score

Additional surgical procedures (ASPs)

A total of 799 patients underwent ASPs. The experimental group (7%) had lesser ASPs than the control group (13%) (OR 0.49, 95% CI 0.30–0.79, p = 0.004, I2 = 0%) (Fig. 6).

Fig. 6
figure 6

Forest plot of risk difference in additional surgical procedures rates from randomized control trial studies included

Overall adverse events

The summary of all the adverse effects compared in the study is shown in Tables 2 and 3. These differences were not significant for any of the adverse events except for non-unions (OR 0.28, 95% CI 0.11–0.68, p = 0.005, I2 = 0%).

Table 2 Risk difference in adverse events rates from randomized control trial studies included: a respiratory, b malignancy, c wound/surgical infection, d total adverse events
Table 3 Risk difference in non-union rates from randomized control trial studies included

Discussion

Based on the findings of the present meta-analysis, we can say that first, when BMP-2 is compared with ICBG, a higher fusion rate was obtained, in addition to a shorter operative time lower blood loss, and lesser hospitalization days, and second, more ASPs were performed in the ICBG group. In this regard, there is no meta-analysis conducted in the last 10 years including all recent RCTs. This meta-analysis updates the results of studies comparing BMP-2 with ICBG. The results obtained showed the absence of significant differences in terms of the clinical variables, such as ODI, SF-36, or Back Pain Score, and adverse events.

The RCTs were of considerably quality, therefore conclusions of greater evidence could be drawn from the present study [1, 2, 13,14,15,16]. The found fusion rate may be due to a greater ability of the BMP to generate bone bridges. The fusion rate was higher both in the short and long term, and showed more differences at 6 months which suggests that the regenerative capacity of BMP-2 is faster than that of ICBG. This evaluation of the fusion rate was mostly radiographic. CT grade showed differences also, but only two RCTs could assess it. The fact that higher resolution image tests show a higher fusion rate in the BMP group could support the fact that BMP-2 has an osteogenic capacity than ICBG.

ICBG is the gold standard for spine fusion, but with regard to certain variables such as surgery time or blood loss, different bone graft alternatives should be evaluated [12]. This is quite advantageous for BMP, as there is no need for bone autografts being harvested from the iliac crest. BMP presents a lower surgical morbidity. The fact that less blood is lost during surgery or shorter days of hospitalization translate into a higher quality of care as well as less consumption of resources, complications or reinterventions. In addition, patients with other types of pathologies could benefit as patients with osteoporosis since the bone graft from the iliac crest will be of lower quality [17, 18]. This is an important point to keep since most patients are patients between 50 and 70 years old. The clinical variables were equally distributed. This could be correlated with the fact that there were no differences in adverse events.

Despite these results, there are still concerns regarding the relative value derived from the use of BMP as a replacement for the iliac crest bone [12, 17]. In addition, these studies showed the relative safety of BMP. The greatest number of adverse events was found in the ICBG group despite there being no significant differences for either of the two groups except for non-unions. Thus, according to the results of the meta-analysis, it can be concluded that these events are practically distributed similarly in both groups. Some studies have found an association between BMPs and cancer. In the meta-analysis conducted, higher percentage of cancer was found in the BMP group, although it was not significant. However, the cancer rate is 2.5 times higher in the BMP group. This is an alarming fact since with a greater power it could be significant [8, 19,20,21]. With reference to the results obtained, the safety parameters included by the RCTs focused on respiratory complications, malignancy, or infection, but no evidence of poor safety was observed with the use of BMP [22]. Non-unions could lead to a higher number of reoperations or complications. However, the adverse effects of BMP could be due to a dosing problem. Nevertheless, the dose used in the different studies proved to be safe compared to ICBG. Therefore, treatment with BMP could be a substitute for optimal ICBG due to its greater efficacy, lower surgical morbidity, and similar adverse events with ICBG, resulting in significant clinical improvement.

Nonetheless, the different fusion techniques could be studied together with BMP. Recent studies have shown similar or better fusion rates in patients who received BMP-2 compared with those who received ICBG for posterolateral or transforaminal interbody fusions [23]. With regard to the type of BMP, it was shown that the most effective in terms of fusion, and where differences were really found, was BMP-2, with a strong recommendation against BMP-7 (OP-1) [11]. Our results differ from the meta-analysis between BMP-7 and ICGB, they found that with the exception of reducing the operation time, the use of the rhBMP-7 instead of ICBG produced no additional beneficial effect on the fusion rates, clinical success of ODI, overall adverse events, revision rates and duration of hospitalization in one-level PLF [11].

There are some limitations that must be taken into account: first, the doses and the different BMP carriers may have influenced in some way the results obtained. Second, according to these results, it seems logical to think that the use of BMP-2 is more beneficial than ICBG; but it would also be important to highlight the cost–benefit ratio: there is only one study that analyzed BMP vs. ICBG in terms of cost, so the necessary comparisons could not be made to arrive at results from which the pertinent conclusions can be drawn. The use of BMP reduces the need for additional surgery because of the higher fusion rate compared to ICBG, so that long time BMP would be more cost-effective than ICBG. The greater number of complications and persistent symptoms in the ICBG group compensate the higher initial cost for the use of BMP. Therefore, the total cost in each group is sensitive to changes in the incidence of complications and the need for additional treatment or revision surgery [24]. Third, there was a lack of detailed information on the definition of fusion. In addition, two studies included criteria for fusion using CT.

In summary, this meta-analysis concludes that the use of BMP-2 in PLF reduced the surgical morbidity and had more beneficial effects on the fusion rate on the long and short term. The quality of life based on clinical scores was the same in both groups. Finally, the safety was similar except the ICBG group had more non-union rate.