Introduction

Lumbar spinal fusion is a widespread technique for the surgical management of degenerative lumbar pathology, which might be indicated where conservative care fails to adequately control the symptoms [1, 2]. A major clinical challenge in the procedures of fusion surgery has centered around the issue of pseudarthrosis. In general, solid bony fusion depends on multiple factors: (1) patients’ age; (2) smoking status; (3) patients’ metabolism status; (4) quality of graft-bed preparation; (5) a stable and loaded construct; (6) comorbidities (e.g., osteoporosis); (7) number of fused levels; and (8) bone grafts selected [3, 4]. Among these, there is no doubt that the selection of grafts is a key determinant for the success rate of spinal fusion.

Autologous iliac crest bone graft (ICBG) can be filled into the posterolateral gutters and intervertebral to promote fusion in lumbar fusion, which was considered as the “gold standard” as it contains three inherent properties: osteoconductive, osteoinductive, and osteogenetic [5, 6]. However, the procedure of ICBG harvesting is inevitably associated with multiple donor-site-related complications including persistent iliac pain, iliac fractures, vascular and nerve injuries, hematomas and deep infections [7, 8]. In addition, the amount of available ICBG is limited, especially in multi-segment fusion, revision surgery, and patients with osteoporosis [9]. For the numerous disadvantages of ICBG, a variety of alternative bone substitutes, such as recombinant human bone morphogenetic proteins (rhBMP-2 and rhBMP-7), hydroxyapatite (HA), β-tricalcium phosphate (β-TCP), demineralized bone matrix (DBM), autograft local bone (ALB), bone marrow aspirate (BMA), silicate calcium phosphate (Si-CaP), platelet-rich plasma (PRP), and allograft, have been researched and applied separately or with various combinations to promote the process of lumbar fusion. The ideal bone substitutes should possess osteoconductive and osteoinductive properties and, when possible, osteogenetic cells to achieve a comparable fusion rate to ICBG. The grafts primarily developed to provide a conductive scaffold are ceramic products, such as HA, TCP, and Si-CaP, and DBM, while rhBMP and DBM are products equipped with osteoinductive character to facilitate osteogenesis. Other biological agents including PRP and BMA are rich in platelets (and their growth factors) and mesenchymal stem cells (MSCs) that could enhance the osteogenic potential of the scaffold materials.

In current, most of the RCTs comparing efficacy and safety of different bone substitutes are based on relatively small sample size, lacking data comparing multiple grafts to each other [3, 4, 9,10,11]. Previous head-to-head meta-analyses also could not rank these bone substitutes because some of them had not been compared one by one [12,13,14,15,16]. Therefore, this network meta-analysis (NMA) was carried out with the purpose of comparing the effectiveness and safety of all available bone grafts for the management of lumbar degenerative disease with lumbar spinal fusion and to provide a ranking spectrum of the grafts.

Methods

This review was conducted according to the guidelines outlined in Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) statement (See “Appendix 1”) [17]. A prospective protocol was created in advance and uploaded to the PROSPERO online platform.

Data sources and search strategy

Two independent researchers systematically retrieved the platforms of PubMed, EMBASE, and CENTRAL from the inception dates to Jun. 2019, using keywords including: “lumbar degenerative disease,” “lumbar spine,” “spinal fusion,” “bone graft,” “bone substitutes,” etc.

Eligibility criteria and study selection

The inclusion criteria were as following: (1) patients diagnosed with a lumbar degenerative disease undergone spinal fusion with bone graft materials; (2) definitive outcomes were reported in studies, such as fusion rate and the number of adverse events; (3) head-to-head RCT study; (4) the judgment of fusion was contingent on computed tomography (CT) or X-ray plain results. Exclusion criteria: (1) studies with single-arm design; (2) pathology other than degenerative diseases, such as infectious or inflammatory diseases, spinal tumors, and trauma; (3) studies with less than 10 subjects in any treatment arm.

There were two steps in study selection process: screening the titles and abstracts, and reviewing the full texts. Throughout the screening process, the two independent authors strictly followed the inclusion and exclusion criteria. Finally, references cited in eligible studies that were considered to be potentially relevant were also retrieved and assessed in full. In case of a disagreement between the two authors, a third investigator resolved the disagreement through discussion.

Data extraction

Two authors independently extracted the following information from each included studies: (1) Study characters: lead author, publication year, study design, the country of lead author, study period, and follow-up; (2) Patients information: number of involved subjects, number of patients dropped, percentage of male patients, and age at operation; (3) operation information (Intervention and Comparison): the types and dosages of bone grafts, and surgical methods; (4) Outcome information: success rate of fusion (based on plain/extension -flexion radiographs or thin-layer CT scan) and frequency of adverse events at final follow-up. The differences between the two authors were resolved by a third author after discussed.

Risk of bias assessment

The risk of bias was assessed using the Cochrane Collaboration’s risk of bias tool [18]. Each study was assessed on seven items: (1) random sequence generation; (2) allocation concealment; (3) performance bias; (4) detection bias; (5) incomplete outcome data; (6) reporting bias; (7) other bias. Each parameter is judged as low risk of bias, high risk of bias or unclear.

Data synthesis and statistical analysis

The primary and second outcomes analyzed were the fusion rate and the number of each specific treatment-related adverse events. We recorded all adverse events that were occurring during the course of treatment without distinguishing between their specific classifications. We used odds ratio (OR) and 95% credibility interval (95% CrI) as summary statistics to quantify the effect of treatment. A classic half-integer continuity correction was used so that studies with no events would still be included for analyses [19].

To illustrate which interventions were directly compared in the primary RCTs, we generated network plots using “network” suite of commands for Stata version 14.0 (StataCorp LLC, College Station, Texas, USA). R 3.5.3 software (R Core Team, Vienna, Austria) was used to invoke the program of WinBUGS 1.4.3 software (MRC Biostatistics Unit, Cambridge, UK) for Bayesian NMA. A random-effect model was used to compare treatments using Markov chain Monte Carlo (MCMC) methods with Gibbs sampling from 40,000 iterations obtained after a 10,000 burn-in phase. Following the processes of NMA, interventions were ranked according to their estimated effect sizes to display which treatment ranked highest, second highest, and so on, using the surface under the cumulative ranking curves (SUCRA) [20]. Statistical significance was defined as a two-sided P value of less than 0.05.

Standard pairwise meta-analysis was also performed for all direct head-to-head comparisons, using random-effect model for considering the anticipated variety in study populations. Both of the pooled effect estimates in NMA and pairwise meta-analysis were presented as the estimated summary effects (OR) combining with the 95% CrI as well as the 95% prediction intervals (95%PrI). Inconsistency is estimated as the difference between direct and indirect comparisons for each closed loop, with the method of node-splitting analysis (p < 0.05 indicated significant inconsistency).

Novel presentational approach (i.e., summary forest plot matrix) was used to display the results, including the forest plots and estimated effects both for NMA and pairwise meta-analysis, SUCRA value for each intervention, and the between-study heterogeneity, as described by Tan et al. [21]. Comparison-adjusted funnel plot was used to identify possible small-sample effect for each network using Stata software [22]. Subgroup NMA were performed for the subgroups of posterolateral lumbar fusion (PLF) and lumbar interbody fusion (LIF) on the success rate and incidence of adverse events, to assess the stability of NMA results.

Results

Study inclusion and baseline characteristics

Figure 1 shows the flowchart illustrating the process of study retrieval and selection. Databases searching initially identified a total of 5185 records, and another two records were manually searched for potential eligibility. Following exclusion of the duplicates, 3604 titles/abstracts were left for screening. Finally, 47 full-text articles were assessed for final eligibility, and 27 RCTs [3, 4, 9, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46] were included for qualitative and quantitative syntheses

Fig. 1
figure 1

PRISMA flowchart for the studies searching and selecting

Table 1 shows a summary of the trials included in this NMA. These studies included 2488 patients with an overall female percentage of 58.2% (range 36.8–72.5%). The mean follow-up period was 19.8 ± 8.5 months with an overall dropout rate of 10.5%. Several fusion techniques were performed, including PLF in 18 studies [4, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37, 39, 40], posterior LIF (PLIF) procedures in four studies [38, 43, 44, 46], anterior LIF (ALIF) in one study [45], transforaminal LIF (TILF) in three studies [9, 41, 42], and extremely lateral LIF (XLIF) in one study [3].

Table 1 Characteristics of included trails

Summary of the risk of bias and the risk of bias graph is presented in Fig. 2. The blinding of participants and personnel was presented to be with high risk of bias in most of the studies, while the other items were all shown to be with low or unclear risk of bias predominately.

Fig. 2
figure 2

Risk of bias summary for each included RCT (a) the risk of bias graph and (b) based on the Cochrane Collaboration tool. The percentages of “high risk of bias,” “low risk of bias,” and “unclear risk of bias” for each item are presented in a bar diagram

NMA for spinal fusion rate and all recorded complications

Figure 3 displays the network plot illustrating interventions directly compared in the primary RCTs. In total, 13 individual or combined intervention regimens, including ICBG (n = 962), rhBMP-2 (n = 746), rhBMP-7 (n = 329), Si-CaP (n = 92), PRP + ICBG (n = 20), HA + BMA + ALB (n = 20), HA + ALB (n = 25), DBM + ALB (n = 28), allograft (n = 102), ALB + β-TCP + HA (n = 10), ALB (n = 82), allograft + BMC (bone marrow concentrate) (n = 40), and ALB + β-TCP (n = 32), were available for analyses

Fig. 3
figure 3

Network plots illustrating interventions directly compared in the total network meta-analysis (a), and subgroup analyses of posterior lumbar fusion (b) and lumbar interbody fusion (c). Each node represents a type of bone graft, while each line represents a direct comparison between two grafts. The nodes and lines are weighted by the numbers of related patients and trials

The results of NMA for success rate of fusion are available in the summary forest plot matrix in Fig. 4. A ranking spectrum was provided in the diagonal line depicting the efficacy order of the intervention regimens. In general, rhBMP-2 provided the highest fusion rate, which was significantly superior to that of ICBG (OR = 0.21, 95% CrI 0.11–0.36, p < 0.001), ALB (OR = 0.18, 95% CrI 0.04–0.78, p = 0.022), rhBMP-7 (OR = 0.15, 95% CrI 0.06–0.38, p < 0.001), allograft (OR = 0.13, 95% CrI 0.03–0.60, p = 0.009), and DBM + ALB (OR = 0.07, 95% CrI 0.00–0.98, p = 0.048). The treatment efficacy of allograft could be significantly enhanced by BMC supplying (OR = 0.16, 95% CrI 0.04–0.64, p = 0.010). No significant difference was demonstrated for any other comparison according to the NMA results. The DBM + ALB was associated with the least success rate of fusion. The summary forest plot matrix for NMA of the recorded complications is available in Fig. 5. Among the available interventions, the DBM + ALB is associated with the highest incidence of complications, while the β-TCP + ALB is of the most favorable safety. ICBG ranks second in the frequency of complications, which is significantly higher than that of allograft (OR = 0.14, 95% CrI 0.02–0.92, p = 0.041) and ALB (OR = 0.14, 95% CrI 0.02–0.83, p = 0.030). All of the other comparisons were shown to be similar between groups.

Fig. 4
figure 4

Summary forest plot matrix for NMA of fusion rate. The matrix consisted of the forest plots (below the diagonal) as well as the estimated effect sizes (above the diagonal) for pairwise meta-analyses and NMA, the SUCRA curves (along the diagonal ordering by SUCRA values), and the between-study variance (τ2). NMA network meta-analysis, CrI credible interval, PI prediction interval

Fig. 5
figure 5

Summary forest plot matrix for NMA of complications. The matrix consisted of the forest plots (below the diagonal) as well as the estimated effect sizes (above the diagonal) for pairwise meta-analyses and NMA, the SUCRA curves (along the diagonal ordering by SUCRA values), and the between-study variance (τ2). NMA network meta-analysis, CrI credible interval, PI prediction interval

The cluster ranking plot is shown in Fig. 6, in which the bone grafts are divided into four groups using the median SUCRA values of the two networks. In general, the allograft + BMA, ALB + β-TCP + HA, HA + BMA + ALB, and β-TCP + ALB were demonstrated to provide both increased fusion rate and decreased frequency of complications. In contrast, though rhBMP-2, Si-CaP and ICBG could provide favorable fusion rate, they were also associated with increased risk of complications, especially for ICBG. For grafts including ALB, allograft, and rhBMP-7, they provided below-median treatment efficacy, but increased safety. DBM + ALB, HA + ALB, and PRP + ICBG were divided into the most unfavorable group of grafts, which were associated with both below-median efficacy and safety.

Fig. 6
figure 6

Cluster ranking plot which divided the bone grafts into four groups (colored as red, blue, green and purple) using the median SUCRA values of network meta-analyses for efficacy and safety. Values close to 100% indicate increased spinal fusion rate or increased incidence of adverse events. rhBMP recombinant human bone morphogenetic protein, BMA bone marrow aspirate, ALB autograft local bone, β-TCP β-tricalcium phosphate, HA hydroxyapatite, Si-CaP silicate calcium phosphate, ICBG autologous iliac crest bone graft, PRP platelet-rich plasma, DBM demineralized bone matrix, BMC bone marrow concentrate

Subgroup analyses

Supplementary Figures S1–4 show the forest plot matrices for the efficacy and safety of the available bone grafts based on the subgroups of PLF and LIF, and the corresponding ranking spectrums are available in Supplementary Table S1. RhBMP-2 was shown to be the most effective bone graft in both subgroups, providing significantly superior fusion rate than ICBG (OR = 0.24, 95% CrI 0.13–0.44, p < 0.001) and rhBMP-7 (OR = 0.17, 95% CrI 0.07–0.45, p < 0.001) in PLF subgroup (Supplementary Figure S1), and ICBG (OR = 0.06, 95% CrI 0.00–0.61, p = 0.017) in LIF subgroup (Supplementary Figure S2), respectively. None of the other head-to-head comparisons showed significant difference on the fusion rate. Similar incidence of complications was presented among the available grafts in the subgroups of PLF and LIF (Supplementary Figure S3–4). When compared with the total NMA, the subgroups provided similar ranking of the available bone grafts, indicating that no obvious unstability of the NMA results exists.

Inconsistency assumption and small-sample effect test

The results of inconsistency test are provided in Supplementary Figure S5. Only a single closed triangle loop (ICBG–allograft–ALB) was available in the integrated networks for spondylodesis efficacy and safety (Fig. 3a). No significant inconsistency was found between the direct and indirect comparisons in the closed loops, according to results of the node-split analysis (p > 0.05).

Comparison-adjusted funnel plot is presented in Supplementary Figure S6a–f, giving no obvious asymmetry, but some small-sample trials in each network located in the bottom of the funnels. Thus, no obviously detected publication bias exists, but irreducible small-sample effect may lead to the risk of bias.

Discussion

The main finding of our study was that rhBMP-2, allograft + BMA, ALB + β-TCP + HA, Si-CaP, β-TCP + ALB, and HA + BMA + ALB were associated with a tendency of increased success rate of lumbar fusion than that of ICBG, but of these Si-CaP and rhBMP-2 were found to lead to above-median incidence of complications.

To achieve solid spinal fusion in the situation of lumbar degenerative diseases, many alternative biological and synthetic bone substitutes have been identified or currently under development [47]. The optimal alternative to the ICBG, nevertheless, remains elusive till now. As a low molecular weight glycoprotein which belongs to the transforming growth factor-β superfamily, rhBMP-2 possesses strong osteoinductive property and has been widely accepted as the most effective osteobiologic agent to induce arthrodesis since the introduction in spinal fusion [48, 49]. There are several pieces of high-level evidence from meta-analyses that have compared the efficacy of rhBMP-2 and ICBG, which consistently reported superior spinal fusion rate for the rhBMP-2-treated group [12, 13, 50]. In the meta-analysis of individual participant data performed by Simmonds et al. [12], RCTs of rhBMP-2 versus ICBG in spinal fusion surgery for degenerative disk disease and related conditions were included for analysis, and a 12% higher radiographic fusion rate was provided with rhBMP-2 than with ICBG. Chen et al. [50] conducted a meta-analysis basing on 10 high-quality RCTs to compare the efficacy of rhBMP-2 and ICBG for lumbar fusion, showing significantly decreased risk of fusion failure at all time intervals (6, 12, and 24 months) for rhBMP-2 group than ICBG group. Similar result was demonstrated in our study, which showed that rhBMP-2 is the most effective bone graft substitute among all available grafts, providing significantly increased fusion rate than ICBG, ALB, rhBMP-7, allograft, and DBM + ALB.

Despite these encouraging results following the application of rhBMP-2, the utilization of rhBMP-2 in lumbar spondylodesis is still an off-label procedure which has not been approved by the Food and Drug Administration of USA [51]. Recent articles have presented several adverse events associated with rhBMP-2 application, including heterotopic bone growth, increased risk of malignancy, bony resorption or osteolysis, retrograde ejaculation (RE), radiculitis, and direct neural toxicity [50, 52,53,54,55]. Fu et al. [14] reported a significantly increased overall cancer risk at 24 months following treating with rhBMP-2. Poorman1 et al. [56] also reported increased odds of developing radiculitis or neurological complications attributed to BMP use, when compared with non-BMP group. Even so, the small number of adverse events has limited the power to detect the difference between groups, precluding definite conclusions. In our results, rhBMP-2 is associated with an above-median but lower-than-ICBG incidence of overall adverse events. Mostly, the adverse events associate with ICBG may be caused by graft harvesting, which should be less life-threatening than the former mention adverse events caused by rhBMP-2 application. Thus, to weight the benefit and damage that rhBMP-2 may bring to patients is quite essential, and application procedure should be taken with proper caution to ensure the graft to be contained within the cage or area where bone should grow.

Apart from the rhBMP-2, another molecule belonging to BMP family, which is called rhBMP-7 or osteogenic protein-1 (OP-1), has been shown to be able to initiate the cascade of bone formation in a variety of clinical situations including lumbar spondylodesis [57]. Up to now, the effectiveness and safety of rhBMP-7 relative to ICBG remain controversial [13, 15, 26]. The current NMA showed that OP-1 was associated with nonsignificantly inferior efficacy than ICBG and located on the median level of safety among all available graft materials that was nonsignificantly superior than ICBG. Similarly, Ye et al. [15] also found that there was no significant difference between the rhBMP-7 and ICBG groups, but rhBMP-7 appeared to yield a lower fusion rate in the instrumented PLF subgroup. Additionally, though rhBMP-7 group recorded lower rate of adverse events, no significant difference was found between the two groups. Thus, the current review does not recommend the rhBMP-7 as an effective alternative to ICBG due to no additional benefit would be produced, while it tended to yield a decreased fusion rate.

The ALB is often used as an alternative graft to ICBG, which provides almost same characteristics as bone graft from ICBG, including three-dimensional osteoconductive scaffold, osteoinductive potential provided by inherent BMPs, and osteogenetic activity derived from the osteoblasts [58]. The bone chips obtained during laminectomy are of predominantly cortical composition, with only a small percentage of trabecular or unmineralized bone which consists of the main components of marrow cavity. The ICBG is a graft rich in cancellous trabecular, which would be theoretically superior to ALB on the fusion rate due to increased osteoinductive activity. Our NMA found that ALB provided lower fusion rate than ICBG, but the subtle difference did not reach a statistical significance basing on the available patient samples. Concerning the safety of the grafts, ALB was identified to be with the least incidence of complications, which was found to be significantly less frequent than that of ICBG. Thus, ALB still could be used as an alternative graft to ICBG in lumbar arthrodesis, to provide nonsignificantly inferior fusion rate but obviously decreased risk of postoperative complications.

Calcium phosphate (CaP) ceramics, such as HA, β-TCP, and Si-CaP, are another set of bone graft substitutes which mainly exhibits osteoconductivity through their intrinsic three-dimensional scaffold [51, 59]. In general, these ceramic-based grafts are biocompatible with an appropriate safety profile and are able to mimic physiological bone [60, 61]. When augmented with osteoinductive growth factors or autologous mesenchymal stem cells or local bone, the ceramics could be equipped with ability to induce bone regeneration and osteogenic ability. What is more, ceramic materials application could also prevent the complications associated with autograft harvesting, and large-scale production is allowed. This study analyzed a total of five CaP ceramics-based intervention regimens, including TCP + HA + ALB, Si-CaP, TCP + ALB, HA + ALB + BMA, and HA + ALB, in which augmenting with osteoinductive materials were provided to enhance the osteogenesis process. Apart from the HA + ALB, a tendency of increased fusion rate than that of ICBG was provided by the other four grafts combinations. Thus, CaP ceramics are recommended to be used in combination with autogeneous bone as alternatives to ICBG to obtain solid fusion. We failed to assess the effectiveness of purely osteoconductive scaffolds in spinal fusion, but unsatisfactory results of stand-alone CaP bone graft substitutes have been previously reported [62,63,64].

Allogenic bone graft is another conventional alternative to ICBG used for spondylodesis, which biologically appears to be inferior due to the lack of osteoinductivity and osteogenic potential [65]. Nevertheless, in the current study, we found that when mixed with BMA, the fusion rate of bone allograft was significantly elevated to be ranked only second to rhBMP-2. DBM is a class of commercially available grafts derived from allograft, which theoretically has all types of BMPs involved in osteoinduction, albeit with lower concentrations. This may apply another potential alternative to ICBG for spine fusion. However, few data about DBM application were available for analysis.

Limitations

There were some limitations that should be noted. First, the small samples enrolled in primary trials might not provide sufficient power to detect small differences between groups (type II error). Therefore, some larger controlled trials of higher quality should be conducted to draw more definite conclusions. Second, the assessment of solid spinal fusion mainly depended on radiological evaluations. It must be taken into consideration that a predictive value of no more than 70% has been reported for the evaluation procedures with radiological methods [66, 67]. Some novel assessment methods, therefore, are required to provide more precise assessment on fusion rate. Finally, some potential clinical heterogeneity, such as the different fusion techniques selected, numbers of segments fused, the utilization of internal fixation instrumentation, and the amounts of the grafts provided, may confused the reliability of results. Hence, subgroup analyses were carried out for some of these confounding factors to decrease potential heterogeneity, giving stable ranking orders similar to the total NMA.

Conclusions

In summary, ranking spectrums of the efficacy and safety for various bone grafts were graphically provided, to guide the selection of potential alternatives to ICBG in spondylodesis. RhBMP-2 was of the highest success rate, which obtained statistical significance when compared to ICBG, ALB, allograft, and DBM + ALB. However, the application of rhBMP-2 should be taken with proper caution concerning the widely proposed life-threatening adverse events though with low incidence. ALB alone, ALB plus synthetic ceramic materials and allograft mixed with BMC were also proved to be potentially effective alternative graft to ICBG.