Introduction

Neurofibromatosis type 2 (NF2) is an autosomal dominant tumor predisposition syndrome of the nervous system caused by mutations in the NF2 gene on chromosome 22, resulting in loss-of-function in gene product, merlin, a key tumor suppressor [1,2,3]. The pathognomonic hallmark of NF2 is bilateral vestibular schwannomas (VS) by the third decade of life [4, 5]. Hearing loss is among the chief considerations in management of both sporadic and syndromic VS, as it may result from either treatment or the natural history. While VS are benign and typically indolent, treatment is often recommended in the setting of large tumors, worsening symptoms, or progressive tumor growth—with consideration given to microsurgical resection versus stereotactic radiosurgery (SRS) in most modern treatment paradigms [2]. Treatment decision-making is exponentially more complicated in NF2 patients, as syndromic tumors are more likely to be multifocal, extensively involved with adjacent neurovascular structures, and challenging to safely resect without hearing loss—a major source of severe potential morbidity, given the very high baseline risk of contralateral hearing loss [6, 2].

In this clinical context, the Food and Drugs Administration (FDA) approved bevacizumab (Avastin), a monoclonal antibody targeting vascular endothelial growth factor (VEGF), was first proposed by Plotkin et al. [7] in 2009 as a possible treatment alternative or adjuvant therapy for management of hearing symptomatology associated with VS in NF2. Encouragingly from a translation perspective, years later, a systematic review was able to identify that bevacizumab could indeed have the potential to confer hearing benefits in select patients, however, concluded that the certainty at that time was low [8]. Since then, clinical evidence supporting this practice has grown from anecdotal reports, followed by translational studies confirming the association between the expression of VEGF and its receptor VEGF-1 with VS progression [9,10,11,12]. Notwithstanding, granular data describing clinical course of VS lesions in NF2 following bevacizumab have remained limited to small case series, and best available evidence supporting clinical practice is Level III (expert opinion, case series/reports) [13]. Additionally, mounting evidence suggests that bevacizumab-related toxicity may be more significant than previously estimated [14]. Whether or not the therapeutic benefits counterbalance the toxicity risks has yet to be substantiated outside anecdotal individual studies. Correspondingly, the aim of this metadata analysis was to assess the reproducibility and precision of all pertinent studies in the literature describing radiographic tumor control, hearing loss, and adverse events outcomes in response to bevacizumab therapy for VS lesions in NF2 patients.

Methods

Search strategy

The search strategy was designed using PICOS format: Among NF2 patients with VS (Population) treated with bevacizumab(Intervention and Comparator), what are radiographic tumor control and hearing loss rates (Outcome), based on observational studies (Study Type) during treatment? The literature review was conducted according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines and recommendations [15]. Electronic searches were performed using Ovid Embase, PubMed, SCOPUS, and Cochrane databases from inception to March 2019. The literature was searched independently by two investigators (VML & KR) using the following string of terms: (bevacizumab OR Avastin) AND (schwannoma OR acoustic neuroma); Supplementary Table 1).

Selection criteria

Included articles reported patients (1) with confirmed diagnoses of VS and NF2 (2) treated with intravenous (IV) bevacizumab, and (3) with either radiographic tumor control or hearing outcome reported. There was no restriction regarding prior treatments or patient age. Exclusion criteria were (1) bevacizumab treatment for meningioma, ependymoma, and other non-schwannoma tumors, and (2) publications reporting n < 3 patients. For institutions publishing serial overlapping cohorts, only the most complete reports were included for quantitative assessment at each time interval. Publications were limited to English language.

Data extraction

Outcomes were abstracted directly from article texts, tables, and figures independently by two investigators (VML & KR). The primary endpoints were radiographic tumor control and hearing outcomes at final bevacizumab dose compared to pre-treatment measurements. As there currently does not exist a standardized bevacizumab regime, final dose was administered at the discretion of each institution, which will have included when maximum tolerability was reached. Radiographic tumor control was defined using MRI criteria as partial response ( > 20% volume reduction), stable ( < 20% change in volume) or progression ( > 20% increase in volume), previously validated categorizations specific to NF2-associated VS lesions [16]. Hearing outcome in those with assessable hearing (6% or greater on a 50-word list [17]) was defined as either improved, stable, or worsened based on Word Recognition Scores (WRS), as described by the American Academy of Otolaryngology Head and Neck Surgery (AAO-HNS) Committee on Hearing and Equilibrium classification [18, 19]. Plotkins et al. [7] utilized these hearing thresholds in their seminal work, which has since become standard categorizations in this niche scenario irrespective of scale used.

Secondary endpoints were complications from bevacizumab, including hypertension, proteinuria, amenorrhea, or any serious toxicities, defined as grade 3 and above according to Common Terminology Criteria for Adverse Events (CTCAE) 4.0 [20] (Supplementary Table 2). In brief, Grade 3 refers to complications that are severe or medically significant, but not immediately life-threatening such that admission or prolongation of hospitalization is indicated. More specifically, Grade 3 toxicities are disabling with respect to self-care and activities of daily living, whereas Grade 4 toxicities have potentially life-threatening consequences mandating urgent medical intervention.

Meta-analysis

The incidence rate was the primary summary statistics of this study. Incidence was calculated with initial variance by Fisher’s exact test for binomial data, and then transformed by Freeman-Tukey Transformation to stabilize the variances [21]. All statistics were pooled by meta-analysis of proportions using the random-effects (RE) model described by DerSimonian and Laird [22] to provide the overall study statistic. Heterogeneity was assessed using I2 for RE modeling, with values > 50% indicating substantial heterogeneity [23]. Meta-analytic data were presented as forest plots. All P-values were 2-sided, significance was defined using the alpha threshold 0.05, and the P-effect was used to describe testing against a null hypothesis assuming statistically negligible incidence. All statistical analyses were conducted with STATA 14.1 (StataCorp, College Station, Texas).

Quality and bias assessment

The certainty of each outcome was evaluated using the Grading of Recommendations, Assessment, Development and Evaluations (GRADE) approach and presented as a summary of findings to identify the certainty of all pooled outcomes [24]. The quality of evidence for each study was then evaluated using a modified Newcastle–Ottawa Scale (NOS) [25] for assessment of single-arm cohort studies [26]. Overall methodologic quality was then summarized based on the quality trends observed. In terms of bias for each outcome, when n ≥ 10 studies, publication bias was assessed using funnel plots, and small-study biases were evaluated using Egger’s linear regression test and Begg’s correlation test [27, 28].

Results

Search results

Following a primary search result of 186 articles and the removal of 74 duplicate citations, the title and abstract 112 articles were evaluated against the selection criteria (Fig. 1). Full-text analysis was performed for 21 articles, of which 2 prospective [29, 30] and 6 retrospective studies [31,32,33,34, 17, 35] satisfied all study criteria (Table 1).

Fig. 1
figure 1

Search results according to PRISMA guidelines

Table 1 Study characteristics and primary demographics

Demographics and clinical features

One hundred and sixty-one NF2 patients harboring 196 VS were reported with adequate tumor control data; assessable hearing data were reported in 114 patients (Table 1). Overall median female proportion was 55% (range 33–71%), median age was 30 years (range 15–34), and indications for bevacizumab in all cohorts included progression of tumor size and/or hearing loss. Both initiation and maintenance bevacizumab regimens varied greatly between cohorts, with a dose range of 5–10 mg/kg/2–6 weeks (Table 2). Median treatment duration was 14 months (range 11–22 months).

Table 2 Clinical features of all included studies

Previous treatment were incompletely reported, but included that at least 15% had undergone surgical resection for VS, while 10% had been treated with radiotherapy of any modality—only one of which was confirmed to have been irradiated less than 12 months prior to initiation of bevacizumab. Three studies [30, 17, 35] by-protocol administered bevacizumab to previously untreated patients, as an assessment of its efficacy in delaying or avoiding surgery.

Radiographic response

The pooled incidence of partial response was 41% (95% CI 31–51%; P-effect < 0.01; I2 = 42%; P-heterogeneity = 0.10), calculated by RE modeling based on eight individual cohorts [31, 29, 32,33,34, 30, 17, 35] describing 196 VS (Fig. 2). In the same population, pooled incidence of stable response was 47% (95% CI 39–55%; P-effect < 0.01; I2 = 9%; P-heterogeneity = 0.36), while the pooled incidence of tumor progression was 7% (95% CI 1–15%; P-effect < 0.01; I2 = 59%; P-heterogeneity = 0.02).

Fig. 2
figure 2

Forest plot of the incidence of radiographic responses of vestibular schwannoma lesions in NF2 patients that are partial response, stable or worsened. The effect size (ES) of incidence, its 95% CI and the relative weightings are represented by the middle of the square, the horizontal line, and the relative size of the square respectively

Hearing outcome

The pooled incidence of hearing improvement was 20% (95% CI 9–33%; P-effect < 0.01; I2 = 42%; P-heterogeneity = 0.11), calculated by RE modeling based on seven individual cohorts [31, 29, 32, 34, 30, 17, 35] describing 114 patients with assessable hearing (Fig. 3). In the same population, pooled incidence of stable hearing was 69% (95% CI 51–85%; P-effect < 0.01; I2 = 61%; P-heterogeneity = 0.02), and pooled incidence of worsened hearing was 6% (95% CI 1–15%; P-effect = 0.01; I2 = 25%; P-heterogeneity = 0.25).

Fig. 3
figure 3

Forest plot of the incidence of hearing outcomes of vestibular schwannoma patients with testable hearing in NF2 patients that are improved, stable or worsened. The effect size (ES) of incidence, its 95% CI and the relative weightings are represented by the middle of the square, the horizontal line, and the relative size of the square respectively

Complications

The pooled incidence of treatment-induced hypertension of all severities was 33% (95% CI 20–45%; P-effect < 0.01; I2 = 44%; P-heterogeneity = 0.10), and the pooled incidence of proteinuria of all severities while on bevacizumab therapy was 43% (95% CI 23–64%; P-effect < 0.01; I2 = 78%; P-heterogeneity < 0.01). These estimations were calculated by RE modeling based on seven individual cohorts [31, 29, 32, 34, 30, 17, 35] describing 145 treated patients. The pooled incidence of amenorrhea was 70% (95% CI 51–87%; P-effect < 0.01; I2 = 13%; P-heterogeneity = 0.32) calculated by RE modeling based on four individual cohorts [31, 29, 17, 35] describing 35 treated female patients.

The pooled incidence of serious toxicity (Grade 3 or higher) was 17% (95% CI 10–26%; P-effect < 0.01; I2 = 13%; P-heterogeneity = 0.33), calculated by RE modeling based on five individual cohorts [31, 29, 34, 30, 17] describing 125 patients. Although there was a lack of granularity in studies describing how hypertension was managed, the incidence of Grade 3 hypertensive events (with implied need for medical/clinical intervention) was anecdotally low, as reported by Morris et al.[30] (1/61 patients, 2%), Plotkins et al. [17] (1/31 patients, 3%), and Blakely et al. [29] (2/14 patients, 14%).

There was one case of mortality while on-treatment reported by Alanin et al. [31] due to spontaneous intracerebral hemorrhage in a 23 year old female patient. The need to cease treatment early was documented by four studies. Blakely et al. [29] reported reasons as immune thrombocytopenia and surgery for another tumor (2/14 patients, 14%). Hochart et al. [34] reported one reasons as hypertension and infection (2/7 patients, 28%). Plotkins et al. [17] reported the reason as proteinuria (3/31 patients, 9%). Finally, Morris et al. [30] reported four cases of hypertension, two cases of bleeding, two cases of wound healing, 1 case of fatigue, and 1 case of infection as reasons for cessation (10/61 patients, 16%).

Surgical intervention

The pooled incidence of treatment failure requiring surgical intervention for VS resection was 11% (95% CI 2–20%; P-effect = 0.02; I2 = 59%; P-heterogeneity = 0.06) calculated by RE modeling based on four individual cohorts [33, 30, 17, 35] describing 125 treated patients.

Quality and bias assessment

The certainties of all reported outcomes were then assessed against the GRADE criteria (Table 3). For radiographic response, certainty ranged from moderate in those with stable response, to very low in those with progression. For hearing outcomes, certainty ranged from moderate in those with worsening, to very low in those with stable response. For complications, certainty ranged from moderate regarding serious toxicity and amenorrhea to very low regarding hypertension and proteinuria. Finally, certainty of surgical intervention occurring was also very low.

Table 3 GRADE assessment for reported outcomes

Against the modified NOS criteria, the median score was 5 (range 3–5) out of a maximum of 5 evaluating studies for selection, ascertainment, causality, and reporting (Supplementary Table 3). All studies were deemed to be of moderate to high quality overall, with the primary reason for point deduction being lack of explicit therapy duration. The risks of publication and small-study biases could not be reliably performed due to limited cohort numbers ( ≤ 10) and so were not conducted.

Discussion

Bevacizumab has recently emerged as a compelling therapeutic option for managing progressive VS in NF2. Our metadata analysis suggests that a consistently significant proportion of treated patients will experience a beneficial impact on tumor control or hearing preservation; however, the risk of serious complications is not negligible, and may include hypertension, proteinuria, or amenorrhea. Certainty of these outcomes ranged from very low to moderate. Taken together, these findings support further consideration for bevacizumab in the very challenging NF2 population, while further mandating the opening of prospective, and ideally randomized, controlled trials to establish the longer-term treatment effect and risk profile in a more rigorous fashion.

In the largest series to date, Morris et al. [30] reported in a cohort of 61 patients that bevacizumab treatment resulted in radiographic tumor control in 82% and hearing preservation in 95%, when comparing pre-treatment baseline to outcomes at termination of therapy. These findings are largely representative of both the other included studies and the overall pooled confidence intervals of the study outcomes. Notable exceptions include Hochart et al. [34], who reported the highest proportion of radiographic progression (4/11 lesions, 36%), as well as Alanin et al. [31] and Sverak et al. [35], who each reported the highest proportion of worsening hearing (2/9 patients, 22%). These outliers represent the primary sources of heterogeneity driving our study outcomes in spite of their small sample sizes, which emphasizes the importance of cautiously interpreting the reported incidences of tumor control or hearing preservation on bevacizumab until their reproducibility can be better validated.

Our study also identified significant estimates of bevacizumab treatment complications, including the concerning finding that almost 1-in-5 patients experienced an adverse event requiring medical intervention within the first year of therapy. Furthermore, the results estimated 1-in-3 will experience hypertension, and up to 1-in-2 will experience proteinuria. In addition, there appeared to be an anecdotal non-negligible proportion of patients that ceased treatment early due to intrinsic and induced complications. Although concerning, these significant incidence rates are not necessarily surprising, given that multiple clinical trials of bevacizumab in recurrent intracranial glioblastoma [36,37,38] have reported serious toxicity events in > 30% of patients, and hypertension or proteinuria in > 50% and > 30%, respectively. However, while the treatment risks are comparable, the context in which the counterbalance of potential benefits is weighed are different, particularly given the more benign nature of VS and the near-universal fatality of glioblastoma. What is absent currently is any confirmation that a relationship between bevacizumab dosage and complications does or does not exist, which will prove of the upmost utility in the future for any attempts to standardize treatment schedules.

This is because another major concern arising from our systematic review is the marked heterogeneity in treatment practices for bevacizumab administration. When a single dose schedule was reported, this ranged from 5 mg/kg/2 weeks [33] to 10 mg/kg/6 weeks [35]. In studies [31, 32, 30] that reported both initiation and maintenance regimens, doses at initiation ranged from 5 to 10 mg/kg/2 weeks, and from 2.5 mg/kg/2 weeks to 15 mg/kg/3 weeks for maintenance. Complicating this discussion further, expert opinions and local treatment practices regarding the timing of therapy cessation versus institution of a drug holiday are a topic of active dispute; correspondingly, the validity of direct comparisons in treatment benefit between studies remains limited at present [13, 30].

Finally, the present analysis has identified age as a key parameter that will demand greater granularity in future study designs. Morris et al. [30] identified a pediatric subgroup ( < 18 years old) of 6 NF2 patients in their study, and observed that VS lesions in this subgroup had a greater propensity for tumor progression while on bevacizumab therapy, as compared to adult counterparts. The poor pediatric treatment response is further supported by Hochart et al. [34], who described the single pediatric-specific experience, in their series of 7 NF2 patients, which demonstrated the single highest rate of radiographic progression in all included studies. Adding yet another dimension of complexity to the treatment implications in children with NF2, the high incidence of amenorrhea and its reproductive implications warrants particular attention among young female patients of child-bearing age [31, 29, 17, 35].

Limitations of the literature and future directions

The literature describing bevacizumab for VS in NF2 is limited by a number of constraints. Outside the obvious need for greater cohort sizes, the first is the lack of standardized regime doses and durations, as well as surveillance protocols, which likely contribute to the reported heterogeneity of the outcomes in the meta-analysis. It is unclear if the successes, need for progressive surgery, and time-dependent complications of therapy will change if longer regimes are followed, as in the case of glioblastoma, or if tumor control and hearing preservation may still be maintained on a less toxic treatment scheme in this benign disease process. Indeed, the interpretation of homogenous time-to-event of all outcomes reported in this study requires significant caution as it was beyond the scope of this study to adjust for variance in institution-based follow-up protocols.

As there currently exists no standard therapy duration, our results are limited by the variable use of patient-dependent therapy holidays, heterogeneous dosing regimens and schedules, and post-cessation reporting practices. Although beyond the scope of the PICOS question and systematic review design as of yet, if we can reliably determine the durability of bevacizumab effects both on- and off-therapy, this will greatly inform clinicians about the sustainability of therapy, durability of tumor control or hearing protection following cessation of therapy, lowest effective dose and frequency, and the possibility of progressive toxicity with longer-term use. Elucidating these parameters will require greater assessment standardization and methodological transparency in the future.

Another concern is the risk of selection bias in reported outcomes. Ultimately, the precision of this meta-analysis relies on the accurate reporting of those outcomes included—favorable or otherwise—a practice that cannot be assumed in observational cohort studies. We attempted to mitigate some of the systematic bias introduced by patient selection in excluding very small series of 1–2 cases, given that reporting and publishing practices are such that institutions would be highly unlikely to publish case reports of unsuccessful outcomes, absent a highly interesting complication or novel application. Notwithstanding, until a randomized, controlled, blinded trial can be conducted, the reported incidences of benefit and complications following bevacizumab therapy highly suspicious for the influence of systematic bias, potentially resulting in over-statement of benefit and under-estimation of risk. Given the orphan nature of this disease, an international registry could assist in attaining greater statistical power sooner.

A final limitation that warrants consideration is the paucity in reporting of both long-term outcomes and the quality-of-life (QOL) impact sustained on bevacizumab. At present, all studies report outcomes of bevacizumab regimes with duration less than two years. Further, only Morris et al. [30] reported QOL outcomes, which highlighted the significant increases in quality of life (using the disease-specific NFTI-QOL measure [39]) in patients after 3 and 6 months of bevacizumab therapy, with no significant change noted in a matched, historic, bevacizumab-free cohort. If greater granularity can be achieved in demonstrating that bevacizumab positively impacts long-term QOL in NF2 patients, firmer recommendations regarding treatment benefits may be possible. To that end, long-term follow-up both on- and off-therapy is required to ascertain the true clinical efficacy of bevacizumab in this population, particularly given that the high risk of adverse events may be more reasonable to consider if post-treatment follow-up studies demonstrate compelling long-term benefit, ideally in multiple domains (e.g. tumor control, hearing preservation, QOL).

Conclusions

This metadata analysis evaluated the current literature in a systematic fashion regarding the efficacy and safety of the anti-angiogenic monoclonal antibody bevacizumab for the treatment of VS lesions in NF2 patients. We report that a significant majority of patients will experience tumor control and hearing improvement or preservation during therapy; however, a considerable proportion will also experience a serious treatment complication, such as hypertension, proteinuria, or amenorrhea. Pediatric patients may respond less favorably than adults based on present reported evidence. Perhaps most importantly, our review demonstrates the growing need for a standardized approach to both the clinical utilization of bevacizumab in this population, as well as reporting practices for short- and long-term follow-up and QOL outcomes, and the ultimate need for higher level evidence in the form of a preferably randomized, controlled trial.