FormalPara Key Points

The number of “low risk of bias” randomized controlled trials (RCTs) of intravitreal bevacizumab (IVTB) in patients with age-related macular degeneration (ARMD) illustrative of its cardiovascular safety are limited, the number of patients is modest for the purpose, and the number of events is low.

Consequently, estimates are imprecise and fragile and result in a low quality of evidence (high level of uncertainty).

At present, we simply do not know whether IVTB affects cardiovascular risk in patients with ARMD (as compared with any other treatment) and in which direction.

1 Introduction

Neovascular age-related macular degeneration (ARMD) is the leading cause of vision loss among elderly people in developed countries. Uncontrolled expression of the vascular endothelial growth factor (VEGF)-A in the retinal tissue promotes angiogenesis and vascular permeability, resulting in progression of wet ARMD [1, 2]. Treatment of the disease is of major importance in delaying vision loss and providing functional benefit. The first anti-VEGF-based treatment approved (in 2006) for this purpose, pegaptanib, a pegylated aptamer that inhibits VEGF, was followed by ranibizumab (in 2007), a humanized anti-VEGF monoclonal antibody fragment and finally (in 2012) by aflibercept, a fusion protein (a portion of the VEGF receptor as an active principle) that functions as a decoy for VEGF. All these treatments are administered intravitreally (IVT) [3] but yield detectable levels in the systemic circulation. Hence, there is a rationale for potential occurrence of systemic adverse events (AEs) [1]. These are probably related to the intrinsic properties of VEGF. Physiologically, VEGF promotes vascular repair in response to hypoxia. Therefore, long-term use of IVT anti-VEGF treatments for ARMD might increase the risk of a prolonged systemic inhibition of VEGF, with possible adverse consequences, particularly those of a thromboembolic nature. Therefore, the population affected by ARMD, especially the late stages of the disease, is at an increased risk of cardiovascular (CVD) and cerebrovascular incidents [4, 5].

Bevacizumab is a humanized monoclonal antibody against VEGF-A that selectively inhibits all of its isoforms and bioactive proteolytic breakdown products, thus inhibiting angiogenesis [6]. It has been approved since 2004 for the treatment of metastatic colorectal cancer and lung, kidney, ovarian, and brain cancers. Approval for the use of bevacizumab has never been obtained (nor sought) for the treatment of ARMD, yet it is the most commonly used treatment for ARMD worldwide [7]. The reason is its considerably lower price as compared with other anti-VEGF treatments. Although IVT bevacizumab (IVTB) is as effective as ranibizumab [8], its use in this indication is off label. Off-label prescribing—prescription of a medication in a manner different from that approved by the regulatory agencies—is legal and common, yet it is often carried out in the absence of adequate supporting data. Off-label uses have not been formally evaluated, and evidence provided for one clinical situation may not apply to other situations. Hence, off-label use undercuts expectations that drug safety and efficacy have been fully evaluated [9]. The systemic use of bevacizumab is associated with serious CVD AEs: hypertension, arterial thrombotic events, hemorrhage, and death [1]. As suggested by animal and limited human data on systemic exposure, IVTB might affect circulating levels of VEGF and also induce systemic AEs [10, 11]. Since it is used off-label, bevacizumab—unlike the approved treatments in this setting—is not embraced by periodic safety update reports, risk minimization measures, or other pharmacovigilance tools. Consequently, publicly available data are the key source of information. The issue has attracted much attention and at least eight systematic reviews/meta-analyses of randomized controlled trials (RCTs) with IVTB in ARMD addressing the problem of systemic safety have been published in 2014 and up to June 2015 [1, 8, 1217] [see the Electronic Supplementary Material 1 (ESM 01), Table S1]. However, not all have addressed the same aspects of CVD safety, and their conclusions have not been equivocal. When it comes to the safety assessment of an intervention, RCTs (particularly published reports) and meta-analyses of RCTs have several major limitations, especially regarding rare AEs; typically, they are designed primarily to assess efficacy, not safety. As such, they are underpowered [18] to detect a relevant safety difference. For example, the largest RCT of bevacizumab versus ranibizumab in patients with ARMD identified in the recently published [1, 8, 1217] reviews enrolled a total of close to 600 patients in each arm. As such, it had only around 20 % power to detect an absolute difference in incidence of, for example, myocardial infarction (MI) of 1 %, e.g., 3 versus 2 % (over 1 year), which translates into a relative risk of 1.50 and a practically relevant absolute effect of ten patients with MI more per 1000 treated. Similarly, the largest number of RCTs included in the bevacizumab versus ranibizumab in ARMD meta-analyses [12] was nine, with a total of around 3600 subjects. An ad hoc calculation [19] suggests that, assuming closely similar trial/arm sizes and mild to moderate heterogeneity (up to I 2 = 50 %), such an analysis had between 40 and 60 % power to detect the described practically relevant difference. Safety reporting is commonly poor, i.e., affected by attrition/selective reporting bias (a typical example is not reporting “non-events”). Trial quality is typically assessed with respect to efficacy (primary) outcomes, and a trial of good quality in this respect could be of poor quality in respect to the safety aspects [20]. Consequently, we undertook a systematic review of the literature to identify high-quality clinical and epidemiological safety data in an attempt to estimate whether IVTB affected the risk of CVD AEs in patients with ARMD, and if so, to estimate the incidence of such events.

2 Materials and Methods

Our systematic review of the literature (Fig. 1a) included the selection of studies for quality assessment and a selection of high-quality studies for the risk and incidence estimation. Study identification and selection was conducted by the two authors independently, and disagreements were resolved via consensus. The estimation of the bevacizumab-associated risk was to be based on RCTs, non-randomized controlled clinical studies, and controlled epidemiological studies, i.e., cohort or case–control studies. Estimation of incidence was to be based on bevacizumab arms from RCTs and non-randomized controlled clinical studies and case-series (uncontrolled cohorts).

Fig. 1
figure 1

Study design (a) and PRISMA flow chart (b) of the study identification, selection, and assessment process. Studies meeting inclusion/exclusion criteria were first assessed for quality in line with their design, with a focus on consideration and reporting of systemic adverse events. The Cochrane Collaboration risk of bias tool for randomized controlled trials was adapted to fit the specific needs related to adverse events [21]. The Newcastle–Ottawa tool for (epidemiological) cohort studies [22] was adapted to assess also uncontrolled clinical studies (i.e., case-series). AE adverse event, RCT randomized controlled trial

2.1 Literature Search and Selection of Studies for Qualitative Assessment

Elements of PICOS were combined in the literature search and study selection criteria. Electronic databases [PubMed MEDLINE, Ovid MEDLINE, all Cochrane Library, and EBSCO (Academic Search Complete, CINHAL, and ERIC)] were systematically repeatedly searched between 1 June 2014 and 15 June 2015 (in periods: 1 June 2014–6 July 2014, 26 January 2015–5 February 2015, and 8 June–15 June 2015). Studies published up to 15 June 2015 were considered. The search strategy was designed to be sensitive and not specific: we used the terms “age-related macular degeneration” or “AMD (ARMD)”, or synonyms, i.e., “neovascular” or “wet” or “exudative” combined with “macular degeneration” or “choroidal neovascularization” to identify the disease; “bevacizumab” or “Avastin” combined with “intravitreal” or “intra-ocular” to identify the treatment (the terms were used with the “all fields” option). The search was limited to articles in English and German. We also reviewed the reference lists of the retrieved articles. Studies meeting the following criteria were included in quality assessment: (1) Studies pertained to IVT administration of bevacizumab in humans with ARMD; (2) Studies were RCTs, case–control studies, stratified cohort studies, or clinical non-randomized controlled or uncontrolled studies (prospective or retrospective); (3) Studies explicitly addressed at least one of the following outcomes: (a) all-cause mortality; (b) vascular mortality; (c) mortality of unknown causes; (d) incident hypertension; (e) arterial thrombotic incidents (stroke or transitory ischemic attack; MI or angina; peripheral artery occlusion); (f) venous thromboembolism; (g) non-ocular hemorrhage; or alternatively, “systemic AEs” were mentioned. Exclusion criteria were as follows: (1) Congress abstracts; (2) Studies with ≤20 subjects exposed to bevacizumab; (3) Bevacizumab used in a combined treatment (e.g., bevacizumab + photodynamic therapy); (4) Follow-up period of <6 months and/or fewer than three repeated bevacizumab administrations. When a study was published in more than one article, the one with the most complete data was included. The exemptions were separate reports (publications) on extended results from the same studies; these were considered as separate contributions to effect evaluation at different time points. Since no data pooling across time points was conducted, this did not represent a “unit of analysis” issue and there was no “double data counting”. Selection agreement between the two authors was assessed after title and abstract screening and after full-text evaluations (Fig. 1a). We used only published data and did not contact the authors of primary studies.

2.2 Quality Assessment and Selection of Studies for Quantitative Synthesis

Quality assessment was based primarily on definition, ascertainment and reporting of the targeted AEs and subject selection criteria. We used the Cochrane Collaboration risk-of-bias instrument, modified to include the specifics for AEs, to evaluate RCTs [21], and we used the respective Newcastle–Ottawa Scales (NOS) for the assessment of stratified cohort and case–control studies [22]. Non-randomized controlled clinical studies (concurrent or historical controls) were managed as stratified cohort studies. For uncontrolled cohorts (i.e., essentially large case-series), we used the elements from NOS pertaining to patient selection, ascertainment of exposure, and outcome-related items. To avoid misleading (biased) estimates, only studies judged to be of high quality were considered for quantitative data evaluation (Fig. 1a). An RCT was considered to be of high quality when (1) the risk of reporting bias and the risk of attrition bias were low: the “adverse event” modification of these items required that definitions of AEs were given; that pro-active methods of AE monitoring were implemented and described; that numerical values were provided, including explicit reporting of “zero events”; (2) not more than one of the other risks of bias [selection (random sequence generation; allocation concealment); performance (blinding of participants), and detection (blinding of investigators; blinding of outcome assessment)] was unclear; and (3) no risk of bias was “high”. Stratified cohort studies (and non-randomized controlled clinical studies) were considered to be of high quality when at least one star could be assigned to each of the eight NOS elements [selection of cohorts—representativeness of the exposed cohort, selection of the non-exposed cohort, ascertainment of exposure, evidence of no outcome of interest at baseline; comparability of cohorts—by design/analysis (two stars possible); and outcome—method of assessment, length of follow-up sufficient for outcome to occur and adequacy of follow-up (proportion and accounting for loss to follow-up)] i.e., eight of nine stars were assigned. A case–control study was considered to be of high quality when at least seven of nine possible stars by the respective NOS [22] could be assigned, i.e., one to each of the following: case definition, case representativeness, definition of controls, comparability of cases and controls (two stars possible), ascertainment of exposure and same method for cases and controls, and non-response rate. We considered it acceptable to have hospital-based (and not necessarily community-based) controls. Finally, uncontrolled cohorts (i.e., large case-series) were considered to be of an adequate quality when a star could be assigned to the following elements of NOS for the cohort studies [22]: representativeness of the exposed cohort, ascertainment of exposure, assessment of the outcome, follow-up long enough for the outcome to occur and follow-up adequacy (account for loss to follow-up). We considered it acceptable when such studies included patients with existing CVD/cerebrovascular morbidity (e.g., hypertension or coronary artery disease) but reported on worsening of the condition.

2.3 Outcome Measures, Data Extraction, and Quantitative Synthesis

Data were extracted independently by the two authors in standardized forms using the Cochrane Collaboration methodology [23], and any discrepancies were resolved via consensus. CVD/cerebrovascular safety outcomes were extracted as explicitly reported regarding definition (e.g., “myocardial infarction”, “stroke”) and numerical values: we did not attempt to generate “composite” measures by combining individual outcomes, e.g., generating “ “any CVD event” from reports on a variety of individual events. Hence, we avoided double counting of subjects in excess to double counting that occurred in primary studies (e.g., a patient with different events reported as incident for each event). However, when each outcome is assessed separately, the latter does not generate a “unit of analysis issue” [23]. In RCTs, study-level binary data were extracted to illustrate the cumulative incidence of a particular AE as a number of patients with an event per total number of patients reported as “safety dataset” or alternatively (if “safety dataset” not declared), as those who “received at least one dose of the assigned treatment” or as an “intent-to-treat dataset”. For case–control, cohort, and non-randomized controlled clinical studies, adjusted ratio measures (risk, odds, rate, hazard) were considered. For estimation of incidence, n/N data were extracted from bevacizumab arms in clinical studies. Data pooling was considered when two or more studies/arms reported on the same outcome and were clinically comparable, i.e., came from studies of similar designs (RCT, observational) and referred to a similar follow-up period (separate pooling by time period, no pooling across time periods). For estimation of incidence, “clinical similarity” implied also that the patients were similar regarding the pre-existing CVD/cerebrovascular burden. For RCTs, we used Peto odds ratio (OR) for outcomes with frequencies around or below 1 %; for outcomes with an incidence of >1 %, we used random effects (due to the variability of study particulars) Mantel–Haenszel (M–H) OR [24]. In both cases, we also report conditional exact M–H ORs with mid-P confidence intervals (CIs), as the method does not use continuity corrections and does not exclude zero-event trials. For observational studies (adjusted ratio measures), we anticipated the random-effects inverse-variance method [24]. For meta-analysis of proportions (incidence), we used the random-effects Freeman–Tukey double arcsine transformation method. We performed exploratory random-effects meta-regression (restricted maximum likelihood estimation) to evaluate the effects of study duration and the mean number of delivered doses on the incidence of AEs. To illustrate heterogeneity of study effects, we reported the estimate of true between-study variance (τ 2), inconsistency index (I 2) illustrating the proportion of variability (heterogeneity) not ascribable to random error, Q statistics and 95 % prediction interval (PI) (whenever more than three studies pooled) as likely the most intuitive measure of dispersion of true effects (“true heterogeneity”) [25]. With three RCTs, we used the 90 % PI since the critical t value with 1 df and probability of 0.05 is high, resulting in non-intuitively wide intervals. Considering the low number of trials, I 2 estimated to be 0.0 % was considered likely inaccurate, hence the upper confidence limit (UCL) for I 2 as an indicator of (in)consistency is also reported [26]. Between-treatment comparisons referred to bevacizumab versus “non-anti VEGF treatments” and versus “other anti-VEGF treatments”. Hence, different comparisons from a single trial were used for different analyses, thus a “unit of analysis issue” was not generated. Considering the comprehensive literature search, we did not specifically evaluate publication bias. Although it would have been statistically correct, we considered that adjusting for multiplicity of comparisons would have been too conservative and would have resulted in missing a potentially relevant safety signal. We used STATA 13 (StataCorp, College Station, TX, USA) software.

2.4 Summarizing and Grading Quality of Evidence

We used the GRADE methodology [27, 28] that a priori treats RCT-based data as high-quality evidence (⊕⊕⊕⊕) and observational data as low-quality evidence (⊕⊕○○). Considering the literature search, publication bias was not evaluated in the quality grading. Furthermore, we did not evaluate primary study limitations (risk of bias), since they were all selected on a criterion of high-quality using the established instruments for the respective study designs. Quality grading was based on (im)precision, sparseness, (in)directness, (in)consistency, and size of the effect. When the body of evidence consisted of a low number of trials/small sample size combined with sparse data (common lack of events, low number of events), we downgraded quality for both imprecision and sparseness.

3 Results

3.1 Eligible Studies

Of the identified 2028 non-duplicate records, 186 were retrieved for full-text evaluation (Fig. 1b). In this last round of eligibility assessment, 154 studies were excluded: ten RCTs (five had fewer than 20 patients treated with bevacizumab [2933], two had <6 months of follow-up [34, 35], and three had no explicit reference to “systemic AEs” [3638]); one population-based time-series analysis indicating no change in stroke-related hospitalization rates co-incident with the increasing ophthalmological use of bevacizumab and ranibizumab, but including a variety of treated condition(s) (not only ARMD)[39]; one population-based nested matched case–control study indicating no association between exposure to either bevacizumab or ranibizumab within 6 months and occurrence of ischemic stroke, MI, venous thromboembolism (VTE) or chronic heart failure (HF), but considering a variety of treated ophthalmological conditions [40]; and 142 clinical case-series or non-randomized controlled studies (mostly retrospective chart reviews; small and/or short and/or not mentioning “systemic AEs”). Hence, a total of 32 studies met the inclusion/exclusion criteria and were assessed for quality (Fig. 1b). A total of 15 studies (ten RCTs in 12 publications; one non-randomized controlled clinical study; two population-based stratified cohort studies, and two uncontrolled clinical cohorts) met the quality criteria and were considered for estimation of the risk of CVD AEs associated with IVTB for ARMD, and estimation of their incidence (Fig. 1b).

3.2 Study Quality and Other Characteristics

3.2.1 Randomized Controlled Trials

We specifically focused the risk-of-bias assessment of the 14 eligible RCTs (16 publications) on consideration and reporting of CVD AEs; ten trials (reported in 12 publications) were judged to be of high quality (see ESM 01, Table S2). Clinical heterogeneity was apparent in several aspects: (1) comparator treatments—five trials compared bevacizumab with ranibizumab (results at 12 [42, 45, 47, 48, 51] and 24 months [43, 46]), one compared it with a “standard of care” (including photodynamic therapy [PDT], pegaptanib, and sham injections) [41], and four trials compared different bevacizumab regimens [44, 49, 50, 52]; (2) bevacizumab regimens—variations regarding loading dose (yes/no), dosing intervals (fixed, flexible, duration), or “as needed” treatment (see ESM 01, Table S3); (3) selection of patients depending on the pre-existing CVD morbidity—six RCTs excluded patients with a history of or ongoing “serious” CVD morbidity [41, 42, 4749, 52] and four did not [44, 45, 50, 51]; and (4) pre-existing “CVD burden” in treated patients (see ESM 01, Table S3).

3.2.2 Population-Based Cohort Studies and Non-Randomized Controlled Clinical Studies

All three eligible population-based cohort studies [5759] were focused on CVD safety of bevacizumab (also ranibizumab) assessing the risks of death (all-cause), MI, stroke, and bleeding (see ESM 01, Table S4). Two [57, 58] were judged to be of high quality, whereas one [59] was not—it reported age-standardized incidence rates in a cohort of ~1200 mostly bevacizumab-treated patients with ARMD and concluded no difference versus age-standardized rates in the general national population. It therefore failed to achieve comparability of cohorts by either design or analysis. A further limitation was an inability to define the time elapsed between the last injection and occurrence of events, hence the adequacy of follow-up was questionable. Among five eligible non-randomized controlled clinical studies (all retrospective chart reviews), one compared bevacizumab with ranibizumab with occurrence of atherothrombotic events (ATEs) as a primary outcome [60] and was judged to be of high quality (see ESM 01, Table S4), whereas the remaining four [6164] were focused on efficacy and, considering systemic safety, failed to achieve comparability of cohorts [6164] and/or representativeness of cohorts (patient inclusion based on data completeness) [61], appropriate outcome assessment [6264], or adequate follow-up [61].

3.2.3 Uncontrolled Clinical Cohorts

In the case of uncontrolled clinical cohorts (i.e., case-series), “representativeness of the exposed cohort” was concluded when patients’ characteristics were documented to be “typical” for ARMD patients with an indication for anti-VEGF treatment, and no bias-introducing selection criteria were implemented (e.g., inclusion based on “efficacy data completeness”). Among ten eligible reports, two were specifically focused on systemic safety of IVTB and were judged to be of adequate quality [65, 66], whereas the remaining eight dealt primarily with efficacy and, considering systemic safety, failed to achieve “cohort representativeness” [67, 68], appropriate outcome assessment [6974], or adequate follow-up (not reporting/accounting for withdrawals and questionable consistency of follow-up) [6773] (see ESM 01, Table S5).

3.3 Intravitreal (IVT) Bevacizumab for Age-Related Macular Degeneration (ARMD) and Risk of Cardiovascular (CVD)/Cerebrovascular Adverse Events (AEs)

3.3.1 Randomized Controlled Trials

Bevacizumab versus non-anti VEGF treatments One 12-month trial [41] evaluated bevacizumab (n = 65) versus “standard of care”, where 28 patients received a non-anti VEGF treatment (PDT and verteporfin, n = 16; or sham injections, n = 12) (see ESM 01, Table S3). Lack of new-onset hypertension, stroke, intracranial hemorrhage, or any non-ocular bleeding in both arms was explicitly stated. There was one case of MI and one vascular death in the bevacizumab arm versus 0 among controls.

Different bevacizumab regimens—“higher” versus “lower” injection frequency The four trials comparing different bevacizumab regimens (over 12 [49, 50, 63] or 23 months [52]) (Table 1) generally compared “more frequent” and “less frequent” dosing; however, clinical heterogeneity was considerable (duration, actually delivered doses, patient selection based on history of CVD diseases). The type of the explicitly addressed CVD/cerebrovascular AEs and their definitions also varied (see ESM 01, Table S3). Consequently, all-cause mortality was reported in two trials [52, 63] of different durations, delivered doses, and patient selection criteria, and none indicated differences between regimens (Table 1). Incidence of vascular death and of arterial thrombotic events were each reported in two trials [49, 52] of different durations, none of which indicated differences between dosing schedules (Table 1). Incidence of heart failure and of vascular AEs (by MedDRA classification) were reported in a single 23-month trial [52], and the incidence of the latter outcome appeared lower with more frequent dosing [risk difference (RD) −3.6 %, p = 0.032] (Table 1). The incidence of CVD/cerebrovascular incidents was reported in three trials [49, 50, 63] with 12-month durations—there were six events in 478 patients, and neither individual trials nor the pooled estimate indicated differences between treatment regimens (Table 1). However, all trials were small, the number of events was very low, with “no-event trials” regarding some outcomes (Table 1), hence all individual and pooled estimates were extremely imprecise.

Table 1 Meta-analysis of randomized controlled trials of more versus less frequent bevacizumab dosing in patients with age-related macular degeneration in respect to cardio-/cerebrovascular adverse events (by event)

Bevacizumab versus other anti-VEGF treatments Of the six RCTs, one small 12-month trial [41] compared bevacizumab (n = 65) with pegaptanib (n = 38), whereas all other comparisons were against ranibizumab, reporting data after 12 months (five trials [42, 45, 47, 48, 51]) and 24 months (two trials [43, 46]) of treatment (see ESM 01, Table S3).

All-cause mortality All trials (k = 6, N = 3141) explicitly reported on this outcome (Fig. 2). There appeared to be no statistically significant difference between bevacizumab and other anti-VEGF treatments at 12 months. The same was true regarding comparisons with ranibizumab at 12 months (k = 5, N = 3038) and 24 months (k = 2, N = 1795). However, the number of events was low, and all individual trial and pooled estimates were imprecise. At 12 months, 95 % PIs indicated substantial dispersion of effects and the upper 95 % confidence limit of I 2 indicated inconsistency.

Fig. 2
figure 2

Meta-analysis of randomized controlled trials comparing intravitreal bevacizumab with other anti-vascular endothelial growth factor treatments in age-related macular degeneration: all-cause mortality after 12 (upper panel) or 24 (lower panel) months of treatment. One 12-month trial (Tufail [41]) compared bevacizumab with pegaptanib, whereas all other comparisons were with ranibizumab. Hence, at 12 months, the pooled estimate is given for bevacizumab vs. “other anti- vascular endothelial growth factor treatments” and also vs. ranibizumab. For primary analysis, effect measure was random-effects Mantel–Haenszel odds ratio. Since there was a “no-event” arm at 12 months, an alternative method without continuity correction was implemented—exact conditional M–H odds ratio [24]. CVD excl.? refers to non-inclusion of patients with a history of- or an on-going cardio-/cerebrovascular disease or incident. BEV bevacizumab, CI confidence interval, CVD cardiovascular, LCL lower confidence limit, M–H Mantel–Haenszel, OR odds ratio, PI prediction interval, UCL upper confidence limit, VEGF vascular endothelial growth factor

Vascular death Four trials [41, 42, 45, 51] (2339 patients) explicitly reported on this outcome (Fig. 3). There appeared to be no statistically significant difference between bevacizumab and other anti-VEGF treatments at 12 months. The same was true regarding comparisons with ranibizumab at 12 months (k = 3, N = 2236) [42, 45, 51] and 24 months (k = 2, N = 1795) [43, 46]. However, the number of events was low, and all individual trial and pooled estimates were imprecise. At 12 months, the PI indicated substantial dispersion of effects, and the upper 95 % confidence limit of I 2 indicated inconsistency.

Fig. 3
figure 3

Meta-analysis of randomized controlled trials comparing intravitreal bevacizumab with other anti-vascular endothelial growth factor treatments in age-related macular degeneration: incidence of vascular death after 12 (upper panel) or 24 (lower panel) months of treatment. One 12-month trial (Tufail et al. [41]) compared bevacizumab with pegaptanib, whereas all other comparisons were with ranibizumab. Hence, at 12 months, the pooled estimate is given for bevacizumab vs. “other anti-vascular endothelial growth factor treatments” and also vs. ranibizumab. For primary analysis, effect measure was Peto odds ratio. Since there were “no-event” arms at 12 months, an alternative method without continuity correction was implemented—exact conditional Mantel–Haenszel odds ratio [24]. CVD excl.? refers to non-inclusion of patients with a history of- or an on-going cardio-/cerebrovascular disease or incident. BEV bevacizumab, CI confidence interval, CVD cardiovascular, LCL lower confidence limit, M–H Mantel–Haenszel, OR odds ratio, PI prediction interval, UCL upper confidence limit, VEGF vascular endothelial growth factor

MI or angina All six trials (3141 patients) explicitly reported on this outcome, but only one included “angina” [45, 46] (Fig. 4). There appeared to be no statistically significant difference between bevacizumab and other anti-VEGF treatments at 12 months. The same was true regarding comparisons with ranibizumab at 12 months (k = 5, N = 3038) and 24 months (k = 2, N = 1795), although one trial [51] reported significantly fewer MIs with bevacizumab (0/220) versus ranibizumab (6/221 [2.7 %]) at 12 months. The number of events was low, and all individual trial and pooled estimates were imprecise. At 12 months, the PI indicated a substantial dispersion of effects, and the upper 95 % confidence limits of I 2 indicated inconsistency.

Fig. 4
figure 4

Meta-analysis of randomized controlled trials comparing intravitreal bevacizumab with other anti-vascular endothelial growth factor treatments in age-related macular degeneration: incidence of myocardial infarction or angina after 12 (upper panel) or 24 (lower panel) months of treatment. One 12-month trial (Tufail et al. [41]) compared bevacizumab with pegaptanib, whereas all other comparisons were with ranibizumab. Hence, at 12 months, the pooled estimate is given for bevacizumab vs. “other anti-vascular endothelial growth factor treatments” and also vs. ranibizumab. One trial (data after 12 and 24 months—IVAN 2012 [45] and IVAN 2013 [46], respectively) reported both myocardial infarction and angina. The analysis is based on cumulative counts (angina cases depicted in brackets). For primary analysis, effect measure was Peto odds ratio. Since there were “no-event” arms at 12 months, an alternative method without continuity correction were implemented—exact conditional Mantel–Haenszel odds ratio [24]. CVD excl.? refers to non-inclusion of patients with a history of- or an on-going CVD/cerebrovascular disease or incident. BEV bevacizumab, CI confidence interval, CVD cardiovascular, LCL lower confidence limit, M–H Mantel–Haenszel, OR odds ratio, PI prediction interval, UCL upper confidence limit, VEGF vascular endothelial growth factor

Stroke All six trials (3141 patients), two of which recorded no events, explicitly reported on this outcome (Fig. 5). There appeared to be no statistically significant difference between bevacizumab and other anti-VEGF treatments at 12 months. The same was true regarding comparisons with ranibizumab at 12 months (k = 5, N = 3038) and 24 months (k = 2, N = 1795). However, the number of events was low, and all individual trial and pooled estimates were imprecise. At 12 months, the 95 % PIs indicated a substantial dispersion of effects, and the upper 95 % confidence limit of I 2 indicated inconsistency.

Fig. 5
figure 5

Meta-analysis of randomized controlled trials comparing intravitreal bevacizumab with other anti-vascular endothelial growth factor treatments in age-related macular degeneration: incidence of stroke after 12 (upper panel) or 24 (lower panel) months of treatment. One 12-month trial (Tufail et al. [41]) compared bevacizumab with pegaptanib, whereas all other comparisons were with ranibizumab. Hence, at 12 months, the pooled estimate is given for bevacizumab vs. “other anti-vascular endothelial growth factor treatments” and also vs. ranibizumab. For primary analysis, effect measure was Peto odds ratio. Since there were “no-event” trials at 12 months that are disregarded by the method, an alternative method that includes such trials and does not use continuity correction was implemented—exact conditional Mantel–Haenszel odds ratio [24]. CVD excl.? refers to non-inclusion of patients with a history of- or an on-going CVD/cerebrovascular disease or incident. BEV bevacizumab, CI confidence interval, CVD cardiovascular, LCL lower confidence limit, M–H Mantel–Haenszel, OR odds ratio, PI prediction interval, UCL upper confidence limit, VEGF vascular endothelial growth factor

Transitory ischemic attack (TIA) Four trials [42, 45, 47, 51] (2721 patients) comparing bevacizumab with ranibizumab explicitly reported on this outcome at 12 months (Fig. 6). There appeared to be no statistically significant difference between treatments, but data were sparse, all individual trial and pooled estimates were imprecise, and the PI indicated a substantial dispersion of effects, and the upper 95 % confidence limit of I 2 indicated inconsistency. At 24 months, only one trial [46] reported 1/296 versus 1/314 cases with bevacizumab and ranibizumab, respectively.

Fig. 6
figure 6

Meta-analysis of randomized controlled trials comparing intravitreal bevacizumab with other anti-vascular endothelial growth factor treatments in age-related macular degeneration: incidence of atherothrombotic events (ATE) after 12 (upper panel) or 24 (middle panel) months of treatment and of incident/worsening hypertension after 12 months (lower panel). All comparisons were with ranibizumab. For analysis of atherothrombotic event, effect measure was random-effects Mantel–Haenszel odds ratio. For primary analysis of hypertension, effect measure was Peto odds ratio. Since both trials reporting on hypertension had a “no-event” arm, an alternative method without continuity correction was implemented—exact conditional Mantel–Haenszel odds ratio [24]. CVD excl.? refers to non-inclusion of patients with a history of- or an on-going cardio-/cerebrovascular disease or incident. ATE atherothrombotic event, BEV bevacizumab, CI confidence interval, CVD cardiovascular, LCL lower confidence limit, M–H Mantel–Haenszel, OR odds ratio, PI prediction interval, UCL upper confidence limit, VEGF vascular endothelial growth factor

ATEs Only two trials comparing bevacizumab with ranibizumab (N = 1795) explicitly reported on this outcome listing data for 12 [42, 45] and 24 months [43, 46] (Fig. 6). While 12-month data were inhomogeneous (I 2 = 62.2 %), 24-month data indicated no difference between treatments.

Hypertension Only two trials (N = 1626) explicitly reported on this outcome, both comparing bevacizumab with ranibizumab and listing data at 12 months [42, 51], and one also at 24 months [43] (Fig. 6). At 12 months, there were only four events (two in each trial), all with bevacizumab, yielding a highly imprecise pooled estimate but a statistically significantly higher risk with bevacizumab. At 24 months, the reported incidence [43] was 4/586 (0.7 %) versus 3/599 (0.5 %) for bevacizumab and ranibizumab, respectively, indicating no difference between treatments.

HF Only one trial comparing bevacizumab with ranibizumab explicitly reported on this outcome, listing data at 12 months [45] and 24 months [46]. While no relevant difference was apparent at 12 months—2/296 (0.7 %) with bevacizumab versus 3/314 (1.0 %) with ranibizumab—at 24 months, there were fewer events with bevacizumab, with borderline statistical significance: 2/296 (0.7 %) versus 7/314 (2.2 %): absolute RD −1.6 % (approximate Miettinen 95 % CI −3.9 to 0.5); p = 0.112.

VTE Four trials comparing bevacizumab with ranibizumab explicitly reported on this outcome, all at 12 months [42, 45, 47, 51] (2721 patients) and two [43, 46] also at 24 months (1795 patients) (Fig. 7). At 12 months, the odds of VTE tended to be higher with bevacizumab in three of four trials and overall. However, the number of events was low, and all individual trial and pooled estimates were imprecise, while the PI indicated a considerable dispersion of effects and the I 2 indicated inconsistency. The odds of VTE also tended to be higher with bevacizumab at 24 months, and one of the two trials yielded a statistically significant difference [10/586 (1.7 %) vs. 3/599 (0.5 %); p = 0.046].

Fig. 7
figure 7

Meta-analysis of randomized controlled trials comparing intravitreal bevacizumab with other anti-vascular endothelial growth factor treatments in age-related macular degeneration: incidence of venous thromboembolism after 12 (upper panel) or 24 (lower panel) months of treatment. All comparisons were with ranibizumab. For primary analysis, effect measure was Peto odds ratio. Since there were “no-event” arms at 12 months, an alternative method without continuity correction was implemented—exact conditional Mantel–Haenszel odds ratio [24]. CVD excl.? refers to non-inclusion of patients with a history of- or an on-going CVD/cerebrovascular disease or incident. BEV bevacizumab, CI confidence interval, CVD cardiovascular, LCL lower confidence limit, M–H Mantel–Haenszel, OR odds ratio, PI prediction interval, UCL upper confidence limit, VEGF vascular endothelial growth factor

3.3.2 Observational Data

One population-based cohort study (Curtis et al. [57]) indicated no difference between IVTB and PDT or pegaptanib regarding all-cause mortality, and the risk of MI or stroke (Table 2). It also indicated no difference between bevacizumab and ranibizumab regarding all-cause mortality and the risk of MI, but a higher risk with bevacizumab regarding stroke [adjusted hazard ratio (HR) 1.23, 99 % CI 1.02–1.47] (Table 2). In the secondary analysis accounting only for patients started on either treatment within the same timeframe and with a further adjustment for socioeconomic status, the difference was no longer statistically significant, but the estimate was imprecise: HR 1.15 (99 % CI 0.81–1.64) (Table 2). Another population-based cohort study (Kemp et al. [58]) reported a tendency of a higher risk of MI for bevacizumab- or ranibizumab-treated patients versus PDT, but the estimate was extremely imprecise (adjusted HR 2.32, 95 % CI 0.70–7.74) (Table 2). No other adjusted estimate was reported. Primary data indicated no difference between bevacizumab and ranibizumab regarding MI or stroke, but the cohorts were small and the number of events was low (Table 2). Finally, one non-randomized clinical study [60] reported 12 cases of ATE (six stroke, two MI, one unstable angina, one TIA, one sudden death, and one peripheral ATE) among 97 bevacizumab-treated patients and three cases (two stroke, one MI) among 219 ranibizumab-treated patients. With adjustment for diabetes, hypertension, pre-existing CVD disease, lung disease, and history of ATE, the risk was considerably higher with bevacizumab, but the estimate was imprecise (HR 6.11, 95 % CI 1.61–23.2).

Table 2 Observational data (record-linkage studies) regarding 1-year cardio-/cerebrovascular risk associated with intravitreal bevacizumab (BEV) in patients with age-related macular degeneration

3.4 Mortality and Incidence of CVD/Cerebrovascular AEs in IVT Bevacizumab-Treated ARMD Patients in Clinical Studies

All bevacizumab treatment arms from clinical studies were classified into several subsets with a reasonable clinical homogeneity (within a subset) considering the study design (RCTs, case-series, non-randomized controlled studies), treatment/follow-up duration, and selection of patients with respect to the history of CVD/cerebrovascular incidents. Considering that not all AEs were explicitly addressed in all studies, the numbers of treatment arms, patients, and events across these “design-by-outcome” subsets were rather low (most commonly, a single bevacizumab arm with a limited number of patients). Data are summarized in Table 3 and ESM 02 (including forest plots). Exploratory meta-regression considered all arms per outcome (if more than five), but arms from two RCTs reporting data after both 12 and 24 months in separate publications were considered only at 24 months [43, 46].

Table 3 Mortality and incidence of cardio-/cerebrovascular adverse events in bevacizumab-treated patients with age-related macular degeneration in clinical studies: summary of meta-analysis

All-cause mortality Estimates varied from 0 % in a single 6-month case-series to 6.5 % estimated by pooling three bevacizumab arms from 24-month RCTs (Table 3, ESM 02 Fig. S1). In a meta-regression (12 arms) accounting for both study duration and the number of delivered doses, longer study duration was associated with higher all-cause mortality (coefficient = 0.007; 95 % CI 0.002–0.011, p = 0.003), and a higher number of delivered doses tended to be associated with higher all-cause mortality (coefficient = 0.005, 95 % CI 0.000–0.010, p = 0.071).

Vascular death Estimates varied from 0.3 % (two arms from 12-month RCTs) to 2.1 % (three arms from 24-month RCTs) (Table 3, ESM 02 Fig. S1). Meta-regression (eight arms) did not indicate associations between either study duration or the mean number of delivered doses and incidence.

MI Estimates varied from 0.3 % from one 13-month case-series to 2.1 % in one 30-month arm from a non-randomized controlled study (Table 3, ESM 02 Fig. S2). Neither study duration nor the mean number of delivered doses appeared associated with the incidence (meta-regression on nine arms).

Stroke Estimates varied from 0.3 % from one 13-month case-series to 6.2 % in one 30-month arm from a non-randomized controlled study (Table 3, ESM 02 Fig. S2). Neither study duration nor the mean number of delivered doses appeared associated with the incidence (meta-regression on 13 arms).

ATEs Estimates varied from 0.3 % in a single 12-month RCT arm to 12.4 % in one 30-month arm from a non-randomized controlled study (Table 3, ESM 02 Fig. S3). In a meta-regression (eight arms) accounting for both study duration and the number of delivered doses, longer study duration tended to be associated with higher incidence (p = 0.072), but the mean number of delivered doses did not (p = 0.851).

VTE Estimates varied from 0 % in a single 6-month case-series to 1.4 % in three arms from 24-month RCTs with exclusion of patients with CVD burden (Table 3, ESM 02 Fig. S3). In a meta-regression (seven arms), longer study duration (p = 0.054) and higher mean number of delivered doses (p = 0.034) were each individually associated with a higher incidence, but not when both moderators were accounted for.

Hypertension Estimates varied from 0.3 % in a single 12-month RCT arm to 1.4 % in a single 13-month case-series (Table 3, ESM 02 Fig. S3).

HF Estimates varied from 0.7 % in a single 24-month RCT arm to 1.7 % in two 24-month RCT arms (Table 3, ESM 02 Fig. S3).

3.5 Summary of Findings and Quality of Evidence

Considering the limited meaning of exposure/dose–response data in identification of a treatment effect (e.g., dose-dependency might indicate an effect, but it cannot be accurately quantified, whereas a lack of dose-dependency does not exclude a treatment effect), the body of evidence about CVD/cerebrovascular safety of IVTB in patients with ARMD pertains primarily to comparisons with other treatments.

All-cause mortality The highest quality evidence comes from two 24-month RCTs versus ranibizumab and is graded as “low” due to a considerable imprecision: the pooled effect extends from a relevant benefit (by 25 % lower odds) to a relevant harm (by 69.8 % higher odds) (Table 4). In absolute terms, with the control risk of 52 deaths/1000 patients/2 years, it extends from 12 less to 33 more deaths. One-year RCT-based evidence on the same comparison comes from a meta-analysis of five trials, but its quality is “very low” due to a considerable imprecision (effect estimate is even more imprecise), dispersion (heterogeneity) of effects (wide 95 % PIs) and inconsistency (upper confidence limit of I 2 is 100 %). Well-adjusted 1-year observational data for concurrent exposure to bevacizumab or ranibizumab are also imprecise, as they extend from a relevant benefit (by 29 % lower risk with bevacizumab) to an appreciable harm. Neither RCT-based nor observational data indicated a significant difference, but their point estimates were in opposite directions. There is no relevant RCT-based data for a sound comparison of bevacizumab with PDT or pegaptanib. Observational 1-year data do not indicate a relevant difference, but the quality of evidence is “very low” due to indirectness since control treatments were not used within the same time period as bevacizumab. Although confounding in this study was generally well controlled for, there is an inevitable residual bias pertaining to development of “general baseline risk” over time and its impact is neither controllable nor estimable.

Table 4 Summary of evidence: all-cause and vascular mortality

Vascular death Evidence comes exclusively from RCTs (Table 4). The highest-quality evidence comes from two 24-month trials versus ranibizumab and is graded as “low” due to a considerable imprecision: the pooled effect extends from a relevant benefit (by 38 % lower odds) to a relevant harm (by 147 % higher odds). In absolute terms, with the control risk of 17 vascular deaths/1000 patients/2 years, it extends from six fewer to 24 more. One-year RCT-based evidence on the same comparison comes from a meta-analysis of three trials but is of “very low” quality due to a considerable imprecision, dispersion of effects, and inconsistency. There are no relevant RCT-based data for a sound comparison of bevacizumab with PDT or pegaptanib.

MI The highest-quality evidence comes from a meta-analysis of five 1-year RCTs versus ranibizumab and two 24-month RCTs with the same comparison (Table 5). In both cases, quality of evidence is “low”. One-year data suggest a lower risk with bevacizumab but are imprecise: the estimate extends from a considerable benefit (by 73.5 % lower odds) to still appreciable harm (by 14.6 % higher odds). In absolute terms and with the control risk of 13/1000/year, it extends from ten fewer to two more patients with an event. There is also a considerable dispersion (95 % PI from 12 fewer to 21 more) and inconsistency (upper confidence limit of I 2 = 78.3 %) of effects. The 2-year pooled estimate is in the same direction but is even more imprecise. One-year observational data come from two studies that are not appropriate for pooling, and each provides “very low” quality of evidence due to imprecision as the estimates extend from relevant benefit to relevant harm (with point estimates in opposite directions). There are no relevant RCT-based data for a sound comparison of bevacizumab with PDT or pegaptanib. One-year observational data versus PDT do not demonstrate a difference, but provide “very low” quality of evidence either due to indirectness (one study; interventions used in different time periods) or due to extreme imprecision (another study). Observational data versus pegaptanib also do not indicate a difference, but evidence is of “very low” quality due to indirectness (one study; interventions used in different time-periods).

Table 5 Summary of evidence: myocardial infarction and stroke. Both relative and absolute effects (difference in the number of patients with the outcome/1000 treated) are depicted

Stroke The highest-quality evidence comes from two 24-month RCTs versus ranibizumab and is graded as “low” due to imprecision: it extends from a considerable benefit (by 63.1 % lower odds) to a considerable harm (by 78.9 % higher odds) (Table 5). In absolute terms, and with the control risk of 16/1000/2 years, it extends from ten fewer to 12 more patients with an event. One-year RCT-based evidence comes from a meta-analysis of five trials but is of “very low” quality due to even more pronounced imprecision, as well as dispersion and inconsistency of effects. One-year observational data also provide only a “very low” level of evidence due to imprecision. There are no relevant RCT-based data for a sound comparison of bevacizumab with PDT or pegaptanib. One-year observational data provide only “very low” quality of evidence for the same reasons as in the case of MI.

TIA Evidence comes only from RCTs versus ranibizumab at 12 or 24 months and is of “very low quality” due to imprecision, data sparseness, dispersion, and inconsistency (Table 6).

Table 6 Summary of evidence: transitory ischemic attack, atherothrombotic events, hypertension, heart failure and venous thromboembolism. Both relative and absolute effects (difference in the number of patients with the outcome /1000 treated) are depicted

ATEs Evidence is based on comparisons with ranibizumab in two RCTs with 12- and 24-month data and in one observational study (Table 6). Data from the 24-month RCT do not indicate a difference but provide “low” quality of evidence due to imprecision: the estimate extends for a relevant benefit (by 35.9 % lower odds) to a relevant harm (by 59.3 % higher odds). In absolute terms and with the control risk of 45/1000/2 years, it extends from 16 fewer to 25 more patients with an event. Data at 1 year are even more imprecise (“very low quality”). Observational data indicate a greatly increased risk with bevacizumab, but evidence is of “very low quality” due to extreme imprecision and indirectness.

Hypertension Evidence comes from RCTs versus ranibizumab (Table 6). One-year data indicate a greatly increased risk of hypertension with bevacizumab, but quality of evidence is “very low” due to imprecision and data sparseness.

HF Evidence comes from a single RCT versus ranibizumab and is of “very low” quality (Table 6).

VTE Evidence comes from RCTs versus ranibizumab at 12 months (four trials) and at 24 months (two trials) (Table 6). Both effects indicate a considerably increased risk with bevacizumab, but the level of evidence is “low” (24 months) or “very low” (12 months) due to imprecision of the estimates and data sparseness, and at 12 months also due to dispersion of effects and inconsistency.

Overall, there is still a high level of uncertainty about whether bevacizumab differs from ranibizumab (or older treatments, i.e., PDT and pegaptanib) regarding any of the assessed outcomes.

4 Discussion

IVT administered anti-VEGF agents have become a mainstay of treatment of ARMD. At least theoretically and in line with data on systemic bioavailability, even with IVT administration they might affect the risk of systemic vascular events [75, 76]. Hence, CVD/cerebrovascular safety is a potentially important element of their overall assessment. In this respect, bevacizumab is rather specific since it is commonly used in this setting [7], largely due to its favourable cost, but it is used off-label. Hence, its assessment rests on evaluation of published data. In line with the chronological development in the field and arising practical questions (e.g., choosing between bevacizumab and ranibizumab), the evaluation of IVTB in patients with ARMD has been based practically exclusively on RCTs versus ranibizumab. Several recent reviews/meta-analyses [1, 8, 1217] have addressed these trials in order to evaluate systemic safety. The present overview is specific in that it is focused on one aspect of systemic safety of IVTB and attempts to identify whether it affects CVD/cerebrovascular risk by assessing both RCTs and epidemiological (observational) data.

4.1 Main Findings

The present systematic review addressed nine AE outcomes considered illustrative of a potential impact of IVTB on CVD/cerebrovascular risk in patients with ARMD: all-cause mortality, vascular mortality, incidence of MI, stroke, TIA, any ATE, VTE, hypertension, and HF. The overall body of evidence comprises five RCTs with low risk of bias comparing bevacizumab with ranibizumab [12-month (all five trials) and 24-month (two trials) outcomes]; one RCT with low risk of bias comparing bevacizumab with pegaptanib, PDT, or sham injection (12-month outcomes); four RCTs with low risk of bias comparing different bevacizumab regimens [12-month (three trials) and 23-month (one trial) data]; three high-quality observational studies providing data on comparisons between bevacizumab and ranibizumab (k = 3), PDT (k = 2), or pegaptanib (k = 1); and two high-quality case-series used in addition to randomized and non-randomized clinical-based bevacizumab-treated arms in estimation of AE incidence and exploration of the effect of duration/cumulative dose on AE occurrence through meta-regression. Not all included studies reported on all targeted AEs. Overall, the incidence of each of the outcomes is low or very low, and a number of studies/arms reported no events.

There are no relevant RCT-based data that would enable reasonable assessment of CVD/cerebrovascular risk associated with IVTB in the treatment of ARMD as compared with PDT or pegaptanib. Observational data do not indicate any signals of an increased or a reduced risk of all-cause mortality, MI, or stroke, but the quality of evidence is very low, and the uncertainty of the (non)existence of an effect (and its direction) is very high.

However, comparisons with ranibizumab are far more relevant from a practical standpoint, as they might directly influence choices among currently preferred treatments.

Regarding all-cause mortality, 12-month RCT data (k = 5, N = 3038) do not demonstrate a difference between treatments; however, the quality of evidence is “very low” due to imprecision, dispersion of effects, and inconsistency. At 24 months (k = 2, N = 1795), the estimate again does not indicate a difference, but the quality of evidence is “low” due to imprecision and that there are only two trials leaves uncertainty about dispersion of effects and inconsistency. One observational study (10,968 patients) also did not demonstrate a difference at 1 year; however, the level of evidence is “very low” due to imprecision. The present assessment differs from that in a recent review [12]. Based on eight RCTs (the same five included in the present overview, two additional RCTs that we excluded based on small sample size and/or inadequate quality and one unpublished trial), the authors concluded “moderate quality of evidence” and no difference between bevacizumab and ranibizumab regarding all-cause mortality at the “longest follow-up”. However, the estimate was as similarly imprecise as the present estimates, and dispersion/inconsistency was not specifically explored. When the present five RCTs are considered “at longest follow-up”, the estimate again does not exclude a relevant harm or a relevant benefit (0.726–1.525), and dispersion or inconsistency of individual study estimates are not relevantly improved. Overall, the existing RCT-based and observational data do not demonstrate a difference between bevacizumab and ranibizumab regarding all-cause mortality, but the level of uncertainty about (non)existence of the effect (and its direction) is high.

Regarding vascular mortality, RCT-based data at 12 months (k = 3, N = 2236) and at 24 months (k = 2, N = 1795) are also judged to be of “very low” or “low” quality for the same reasons as for all-cause mortality. Estimates do not demonstrate a difference between treatments. However, point estimates are at the level of “relevant harm”, but imprecision is, considerable and overall uncertainty about (non)existence of the effect is particularly high.

Regarding MI and stroke, the body of evidence consists of the same RCT-based data at 12 months (k = 5, N = 3038) and at 24 months (k = 2, N = 1795) and two observational studies at 1 year (10,968 and 1267 patients, respectively). For neither outcome do the RCT-based data demonstrate a difference between treatments (at any time point). All point estimates (both outcomes, both time points) are at the level of a “relevant benefit” or close to it, but evidence is judged to be of “low” or “very low” quality for the same reasons as for all-cause mortality. Two observational studies also did not demonstrate a difference between the treatments for either outcome, but provided only “very low” quality evidence due to largely imprecise estimates. The present assessment differs from that in a recent review [12] (the same five RCTs included in the present overview, and one same unpublished trial regarding both outcomes), where the authors concluded “moderate quality of evidence” and no difference between bevacizumab and ranibizumab at the “longest follow-up”. However, the estimates were similarly imprecise as the present estimates, and dispersion/inconsistency was not specifically explored. Regarding MI, the discrepancy could be partly because, for one primary trial [45, 46], we also included patients with angina (as they could be distinguished from “MI patients”). However, even if only MI is counted at the “longest follow-up”, the estimates do not exclude a relevant benefit or a relevant harm (random effects M–H OR 0.418–1.670, Peto OR 0.367–1.527, conditional exact M–H OR with mid-P CIs 0.354–1.353). The same is true regarding stroke at the “longest follow-up” (random-effects M–H OR 0.396–1.642, Peto OR 0.399–1.610, conditional exact M–H OR with mid-P CIs 0.389–1.625). Overall, the existing RCT-based and observational data do not demonstrate a difference between bevacizumab and ranibizumab regarding incidence of MI or stroke, but the level of uncertainty about (non)existence of the effect (and its direction) is high.

Considering ATEs, the main body of evidence identified in the present review consists of only two RCTs reporting data at 12 and 24 months (Fig. 6). At 24 months, individual study results are close to each other and indicate no difference between treatments (OR 1.007), but due to a largely imprecise estimate that does not exclude a relevant benefit or a relevant harm and because there are only two RCTs, evidence is considered to be of “low” quality. This is in discordance with the recent review [12] that included six RCTs (all five identified in the present review and one unpublished trial), and concluded “no difference between treatments” and “moderate” quality of evidence, although the estimate was similarly imprecise as the one presented here. The discrepancy between the present and the published review [12] is because we decided to “count” the outcomes exactly in the way they were reported in the primary studies: the two depicted trials were the only ones in which “ATEs” was accompanied with a numerical value reported by the authors. For all other trials, generation of “ATEs” would have required summing-up events like stroke or MI and would have represented a risk of double counting. One observational study (316 patients over up to 3 years) included in the present review indicated a greatly increased risk of ATE with bevacizumab versus ranibizumab. Although, due to imprecision, it is considered to provide only “very low” quality evidence, it further supports a conclusion that the existing data leave a rather high level of uncertainty about (non)existence of a difference between bevacizumab and ranibizumab.

Considering TIA (four RCTs at 12 months, one also at 24 months), VTE (four RCTs at 12 months, two also at 24 months), hypertension (two RCTs at 12 months, one also at 24 months) and HF (only one RCT), the existing evidence is considered to be of “low” or of “very low” quality primarily due to huge imprecision of estimates, low number of trials/events and, in some cases, dispersion/inconsistency of individual trial results. Although data suggest a considerably (and significantly) greater risk of hypertension or VTE (close to statistical significance) with bevacizumab versus ranibizumab, there is a high level of uncertainty about the (non)existence of the bevacizumab effect.

Evidence based on RCTs comparing different bevacizumab regimens is also burdened with the low numbers of patients and events, and imprecise estimates, and is therefore also of very low quality, leaving a high level of uncertainty about potential difference between regimens. Incidence of individual AEs largely differs across the included bevacizumab treatment arms, and there is no clear-cut signal of the effect of “cumulative dose”.

4.2 Strengths and Limitations of the Present Review

We conducted a comprehensive literature search that is unlikely to have missed any published randomized or observational comparison of IVTB versus any other treatment (or a comparison between different bevacizumab regimens) in this indication, or any relevant case-series illustrative of incidence of CVD/cerebrovascular adverse events in bevacizumab-treated patients with ARMD. Non-inclusion of unpublished studies might be viewed as a limitation; however, we considered that the unpublished status or only an abstract form were unsuitable for an independent, reliable assessment of inclusion/exclusion criteria and quality assessment specifically with respect to CVD/cerebrovascular AEs ascertainment and reporting. We consider scrutinized quality using well-established instruments focused on ascertainment, evaluation, and reporting of these specific AEs and inclusion of only high-quality (low risk of bias) studies to be a further strength of the present work: estimation based only on high-quality primary studies is likely to protect against “false findings” [77]. Closely related to this, we consider that the decision not to include small studies (low patient numbers/duration too short for the targeted AEs to occur), which are prone to “chance findings” further improved accuracy of the present analyses [77]. The cut-off of at least 20 subjects per arm was defined considering that such a sample still had a reasonable probability to record at least one event assuming the true incidence of around 5 %. The cut-off of at least 6 months of treatment/follow-up and/or at least three drug administrations was set arbitrarily. Strictly speaking, the most direct way of assessing CVD/cerebrovascular risk associated with IVTB in ARMD would be to compare it with placebo or a non-anti-VEGF treatment (e.g., PDT). One recent network meta-analysis [15] of RCTs indicated that all IVT anti-VEGF agents in this indication, including bevacizumab, were associated with a higher risk of thrombotic events than placebo (absolute RD up to 5 %). However, regarding bevacizumab, this assessment was exclusively indirect [15] and should therefore be considered, at best, as “low” quality evidence [78]. As anticipated based on the experience of others [1, 8, 1217], only one small RCT has been published that compared bevacizumab with sham treatment or PDT [41]. Hence, due to the low availability of appropriate primary trials, multiple-treatment comparison would not contribute much to the overall evidence and we preferred to focus on direct RCT-based comparisons. In respect to the evaluation of therapeutic interventions, observational data are generally considered to provide a “low” level of evidence [27, 28], but in a situation with a limited number of RCTs and with the low incidence AEs as the outcomes of interest, observational data should be viewed as a potentially valuable complementary source of information.

Having in mind that statistically determined heterogeneity may not always be of practical relevance (e.g., individual study effects in the same direction, but variable sizes of effects), and that a “formal” lack of it might not always be viewed as evidence of consistency of individual study estimates (e.g., inadequate power for the test of heterogeneity; low heterogeneity, but subsets of trials yield estimates in opposite directions) [79], we a priori decided that pooled estimates should be generated only across RCTs with a reasonable clinical similarity and that exploration of heterogeneity through meta-regression in a setting with a handful of primary trials should be avoided. The most prominent elements of clinical heterogeneity regarding high-quality RCTs were duration of treatment/follow-up and the extent of “CVD/cerebrovascular burden” at baseline. Due to their limited number, “subsetting” of RCTs was based on only one criterion—study duration. Having in mind that CVD/cerebrovascular changes resulting in respective clinical AEs need a certain period of exposure to a “noxious stimulus,” we considered it more important to group the trials based on duration than on patient inclusion based on “CVD burden”, particularly since reporting on this issue was not very uniform across the trials. However, estimates of incidence of individual AEs were drawn for “duration-by-CVD burden-by- type of study” subsets since the number of bevacizumab arms was greater than the number of trials.

5 Conclusions and Implications for Further Research

The existing published data, RCTs, epidemiological studies, and other observational studies do not provide grounds for a reliable estimate of the potential impact of IVTB on CVD/cerebrovascular risk in patients with ARMD—although they do not indicate any clear-cut signals of an increased or a reduced risk (as compared with any other treatment), the level of uncertainty about whether there is any effect and in which direction is high. The uncertainty is primarily due to a limited number of high-quality studies (RCTs) with a limited number of patients and limited follow-up periods combined with a low incidence of CVD/cerebrovascular events. Under such conditions, the estimates are not only imprecise but also fragile, and a few events more or less could significantly impact the overall conclusion. Since bevacizumab is commonly used in this setting, resolution of this issue, particularly in respect to other anti-VEGF agents is important. More high-quality RCTs paying more attention to CVD/cerebrovascular safety, and more high-quality epidemiological studies, are clearly needed. A considerable contribution might also come from more efficient research-synthesis methods, e.g., individual-patient meta-analysis of the already completed and still ongoing RCTs, or even from pharmacokinetic–pharmacodynamic modelling studies.