Background

Warfarin is commonly used for primary and secondary prevention of numerous thrombotic conditions [1]. New oral anticoagulants (OAC) including dabigatran and rivaroxaban have recently become available in several countries, and randomized controlled trials to support their use in many of these indications now exists [26]. Bleeding is the main serious adverse effect when using any of these agents. Estimates of bleeding rates for warfarin vary widely depending on the study design with annual incidences of 0.6% for fatal bleeding, 3.0% for major bleeding, and 9.6% for major or minor bleeding being reported [7]. Newer OACs have been shown to have similar [2, 3, 8] or lower rates of major bleeding [2], depending on the doses used. Many different factors such as concurrent use of interacting medications, labile international normalized ratio (INR), previous bleeding episodes, and age can increase a patient’s risk for bleeding [9, 10]. Risk assessment tools [clinical prediction rules (CPR)] that are able to quantify an individual patient’s risk of bleeding while on OAC would be a valuable addition to clinical practice by aiding clinicians in evaluating the benefits versus risks of initiating OAC.

When we previously systematically reviewed this topic [11], no such tools had been sufficiently evaluated and/or had performance characteristics [defined by likelihood ratios (LR)] sufficient to be recommended for use in routine practice. Given the emergence of new OACs and time for better tools to be developed, the purpose of this review was to systematically review and evaluate the performance of CPRs for estimating bleeding risk in patients on OAC therapy based on research published since 2006.

Methods

A standardized systematic review methodology was employed [12, 13] with the required adaptations for studies of diagnostic tests or clinical prediction rules.

Criteria for considering studies for this review

Types of studies

We included studies that prospectively or retrospectively evaluated the ability of a CPR using only readily available clinical parameters to distinguish between patients at high and low risk of experiencing major bleeding on warfarin, ximelagatran, rivaroxaban, apixaban, or dabigatran therapy for any indication. We excluded studies of CPRs that were published in abstract format only, were conference proceedings, did not report observed bleeding rates in the CPR risk strata, or were published in languages other than English.

Types of participants

Participants included adults with any condition for which OAC (warfarin, ximelagatran, rivaroxaban, apixaban, or dabigatran) was prescribed. The latter three of these OAC agents were included in our search strategy because they are either recently marketed in several countries or are expected to be soon. Ximelagatran was included because although its development was halted due to non-bleeding-related safety concerns, it has been subjected to large trials that could inform our research questions [14, 15].

Types of intervention

To be included, study participants must have undergone risk stratification via a CPR that used only clinical variables readily available when the decision to commence OAC therapy is being made.

Types of outcome measures

The primary outcome of interest was the ability to distinguish between patients at high and low risk of experiencing major bleeding on warfarin therapy. Predictive ability was defined in terms of LR. Authors’ definitions of “major bleeding” were accepted if they were similar to the International Society on Thrombosis and Haemostasis’ definition, which includes fatal bleeding, and/or symptomatic bleeding in a critical area or organ such as intracranial, intraspinal, intraocular, retroperitoneal, intra-articular or pericardial, or intramuscular with compartment syndrome, and/or bleeding causing a fall in hemoglobin level of 20 g/L (1.24 mmol/L) or more, or leading to transfusion of two or more units of whole blood or red cells [16].

Secondary outcome measures included the ability of the CPRs to predict mortality, minor bleeding, and erratic INR (for warfarin only).

Search strategy for identification of studies

A systematic search of the following databases was performed: PubMed, MEDLINE, EMBASE, Cochrane Database of Systematic Reviews, and International Pharmaceutical Abstracts. The search timeframe was 2006 to January 2011 for each given our previously published systematic review covering the literature up to 2006 [11]. The following search terms were used: risk assessment, assessment tools, clinical prediction tools, bleeding risk, bleeding, and warfarin, ximelagatran, dabigatran, apixaban, or rivaroxaban. In MEDLINE, a search strategy previously identified as having 98% sensitivity for identifying CPRs was employed [17]. A manual review of reference lists from retrieved articles was performed to identify any additional studies.

Methods of the review

Initially, two reviewers independently evaluated the titles, abstracts, and citations of all identified articles to select those potentially meeting the inclusion criteria. Articles so selected were independently subjected to full text review to establish whether they met the inclusion criteria and did not meet any exclusion criteria. Concordance between reviewers was measured by simple agreement. Any disagreements were resolved by discussion and consensus. Reviewers were not blinded to author or journal names.

Quality assessment

Both reviewers independently subjected the included studies to quality assessment. Three methods of quality assessment were carried out according to the methodologic standards published by Laupacis et al. [18], McGinn et al. [19], and the hierarchy of evidence for CPRs published by McGinn [19]. For the methodologic quality assessment, each question was assigned a score of 0 (criteria not met), 1 (criteria partially met), and 2 (criteria fully met). The score was summed to give an overall quality score out of 34 points for each study.

Data extraction

Both reviewers independently read each article and abstracted data using a standardized content review form. Data extracted included: the aim of the study, design, number, type, sample size, duration, inclusion and exclusion criteria, patient characteristics, elements comprising the CPR, and results. Attempts were made to acquire additional information from investigators as required. Discrepancies of data extraction were resolved by consensus through review of the published report.

Analysis of CPR performance characteristics

The LR for each stratum of the CPRs was calculated using published data and the method of Peirce and Cornell [20]. In this case, LRs represent the ratio of the probability of each test result (e.g., low, intermediate, high risk using the CPR) in people who end up experiencing bleeding to the probability in those who did not bleed. Unlike the c statistic commonly used to evaluate CPRs and which only provides an overall representation of the percentage of cases in which the CPR correctly predicts the outcome, LR is the most directly clinically applicable measure of diagnostic test performance, particularly when the tests produce more than two strata of results (e.g., low, intermediate, or high risk of bleeding) [2124]. Likelihood ratios can be used clinically to estimate an individual patient’s risk of an outcome when combined with the clinician’s baseline level of suspicion of that risk. The cutoffs for performance based on LR proposed by Jaeschke et al. were used in interpreting these estimates [25, 26]. Performance was considered moderate when the LR was greater than 5.0 or less than 0.20. Performance was considered strong when the LR was greater than 10.0 or less than 0.10. Statistically significant LRs, less extreme than these values were considered to have weak predictive performance.

For CPRs where multiple studies (>2) assessed their performance or where more than one study had a validation cohort containing more than 2,000 patients, we attempted to quantitatively combine the results. Our aim was thereby to provide a synthesis of evidence for CPRs that have been the most extensively studied.

Planned subgroup analysis

No subgroup analyses were prospectively planned.

Results

Description of studies

Six studies meeting the inclusion criteria were identified [2732]. Two of these [27, 32] were excluded due to the authors not reporting the actual bleeding risks identified within the CPR strata. The four included studies are described in Table 1.

Table 1 Included studies

Patient characteristics

Patient characteristics varied within the included studies. Mean age of subjects ranged from 58.4 years [19] to 80.2 years [22]. Three of the studies specifically evaluated patients with atrial fibrillation [2830] and one focused on patients with VTE/PE [31]. Three of the studies involved only patients receiving warfarin [28, 30, 31], and one involved patients receiving warfarin or ximelagatran [29]. No studies involving dabigatran, apixaban, or rivaroxaban were identified.

Intervention characteristics

One of the studies evaluated a CPR which existed at the time of our previous review [11], the modified outpatient bleeding risk index (mOBRI) [30]. The other three studies described novel CPRs, which we refer to as “RIETE” [31] and “hypertension, abnormal renal/liver function, stroke, bleeding history or predisposition, labile INR, elderly, drugs/alcohol concomitantly score (HAS-BLED)” [28, 29]. Lip et al. [29] are unique in that they simultaneously evaluated the performance of three other CPRs (all of which we have previously evaluated) in a new retrospective cohort of patients.

mOBRI

The outpatient bleeding risk index was first developed from a retrospective review by Landefeld et al. in 1989 [33]. It was subsequently modified by Beyth et al. [34] in a reanalysis of Landefeld’s cohort resulting in the mOBRI which includes age ≥65 years, history of stroke, history of gastrointestinal bleeding, and serious comorbidity (renal insufficiency, recent myocardial infarction, severe anemia, or diabetes). Others have evaluated the mOBRI in different settings [35, 36].

More recently, Airaksinen et al. [29] examined the mOBRI’s performance in patients undergoing percutaneous coronary intervention, 72% of whom were receiving anticoagulation for atrial fibrillation. Only one prior study evaluated this CPR in atrial fibrillation patients [35]. That study evaluated the mOBRI in an anticoagulation clinic and failed to demonstrate moderate or better performance [11]. In Airaksinen et al.’s retrospective study, 9% of the population was deemed to have a high bleeding risk (mOBRI score 3–4). Of these, ten patients (26%) had a major bleed; three of whom were on oral anticoagulation alone and the rest on additional antiplatelet therapy.

Lip et al. also retrospectively evaluated the mOBRI in a cohort of 3,665 warfarin recipients with atrial fibrillation participating in the SPORTIF III and V trials [29]. One hundred thirty-six major bleeds were observed during 1.4 years of observation.

RIETE

A novel six-point major bleeding risk CPR was developed and evaluated using a derivation sample and a validation sample (both retrospective) and logistic regression methods to determine optimal cutoffs for low, intermediate, and high risk of major bleeding [31]. The study data were derived from patients in a large Spanish venous thromboembolism registry (“RIETE”). The CPR involves four clinical variables and two laboratory-based ones (serum creatinine and presence/absence of anemia). Major bleeding rates over 3 months of exposure were 0.1%, 2.8%, and 6.2%, respectively. Based on LRs, the authors concluded that the CPR was capable of distinguishing between high and low bleeding risk (Table 2).

Table 2 Performance characteristics of CPRs in validation groups

HAS-BLED

This most recent bleeding risk CPR was retrospectively evaluated patients in the Euro Heart Survey on Atrial Fibrillation cohort using data derived in this population along with a literature review to develop their seven-characteristic scale. HAS-BLED includes hypertension, abnormal renal and liver function, stroke, bleeding history, labile INRs, age ≥75, aspirin/NSAID use, and alcohol use (1 point each). Based on their findings, the authors assert that the risk of bleeding for a particular patient will outweigh the potential benefit of oral anticoagulation if the HAS-BLED score exceeds the CHADS2 score for atrial fibrillation-associated stroke risk [37]. HAS-BLED was also retrospectively evaluated in a cohort of 7,329 warfarin or ximelagatran recipients participating in the SPORTIF atrial fibrillation trials [29].

None of the included studies addressed our secondary outcome measures of mortality, minor bleeding, or erratic INR (for warfarin only).

Methodological quality of included studies

Table 3 lists the scores for the quality assessment completed by the reviewers for the included studies. According to the predefined criteria, the studies included in this update of our previous review were of similar or slightly higher quality than those previously evaluated. The new studies all lacked prospective validation or prospective evaluation of their effects on clinical use. The HAS-BLED CPR is the only one for which its proponents propose a course of action based on a score [28], although the effects of doing so have not been studied.

Table 3 Quality characteristics and scores for the included studies, based on the method of Laupacis [18]

Levels of evidence

Table 1 depicts the design and population characteristics of the included studies and their level of evidence. The included studies were all derived and/or validated in either a split sample or a retrospective database.

Based on the evidence hierarchy by McGinn et al. [9] and our previous analysis [11], the mOBRI would still be classified as having level 2 evidence as it has been validated in different populations prospectively and has demonstrated reproducibility. Evidence that its application has changed clinician-prescribing behavior with favorable outcomes is not yet available. The two new CPRs evaluated would be classified as having level 4 evidence “Rules that need further evaluation before they can be applied clinically” [9].

Predictive ability of included studies

The performance characteristics of the included studies are described in Table 2. Based on the finding of five studies evaluating the mOBRI and two studies involving more than 2,000 subjects each for HAS-BLED, we chose to perform a pooled analyses of those studies, also depicted in Table 2.

Our analysis on the mOBRI data of Airaksinen et al. [30] revealed performance characteristics somewhat worse than in our previous pooled analysis of studies involving this CPR [11]. Only for “high risk” did the LR achieve statistical significance and its magnitude was in the “weak” category. Airaksinen et al. used survival analysis to assess differences in major adverse cardiac events or mortality between the low/intermediate/high bleeding risk categories and found an association only with the latter. They did not report other measures of CPR predictive ability. Our pooled analysis of all five studies involving the mOBRI (Table 2) shows that the “low” and “high” risk strata have weak predictive ability and classifying a patient as intermediate risk has no predictive utility at all.

The RIETE CPR [31] showed statistical significance for the low- and high-risk strata, and the LRs were considered strong for the low-risk stratum (LR 0.03). These authors used LRs to assess predictive ability in the validation cohort and our analysis closely matched theirs. In other words, a “low-risk” classification on the RIETE CPR may be clinically useful for identifying low bleeding risk venous thromboembolism patients. This requires prospective validation.

The authors of the HAS-BLED analyses did not recommend a low- or high-risk stratification scheme, and instead recommend that the decision to initiate oral anticoagulation be based on comparing the HAS-BLED score to the CHADS2 score for stroke risk. According to the authors, “the risk of bleeding outweighs the potential benefit of OAC if the HAS-BLED bleed score exceeds the individual CHADS2 index” [28]. In a study of Pisters et al. [28] this was based on the c statistic, which was 0.72 (0.65–0.79) in the validation cohort, corresponding to having “modest value” and below the threshold of 0.8 for “genuine clinical utility” [38, 39]. In addition to analyzing the LRs for each of the six risk scores, we performed the following additional analyses: To explore for threshold effects, we chose to analyze HAS-BLED scores of ≥2 vs. <2, ≥3 vs. <3, and ≥4 vs. <4 in our analysis of the individual studies and the pooled analysis. In the study of Pisters et al. [28], scores of 4 or 5 showed moderate predictive ability [LR 6.0 (2.23–16.0) and 9.0 (1.12–71.7), respectively] and a score of ≥4 was associated with moderate predictive ability [LR 6.42 (2.68 − 15.4)]. In the analysis of Lip et al. [29], much larger cohort did not reveal any score stratum with more than weak predictive ability, and no threshold effects were detected. Their analysis was based on c statistics, which were in the 0.6–0.7 range for all the reported analyses, corresponding to “limited clinical value” [38, 39]. For the pooled HAS-BLED analysis, none of the risk strata or threshold groupings yielded LRs any better than “weak” and the threshold effect detected in the study of Pisters et al. [28] was not preserved. This may represent a more accurate estimate of the effect or an artifact of pooling statistically or clinically heterogeneous data.

Comments

Since our previous evaluation of performance of CPRs for bleeding [11], two new CPRs have been developed (RIETE and HAS-BLED) and the mOBRI has been further studied.

Quality

Based on the comprehensive rating system used, we consider the overall quality of the available studies to be poor. For any of the existing CPRs to be recommended for routine use in clinical practice, prospective validation of their use and, ultimately, evidence that using them improves patient outcomes would be required [19]. At this point, no such evidence exists and no progress toward it is evident since our last review.

Performance

Using pre-specified thresholds for clinical usefulness based on LR estimates (LR >10, >5 or <0.2, <0.1) in only one case was moderate or better performance detected: a RIETE score of 0 points, which is strongly predictive of the absence of major bleeding [LR 0.03 (0.01–0.20)]. The finding in one study [28] that a HAS-BLED score ≥4 was moderately predictive did not persist when pooled with data from another study involving similar patients [29].

Based on our pooled analysis of all trials involving the mOBRI (Table 2), we believe that sufficient evidence is now available that this CPR lacks the ability to make clinically useful distinctions and it is doubtful that more studies are warranted. A limitation of our approach is that because the mOBRI has been studied in a variety of patient populations, pooling could obscure clinically useful LRs. We believe this is unlikely, however, since none of the individual studies of the mOBRI demonstrated such LRs either.

HAS-BLED has recently been endorsed by the European Society of Cardiology and the Canadian Cardiovascular Society for routine use in clinical practice [40, 41]. This seems premature given the CPR authors’ conclusion that it requires prospective validation prior to routine use [28], c statistics indicating only limited clinical value in a large cohort study [29], and our finding that it lacks power to predict major bleeding in any of the evaluable strata. Hence, the authors’ assertion that bleeding risk exceeds the prospect of benefit when the HAS-BLED score is greater than the CHADS2 score [28] requires prospective evaluation before being adopted as a decision-making rule. Also, adoption of such a rule in practice would somehow have to integrate the evidence that many patients place significantly more disutility on stroke than on major bleeding [42, 43].

Several explanations for the lack of moderate to strong predictive ability of the studied CPRs are possible. In studies with short follow-up periods and/or subjects at low inherent risk of bleeding, the incidence of major bleeding may be too low to result in sufficiently large LRs or results in excessively wide confidence intervals. The predictive variables used in existing CPRs simply may not contain enough useful information to distinguish between different risks of bleeding. Many reasons for this are possible, prominently including that in CPR derivation populations exposure to bleeding risk factors like aspirin, non-steroidal anti-inflammatory drugs, oral anticoagulants, or combinations thereof are so heterogeneous (by dose, drug, or duration) that these do not emerge in regression models as precisely as necessary to be predictive. This may be especially true of the mOBRI, for which the LR estimates are fairly precise and not very predictive. The list of patient factors that can increase bleeding risk is extensive and includes genetic factors, age, sex, personal characteristics such as compliance and dietary intake, comorbid conditions such as cancer and liver disease, and concurrent medications [44]. The metabolism and action of warfarin itself have been associated with 30 different genes with polymorphisms leading to large interindividual variations in dosing requirements and higher bleeding risks [44]. Perhaps this complexity makes creation of a clinical prediction rule that can be easily understood by clinicians and widely used in different populations impossible.

Applicability to practice

The mOBRI, RIETE, and HAS-BLED CPRs involve a small number of easily ascertained variables. HAS-BLED includes labile INR, a factor that can only be determined after initial warfarin exposure. Presumably this does not disqualify warfarin-naive patients, however, who could simply not be given a point for that parameter.

Using the dual criteria of levels of evidence and quantitative performance characteristics (LR), none of the identified CPRs are supported by sufficient evidence to recommend their adoption in clinical practice. The mOBRI and HAS-BLED are the most developed in terms of quantity of evidence; however, they both suffer from poor predictive performance and their widespread uptake into clinical practice could only be recommended once a positive impact on clinical decision making had been demonstrated. A low score on the RIETE CPR appears to be useful for identifying venous thromboembolism patients at negligible risk of bleeding, but this requires further validation.

Conclusion

Bleeding risk CPRs could provide clinicians with valuable individualized information to aid in decision making prior to starting oral anticoagulation therapy. Unfortunately, none of the available CPRs exhibit sufficient predictive accuracy or have trials evaluating the impact of their use on patient outcomes. Hence, it remains the case that no existing CPR can be recommended for widespread use in practice at present. A low RIETE score is promising as a means to identify patients at extremely low risk of bleeding. The current evidence does not support the use of HAS-BLED or mOBRI in routine practice. While the goal of accurately estimating a patient’s risk of major bleeding remains elusive, risk factors that are known to increase an individual patient’s risk of bleeding should be evaluated and minimized if possible. Further prospective trials are required to develop a CPR that can be reliably employed in clinical practice.