FormalPara Key Points

Age appropriateness of anticoagulants used for the long-term treatment of atrial fibrillation is still uncertain.

Using a structured comprehensive literature search and a subsequent Delphi process, an interdisciplinary expert group rated the appropriateness of oral anticoagulants for the long-term treatment of atrial fibrillation in older people with regard to efficacy, tolerability and safety.

In older people, the majority of these drugs, the non-vitamin K oral anticoagulants and warfarin, are seen to be beneficial or very beneficial while regionally used older vitamin K antagonists should be used with caution as evidence is missing.

The evidence basis for the use of these drugs in older people regarding geriatric syndromes is very limited.

1 Introduction

Atrial fibrillation is an age-related condition afflicting up to 2% of the general population [1] but up to about 13% in patients aged over 75 years [2]. It is a leading cause of morbidity and mortality as a result of stroke, but also a major risk for dementia [3]. Anticoagulation reduces the embolic risk by more than half [4]. As the prevalence of atrial fibrillation rises with age [3], anticoagulants are most often used in older people.

Against the background of pharmacological complexity in older people, the Fit fOR The Aged (FORTA) classification was introduced in 2008 with the aim of guiding physicians in their efforts to rapidly optimise and prioritise medications. FORTA is based on the benefit, risk and appropriateness of drugs for older patients in everyday clinical settings [5, 6]. It represents the first classification system in which both negative (harmful or critical drugs, D and C labels) and positive (beneficial drugs, A and B labels) labelling is combined at the level of individual drug or drug groups. The system and the derived FORTA list ([7] updated February 2016 [8]) are based on individual indications (implicit listing depending on patient characteristics/diagnoses) and therefore differ from negative lists such as the Beers Criteria list [9], which do not require intricate knowledge on patients (explicit lists [10]). In a randomised controlled trial (RCT) (VALFORTA), FORTA significantly improved medication quality as measured by the FORTA score that adds over- and under-treatment errors. FORTA also reduced adverse drug effects at a number needed to treat of only five [11]. Here, we present the rating process of an independent multi-professional international expert panel for eight oral anticoagulants (OACs) used to treat atrial fibrillation that was based on a structured comprehensive literature review and a subsequent two-step Delphi approach using the FORTA classification.

2 Methods

2.1 Procedure

The present expert rating procedure was similar to that used to assess urological drugs earlier (for details see [12]). In brief, a structured comprehensive review on clinical trials providing relevant data for OACs used in older people was performed; based on this structured comprehensive review, eight OACs were assessed and labelled by ten raters (all authors plus the initiator MW) according to the FORTA system.

2.1.1 Structured Comprehensive Literature Review

A literature search in PubMed/MEDLINE was performed from November 2015 through February 2016 using the search terms (drug name) (atrial fibrillation) in the International Nonproprietary Names terminology, plus the standard filters (randomized controlled trial) (full text available) (age 65+ years) (no language exclusion). The aim was to identify appropriate clinical trials to examine the efficacy, safety and tolerability of OACs used for the treatment of atrial fibrillation in older people. Primary research questions were to assess study and total patient numbers, quality of major outcome data and data of geriatric relevance. Abstracts were retrieved and reviewed for appropriateness by MW, and rechecked. Randomised controlled studies with >100 patients exposed to the particular drug for at least 6 months providing relevant data on stroke and/or safety (major bleeding, intracranial bleeding or geriatric syndromes, e.g. frailty, falls, dementia) for treatments were included, if abstracts pointed to such endpoints and the full paper proved to contain them, in particular, whether the article explicitly reported results in age groups ≥65, ≥70, ≥75, ≥80 or ≥85 years. Sub-analyses were only included if they contained data on the population searched for and those that were not reported in the primary paper. No other sources or primary data from investigators were included. The included studies were analysed for separate data on the group of older people that were recorded. Conflicts of interpretation would have been discussed further in the rater panel, but did not occur. Key information from appropriate articles was extracted into a Microsoft Word file with particular focus on the presence of information on geriatric syndromes [Table 1 of the Electronic Supplementary Material (ESM)]. By definition, only class 1 studies according to the Oxford Centre for Evidence-based Medicine were included ([13] individual RCT with narrow confidence interval). No meta-analysis of data was planned, rather summary values concerning effect/safety parameters were provided to the raters for their assessment.

2.1.2 Identification of the Raters

The initiator of the project (MW) identified raters based on online information. Experts were eligible if they met the following criteria: geriatricians or cardiologists with documented clinical experience in the pharmacotherapy of (multi-morbid) older people; high academic status; prominent standing in the leading geriatric/cardiology medical associations; substantial number, and the quality and relevance of publications. Accordingly, ten raters from seven European countries were identified who met those criteria and could also accept the invitation to participate (the other authors). This number was between the pre-set minimum of eight raters and the maximum of 15 raters.

2.1.3 Selection of Drugs to be Assessed

In the first step, the initiator selected OACs used in thromboembolic prevention for atrial fibrillation. The proposed choice of drugs was refined by the raters who voted for adding fluindione. The studied drugs were the vitamin K antagonists (VKAs) warfarin, phenprocoumon, acenocoumarol and fluindione, and all currently marketed non-vitamin K oral anticoagulants (NOACs): dabigatran, rivaroxaban, edoxaban and apixaban. The raters proposed dabigatran to be assessed for both the marketed high- and low-intensity approaches and edoxaban only for the licensed high-intensity approach.

2.1.4 Analysis of Summary of Product Characteristics

The most recent summary of product characteristics (SmPC) was downloaded for all drugs from the EMA website, or if not available, from other reliable sources (e.g. http://www.fachinfo.de) or the producer. The texts were thoroughly analysed using the same template as above (Table 1 of the ESM).

2.1.5 FORTA Labels

From this material, the initiator derived a proposal for initial FORTA labels. The proposal together with the spreadsheet and full texts/abstracts were forwarded to the rater team for the review and addition of further articles felt to be relevant.

2.1.6 Two-Step Delphi Process

The initiator and the rater group convened in March 2016; raters were instructed about the process with particular focus on the FORTA procedure. The evidence synthesis was presented and a round table discussion was chaired by MW. After the meeting, raters reviewed the literature, the structured comprehensive review, classified each of the listed drugs according to FORTA and had the option to provide comments in a survey form (Table 2 of the ESM). Rating was performed blinded to others’ scores. Results were collated and scores aggregates, along with anonymised comments shared with the raters. Where a consensus was not reached, raters were asked to resubmit scores.

2.1.7 Delphi Process, Statistics

Details of the Delphi method (all experts rate independently without knowing their peers’ ratings, knowing only the reached consensus) and the corresponding statistical analysis have been described in detail elsewhere [7, 12]. In brief, the international raters assessed the OAC after instruction about FORTA based on the structured comprehensive review and SmPC. The aggregated list of raters’ labels was statistically analysed and the aggregate findings were sent out to the raters for a second rating round if the corrected consensus coefficient was <0.8. The raters’ FORTA labels were converted into numerical values A → 1, B → 2, C → 3 and D → 4, respectively; the arithmetic mean (m) was calculated for each item, reconverted to FORTA labels and compared with the original author-based labels.

3 Results

3.1 Literature Search

Three hundred and eighty abstracts were potentially relevant based on the search in PubMed/Medline. Figure 1 shows 32 papers identified from abstracts meeting the inclusion criteria as checked in the full text (except for two abstracts); they contained results on clinical trials on older people or explicitly reported data from subgroups of older people aged ≥65 years (which is the most commonly used, but unauthorised definition of ‘elderly’) for the eight drugs investigated. Explicit results on clinical trials for older patients were reported for all drugs except for phenprocoumon, acenocoumarol and fluindione. Table 1 shows the number of abstracts retrieved, the numbers of studies reporting data on older patients to support drug efficacy and safety, patients’ numbers and information on geriatric syndromes. The drug with the most patients studied is warfarin; for each NOAC, several thousand older patients were studied as well in a grand total of eight eligible studies to date. Information on geriatric syndromes was only available in three trials on warfarin, concerning mental status, falls or frailty. The hazard ratios or odds ratios, event rates, for the individual trials regarding efficacy and safety parameters as well as their comparators are compiled in Table 1 of the ESM. Only one placebo-controlled trial of warfarin (the most studied drug) provided a subgroup analysis for both efficacy and safety on 616 patients aged >70 years (AFFIRM trial [14]). All NOACs were compared with warfarin only, and superiority claimed for one (rivaroxaban), two (dabigatran, edoxaban) or all three (apixaban) major endpoints (stroke/systemic embolism; major bleeding; intracerebral bleeding) with non-inferiority substantiated otherwise for all endpoints.

Fig. 1
figure 1

Flow diagram for the structured comprehensive review according to the preferred reporting items for systematic reviews and meta-analyses statement [15]

Table 1 Results of the structured comprehensive review on oral anticoagulants; if not separated, patients may have been counted twice in the age categories

3.2 Analysis of Summary of Product Characteristics

All package inserts explicitly mentioned the elderly population. A summary is provided in Table 1 (see ESM). Information available on side effects and contraindications of particular interest in older populations (e.g. geriatric syndromes) was not found in any of the reviewed SmPCs. All contained precautions regarding renal function and high age in general; only for NOACs do specific dosing recommendations exist reflecting renal function and high age (dabigatran and apixaban).

3.3 Delphi Process Leading to the Final FORTA Classification

Final ratings as well as the individual score categories are shown in Table 2. Proposed ratings were confirmed in 89% of cases (deviation for one out of nine items); only for high-intensity edoxaban was the final result of B different from the proposed A rating. Table 3 of the ESM compiles the raters’ comments.

Table 2 Results of the two-step Delphi process to label oral anticoagulants according to the Fit-fOR-The-Aged (FORTA) classification. The FORTA class is shown as well as the number of votes in each FORTA category. Comments were condensed from data, summary of product characteristics and raters’ comments shown in full in Tables 1 and 3 of the Electronic Supplementary Material

One of the nine items had to be re-rated in the second survey (high-intensity edoxaban). This was necessary as the first round resulted in five A and five B votes, leading to a corrected consensus coefficient of 0.75. The second round resulted in six B and four A votes, and the final vote remained unchanged at B.

For regionally used VKAs, two to four raters without experience with these VKAs refrained from voting. Table 2 also summarises the rationales (key points) behind the categorisation of the individual drugs as derived from data and the raters’ comments given in Table 3 of the ESM. Ratings were markedly different for e.g. warfarin (seven B, three C) and high-intensity edoxaban (six B, four A), yet leading to the same label B as the FORTA principle does not support intermediate values (for simplifying purposes). In contrast to this heterogeneity, apixaban was unanimously rated A by all ten raters. No item was assigned the FORTA-D (Don’t) label. Three VKAs (phenprocoumon, acenocoumarol and fluindione) were labelled FORTA-C, mainly reflecting the lack of study data in older people. This category indicates that it requires even more intense monitoring than that required for studied drugs.

Warfarin, dabigatran, edoxaban and rivaroxaban were labelled B (beneficial), which means atrial fibrillation can be safely and effectively treated in older people, and this label affirms that it is standard to treat this condition. Apixaban was labelled A (absolutely), meaning it was seen as the drug with the most beneficial risk–benefit ratio in this group. This differentiation was mainly based on the fact that endpoint superiority was most prominent for apixaban; either renal problems (dabigatran) or limited data on superiority (rivaroxaban, edoxaban) reduce the distance of these NOACs from warfarin, thus it cannot be detected by FORTA as the number of categories is limited.

4 Discussion

4.1 Strengths and Weaknesses

This structured comprehensive review for the first time confirmed the paucity or absence of data of geriatric importance in one of the most successful and important areas of drug treatment.

Concerning weaknesses, inclusion was limited to studies >100 participants treated for a minimum of 6 months which, thus, may have missed smaller trials. Reporting of endpoints was heterogeneous, in particular for bleeding events, as were patient populations regarding co-morbidities (e.g. reflected in CHADS2 or CHA2DS2-VASc scores), thus precluding quantitative comparisons. No attempts were made to obtain data on unpublished observations.

If older subgroups were not explicitly reported in larger studies, these data remained excluded.

Although SmPCs were included to consider unpublished information, some valuable information from clinical studies may not have been detected by the screening procedure. The experiences from uncontrolled studies, real-life cohort studies, registries or even case reports are lost in such an approach; however, they may contain relevant information, sometimes even triggering regulatory actions (case series and ‘Dear doctor letters’). This is reflected by the considerable discrepancy between numbers of primarily identified studies and included abstracts, in particular for warfarin (237 over 24).

The strength of the Delphi process is to bring opinions from different professional and regional backgrounds into a quantitative rating process, which is the typical strategy to assess treatments (and diagnostics) for which consensual elements are essential as evidence is sparse. Concerning weaknesses, the multidisciplinary nature of this Delphi exercise may result in biases and inconsistencies. For instance, not all of the raters had practical experience with all drugs (e.g. regionally used VKAs). Furthermore, the group was small and did not include key stake-holders (e.g. general practitioners, pharmacists) and experts from North America, and therefore a larger set of experts might have rated differently. However, the degree of consensus (only one out of nine items had to be re-rated) was remarkable, as experts with different professional backgrounds voted without knowing their colleagues’ opinions. This is in line with the degree of rating consensus for the first round of the Delphi process for the published FORTA list, both in 2012 and 2015 [7, 8], which was almost the same (92%) for a much larger group of raters (20 from different countries).

As the experts were instructed as a group about FORTA and the structured comprehensive review at the inaugural meeting, anonymity could not be warranted; conversely, this collective instruction ensured that rating was performed on the same basis of information. To further independence, communication of any opinion relevant to the voting was strongly discouraged (‘forbidden’) at this meeting, and a formal agreement was obtained on not communicating the individual votes between the experts during the Delphi rounds. Potential conflicts of interest together with industry sponsorship are openly listed below. We consider that the methodology, procedure and approach were robust in minimising any potential bias of this origin.

4.2 Key Findings on Oral Anticoagulant Appropriateness

The results of the FORTA process show that, within a given drug class, the perceived appropriateness of individual drugs may substantially vary: the rating for OACs range from C to A. Such differences can be based on proven differences in efficacy and safety (for example, newer drugs may have better efficacy and/or safety), but also on the quality of the available trial(s), and the specific patient population studied. Such compelling evidence, as typically derived from RCTs, is an exception rather than the rule for the older population. Studies especially designed for older people may reflect specific outcomes of interest, for example, cognition or frailty aspects, rather than efficacy or tolerability. Such studies are typically even rarer, as discussed here for dementia: this pivotal geriatric condition and atrial fibrillation are clearly associated as shown in a recent meta-analysis and a cohort study [16, 17]. Only two older studies on warfarin reported on cognitive function or falls in relation to treatment [18, 19], but none for the other drugs. In an earlier uncontrolled trial, even the hypothesis was derived that dementia may be prevented by warfarin [20], and a relationship between anticoagulation control and dementia was reported [21]. A current meta-analysis suggests a possible cognitive benefit of anticoagulation [22]. Further non-controlled studies such as those by Perera et al. and Lefebvre et al. [23, 24] show that frail patients are at the higher risk for bleeding. However, no comparison between drug and placebo supports the assessment of efficacy/safety here.

Oral anticoagulation treatment is often withheld in clinical practice because of the risk of falls. One analysis suggests this risk might be over-emphasised [25], though people who fall while on anticoagulation treatment would also seem to have greater mortality [26]. This balance of degree of risk vs. derived benefit of anticoagulation, in falling patients, requires further research.

Thus, a major result of this structured comprehensive review is the fact that though older people are included in several RCTs, no relevant data on specific geriatric syndromes or side effects with geriatric relevance other than bleeding have been sufficiently studied and, thus, cannot be used to guide the FORTA assessment. The latter has consequently been based only on efficacy/safety data and use conditions (e.g. renal function, dosing regime), as in some cases this information is only found in the SmPCs.

This was the case for three VKAs: phenprocoumon, which is almost exclusively used in Germany, fluindione is only used in France and acenocoumarol is used mainly in France. The lack of eligible studies was the basis for labelling them FORTA-C: this category (cautious) is typical for drugs with potential risks that need to be applied under close surveillance as comparable study data are missing. Therefore, it is recommended to rethink current practice to use unstudied drugs in older people.

Warfarin was seen as beneficial (FORTA-B), as the overall positive impression is backed by studies in >26,000 older people, and a strong indication to treat the disease in older patients is derived from the data on this particular drug. The assumption that warfarin is beneficial in older people is based on one subgroup analysis including 616 patients comparing warfarin with a matched group of patients not taking warfarin [14]. A smaller placebo-controlled study involved 110 older patients taking warfarin but only reported efficacy data [27]. Strictly speaking, these 700 patients tells us that warfarin is useful in older people, and NOACs may be similarly, or more, efficacious and safe. Thus, the placebo basis for the entire OAC data construct is very slim in the elderly.

The NOACs were seen as either beneficial (B) or in one case very beneficial (A). They provide at least one (apixaban two) large trial containing thousands of older people. Efficacy and safety parameters were looked at specifically for older people, and superiority to warfarin seemed most consistent for apixaban. Superiority was seen for all important endpoints, major and intracranial bleeding, all-cause stroke prevention and mortality. Rivaroxaban, though tested in a study with sicker patients, showed no superiority in those endpoints except for intracerebral bleeding, which however also disappeared at age >75 years. Dabigatran at both dosing intensities was superior in major bleeding and stroke prevention. Edoxaban (high intensity only) was safer but not more efficacious than warfarin. The latter three NOACs were not seen to be better than warfarin in older people, though the vote on edoxaban was closest to the A level.

Some of those observational studies or registries cited above (e.g. [20,21,22,23,24]), but not included in this structured comprehensive review, point to the importance of data from studies or even case reports not fulfilling inclusion criteria of RCTs commonly required for such reviews. This instrument thus does not cover all evidence available, though even regulatory action including ‘Dear doctor letters’ may be triggered by these observations such as for bleeding complications of dabigatran treatment in 2011 in Germany [28]. The consensus process by experts in the field may offer some compensation for this deficiency, as their experience and knowledge of the field should reflect major ‘other’ evidence; of course this introduces subjective reasoning and can only be seen as a weak remedy.

It is speculative to explain the endpoint differences for the NOACs; certainly, renal function is reduced at a high age, and this is a particular concern for any drug with predominant renal excretion and a narrow therapeutic index. In this context, dabigatran (renal excretion 80%), and possibly edoxaban (renal excretion 50%) have an age-related safety problem that may result in accumulation [29], in particular, if renal dosing is not optimal. This feature, however, cannot explain all differences observed.

In essence, the votes in this Delphi-process were mainly guided by the available endpoint data; it is an exceptional advantage of NOACs in this regard that, for atrial fibrillation, they were primarily tested in older people with average ages between 71 and 73 years in the large phase III studies. Yet, warfarin-specific data on geriatric syndromes are widely missing as none of these trials was specifically designed to meet the needs of a geriatric population, e.g. by including tests on mental function, frailty or fall risk.

Bearing this in mind, FORTA does only reflect currently available data; it should be seen as a stimulus to fill the huge gaps in clinical data concerning older people, and if new data become available it should be revisited. The absence of information on side effects and contraindications of particular relevance to older populations (e.g. geriatric syndromes) in any of the reviewed SmPCs continues to be unacceptable.

5 Conclusions

All NOACs and warfarin were classified as beneficial or very beneficial in older persons (FORTA-A or -B), underlining the overall positive assessment of the risk/benefit ratio for these drugs against available evidence. Differentiations between FORTA-A and -B were limited owing to the restricted number of categories in this system, thus not reflecting distinct advantages or disadvantages in full. For other vitamin K antagonists (FORTA-C) regionally used in Europe, the lack of evidence should challenge current practice.