Impact statements

  • Machine learning algorithms play a vital role in the implementation of personalized medicine and can be used to predict the clinically relevant therapeutic outcomes with warfarin.

  • Genotyping for CYP2C9, VKORC1, and CYP4F2 polymorphisms aid in identifying patients who are likely to have poor anticoagulation control with warfarin as well in determining the appropriate dose.

Introduction

Warfarin, the most widely-used anticoagulant drug, characterizes certain challenges in terms of clinical use due to its narrow therapeutic window and wide inter-individual variability [1,2,3]. Anticoagulation control has been identified as a predictor of the health-related quality of life for patients who use warfarin [4].

Warfarin is one of the few drugs for which the clinical utility of pharmacogenetics measures is established. Single nucleotide polymorphisms (SNPs) in the cytochrome P 450 2C9 (CYP2C9), vitamin K epoxide reductase complex 1 (VKORC1), and CYP4F2 metabolizing enzymes, in conjunction with non-genetic factors, explain around two-thirds of the variability in warfarin dosing [5]. Genotype-based dosing has been demonstrated to improve the prediction of the therapeutic dose for warfarin, relative to clinical algorithms [6]. We reported that patients with specific genotypes of CYP2C9, VKORC1, and CYP4F2 required a reduced stable weekly warfarin dose (SWWD), showed higher variability in the prothrombin time-international normalized ratio (PT-INR), and characterized an increased risk of bleeding [7].

A recent systematic review reported that nearly 80% of warfarin dosing algorithms were directed toward dose initiation and that most of these algorithms were developed using multiple linear regression and employed in the context of Asians/Whites [8]. Another recent review on various dosing methods revealed conflicting results due to differences in the study population, dosing regimen, and estimation of outcomes [9]. Time spent in the therapeutic range (TTR) is a vital clinical endpoint that determines both the sub-therapeutic (leading to thrombosis) and supra-therapeutic effects (leading to bleeding episodes) of warfarin. Nevertheless, algorithms to predict TTR with the use of clinical and genetic factors have not been widely investigated, relative to other literature.

Machine learning algorithms (MLAs), which have recently emerged as a promising method to model drug responses in the field of pharmacogenetics and pharmacometrics, employ data-driven models that aim to predict outcomes after being trained using a data set [10, 11]. Machine learning algorithms were found to be useful in the context of major depressive disorders and cancer states [12, 13]. They also demonstrate great potential for implementing personalized medication therapy [14].

While several studies have evaluated machine learning for warfarin therapy using non-genetic factors, there is a dearth of research that uses genetic factors as predictors [15,16,17]. Moreover, anti-coagulation status (ACS), an important clinical outcome variable, has hardly been investigated, as only one study has evaluated the non-genetic factors that contribute to it [17]. We conducted a pharmacogenetic study to evaluate the association between genetic polymorphisms of CYP2C9, VKORC1, and CYP4F2 with clinically significant outcomes [7]. The primary objective of the current study was to evaluate the application of the decision-tree procedure in creating tree-based classification models to predict ACS and SWWD outcome variables. Classification and regression tree (CART), chi-square automatic interaction detector (CHAID), and C5.0 algorithms are commonly used decision tree models [18]. Further, we compared the findings with those of the international warfarin pharmacogenetics consortium (IWPC) dataset [19].

Aim

To identify factors (clinical and genetic) that predict the poor ACS and SWWD with the use of supervised MLAs.

Ethics approval

The current study was conducted as part of a warfarin pharmacogenomics research approved by the Institutional Ethics Committee (E024-PI-11/18) on April 22, 2019.

Method

Study procedure

This cross-sectional study was carried out in the Department of Cardiology, Salmaniya Medical Complex (a tertiary care hospital), Kingdom of Bahrain, from September 2019 until November 2020. Written consent was obtained from the study participants. We adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines [20].

Patients who received warfarin for at least the previous six months were included. Age, sex, comorbidities, warfarin dosage regimen, PT-INR, and concomitant drugs were obtained for all participants. Further, congestive heart failure, hypertension, age, diabetes, stroke, sex, vascular disease (CHA2DS2-VASc) scores; hypertension, abnormal liver or renal function, stroke, bleeding, labile INRs, elderly, drugs/alcohol (HASBLED) scores; and sex, age, medical history, treatment, tobacco use, and race (SAMe-TT2R2) scores were estimated. The following concomitant drugs were considered to interact with warfarin: statins, proton pump inhibitors, carbamazepine, phenytoin, valproic acid, and amiodarone [21].

The genetic polymorphisms in CYP2C9 (rs1799853 and rs1057910), VKORC1 (rs9923231), and CYP4F2 (rs2108622) were obtained using the allele discrimination genotyping methods, as previously described [7].

IWPC dataset

The IWPC dataset was obtained from the PharmGKB website [22]. Cases, where one or more of the following variables were unavailable, were excluded: age group, drug interaction, PT-INR, VKORC1, and CYP2C9 genotypes. Smokers were excluded, such that our population included only non-smokers.

In the IWPC data, age was coded as a categorical variable in 10 year intervals. Hence, for machine learning analyses, age was categorized as follows: young: < 40 years; middle-aged: ≥ 40 to 69 years; and elderly: ≥ 70 years.

Outcomes

The warfarin therapeutic range was defined as 2.5–3.5 for subjects who underwent mechanical valve replacement surgery. For other indications, the therapeutic range was defined as 2–3 [23].

Time spent in the therapeutic range (TTR) was obtained by the Rosendaal method [24]. Individuals with TTR ≥ 70% were categorized as having adequate ACS while those with TTR < 70% were categorized as poor ACS [25].

The warfarin dose was considered stable when two consecutive PT-INR values with a gap of at least 1 week were observed in the therapeutic range from the start of the warfarin therapy [26].

The SWWD from the IWPC dataset was analyzed for comparison with the Bahraini data. TTR was not available in the IWPC dataset.

Machine learning analyses

SPSS version 28 (IBM Corp. Released 2020. IBM SPSS Statistics for Windows, Version 27.0. Armonk, NY: IBM Corp.) and SPSS Modeler version 18 were used for machine learning analysis.

The datasets were partitioned (80:20) across training and testing cohorts.

Median with interquartile ranges (IQR) was used to represent the numerical variables. The median differences of continuous variables between the training and testing cohorts were evaluated using the Mann–Whitney U test. The chi-square test was used for categorical variables.

Ages were categorized as follows: < 40 years (young); 40 to < 65 years (middle-aged); and ≥ 65 years (elderly) for the dataset associated with the Bahraini population. Due to the small numbers in various categories of single nucleotide polymorphisms (SNPs), they were considered as a single entity for the analysis of anticoagulation control. Predictors included were CYP2C9, VKORC1, and CYP4F2 genotypes, age, gender, and the presence of concomitant interacting drugs. All predictors in the study were categorical variables.

The C5.0 decision tree classification algorithm was used to predict poor ACS outcomes. CART and CHAID analyses were used to predict SWWD in the Bahraini population and the IWPC dataset. These models were selected as they characterized the least errors relative to the conventional regression models. Linear regression analysis was carried out with SWWD as the dependent variable and the above-mentioned predictors as the independent variables.

The comparison of algorithms for SWWD was evaluated with the percentage of predicted doses within 20% of the actual dose (PPD-20%AD), as previous studies considered this threshold to be clinically significant [15].

Logistic regression analyses were carried out with the MLA to predict ACS, as well as for those who had their algorithm-predicted stable doses within 20% of the actual dose. The root mean squared error (RMSE), calculated as the square root of the mean of squared differences between the actual and algorithm-predicted doses was estimated along with the mean absolute error (MAE). The relationship with the anticoagulation status (adequate/poor) is expressed in terms of odds ratio (OR) with 95% confidence intervals (95% CI). The area under the receiver-operating characteristics curve (AUROC) was used to estimate model-predicted poor anticoagulation status, relative to the actual status. As per the sample size recommendation from the TRIPOD guidelines, a minimum of 10 participants are required per predictor parameter of each candidate [20]. Thus, with a total of six predictors in the present study, the minimum estimated sample size was 60.

Results

Demographic characteristics

The 232 study participants had a median age of 69 (IQR: 57–76) years. The other demographic characteristics have been summarized in Electronic Supplementary Table 1. The medians (IQR) for CHA2DS2-VASc, HASBLED, and SAMe-TT2R2 scores were 4 (3–4), 2 (1.75–3), and 1 (1–2), respectively.

The analysis of the group with SWWD included 218 patients; 14 patients who did not achieve stable therapeutic control were excluded.

The training (n = 173 for ACS and n = 163 subjects for SWWD) and testing cohorts were found to be similar (Table 1).

Table 1 Comparison of variables between training and test cohorts in the study participants

The sample size of included subjects for the IWPC dataset was n = 1962 out of a possible n = 5700. The following factors constituted the rationale for excluding data on 3738 subjects: patients were smokers (n = 2929); data were unavailable for the CYP2C9 genotype (n = 50); characterized VKORC1 1639 genotyping (n = 513), were not from the relevant age group (n = 1); were characterized by undesirable concomitant medications (n = 85); and PT-INR (n = 160). Characteristics across the IWPC training and testing cohorts were comparable (Table 2).

Table 2 Comparison of training and testing cohorts of the IWPC dataset

Prediction of the poor anticoagulation status

The median (IQR) for TTR amongst the study participants was found to be 67.6 (54.5–77%). The TTR for 141 participants (60.8%) was found to be less than 70%. Electronic Supplementary Fig. 1 shows the decision tree from the C5.0 algorithm. The first nodal split was observed on the CYP4F2, the second nodal split on the CYP2C9 genotype, while the third, fourth, and fifth nodal splits were found on the VKORC1 genotype, age status, and the presence of interacting drugs, respectively. CYP2C9 genotype (Node 20 in Electronic Supplmentary Fig. 1) was associated with poor ACS in the sub-group with the variant CYP4F2 allele (Node 10). The variant allele of VKORC1 (Node 4) was associated with poor ACS in the subgroup with the wild-type CYP2C9 allele (Node 2). The young and middle-aged groups were associated with an increased risk of poor ACS in sub-groups where the variant allele was in VKORC1, CYP2C9 allele, and the wild-type CYP4F2 (Node 6). The young and middle-aged groups were also associated with a high risk of poor ACS with variant type VKORC1, wild-type CYP2C9 allele, and variant CYP4F2 genotype (Node 16). Patients who were treated with interacting drugs, with wild-type VKORC1 and CYP2C9 genotypes with the variant CYP4F2, characterized an increased risk of poor ACS (Node 14).

Logistic regression analysis revealed VKORC1 SNPs to be significant (OR: 4.6; 95% CI: 1, 21.3) in the C5.0 algorithm for predicting poor ACS. Further, AUROC for the C5.0 algorithm was 0.53.

Prediction of stable weekly warfarin dose

The median (IQR) of SWWD was 31.5 (24.5–42 mg). Linear regression analysis revealed that CYP4F2 and VKORC1 genotypes, in addition to age categories, represented significant predictors for SWWD (Electronic Supplementary Table 2). The linear regression model explained 32.6% of the variation in SWWD in the Bahraini population. The CART analysis revealed VKORC1 polymorphisms to be the most significant predictor, followed by CYP2C9 (Fig. 1). Patients with an SNP in either of these enzymes required lower weekly doses, relative to those with the wild-type. An evaluation of model parameters revealed a similar MAE across the training (10.2 mg/week) and testing cohorts (9.1 mg/week).

Fig. 1
figure 1

CART analysis of SWWD. Each node represents the number of patients included for that specific variable with the percentage of patients included from the previous node. The predicted dose was the median SWWD in milligrams

The other evaluation parameters of the predictive algorithms, which were found to be similar, have been represented in the Electronic Supplementary Table 3. The percentage of patients within 20% of the MAE with the CART algorithm was 38.2%. The RMSE for the same was observed to be 13.6.

Multivariate logistic regression analysis for the predicted outcomes from the testing cohort revealed only SNP in CYP4F2 to be significant in (OR: 11.2; 95% CI: 1.1, 111) the CART-prediction of SWWD within 20% of the actual dose.

The first nodal split in CHAID analysis occurs on VKORC1. Males with VKORC1 C/T genotype were found to require relatively higher doses than females (Fig. 2). Females with the variant allele of CYP4F2 were found to require higher doses. Further, among patients with wild-type CYP4F2, the presence of an interacting drug was found to reduce the dosage requirement. For those with the homozygous VKORC1 T/T genotype, interacting drugs increased the warfarin dose required to achieve a stable therapeutic PT-INR. CART predicted the following variables in order of significance: the CYP2C9 genotype, presence of potentially interacting drugs, CYP4F2 genotype, sex, age, and VKORC1 genotypes. An evaluation of model parameters revealed a similar mean absolute error between the training (9.3 mg/week) and testing (9.9 mg/week) cohorts with the model with other comparable parameters (Electronic Supplementary Table 3). The percentage of patients within 20% of the MAE with the CHAID algorithm was 49% while the RMSE was found to be 13.4. Logistic regression analysis did not reveal any significant association among the independent factors for CHAID-predicted doses within the 20% threshold.

Fig. 2
figure 2

Prediction of SWWD by the CHAID algorithm. Each node represents the number of patients included for that specific variable with the percentage of patients included from the previous node. The predicted dose was the median SWWD in milligrams

Comparison of algorithms between populations

Following the application of the CART algorithm on the IWPC dataset, VKORC1, followed by age, CYP2C9, the presence of potentially interacting drugs, and sex were identified as variables predicted by the algorithm, in the order of their significance. The decision algorithm for the IWPC dataset has been depicted in Electronic Supplementary Fig. 2. The first nodal split occurred with VKORC1 genotypes; the G/G genotype group characterized the highest warfarin stable dose, relative to the A/G and A/A genotypes. Elderly individuals with the wild VKORC1 genotype (G/G) were found to require lower doses, similar to those with the A/G genotype. Among the CYP2C9 group, *1/*1, *1/*2, and *1/*11 were predicted to characterize higher warfarin doses while *1/*3, *1/*13, *2/*2, *2/*3, and *3/*3 were predicted to require significantly lower doses across all the genotypes of VKORC1. An evaluation of model parameters across the training and testing cohorts with the CART algorithm was comparable (Electronic Supplementary Table 4). The mean absolute error was 9.3 mg/week for the training, and 10.5 mg/week for the testing cohorts. The percentage of patients within 20% of actual doses was observed to be 43.1% with the CART model, accompanied by an RMSE of 19. Logistic regression analysis did not reveal any significant associations among the independent factors with the outcome for the IWPC dataset, as predicted by the CART model.

The CHAID algorithm also revealed the first nodal split on VKORC1 genotypes (Electronic Supplementary Fig. 3). Those with a homozygous mutant in the VKORC1 (A/A) genotype required lower doses. Of this group, those with CYP2C9 genotypes (*1/*3, *1/*13, *1/*14, *2/*2, *2/*3, and *3/*3) required the least. With regard to the other VKORC1 A/G and G/G genotypes, elderly individuals required lower doses, such as in the case of those characterized by the CYP2C9 *1/*1 genotype. Similarly, males in the elderly age group with G/G and A/G genotypes in VKORC1 required higher stable doses. An evaluation of the model parameters across the training and testing cohorts with the CHAID algorithm was comparable (Electronic Supplementary Table 4). The mean absolute error was 9.3 mg/week for the training and 10.5 mg/week for the testing cohorts. The percentage of patients within 20% of actual doses was observed to be 43.4% with the CHAID model. The RMSE was found to be 19.2. Logistic regression analysis did not reveal any significant association among the independent factors for the IWPC dataset, as predicted by the CHAID model.

A comparison of outputs of MLAs in their predictions of SWWD in the IWPC dataset concurred with that of our population, where VKORC1 was observed as the most significant predictive factor, followed by CYP2C9, where those with the wild genotype (*1/*1) required relatively higher stable doses of warfarin. Further, males were found to require higher stable doses, while the presence of interacting drugs was associated with lower SWWD.

Discussion

Statement of key findings

The current research represented the first study to evaluate the utility of MLAs in the prediction of ACS and SWWD with the use of genetic factors (in addition to non-genetic covariates) among patients receiving warfarin. CYP4F2, CYP2C9 genotypes, age, the presence of potentially interacting drugs, and VKORC1 genotypes were identified as key predictors of ACS. For SWWD, VKORC1, CYP2C9 genotype, sex, CYP4F2 genotype, the presence of potentially interacting drug, and age were found to be the key predictors. An evaluation of the classification MLA in the IWPC dataset revealed findings similar to ours.

Comparison of findings with other studies

Previously, Liu et al. evaluated nine MLAs in their predictions of warfarin therapeutic doses using the IWPC dataset and observed that multivariate adaptive regression splines (MARS) and Bayesian additive regression trees (BART) performed well among the Whites; further, the performance of support vector regression, BART, MARS, and lasso regression (LAR) was similar to multiple linear regression (MLR) in the Asian population; finally, MLR and LAR performed well in the Blacks [15]. We observed that the CART and CHAID analyses performed well in our population in the context of warfarin stable dose and the C5.0 algorithm for ACS. Further, the overall performances of the models evaluated by Liu et al. ranged between 37 and 47%, which is comparable to the present study, where the performances ranged between 38 and 49% in our population, and 44.7% for the IWPC dataset. The RMSE of the models evaluated in the present study ranged between 13.4 and 13.6, which were significantly lower than 21.6 with the neural network, 17.3 with support vector regression, and 14.51 with MVR methods, as reported by Sharabiani et al. [27]. Similarly, the C5.0 algorithm predicted poor ACS performance, similar to the findings of Gordon et al. who identified the accuracies (AUROCs) of the stochastic gradient boosting method and recurrent neural network algorithms to be 0.6 [17]. However, the authors in that study observed that the time-varying neural network model outperformed all other models with an AUROC of 0.8 [17]. A recent study from sub-Saharan African patients revealed only slight differences between the 21 models, which also included decision trees, to predict the stable warfarin dose [28]. Nguyen et al. evaluated predictions of stable daily warfarin doses with CART and other network algorithms, observing that the multiple linear methods and gradient boosting machine were demonstrative of the best performance [29].

Our finding on the association between CYP4F2 polymorphisms with anticoagulation control and warfarin stable dose outcomes has been supported by an emerging body of literature, which indicates correlations between CYP4F2 polymorphisms (A/G, A/A) and reduced warfarin doses in Korean, Caucasians, Asians, and Japanese populations [30,31,32,33]. A recent meta-analysis found that the presence of CYP4F2 polymorphisms required 11% [95% confidence interval: 8–14%] higher doses than the wild-type genotype [34]. Nonetheless, a study on the German population revealed an improvement of only 0.5–0.7% with the inclusion of CYP4F2 SNPs [35]. The CYP4F2*3 genotype was associated with higher warfarin doses in European Americans but not African Americans [36]. In the present study, we found CYP4F2 to be a significant predictor of ACS and SWWD in the MLAs. Our finding compels larger studies to assess the inclusion of CYP4F2 polymorphisms in the pharmacogenetic algorithms for predicting warfarin dose. Further, the study found males with VKORC1 polymorphisms to require a significantly greater dose of warfarin, which aligns with the extant literature [37,38,39]. Although the exact reasons for the same remain unknown, factors such as variations in the rate of gastric emptying/intestinal enzymatic expression/body water content/hepatic metabolism may contribute to the altered pharmacokinetics of warfarin, and consequently, the dose required.

Strengths and limitations

To our knowledge, this research represents the first study that evaluated the utility of MLAs in their prediction of ACS using genetic co-variates among patients receiving warfarin. Our findings were qualitatively concordant with those that of the IWPC dataset. It provided a measure of external support for the prediction of SWWD using this approach. However, the study was limited by its cross-sectional design and compliance with therapy. Further, endpoints associated with supratherapeutic anticoagulation, such as bleeding episodes, dietary consumption of green leafy vegetables, serum vitamin K concentration, and mRNA expressions of the identified polymorphisms could not be assessed. Moreover, the IWPC dataset failed to include age as a continuous variable and instead presented age in categories with 10-year intervals, which may also have impacted the findings associated with this variable.

Conclusion

Machine learning algorithms are promising tools in delineating the factors for appropriate decision-making processes with warfarin therapy. The study marked the need to externally validate the MLAs in a prospective study. The recognized factors across both our population and the IWPC dataset were found to be similar.