Introduction

Warfarin is one of the most widely used oral anticoagulants. It competitively inhibits vitamin K epoxide reductase complex subunit 1 (VKORC1) that catalyzes the recycling of vitamin K in the liver. The inhibition of vitamin K epoxide reductase (VKOR) impairs the activity of coagulation factors II, VII, IX, and X by depleting reduced vitamin K [1]. The pharmacological difficulties of warfarin treatment are associated with its narrow effective range and the large inter-individual differences in maintaining its effective dose. Due to these features, repeated monitoring of the coagulation ability defined as the prothrombin time–international normalized ratio (PT-INR) is required to determine individually the adequate dose at each time point [2], which lowers the risk for adverse events such as ischemic or hemorrhagic stroke [3].

Warfarin is a racemic mixture of S- and R-warfarin. S-warfarin is approximately three to five times more potent as an anticoagulant than R-warfarin. S-warfarin is predominantly metabolized by cytochrome P450 2C9 (CYP2C9) [4]. Cumulative evidence indicates that genetic polymorphisms of VKORC1 and CYP2C9 are closely associated with the optimal dose of warfarin [5]. The genetic polymorphisms of the CYP2C9, CYP2C9*2 (430C>T) and CYP2C9*3 (1075A>C) reduce the CYP2C9 activity to 12% and 5% of the wild-type activity, respectively. As a result, the maintenance warfarin dose in subjects with these genetic polymorphisms is strongly decreased [6]. In contrast, VKORC1 (-1639G>A), a genetic polymorphism in the promoter region of the VKORC, decreases the expression level of VKORC1, leading to a decreased maintenance warfarin dose [7]. Furthermore, it has been reported that the cytochrome P450 4F2 (CYP4F2) polymorphism affects warfarin dose [8]. CYP4F2 is involved in the accumulation of vitamin K in the liver by catalyzing the production of hydroxylated vitamin K [9]. The genetic polymorphism of the CYP4F2, CYP4F2*3 (1297C>T) decreases the enzymatic activity of CYP4F2, leading to increased hepatic vitamin K. Therefore, the genetic polymorphism of CYP4F2 increases the requirement for warfarin [10].

So far, several algorithms for predicting the warfarin dose have been proposed to determine the optimal dose prior to warfarin administration [11, 12]. Most of these algorithms typically use patient characteristics such as age, weight, height, concomitant drugs, and the genotypes of CYP2C9, VKORC1, and/or CYP4F2 [13]. These genotype-guided warfarin dosing algorithms were considered a rational approach to optimize warfarin dosing and, potentially, reduce adverse events. However, these algorithms are not widely used in the real world, because at least partially, the algorithm formulas propose coefficient values that differ strongly for each factor, and in addition, mutual validity among these algorithms has not been addressed [14].

The clinical significance of CYP2C9, VKORC1, and CYP4F2 genetic polymorphisms has been recognized in many populations with different races. The reduction in CYP2C9 activity caused by the genetic polymorphism of CYP2C9 (dominated by CYP2C9*2 and CYP2C9*3) varies significantly depending on racial differences; thus, warfarin responsiveness related with the frequency of these genetic polymorphisms for racial differences has attracted attention [15]. Among Caucasians, the allele frequencies of CYP2C9*2 and CYP2C9*3 are in the range of 8–12% and 6–10%, respectively, but they are lower in Southeast Asians. Variant allele frequencies are higher in Caucasians than Japanese, which may contribute, at least partially, to the racial differences. [16]. The frequency of the VKORC1 (-1639 G>A) minor allele is around 90% in Asian, around 40% in Caucasians, and around 9% in African-Americans [17]. The frequency of the CYP4F2*3 minor allele is approximately 30% in Asians and Caucasians and approximately 7% in African-Americans [18]. Because of the differences between the algorithms, it is unknown whether the significance of the genetic polymorphisms differs among populations. Previously, we demonstrated the importance of the CYP2C9 and VKORC1 genetic polymorphisms in a Japanese population and created a corresponding algorithm to predict the maintenance warfarin dosage [19]. In this study, we examined the mutual validity among the algorithms by using the previously collected clinical data of the Japanese population as a test set.

Methods

The Ethical Review Committee of Osaka University approved this study (approval number, 766).

Study subjects

Clinical data of the subjects from our previous study [19] were used as registered data. Briefly, the study subjects were 125 Japanese patients with stable anticoagulant warfarin therapy, who provided written informed consent to allow their samples and clinical data to be used for secondary analyses. As shown in Table 1, the patients’ information on age, weight, height, PT-INR value, and genotypes of CYP2C9, VKORC1, and CYP4F2 were used for this study. Warfarin maintenance dose was defined as the dose that controlled the PT-INR range between 1.5 and 3.0 during the last three clinic visits and designated as actual warfarin dose (AWD) in this study. Patients were excluded if they had hepatic or renal dysfunction or if they had the concomitant medication by amiodarone, bucolome, fluconazole, miconazole, and sulfamethoxazole in addition to warfarin because these drugs affect the metabolism of warfarin.

Table 1 Characteristics of the study subjects

Selection criteria for algorithms

As shown in Table 2, the algorithms, designated as Original and algorithms I (IWPC) to V, were selected for this study according to the following criteria: (1) The algorithms should include genetic polymorphisms of CYP2C9, VKORC1, and/or CYP4F2 as variables. (2) The articles with the algorithms should provide the patients’ information related to race and disease. (3) The algorithms should include both weight and height as their variables. (4) The algorithms should be designed to calculate the maintenance dose rather than the initial dose and should not require the initial dose as a variable. (5) The algorithms should be in articles with free access to the full text in the PubMed Central database. We excluded algorithms as candidates if they required the patients’ information on chronic kidney disease, smoking, and drinking of alcoholic beverages as variables.

Table 2 Summary of selected algorithms

The warfarin doses were calculated using the information on 125 patients and algorithms that fulfilled all selection criteria.

Statistical analyses

The performance of each algorithm was evaluated by the mean absolute error (MAE), relative error as root mean square error (RMSE), and root mean square percentage error (RMSPE) to the AWD calculated using the following formulas:

$$ \mathrm{MAE}=\frac{1}{n}{\sum}_{i=1}^n\mid CWDi- AWDi\mid $$
(1)
$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}{\sum}_{i=1}^n{\left( CWDi- AWDi\right)}^2} $$
(2)
$$ \mathrm{RMSPE}=100\sqrt{\frac{1}{n}{\sum}_{i=1}^n{\left(\frac{CWDi- AWDi}{AWDi}\right)}^2} $$
(3)

In addition, we confirmed the distribution of the difference between the AWD and the calculated warfarin dose (CWD) by Bland-Altman plots. Limits of agreement (LA) were defined as the symmetric range encompassing 95% of the data. Assuming a normal error distribution, the upper and lower limits of agreement (ULA and LLA, respectively) were calculated as the mean difference (MD) ± 1.96 times the standard deviation (S.D.) of the differences. We also calculated the percentage of patients whose CWD was within 20% of the AWD to evaluate the potential clinical value of each algorithm according to a previous report [20]. Finally, the one-way ANOVA and the Bonferroni correction method for multiple comparisons were performed, and the value of p < 0.05 was considered significantly different. IBM SPSS® Statistics Version 21 (IBM Corp.) was used for statistical analyses.

Results

As shown in Fig. 1, to analyze the correlations between the AWD and the CWD derived from the algorithms designated as Original (a), I (IWPC) (b), II (c), III (d), IV (e), and V (f), we created corresponding scatter plots between the AWD and the CWD with the line indicating the perfect prediction. In addition, we statistically analyzed the performance of each algorithm (Table 3). Because the Original algorithm was created based on the clinical data of the study subjects, we considered the errors (MAE, 0.63; RMSE, 0.84; RMSPE, 42) and the MD between the AWD and the CWD (MD, − 0.01; LLA, − 1.66; ULA, 1.63) from the Original algorithm as control values and evaluated the other algorithms. Interestingly, the algorithms I (IWPC) and II had MAE values (I, 0.66; II, 0.67) that were the closest to that of the Original algorithm (0.63), and their scatter plots were similar to each other (Fig. 1b and c). In addition, the percentages of patients whose AWD was within 20% of the CWDs derived from algorithms I and II (I, 49.6%; II, 50.4%) were higher than those from the other algorithms. In contrast, the algorithm V created with an African-American population had MAE (1.18) and RMSE values (1.34) that were higher than those of the Original algorithm. The algorithm V also had by far the highest values of RMSPE (99%) and the MD between the AWD and the CWD (MD, + 0.98; LLA, − 0.81; ULA, + 2.77) among all algorithms at a significant level and its scatter plot appeared to have the least correlation among them (Fig. 1f). We have noticed that this study cohort included the population that had PT-INR between 1.5 and 2, which is not the range that the algorithms I through V were developed for. Therefore, we created scatter plots between the AWD and the CWD for 51 subjects with PT-INR between 2.0 and 3.0, and we got the similar plots (Supplementary Figure 1).

Fig. 1
figure 1

Scatter plot of the actual warfarin dose (AWD) vs. the calculated warfarin dose (CWD). Scatter plots were applied to examine the correlation between the AWD and the CWD derived from each algorithm. The solid line indicates the line of equivalence, which shows that the AWD and the CWD are perfectly matched. The x-axis represents the AWD (mg/day) and the y-axis represents the CWD (mg/day). a Original algorithm. b Algorithm I (IWPC). c Algorithm II. d Algorithm III. e Algorithm IV. f Algorithm V

Table 3 Performance of each algorithm

Importantly, the algorithm V showed significantly higher MD than the Original algorithm and the range of ± 1.96 S.D. shifted to the positive direction. To confirm visually the distribution of each difference between the AWD and the CWDs derived from the Original algorithm and algorithm V, the Bland-Altman plot was applied [21] (Supplementary Figure 2). In addition, the mean value of CWD derived from the algorithm V (mean ± S.D.; 3.65 ± 0.79) was significantly higher than those of the AWD (2.67 ± 1.25) and the CWD (2.66 ± 0.93) derived from the Original algorithm. As will be understood, the percentage of patients whose AWD was within 20% of the CWD derived from the algorithm V (24.0%) was the smallest among those with other algorithms.

Next, we focused on the algorithms III and IV, which use the CYP4F2 genotype information for the CWDs. Interestingly, as shown in Table 3, each algorithm had values of MAE (III, 0.72; IV, 0.75), RMSE (III, 0.91; IV, 0.94), RMSPE (III, 62%; IV, 53%), and the MD between the AWD and the CWD (III: MD, + 0.22; LLA, − 1.51; ULA, + 1.94; IV: MD, + 0.31; LLA, − 1.43; ULA, + 2.05) that were higher than those of the Original algorithm, although the algorithm III was created using the Japanese population data. These results indicated that the correlation between the AWD and the CWD derived from the algorithms III or IV may not be significantly improved by integrating the CYP4F2 genetic information into the algorithm.

Finally, to evaluate the mutual validity among these algorithms, as shown in Fig. 2, we created the scatter plots between the CWD derived from the Original algorithm (Original CWD) and the CWDs derived from the algorithms with the designation I (IWPC) to V. We also analyzed the MDs based on the Original CWD and the CWD from each test algorithm. As shown in Fig. 2, algorithm I (IWPC) appeared to have the best correlation among all scatter plots (Fig. 2a). It also showed the smallest value of MD (MD, − 0.20; LLA, − 0.68; ULA, + 0.27) as compared to those derived for the other algorithms (II, − 0.25; III, 0.23; IV, 0.32; V, 0.99) (Table 3). To exclude the possibility that the results were influenced by the data from the subjects with PT-INR 1.5–2.0, we created the scatter plots between the Original CWD and the CWDs from the other algorithms using the subjects with PT-INR between 2 and 3 (Supplementary Figure 2) and confirmed that the algorithm V has a tendency to overestimate the dose.

Fig. 2
figure 2

Scatter plot of the calculated warfarin dose (CWD) derived from each of the algorithms I (IWPC) to V vs. CWD derived from the Original algorithm. Scatter plots were applied to examine the correlation between the CWD derived from each algorithm I (IWPC) to V and the CWD derived from the Original algorithm (Original CWD). The solid line indicates the line of equivalence, which shows that the Original CWD and the CWD derived from each algorithm are perfectly matched. The x-axis designated as Original represents the Original CWD (mg/day). The y-axis represents the CWD derived from each algorithm (mg/day). a Algorithm I (IWPC). b Algorithm II. c Algorithm III. d Algorithm IV. e Algorithm V

Finally, as the algorithm I (IWPC) was applied for some retrospective studies and it also showed the close performance to the Original algorithm, in order to confirm if the algorithm I (IWPC) is the best applicable for Japanese population among other algorithms, we created corresponding scatter plots (Fig. 3) and calculated MD (shown as MD3 in Table 3) between the CWD derived from the algorithm I (IWPC) and the CWDs from the other algorithm II to V with our cohort, which was used to develop the Original algorithm. As a result, each scatter plot titled algorithms II (Fig. 3a), III (Fig. 3b), IV (Fig. 3c), and V (Fig. 3d) is similar to the plot with the corresponding figure title in Fig. 2 b to e. As MD3 showed in Table 3, the algorithms III, IV, and V were overestimated than the algorithm I (IWPC).

Fig. 3
figure 3

Scatter plot of the calculated warfarin dose (CWD) derived from each algorithm II to V vs. CWD derived from the algorithm I (IWPC). Scatter plots were applied to examine the correlation between the CWD derived from each algorithm II to V and the CWD derived from the algorithm I (CWD from algorithm I). The solid line indicates the line of equivalence, which shows that the CWD from algorithm I and the CWD from each algorithm are perfectly matched. The x-axis designated as algorithm I represents the CWD from algorithm I (IWPC) (mg/day). The y-axis represents the CWD derived from each algorithm (mg/day). a Algorithm II. b Algorithm III. c Algorithm IV. d Algorithm V

Discussion

The FDA-approved drug label for warfarin states that CYP2C9 and VKORC1 genotype information can contribute to the prediction of the warfarin dose, and multiple algorithms have been proposed to predict the warfarin maintenance dose. Since the warfarin maintenance dose is influenced by genetic variants; demographic parameters, including weight, height, and age; and environmental exposures, it is likely that algorithm performance is associated with the characteristics of the study cohorts; however, the mutual validity among algorithms remains to be addressed, though verification of the algorithms is a critical process in the precision medicine for warfarin therapy. Previously, we proposed the Original algorithm based on the clinical data of 125 Japanese patients, and in this study, we analyzed the mutual validity among the five select algorithms using the same clinical data as a test set. Moreover, we examined whether adding the CYP4F2 genotype information improves the prediction of the warfarin dose in the Japanese population, as compared to that using only the CYP2C9 and VKORC1 genotypes.

In the preliminary stage of this study, we found that both body height and weight are essential to get a good correlation between the AWD and the CWD. We probed the performance of the selected algorithms by creating scatter plots and by calculating several statistical parameters. As a result, we found that the algorithm proposed by IWPC (algorithm I) had the best correlation between the CWD and the AWD as compared to that derived from the others except for the Original algorithm. Because algorithm I was based on data from Caucasian, Asian, African-American, and mixed populations and accounts for racial differences as a variable, it could be applied for the Asian population, suggesting that racial differences might influence the warfarin dosage independently of the CYP2C9 and VKORC1 genotypes. Interestingly, two patients with VKORC1 GG genotype had remarkably higher CWDs derived from the Original algorithm than those derived from the other algorithms, and the points deviated substantially from the line indicating the perfect prediction of every scatter plot in Fig. 2 (Although they appear to be one point, two points overlapped). Because their AWDs were more than 7 mg/day, other factors besides the genotypes might have affected the warfarin dose in these two patients.

In contrast to algorithm I, the CWD using algorithm V, which exclusively derived from an African-American population, were less coincident with the AWD associated with a RMSPE of 99%, although this algorithm includes the genotypes of VKORC1 and CYP2C9, age, and body surface area (BSA). Bland-Altman analyses showed that the difference is significant. These findings are consistent with the previously reported observation that the genetic polymorphism of VKORC1 contributes to the variability of the warfarin dose at different degrees, depending on the race. It could be proposed that one or more additional genetic factors reduce the sensitivity to warfarin independently of VKORC1 in the African-American population. In addition to the VKORC1 genetic polymorphism, the influence of the CYP2C9*5, *6, *8, and *11 alleles might be as significant in African-Americans as the influence of the CYP2C9*2 and *3 alleles in Caucasians or Asians [20], and the algorithm V includes the genotypes of CYP2C9 *2, *3, *5, *6, *8, and *11 as variables. However, the coefficient of the variable is defined as 0 for the patients with wild-type and 1 for those with mutant genotype of CYP2C9. Considering the difference in the enzyme activity among the genotypes, the algorithm V might have the intrinsic limitation to be applied for this study beyond the racial difference.

Regarding the CYP4F2 genotype, its integration into the algorithms resulted in a nonsignificant improvement of the warfarin dose prediction among our registered patients. The importance of the CYP4F2 genotype for the warfarin dose prediction is controversial. CYP4F2 is involved in vitamin K metabolism but warfarin does not alter its activity directly. Additional vitamin K metabolism-related factors might affect the influence of CYP4F2 activity on the warfarin dose. We noticed that both algorithms III and IV with the CYP4F2 genotype had the tendency to generate a slightly overestimated CWD as compared to the AWD. Therefore, this algorithm might be improved by some specific adjustments, depending on the population. However, further studies are required to test whether an algorithm with the CYP4F2 genotypes as a variable, possibly in combination with additional vitamin K metabolism-related factors, would further improve genotype-guided warfarin dosing in the real world than those using only the CYP2C9 and VKORC1 genotypes.

Our study has other limitations in addition to the number of subjects. We did not evaluate the reported algorithms with other variables such as smoking status, alcohol consumption, the APOE genotypes encoding apolipoprotein E and other factors which can affect the warfarin dose because we did not have those clinical data for all the registered patients. However, we believe that the algorithms using fewer variables to predict dosages are more applicable in the real world.

In summary, we validated the algorithms for warfarin dosing using the VKORC1, CYP2C9, and CYP4F2 genotypes as variables by applying the clinical data of a Japanese population as a test set. Despite the apparent differences in the formulas between the algorithms, the CWD derived from algorithms based on Asians appeared to be significantly consistent with the AWD. The IWPC algorithm showed the high accuracy for the Japanese population because the algorithm accounts for racial difference as a variable whereas an algorithm exclusively derived from the African-American population is less useful for warfarin dose prediction.

In conclusion, racial differences are critical for the pharmacogenomics-based prediction of warfarin dosing. We expect prospective clinical studies to propose and/or confirm the flowchart for warfarin therapy regimens administered as precision medicine adjusted for each racial population.