Introduction

Idiosyncratic drug-induced liver injury (DILI) is a significant public health issue. Although the occurrence is relatively uncommon, once clinically significant DILI occurs, about 10% of the patients may develop life-threatening clinical outcomes, such as acute liver failure, with most requiring liver transplantation or succumbing to death within 6 months (Andrade et al. 2019; Garcia-Cortes et al. 2020). Without prompt identification and drug cessation, 5.7–18.5% of DILI cases may progress to chronic liver disease, and in rare cases, to hepatic fibrosis and cirrhosis (Medina-Caliz et al. 2016; Hayashi and Björnsson 2018).

Clinical DILI manifestations are heterogeneous. By convention, DILI is classified according to the activity of aminotransferases and alkaline phosphatases into hepatocellular (HC), cholestatic (CS) and mixed injury at the time of recognition (Aithal et al. 2011), which has diagnostic and prognostic implications (EASL 2019). Indeed, initial biochemical presentation, histologic features, and clinical outcomes considerably vary among individuals who develop DILI, even when caused by the same agent. DILI could also mimic other liver diseases, such as autoimmune hepatitis and fatty liver diseases, which makes the clinical DILI diagnosis challenging. As a result, DILI is frequently under- or misdiagnosed.

We previously proposed a concept of drug–host interplay in DILI, theorizing that DILI susceptibility and phenotype are defined by drug properties, host responses, and their interplay (Chen et al. 2015). To date, very few studies evaluated DILI phenotypes, considering effects of both drug properties and host factors, and their interactions. In this study, we aimed to analyze well-characterized DILI cases at the Spanish DILI Registry and the information on drug properties collected from several established knowledge databases to explore factors associated with initial biochemical presentation of the DILI cases, applying a machine learning approach. Overall goals of these analyses were: (1) to identify drug properties and host factors that are associated with biochemical liver injury types at the time of DILI recognition and (2) to develop random forest models to classify biochemical injury patterns, and explore factors (or combinations of factors) that contribute to an accurate classification of biochemical liver injury types, i.e., HC vs. CS injury. Further, utilizing the knowledge gained from the machine learning approach, we developed a prediction model for practical use to aid in future causality assessment, by providing an estimated likelihood of HC vs. CS injury based on drug properties of causal drug and host factors. Our analysis demonstrated that both drug properties and host factors are associated with initial biochemical presentation while interacting each other. A simplified prediction model showed a fair performance, suggesting other host factors need to be considered in future research.

Materials and methods

Study design

A cross-sectional analysis was conducted using the data retrieved from the Spanish DILI Registry. A random decision forest approach (non-parametric, ensample computer learning) was applied to explore factors (or combinations of the factors) that contribute to accurate classification of biochemical liver injury types. Further, a simplified model was developed to predict HC vs. CS injury at the presentation, while considering drug–drug and drug–host interactions. The performance of the random forest model and the simplified prediction model were further validated using independent, well-characterized DILI cases from the Latin American DILI Network. Detailed methods are provided below.

Study population

Among cases enrolled at the Spanish DILI registry, cases that (1) met DILI criteria according to the international consensus (Bénichou 1990; Aithal et al. 2011), (2) were adjudicated to a single drug, and (3) were scored definite, probable, or possible when applying the CIOMS/RUCAM causality assessment scale (Danan and Benichou 1993), were included in the analysis. DILI cases attributed to illegal drugs, herbal medicines, dietary supplements, biological products, or drugs with non-oral routes of administration, and cases with pre-existing liver diseases, such as viral hepatitis, cirrhosis, cholangitis, alcoholic steatohepatitis and autoimmune hepatitis, were excluded, leaving 610 cases for our analysis.

DILI cases at the Latin American DILI Registry that met the above inclusion criteria were included in our analysis as an independent validation set (N = 308). Detailed methods of both registries have been described elsewhere (Andrade et al. 2005; Bessone et al. 2016). The study protocols were approved by local ethics committees. All patients enrolled in the registries gave their written informed consent.

Case categorization based on biochemical presentation

The first set of liver enzyme measurements (alanine aminotransferase [ALT] and alkaline phosphatase [ALP]) available at the time or after DILI recognition were used to calculate ALT (fold-increase above ULN)/ALP (fold-increase above ULN) ratio (i.e., R-value) (Bénichou 1990). The pattern of liver injury was classified using the R-value as HC (R ≥ 5), CS (R ≤ 2) and mixed (2 < R < 5) (Aithal et al. 2011; EASL 2019). Culprit drugs were classified according to the Anatomical Therapeutic Chemical (ATC) Classification by the World Health Organization (WHO) (World Health Organization 2018).

Other clinical variables

Patient information on demographics, co-medications, comorbidities, and laboratory data at DILI recognition was collected from the DILI registry database. Eosinophilia was considered when the serum eosinophil value reached > 5% of white blood cells. Lymphopenia was defined as a lymphocyte counting less than 20% or less than 1.5 × 103 cells.

Drug categorization based on biochemical injury type

To explore drug properties associated with specific biochemical injury types, we classified causal drugs implicated in the Spanish DILI Registry based on their dominant injury types. Drugs dominantly causing CS injury were arbitrarily defined as presenting CS injury in ≥ 60% cases and HC injury in ≤ 25% cases, while drugs causing HC injury were defined as presenting HC injury in ≥ 80% cases but no CS injury. Drugs implicated as causal in at least three DILI cases were included in the classification. The mixed injury was not considered in this drug classification, focused on HC vs. CS injury. Reports from other prospective DILI registries, case reports in the literature, and the information available at the LiverTox database (https://livertox.nlm.nih.gov/) were also used to assess/validate the classification of the most prevalent type of liver injury.

Drug properties

Drug property information was retrieved from the Liver Toxicity Knowledge Base (LTKB) database developed and maintained at the US Food and Drug Administration’s National Center for Toxicological Research (Chen et al. 2013; Hong et al. 2016). This knowledge base accumulates comprehensive drug property information on US-marketed pharmaceuticals. Information on drugs not marketed in the United States was obtained from the drug summary of product characteristics at the Spanish Medicines Agency (in Spanish, Agencia Española de Medicamentos y Productos Sanitarios, AEMPS). In the LTKB database, hybridization ratio was defined as the ratio between the number of sp3 and sp2 orbitals in drug molecule. Heterorings were defined as the organic rings with no carbons in their main atomic substituents (heteroatoms) (e.g., sulfur or halogen atoms). Information on specific variables such as enterohepatic circulation and percentage of drug elimination in parent drug form was obtained from the DrugBank database (Wishart et al. 2006). Drug disposition was categorized according to the Biopharmaceutical Drug Disposition Classification System (BDDCS) (Benet et al. 2011; Broccatelli et al. 2012). Hepatic metabolism was classified in accordance with the study done by Lammert et al. (2010). Lipoaffinity was determined as described by Liu et al. (2001). Compound electronegativity was determined using the Pauling electronegativity scale to calculate a mean electronegativity value of all atoms for each compound. High electronegativity was defined as a mean electronegativity value ≥ 1.016 (Matsunaga et al. 2003). Bile salt export pump (BSEP) inhibition is generally reported as a drug’s IC50 value, the drug dose required to inhibit 50% of BSEP activity (Warner et al. 2012). Drugs with BSEP IC50 < 300 μM were considered BSEP inhibitors in the current study.

Statistical analysis

Results are presented as mean ± SD or median [interquartile range] (for continuous variables) or percentile (dichotomous or ordinal variables). We only considered the variables available in at least 70% of the drugs/cases in this study. First, univariate analyses were performed to study the associations between the observed injury types with clinical variables in the DILI cases. We also compared drug properties between drugs dominantly associated with HC and CS injury, as defined above. We used the Student’s t test, Wilcoxon Rank Sum test, analysis of variance (ANOVA) with the post hoc Tukey’s HSD test or Kruskal–Wallis test with the Mann–Whitney U test, as appropriate, for continuous variables and the Chi-square test for categorical variables. Due to the exploratory nature of this study, p values were not adjusted for multiple comparisons in the univariate analysis.

Next, a random decision forest approach was applied to explore both host and drugs factors which significantly contributed to the classification of biochemical liver injury types. Mixed cases were excluded from this analysis, leaving 501 cases for this analysis. We trained a random decision forest regression model to identify the best-performing decision tree to distinguish HC from CS injury, using a combination of drug/host variables. Through iterative bootstrap sampling from the 70% of the original data (training set), random forest classification models were developed using R software (version 3.6.0), by which 100,000 decision trees were generated. The remaining population (30%) was used for cross-validation. All models were evaluated based on their respective p value (McNemar’s test), accuracy, and predictive values. To assess the importance of variables in the accuracy of classification, we evaluated the variables in all decision tree models using two scales, Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG). MDA is a measure of reduction in general model accuracy from permuting values in the feature, i.e., the number or proportion of observations that are incorrectly classified by removing the feature (values from the feature) in question from the model. Thus, the higher MDA (e.g., higher reduction of the accuracy when the variable is removed), the more important the variable is deemed for classification of the data. MDG is a measure of average gain of node purity by splits of a given variable, i.e., a measure of how well a variable can split mixed labeled nodes into pure nodes. The higher MDG, the more important the variable is deemed in gaining node purity. The random decision forest analyses were performed using three different datasets: the entire cases, amoxicillin/clavulanate cases, and non-amoxicillin/clavulanate cases, as nearly one-fourth of the DILI cases at the Spanish DILI Registry were attributed to amoxicillin/clavulanate (23%). The best-performing model for the entire cases dataset was validated using the independent cohort from the Latin American DILI Network.

After a panel of the investigators carefully vetted the above results, we developed a prediction model for practical use based on the best-performing decision tree in the entire cases dataset. We used a binary logistic regression model with the biochemical injury type as an outcome (HC vs. CS injury) using JMP Pro 14 from SAS Institute Inc., Cary, NC. The factors included in the best-performing decision tree were considered as predictors. They were further evaluated for potential drug–drug, drug–host, and host–host interactions using tabulations and iteratively assessed their contribution to the prediction performance (measured by the area under the ROC curve). A final model was selected based on the maximum area under the ROC curve while considering the simplicity for broader clinical use. Significant factors (p value < 0.05 of Wald test, used to evaluate the significance of individual coefficients in the model) yet yielding a negligible contribution to the predictive performance were not included in the final model for the simplicity. The developed model was also applied in the Latin American DILI Network cohort for validation.

Results

Clinical characteristics and the associations with biochemical liver injury types

A total of 610 patients with the three types of liver injury: HC, CS and mixed (median age and interquartile range: 59 [43–70] years) were included. Overall, 52% (N = 316) were women, and the majority were native-born Spanish. Patient characteristics and clinical manifestations at DILI recognition among different injury patterns are summarized in Table 1. Patients with CS were older than patients with HC injury (p value < 0.001, median age and range: HC 56 [39–68] years vs. CS 66 [53–77] years). Women were more prevalent among patients with HC compared to patients with CS injury, albeit the difference did not reach statistical significance (54% vs. 45%, p value = 0.104). Jaundice, eosinophilia, and lymphocytopenia were more frequently observed among patients with CS and mixed injury while positive autoantibodies were more prevalent among patients with HC injury (p value = 0.021 for the three-group comparison) (Table 1).

Table 1 Demographics and clinical characteristics of 610 DILI cases classified by the type of liver injury

Patients with HC injury had a higher prevalence of drug allergy history than patients with CS injury (17% vs. 7%, p value = 0.026). Prevalence of underlying diseases at the time of event was 74% in the entire population and was significantly lower in patients with mixed type of liver injury (62%, p value = 0.012 for the three-group comparison). Vascular, endocrine, cardiac, and renal diseases were more prevalent in patients with CS injury while rheumatologic diseases were more prevalent in patients with HC injury (Table 1).

Categorization of causal drugs by dominant biochemical injury types and the association with drug properties

Among the 610 DILI cases, 155 drugs/combination drugs were identified as a cause of DILI. Of them, 91 drugs (59%) were exclusively presented in this population with HC or CS injury, while 64 drugs (41%) were presented with HC and CS, depending on the cases. Of the 91 drugs, the majority were implicated in one or two DILI cases, thus excluded from this analysis. Only 14 of the 91 drugs were associated with 3 or more DILI cases. From supplementing DILI cases retrieved from other resources (see the methods), six drugs, i.e., azathioprine (immunosuppressant), captopril (ACE inhibitor), chlorpromazine (antipsychotic), cloxacillin (beta-lactamase resistant penicillin), norfloxacin (fluoroquinolone), and thiamazole (i.e., methimazole, antithyroid drug) were considered as drugs dominantly causing CS injury. Nine drugs, i.e., acarbose (alpha glucosidase inhibitor), bentazepam (benzodiazepine), cyproterone (antiandrogens and estrogens), ebrotidine (H2 receptor antagonist), isoniazid (hydrazide), leflunomide (selective immunosuppressant), paracetamol (i.e., acetaminophen, analgesic), sertraline (selective serotonin reuptake inhibitor), and trovafloxacin (fluoroquinolone) were considered as drugs causing mainly HC injury.

Properties of drugs dominantly associated with HC vs. CS injury are summarized in Table 2. None of the drug properties was significantly associated with injury types, probably due to the low numbers of drugs that present dominant injury types (6 CS drugs vs. 9 HC drugs). Significant hepatic metabolism (≥ 50%) tended to be more prevalent among drugs dominantly presenting HC vs. CS injury (89% vs. 50%, p value < 0.1).

Table 2 Physicochemical, pharmacokinetic and pharmacodynamics properties of drugs causing hepatocellular injury vs. drugs causing cholestatic injury

Random decision forest analysis of drug properties and host factors in classifying HC vs. CS injury

We performed random decision forest analysis to identify factors associated with specific injury types, HC vs. CS injury. To note, mixed cases were excluded in this and the following analyses to focus on variables discriminating two distinct DILI types, HC and CS injury (N = 501). The top 18 variables deemed important by MDA and MDG in the analysis are shown in Fig. 1. The accuracy of the best-performing model in the entire cases dataset was 0.84 (95% CI 0.78, 0.88) (Fig. 2). The top-performing models showed the best yet equivalent performance, including age, duration of treatment, daily dose, lipoaffinity index, AlogP, serum half-life, vascular diseases, and hybridization ratio.

Fig. 1
figure 1

Importance of the variables for the classification of hepatocellular (HC) vs. cholestatic (CS) injury in the random forest models using the entire cases dataset. This figure shows the top 18 variables deemed important by Mean Decrease Accuracy (MDA) (left panel) and Mean Decrease Gini (MDG) (right panel). The two measures were computed using the entire cases dataset models in the analysis. The higher values in the measures represent the higher importance of the variables in accurate classification of the outcomes, HC vs. CS injury (see the methods)

Fig. 2
figure 2

Best-performing decision tree model for classifying hepatocellular (HC) vs. cholestatic (CS) injury in the entire cases dataset selected by a random forest approach. All the identified variables are continuous variables, except for lipoaffinity (< 2, yes/no) and the presence of vascular disease (yes/no), both of which are binary variables. For continuous variables, the best cutoffs determined by the computer are shown underneath each node. Two numbers in each node show (1) the number of cases included in the node over the number of total cases (%) and (2) the fraction of HC cases in the node, ranging from 0 to 1. At the bottom, 12 nodes show the predicted probability of having hepatocellular injury (0 to 1) and the percentage of cases included in the node. Green color represents a higher probability of HC vs. CS injury (probability of HC cases > 0.5), while blue color represents a lower probability of HC vs. CS injury (probability of HC cases < 0.5)

The random decision forest models were also developed using amoxicillin/clavulanate cases as well as non-amoxicillin/clavulanate cases datasets. The accuracy of the model was 0.82 (95% CI 0.66, 0.92) for the amoxicillin/clavulanate cases and 0.83 (95% CI 0.76, 0.88) for the non-amoxicillin/clavulanate cases (models are not shown). In the model for the amoxicillin/clavulanate cases, the combination of older age (≥ 56 years.) and longer latency (≥ 10 days) were associated with a higher likelihood (89%) of CS injury while younger age (< 56 years) was associated with a higher likelihood (76%) of HC injury. In the model for non-amoxicillin/clavulanate cases, factors contributing to the accurate discrimination were consistent with the model of the entire cases dataset, including age, duration of treatment, lipoaffinity index, and hybridization ratio. A combination of low lipoaffinity (< 2), shorter treatment duration (< 60 days), and low hybridization ratio was associated with a higher likelihood (73%) of CS injury, while the combination of high lipoaffinity, low hybridization ratio, and younger age was associated with a higher likelihood (96%) of HC injury.

The performance of the model—based on the best-performing decision tree—was validated using the Latin American DILI Network cohort. The application of the model in the whole Latin American cohort showed an accuracy of 0.69 (95% CI 0.62, 0.75) (N = 200) (Table 3). After excluding amoxicillin cases, the accuracy of the model remained similar, 0.72 (95% CI 0.64, 0.78) (N = 169).

Table 3 The performance of the best model in the Spanish DILI and the Latin American DILI registries, a validation cohort

Prediction model for practical use considering drug and host factors

Factors significantly contributing to the classification of HC vs. CS injury in the random decision forest models were considered in this prediction model.

The most significant host factor, age, showed a linear association with the injury types for amoxicillin/clavulanate cases, but for the rest of the DILI cases (i.e., non-amoxicillin/clavulanate cases), age effect was apparent only after age 30 years (Online Resource 1). Thus, a continuous age variable was applied only after age 30 in the model. The two key drug properties, lipoaffinity and hybridization ratio, showed an interaction; a higher hybridization ratio (> 0.5) only increased the chance of HC injury when lipoaffinity was low (< 2). Thus, a combinatory categorical variable was created for the two drug properties. The age variable and the combinatory drug properties categories yielded an area under the ROC curve of 0.74. Drug metabolism, longer half-life, daily recommended dose, latency, treatment duration, concomitant use of cardiovascular drugs, and endocrine comorbidities showed significant associations with the biochemical injury types but did not add statistically significant contribution to the model prediction. Thus, these variables were not included in the final model for the simplicity.

The model’s predictive performance was validated in the DILI cases from the Latin American DILI Network Registry. Both the age variable and the combinatory drug properties categories showed significant associations, and the area under the ROC curve was 0.68, slightly lower than the performance observed in the training set of the Spanish DILI cases.

Discussion

Liver injury presentation is one of the most elusive manifestation of idiosyncratic DILI. This study, combining comprehensive clinical data from a large database at the Spanish DILI Registry and the drug property information, demonstrates for the first time that both drug properties and host factors contribute to the initial biochemical DILI presentation, HC vs. CS injury. Our analysis also suggests drug–drug and drug–host interactions play a role in the biochemical manifestation, reiterating the importance of considering such interactions in future studies/analyses. Using two different measures of variable importance (MDA and MDG) from random decision forest, the top 18 factors contributing to the accurate discrimination of injury types were consistent, including age, duration of treatment, daily dose, lipoaffinity index, AlogP, serum half-life, vascular diseases, and hybridization ratio. This model yielded 82–84% accuracy in the original Spanish DILI cohort and 69–72% accuracy in the Latin American validation cohort. Our simplified model, developed for practical use, consisting of the selected patient’s age, drug’s lipoaffinity, and hybridization ratio, showed a fair performance in the testing cohort (74%) to predict HC vs. CS injury, which suggests further opportunities to improve prediction, using a larger, and even more diverse dataset.

Among drug properties, lipoaffinity and hybridization ratio were consistently identified as significant contributors, defining the initial biochemical presentation. Low lipoaffinity (< 2) was associated with a higher prevalence of CS injury, regardless of age. There was significant drug–drug interaction between lipoaffinity and hybridization ratio; the latter was influential on the biochemical presentation only when lipoaffinity was low. Other drug factors, such as daily recommended dose, half-life, latency, drug metabolism, also showed significant associations with the biochemical presentation, but did not significantly contribute to the prediction of HC vs. CS injury, showing a marginal effect in the prediction of biochemical phenotype.

The most influential host factor affecting the biochemical presentation was age, which is consistent with the finding of a recent study, showing that age older than 65 years is the strongest determinant of CS injury (Weersink et al. 2020). Among amoxicillin/clavulanate cases, the prevalence of CS injury linearly increased along with aging, although among non-amoxicillin/clavulanate cases, the association was only linear after age 30. Indeed, our cohort was not optimal to investigate adolescents and young adults as they are sparsely represented in our registry. Thus, our observation of a seemingly higher likelihood of CS injury among adolescents and young adults needs to be further confirmed.

In a recent study, we have observed that some drugs other than amoxicillin/clavulanate are associated with a shifting injury phenotype when aging, whilst a few other drugs show a consistent HC signature regardless age (Weersink et al. 2020). Interestingly, our developed model supports these findings.

The consistent association of CS injury with increasing age illustrates the complexity of host factors influencing phenotypic presentation, as older patients significantly have more comorbidities and receive a higher number of drugs (Lucena et al. 2020). Not surprisingly, vascular diseases showed a significant association toward CS injury. Besides, diabetes and endocrine diseases were also found as important factors to classify patients according to the type of liver injury. Indeed, the net contribution of underlying diseases in addition to co-medications to CS injury remains to be elucidated and may explain the further opportunities to improve the performance of the simplified model in the prediction of this phenotype. Host factors, with special attention to age, are the cornerstone to be considered in a complex context to define CS injury pattern. Other host factors, such as race/ethnicity, genetic/epigenetic factors, and reproductive status, may modulate individuals’ response to injury stimuli and influence the initial biochemical presentation of DILI. These factors were not assessed in this analysis and are warranted to be considered in future investigation.

Our simple prediction model for practical use yielded a reasonable performance in the original dataset with a slightly lower performance in the validation set, suggesting the initial biochemical presentation may not be fully predictable using a simplified model. Indeed, differences in prescription patterns, culprit drugs, the overrepresentation of female sex in the validation cohort, and other genetic, epigenetic and environmental factors might explain the differences in the performance in the two DILI cohorts. Nonetheless, initial biochemical presentation is influenced by the pattern of elevation of liver enzymes; the biochemical presentation (R-value) changes over time after the injury insult due to (1) differences in enzymes half-life t1/2 (longer in ALP compared with ALT), and (2) different timing in ALT elevation vs. ALP elevation after acute liver injury, shorter for ALT (Kim et al. 2008; Lowe et al. 2020). Despite the limitations, when the model was applied to the overall population (including mixed injury), none of the cases with a computed probability of HC injury > 95% had CS injury (93% HC, 7% mixed), while in cases with a high computed probability (> 60%) of CS injury, 52% actually had CS injury (24% HC, 24% mixed) (data not shown). Considering mixed injury is intermediate, the predicted probability may have clinical implication, providing additional information (i.e., probability of HC vs. CS injury, based on drug and host factors), which can be useful in the causality assessment.

This study has several limitations. It did not include broader racial populations (mainly Caucasian); thus, whether the findings can be extrapolated to other racial/ethnic populations deserves further investigation. Furthermore, the study populations did not include a sufficient number of pediatric patients, which precluded from addressing the effect of age on DILI type throughout the lifecycle. In non-amoxicillin/clavulanate cases, we observed non-linear age effect with a higher proportion of CS cases in younger age groups (< 30 years). Thus, biological significance of this observation remains uncertain. The impact of concomitant medication use was not thoroughly investigated in this study. Concomitant medications have been associated with the severity of DILI and the reporting frequency of liver events in large spontaneous adverse event reporting systems (Suzuki et al. 2009,2015), suggesting that co-administered medications may contribute to DILI risk via drug–drug, drug–host interactions and may also contribute to biochemical injury patters as well, which is warranted for further investigation.

In summary, our machine learning analysis and subsequent prediction modeling demonstrated that initial biochemical presentation at DILI recognition is associated with both drug and host factors and their interactions. The simplified model showed a fair performance, yet provides some clinical implication, supplementing the information on the predicted biochemical presentation based on the patient’s age and drug properties. As discussed in the concept paper (Chen et al. 2015), DILI manifestations are determined by not just drug but also how the host responds to the injury insult. We believe that considering both drug and host factors in evaluating DILI risk and phenotypes while critically assessing data independence in the analysis open an avenue for future DILI research and would aid in the causality assessment.

Lastly, including diverse populations and drugs in the modeling approach is the key to developing a broadly applicable model. Further international collaboration is encouraged.