Introduction

Clearance is one of the most fundamental pharmacokinetic parameters, and its accurate in vivo prediction is necessary for compound prioritization and first-in-human estimates. However, the surprising inaccuracy in predictions from in vitro to in vivo extrapolation (IVIVE) has recently been reviewed (1,2).

The typical IVIVE process involves measuring an intrinsic clearance (CLint) in microsomes or hepatocytes and applying biological scaling factors and a model of hepatic disposition to estimate an in vivo hepatic clearance (CLH). In an attempt to eliminate the systematic error with IVIVE, groups have begun applying regression or empirical based scaling factors (3).

When examining the widespread IVIVE error, significant interlaboratory in vitro variability has been noted (1,4,5). While variability may result from interdonor differences, pooled lots are now commonly used to reduce lot-to-lot variation, or may result from differences in the biological scaling factors applied, efforts have been directed toward reaching a consensus (6,7). There could also be variation due to the use of fresh vs. cryopreserved hepatocytes, however previous studies have not found significant differences (8,9).

When collating in vivo hepatic clearance values from intravenous studies, Stringer et al. (5) found low variability; however, upon examining in vitro hepatocyte CLint values, the authors found large coefficients of variation (CVs), which increased with increasing CLint. Nagilla et al. (4) noted the paucity and variability of in vitro literature data, explaining that CLint values should be taken from a consistent assay rather than arbitrarily chosen from different literature sources. Now that more data have been generated, we reexamine the interlaboratory variability, and search for trends with variability and physicochemical and pharmacokinetic parameters. We also examine trends in prediction accuracy for compounds with uniform in vitro values.

Methods

A total of 14 papers were examined (Table I) and overlapping values were found for 50 compounds with data generated in human hepatocytes (Supplementary Table I). All in vitro CLint values were scaled to a predicted CLint,in vivo (Eq. 1) using consistent scaling factors of 120 × 106 hepatocytes/g liver and 21.4 g liver/kg body weight, and the fraction unbound in the hepatocyte incubation (fuhep) values taken from the Wood et al. (2) database:

Table I Human Hepatocyte Data Examined for this Evaluation
$$ Predicted\ CLint, in\ vivo=\frac{C{L}_{\mathit{\operatorname{int}}, in\ vitro}}{f{u}_{hep}}\cdot 120\cdot 21.4 $$
(1)

Coefficients of variation (CV) were determined as standard deviation divided by the average.

Values for hepatic clearance (CLH,in vivo) (ml/min/kg), fraction unbound in the blood and plasma (fub, fup), and intrinsic clearance (CLint, in vivo) (ml/min/kg) were taken from Wood et al. (2). CLint, in vivo values were calculated using the well-stirred model (since the difference in bias between the well-stirred and parallel tube model, the two extremes for models of hepatic disposition, was determined to be minimal) (2).

The ChEMBL database (https://www.ebi.ac.uk/chembl) (21) was used to obtain values for molecular weight (MW), logP, logD, polar surface area (PSA), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds, and number of aromatic rings.

Values for the steady state volume of distribution (VDss) (l/kg) and mean residence time (MRT) (hr) were found for 45 compounds in Obach et al. (22).

Classification within the Biopharmaceuticals Drug Disposition Classification System (BDDCS) was determined using Benet et al. (23) and Hosey et al. (24).

Main metabolizing enzyme information was found for 33 compounds in El-Kattan et al. (25)

The relationship between variability and the properties was evaluated by examining the coefficient of correlation R2.

The accuracy of predictions was determined based on whether the predicted CLint values fell within two fold of the observed CLint values (Eq. 2).

$$ 0.5\le \kern0.5em \frac{observed\ {CL}_{int}}{predicted\ {CL}_{int}}\kern0.5em \le 2 $$
(2)

Results

Coefficients of Variation and Physicochemical Parameters

Data for 50 compounds were evaluated and each compound had values from 2 to 9 sources. Of the 50 compounds, 17 had n = 2, preventing a statistically relevant CV from being calculated. For the remaining 33 compounds, the CVs ranged from 8.53–99.8%. The potential for CV dependence on the number of values (n) was examined first. Pindolol with the second lowest CV of 19.0% had data from three sources, and triazolam with the second highest CV of 99.4% similarly had data from three sources. Imipramine, with n = 5 had the lowest CV of 8.53%. Therefore, a high value of n did not necessarily cause high CV values as shown in Fig. 1a. The fold difference between the highest and lowest predictions for each compound was also examined and there did not appear to be a dependence on n (Fig. 1b).

Fig. 1
figure 1

The dependence of CV (a) and the largest fold difference (b) on n.

Sixteen physiochemical and pharmacokinetic properties were examined in relation to CV (Fig. 2) and there were no direct correlations here as the highest R2 value was only 0.071. The 5 largest correlations are reported in Table II. The data were then divided into a lower CV group (CV < 50%) and higher CV group (CV ≥ 50%) and median parameter values were examined (Table III). The largest relative difference was seen with fub and fup values, followed by the number of rotatable bonds. In the lower CV half, 29% of compounds had ≥7 rotatable bonds compared to 6.3% of compounds with higher CV.

Fig. 2
figure 2

Trends between various physicochemical and pharmacokinetic properties and CV.

Table II Highest Correlations, R2, of CV with Parameters
Table III Median parameter values for compounds with lower vs. higher CV values with (n)

BDDCS class, molecular species, and main metabolizing enzymes were also examined. BDDCS Class 1 drugs appeared to have a wider range of CV values than Class 2 drugs (Fig. 3a). When examining molecular species, neutral drugs had the highest CV values (Fig. 3b). Looking at main metabolizing enzymes, compounds metabolized by CYP3A4 appeared to have the highest CV values (Fig. 3c). For CYP3A4 substrates, 38% had a CV > 90%, while no CYP2D6, CYP1A2, CYP2C, and UGT substrates had CVs > 90%.

Fig. 3
figure 3

Trends between CV and BDDCS class (a), molecular species (b), and main metabolizing enzyme (c).

Given the difference seen between BDDCS classes, the data were also split by class 1 and class 2 compounds (n = 21 and 11 respectively). Examining the same physiochemical properties with CV for both classes, there were no correlations for BDDCS class 1 compounds (every R2 value was less than 0.10). For BDDCS class 2 though, there were potential trends (Fig. 4a). The number of HBA and HBD and number of aromatic rings had the largest correlations, however the smaller number of compounds should be noted. The lack of correlation with BDDCS class 1 compounds is shown in Fig. 4b for comparison.

Fig. 4
figure 4

The highest correlations of CV with physicochemical properties for BDDCS class 2 compounds (a) and the lack of correlation for BDDCS class 1 compounds (b).

Uniformity of Predictions and Physicochemical Parameters

Next the variability relating to the accuracy of predictions was examined. Accurate predictions are typically defined as predictions that fall within two fold of observed values (16,26,27). Here, if a compound had predictions all falling either within two-fold or outside two-fold, it was categorized as “uniform”. If a compound had some predictions falling within two-fold, and some falling outside two-fold, it was categorized as “non-uniform”. The same properties were then examined to determine if any drive the difference between the two categories.

Returning to the 50 compiled compounds, there were 31 uniform compounds and 19 non-uniform compounds. Of the uniform predictions, 6 (19%) were accurate predictions, and 25 (81%) were inaccurate underpredictions. The most distinct difference between the uniform and non-uniform categories was seen with CLint, in vivo. Compounds with uniform predictions typically had higher CLint,in vivo values (Fig. 5). Furthermore, 37% of non-uniform predictions had CLint, in vivo values <10 ml/min/kg compared to 10% of uniform predictions.

Fig. 5
figure 5

Relationship between compounds with uniform vs. non-uniform predictions and CLint, in vivo.

Accuracy of Predictions and Physicochemical Parameters

Finally, all 31 compounds with uniform predictions were further examined. It is expected that new understandings of mechanisms will help reduce the current IVIVE underprediction, but for now, it is important to know which new compounds may yield results that will be accurate, and which may not. Here only 6 compounds had accurate predictions, limiting the power of the evaluation. Despite this, there were accuracy distinctions when considering logD, CLint, in vivo, and fup (Table IV). Of the accurate predictions, 83% had a logD of <1.0 compared to 28% of inaccurate predictions. 42% of compounds with logD of <1.0 had accurate predictions and 5.0% of compounds with logD ≥1.0 had accurate predictions. For CLint, in vivo, 31% of compounds with CLint, in vivo < 100 ml/min/kg had accurate predictions compared to 6.7% with CLint, in vivo ≥ 100. Finally, for fup, 11% of predictions with fup < 0.1 were accurate compared to 33% of predictions with fup ≥ 0.1.

Table IV Properties of Compounds with Accurate, Uniform Predictions

Discussion

Variability in the in vitro data generated and used for IVIVE can significantly affect clearance predictions. This compilation found varying reported data for 50 compounds. Of these, 33 had n ≥ 3, and CV values for the same compound were as high as 99.8%. Trends were sought in hopes of determining in the future which new compounds may yield more reliable predictions than others. However, after confirming that variability was not dependent on n, no direct trends appeared with the physicochemical properties examined.

Upon more generally splitting the compounds into low and high CV groups though there appeared to be marked relative differences in the median values for fub and fup and the average number of rotatable bonds. After further examining the binding values though, an obvious trend did not appear. For fub, 35% of the low CV group had high protein binding (fu ≤ 0.05) and 31% of the high CV group also had high binding. A similar result was seen with fup where 47% of the low CV group had high protein binding and 38% of the high CV group did also. A difference did hold for rotatable bonds where in the lower CV half, 29% of compounds had ≥7 rotatable bonds compared to 6.3% of compounds with higher CV. It has previously been shown that decreasing rotatable bond count is paralleled by increasing permeation rate (28), and here this may lead to larger variability. Wood et al. (29) previously examined the importance of the unstirred water layer (UWL) on clearance predictions with hepatocytes. Given that the UWL has been shown to reduce the apparent permeability of highly permeable compounds, the authors showed that shaking of incubations can lead to 3 to 5-fold higher CLint values (29). Perhaps the increase in variability noted with lower rotatable bond counts (and thus higher permeability) could be related to experimental differences for incubation shaking among laboratories and moving forward, this factor should be considered for new chemical entities.

Interestingly BDDCS class 1 drugs had a larger CV range than BDDCS class 2 drugs and neutral drugs had more variation than acidic or basic. Although the number of drugs with main metabolizing enzyme information was more limited, CYP3A4 substrates had higher CV values, perhaps due to the potential of extrahepatic metabolism. When examining R2 values with class 2 drugs and different properties, the number of HBA and HBD stood out, which has also been shown to be related to permeation rate (28,30). As more data are generated and shared, it would be useful to reevaluate these potential trends and their statistical significance with a larger sample size.

Some compounds had large CV values, however upon further examination, no matter which value was used, the predictions would have fallen outside of two-fold of the observed value and been considered inaccurate. For instance for triazolam that had a CV of 99.9%, data from three sources underpredicted by 3.8, 14, and 29 fold. For these cases, the compounds were deemed to have “uniform” predictions. The main difference noted between uniform and non-uniform compounds was that uniform compounds had higher CLint, in vivo values. The majority of the uniform compounds were uniformly inaccurate (80%), and all of these inaccurate compounds were underpredicted. This is not unexpected given the high inaccuracy previously reported (1,2) and emphasizes the need to find a mechanistic reason for the underprediction. It has been noted that compounds with high CLint,in vivo commonly have large error (2,31,32), which explains why these compounds would have uniform inaccurate predictions. More low clearance (CLint, in vivo < 10 ml/min/kg) compounds fell in the non-uniform category, confirming the experimental challenges for low turnover compounds (5,33).

Finally, trends in accuracy for the 31 compounds with uniform predictions were examined. More or less confidence could theoretically be placed in a new compound’s extrapolation results if any trends exist and accordingly more or less experiments may be needed. Of the 50 drugs examined, only 6 compounds had uniform accurate predictions, limiting the power of the evaluation. Of the accurate compounds, there were 5 accurate BDDCS class 1 and 0 accurate class 2 compounds (the 6th accurate compound was class 3) supporting the hypothesis that class 1 drugs would have more accurate predictions (1). Based on this dataset it appears that less lipophilic, lower intrinsic clearance, and lower protein binding compounds have more accurate predictions. The intrinsic clearance finding agrees with the idea of CLint dependent underprediction mentioned earlier, and the protein binding finding agrees with previous studies concluding that highly bound compounds have more inaccuracy (34,35). It will be useful to reevaluate these trends as more uniform, accurate data are generated for compounds.

Conclusions

This investigation highlights the interlaboratory variability in generated CLint values and the need for consistent and improved methodologies. Compounds with lower rotatable bond counts and therefore higher permeability had more variability, perhaps due to experimental differences in incubation shaking and the role of the unstirred water layer. Compounds with uniform predictions typically had higher CLint, in vivo values and uniform underpredictions, confirming a lack of mechanistic understanding with IVIVE; while compounds with non-uniform predictions typically had lower CLint, in vivo values, reaffirming the current experimental challenges for compounds falling within this clearance range. While only a limited number of uniform predictions were accurate, lipophilicity, intrinsic clearance, and protein binding may be determinants of accurate IVIVE.