FormalPara Key Points

We developed an effective computational protocol for the generation of valid physiologically-based pharmacokinetics (PBPK) models for arbitrary molecules, an important task in preclinical drug discovery.

This method utilizes ADME Predictor to calculate pharmacokinetic parameters as inputs for PBPK modeling and simulation using Simcyp simulator.

More than 60% of compounds have satisfactory performance utilizing this method.

1 Introduction

Pharmacokinetics is the study of the time courses of a drug administered to the body, which includes the processes of absorption, distribution, metabolism and excretion (ADME) [1]. Usually, it is essential to quantitatively measure the concentration of the drug in plasma at different time points in a pharmacokinetic study for the analysis of drug behavior and dose adjustment. In addition to clinical trials, which always involve time, cost and ethical considerations, the prediction of concentration profiles under various administration conditions can also be achieved by the implementation of physiologically based pharmacokinetic (PBPK) [2,3,4] modeling. Computational tools for both PBPK modeling and pharmacokinetic parameter prediction have been developed, further reducing the experimental expense. By virtue of such tools, quick and convenient in silico prediction of drug behavior in the human body can be easily performed without investing many resources in the experiments, informing further studies in drug toxicity, dosing strategy and potential drug-drug interactions. As such, this PBPK modeling can be particularly useful in the preclinical phase and can serve as a tool to help select drug candidates that are more likely to have desirable pharmacokinetic profiles.

In the literature, one study predicted the bioavailability (Fa%) of a structurally diverse group of drugs using theoretical descriptors and neural network modeling [5]. Another study applied a genetic algorithm to optimize the prediction model for drug Fa%, plasma protein binding and urinary excretion [6]. There are also studies predicting the Fa% of a chemical series with GastroPlus [7, 8]. Evaluation of the Fa% prediction performances from different software platforms, SimCYP and GastroPlus, has also been conducted focusing on low-solubility drugs [9]. Collectively, these studies focused on the value of Fa% and area under the curve (AUC) as the most important parameters for drugs after administration, but these parameters cannot fully explain the shape of the drug concentration-time (CT) profile. Therefore, how a drug is absorbed, distributed, metabolized and excreted in the course of time still lacks systematic prediction guidance.

This research aims to develop a pure in silico method to predict the pharmacokinetic profile of a compound efficiently, taking advantage of the available high-quality PBPK models in the Simcyp compound library and public domain. This method has potential application in selecting drug candidates with favorable pharmacokinetic profiles to enter the next stage of drug development.

2 Methods

In this study, we developed a novel method to predict the plasma concentration profile of a target compound based on PBPK models constructed using the model of a structurally similar drug that serves as the template. We utilized the SimCYP simulator (V19, Release 1; Shefeld, UK) [10] software to construct PBPK models for a target drug by only substituting the predicted ADME parameters of the target drug for those from the PBPK model of the corresponding template drug. We applied ADMET Predictor (V9.5, Simulation Plus) [11, 12], a software developed by SimulationPlus Inc. to predict the ADME properties of target drugs, which include physiochemical parameters such as fraction unbound in plasma (fu) and blood-to-plasma partition ratio (B/P) and ADME input parameters such as volume of distribution (Vd), Michaelis-Menten constant (Km) and maximal metabolism rate (Vmax) of common enzymes. To better validate our constructed PBPK models as well as evaluate the performance of the two software tools, we selected 18 drugs from the SimCYP compound library (including substrates and inhibitors) as the template drugs. In total, 13 drug pairs were formed based on their structural similarity. For each pair of drugs, one serves as the template and the other serves as the target drug. For the target drug in a drug pair, we pretended that no PBPK model was available, and new PBPK models were constructed based on the PBPK model of the template drug. We tested three protocols by introducing ADMET Predictor predicted ADME properties into the template PBPK model and evaluated the model performance using the observed pharmacokinetic profile of the target drug. The corresponding PBPK models constructed using the three protocols were called V1, V2 and V3 models.

2.1 Drug Preparation

Drugs selected for the construction of in silico PBPK models come from the built-in drug database of the SimCYP software. Simplified Molecular-Input Line-Entry System (SMILES) [13] strings of all drugs from the SimCYP built-in library, including substrates and inhibitors, were collected from the DrugBank database [14]. The SMILES strings of drugs were used not only for their structural similarity calculation on a web platform, but also as inputs for the generation of their properties using the ADMET Predictor.

2.2 Structure Similarity Calculation

Tanimoto scoring is a commonly used method to compute the fingerprint-based similarity between two compounds [15]. In this study, we applied the maximum common substructure based (MCS) Tanimoto algorithm for the similarity calculation. The Tanimoto score (TS) is defined by the function below (Eq. 1) [16]:

$${\text{TS}}\left( {X,Y} \right) = \frac{{N_{Z} }}{{N_{X} + N_{Y} - N_{Z} }}$$
(1)

where NX and NY are the numbers of bits in fragment bit strings of the two compounds and NZ is the intersection set, i.e., the number of common substructures shared by these two compounds. TS (X, Y) ranges from 0 to 1, measuring the structural similarity between two compounds from the lowest to highest (when the two molecules are identical). TS scores were calculated using ChemMine for all combinations of drugs in the SimCYP compound database [17].

2.3 Validation of PBPK Models for Drug Templates

We first validated the PBPK models of all 18 selected drugs by utilizing their observed data from the literature. In detail, we utilized the original built-in models of those drugs in SimCYP to run the simulation. In terms of the trial design, the dose regimens, simulation time as well as population information including age, weight and health condition were the same as those reported in the clinical study of pharmacokinetics measurement. Meanwhile, the parameters of the built-in PBPK model, such as the drug’s ADME properties, remained the same for all the drugs except fluoxetine. As a racemate, we adjusted some of its ADME and pharmacokinetic parameters according to the literature to make the predicted curve fit the experimental data much better [18,19,20]. The key ADME parameters predicted by ADMET Predictor for the 18 drugs are all listed in Table S1, including the details of the adjusted parameters of fluoxetine. The observed drug concentration data of each template drug were extracted from published concentration-time (C-T) curves using WebPlotDigitizer [21]. The CT curves from simulations were then overlaid to the observed drug concentrations. The predicted pharmacokinetic profiles of each template drug, including the maximal concentration (CMax), time at which CMax was observed (TMax) and area under the curve (AUC), were compared to the observed ones.

2.4 Evaluation of Inherent Differences Among Software Platforms

The quality of models constructed for target drugs is not only affected by the structural similarity between the template drug and the target drug but also relies on the prediction quality of ADMET Predictor and how good the collaboration is between the software. There may be some inherent differences among different software platforms, including but not limited to the training set data and algorithms for constructing models. More importantly, the prediction accuracy of ADMET Predictor for an individual ADME parameter is unknown. Thus, we utilized parameters predicted by ADMET Predictor for the 18 drugs to simulate their pharmacokinetic profiles using SimCYP and then compared them to those predicted using SimCYP built-in parameters. Since the calculation of molecular weight (MW) must be very accurate, the reliability of this parameter from ADMET Predictor for each drug will not be evaluated (Category I). The following ADME parameters predicted by ADMET Predictor belong to Category II: B/P, Fu, the logarithm of octanol-buffer partition coefficient (log Po:w) and acid dissociation constant (pKa); ADME parameters in Category III include human jejunum effective permeability (Peff), Vd and cytochrome P450 (CYP) metabolism parameters (Km, Vmax or CLint). The prediction accuracy decreases from Category I to Category II and then to Category III. The values of these ADME parameters for 18 drugs are listed in Table S1. To investigate the different qualities of the calculated parameters, we modified the template step by step by introducing more and more ADME predicted parameters. Specifically, in substitution protocol Version 2 (V2), we replaced log Po:w, pKa, B/P and Fu values in the SimCYP drug template with the calculated results from ADMET Predictor. In substitution protocol Version 3 (V3), all the above-mentioned ADME parameters, which not only include the parameters mentioned by V2 but also Peff in absorption, Vd in distribution and CYP metabolism parameters of template drug, were replaced by predicted values of ADMET Predictor.

2.5 Model Construction for Target Drugs

The parameter substitution plan is the same as that for ADME Predictor software evaluation in Sect. 2.4. In total, three versions of PBPK models for a target drug were built by modifying the models of the template drug: (1) in Version 1 (V1), only the MW of template drug was changed to that of the target one; (2) in Version 2 (V2), in addition to the MW, other parameters of template drug, which are the same as in the above-mentioned Version 2, were replaced by those predicted for the target drug; (3) in Version 3 (V3), in addition to MW and physiochemical properties, Peff, Vd and CYP of templates were also replaced with the calculated ones for the target drug, in accordance with above-mentioned Version 3. All the ADME properties of the target drugs are predicted by ADMET Predictor, a software tool that can predict > 140 properties based on its built-in quantitative structure-activity relationship (QSPR) models [22]. Information about the experimental subjects and trial design of each target drug during simulations was derived from the corresponding clinical reports.

2.6 Evaluation of Models for Target Drugs

To evaluate the performance of PBPK models with input parameters from ADMET Predictor, the experimental data of target drugs were overlaid by the simulated CT curves. To quantitively evaluate how well the experimental and simulated curves overlaid each other, we calculated the root mean square error (RMSE) [23] of the observed and predicted concentrations at different time points. The formula for the RMSE calculation is as follows (Eq. 2):

$${\text{RMSE}} = \left[ {\mathop \sum \limits_{{i = 1}}^{N} {{\left( {C_{{{\text{p}}i}} - C_{{{\text{o}}i}} } \right)^{2} } \mathord{\left/ {\vphantom {{\left( {C_{{{\text{p}}i}} - C_{{{\text{o}}i}} } \right)^{2} } N}} \right. \kern-\nulldelimiterspace} N}} \right]^{{\tfrac{1}{2}}}$$
(2)

where Coi and Cpi represent the observed and predicted drug concentration at time point i. N is the number of time points (N > 1) from the extracted observed data. Specifically, in this study, to facilitate the comparison between models for different drugs with various concentration scales, we introduced normalized root mean square error (NRMSE) to evaluate the performance of PBPK models, which is calculated using the following formula (Eq. 3):

$${\text{NRMSE}} = \frac{{{\text{RMSE}}}}{{C_{{{\text{max}}}} - C_{{{\text{min}}}} }}$$
(3)

where \(C_{{{\text{max}}}}\) and \(C_{{{\text{min}}}}\) are the maximum and minimum values among the observed and predicted concentrations using all three versions of models.

The flowchart of the experiment protocol is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of experiment protocol. pKa acid dissociation constant, log Po:w logarithm of octanol-buffer partition coefficient, B/P blood-to-plasma partition ratio, Fu fraction unbound in plasma, Vss volume of distribution at steady state, Peff human jejunum effective permeability, Km Michaelis-Menten constant, Vmax maximal metabolism rate, V1 version 1, V2 version 2, V3 version 3

3 Results

3.1 Drug Pair Selection and Validation of PBPK Models for Drug Templates

Thirteen pairs out of 18 drugs, which have calculated TS ≥ 0.5, were selected for in silico PBPK modeling. Drug pairs with TS < 0.5 were not considered to be structurally similar and were excluded in this study. The calculated TS for 13 selected pairs (Groups A–M) is listed in Table 1. Since both drugs in a pair will in turn serve as the template and target drug for cross validation, we used X-1 and X-2 to label two drugs in the pair, respectively, where X can be A to M.

Table 1 Calculated Tanimoto coefficient between each pair of drugs

The predicted mean plasma concentration-time profiles overlaid with observed data of all 18 template drugs are shown in Fig. 2. Accordingly, Table 2 [20, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39] exhibits the predicted pharmacokinetic parameters (CMax, TMax, AUC) versus observed values. From Table 2, excluding the drugs with unavailable observed pharmacokinetic parameters (dextromethorphan, mephenytoin and fluoxetine), the predicted pharmacokinetic parameters of most drugs are within the standard deviation ranges of their observed values. The predicted values of CMax, TMax and AUC for theophylline are all slightly beyond the margin of error but still within the range of two-fold standard deviation. Overall, as shown in Fig. 2, the observed CT profiles are within the 95% confidence interval (CI) ranges (upper and lower gray dashed curves) of the simulated CT curves. Therefore, the PBPK models for the template drugs have been well validated.

Fig. 2
figure 2

Predicted concentration profiles and observed data of all drugs. Prediction results for all drugs except fluoxetine are from the original SimCYP template. The result for fluoxetine is from the adjusted fluoxetine template. Upper and lower dashed gray curves represent 95% confidential interval

Table 2 Comparison between predicted and observed pharmacokinetic profiles of all drugs

3.2 Evaluation of Inherent Differences Among Software Platforms

The predicted pharmacokinetic parameters of the 18 modified drug templates by replacing the ADME parameters with those predicted by ADMET predictor are listed in Table 2. The CT profiles of those 18 drugs are shown in Fig. 3 (V2) and Fig. 4 (V3). In V2, most drugs exhibit satisfying prediction results. As Fig. 3 shows, in 14 of 18 drugs most of the experimental data points lay within the predicted confidence interval. Only in triazolam, atomoxetine, simvastatin and pravastatin do nearly or more than half of the data points exceed the confidence interval, showing poor prediction performance. V3 shows that bupropion, caffeine and phenobarbital have a very good overlay between the clinical report and predicted result from the modified drug template, with the observed data lying within the confidence interval of predicted curve. For fluoxetine, alprazolam, quinidine and triazolam, although the predicted results do not show an excellent overlay with the experimental data, most of the clinical data points lay within the confidence interval of the prediction profiles. For lorazepam, although the observed data were all at or around the upper confidence interval of the predicted profile, the shape of the predicted curve is very similar to that of the observed pharmacokinetic profile. Unfortunately, the other drugs do not show very satisfying prediction results, using clinical data points as reference.

Fig. 3
figure 3

Predicted concentration profiles using SimCYP drug template with input parameters from ADMET Predictor (log Po:w, pKa, B/P and Fu) and observed data of all drugs. Upper and lower dashed gray curves represent 95% confidential interval. Po:w logarithm of octanol-buffer partition coefficient, B/P blood-to-plasma partition ratio, Fu fraction unbound in plasma, pKa acid dissociation constant

Fig. 4
figure 4

Predicted concentration profiles using SimCYP drug template with input parameters from ADMET Predictor (log Po:w, pKa, B/P, Fu, Peff, Vd and CYP parameters) and observed data of all drugs. Upper and lower dashed gray curves represent 95% confidential interval. Po:w logarithm of octanol-buffer partition coefficient, B/P blood-to-plasma partition ratio, Fu fraction unbound in plasma, pKa acid dissociation constant, Peff human jejunum effective permeability, Vd volume of distribution, CYP cytochrome P450

To quantitatively measure the deviation of predicted concentration profiles from the experimental data, the difference between observed and predicted values was evaluated by NRMSE (Table 3). The lower the NRMSE value is, the smaller the difference between the predicted and experimental concentration profile, i.e., the better the performance of the created drug model. The average NRMSE of V2 is 0.26 compared with the average value of 0.43 for V3, showing that V2 can introduce less prediction error when combining the two software platforms for prediction. Especially for V2, although dextromethorphan has an NRMSE value as large as 0.45, this should be caused by the deviation of the curve from the first data point. All the remaining data points are very close to the predicted curve. Fourteen of 18 drugs have NRMSE values < 0.4, and 7 of them are < 0.2, showing the satisfying prediction and collaboration quality of the two software tools. For V3, the top three drugs, caffeine, phenobarbital and bupropion, all have very small NRMSE values, which is consistent with the fact that the simulated CT curves are well overlain with the experimental data points as shown in Fig. 4. Interestingly, the NRMSE values of fluoxetine (0.41), alprazolam (0.28), quinidine (0.53) and triazolam (0.29) are quite different, even though the simulated CT curves of the four drugs are relatively satisfactory. Taken together, both the overlay of simulated CT curves with the measured CT data points and NRMSE should be used to evaluate the quality of the predicted ADME parameters by the ADMET predictor. Overall, the predicted ADME parameters according to ADMET Predictor can produce satisfactory CT curves using SimCYP simulator for about half of the tested drugs.

Table 3 Calculated normalized root mean square error (NRMSE) between predicted results by modified drug template and experimental concentration profiles of drugs

As illustrated in Figs. 3 and 4, more V2 version models (Fig. 3) have better performance than V3 version models (Fig. 4), suggesting log Po:w, pKa, B/P and Fu are more accurately predicted by ADMET Predictor than Peff, Vd and CYP parameters. Regarding a specific parameter, the prediction performance varies from one compound to another. Thus, we recommend adopting a different version of parameter substitution mainly based on the structural similarity between the template and target drugs. When the structural similarity is very high (TS > 0.9), fewer parameter substitutions are prefered, while when the structural similarity is not very high, more parameter substitutions are desirable, as the prediction errors are smaller than the differences of the parameters between the target and the template.

3.3 Predicted Concentration Profiles for the in Silico PBPK Models

The CT profiles predicted by all three versions (Versions 1, 2, and 3) of PBPK models are shown in Fig. 5. The NRMSE value is also calculated to measure the differences between observed and predicted values of three versions, respectively, which are summarized in Table 4. The table cell is marked with “*” if the NRMSE value of V1, V2 or V3 is < 0.2. In the following, we grouped all 13 drug pairs/26 drug pair sets into three groups according to their Tanimoto scores for the sake of discussion.

Fig. 5
figure 5

Predicted concentration profiles of three versions and observed data of all predicted drugs

Table 4 Calculated normalized root mean square error (NRMSE) between predicted (three versions) and experimental concentration profiles of drugs in each drug pair set

Group I (TS ≤ 0.7). Six drug pairs, A–F, belong to this group. According to Table 4, the performance of the three protocols does not show an obvious pattern for Group I. V1, V2 and V3 have two (A-1 and D-1), five (A-1, A-2, D-1, D-2 and F-1) and three (B-2, C-2 and D-2) pair sets in “*” table cells, respectively. Most of those pair sets also exhibit a good overlay between experimental data points and prediction curves as shown in Fig. 5, indicating the collaboration between SimCYP and ADMET Predictor is good. For the other groups from A-1 to F-2, all three protocols have NRMSE values > 0.2, and the simulated C-T curves do not overlay the experimental data points well. Interestingly, for the D-2 drug pair set, although the NRMSE of the V2 model is the lowest, the predicted C-T curve by the V3 model has a better shape fitting of the observed data as shown in Fig. 5. This phenomenon is caused by the deviation of the first data point from the predicted curve of V3, which caused its NRMSE to be larger than that of V2. When this outlier is eliminated and the NRMSE value is recalculated, V3 becomes the best for this pair set (NRMSEs are now 0.57, 0.16 and 0.06 for the V1, V2 and V3 protocols, respectively).

Group II (0.7 < TS ≤ 0.9). This group contains five drug pairs, G–K. As shown in Table 4, most drug pair sets have at least one version with NRMSE value < 0.2, except H-1 and I-2. Notably, the NRMSE value of I-2 is only 0.21, and the predicted CT curve exhibits good consistency with experimental data (Fig. 5). The failure of the H-1 model is likely caused by using problematic ADME parameters predicted by ADMET Predictor for the target drug. The “collaboration” between the two software tools should not be a problem for this drug pair since the NRMSE values of H-2 are very low for both the V2 and V3 models, which are 0.08 and 0.02 for the two models correspondingly. As shown in Table 4, the V3 version models apparently outperform the V1 and V2 models for most drug pair sets, as seven out of ten V3 models have NRMSE values < 0.2, while none of the V1 models and 2 V2 models have NRMSE values < 0.2. Interestingly, for drug pair set J-2, the V2 and V3 models have highly similar performances with good prediction results as shown in Fig. 5; however, for K-2, all three model versions do not exhibit satisfying prediction (Fig. 5), even though the NRMSE values of the V1 and V2 models are equal to or lower than the cutoff.

Group III (TS > 0.9). This group contains two drug pairs, L and M. As shown in Table 4, most models have satisfactory NRMSE values. For L-1 and L-2 drug pair sets, the predicted profiles of the V2 and V3 models are very close to the clinical data points. Interestingly, for M-1 and M-2 drug pair sets, the performance of the V3 models is very poor. Drug pair M has structural similarity with the TS of 0.95; interestingly, the V3 models perform poorly while the V1 and V2 models have not only satisfactory NRMSE values but also very well-overlain C-T curves with measured data points. This phenomenon may be explained by the prediction error by ADMET Predictor, and error caused by the inherent difference between the two software platforms can be compensated by the small difference in the ADME parameters between the template and target drugs. Indeed, the NRMSE values of the two drugs in drug pair M, 0.51 and 0.70, are very large (Table 4).

As shown in Fig. 5, the performance of three parameter substitution versions varied from one drug pair to another mainly depending on the net effect of eliminating two sources of errors, the prediction errors of ADMET Predictor and the errors of applying the template model to describe the target. For the first source of errors, more and more prediction errors are introduced from V1 to V2 and then to V3. The second source of errors is big for dissimilar drug pairs (Group I) and small for highly similar drug pairs (Group III). For a structurally dissimilar drug pair, V2 or V3 are necessary to overcome the large second source errors, even though more first source errors are introduced. On the other hand, for a structurally similar drug pair, V1 or V2 is preferrable as the errors from both sources are small. More discussion on choosing proper versions of a parameter substitution scheme is provided below.

4 Discussion

In this study, we developed a novel approach to construct in silico PBPK models for target drugs lacking experimental ADME and other pharmacokinetic parameters using an established PBPK model of a structurally similar drug as the model template. We used 18 drugs, which formed 13 drug pairs (A-M) and 26 drug pair sets (each drug in a pair serves the template and target roles alternatively) to evaluate three ADME parameter substitution protocols, which are corresponding to three versions of PBPK models. The performance of the in silico PBPK models was critically evaluated using experimental pharmacokinetic profiles and parameters.

4.1 Practical Guidance on Selecting a Suitable Drug Template

We attempted to obtain guidance on selecting a suitable template drug for a given target drug. We focused on using structural similarity to select the template drugs. It was found that drug pairs with Tanimoto score > 0.70 (Groups II and III) tended to show better prediction performance among the three versions compared with drug pairs with Taminoto score < 0.70 (Group I). It is obvious that the higher structural similarity of two drugs within a drug pair should contribute to the higher possibility of good prediction results. After comparing the model performance of all three versions of models, we developed the following guidance: for Group I drug pairs, V2 or V3 is recommended; for Group II drug pairs, V3 is recommended; for Group III drug pairs, V2 is recommended. Following this practical guidance, 16 out of 26 drug pair sets have NRMSE values < 0.2, the threshold to recognize a good PBPK model. Nevertheless, the prediction accuracy of ADMET Predictor and the extent of inherent difference between it and SimCYP are also crucial factors that affect the model performance. From the evaluation of the error caused by combining the two software tools, the prediction accuracies of each modified drug template varied from each other, which shows the influence of the introduced error can be very different for different drugs.

Additional criteria other than structural similarity between the template and target drugs may be introduced to further improve the computational protocol since the prediction performance of drug pairs with the similar/same TS may have different prediction accuracies as indicated by Fig. 5. This phenomenon is more obvious for drug pair sets with low TS. For example, for drug pair set D (lorazepam/midazolam), the prediction for midazolam by V3 version of parameter substitution is much more accurate than that for lorazepam. This discrepancy may come from the failure of parameter prediction by ADMET Predictor and/or the imperfect collaboration between the two software platforms. Fortunately, this inconsistency problem becomes less severe when the drug pairs share higher structural similarities as for drug pair sets from G to M.

4.2 Another Possible Method to Evaluate the Prediction Results of the Three Versions

There is also another method to evaluate the prediction results of V1, V2 and V3, which is the fold error in the AUC of the three prediction versions compared to the clinical data. However, the fold error in the AUC can only show the difference between the total area under the prediction curve and the literature-reported pharmacokinetic curve without delineating the concrete shapes of curves. Contrarily, the shape of the predicted drug C-T curve can be reflected by the difference between predicted and observed drug concentrations at each time point when using RMSE as an evaluation method. Furthermore, the variation of the dosages can contribute to large RMSE discrepancy among drugs. Therefore, we normalized RMSE to eliminate the influence of dosages on RMSE values. The utilization of NRMSE can help to reduce the false-positive rate.

4.3 Perspective of Applying in Silico PBPK Modeling for Compounds Lack Experimental ADME and Pharmacokinetic Properties

SimCYP simulator is an advanced software with well-constructed drug pharmacokinetic models in its built-in drug library, with each drug template containing comprehensive drug parameters. It can intuitively show simulated drug CT curves contributed by these parameters under different trial designs. On the other hand, ADMET Predictor can predict many pharmacokinetic parameters of an input compound based on its structural information without giving additional information. However, constructing a drug pharmacokinetics model needs full-scale pharmacokinetic parameters, and some of them cannot be predicted reasonably. Considering this, we can partially rely on the pharmacokinetic parameters of another compound which shares high structural similarity with the unknown target compound. In this study, we put forward a novel approach to build PBPK models for a target drug with a lack of measured ADME and other pharmacokinetic parameters using the PBPK model of a template drug which is structurally similar to the target drug. Also, we proposed overall guidance on selecting a suitable template drug and using its PBPK model as the model template. The success of this computational approach depends on two important factors, the availability of a high-quality PBPK model for the template compound and accuracy and consistency of the ADME and pharmacokinetic parameters predicted by ADMET Predictor software for the target drug. Thus, the performance of two software tools can greatly contribute to the experimental results of our study. As a calculator of ADMET properties for compounds, the prediction results of drug properties may not be close enough to the real state, leading to errors when constructing drug models. Additionally, not all the ADME/pharmacokinetic properties can be calculated with the current version of ADMET Predictor. For example, the prediction of metabolism in ADMET Predictor is limited to only five commonly used enzymes (CYP1A2, CYP2D6, CYP2C9, CYP2C19 and CYP3A4), and the prediction results of the transporters related to the drug can only be reported qualitatively rather than quantitively. On the other hand, there are currently 70 established compounds in SimCYP’s drug libraries (including both the substrate and inhibitor libraries), and the libraries are still under development. We tested 18 compounds that shared structural similarity, and this study will be filled out as more clinically validated PBPK models and related parameters for in-use drugs become available.

At the current stage, application of this proposed method to construct a PBPK model for a candidate compound may encounter some difficulties, such as failing to identify a template drug in the SimCYP library that shares high structural similarity with the target compound or several template drugs being identified in SimCYP library that have similar structural similarity with the target compound. For the first problem, a practical solution is to greatly expand the library of PBPK models, and we are now constructing high-quality PBPK models for the top-selling drugs. For the second problem, we can add additional criteria to further prioritize the templates; those criteria include but are not limited to key ADME properties (such as aqueous solubility, permeability and metabolism profile) and drug targets.

Nevertheless, we have proposed a practical approach to generate PBPK models for a compound lacking experimental ADME/pharmacokinetic properties. This model can serve as the initial version of the PBPK models for the target compound, and its performance can be improved using the measured pharmacokinetic profiles and properties in the future. The computational protocol introduced in this work can have important applications in selecting drugs to enter the drug optimization phase or drug candidates to enter preclinical studies.

5 Conclusions

In this work, we have introduced and tested a novel computational protocol to develop an in silico PBPK model for a compound lacking measured ADME/pharmacokinetic properties and pharmacokinetic profiles. The general idea is to choose a proper PBPK model as the template when the corresponding compound, the template drug, is structurally similar to the target drug. For the target drug, we calculated the ADME properties using ADMET Predictor of SimulationPlus Inc. We have developed an overall guidance using this method to build PBPK models for an arbitrary drug. First, the structural similarity between the template and target drug is very important; thus, template drugs that have the highest structural similarity to the target drug should be considered first; second, once the target drug is selected, the ADME parameter substitution protocol is selected based on the Tanimoto score between the target and template drugs. If TS is ≤ 0.7, V2 or V3 protocol is recommended; if TS is > 0.7 but ≤ 0.9, V3 protocol is suggested. If TS is > 0.9, V2 is recommended. Following this guidance, > 60% (16 out of 26) of the PBPK models have satisfactory performance. It is emphasized that this method relies greatly on the collaboration between SimCYP and ADMET Predictor as well as the prediction accuracy of ADMET Predictor. The NRMSE values of the template and target drugs can guide us to select proper substitution protocols. If the NRMSE values are small, one can select a protocol with many ADME parameters being substituted, such as V3; however, if the NRMSE values are large, adopting V2 or V1 protocols can minimize the error due to the poor “collaboration” between the two software platforms. Unfortunately, the NRMSE value of the target drug is unknown in practice. A tool which can predict this NRMSE parameter is thus needed to further improve this method. While future experimental work is definitely needed to further improve the model performance, our novel approach proposed in this work can help identify drug candidates with favorable pharmacokinetic profiles, reducing experimental cost and providing insight into drug discovery and development.