Keywords

The drug discovery and development process is time consuming and expensive, encompassing approximately 15 years and over two billion dollars to bring a drug to market [1]. Stage-appropriate use of models is an integral part of the drug discovery process . Early-phase drug discovery uses various in silico and in vitro models to explore potency, ADME properties, and safety. As drug discovery progresses, preclinical in vivo animal models are used to estimate how a compound will behave in humans, and ultimately model situations are created in a controlled clinical environment (clinical models) before the compound is approved for use in the general population.

In an attempt to reduce the time and cost associated with the drug discovery process, in silico tools are one class of models employed throughout this process. In silico tools have a direct impact on how drug discovery progresses and are especially useful in the early-phase of drug discovery where a clinical candidate is being pursued and optimized. These tools are used to design and prioritize the synthesis of compounds with desirable affinity, specificity, a multitude of ADME properties, and safety with the goal of delivering the best possible compound to test in the clinical setting.

In this chapter, we provide an overview of various in silico models and tools employed to identify and resolve ADME challenges during the process of drug discovery. Generally speaking, in silico ADME tools are classified into two major categories, structure-based and ligand-based. Each class of in silico tools are addressed in subsequent sections.

1 Structure-Based In Silico Models

When sufficient structural information exists on the protein of interest, generally in the form of a nuclear magnetic resonance or crystallographic X-ray structure, structure-based drug design techniques are used in early-phase drug discovery. In structure-based drug design, interactions between the protein and the ligand are the focus of the study, and this is commonly referred to as rational drug design. Novel ligands can be designed de novo, meaning the interactions between a hypothetical ligand and the protein are optimized with the goal of creating a compound with high affinity and selectivity. Molecular docking can be used to orient a ligand within the active site of the protein to provide an estimate of the protein-ligand interaction. However, molecular recognition between a protein and a ligand is a complex process that does not occur in a static structure. Molecular dynamics (MD) and Monte Carlo (MC) simulations are computational techniques used to create trajectories that model the protein-ligand fluctuations and dynamics in atomic detail [2, 3].

1.1 Molecular Docking

The goal of molecular docking is to model the potential interaction between a protein and a ligand [4]. Although several docking programs exist [4,5,6,7,8,9,10,11], each docking program can be broken down into two general parts: the search function used to orient and place the ligand inside the binding pocket (binding pose generation) and the scoring function used to quantify the protein-ligand interaction and predict the binding affinity (binding affinity prediction). This chapter provides an overview of the current status of molecular docking but does not go into detail on search algorithms or scoring functions, both areas of active research.

For certain protein targets, the search algorithm may generate bioactive binding poses (root-mean-square deviation <2 Å) during the search process for 90% of compounds, but this percentage can be as low as 40% for other protein systems [12]. This is especially challenging for ADME targets that are known to bind a diverse array of compounds and are promiscuous in nature. For many ADME targets, factors such as the size of the binding pocket (relatively large and hydrophobic), the water network within the active site, and protein flexibility lead to significant challenges while utilizing molecular docking. Figure 4.1 illustrates this point on one class of ADME targets, the cytochrome P450 (CYP) family of enzymes. CYPs are estimated to be involved in the metabolism of approximately 75% of drugs currently on the market with CYP3A4 known to metabolize approximately 50% of such compounds [20]. While several publications exist on CYP3A4 docking [21,22,23,24,25,26], the abovementioned problems limit its use in early-phase drug discovery programs outside of qualitative idea generation.

Fig. 4.1
figure 1

Reproduced from Danielson et al. Potentially increasing the metabolic stability of drug candidates via computational site of metabolism prediction by CYP2C9: The utility of incorporating protein flexibility via an ensemble of structures. Eur J Med Chem 2011 Sep.;46(9):3953–63. Copyright © 2001 published by Elsevier Masson SAS. All rights reserved [13]. Examples of protein flexibility in cytochrome P450 enzymes: (a) Changes in Arg47 side-chain rotamer in P450 BM-3 depending on the bound ligand (palmitoleic acid and corresponding protein in blue, PDB-code: 1FAG [14]; N-palmitoylmethionine and corresponding protein in magenta: 1ZO9 [15]). (b) Alternative loop conformations are observed in CYP119 when different ligands are bound. Compared to the apo structure of CYP119 (F/G loop in orange: 1IO7 [16]), the F/G loop adapts distinct configurations when 4-phenylimidazole (ligand and loop in magenta: 1F4T [17]) or imidazole (blue: 1F4U [17]) is bound. (c) In CYP3A4 significant protein flexibility occurs in the F/G portion of the protein (apo: orange, 1TQN [18]; erythromycin bound: blue, 2J0D [19]) to accommodate erythromycin and part of the F–F′ loop becomes disordered. This motion causes the solvent-accessible volume of the binding site to significantly increase and can dramatically affect ligand binding. (d) CYP3A4 exhibits a protein breathing motion increasing the size of the binding pocket to accommodate two ketoconazole (ligands in magenta, protein in blue: 2V0M [19]) compounds without significant conformational changes of the helices or loop regions composing the binding pocket (apo: orange: 1TQN [18])

In instances where the docking search algorithm identifies a bioactive binding pose, current scoring functions are not accurate enough to reliably predict the binding affinity [27,28,29]. The correlation between the experimentally measured and predicted binding affinities for a series of compounds binding to the same protein target is usually weak and often influenced by the size of the ligand rather than the underlying physicochemical contributions to the binding affinity [30, 31]. Therefore, bioactive binding poses are not always ranked as the most energetically favorable (or top ranked) during the docking procedure [12]. In addition, the lack of accuracy and separation in binding affinity prediction makes it challenging to predict the binding affinities of compounds within a structure-activity relationship (SAR) series let alone in silico de novo-designed compounds. A recent review by Lill [32] discusses many of the current problems and challenges of molecular docking and goes into greater depth on techniques used to overcome such obstacles.

Post-processing is one such technique designed to overcome the problem of using simplistic scoring functions in docking and can significantly improve the successful prediction of binding affinities [33, 34]. Post-processing techniques incorporate dynamic information of the protein-ligand system after the docking process has been completed. The top-scored binding pose, or several favorably scored poses, is used as input to subsequent MD simulations. In combination with free-energy methods such as free-energy perturbation [35], thermodynamic integration [36], molecular-mechanics Poisson-Boltzmann or generalized Born surface area [37], or linear interaction energy analysis [38], a more accurate estimation of the free energy of binding is possible [33]. However, this process is relatively time consuming and requires that the bioactive binding pose is within the top-ranked binding poses in order to limit computational time, a criterion that is not always evident when carrying out molecular docking studies on large and rather promiscuous ADME targets.

1.2 Molecular Dynamics

Molecular dynamics (MD) is a computational technique used to study the physical movement of atoms. The first MD simulation of a biomolecular system was done in 1977 on bovine pancreatic trypsin inhibitor using a simplistic molecular mechanics potential to describe the properties of the system [39]. Although this simulation was only performed for 9.2 ps, it was a groundbreaking study that showed that integrating Newton’s equations of motion over a series of very short-time steps (usually one or two femtoseconds) could transform a once static X-ray structure into a dynamic trajectory from which time-averaged properties could be calculated. Underlying any MD simulation is a physics-based force field that defines all parameters of the system. Several force fields and MD programs exist [40,41,42,43,44,45,46], and the parameters are usually defined by high-level quantum chemical calculations or empirically fit to experimental properties. In addition to the force field parameters, a potential function, or mathematical relationship, is needed to describe how the individual atoms of a system interact during the MD simulation. Most force field potentials describe the interactions between atoms in the system in terms of a five-component description of intra- and intermolecular forces. The AMBER force field potential is shown in Eq. (4.1) and consists of bonded (bonds, angles, and dihedral terms) and nonbonded (van der Waals and electrostatic terms) components [42].

$$ \begin{array}{cc}\hfill V\left({r}^N\right)={\sum}_{\mathrm{bonds}}{K}_r{\left(r-{r}_{\mathrm{eq}}\right)}^2\hfill & \hfill \mathrm{bond}\kern0.5em \mathrm{term}\hfill \\ {}\hfill \kern5em +{\sum}_{\mathrm{angles}}{K}_{\theta }{\left(\theta -{\theta}_{\mathrm{eq}}\right)}^2\hfill & \hfill \mathrm{angle}\kern0.5em \mathrm{term}\hfill \\ {}\hfill \kern9.5em +{\sum}_{\mathrm{dihedrals}}\frac{V_n}{2}{\left(1+ \cos \left[ n\phi -\gamma \right]\right)}^2\hfill & \hfill \mathrm{dihedral}\kern0.5em \mathrm{term}\hfill \\ {}\hfill \kern5em +{\sum}_{i<j}^{\mathrm{atoms}}\left(\frac{A_{ij}}{R_{ij}^{12}}-\frac{B_{ij}}{R_{ij}^6}\right)\hfill & \hfill \mathrm{van}\kern0.5em \mathrm{der}\kern0.5em \mathrm{Walls}\kern0.5em \mathrm{term}\hfill \\ {}\hfill \kern1.2em +{\sum}_{i<j}^{\mathrm{atoms}}\frac{q_i{q}_j}{\varepsilon {R}_{ij}}\hfill & \hfill \mathrm{electrostatic}\kern0.5em \mathrm{term}\hfill \end{array} $$
(4.1)

In this type of potential, intermolecular bonds are treated as a simple Hooke’s law springs with a characteristic force constant K r and equilibrium bond length r eq. The angular term accounts for bond angle bending in the system, and the dihedral term represents the intrinsic torsional energy due to twisting about bonds. The van der Waals term accounts for the attractive London dispersion and repulsive van der Waals nonbonded forces and is calculated by a 12-6 Lennard-Jones potential. Force field assigned atomic partial charges are used to calculate the nonbonded electrostatic interaction between two atoms by solving Coulomb’s law. Summing over all pairs, triplets, and quartets of atoms in the system, the force field potential provides an estimate of the energy of the system at a particular configuration. A more detailed description of MD and the algorithms associated with this technique can be found elsewhere in the literature [3, 41,42,43, 47,48,49].

Currently, MD simulations are performed on macromolecular systems comprised of thousands of atoms, and several different explicit and implicit water models exist to solvate the system [47,48,49,50,51,52,53]. The nanosecond time scale is routinely reached in MD simulations, and in specialized instances protein systems have even been simulated up to the millisecond time scale [54, 55]. With increasing computer power and advances in technologies and methods, millisecond time scale simulations may become routine in the near future. However, this also brings with it additional challenges such as storing, analyzing, and interpreting such a vast array of data. Despite the previously mentioned problems, MD simulations are routinely used to turn a static X-ray crystallographic structure into a dynamic system. Snapshots taken from the MD simulation provide some estimate of protein flexibility and can be used as alternative templates for molecular docking, and this technique has been utilized in several CYP isoforms [13, 56,57,58,59,60,61]. While MD simulations have become routine in the computational chemistry field, their application in early-phase drug discovery has not. This is especially true for ADME targets due to very limited number of high-resolution X-ray crystallographic structures and their promiscuous nature. Additionally, the time and resource intensive nature of MD simulations and the rather fast-paced movement of chemistry SAR on project teams further limit the application of MD simulations during this phase.

2 Ligand-Based In Silico Models and Tools

2.1 Quantitative Structure-Property Relationship (QSPR) Models

Quantitative structure-activity relationship (QSAR) models are one of the commonly employed ligand-based techniques to predict the activity of compounds. The field of modern QSAR can be traced back more than 50 years to a model produced by Hansch [62]. QSAR sophistication has grown from its early application on a small congeneric series of compounds using simple linear regression to now being applied to data sets comprised of thousands of diverse compounds utilizing a wide variety of statistical and machine learning algorithms.

When such models are used to predict various properties, including ADME endpoints, they are referred to as quantitative structure-property relationship (QSPR) models. Given the promiscuity and limited structural knowledge of ADME targets, QSPR models are commonly used in the pharmaceutical industry to address ADME-related challenges. The basic premise of QSPR methodology is to develop a relationship between an observed property and structural features of a compound. Considering a set of compounds with observed experimental data (training set), a model is developed that can be used to predict the activity of other compounds (test set) not included in the initial training set. Compounds are represented using a variety of molecular descriptors that describe the chemical structure and properties of the compound. A relationship between the molecular descriptors and the observed response is computed using mathematical techniques such as linear regression, artificial neural network , support vector machine (SVM), and random forest (RF). A general description of such algorithms is summarized in Sect. 4.2.1.4. Figure 4.2 illustrates the general process of building and applying QSPR models to a group of compounds, and each step of the process is further explained below.

Fig. 4.2
figure 2

Schematic representation of key components when building and applying QSPR models . The top section shows the generalized equation representing a typical QSPR model and lists key components required to derive such an equation for a given data set. The bottom section depicts a typical workflow used to build and use a QSPR model

2.1.1 Data Set Selection and Curation

The first step to create any QSPR model is the selection of the data set that the model will be built upon. A key consideration when choosing any data set to create a model upon is that the data should be accurate, reliable, reproducible, and measured using identical experimental conditions for all compounds. This can be a significant challenge when building QSPR models based on public databases compiled by collating data from multiple labs spanning a variety of experimental protocols. Stouch et al. demonstrated that models based on data sourced from multiple labs showed poor predictive capabilities for compounds tested in a rigorous and consistent manner [63]. For example, in the case of a hERG inhibition model provided by an external vendor, the data were collated from several different laboratories using a variety of assay conditions: different cell types expressing the hERG channel and different activation potentials for the channel, along with combining binding and inhibition data. The predictions from the vendor model had a poor correlation coefficient of 0.01 and a high root-mean-square error (RMSE) of 1.3 log units for the test set evaluated by the authors.

Following the selection of data, the importance of data curation cannot be overemphasized. In order to create the best possible QSPR model, it is critical to minimize the inclusion of potentially erroneous data. The potential sources of erroneous data include false positives/false negatives, under-/overestimated responses, spurious results (e.g., microsomal stability >100%), incorrect structural representation of compounds, data below the analytical detection limits, and impure material. For example, while building a classification model for P-glycoprotein (P-gp) efflux, Desai et al. excluded compounds reported as non-substrates displaying >60% inhibition of a fluorescent P-gp substrate, very slow passive permeability, and very low cell partitioning (all cases suggesting potential false negatives) in addition to compounds with poor mass recovery (potentially spurious data) [64]. When feasible, it is good practice to find and utilize analytical data related to identity and purity of compounds. Such information is commonly available in an industrial setting but not easily found for data compiled from multiple sources and available in public databases like ChEMBL . In a previous study, several public and commercial databases were investigated, and error rates in chemical structure annotation ranged from 0.1% to 3.4% [65].

In order to properly curate the assay data that will be used to build a model, it is critical to understand the experimental protocol and potential caveats associated with that given measurement. One of the common issues leading to potentially erroneous results is poor solubility of the compound in the medium used for the assay (e.g., none or very little of the compound is in solution giving an incorrect assay value). This can potentially be addressed by running a parallel experiment to measure the solubility of the compound in the buffer used for the ADME assay. For example, at Eli Lilly and Company , aqueous kinetic solubility in pH 7.4 phosphate buffer is measured for all compounds tested in high-throughput ADME assays. This information is used to curate the data for various ADME endpoints wherein compounds that are not in solution at the concentration used for the given ADME assay are not included in the QSPR model. To summarize this section, while it is often an overlooked and underappreciated step, data curation based on detailed understanding of the experimental measurement is a critical step in building high-quality QSPR models.

2.1.2 Training Set Selection

Following data curation, the next logical step of creating a QSPR model is selecting compounds to construct and train the model. What size or how many compounds needed to be in the training set is a precarious question that is sometimes asked. No easy answer to the question exists, and the size of the training set needed to build a useful model depends on the complexity of the endpoint and the intended use of the model. For example, for models intended to be applied prospectively to compounds spanning a wide range of structural diversity, the training set should reflect similar structural diversity and perhaps as much diversity as possible. Prospective model performance, meaning how well the model predicts compounds not in the training set, also depends on whether the training set encompasses the entire range of the assay response. For models such as microsomal metabolic stability that are based on a continuous response (assay range from 0% to 100%), the ideal situation is to have a training set containing compounds spanning the entire 0–100% range and uniformly distributed if possible. For categorical response such as low or high, an even or close to even distribution of compounds between the categories is desired.

Models constructed with training sets that span a narrow spectrum of the entire assay response (e.g., a training set containing 95% of compounds that have microsomal metabolic stability of >90% when the assay range spans 0–100%) or with a highly skewed distribution of the categorical response (e.g., 95% of compounds in the training set belong to the “high” class) are likely to result in QSPR models with limited utility when used prospectively.

2.1.3 Molecular Descriptors

Following data curation and training set selection, molecular descriptors must be calculated in order to derive the mathematical relationship between chemical structure and assay activity. Molecular descriptors are numerical parameters derived from chemical structures, and a wide variety of descriptors are used to build QSPR models. Physicochemical (e.g., log P, pK a, MW), topological (e.g., atom connectivity), constitutional (e.g., number of nitrogen), and quantum chemical (dipole moment, atomic charges) are few examples of common types of descriptors. To gain a deeper understanding and comprehension of molecular descriptors, the reader is referred to a publication by Todeschini and Consonni [66].

In addition to molecular descriptors, molecular fingerprints are often used to represent chemical structures [67, 68]. A molecular fingerprint is comprised of a series of substructures, and the presence/absence of such substructures determines the numerical code for the molecular fingerprint [69,70,71]. For example, Molecular Access System (MACCS) fingerprint uses a set of structural features to code the compound into a binary representation [72]. Figure 4.3 shows an example snippet of the MACCS fingerprint representation for the drug diazepam. The column titled “key positions” in the figure assigns a number to a particular chemical feature, listed under “fragment description.” The “fingerprint code” is a binary value associated to the absence (assigned zero)/presence (assigned one) of the chemical feature. Using the “key positions” and “fingerprint code,” one can derive the final fingerprint shown in Fig. 4.3. Only “fingerprint codes” that are present in the compound are kept in order to keep the fingerprint code vector sparse.

Fig. 4.3
figure 3

Snippet of MACCS fingerprint of diazepam

Typically, when constructing a QSPR model , a large collection of molecular descriptors and a variety of fingerprints are calculated. The descriptors and fingerprints are subsequently evaluated using statistical approaches to select the optimal combination to relate chemical structure to the activity of the endpoint. When constructing a model for the first time, several versions of the QSPR model may be built using various combinations of descriptors or fingerprints followed by several iterations of prospective model evaluation (Sect. 4.2.1.5) to identify the optimal collection of descriptors or single best fingerprint [73].

2.1.4 QSPR Model Training/Building

After data curation, training set preparation, and descriptor/fingerprint selection, the QSPR model is ready to be built. Mathematic algorithms such as linear regression, artificial neural network, SVM, and RF are routinely used to train and build QSPR models [74]. Linear regression (for continuous response) or discriminant (for categorical response) models assume that the measured property value is an additive response to the underlying molecular descriptors. For example, in the QSPR model for solubility shown in Eq. (4.2) [75], it is assumed that solubility is linearly dependent on lipophilicity (log P) and topological polar surface area (TPSA) .

$$ \log S=-1.0377 \log P-0.0210\mathrm{TPSA}+0.4488 $$
(4.2)

Besides prediction, linear models may provide mechanistic insight and can be interpretable in nature as long as the molecular descriptors are “simple” and intuitive. Thus, in case of the solubility model in Eq. (4.2), the negative coefficient for log P suggests that an increase in the lipophilicity of compounds is expected to decrease solubility.

Given the complexity of most ADME-related responses, linear models appear to only be applicable over a relatively narrow spectrum of compounds that contain conserved structural motifs. In practice, such models are rarely useful prospectively due to their inability to extrapolate and predict compounds outside their immediate domain of applicability. Machine learning methods such as RF [76, 77] and SVM [78, 79] have been applied to QSPR models to combat the abovementioned limitations and are capable of elucidating more complex relationships between structural descriptors and the observed response.

In general terms, RF models are based on several iterations of the recursive partition approach , and SVM models identify a hyperplane in the high-dimension descriptor space to enable maximum separation of observed responses. Within the pharmaceutical industry, a large amount of ADME data are generated in a consistent manner, and therefore such machine learning methods are preferred to build “global” QSPR models that are designed to be applicable across multiple drug discovery projects that cover a broad spectrum of chemical space [80]. In our experience, such models typically outperform linear QSPR models in extracting structure-property relationship knowledge from large sets of diverse compounds. However, given the complexity of RF and SVM models, they are relatively less interpretable compared to linear models and often offer limited mechanistic insight to go along with predictions. Although generally less interpretable, it should be noted that it is possible to get an estimation of the most influential descriptors for RF models, in turn providing some understanding of key molecular characteristics influencing a given endpoint. For example, in case of an RF model for P-gp efflux, Desai et al. identified that molecular features related to the number of hydrogen bond donors (HBD) , TPSA, and hydrogen bond strength were most influential in terms of P-gp efflux of compounds [64].

2.1.5 QSPR Model Evaluation

The performance of a QSPR model is evaluated using a variety of parameters depending on the type (continuous vs. categorical) and the intended use of the model. Performance parameters are typically calculated at three stages of the model building process. For example, after building a continuous response model, the first stage is to assess the ability of the model to fit the training set compounds. This metric is commonly referred to as r 2 in the QSAR/QSPR literature. The second stage evaluates the ability of the model to predict the set of compounds left out of the model building process in an iterative manner (called cross-validation, leave-one-out, or leave-some-out) is referred to as q 2. The third stage is known as external or prospective validation , and the model’s ability to predict compounds that were not used during any stages of the model building process is evaluated.

The ability of the model to fit the training set simply serves as a feasibility assessment. It does not provide an assessment of the model’s ability to predict compounds outside the training set and therefore isn’t particularly useful [81]. Cross-validation is based on prediction of compounds left out of the model but is still an internal validation as it derives the test set from the existing pool of compounds. Depending on the modeling method employed, the cross-validation test set can bias the choice of descriptors and other model-related parameters [82]. Many experts in the QSAR community believe that this type of validation often overestimates a model’s ability to predict a true external or prospective test set. Therefore, in order to comprehensively evaluate the utility of a QSPR model, it is critical to assess its predictive ability against an external prospective test set [64, 83,84,85].

For QSPR models based on continuous data, the square of the correlation coefficient (r 2) between the observed and predicted value (referred to as q 2 when used in the context of cross-validation) is the most common performance parameter reported. RMSE between the observed and predicted values is another key parameter used to assess continuous response model performance. Higher values of r 2 (maximum 1 for a perfect model) and smaller values of RMSE are desirable [86]. In many cases, Spearman’s rank correlation coefficient (ρ) is also reported as an indicator of model performance [87]. Depending on the intended use of the QSPR model, one or more of these parameters may be utilized to determine how well a particular model is preforming. For example, if the goal is to identify a model wherein predictions are correlated with the observations (not necessarily to predict the absolute value of the property), the r 2 of a prospective test set would serve as a useful parameter. On the other hand, to simply rank order the prospective compounds, a model with high ρ value would be sufficient. If the goal is to accurately predict the absolute value of the property, a model with low RMSE would be necessary. The ideal QSPR model would have favorable performance values for all of the abovementioned metrics.

Classification QSPR models have a different set of performance metrics compared to regression models. Commonly reported performance parameters for classification models are based on the fraction/percent of correct predictions (overall accuracy), the accuracy of each experimental class (sensitivity and specificity), and the accuracy of each predicted classes (PPV and NPV). Table 4.1 provides details to calculate the abovementioned parameters and is referred to as a contingency table or confusion matrix. In addition to these widely used metrics, parameters such as the kappa index are often reported to assess the agreement between prediction and the experimentally determined category. A kappa value of 1 indicates perfect agreement between predictions and experimental values, −1 suggests complete disagreement, and 0 indicates the prediction is no better than random chance. In general, a kappa value >0.4 is considered an indicator of reasonable model performance with useful predictive power [88, 89].

Table 4.1 Contingency table with equations for a classification QSPR model

2.1.6 Interpretation of Model Prediction

In addition to the abovementioned parameters for model evaluation, several other factors should be considered when assessing the utility and/or applying a QSPR model to a given drug discovery project. In the case of a continuous response model, an applicability domain-related parameter should also be considered in addition to the predicted value if available. Meaning a parameter that indicates if the QSPR model can, or should, predict a compound of interest based on what the model was trained on. If the compound of interest is vastly different than all compounds in the training set, it is expected that such an applicability domain parameter would be unfavorable. Several methods to estimate the applicability domain for a QSPR model have been described in the literature, and they generally provide a qualitative indicator of the confidence for each prediction or a quantitative estimation of the confidence interval around the predicted value [90,91,92,93].

In addition to the standard contingency table metrics commonly reported (see Table 4.1), if one is evaluating a classification QSPR model built with a machine learning method (e.g., RF or SVM), the predicted scores of each compound give an estimation of the relative confidence or reliability of prediction [64, 77, 94]. For example, for two compounds predicted to be in the same category, the compound associated with higher score is assumed to be a more reliable prediction compared to the other.

In addition to the abovementioned numerical parameters reported to determine QSPR model applicability/reliability , in order to conduct a thorough assessment of the utility of a model for a given chemical scaffold or drug discovery project, one should always consider:

  • The inherent experimental variability in the measurement, especially in case of the high-throughput ADME assays. Model performance has been shown to be directly related to the inherent variability in the measurement of the given assay parameter [95]. For regression QSPR models built on continuous data, one should evaluate the performance of the model based on the proportion of predicted values that falls within the experimental variability of the measured response and not just rely on an r 2 value. For example, if the inherent variability of an assay is threefold, a model built on these data should be evaluated with this variability in mind. One should check the proportion of the prospective test set that are predicted within threefold of the experimental values. A regression model may not have an r 2 value of 0.9 for this model, but if 90% of the predicted compounds are within threefold of the experimental values, then that model will still be useful.

  • Due to the variability in ADME high-throughput assays, we build and advise the use of categorical QSPR models for such data.

  • The QSPR model should be evaluated on a prospective test set that spans the entire spectrum of the response, or in the case of a categorical model, the test set should have a balanced distribution of compounds from each category or one that mirrors the training set distribution.

  • The assessment of a QSPR model should not be based on a small fraction of compounds, only the most recent compounds, or only the potent compounds from a given chemical scaffold or drug discovery project.

  • A QSPR model should not be evaluated based on its performance against a second experimental endpoint not directly predicted by the model. For example, comparing predictions from a QSPR model built on in vitro microsomal metabolic stability data against an in vivo clearance outcome should not be done without establishing if this is permissible. The compound and scaffold of interest may be cleared by mechanisms other than microsomal metabolism, and an in silico microsomal clearance QSPR model should not be expected to accurately predict the in vivo clearance value for such cases.

2.2 ADME QSPR Models Used at Eli Lilly and Company

Over the past couple of decades, many publications pertaining to the application of QSPR models for ADME-related physicochemical properties and in vitro/in vivo endpoints have been published. In an attempt at brevity, the reader is referred to review articles that summarize this area of research [96,97,98]. Table 4.2 provides a brief summary of ADME QSPR models developed and used at Eli Lilly and Company. The data set for each individual model was generated by/for Eli Lilly and Company using consistent experimental conditions for each individual ADME in vitro or in vivo assay. Total data set size ranges from 2,000 to 80,000 depending on the throughput of the particular assay. All ADME QSPR models are built using an SVM algorithm with an optimum molecular fingerprint selected for each assay endpoint.

Table 4.2 Representative list of ADME QSPR models used at Eli Lilly and Company

2.3 Prospective Validation of ADME QSPR Models at Eli Lilly and Company

In an industrial drug discovery paradigm where new pharmacological targets are constantly explored, it is important to update global QSPR models to ensure their applicability and prospective prediction performance. Figure 4.4 highlights the outcome of this chronological process at Eli Lilly and Company where prospective performance of ADME QSPR models was maintained for several classification models used over the past several years.

Fig. 4.4
figure 4

Prospective validation of ADME QSPR classification models used at Eli Lilly and Company. Average PPV and NPV over the last 8–10 versions are shown. Error bars represent the standard deviation. All models were applicable for ~80% of prospective test sets when score cutoffs were used to “accept” a prediction

As drug discovery project teams synthesize and test new compounds in various ADME in vitro assays, the global models are updated by curating and adding the new data to their respective training sets. Before updating any particular model, the existing model is prospectively evaluated to measure its predictive performance against data generated after the model was built. The result of this assessment for a set of seven Eli Lilly and Company ADME models is shown in Fig. 4.4. The training set for these models range from ~4,000 to 75,000 and increases in number with every model update cycle. Focusing on the mouse metabolic turnover model , the oldest version of the QSPR model in Fig. 4.4 was built using ~40,000 compounds. Before updating the model, it was prospectively evaluated against an additional ~4,000 compounds, and after showing suitable performance, the new data were added to the training set of the existing model to build the next version containing ~44,000 compounds.

All models in Fig. 4.4 are SVM models using fingerprints as descriptors and provide categorical predictions, along with a score representing the reliability of such a prediction. As explained in Sect. 4.2.1.6, predictions associated with higher scores are expected to have greater likelihood of aligning with the measured response. Based on the prospective validation results, suitable score cutoffs (typically 0.7 on a scale of 0–1.0 for both prediction categories) are assigned to “accept” a given prediction, while predictions with scores below the cutoffs for a given category are labeled as “indeterminate.” The PPVs/NPVs shown in Fig. 4.4 are calculated for compounds with “acceptable” scores. For all models listed in Fig. 4.4, >80% of the test set compounds had “acceptable” scores, and thus the models were applicable for >80% of the test sets. As shown in Fig. 4.4, the average PPV/NPV for the ADME models ranged from 75% to 85% in prospective testing. Given such consistent prospective performance, the ADME QSPR models are routinely used to design and prioritize compounds for synthesis and testing during early-stage drug discovery. The performance of various versions of the global P-gp efflux model and its application in identifying and addressing challenges related to central nervous system (CNS) drug discovery projects is described in detail by Desai et al. [64].

2.4 Trends Between Calculated Physicochemical Properties and ADME Parameters

To complement the usefulness of ADME QSPR models, the physicochemical properties of compounds influencing ADME properties is well documented. One of the earliest analysis of ADME properties was performed by Lipinski leading to the “rule of five” suggesting that poor absorption and permeability are more likely if the molecular weight (MW) is >500, the number of NH and OH hydrogen bond donors is >5, the calculated log P (i.e., clog P) is >5, and the number of N and O atoms is >10 [99]. The goal of this guideline was not necessarily to rule out certain synthetic ideas but rather steer the synthetic chemistry effort toward chemical space that is more likely to yield compounds with superior ADME properties. Subsequently, several analyses describing the trends between calculated physicochemical properties and in vitro/in vivo ADME parameters have been reported [100,101,102,103]. In an exhaustive analysis of a large and structurally diverse set of preclinical compounds profiled at GlaxoSmithKline, Gleeson reported relationships between several ADME assays and calculated physicochemical descriptors [100]. This included in vitro ADME endpoints like solubility, permeability, rat brain tissue and plasma protein binding, P-gp efflux, and inhibition of the CYP isozymes . Several in vivo ADME parameters like oral bioavailability, clearance, volume of distribution, and CNS penetration in the rat were also analyzed. Some of the calculated physicochemical descriptors used in this analyses were clog P, clog D, the number of hydrogen bond acceptors (HBA) and donors (HBD) (typically counted as number of N + O for HBA and NH + OH for HBD), positive and negative ionization states, molecular flexibility, molar refractivity, MW, TPSA [104], and the number of rotatable bonds. From this descriptor list, ionization state, clog P, and MW were identified as the most influential physicochemical properties for ADME properties. The paper suggested that compounds with a MW of <400 and a clog P of <4 were preferred with regard to maintaining a favorable ADME profile. In another report by Varma et al. [102], ionization state, lipophilicity, and polar descriptors were found to be the physicochemical determinants of renal clearance in human based on a compiled data set of ~400 marketed drugs. It is important to keep in mind that the conclusions about correlations between physicochemical and ADME properties can be strongly influenced by the size and nature of the database employed. Moreover, many of the physicochemical parameters are not independent of each other. For example, an increase in MW is likely to be associated with increase in the number of heteroatoms like N and O, which in turn are associated with TPSA.

Figures 4.5, 4.6, 4.7, and 4.8 along with summary Table 4.3 detail Eli Lilly and Company’s ADME in vitro data in relation to key physicochemical properties over the past 2 years. Figure 4.5 shows the trend that as clog P increases so does microsomal unbound intrinsic clearance (Clint,u) [105]. This analysis indicates that compounds with clog P value <4 are more likely to have slow unbound intrinsic clearance (Fig. 4.5) and a low CYP3A4 inhibition potential (Fig. 4.8). Similarly, compounds with clog P between 2 and 4 (Fig. 4.6) and TPSA <100 Å2 (Fig. 4.7) are more likely to have rapid permeability across MDCK cells. Desai et al. have previously published physicochemical trends for efflux by the P-gp transporter and reported having the most basic pK a < 8.0 and TPSA <60 Å2 as key physicochemical properties of P-gp non-substrates [64].

Fig. 4.5
figure 5

Experimental rat microsomal Clint,u vs clog P . Green = slow, yellow = moderate, red = rapid Clint,u. Global data analysis suggests compounds with clog P of <4 are less likely to have rapid Clint,u

Fig. 4.6
figure 6

Experimental MDCK permeability vs clog P . Green = rapid permeability, red = slow. Global data analysis suggests that compounds with clog P between 2 and 4 are more likely to have rapid permeability

Fig. 4.7
figure 7

Experimental MDCK permeability vs TPSA . Green = rapid permeability, red = slow. Global data analysis suggests compounds with TPSA of <100 Å2 are more likely to have rapid permeability

Fig. 4.8
figure 8

CYP3A4 inhibition vs clog P . Green = low inhibition, red = high inhibition. Global data analysis suggests compounds with clog P of <4 are more likely to have low inhibition of CYP3A4

Table 4.3 Trends between calculated physicochemical properties and ADME endpoints from Eli Lilly and Company data

2.5 Pharmacophore Modeling

Another ligand-based modeling technique that is used in drug discovery is pharmacophore modeling. The word pharmacophore has several definitions associated with it despite the concept being around for over 40 years. A medicinal chemist may define a pharmacophore as a structural fragment or functional group related to a chemical compound or series of compounds. Computational chemists often define a pharmacophore as a collection of hydrogen bond acceptors, hydrogen bond donors, aromatic rings, charged atoms, and hydrophobic regions of compounds that provide affinity and specificity to a particular target. The official IUPAC definition states, “A pharmacophore is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response” [106].

No matter the definition , the concept of pharmacophore modeling is simple and even intuitive to medicinal chemists working in early drug discovery. The technique models the interaction between a ligand and a binding site, thereby producing a model of the spatial arrangement of molecular features essential for biological activity. The central premise of a pharmacophore model states that if a compound contains the needed molecular features in a spatial orientation that matches the model, the compound should bind to the target of interest. Pharmacophore models have been created for several ADME targets along with being used to predict activity, selectivity, toxicity, and enrichment in high-throughput screening experiments [20, 74, 107,108,109,110].

The scope of this chapter provides an overview of pharmacophore modeling and will only briefly introduce the two general parts of any pharmacophore modeling program. However, extensive literature has been published that describes pharmacophore models in greater detail [111,112,113]. In general, pharmacophore modeling can be broken down into two general steps : (1) molecular super positioning of ligands and (2) scoring how well a ligand matches the pharmacophore features.

The molecular super positioning (also known as alignment ) of ligands is time consuming and represents a significant challenge to creating any pharmacophore model. This step inherently involves the alignment of flexible compounds that have multiple possible conformations. Precomputing ligand conformers is common in many of the pharmacophore program available today [111,112,113]. When conformers are pre-generated, pattern-matching techniques are then used to create the ligand alignment. Many pharmacophore programs use a rigid-body alignment technique that is some type of a maximum common substructure search [114] implemented with the Bron-Kerbosh clique detection algorithm [115] that accounts for the spatial arrangement of pharmacophore features. Scoring functions differ between software, but they generally account for things such as number of matching pharmacophore points along with the spatial orientation and the internal energy of the matching ligand conformer along with some sort of volume or binding site matching term. Throughout the pharmacophore building process, several parameters must be set and optimized, thereby complicating the process of creating an optimal pharmacophore model or one that the entire community uses or accepts for that matter. The reference ligand, or set of ligands, used to create the pharmacophore alignment is often subjective and requires the skill and knowledge of a computational expert.

However, it can be especially challenging to create useful pharmacophore models for targets that are known to be flexible and promiscuous in binding many compounds. Most ADME targets fall into this class, but there is no lack of pharmacophore models published for such targets [107, 109, 116,117,118]. For example, pharmacophore models have been published for several CYP enzymes , including CYP3A4, that are known to be extremely flexible and recognize diverse compounds. Figure 4.9 displays a pharmacophore model for the organic anion-transporting polypeptide 1B1 (OATP1B1), a liver-specific uptake transporter that lacks high-resolution structural information.

Fig. 4.9
figure 9

Reproduced from Ekins et al. Comparative pharmacophore modeling of organic anion-transporting polypeptides: a meta-analysis of rat Oatp1a1 and human OATP1B1. J Pharmacol Exp Therap 2005, 314(2):533–541 [116]. Pharmacophores generated from substrate data for human OATP1B1 expressed in oocytes (showing bilirubin mapped to features) (a), human embryonic kidney cells (showing bilirubin monoglucuronide mapped to features) (b), rat Oatp1a1 expressed in oocytes (showing aldosterone mapped to features) (c), CHO cells (showing BSP mapped to features) (d), HeLa cells (showing taurohyodeoxycholate mapped to features) (e), merged OATP1B1 model using pharmacophores described in a and b (f), meta-analysis model using all cell type compound data for human OATP1B1 (showing bilirubin mapped to features) (g), and merged Oatp1a1 model using pharmacophores described in c, d, and e (h), showing aldosterone mapped to features (i). Pharmacophore features include hydrophobes (cyan), negative ionizable (blue), and hydrogen bond acceptors (green)

While many pharmacophore publications exist, in many instances pharmacophore models are created using a small subset of compounds known to bind to such targets (10–15 compounds maximum). Such models may perform well on very similar compounds (meaning if the alignment was done with a statin compound, the pharmacophore model more than likely will predict other statin-like compounds as likely to interact with the target), but they are not particularly useful in a drug discovery setting where diverse chemistry is being explored on many projects.

The other extreme also is problematic for ADME targets , meaning creating a pharmacophore model based on hundreds of compounds. This is due to the fact that generating a “unique” pharmacophore pattern for ligand binding is extremely challenging given the diversity of compounds. More often than not, the number of unique matching pharmacophores for several hundred diverse structures will be very few and limited. For example, a pharmacophore model constructed on 500 OATP1B1 inhibitors may only have three pharmacophore points that match the majority of the 500 compounds. When this occurs, the pharmacophore model is not useful as it is incapable of differentiating between active and inactive compounds in the data set. In order for any pharmacophore model to be useful, it has to be shown to not only differentiate active vs inactive compounds but additionally it must have predictive power that informs the design of de novo compounds. This validation criterion is not examined in many published ADME pharmacophore models , and it is essential to evaluate before making the claim that a useful model has been created.

2.6 Site of Metabolism Prediction

Understanding and modulating drug metabolism is one of the fundamental concepts of ADME. Several computational techniques exist to predict the site of metabolism (SOM) on compounds. It should be noted that publications and research on SOM prediction exist for metabolizing enzymes other than CYPs [119,120,121,122]. However, due to their significance in metabolizing compounds, SOM predictions by CYP enzymes dominate the published literature and will be the focus of this section.

Prior studies predicting SOM of compounds interacting with CYPs have utilized a variety of computational methods such as quantum chemical calculations, pharmacophore models, QSAR, molecular docking, MD simulations, and basic empirical/chemical rules [13, 121, 123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138]. Recent reviews published on CYP SOM prediction provide a good summary of prior studies and techniques used [139, 140]. Although previous studies have been performed to predict SOM, there is no consensus about which method performs “best.” In general, the top performing methods claim to accurately predict the experimental SOM 80% of the time or greater.

Recent thinking suggests that the SOM of a compound is influenced by two factors: (1) the intrinsic reactivity of each site in the compound to oxidation and (2) the accessibility of individual atoms to the CYP heme group, the site where oxidation occurs in the enzyme. The intrinsic reactivity is normally estimated using Hartree-Fock, semiempirical methods such as the Austin Model 1, or density functional theory quantum mechanical calculations of the chemical reaction. Accessibility to the CYP heme group is routinely estimated with solvent-accessible surface area calculations, molecular docking, and other structural features.

Several commercial SOM prediction programs exist that allow users to profile compounds to overcome metabolic liabilities. While this may be possible, caution should be used when proposing such a strategy using SOM tools in isolation. In a publication by Vaz et al. [141], they address problems associated with the metabolic “blocking” strategy. Metabolic “blocking” occurs when a halogen atom, typically a fluorine atom, is attached to the atom/region of the compound susceptible to metabolism in order to reduce the metabolic turnover. Despite literature examples where this strategy was shown to be successful, the general strategy of “blocking” typically shifts the SOM to another atom or region of the compound due to the promiscuous nature of CYPs. In many instances, halogenating a site, typically an aromatic ring, makes the compound more lipophilic. This ultimately can lead to no change, or even increase, in affinity for CYPs and thus expose other sites on the compound to oxidation. In addition, the more lipophilic compound could potentially fit the CYP pocket better and hence become potential CYP inhibitors. By possibly fixing one ADME problem (metabolism) by introducing additional lipophilicity through “blocking,” another problem may also arise in the form of solubility limitations.

When trying to mediate metabolic ADME problems, we suggest that multiple in silico tools and methods are used to provide a balanced ADME profile of a compound. In addition to SOM prediction software, in silico models of unbound intrinsic clearance, metabolic stability, log P, and solubility should be monitored with any proposed structural change to mediate a metabolic liability. Besides altering the reactivity of a particular site, we suggest evaluating options to reduce the affinity of a compound for CYPs as well. A reduction in log P by modifying hydrophobic groups into polar moieties and/or removing hydrophobic fragments from the compound is more likely to provide the reduction in metabolic turnover needed for a particular project.

2.7 SPR/STR Knowledge Extraction Using Matched Molecular Pair Analysis

Knowledge-driven modification of compounds is desirable to achieve the optimal potency and ADME properties. For each drug discovery project, a useful QSAR/QSPR model is able to accurately predict the activity of a compound. However, the model provides limited information pertaining to what modifications should be made to the compound in the next cycle of drug design. The matched molecular pair analysis (MMPA) technique is a promising approach to address this issue. MMPA was first coined by Kenny and Sadowski [142] to describe any systematic method of identifying structural matched molecular pairs (MMPs) from a set of compounds and associated property change. In this context, MMPs are generally defined as pairs of compounds that differ only by a single, localized structural transformation, and Fig. 4.10 shows an example [144].

Fig. 4.10
figure 10

Permission to use from Papadatos et al. [143]. Example of a matched molecular pair. The transformation is H to CF3 (a single-point change) and is highlighted in blue. The asterisk in the context denotes the attachment point

The basic premise of MMPA is essentially an extraction of information within a chemical series featuring a common core. The property of interest can be plotted against the substituents at a given position of the core in order to identify the effects of the structural transformation on the property [145]. Various automated methods, including supervised and unsupervised methods , have been developed to identify MMPs and quantify the associated biological changes on large data sets. Supervised methods require predefined molecular transformations to identify the MMPs in the data set [144, 146]. However, any possible MMPs that are outside the predefined structural transformation dictionary cannot be identified. Unsupervised methods have the potential to identify all MMPs within a compound data set without a predefined molecular transformation dictionary [147,148,149,150,151]. It decomposes the compounds into fragments first and then indexes the fragments for rapid sorting and identifies the core scaffolds and R-group substituents. For a more detailed summarization of current MMPA methods, the reader is referred to a review by Griffen et al. [145].

After the MMPA algorithm identifies all possible MMPs, the results are tabulated to show differences between MMPs for a measured endpoint. The effect of a specific chemical substitution is typically summarized by the mean response change, the sample standard deviation of the response change, and the standard error of the mean for each endpoint . The total number of pairs identified for each substituent is also reported to assess the significance of the effects. Leach et al. recommended at least 20 MMPs should be identified for a useful molecular transformation [144]. More recently, Kramer et al. have recommended the use of paired t-test to calculate the number of pairs necessary to achieve statistical significance with a given average activity difference. They also demonstrated the importance of building pairs from identical assays measured in the same laboratory [152].

To provide quick and easy understandable guidance, the effects of a molecular transformation on different endpoints can be summarized by a simple symbolic colored arrow or circle that informs the medicinal chemists what compounds to be synthesized [153]. In addition, the structural transformations information can be summarized as rules in a knowledge database. By querying a compound of interest against the knowledge database with MMP rules in place, virtual compounds can be proposed to determine if the property of interest is likely to improve with the associated structural modification.

MMPA methods have been used to assess the mean effect of different substituents on various ADME parameters such as solubility [143, 144, 154], permeability [147, 149], clearance [149], and CYP inhibition [147]. Not surprisingly, common structural modifications, such as replacing hydrogen with a methyl group or changing a methyl to an ethyl substituent, were the most frequently observed MMPs [149].

In general, the structural changes that displayed favorable changes for an endpoint could also be explained by the associated change in physicochemical properties. For example, Gleeson et al. reported that replacing an aliphatic hydrogen atom with a hydroxyl, ethyl, or benzyl group leads to a decrease in CYP3A4 pIC50 > 0.2 log unit in 55%, 15%, and 10% of MMPs. This finding correlates well with the change in clog D (pH 7.4) of the substituents [147], meaning that as the compound becomes less lipophilic, it is less likely to be an inhibitor of CYP3A4. This observation is aligned with our internal analysis of trends between lipophilicity and CYP3A4 inhibition (Fig. 4.8).

Leach et al. also found that the addition of heavy halogens on aromatic rings was detrimental to solubility and a numerical estimate for such effects was also calculated. For instance, adding bromine to an aromatic ring led to over an order of magnitude reduction of aqueous solubility [144]. Therefore, if a drug discovery team is trying to increase the solubility of their scaffold, they should avoid adding heavier halogens, such as bromine, to their compounds.

While molecular substitutions that track closely with the molecular properties can be useful in guiding the design of new compounds, they may not be overly insightful to a well-versed medical chemist. It is more interesting to identify the substituents that display changes not associated with their physicochemical property changes. For example, despite the considerable increase in lipophilicity caused by phenyl substitutions of an aliphatic hydrogen (Δclog D at pH 7.4 of +1.8 log units), the average change in pIC50 of CYP1A2 inhibition for 147 pairs of compounds was quite insignificant (ΔpIC50 of 0.11) [147].

Another type of MMP is called “switch” transformations , which acts to turn on or turn off the activity. Regardless of the starting value of the endpoint, such MMP transformation results in approximately the same ending value. For example, it was reported that the replacement of a hydrogen by a 4-piperidine group resulted in a microsomal clearance value of ~20 μL/min/mg for all the studied compounds regardless of the starting microsomal clearance values [149].

One should be aware that MMPA results depend on both the transformation and the chemical context. This is manifested by the observation that although many of the molecular transformations are statistically significant with large mean activity changes, most of them also have high variability [149]. Therefore, making conclusions based on the average activity change across the entire MMPA data may be misleading for the chemical series of interest [143, 147]. For example, global context independent MMPA indicated that substituting a pyrimidine for a hydrogen atom increased CYP2C9 inhibition [147]. However, when the same substitution occurred for an aliphatic hydrogen (context dependent), a decrease in CYP2C9 inhibition was observed [147].

Another example also showed the importance of the chemical context for the MMP transformation. It was observed that transforming a piperidine ring into a morpholine ring has conflicting effects on solubility depending on whether the transformation was added to a polar aromatic ring or a positively ionizable aliphatic ring (Fig. 4.11) [143]. Several recent publications have proposed adding two dimensional contextual information about the compound or three dimensional (3-D) information pertaining to binding environment into the MMPA analysis to address the issue of context dependency in MMPA [155, 156].

Fig. 4.11
figure 11

Permission to use from Papadatos et al. [143]. Global and local MMPA distributions for the piperidine to morpholine transformation for a solubility data set. The colors reflect the effect of each transformation with red, amber, and green denoting unfavorable (decrease), zero, and favorable (increase) changes in solubility. Different outcomes are observed depending on the context of the compound; if the attachment point is a polar aromatic ring [V], then there is an increase in solubility, while if the attachment point is a positively ionizable aliphatic ring [Y], then solubility decreases

3 Integrated and Iterative Use of Models in Early Drug Discovery

As mentioned in the introduction to this chapter, the application of in silico, in vitro, and in vivo models is inherent to the drug discovery process. It should be noted that the use of such models in isolation is unlikely to be fruitful and may even be misleading. Therefore, models should be applied in an integrated and iterative fashion to build structure-activity and structure-property knowledge toward identifying the best clinical candidate possible for any given drug discovery project.

Once a scaffold has been identified that interacts with the desired pharmacological target, to assess the applicability of in silico ADME models for that particular scaffold, one needs to select a set of compounds that will be tested in vitro. As depicted in Fig. 4.12, this representative set should span the range of predicted in silico values, include various physicochemical characteristics, and include as much structural diversity as possible in order to systematically evaluate in silico model(s). While it would be preferred to select “active” compounds against the biological target for this assessment, this is not a requirement. It is more important to focus on including diversity as mentioned above. The in silico-in vitro analyses will help assess whether the in silico model(s) are applicable for a particular scaffold or along with predicted physicochemical properties can be used to guide and prioritize the synthesis of compounds. In an analogous manner, it is equally important to explore the relationship between in vitro ADME models and the in vivo profile of compounds in order to select an appropriate suite of in vitro tools to prioritize the selection of compounds for in vivo assessment. This iterative learning cycle (shown in Fig. 4.12) provides an efficient strategy to identify and resolve various challenges related to optimizing compound potency and ADME properties rather than using a filtration approach where only the active compounds progress for in vitro and in vivo ADME measurements.

Fig. 4.12
figure 12

Integrated and iterative use of models in early-phase drug discovery . The left schematic shows the recommended process to identify and integrate in silico, in vitro, and in vivo models. The schematic on the right illustrates the importance of the iterative learning cycle

To detail how this integrated and iterative process unfolds in the pharmaceutical industry, consider this example. The typical goal of most small compound drug discovery project is to identify compounds that can attain, and maintain, sufficient in vivo unbound concentration to engage the pharmacological target following oral dosing. To that end, it is important to balance compound potency with key ADME parameters like solubility, permeability, and clearance from the body. For this example, let us assume that the discovery project team has access to global QSPR models for solubility , permeability, and microsomal stability.

The first step to establish the in silico-in vitro connectivity is to select a set of compounds from the scaffold and subsequently compare the outcome from corresponding in vitro measurements. This set of compounds should represent a range of predicted property (solubility, permeability, and microsomal stability), calculated phys-chem properties (e.g., clog P, TPSA), and be structurally diverse. This step will determine if the global ADME QSPR models are applicable for the scaffold in question and if they provide reasonable predictive performance to enable the prioritization and design of compounds predicted to have a balanced ADME profile in terms of the three ADME endpoints mentioned above.

Before implementing this strategy, it is important to test a small set of compounds spanning a range of measured solubility, permeability, and microsomal stability in the in vivo models to determine whether the oral exposure of these compounds is aligned with their in vitro profile. For example, if the in vivo clearance is rapid for compounds with low microsomal turnover in vitro, it would suggest that the primary clearance mechanism for such compounds is likely to involve non-oxidative pathways and/or excretion via renal or biliary route. Typically, elimination routes outside the oxidation pathway would not be identified using a microsomal stability assessment (in silico or in vitro). In such cases, one might consider testing the compounds in an in vitro hepatocyte clearance model (that will account for various non-CYP metabolic enzymes) to see if better alignment is observed with in vivo clearance. Once a suitable suite of in silico and in vitro tools have been identified that align with key in vivo characteristics, an efficient and robust strategy to integrate these models in an iterative manner can be implemented.

4 Summary

In this chapter, a variety of structure- and ligand-based in silico methods used to identify and resolve challenges related to the optimization of key ADME properties have been described. Given the promiscuity of many ADME targets and the limited availability of high-resolution 3-D structures, structure-based in silico techniques like docking and MD simulation have significant challenges and therefore have limited applicability for this purpose. Ligand-based in silico methods such as pharmacophore models can be useful to identify key structural features responsible for the interaction with the target of interest. However, due to broad ligand specificity and likelihood of multiple binding sites (e.g., P-glycoprotein) for many ADME targets, pharmacophore models also have limited prospective applicability across structurally diverse chemical scaffolds.

QSPR models, especially machine learning models, can extract knowledge from a wide variety of chemical scaffolds and a large number of compounds enabling their utility as predictive models for many ADME endpoints. Not surprisingly, QSPR models are one of the most commonly employed in silico tools for ADME optimization during the drug discovery process, especially in an industrial setting where a large number of structurally diverse compounds are routinely measured in a variety of ADME assays. At the same time, QSPR models have limited interpretability and thus typically don’t provide direct clues to design new compounds to address ADME challenges.

To address that limitation of QSPR models, trends with calculated physicochemical properties like molecular weight, clog P, TPSA, and others are effectively utilized during the design process to optimize the ADME characteristics of a given chemical scaffold. Similarly, knowledge extracted by the MMPA of existing ADME data also provides clues that identify fragment replacements toward improving the ADME characteristics.

To summarize, an effective amalgamation of in silico tools is valuable in guiding the design of compounds with favorable ADME properties on a drug discovery project. These models must be verified to show they provide valid predictions or the integrated in silico-in vitro-in vivo cycle breaks down. Finally, in silico tools should never be used in isolation. They make up one arm of the integrated and iterative learning cycle that we recommend using in order to effectively drive a drug discovery project.