Keywords

1 Introduction

The advancement of a new chemical entity (NCEs) to become a drug candidate is a slow, complex, expensive and multi task process. Along this long road, identification of the disease and the isolation and validation of the molecular target(s) are the first crucial steps. Next, the right drug candidates to interact with the validated target are designed, synthesized and tested for their preclinical and clinical efficacy and safety (Satyanarayanajois 2011; Speck-Planche and Cordeiro 2015). Despite the great advances in science and technology, this process can take around 15 years with a cost of hundreds of millions of dollars (Paul et al. 2010). In fact, much of this cost comes from failures, which account for 75% of the total drug discovery and development expenses. On the other hand such failures if appropriately consolidated, contribute to the body of knowledge on biological complexity.

To prevent late-stage project interruptions, research is shifted to reduce the uncertainties and obtain a proof of concept (POC) for a molecule as a potential medicine in earlier phases of development. Thus, investigation of the fate of a molecule in the organism, considering appropriate pharmacokinetics as well as safety and adverse reactions profiles should advance in parallel with affinity for the target receptor(s) (Gaviraghi et al. 2001; Swift and Amaro 2013). The fate of drug molecules within the organism is principally controlled by ADME properties which stand for absorption, distribution, metabolism and elimination. (Rogge and Taft 2010; Testa et al. 2005b). Poor absorption and thereupon poor bioavailability have been in the past one of the main reasons for the failure of drug candidates. According to more recent statistics, the most important issues to be confronted are drug efficacy and drug safety, associated mainly with plasma protein binding, metabolism and off target activity (Kola and Landis 2004).

Computer-aided approaches and chemoinformatics, applied during the different stages of the pipeline, permit an effective handling of such failures and uncertainties, facilitate candidate selection and speed up their long journey to the market. Reliable models obtained by Quantitative Structure-Activity Relationships (QSAR) and Quantitative Structure-Property Relationships (QSPR) offer decision support upon rationalizing the drug discovery procedure in line with the Quick Win, Fast Fail concept, allowing a pre-selection of compounds with more chances to succeed in later phases (Owens et al. 2015). In this context, a new scientific area has emerged, defined as pharmacoinformatics, which enables the management of all available information from binding to kinetics and toxicity for safer drug candidates (Goldmann et al. 2014).

In fact, successful drug candidates usually represent a compromise between the numerous, sometimes competing objectives so that the advantages for patients outweigh potential drawbacks and risks. However, in order to benefit from QSAR/QSPR models, the appropriate criteria for their evaluation and thereupon their proper use and/or interpretation are essential. Such criteria as well as the ultimate goal of the models may differ according to the timeline and the particular process modeled.

The present chapter provides an outline of the philosophy, the state of the art and the strategies for QSAR/QSPR generation. Distinction between QSAR and QSPR is primarily associated with the traditional drug design steps, concerning lead optimization for efficient receptor binding and predictions of pharmacokinetic/toxicity properties, respectively. After an overview of the common features for in silico modeling, QSAR models for pharmacodynamics properties, e.g., binding to target receptor(s) or off-target proteins and QSPR models for pharmacokinetic process (ADME properties) are discussed in separate sections. According to the underlying mechanism QSPR models concern both models for passive phenomena and for bonding to proteins. In all cases, two critical interdependent issues are addressed throughout the chapter: (i) the value of global models built on large and chemically diverse datasets and that of local models , built specifically for a series or project, and (ii) the importance or not of model interpretability (Cox et al. 2013; Fujita and Winkler 2016).

2 Historical Aspects and Background

Early QSAR studies were based on the assumption that biological activity can be quantitatively expressed as a function of chemical structure (Brown and Fraser 1868). They involved the establishment of model equations in order to understand and if possible to predict biological activity on the basis of structural parameters, as expressed by equation of type (1).

$$ {\text{Biological activity}} = {\text{a}}_{0} + {\text{ a}}_{ 1} {\text{P}}_{ 1} + {\text{a}}_{ 2} {\text{P}}_{ 2} + \cdots + {\text{ a}}_{\text{n}} {\text{P}}_{\text{n}} $$
(1)

where P1…Pn are physicochemical/molecular properties characterizing the compound structures and ao a1…an the constants derived by multiple linear regression analysis (Hansch et al. 1995b; Hansch and Fujita 1964; Martin 1978).

Although biological activity was not always considered at the molecular level, it was recognized as an essential prerequisite that the analyzing compounds should act at the same receptor and with the same mechanism of action. Within a congeneric series it was assumed that all other factors influencing the manifestation of drug action should have similar impact. In regard to the description of chemical structure, the well-known Hansch analysis recognized three major categories of physicochemical parameters, namely lipophilicity, electronic properties and steric (geometric) properties (Eq. 2).

$$ {\text{logBR}} = - {\text{alogP}}^{ 2} + {\text{blogP}} +\uprho \upsigma +\updelta{\rm E}{\varsigma } + {\text{c}} $$
(2)

where logBR is a general expression for biological activity in its logarithmic form to be linearly related to free energy, logP is the logarithm of octanol-water partition coefficient, the widely accepted measure of lipophilicity, σ Hammett’s electronic substituent constant and Ες Taft’s steric substituent constant (Hansch 1969; Hansch and Fujita 1964).

Evidently, early QSAR models could be developed only for congeneric compounds, having a common skeleton and different substituents. In those models, lipophilicity was considered as the physicochemical property of primary importance, since it was understood to influence both pharmacokinetics and pharmacodynamics (Kubinyi 1979; Leo et al. 1971; Pliška et al. 1996; Van de Waterbeemd and Testa 1987). A parabolic relationship between lipophilicity and membrane passage was assumed; thus the quadratic term in Eq. 2 reflects transport to the active site, considering all other pharmacokinetic issues equal within a congeneric series (Hansch and Clayton 1973). Since, the parabolic relationship between potency and logP did not fit all datasets, Kubinyi proposed a bilinear relationship, which allows for different slopes at low and high logP values (Kubinyi and Kehrhahn 1978). At the same time calculation methods for logP were developed, based on the additivity principle. The hydrophobic substituent constant π and soon later the hydrophobic fragmental constant f or their Σπ and Σf, accounting for all substituents/fragments on the parent structure, could replace logP of the whole molecule, in line with the other substituent constants in Hansch analysis (Hansch and Leo 1979; Rekker and Mannhold 1992).

In fact, Hansch analysis, firstly applied in agrochemistry, drug design, toxicology, industrial and environmental chemistry (Dunn 1988; Hansch et al. 1995a, 1963; Muir et al. 1967), marked a breakthrough in the way of thinking in medicinal chemistry and the start of the new discipline of QSAR (Ganellin 2004), with the mission to exploit the increasing amount of information in the aim to facilitate drug discovery.

Since those early days, QSAR has undergone a tremendous evolution in regard to all aspects, the target end points, the structural representation, the implemented statistical tools, as well as its own standpoints (Cherkasov et al. 2014; Cramer 2012; Puzyn et al. 2010; Tsantili-Kakoulidou and Agrafiotis 2011). In view of biological complexity QSAR has adapted to the multi-task concept, taking advantage of technological achievements and moving from the perception of single-objective drug design to the multi-objective drug discovery and development (Fujita and Winkler 2016; Jorgensen 2004; Speck-Planche and Cordeiro 2015). The multiple tasks addressed by QSAR/QSPR and the tools implemented to construct the models are illustrated in Fig. 1.

Fig. 1
figure 1

Tasks addressed by QSAR/QSPR and tools implemented in model construction

Thus, QSAR/QSPR models are generated to address two goals, each of which has its own value: One goal is to establish models which provide an insight of the properties or chemical features that correlate with a biological assay and thereupon an understanding of the mechanism of action. Such models are valuable support for the design of novel compounds with affinity to a target protein. The second goal is to create models, which provide accurate prediction of large chemically diverse datasets and address a variety of biological endpoints, as well as different pharmacokinetic processes. Such models allow ranking of compounds prior to synthesis or set priorities among drug candidates for proceeding to further development (Birchall et al. 2008a, b; Nicolotti et al. 2002).

3 Experimental Data and Endpoints in QSAR/QSPR

The multi-objective QSAR starts with data analysis for hit identification, followed by hit-to-lead optimization (lead discovery) and lead optimization (Jorgensen 2009). For hit identification, virtual screening has gained a crucial role, as a consequence also of the continuous emergence of novel biological targets (Schneider 2010; Vasudevan and Churchill 2009). QSAR end-points are usually measured at the molecular or cellular level. The advent of robotized biological testing in the 1990s (Ashour et al. 1987; Houston and Banks 1997; Löfås and Johnsson 1990; Navratilova et al. 2007) has led to the creation of large databases, freely accessible in the public domain, which incorporate millions of compounds with associated bioactivities. PubChem (https://pubchem.ncbi.nlm.nih.gov) and ChemSpider (http://www.chemspider.com), the two major collections of chemical structures on the web, currently include over 30 million compounds each. ZINC (http://zinc.docking.org), a database frequently used for virtual screening applications, incorporates a total of approximately 21 million compounds (Irwin 2008; Moura Barbosa and Del Rio 2012; Wang et al. 2012). In such databases the results of many screens are presented in the form of scores for many compounds on a given assay, while they also contain information on the structures of compounds and the target of particular assays. More detailed data about binding assays can also be found in Binding Database (www.bindingdb.org) which is a public web-accessible database of measured binding affinities containing more than 1 million binding data for nearly 500,000 small molecules and thousands of proteins (Gilson et al. 2016).

However there is a warning on the use of the databases, since they may include inconsistencies concerning both chemical and biological data, while the chemical structures may be inaccurate or presented in a non-consistent way. Therefore curation of the data sets is recognized as a critical step for the establishment of good quality models (Akhondi et al. 2012; Cherkasov et al. 2014).

More to the point, there are databases with sets of inactive compounds (decoys) for several biological targets together with a small set of known active compounds (Mysinger et al. 2012) or even software to produce decoy datasets based on similarity with known active compounds (Cereto-Massagué et al. 2012). Decoy data sets are useful for validation of the QSAR/QSPR models.

When searching in structural databases for experimental binding affinities, one could find different biological data. They may be expressed as continuous response such as IC50, EC50, Ki, Kd, % inhibition, etc., or as categorical response, e.g., active/inactive. Continuous response values are preferably used in their negative logarithms, so as to be in linear correlation with free energy. In line with this concept, ChEMBL database introduced the pChEMBL activity value, defined as −log(IC50, XC50, EC50, AC50, Ki, Kd or Potency) in M units (Papadatos et al. 2015). This value allows a number of roughly comparable measures of half-maximal response concentration/potency/affinity to be compared on a negative logarithmic scale (https://www.ebi.ac.uk/chembl/faq#faq67). This approach has also been implemented in software for large scale off-target pharmacology and predictive safety of small molecule such as CTLink (http://www.chemotargets.com).

Besides the compound databases, there is also a wealth of deposited gene expression data available for downloading and/or online interrogation For example, the NCBI gene expression omnibus (GEO) (Barrett et al. 2007) hosts over half a million single array chip expression profiles and the EBI hosts the Array Express database (Parkinson et al. 2010) with a similar largely overlapping number of arrays. Gene expression-based screening (GE-HTS) represents a strategy for identifying modulators of biological processes with little a priori information about their underlying mechanisms. It is mainly used in cancer research, where it detects compounds, which may revert undesired oncogenic states to nonmalignant or drug-sensitive states (Evans and Guy 2004; Williams 2012). It is evident that for the screening procedure, good prediction models are necessary, complying with the second goal as described in Sect. 2. In such case model interpretability is not a priority. In contrast, the transition from hit identification to lead discovery and optimization requires models which should provide an understanding of the molecular factors involved and a sound physicochemical interpretation, while in-house affinity measurements of the novel compounds are used as endpoints.

The range of affinity values is a crucial issue for model construction. Generally it should be significantly greater than the experimental error among the biological data. Considering that such errors can often exceed half a log unit (Gedeck et al. 2006) it is recommended an endpoint value range of at least 1.0 log unit to obtain a reasonable QSAR model (Cherkasov et al. 2014).

Lead optimization in regard to other pharmaceutical properties, while maintaining affinity, is a next important step. This is a multi-objective process involving many experimental parameters (assays) related to physicochemical properties, ADME properties, plasma and tissue protein binding, target selectivity, off-target activities and toxicity. These properties influence considerably the efficacy and safety of drug candidates and are potential causes for attrition. Rapid in vitro measurements have been and are being developed for permeability and for plasma protein binding assessment and toxicity protocols have been established (Artursson et al. 2001; Kansy et al. 1998; Kariv et al. 2001; Rich and Myszka 2000). On the other hand, there are many efforts for in silico prediction of many of these endpoints by constructing appropriate QSARs or QSPRs (A Cabrera-Perez et al. 2012; Dearden 2007; Lambrinidis et al. 2015; Swift and Amaro 2013). Certain global models for toxicity predictions are approved by OECD and provide support to regulatory authorities (Larregieu and Benet 2013). More to the point, predictions on secondary targets may be useful for the safety profile as well as for drug repurposing (Hodos et al. 2016; Sheridan et al. 2015).The implementation of QSAR/QSPR in the complex drug discovery process is demonstrated in Fig. 2.

Fig. 2
figure 2

Implementation of QSAR/QSPR in the drug discovery process

The splitting of QSAR models to encompass various areas of biological complexity has challenged the development of workflows, which integrate QSAR/QSPR models of selected endpoints, including affinities for different target proteins/off-targets and pharmacokinetic data (Cartmell et al. 2005). Consensus predictions using all acceptable models may contribute to further decisions in selecting future experimental screening sets. In inductive knowledge transfer approaches, treating multi-task modeling, the individual QSAR models are not considered separately but they are viewed as nodes in a network of inter-related models (Cherkasov et al. 2014; Qiu et al. 2016). Evidently, the quality of such integrated models largely depends on the quality of the available experimental data compiled in relevant databases, which should be carefully curated, as well as on the range of endpoint values, as already commented (Cherkasov et al. 2014; Gedeck et al. 2006). Interpretability of such models as a prerequisite depends on the purpose and the timeline that they are used along the drug discovery process. In regard to toxicity, for QSAR models to be accepted for regulatory purposes, interpretability is often a crucial issue. According to OECD “To facilitate the consideration of a QSAR model for regulatory purposes, it should be associated with … a mechanistic interpretation, if possible” (www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf).

4 Tools Implemented in Model Construction

4.1 Molecular Structure Representation-Descriptors

Molecular structures are represented by descriptors which mediate their relation with activity. Thus, molecular descriptors are at the core of QSAR modeling.

In line with the definition of Todeschini and Consonni (2009), molecular representation has moved forward from substituent constants to variables suitable to portray diverse molecules, belonging to different chemical classes. A variety of software calculates a large number of different physicochemical/molecular properties and theoretical descriptors, starting from SMILES, 2D-chemical graphs to 3D-x, y, z-coordinates or based on mathematical algorithms or statistics. Some of the most popular software are DRAGON, which calculates more than 4000 descriptors (http://www.talete.mi.it/products/dragon_description.htm), ADAPT (Stuper and Jurs 1976) (http://research.chem.psu.edu/pcjgroup/adapt.html), OASIS (Mekenyan and Bonchev 1986), CODESSA (Katritzky et al. 1994), MOE-Chemical Computing Group (https://www.chemcomp.com/) and MolConnZ (http://www.edusoft-lc.com/molconn/).

According to molecular structure representation, descriptors may reflect various levels of dimensionality, ranging from 0D to 4D and xD. 0D are based on molecular formula and are independent from molecular connectivity and conformations. 1D descriptors, reflect the substructure representation of a molecule, 2D descriptors are based on the two-dimensional structural formula (2D), while 3D descriptors are conformation dependent. 3D descriptors are based on thermodynamically favored conformation and necessitate geometry optimization. 4D descriptors reflect interactions with some probe within a grid, while higher dimension (xD) are receptor dependent descriptors. They represent each ligand molecule as an ensemble of conformations, orientations, tautomeric forms and protonation states (Ekins et al. 1999; Hopfinger et al. 1997; Vedani et al. 2000, 2005). Using enhanced molecular dynamic simulations, the overall conformational change of the receptor upon ligand binding can be simulated, producing more vital structural descriptors (Sohn et al. 2013). Such approaches can be considered as a promising link between structure and ligand based strategies (Polanski 2009; Caporuscio and Tafi 2011). An atlas of the available descriptors, the theory used for their calculation and their information content, has been compiled by Todeschini and Consonni (2009). In Table 1, a classification of representative descriptors is presented.

Table 1 Classification of representative descriptors according to the theory used

Among the physicochemical descriptors, logP keeps its central role in drug-protein and drug-membrane interactions, as well as in permeability models. Nowadays, there are many algorithms for logP or logD calculation, implemented in relevant software. They are based on the additivity principle and have been developed upon analysis of a large amount of experimental data (Mannhold and Dross 1996). More to the point, calculation of logD necessitates knowledge on pKa, while charge is a crucial determinant also in drug action (Csizmadia et al. 1997). Actually most of the logP, pKa and solubility prediction algorithms are QSPR models per se. Some global logP models are implemented in software workflows, which allow the user to utilize his/her own compound library as input in order to refine predictions (Tetko et al. 2001). A comprehensive description and classification of the logP/logD calculation systems and software is provided by Mannhold et al. (Mannhold et al. 2009). Among them, ClogP is often considered as a reference calculation system, while it has been included in most rules for druglikeness (see Sect. 5). Some software for logP/logD prediction are free available on the web.

Despite the large arsenal of available software, the correct selection for logP/logD prediction is not always easy, since often the outcome of the different algorithms shows considerable variations. Although this is not an issue for models intended to screen large compound libraries, it becomes crucial for local models established for lead optimization or for predictions within congeneric compounds (Chrysanthakopoulos et al. 2009; de Melo et al. 2009). In such cases it is important that the compounds analyzed fall within the applicability domain of the training set, used to construct the prediction algorithm (see Sect. 4.3) (Tetko et al. 2009).

Next to lipophilicity, other molecular properties such as molecular volume and surface area, polarizability, molar refractivity, polarity descriptors, dipole moments, hydrogen bond acidity/basicity, as well quantum chemical descriptors, including energy parameters like EHOMO and ELUMO, maximum and minimum electrostatic potentials, partial charges etc., are most commonly used by medicinal chemists. Such descriptors considered as “well-founded”; actually fall within the frame of the three categories: lipophilicity, electronic and geometric descriptors, reflecting the recognition forces and steric requirements of binding to receptor active site. Thus they provide insight into the mechanism of action. More to the point, easily calculated physicochemical and molecular properties have created the basis for the development of the drug-like concept (see Sect. 4.1.1).

On the other hand, theoretical descriptors may be considered to reflect a direct detailed representation of molecular structure. However they are not easily interpretable and they do not provide a straightforward perception of the mechanism of action. Their use in QSAR/QSPR models is often faced with some skepticism and their contribution to model quality and validity performance compared to classical descriptors has been questioned in the case of lead optimization (Vallianatou et al. 2013). However, it is true that in some cases the most predictive model may not be the most interpretable (Birchall et al. 2008a, b; Nicolotti et al. 2002). The value of models with high prediction accuracy but low interpretability has already been discussed in Sect. 3.

To obtain information about molecular structure from QSAR/QSPR models with low interpretability, a procedure called reversible decoding or inverse QSAR is being developed. Topological and molecular signature descriptors are considered to be more suitable for inverse QSAR/QSPR (Faulon et al. 2005; Gozalbes et al. 2002).

Moreover, sub-structural descriptors and molecular fingerprints are important to establish similarity/diversity approaches, which gain increasing interest within the scientific community (Willett 2004). Such approaches are widely used for virtual screening and design of chemical libraries, which aid in the primary identification of promising hits.

Recently, chemical similarity between molecules is being extended to evaluate clinical effects, if combined with information derived from computing similarity based upon lexical analysis of patient package inserts. It is expected, that drugs with highly structurally similarity (both by 2D and 3D comparison) are much more likely to have significant overlap of their clinical effects, compared to drugs that are structurally different (low 2D similarity but high 3D similarity Yera et al. 2014). However in the search of new candidates chemical similarity does not always lead to biological similarity. Structure-Activity landscape may present the so called activity cliffs. Such discontinuities cannot be predicted by statistically derived QSAR models (Guha 2011).

In the case of toxicity predictions the incorporation of biodescriptors (short-term assays) as independent variables is suggested. Such descriptors are derived by in vitro quantitative high through put screening (qHTS) and in combination with chemical descriptors lead to hybrid models, which may exhibit higher accuracy (Sedykh et al. 2011).

Gene expression signatures of a desired biological state, derived from gene expression data are used to screen a compound library to identify compounds that induce this target signature and corresponding phenotype, while they may also be used as descriptors (Hieronymus et al. 2006; Stegmaier et al. 2004).

4.1.1 Drug-Like Filtering

The use of combinatorial methods during the last 30 years has produced a vast number of compounds, which tend to be more lipophilic, less soluble and with higher molecular weight than conventional drug entities (Hertzberg and Pope 2000). Such properties are often associated with unfavorable absorption, poor or inconsistent bioavailability, as well as with lack of selectivity and increased toxicity (Oprea 2000). To face this situation the concept of druglikeness was launched, defining boundaries on the chemical space and functioning as filter to guarantee a physicochemical profile enabling further development (Leeson and Springthorpe 2007; Yusof et al. 2013). Druglikeness provides useful guidelines for early stage drug discovery, following simple rules of thumb, which suggest cut-off values or ranges for certain properties. According to the rule of 5 (RoF), molecular weight (MW) should not exceed 500 Da, calculated lipophilicity (clogP) should not exceed 5, hydrogen bond donor sites (HBD) should not be more than 5, and hydrogen bond acceptor (HBA) sites not more than 10. Upon pairwise violation of these limits, bioavailability problems may occur in the case of orally administered drugs (Lipinski et al. 1997). RoF was further extended including cutoff values or ranges for additional properties, the most common being: Polar Surface Area (PSA) < 140, number of rotatable bonds (ROTB) < 10, Molar Refractivity (MR) in the range of 40–130, number of aromatic rings (AROM) < 3, total number of atoms in the range of 20–70 (Ursu et al. 2011; Veber et al. 2002). Lipophilicity is related also to safety endpoints. Increased relative risk (6:1) for an adverse event may be anticipated for compounds possessing high lipophilicity (ClogP > 3) and low topological polar surface area (TPSA < 75 A) (Hughes et al. 2008). It is also reported that for ClogP > 3 there is a dramatic higher risk for hERG channel inhibition, an endpoint associated with cardiotoxicity (Wager et al. 2011). More strict cutoff values are proposed for compounds intended to act in the Central Nervous System (CNS-likeness Pajouhesh and Lenz 2005). A quantitative estimate of drug-likeness (QED) has been proposed by Bickerton et al. (Bickerton et al. 2012) which relates the similarity of a compound’s properties to those of oral drugs based on eight commonly used molecular properties: MW, log P, HBDs, HBAs, PSA, ROTBs, AROMs and count of alerts for undesirable substructures.

For lead compounds the rule of 3 is suggested according to which MW < 300, logP < 3, HD < 3, and HA < 6 (Congreve et al. 2003). The rule of 3 is applicable mainly for fragment-based lead generation.

The rules of thumb are very simple and understandable, however they do not take into account inaccuracies in the prediction of logP and more important they do not consider the receptor demands. For instance, receptors of the PPAR family possess a very large hydrophobic cavity in their active center, requiring lipophilic ligands with high molecular weight, which in many cases violate twice the rule of 5 (Giaginis et al. 2008, 2007). Target specific lipophilicity profiles obtained through calculation of the logP and logD of ligand series for different receptors have recently investigated, showing also other targets where the compound libraries had mean logP ≥ 5, i.e., outside of traditional RoF space with respect to lipophilicity. Such knowledge in the early stages of drug development is very useful for the formulation strategy in later stage (Bergström et al. 2016).

The advantages of smaller and less lipophilic compounds as safer and more selective drug candidates were further recognized in terms of receptor binding. According to metrics such as ligand efficiency (LE) and ligand lipophilicity efficiency (LLE) affinity is normalized against molecular size, expressed as heavy atoms, or lipophilicity respectively (Abad-Zapatero 2007; Hopkins et al. 2014). Ligand efficiency dependent lipophilicity (LELP) takes both lipophilicity and molecular size into consideration by dividing logP (clogP) by LE (Tarcsay et al. 2012). In terms of thermodynamics, according to the above metrics drug—receptor binding should be optimized in regard rather to the enthalpic component through specific interactions. Such metrics may be used to prioritize drug candidates with quasi equal potency (Hann 2011; Leeson and Springthorpe 2007).

An update on recent applications of efficiency metrics and strategies to control drug-like properties and to replace problematic elements for improving drug design, is recently published by Meanwell (2016).

4.2 Modeling Techniques

Statistical tools mediate the relationship between structural descriptors and the response variable(s) leading either to regression or to classification models. Model building methods are incorporated in different software packages (Bruce et al. 2007). Multiple linear regression (MLR) analysis is a simple and still widely used technique, which however can handle a limited number of variables. Thus, as a first step, variable selection methods are applied to reduce the large number of calculated descriptors to a set which is information rich but as small as possible. Redundant descriptors and descriptors which show low variance or/and collinearity are removed. For further descriptor reduction, stepwise regression approaches are commonly used, with the drawback however that they are local search processes and may converge to local optima (Paterlini and Minerva 2010).

A promising alternative for variable selection is the use of genetic algorithms (GA). GAs explore the descriptor space simultaneously by a population of candidate solutions which compete and recombine, mimicking the process of natural selection (Mitchell 1998).

Reduction of the descriptors space is inherent in multivariate data analysis (MDVA) a popular statistical technique, which permits the simultaneous (not one at a time) treatment of large number of descriptors, while tolerating inter-relation between them (Eriksson et al. 2001; Wold et al. 2001). It is a projection method from a space with high dimensionality to a space with few dimensions (latent variables), characterized as principal components. Principal component analysis (PCA) is a powerful unsupervised classification method. Projection to latent structures defined also as partial least squares (PLS) is the regression extension of PCA. PLS can handle more than one response variables, under the precondition that they are to some degree inter-related. This is very important for multi-target drug design, for toxicity models or for the establishment of activity profiles of antimicrobial or anticancer agents (Vallianatou et al. 2013; Koukoulitsa et al. 2009). PLS analysis generates coefficients for the original variables (descriptors), which permit a straight-forward interpretation of the model.

MLR and PLS are linear methods and any non-linearity should be incorporated through data transformation before the analysis. On the other hand, machine learning (ML) methods are gaining increasingly important roles in the construction of classification and/or prediction models in several steps of the drug discovery process (Tao et al. 2015). They are effective dimension reduction methods, while allowing for non-linearity to be included in the models and the incorporation of variable interactions. Thus they can reflect biological complexity leading to models with high accuracy. Their drawback is their black box character, e.g., the inability for their rationalization and interpretation in chemical terms. Most popular ML techniques are artificial neural networks (ANN) and associative neural networks (ASNN), inspired by the function and structure of neural network correlations in brain, the k-nearest neighbor technique (k-NN), support vector machines (SVM), regression trees (RT) or random forest (RF) (Byvatov et al. 2003; Sakiyama 2009). The latter are also very useful in the creation of gene expression signatures (Lima et al. 2016). An overview of the machine learning methods, used mainly as prediction tools for ADME properties is given in a recent review by Tao et al. (Tao et al. 2015). Table 2 includes commonly used statistical tools, which are referred in the representative QSAR and QSPR examples, discussed in Sect. 5.

Table 2 Statistical tools, commonly used in QSAR/QSPR prediction or classification models

Models are evaluated by statistical data, the most commonly being correlation coefficient (R or r) and determination coefficient (R2 or r2), standard error of estimate(s), given also as root mean square error of estimate (RMSE). The adjusted determination coefficient (R 2adj ) for degrees of freedom allows for comparison between QSARs with different numbers of descriptors and can indicate if a given QSAR model is overfit incorporating too many descriptors. The Fisher test F provides an indication of a chance correlation, while the Student test t is used to evaluate the significance of descriptors in MLR. In multivariate data analysis, the variable importance to projection (VIP) criterion is used instead. In ANN, the contribution of molecular descriptors is based on the ratio between the performance of neural network before and after the elimination of each descriptor (sensibility analysis).

Visualization of the results, fitting the line on the graph of observed versus predicted values, enables to check for outliers or trends in the data, while it provides an overview of the predictive power of the model. In fact a good model should show an 1:1 correlation between observed and predicted values. Detected outliers should be submitted to further investigation—they may unravel interesting information. Further statistical data are related to model internal or external validation (Sect. 4.3).

For classification models, % sensitivity defined as the ratio of percentage of true positives in respect to the sum of true positives + false negatives, % specificity, defined as the ratio of percentage of true negatives in respect to the sum of true negatives + false positives and %CCR (correct classification rate or balanced accuracy) equal to (sensitivity + specificity)/2 are common statistical data to evaluate the merit of the models. It should be noted that acceptance criteria depend on the quality of experimental data, as well as on the ultimate goal of the QSAR/QSPR performed.

4.3 Model Validation

Whatever modeling technique is used, validation of QSAR models has received considerable attention in the last decades (Guha and Jurs 2005; Tropsha et al. 2003; Veerasamy et al. 2011). Validation requirements are becoming increasingly strict so as to assure robust models, which can lead to reliable predictions and to proof of concepts. According to the European center for the validation of alternative methods (ECVAM) four tools, the methods accepted for estimating the prediction accuracy include (i) cross-validation, (ii) bootstrapping, (iii) randomization of the response data, and (iv) external validation (Worth et al. 2004).

Cross-validation as an internal model validation method is usually performed by the ‘leave-one out’ (LOO) or ‘leave many out’ (LMO) procedure to determine PRESS and cross-validated correlation coefficient q2, which are metrics reflecting the internal predictive ability of the model. In contrast to r2 which increases with the number of variables included in the model with a tendency to approximate the value of 1, Q2 follows a quadratic relationship reaching a maximum corresponding to optimal number of variables.

To check that the obtained model is not a result of chance factors, randomization of the Y response is recommended (Rücker et al. 2007). All models obtained with the randomized training set should be inferior, with r2 and q2 values around 0 or with negative values respectively for a set with 0% similarity with the original set (Gasteiger et al. 2003; Klopman and Kalos 1985).

A prerequisite for model validation is external validation, either by dividing the data set into training and test sets and rebuilding the models or/and using a blind test set. The errors produced in the predictions should be comparable to those achieved for the training set. Recently, Roy et al. have proposed a modified correlation coefficient r 2m as a novel metric for external validation, which represents the actual difference between the observed and predicted response data without consideration of training set mean and taking into account the r2 with intercept and r 20 , without intercept. Change of the axes denoting observed and predicted y modified correlation coefficient r′ 2m may be different from r 2m A threshold for the difference delta r 2m  = abs(r 2m −r′ 2m ) less than 0.2 and an average r 2m  = (r 2m  + r′ 2m )/2 higher than 0.5 indicate robustness of the model (Roy et al. 2009; Roy et al. 2012).

Model applicability domain (AD), defined as the region of chemical space where predictions can be made without extrapolation is an important issue that should be taken into consideration for the proper use of QSAR/QSPR. There are different methods for the assessment of applicability domain, for particular types of QSAR models (Jaworska et al. 2005; Netzeva et al. 2005; Sahigara et al. 2012). Distance/leverage based methods are usually applied. In regard to QSAR models for regulatory purposes, OECD clearly states that the AD should be described “in terms of the most relevant parameters, i.e., usually those that are descriptors of the model” (Jaworska et al. 2003).

The performance of the models over time, in particular in the case of global QSPR models, has been addressed by continuous updating of the original models, so as to extend the applicability domain allowing predictions for new compounds of different chemotypes (Rodgers et al. 2011).

5 QSAR/QSPR Applications in the Drug Discovery Process

QSAR/QSPR models can be established for all processes across the drug discovery pipeline. Initial virtual screening may be followed by modeling of the affinity of ligand series to the receptor or to other off-target proteins. In parallel, models for permeability and other pharmacokinetic properties like plasma protein binding, affinity to uptake or efflux transporters and metabolic stability may be established to evaluate safety and efficacy of the candidates.

5.1 Modeling Pharmacodynamics

Pharmacodynamic models focus on predictions of receptor affinity. It should be noted however that binding to proteins is governed by the same recognition forces, regardless if they are target receptors, plasma and tissue proteins, metabolizing enzymes or off-target proteins. They reflect interactions between the small molecules and the amino acid residues within the active site of the protein.

Computational techniques to detect and/or and optimize efficient binding involve both ligand- and structure-based methods and are applied to optimize receptor binding as well as to predict ADME properties involving proteins, like plasma protein binding, binding to metabolizing enzymes or transporters (Fig. 2).

5.1.1 Ligand-Based Drug Design (LBDD)

Ligand-based Quantitative Structure-Activity Relationships (QSAR), established by the procedures, already discussed in Sects. 35, do not require or ignore knowledge on the structure of the target protein. In most cases, they are two dimensional models, although they may embrace three dimensional information by incorporating 3-D descriptors. Such models take advantage of the large number of available descriptors and the progress in the statistical techniques as well as of the associated philosophy (see Sect. 4). They can be further classified as global or local models .

Global models are useful for virtual screening, off target screening or for plasma/tissue protein binding (Helgee et al. 2010; Sheridan 2014). For global models , the goal is to encompass a large applicability domain, while interpretability may not be an issue, at least in the early stages. More important may be the continuous updating of the models to incorporate new chemotypes, so as to expand their applicability domain (Rodgers et al. 2011). In fact, the goal of such global models is not the search for new chemical entities, but to prioritize existing or virtual compounds. In contrast, for lead optimization on receptor binding, local models are more helpful. They are built under the precondition that all analyzed molecules interact with the same type of receptor in the same manner. Evidently, in these cases interpretability defines a determinant factor since the primary goal is to understand the receptor requirements and search for novel compounds with the desired physicochemical/molecular properties. Yet, the inverse-QSAR methodology (see Sect. 4), although based on descriptors which do not confer interpretability, may still allow to construct viable molecules (Wong and Burkowski 2009).

The three dimensional structure of the molecules can serve to create 3-D QSAR models, which provide a direct link to potency. 3D-QSAR has emerged as an extension to the classical 2D-QSAR, using robust chemometric techniques, such as PLS. In 3D-QSAR the precondition for identical binding sites in the same relative geometry for all molecules should be strictly obeyed. After geometry optimization, molecules are superimposed and carefully, aligned in a rational and consistent way to create a hypermolecule. A sufficiently large box is positioned around this hypermolecule and a grid distance is defined. Different atomic probes, e.g., a carbon atom, a positively or negatively charged atom, a hydrogen bond donor or acceptor, or a lipophilic probe, are used to calculate field values in each grid point, i.e., the energy values which the probe would experience in the corresponding position of the regular 3D lattice. Using these fields as input descriptors in PLS analysis, principal components, defined by different proportions of the fields, are generated.

The most popular 3D-QSAR methodology is Comparative Molecular Field Analysis (CoMFA) and Comparative Molecular Similarity Indices Analysis (CoMSIA). CoMFA, developed by Cramer in 1988, is based upon the calculated energies of steric and electrostatic fields (Cramer et al. 1988). CoMSIA, instead of interaction fields, calculates similarity indices using a distance-depended Gaussian functional form. Five types of similarity indices are calculated, steric, electrostatic, hydrophobic, and hydrogen-bond donor and acceptor (Klebe 1998). An important advantage of CoMFA and CoMSIA is the graphical representation of the results. 3-D contour maps in CoMFA display the different contributions of the potentials to the activity, while in CoMSIA they highlight the areas within the region occupied by the ligands, that ‘favor’ or ‘dislike’ the presence of a structural feature with a given physicochemical property. In this sense the CoMSIA representation is more easily interpretable than CoMFA contour maps.

The difficulties of both methods are associated with the structure alignment, which may affect the results, while it limits their application to strictly similar compounds. The use of a single conformation for a given ligand represents a limitation of 3D-QSAR since the bioactive conformation may not be necessarily the thermodynamically optimal one. Moreover, orientation in the binding site may be ambiguous, especially in the absence of structural information on the biological receptor. To face such problems, higher dimension QSAR methodologies (xD-QSAR) have been developed. Additional dimensions offer the possibility to represent each ligand molecule as an ensemble of conformations, orientations, tautomeric forms and protonation states (Ekins et al. 1999; Hopfinger et al. 1997; Vedani et al. 2005, 2000).

A general drawback of ligand-based QSAR models is the underlying assumption that chemical similarity correlates with biological similarity, considering a rather smooth structure-activity landscape. The presence of activity outliers however shows that this is not always the case and structure-activity landscape may present activity cliffs (Guha 2011). In such cases, outliers deserve special attention and should be investigated separately. Outliers representing activity cliffs can be identified by structure-based methods, like docking or pharmacophore approaches. In this aspect combination of both ligand- and structure-based approaches may provide insight on the behavior of such outliers (Vallianatou et al. 2013).

5.1.2 Structure-Based Drug Design (SBDD)

Structure-based methods rely on detailed knowledge of target protein structures and target protein-ligand complex providing a more straightforward understanding of the mechanistic aspects in drug-receptor interactions. X-ray crystallography as well as NMR have contributed immensely in this field (Anderson 2003).

In the PDB database (http://rcsb.org), more than 120,000 biological macromolecular structures are deposited, covering more than 40,000 organisms and 38,000 distinct protein sequences. However, in order to use those data, a proper and detailed preparation of the protein must be performed (Anderson 2003; Sastry et al. 2013). The preparation process includes hydrogen addition, protonation or deprotonation based on pKa prediction of acid or basic side chains, and side chain optimization to achieve the optimum number of hydrogen bond interactions. Once the structure of the protein is well studied and analyzed, all essential parts for interactions between the co-crystalized ligand and the protein are gathered to design new optimized molecules. In this aspect, the key issue for a successful structure-based design is the identification of the target and the appropriate binding site. In Fig. 3 a representative crystal structure of a protein-ligand complex and the interaction points is illustrated. In Fig. 3a, PPARα receptor is represented by ribbons in complex with aleglitazar, represented in space-filling way (CPK representation). Figure 3b shows the ligand interaction diagram of aleglitazar inside the binding pocket.

Fig. 3
figure 3

a Ribbon representation of PPARα in complex with aleglitazar (CPK representation), b Ligand interaction diagram of Aleglitazar inside the binding pocket. Hydrophobic residues are colored green, hydrophilic residues are colored cyan, positive charged residues are colored blue and negative charged residues are colored red. Hydrogen bonds are depicted with dashed lines

Additionally, the crystal structure of a protein-target can be used for virtual screening procedure. Virtual screening procedures are based on the structure of a protein while a large database is screened and all molecules are ranked based on empirical docking scoring function for binding affinity (Hillisch et al. 2015). Top ranked molecules are than tested in vitro to validate the model, and the new lead compounds are optimized using computer-aided combinatorial techniques (CombiGlide, version 4.1, Schrödinger, LLC, New York, NY, 2016). Thus, using fragment based algorithms, new virtual chemical libraries are designed based on the core skeleton of the hit compound previous, and top ranked “theoretical” molecules are passed to medicinal chemists for synthesis and further in vitro testing.

However prediction of binding constants based on the correlation with docking scores is not always feasible, especially in the case of structurally diverse compounds. ΔG values calculated by molecular docking may have an acceptable calculation error of 2 kcal/mol corresponding to 2 log units of dissociation constants Kd (Enyedy and Egan 2008; Keserü 2001). Moreover, they may show little differentiation, since they are the outcome of enthalpy–entropy compensation (Brandt et al. 2011). Therefore docking calculations alone are not sufficient, if the principal query is to predict binding constants.

In the past years, many success stories have been achieved using structure-based drug design (SBDD). Some representative examples are reported below:

Amprenavir (Agenerae) and nelfinavir (Viracept) (Kaldor et al. 1997) were the first drugs reaching the market designed against HIV protease using SBDD methodology. Zanamivir (Relenza) was designed against neuraminidase (Varghese 1999), Tomudex against thymidylate synthase (Rutenber and Stroud 1996) and imitinabmesylate (Glivec) against Abl tyrosine kinase (Schindler et al. 2000). Moreover, SBDD has contributed to address more complicated targets, like nucleic acids as well as protein-protein interactions. Thus, inhibitors have been developed for HIV-1 RNA target TAR (Lind et al. 2002, Filikov et al. 2000), the IL2/IL2Rα receptor interaction (Tilley et al. 1997), the VEGF/VEGF receptor (Wiesmann et al. 1998) and Bcl2 (Enyedy et al. 2001).

5.2 Modeling Pharmacokinetics

Pharmacokinetic processes are controlled both by passive phenomena and binding to proteins, the latter concerning plasma and tissue proteins, metabolizing enzymes and transporters. Passive phenomena include passive diffusion through various biological barriers, hemolysis or cell retention. They are governed primarily by lipophilicity, while molecular weight and hydrogen bonding may contribute as additional factors (Avdeef 2012; van de Waterbeemd and Smith 2001). There are also border cases between passive diffusion and binding such as phospholipidosis or drug membrane interactions (Hanumegowda et al. 2010). Volume of distribution is also the outcome of membrane permeability and tissue binding (Hollósy et al. 2006). Among the biological barriers, the gastrointestinal tract and the blood brain barrier are of highest interest and relevant QSPR models are discussed in the following sections.

5.2.1 Modeling Permeability

Several in vitro techniques have been developed for rapid estimation of membrane permeability in vitro. Artificial membranes used in parallel artificial membrane permeability assay (PAMPA) (Kansy et al. 1998) or in immobilized artificial membrane (IAM) chromatography (Tsopelas et al. 2016a, b) provide easy measurements. However, cell-based protocols such as Caco2 or MDCK cell lines are more widely accepted as measures of effective permeability, which is considered as a reliable index mainly for intestinal human absorption (Thiel-Demby et al. 2008; Usansky and Sinko 2005; Volpe 2008; Yee 1997). The Caco-2 model is recommended by the US FDA for the classification of compounds according to the bio-classification system (BCS) (Larregieu and Benet 2013). Several QSPR models to predict Caco-2 or MDCK permeability have been published, which however include a limited number of compounds (Castillo-Garit et al. 2008; Irvine et al. 1999; van De Waterbeemd et al. 1996). It has been shown however from local models , that high Caco-2 permeability rate should correspond to the high human intestinal permeability rate (or extent of absorption), independent of the laboratories of origin and regardless of whether carrier-mediated transport is occurring (Larregieu and Benet 2014).

Due to the considerable inter- and intra-laboratory variability of Caco-2 effective permeability, classification models may be a better option, while meeting the requirements for BCS. Two representative studies performed on large datasets are reported below. Sherrer et al. applied random forest (RF) to the largest dataset ever reported (15791 compounds) to establish a moderate model with a R2 = 0.52, RMSE = 0.20 using 8 descriptors (Sherer et al. 2012). A later model derived by ruled-based decision trees using 1289 compounds achieved determination of 3 permeability classes (High-H, Medium-M, Low-L). The best rule, based on the combination of PSA-MW-logD (3P Rule), was able to identify the H, M and L classes with accuracy of 72.2, 72.9 and 70.6 %, respectively, while a consensus system based on three voting binary classification trees predicted 78.4/76.1/79.1 % of H/M/L compounds on the training and 78.6/71.1/77.6 % on the test set (Pham-The et al. 2013).

Recently, a QSPR study to predict Caco-2 cell permeability was performed on a large data set of 1272 compounds, which were filtered and curated (Wang et al. 2016). Four different methods including multiple linear regression (MLR), partial least squares (PLS), support vector machine (SVM) regression, and boosting trees were employed to build prediction models with 30 molecular descriptors. The nonlinear model derived by Boosting performed better with R2 = 0.97, RMSE = 0.12, Q2 = 0.83, RMSECV = 0.31 for the training set and R2 = 0.81, RMSE = 0.31 for the test set.

5.2.2 Predicting Human Intestinal Absorption/Oral Bioavailability

Considerable efforts are oriented to establish QSPR models for human intestinal absorption and oral bioavailability. Relevant software packages are available either for direct predictions or for predictions of ADME properties like lipophilicity, solubility, ionization, which would allow a rough evaluation of the potential of drugs to be orally absorbed. The rules of thumb, discussed in Sect. 4.1, are very helpful in this case.

Human intestinal absorption (HIA) is usually measured as the percentage of the dose that reaches the portal vein after passing the intestinal wall (%HIA). On the other hand, oral bioavailability (%F) describes the passage of a substance from the site of absorption into the systemic circulation after first pass hepatic metabolism. Intestinal metabolism, acidic stability and the effect of transporters contribute to the outcome. Absorption in gastrointestinal tract is governed by permeability through cell membranes (transcellular absorption) or through the intercellular space between cells of the gastrointestinal mucosa (paracellular transport). The effect of lipophilicity on absorption has been previously described by linear, bilinear, sigmoidal or parabolic models (Kubinyi et al. 1993; Kubinyi and Kehrhahn 1978). However, for the establishment of global QSPR models, which would permit predictions for different chemotypes of novel compounds, additional physicochemical parameters or molecular descriptors, should be implemented. Molecular weight, polarity or hydrogen bonding parameters as well as the charge state are most commonly used, being also consistent to describe Caco-2 permeability as discussed above (Kumar et al. 2011; Tsopelas et al. 2016a, b; Veber et al. 2002).

The main problems to be addressed for the establishment of robust global HIA models concern the significant variability of the datasets from one source to another and the distribution of endpoints, since they include commercially available drugs and are often heavily biased towards compounds with high intestinal absorption values (Hou et al. 2007). This fact will influence the predictive capacity of the in silico models and better predictions will be obtained for compounds with high intestinal absorption values, compared to the rest of the dataset. A scientific and technical report of the European Commission Joint Research Centre and the Institute for Health and Consumer Protection compiles literature models for HIA published till 2010, along with databases with ADME endpoints (Mostrag-Szlichtyng and Worth 2010). In this chapter, representative examples and latest investigations are discussed.

One of the first attempts to predict %HIA was published by Wessel et al. who applied a genetic algorithm with a neural network (GA-NN) technique to develop a non-linear model for set of 86 drugs. They identified six most significant variables, namely: the cube root of gravitational index, related to the size of molecule, the normalized 2D projection of the molecule on the YZ plane (SHDW-6, related to the shape, the number of single bonds (NSB), related to flexibility, as well as the charge on hydrogen bond donor atoms (CHDH-1), the surface area multiplied by the charge of hydrogen bond acceptor atoms (SCAA-s) and the surface area of hydrogen bond acceptor atoms (SAAA-2), related to hydrogen-bonding properties. The predicted %HIA values achieved good statistics with root mean square errors (RMSE) of 9.4%HIA units for the training set, 19.7%HIA units for the cross-validation (CV) set, and 16.0%HIA units for the external prediction set (Wessel et al. 1998).

The general solvation equation developed by Abraham’s group (Abraham et al. 2002) was used by Zhao et to model the human intestinal absorption data of 169 drugs (Zhao et al. 2001). The model Eq. (3) derived by stepwise MLR was based on Abraham’s linear solvation energy (LSE) descriptors, namely: excess molar refraction (E), solute polarity/polarizability (S), the McGowan characteristic volume (V), solute overall hydrogen bond acidity (A) and basicity (B).

$$ \begin{aligned} & \% {\text{HIA}} = 9 2+ 2. 9 4 {\text{E}} + 4. 10{\text{S}} + 10. 6 {\text{V}}{-} 2 1. 7 {\text{A}}{-} 2 1. 1 {\text{B}} \\ & {\text{R}}^{ 2} = 0. 7 4,\; {\text{s}} = 1 4\\ \end{aligned} $$
(3)

According to Eq. (3) the volume and the hydrogen bond descriptors were found to be the most important.

Klopman et al. compiled a large dataset of 467 drug molecules for human intestinal absorption. The data were split into a training set of 417 and external prediction set of 50 molecules. Structural fragments promoting or preventing HIA were identified using the CASE program (http://www.multicase.com/) and their occurrence was subsequently used in a multiparameter linear equation (4) to predict human intestinal absorption (Klopman et al. 2002).

$$ \% {\text{HIA}} = {\text{c}}_{0} + {\text{c}}_{\text{i}} {\text{G}}_{\text{i}} , $$
(4)

where c0 is a constant, ci are the regression coefficients and Gi is the presence (1) or absence (0) of a certain structural fragment. The final QSAR model included 37 descriptors: 36 statistically significant structural descriptors identified by CASE analysis and one important physicochemical parameter—the number of hydrogen bond donors (H donors). The model was able to predict the %HIA with an r2 = 0.79 and a standard deviation s = 12.32% for the compounds of the training set. The standard deviation for the external test set (50 drugs) was 12.34%. The merit of the model is that it indicates certain substructures with negative impact in %HIA, such as quaternary nitrogens, SO2 groups connected to an aromatic ring and others with positive impact on HIA. A drawback of the model is that the training set was biased towards high absorption values (Klopman et al. 2002).

Using Zhao’s data set, Sun proposed a PLS-DA classification approach for human intestinal absorption modeling, using atom type descriptors. Drugs were classified as classified them as “good” (absorption > 80%) “medium” (80% < absorption > 20%) or “poor” (absorption < 20%), according to their %HIA. A five component PLS-DA model separated very well all 169 compounds with r2 = 0.921 and q2 = 0.787. Since in the case of virtual screening, only poorly absorbed compounds would need to be identified and removed the authors proposed also a three-component PLS-DA with r2 = 0.939 and q2 = 0.861 to separate the compounds with less than 20% absorption (Sun 2004).

Recently, a dataset of 578 compounds, split into a training set of 403 compounds a validation set of 87 and an external prediction set of 87, was analyzed, using ensemble learning (EL) techniques, (gradient boosted tree, GBT and bagged decision tree, BDT) to derive both qualitative (classification) and quantitative models. Topological polar surface area proved to be the most important descriptor with negative contribution, followed by lipophilicity expressed as XlogP. Classification accuracy > 99% was reported, while the QSAR models yielded correlation coefficients R2 > 0.91 between the measured and predicted HIA values (Basant et al. 2016).

Prediction models are available also for the more complex process of oral bioavailability (Andrews et al. 2000; Hou et al. 2007; Kim et al. 2014; Kumar et al. 2011; Martin 2005; Moda et al. 2007; Tian et al. 2011). Till the year 2010 they are compiled in the scientific and technical report of the Joint Research Center of the European Union. In the same report relevant software for prediction of oral bioavailability are provided (Mostrag-Szlichtyng and Worth 2010).

Recently, in silico approaches focus more on physiologically based pharmacokinetics (PBPK), which go beyond human intestinal absorption and oral bioavailability, providing realistic descriptions of absorption, distribution, metabolism, and excretion processes (Bois and Brochot 2016; Jamei 2016). PBPK modeling has gained a significant impact on regulatory science and decisions (Huang et al. 2013) and best practice for its use to address regulatory questions, has been reported (Zhao et al. 2012).

5.2.3 Predicting Blood Brain Barrier Penetration

In drug discovery for CNS active drugs, it is important to determine whether a candidate molecule is capable of penetrating the blood brain barrier (BBB). For drugs targeted at the CNS, the BBB penetration is a necessity, whereas for drugs acting in peripheral tissues, the BBB penetration may lead to undesirable adverse effects (Di et al. 2009; Ecker and Noe 2004). The log BB, defined as the logarithm of the ratio of the concentration of a drug in the brain and in the blood, measured at equilibrium, is an index of BBB permeability. The optimal threshold for classification as a CNS acting drug is typically specified between 0 and −1 (Clark 2003). Log BB values, although widely used, do not take into account plasma and tissue binding, and therefore, do not reflect the free amount of the drug in the brain. Permeability surface area product (PS, quantified as logPS) representing the uptake clearance across the BB is used as a direct measure of permeability and theoretically is not confounded by the plasma and brain tissue binding.

Several models have been published trying to predict blood-brain barrier permeability from various physicochemical properties of molecules, including, among others, molecular size, lipophilicity or number of groups that can establish potential hydrogen bonds (Clark 1999; Kaliszan and Markuszewski 1996; Konovalov et al. 2007; Luco 1999; Vastag and Keseru 2009). Rules of thumbs are also suggested, as discussed in Sect. 4.1. Till the year 2010, literature models are compiled in the scientific and technical report of the European Commission Joint Research Centre and the Institute for Health and Consumer Protection (Mostrag-Szlichtyng and Worth 2010). Some representative models and recent publications are discussed in this chapter.

Already in 1980, Levin had related log Pc (which is close analog of log PS) to a simple linear function of logP and molecular weight. The overall effect was represented as log (P ∗ MW−1/2) = logP−½logMW, whereby increasing log P was supposed to reflect a steady increasing log PS effect, whereas increasing MW had an opposite effect (Levin 1980). In 1999, Clark analyzed a set of 55 diverse organic compounds and generated a multiple linear regression model based on in silico calculated polar surface area (PSA) and logP values with negative and positive contribution respectively (Clark 1999).

The linear solvation energy relationship approach (LSER), also used to model human intestinal absorption, has been applied to blood/brain permeability prediction (Platts et al. 2001). For a dataset of 148 diverse compounds using MLR, they obtained a transparent QSAR incorporating 5 Abraham descriptors and an indicator variable (equal 1 for carboxylic acids and 0 for other compounds) has been reported. The model shows good statistics (R2 = 0.74, s = 0.34, RCV2 = 0.71). According to the model, the increasing size of molecules strongly enhances brain uptake, while increasing polarity/polarizability, hydrogen-bond acidity, basicity and the presence of carboxylic acid groups have a detrimental effect. Platt’s model has been implemented in the commercially available ADME Boxes software (previously Pharma Algorithms; now ACD Labs, http://www.acdlabs.com/), providing a very fast estimation of logBB. Later, the data set was extended to include 328 compounds with in vivo and in vitro logBB values. A correlation coefficient r2 = 0.75 and a standard deviation s = 0.3 was achieved by incorporating an additional indicator for in vitro data (Abraham et al. 2006).

For a data set of 88 diverse compounds using a variable selection and modeling method, a QSAR with three or four descriptors out of 324 descriptors has been reported for logBB prediction. In both models, calculated lipophilicity (AlogP98) was combined either with the atomic type E-state index (SsssN) and Van der Waal’s surface (r = 0.842, q = 0.823, and s = 0.416) or with kappa shape index of order 1, atomic type E-state index (SsssN), atomic level based AI topological descriptor (AIssssC) (r = 0.864, q = 0.847, and SE = 0.392). The success rate of the reported models in test sets was 82% in the case of BBB + compounds. A similar success rate was observed with BBB-compounds (Narayanan and Gunturi 2005).

The VolSurf technique, which is based on molecular interaction fields, has also been used for blood/brain partitioning modeling (Crivori et al. 2000). The model was built on the basis of 230 diverse compounds and more than 70 VolSurf descriptors. Its prediction accuracy (assessed against an external test set) is 90% for BBB permeable molecules and 60% for non-permeable ones. The computational procedure is fully automated and fast and it provides a valuable tool for the virtual screening of large datasets of diverse molecules (Cruciani et al. 2000). The shortcoming of this approach however is its low interpretability.

Linear discriminant analysis (LDA) based on physicochemical descriptors calculated in silico has been used to establish two distinct classification models (Vilar et al. 2010). The data set consisted of the 307 compounds used by Abraham et al. (Abraham et al. 2006) for which in vivo logBB values were available. Considering that molecules with log BB > 0.3 cross the BBB readily while molecules with log BB < −1 are poorly distributed to the brain, these values were selected thresholds for classifying the compounds into two categories. For the threshold 0.3, a two component model was obtained with lipophilicity and topological polar surface area (TPSA), the latter with a negative coefficient. For the threshold-1, the total number of acidic and basic atoms was additionally incorporated, also with a negative sign. The models were validated with external data sets using the area under receiver operating characteristic (ROC) curves as evaluation criterion. In ROC the fraction of true positives (sensitivity) is plotted against the fraction of false positives (1-specificity). An area under the ROC curve of 0.95 for model 1 and 0.97 for model 2 is reported, demonstrating the high predictive power of the models, considering that for a perfect classifier the area under the curve is 1 and for a random classifier it is 0.5 (Vilar et al. 2010).

Based on logPS values in rats, Suenderhauf et al. developed predictive computational models (decision tree induction) for a dataset of 153 compounds. The established models exhibited a corrected classification rate of 90%. The models confirmed the involvement of lipophilicity, molecular size and charge in BBB permeation (Suenderhauf et al. 2012).

5.2.4 Modeling Plasma Protein Binding

A special case of binding of small molecules to macromolecules is plasma protein binding. Plasma protein binding (PPB) is the reversible association of a drug with the proteins of the plasma and is mainly due to hydrophobic and electrostatic interactions. Since only the fraction of unbound (fu) drug is able to pass across cell membranes, PPB strongly influences volume of distribution, half-life and efficacy of drugs. Extended plasma protein binding may be associated with drug safety issues, low clearance, low brain penetration, as well as drug–drug interactions (Ito et al. 1998; Rowley et al. 1997). In fact, plasma protein binding belongs to the ADME properties, representing mainly the “D” of the acronym.

Among the plasma proteins, human serum albumin (HSA) has a central role and the affinity of drugs to this protein is considered to dominate PPB and the thereupon related pharmacokinetic issues. Two primary active sites on HSA have been recognized for drug binding, the Sudlow’s sites 1 (warfarin site) and 2 (benzodiazepine site), α1-acid glycoprotein (AGP) is the second essential plasma protein with two main variants and a complicated physiological role (Lambrinidis et al. 2015).

Modeling of total plasma protein binding or/and of HSA binding has been the objective of many researchers and offers a representative case where combined structure- and ligand-based methods act synergistically. Structure based methods are very helpful to initially classify the compounds according to the preferred binding site or protein, prior to proceeding to ligand-based methods. Since PPB is practically involved in any class of therapeutics, the ultimate goal is to construct global HSA or PPB models, where structural diversity plays an important role. Representative successful efforts are described below. Often more than one model are suggested by the same research group, where interpretability may compete with accuracy in predictions.

A multiple computer-automated structure evaluation method (M-CASE) was used by Saiakhov et al. (Saiakhov et al. 2000) to analyze 154 structurally diverse compounds for total plasma protein binding. M-CASE starts by searching for ‘baseline correlation’ via an internal baseline activity identification algorithm subroutine (BAIA), using the octanol–water partition coefficient which is the most important parameter. For compounds showing residual binding when predicted by the baseline correlation, the algorithm continues to identify responsible structural characteristics, called biophores. Several local QSAR models built for subsets with common biophores are included in the final global model . The binding site(s) of each biophore, including the warfarin, benzodiazepine and digitoxin sites, as well as AGP and lipoproteins, are also characterized. Lipophilicity as the prevalent parameter showed different contribution in each local QSAR, indicating different lipophilicity requirements for each binding site. A crucial structural fragment present in the molecules was found to be part of a phenyl ring. The model, after classifying the compounds according to their biophores, was able to predict correctly the percentage bound to plasma for 80% of the compounds with an average error of 14%.

A large data set of 1008 compounds, partitioned into a training set of 808 compounds and an external validation test set of 200 compounds was used by Votano et al. for model construction of human serum protein binding (Votano et al. 2006). A robust ANN model based of topological descriptors in combination with logP was established with r2 = 0.90, MAE = 7.6 and r2 = 0.70, MAE = 14.1 respectively. MAE stands for Mean Absolute Error.

Votano’s data set was used by Ghafourian et al. (Ghafourian and Amin 2013) to construct linear regression and nonlinear models using classification and regression trees (CART), boosted trees and random forest. Interpretable linear regression and simple regression trees models were able to identify the important contribution of hydrophobicity, van der Waals surface area and aromaticity for high PPB. On the other hand, the more complicated ensemble method of boosted regression trees produced the most accurate PPB predictions.

Combination of chemometrics with molecular modeling confirmed the preponderant contribution of hydrophobic regions of drug molecules and the specific roles of polar groups, which anchor drugs to HSA 1 and 2 binding sites (Estrada et al. 2006). Identification of the binding site before performing QSAR analysis can evidently lead to better models. For 889 chemically diverse compounds with binding affinity for domain III-A, a group contribution model was developed based on 74 chemical fragments. (R2 = 0.94, Q2 = 0.90) (Hajduk et al. 2003). The authors further suggested a combination of QSAR models for full-length albumin and for domain-IIIA to allow for discrimination between compounds that bind to the latter site and those that bind elsewhere on the protein. An important issue is that the fragments used in the model are mapped by most of the topological descriptors included in Votano’s model, indicating that they can be considered quite universal. Thus, they provide a convenient look-up table for quantitatively estimation of the effect of a particular group to albumin binding.

A free web prediction platform was constructed by Zsila et al. who combined support vector machine (SVM) classification model with molecular docking calculations. The classification model was based on 45 descriptors, with logP being the most important. The platform (http://albumin.althotas.com) enables the users (i) to predict if albumin binds the query ligand, (ii) to determine the probable ligand binding site (site 1 or site 2) according to the classification model (iii) to select using the Tanimoto similarity the albumin X-ray structure which is complexed with the most similar ligand and (iv) to calculate complex geometry using molecular docking calculations (Zsila 2013).

The continuous update of the HSA models in order to maintain their performance over time is essential for the drug discovery and development settings, extending their applicability domain and robustness. In this sense, Rodgers et al. proposed a procedure for monthly updating human plasma protein binding models over a period of 21 months (Rodgers et al. 2007), which was extended to three years, using partial least squares (PLS), random forest (RF) and Bayesian neural networks (BNN). The authors started with a large data set, the size of which was doubled by the end of the study (Rodgers et al. 2011). Consensus predictions of HSA binding constants using the final models, generated by all three techniques showed, RMSE = 0.55. These results justified the need for the automatic regular updating of QSAR models (autoQSAR) in the case of ADME properties.

An analogous approach for modeling HSA binding, as well as other ADME properties, over time is implemented in a software architecture, the so called “Discovery Bus” which allows exhaustive exploration of descriptor and model space, automates model validation and their continuous updating providing an automated QSPR through competitive workflow (Cartmell et al. 2005).

Recently, ensemble machine learning-based QSPR models have been established for a four-category classification and PPB affinity prediction , using a dataset of 930 compounds. The structural diversity of the compounds was tested by the Tanimoto similarity index. In the test set, the classification QSPR models proved superior with an accuracy > 93%, while the regression QSPR models yielded r2 > 0.920 between the measured and predicted PPB affinities, with the root mean squared error < 9.77. Lipophilicity, expressed as XLogP, was the most important descriptor (Basant et al. 2016).

For further PPB models and for the state of the art in predicting binding to a1-acid glycoprotein, the second important plasma protein, the reader is referred to a recent comprehensive review by Lambrinidis et al. (2015).

5.2.5 Prediction Models for Metabolism

Metabolism, the M in ‘ADME’, is one of the main factors influencing the fate and toxicity of a chemical. Metabolism or (biotransformation) includes a large set of chemical reactions, which generally convert drugs or other xenobiotics into more polar and more easily excreted, i.e., less toxic forms. However, in some cases, metabolism may lead to toxic metabolites or/and intermediates. Thus, metabolites with physicochemical and pharmacological properties that differ substantially from those of the parent drug have important implications for both drug safety and efficacy (Testa et al. 2004; Testa 2009).

The utility of conventional QSARs predicting the metabolic fate of chemicals is rather limited. Most of the models are established to predict the phase I metabolism, mainly addressing cytochrome P450 (CYP450) isoforms, a superfamily of enzymes including more than 70 families of proteins, which play a predominant role in the biotransformation of drugs and xenobiotics. Based on a ‘guesstimate’ of the number of drug metabolites that are known to be produced by cytochromes P450 isoforms and other oxidoreductases (EC 1), as well as hydrolases (EC 3), and transferases (EC 2), it is supposed that oxidoreductases are the main enzymes responsible for the formation of toxic or active metabolites, whereas transferases play the major role in producing inactive and nontoxic metabolites (Testa 2009).

Terfloth et al. (2007) investigated the application of several model-building techniques, such as k-NN, decision trees, Multilayer Perceptron as Neural Networks (MLPNN), Radial Basis Function Neural Networks (RBF-NN), Logistic Regression (LR) and Support Vector Machine (SVM), to predict the isoform specificity for CYP450 3A4, 2D6 and 2C9 substrates (Terfloth et al. 2007). The applied descriptors included simple molecular properties and functional group accounts, topological descriptors, descriptors related to the shape of molecules or the distribution of interatomic distances considering the 3D structures of the molecules. A 9-descriptor model, established by combining automatic variable selection with the SVM technique, gave the best results. The achieved predictivity for an external data set of 233 compounds was equal 83%. Promising results were also obtained for the decision tree based model with three descriptors only, and 80% predictivity for the external data set was achieved. Burton et al. (2006) constructed classification models for human CYP1A2 and CYP2D6 inhibition using binary decision tree. The decision tree for CYP2D6 had sensitivity 88%, specificity 92% and positive predictivity 90%. The external validation hada ccuracy 89%, sensitivity 91%, specificity 92% and precision 90%. For CYP1A2, accuracy was 89%, sensitivity 95%, specificity 83% and precision 85% for the training set while the test set had 81% accuracy, 76% sensitivity, 86% specificity and 85% precision. The authors identified a range of useful descriptors. Van der Waals surface area (VSA) was particularly efficient and allowed to develop models reaching 95% correct classification. 3D descriptors also provided promising results. Sheridan et al. (2007) applied Random Forest (RF) technique for predicting CYP450 (3A4, 2D6, 2C9) sites of the metabolism, using descriptors that describe the environment around each non-hydrogen atom in each molecule. The authors identified several descriptors positively and negatively related to the oxidation sites of molecules. Compared to the results using MetaSite software (Molecular Discovery) of Cruciani et al. (2005), Sheridan’s model performed better in the case of CYP3A4. For CYP2D6 and CYP2C9 the predictions of Sheridan’s model were only slightly better.

In the case of metabolism, computer-based expert systems have a much broader applicability. Among them MetaSite is widely used (Cruciani et al. 2005). It makes predictions based on the lability of hydrogens and orientation effects derived from the 3D structure of a CYP active site, independently of the availability of pre-existing data. MetaSite can handle 3A4, 2D6, 2C9, 1A2, 2C9, and 2C19 and can be extended to any CYP for which a homology model can be generated. It is advantageous for enzymes such as CYP1A2 and CYP2C19, where there are not currently enough data in the literature to generate a QSAR model. Moreover, the MetaSite methodology is easy to use, fast and fully automated. Other expert systems are MetabolExpert, developed by CompuDrug (Darvas 1988), METEOR (Testa et al. 2005a) COMPACT (Computer-Optimised Molecular Parametric Analysis of Chemical Toxicity) (Lewis et al. 1996; Lewis 2001) and META, implemented in MCASE ADME Module (MultiCASE) (Klopman et al. 1999, 1997; Talafous et al. 1994).

More information about for predicting drug metabolism can be found in a recent review by Kirchmair et al. (2015).

5.2.6 Integrated ADME Prediction Models

In previous sections, separate models for different processes along the drug discovery and development pipeline are discussed. The medicinal chemist team should try to take advantage by applying them in their project compounds, selected by early stage techniques, e.g., virtual screening, structure or ligand based design for the target of interest, drug-like filtering. The multi-objective character of drug development however has challenged the creation of software tools and web platforms mainly for the purpose of integrated ADME and ADME-related predictions. Many of them are commercial. They differ greatly in terms of their capabilities and applications. Prediction software for physicochemical properties like lipophilicity and ionization, related to ADME, has already been discussed in Sect. 4. Solubility is another endpoint of interest for oral absorption as well as for formulation issues. Such predictions serve as inputs to models of key ADME properties , mainly for gastrointestinal absorption, BBB permeability, oral bioavailability (including affinity to uptake or efflux transporter) and plasma protein binding. Predictions of possible metabolite, as well as toxicity endpoints like mutagenicity, carcinogenicity or teratogenicity are also implemented in certain software. Some popular software are Know-it-All (Bio-Rad Laboratories http://www.bio-rad.com/), ADME Boxes (Pharma Algorithms—now included in ACD/ADME Suite), and ADMET Predictor (Simulations Plus Inc. http://www.simulations-plus.com/). VolSurf/VolSurf + (Molecular Discovery and Tripos) also predicts various ADME properties including passive intestinal absorption, blood-brain barrier permeation, solubility, protein binding, volume of distribution, and metabolic stability on the basis of different models based on VolSurf descriptors.

Moreover, there is a trend towards developing more sophisticated, mathematical PBPK models, see also Sect. 5.2.2. In these software tools, in vitro and/or in vivo ADME data are integrated with the results of QSAR/QSPR models (e.g., for percentage plasma protein binding or blood/brain barrier penetration) for organism-based ADME modeling. GastroPlus and Cloe, which mimic the processes inside living organisms, are more commonly used. Simcyp (http://www.simcyp.com/) is a proprietary PBPK simulator that provides a platform for modeling the ADME properties of drugs and their metabolites, as well as drug-drug interactions, in virtual patient populations (Jamei et al. 2009).

It should be noted as a warning for using software for ADME prediction that the results should be considered as rough estimates, useful for screening purposes or as starting points for further modeling or experimental evidence.

6 Conclusions

Drug discovery and development is a complicated multi-objective and expensive enterprise, with drug candidates being a compromise of competing pharmacodynamics and pharmacokinetic processes. In silico predictions along the different stages of the pipeline provide valuable support in the selection of drug candidates with balanced properties, so as to control each stage early enough and reduce failures at clinical phases. High technology provides new endpoints that may serve to establish efficient QSAR and QSPR models, which themselves profit of the evolution in computational and statistical techniques. Local and global models have their own value, dependent on the underlying goal and the timeline. Initial screening, off-target affinities or ADME properties benefit more by global models , while local models are suitable for selected project ligands with potential affinity for a target receptor. Interpretability of models is an important issue. The medicinal chemist is more familiar with models containing well understandable physicochemical or molecular descriptors, which provide an insight in the mechanism of action. However the most accurate model is not always the most interpretable. In such cases the intended use of the model is the determinant factor. Nevertheless, toxicity models for regulatory purpose must have a certain degree of interpretability as required by OECD.

The correct use of the models implies that the user is aware of their merits and pitfalls. Their evaluation should consider the accuracy and range of the endpoints, while external validation with blind test sets is a strict prerequisite in particular for global models. In such cases, determination of their applicability is useful in order to evaluate when predictions are reliable.

In conclusion, the results of the in silico models at the different stages of drug discovery should be taken into consideration for prioritizing the drug candidates, before proceeding to the next step. The ultimate goal is to produce safe and efficient drug candidates, a goal, which can be achieved by finding the golden ratio between affinity to the target receptor, in regard also to off-targets and the appropriate pharmacokinetic properties in compliance with the concept of druglikeness . The tools are available, they need to be properly used.