Keywords

11.1 Introduction

A potential drug molecule is one that effectively binds and modulates a molecular target in such a manner that is less toxic, safe and effective in the disease context for which it is doled out. The drug discovery development is a complex process, which can take 12–15 years and entail costs of more than $1 billion. In the modern era of drug discovery, development involves the cooperation of many disciplines such as chemistry, biology, mathematics and computer science (Herrling 2005). A chemical moiety with significant therapeutic value is extensively analyzed for its safety and efficacy before it is marketed. The multistep process, termed ‘drug discovery,’ includes identification and validation of the drug target and of the lead molecule. The drug development process is categorized, basically, into the two major phases of drug discovery and drug development. The drug discovery process involves two important approches; identification and validation of a potential disease-oriented target molecule and another approach is phenotypic screening to identify and refine the potential small molecules that can interact with target (Ernst and Obrecht 2008). This molecular interaction can be to block, promote or modify the activity of the target. In recent years, the drug discovery process has undergone radical changes due to the entry of various novel techniques in genomics; proteomics have been developed in drug target identification and validation has become more specific (Umashankar and Gurunathan 2015). In the past decade, emergence of microbial resistance (Amini and Tavazoie 2011) and complicated new diseases and unexpected adverse side effects have accelerated the identification of potent lead molecules (Ashrafuzzaman 2014). Infectious diseases, particularly Gram-positive bacterial infection, are among the major serious threats to public health worldwide: they are difficult to treat and are associated with high morbidity and mortality rates. Gram-negative bacteria are highly adaptive pathogens that produce resistance to antibiotics through several mechanisms. The production of β-lactamases and hydrolyzation of the β-lactam ring represents the most common resistant mechanism in Gram-negative bacteria against β-lactam antibiotics. Most bacteria can develop and adapt themselves according to their surroundings and subsequently develop several protective mechanisms to reduce their susceptibility to antibiotics. In some cases, bacteria allow horizontal gene transfer within and between species to become more resistant to antibiotics (Palumbi 2001; Thomas and Nielsen 2005). This horizontal gene transfer provides the most important mechanism to accelerate the spectrum of β-lactamases (ESBLs ), causing severe problems in drug resistant in the health care world (Giske et al. 2008; Hawkey and Jones 2009). Bacterial strains capable of producing ESBLs are resistant to several antibiotics, including penicillins and cephalosporins, and they are resistant to other antibacterials such as quinolones and aminoglycosides. This antibiotic resistance shows a strong correlation between the segment of the population that uses antibiotics and the prevalence of antibiotic-resistant bacteria in the same population; the correlation has been found on both national and regional levels (Bronzwaer et al. 2002; Albrich et al. 2004).

11.2 Global Battle Against Infectious Diseases

In the middle of the seventeenth century, smallpox infection was the most fatal and feared of diseases. The discovery of penicillin developed a new generation of antibiotics that cured a wide range of infectious diseases. Several researches focused on understanding what mechanisms the microbes used to survive antibiotics, and several pharmaceutical and biotech companies nearly stampeded to identify a significant bacterial target and to create novel methodologies against the bacteria. Recent evidence suggests that mutation with humans is not the only way bacteria develop antibiotic resistance; they can also transfer genetic instructions for avoiding an antibiotic to other bacterial species. In the late 1800s, pathogen-specific medical diagnosis lent a hand to the identification of microbes that caused specific diseases. Molecular genetics technique, polymerase chain reaction (PCR ) and, more recently, sophisticated, high throughput rapid sequencing of the genome of the pathogen are all used to observe the individual genetic variants ,facilitating identification of the familial base of drug immunity. Other factor-based, diagnostic tools including microchip and serological techniques and enzyme-linked immunosorbent assay can be more sensitive than traditional techniques in finding and measuring antibodies to pathogens (Pallen et al. 2010). Current data suggest that Gram-positive bacteria cause 45–70% of infectious diseases and are behind the increase in rates of drug resistance in many infections. The pace of drug resistance among bacterial pathogens is increasing; virtually no new antibiotics are being developed (Spellberg et al. 2004). Gram-positive organisms such as the bacteria of the genera Staphylococcus, Streptococcus and Enterococcus are the predominant bacterial spp causing clinical infection, hence, recent attention has focused on the multi-drug resistance (MDR ) and antimicrobial resistance (AMR) (Menichetti 2005; Doernberg et al. 2017).

Sulfonamide synthetic antimetabolites were first used clinically in 1932 for a wide range of both Gram-negative and Gram-positive bacteria. These synthetic metabolites inhibit dihydropteroate synthetase leads to repressed DNA replication. Until 1938, β-lactam was another widely used antibiotic. The 28 members that include antibiotics/β-lactamase inhibitor combinations are broadly classified into three subclasses: penicillins, cephalosporins and carbapenems, which are critically used in very broad-spectrum activity against most aerobic and anaerobic Gram-positive and Gram-negative bacteria (Walsh 2003; Collignon et al. 2009; Lewis 2013). Recently, glycopeptides like vancomycin (VANC ) and teicoplanin (TEIC ) have been widely used against Gram-positive bacteria; these share a mechanism of natural process similar to that of β-lactams, except their interruption on cell wall synthesis via an interaction with the D-alanyl-D-alanine (DADA ) moiety of peptidoglycan precursors inhibits the cross-linking stabilization step in bacterial cell wall formation (Malabarba and Goldstein 2005). The cyclic lipopeptide daptomycin has an extensive range of activity on Gram-positive bacterial infection and also on MRSA. Structurally, daptomycin comprises a 13-member hydrophobic polypeptide with a lipophilic side chain having a unique mechanism of natural process, which is leads insertion of the lipophilic region into the bacterial cell wall, oligomerizing into pore-like constructions, through which a significant efflux of potassium ions results in rapid bacterial cell death (Silverman et al. 2003; Steenbergen et al. 2005).

11.3 Methods in Drug Design

Drug development commences with the identification of a molecular target and lead molecules followed by lead optimization and preclinical in vitro and in vivo studies to recognize potent compounds that fulfill the primary criteria for the drug development (Bleicher et al. 2003). But, the development of lead molecules through in vitro and in vivo methods takes a long time and is very expensive (DiMasi et al. 2003); hence, in recent years in silico drug designing has been widely used to predict active lead molecules. Here, we look at discovery. Traditional drug discovery (in vitro and in vivo) requires about 12–14 years and costs up to $1.2–$1.4 billion dollars to get a drug from discovery to market (Hileman 2006). About 90% of the drugs entering clinical trials fail to obtain FDA approval and reach the consumer market (Tollman 2001). Lately, high throughput screening (HTS ) experiments are used to sort thousands of molecules with robotic automation; however, HTS is still expensive and requires a great amount of resources. Therefore, computer-aided drug designing (CADD ) can cut cost- and time-associated drawbacks and ensure the best possible lead compounds are used in animal studies. CADD tools have not merely been applied to distinguish potential lead molecules; they can also predict effectiveness and possible side effects and aid in improving bioavailability of the possible drug molecules (Yang et al. 2016). CADD plays a crucial role in the identification of many pharmaceutically available drugs, ones that have obtained FDA approval and reached the consumer market (Kitchen et al. 2004; Clark 2006; Talele et al. 2010). CADD methods are broadly classified into two categories: structure-based (SB) drug discovery and ligand-based (LB) drug discovery.

11.3.1 Structure-Based Drug Design

Structure-based drug design (SBDD ) methods are prominent tools in modern medicinal chemistry that utilize three-dimensional structural information from biological targets (Salum et al. 2008). Understanding the mechanism of small molecule reorganization and interaction with biological macromolecules is of great importance in pharmaceutical research and development. In recent years, due to wide range of application such as molecular docking, molecular dynamic simulation, and structure-based virtual screening (SBVS), SBDD has played a crucial role in the identification of potential drug molecules against various drug target (Kalyaanamoorthy and Chen 2011). In SBDD, binding site topology (including clefts, cavities and sub-pockets) and the electrostatic properties of the target molecule were carefully examined (Wilson and Lill 2011).

SBDD is an iterative method involving multiple steps for finding a lead. The first step of SBDD includes the cloning, purification and structure elucidation of the target proteins or nucleic acid by NMR, X-ray crystallography or homology modeling, identification of potential ligand molecules and evaluation of biological properties, such as potency, affinity and efficacy, as carried out through various experimental analyses (Fang 2012). It also provides the structural descriptions of the target-ligand complex for understanding the binding mode and conformations, characterization of key molecular interaction, characterization of unknown binding sites, mechanistic studies and elucidation of ligand-induced conformational changes (Kahsai et al. 2011). Methods used in SBDD such as molecular dynamics give insight into not only how ligands bind with target proteins but also consider the target flexibility and interaction of pathway. SBDD has contributed to several compounds reaching the clinical trial stage and getting FDA approval to go into the market (Burger and Abraham 2006; Wang et al. 2010; Hanson et al. 2015). Thus, SBDD is a cyclic process consisting of several steps, starting from a known target structure, then going on to several in silico studies, which are conducted to identify potential ligands. The mechanism of structure-based drug design is explained in Fig. 11.1, which shows the binding site feature of the protein (Fig. 11.1a); the available drug molecules displaying the binding phenomenon with the binding site, with a few empty spaces that may be filled with water molecules (Fig. 11.1b); and finally the new drug, designed as per the binding site feature that perfectly fits with the binding site (Fig. 11.1c).

Fig. 11.1
figure 1

Mechanism of SBDD showing the design of a new molecule as per the binding site feature of a protein

11.3.2 Ligand-Based Drug Design (LBDD)

LBDD is an one the often used method in computer aided drug design effectively used in the absence of the 3D structure of the target and the binding site is not accurately known, then a ligand-based drug design (LBDD) approach is a popular technique in the case of experimentally active compounds that bind to the biological target of interest. The common assumption in drug identification is that similar compounds with similar chemical properties may exhibit similar biological activity. Ligand-based virtual screening (LBVS) is based on the exploration of molecular descriptors gathered from known active compounds. In general, similar characteristics of a compound series are identified and subsequently applied as molecular filters. These filtering methods are used to discover potential lead molecules for experimental evaluation and reduce the chemical space to be explored in further screening steps (Geppert et al. 2010; Sliwoski et al. 2013). This is the main principle and motivation of LBDD, where a compound with interesting biological properties can act as template for finding potential lead molecules. Basically, three approaches –2D fingerprints, 3D methods and pharmacophores—are widely used for defining and quantifying chemical similarity in LBDD.

11.3.2.1 Pharmacophore Modeling

Pharmacophore model prediction is an essential way to describe those steric and electronic features needed for optimal interaction of lead with receptor molecules. According to the International Union of Pure and Applied Chemistry (IUPAC), pharmacophore is “the ensemble of steric and electronic features … necessary to ensure the optimal supramolecular interactions with a specific biological target structure to trigger or to block its biological activity.” (Kaserer et al. 2015). In drug discovery approaches with small molecules, it is important to analyze the assignment of proper protonation and tautomeric states of the lead molecules. Pharmacophore describes a set of interactions required to bind in the cavity of target molecules and a set of spatially arranged spheres of a certain type and diameter. These spheres are commonly known as pharmacophoric features (Fig. 11.2). They include hydrophobic centroids, hydrogen-bond acceptor, hydrogen-bond donor, positively ionizable groups and negatively ionizable groups— all common features which target their corresponding sites. For example, a hydrophobic feature corresponds to hydrophobic protein side chains in the cavity; and a hydrogen-bond acceptor feature has a hydrogen bond-donating counterpart in the protein (Langer and Hoffmann 2006; Wolber and Langer 2005). A pharmacophore model was built from a collection of known partial agonists, and it was validated with a newly discovered partial agonist. Pharmacophore models are frequently employed in virtual screening processes to find a potential lead molecule. For example, Mustata et al. developed a potential lead molecule against Myc-Max via a pharmacophore model generated using known disruptors. In another study, Petersen et al. identified a novel PPARγ partial agonist using a pharmacophore model (Mustata et al. 2009; Petersen et al. 2011). Pharmacophore-based screening processes match all the atoms or functional groups and the geometric relations between them to the pharmacophore in the query. Basically, two steps are involved in a pharmacophore-based search: in the first step, software checks all the lead molecules as to whether it has the atom type or functional groups required by the pharmacophore; then it checks whether the spatial arrangement of this element matches the query.

Fig. 11.2
figure 2

Basic pharmacophore features (a) and (b) show the superimposed lead molecule with the pharmacophore model

11.3.2.1.1 2D pharmacophore searching

Searching of a 2D database to find potential lead molecules is one of the crucial steps in drug discovery. Pharmacophore-based virtual screening has been used for the identification of potential hit molecules in drug development process. This approach can used to screen virtually millions of compounds for hit identification. However, problems can arise from substructure when the number of compounds identified reaches into the thousands. This problem can be rectified by collecting these compounds based on similarity between compound in the database and in the query (Vyas et al. 2008). The structure activity relationship of these compounds can be generated in these processes even before synthetic pans are made for lead optimization based on the biochemical data (Enyedy et al. 2003). Beyond structure similarity, activity similarity has also been the subject of several studies.

11.3.2.1.2 3D pharmacophore searching

3D pharmacophore modeling acts as an efficient filter for virtual screening of large compound libraries due its simplicity and abstract nature. The computational complexity of the hit identification process in virtual screening is greatly reduced by the sparse pharmacophoric representation of ligand-protein interaction. The generation of a query pharmacophore model that specifies the type and geometric constraints of the chemical feature is the first step in a typical pharmacophore-based virtual screening experiment. Both ligand-based and structure-based models can be created and used separately or in combination via parallel virtual screening. Ligand-based screening is generally used when crystallographic solution structure or modeled structure is lacking. Both ligand-based and structure-based pharmacophores significantly screen the potential novel compounds with similar features and activity that can bind the same site of the proteins based on the features of the known compounds as mentioned in the Fig. 11.3. Several software products such as Catalyst, Sybyl/Unity, MOE and Phase are widely used methods for ligand-based pharmacophore building. Structure-based methods in pharmacophore modeling have gained significant interest in recent years, and several new approaches have been described, including the application of pharmacophore fingerprints for lead identification (Karnachi and Kulkarni 2006; Langer and Hoffmann 2006).

Fig. 11.3
figure 3

Working method of 3D pharmacophore searching against small molecule databases

11.3.2.1.3 Fingerprinting

Pharmacophore fingerprints are defined as the binary encoded information about the presence or absence of pharmacophore features such as the centers and the three inter-center distances between them. By default, the seven center types that are probably the most important for the ligand-receptor interactions defined are: hydrogen-bond acceptor (A) and donor (D), groups with formal negative (N) and positive (P) charges, hydrophobic (H) and aromatic ring (R), and distance in a single molecule or a compound collection. Generally, fingerprinting focuses two or four-point pharmacophore fingerprints, but a larger number can be used, and utilization of up to nine pharmacophores has been described (Martin and Hoeffel 2000; Cato 2000). Traditionally, pharmacophore triplets are a widely used method and are most effective in terms of information content versus complexity; they are usually generated for a set of compounds instead of an individual one. For each compound, the flow energy conformer is calculated by every possible combination of three or four features and used to set the corresponding bit in the fingerprint. The obtained fingerprint is termed the ‘union key’ (Cato 2000). The generation of pharmacophore fingerprints for proteins with known binding site can be calculated from complementary site-points in the binding site. Methods such as ChemProtein module of Chem-X or the GRID program are often used for generation of site-points using a variety of probe atoms (Mason and Cheney 2000; Mason and Beno 2000). Chem-X is one of the most popular software packages. The fingerprinting in this module is defined according to all the potential pharmacophores that can be present in some low-energy conformer of the molecules. Another method, the Oriented Substituent Pharmacophore PRopErtY space (OSPPREYS) approach, introduced by Martin and Hoeffel, is aimed towards better representation of diversity and similarity in combinatorial libraries in the 3D pharmacophore space (Martin and Hoeffel 2000). Pharmacophore fingerprint methods have a wide range of applications; they can be used to measure molecular similarity (Willett 2006), to design libraries, to assess their diversity and to search them for novel active compounds (Beno and Mason 2001).

11.3.2.2 QSAR Modeling

Quantitative structure-activity relationship (QSAR) is a highly popular approach for ligand-based drug designing. This method significantly quantifies the correlation between the chemical structures of a series of compounds and a chemical or biological process. The basic mechanism underlying the QSAR method is that structurally similar molecules or those compounds having similar physiochemical properties yield similar activity (Akamatsu 2002; Verma and Hansch 2009). The first step of developing a QSAR model is identification of a group of chemical entities or potential lead molecules which show the desired biological activity. The developed QSAR model is then used to optimize the active compounds to maximize the relevant activity, and then it is tested experimentally for the desired activity. Mainly, four steps are involved in QSAR model prediction (Fig. 11.4). In the first step, potential lead molecules are identified with experimentally measured values of the desired biological activity. In second step, molecular descriptors associated with various structural and physiochemical properties of the molecules are identified, and in the third step, the correlation between molecular description and biological activity is discovered to explain the variation in activity in the dataset. Finally, the statistical stability and predictive power of the QSAR model is tested.

Fig. 11.4
figure 4

Working method of QSAR modeling and predictions

In the classical or the 2D QSAR method, various electronic, hydrophobic and steric features are correlated with biological activity for a congeneric series of compounds (Acharya et al. 2011). In the classical method the molecular descriptors used for correlation with activity are mostly representative of fragments of the parent molecule. The major advantage of the classical method is that it is more effective for a congeneric series of molecules; however, the fragment-based descriptors are usually inadequate to capture 3D conformational features of the crucial step for its activity (Winkler 2002; Bernard et al. 2005; González et al. 2009). To describe the 3D features of molecules the new 3D QSAR method was developed in which various geometric, physical characteristics and quantum chemical descriptors are used to describe the 3D features of a molecule; those descriptors are then combined to create a pharmacophore that can explain the biological activity of ligands (Chang and Swaan 2006). Then, a developed pharmacophore model is subjected to stability and statistical analysis to obtain the final 3D QSAR model. Several techniques including CoMFA, CoMSIA and catalyst are currently used for this drug designing approach.

11.3.2.3 CoMFA

Comparative molecular field analysis (CoMFA) is one of the 3D QSAR techniques mainly used to describe structure activity relationships in a quantitative manner. In this method a set of molecules is identified and aligned based on their 3D structures on a 3D grid and the values of steric and electrostatic potential energies are calculated at each grid point. The identified lead molecules should have a similar binding mode (identical binding) to the same kind of receptor. In the next step, a certain group of molecules is selected as a training set to derive the CoMFA model. The residual molecules are considered a test set, which independently proves the validity of the derived models. A pharmacophore hypothesis of this method is generated to orient the superposition of all molecules and to afford a rational and consistent alignment. It calculates the values in each grid point, i.e., the energy of molecules via a carbon atom, a positively or negatively charged atom, a hydrogen-bond donor or acceptor, or a lipophilic probe, correlating these values with the biological activity. Principle component analysis (PCA) and partial least squares (PLS) are the most widely used methods for development of pharmacophore in CoMFA. The developed model is then tested for statistical significance and robustness (Gohda et al. 2000; Akamatsu 2002; Yasuo et al. 2009). The result of this approach can be represented as counter maps that indicate points of the lattice where variations in field values are related to variations in biological activity. These maps can be used to estimate the regions of molecules where some types of interactions have a favorable or unfavorable influence on the biological activity . Recently, several modifications have been described which significantly are used as alternatives to CoMFA (Sen et al. 2012).

11.3.2.4 Comparative Molecular Similarity Indices Analysis (CoMSIA)

CoMSIA is another 3D QSAR method, introduced by Klebe and his coworkers (1994) based on the calculation of similarity indices between the alignment’s molecules and a common probe atom placed at the interaction grid. Most of the features of CoMSIA are similar to CoMFA; however, there are differences: The molecular field expression includes five different properties such as hydrophobic, hydrogen-bond donor and acceptor terms in addition to steric and coulombic contributions, and it calculates similarity indices instead of interaction energies by comparing each ligand molecule with a common probe. The statical evaluation of these field properties are correlated with the biological property by PLS analysis, but the counter maps are more contiguous and easier to interpret in CoMSIA because they are no cut-off values (Flower 2002; Klebe et al. 1994). To calculate the similarity indices, a Gaussian-type functional form is used to describe steric, electrostatic and hydrophobic compounds of the energy function, and it avoids using the arbitrary cut-off value for the energy calculation (Acharya et al. 2011). The Gaussian function also provides a smoother description of potential energy in regions near the van der Waals radius atom (Klebe et al. 1994).

11.4 Virtual Screening (VS) for Lead Discovery

The discovery of novel leads with potential interaction with targets is one of the important steps in drug discovery. This approach is conventionally achieved by wet-lab high throughput screening (HTS ) in many pharmaceutical industries, but due to the high cost and low hit rate, the alternative method is developed with broad application of the cheaper and faster screening of in silico approaches (Clark 2008; Ripphausen et al. 2010). Alternative virtual screening (VS) uses computational power to test a large set of small molecules in a limited time at low cost. VS is a stepwise process with a cascade of sequential filters able to narrow down and choose a set of lead-like hits with potential biological activity against intended drug targets. It can be broadly classified into two categories, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS ). A broad range of computational techniques that can be applied in this process includes drug likeness screening, counting scheme, functional group filters, topological drug classification, pharmacophore points filter and pharmacophore-based virtual screening. Molecular docking is a computationally intensive method that has been applied to very large databases of chemical structures.

Protein-ligand docking has become one of the widely used tools in modern drug discovery approaches to predict the most likely binding mode of small molecules at a particular receptor to explore specific interactions that may be formed and to estimate ligand-binding affinity. A number of protein-ligand methods are available to date, from academic groups to commercial software vendors. The binding free energy between protein and ligand molecules employs rather heuristic terms and these functions are referred as scoring function. Scoring functions is a very important step, which includes protein preparation, ligand database preparation, docking calculation and post processing. Basically, the scoring process composed of three different aspects relevant to docking and design. The first aspect is the ranking of the conformations generated by the docking research for one ligand interacting with a given protein; this aspect is crucial for detecting the binding mode that best approximates the experimentally observed situation. The second aspect is ranking the different ligands with respect to binding to one protein; that is, prioritizing ligands according to their affinity, which is essential in virtual screening and the third aspect is ranking one or different ligands with respect to their binding affinity to different compounds which is essential for the consideration of selectivity and specificity of ligands (Leach and Hann 2000; Lewis et al. 2000). The amount and quality of available information on the target protein is one of the key factors in designing a virtual screening project (Klebe 2006). The information on the coordinates of the features of the 3D structure of the known targets is valuable data and can be used to improve the quality of the results. The predictions of 3D structure of biomolecules are obtained by the three exemplary methods of NMR spectroscopy, X-ray crystallography and homology modeling. Currently, PDB contains more than 70,000 experimentally solved 3D structures of proteins that can be used as targets in VS and in homology modeling.

11.4.1 Protein Modeling

Proteins are the fundamental structural elements in living organisms; they act as catalytic agents, signal transmitters, transporters and molecular machines in cells (Nelson et al. 2008). Mostly, most the proteins are not functions individually; they must interact with other molecules to carry out their cellular roles, if any alteration in the protein interface leads to a pathological condition. Hence, the protein interface may be used as potential targets for rational drug designing approaches (Rask-Andersen et al. 2011; Jubb et al. 2015). Many experimental methods including NMR and X-ray crystallography have been used to identify and characterize the protein-protein interface at the level of individual atoms and residues, and various mass spectrometry-based approaches such as chemical cross-linking and hydrogen/deuterium exchange have been used, which typically report the location of interface at lower resolution (Hoofnagle et al. 2003; Kaveti and Engen 2006; Gobl et al. 2014; Shi 2014). Though these experiments provide valuable knowledge of the protein recognition mechanism, technical challenges such as expressing and purifying aggregation-prone protein samples, obtaining high quality crystals and protein size constraints are both labor-intensive and time-consuming. Hence, in the absence of an experimentally determined structure, an alternative computational approach such as comparative or homology modeling is used to predict the 3D model of proteins related to at least one known protein structure. The model gives the 3D structure based on its alignment to one or more known protein structures (Pieper et al. 2002).

11.4.1.1 Homology Modeling

Comparative or homology modeling is one the easiest methods among the three-structure prediction approach. In homology modeling, the structure process consists of fold assignment, target-template alignment, model building and model evaluation. There are several computer programs and web servers that automate the comparative modeling of proteins. Generally, the 3D structure of proteins can be achieved by several different approaches and is strongly dependent on the sequence identity (SI) or the percentage of identical amino acid residues present among the target sequence and their templates (Santos Filho and Alencastro 2003). Ab inito is the another method used for prediction of 3D structure of protein and mostly suitable, when there is no suitable template with significant sequential identity to the target sequence. If the sequence identity between target and template protein is above 30%, comparative or homology modeling is a suitable approach (Baker and Sali 2001; D’Alfonso et al. 2001). In practice, homology modeling consists of the seven important steps, which are template recognition and initial alignment, alignment correction, backbone generation, loop modeling, side chain modeling, model optimization and model validation (Peitsch et al. 2000; Westbrook et al. 2002; Orengo et al. 2002; Lo Conte et al. 2002).

Template selection is the initial step in safe homology modeling. The percentage of sequence identity between the sequence of interest (query) and a possible template can be detected by different software. The template model can be found using the query sequence from a database such as the protein data bank (Westbrook et al. 2002), SCOP (Lo Conte et al. 2002) and CATH (Orengo et al. 2002). Three main classes of protein comparison methods are involved in fold identification. Initially, the target sequence is subjected to pairwise sequence alignment with each database sequence independently to find its homologous sequence (Fiser 2010). Computational programs such as BLAST (Schäffer et al. 2001), FASTA (Srivastava et al. 2009) and CDART are frequently used methods for searching the related protein sequence and structure of the template. The second class of method employed is a multiple sequence alignment profile to compare the sequence using profile analysis profile-profile comparisons, Hidden Markov models and intermediate sequence search (Rychlewski et al. 2000 Yona and Levitt 2002; Zhou and Zhou 2005; Fiser 2010). SAM and PSI-BLAST (Karplus et al. 2003) are the most often used programs for this approach. The third class of method is also a pairwise alignment method, where the target sequence adopts any one of the many known 3D -folds predicted by an optimization of the alignment with respect to a structure-dependent scoring function independently for each sequence-structure pair; i.e., the target sequence is threaded through a library of 3D-folds (Kelley et al. 2000).

The next important step is a sequence alignment between the target and template structure. Mostly, fold assignment methods are widely used in this process and it is agreed that profile-based alignment produce better quality models than sequence-based alignments. In addition, HMM-based alignments produce higher quality model than PSSM-based method alignments produced by PSI-BLAST (Yan et al. 2013). A pairwise comparison of protein sequence and protein structure is matched against a library of 3D profiles, this method is also known as fold assignment. Once a list of potential templates is obtained using different searching methods, it is necessary to select a potential template more appropriate for the modeling problem. The selection of highest sequence similarity is the simplest template selection rule for modeling the protein (Retief 2000). After the selection of a potential template, a suitable method is used to construct the 3D model from template and alignments. Generally rigid-body assembly, segment matching, spatial restraint and artificial evolution are used for model building. This rigid-body assembly model relies on the natural dissection of the protein into conserved core regions, variable loops that connect them and side chains that decorate the backbone. The segment matching based on the construction of a model by using a subset of atomic positions from template structure and by identifying and assembling short. All atom segments in the model that fit the guiding positions can evaluated by scanning all the known protein structures (Xiang 2006). Several programs are available for modeling the query sequence. Andrej et al. developed MODELLER, which remains one of the most widely used comparative modeling methods. The spatial restraints approach is implemented in MODELLER. It starts by aligning the target sequence with the related known 3D structure, and the output obtained by this method contains a molecular structure that includes main chain and side chain non-hydrogen atoms similar to the known structure. In addition to MODELLER, other tools including Swiss Model, RAMP, PrISM, COMPOSER, CONGEN+2 and DISGEO/Co-sensus are often used in homology modeling (Schwede et al. 2003; Vyas et al. 2012). This homology modeling approach is described in several available programs, both in the commercial and public arena.

Model evaluation and validation is necessary to construct a model with good stereochemistry; the most important factor in the assessment of constructed models is the scoring function, and programs evaluate the location of each residue in a model with respect to the expected environment as found in the high-resolution X-ray structure. The stereochemistry of the modeled protein can be verified by the analysis of parameters like bond lengths and angles, torsional angles and chirality of residues using PROCHECK (Laskowski et al. 1993), WHATCHECK (Hooft et al. 1996), PROSA (Sippl 1993) and Molprobity (Davis et al. 2007; Chen et al. 2010). The reliability of a predicted model is also subject to a check of other parameters such as planarity of the peptide bond, chirality of the Cα, bond length and angles in the main chain, the planarity of aromatic system, the inner backing of globular proteins and the elements of the secondary structure, hydrophobic and hydrophilic residues of the predicted protein structure (Schwartz et al. 2001).

11.4.1.2 Threading

In comparative modeling it has been observed that the careful alignment of the corresponding amino acid residues of the unknown proteins with a similar sequence, often closely related homologues, tend to have similar 3D structure with similar conformations. When no sequences are clearly related to the modeling target, the alternative method of threading is employed to predict structure via fold recognition. Protein threading, i.e., sequence-structure alignment, is a promising template based on fold recognition, which identifies a suitable fold from a structure library for the query sequence and provides an alignment between the query protein and the fold (Shan et al. 2001). The word ‘threading’ was first coined by Jones et al. (1992); the original term was ‘optimal sequence threading,’ later it shortened ‘threading.’ In this method, the query sequence is threaded onto the backbones of the template structures. Threading requires four basic components: (1) a template library representing the 3D protein structure to be used as the template; (2) an energy function to describe the fitness of any template; (3) a threading algorithm to search for the lowest energy among the possible alignments for a given sequence-template pair; (4) a criterion to estimate the confidence level of the predicted structure. The treading method is further classified into two broad categories, singleton threading, in which the threading considers only the preference of amino acids in the query sequence at single sites of the templates; and a category that uses the preference on pairs of amino acids in the query sequence within contact distance when they are aligned to a given structure. Singleton threading constructs a 1D structure profile for each amino acid residue position in a template using the 3D structural information, such as secondary structure type, degree of environmental polarity and fraction of residue surface accessible to solvent. Typically in threading, it is assumed that the backbones of the structures are rigid and only the amino acid side chains of the query and the template are different. Threading exploits the fact that proteins with different functions can possess a similar structure even though they may have little to no sequence similarity. Loopp and therader are software (learning, observing and outputting protein patterns (Tobi and Elber 2000; Meller and Elber 2001; Teodorescu et al. 2004) can be used for structure prediction via fold recognition. Both loop and threader rely on similar strategies, yet they use different energy and scoring functions to generate possible alignments with feasible templates. THREADER uses solved protein structure as a scaffold on which to place the target protein sequence and analyze secondary structure information about the target sequence used to force alignment between predicted secondary structures of the target. It uses a set of basic knowledge-based potentials such as statistical data compiled from known protein structure and pairwise pseudo-energy to indicate misfolded proteins.

The strategy of LOOPP is similar to THREADER, but it differs in its implementation of an empirical energy function and its scoring method. The most notable aspect of LOOPP is its extensive parameterization, which is based on the structure from the protein data bank (PDB) and a database of close to five million decoy structures (Berman et al. 2000; Tobi and Elber 2000). Three novel implementations of common protocol—the pairwise contact model, gap penalties and Z-scores—differentiate LOOPP from other threading methodologies. It creates a new pairwise interaction model (empirical energy function) acting as the key to devising a truly novel threading algorithm. Basically, two main types of empirical energy functions exist in this method: (1) those that pairwise residues contacts for residues within a specified distance of one another; (2) those based on the environment of an amino acid residue at a point in the structural lattice (Meller and Elber 2001). Several threading programs including the NCBI threading package (Bryant and Lawrence 1993), PROFIT (Sippl and Weitckus 1992), PROSPECT (Xu et al. 1998), CASP-3 (CASP 1999), TOPITS (Rost and Sander 1995) and SAS (Milburn et al. 1998) are used for singleton and pairwise interactions. The NCBI threading package provides a good statistical assessment of a threading result, and recently CASP-3 was used as a top performer in threading with pairwise interactions.

11.4.1.3 Ab Initio Method

Ab initio method is one of the modeling technique often used for structure prediction when the sequence of the query proteins has either no or a low amount of similarity and in this method the query protein is folded with a random conformation. The ab initio method is based on the thermodynamic hypothesis proposed by Anfinsen, according to which the native structure corresponds to the global free energy minimum under a given set of conditions (Floudas et al. 2006). Basically, the ab initio category has two subclasses, fragment-based and biophysics-based methods. These are often called, respectively, first-principles methods that employ database information and first-principles methods without database information (Floudas 2007). All types of proposed approaches rely on minimization of the energy function over the conformation parameters. The typical method has four basic steps for finding the conformation with the lowest energy: (1) start with an unfolded/arbitrarily folded conformation; (2) generate alternative conformations using some heuristics; (3) estimate their corresponding energy; and (4) again, generate the alternative conformation until the final criterion is reached. Parameters like energy function accuracy, search algorithm efficiency and selection of the best models play a crucial role in the structure prediction ab initio method. In the basic modeling, folding process, and quantum mechanics is used to model and estimate the interactions of atoms. Currently, a high performance computing facilities force field (FF) or energy function are employed to express a variety of atomic interactions such as van der Waals, torsion angles, electrostatics and bond length. Energy functions are usually associated with the search procedure to locate the conformation that has the minimum energy function value. The most popular optimization methods are molecular dynamics and Monte Carlo simulation (Adcock and McCammon 2006). The category of ab initio prediction with database information focuses only on predicting as accurately as possible a protein’s final configuration. In this approach, the structure prediction starts with the primary amino acid sequence, which is searched for different conformations, leading to the prediction of native folds. After the folds have been recognized and predicted, the model assessment is performed to verify the quality of the structure. ROSETTA and I-TASSER are widely used fragment-based enhanced methodologies for ab initio structure prediction of a protein. TASSER was initially created in 2004 by Zhang and Skolnick (2004), and later the enhanced versions Chunk-TASSER (Zhou and Skolnick 2007) and I-TASSER were developed in structure prediction (Wu et al. 2007). TASSER is a hierarchical approach that encompasses three phases, thus its name: threading/assembly/refinement (“TASSER”). The first step, threading, is an iterative sequence-structure alignment algorithm that uses the program PROSPECTOR_3 (Skolnick et al. 2004). The second step, assembly, uses parallel hydrophobic Monte Carlo sampling by rearranging the template fragments (Zhang et al. 2005). The final step, refinement, is performed using a clustering program called SPICKER (Zhang and Skolnick 2004), and the full atom optimization is conducted using the CHARMM22 force field. ROSETTA prediction involves the identification of small fragments from the structural databases consistent with a local sequence preference.

11.4.1.4 Protein Validation Server

Protein structure has proved to be a crucial piece of information for biochemical research. From the millions of currently sequenced proteins only a small fraction is experimentally solved for structure, and the only feasible way to bridge the gap between sequence and structure data is computational modeling. Unlike experimental structure, the accuracy of a computationally modeled structure can be estimated by a broad range of the accuracy spectrum. Over the past two decades, several approaches have been developed to analyze the accuracy of the protein structure and model. They use stereochemistry checks, molecular mechanics energy-based functions and statistical potentials to tackle problems. Typically, features like molecular environment, hydrogen bonding, secondary structure, solvent exposure, planarity, chirality, phi/psi preference, chi angles, non-bonded contact distances, unsatisfied donors/acceptors, pairwise residue interaction and molecular packing are analyzed in these approaches. A good quality protein should resemble a native protein, with spatial features of the residues complying with empirically characterized constraints on torsional angles captured in Ramachandran plots (Ramachandran et al. 1963). PROCHECK (Laskowski et al. 1993) and MolProbity (Chen et al. 2010) are widely used programs for determining whether a modeled protein structure has native-like features. Traditionally, several studies have examined protein structures using an all atom-based description. Ramachandran’s plot with backbone dihedral angle ɸ (N-Cα) and ψ (Cα-C) is a representative microscopic description of the protein structure. Dihedral angle prediction has several applications in protein structure prediction; which include secondary structure prediction (Rost 2001; Wood and Hirst 2005; Kountouris and Hirst 2009), generation of multiple alignments (Huang and Zou 2006a, b; Miao et al. 2008), identification of protein fold (Karchin et al. 2003; Zhang et al. 2008) and fragment-free tertiary structure prediction (Faraggi et al. 2009). Quality assessment is an important step in the modeling process, wherein processes like template level, alignment level, selected fragment level and structural level error are analyzed. A template structure for a target sequence is identified by considering the significance of the score that indicates the fitness of the target to the template. In principle, most frequently the statistical significance of a raw score is considered as either in the form of the E-value (homology search) or the Z-score (used in threading algorithms). Z-score are calculated as measured value minus population mean, divided by the standard deviation of the population. So, a Z-score is negative if the value of X is less than the mean, and it is positive if the measured value is greater than the mean value. WHAT IF uses this criterion a lot to calculate Z-score. The Z-score provides basic information about the root mean square of a population with a Z value and it should be 1.0.

11.4.2 Protein and Ligand Preparation

The success of the various drug designing approaches depends largely on whether reasonable starting structures are used for both the protein and the ligand. The protein structure that is retrieved from PDB (X-ray structure) consists of heavy atoms and may contain water molecules, cofactors, activators, ligands and metal ions as well as several protein subunits and does not have the information on bond orders, topologies. Because of the above structural issues, several protein preparation approaches have been developed (Sastry et al. 2013; Pitt et al. 2013). The determination of protonation states of the amino acid in protein molecules is the first crucial step in protein preparation. Several freely available software packages including PROPKA (Li et al. 2005), H++ (Anandakrishnan et al. 2012) and SPORES (ten Brink and Exner 2010) are widely used for determining the first step of the protein preparation. The next important step is to assign hydrogen atoms and optimize protein hydrogen bonds according to an optimal hydrogen bond network. PDB2PRO software is a widely used tool for these tasks (Dolinsky et al. 2007). The next step is assignment of partial charges, capping of residues, treating metals, filling missing loops and missing side chains and minimizing the protein structure to relieve steric clashes; also, a crucial decision must be made regarding whether water molecule will be left in or removed from the binding site. To tackle the above mentioned challenging problems, freely available tools such as 3D-RISM (Kovalenko 2003; Young et al. 2007; Abel et al. 2008), SZMAP (Myrianthopoulos et al. 2016), JAWS (Michel et al. 2009) and WaterMap (Young et al. 2007; WaterMap, Schrödinger 2014) are utilized in commercial software (Jorgensen and Tirado-Rives 2005; SZMAP Sofware Inc.). In the case of a co-crystallized protein structure with substrates and cofactors, Protein Preparation Wizard of Maestro (Maestro, Schrödinger, LLC) is used to assign proper bond orders and generate accessible tautomer and ionization states prior to virtual screening.

The selection of the type of ligand molecule chosen for docking is another important step in virtual screening. The type can be obtained from various databases like ZINC or pubchem, or it can be sketched by means of Chemsketch or Chemdraw tools (Dias and de Azevedo 2008). A wide variety of small molecule databases are available for virtual screening-based drug designing. Many of them are free and possess desirable characteristic lead molecules. ZINC is a public access database, contains number of commercially available compound that are mostly developed in the pharmaceutical chemistry department at the University of California, San Francisco. NCI is an another open database developed by the Developmental Therapeutics program of the National Cancer Institute, NIH; it currently contains over 250,000 molecules from both organic synthesis and natural sources. ASINEX is a regularly updated commercial database currently containing 600,000 screening compounds, 27,000 macrocycles, 23, 000 fragments and 7000 building blocks. SPECS is a monthly updated database containing more than 240,000 novel drugs—drug-like small molecules obtained from an academic research institute. MAYBRIDGE is one of the widely used commercial databases containing a screening hit discovery collection more than 53,000 and offering a fragment library of 30,000. CHEMBRIDGE encompasses one million drug-like and lead-like molecules in two non-overlapping collections of respectively 460,000 and 620,000 compounds. After selection of potential lead molecules, it should be preprocessed before docking. There are several thousand small molecules in a ligand database, so one must avoid performing manual steps in data preparation. Typically, information on available ligands is stored in 2D form in databases, serving as a data repository. Currently, several thousand small molecules are available in various databases; Table 11.2 shows widely used small molecule repositories. The 2D structure retrieved from these repositories of atom and bond types must be checked and corrected; protonation states and charges have to be assigned. Then, 3D structures must be converted for calculating ligand conformation like rotational barriers or side-chain rotamers allowed. In addition, protein-ligand interactions including site-points that guarantee proper hydrogen-bonding directionality must be assigned (Claussen et al. 2001). LigPrep is the most widely used module for ligand preparation implemented in Schrödinger (LigPrep, Schrödinger 2011). In this module, ionization/tautomeric states are generated with either a pair of fast rule-based programs or with Epik, which is based on the more accurate Hammett and Taft methodologies (Shelley et al. 2007; Epik, Schrödinger 2011).

11.4.3 Active Site Prediction

Binding site prediction and characterization of small molecules is more important for drug discovery. Often, possible binding sites for potential small molecules are known for co-crystal structures of the target or a closely related protein with natural ligand molecules. Recently, Hajduk and coworkers used heteronuclear-NMR-based screening to identify and characterize the ligand binding site on a protein surface (Hajduk et al. 2005). By screening a large number of lead-like molecules against 23 target proteins, the results revealed that 90% of the ligand molecules bonded to specific locations on the protein surface, depicting that certain properties of small-molecule binding sites should be common to general molecular recognition. Mostly computational studies have been used to predict the binding site for an unknown or if a new binding site is to be identified, e.g., allosteric molecules. Computational methods like Q-SITEFINDER, POCKET (Levitt and Banaszak 1992), SURFNET (Laskowski 1995), APROPOS (Peters et al. 1996), LIGSITE (Hendlich et al. 1997), CAST, CASTp (Binkowski et al. 2003) and PASS (Brady and Stouten 2000) are often used for binding site prediction. Computational methods for the identification of a binding site can be categorized into three major classes: (1) geometric algorithms to find the shape concave invagination in the protein molecules; (2) energies-based method; and (3) method considering dynamic of protein structures. Geometric algorithms find a putative binding site through detection of cavities on a protein surface. In this algorithm, grids are used to describe the molecular surface of the protein, and the boundary of the binding site is determined by rolling a spherical probe over the grid surface. This kind of algorithm is used in SURFNET, LIGSITE and POCKET, where spheres are placed between all pairs of target atoms and then the radius of sphere is reduced until each sphere contains only a pair of atoms. An et al. (2005) developed the Pocket Finder algorithm and expanded the geometric method by countering a smoothed van der Waals potential for the target protein to identify candidate ligand binding sites. The new technique of Sitemap, developed by Schrödinger, Inc., identifies the known binding site in >96% of cases by linking together site-points that contribute to tight protein ligand binding. Sitemap provides quantitative and geographical information that helps guide efforts to modify ligand structure to enhance properties (Halgren 2007; Halgren 2009) (Table 11.1).

Table 11.1 Widely used small molecule repositories with basic information about the class of the compounds and their size

11.4.4 Molecular Docking

In a modern drug discovery approach , protein-ligand and protein-protein interaction mechanisms play a significant role in predicting orientation of the ligand when it is bound to a protein receptor or enzyme using shape and electrostatic interaction to quantify it. Molecular docking is an attractive scaffold for understanding protein-ligand interaction in a rational drug design and drug discovery; in the mechanistic study a molecule is placed into the binding site of the receptor molecules mainly in a non-covalent fashion to form a stable complex of potential efficacy and more specificity (Rohs et al. 2005; Guedes et al. 2014). The information obtained from a docking study can be used to study the binding energy, free energy and stability of drug-biomolecular complexes with optimized conformation and with the intention of possessing less binding free energy. The basic two steps involved in molecular docking, usually related to sampling methods and scoring schemes, are (1) prediction of ligand conformation and position and orientation within these sites (usually referred as pose) and (2) assessment of binding affinity (Fig. 11.5).

Fig. 11.5
figure 5

Basic steps involved in molecular docking approach. (a) Three-dimensional structure of lead molecules; (b) three-dimensional structure of the protein; (c) ligand is docked into the binding site of the protein; (d) binding affinity and interactions of ligand molecules with protein

Most of the docking tools employed the searching algorithms including genetic algorithms (GA), Monte Carlo algorithms, molecular dynamics algorithms and conformational search algorithms in the molecular docking method. Conformational search algorithms perform in the docking approach by applying systematic and stochastic search methods (Agrafiotis et al. 2007; Yuriev et al. 2011). The basic methodology of molecular docking falls into three categories: induced fit docking, where both ligand and receptor molecules are flexible; rigid body docking, where ligand and receptor molecules are rigid; and flexible docking method, in which it is also the case that both interacting molecules are flexible (Meng et al. 2011). The molecular docking process involves the following major steps: (1) Preparation of protein—before docking, the 3D structure of the receptor molecule (retrieved from either PDB or molecular modeling) should be pre-processed by stabilizing the charges, filling the missing residues, and generating and removing free water molecules from the cavity. (2) Active site prediction—the binding site of the receptor molecules should be predicted in this step; the water molecules and hetero atoms are removed. (3) Ligand preparation—the small molecules can be retrieved from small molecule databases while choosing the ligand molecules; the LIPINSKY’S RULE OF 5 should be utilized. (4) Docking—the final step, where the ligand is docked against the protein and the interactions are analyzed; the scoring function finds the docking scores based on best pose of docked ligands complex. Over the last two decades, approximately 60 different docking tools and programs have been developed for both academic and commercial use, including DOCK (Venkatachalam et al. 2003), Auto Dock (Österberg et al. 2002), FlexX (Rarey et al. 1996), Surflex (Jain 2003), GOLD (Jones et al. 1997), ICM (Schapira et al. 2003), Glide (Friesner et al. 2004), Cdocker, LigandFit (Venkatachalam et al. 2003), MCDock, FRED (McGann et al. 2003), MOE-Dock (Corbeil et al. 2012), LeDock (Zhao and Caflisch 2013), AutoDock Vina (Trott and Olson 2010), Dock (Ruiz-Carmona et al. 2014) and UCSF Dock (Allen et al. 2015). Table 11.2 shows the basic information on the currently used docking tools and scoring functions.

Table 11.2 Basic characteristics of widely used docking tools

11.4.5 Scoring Methods

Molecular docking approaches use scoring functions to calculate the binding energies of the predicted ligand-receptor complexes. Scoring function is a key element of a protein-ligand docking algorithm, determining the accuracy of the algorithms (Gohlke and Klebe 2001; Schulz-Gasch and Stahl 2004; Jain 2006; Rajamani and Good 2007; Gilson and Zhou 2007). Speed and accuracy are the important aspects basic to a scoring function. Several scoring functions have been used mainly to delineate correct poses from incorrect poses, or binders from inactive compounds within a reasonable computation time. Overall, scoring functions can be divided in the three categories of as force field-based, empirical-based and knowledge-based scoring functions (Kitchen et al. 2004). A classical force-field scoring function estimates the binding energy of a complex by calculating the sum of bonded terms such as bond stretching, angle bending and dihedral variation, and non-bonded terms including electrostatic and van der Waals interactions. Electrostatics terms use a set of derived force-field parameters such as AMBER or CHARMM (Miller et al. 2017) and are calculated by a coulombic formulation. In addition to the above electrostatic terms, the force field-based scoring function also considers hydrogen bond, solvation and entropy contributions. The software such as DOCK (Kuntz et al. 1982), GLOD (Shoichet et al. 1993) and Auto Dock (Morris et al. 1998) offer users such functions. Force fields are mathematical expressions describing the dependence of energy of a system on the coordinates of its particles. The force-feild scoring function shows some differences in the treatment of hydrogen bonds in terms of the energy function used, and it is further refined with other techniques such as linear interaction energy (Michel et al. 2006) and free-energy perturbation method (FEP) (Kollman 1993; Briggs et al. 1996) to improve accuracy in predicting binding energies. To reduce computational expense, alternative approaches such as Poisson-Boltzmann/surface area (PB/SA) and the generalized-Born/surface area (GB/SA) models were used to measure accuracy by treating water as a continuum dielectric medium (Rocchia et al. 2002; Liu and Zou 2006; Lyne et al. 2006; Thompson et al. 2008; Guimaraes and Cardozo 2008).

Empirical scoring function is another method to evaluate the types of physical events involved in the formation of the ligand-receptor complex. The binding energy of a complex is calculated by summing up a set of empirical energy terms including van der Waals energy, electrostatic energy, hydrogen bonding energy and desolvation terms. Each empirical energy term component is multiplied, and corresponding coefficients are determined by reproducing the binding affinity data of a training set of protein-ligand complexes with known three-dimensional structure using least squares fitting (Ballester and Mitchell 2010). Due to the simple energy terms and the nature of their fitting to known binding affinities of the training set, empirical scoring functions are computationally more efficient and faster than force-field-based methods. Molecular docking tools such as Surflex and FlexX and Glidescore (Friesner et al. 2004; Halgren et al. 2004), PLP (Gehlhaar et al. 1995; Gehlhaar et al. 1999), SYBYL/F-Score (Rarey et al. 1996), LigScore (Kramer et al. 1998) and Chemscore are some examples of programs that use empirical scoring functions (Jain 2003). Table 11.3 provides the widely used scoring functions implemented in the most frequently used molecular docking programs.

Table 11.3 Provides widely used empirical scoring functions in frequently used molecular docking tools

A third approach includes knowledge-based scoring functions that use statistical analysis, which are directly derived from the structural information in an experimentally determined protein-ligand complex to obtain interatomic contact frequencies and distance between the ligand and protein. Further, this approach uses pairwise energy potentials derived from a known ligand-receptor complex to obtain a general function (Huang et al. 2006). These potentials are constructed by considering the frequency distribution and the score is calculated by summing up of the individual interactions. Compared to force field and empirical scoring functions, knowledge-based scoring functions offer a good balance between accuracy and speed and are relatively robust and also enable the scoring process to be as fast as the empirical scoring function (Muegge 2006; Huang and Zou 2006a, b). Recently, a consensus scoring method has been developed which combines several scores to assess the docking conformation.

11.4.6 Molecular Dynamics (MD) Simulations

Molecular dynamics (MD) simulations of recent years play a critical role in computational drug discovery. Simulation studies can provide detail concerning individual particle motion as a function of time and use physics-based energy functions and explicit representations of atomic systems to model protein dynamics. MD simulation studies provide basic information to evaluate the stability and functions of the protein and to monitor the specific behaviors over the course of many simulations and provide information about target structure or properties unobtainable from static native structure. MD simulation was first developed in the late 70s when Alder and Wainwright performed it using a hard-sphere model. The first molecular simulation of BPTI was done in 1975 with a crude molecular mechanics potential for only 9.2 ps (Adcock and McCammon 2006). Molecular dynamics simulation mimics the physical motion of each atom in the macromolecule present in the actual environment. Each atom of a protein molecule can interact for a certain period of time, which helps in the computation of their trajectory in and around the protein molecules. A variety of properties such as free energy, kinetics measures and other macroscopic quantities of macromolecules can be calculated by using the trajectories. Several studies revealed the role of classical MD simulations to obtain different conformations of proteins and nucleic acids, including early attempts to stimulate spontaneously complex phenomena such as protein folding (Frenkel and Smit 2001). In recent research, MD simulation has been widely used to overcome the major limitation of static structure-based drug design and also to characterize routinely applied ligand docking calculations which do not sample the major protein conformational rearrangements during ligand binding (Carlson 2002; Fanelli et al. 2008). MD simulation is a multistep process that starts with the knowledge of the potential energy of the system with respect to its position coordinates, and these position coordinates help to compute the force acting on the individual atoms of the system. The next important step is simulation environment, which gives the actual environment including optimum pressure and temperature. In general, protein simulation is done in a canonical ensemble (NVT), particularly the initial equilibrium steps, or it is done in an isothermal-isobaric (NPT) ensemble. For simulation, the protein molecule should be kept in the unit cell and solvated with a suitable explicit solvent. Several explicit water models include TIP3P, TIP4P (Jorgensen et al. 1983), TIP5P (Mahoney and Jorgensen 2001), SPC and SPC/E (Berendsen et al. 1987) are the most popular models used to imitate the specific nature and complexity of molecule hydration, including orientation of solvent dipoles and effective electrostatic shielding, subtle hydrogen bond network rearrangements, saturation of hydrophobic surface and accompanying changes in entropy.

There are two main families of MD simulation methods, classical and quantum simulation, which are distinguished based on the model chosen to represent a physical system. A basic ball-and-stick model of molecules was used in classical molecular simulation, where the atoms correspond to soft balls and elastic sticks correspond to bonds. Several force fields are widely used in the molecular simulation approach. AMBER (Case et al. 2005), NAMD (Phillips et al. 2005), CHARMM (Brooks et al. 1983) and GROMOS (Pronk et al. 2013) are widely used force fields which differ principally in the way they are parameterized, but they generally give similar results. Quantum simulation or first principle MD simulation began in 1980s with the seminal work of Car and Parinello, explicitly taking into account the quantum nature of the chemical bond. Due to the invention of high configurational computer and the advent of graphical processor unit (GPU) architectures, MD simulation software can efficiently run on innovative hardware infrastructures, surpassing alternate conventional methods. Even these methods, running on specialized hardware fails to describe the slow unbinding events. In fast-paced drug discovery programs, this is the major issue limiting the use of MD-based simulation for kinetic prediction (Borhani and Shaw 2012). However, sampling issues have led the development of several innovative algorithms that form the basis of the enhanced sampling method, speeding up the description of slow processes and accelerating the rare events characterized by high-in-free-energy states (Abrams and Bussi 2014). Sampling methods including free energy perturbation (Jorgensen and Thomas 2008), umbrella sampling, replica exchange, meta-dynamics (Laio and Parrinello 2002), steered MD (Isralewitz et al. 2001), accelerated MD (Hamelberg et al. 2004) milestoning (Faradjian and Elber 2004), transition-path sampling (Bolhuis et al. 2002), Monte Carlo sampling of conformational space, quantum mechanics/molecular mechanics (QM/MM) and molecular docking simulation are recently used methods for studying protein-ligand binding and estimating the associated energy and kinetics (Durrant and McCammon 2011; Harvey and Fabritiis 2012).

11.4.7 QM/MM Simulations

Most of biological systems such as enzymes are heavy atoms, too large to be described at any level of ab initio theory, and classical molecular mechanics force field is not sufficiently flexible to model processes in which chemical bonds are broken or formed and make a proper model of the complex environment of the reaction, which involves efficient thermal averaging of the energy landscape. To overcome these issues, an alternative approach has been developed that treats a small part of the system at the level of quantum chemistry (QM) while retaining the computationally cheaper force field (MM) for the large part (Fig. 11.6).

Fig. 11.6
figure 6

Showing the focused QM region inside the MM region of the whole protein

This hybrid strategy QM/MM simulation was introduced by Warshel and Levitt and become a power full tool for the analysis of the enzyme reaction mechanism, playing a significant role in exciting applications like drug design (Gao and Truhlar 2002; Shaik et al. 2010; van der Kamp and Mulholland 2013; Lonsdale and Mulholland 2014). Basically, three classes of interaction are involved in QM/MM potential energy: interaction between atoms in the QM region, interaction between atoms in the MM region and interactions between QM and MM atoms. Quantum mechanics calculations are also an essential complement or alternative in the interpretation of outcomes of experiments by theoretical prediction of a molecular characteristic such as electrical and magnetic ones and properties related to geometrical derivatives (Cohen et al. 2012). QM treats molecules as a collection of nuclei and electrons, without any reference to chemical bonds, which is important in understanding the behavior of system at the atomic level. This method applies the lows of QM to approximate the wave function of Schrödinger equation in terms of the motions of electrons (Atkins and de Paula 2006; Tannor 2008). QM methods are a more accurate but they entail an expensive and time-consuming calculation. Calculations are employed in semi-empirical methods such as AM1 and PM3 only for valence electrons in the system. The combined QM-MM methods provide the accuracy of QM description with the low cost of MM (Lin and Truhlar 2007; Menikarachchi and Gascon 2010; Honarparvar et al. 2014). Quantum mechanics-based methods such as ab initio and the density functional theory (DFT) method fall within the approximate range of a few picometers to nanometers. These electronic structures allow accurate theoretical studies to be certain to extend to both macromolecules (synthetic polymers and proteins) and condensed matter (liquid and solids). DFT provides all the information on the system and avoids the wave function calculation. DFT is rooted in the Hoenberg–Kohn theorems, according to which the exact energy of a molecular system depends on its electron density; the latter being a function of the electronic coordinates. The total energy of a system can be calculated by the sum of several functionals such as kinetic energy, nucleus-electron potential energy, electron-electron repulsion energy and exchange-correlation functional. The choice of QM method, choice of MM force field, segregation of the system into QM and MM regions, simulation types and the advanced conformational sampling are the five important aspects of QM-MM calculation of an enzyme. The choice of QM method is crucial: there are different QM methods ranging from fast, semi-empirical methods to more accurate and more computationally expensive methods; however, not all the methods are applicable to all systems for reasons of accuracy, practicality or due to lack of parameters. The Table 11.4 shows the accuracy of different quantum methods.

Table 11.4 Accuracy of different quantum mechanics methods

11.5 Drug Delivery Approach Using Computational Methods

In drug delivery approach,  potential drug molecule must have the capability to sustain its effectiveness, posing key challenges to effective drug delivery; an administered drug must penetrate obstacles such as endo or epithelial membranes and also survive the host’s defenses to be effective. Hence, to overcome these challenges requires some form of drug encapsulation such as the unique molecular encapsulation architecture known as a drug delivery system (Allen et al. 2004; Blanco et al. 2015). This new approach of controlling the pharmacokinetics, thermodynamics, non-specific toxicity, immunogenicity, biorecognition and efficacy of drugs was generated to minimize drug degradation and loss and to prevent harmful side effects and increase drug bioavailability and the fraction of the drug that accumulates in the required zone (Reddy and Swarnalatha 2010). Several mechanisms are involved in a drug delivery system such as drug formulation, medical device or dosage technology to carry the drug inside the body and a mechanism for the release. Most of the commercial applications of nanoparticles in medicine are directed to drug delivery, for which several solutions have been proposed, including liposomal and lipid-based colloidal nanoDDS, nanoparticulate polymeric micelles (as drug carrier and polymer-based nanoparticulate DDS. Molecular modeling and computational chemistry provide several tools such as quantum mechanical ab initio methods, molecular dynamics, free energy perturbation and docking to quantify drug-carrier, carrier-medium and drug-medium interactions (Neumann et al. 2004).

11.6 Polymer Used as Carrier

Polymers are naturally occurring substances with high molar masses and a large number of repeating units; they play a significant role in the development of drug delivery systems by releasing both hydrophilic and hydrophobic drug molecules. Covalent bond formation of polymers with drug molecules carries the drug molecules to their respective site. Hence, there are several advantages of polymers acting as inert carriers to which a drug can be conjugated; for example, polymers improve pharmacokinetic and pharmacodynamic properties of drug molecules. Polymers is an important constituents of pharmaceutical forms such as solid dosage as in tablets and capsules; they can be dispersed in a system like a suspension, emulsion, cream or ointment; and they can be made into a particulate system, microcapsules, microparticles and nanoparticles; and they are accepted that formulation in clinical performance of pharmaceutical dosage forms (Duncan 2003; Raizada et al. 2010). The main function of a polymeric carrier is to carry and transport drug molecules to the site of action. This polymeric drug delivery system significantly protects the drug molecule from interaction with other macromolecules including proteins and nucleic acids, which could alter the chemical structure of the drug molecules. Both non-biodegradable and biodegradable polymers have been used in drug delivery systems. Based on their desirable physical properties, polymers are selected and used in both non-biological and biological settings. Polymers such as polymethyl methacrylate, polyvinyl alcohol, polyurethane and polyethylene are a few examples of polymer use in non-biological processes. In recent years, polymers have been used as carrier molecules due to their unique features such as chemical inertness, freedom from impurities, appropriate physical structure and ability to be processed readily. Polyethylene-co-vinyl acetate, polymethyl methacrylate, polyvinyl alcohol, poly-N-vinyl pyrrolidine, polyacrylic acid and polyacrylamide are often used in controlled drug delivery system (Poddar et al. 2010; Harekrishna Roy et al. 2013). Smart polymers are those having the capability to change their properties in response to the changes in biological conditions (Yang and Pierstorff 2012). Several stimuli including temperature, pressure, pH electric field, magnetic field, light, change in concentration, ionic strength and potential may influence the changes in nature of polymer properties (Schmaljohann 2006). For example, a temperature-responsive polymer brings about changes in hydrophilicity/hydrophobicity of polymers, enhancing their membrane permeation. This alteration in polymer properties can be used to allow adhesion to a cell surface, to break down a cellular membrane and to release biologically active compounds. Recently, polymers have been used for developing controlled drug release systems and sustained release formulations, which help regulate drug administration by preventing under- or overdosing. These advanced drug-releasing systems play a significant role in improving bioavailability, minimizing side effects and other types of inconveniences (Liechty et al. 2010).

11.6.1 Drug-Polymer Interaction

Most computational studies for drug delivery use molecular dynamics simulation, which mimics the natural pathway of molecular motion to sample successive configuration. Newton’s law and Maxwell–Boltzmann distribution assign initial velocity of molecules at a given temperature. The interactions between molecules at each time are computed and then equations of motion are solved numerically with an appropriate time step to update the velocities and position for the next successive steps (Frenkel and Smit 2002).

In classical molecular dynamics simulations, the interaction of molecules can be described by a force field with certain functional forms and several parameters. A force field such as AMBER (Cornell et al. 1995), OPLS (Jorgensen et al. 1996) and CHARMM (Mackerell et al. 1998) is widely used to study polymer and peptide drug interactions. Interactions such as hydrogen bonding (Zhang et al. 2012; Miyazaki et al. 2011), dipole-dipole interaction (Marsac et al. 2009; Khougaz and Clas 2000), ionic interaction (Yoo et al. 2009; Kindermann et al. 2011) and van der Waals interaction (Marsac et al. 2009) generally occur between drug and polymer. Dissipative particle dynamics (DPD) is a widely used mesoscale simulation for identifying and defining chemically distinct components and defining interaction parameters between various chemical species. In this model, a fluid system is simulated using a set of interacting particles. Each particle represents a cluster of small molecules instead of a single molecule. Drug, polymer, surfactant and solvent are represented as distinct bead types. Polymer bead number length is determined by

$$ {N}_{\mathrm{DPD}}=\frac{Mp}{Mm\ C\infty }, $$

where Mp is polymer molecular weight, Mm monomer molecular weight and C∞ polymer characteristic ratio. However, a detailed mechanism on drug-polymer interactions is lacking, such as how chemically substituted cellulosic polymers interact with drug molecules at a molecular level and how different structural variables such as molecular weight and substitution pattern affect the drug-polymer interaction. In addition to the classical MD and DPD, another two levels of molecular models such as coarse-grained molecular dynamics (CG-MD) simulations, which are used to model excipients such as modified cellulosic polymers at a monomer level resolution and drugs at a similar level. The full spectrum of the CG-MD approach contains contributions from several different fields and continuum transport modeling, in which diffusion equations for transport of polymer, drug and solvent through a capsule are determined by solving the relevant differential equation. Several software packages can integrate these equations, including the popular GROMACS (Van der Spoel et al. 2005), NAMD (Phillips et al. 2005), CHARMM (Klauda et al. 2010) and AMBER (Wang et al. 2004) packages. Many of the coarse-grained methods utilize one of these integrators to perform simulations.

11.7 Computational Methods Used in Toxicity Studies

Toxicity is a measurement of the adverse effect of chemicals, and specific types of these adverse effects are known as toxicity endpoints, for example, carcinogenicity or genotoxicity. These adverse effects can be quantitatively or qualitatively measured to identify harmful effects caused by substances on humans and animals (Rowe et al. 2010). A number of factors determine the toxicity of chemicals, including route of exposure, dose, duration of exposure, ADME properties (absorption, distribution, metabolism and excretion), biological properties and chemical properties (Raies and Bajic 2016). A number of in vitro models have been used to determine toxicity such as high throughput screening (AltTox) and in vivo animal models. Recently, computational toxicity methods have been widely used to potentially minimize the need for animal testing and reduce the cost and time of the toxicity test to improve toxicity prediction and safety assessment. The major advantage of computational toxicity methods is their ability to estimate chemicals for toxicity even before they are synthesized (Madan et al. 2013). In silico toxicology analysis encompasses a wide range of computational tools including database storage of chemical data, their toxicity and chemical properties, and software for generating molecular descriptors, simulation tools for systems biology and molecular dynamics and modeling methods for toxicity. Rule-based and structural alerts are often-used computational methods for determining toxicity based on chemical properties and how drugs should be altered to reduce their toxicity. Another method, read-across, is used to predicting the unknown toxicity of a chemical through the use of similar chemicals (analogs) with known toxicity from the same chemical category (Dimitrov and Mekenyan 2010; Modi et al. 2012; Benigni et al. 2013; Venkatapathy and Wang 2013;). There are two approaches—an analog, or one-to-one approach, and a category, or many-to-one approach—for developing a read-across method. Both approaches are quite sensitive, identifying similar chemicals by calculating their properties and the similarities between them. The main advantage of read-across is its transparency (Cronin 2011): it is easy to interpret and implement (Enoch 2009), and it can model quantitative and qualitative toxicity endpoints and allow for a wide range of types of descriptors and similarity measures to be used to express similarity between chemicals (Dimitrov and Mekenyan 2010).

Quantitative structure-activity relationship (QSAR) is another widely used method that employs molecular descriptors to predict a chemical’s toxicity. Generally, the QSAR method predicts toxicity (T) of a lead molecule using a vector feature of chemical properties (θp) and a function f that calculates T given θp is

$$ T=f\left(\theta \mathrm{p}\right). $$

There are two QSAR models: local QSAR, which is generated from congeneric chemicals, and global QSAR, which is made from diverse chemicals. Local QSAR is used to predict toxicity based on the mode of action of specific chemicals, hence, local QSAR are more accurate as they are customized for specific chemicals (Valerio 2009). Mainly two basic steps are involved in the development of a QSAR model: the generation of molecular descriptors and then of models to fit the data. The number of molecular descriptors, as based on simulated annealing, generic algorithm or principal component analysis, can be used to determine the chemicals (Deeb and Goodarzi 2012; Devillers 2013). If there are a small number of descriptors, using two-dimensional scatterplots of each descriptor versus its biological activity can help identify significant descriptors (Devillers 2013). There are many tools available that provide pre-built QSAR model such as OECD QSAR Toolbox (OECD 2015), TopKat (Accelrys 2015) and METEOR (Lhasa Limited, Meteor Nexus 2014). The major advantage of QSAR is that it’s easy to interpret and it can model categorical and continuous toxicity endpoints and molecular descriptors and toxic and non-toxic chemicals. However, it may not be always employable, as a large number of chemicals are needed in the model development for QSAR to achieve statistical significance (Valerio 2009; Deeb and Goodarzi 2012).

Pharmacokinetic (PK) models relate to the concentration of drug molecules in tissues to time, estimating the amount of chemicals in different parts of the body and quantifying ADME (absorption, distribution, metabolism and excretion) processes (Jack et al. 2013; Sung et al. 2014). Mainly, the PK models are used to relate chemical concentration in a part of the body to time of toxic responses. A PK model can be categorized as two models: compartment and non-compartmental (Sung et al. 2014). A compartment model consists of one more compartments, with each compartment represented by differential equations (Sung et al. 2014). One compartment model represents the whole body as a single compartment, assuming rapid equilibrium of chemical concentration within the body but not considering the time to distribute of the chemical. Two-compartment models consist of two compartments, the central and peripheral with both compartments represented by differential equations. These models provide mechanistic insight based on pharmacokinetic models including concentration and time, physiological descriptors of tissues and ADME processes such as volumes, blood flows, chemical binding/partitioning, metabolism and excretion (Jack et al. 2013; El-Masri 2013).

11.8 Outcome of Drug Research in Bacterial Inhibitors

Bacterial infection is one of the major threats to human health because it frequently causes severe diseases not only in the form of primary agents but also after pathologies caused by other agents. Compared to Gram-negative bacteria, Gram-positive bacteria have a much thicker peptidoglycan layer, which is responsible for the increasing occurrence of bacterial resistance to antibiotics in medicinal practice (Springer et al. 2010; Nikaido 2003). Since the discovery of several antibiotics in the mid-twentieth century, resistance has been a concern (Peters et al. 2008). Although the emergence of antibacterial resistance is not new, it continues to be a major health concern. The report from the Centers for Disease Control and Prevention on antimicrobial resistance revealed that more than 21% of hospital-acquired infections were caused by an antimicrobial resistant pathogen. Hence, there is a need for new alternatives in the treatment of infections by multi- resistant bacteria. Among the several pathogens, Staphylococcus aureus resistant to methicillin (MRSA), Streptococcus pneumonia, resistant to penicillin, glycopeptide-intermediately-resistant S. aureus (GISA), methicillin-resistant S. epidermis, glycopeptide-resistant enterococcus spp and vancomycin-resistant Enterococci (VRE) are the more important etiological agents of hospital and community infections and are responsible for high rates of morbidity and mortality in hospitalized patients (Woodfor and Livermore 2009; Livermore 2009; Arias and Murray 2009). Several fluoroquinolones, ramoplanin, beta-lactams and the quinupristin/dalfopristin are currently used in the market. Moellering et al. (1999) studied the clinical efficacy and safety of quinupristin-dalfopristin in the treatment of a patient with a vancomycin-resistant infection. From the studies it was noted that the overall clinical and bacteriologic success rate was 66%. In another study, Nichols et al. (1999) compared quinupristin-dalfopristin with cefazolin, oxacillin and vancomycin in two randomized, open-label clinical trials.

Oxazolidinones, an antimicrobial class of agents, are a unique family of drug molecule possessing activity against Staphylococcus aureus and glycopeptide-intermediately-resistant S. aureus (Rybak et al. 2000; Wootton et al. 2000) and they are also more effective against a wide range of Gram-positive bacteria and Mycobacterium tuberculosis. Linezolid was the first approved derivative with acceptable tolerability in humans for the treatment of pneumonia, skin and soft tissues infections caused by VRE (Cammarata et al. 2000). Daptomycin is another antibacterial agent used to treat a wide range of Gram-positive bacteria. Recent studies from the US and Europe revealed that daptomycin was active against all Staphylococcus aureus and Gram-negative bacteria such as Leuconostoc, which are characteristically resistant to glycopeptides (Barry et al. 2001; King and Phillips 2001). The effectiveness of daptomycin has been proved in various animal models of Gram-positive infection. Several global randomized, double blind phase II trials have investigated the efficacy of daptomycin in the treatment of community-acquired pneumonia (Pertel et al. 2008).

11.9 Future Aspects of Computational Methods in Targeting Bacterial Infections

The drug-resistant capability of Gram-positive bacteria is a serious issue in clinical practice, and several antibacterial agents have already been approved by the US Federal Drug Administration for several infections, while other agents are still undergoing clinical trials. However, a lack of effective antibiotics in development implies that future treatment strategies for the resistant bacteria may have to show enhanced therapeutic efficacy. The battle against antibiotic resistance can be carried out on two fronts: either in advancing research efforts toward the discovery of novel and potential agents or by enhancing the effectiveness of the currently available ones. With the increasing prevalence of bacterial resistance, there is need to identify potential lead molecules to combat them. Conventional drug development research requires huge investment and at least 12–15 years experimentation, and even so, it often does not reach the market; hence; alternative approaches and strategies are required to develop safe and effective novel antimicrobial therapies. The current scenario of antibiotic research and development is not very effective, so a computational approach such as structure-based drug design, ligand-based drug design, pharmacophore modeling and molecular docking are useful for understanding the mechanism of bacterial resistance to antibiotics. In addition to the experimental approach, computational biology combination therapy has great potential in the future discovery of antimicrobial drugs.