Keywords

1 Introduction: The Need for Computational Biology

Discovering novel drug molecules strictly demands huge investments in terms of time, infrastructure, and labor to identify, optimize, and validate the drug-likeliness of such molecules by conducting in vitro, in vivo, and preclinical experiments (Lei et al. 2016; Rifaioglu et al. 2019). To ease the process, a shift toward the application of computational tools is witnessed in the early stages of identifying drug-like molecules. Constant advancements in software and its algorithms aid to bridge the “innovation gap” that exists due to higher investments and lower approval rates. The process of drug discovery sequentially includes the identification and validation of disease targets, lead compound identification and its optimization, and finally success in clinical trials. Accordingly, establishing a drug can take around 10 to 13 years with huge capital expenditure (Malathi et al. 2018). Challenges arising due to the pleiotropic nature of biomolecules and the interaction of chemical compounds with multiple pharmacological targets (often encountered in combinatorial/multitargeted approaches) can be addressed by chemo- and bioinformatics tools that make use of databases on physicochemical characteristics and therapeutic use of compounds (Lagunin et al.). The fact that the primary healthcare of 80% of the population in developing countries counts on the conventional herbal remedies and the steep rise of 380% in plant-based supplements’ sales in the United States from 1990 to 2000 encouraged the development of numerous databases on ethnomedicine (Dunkel et al. 2006; Mosihuzzaman and Choudhary 2008). This expands the prospects of utilizing traditional knowledge on medicinal plants in modern-day drug discovery and drug repurposing.

In 1971, the database Protein Data Bank (PDB) came into existence, being the first open-access digital repository in the field of biology. It is a collection of 3D structures (resolved by laboratory experimentations namely X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy) of biological macromolecules and receptor–ligand complexes (https://www.rcsb.org/) (Burley et al. 2017). ChemCom (Chemical Comparator) is an application based on Java Web Start (JavaWS) technology and includes UnionBit Tree Algorithm to search and compare large chemical libraries (Saeedipour et al. 2015). The list of such repositories can be long enough (Lagunin et al.). A few open-source databases on medicinal plants, phytochemicals and other chemical compounds can be listed as follows: Plants For A Future (PFAF), Indian Medicinal Plants Phytochemistry And Therapeutics 2.0 (IMPPAT 2.0), Native American Ethnobotany database, SuperNatural 3.0, The Natural Compound (NC) collection, NCBI PubChem, ChEMBL, Collection of open natural products (COCONUT), Traditional Chinese Medicine Information Database (TCMID), Dr. Duke’s Phytochemical and Ethnobotanical Databases, Aromatic and Medicinal Plants Index (by Purdue University), Agricultural Science and Technology (or AGRIS supported by the Food and Agriculture Organization (FAO) of the United Nations), Compendium of Ayurveda Medicinal Plants of Sri Lanka, Botanical.com, Chinese Herbal Medicine Dictionary (by Complementary and Alternative Healing University), Clinicaltrials.gov database, Medicinal Plant Database (by Botanical Survey of India), EcoPort, TIPdb (a database of indigenous and endemic plant species in Taiwan), Traded Medicinal Plants Database, Herbal Medicines Compendium Medicinal Herbs and Plant Database, Drugs Herbs and Supplements by MedlinePlus, ZINC database, Marowina database medicinal support, Natural Medicines, Herbs at a Glance, Prelude Medicinal Plants Database, Raintree Tropical Plant Database, The World Flora Online, and TRAMIL database (Xie et al.; Duke 2020). Commercial databases with paid access include Chemical Abstracts Service (CAS), HerbalThink-TCM, Dictionary of Natural Products (DNP), and HerbMed.

The increasing data relating to the bioactivities of a chemical compound, the composition of phytoconstituents in an extract, and the target receptors responsible for specific bioactivity need to be stored and be able to be retrieved systematically. “Omic” technologies have led to the development of diverse databases, and interpreting/interconnecting them or data mining from them is a major challenge to human capabilities. Hence, computational approaches including artificial intelligence and machine learning algorithms (e.g., artificial neural networks (ANN), Naive Bayes, K-Means, support vector machine (SVM), random decision forest, etc.) are widely adopted to provide solutions to complex biological questions (Gupta et al. 2021; Muzammil et al. 2023). Continuous development of in silico tools for chemoinformatics and bioinformatics provides insight to the vast multiomics data and adds different perspectives to the scientists in the domain of drug discovery. Chemoinformatics particularly aims to model a statistical correlation between the observed bioactivity and structural parameters. These approaches relating to computer-aided drug design have gained noteworthy momentum in the drug discovery process. Genome-wide functional genetic screening (e.g., using deep learning algorithms) is a cutting-edge technique that has led to the discovery of genotype–phenotype interconnections and established new phenotypes (Zhang et al. 2011). Genomics and proteomics analyses in high-throughput screening have shown promising results to rationalize the drug discovery process; however, the cost inflation incurred due to these technologies does not meet the expected growth of the drug’s approval rate. Freely available software that are frequently employed in machine learning and statistical analysis of data are R, PSPP by the GNU Project, and WEKA while commercial ones include MATLAB, SAS/STAT, SIMCA, SPSS Statistics by IBM, and TIBCO Data Science/Statistica (Dzemyda et al. 2019).

The first thing that needs to be checked while selecting a software for computer-aided drug designing is its vendor and license—whether it is under academia, commercial, open-source, or in-house software. Open-source software are popular among academic personnel as, unlike commercial software, no license fee is required, their source code is made available freely and can be modified by a user. Based on the intended use, license fee, and characteristic features of the software/platforms, attempts are made to categorize and list the in silico tools employed in the various domains of drug discovery (Singh et al. 2021). Molecular docking, pharmacophore modeling, methods relating (Q)SAR, molecular dynamics simulation, network pharmacology and machine learning algorithms accelerate the drug discovery process and complement the traditional bioactivity-guided fractionation, high-throughput screening and systems biology approaches. In this chapter, the tables summarizing the in silico tools only provide a fraction of popular platforms and encourage the readers to explore other alternatives in various domains of drug discovery and protein engineering.

2 Visualization of Molecular Structures

Molecular graphics enhances the experience of representing, modeling and analyzing multifaceted biochemical systems. Besides modeling the 3D architecture of molecular structures, 2D illustrations of molecules have gained interest among chemical scientists and biologists in the field of theoretical chemistry and discovery because of their clear representation of structural characteristics and interactions between atoms (Zhou and Shang 2009). Visualizing molecular structures in any virtual reality environment demands rapid high-quality rendering of geometries to build molecular models with intuitive and informative interactions. Different visualization techniques (such as the space-filling model, the ball-and-stick model, and the reconstruction of the surface of the secondary structures alpha helixes and beta sheets) are used while representing a molecular model, and some available platforms to design, analyze and visualize molecular structures are listed in Table 1.

Table 1 Software/servers for designing and visualizing molecular structures

3 Prediction of Pharmacokinetic/Pharmacodynamic Profile

Evaluating the ADMET (absorption, distribution, metabolism, excretion and toxicity) properties of a molecule is a major step in discovering novel drug compounds. In general, compounds having natural origin tend to have desirable ADMET properties compared to synthetic compounds. Early prediction of ADMET properties of a chemical compound can be of utmost importance since most drug failures occur in the later phases due to undesirable pharmacokinetics and toxicological characteristics. Lipinski’s rule of five is often checked to predict the drug-likeliness (in humans) of an oral-administered compound (Lipinski 2004; Rego et al. 2022). According to it, a drug molecule can have at most one violation among these five rules: (a) ligand’s molecular weight should be less than or equal to 500 Daltons, (b) the number of H-bond donors should be less than 5, (c) the number of H-bond acceptor should be less than 10, (d) value of octanol partition coefficient (miLogP) should be less than 5 and (e) the number of rotatable bonds should be less than 10.

Most of the software packages that predict the ADMET of compounds (e.g., their affinity toward transporter proteins, blood proteins and drug-metabolizing enzymes P450 cytochromes isoforms, etc.) consider their structural/physicochemical characteristics to develop (Q)SAR models. Derek Nexus (Lhasa Ltd.), TOPKAT (Accelrys), OSIRIS Property Explorer, MCASE (Multicase) and PASS can be opted to predict various toxicities and report the teratogenic, mutagenic, cardiotoxic, hepatotoxic, carcinogenic, and renal-toxic nature of the compounds (Kar et al. 2018). The online server of GUSAR (www.way2drug.com) predicts the LD50 values of query compounds on rodents when administered via four different routes. Some other software/web-servers to study the ADMET properties are listed in Table 2.

Table 2 In silico tools used in the prediction of ADMET properties of small molecules

4 Prediction of Structures Including Homology Modeling

Molecular modeling based on structure-based drug designing requires 3D structures of the receptor and ligand molecules (experimentally determined by X-ray crystallography and NMR spectroscopy). In cases where experimental data are unavailable, the existing data and sequences can be used to predict the structures by homology-based modeling, sometimes referred to as comparative modeling of protein. The amino acid sequence of a protein (acquired from NCBI or UniPort) is used to generate the structure using computational tools. Evolutionarily related proteins share a similarity in sequences and homologous proteins exhibit similarity in their protein structure (substitution matrices such as Blosum 60 describe such homology). The 3-D protein structure is found to be evolutionarily more conserved compared to the sequence conservation alone (Kaczanowski and Zielenkiewicz 2010). Homology modeling starts with recognizing a template that shows similarity in sequence (searching is accomplished by employing BLAST (Basic Local Alignment Search Tool) or PSI-BLAST (Position-Specific Iterated BLAST) or fold recognition methods) and subsequent alignment of the known structures (resolved by experiments) in the database. A similarity of less than 30% is generally not preferred in homology modeling. BLAST compares a query sequence with the existing database and identifies the most suitable sequence with significant similarity, i.e., it identifies the homologous sequences. Alignments with an expectation value (E-value) closer to zero indicate a higher similarity index. A higher E-value makes the alignment of two sequences strenuous, thus considering sequences from other homologous proteins can help in this scenario (Pearson 2013; Alves et al. 2023). Multiple Sequence Alignment programs, e.g., CLUSTALW, can align sequences by performing insertions and deletions. Alignment correction, if done not properly, will generate defective structures. Some of the methods that are used to build models are spatial restraint, rigid-body assembly, segment matching and artificial evolution. Modeling tools, namely Modeller or CASP, can be used to build the backbone from the aligned sequences. Most often, aligning the model sequence with the template sequence creates gaps that can be resolved by considering conformational changes, insertions/deletions/substitutions of amino acid residues. Thus, refining the model includes loop modeling and side-chain modeling following the principles of molecular dynamics, Monte Carlo, and genetic algorithms. After modeling, structures are energetically minimized by employing force fields (for instance OPLS, AMBER, MM3, and CHARMM22 force fields) (Lewis-Atwell et al.). Loop modeling can be knowledge-based or energy-based. Knowledge-based loop modeling, sometimes referred to as template-based or homology-based, searches existing databases to identify known loop conformations that match the input sequence and geometric descriptors about the anchoring points (Karami et al.). It does not require complex simulations and high computation power; however, it relies on the availability of appropriate loop conformations present in the existing repositories of protein structures to consider the entire conformational space. Energy-based loop modeling corresponds to nontemplate-based or de novo methods that use an energy function and minimizes it by Monte Carlo methods or molecular dynamics to optimize the loop conformation. Proteins that share structural similarity also exhibit similarity in torsion angle about Ca-Cb bond (psi angle) when side-chain conformations are considered. The entire conserved residues can be taken from the template and copied to the model to yield highly accurate results when compared to the methods that copy the backbone or predict the side chains. Modeling of side chains includes knowledge-based methods to extract a library of rotamers from known crystallographic structures and substitutes the side chains on the backbone structure. After side chain modeling, the analysis is done by using their root mean square deviation (RMSD) values. The errors found in the final model are dependent on the extent of similarity between the template and the target. If it is >90%, then the crystallographic structure is fairly predicted, whereas for a value <90%, the r.m.s.d errors will be significant. The estimation of errors can be done by using a force field to calculate the model’s energy and checking if the bond lengths and angles are exhibiting a value in the normal range (Dolan et al. 2012; Wink et al. 2019; Lima et al. 2022). However, this method does not evaluate the folding nature of the model and the misfolding in proteins is taken care of by 3D distribution functions. Model validation is necessary to establish the prediction accuracy.

The stereochemical aspects of the protein can be explored by WHATCHECK, WHAT IF, VADAR, and PROCHECK. Ramachandran plot, obtained by plotting the torsional angles of amino acids φ (phi) and ψ (psi) in a protein sequence two-dimensionally, is used to analyze the stereochemical and geometrical nature of the structure and verifies the presence of geometries in the electrostatically unfavored regions of the plot. A higher proportion of residues in the favored region indicates the structural feasibility of the model (Agnihotry et al. 2022). Popularly used tools for homology-based modeling are MODELLER, SWISS PDB VIEWER, SWISS MODEL and COMPOSER (Malathi et al. 2018). MODELLER is also used for sequence searching, comparing and clustering protein structures or sequences. In brief, steps in homology modeling take into account template identification, sequence alignment, structural modification, energy minimization and model validation to predict the 3D structure.

5 Interaction Networks

Hopkins in 2007 brought the concept of network pharmacology that makes use of network analysis algorithms (on the existing knowledge of biological networks consisting of structural/physicochemical properties of proteins/ligands, the interaction of a protein/gene with another protein/gene/ligand, signaling and metabolic pathways) to predict the therapeutic action of small molecules, elucidate their mechanism of action, and understand the drug-disease relationships at the system-level (Csermely et al. 2013). Visualization of biological networks (such as pie-nodes and edge-pie matrix visualization) and network comparison (by employing network alignment and computing pair-wise similarity between selected networks) is essential for network analysis, identification of key components/nodes/interactions in a concerned biological system, and highlighting the union/intersection/complement regions in a set of biological networks. Networks have the capability to highlight the interacting elements within a complex biochemical system, thus aiding in the visualization and exploration of big data. However, the challenges relating the large size and high complexity of biological networks generate the so-called “hairballs” in the networks. Hence, there is a need for an efficient and interactive graphical user interface for network comparison and visualization (Pirch et al. 2021; Almeida et al. 2022). One needs to consider several types of relationships (namely “target–effect,” “target–pathway,” “pathway–effect,” and “target–pathway–effect” relationships) to investigate the pleiotropic and synergistic effects of a drug compound or a combination of drug compounds. The benefit of conceptualizing such “cause–effect” relationships unfold gradually—if the bioactivity of a drug relates to certain molecular targets and their corresponding pathways are established, then other mode-of-actions of influencing those pathways can yield similar effects.

Analysis of biological pathways (such as signaling pathways, regulatory pathways, metabolic pathways, signal transduction pathways, etc.) makes use of various pathway databases (Lagunin et al.). To name a few, WikiPathways (https://www.wikipathways.org), HumanCyc (https://humancyc.org/), NetPath (http://www.netpath.org/), Reactome (https://reactome.org/), KEGG (https://www.genome.jp/kegg/pathway.html), SignaLink (http://signalink.org/), and Small Molecule Pathway Database (https://www.smpdb.ca/). QIAGEN Ingenuity Pathway Analysis (IPA) is an online platform that is used to analyze, integrate, model and interpret the nexus of data from “omic” technologies including RNAseq experiments and Single-Nucleotide Polymorphism (SNP) microarrays. It aids in the identification of genes and pathways that functionally interact with the drug molecules and compares the gene regulatory circuits involved in the phenotypic responses. Connectivity Map (CMap) connects the genes and the drugs (currently in use) underlying various diseases and enables us to perform data-driven analysis of repurposing/reprofiling/repositioning of drugs (it does so by analyzing the disease-specific and drug-specific gene signatures). A user provides the “gene hit lists” (aka “signatures”) to the CMap for its comparison with a gene differential expression (DE) database (obtained by perturbation of cell lines with numerous drug-like molecules) to output a rank of compounds that exhibit similarity in expression patterns considering the query hit list. The CMap resource hosts over 1.5 million gene expression profiles from around 5000 chemical compounds and 3000 genetic reagents that are tested in various cell lines (Lim and Pavlidis 2021). The similarity in the gene expression profiles based on drug–drug, drug–disease, and disease–disease relationships is used to create the disease–drug networks for studying the potential side effects, targets and pathways associated with the drug compound. Aside CMap, Gene Expression Omnibus (GEO), and the Comparative Toxicogenomics Database (CTD) can be opted to create such disease-specific gene expression signatures. DIGEP-Pred is a free web-based platform that considers the structural characteristics of compounds to predict drug-induced variations in gene expression profiles (Lagunin et al. 2013). Natural Product-based Drug Combination and Its Disease-specific Molecular Regulation (NPCDR) is an interactive database that shares knowledge on drug combinations (of natural products) with clinical or experimental validations. It also provides information on disease-specific molecular recognition and pathways and allows integration of available databases, easing the research on network pharmacology and medicinal chemistry (Sun et al. 2022).

The platforms that are free for academic use in bioinformatics and systems biology researches to analyze complex data from “omic” technologies are OmicsNet (https://www.omicsnet.ca/) (Zhou and Xia 2018), Cell Illustrator (http://www.cellillustrator.com/home), Cytoscape (https://cytoscape.org/), ConsensusPathDB (http://cpdb.molgen.mpg.de/), Gene Set Enrichment Analysis or GSEA (https://www.gsea-msigdb.org/gsea/index.jsp), The Database for Annotation, Visualization and Integrated Discovery or DAVID (https://david.ncifcrf.gov/), VANESA (https://cbrinkrolf.github.io/VANESA/). Other software with paid licenses include the geneXplain platform (https://genexplain.com/), QIAGEN Ingenuity Pathway Analysis (https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/analysis-and-visualization/qiagen-ipa/), and Elsevier’s Pathway Studio (https://www.elsevier.com/en-in/solutions/pathway-studio-biological-research). Other alternatives that can be explored in this domain of research are presented in Table 3.

Table 3 Some software/servers to generate, visualize, and analyze biological networks

6 Pharmacophore Modeling and Molecular Docking

From a large library of chemical compounds, virtual screening identifies the lead compounds having a specific bioactivity. There exists structure-based and ligand-based virtual screening. The former approach utilizes the 3D structure of the target protein and performs molecular docking to report the potential active compounds that exhibit good binding affinity/score with the target receptor structure. Molecular docking is a structure-based approach and is used in the prediction of the 3D orientation of the ligand molecule with respect to a particular conformation of the receptor molecule when both are interacting and forming a stable complex (Sahoo et al.). It is one of the first-line tools used in discovering/designing novel drug molecules that predict the binding affinity of a chemical compound with the target receptor and ranks the ligands based on their respective docking scores. Molecular docking predicts the atomistic model of the receptor–ligand interactions and their binding orientations. In site-specific or targeted docking, the active sites of the target protein are reviewed or predicted by using programs such as CASTp, Q-SiteFinder, LigA Site, and MetaPocket, while blind docking considers the entire protein structure as the probable region of ligand interaction (Wong and Kwan 2015). Searching algorithms that fish out favorable conformations from infinite possibilities include matching algorithms, incremental construction methods, multiple copy simultaneous searching, Monte Carlo and genetic algorithms. Scoring functions (either empirical, force field, or knowledge-based) of a docking software estimate the binding affinity of the ligand with the target receptor and rank the ligands based on docking scores.

Qualitative “structure–activity relationships” (i.e., SAR) and quantitative structure–activity relationships (i.e., QSAR) are used in virtual screening (and target fishing) if the structures of the chemical compounds are available or predicted/designed. These approaches assume the bioactivity of a ligand as a function of its structural or physicochemical characteristics. Analysis and comparison of the structures are achieved with the help of some descriptors (such as structural fragments, fingerprints, constitutional, topological, electro-topological, quantum-chemical and physicochemical descriptors) (Lagunin et al.). Pharmacophore modeling considers a group of atoms in the structure whose presence directs the pharmacological effect of the ligand. Ligand-based virtual screening employs QSAR approaches that aim to develop mathematical models to study the correlation between the observed bioactivities and structural/physicochemical characteristics. Software such as Sybyl-X 2.0 and E-Dragon perform QSAR studies (Browne et al.; Fedyushkina et al. 1990).

Two techniques, namely comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA), are encountered in 3D OSAR for ligand-based drug designing (Chavda and Bhatt 2019). In CoMFA, a library of ligands comprising their physicochemical characteristics and biological activities is created. These bioactive compounds vary between themselves by some substitutions. Seventy percent of data in the database is fed as the input to the training set (regression models are generated using it following the Partial Least Squares (PLS) regression and correlating the models with the pIC50 value), whereas the rest of the 30% data is kept as the test set (used to establish the prediction accuracy of the QSAR regression models). Finally, the models undergo leave-one-out (LOO) cross-validation. The descriptors in CoMFA are determined by the sp3 probe. Columbic potential energy calculates the electrostatic field and Lennard Jones potential energy describes the bond energy curves for Van der Waals bonding. The 3D steric and electrostatic contour plots depict the variation in bioactivity with the alteration of molecular fields. The SEAL similarity method in CoMSIA takes into account the electrostatic, steric, hydrogen bonding and hydrophobic descriptors to predict the similarity between molecules using Gaussian functions. The contour plots produced by the CoMSIA portray the favorable and unfavorable regions for the interaction of ligands (Bordás et al. 2003).

Approaches to build pharmacophore-based models identify the molecular characteristics that direct the macromolecular recognition of ligands, thus triggering the biological response. The aromaticity, hydrophobicity, presence of hydrogen bond acceptors/donors and anion/cation residues are considered to model pharmacophores that act as a query to search the potential bioactives from a database of compounds in virtual screening. Developing pharmacophore models can follow either structure-based or ligand-based approaches. The former approach relies on the availability of X-ray crystallographic or NMR spectroscopic 3D structure of the receptor molecule/target protein. The active sites and the spatial interactions are described by certain physicochemical properties that complement the interacting ligands and selectively identify the compounds with high binding affinity. A good model must incorporate protein flexibility to consider the structural changes that occur during the formation of the receptor–ligand complex. Ligand-based modeling is useful in cases where the 3D molecular structure of the receptor molecule is not available and the pharmacophores are generated by studying the common features (e.g., hydrophobic and electrostatic interaction, hydrogen bonding, etc.) that exist at the same position in the ligand structures. In ligand-based pharmacophore modeling, chemical compounds in the training set create a conformational space that takes care of ligand flexibility (Braga et al.). HipHop, DISCO, HypoGen, and PHASE are some software for generating pharmacophore models.

The structural data generated by NMR, X-ray crystallography, and homology modeling are static in nature that fails to describe the dynamic nature of the biorecognition process during receptor–ligand binding. These experimental data highlight the binding sites for some endogenous agonists; however, other active sites (including the allosteric and cryptic binding sites) are often not identified. Neither the receptor nor the ligand is a frozen/rigid entity; instead, the structures are interacting under constant motion in a solution (any biological fluid). Moreover, an approaching ligand can cause a series of conformational changes in the receptor structure to improve its binding affinity. In order to consider the flexibility of the macromolecular structures, the relaxed complex scheme (RCS) has been developed that extracts several conformations of the receptor sampled using simulation and then performs molecular docking of the ligand with each of the conformations. The scoring functions often consider conformational entropy and solvation energy as negligible parameters while calculating the binding affinity to make the process computationally less expensive (but compromising with the model’s accuracy).

Often researchers employ both QSAR modeling and molecular docking to predict the bioactivity and investigate the mechanism of action of compounds. In a study, the immunomodulatory effect of the ligands is evaluated by employing forward stepwise multiple linear regression to develop a QSAR model with 52 physical-chemical descriptors (important ones are namely dipole moment, steric energy, amide group count, ƛmax (UV-visible) and molar refractivity) using the SCIGRESS platform. Finally, molecular docking is performed to predict their binding affinity with immunomodulatory targets namely TLR-4, iNOS, COX-2, CD14, IKK b, CD86, and COX-1 (Yadav et al. 2010). A similar QSAR model with 50 descriptors from SYBYL-X 1.3 is used to study the cytotoxicity of ursolic acid analogs against human glioblastoma and lung cancer cell lines. The model exhibited a good regression coefficient (r2) and the cross-validation regression coefficient (rcv2) (values ranging from 0.8 to 0.96). The relevant parameters for cytotoxicity are found to be LUMO energy, ring count, dipole vector and solvent-accessible surface area (Kalani et al. 2012).

Some freely available software and webservers to generate descriptors (that include arithmetical, topological, constitutional, geometrical, electrostatic, thermodynamic, quantum-chemical descriptors and other molecular fingerprints) are AFGen (http://glaros.dtc.umn.edu/gkhome/afgen/overview), ISIDA-fragmentor (https://complex-matter.unistra.fr/equipes-de-recherche/laboratoire-de-chemoinformatique/software-development/#c89382), E-DRAGON (http://www.vcclab.org/lab/edragon/), Open3DQSAR (https://open3dqsar.sourceforge.net/), ToMoCoMD-CARDD (http://tomocomd.com/), MOLGEN (http://molgen.de/?src=documents/molgenqspr.html), Mold2 (https://www.fda.gov/science-research/bioinformatics-tools/mold2), Toxicity Estimation Software Tool or TEST by United States Environmental Protection Agency (https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test) and Open Babel (http://openbabel.org/wiki/Main_Page) while commercial alternatives are The CODESSA PRO project (http://www.codessa-pro.com/). Along with the model’s high internal accuracy (i.e., R2 > 0.9 and Rcv2 > 0.8 calculated using the training set only), external validation of the (Q)SAR model with experimental data is desirable as per the OECD guidelines (www.oecd.org/env/ehs/risk-assessment/37849783.pdf). In order to better correlate the structural characteristics with the bioactivities, one must use molar units (such as mol/kg, or mmol/kg) instead of mass units (i.e., mg/kg) in the models (Dearden et al. 2009). In inverse docking or target fishing, identification of the possible targets/receptors for the query ligand is performed by software such as GOLD, FlexX, TarFisDock, TarSearch-X, and TarSearch-M.

The evaluation of the bioactivities of a novel compound (i.e., the potential drug targets) can be accomplished by using pair similarity with known compounds (e.g., ChEMBL database calculates the Tanimoto coefficient based on fingerprints), molecular docking, pharmacophore modeling, Bayesian statistics and designing substructural descriptors or fingerprints. However, one must take to avoid the “activity-cliff” problem in the model that arises when the compounds share analogous structural characteristics but exhibit dissimilar bioactivity spectra. Despite being a rapid and efficient technique in virtual screening, pharmacophore modeling essentially relies on the knowledge of reported active ligands, necessitates sampling conformers using a search algorithm, and is based on a rigid framework for searching hit compounds from the database (Horvath 2010; Kaushik et al. 2018; Lans et al. 2020).

Some platforms to perform protein-protein or protein-DNA docking include SPServer (http://aleph.upf.edu/spserver/), pyDockDNA (https://model3dbio.csic.es/pydockdna), CoDockPP (http://codockpp.schanglab.org.cn/), DOCKSCORE (http://caps.ncbs.res.in/dockscore/), PIIMS Server (http://chemyang.ccnu.edu.cn/ccb/server/PIIMS/index.php), GalaxyDomDock (https://galaxy.seoklab.org/cgi-bin/submit.cgi?type=DOMDOCK_INTRO), P3DOCK server, HDOCK server, and GRAMM (Global RAnge Molecular Matching) (https://gramm.compbio.ku.edu/). ezCADD is a fast 2D/3D molecular visualization software that allows small-molecule docking, protein-protein docking, prediction of binding sites, identification of drug targets, homology modeling and structure quality assessment (Tao et al. 2019). FragVLib is an open-source software (distributed under the GNU General Public License) that generates a virtual library of ligand fragments (used for structure-based drug designing) by searching the binding pocket similarity considering a database of ligand-receptor complexes (Khashan 2012). eMolFrag is used for the virtual fragmentation of molecules and extracts the molecular fragments to build a library for virtual screening (Liu et al. 2017). Other software/servers used in virtual screening (structure-based and/or ligand-based) can be listed as follows, although other popular platforms do exist: DENVIS (https://github.com/deeplab-ai/denvis), ReMODE (Receptor-based MOlecular Design for de novo drug designing available at http://cadd.zju.edu.cn/relation/remode/), Pocket2Drug (https://github.com/shiwentao00/Pocket2Drug), DrugRep (http://cao.labshare.cn/drugrep/), DockingPie (a docking plugin for PyMOL), CB-Dock2 (https://cadd.labshare.cn/cb-dock2/php/index.php), PharmRF (https://github.com/Prasanth-Kumar87/PharmRF), DeepDock (https://github.com/OptiMaL-PSE-Lab/DeepDock), Knime workflow (https://hub.knime.com/), RNALigands (http://rnaligands.ccbr.utoronto.ca/php/downloads.php), AutoDock Vina (https://vina.scripps.edu/), eSPC (https://spc.embl-hamburg.de/), RASPDplus (https://github.com/HITS-MCM/RASPDplus), LigRMSD (https://ligrmsd.appsbio.utalca.cl/), LeDock (http://www.lephar.com/index.htm), VSpipe (https://github.com/sabifo4/VSpipe), PyRx (https://pyrx.sourceforge.io/), LiSiCA ((Ligand Similarity using Clique Algorithm available at http://insilab.org/lisica/), ALIDE (http://chemyang.ccnu.edu.cn/ccb/server/AILDE/), Open3DALIGN (https://open3dalign.sourceforge.net/), PrepFlow (https://ifm.chimie.unistra.fr/prepflow), QSAR-Co-X (https://github.com/ncordeirfcup/QSAR-Co-X), PyRMD (https://github.com/cosconatilab/PyRMD), SwissSimilarity (http://www.swisssimilarity.ch/), PharmMapper (http://lilab-ecust.cn/pharmmapper/), and ZINCPharmer (http://zincpharmer.csb.pitt.edu/).

7 Molecular Dynamics Simulation

The deterministic approach of the quantum-mechanical model of motion in the macroscopic world contrasts the use of probability functions that describe the motion in the microscopic world. This is because the electron clouds (that interact while bonding) exhibit wave-particle duality and not simple mechanical bonding. Simulating the system of proteins and other receptor molecules interacting with ligands at the atomistic level has paved its importance to the drug discovery process. The breakthroughs in hardware-based computational power and the development of new algorithms ease the calculation of molecular forces that exist in the system. The limitations of the conventional “lock and key” model of receptor–ligand interaction (where the receptor is held rigid and conformational sampling of the ligand is done, restricting the atomistic motions to keep the model simple) are overcome by such simulations. This considers the dynamic nature of the proteins, thus sampling numerous conformational states and selectively stabilizing them when an agonist or antagonist interacts. Any simulation starts with the modeling of the receptor–ligand system (using the data obtained from NMR, crystallography, or homology modeling), subsequently, the forces experienced by each atom (present in the system) are estimated and the positional changes of atoms are done following Newton’s laws of motion. These forces are the results of bonded interactions (i.e., charged/electrostatic interactions that use Coulomb’s law to generate the model) and nonbonded interactions (i.e., van der Waals interactions that use the Lennard-Jones 6–12 potential for modeling). Virtual springs and sinusoidal functions are used in the estimation of the difference in potential energy between eclipsed and staggered conformations. The parameters used in the functions identify the stiffness and lengths of the springs, estimate the atomic angles (and dihedral angles), calculate the partial atomic charges (responsible for electrostatic interactions), and predict the van der Waals atomic radii. These parameterizations form the basis of a “force-field” that depicts the nature of molecular dynamics under the influence of several atomic forces. Finally, the simulation time is advanced (by 1–2 femtoseconds, i.e., 10−15 s), and the process is iterated (in the order to 106) (Durrant and McCammon 2011). Different force fields exist depending on how they are parameterized, although they mostly generate similar outputs. AMBER, CHARMM, and GROMOS force fields are generally encountered in simulation modeling. Molecular dynamics simulation demands performing a huge number of calculations; hence, computer clusters or supercomputers with numerous processors need to operate parallelly. Message Passing Interface (MPI) compatible simulation software like NAMD, CHARMM, and AMBER help in connecting multiple processors so that they can be simultaneously used to execute a complex assignment. Such simulations can estimate the values of NMR-related parameters (e.g., spin relaxation), thus allowing comparison between the theoretical prediction and experimental value.

Simulating molecular systems follows Newton’s laws of motion. Such simulations output trajectory graphs for evaluating the stability of the target protein or its docked complexes. In order to perform molecular dynamics simulation, the protein topology is generated by applying force fields such as Amber and Gromos (using GROMACS or LEaP program), while the PRODRG server can be used for obtaining ligand topology (Strasser and Wittmann 2013). The structures are placed inside a cube and solvation is done using the flexible simple point-charge (SPC) water model. Followed by system neutralization, the steepest descent algorithm minimizes the energy of the system. At a particular temperature (let’s say 300 K), position-restraining simulations are performed for a certain period of time under constant volume and temperature dynamics (NVT) and pressure and temperature dynamics (NPT). LINear Constraint Solver (LINCS) algorithm is frequently reviewed for molecular simulations with bond constraints (Hess et al. 1997). The Particle Mesh Ewald algorithm estimates the electrostatic energy (Madelung energy) of the complex/crystal. After performing the molecular dynamics simulation, the trajectories (w.r.t. time) are generated by the XMGrace tool and the parameters namely the root mean square deviation (RMSD), root mean square fluctuation (RMSF), the radius of gyration (Rg), and intermolecular hydrogen bond formations are considered to analyze the stability of the protein-ligand complex (Van Der Spoel et al. 2005). The advantages of molecular dynamics simulation come with a cost—the process becomes computationally expensive. Lower simulation time will reflect the inadequacy (of the model) in the conformation sampling step. Force fields are used in the approximation of the quantum-mechanical model of motion at the atomistic level; hence, molecular dynamics simulations fail largely for the systems having dominant quantum effects such as bonds involving transition metal atoms (Durrant and McCammon 2011). The tools/platforms that can be employed to perform molecular dynamics simulations and analyze the output files post simulation are reviewed in Table 4.

Table 4 Platforms to perform molecular dynamics simulations and analysis of output files

8 Conclusion

To ease the process of drug discovery, a shift toward the application of computational tools is witnessed in the current era of research. Challenges arising due to the pleiotropic nature of biomolecules and the interaction of chemical compounds with multiple pharmacological targets (often encountered in combinatorial/multitargeted approaches) can be addressed by chemo- and bioinformatics tools that make use of databases on physicochemical characteristics and therapeutic use of compounds. Early prediction of ADMET properties of a chemical compound can be of utmost importance since most drug failures occur in the later phases due to undesirable pharmacokinetics and toxicological characteristics. Simulating the system of proteins and other receptor molecules interacting with ligands at the atomistic level has paved its importance to identifying novel drug-like compounds. This considers the dynamic nature of the proteins, thus sampling numerous conformational states and selectively stabilizing them when an agonist or antagonist interacts. The breakthroughs in hardware-based computational power and the development of new algorithms ease the calculation of molecular forces that exist in the system. Biological networks have the capability to highlight the interacting elements within a complex biochemical system, thus aiding in the visualization and exploration of big data. In brief, molecular docking, pharmacophore modeling, methods relating (Q)SAR, molecular dynamics simulation, network pharmacology, and machine learning algorithms accelerate the drug discovery process and complement the traditional bioactivity-guided fractionation, high-throughput screening, and systems biology approaches. The examples that are listed/tabularized in this chapter highlight only a fraction of popular software/platforms and encourage the readers to explore other alternatives in various domains of drug discovery and protein engineering.