Keywords

2.1 Background

Since ancient times humans have been exploring natural sources like plants and herbs for medicinal purposes. The primitive usage of these products was limited to the direct application of plant leaves, barks, extracts, fruits, seeds, etc., as medicines. The earliest traces of the medicinal usage of plants are found in the Sumerian civilization’s Clay tablet records (Petrovska 2012). Historical records from ancient Egypt, Greek, and the ancient Indus valley civilization have shed light on the pre-historic usage of plants for therapeutic aspects (Bernardini et al. 2018; Dias et al. 2012). However, for a long time and still, to some extent, these techniques have been considered a clear example of medicinal witchcraft despite their success in treating certain ailments of humans and animals. The emergence of molecular biology has made some remarkable discoveries that emphasized the importance of natural resources as medicinal agents.

When we talk about natural products (NPs) in the context of drug discovery, the majority of instances come from the plant kingdom or medicinal plants. Phytochemicals are prominent examples of NPs in drug discovery. Besides, fungi, bacteria, certain insects, and other vertebrates have also been reached for medicinal purposes. There are so many comprehensive reviews that shed light in detail on the types of NPs (Cragg et al. 1997; Harvey et al. 2015; Romano and Tatonetti 2019). In the chapter, our focus will mainly be confined to plant-based NPs or phytochemicals and various techniques in Bioinformatics and Cheminformatics that have been applied so far to develop and design new drugs.

2.2 Historical Overview of Drug Discovery from Natural Products

Although the usage of natural products for the therapy of various ailments has pre-historic traces, from the modern perspective, drug discovery from NPs ways back to the nineteenth century when extraction and chemical analysis came into existence, for a comprehensive review of the usage of NPs for medicinal purposes, readers are suggested to refer to (Petrovska 2012). This journey began with the isolation of alkaloids from plants. Friedrich Sertürner, a German pharmacist, first isolated morphine from the poppy plants in 1806 (Pathan and Williams 2012). This is the first-ever instance of extraction of NP from a plant in modern pharmacology. After a decade, some more alkaloids were extracted from the plants like ipecacuanha and strychnos in 1817 (Petrovska 2012). The credit from the commercial extraction of alkaloids, mainly morphine, goes to Heinrich Emanuel Merck in 1826 in Germany (Atanasov et al. 2015).

The second phase of advancements in the exploitation of NPs in drug discovery began with synthesizing various phytochemicals and their derived products in the laboratory. The most remarkable example of such experiments is acetylsalicylic acid, aka Aspirin, the wonder drug used as a pain killer. Charles Frédéric Gerhardt first synthesized acetylsalicylic acid by treating acetyl chloride and sodium salicylate in 1853 (ref). Sodium salicylate is a sodium salt of salicylic acid. Salicylic acid has been mainly extracted from the Willow tree and other salicin-rich plants. Willow bark is known for medicinal purposes in the ancient Sumerian and Egyptian civilizations (Norn et al. 2009). Later on, the commercial synthesis of Aspirin began in 1897 by Felix Hoffmann in the Bayer Company.

Moreover, the discovery and synthesis of Aspirin from plant-based extracts have ignited a rapid growth and development of NP-based drug discovery, which is still on the path of progress with modern strategies and techniques. In the next section, we will discuss some strategies for NP-based drug discovery, mainly focusing on the computational and data-driven methods using Bioinformatics and Cheminformatics’.

2.3 Strategies for Drug Discovery Through Natural Products

In this section, we will discuss some modern strategies applied in NP-based drug discovery and design. These strategies make use of established Bioinformatics and Cheminformatics methods. Some leading methods incorporated in these approaches are QSAR analysis, pharmacophore modeling, molecular docking, molecular dynamics simulation, gene expression-based drug discovery, NP library construction, biomarker identification for NP-based drugs, NP-derived database development, big-data-driven drug discovery, machine learning-based methods, combinatorial library construction, fragment-based drug discovery, pharmacokinetic properties prediction and so on (see Fig. 2.1 for a generalized in silico drug discovery process).

Fig. 2.1
A flow chart illustrates the silico drug discovery processes such as screening and docking, lead optimization, A D M E T, clinical trials, and F D A.

Graphical representation of in silico drug discovery process from NPs

2.3.1 Quantitative Structure-Activity Relationship (QSAR)

QSAR is a well-known technique applied in structure-based drug discovery. Although, the formal definition and technical details of QSAR are disseminated in a wide range of literature published previously. But for the brief overview for the reader, QSAR is applied to correlate the activity of a chemical compound with its structure. This uses a set of “predictor variables,” which are molecular descriptors as physicochemical properties of the compounds in the context of chemical compounds. Based on these values, the biological activities of new chemical compounds can be predicted (Verma et al. 2010). The applications of QSAR in NP-based drug discovery we established and have come forward with promising results (Ref).

2.3.2 Pharmacophore Modelling

Pharmacophore “is the ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target structure and to trigger (or to block) its biological response (IUPAC).” Pharmacophore modeling and its application in drug discovery have seen a rise during recent years. Similarly, NP-based drug discovery has also been benefited from the powers of pharmacophore modeling. The detailed description of pharmacophore approaches is beyond the scope of this chapter. The reader is suggested to refer to (Leach et al. 2010; Yang 2010) for a thorough understanding of the various aspects of pharmacophore modeling and its application in drug discovery.

2.3.3 High Throughput Virtual Screening and Molecular Docking

Screening of 1000 compounds against a target molecule in the laboratory is costing too much of money and time. To overcome the complications of this approach and make the processing time and cost-effective, high throughput virtual screening (HTvS) has been applied to scan a library of bioactive compounds with a biomarker for drug discovery. Subsequently, a selected set of possible candidate compounds are taken for docking analysis which further elucidates the patterns of binding and interaction with the target molecule. Docking helps identify the possible physical interaction between ligand and receptor while suggesting conformational poses that might fit into the binding cavity of the receptor. The combination of HTvS and molecular docking are well-established methods applied in computer-aided drug discovery in general and for NP-based drug development in particular.

2.3.4 Molecular Dynamics Simulations

The first successful application of MD simulation to study protein folding in 1979 opened a window for theoretical chemists to develop further the technique to study more intricate biological processes (i.e., protein–ligand interaction) at the atomic level that are otherwise difficult to study in lab conditions. With this, MD simulations became the most desired tool to study the conformational effects of ligand binding on the target molecule. MD simulations, QSAR, pharmacophore modeling, and molecular docking are widely applied in silico drug discovery studies. The mechanistic insights into the binding of a drug molecule to its target are crucial to understanding the intricate mechanism of how the drug works on its target, thus leading to the development of more specific ligands. NP-based drug discovery and development have largely benefitted from the strengths of MD simulations.

2.3.5 Prediction of Pharmacokinetic Properties and Toxicity

Knowledge of pharmacokinetic properties i.e., Adsorption, Distribution, Metabolism, and Excretion (ADME), and the safety in terms of toxicity of the NP-based compounds, is a challenge in the drug discovery process. There are several statistical and knowledge-based methods for the in silico prediction of these properties, which help eliminate the signs of “unlikely” candidates from the pool of NPs. By convention, ADME properties of a candidate molecule depend on the physicochemical properties such as molecular weight, lipophilicity, number of hydrogen bond donors, and acceptors of the compounds. In practice, “Lipinski’s Rule of Five” is applied as a filter to separate candidates that violate these parameters. Although, the application of the rule of five needs a significant amount of caution and decision making when relatively large NP-based compounds are in the picture. Hence, the rule of five has its limitations.

2.3.6 Computational Combinatorial Chemistry and Library Design

The emergence of combinatorial chemistry in the 1980s (Liu et al. 2017) opened channels for developing chemical libraries of structurally diverse chemical compounds using the structural and chemical properties of known bioactive compounds. Since then, with the advancements of computers, several methods have been developed that use combinatorial chemistry approaches to generate thousands of synthetic compounds and develop ready-to-screen chemical libraries for drug discovery purposes. In the context of NPs, this approach has been applied in recent years to develop chemical libraries of candidate compounds using naturally occurring compounds scaffolds (Grabowski et al. 2008; Mang et al. 2006). As a part of this, computational mutagenesis is a recent technique that involves mutating specific structural features of the target NPs to generate libraries of novel candidate compounds (Chen et al. 2002; Romano and Tatonetti 2019).

2.3.7 Machine Learning Approaches for NP-Based Drug Discovery

Machine learning algorithms have found their way through almost every domain of human life where human–machine interaction is possible. The area of computer-assisted drug discovery is also not remained untouched with applications of machine learning methods. In today’s time, various aspects of drug discovery take help from the strengths of machine learning algorithms, whether in target identification or validation, de novo inhibitor design, virtual screening, docking, and ADMET property prediction (Vamathevan et al. 2019). Similar trends are also observed in NP-based drug discovery during recent years, where machine learning algorithms are applied to reduce the intricate decision-making process in drug discovery. There are discussions about possible applications of machine learning methods in predicting the functions of NPs using their two-dimensional structures (Liu et al. 2019). Moreover, manifold developments in the strengths of machine learning methods while addressing their limitations in dealing with NP-based drug discovery suggest potential success shortly.

2.3.8 Big Data and Data-Driven Drug Discovery

This section, however, is an extension to the last section, where we have discussed the applications of machine learning in drug discovery. Since these algorithms use an ample amount of predictive data that need both accuracy and integrity to reduce the possibility of false-positive prediction. In the context of drug discovery, big data refers to the huge amount of chemical information piled up in publicly accessible databases such as ZINC, PubChem, ChEMBL, and DrugBank, etc., which store millions of active compounds both naturally occurring and synthesized (Thomford et al. 2018). Big data also encircle a large amount of clinical data stored in Electronic Health Records (EHRs). Besides, disease biomarker databases, disease pathways, protein–protein interaction networks, protein–drug interaction networks, cancer gene expression data, etc., add more to the paradigm of big data. This leads to a novel but challenging aspect of drug discovery, i.e., data-driven drug discovery.

2.4 Tools and Databases for NP-Based Drug Discovery

This section will catalog various tools and databases based on or otherwise strategies above applied in NP-based drug discovery. Herein, the tools and databases discussed are generally used in computer-aided drug discovery and those sources specific to NP-based drug development. Table 2.1 provides a detailed list of available sources along with their brief description. In the following table, we have provided information on the tools and databases widely applied in the drug discovery process in general and particularly for NPs. However, the reader is suggested to refer to (Chen et al. 2017; Lagunin et al. 2014; Ma et al. 2011; Naqvi and Hassan 2017; Naqvi et al. 2018; Nguyen-Vo et al. 2020) for comprehensive details on the sources for in silico drug discovery.

Table 2.1 List of tools and databased for natural product-based drug discovery

2.5 Recent Advances in Drug Discovery from Natural Products

This section will explore and discuss recent case studies that have focused on natural product-based drug discovery using state-of-the-art in silico methods that we have discussed in the previous sections. This will direct the reader to understand and observe the current state of NP-based drug discovery. The studies discussed in this section are both purely computational technique-based and have a hybrid approach of integrating laboratory techniques with the in silico methods.

  1. 1.

    Monoamine oxidase B (MAO B) is associated with the catalysis of aryalkylamines neurotransmitters. Its malfunction is said to have possible involvement in the development of Parkinson’s disease. In an attempt to develop potential inhibitors of MAO B, Mladenović et al. (2017) have applied, in a hybrid in vitro in silico approach, 3D-QSAR models for the evaluation of the biological activity of coumarin based compounds. Coumarin is a phytochemical which is found in tonka beans in high concentrations. In this approach, they developed a combination of structure-based and ligand-based 3D-QSAR models and eventually deduce six relatively active inhibitors of MAO B, which might act as potential lead candidates for drug development for Parkinson’s. Dhiman et al. (2018) have reviewed the application of 3D-QSAR on a diverse set of NP-based compounds such as coumarins, morpholine, piperine, naphthoquinone, amphetamine moreover flavonoids, caffeine, and curcumin, etc., as potential MAO inhibitors. They conclude the effectiveness of QSAR and molecular docking and COMFA in finding selective and highly active inhibitors of MAO.

  2. 2.

    As discussed in the previous sections, molecular docking in combination with molecular dynamics simulations has proven very effective in computer-aided drug discovery studies. In recent years, many successful experiments have been conducted. These tools have been applied to discover potent and effective inhibitors for several known biomarkers of life-threatening diseases. Khan et al. (2009) studied the inhibitory effects of flavonoid derivatives quercetin, rutin, kaempferol 3-O-beta-d-galactoside, and macluraxanthone using molecular docking and enzyme inhibition assays against the activity of acetylcholinesterase (AChE) and butyrylcholinesterase (BChE). In this study, they found that macluraxanthone shows effective binding against both enzymes. Moreover, quercetin also exhibited strong intermolecular interactions with both these enzymes.

    Cozza et al. (2006) identified ellagic acid as an effective inhibitor of casein kinase 2 (CK2) using virtual screening and molecular docking methods. In a recent study, Zhang et al. (2020) screened a library of 2080 NPs to discover their efficacy as potential antiviral compounds. The study aimed to further utilize the compounds as a potent HIV-1 capsid (CA) protein inhibitor. Based on molecular docking, they found compounds rubranol and hirsutanonol showing strong intermolecular binding with HIV-1 CA.

    Ebola virus nucleoprotein (EBOV NP) is significant for its proliferation. To develop effective inhibitors against EBOV NP, Nasution et al. (2018) screened a library of 190,084 NPs from the ZINC database. To evaluate top-scoring compounds’ binding affinity and effectiveness, they applied a flexible docking approach and molecular dynamics simulation. Eventually, α-lipomycin and 3-(((S)-1-amino-1,2,3,4-tetrahydroisoquinolin-5-yl)methyl)-5-((5-((5R,7S)-5,7-dihydroxy-3-oxodecyl)-2-hydroxyphenoxy)methyl)pyrrolo[3,4-b]pyrrol-5-ium were found showing strong binding thus posing as potent candidates as anti EBOLA drug.

  3. 3.

    Machine learning algorithms coupled with QSAR or molecular docking have proven very effective in elucidating the inhibitory effects of NPs towards the identification of novel drug candidates (Korotcov et al. 2017; Lavecchia 2015). Classical MD simulation when integrated with machine learning-based methods, enhances the performance of in silico drug discovery processes (Perez et al. 2018). Shi et al. (2020) applied machine learning models to discover New Delhi metal beta-lactamase (NDM-1) inhibitors. NDM-1 producing bacteria are crucial in drug-resistant bacterial infections. They screened a library of NP-based compounds using prediction models and also compared their performance with the virtual screening and docking strategy. As a result, machine learning models exhibited 90.5% accuracy in predicting the potent inhibitors in comparison to 69.14% accuracy by the traditional docking approach.

    Besides inhibitor discovery, machine learning is also applied to ADME property prediction and toxicity profiling of the NPs. In an attempt to assess the efficacy of machine learning in the toxicity profiling of natural compounds, Onguene et al. (2018) carried out a toxicity assessment of three compound libraries of African flora, which have anti-malarial and anti-HIV activity. When compared to available experimental data for toxicity, machine-learning models were found to agree with the compounds’ predicted toxicity.

  4. 4.

    Biological and chemical data during recent years has seen a manifold increase during recent years. Moreover, clinical data for several patients in the for EHRs is also available worldwide. This has opened a new window in the realms of drug discovery, called “data-driven drug discovery.” The astronomical amount of data, preferably referred to as “big data,” has a tremendous scope towards novel drug discovery (Lusher et al. 2014). During recent years, efforts have been made to study the drug effects and their interactions with pathogenic drug targets using the available information stored as clinical records and EHRs (Tatonetti et al. 2012; Yao et al. 2011). Despite the effectiveness of this approach in revolutionizing the drug discovery era, there are certain limitations, such as hurdles in accessing the clinical records or EHRs or limitations in the understanding of clinical data by informatics researchers. However, projects like Electronic Medical Records and Genomics (eMERGE) network (McCarty et al. 2011) and Observational Health Data Sciences and Informatics (OHDSI) (Hripcsak et al. 2015) are moving towards removing these barriers.

2.6 Challenges and Prospects

In this chapter we have discussed methods and strategies to find out potential drug candidates from NP-based compounds. Most of these approaches seem promising in providing relevant answers to drug discovery problems both in traditional laboratory-based methods and computational techniques. But then the question arises. Do the drug candidates and so-called “potential” inhibitors reach their destination? Destination as in as marketed drugs in the real world treating real diseases. If we see the statistics, the results are somehow satisfying. According to the survey of Newman and Cragg (2016), out of 1562 drugs approved between 1981 and 2014, 646 drugs are either NPs or NP-derived. Another survey suggests that around one-third of the new molecular entities (NMEs) approved by the FDA belong to NPs (Patridge et al. 2016). The statistics for the success of drug candidates discovered through in silico methods are also promising (Zhu et al. 2018).

Despite all the success of in silico drug discovery studies in general or in particular for NP-based compounds, some challenges still need to be addressed for a better future of NP-based drug discovery. These challenges cover the under or overuse of computational methods for discovery, unequal distribution of chemical data over several sources, thus limiting the access in most cases, controlled or no access to clinical data such as in the case of EHRs, etc. However, with the rapid development of computer hardware to mimic more intricate biological and molecular processes, processing large chemical libraries, and providing better solutions to the challenging problems faced in computer-aided drug discovery, the future of in silico methods for NP-based drug discovery is bright and promising.