Keywords

5.1 Introduction

Drug discovery is an extensive, costing big-budget, time-consuming process with the low rate of success. The development of a drug from scratch to market value, maintaining its efficacy, takes around 13–15 years and costs billions of dollars on average and still counting. In comparison to that, the rate of the launching novel drugs in the market is less. It is estimated that about more than half of all the drugs approved in the last three decades were either NPs (Natural products) or their semisynthetic derivatives (Newman and Cragg 2016) (Patridge et al. 2016).

The reason is their diversity in species and utilization for medicinal purposes since ancient civilizations. NPs possess comparatively greater molecular mass and a number of sp3 carbon atoms, H-bond acceptors and donors, more hydrophilic nature, and molecular rigidness than that of nonnatural compounds’ libraries (Atanasov et al. 2015) (Feher and Schmidt 2003). The structural upper hand can be advantageous while tackling protein-protein interaction owing to the greater rigidity of NPs (Lawson et al. 2017). Despite not adhering to Lipinski’s rule of five, NPs are still a class that is used for therapeutic purposes, owing to their high molecular mass.

Natural products, despite being an inspirational source for NP-based drug discovery, pose disadvantages for the pipeline. NPs have diverse and complex molecular structures which means a challenge for generating 3D molecular structures and their analogs while considering stereochemistry, force fields, and algorithm for predicting protein-bound conformations (Friedrich et al. 2019). Dereplication tools are required to circumvent the rediscovery of known compounds. Other challenges include procuring the materials, extraction, detection, and isolation of bioactive compounds and generating activity profiles are time-consuming and the success rate is less. Considering the facts, the prior prediction of activity using in silico methods can aid in simplifying the process.

The capital needed for in silico experiments is comparatively less than the expenses (for example- scikit-learn, CDK) associated with experimental procedures of which software licensing costs alone, continue to be a significant cost component and have been steadily rising in recent years. Moreover, on site efficient computing center is no longer necessary as calculations can be performed affordably in the cloud at very large scales, with a low degree of complexity. Computational-based drug discovery has well-established techniques equipped with cheminformatics for easing the process, reducing the loss and comparatively less time-consuming. These techniques involve data mining on large data, dereplication, chemical space analysis, visualization and comparison, prediction of bioactivity, ADME and safety profiles’ natural products-inspired de novo design, and prediction of natural products prone to cause interference with biological assays (Chen and Kirchmair 2020).

5.2 Natural Products’ Collections

The definition of “natural products” is not universally agreed upon, with some authors limiting the term to small molecule secondary metabolites while others broadly accept that chemical substance produced by a living organism as NP. The latter one holds more diversity and hence the line separating the subclasses remains ill-defined. The therapeutic class of NP as per the definition can be classified into phytochemicals, fungal metabolites, toxins, antibodies, and NPs with limited activity. The NPs collection can also be categorized as physical and virtual collections for in silico technology.

5.2.1 Physical Collection

The importance of NPs in ailment curing can be dated back to ancient civilizations. In earlier decades, natural compounds and their structural analogs have significantly added to the therapeutic arsenal for curing numerous diseases, including cancer and infectious disorders. According to a survey, only 6% of the estimated four lakh plant species have undergone activity studies, while less than 20% have undergone phytochemical investigations (Fabricant and Farnsworth 2001). Phytochemicals being antioxidants and a source for many life-saving medicines form a broad class of NPs including polyphenols, terpenoids, and alkaloids. The fungal metabolites have been explored for their use as antidiabetic, antibacterial, antioxidant, antitumor, and even insecticidal agents (Daley et al. 2017). In most cases, chemotherapy medications are made from naturally occurring poisons produced by large clades of organisms, such as plants, fungi, and bacteria. The next important therapeutic class is antibiotics with more than 60% of drugs approved and more than 500 in the developmental stage as per the survey in 2016 (Cragg et al. 1997). The common mechanism of action includes receptor blocking or downregulation and induction of target cell signaling which can be exploited for rheumatoid arthritis, non-Hodgkin lymphoma, multiple myeloma, and various other diseases (Carter and Lazar 2018). NPs such as biopolymers, spider silk are known for their activity in drug delivery systems rather than therapeutic value.

For virtual screening of NPs for in silico studies, the majority of compound suppliers across the world now freely offer information related to the structures and some other features of the compounds. According to a survey, of the total known NP compounds in virtual databases, only about 10% of them are available for experimental procedures (Chen et al. 2017). This lack of availability of NPs physically serves as a blockage in the path of drug discovery. However, the readily available ones have favorable physicochemical properties for the drug discovery pipeline. Moreover, more than half of them have a molecular weight of less than 300 Da hence, providing many prospects for optimization (Chen et al. 2017). There are more than 100 commercial suppliers of purified NPs in the world, but only few of them supply more than 5000 NPs.

The fact that the (abovementioned) 25 k easily accessible NPs encompass more than 5700 Murcko scaffolds is noteworthy in this respect. Additionally, these NPs including alkaloids, steroids, and flavonoids, provide a fair representation of all of the major NP classes (Chen et al. 2018).

5.2.2 Virtual Collection

The rapidly growing attention of NPs has led to steep growth in NP-based databases. The virtual collection (or databases) of NPs can be categorized into (i) the generalized NP-based databases, (ii) databases of traditional NPs, and (iii) specialized databases (Chen et al. 2019a). The second category includes databases for traditionally used NP-based drugs whereas the third category includes databases focusing on some exclusive organisms belonging to a specific habitats, biological activities, or specific NP classes. A survey reported that since the 2000s, approximately 120 different databases and collections have been released and used in context with NPs (Sorokina and Steinbeck 2020). And of them, approximately 50 are open access, whereas 98 are still in some way accessible. These open -access databases include NP-based database collections published as supplementary material in scientific publications as well as those available in the ZINC database (Sterling and Irwin 2015). The collection of NPs on ZINC database provides information about their structure and their origin but no other additional information. The databases could be open access or commercially available. Amongst free NP databases is Super Natural II, consisting of more than 325 k NPs (Banerjee et al. 2015). A chemistry-aware online interface can be used to query the database, although the bulk download is not officially supported. Universal Natural Products Database (UNPD) is another free database with more than 200 k NPs and downloadable resources (Gu et al. 2013). Unfortunately, UNPD database appears to be nonfunctional. These virtual databases are either specific to a particular geographical region (like databases only for Chinese herbals), or particular section of NPs (like database for only marine-based NPs), or could be generalized (COCONUT) (Sorokina et al. 2021). Some examples of functional databases are listed below in Table 5.1.

Table 5.1 Examples of some active databases

Some other examples include NuBBEDB (Pilon et al. 2017), KnapSack (Nakamura et al. 2013),CMAUP (Zeng et al. 2019), and smaller databases like FooDB. On the contrary, the data available on the therapeutic efficacy and protein-bound conformations of NPs suffer from scarcity. Amongst the most relevant ones, the Marine Natural Library has special mention, as it allows the download of the full dataset of more than 14 k marine NPs (Bugni et al. 2008). NPs seem to have a slight upper hand over synthetic compounds, as their “libraries” already exist in nature. The generalized databases of chemical compounds (Li et al. 2010; Leach 2017) (such as PubChem and ChEMBL) also include databases related to NPs that are annotated by their class, while, more specific ones (such as ArachnoServer, VenomKB, and the Dictionary of Marine Natural Products) provide even more granular annotations for aggregating NP libraries with various characteristics of interest (Dona et al. 2017; Romano et al. 2018).

5.3 Cheminformatics and Computational Approaches for NP-Based Drug Discovery

5.3.1 Computational-Based Approaches

Computer-based approaches being the broader term encloses within cheminformatics technology. Cheminformatics is the application of computational approaches to facilitate collection, storage, analysis of large databases addressing the major concern, drug discovery. Along with cheminformatics, other informatic approaches such as bioinformatics, semantic methods have also been reviewed (Romano and Tatonetti 2019). Computational techniques have long been regarded as an important part of drug development and discovery procedures. The various approaches it offers for drug discovery purpose are structural elucidation, analysis of the physicochemical and structural properties, in determining macromolecular targets, prediction of ADME properties and safety profiles. Computational methods can be broadly classified into: structure-based and ligand-based for the abovementioned approaches (Podlogar et al. 2001). This classification is revolving around the level of structural information available in context with target to support the computational calculations. Structure-based methods operate on the availability of info regarding three-dimensional (3D) molecular target of interest, typically obtained from X-ray crystallography, nuclear magnetic resonance, or homology modeling (Cerqueira et al. 2015). Whereas ligand-based approaches focus on the availability of information in context with active ligands (and inactive compounds, when available) (Lill 2007). With the increasing need for prior virtual screening of NPs and maintaining of databases, cheminformatics has made its way through drug discovery process. The methods are generally classified as direct and indirect approaches, based on the type of properties they exploit. Direct approaches deal with chemical activity, their constants, reactive groups, ADME profiling, whereas indirect ones deal with structural specifications, compound category or other observations (Romano and Tatonetti 2019).

5.3.2 Cheminformatics and NP-Based Pipeline

So far, cheminformatics and other related informatics approaches have been reviewed in drug discovery pipeline. Cheminformatics and other approaches have played important part in curating NP-based fragmented databases and analyzing the result. Cheminformatics and computational approaches share an important linkage, basically cheminformatics is the application of computational approaches as shown in Fig. 5.1. Cheminformatics techniques exclusive to NP-based drug discovery are NP-based QSAR analysis, Molecular Docking and Dynamics, Computational Mutagenesis, and Library Construction. Numerous classes of NPs have been studied using QSAR, and the chemical descriptors used tend to be dictated by the particular classes (Huang et al. 2016). For example, small-molecule NPs include categorical variables suggesting their specific category of classification, species of origin. Similarly, in case of molecular docking, the specific classes of NPs decide the interaction of target and ligand. For example, if a macromolecular NP (belonging to specific class) is suspected of showing interactions with small-molecule metabolites, docking simulations can be used for mining which metabolites could bind to that NP (Pithayanukul et al. 2009). Other aspects of molecular docking include protein preparation and flexibility, pose scoring in context with binding affinity. The generation of extensive libraries of compounds and its screening aids in prediction of potential drug candidates along with awareness of encountering small fraction of hits (Terrett et al. 1995). In case of NPs, their databases exist in nature way before synthetics. In this chapter, we are going to discuss different analytical methods used in computational approaches for NPs. Antibodies, despite of their large molecular weight, are relatively easy to screen for large numbers via docking, indicating their specificity in structural and binding properties that eventually reduces computational complexity for simulations (Mann 2002). Additionally, noteworthy success stories have emerged from screening smaller NP-based databases against specific drug targets. For example, the compound ellagic acid, known to have both antiproliferative and antioxidants properties, was identified by Moro et al. by screening a proprietary database of 2000 NPs against the oncoprotein casein kinase 2 (Cozza et al. 2006).

Fig. 5.1
A schematic representation exhibits a list of computational approaches and cheminformatics. Cheminformatics include computational mutagenesis, molecular docking, and so on. Computational includes structural elucidation, structural diversity analysis, N P likeness assessment, and so on.

Amalgamation of cheminformatics and computational approaches

5.4 Computational Approaches Related to Natural Products

5.4.1 Structural Elucidation

For the extraction and isolation of NPs, the source of materials area is going to be highly priced and long-time taking and when everyone gained knowledge about the NPs, the discovering of novel compounds is decreasing. Of order to make the most of the available experimental resources, it is necessary to integrate analytical and computational approaches for early detection of both favorable and negative features in NPs (Pereira and Aires-de-Sousa 2018). Databases that provide measurable analytical statistics (such as bioactivities, chromatographic data, MS, NMR spectroscopy, and FTIR data) for known NPs and their interrogation using computational methods play a crucial role in this interaction of technologies. However, even the biggest of these databases only include a small subset of the NPs that are considered. This is why NMR and MS fragmentation predictions are increasingly being made using computational methods, often in tandem with structure generators (Pereira and Aires-de-Sousa 2018).

In recent days, for the virtual screening of natural product (NP) candidates in both small datasets of isolated chemicals and huge databases, structure-based (SB) and ligand-based (LB) cheminformatics techniques have become indispensable tools. Quantitative structure-activity relationships (QSAR), assessment of drug similarity, forecasting surface assimilation, distribution, metabolism, excretion prediction, similarity detection, and pharmacophore identification are the most often used LB approaches. Similar techniques used in SB methods include molecular dynamics, docking, and binding cavity analysis (Pereira and Aires-de-Sousa 2018).

The potential of re-isolating well-known molecules has recently, however, put a hold on the drug development process from natural products. The process of dereplication, which automates the quick identification of previously isolated compounds, directs researchers to fresh discoveries and cuts down on the time and effort needed to develop innovative medication leads. Dereplication uses processed experimental data to identify compounds by comparing it to data from known compounds, hence it requires a variety of computing tools and resources to process and analyze compound data. The combination of analytical data analysis and multivariate data analysis is a key technique for computer-assisted dereplication (Chanana et al. 2017). Dimensionality reduction methods like principal component analysis (PCA), cluster analysis, and/or discrimination assessment may be used to isolate interesting NPs from complicated mixtures, such as NPs in extracts that are specific to a certain organism of interest (Chanana et al. 2017; Abdelmohsen et al. 2014).

By analyzing spectroscopic data, computer-assisted shape elucidation (CASE) systems aim to identify the optimal shape for an active molecule. Structures that are in agreement with experimental (spectroscopic) data are listed and ranked by CASE systems for greater precision. CASE structures ideally operate at low mistake rates and in a fully computerized form. The assignment of stereochemical attributes to NP structures can be done using sophisticated CASE systems because they also take stereospecific NMR data and/or calculations based on DFT (density functional theory) into account (Burns et al. 2019).

NP dereplication is a topic that machine learning techniques find very appealing. Using 13C NMR spectroscopic data, for instance, a recent study once investigated the possibility of machine learning algorithms to assign NPs to eight NP classes (such as chromans) (Martínez-Treviño et al. 2020). It is used to take an XGBoost classifier to achieve the remarkable overall performance. More than 80% of a test set’s compounds were correctly assigned for the majority of NP classes. For the quick identification of novel NPs from a filamentous marine cyanobacterium, another discovery successfully applied a convolutional neural network-based method (Reher et al. 2020).

One of the most up-to-date resources for managing MS/MS spectra and sharing the results of such analyses is the Global Natural Product Social Molecular Networking (GNPS). It enables researchers to investigate a dataset and compare its results to anything else that is publicly available. Online dereplication is made possible by GNPS’s usage of automated molecular networking analysis (Wang et al. 2016).

5.4.2 Analysis of Physicochemical and Structural Properties

By utilizing the physicochemical and structural characteristics of NPs, NPs have been characterized in a significant way by cheminformatics. The chemical space that NPs occupy is substantially larger than that of synthetic compounds, and they also occupy regions of the chemical space that are often inaccessible to synthetic molecules (Ertl and Schuffenhauer 2008) (Singh and Culberson 2009).

Compared to synthetic pharmaceuticals and synthetic, drug-like substances, NPs are generally heavier and more hydrophobic (Chen et al. 2019b). In addition, their structural complexity is usually higher, particularly when it comes to stereochemistry (often measured by the number of chiral centers and the number of bridgehead atoms in ring systems) and three-dimensional molecular form (Henkel et al. 1999) (Lucas et al. 2015).

The vast variety of ring systems displayed by NPs, particularly in aliphatic systems, is astounding (Ertl and Schuffenhauer 2008) (Grabowski and Schneider 2007). Researchers found that commercially available screening databases lacked core ring scaffolds for 83% of NPs. The two characteristics of NPs that set them apart from synthetic compounds in terms of atom composition are their low variety of nitrogen atoms and their large number of oxygen atoms (Feher and Schmidt 2003; Wetzel et al. 2007; López-Vallejo et al. 2012). However, the vast majority of known NPs and, even more so, those found in actual NP libraries have pharmacological properties (Chen et al. 2018).

Physicochemical and structural characteristics vary across NPs from various kingdoms. For instance, marine species are more likely to have macrocycle-containing NPs or lengthy aliphatic chains than terrestrial species (El-Elimat et al. 2012) (Muigg et al. 2013) (Saldivar-Gonzalez et al. 2018). Their NPs are distinguished by an excessive number of heteroatoms and, in conjunction with this, a wide range of functional groups (Pilkington 2019) (Shang et al. 2018) (Ertl and Schuhmann 2020) (Ertl and Schuhmann 2019).

5.4.3 Structural Diversity Analysis

In terms of structural variety, NPs are incomparable, and this is something that is also evident at the fragment level (Tran et al. 2020). Using the concept of molecular scaffolds, some research compares natural products (NPs) to synthetic ones in order to evaluate the structural diversity of NPs (Bemis and Murcko 1996). Recent research contrasts the scaffolds that are unique to natural products (NPs) with those of synthetic chemicals and presents an intuitive depiction of them (Ertl and Schuhmann 2020). This then allows us to compare the scaffolds often seen in NPs derived from bacteria, plants, fungi, or mammals (Chen et al. 2018).

Scaffold Hunter is a potent java-based application for the intuitive, visual study of the structural variety of a set of chemicals (Schäfer et al. 2017; Lachance et al. 2012). The concept of molecular scaffolds being represented and categorized hierarchically forms the foundation of Scaffold Hunter. An early version of this tool was used to develop the structural categorization of NPs (SCONP), a technique for mapping the chemical space of NPs (Koch et al. 2005).

Principal component analysis (PCA) is a common technique for mapping the chemical space since it transforms high-dimensional data into a low-dimensional space with little loss of information. The most useful result of principal components analysis (PCA) is the PCA scatter plot, which shows how the data points are distributed in a low-dimensional space (Saldívar-González et al. 2019; Shen et al. 2012).

A method called ChemGPS was created and updated for usage with NPs under the name ChemGPS-NP in order to prevent the need for the principal components to be recalculated as new compounds are added to the datasets. For mapping the chemical space of tiny compounds, predicting modes of action, and analyzing structure-activity connections, ChemGPS-NP has been employed in a number of research (Frédérick et al. 2012; Korinek et al. 2017; Muigg et al. 2013).

The recently developed UMAP for Dimension Reduction method and t-SNE are two more reliable methods for dimensionality reduction. When various items are modeled by distant points and the same objects are generally close together, t-SNE creates plots. Although UMAP is quicker, it delivers results conceptually comparable to those of t-SNE (Van der Maaten and Hinton 2008) (Burton 2020).

Researchers recently developed Statistical-Based Database Fingerprint (SB-DFP), which is a new technique for representing the chemical space of compound databases by a single fingerprint. In theory, any chemical fingerprint and any reference set might be used to derive the SB-DFP, which has a wide range of applicability. By contrasting the binomial distributions of the preferred molecular fingerprint features among the compounds in an interest dataset with those in a reference dataset, the SB-DFP is created (Sánchez-Cruz and Medina-Franco 2018).

5.4.4 Natural Product-Likeness Assessment

The NP-likeness of compounds can be quantified using computational techniques, which can also distinguish NPs and NP-like substances from manufactured compounds with high accuracy. As a result, they are often used in the development of new compounds, the construction of libraries, the selection of NPs (and NP derivatives and analogs) from collections of mixed compounds, and the prioritizing of compounds (Chen et al. 2022) (Yu 2011).

The NP-Likeness Score is one of the most well-known strategies (Ertl and Schuffenhauer 2008). This score assesses the NP-likeness of compounds using Bayesian statistics, mostly based on how similar their fragments are to those of recognized NPs. With certain changes, the NP-Likeness Score has been modified in several programs and platforms (Jayaseelan and Steinbeck 2014; Vanii Jayaseelan et al. 2012; Sorokina and Steinbeck 2019). Additionally, a rule-based strategy and a theoretically related method using extended connectivity fingerprints (ECFPs) are other options (Zaid et al. 2010). A more recent method for locating NPs and NP-like substances in vast sets of molecules is called NP-Scout (Chen et al. 2019b).

In order to properly characterize the structural properties of NPs, a novel method known as the Natural Compound Molecular Fingerprint (NC-MFP) has been developed (Seo et al. 2020).

5.4.5 Identification of Bioactive Natural Products

With regard to identifying bioactive NPs, computational approaches have demonstrated their effectiveness. For NP research, the full spectrum of virtual screening methods has been used, from straightforward, quick methods based solely on 2D molecular fingerprint similarity to more sophisticated, 3D methods largely based on similarity in molecular structure, pharmacophore models, molecular interaction fields, or docking. Machine learning techniques have recently become a cornerstone in virtual screening for bioactive NPs (Kirchweger and Rollinger 2018).

The sparseness of the structural information that is now available will make it extremely difficult to attach NPs to the structures of macromolecules. This is due to the fact that docking algorithms and scoring criteria are particularly sensitive to even very small changes in 3D form, as those frequently brought on by ligand binding (including solvent effects). The careful employment of homology modeling techniques, induced fit docking methods, and molecular dynamics simulations, however, can also aid to overcome this challenge. Docking toward a variety of representative protein structures may be an effective strategy when dealing with highly adaptable proteins (for binding mode prediction as well as virtual screening) (Amaro et al. 2018; Grienke et al. 2010).

In terms of binding mode prediction, docking algorithms frequently produce accurate results as opposed to virtual screening. It is possible to generate a sufficiently accurate binding pose that offers crucial insights for the development of optimization techniques if the target NP is no longer excessively large or flexible, the ligand binding site is well-defined (i.e., not too shallow, not solvent-exposed), and the interaction between the binding companions consists of two or more directed interactions (Chen and Kirchmair 2020). Binding posture prediction is more practical than virtual screening because it completely ignores the most difficult part of docking—scoring compounds according to their binding affinity—and permits researchers to focus their efforts on a single ligand-target combination. Importantly, docking makes it possible to clarify the stereoselectivity of ligand binding, especially in the context of NP research (and different processes, such as metabolism). It is impossible to exaggerate how important it is to employ the proper stereochemical data when using 3D techniques, particularly docking (Warren et al. 2006).

5.4.6 Determination of Macromolecular Targets

When one, few, or even many compounds are tested against the broadest range of macromolecules, it may be said that in silico target prediction is a large-scale use of virtual screening (Grisoni et al. 2019). Numerous techniques including models have been described in recent years, and they are now recognized as crucial resources in the early stages of drug development. The majority of target prediction algorithms are ligand-based due to the difficulties associated with docking and structure-based approaches in general (specifically, the restricted representation of macromolecules through the available structural data) (Cereto-Massagué et al. 2015; Ezzat et al. 2019; Sam and Athri 2019; Chaudhari et al. 2017).

Ligand-based approaches span the whole spectrum, from simple similarity-based methods to sophisticated machine learning and network-based methods. Unexpectedly, despite the wide variety of computer approaches available today for target prediction, we still have a limited understanding of the importance of these tactics in practical situations. This is especially true given the (generally) expensive expenses associated in experimentally evaluating such models in a systematic, prospective manner. However, it is also true given the common use of partially inadequate, cursory retrospective validation techniques (Mathai et al. 2020; Mathai and Kirchmair 2020). To the best of our knowledge, the Similarity Ensemble Approach (SEA) is the only computational strategy for which consistent experimental validation has been documented (Keiser et al. 2007) (Keiser et al. 2009)(Lounkine et al. 2012).

In recent research comparing the effectiveness and scope of a similarity-based strategy and a machine learning technique toward determining the targets of small molecules, it was discovered that the structural similarity between both the compounds of interest and the compounds reflected in the training set is a key factor in both methods’ predictability (or knowledge base). Given that target prediction models are essentially created for and trained on experimental measurements for synthetic chemicals, it is important to take this fact into consideration while working with NPs (Mathai et al. 2020).

Surprisingly, in the same research, the similarity-based technique beat the machine learning strategy for the data at hand. The results imply that the basic similarity-based strategy is a realistic choice, in particular when taking into consideration model interpretability. However, a direct comparison in between two approaches should be approached with extreme caution for a number of reasons. Additionally, this is demonstrated by the successful operation of several well-known, similarity-based approaches like SwissTargetPrediction (Gfeller et al. 2014).

In addition to 3D similarity-based methods, 3D pharmacophore-based approaches are extensively utilized in the field of NP research for target prediction. A profiling investigation, for example, evaluated secondary metabolites extracted from the medicinal plant Ruta graveolens against a battery of over 2000 pharmacophore models covering over 280 targets (Rollinger et al. 2009). Arborinine was discovered to be an inhibitor of acetylcholinesterase (estimated IC50 = 35 M) as a result of this in silico search, among other potential bioactive NPs and interactions.

Machine learning-based methods have undoubtedly sparked the most interest in NP target prediction in recent years. SPiDER, TIGER, and STarFish are a few notable examples (Reker et al. 2014b) (Schneider and Schneider 2017a) (Cockroft et al. 2019).

With the use of “fuzzy” molecular descriptors, SPiDER employs self-organizing maps in an acronym that enables NPs to utilize it (Rodrigues et al. 2016b; Merk et al. 2018). The mannequin helped identify the targets of the macrolide PPAR, archazolid A (Reker et al. 2014a), including 5-lipoxygenase, FXR, glucocorticoid receptor, as well as, prostaglandin E2 synthase 1. It also successfully predicted the target of the 16-membered depsipeptide doliculide, which is prostanoid receptor 3 (Schneider et al. 2016). Numerous fragment-like NPs were also successfully recognized by SPIDER, including (i) sparteine, whose targets include the nicotinic receptors, muscarinic, p38 mitogen-activated protein kinase, and kappa opioid receptor (Rodrigues et al. 2016a), (ii) DL-goitrin, whose targets include the muscarinic M1 receptor and the pregnane X receptor, (iii) Isomacroin, whose targets were experimentally verified to be the adenosine A3 receptor and the platelet-derived growth factor receptor, and (iv) graveolinine, whose objectives were scientifically proven to be cyclooxygenase-2 and the serotonin 5-HT2B receptor (Rodrigues et al. 2015).

SPiDER and TIGER have a similar conceptual framework. The projected targets are scored using a new methodology and updated CATS descriptions (taking into account ensemble similarity). The marine NP (+−)-marinopyrrole A (Schneider and Schneider 2017a) has been effectively discovered by TIGER as a target of cholecystokinin receptor, the orexin receptor, and glucocorticoid receptor. The model correctly identified the estrogen receptors and as targets of the stilbenoid resveratrol, among other proteins (Schneider and Schneider 2017b).

A stacked ensemble target prediction approach called STarFish was developed using synthetic chemical data (Cockroft et al. 2019).

Most recently, medical indication information was used to train multitask deep neural networks and use them to identify privileged chemical scaffolds in NPs (in this instance, scaffolds are used for which many NPs built within the same scaffold are active inside the same indication). A privileged scaffold dataset was created for 100 indications based on the predictions of these models, which may be used as the starting point for NP-based drug development (Lai et al. 2020).

5.4.7 Prediction of ADME and Safety Profiles of NPs

ADME and safety profiling has a major say in drug discovery. ADME failures contribute to around 40% of all the drug failures (Bhhatarai et al. 2019). So far, the in silico ADME techniques have seen significant progress as shown in Table 5.2. Drug toxicity is still a major concern despite the fact that pharmacokinetics (PK) failures have decreased as a result of preclinical ADME investigations. These failures at late phases of drug discovery pipeline causes huge loss of time and capital. The in silico models provide a prior prediction for optimization. Another concern is drug–drug interactions (DDI) which can result in toxicity and severe ADR, obscuring the whole process. Established and broadly applicable computational filters will serve the best for screening and synthesizing and optimizing the drug product (Ekins et al. 2000). In the 1960s, the early phase of ADME models was developed using Hansch’s conventional QSAR methods. As a result, comparative molecular field analysis (CoMFA), a type of molecular modeling software, was developed, in such a way that three-dimensional visualization became an important direction for QSAR.

Table 5.2 Progress in in silico ADME (Bhhatarai et al. 2019)

The different ADME properties that can be evaluated by computational approach are solubility, permeability, clearance, metabolic stability, drug–drug interactions, blood–brain barrier, and cardiotoxicity.

The different software available for predicting ADME properties are MolCode toolbox, preADMET, MolCode toolbox, Discovery Studio,volsurfC, QikProp, ADMEWORKS Predictor C Chembench, and admetSAR (Shin et al. 2017).

The major challenges addressed by NPs related to ADME profiling are off-target receptors such as—hERG channel, cytochrome P450 enzymes (suspected for drug-drug interactions, and toxicity), and the P-glycoprotein (suspected for drug resistance). A plethora of such models based on statistical, machine learning, pharmacophore address these and many other off-targets. Another major concern is most of the computational models are validated by synthetic origin drug product. Computational models such as FAME 3 have reportedly known to for their effectiveness even when majority of compounds in the training set are again of synthetic origin (Šícho et al. 2019).

5.4.8 Case Study

Scientists have shown that five tropical plants—M. charantia, B. javanica, E. longifolia, T. divaricata, and G. mangostana—exhibit inhibitory effect against H5N1 neuraminidase. For the purposes of bioassays, different plant parts (leaves, roots, and fruits) were extracted, chromatographed, and fractionated. The anti-H5N1 neuraminidase activity of the plant fractions and extracts ranged from excellent to moderate. At 250 g/ml, G. mangostana showed the maximum inhibition (82.95 percent). Following this, pure chemicals were extracted from the five plants. The IC50 values of rubraxanthone, mangostin, and garcinone C ranged from 89.71 to 95.49 M, making them stand out (Ikram et al. 2015). This process is depicted below (Fig. 5.2) and the docking results of the abovementioned plant derivative are mentioned in Fig. 5.3.

Fig. 5.2
The schematic representation exhibits the chemical structure of Oseltamivir and Garcinone C along with the strategies of novel neuraminidase.

Strategies for novel neuraminidase inhibitors discovery of natural product (Ikram et al. 2015)

Fig. 5.3
4 illustrations exhibit the structure of crystallographic oseltamivir poses, hydrogen bonds of Garcinone C in the active site of neuraminidase, cation pi interactions between R 371, R 292, R 152, and crystallographic pose of oseltamivir.

(a) The superimposition of the docked and crystallographic oseltamivir poses (green and blue, respectively). The RMSD was 0.84 Å. (b) Predicted hydrogen bonds of Garcinone C in the active site of neuraminidase inhibitors. (c) Predicted cation-π interactions between R371, R292, R152, and the xanthone moiety of Garcinone C in the active site of neuraminidase inhibitors. (d) The crystallographic pose of oseltamivir, a potent inhibitor, shown for reference (PDB ID: 2HU4) (Ikram et al. 2015)

5.5 Challenges to Computational Approaches

The major challenges for NP-based drug discovery is management and representation of the data. Although ArachnoServer and ConoServer are rich and highly descriptive NP databases, but reserved only to specific clade of species producing toxins (Kaas et al. 2012). A partial solution for this is Tox-Prot manual annotation program within UniProtKB/Swiss-Prot which provides a more generalized and improved representation of databases for NPs (Jungo et al. 2012). However, this does not seem to be the complete solution. Another concern associated with NPs is fragmentation of databases which means more scattered form of data to be maintained by smaller or larger organizations. The added difficulty is shortage of funding required for maintaining those databases which leads to mismanagement of data, ultimately disabling the function of that database. Examples of such databases include as follows (Table 5.3):

Table 5.3 List of databases discontinued in 2019

To have a clear view, a comparative data of functional and defunct databases have been depicted in Fig. 5.4. A fundamental obstacle to the experimental screening of NPs is their propensity to interact with biological tests. This could be explained with the example of quercetin which has reportedly shown active in more than about 800 unique bioassays. The most common mechanism followed for interference is aggregate formation, covalent binding, membrane disruption, metal chelation, interference with assay spectroscopy, and buffer decomposition buffers (Baell and Holloway 2010). These problems could be overcome by specific set of rules following statistical approach known as pan-assay interference compounds (PAINS) rule set (Baell and Nissink 2018).

Fig. 5.4
A chart of nonfunctional databases contains 57000 C N P D, 6000 M N P D, 1000 A T B D, 60000 antimarin, 1393 T M D B, and 212100 U N P D. For functional databases, 18248 3 D MET, 420 Biofaquim, 85198 Zinc, 20035 NP Atlas and 1200 PhytoHub.

Pictorial representation of functional and defunct databases

5.6 Conclusion and Future Perspectives

Between the 1980s and the 2010s, two-thirds of the medications were either featured NP pharmacophores (35%) or were analogs of NPs (5%). Modern computational techniques discussed above can significantly expedite and reduce the risk of NP-based drug development. The integration of computational approaches with cheminformatics and other informatics methods has led to ease the management, storage, and representation of vast NP-based databases. Computational tools offer assistance in structural elucidation of bioactive NPs, in prior prediction of various properties of NPs as discussed above which eases the procedure for drug discovery pipeline. However, the major challenge being availability of descriptive database, fragmented databases, and its maintenance along with physical availability of the particular NP. These challenges have been resolved partially with introduction of databases like COlleCtion of Open Natural prodUcTs (COCONUT) which provides a web interface to browse and download elucidated and predicted NPs collected from open sources. On a larger parameter, machine learning (ML) has been using computational methods in drug discovery. For instance, clustering techniques have enabled de novo molecular design, projected protein target druggability, and segmented cell type imaging. The computational approach for NP-based drug discovery holds great future for NP-based drug discovery. The amalgamation of computational methods with advanced technologies in analytical domains can improvise the drug discovery pipeline for NPs. The advancement of higher-field NMR instruments and probe technology has made it possible to determine the structure of NPs from extremely small amounts hence, less wastage of hardly obtained product. Pauli and associates suggested conducting early, relatively sophisticated purity analyses on lead nanoparticles using quantitative NMR and LC–MS to avoid pointless downstream initiatives. Further advancement of metabolomics, genome mining, microbial culturing technique has added to the future scope of NP-based drug pipeline. In addition, antivirulence strategies may represent an alternative method for combating infections, for which NPs that target bacterial quorum sensing may be of interest. In silico Medicine, an American company, created an AI system called GENTRL (Generative Tensorial Reinforcement Learning) in 2019 that, in just 46 days, successfully created six kinase inhibitors of the discoidin domain receptor 1 linked to lung fibrosis. Cheminformatics, bioinformatics, and other related fields have made significant contributions to NP-based drug discovery over the years. Recently, reviews of their successful applications and limitations were conducted.