Keywords

12.1 Introduction

The continuous developments in the field of bioinformatics have paved ways for scientists to discover its applications in disease genetics, exploration of drug targets and drug design and discovery processes (Xia 2017). The complexity and high cost associated with the drug discovery process has led to the use of in-silico approaches to ease the drug development in connection with the experimental techniques. However, the process of drug development starting with the target discovery or identification and then designing and synthesizing drugs to modify the pathological processes is highly expensive due to the associated cost and time constraints of the drug development pipeline (Pereira et al. 2020).

Bioinformatics has been the mainstay of all scientific research for quite some time now. It is a collection of biological data that can be accessed and analyzed using computational tools and algorithms (Jawdat 2006). The major contributions towards the development of bioinformatics are the whole genome sequencing and the progress in the fields of proteomics, genomics and transcriptomics generating high-quality data which is retained in several databases and accessible for further developments (Jin et al. 2021). Deep learning has gained much attention due to its seemingly flawless performance in tasks of the machine learning including structure prediction, biological sequence analysis, protein interactions, biological diagnosis and image processing along with the prediction of biological properties and features (Li et al. 2019a). The data from the field of metabolomics has enabled researchers to study the biological processes to develop and understand the pathways and factors which are essential in physiological responses. Omics fields has influenced not only medical sciences but also other fields such as it provides considerable input in determining the factors necessary for plant growth and its molecular processes. All this is possible merely due to the availability of data which is a fruit of bioinformatics (Ambrosino et al. 2020). Era of omics has expanded the understanding of systems biology by providing valuable insights at all levels necessary for the understanding of biological systems such as proteins, transcripts, metabolites, and genes, requiring complex data processing and computations which can be done by bioinformatic approaches and tools (Waseem et al. 2020). Translational bioinformatic approaches have gained much interest for the advancements in precision medicine and they are focused on the patient specific needs by researching on the pharmacogenomics with the aid of artificial intelligence techniques (Ritchie et al. 2019). The future of bioinformatics in health sciences is quiet promising as it aims to discover modern approaches for direct clinical practices.

12.2 High-Throughput Drug Screening

Drug discovery involves the detection or identification of drug candidates against a biological target. High-throughput screening (HTS) enables us to screen a large library of drug candidates against a selected target in multiple well plates which ultimately leads to the discovery of a novel lead compound. The major advantage of HTS is that the inactive compounds are eliminated at an early stage before pre-clinical or clinical testing of the drugs, thus saving the cost of analysis of inactive compounds. With the advancements in the area of computational chemistry and genetic biology, a number of new druggable targets have emerged and the library of synthetic and semi-synthetic drugs has grown exponentially (Kainkaryam and Woolf 2009; Hsu et al. 2021). The HTS allows us to examine the effect of a library of compounds on different targets using a single compound per well technique. In this way a library of molecules is screened in-vitro against a specific target and the inhibitors/stimulators are identified for further processing. However, the market output of the drug discovery still faces some challenges due to the unexpected and undesirable pharmacological and toxicological profile of the screened compounds in clinical trials (Scannell et al. 2012). The pharmacokinetic profile of the screened compounds also poses a limitation in the successful drug development process which is identified at a later stage and hence results in the loss of considerable time and resources. Cell-based assays although are relatively slow and expensive as compared to biochemical HTS methods but provide data related to the toxicity profile as well as the kinetic properties. These assays are used in the initial stages of drug development HTS of the library of compounds with the desirable characteristics. Cell-based HTS methods are employed while considering the quality control, and the automation techniques are carefully regulated to optimize the outcome and the development of a new chemical entity (NCE) (Schaduangrat et al. 2020).

Traditional HTS methods employ a single drug per well technique in which large number of chemical resources are wasted in exploring the active compounds as the library only have a few hits among the thousands of compounds (Volochnyuk et al. 2019). The inactive compounds are in large number and utilize great deal of time and resources even after automation. Moreover, sometimes the data results in false positive and negatives leading to the miscalculation and hence polluting the overall drug development process. Miniaturization techniques are used to overcome some of the limitations in which small amount of testing reagents and compounds are utilized and are also time efficient (Wilson et al. 2020; Wölcke and Ullmann 2001). Another strategy used for boosting the efficiency of the HTS process is the use of pooling strategy in which compounds are primarily screened as a mixture and then secondary screening is done for the compounds with positive results from primary pooled mixtures (Kainkaryam and Woolf 2009). Hence it optimizes the resources, and reduces the error and cost. Nevertheless, the choice of pooling design and its development and implementation are some of the limitations. Combinatorial pooling strategy finds its applications in disease diagnostics as well such as one recent study identified the use of HTS for SARS-CoV-2 diagnostic testing for asymptomatic patients who are a carrier of the virus and pose a significant threat to the disease spread (Shental et al. 2020).

12.3 Bioinformatics in High-Throughput Drug Screening

With the advent of bioinformatic tools that complement the drug discovery process, we have overcome many of the limitations and problems encountered in late 90 s in the processes of HTS. Bioinformatics has emerged as a multidisciplinary field which is a vital part of drug discovery and development process such as screening of compound libraries, identification of biological targets, proteomics, genomics, biological, chemical and virtual screening of compounds. Cheminformatics along with the bioinformatics has resolved various problems encountered in the drug development and screening process (Parikh et al. 2023). Bioinformatic techniques are employed for the identification of novel drug targets, modelling of target proteins, designing of druggable compounds, determining their interactions with the target and prediction of physicochemical properties and toxicology profiling. Machine learning techniques and algorithms are being developed to aid these processes (Chavda et al. 2021).

Various data mining tools are being used which provide large datasets for the identification of potential targets as well as the compounds that will bind with those targets and produce a response. These tools also enable us to establish the effectiveness of the drug candidate and its binding interactions (Yang et al. 2012; Patel et al. 2020).

Molecular modelling tools are used to generate models of the target proteins and biological systems using different techniques which enable us to virtually analyze the target structure and predict the binding site as well as the binding interactions or the groups needed for potential binding (Haghighatlari and Hachmann 2019).

Virtual high-throughput screening tools are developed to overcome the limitations of traditional HTS systems which results in loss of reagents and resources. When the HTS assays are complex and tedious, virtual drug screening approaches are used to complement the HTS. The drug libraries are screened using in-silico experimentation to determine their binding interactions with biological targets using molecular modelling tools (Mcintosh-Smith et al. 2015). Such techniques enable scientists to screen large libraries in an efficient manner without the expenditure of viable resources. However, these tools require considerable expertise to operate, operate using complex algorithms and are not always error-free. One of the major advantages of using virtual HTS is that it is economical and less time consuming as compared to the traditional experimentation; using large volume of reagents to screen for a potential active agent from millions of compounds (Mohammad et al. 2021). Structure-based and ligand-based virtual screening are two strategies for hit-to-lead discovery and optimization (Fig. 12.1). These approaches are purely theoretical when compared to the HTS which is purely experimental technique but both aim at the generation of a lead compound for the successful drug discovery process. A combination of both approaches can help in the efficient delivery of successful drug candidates without incurring resource wastage and added costs of analyzing thousands of compounds in HTS experiments (Zhang et al. 2022).

Fig. 12.1
A flow diagram of the Virtual High Throughput Screening. The flow is as follows. Virtual High Throughput Screening. Ligand based H T virtual screening. Structure based H T virtual screening. Pre-processing, drug-likeness, and lead-likeness. Hit-to-lead optimization.

Virtual high-throughput screening strategies for lead identification (Stumpfe and Bajorath 2020; Guterres and Im 2020; Da Silva Rocha et al. 2019)

One of the methods for virtual screening is molecular docking which involves the study of interaction between the drug molecule and the biological target followed by the analysis of binding energies and interacting amino acid residues of the binding pocket. The spatial arrangement of drug with its target is based on the induced fit theory and results in the identification of its mechanism of action (Lin et al. 2020). Pharmacophore modelling is another method which is used to design basic structural model of the drug candidate from which lead compound is generated, followed by the screening of databases. The structural features of a pharmacophore are based on its complementary target and by adjusting these features compounds with desirable activity can be designed. Another way is to proceed to the screening of small molecules based on chemical similarity searching on various databases. ZINC is one such database (Seidel et al. 2017; Lin et al. 2020).

Quantitative Structure-activity relationship (QSAR) is a technique in which some quantifiable property of a compound is correlated with its biological activity based on experimental data. Quantitative descriptors are used to identify the active agents against a target of interest by comparing their features such as lipid solubility, permeability, electronic features, size and shape of the molecule and ADME properties. 3D QSAR modelling is still used in pharmaceutical industry due to its ability of accurate structural predictions using minimal calculations (Vucicevic et al. 2019). In-silico screening methods are used for the development of drugs for a wide range of diseases such as tuberculosis (Macalino et al. 2020), CVDs (Savoji et al. 2019), COVID-19 (Gupta et al. 2023), hepatitis (Hdoufane et al. 2022), diabetes (Akhtar et al. 2019), neurodegenerative diseases (Aldewachi et al. 2021) and cancer therapy (Vougas et al. 2019).

12.4 Applications of Bioinformatics in High-Throughput Drug Screening

Omics technology has emerged as a turning point in the health sciences which provides data related to the biological systems and includes proteomics, genomics, metabolomics and transcriptomics. The first step in the HTS is the target identification and various drug targets have been discovered and identified with the help of bioinformatic approaches (Martis et al. 2011). Data mining approaches include high-throughput chemogenomic and proteomics. A wide range of data mining sources are available which have all the necessary information needed for the identification of a biological target such as structural databases (UniPort, PubMed, InterPro), text mining tools (GeneWays, Texrpresso, BioRat), microarray databases (SMD, Oncomine, caArray), clustering database (GenePattern, ArrayMiner, Genecluster), supervised analysis platform (SAM) and interactome and pathway databases (KEGG, PathwayExplorer, Pathguide) (Yang et al. 2012; Agamah et al. 2020). One study reported the use of various bioinformatic tools and databases to develop a human-virus interactome for ZIKA virus using an algorithm OralInt, potentially highlighting various druggable targets against ZIKA virus (Fig. 12.2) (Esteves et al. 2017).

Fig. 12.2
A chart of the databases and bioinformatics sources in High Throughput Screening. The sources include structural and text databases. Molecular docking. Clustering platform. Interactome and pathways.

Databases and bioinformatic sources in high-throughput drug screening (Yang et al. 2012)

Assay development is a crucial step for the success of screening process. Specificity and sensitivity of assay is the basis of the whole experimentation and bioinformatic techniques have been utilized to develop highly sensitive screening assays. Virtual screening assays are developed as a complementary approach to the HTS and can be regarded as a basic simulation of the HTS assays using the knowledge of biophysics and computer sciences. These simulations are also conducted to optimize the conditions needed to run an assay. In the simulation models, various parameters such as temperature, reagents and time duration can be adjusted leading to the highly sensitive assay. MolMind is one such tool which combines the laboratory based assays and in-silico methods (Szymański et al. 2012). In-silico toxicological analysis is a preliminary assay which results in the filtering of potential toxicological compound while virtual screening. ADME-T methods are being used along with computational toxicology methods (i-drug discovery, ToxScope, OncoLogic, MetaDrug, HazardExpert, and e-TOX) for determining the toxicological profile of the drug candidates (Szymański et al. 2012). In one study, imaging techniques and florescent-based methods were combined to create a high-throughput drug screening assay using 3D organoids to assess the organoid growth and the effects of drugs (Li et al. 2022).

Data mining approaches and microarray techniques are utilized for HTS. One study reported the use of microarray analysis for the identification of micro RNAs and genes as biomarkers for the treatment and diagnosis of atrial fibrillation using different databases (Li et al. 2019b). The study of biological pathways utilizing bioinformatic tools has made it possible to identify the disease biomarkers and drug targets. One study reported the involvement of multiple RNAs expression in the regulation and progression of preeclampsia using different bioinformatics tools and databases. It also reported that the activation of JAK-STAT signaling pathway is related to the progression of preeclampsia (Liu et al. 2019).

12.5 Challenges and Limitations of Bioinformatics in High-Throughput Drug Screening

Virtual HTS is an efficient, robust and cost-effective technique for the screening of biologically active molecules from large datasets but it does not replace the traditional HTS methods, it simply complements it by narrowing down the possible hits and leads. Computational analysis, although, enables us to screen for a library of thousands of compounds in a day but it faces some challenges due to the complexity of data and sometimes generates erroneous results. The main focus, however, remains on the generation of efficient leads for subsequent optimization in drug development pipelines. The complexity of data generated by computational analysis is also challenging for the effective interpretation and requires highly skilled analysis. Machine learning techniques such as decision tree models and artificial neural networks are developed to overcome the complexity of data available for the computational analysis (Han et al. 2008; Butkiewicz et al. 2012). The computations and equations of QSAR models are highly complex and requires careful analysis, validations and may sometimes be impractical (Spiegel and Senderowitz 2020). The computational complexity of various multilayered techniques is a hindrance. The method development requires validation of data and sometimes the results are not reproducible raising a question on the validity of the data obtained (Stumpfe and Bajorath 2020). One of the major challenges highlighted and mentioned by multiple researchers in the field is the accuracy of data obtained from virtual screening methods. Sometimes its impractical to translate the outcome in human patients although significant evidence of activity is obtained from computational analysis. In structure-based drug screening, the binding energies of actives and inactive are closely related showing inaccuracy and sometimes putative interactions are generated for inactive leads which like HTS results in the generation of ineffective lead compounds identified at a later stage of testing (Jasial et al. 2016).

Current debate is on the ligand promiscuity of the biological targets which may points towards the inaccuracy of the binding interactions generated through the virtual screening. With the prior knowledge of drug-target binding interactions, virtual screening methods also faces a certain bias in the selection of screening library which leads to the high hit rates confused with the accuracy of the prediction (Stumpfe and Bajorath 2020).

12.6 Future of Bioinformatics in High-Throughput Drug Screening

Bioinformatics has emerged as an indispensable field in the drug discovery and screening processes. The traditional high-throughput screening requires the experimentation of large library of compounds having millions of drugs comprising of large number of inactive candidates. This resulted in wastage of resources, time and money. Bioinformatic tools and techniques enable us to shrink down the chemical library before high-throughput screening assays by ruling out the possible inactive agents in in-silico or virtual screening steps. These virtual screening methods enable us to identify and select only those compounds which show promising results in virtual screening assays (Stumpfe et al. 2012; Stumpfe and Bajorath 2020). Hence these techniques save cost, resources and time by providing highly specific and nearly accurate predictions. Virtual screening era is promising and is predicted to progress further mainly due to its screening efficiency and enormous data handling capacity. Nevertheless, the virtual screening problems need to be encountered in the future to continue an integrative approach towards drug screening. The number one problem which requires attention is the generation of inaccurate binding energies and similarity hits; which require rigorous post-analysis to interpret the accuracy of results. Scientists are working to overcome this problem and have made some progress. In this post-genomic era, the field of molecular and chemical biology remain potential areas of growth that will enhance our understanding as well as the applications of virtual drug screening (Heikamp and Bajorath 2012; Sabe et al. 2021).

The advancements in the field of artificial intelligence are a turning point for the pharmaceutical and health sciences as it is a step forward towards overcoming the limitations encountered in drug discovery (Zhong et al. 2018). Virtual screening is indispensable in drug discovery and development process. Sequential screening which is a widely known concept; computational screening integrated with experimental screening, should be practically incorporated in order to avoid problems at a later stage and overcome the limitations of both techniques (Achary 2020).

12.7 Conclusions and Future Perspectives

The field of bioinformatics has significantly contributed to the drug discovery and development process by providing an avenue though virtual high-throughput screening. However, the vast amount of unverified data available on genetic and protein repositories makes it essential for the bioinformaticians to pre-process it before its integration and interpretation can actually begin. In addition, biological complexity of available data hinders its wider usage. Experimental limitations and lack of availability and accessibility to a variety of user-friendly computer applications also appears to slow-down the HTS process. The advent of publicly available machine learning and artificial intelligence platforms can address some of the identified limitations. Moreover, the inter-disciplinary collaborative research can facilitate the drug development process.