Keywords

1 Introduction

Computational approaches introduced into the drug design and discovery arena have often been praised for their capability to gently disclose not only how strongly a drug molecule can bind to a biological target, but also for elucidating how they bind together. Due understanding of the approaches’ ability in research has been proved eloquently by several hundred novel drugs approved for clinical practice which have been discovered thanks to significant assistance of in silico simulations (e.g. HIV protease inhibitors nelfinavir and amprenavir) [1]. Methodologically, modern computer-aided drug design (CADD) methods have evolved from classical Hansch correlation analysis of the relationships between structure and biological activities (QSAR) by extending to more complex mathematical modeling and computational chemistry simulations [2]. Increasing the success ratio in hit discovery and subsequent lead structure optimization, actual advances in computer technologies have enabled carrying out sophisticated data mining of many thousands intercorrelated molecular descriptors (e.g. by partial least square regression, principal component analysis, pattern recognition methods, artificial neural networks, nearest neighbor, cluster analysis, SIMCA, etc.), to deeply investigate electronic structures and interactions of large molecular systems such as proteins and nucleic acids or to perform structure-based virtual screening (SBVS) of large virtual ligand libraries for a selected biological target using supercomputers [3].

Philosophy of the current rational drug discovery is to accelerate the whole process and to reduce its costs by proper utilization of various in silico methods. However, accurate computational methods are also very complex and time consuming, and the necessary and underlying computer power has to be paid for. In comparison with more or less simple experimental methods like combinatorial synthesis, bioisosteres synthesis, scaffold variations, peripheral lead modifications, multi-target directed lead combinations, Topliss’s tree procedures for lead optimization, or in vitro high-throughput screening, even advanced in silico methods can hardly promise a quicker and neater way to novel drugs [4]. That is the reason why CADD methods are often simplified to a critical theoretical level to achieve as easily as possible some satisfying calculation outputs. It is truly a matter of hot scientific discussions how to judge the benefits of, in majority over-simplified, molecular mechanics-based methods accompanying drug research.

In the present study, we will focus on the issue of searching for novel/unknown ligands of human orexin 2 receptor (OX2R) by SBVS. The selected receptor OX2R is substantially related with a rare disease called narcolepsy, which is characterized by excessive daytime sleepiness accompanied with one or more of three additional symptoms (cataplexy or sudden loss of muscle tone, vivid hallucinations, brief periods of total paralysis) linked to the occurrence of rapid eye movement (REM) sleep at inappropriate times [5]. On biochemical level, narcolepsy manifests itself by decreased production of orexin peptides by orexin neurons in the lateral hypothalamus, which is the cause of relaxation of orexinergic neurons distributed throughout the central nervous system, consequently leading to misbalance of sleep-wake cycle regulation, impairment of food intake and pleasure-seeking behavior. Hopefully, narcolepsy could be treated pharmacologically through activating OX2R by suitable synthetic agonists, but no such agonist is known at present. Only several antagonists (e.g. suvorexant) are used in practice as hypnotics.

Since an X-ray model of OX2R has already been determined, SBVS can now initiate the discovering process of small molecule agonists by evaluating firstly the in silico binding energy of a set of chemical structures towards the receptor [6]. To accomplish SBVS, we utilized an open-source software-as-a-service (SaaS) platform iStar with iDock as a molecular docking engine. In total, 1 million ligand molecules were docked in OX2R model on a small-size cloud system employing ultra-fast calculation capabilities of iDock program. The 1000 top-scoring compounds resulting from iDock were flexibly re-docked in OX2R model by AutoDock Vina program employing a pleasingly parallelized operation scheme deploying separate jobs over a peta-flops-scale supercomputer. The results of both approaches are compared, generalized and interpreted with respect to their usefulness for drug discovery of novel OX2R modulators.

2 Human Orexinergic Neuronal System

Human orexinergic neuronal system is composed of two homologous receptors—orexin 1 receptor (OX1R) and orexin 2 receptor (OX2R)—and two activating neuropeptides—orexin A (OXA) and orexin B (OXB). The orexin peptides are formed by hydrolytic cleavage of prepro-orexin, containing 131 amino acids, which is expressed in orexin neurons of lateral hypothalamus. From thence they are distributed to the central nervous system (CNS). The scaffold of OXA is built of 33 amino acids stabilized with two disulfide bridges (Cys6–Cys12, Cys7–Cys14), having a pyroglutamoyl function attached to the N-terminus and an amidic group at the C-terminus. Its structure is folded into three helical sections. OXB is a shorter chain containing 28 amino acids with an aminated C-terminus. Unlike OXA, OXB lacks the strengthening disulfide bridges and comprises only two helices. Nonetheless, both orexin peptides are relatively flexible structures with many alternating conformations, as was determined experimentally by nuclear magnetic resonance (NMR) [7].

On the other hand, orexin receptors are localized throughout the whole CNS, even though they can marginally be found also in pancreas, gastrointestinal system, kidney and adipose tissue. From the structural point of view, orexin receptors belong to G-protein-coupled receptor (GPCR) family, consisting of a seven-fold helical transmembrane domain interconnected with a C-terminal globular functional unit localized in the cytosol. OX1R is built of 425 amino acids, while OX2R is composed of 444 amino acids. Both orexin receptors contain a disulfide bridge binding the third transmembrane segment (TM3) with the second extracellular loop (ECL2) and share a high degree of structural similarity (i.e. 63.23% of pairwise identity, 282 identical positions, 81 similar positions determined by alignment in Clustal Omega program) (Fig. 1).

Fig. 1
figure 1

X-ray models of OX1R (PDB ID: 4ZJ8) and OX2R (PDB ID: 4S0 V). Both receptors share 63.23% of pairwise sequence identity (marked by asterisks over the residues)

OXA exhibits strong activation potency towards both OX1R and OX2R, while OXB activates preferentially OX2R but with 10 times high an efficiency as OX1R. OX2R pathways are predominantly associated with wakefulness and arousal regulation, whereas OX1R subsystem is involved in feeding control, coordination of reward, coping nociception and stress. Generally, activation of orexin receptors in the CNS brings about neuroexcitation by closing K+ channels and activation of Na+/Ca2+ exchange [8].

At present, an intensive research of orexin receptors has revealed many details about the orexinergic signaling cascades. By X-ray and NMR, 3D structures of OXA, OXB, OX1R, OX2R have been determined and profoundly analyzed. It was proved that modulation of orexin receptors can be useful in the treatment of sleep disorders, narcolepsy, cataplexy, obesity, hypophagia, attention deficit, depression, bipolar disorders, and, moreover, in colon cancer and Parkinson’s disease. These studies are very important for development of new ligands capable to modulate the orexin receptors activity. Since discovery of orexin receptor antagonists has been successfully started by suvorexant, the main attention in this research area is moved especially to development of small molecule agonists able to cross blood-brain barrier which might be deployed to tackle narcolepsy.

3 Problem Definition

Theoretically, it is quite easy to propose a method for discovery of OX2R antagonists since the only issue herein is to block interactions between OXA/OXB and the receptor. Once the natural agonist is prevented from interaction with the receptor, the signaling process cannot develop and the orexinergic system stays relaxed. Of course, finding efficient antagonists of OX2R is not straightforward, but in comparison with searching for agonists it seems to be considerably easier. In case of OX2R agonists, one needs to mimic peptide-protein interactions within OXA/OXB-OX2R complex, which is generally understood as a challenging task. Particularly with GPCR, the activation process depends on a variety of binding interactions and subsequent conformational changes, which makes the agonists’ development extremely difficult [9].

Fortunately, a common property of both agonists and antagonists is a significant affinity for the target receptor. Because design of novel OX2R agonists from scratch would require very complex and time consuming studies of the activated receptor, we reduced the objective of the present work to searching for promising OX2R modulators. This task was accomplished by SBVS via molecular docking. The principle of such SBVS can be concisely defined as computational evaluation of binding energies of a huge set of compounds towards OX2R model. After estimating the binding energies, the top-scoring candidates can be regarded as potential modulators of OX2R.

SBVS coupled with flexible molecular docking is nowadays a common computational chemistry approach which has been profoundly described in the literature [10]. However, some innovations and improvements of the methods still emerge, especially with regard to calculation speed, accuracy, user-friendliness and availability. In order to investigate the benefits of different computation technologies in SBVS, we utilized an open source SaaS platform iStar which employs iDock program as an engine for molecular docking, and free AutoDock Vina program running in “high performance computing” (HPC) mode on a peta-flops-scale supercomputer. At first, we compared the performance of both platform methods in SBVS of 1000 ligands representing a random selection of FDA-approved drugs. Finally, we launched a process of 1 million docking jobs, corresponding to 1 million drug-like ligands, on an iStar-iDock cloud to obtain insight into binding energy population in the chemical space close to OX2R. Because the calculations in AutoDock Vina were deliberately set to higher precision, we performed re-docking only for 1000 top-scoring ligands resulting from the first stage docking of 1 million ligands on the iStar-iDock cloud. In all studies, we used an X-ray model of OX2R (PDB ID: 4S0 V) as the target receptor. The following section provides a brief description of the undertaken calculations and achieved results.

4 SBVS Using iStar and iDock Cloud System

SBVS is substantially associated with a more or less demanding computational method known as molecular docking which aims at finding such geometrical position of a ligand and a receptor/enzyme molecule that exhibits the lowest potential energy, thus the strongest mutual attraction. The task of molecular docking is solved as a minimization problem using various potential energy gradient-driven methods (e.g. steepest descent, conjugate Polak-Ribiere algorithm, Fletcher-Reeves algorithm, eigenvector following, etc.). For expressing the potential energy of a molecular system, a mathematical definition of each energy contributors: (1) bond stretching (E s ); (2) bond bending (E b ); (3) dihedral torsion (E t ); (4) van der Waals interactions (E vdW ); (5) hydrogen bonding (E hb ); (6) electrostatic interactions (E e ) and a set of atom-and-bond-specific constants (i.e. force field) are necessary Eq. (1) [3].

$$ \begin{aligned} E_{s} & = \sum\limits_{i = 1}^{n} {K_{s} \left( {R_{i}^{a} - R_{i}^{0} } \right)^{2} } ;E_{b} = \sum\limits_{i = 1}^{n} {K_{b} \left( {\theta_{i}^{a} - \theta_{i}^{0} } \right)^{2} } ;E_{t} = \sum\limits_{i = 1}^{n} {\frac{{V_{n} }}{2}\left( {1 + \cos \left( {n\varphi_{i}^{a} - \varphi_{i}^{0} } \right)} \right)} ; \\ E_{vdW} & = \sum\limits_{i,j \in vdW;i \ne j}^{n} {\left[ {\frac{{A_{ij} }}{{R_{ij}^{12} }} - \frac{{B_{ij} }}{{R_{ij}^{6} }}} \right]} ;E_{hb} = \sum\limits_{i,j \in hb; i \ne j}^{n} {\left[ {\frac{{C_{ij} }}{{R_{ij}^{12} }} - \frac{{D_{ij} }}{{R_{ij}^{10} }}} \right]} ;E_{e} = \sum\limits_{i,j \in e; i \ne j}^{n} {\left[ {\frac{{q_{i} q_{j} }}{{\varepsilon R_{ij} }}} \right]} . \\ \end{aligned} $$
(1)

SBVS is simply a high-throughput extension of molecular docking which performs quickly, often through parallelized and distributed calculations, an evaluation of the binding energy of many ligands to a selected receptor.

For the present study, we utilized an open source SaaS platform iStar employing iDock as a molecular docking engine. This web platform is constructed of several elements which together bring highly effective SBVS capabilities to the user (Fig. 2.). Briefly, the web client is implemented with Twitter Bootstrap, jQuery, jQuery UI, three.js, zlib.js, jquery-dateFormat and jquery_lazyload functions. It is compatible with Google Chrome 30, Mozilla Firefox 25, MSI Explorer 11, Apple Safari 6.1 and Opera 17. The web server uses node.js, mongodb, express and spdy modules. The files used for SBVS are managed on MongoDB platform [11].

Fig. 2
figure 2

Architecture of the open source platform iStar. It is a software-as-a-service system designed for bioinformatics and chemometrics [11]

Functionally, iStar enables scheduling, deployment and monitoring of SBVS jobs, messaging and storing the resulting data. Beside iGrep, iCUDA, iView applications, it is associated with iDock program developed for parallelized molecular docking by Hongjian Li. The ligands for virtual screening, originally obtained from ZINC database, are converted to PDBQT format and stored at the web server database (http://istar.cse.cuhk.edu.hk/idock/). The ligands used for SBVS can be selected from a huge pool of 17 million compounds by setting limits on molecular weight, calculated logP, apolar desolvation, polar desolvation, number of hydrogen bond donors and acceptors, topological surface area tPSA, net charge, and number of rotatable bonds. At present, the iStar & iDock platform provides the ligands which are stored at the ZINC database as All Clean subset. Nonetheless, to accomplish SBVS, this platform needs an externally imported PDB file with the receptor 3D model. Mostly, the free database of proteins rcsb.org can be utilized for these purposes.

Beside the iStar platform which secures user-friendliness and computational power management, the most crucial element for SBVS is the iDock program. iDock was developed from open source code of AutoDock Vina, the benchmark program in flexible molecular docking, and, thus, it borrows many substantial features from its predecessor. Although iDock tries to improve AutoDock Vina, it implements only calculations of flexible ligands and rigid receptors. By default, iDock utilizes grid maps of granularity 0.15625 Å to screen the global ligand-receptor interactions by distributing independent Monte Carlo tasks to separate threads [12].

The scoring function of iDock is exactly the same as that used in Vina, giving the binding energy estimate of the ligand-receptor system by summation of five terms (i.e. Gaussian energy 1, Gaussian energy 2, repulsion, hydrophobic interaction, and hydrogen bonding). The optimization algorithm is also the same in iDock and Vina. It is divided into two parts: global and local. The global search utilizes Monte Carlo principle with random mutation of the current solution, while the local search is based on Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, approximating the inverse Hessian of the scoring function. Compared with Vina, iDock increases the default number of parallel Monte Carlo runs from 8 to 64 and stops the BFGS local optimization only if no other allowed step can be performed. Due to these changes, iDock can assure finding the energy minimum with somewhat higher probability comparing to Vina. The improvement of iDock also consists in revision of CPU and memory utilization of Vina. iDock evaluates the capacity of every vector structure and employs R-value reference of the C++ 11 standards.

The iStar web platform along with iDock engine allows users to easily start extensive SBVS without the necessity to solve computational issues. The system automatically proposes the grid box center and size in the active site of the receptor and offers simple filters for designing a custom virtual ligand library. Therefore, one only needs to upload the biological target model and to set parameters for selecting the screened ligands. Once the SBVS is completed, the user is notified by e-mail about the results, which can be downloaded from the web server. The results involve a list of docked compounds, their binding energy estimates and binding modes in PDBQT format.

5 SBVS with AutoDock Vina and Salomon Supercomputer

The underlying principles implemented in AutoDock Vina program have already been mentioned in the previous chapter. Nowadays, Vina still represents a classical tool in computational chemistry and biology for evaluating interactions between ligands and enzymes or receptors. Fortunately, Vina is natively implemented as a multithreading application since it deals with ideally computationally separable tasks. Up to this date, many Vina variations have been published providing some better features than the original C++ code. However, Vina remains a principal robust standard which easily helps medicinal chemists in understanding molecular level of drug action. With some effort, Vina can be extended to SBVS regimen by incorporation into distributed calculation schemes using computer grid systems or supercomputers equipped with a scheduler for managing the submitted jobs.

For the present study, we developed a code implementing job arrays which sends individual flexible molecular docking tasks to PBS scheduler and collects the results from computer nodes. The ligands are stored in LIGANDS directory, grid-box parameters, CPUs, exhaustiveness, rigid and flexible receptor parts are defined in conf.txt:

For this SBVS, we utilized Salomon (Czech Republic) supercomputer consisting of 1008 compute nodes with 24,192 CPUs and 129 TB RAM in total. The peak performance of the system is over 2 peta-flops. Each node has two Intel Xeon E5-2680v3 CPUs with 24 cores. The system, running CentOS Linux, is interconnected by 7D enhanced hypercube InfiniBand network and affords interesting power for medium-sized virtual screenings.

Although supercomputers offer great computational power, they are considered too complicated for laymen in information technologies. Comparing with clouds systems, supercomputers are generally more suitable for high performance computing (HPC), but traditional medicinal chemists will prefer cloud solutions which can carry out many elementary steps automatically without disturbing the user.

6 Results and Discussion

Utilizing the iStar & iDock platform operating on four powerful computers, each with four Intel Xeon E7-4830 v2 processors and 512 GB DDR3 RAM (i.e. 320 cores in total), we performed SBVS of 1000448 ligands in OX2R (PDB ID: 4S0 V). The calculations implemented only multithreading. The center of the gridbox of size of 17 × 14 × 16 Å was automatically placed to x = 52 Å, y = 8 Å, z = 53 Å. The calculations were completed after 125 h. One docking job is done approximately in 30 s. Some elementary statistics of the estimated binding energies within SBVS are given in Table 1.

Table 1 Basic statistics of SBVS on iStar & iDock platform for OX2R (PDB ID: 4S0 V)

The top-scoring candidate for OX2R exhibited the binding energy of −13.08 kcal/mol. Its binding mode in the central tunnel of the transmembrane domain of OX2R is displayed in the left part of Fig. 3.

Fig. 3
figure 3

The top-scoring candidates in OX2R (PDB ID: 4S0 V) provided by iDock (left) and by AutoDock Vina (right)

The top scoring compound outlined in Fig. 3. represents a ligand with a relatively strong affinity for OX2R receptor. It occupies the place in the opening of the transmembrane helical domain, which can hinder the activation by OXA/OXB. However, as has been already mentioned, we cannot deduce from this binding mode that the compound might be a potential OX2R agonist. Further investigation is necessary to elucidate subsequent steric interactions in the ligand-receptor complex.

1000 top-scoring candidates resulting from SBVS in iStar & iDock calculations were also submitted to flexible molecular docking in the same OX2R model using AutoDock Vina program and Salomon supercomputer. In the configuration file, the gridbox of 30 × 30 × 30 Å was centered at the same point as in the case of iStar & iDock calculations. Unlike iStar & iDock calculations, 38 amino acid residues encompassed by the gridbox were set as flexible structures for docking in Vina program. Further, the calculations in Vina were set to utilize 24 CPUs in multithreading mode with the exhaustiveness parameter equal to 24. Completing one docking task in Vina took 4.5 h on average, although all 1000 jobs were finished on Salomon after 6 h. The 1.5 h delay was caused by the scheduler which starts different tasks without any preference for jobs of a single supercomputer user. The results are summarized in Table 2.

Table 2 Basic statistics of SBVS by AutoDock Vina and iStar & iDock for OX2R (PDB ID: 4S0 V). 1000 top-scoring candidates from complete SBVS by iStar & iDock

Interestingly, the resulting binding energy estimates from AutoDock Vina and iStar & iDock correlate rather weakly, although significantly (Pearson’s R = 0.3677, p = 2.2437e-33; Spearman’s R = 0.3558, p = 3.2961e-31). The best candidate scored with −13.08 kcal/mol by iDock (Fig. 3.) provided in Vina a binding energy of −10.7 kcal/mol. Conversely, the top-scoring candidate marked with binding energy of −15.4 kcal/mol by AutoDock Vina was characterized only with energy of −12.393 kcal/mol by iDock. From this it is evident that iDock is much faster than Vina, but because of omitting the receptor flexibility it probably does not assign best scoring to the docked ligands. This discrepancy can be properly arbitrated only by calculations on a higher level of theory or experimentally. However, iStar & iDock cloud platform remains attractive even while the accuracy of both iDock and AutoDock Vina has to be properly investigated.

7 Conclusions

We have reported on very demanding simulations of non-covalent interactions between 1 M chemical compounds and OX2R receptor employing molecular docking and a SaaS cloud system. 1 k top-scoring ligands resulting from this phase of SBVS were re-docked in OX2R applying flexible molecular docking to obtain more accurate estimates of Gibbs free energies. For these purposes, we developed a SBVS protocol to distribute the computational jobs in a supercomputer. We have proved in the present article that cloud systems may be equipped with fast docking algorithms but the achieved results might be contradictory with the outputs of practically identic calculations. Since iDock and AutoDock Vina differ especially in handling flexible residues, we may suppose that low correlation between the binding energies is caused by improper optimization in iDock program. Nonetheless, the top-scoring candidate revealed in AutoDock Vina (−15.4 kcal/mol) seems to be worthy of further research because scoring lower than −12.5 kcal/mol is empirically taken as a significant in silico level.