Introduction

Tuberculosis (TB) is the most important cause of human death from a curable infectious disease. It is estimated that, worldwide, one hundred million people are infected annually and about ten million develop the disease, with five million of those progressing to an infectious stage, culminating with approximately three million deaths. According to the World Health Organization [1], the overall incidence of TB increases approximately 0.3% per year. The resurgence of this health problem occurred mainly due to the proliferation of multi (MDR-TB), extensively (XDR-TB), and recently, totally-drug (TDR-TB) resistant Mt strains. Besides, the high susceptibility of HIV/AIDS infected patients to TB is also a health problem. Therefore, there is an urgent need for the discovery and development of new and better drugs for the TB treatment [2].

Enzymes of the shikimate pathway (SP) are promising targets for the development of antimicrobial agents [3] and herbicides [4], because they are essential to the survival of algae, higher plants, bacteria, fungi, apicomplexan parasites and absent in mammals [5]. It is a seven-step biosynthetic route that converts erythrose 4-phosphate to chorismate, a precursor of aromatic amino acids and many other essential compounds [6].

The object of our study is the fifth enzyme of the SP, shikimate kinase (SK) (EC 2.7.1.71), which catalyzes the specific phosphorylation of the 3-hydroxy group of shikimate using ATP as a co-substrate resulting in shikimate-3-phosphate and ADP [7, 8]. This enzyme is an established target against Mt, since Parish and Stoeker demonstrated that the SP is essential for the viability of Mt due to the disruption of the aroK gene, which codes for the SK enzyme [9].

SK is a member of the nucleoside monophosphate kinases (NMP kinases) family, which suffer large conformational changes during catalysis (Fig. 1) [10]. The enzymes of this family are composed of three domains: the CORE, which contains a highly conserved phosphate-binding loop (P-loop), the LID domain, which undergoes substantial structural changes upon substrate binding, and the NMP-binding domain which is responsible for the recognition and binding of a specific substrate [11].

Fig. 1
figure 1

Structure of shikimate kinase in complex with ADP and shikimate (PDB access code: 2DFN)

Drugs are usually discovered by trial and error by means of high-throughput screening approaches that use in vitro experiments to evaluate the activity of a large number of compounds against a known target. This procedure is very costly and time-consuming. If crystallographic information is available for the protein target, then molecular docking simulations can be a helpful computational approach in the drug-discovery process [12]. Molecular docking is a simulation method that predicts the conformation of a receptor-ligand complex, in which the receptor can be either a protein or a nucleic acid, and the ligand is a small molecule. This computer simulation can generate many possible positions for the ligand in the receptor-binding pocket. Therefore, a criterion is necessary that will allow comparisons of all possible positions of the ligand, and then a selection can be made for the best position.

Our goal here is to find potential inhibitors against shikimate kinase from Mycobacterium tuberculosis MtSK using virtual screening (VS). VS can decrease costs and improve hits rates for lead discovery. For this, we used the MtSK structure [13] as a target for the molecular docking simulations with MOLDOCK [14]. Our docking protocol was validated against an ensemble of 12 crystallographic structures available for complexes of MtSK. The VS was validated by inclusion of a known SK inhibitor in the small-molecule database with over 4500 structures. We describe the results obtained in terms of the MOLDOCK scores, modes of interaction and discuss the importance of the active site residues in the ligand binding process.

Materials and methods

Molecular docking simulations

One of the fundamental questions in structural biology is the study of protein-ligand interactions, particularly considering the pharmacological applications of such study in the design of drugs based on structure [15]. To simulate the interaction of MtSK with a library of ligands, we used the MOLDOCK program [14], an implementation of a variant of the evolutionary algorithm (EA). Recent evaluation of MOLDOCK strongly indicates that it is capable of finding the right position of a ligand. Furthermore, MOLDOCK exhibits better overall performance compared with SURFLEX, FLEXX, and GOLD [14]. In the present work, all simulations were performed in an iMac (Intel Processor Core 2 Duo, 2.66 GHz, 2 GB SDRAM DDR3 1066 MHz).

Re-docking and cross-docking

In molecular docking simulations, the best binary complex (protein-ligand) is the one closer to the crystallographic structure. For that reason we must establish a methodology that assesses the distance from the computer-generated solution (pose) to the crystallographic structure. This distance can be calculated using the root-mean-square deviation (RMSD), which is a measure of the differences between values predicted by a model and the values actually observed from the object being modeled or estimated (protein-ligand complex). The RMSD is calculated between two sets of atomic coordinates, in this case, one for the crystallographic structure (x ctal , y ctal , z ctal ; the object being modeled) and another for the atomic coordinates obtained from the docking simulations (x pose , y pose , z pose ; predicted model). A summation is then taken over all N atoms being compared, using the following equation:

$$ {\text{RMSD}} = \sqrt {{\frac{{1}}{\text{N}}\sum\limits_{{{\text{i}} = {1}}}^{\text{N}} {{{{(}{{\text{x}}_{{{\text{ctal, i}}}}} - {{\text{x}}_{{{\text{pose, i}}}}}{)}}^{{2}}} + {{{(}{{\text{y}}_{{{\text{ctal, i}}}}} - {{\text{y}}_{{{\text{pose, i}}}}}{)}}^{{2}}} + {{{(}{{\text{z}}_{{{\text{ctal, i}}}}} - {{\text{z}}_{{{\text{pose, i}}}}}{)}}^{{2}}}} }} \,.$$
(1)

In docking simulations, it is expected that the best results generate RMSD values less than 2.0 Å compared with crystallographic structures [16]. This procedure of obtaining the crystallographic position of the ligand is often called “re-docking,” which is fundamentally a validation method that determines whether the molecular docking algorithm is able to recover the crystallographic position using computer simulation. In this work, all RMSD calculations were calculated for non-hydrogen atoms.

In order to validate our docking protocol, we used the SK crystallographic coordinates available at the protein data bank (PDB), under the access code 2DFN [13]. We performed the docking simulation against the active site of MtSK and compared the docked poses with the crystallographic structure. We used the MOLDOCK default protocol with center at coordinates x = (−15.23), y=(−14.38), and z = (14.88) Å, and a docking sphere radius of 9 Å. Figure 2 shows the docking sphere used in the simulations.

Fig. 2
figure 2

Search space sphere (green) defined for molecular docking simulations

In the implementation of EA in MOLDOCK, computational approximations of an evolution course, called genetic operators, are applied to simulate the permanence of the most positive features. In a sample space, where there is a problem or a search routine and many different possible solutions (candidates), each option is ranked based on a set of parameters (scoring function or fitness function), and only the best ranked solutions are kept for the next iteration. This cycle is repeated until an optimal solution can be found. In the molecular docking simulations, the optimal solution is the one with the best scoring function, which should be the closest to the crystallographic structure. MOLDOCK presents two biological inspired algorithms to perform positional searches in docking simulations. One is called the optimizer search algorithm (MOLDOCK Optimizer), which is based on an GDEA [14]. The second is a simplex evolution algorithm (SE) called MOLDOCK SE. GDEA is based on an EA adjustment called differential evolution (DE), which provides a distinct method for selecting and modifying candidate solutions (individuals). We used MOLDOCK Optimizer as search algorithm.

In addition to re-docking, a procedure called “cross-docking” can also be used to further validate a docking protocol. Considering that several crystallographic structures are available for the same protein, cross-docking can be applied. This procedure involves docking a number of ligands found in a variety of crystal structures of a protein identical to a single rigid protein crystallographic conformation [17]. When a protein target presents major conformational changes upon ligand binding, a significant difference is expected between the crystallographic and docked structures. We identified 12 MtSK structures in PDB with ligands in the shikimate-binding site ( PDB access codes: 2DFN, 1U8A, 1WE2, 1ZYU, 2G1K, 2IYQ, 2IYR, 2IYS, 2IYX, 2IYY, 2IYZ, and 3BAF). This search was performed on February, 24th 2011. This validation procedures, re-docking and cross-docking, is the initial stage of a virtual screening protocol (phase 1) described in the next sections.

Virtual screening

Our virtual screening (VS) protocol is divided in four phases as shown in Fig. 3. Phase 1 is focused on selection and validation of a docking protocol, as described earlier in the section re-docking and cross-docking. Phase 1 ends when an adequate protocol is found (selection criterion RMSD < 2.0 Å). It should be pointed out that the RMSD criterion is dependent on the number of torsion angles, and a less demanding criterion may be adopted for re-docking of a ligand with a number of torsion angles higher than 10 [14]. Once a docking protocol is chosen we select a small-molecule database to be used in the screening (phase 2). Here we used a ligand library commercially obtainable at Acros Organics. The ligands (mol2 format) were downloaded from http://zinc.dock.org [18], with a total of 4579 small molecules. In addition to commercially oriented databases the ZINC database also provides an interface to build small-molecule databases based on molecular similarity, such as Tanimoto coefficient [19, 20].

Fig. 3
figure 3

Flowchart of virtual screening process

In phase 3, we start docking simulations for each ligand present in the selected database. MOLDOCK program is the workhorse of the present protocol. It was used in all docking simulations described here. During a typical docking simulation several orientations can be obtained for each ligand. Here we selected the one with the lowest scoring function. The scoring function used by MOLDOCK improves accuracy of scoring functions with a new hydrogen bonding term and new charge schemes. Four scoring functions are implemented in the MOLDOCK, including MOLDOCK score and PLANTS score [14, 21]. These two functions offer grid-based versions, in which hydrogen bond directionality is not considered. In the present protocol we employed grid-based MOLDOCK score since it offers approximately four-fold greater speed by performing a precalculation of potential-energy values on an equally spaced cubic grid.

The MOLDOCK score is based on the piecewise linear potential (PLP) scoring functions developed by Yang et al. [22, 23]. The docking scoring function E MOLDOCK SCORE is defined as the following:

$$ {{\text{E}}_{\text{MOLDOCK SCORE}}} = {{\text{E}}_{\text{Intramol}}} + {{\text{E}}_{\text{intermol}}}, $$
(2)

where E intermol is the intermolecular interaction energy:

$$ {{\text{E}}_{\text{intermol}}} = \sum\limits_{{{\text{i}} \in {\text{ligand}}}} {\sum\limits_{{{\text{j}} \in {\text{protein}}}} {\left[ {{332}\frac{{{{\text{q}}_{\text{i}}}{{\text{q}}_{\text{j}}}}}{{{\text{D}}{{\text{r}}_{\text{ij}}}}}{ } + { }{{\text{E}}_{\text{PLP}}}{ (}{{\text{r}}_{\text{ij}}}{)}} \right]} }. $$
(3)

All non-hydrogen atoms in the ligand and protein are taken in the summation. The first term accounts for electrostatic interactions, in which the factor 332 is used to obtain energy in kJ mol−1. D represents the dielectric constant, which is the following: D = 4rij. The second term (E PLP ) is a PLP, described elsewhere [22, 23]. To ensure that no energy term can be superior to the clash penalty, the electrostatic term is cut off at a level equivalent to the distance of 2.0 Å for distances less than 2.0 Å.

Intramolecular energy is given by the following equation:

$$ {{\text{E}}_{{{\text{intramol}}}}} = {{\text{E}}_{{{\text{penalty}}}}} + \sum\limits_{{{\text{i }} \in {\text{ ligand}}}} {\sum\limits_{{j{\text{ }} \in {\text{ ligand}}}} {{{\text{E}}_{{{\text{PLP}}}}}({{\text{r}}_{{ij}}})} } + \sum\limits_{{{\text{single bonds}}}} {{\text{A}}\left[ {1{\text{ }} - {\text{ }}\cos \left( {{\text{n}}\varphi - {\varphi _{0}}} \right)} \right]} $$
(4)

The term E penalty is a penalty energy to be added to E intramol when two non-bonded atoms are closer than 2 Å (for non-hydrogen atoms). This term avoids unrealistic molecular topologies for the ligands. The second term is a PLP, already mentioned [22, 23]. The last term accounts for torsion energy, which is expressed as a periodic function. In this term, A, n, and ϕ o are empirically determined [22, 23]. MOLDOCK defines a limiting sphere where the search is focused. If a ligand non-hydrogen atom is positioned outside this limiting sphere (the search space sphere), then a constant penalty of 10000 is added to the total energy (implemented for the grid-based version of the MOLDOCK score).

After identification of potential inhibitors by molecular docking simulations, the best scored ligands were submitted to the web server FAF-Drugs [24], in order to assess physical-chemical properties (phase 4). These are key properties that need to be considered in early stages of the drug discovery process, and FAF-Drugs allows users to filter molecules via simple rules such as molecular weight, polar surface area, logP and number of rotatable bonds. The ligands were filtered following the Lipinski’s rule of five (RO5). RO5 advocates that drugs which present oral bioavailability, in general, follow: molecular weight less or equal to 500, LogP less or equal to 5, number of hydrogen bond donor groups less or equal to 5 and number of hydrogen bond acceptor groups less or equal to 10 [25].

In addition to the 4579 small molecules present in the Acros database we added staurosporine (PubChem Compound Identification: CID 44259) to the database to be used in the VS. This molecule has been already tested directly on MtSK [26]. This addition allows testing whether this VS protocol is able to identify SK inhibitors present in a database with over 4,500 ligands.

Enrichment factor

In order to further validate the present VS protocol we calculated the enrichment factor (EF), which takes into account the improvement of the hit rate by a VS protocol compared to a random selection. EF is defined by the following equation,

$$ {\text{EF}} = \frac{{{{\text{H}}_{\text{a}}}{/}{{\text{H}}_t}}}{{{\text{A/N}}}} $$
(5)

where Ha is the number of active compounds in the Ht top-ranked compounds of a total database of N compounds of which A are active [27, 28]. Successfully VS implies EF >>1. For this validation simulation, a sub-set of a previously described kinase decoy database (containing 627 molecules) was spiked with the four known MtSK inhibitors (A = 4) [26]. This database for MtSK decoys and active MtSK inhibitors is available for downloading at: http://azevedolab.dominiotemporario.com/doc/mtsk_decoys_set.zip.

Results and discussion

Docking and cross-docking

Re-docking simulations (phase 1 of the VS protocol) using the structure 2DFN generated an RMSD of 1.6 Å. In addition, cross-docking simulations generated RMSD ranging from 1 to 2 Å further validating the present docking protocol. These two tests indicated that the docking simulation was successful, and that the protocol is good enough to be used for the virtual screening process.

Virtual screening

VS uses computational methodologies to identify biologically active molecules against a specific protein target. Two main methodologies are used in VS. Methods that search for similarity to validated ligands and molecular docking methods that require the use of crystallographic information of the target. Here we made use of the second approach. VS studies performed by other research groups have been previously published on the identification of MtSK inhibitors as antitubercular drugs by similar molecular docking procedures [29, 30]. Nevertheless, very limited information (experimental data) on the direct effect of these compounds on MtSK is yet available. The novelty of the present work relays on the method used in the VS and the selection of compounds according to their pharmacological properties. The VS simulations were carried out using the MOLDOCK program, having as target the MtSK (PDB access code 2DFN). The ligand library comprises 4580 molecules (Across database plus staurosporine). Addition of a known SK inhibitor allows testing the accuracy of the present protocol.

After docking simulations, we selected 20 top-scoring compounds from the initial set of 4580 compounds (selection based on MOLDOCK score). Staurosporine was present in the 20 top-scoring compounds obtained in the VS, with MOLDOCK score of −144.168. Identification of a known SK inhibitor among the best VS results gives further validation for this VS protocol. These 20 potential inhibitors were submitted to filter tests, available at the web server FAF-Drugs [24], to exclude those compounds that have known undesirable physical-chemical features to oral bioavailability. We could have applied this filter analysis previous to docking simulations, since it would reduce simulation time. Nevertheless, we kept filtering analysis after docking simulation, since the MOLDOCK protocol was fast enough to be run in less than a week of CPU time of an iMac (Intel Processor Core 2 Duo, 2.66 GHz, 2 GB SDRAM DDR3 1066 MHz). In addition, application of filtering analysis previous to docking simulations could eliminate candidates that fail to filtering analysis but present promising MOLDOCK score, which could have toxicity reduced by small modification in the structure.

Especially interesting is the fact that staurosporine is a well-known cyclin-dependent kinase (CDK) inhibitor that has a plethora of structural and functional studies [3135]. Staurosporine is non-selective and too toxic for use in therapy, but UCN-01, a hydroxylated form of staurosporine (7-hydroxystaurosporine), shows greater selectivity for CDK and is currently undergoing clinical trials in the United States and Japan [33]. This opens new possibilities to test new molecular moieties as potential SK inhibitors, the CDK inhibitors that have already shown low toxic effects make a promising dataset to be explored as potential SK inhibitors.

The FAF-Drugs parameters used were those of the Lipinski's role of five [25]. From the set with 20 selected molecules, nine fit the Lipinski’s role of five, which includes stauorosporine. Figures 4a–i show the molecular structures for all nine ligands. Since staurosporine is already a known SK inhibitor we excluded it from the rest of the analysis. Staurosporine was included only to test the VS protocol. The selected ligands are shown in Table 1. The MOLDOCK scores for these eight molecules ranging from −144.208 to −151.943. All eight ligands show MOLDOCK scores better than staurosporine.

Fig. 4
figure 4

Molecular structures of the top-scoring compounds identified in the VS protocol. a) Staurosporine. b) ZINC15707201. c) ZINC20462780. d) ZINC15707234. e) ZINC15675581. f) ZINC15707188. g) ZINC22936889. h) ZINC20464408. i) ZINC22936937

Table 1 Physical-chemical properties of ligands that fitted the Lipinki's role of five after analysis by FAF-Drugs

Enrichment factor

The complete database used to calculate the EF contains 627 decoys and four active molecules (N = 631). After application of the present VS protocol against this database, six molecules (Ht = 6) were retrieved as hits (0.95% of the database). Among these hits, two molecules were known MtSK inhibitors (Ha = 2) [26]. Thus, the enrichment factor was found to be 52.6, indicating that it is 52.6 times more likely to pick an active compound from the database than an inactive one. Previously published benchmarking sets for molecular docking of kinases exhibited EF ranging from 1.2 to 54, indicating that the present VS protocol is adequate for our purposes.

Intermolecular interactions

In order to better understand the interactions of these eight molecules with MtSK, we used the program LIGPLOT [36] to access the atoms of both, the small molecules and the protein ones that are responsible to make hydrogen bonds and van der Waals contacts. A comparison among the MOLDOCK score values obtained for these ligands, is not enough yet to predict activity, since in vitro assays are necessary to conclude this. Therefore it is not possible to say that the selected compounds, the ones with the best MOLDOCK scores, would be the most potent ones. We could observe, only, that among the selected compounds the best scores mean a greater potential to interact with the shikimate-binding cavity.

The docking simulation results corroborate the importance of some shikimate-active site residues as responsible to establish intermolecular interactions with the substrate as well as with the tested ligands. The binding of shikimate to its cavity, presents pivotal residues that make protein-ligand interactions possible, as shown in Fig. 5. These residues are essential to the ligand binding and, finally, to the reaction catalyzed by the enzyme. The SK residues that perform intermolecular hydrogen bonds (HB) with the shikimate are: Gly80, Arg136 and Arg58. SK makes van der Waal contacts with residues: Ile45, Asp34, Pro11, Pro118, Gly79, Phe57, Leu119 and Gly81.

Fig. 5
figure 5

Shikimate-binding pocket with main residues found in intermolecular interactions with shikimate

Information about intermolecular interactions for all eight top-scoring compounds is summarized in Table 2. Analysis of shikimate-binding site indicated that all top-scoring compounds present interaction with residues Lys15, Ser16 and Arg117. Figure 6 shows the intermolecular interactions for the top-scoring compound (ligand 1, ZINC15707201). This figure is representative of the positioning of all top-scoring compounds in the shikimate-binding pocket. Ligands 1, 2, 4, 5, 6 and 8 highlight the presence of residue Val116, suggesting it is also relevant to intermolecular interactions. Two previously published molecular docking studies focused on MtSK were able to identify intermolecular molecular interactions with the same residues [29, 30], further corroborating the pivotal importance of these residues for ligand-binding affinity. Especially interesting is the fact that these previous molecular docking studies analyzed completely different molecular moieties, such as dipeptides (arginine-aspartate/lysine-aspartate) [30] and triazole/tetrazole heteroaromatic systems [29]. These molecular structures were not present in the database used in the present study.

Table 2 Intermolecular interactions for the top-scoring ligands selected in the VS procedure. The presence of an X indicates that the interaction occurs. HB means hydrogen bonds and VDW means van der Waals contacts
Fig. 6
figure 6

Shikimate-binding pocket with main residues found in intermolecular interactions with the top-scoring compound (ZINC15707201)

Conclusions

Advanced molecular docking algorithms available nowadays make it possible to undertake larger virtual screening studies focused on small-molecules libraries up to millions of compounds. Here we described an efficient molecular docking protocol, which was able to recover crystallographic position of a ligand present in the active site of the SK. Re-docking and cross-docking simulations generated RMSD results below 2 Å. The virtual screening protocol was able to confirm a known SK inhibitor, staurosporine, as a top-scoring compound and presents an enrichment factor of 52.6. Furthermore, the present work indicates new molecules with the potential to become drugs against TB. Besides, we identified the MtSK binding-cavity residues that are essential to make possible the interactions of this enzyme with a variety of molecules. Analysis of the top-scoring compounds also indicates that MtSK has the ability to bind a variety of molecular moieties not previously identified.