Introduction

Tuberculosis (TB) remains one of the deadliest diseases worldwide. World Health Organization estimates that one third of the population is infected with Mycobacterium tuberculosis, the causative agent of TB [1]. It has been estimated that there are approximately 8 million new cases of TB every year and 2 million deaths occur each year [2]. The reemergence of TB is due to the high incidence of AIDS, the proliferation of drug-resistant strains and the decline in health care structures and national surveillance. It has been pointed out that the emergence and spread of multi-drug resistant strains of M. tuberculosis (MDR-TB), defined as resistant to at least isoniazid and rifampicin, could threaten global TB control [3]. More recently, a survey of the frequency and distribution of extensively drug-resistant (XDR) TB cases showed that during 2000–2004, of 17,690 TB isolates, 20% were MDR and 2% were XDR [4]. XDR-TB is defined by the World Health Organization as resistance to at least rifampin and isoniazid plus resistance to the fluoroquinolones and to at least one of the injectable drugs capreomycin, kanamycin and amikacin [5]. XDR-TB has a wide geographic distribution [6] and high fatality rate [7]. The emergence of XDR-TB worldwide raises the bleak prospect of virtually untreatable TB. There is thus an urgent need for the development of new chemotherapeutic agents to treat TB.

CMP kinases belong to a family called nucleoside monophosphate kinases (NMK), which are key enzymes in the metabolism of nucleotides [8]. Substrate specificity studies on recombinant human UMP/CMP kinase (pyrimidine nucleoside monophosphate kinase) has shown that UMP and CMP are far better substrates than dCMP [9]. Bacterial cytidylate kinase or cytidine monophosphate kinase (CMP kinase) catalyses the phosphoryl transfer from ATP to CMP and dCMP, resulting in the formation of nucleoside diphosphates. CMP kinase has been shown to be essential for bacterial growth in Streptococcus pneumoniae [10]. The Staphylococcus aureus CMP kinase crystal structure has recently been resolved [11]. The crystal structure of Escherichia coli CMP kinase resembles those of other NMP kinases sharing common features such as a central five-stranded β-sheet connected by α-helices, a fingerprint sequence of Glu-x-x-Gly-x-Gly-Lys (P-loop), and an anion hole in the central cavity for substrate binding [12]. However, E. coli CMP kinase contains an insertion of 40 residues and a short LID domain, differing from other NMP kinases [12]. These differences could allow the rational design of drugs specific for CMP kinase of pathogenic bacteria, such as M. tuberculosis. However, no report has yet described the three-dimensional structure of M. tuberculosis CMP kinase. In the present work we modeled the structure of CMPK from M. tuberculosis (MtCMPK). It was analyzed the structure of MtCMPK in complex with cytidine-5′-monophosphate (CMP) to identify the structural basis for the substrate interactions in the binding site and to predict the enzyme substrate specificity. The structural features and the structural stability were assessed by molecular dynamic (MD) simulation.

Materials and methods

Molecular modeling

Homology modeling is usually the method of choice when there is a clear relationship of homology between the sequences of a target protein and at least one experimentally determined three-dimensional structure. This computational technique is based on the assumption that tertiary structures of two proteins will be similar if their sequences were related, and it is the approach most likely to give accurate results [13].

For modeling of the MtCMPK we used restrained-based modeling implemented in the program MODELLER 9v1 [14]. This program is an automated approach to comparative modeling by satisfaction of spatial restrains [15]. The modeling procedure begins with alignment of the sequence to be modeled (target) with related known three-dimensional structures (templates). This alignment is usually the input to the program and the output is a three-dimensional model for the target sequence containing all main-chain and side-chain non-hydrogen atoms [16].

The high degree of primary sequence identity between MtCMPK (target) and of E.coli indicates that these crystallography structures are good models to be used as templates for MtCMPK enzyme (target). The alignment of the MtCMPK (target) and E. coli CMPK is shown in Fig. 1 [17].

Fig. 1
figure 1

Sequence alignment for EcCMPK and MtCMPK The sequence identity between EcCMPK and MtCMPK is ∼40%. The alignment was performed using ClustalW and edited with BioEdit [17]

A total of 1000 models were generated for each binary complex and the final models were selected based on the MODELLER [14] objective function.

Evaluation of binding affinity

Analysis of the interaction between a ligand and a protein target is still a scientific endeavor. The affinity and specificity between a ligand and its protein target depend on directional hydrogen bonds and ionic interactions, as well as on shape complementarity of the contact surfaces of both partners [18]. We used the programs X-SCORE [19], SCORE [20] and PEARLS [21] to evaluate the binding affinity of the ligand against E. coli CMPK and MtCMPK.

Analysis of the models

The overall stereochemical quality of the final models for each enzyme of the MtCMPK was assessed by the program PROCHECK [22, 23] and the objective function supplied of the MODELLER [14].

Molecular dynamic simulations

MD simulations were performed with the GROMACS [24] package using the Gromos 96.1 (43A2) force field. The CMP topology was generated with the PRODRG program [25]. Accurate force fields are essential for reproducing the conformational and dynamic behavior of condensed-phase systems, the Gromos 96.1 force fields are well parameterized for proteins but the parameters for small molecules are still limited for simulations of more complicated biological systems. Accordingly, GAMESS was used for the atomic charges in the CMP molecule [26], which were submitted to single-point ab initio calculations at RHF 6-31G* level in order to obtain Löwdin derived charges. Manipulation of structures was performed with Swiss-PDBViewer v3.7 program [27]. The first system was composed by apoenzyme MtCMPK (system 1) and the second by MtCMPK enzyme, three sulphate ions and CMP ligand (system 2). The simulations of two systems were performed by a time period of 3 ns. Na+ counter ions were added to both systems (six Na+ ions on system 1 and 14 Na+ ions on system 2) using Genion Program of the GROMACS simulation suite to neutralize the negative charge density of the systems.

Each structure was placed in the center of a truncated cubic box filled with simple point charge (SPC/E) water molecules [28], containing 14,993 water molecules for system 1 and 16,741 water molecules for system 2. The initial simulation cell dimensions were 44.78 × 47.23 × 49.38 Å for system 1 and 51.37 × 48.76 × 55.23 Å for system 2, and had the protein solvated by a layer of water molecules of at least 10 Å length in all directions in both systems. During the simulations, bonds lengths within the proteins were constrained by using LINCS algorithm [29]. The SETTLE algorithm was used to constrain the geometry of water molecules [30]. In the MD protocol, all hydrogen atoms, ions, and water molecules were first subjected to 1000 steps of energy minimization by steepest descent followed by 500 steps of conjugate gradient to remove close van der Waals contacts. The systems were then submitted to a short molecular dynamic with position restrains for a period of 20 ps and afterwards performed a full molecular dynamics without restrains. The temperature of the system was then increased from 50 to 300 K in 5 steps (50 to 100 K, 100 to 150 K, 150 to 200 K, 200 to 250 K, 250 to 300 K), and the velocities at each step were reassigned according to the Maxwell-Boltzmann distribution at that temperature and equilibrated for 10 ps except the last part of termalization phase that were for 40 ps. Energy minimization and MD were carried out under periodic boundary conditions. The simulation was computed in the NPT ensemble at 300 K with the Berendsen temperature coupling and constant pressure of 1 atm with isotropic molecule-based scaling [31]. The LINCS algorithm, with a 10-5 Å tolerance, was applied to fix all bonds containing a hydrogen atom, allowing the use of a time step of 2.0 fs in the integration of the equations of motion. No extra restraints were applied after the equilibration phase. The electrostatic interactions between nonligand atoms were evaluated by the particle-mesh Ewald method [32] with a charge grid spacing of ∼ 1.0 Å and the charge grid was interpolated on a cubic grid with the direct sum tolerance set to 1.0 × 10−5. The Lennard-Jones interactions were evaluated using a 9.0 Å atom-based cutoff [33].

All analyses were performed on the ensemble of system configurations extracted at 0.5-ps time intervals from the simulation and MD trajectory collection was initiated after 1 ns of dynamics to guarantee a completely equilibrated evolution. The MD simulation and results analysis were performed on a personal computer Intel Core 2 Duo E6300 - 1,86 GHz and 4 Gb RAM.

The convergences of the different simulations were analyzed in terms of the secondary structure, root mean-square deviation (RMSD) from the initial models structures, and root mean-square fluctuation (RMSF).

For the RMSFs were calculated relative to the last 2 ns averaged backbone structures, and all coordinate frames from the trajectories were first superimposed on the initial conformation to remove any effect of overall translation and rotation.

Results and discussion

Quality of the models

There is no crystallographic structure available for MtCMPK, however the sequence identity (38.70%) between MtCMPK and E. coli CMPK (PDB access code: 1KDO) sequences makes E. coli CMPK structure a good template for modeling of MtCMPK. The atomic coordinates of crystallography structures of template were hence used as basic models for MtCMPK modeling. The atomic coordinates of all water molecules were removed from the templates.

The analysis of the Ramachandran diagram ϕ - ψ plots for the template (E. coli CMPK) was used to compare the overall stereochemical quality of the MtCMPK structure against those of the template solved by biocrystallography. The homology model has over 92.1% of the residues in the most favorable regions.

Overall description

The structural model of the MtCMPK contains a seven beta-ribbon motif. The beta-ribbons are composed of residues 7–12, 33–37, 69–75, 78–84, 127–132, 145–150 and 202–206. Ten alpha - helices surround the beta - ribbon structure. The helical regions are composed of residues 19–30, 38–51, 58–66, 89–91, 95–105, 109–120, 134–138, 153–167, 173–189 and 211–225. Figure 2 shows a schematic drawing of the MtCMPK structure (monomer). As described in the M. tuberculosis H37Rv genome annotation [34], the model of MtCMPK consists of 230 amino acids with a predicted molecular mass of 24,177.29 Da and a theoretical pI value of 5.05. The E. coli CMPK consists of 223 amino acids with a predicted molecular mass of 24,746.34 Da and a theoretical pI value of 5.56. Analysis of both structures indicates the conservation of the residues that make intermolecular hydrogen bonds with cytidine-5′-monophosphate.

Fig. 2
figure 2

Tertiary structure of the apoenzyme MtCMPK. The structure is presented as a ribbon diagram. The structure contains ten alpha-helices surround by seven-stranded β-sheet. The image was generated using Pymol [22]

MD simulations

We performed molecular dynamics simulations of the MtCMPK structure in the apo form (system 1) and the complex MtCMPK:CMP (system 2) to elucidate the influence of CMP on the overall structure of the MtCMPK. The root-mean square deviation (RMSD) of the positions for all backbone C-alpha atoms from their initial configuration as a function of simulation time for all systems were calculated and are shown in Fig. 3. Analysis of this figure indicates that the structure without ligand presents high RMSD when compared with the structure of the binary complex.

Fig. 3
figure 3

Graphical representation of root-mean-square deviation (RMSD) of all Cα from starting structure of models as a function of time. The graphic a shows the RMSD of apoenzyme MtCMPK, b graphic shows the RMSD of MtCMPK-CMP complex. The dashed line gives the equilibration phase, the solid line shows the last 2 ns of calculation

As shown in Fig. 3a system 1 presents an increase in RMSD values during overall MD simulation showing a relative stability between 250 and 1500 ps. On the other hand, system 2 after a rapid increase during the first 250 ps reaches a plateau of 2.6 ± 0.2 Å value for RMSD of all C-alpha atoms. Figure 3b, suggests that that 2980 ps unrestrained simulation was sufficient for stabilizing a fully relaxed MtCMPK-CMP model. Accordingly, the MtCMPK-CMP binary complex structure appears to be more stable than the apo form of the enzyme.

Superposition of the average structure of system 2 with the initial model (Fig. 4) does not show major conformational changes from the initial model, which is consistent with the relatively low RMSD value. System 1 presents high RMSD values, however its secondary structure was kept (Fig. 5), and these high values are due to the flexibility of the apo form of MtCMPK.

Fig. 4
figure 4

Superposition of the average during the last of 2000 snapshots with the initial minimized structure of MtCMPK-CMP. The structures are presented as ribbon diagram. The average structure is colored light gray; the initial structure is colored dark gray

Fig. 5
figure 5

Superposition of the average during the last of 2000 snapshots with the initial minimized structure of aporenzyme MtCMPK. The structures are presented as ribbon diagram. The average structure is colored light gray; the initial structure is colored dark gray

The flexibilities of the proteins were assessed by the RMSF values from MD of the trajectory which reflects the flexibility of each atom residue in a molecule (Fig. 6). The major backbone fluctuation occurs in the loop region and in the region surrounding the beta-alpha-beta fold, whereas regions with the low RMSF correspond exclusively to the rigid beta-alpha-beta fold. These results indicate the stability of our model structures.

Fig. 6
figure 6

Graphical representation of root-mean-square fluctuations (RMSF) of all Cα from starting structure of models as a function of time. The graphic shows the RMSF of apoenzyme MtCMPK and of MtCMPK-CMP complex. The average of the last 2 ns of calculation is given in the dashed line apoenzyme MtCMPK and the solid line shows MtCMPK-CMP complex

LID domain

The MtCMPK structure model as E. coli CMP kinase contains a short LID domain [12]. Shikimate Kinase from M. tuberculosis (MtSK), a member of NMP kinase family, has three domains: the CORE, LID and NMP-binding domains. A comparison between the crystal structure of MtSK structure in complex with MgADP and Erwinia chrysanthemi SK suggested a concerted conformational change of the LID and shikimate binding (SB) domains upon nucleotide binding [35, 36]. More recently, a comparison between the structures of MtSK-MgADP-shikimate dead-end ternary complex and the MtSK-MgADP binary complex showed that the LID and SB domains undergo concerted movements toward each other [37], resulting in an additional closure of the enzyme active site. The RMSF values calculated for both systems described here suggest that a short LID domain in MtCMPK may undergo conformational changes as observed for the structure of MtSK (Fig. 6).

The LID of MtCMPK corresponds to the Arg160-Asp174 domain. The large mobility of LID domain is consistent with the need for this region to undergo conformational changes upon substrate binding to shield the enzyme active site from water in order to avoid ATP hydrolysis [38]. In the LID domain the behavior differences between system 1 and system 2 is evident and the LID domain appears in a more open conformation in apoenzyme than in MtCMPK-CMP (Fig. 7), demonstrating that the CMP induces a major stability of this region, as the substrate binding loop and is involved in substrate entrance and exit.

Fig. 7
figure 7

Superposition of the apoenzyme MtCMPK with the structure of MtCMPK-CMP showing the LID domain, the distance of Leu169 (C-alpha) between both structures is shown. The structure of the apoenzyme MtCMPK is colored light gray and MtCMPK-CMP structure is colored dark gray

The NMPbind domain

The NMPbind domain has also been described as undergoing motions when it binds the phosphoryl acceptor substrate. The NMPbind domain of MtCMPK corresponds to the segment Gly38-Glu123. As GMP kinases from yeast, the MtCMPK has a two-stranded β sheet in the NMPbind domain [12], contrary to EcCMPK that contains a three-stranded antiparallel β sheet [11]. The RMSD for the last 2 ns between MtCMPK and MtCMPK-CMP is 9.02 Å demonstrating a large motion in this region. These data are is in agreement with RMSF, which thus confirms the differences between ligand-free enzyme and enzyme in complex with CMP.

The sulphate ion bound in the phosphate donor site

MtCMPK contains the classical mononucleotide-binding motif: a β strand (β1 for MtCMPK) followed by a helix (α1) connected by a glycine-rich loop with a strongly conserved fingerprint sequence Gly-X-X-Gly-X-Gly-Lys (P loop), where X stands for any amino acid [8]. This motif forms a giant anion hole also known as P-loop. In MtCMPK, this motif corresponds to residues Gly13-X-X-Gly16-X-Gly18-Lys. The RMSF values indicate the high flexibility of helix α1 (Lys19-Leu30) and the phosphate donor site (Gly13-Gly18) in apo form. The presence of CMP causes a major stability and approximation of this structure when compared with its ligand-free model. It has been suggested that negatively charged ions with tetrahedral geometry (such as sulphate and phosphate ions) can inhibit kinases from binding nucleotides bearing a β-phosphate [39].

Interaction with cytidine-5′-monophosphate

The specificity and affinity between enzyme and its inhibitor depend on directional hydrogen bonds and ionic interactions, as well as on shape complementarity of the contact surfaces of both partners [40]. Analysis of the hydrogen bonds between cytidine-5′-monophosphate and MtCMPK reveals ten intermolecular hydrogen bonds and 14 with EcCMPK, the residues involved in the interaction with ligand are showed in Table 1. Analysis of the affinity constants between complex protein-ligand calculated by the program X-SCORE, SCORE and PERALS (Table 2) reveals that the pKd value of 4.93 calculated by SCORE program is more accurate as compared to the experimental value of 4.45 pKd units [41]. EcCMPK has higher affinity for CMP than MtCMPK, which is consistent with the larger number of hydrogen bonds for the latter as compared with the former.

Table 1 Intermolecular contacts of EcCMPK and MtCMPK with CMP
Table 2 pKds for EcCMPK and MtCMPK

Conclusions

The molecular models of MtCMPK show that the interaction with CMP is favorable as observed for EcCMPK. The results obtained are in agreement with experimental data [12] demonstrating that the LID domain, NMPbind domain and phosphate donor site have low mobility when complexed with CMP and large motion in ligand-free form. Moreover, the data presented here suggest that the mode of action for MtCMPK may be similar to MtSK, with the open/closed conformational changes of the LID domain and the substrate binding loop involved in substrate entrance and exit.