Introduction

According to the World Health Organization [1], 21 Latin American countries are endemic for Chagas disease, which is caused by the parasite Trypanosoma cruzi (T. cruzi), and about 6–7 million people are infected with T. cruzi [2]. Chagas disease has two phases: the initial acute phase lasts for about 2 months with mild or little symptoms, followed by the chronic phase, in which 30 % (10 %) of patients suffer from cardiac (digestive) disorders before death or heart failure [1].

Chagas disease can be cured with benznidazole and nifurtimox if treatment is initiated soon after infection [3]. Both drugs are activated by a NADH-dependent, mitochondrially localized, bacterial-like, type I nitroreductase (NTR) and resistance to these compounds is associated with loss of NTR [4]. It is well established that both drugs are far from being ideal because of their controversial efficacies in chronic phase, severe host toxicity, and long treatment durations [5].

The identification of new protein targets for drug development against Chagas disease is therefore needed and strategies to identify enzymatic functions specific to the particular T. cruzi pathogen are highly desirable. Indeed, the enzyme forms that are absent in humans can be used as targets for the development of new drugs that would have a low chance of side effects [68].

Recently, several proteins have been proposed as new biotargets against Chagas disease and 3D models were built based on homology modeling followed by molecular dynamics simulations. These T. cruzi biotargets include NADH-dependent fumarate reductase [9], ribosomal P0 antigenic protein [10], silent-information regulator 2 proteins or sirtuins [11], and a putrescine–cadaverine permease [12], among others. While these studies provide useful information on specific targets, a large-scale search of promising proteins is desirable.

In this context, the analogous enzyme pipeline (AnEnPi), which compares genomic datasets for analogous enzymes by clustering the primary structures of enzymes with the same described activity, is a tool able to identify enzymatic activities that may be potentially useful for the design of new therapeutic targets. AnEnPi searches for enzymes specific to a parasite and enzymes that are analogous for the same enzymatic function, but structurally distinct in the parasite and its human host [13]. Using AnEnPi with a Blastp similarity raw score of 120 as cut-off, Capriles et al. identified 41 protein sequences classified as analogous or specific for T. cruzi compared to H. sapiens on the basis of similarity detection by the local alignment program BLAST, and then used automatic 3D structure modeling to infer differences between the host and parasite protein structures [14].

In this study, we revisit these 41 protein candidates both in terms of specificity and 3D modeling by using a more complex strategy and more accurate techniques. Indeed, it is known that profile–profile alignments are more sensitive than the local alignment program PSIBLAST for similarity [15, 16], and automatic 3D modeling methods lead to significant structural errors for sequence identities <80 % [17]. The workflow for T. cruzi protein modeling we developed consists of the following steps: sequence clustering, prediction of cell localization and enzyme classification (EC) number and analysis of metabolic pathways, homology search in H. sapiens and other organisms, 3D template identification, model generation and refinement by molecular dynamics (MD) simulations refinement.

Overall, we find that it is possible to cluster the 41 protein sequences identified as analogous or specific for T. cruzi versus H. sapiens into seven groups, and based on the current Protein Data Bank (PDB) structures, it is possible to provide structures for 33 T. cruzi sequences associated with five very probable enzymatic activities: ATPase (EC: 3.6.3.6), trypanothione reductase (1.8.1.12), 2,4-dienoyl-CoA reductase (1.3.1.34), cysteine synthase (2.5.1.47), and leishmanolysin (3.4.24.36). The implications of our results for drug development are discussed.

Materials and methods

Our pipeline for the 41 protein sequences is shown in Fig. 1.

Fig. 1
figure 1

Workflow used for protein characterization

Sequence clustering

We used the sequence search of the PFAM server to identify the PFAM domains for of each of the 41 sequences [18]. Sequences having the same PFAM domains were then grouped into clusters.

Cell localization prediction

We used both the EUK-mPloc 2.0 and CELLO v.2.5 servers [19, 20]. Note that the EUK-mPloc 2.0 learning dataset includes 8897 protein sequences (7766 different proteins), classified into 22 eukaryotic subcellular locations.

EC number prediction and metabolic pathway identification

Pathway graphs can be thought of as three information layers: (i) the lowest layer is associated with the metabolites that act as substrates (input) and products (output) of enzymatic reactions, (ii) the intermediate layer consists of the enzymes that perform the enzymatic reaction with which the biological engineer intended to interfere through drug inhibition, and (iii) the highest organization level is given by the topology of a metabolic path in which enzymes are inserted. An enzyme within a non-redundant path is a better target than an enzyme in a redundant one since its inactivation will inevitably result in the elimination of the corresponding product. If that path is a key path for the whole parasite metabolism, the target inhibition is expected to have deleterious consequences on the parasite biology. Considering these three organization levels [21], we used KEGG [22, 23] to map the EC number of an enzyme target and MetaCyc [24] to retrieve information about the metabolic reactions of each enzymatic reaction. We also checked that our predictions are consistent with the AnEnPi server [13].

Homology identification

To identify homologs in H. sapiens and other organisms, we have used HHpred [25]. A homolog was identified if the sequence identity is > 30 % for an amino acid coverage of more than 90 %.

3D modeling of T. cruzi targets

We searched for template 3D structures using the HHsearch method hosted on Mobyle against the PDB and SCOP databases [26]. Then 3D models were generated by HHalign-Kbest procedure [27] that perform HHM alignments based on primary sequences and 3D structures. It has to be noted that if HHalign-Kbest identified a hit with a sequence identity >35 % (or < 35 %), 20 (or 50) models were automatically created. Outputs consist of the five best models ranked according their Z-scores awarded by Qmean4 [28] and only the best model was used for homology modeling.

When the template did not cover the full sequence, we examined the putative biological information of the protein (aqueous solution, anchored or inserted into the membrane), and then used the template-based C8-Scorpion [29], the Porter 4.0 [30], and PrDOS [31] servers to determine whether the missing residues are predicted to have secondary structure content or disordered.

Model refinement by molecular dynamics

Simulations were performed with Charmm22* force field [32] at pH 7, i.e., with deprotonated Glu and Asp and protonated Arg and Lys for all proteins, and neutral His with a protonated Nε atom for all proteins except leishmanolysin. This all-atom force field has been able to fold properly many soluble proteins [3335]. The soluble proteins were centered in a cubic box of TIP3P water molecules [36] with a box extended 1.5 nm outside the protein on all sides, and the appropriate numbers of Na+ and Cl ions were added to ensure neutral systems. The leishmanolysin protein simulation includes the Zn2+ ion covalently bonded to the three histidines of the protein by application of a harmonic restraint on the bond lengths connecting Zn2+ to His. The ATPase transmembrane protein was immerged in a 1-palmitoyl-2-oleoylphosphatidylethano lamine (POPE) bilayer and then centered in a cubic box of TIP3P molecules [37]. Each system was subjected to energy minimization with steepest descent and TNPACK [38], followed by a 1-ns MD simulation in NPT ensemble [39], and a 1-ns NVT MD simulation with a velocity-rescaling thermostat found to sample the canonical ensemble [40].

The GROMACS program (version 5.03) was used to perform the MD simulations with periodic boundary conditions [41]. The bond lengths with hydrogen atoms were fixed with the LINCS algorithm and the equations of motion were integrated with a time step of 2 fs [42]. The electrostatic interactions were calculated using the particle mesh Ewald method and a cutoff of 1.1 nm [43]. A cutoff of 1.2 nm was used for the van der Waals interactions. The nonbonded pair lists were updated every 10 fs. In what follows, the analysis is based on MD simulation of 100 ns at 300 K. The MD-generated structures were evaluated with respect to the minimized structure using the Cα root-mean-square deviation (RMSD), and clustered using a RMSD cutoff of 3 Å.

Model evaluation

The quality of the models was evaluated by using the meta-server PSVs 1.5, which analyzes the Ramachandran plot and provides Molprobity and Procheck scores [44]. Following CASP10 recommendations [45], models were also investigated by ModFOLD4 [46], which returns the estimation of both the global and local (per-residue) quality of 3D protein models.

Hydrophobicity analysis of active sites was performed using the Kyte and Doolitle scale [47], as implemented in Chimera [48], and electrostatics were calculated using the PDB2PQR on-line facility, which performs an analysis of Poisson–Boltzmann electrostatics calculation [49].

For two protein targets, we also examined the vibrational frequency modes of T. cruzi and H. sapiens models by using the Elnemo [50] and WEBnm@ [51] on-line services.

Results and discussion

Table S1 in Supplementary Materials presents the 41 targets considered in this study, with their identification numbers in T. cruzi genome, amino acid lengths, family descriptions, PFAM domains, and predicted EC numbers. Our predicted EC numbers are consistent with the original genome function annotation list from GeneDB [52]. Using our methodology, we are able to cluster these 41 sequences identified by Capriles et al. as analogous or specific for T. cruzi versus H. sapiens into seven groups based on PFAM analysis and enzymatic activity. The group 4 with 8 sequences related to cruzipain (EC: 3.4.22.51) and the group 5 related to triacylglycerol lipase (EC: 3.1.1.3) with 1 sequence are eliminated after PFAM protein domain analysis because they include large domains (>250 amino acids) of unknown functions (e.g., DUF3586 in cruzipain) that lack any sequence homology with available PDB structures and cannot be modeled by de novo in silico methods [5355] at the moment. The remaining five groups are related to ATPase (EC: 3.6.3.6, two sequences), trypanothione disulfide reductase (EC: 1.8.1.12, two sequences), 2,4-dienoyl-CoA reductase (EC: 1.3.1.34, two sequences), leishmanolysin (EC: 3.4.24.36, 23 sequences), and cysteine synthase (EC: 2.5.1.47, two sequences). In the first three groups, as one sequence is a fragment of the other, we analyzed the sequence of larger size. Both cysteine synthase proteins have 332 residues and share 98 % sequence identity, and one sequence was modeled. The 23 sequences in the leishmanolysin group have between 516 and 621 amino acids, and as explained below, we modeled the structure of one sequence.

Table 1 summarizes the main results of our procedure for the five sequences considered for 3D structure and refinement. For each target, we give its predicted cell localization, whether we found significant homology hit with H. sapiens and distant organisms in terms of sequences and PDB structures, and finally the amino acid alignment between the T. cruzi and homologous structures. Among these five sequences, only the trypanothione disulfide reductase and the cysteine synthase sequences have clear homologs in H. sapiens. We now describe the details of our analysis for each protein and its implication for drug design.

Table 1 Cluster representatives

ATPase (predicted EC: 3.6.3.6)

The T. cruzi ATPase sequence of 898 amino acids has a clear homolog (36 % sequence identity) with an H+-transporting ATPase in Arabidopsis thaliana (PDB: 3B8C [56], region 7–885, EC 3.6.3.6). The T. cruzi sequence has 26 % sequence identity with an ATPase in Oryctolagus cuniculus (PDB: 3AR4 [57], region 1–994, EC: 3.6.3.8), and 23 % with a Na+/K+ transporting ATPase in Mus musculus (region 9–851, EC: 3.6.3.9). There is no clear homolog in H. sapiens, as there is at most 24 % sequence identity with a calcium transporting ATPase (region 7–872, EC: 3.6.3.8).

Using the X-ray structure from A. thaliana (PDB: 3B8C) as a template for T. cruzi and H. sapiens sequences, there is a Cα RMSD of 1.2 Å between the T. cruzi model and the template, and 1.0 Å between the T. cruzi and H. sapiens structures. Figure S1 shows the superposition of the T. cruzi model on the 3B8C template. However, among five residues important for ACP (phosphomethylphosphonic acid adenylate ester) binding in 3B8C, three residues are conserved (F385 (453), K408 (479), and R441 (522) in T. cruzi (H. sapiens) amino acid numberings), but L443 in T. cruzi is changed to G517 in H. sapiens, i.e., changing the flexibility of the backbone, and more importantly, the K423 residue in H. sapiens is mutated to D359 in T. cruzi, i.e., changing the electrostatic feature of the active site. This makes the choice of this template questionable.

Because the X-ray structure from A. thaliana has a very low resolution (3.6 Å), the X-ray structure from Oryctolagus cuniculus (PDB: 3AR4) with a resolution of 2.15 Å is definitely a better structural template and the sequence alignment shown in Figure S2 was used for the homology modeling and MD refinement. Figure 2a shows the superposition of the T. cruzi structure on the 3AR4 template with a Cα RMSD of 0.2 Å (free of any minimization) with the location of the six transmembrane α-helices (residues 50–99, 219–284, 614–644, 664–725, 749–799, and 823–854 in T. cruzi). Stability of the T. cruzi structure was assessed by 100 ns MD simulation at 300 K, which led to a mean RMSD of 5.3 Å from the minimized structure (Fig. 2d). Figure 2e shows the superposition of the most populated cluster representing 85 % of the conformational MD ensemble on the minimized structure. The minimized and all MD generated structures have good but lower quality than the structure arising from comparative modeling using Ramachandran and ModFOLD4 metrics (Fig. S3).

Fig. 2
figure 2

a ATPase (EC 3.6.3.6). T. cruzi model (cyan), superimposed on O. cuniculus structure (PDB: 3AR4, orange) in explicit membrane environment. b Top: Focus on the active site using the template (orange) with ATP (gray) of T. cruzi model (cyan) in right and H. sapiens model (green). Middle: Electrostatic potential (Kb.T.ec−1) – blue: negative, red: positive. Bottom: Hydrophobicity – orange: hydrophobic, blue: hydrophilic. c Varying amino acids of the active site in the three species. d RMSD variation as the function of time (ns), relative to the initial structure. e Superimposition of the initial structure (cyan) and most populated cluster (red) over a 100-ns period molecular dynamics

The T. cruzi sequence is predicted to be a P-type H+-exporting ATPase (EC: 3.6.3.6) found in the oxidative phosphorylation pathway and allowing proton transportation across the plasma membrane to generate the electrochemical potential gradient of cells. P-type ATPases catalyze the selective active transport of ions like H+, Na+, K+, Ca2+, Zn2+, and Cu2+ across diverse biological membrane systems, and several molecules have been shown to inhibit ATPase activity to different degrees [58]. For instance, artemisinin was found to inhibit growth of cultured T. cruzi at concentrations in the low micromolar range and inhibit Ca2+-dependent ATPase activity in T. cruzi membrane [59]. Also, miltefosin was found to inhibit the Na+-ATPase. This compound was, however, also found to inhibit the protein kinase C present in the plasma membrane of T. cruzi [60].

The alignment of the T. cruzi and H. sapiens structures (RMSD of 1.2 Å) using an O. cuniculus template suggests that the machinery for the dynamic and rotational motion of the cytoplasmic-compartment sector with respect to the membrane sector, driven by the H+ electrochemical potential gradient, is conserved. This is supported by normal mode analysis of the two systems, free of ATP ligand, since the lowest frequency modes are almost identical (data not shown). Figure 2b zooms the active sites in the template, T. cruzi and H. sapiens and shows their hydrophobic and electrostatic surfaces. As seen, while the hydrophobic surface is slightly changed (bottom panel), the electrostatics potential (middle panel) is completely changed from T. cruzi to H. sapiens, being more negative in H. sapiens than in T. cruzi. Figure 2c shows that four residues of the active site crucial for ATP binding are strictly conserved among the template, T. cruzi and H. sapiens, i.e., F385, K408, R441, and L443 in T. cruzi amino acid numbering. As seen in Fig. 2b (bottom panel), the mutated residues between T. cruzi and (H. sapiens) are D359 (K423), L361 (T425), and T392 (M460). Interestingly, these three mutations observed in T. cruzi are strictly conserved in T. rangeli, T. vivax, T. brucei, T. congolense, B. saltans, L. major, T. grayi, and G. theta, indicating a strong evolutionary pressure of these positions in related species.

Whether these three mutations in or near the active site that change the electrostatic potential can govern the selectivity of the ATP-driven pumps for a drug in T. cruzi with respect to H. sapiens, and make the ATPase a suitable candidate, as suggested by the chemogenomics resource for neglected tropical diseases (TDR) [61] and cell surface proteome analysis of human-hosted T. cruzi life stages [62], remains to be explored.

Trypanothione reductase (predicted EC: 1.8.1.12)

The 492-amino-acid trypano-thione reductase (TryR) in T. cruzi has a clear homolog in Trypanosoma brucei (PDB: 2WPF [63], EC: 1.8.1.12) with a sequence identity of 82 % over the amino acid region 1–488. The T. cruzi sequence also has 42 and 39 % sequence identities with two glutathione disulfide reductase with EC: 1.8.1.7 in Pseudomonas aeruginosa and Nostoc punctiforme, and 36 % sequence identity with a glutathione reductase (EC: 1.8.1.7) in H. sapiens. As a result, the X-ray structure (PDB: 2WPF), consisting of a homo-dimer with 2 × 488 amino acids, was selected as structural template and the sequence alignment shown in Figure S4 was used for the homology modeling of the T. cruzi and H. sapiens sequences. The superposition of the 3D models created by HHalign-Kbest for H. sapiens and T. cruzi on the template leads to small RMSD deviations of 1 and 0.6 Å, respectively (Fig. 3a, b). Analysis of the 100-ns MD simulation of the T. cruzi model at 300 K shows that the system is very stable (Fig. 3d), with one unique cluster and a mean RMSD of 2.5 Å from the full homo-dimer minimized structure (Fig. 3e). As for ATPase models, the MD generated structures of trypanothione reductase have lower quality than the structure obtained by comparative modeling as evaluated by the Ramachandran and ModFOLD4 metrics (Fig. S3). Under the harmonic approximation, the T. cruzi and H. sapiens structures show identical atomic fluctuations and deformation energies projected onto the calculated normal modes along the amino acid sequence (data not shown).

Fig. 3
figure 3

a Dimer structure of the trypanothione disulfide reductase (EC 1.8.1.12). T. cruzi model (cyan), superimposed on T. brucei structure (PDB: 2WPF, orange) and H. sapiens model (green). Gray depicts the second monomer. b Focus on only chain A of T. cruzi and H. sapiens superimposed on template structure. c Focus on the active site. Top left: Template (orange) with WPF inhibitor (gray) and T. cruzi (cyan). Top right: T. cruzi and H. sapiens (green). Middle: Electrostatic potential (Kb.T.ec−1) – blue: negative, red: positive. Left: T. cruzi, right: H. sapiens. Bottom: Hydrophobicity, orange: hydrophobic, blue: hydrophilic. Left: T. cruzi, right: H. sapiens. d RMSD variation as the function of time (ns), relative to the initial structure. e Superimposition of the initial structure (cyan) and most populated cluster (red) over 100-ns period molecular dynamics

Trypanothione reductase is an essential enzyme of the unique trypanothione-based thiol metabolism of Trypanosomatidae. Trypanothione is an unusual form of glutathione containing two molecules of glutathione joined by a spermidine (polyamine) linker. TryR is a flavoenzyme protein that catalyzes the reaction trypanothione + NADP+ ↔ trypanothione disulfide + NADPH + H+ [64]. Trypanoso-momatids lack both glutathione reductase and thioreductase, and as a result, TryR is the only connection between NADPH- and thiol-based redox systems, the latter substituting for many antioxidant functions [65].

The active sites in T. cruzi and T. brucei have both hydrophilic and hydrophobic characteristics, while the active site in H. sapiens is highly hydrophobic (Fig. 3c, bottom). Further analysis of the active sites shows that all residues binding the WPF ligand (3,4-dihydroquinazoline inhibitor) are conserved between T. cruzi and T. brucei, while only a single cysteine at positions 53 and 42 is conserved between T. cruzi and H. sapiens (Fig. 3c, top), changing therefore drastically the electrostatic potential surface (Fig. 3c, middle. Clearly, the low amino acid conservation of the catalytic site between Trypanosomatidae and humans, and the importance of this enzyme for protozoan parasites, make this protein target very attractive for drug development.

It has to be emphasized that previous studies have reported T. cruzi trypanothione reductase inhibitors such as the antimicrobial chlorhexidine and a piperidine derivative [66]. Recently, Lavorato et al. described the antitrypanosomal activity and cytotoxicity profile of 20 novel 1,3-bis(aryloxy)propan-2-amine derivatives as new candidates for further development as potential anti-trypanosomal agents [67]. Also, 82 novel TryR inhibitors down to the nM range were identified by using a combined in vitro and in silico screening approach [68].

2,4-dienoyl-CoA reductase (predicted EC: 1.3.1.34)

The 717-amino-acid T. cruzi protein shares 31 % sequence identity with the 2,4-dienoyl-CoA reductase (DECR1, EC: 1.3.1.34) from E. coli (region 1–671) with a structure determined by X-ray (PDB: 1PS9) at a 2.2 Å resolution [69]. The cellular localization and EC number in T. cruzi are predicted to be identical to those in E. coli. The DECR1 enzyme participates in the beta-oxidation and metabolism of polyunsaturated fatty enoyl-CoA esters. DECR1 catalyzes the reaction trans-2,3-dehydroacyl-CoA + NADP+ trans,trans-2,3,4,5-tetradehydro acyl-CoA + NADPH + H+. The T. cruzi sequence has also 34 % sequence identity with a protein in Pseudomonas aeruginosa with the same EC number (1.3.1.34). Searching in the proteome of H. sapiens, the T. cruzi sequence shares only 19 % sequence identity with two proteins differing by EC numbers (4.2.1.22 and 4.1.1.17) and covering only the amino acid region 391–717.

Using the sequence alignment shown in Fig. S5 for homology modeling, the superposition of the T. Cruzi model (sequence 4–717) on the 1PS9 structure (sequence 1–671) has a RMSD of 0.3 Å prior to energy minimization (Fig. 4a). Analysis of the 100 ns MD simulation at 300 K shows that the T. cruzi model fluctuates about 5 Å from its minimized structure (Fig. 4c), and the large RMSD deviation comes from the long loop regions that need to be introduced from residues 666 to 717 (Fig. S5). Analysis of the MD-generated structures leads to two clusters, with the first cluster representing 90 % of the ensemble (Fig. 4d) and high quality as assessed by ModFOLD4 values (Fig. S3).

Fig. 4
figure 4

a 2,4-dienoyl-CoA reductase (NADPH) (EC: 1.3.1.34). T. cruzi model (cyan), superimposed on E. coli structure (PDB: 1PS9, orange). b Focus on the active site. Top: Template (orange) with FMN inhibitor (gray) and T. cruzi (cyan). Middle: Electrostatic potential (Kb.T.ec−1) – blue: negative, red: positive. Bottom: Hydrophobicity – orange: hydrophobic, blue: hydrophilic. c RMSD variation as the function of time (ns), relative to the initial structure. d Superimposition of the initial structure (cyan) and most populated cluster (red) over a 100-ns period molecular dynamics

There are currently no chemical compounds associated to the T. cruzi gene, as reported in the TDR database. Most catalytic residues that allow FMN (Flavin mononucleotide) substrate binding are conserved between the T. cruzi and E. coli sequences (G57, Q100, R214, R288, A330, and R331 in E. coli vs. G68, Q111, R230, R310, A332, and R333 in T. cruzi) and lead to a positive electrostatic field (Fig. 4b). The two residues S24 and H26 in E. coli are changed into P30 and Y32 in T. cruzi (Fig. 4b), but these two mutations are not sufficient to alter the polar features of the binding site, located in an inner enzyme region (Fig. 4c). Interestingly, humans have an enzyme for the same function, but with a completely 3D structure (homotetramer, each of 302 amino acids, PDB 1W6U), and a distinct catalytic center [70], making DECR1 very suitable for the design of drugs specific to T. cruzi.

Cysteine synthase (predicted EC: 2.5.1.47)

The T. cruzi sequence of 332 amino acids has a clear homolog (73 % sequence identity) with a cysteine synthase of 324 amino acids from Leishmania donovani (region: 2–319, EC: 2.5.1.47, PDB: 3TBH at a 1.74-Å resolution [71]). The T. cruzi sequence also shares 55 % sequence identity with a cysteine synthase in Arabidopsis thaliana (EC: 2.1.1.47) and 33 % sequence identity with a human protein (region 6–332 with EC: 4.2.1.22, a cystathionine β-synthase). As a result, the homodimer structure from Leishmania donovani was selected as structural template and the sequence alignment shown in Figure S6 was used for the homology modeling of the T. cruzi protein. Figure 5a and b show the superposition of the predicted HHalign-Kbest model on the Leishmania donovani structure with a RMSD of 0.2 Å. Analysis of the 100-ns MD simulation at 300 K of the homodimer of T. cruzi shows an averaged RMSD deviation of 4.5 Å with respect to the minimized structure (Fig. 5d), with the most populated cluster, representing 70 % of the conformational ensemble (Fig. 5e) and with high quality as measured by the PSVS metrics (Fig. S3).

Fig. 5
figure 5

a Dimer structure of the cysteine synthase (EC: 2.5.1.47). T. cruzi model (cyan), superimposed on L. donovani structure (PDB: 3TBH, orange). Gray depicts the second monomer. b Focus on only chain A of T. cruzi superimposed on template structure. c Focus on the active site. Top: Template (orange) and T. cruzi (cyan). Middle: Electrostatic potential (Kb.T.ec−1) – blue: negative, red: positive. Bottom: Hydrophobicity – orange: hydrophobic, blue: hydrophilic. d RMSD variation as the function of time (ns), relative to the initial structure. e Superimposition of the initial structure (cyan) and most populated cluster (red) over 100-ns period molecular dynamics

There is currently no chemical compound associated to the T. cruzi gene, as reported in the TDR database [61]. Cysteine synthase is a key constituent for the survival of trypanosomatids, and thus vital to the survival of T. cruzi in its vertebrate hosts. It constitutes one of the key pathways in the parasite defense against oxidative stress. Two different routes for cysteine biosynthesis have been described: reverse-transsulfuration (RTS) and de novo pathways. RTS has been demonstrated in fungi and mammals and includes the complete process leading to cysteine from methionine via the intermediary formation of cystathionine. These reactions are catalyzed by two enzymes: CβS (cystathionine β-synthase) and CGL (cystathionine γ-lyase). The de novo pathway, catalyzed by two steps starting with serine acetyltransferase to form O-acetylserine from l-serine and acetyl-coenzyme A, is found in plants, bacteria, and some protozoa, such as Entamoeba histolytica, Leishmania major, and Leishmania donovani, but is absent in mammals [72].

T. cruzi sequence is a clear homolog of an O-acetylserine sulfhydrylase from L. donovani. The active site with the Asp-Gly-Ser-Gly-Ile ligand is fully conserved in terms of amino acids and hydrophilic character from T. cruzi to L. donovani, with only a single amino acid change (T80 changed to S79) (Fig. 5c). As shown in Fig.S6, the residues involved in the active sites with cysteine synthase and cystathionine β-synthase activities are conserved in T. cruzi, L. donovani, and H. sapiens and the differences in the physico-chemical properties of the residues flanking the active sites are minor (Fig. S7). Clearly, the high conservation of the active site and the presence of this enzyme in humans do not make this protein a good target for drug design specific to T. cruzi. However, there have been several reports of inhibitor screening and development against this molecule from different organisms such as Entamoeba histolytica and Mycobacterium tuberculosis, using an off-catalytic site strategy, and it was suggested that among the open form, the intermediate state and the closed form of the enzyme, the intermediate state of the enzyme might be the ideal target for the design of very effective high-affinity inhibitors [73].

Leishmanolysin (predicted EC: 3.4.24.36)

The superfamily of leishmanolysin consists of 23 T. cruzi sequences with 516–621 amino acids and cross-sequence identities varying between 37 and 96 %. Multiple sequence alignment using ClustalW [74] shows that 22 sequences (except the smallest of 516 amino acids with a truncated N-terminus) have a common domain of 565 amino acids with > 30 % sequence identity and, among these, there are seven sequences of 567 amino acids and seven sequences with > 570 amino acids displaying either longer N-terminus (two sequences), longer C-terminus (four sequences), or longer N- and C-termini (1 sequence). The N-terminus can be extended by up to 56 amino acids.

Taking the TC00.1047053511211.90 sequence of 567 amino acids as a representative of the 23 T. cruzi sequences, the region 77–527 shares 37 % sequence identity with a leishmanolysin from Leishmania major (region: 103–558, EC: 3.4.24.36, PDB: 1LML at a 1.8-Å resolution [75]). This sequence has at most a sequence identity of 25 % with a protein from H. sapiens with EC = 4.2.1.22 (region 69–603), but the alignment shows a gap percentage of 16 %. Based on this result, the X-ray structure 1LML was selected as structural template of the region 77–527, and it remained to determine whether the missing residues 1–76 and 528–567 could be modeled by other means. The N-terminus cannot be easily modeled since leishmanolysin is a membrane-bound zinc proteinase, active in situ, but the exact anchorage of the N-terminus to the membrane is unknown. Using the best secondary structure prediction methods, i.e., C8-Scorpion and Porter 4.0, the region 528–567 is predicted to be free of any secondary structure. Similarly, the C-terminus region is predicted as disordered using PrDOS, preventing therefore its 3D modeling. Note that this intrinsic disorder of the C-terminus is shared by the other 21 sequences.

The sequence alignment shown in Figure S8 was used for the homology modeling, and the superposition of the T. cruzi structure on its template (with a RMSD of 0.4 Å) is shown in given in Fig. 6a. Fig. 6B zooms the active site in the template and T. cruzi and shows the high hydrophilic character and negative electrostatic field of the active site. The three histidines essential for Zn2+ binding are conserved in both species, as well as the residues Glu and Met. Stability of the T. cruzi model complexed by Zn2+ was assessed by 100-ns MD simulation at 300 K. It is found that the system has an average RMSD of 5.5 Å from the minimized structure (Fig. 6c), the most populated cluster representing 50 % of the conformational MD (Fig. 6d) having very good quality using PSVS and ModFOLD4 metrics (Fig. S3).

Fig. 6
figure 6

a Leishmanolysin (EC: 3.4.24.36). T. cruzi model (cyan), superimposed on L. major structure (PDB: 1LML, orange). b Focus on the active site. Top: Template (orange) with Zinc Ion (gray) and T. cruzi (cyan). Middle: Electrostatic potential (Kb.T.ec−1) – blue: negative, red: positive, gray: Zn ion. Bottom: Hydrophobicity – orange: hydrophobic, blue: hydrophilic. c RMSD variation as the function of time (ns), relative to the initial structure. d Superimposition of the initial structure (cyan) and most populated cluster (red) over 100-ns period molecular dynamics

Leishmanolysin is the predominant glycoprotein surface antigen of promastigotes of various species of Leishmania. The crystal structure has shown three domains, two of which had novel folds at that time, and the N-terminal domain has a similar structure to the catalytic modules of zinc proteases, revealing therefore that leishmanolysin is a member of the metzincin class of zinc proteinases [75]. The similarity of the active site structure and amino acid composition in T. cruzi model to previously well-characterized metzincin-class zinc proteinases suggests to test known zinc-metallopeptidase inhibitors, such as 1,10-phenanthroline or peptide-based inhibitors [76].

We would like to emphasize that for the five targets, we have evaluated the stability of the residues in the binding site in bound state by calculating the Cα RMS deviations of the active site residues in the minimized structure and the most populated (best) MD cluster with respect to their starting positions. All residues in the active site remain at their native positions although the ligands (ATP in ATPase, WPF in trypanothione reductase, FMN in NADPH and Asp-Gly-Ser-Gly-Ile in cysteine synthase) are absent in the simulations, with most residues deviating by less than 0.3 Å and at most by 1 Å. Also we have checked that we have reasonable contacts between the designed protein receptors and their natural ligands by analyzing the Cα and Cβ deviations of the active site residues in the T. cruzi minimized structure and most populated MD cluster with respect to the PDB structure used for homology without any minimization. Small values of 0.1–0.2 Å indicate that the contact maps for the designed T. cruzi minimized proteins binding to their corresponding ligands are very similar to the contacts maps of the homolog proteins. The largest values are expected and observed for the designed T. cruzi most populated MD cluster, but all remain below 1.5 Å. Finally, we have verified that the dynamics does not change the hydrophobic/hydrophilic character and the electrostatic potential of the active sites; these properties being essentially conserved from the minimized structure to the most populated MD cluster.

Conclusions

The identification of specific T. cruzi protein targets for drug development is desirable. In recent years, drug discovery for Chagas disease has evolved from limited compound testing in manual assays to sophisticated in vitro and in vivo assays. Drug discovery must consider that T. cruzi is a genetically diverse organism with phenotypically variations, and inhibitor susceptibility is a well-established phenotypic variation both in vitro and in vivo [77]. In the next 5 years, 200,000 people living with Chagas disease will die from heart disease and related complications, and the BENEFIT trial pursues a global initiative of diagnosis, treatment, and research [78].

In this study, by following a complex bioinformatics procedure, we have proposed structural models for five T. cruzi enzymes with ATPase, trypanothione reductase, 2,4-dienoyl-CoA reductase, cysteine synthase, and leishmanolysin functions. Based on sequence and active-site similarity with several organisms and notably H. sapiens, we show that these five protein targets, except cysteine synthase, are very attractive for drug development against T. cruzi. This development based on computer simulations is not straightforward, however, because it is important to explore a much larger dynamic time-scale of the enzymes and enzymes/inhibitors using advanced simulation methods so as to evaluate the correct binding pose [79, 80] and time-consuming free energy procedures in order to predict accurate binding affinity that goes beyond simple scores [8183].