Abstract
All eukaryotes express mitogen-activated protein kinases (MAPKs) that govern diverse cellular processes including proliferation, differentiation, and survival. Even though these proteins are highly conserved throughout nature, MAPKs from closely related species often possess distinct signature sequences, making them well suited as drug discovery targets. Based on the central amino acid in the TXY dual phosphorylation loop, mammalian MAPKs are classified as extracellular signal-regulated kinases (ERKs), c-Jun amino-terminal kinases (JNKs), or p38 stress-response MAPKs. The presence of MAPKs in nonmetazoan eukaryotes suggests significant evolutionary conservation of these important signalling pathways. We recently cloned a novel stress-response MAPK gene (tgMAPK1) from Toxoplasma gondii, an obligate intracellular human parasite that can cause life-threatening infections in immunocompromised patients, and we now present data on a second T. gondii MAPK gene (tgMAPK2) that we cloned. We show that tgMAPK1 and tgMAPK2 are members of two distinct and previously unknown protozoan MAPK subfamilies that we have named pzMAPKl/pzMAPK3 and pzMAPK2. Our phylogenetic analysis of a collection of protozoan and metazoan MAPK genes in relation to ERK8-like genes demonstrates that an ERK8-like family, which includes the pzMAPK2 subfamily, is represented across a large variety of eukaryotic kingdoms and is evolutionarily very distant from other MAPK families.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Mitogen-activated protein kinases (MAPKs) play pivotal roles in signal transduction pathways and have been identified in all eukaryotes examined thus far. MAPKs are part of the eukaryotic protein kinase (ePK) superfamily consisting of seven major groups: (1) the cyclic nucleotide and calcium/phospholipids-dependent kinases (the AGC group); (2) the CMGC group, comprising the (2a) cyclin-dependent kinases (CDK), (2b) MAPKs, (2c) glycogen-synthase kinases (GSK), and (2d) CDK-like kinases; (3) calmodulin-dependent kinases (CaMK); (4) tyrosine kinases (TyrK); (5) the casein kinase 1 (CK1) group; (6) the STE group, which includes many signaling molecules functioning in MAPK pathways; and (7) the tyrosine kinase-like (TKL) group, which includes enzymes that are related to those in the TyrK group, although, they are serine-threonine protein kinases (Hanks 2003; Hanks and Quinn 1991).
All MAPKs are proline-directed serine/threonine kinases that phosphorylate substrates with a proline residue in the P + 1 site of the substrate recognition consensus motif ψX[S,T]P, where ψ represents either proline or an aliphatic amino acid (Clark-Lewis et al. 1991). All MAPKs must possess the [LIVM][TS]XX[LIVM]XT[RK][WY]YRXPX [LIVM] [LIVM] signature sequence, where X represents any amino acid (Kültz 1998). Historically, mammalian MAPKs have been classified as extracellular signal-regulated kinases (ERKs), c-Jun amino-terminal kinases (JNKs), or p38 stress-response MAPKs, based on whether the central amino acid in the TXY dual phosphorylation loop (Roux and Blenis 2004) is a glutamic acid, proline, or glycine, respectively (Martin-Blanco 2000).
In 1998, Kültz produced a comprehensive phylogenetic and functional analysis of all MAPK sequences that had been identified at that time. This analysis included 93 sequences, primarily representing animals, plants, and fungi. Based on the early divergence of organisms within these three kingdoms, the MAPK family of proteins is thought to have evolved 1.0 to 1.5 billion years ago. Kültz’s analysis determined that the MAPK family could be subdivided into three major subgroups: the stress-activated protein kinases (SAPKs), the extracellular signal-regulated kinases (ERKs), and a third group, which he called the MAPK3 subgroup. The first two groups reproduced the traditional dichotomy, with members of the ERK subgroup sharing the TEY dual phosphorylation motif and members of the SAPK subgroup represented by either the TGY or TPY motifs. The final group largely consisted of “oddball” sequences that did not share a high degree of similarity with either of the other two MAPK subgroups. Among the sequences in the MAPK3 group were five protozoan sequences: two from Trypanosoma brucei, one from Leishmania mexicana, and two from Plasmodium falciparum. These were intermingled with unusual human and rat MAPKs, as well as sequences from Caenorhabditis elegans and Dictyostelium discoideum. Kültz’s phylogenetic results for the protozoa indicated little similarity among most of these sequences, and there was no evidence to suggest the existence of one or more distinct protozoan subfamilies.
As increasing numbers of MAPKs have been discovered in organisms of diverse evolutionary origin, it has become clear that a more robust classification system is needed. Since the time of Kültz’s study, there has been considerable progress in the discovery and characterization of protozoan MAPKs, particularly with respect to the Apicomplexa. Additional MAPK sequences have been identified in P. falciparum (Carlton et al. 2002), Cryptosporidium hominis (Xu et al. 2004), Cryptosporidium parvum (Abrahamsen et al. 2004), and Theileria annulala (Pain et al. 2005), and we have recently characterized a MAPK in T. gondii (Brumlik et al. 2004). Motivated by our identification of a second MAPK in T. gondii, we have analyzed a collection of Apicomplexan MAPK sequences along with a diverse sample of other protozoan and non-protozoan MAPKs. The phylogenetic results presented in this report offer a reevaluation of Kültz’s overly broad “MAPK3” subgroup, with our findings strongly supporting the existence of at least two distinct protozoan subfamilies.
Materials and Methods
Toxoplasma gondii Sequence Identification and Selection
We translated the T. gondii genomic database (Release 3.0; http://www.toxodb.org), which affords 10-fold coverage of the avirulent Me49 genome, in all possible open reading frames (ORFs) that were greater than 50 codons in length. In our previous study, we used the highly conserved MAPK motif HRDLKP to screen for tgMAPK1, which was the first MAPK gene to be identified and characterized in T. gondii (Brumlik et al. 2004). In the current study we wished to identify any possible homologues of tgMAPK1 using a slightly less stringent screen. In order to do this, we rescreened the T. gondii genomic database for genes encoding a HRD[LIVM]KP motif instead. In order to ensure that the presence of introns did not obscure our screen, we additionally screened the genome using other highly conserved MAPK motifs (GXGXXG, DFGLXR, RWYRXP) to ensure complete coverage of all possible ORFs in the T. gondii genome. This screening strategy identified a second MAPK gene candidate (designated tgMAPK2) which possessed the classical TXY dual phosphorylation motif (Hanks and Quinn 1991; Roux and Blenis 2004). We subsequently cloned this gene by reverse-transcriptase polymerase chain reaction (RT-PCR), using total RNA that was isolated from both the avirulent Me49 strain as well as the virulent RH strain. The primers used for PCR are listed in Table 1, using the same RT-PCR methodology we described previously (Brumlik et al. 2004). Following gel purification, amplicons were either directly sequenced using an ABI 377 Automated Sequencer (Applied Biosystems, Foster City, CA) or were first cloned into a pCR2.l-TOPO vector (Invitrogen, Carlsbad, CA), prior to sequencing. Primers used for RT-PCR were also utilized for sequencing (Table 1). We additionally used the forward and reverse sequencing primers that were provided with the pCR2.1-TOPO vector kit (Invitrogen).
Sequence Selection
Representative sequences from plants, animals, and fungi were chosen from major subfamilies in the ERK and SAPK families to reflect the diversity within each of these taxonomic groups, identified by previously reported accession numbers (Kültz 1998). To incorporate recently identified MAPKs, we used the predicted amino acid sequence encoded by the tgMAPK1 and tgMAPK2 genes to perform BLASTp analyses against all sequence databases available at the National Center for Biotechnology Information (Altschul et al. 1997). We also performed BLASTp searches of the Apicomplexan Database Resources (http://www.apidb.org), a NIH/NIAID funded Bioinformatics Resource Center, to search for additional MAPKs not yet included in the NCBI databases. Our search of these databases detected a third putative T. gondii MAPK which has recently been annotated (T. gondii genomic database; release 3.0; http://www.toxodb.org), a sequence with a TGH motif, that can only be activated by a single phosphorylation event at its threonine. We have designated this MAPK TgMAPK3 for the purposes of this analysis.
After filtering out those sequences that did not possess all required residues within the MAPK signature sequence and accounting for duplication in the databases, we were initially left with a very large assortment of over one hundred putative protozoan MAPK sequences. To avoid the inclusion of nearly identical sequences, we restricted this set to include only a single representative species for each genus (P. falciparum for Plasmodium, L. major for Leishmania, C. hominis for Cryptosporidium, etc.), resulting in a collection of 35 protozoan sequences. Since our primary focus was the analysis of the three T. gondii MAPKs and their Apicomplexan MAPK homologues, we selected sequences from other well-studied protozoa based on their similarity to TgMAPK1, TgMAPK2, or TgMAPK3. An analysis based on pairwise sequence similarity identified two sequences in L. major, two sequences in T. brucei, and two sequences in Tetrahymena thermophila as good candidates, and we included these six sequences in our analysis as representatives from non-Apicomplexan protozoan species. The final selection of 32 protozoan and nonprotozoan MAPK sequences is provided in Table 2.
Alignment
Sequences were aligned using ClustalW version 1.83 (Thompson et al. 1994). Due to the considerable variation in sequence length among the 32 sequences aligned (see Table 2) and the sensitivity of our alignment to slight changes in the ClustalW gap and mismatch penalty parameters within certain nonconserved regions, blocks of unambiguously aligned positions were selected using Gblocks (Castresana 2000) and the alignment was manually adjusted using BioEdit 7.0.5.2 (Hall 1999). Allowed substitution groups for conserved positions were [LIVM], [DN], [EQ], [ST], [KR], and [FYW], as previously described (Kültz 1998).
Phylogenetic Analysis
Two sets of phylogenetic trees, the first including 10 Apicomplexan sequences (A-Seqs) and the second including 28 selected protozoan, plant, yeast, and metazoan sequences (B-Seqs), were constructed by the methods of maximum likelihood (ML), neighbor-joining (NJ), and Bayesian inference (BI) using PHYLIP version 3.63 (Felsenstein 1989) and MrBayes version 3.1.2 (Ronquist and Huelsenbeck 2003). Phylogenetic results were visualized using the TreeView package (Page 1996).
For the ML analysis, the PROML algorithm was used to estimate the topology and relative branch lengths. The Jones-Taylor-Thornton (JTT) transition matrix (Jones et al. 1992) was selected as the amino acid substitution model, with rate variation across sites modelled according to the gamma distribution along with an invariant class (gamma+I). Five categories were selected to approximate the gamma distribution for rate variation, with one reserved for the invariant sites. The gamma rate α and the proportion of invariant sites p were estimated by ProtTest version 1.2.6 (Abascal et al. 2005), with α = 1.52 and p = 0.129 for the A-Seqs and α = 1.41 and p = 0.135 for the B-Seqs. In each case, we ran the PROML algorithm specifying a slow and thorough analysis, global rearrangements, and random jumbling of the sequences three times. Under the same modelling assumptions, we then ran the PROML algorithm on 100 bootstrap replicates that were obtained using SEQBOOT (Felsenstein 1985) and constructed the majority consensus tree (Margush and McMorris 1981) using CONSENSE. Global rearrangements and multiple sequence jumbling were not selected for the bootstrap datasets because of the considerable computational time that was required for each run.
For the NJ analysis, distances were first estimated using PROTDIST according to the JTT distance matrix with a gamma correction for rate variation. Parameter estimates for the gamma rate variation model (without invariant sites), obtained using ProtTest, were α = 0.935 for the A-Seqs and α = 0.793 for the B-Seqs. Phylogenies were constructed using the NJ algorithm (Saitou and Nei 1987) as implemented in NEIGHBOR. Five hundred bootstrap replicates were obtained using SEQBOOT and analyzed using the same modeling assumptions, and the majority consensus tree was produced using CONSENSE.
The Bayesian analysis was run using MrBayes with default priors. The JTT model with gamma+I rate variation was specified a priori, and four gamma rate categories were used to approximate the distribution. For the A-Seqs, 2 runs of 4 chains each (3 heated, 1 cold) were run for 200,000 generations, with sampling every 100th generation, producing 2000 samples from the posterior distribution, of which the first 500 were discarded as burn-in. The final standard deviation of the split frequencies between the two runs was 0.0036, and mean parameter estimates for the gamma+I model were α = 1.669 and p = 0.139. For the B-Seqs, 2 runs of 4 chains each (3 heated, 1 cold) were run for 500,000 generations, with sampling every 100th generation, and the first 1000 of the resulting 5000 samples were discarded as burn-in. The final standard deviation of the split frequencies was 0.005 with mean parameter estimates of α = 1.403 and p = 0.157.
Results
Identification of a Second T. gondii MAPK Gene, Designated tgMAPK2
We have previously described the cloning of tgMAPK1 from T. gondii and the preliminary characterization of its gene product, a 58-kDa stress-response MAPK (Brumlik et al. 2004). In this report, we have identified a 2.0-kilobase pair (kb) gene, designated tgMAPK2, predicted to encode a 683-amino acid, 73-kDa protein in the virulent RH T. gondii strain (Fig. 1). A stop codon is situated 219 base pairs (bp) upstream from a plausible initiation (ATG) codon and accompanying T. gondii translational initiation site (Seeber 1997), with no alternative initiation codon residing in the intervening region. Moreover, a region predicted to encode a conserved MAPK nucleotide-binding domain closely follows the initiation codon (starting at G20XG22XXG25), consistent with our assignment of the tgMAPK2 initiation codon. The deduced amino acid sequence encoded by the first 305 codons of the tgMAPK2 gene shares 50% amino acid sequence identity with T. gondii TgMAPK1 (Brumlik et al. 2004) throughout all 12 of the conserved domains that are characteristic of MAPKs (Kültz 1998). Remarkably, the deduced amino acid sequence of tgMAPK2 shares an astounding 64% amino acid sequence identity with human ERK8 (Fig. 2). The TDY(178–180) motif is typical of other protozoan MAPKs (with an acidic amino acid between the threonine and the tyrosine residues). Interestingly, the tgMAPK2 gene from the virulent RH strain encodes 11 copies of a VSSSHHG repeat in the C-terminal coding region extension, versus nine copies in tgMAPK2 from the avirulent Me49 strain. In addition, three nonneutral amino acid substitutions are predicted to be encoded by the virulent (RH) T. gondii tgMAPK2 sequence compared to the avirulent (Me49) T. gondii tgMAPK2 sequence. The large number of strain-specific nucleotide polymorphisms in T. gondii is unusual. A total of 46 nucleotides differ between tgMAPK2 in RH and Me49, which conceivably could lead to measurable differences in the biological activities of their encoded TgMAPK2 variants (Fig. 1).
Sequence Alignment
The final alignment of the 32 sequences, which consisted only of regions for which the majority of the sequences could be unambiguously aligned as selected by Gblocks, included 255 positions. The subset of the alignment including the 10 Apicomplexan sequences of central interest to this study is shown in Fig. 2, with the human ERK8 sequence included for reference. Seventy-three positions are invariant or conserved within allowed substitution groups for all 32 sequences included in the alignment. We identified 34 additional positions that are conserved in all 10 of the Apicomplexan sequences, together accounting for 42.1% of the aligned positions. Between the sequences TgMAPK1 and ChMAPK, 77.1% of the positions were conserved, with 70.4% of the positions conserved among the quartet including TgMAPK2, ChMAPK1, PfMAPK1, and TaMAPK and 72.3% of the positions conserved among the non-TXY quartet including TgMAPK3, ChMAPK2, PfMAPK2, and TaMAPK2.
Phylogenetic Analysis
We first focused on the 10 Apicomplexan sequences, constructing unrooted trees using the methods of BI, ML, and NJ as described. The Bayesian consensus topology and the posterior probabilities associated with each node are shown in Fig. 3. The phylogeny groups the Apicomplexan sequences into the three clades suggested by the alignment: ApiMAPK1 (TgMAPK1, ChMAPK), ApiMAPK2 (TgMAPK2, ChMAPK1, PfMAPK1, TaMAPK), and ApiMAPK3 (TgMAPK3, ChMAPK2, PfMAPK2, TaMAPK2). The ML and NJ methods both reproduced this trichotomy, with bootstrap support values for (ApiMAPK1, ApiMAPK2, ApiMAPK3) of (76%, 100%, 100%) and (95%, 100%, 100%) respectively. The topology of the ApiMAPK3 clade was also consistently reconstructed by all three methods. There was, however, some discrepancy among the methods regarding the arrangement of the four sequences within ApiMAPK2, with the NJ analysis suggesting an alternative topology. The ML phylogeny was in agreement with the BI topology for ApiMAPK2, but the bootstrap support was low (values for the two internal nodes were 57% and 62%) and removing the two internal edges only decreased the log-likelihood from −3565.1 to −3570.0, an insignificant difference according to the Kishino-Hasegawa-Templeton test. We have therefore dashed the internal edges within this clade to indicate that the consensus among the three methods is represented by a tetrachotomy at the root of ApiMAPK2.
After confirming the three Apicomplexan clades as suggested by the alignment, we selected the pairs (TgMAPK1, ChMAPK), (TgMAPK2, ChMAPK1), and (TgMAPK3, ChMAPK2) as representatives of ApiMAPK1, ApiMAPK2, and ApiMAPK3, respectively. These sequences were analyzed with 6 other protozoan sequences and 16 sequences from mammals, plants, yeast, and other metazoans (see Table 2). The consensus tree from the Bayesian analysis is shown in Fig. 4. This tree has a central tetrachotomy and reproduces the (YSAPK, SAPK) and (YERK, ERK) clades identified by the Kültz analysis. The third major clade joins metazoan ERK8-like sequences (HsERK8, RnERK7, CeERK8, DmERK8) to a monophyletic group including the representative Apicomplexan sequences from ApiMAPK2 (TgMAPK2 and ChMAPK1) and the protozoan pair (TbMAPK2, LmMAPK2). We call this protozoan clade pzMAPK2. The posterior probability of the node creating the pzMAPK2/ERK8 clade is 1.00, and this clade is separated from the central tetrachotomy by a long internal edge. The fourth clade includes the plant MAPK sequences (PERK) and the remaining protozoan sequences, joining these clades with posterior probability 0.99. Within the protozoa, the two representative sequences from the ApiMAPK3 are joined with TtMAPK3 (another sequence with the TGH motif) with posterior probability 1.00, creating a clade that we call pzMAPK3. The pair (LmMAPK4, TbMAPK4) is then joined with this clade with posterior probability 0.78, while TtMAPK1 is joined with the ApiMAPK1 sequences with posterior probability 0.88. We call the collection of five non-TXH protozoan sequences pzMAPK1.
The ML and NJ analyses produced trees consistent with the BI consensus tree shown in Fig. 4, with the exception that both methods suggested a different topology for the four sequences within the PERK group (a bifurcating root node connecting the pairs [AtMPK1, CrMAPK] and [OsMAP1, NtWIPK]). However, none of the three methods strongly supported a specific PERK topology (the BI posterior probability was 0.58, while bootstrap support values for both the NJ and the ML nodes were under 60%), suggesting that the PERK topology is best represented by a tetrachotomy. We have indicated this in Fig. 4 by dashing the internal edges within the PERK clade.
While the three methods all agreed on the topology of the tree outside of the PERK clade, the bootstrap support values from the ML and NJ analyses were far lower than the BI posterior probability values for certain nodes. Such discrepancies between bootstrap support values and Bayesian posterior probabilities are commonly observed in practice (Huelsenbeck et al. 2002). Simulations have determined that while bootstrap values may be slightly conservative in their support for true interior nodes, Bayesian methods are more likely to assign high posterior probabilities to incorrect nodes (Alfaro et al. 2003; Douady et al. 2003). Furthermore, Bayesian methods may assign high posterior probabilities to arbitrary resolutions of polytomies (Lewis et al. 2005; Suzuki et al. 2002). These findings suggest that high posterior probabilities that are unsupported by bootstrap values should not be overly emphasized in practice.
Concerning our analysis, respective ML and NJ bootstrap support values for the node joining ApiMAPK1 to TtMAPK1 were 37% and 44%, and for the node joining the pzMAPK3 clade to the (LmMAPK4, TbMAPK4) pair the respective support values were 52% and 35%. The decrease in log-likelihood between the BI consensus tree shown in Fig. 4 (which has a log-likelihood of −8496.8) and a tree in which the internal edges in the protozoan clade were shortened to length 0 to create a tetrachotomy joining TtMAPK1, ApiMAPK1, the pair (LmMAPK4, TbMAPK4), and the pzMAPK3 clade was only 6.0, with a p-value of 0.456 according to the Shimodaira-Hasegawa test as implemented in PROML. For this reason, we consider the topology of the pzMAPK1 clade to be somewhat ambiguous and do not attach much significance to the association of (LmMAPK4, TbMAPK4) with the non-TXH sequences in pzMAPK3. The bootstrap support for the node separating the pzMAPK1/pzMAPK3 clade from PERK was also weak, with support values of 33% for both the ML and NJ analyses. Finally, the methods did not agree in the magnitude of their support for the joint PERK-protozoan clade. While the BI posterior probability for this node was 0.99, the ML and NJ bootstrap support values were only 50% and 25%, respectively. These results suggest that the phylogenetic connection between plant MAPKs and the pzMAPK1/pzMAPK3 protozoan MAPKs may be somewhat tenuous and should be further investigated.
To reflect the differences in statistical support among the methods, the edges joining the nodes that were consistently reproduced by all three methods but with less than 60% bootstrap support in both the ML and the NJ analyses are indicated by heavily dashed lines in Fig. 4. While these dashed edges demonstrate our lower level of confidence for the topology in this region of the tree, we do not mean to suggest that the simultaneous removal of all of these edges would have a negligible impact on our results. In fact, doing so significantly decreases the likelihood by 43.1 (p = 0.002).
Discussion
Traditionally, MAPK sequences have been categorized by their TXY activation loop signature sequence (Roux and Blenis 2004). Kültz (1998) modified this definition to include several distantly related proteins, which instead possess diverse [T/S]XX motifs, including THE and SEG, that can obviously only be phosphorylated at the T/S position. The inclusion of non-TXY sequences resulted in the formation of the Kültz MAPK3 subgroup, which included animal MAPKs with the SEG motif, a C. elegans MAPK with the THE motif, and protozoan sequences with TDY and TEY motifs. Within this subgroup, the three animal SEG sequences (two human and one rat) formed the “MAPK3 subfamily” (Kültz 1998). These three sequences were highly similar, with 86% identity among the positions shown in Kültz’s alignment figure, but had little similarity to the protozoan sequences: only 32% of the positions were conserved across the larger MAPK3 subgroup, a figure which includes the 25% of positions that were invariant in all 93 MAPKs considered by Kültz. Furthermore, the phylogenetic evidence for the existence of the larger MAPK3 subgroup was weak, with only 19% bootstrap support. We believe that Kültz’s inclusion of non-TXY sequences distorted his results, and that the classification of the protozoan sequences within the MAPK3 subgroup was simply an artifact of long-branch attraction.
The broadening of the definition of MAPKs to include non-TXY sequences has also resulted in the addition of protozoan sequences to the MAPK family. Sequences such as Pfmap-2 from P. falciparum (with a TSH motif instead) have been classified as “atypical MAPKs” (Dorin et al. 1999), and in a recent study of the Plasmodium berghei Pb-map2 MAPK, Tewari and coworkers (2005) analyzed a collection of non-TXY protozoan MAPKs. This analysis phylogenetically distinguished these sequences from other protozoan and metazoan MAPKs and noted that these “atypical MAPKs” seem to be restricted to the Alveolata. Our present analysis supports these conclusions, clearly demonstrating the existence of a monophyletic non-TXY subfamily of closely related sequences (our pzMAPK3). However, we note that the pzMAPK3 subfamily is no more “atypical” phylogenetically than pzMAPK2, with both of these subfamilies separated from the center of the tree by equally large distances (Fig. 4). Furthermore, amino acid sequence alignment of MAPKs from pzMAPK2 and pzMAPK3 subfamilies does not indicate a lesser degree of homology between these subfamilies (Fig. 2) except with respect to the TDY to T[GS]H conversion of their corresponding activation loops.
In fact, a striking and unexpected consequence of our analyses was the demonstration of an extremely large evolutionary distance between the ERK8-like MAPKs and the remaining MAPK families. While we are not the first to notice significant differences between the mammalian ERK7/ERK8 sequences and other mammalian ERKs (Abe et al. 2002; Bogoyevitch and Court 2004), our identification of ERK8 homologues in C. elegans and D. melanogaster indicates that ERK8-like MAPKs represent a distinct family that is evolutionarily conserved across a wide variety of eukaryotes except plants (we found that while similarity searches yielded several potential ERK8-like plant homologues, all of these were consistently grouped phylogenetically within the PERK family, suggesting a single subfamily for all plant MAPKs).
We also note that an association between mammalian ERK7/ERK8 and protozoan MAPKs has been previously observed, as reported in a recent analysis of a newly identified MAPK homologue in Entamoeba histolytica (Ray et al. 2005). This report placed HsERK8 and RnERK7 within a phylogenetic clade that included certain protozoan MAPKs, a result which led the authors to suggest, “[It] is likely that a common ancestor for MAPKs and ERK 7/8 existed before protozoans and slime molds diverged from the ancestral lineage that gave rise to animals, fungi, and plants” (Ray et al. 2005). Although our results also clearly demonstrate the monophyletic nature of the ERK8-like protozoan and metazoan MAPKs, there is a significant difference in our findings. Ray and coworkers suggested a single protozoan MAPK subgroup with ERK7/ERK8 as members, while we instead have presented evidence of two distinct protozoan subfamilies separated by a large phylogenetic distance.
It is interesting to note that, despite extensive research on the set of all P. falciparum kinases (Ward et al. 2004), no MAPK has yet been identified in this species that is homologous to TgMAPK1 and ChMAPK, and our database searches indicate that the ApiMAPK1 subfamily does not appear to have any other members. We also note that the evolutionary distance between TgMAPK1 and ChMAPK is substantially greater than for the T. gondii and C. hominis sequence pairs in the other two Apicomplexan subfamilies, suggesting that these MAPKs are far less conserved across even closely related species.
The identification of meaningful phylogenetic relationships among the protozoan MAPKs is a challenging problem due to the high level of divergence among many of these sequences. The goal of our analysis has been to obtain a phylogeny with strong statistical support while reflecting the diversity of protozoan MAPKs, and for this reason we found it beneficial to focus on sets of protozoan MAPKs with high levels of similarity. We acknowledge that the results presented here are “Apicomplexan-centric” and do not represent the majority of the protozoan MAPKs currently included in the NCBI databases. However, our opinion is that a smaller, more resolved tree is of greater benefit than a larger star-like tree with poor statistical support, and we therefore have included only a small subset of the many protozoan MAPKs that we catalogued in our searches.
Furthermore, because our study involved a collection of 32 highly diverse sequences, many positions could not be unambiguously aligned and were discarded from the final analysis. This led to a considerable loss of phylogenetic signal, and for this reason we did not attempt to precisely reconstruct phylogenetic relationships within closely related subfamilies. Instead, we selected representative sequences to broadly outline the major relationships among these groups. More focused analyses of specific protozoan and metazoan subfamilies would increase the number of well-aligned positions, allowing for the resolution of phylogenetic relationships among groups of closely related MAPKs.
In addition to suggesting further phylogenetic analyses, the existence of multiple distinct protozoan subfamilies also poses compelling biological questions. Our results will prove to be useful for identifying common structural features of MAPKs, which is of great value for the development of novel MAPK inhibitor drugs. However, we would certainly like to determine whether or not the protozoan MAPK subfamilies are functionally distinct. The vast majority of the protozoan MAPK sequences that we have selected for phylogenetic analysis have not been characterized, making every newly cloned protozoan MAPK gene such as tgMAPK2 of great potential significance. While P. falciparum MAPK2 seems to play a critical role in stage differentiation (Dorin et al. 1999) and T. thermophila MAPK3 and T. gondii MAPK1 are both associated with stress response (Nakashima and Nozawa 1999); (Brumlik et al. 2004), little biologic information is available regarding the remaining protozoan MAPKs described in Table 2. Most MAPKs possess C-terminal extensions beyond the region shown in our ClustalW alignment (Fig. 2), ranging in size from 88 ammo acids in the case of human p38α (41 kDa) to 449 amino acids in the case of the P. falciparum MAPK1 (90 kDa). These extensions have been found to be responsible for interactions with upstream/downstream signaling molecules (Zhang and Liu 2001). Therefore, it does not seem currently possible to obtain functional information from MAPK primary sequence data alone, and experimental results will be necessary to determine the connection, if any, between the phylogenetic and functional classification of the protozoan MAPKs.
References
Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105
Abe MK, Saelzler MP, Espinosa R, 3rd, Kahle KT, Hershenson MB, Le Beau MM, Rosner MR (2002) ERK8, a new member of the mitogen-activated protein kinase family. J Biol Chem 277:16733–16743
Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, Deng M, Liu C, Widmer G, Tzipori S, Buck GA, Xu P, Bankier AT, Dear PH, Konfortov BA, Spriggs HF, lyer L, Anantharaman V, Aravind L, Kapur V (2004) Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science 304:441–445
Alfaro ME, Zoller S, Lutzoni F (2003) Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol Biol Evol 20:255–266
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation, of protein database search programs. Nucleic Acids Res 25:3389–3402
Bogoyevitch MA, Court NW (2004) Counting on mitogen-activated protein klnases—ERKs 3, 4, 5, 6, 7 and 8. Cell Signal 16:1345–1354
Brumlik MJ, Wei S, Finstad K, Nesbit J, Hyman LE, Lacey M, Burow ME, Curiel TJ (2004) Identification of a novel mitogen-activated protein kinase in Toxoplasma gondii. Int J Parasitol 34:1245–1254
Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Alien JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ (2002) Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoeii. Nature 419:512–519
Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552
Clark-Lewis I, Sanghera JS, Pelech SL (1991) Definition of a consensus sequence for peptide substrate recognition by p44mpk, the meiosis-activated myelin basic protein kinase. J Biol Chem 266:15180–15184
Dorin D, Alano P, Boccaccio I, Ciceron L, Doerig C, Sulpice R, Parzy D (1999) An atypical mitogen-activated protein kinase (MAPK) homologue expressed in gametocytes of the human malaria parasite Plasmodium falciparum. Identification of a MAPK signature. J Biol Chem 274:29912–29920
Douady CJ, Delsuc F, Boucher Y, Doolittle WF, Douzery EJ (2003) Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol 20:248–254
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–779
Feisenstein J (1989) PHYLIP -Phylogeny inference Package (Version 3.2). Cladistics 5:164–66
Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98
Hanks SK, (2003) Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol 4:111
Hanks SK, Quinn AM (1991) Protein kinase catalytic domain sequence database: identification of conserved features of primary structure and classification of family members. Methods Enzymol 200:38–62
Huelsenbeck JP, Larget B, Miller RE, Ronquist F (2002) Potential applications and pitfalls of Bayesian inference of phytogeny. Syst Biol 51:673–688
Jones DT, Taylor WR, Thomton JM (1992) The rapid generation of mutation, data matrices from protein sequences. Comput Appl Biosci 8:275–282
Kültz D (1998) Phylogenetic and functional classification of mitogen and stress-activated protein kinases. J Mol Evol 46:571–588
Lewis PO, Holder MT, Holsinger KE (2005) Polytomies and Bayesian phylogenetic inference. Syst Biol 54:241–253
Margush T, McMorris FR (1981) Consensus n-trees. Bull Math Biol 43:239–244
Martin-Blanco E (2000) p38 MAPK signalling cascades: ancient roles and new functions. Bioessays 22:637–645
Nakashima S, Nozawa Y (1999) Stress-responsive gene expression in Tetrahymena. Acta Biol Hung 50:413–424
Page RD (1996) TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12:357–358
Pain A, Renauld H, Berriman M, Murphy L, Yeats CA, Weir W, Kerhornou A, Aslett M, Bishop R, Bouchier C, Cochet M, Coulson RM, Cronin A, de Villiers EP, Fraser A, Fosker N, Gardner M, Goble A, Griffiths-Jones S, Harris DE, Katzer F, Larke N, Lord A, Maser P, McKellar S, Mooney P, Morion F, Nene V, O’Neil S, Price C, Quail MA, Rabbinowitsch E, Rawlings ND, Rutter S, Saunders D, Seeger K, Shah T, Squares R, Squares S, Tivey A, Walker AR, Woodward J, Dobbelaere DA, Langsley G, Rajandream MA, McKeever D, Shiels B, Tait A, Barrell B, Hall N (2005) Genome of the host-cell transforming parasite Theileria annulata compared with T. parva. Science 309:131–133
Ray D, Dutta S, Banerjee S, Banerjee R, Raha S (2005) identification, structure, and phylogenetic relationships of a mitogen-activated protein kinase homologue from the parasitic protist Entamoeba histolytica. Gene 346:41–50
Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574
Roux PP, Blenis J (2004) ERK and p38 MAPK-activated protein kinases: a family of protein kinases with diverse biological functions. Microbiol Mol Biol Rev 68:320–344
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Seeber F (1997) Consensus sequence of translational initiation sites from Toxoplasma gondii genes. Parasitol Res 83:309–311
Suzuki Y, Glazko GV, Nei M (2002) Overcredibility of molecular phytogenies obtained by Bayesian phylogenetics. Proc NatS Acad Sci USA 99:16138–16143
Tewari R, Dorin D, Moon R, Doerig G, Billker O (2005) An atypical mitogen-activated protein kinase controls cytokinesis and flagellar motility during male gamete formation in a malaria parasite. Mol Microbiol 58:1.253–263
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Ward P, Equinet L, Packer J, Doerig C (2004) Protein kinases of the human malaria parasite Plasmodium falciparum: the kinome of a divergent eukaryote. BMC Genomics 5:79
Xu P, Widmer G, Wang Y, Ozaki LS, Alves JM, Serrano MG, Puiu D, Manque P, Akiyoshi D, Mackey AJ, Pearson WR, Dear PH, Bankier AT, Peterson DL, Abrahamsen MS, Kapur V, Tzipori S, Buck GA (2004) The genome of Cryptosporidium hominis. Nature 431:1107–1112
Zhang S, Liu Y (2001) Activation of salicylic acid-induced protein kinase, a mitogen activated protein kinase, induces multiple defense responses in tobacco. Plant Cell 13:1877–1889
Acknowledgments
This work was supported by funds from NIH Grant AI060424 (to T.J.C) and the Tulane endowment.
Author information
Authors and Affiliations
Corresponding author
Additional information
[Reviewing Editor: Dr. David Pollock]
Rights and permissions
About this article
Cite this article
Lacey, M.R., Brumlik, M.J., Yenni, R.E. et al. Toxoplasma gondii Expresses Two Mitogen-Activated Protein Kinase Genes That Represent Distinct Protozoan Subfamilies. J Mol Evol 64, 4–14 (2007). https://doi.org/10.1007/s00239-005-0197-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-005-0197-x