Introduction

Reversible protein phosphorylation is a major regulatory mechanism of cellular functions such as stress-activated signal transduction, mitogenic signal transduction, and cell cycle control (e.g., Sun and Tonks 1994; Hanada et al. 2001). Protein kinases have been in the spotlight for several decades (Burnett and Kennedy 1954; Bolen 1995; Shokat 1995; Yeh and Pellegrini 1999; Schlessinger 2000; Cowan and Storey 2003), leading to the recent development of drug therapy using kinase inhibitors (reviewed by Krause and Van Etten 2005). Yet far less focus has been given to the kinases counterparts in cell regulation, the protein phosphatases (PPs). It is becoming clear that the interplay between kinases and phosphatases is quite complex. Thus, in order to fully understand the process of phosphorylation, it is imperative to focus research on phosphatases as well as on kinases.

Protein serine/threonine phosphatases are divided into four structurally distinct superfamilies (Cohen 1989; Shenolikar 1994; Barford, Jia, and Tonks 1995; Wera and Hemmings 1995): PP1, PP2A, PP2B, and PP2C. The PP2C superfamily of phosphatases (also referred to as PPM) is defined by distinct amino acid sequence and three-dimensional (3D) structure (Tamura et al. 1989; Mann et al. 1992; Wenk et al. 1992). The PP2C superfamily does not seem to be evolutionarily related to PP1, PP2A, and PP2B, which are all multisubunit enzymes. This study focuses on the PP2C superfamily as an embodiment of the evolutionary diversity of protein phosphatases.

PP2C Functions

PP2C is a monomeric enzyme which displays broad substrate specificity. Distinguishing characteristics of PP2C are its (i) absolute requirement for divalent cations, mainly Mg+2 or Mn+2; (ii) distinctive structural features; and (iii) insensitivity to inhibition by okadaic acid (Barford et al. 1995; Wera and Hemmings 1995). At least 15 distinct PP2C human paralogues have been found in mammalian cells (Table 1). All of these PP2Cs have Mg+2- and/or Mn+2-dependent phosphatase activity against artificial substrates in vitro (Komaki et al. 2003).

Table 1 The 15 human PP2C paralogues and their cellular functions

The majority of PP2Cs are involved in regulation of stress activated protein kinase (SAPK) cascades which relay signals in response to external stimuli (Meskiene et al. 2003). These cascades are a subfamily of the mitogen-activated protein kinase (MAPK) cascades. Different PP2Cs negatively regulate SAPK pathways at different levels. For instance, PP2Cβ inhibits the TAK1 pathway (Hanada et al. 2001; Li et al. 2003) and is involved in the NF-κB pathway (Prajapati et al. 2004). PP2Cα inactivates the p38 pathway and the c-Jun amino-terminal kinase (JNK) pathway (Takekawa et al. 1998) and is additionally involved in the Wnt signaling pathway (Strovel et al. 2000).

Recent studies have demonstrated that the PP2C superfamily is also associated with eukaryotic cell cycle processes, which are controlled by the ordered activation and inactivation of cyclin-dependent protein kinases (CDKs). Reversible protein phosphorylation is one of the mechanisms through which extracellular and intracellular signals regulate CDKs (Cheng et al. 1999, 2000). Additionally, we have previously reported that overexpression of PP2Cα activates the expression of the tumor suppressor gene TP53/p53, which leads to G2/M cell cycle arrest and apoptosis (Ofek et al. 2003). Thus, the PP2C superfamily appears to be directly involved in several cell regulation and cell signaling processes. In fact, many PP2C members have been reported to inactivate CDK and MAPK family kinases by dephosphorylating a conserved threonine residue on the so-called T-loop of these kinases (Marley et al. 1996; Cheng et al. 1999; Takekawa et al. 2000), implicating that the PP2C superfamily may be general T-loop phosphatases.

Phylogenetic Study of PP2C

An evolutionary comparison of kinomes across species (Manning et al. 2002) has demonstrated the value of a phylogenetic study of kinases. This enabled mapping kinases specific to each lineage, as well as delineating the pathways involving kinases shared throughout various evolutionary lineages. Similar to families of protein kinases, the multiplicity of PP2C proteins suggests a broad functional diversity of these proteins. Thus, the aim of this study is to conduct a comprehensive genomic evolutionary analysis of all known members of the PP2C superfamily in Metazoa. This is the first analysis of its kind undertaken in the study of the PP2C superfamily and, as such, promises to be highly informative in characterizing protein isoform diversification. Consequently, we performed an extensive search of all genomic databases for PP2C genes. We then conducted a phylogenetic analysis of all PP2C members found, with the aim of assigning paralogy and orthology relations to each sequence found. These assignments enabled us to predict functions of previously unidentified genes in the superfamily, as well as explore the differences between the families within the PP2C superfamily and estimate the relative dates of the diversification events. We thereby explore the breadth of PP2C functional conservation throughout the metazoan kingdom.

Methods

Search for PP2C Members in Metazoa

Sequences were retrieved from the following databases: GenBank (http://www.ncbi.nlm.nih.gov; Benson et al. 2005), ENSEMBL (http://www.ensembl.org; Hubbard et al. 2005), FLYBASE (http://www.fruitfly.org; Stapleton et al. 2002), and WORMBASE release WS130 (http://www.wormbase.org; Harris et al. 2004). Initially, all fully sequenced non-plant eukaryote genomes (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, Canis familiaris, Bos taurus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Danio rerio, Takifugu rubripes, Tetraodon nigroviridis, Drosophila melanogaster, Anopheles gambiae, Apis mellifera, Caenorhabditis elegans, Saccharomyces cerevisiae) were screened for PP2C sequences using BLAT (Kent 2002), TBLASTN (Altschul et al. 1990), and the Orthologue Prediction section of ENSEMBL. Sequences uncovered in these stages were filtered according to two criteria: (i) due to an unreliable alignment containing excess gaps, all yeast sequences were discarded; and (ii) orthologous sequences for which there was a significant deviation from the PP2C signature defined in PROSITE (Bairoch and Bucher 1994; entry PS01032) were discarded. Accession numbers of all sequences used in the study are available as supplementary material.

Sequence Alignment and Phylogenetic Reconstruction

Multiple sequence alignment (MSA) was performed using the MUSCLE program version 3.52 (Edgar 2004). Maximum likelihood-based phylogenetic reconstruction was performed with the PhyML program (Guindon et al. 2005) using among-site rate variation with four discrete rate categories and the JTT model of sequence evolution. Node supports were determined by performing 100 bootstrap replicates (Felsenstein 1985). Bootstrap values higher than 70% were considered significant.

Due to the extended evolutionary time spanned by the sequences in this analysis and the low similarity between different paralogues, the resulting alignment includes regions which are difficult to align as well as many gaps. This is a general problem when analyzing superfamily genes, and one must verify that the inferred phylogeny does not depend on those parts of the alignment which are uncertain. In order to test the robustness of the results to the alignment’s validity, we ran the Gblocks program (Castresana 2000) on the alignment, with the aim of removing poorly aligned regions and then performed the phylogenetic analysis on the resulting positions. The new tree was found to be essentially identical to the tree which was obtained based on the full alignment. All minor differences between the trees were supported by low bootstrap values in the new tree. The settings used to run Gblocks, together with the resulting reduced alignment and resulting phylogenetic tree, are given in the supplementary material.

A specific sequence was considered part of a group if it belonged to a monophyletic clade in the tree. For this purpose, it was assumed that the root of the tree is on one of the branches leading to one of the more anciently derived groups (as defined in the Results section).

Site-Specific Evolutionary Rate Analysis

The conservation pattern of the PP2C superfamily was analyzed by estimating site-specific evolutionary rates, using the Bayesian approach of the Consurf server (Mayrose et al. 2004; Landau et al. 2005). The analysis was conducted using the reconstructed PhyML tree. Site-specific positive selection was analyzed using the Selecton server (Doron-Faigenboim et al. 2005), using the Bayesian method (Yang et al. 2000). Once again, the reconstructed PhyML tree was given as input.

Results

The search for PP2C members in the human genome led to the identification of 15 members (Table 1). Each of these members was found throughout the vast majority of the vertebrate genomes searched, with the exception of a few orthologues not found. It is unclear whether these exceptions stemmed from incomplete sequencing or from gene losses in these organisms. All the 15 groups of orthologues clustered as monophyletic groups in the phylogenetic tree reconstructed (Fig. 1), providing firm phylogenetic support for the hypothesis whereby the PP2C superfamily arose following a series of duplication events and for the classical classification of the PP2C family members. Furthermore, the phylogenetic tree enabled assigning annotation to previously unidentified genes according to their location on the phylogenetic tree.

Fig. 1
figure 1

Maximum likelihood phylogenetic tree of PP2C, with the TA group used as an outgroup. For brevity, the tree presented here shows a collapsed version of the full tree, where all clades representing vertebrate orthologues were collapsed, as were all clades of orthologous insects (collapsed clades are shown as black triangles). The lengths of the branches leading to the collapsed groups are the lengths of the branches leading to the ancestral groups in the original tree. Major PP2C groups are shaded in color for clarity. The branch colored yellow represents a putative gene duplication event with subsequent gene loss in vertebrates in one of the duplicants. The scale bar represents one branch length unit, which is equivalent to an average of one substitution per site. A full tree in Newick format is available in the supplementary material.

Two Types of PP2C

Mammalian PP2Cs have been previously classified into two subgroups according to differences in amino acid sequence motifs. Group 2 consists of PP2Cη, PP2Cζ, and PPM1H, while group 1 includes all other PP2Cs (Komaki et al. 2003). TA-PP2C, discovered only a year later (Mao et al. 2004), was not classified as belonging to any of the two groups . Groups 1 and 2 differ in their PP2C signature, as well as in the other sequential motifs found to characterize the PP2C superfamily (Komaki et al. 2003). All residues forming the catalytic domain (Das et al. 1996) are part of the PP2C signature and these additional motifs.

The phylogenetic reconstruction of the PP2C family (Fig. 1) shows that the three families belonging to group 2 (PP2Cη, PP2Cζ, and PPM1H) form a monophyletic clade (99% bootstrap), supporting the previous finding whereby group 2 is characterized by a unique sequence (Komaki et al. 2003). Since the differences between group 1 and group 2 are displayed in residues which surround the catalytic site, this may hint at a functional divergence of the PP2C superfamily into two functionally distinct groups.

Mapping of PP2C Duplications

Nine different clades in the tree comprise members in protostomes. These include sequences belonging to monophyletic PP2C groups (TA-PP2C, ILKAP, PP2Cγ, Wip1, and PP2Cε), as well as sequences assumed to have been derived from ancestral forms of PP2C prior to their duplication. For example, a worm sequence which forms an outgroup of the PDP1 and PDP2 vertebrate clades is assumed to be derived from the ancestral sequence of PDP1 and PDP2 prior to their duplication and will be referred to as the PDP1|PDP2 ancestrally derived sequence. Similarly, there is a protostome ancestrally derived sequence of POPX1|FEM2, PP2Cα|PP2Cβ, group 2 (PP2Cη|PP2Cζ|PPM1H), and PDP1|PDP2. Consequently, it appears that much of the PP2C diversification occurred before bilaterian diversification. On the other hand, the nine paralogous groups, PP2Cα, PP2Cβ, POPX1, FEM2, PP2Cζ, PPM1H, PPM1K, PDP1, and PDP2, are only present in vertebrates. Therefore, the mapping of the emergence of each one of the PP2C groups during evolution is possible, as depicted in Fig. 2. Nine groups emerged prior to the divergence of protostomes, while another nine were created by duplications prior to the divergence of vertebrates.

Fig. 2
figure 2

The two active periods of gene diversification by duplication are marked by arrows in the currently accepted species tree (Halanych 2004). The rectangle represents the putative emergence of the groups: PP2Cα, PP2Cβ, PPM1K, FEM2, POPX1, PP2Cζ, PPM1H, PDP1, and PDP2. The diamond represents the latest possible dating of the emergence of the groups: Wip1, ILKAP, PP2Cγ, TA-PP2C, PP2Cε, PP2Cα|β, PDP1|PDP2, group 2, and FEM2|POPX1.

More precise mapping and relative dating of the different gene duplication events proved to be more difficult. A majority of the more ancient duplications were supported by very low bootstrap values on the tree (Fig. 1). Thus, it is currently impossible to map which members were created by each ancient duplication event, and to determine the order of duplications. This is further aggravated by lack of available genomes, specifically in nonvertebrate chordates and poriferans. However, the formation of four more recent duplications was supported by very high bootstrap values. These duplications (PP2Cα versus PP2Cβ, PDP1 versus PDP2, PP2Cζ versus PPM1H, and POPX1 versus FEM2) occurred before vertebrate diversification and are supported by a single form in nonvertebrates (sea urchin and/or insects). For example, the clades of PDP1 and PDP2 do not include any sea urchin sequences, yet a sea urchin sequence forms an outgroup. An additional duplication, between PP2Cη and the PP2Cζ|PPM1H ancestor, is more difficult to map precisely. According to the tree, the duplication occurred after the speciation of insects and before the speciation of vertebrates. However, it is unclear whether this duplication occurred before or after the speciation of sea urchin, due to the location of the two sea urchin sequences (Fig. 1; denoted Group 2 S. purpuratus 1 and 2).

Protostome-Specific Duplications

The phylogenetic tree supports an ancient duplication event in the ancestral PP2Cγ (Fig. 1; marked in yellow) where the gene was apparently lost in the lineage leading to vertebrates yet remained in protostomes. This novel paralogue further underwent another duplication in insects. Yet another insect-specific duplication is evident, as is apparent by the existence of both α|β D. melanogaster 2 and α|β D. melanogaster 1 (see Fig. 1). In both these insect-specific duplications, no sequence is available from A. gambiae, which may indicate either a loss in this lineage or missing data in the A. gambiae genome. Little is known of these expansions in insects, yet they are supported by EST evidence in D. melanogaster (UCSC genome browser [Kent et al. 2002]), ruling out the hypothesis whereby these sequences represent pseudogenes.

Mapping Functional Regions in PP2C

We performed a comprehensive analysis of the pattern of amino acid conservation throughout the MSA of the PP2C superfamily. This analysis took into account the phylogenetic tree, thus enabling more precise inference of conservation of amino acid sites (Pupko et al. 2002). Two crystal structures of PP2C members exist in the Protein Data Bank (Berman et al. 2000)—one of the human PP2Cα (Das et al. 1996) and one of the Mycobacterium tuberculosis PP2C (Pullen et al. 2004). The conservation pattern obtained for the PP2C superfamily was mapped onto the Van-der-Waals surface of PP2Cα, since it is the only metazoan PP2C with a known 3D structure. A clear pattern of high conservation is apparent throughout the N-terminus of the protein, while the C-terminus of the protein is highly variable. The variability of the C-terminus is expected, since the C-terminus of the PP2Cα is unique to this family and may serve as a substrate recognition domain in the cleft that is created between it and the catalytic domain (Das et al. 1996). However, the relatively high conservation pattern of the N-terminus is more surprising. On the one hand, the results reinforce the previous knowledge of functionally important sites in PP2C, showing that the nine catalytic residues found in PP2Cα (Das et al. 1996) are indeed highly conserved. More surprisingly, over 50 additional sites appeared to be highly conserved. While some of these sites cluster around the catalytic site in the globular N-terminus, others form the cleft between the N-terminus and the C-terminus and form part of the bulk above the catalytic region (according to the orientation in Fig. 3). The high conservation of the N-terminal part of the cleft suggests that there may be a shared mechanism of this cleft throughout all the PP2C families.

Fig. 3
figure 3

The conservation pattern of the PP2C superfamily as inferred by Consurf (Landau et al. 2005). Conservation scores are color-coded onto the Van der Waals surface of PP2Cα, where maroon corresponds to maximal conservation, white corresponds to average conservation, and turquoise corresponds to maximal variability. The Mg2+ ions and associated water molecules are shown in yellow, and the nine previously identified catalytic sites (Das et al. 1996) are shown in red. These nine sites also attained the highest level of conservation in the analysis. Arrows show the globular N-terminus region and the C-terminus tail.

In order to study the differences between the PP2C families, pairs of PP2C families were analyzed for site-specific positive selection using the Selecton server (Doron-Faigenboim et al. 2005). The underlying assumption was that following the gene duplication events which created the two families, both genes underwent a specialization process. Such a process may have led to a rapid fixation of mutations due to positive selection forces. Thus, all pairs of PP2C families which most recently diverged (PP2Cα-PP2Cβ, PDP1-PDP2, FEM2-POPX1, PPM1H-PP2Cζ) were analyzed. The analysis of pairs of sequences, as opposed to an analysis of the entire superfamily, enables a more reliable MSA at the codon level. However, in all of these families no site-specific or global positive selection was observed. This may be due to the fact that purifying selection within each of the families obscures the footprint of positive selection which the families underwent, and due to the small species sampling.

Additional evidence for gene specialization may be obtained by analyzing insertion and deletion events in the different gene families. To this end, the MSA was utilized as a rough indicator of insertion and deletion events (using visual inspection). A schematic drawing was created depicting PP2C domains common to all groups, as well as those unique to specific groups (Fig. 4). Only such blocks which are well defined and clearly observed when viewing the alignment were depicted. Furthermore, these blocks were flanked by anchors of regions which are conserved throughout the entire alignment. This analysis suggests that the evolution of the PP2C family included several significant insertion or deletion events which may have led to the specialization of the duplicants.

Fig. 4
figure 4

A schematic drawing of the PP2C superfamily alignment. Bars represent blocks which are unique to specific groups. Gray rectangles represent the rest of the alignment, including all regions which are common to all PP2C proteins. Coordinates of the blocks (according to the MSA) appear underneath each block. The MSA is available as part of the supplementary material.

Discussion

In this study, we present an analysis of PP2C evolution. The reconstructed phylogeny displays the relationship between PP2C paralogues and orthologues throughout Metazoa, revealing the existence of at least 15 PP2C groups which were created via gene duplication. Analysis of the PP2C superfamily suggests that two waves of duplications were responsible for the creation of the majority of the PP2C members. The first wave of duplications presumably led to the formation of functionally different groups which specialized in different catalytic processes. Our analysis suggests that this wave took place before the divergence of bilaterians. Presumably, these ancient duplications occurred successively in a short time frame (represented by short branches between the different groups; see Fig. 1), rendering it difficult to determine the order of the duplication events (as is evident from the low bootstrap support on these branches). The second wave of duplications presumably led to the formation of tissue- specific groups, and most likely took place at the beginning of vertebrate evolution. More precise timing of these duplication events is still beyond reach due to the lack of sequence data in poriferans.

Extensive gene duplication events during early chordate evolution have been previously reported (e.g., Miyata and Suga 2001; McLysaght et al. 2002). These extensive events may have been the result of a whole-genome duplication or a series of partial chromosomal block duplications. These duplications, which took place in early vertebrate evolution, are thought to account for the existence of the four HOX gene clusters (Larhammar et al. 2002; Prince 2002), as well as for the four different MHC gene clusters (Abi-Rached et al. 2002). Numerous studies have found protein families with a pattern of evolution similar to the pattern of PP2C found here (e.g., Miyata and Suga 2001; Wakeham et al. 2005). Many of these families include kinase and phosphatase families involved in cell signaling. Furthermore, a comparative study comparing kinome catalogs throughout different species (Manning et al. 2002) revealed that the creation of functionally distinct kinase families occurred during early metazoan evolution. Thus, we postulate that the evolution of signal transduction occurred in two major active periods. The first, occurring before metazoan radiation, may have been driven by the increase in complexity of multicellular organisms. This required more sophisticated signaling between and within cells. Furthermore, the more complex developmental mechanisms also required a more elaborate network of signaling proteins. This suggests that the more anciently derived PP2Cs evolved following a requirement for new signaling pathways. For example, the more anciently derived Wip1 and ILKAP display this pattern: whereas Wip1 evolved as part of the cell cycle pathway, ILKAP evolved to participate in cell adhesion and growth factor signaling.

The second active period, before vertebrate diversification, occurred concomitantly with the development of tissues such as skeletal muscle and the nervous system. Indeed, the transition between nonvertebrates and vertebrates is believed to be one of the major leaps in complexity during evolution, involving the evolution of cells such as the neural crest, the brain, and the spinal cord (Gilbert 2001). Insights into this phenomenon are evident throughout all PP2C groups found to have diverged in this second active period. All PP2C vertebrate-specific duplications characterized in this study show specialized tissue-specific expression patterns. In fact, in each of these duplications, one duplicant is uniquely expressed in either skeletal muscle or nervous system tissue. Whereas PP2Cα and FEM2 are ubiquitous in their duplication partners PP2Cβ and POPX1 are tissue specific. PP2Cβ splice variants were shown to display skeletal muscle- and heart-specific tissue expression (Marley et al. 1998; Seroussi et al. 2001). Similarly, POPX1 displays brain-specific expression (Koh et al. 2002). PP2Cζ is displayed in the testicular germ cells (Kashiwaba et al. 2003), while PPM1H is expressed in the brain and is involved in neuronal inhibitory pathways (Labes et al. 1998). Finally, PDP1 and PDP2 both catalyze the same reaction but differ in tissue distribution. PDP1 is highly expressed in skeletal muscle, whereas PDP2 is expressed in liver (Huang et al. 1998).

The evolutionary conservation pattern of PP2C suggests that the highly conserved catalytic domain and the surrounding core are shared by all PP2C members. Of specific interest are those residues which have not been previously identified as critical to the enzymatic function of the protein. Since these sites are abundant on the protein surface, they may represent novel protein or ligand binding sites. As the PP2C superfamily plays an important role in delicate signals relayed across the cell, it is likely that the different PP2C proteins are further bound by tight regulation. Thus, the novel conserved sites found may be the key to understanding these regulatory mechanisms.

Aside from the shared conserved domain, several PP2C families have a unique appendage which may be the specificity determinant of this family. For instance, in the C-terminal region, PP2Cα and PP2Cβ share a unique tail (Fig. 4). These differences may be responsible for the fact that the two paralogues differ in their substrate binding abilities. Further investigation of the differences between the families could also focus on the gene-regulation level, for instance, by comparing the promoters and nontranslated regions of the different paralogues. We have previously reported that the nontranslated regions of PP2Cβ are highly conserved throughout orthologous genes (Seroussi et al. 2001), indicating a significant regulatory role for these regions. Furthermore, different PP2Cβ transcripts were found to differ by alternative splicing and alternative promoters (Ohnishi et al. 1999). It thus seems that the diversification which gave rise to new PP2C families continued with the creation of variants which differ at the transcriptional level. Pinpointing the precise functional differences among and within the PP2C families remains a challenge which may play a pivotal role in our understanding of complex signal networks in cells.