Introduction

The evolution of chordates shows an impressive diversification and specialization of structural and biochemical features. At the molecular level, these phenotypic adaptations are paralleled by the expansion and increased complexity of gene families over vertebrate history. While point mutations and subsequent selection generated some genetic variability, it was the large scale genomic duplication events that are thought to be mainly responsible for the diversity in vertebrate gene families (for review see Ohno 1999; Zhang 2003; Sémon and Wolfe 2007). Two large scale duplication events (1R and 2R) early in chordate evolution generated four paralogous genes in vertebrates for each protochordate/invertebrate ortholog (Ohno 1970, 1999; Panopoulou and Poustka 2005). Evidence from recently completed fish genomic assemblies support the earlier suggestions (Ohno 1970) that an additional round of duplication (3R) occurred early in the ray-finned fish lineage. Thus, there is a one-four model for tetrapods (i.e., one gene in invertebrates could have four homologs in tetrapods) and a one-eight model for teleost genes (i.e., possibly eight genes in teleosts homologous to one invertebrate gene) (Taylor et al. 2003; Meyer and Van de Peer 2005). Though gene-specific evolutionary trajectories resulted in gene families deviating from these predictions, multiple examples consistent with the 1R/2R/3R theory are found throughout vertebrates (Meyer and Schartl 1999; Taylor et al. 2003; Hoegg et al. 2004; Prohaska and Stadler 2004; Steinke et al. 2006).

Mechanistically, each duplication event initially provided organisms with two sets of genes presumably carrying redundant functions. Sustained selective pressure may conserve one of the duplicates, while the other may be relieved from such constraints and could rapidly evolve without deleterious consequences (Zhang 2003; Sémon and Wolfe 2007). The rapid accumulation of mutations in the regulatory or coding sequence of a duplicate would allow a conservation of gene dosage and prevent aberrant phenotypes caused by the doubled expression of the ancestral gene product (see Conrad and Antonarakis 2007). This often lead to the loss or pseudogenization of the rapidly evolving duplicate, as exemplified by the accumulation of nuclear hormone receptor (NHR) pseudogenes in vertebrate genomes (Zhang et al. 2008). In contrast, some duplicated genes may follow a radically different evolutionary trajectory, and eventually assume only a portion of the functions of the ancestral gene (subfunctionalization), or develop entirely new functions (neofunctionalization) (Zhang 2003; Sémon and Wolfe 2007). The vertebrate retinoid acid receptors are good examples of such diversification, as family members assumed distinct expression patterns and ligand-binding abilities post-duplication (Escriva et al. 2006).

The PGC-1 gene family is a small family of transcriptional coactivators that plays a pivotal role in several aspects of metabolic regulation in mammals. In mammals, there are three members: PGC-1α, PGC-1β, and PGC-1 related protein (PRC). PGC-1α was the first member of the family to be discovered, first characterized as an inducer of brown fat differentiation (Puigserver et al. 1998). Since then, it has been linked to multiple metabolic programs in diverse mammalian systems (for review see Puigserver and Spiegelman 2003; Handschin and Spiegelman 2006). PGC-1α and PGC-1β are relatively similar in structure and tissue distribution, but, while they exhibit some functional redundancy in certain metabolic programs, distinctive capacities have been attributed to each coactivator (Lin et al. 2002a; Handschin and Spiegelman 2006; Uldry et al. 2006). In comparison, PRC is much more divergent in structure and expression, and its functional role in metabolic regulation remains largely unexplored (see Andersson and Scarpulla 2001).

Of the three family members, PGC-1α is probably the most studied, and is known to assume many well defined roles in mammals (Handschin and Spiegelman 2006; Scarpulla 2008). Its complex protein architecture allows it to interact with myriad regulators of gene expression, thereby conferring a major role in the coordination of acclimation to developmental, physiological, and environmental stressors. The protein has four main functional regions: the activation domain (AD), the nuclear respiratory factor-1 (NRF-1) binding domain, the MEF-2c binding domain, and the RNA binding domain (RBD). Within the AD, canonical leucine-rich motifs (LXXLL) allow PGC-1α to interact with members of the NHR superfamily, including PPARs, estrogen-related receptor alpha (ERRα), and hepatocyte nuclear factor 4alpha (HNF4α), mediating their effects on metabolic homeostasis (Vega et al. 2000; Wu et al. 2002). In addition, PGC-1α possesses a PPARγ-specific interaction domain that permits ligand-independent interactions with the NHR during brown fat differentiation (Puigserver et al. 1998). PGC-1α exerts its effects on the muscle phenotype via the nuclear respiratory factors (NRF-1 and 2) and myocyte enhancing factor 2c (MEF2c). The coactivator binds these transcription factors through independent motifs and mediates their respective roles on muscle metabolism (Wu et al. 1999; Michael et al. 2001; Lin et al. 2002b; Vercauteren et al. 2008). Considered together, these features garnered PGC-1α the label of a “master controller” of oxidative metabolism in mammals (Puigserver and Spiegelman 2003; Scarpulla 2006). Though PGC-1α may play a similar role in control of mitochondrial content of birds (Ueda et al. 2005), recent studies suggest that its role in may differ in lineages that diverged early in vertebrate evolution. Specifically, in the context of metabolic remodeling in fish muscle in response to exercise, dietary and temperature stressors, mitochondrial changes seem to occur in a PGC-1α independent manner, possibly compensated through changes in PGC-1β, while other roles such as the PGC-1α regulation of lipid homeostasis appear to be conserved (McClelland et al. 2006; LeMoine et al. 2008).

The modular structure of PGC-1α, with separate interaction sites involved in distinct metabolic programs, could allow for the different binding domains to follow independent evolutionary trajectories. Thus, while some functions of PGC-1α may be conserved throughout vertebrates, others could experience lineage-specific diversification. In this study, we test these possibilities by evaluating the evolutionary history of PGC-1α in representative vertebrate species and the potential functional implications entailed.

Methods

PGC-1 Sequences

Protein sequences from the different family members were obtained from the ENSEMBL and GENBANK databases (see Tables 1, 2). PGC-1α nucleotide sequences were also generated from muscle cDNA using PCR. Total RNA was extracted from flash frozen ground muscle tissues of targeted species (Table 1) using the RNeasy Extraction kit according to manufacturer’s instructions (QIAGEN, Mississauga, ON, Canada). Total RNA was reverse transcribed and the PGC-1α orthologs amplified from each species using consensus primers and species specific primers (Supplemental Data). The cycling and reactions conditions were optimized for each target using Pfx proofreading polymerase (Invitrogen, Burlington,ON, Canada). Amplicons were run on agarose gels, purified using a QIAquick gel extraction kit (QIAGEN) and cloned with the QIAGEN cloning kit or TOPO TA kit (Invitrogen) following manufacturer’s instructions. Clones were sequenced in both directions at the Genome Quebec Sequencing center (Montreal, QC, Canada) and at Queen’s University Biology Molecular Core Facility (Kingston, ON, Canada).

Table 1 PGC-1α sequence information of representative vertebrate species used in this study
Table 2 Protein information for the PGC-1 homologs used in the phylogenetic reconstruction of the PGC-1 family

Phylogenetic Analyses

PGC-1α nucleotide sequences were aligned with ClustalW (Thompson et al. 1994), and through comparison of the nucleotide and translated proteins in MEGA 4.1 (Tamura et al. 2007) putatively non-homologous regions with either gaps or alignment discrepancies (including clade-specific insertions) between taxa were manually excluded from further analysis.

We reconstructed the phylogeny of the PGC-1 family to confirm the identity of the PGC-1α orthologs we retrieved. As outgroups of the modern vertebrate PGC-1s, we included putative homologs from a tunicate (Ciona intestinalis) and two arthropod species (Anopheles gambiae, Drosophila melanogaster). Protein sequences of the family members were aligned in ClustalW and subjected to a Bayesian analysis (MrBayes v. 3.1.2; Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003) for 500,000 generations sampling every 100 with default settings (three heated chains and one cold chain for each simultaneous runs). We used a fixed-rate modeling approach using the PRSET aamodelpr = mixed option. This allows the MCMC sampler to explore each of 10 empirically determined fixed-rate models specified by the program, with each model contributing to the final tree achieved after convergence proportional to its posterior probability (Ronquist et al. 2005). The burn-in of 15% (750 of the 5000 trees sampled) was determined based on a preliminary run and visual inspection of the likelihood traces in Tracer v1.4 (Rambaut and Drummond 2007).

The retrieved PGC-1α homologs were annotated using the annotated rat PGC-1α sequence as a reference. Accordingly, the PGC-1α nucleotide sequences were partitioned into four functional domains (see Fig. 3) relative to the nucleotide sequence of the rat PGC-1α homolog (NM_031347): the AD from nucleotide (nuc.) 93–593, the NRF-1 domain (nuc. 594–1200), the MEF2c domain (nuc. 1201–1683), and the RNA binding domain (RBD, nuc. 1684–2167). We used Model test v. 3.7 (Posada and Crandall 1998) in conjunction with PAUP* (v 4.0b10, Swofford 2000) to select the best model of nucleotide evolution for each domain partition separately, according to the Akaike information criterion (AIC). The domain-specific evolutionary models selected were SYM + Γ for AD (lnL = −4382.29), GTR + I + Γ for NRF-1 (lnL = −6276.69), TVM + I + Γ for MEF2c (lnL = −5083.84), and GTR + I + Γ for RBD (lnL = −4435.0703). These models of nucleotide evolution were specified in separate Bayesian phylogenetic analyses in MrBayes 3.1.2 using the LSET command (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003), with the default prior distributions. The Metropolis coupled MCMC used two incrementally heated Markov chains and was run for 1 × 106 generations until all standard deviations of the split frequencies were <0.01, sampling every 100 generations. The first 2500 sampled trees were discarded as “burn-in” to ensure the analysis reached stationarity for each domain partition, and the remaining trees were used to estimate the Bayesian posterior probabilities. Trace analyses were performed to ensure sufficient effective sample size (>100) in Tracer v.1.4 and inferred tree topologies were retrieved in MEGA 4.1. In addition, maximum likelihood (ML) analyses were performed in PAUP* v 4.0b10, on each of the four domain partitions separately, specifying the nucleotide substitution models as chosen above. Support for trees generated for each analysis was estimated using non-parametric bootstrap analysis, with 100 pseudoreplicates.

Analysis of Substitution Patterns

We employed three approaches to investigate patterns of substitution in different clades of interest. First, we used our ML phylogenies (based on the AIC) together with the codon-based model in PAML 4.1 (Yang 1997) to estimate the ratios of nonsynonymous (dN) to synonymous (dS) substitutions (ω = dN/dS) for each domain partition and test if these ratios were different among the lineages. The likelihood ratio test (LRT) indicated that the free model (each branch has its own ω; see Figures S1–4 in supplementary file 2) was the best model for each domain partition (P < 0.01). However, as our purpose was to compare the evolution of the different domains, we report here the estimation of ω for each domain as an estimate of the accumulation of dN/dS according to the single branch ratio model.

Second, we used a simple permutation test to compare mean pairwise divergences of sarcopterygians (tetrapods) and actinopterygians (fishes) using MEGA 4.1. Pairwise distances were estimated using the Kimura 2-parameter Γ distributed model of substitution for each domain, a single model to ensure comparability among domains. Sequences for each domain were randomly assigned (n = 103 permutations) to each group to generate a null distribution of the mean pairwise difference between tetrapods and fishes. The observed mean divergence in substitutions between the two groups was compared to our null distribution and a P-value generated.

Finally, we estimated mean substitution rates independently for each of the four functional domains within sarcopterygians and actinopterygians separately using BEAST version 1.4.8 (Drummond and Rambaut 2007). We used a model with discrete Γ distribution with six rate categories. We further specified a relaxed molecular clock, where substitution rates are assumed to be uncorrelated between neighboring branches and also to follow a lognormal distribution (Drummond et al. 2006). Substitution rates for each domain were estimated by MCMC sampling in BEAST. For each domain we ran the MCMC analysis twice, each with 10,000,000 steps from which we discarded 1,000,000 as burn-in. We assigned nine internal TMRCA priors assuming a lognormal distribution based on fossil evidence following Benton and Donoghue (2007) (see Table 3). For each domain we combined the two runs using LogCombiner version 1.4.8. We compared the rates of evolution between the AD and the other domains using paired t-tests in each branch for all species. In addition, we computed the average (±standard deviation) substitution rate across all branches for sarcopterygians and actinopterygians separately, and compared them by one-tailed t-tests.

Table 3 Prior probability distributions for MCMC estimates of substitution rates using BEAST

Gene Predictions

We used the ENSEMBL genome browsers to identify the genomic location of putative PGC-1α homologs in human (Chr. 4), chicken (Chr. 4), Xenopus (Sc. 231), stickleback (group VII) and medaka (Chr. 18). We then compared the annotated region flanking these loci to investigate the respective syntenic relationships in these species. In addition, we used the ENSEMBL gene structure predictions of the chicken PGC-1α (ENSGALG00000014398) and stickleback PGC-1α (ENSGACG00000019546) along with the transcript sequences to infer the intron/exon structures and compare them to the human PGC-1α gene (ENSG00000109819).

Results

PGC-1α Gene in Vertebrates

The PGC-1α homologs that were amplified all clustered with established PGC-1α proteins, suggesting that the retrieved sequences were orthologous to the mammalian PGC-1α gene (Fig. 1). To further confirm the identity of the putative PGC-1α homologs, we used the ENSEMBL genome browser to analyze the genomic region surrounding the PGC-1α genes in representative species The synteny analysis revealed homologous organization in the model species examined (Fig. 2a). In addition, the comparison of the predicted PGC-1α gene intron/exon organization presented a conserved organization among vertebrates (Fig. 2b). The gene included 12 introns and 13 exons in all species examined, encoding a polypeptide ranging in size from 798 amino acids in tetrapods to 975 amino acids in fish (Fig. 2b). Collectively, these data suggested that the PGC-1α sequences obtained in the current study were true orthologs of the mammalian PGC-1α.

Fig. 1
figure 1

Phylogeny of the PGC-1 family. Tree resulting from a Bayesian phylogenetic analysis of PGC-1 family proteins from representative species. PGC-1α coding sequences (Table 1) were translated using MEGA 4.1, other family members protein sequences including the putative PGC ancestors from two arthropods (Drosophila and Anopheles mosquito) and a primitive chordate (sea squirt) were retrieved from ENSEMBL and GENBANK databases (Table 2). All nodes were supported by ≥95% posterior probabilitities except where noted (93%, 90%, *70%). The scale indicates the estimated number of substitution per site

Fig. 2
figure 2

Synteny and structure of the PGC-1α gene in vertebrates. a Syntenic comparisons of the PGC-1α gene in representative species, the black boxes represent the genomic location of the genes. b Deduced intronic (lines) and exonic (boxes) organization of the PGC-1α gene in vertebrates, adapted from the ENSEMBL prediction database for PGC-1α from human (ENSG00000109819), chicken (ENSGALG00000014398), and stickleback (ENSGACG00000019546)

Interaction Domains

The general architecture of the PGC-1α protein was relatively conserved across the vertebrate taxa sampled. The positions of the specific amino acid residues (a.a.) discussed in the following sections refer to the location on the rat PGC-1α protein (NP_112637). Of the four functional domains, the amino terminal AD showed the greatest similarity among vertebrates. In particular, the two activation motifs, AD1 (a.a. 30–40) and AD2 (a.a. 82–95), crucial to PGC-1α transcriptional activity (Sadana and Park 2007), showed over 90% identity across taxa. The three NHR boxes, also characteristic features of PGC-1α (Fig. 3), were identical in all species investigated; the residues flanking these boxes as well as their spatial organization were highly conserved as well.

Fig. 3
figure 3

Structure and conservation of the PGC-1α protein in vertebrates. The activation domain (AD), NRF-1 domain, MEF2c domain, and RBD, used to partition the PGC-1α sequences, are indicated on the rat protein sequence. Nuclear hormone receptor boxes (black boxes) and Host Cell Factor (white box) binding sites are represented on representative vertebrate species. The fish-specific serine and glutamine-rich insertions are indicated by an arrow. For each species, the number above a delineated domain represents the percentage of residues in that region identical to the rat protein

The central section of the protein was comparatively more divergent among vertebrates. For our analysis, we further divided it into the two major transcription factor interacting regions (Fig. 3), the NRF-1 (a.a. 180–403) and MEF2c (a.a. 404–564) domains (Fig. 3), based on findings of previous studies (Michael et al. 2001; Vercauteren et al. 2008; Wu et al. 1999). Lineage-specific insertions occurred within both domains. In the NRF-1 interaction domain, consequent serine-rich insertions (56–86% serine) appeared in chondrosteans (12–13 residues), bowfin (22 residues), and teleosts (28–31 residues). The teleosts also exhibited a glutamine-rich insertion (4–14 residues) flanking the well conserved tri-lysine residues typical of the NRF-1 interaction domain of vertebrate PGC-1α orthologs (a.a. 187–189). In that same domain, the interaction motifs for PPARγ (a.a. 292–339) and Host Cell Factor (HCF; a.a. 383–387) were reasonably conserved across species. The MEF2c domain was the least conserved part of the protein, with an additional serine-rich insertion (5–15 a.a.) specific to neopterygians. Within the central domain, there is a region of unknown function (a.a. 448–464) that showed ≥50% a.a. identity in all the PGC-1α orthologs. This region is also over 75% identical to rat PGC-1β (a.a. 722–738).

The carboxy terminal RBD is composed of the RNA recognition motif (a.a. 678–709) and two sequential series of serine–arginine repeats (a.a. 565–598 and a.a. 620–632). The RNA recognition motif exhibited 66% identity among vertebrates. The serine–arginine repeats were relatively more divergent in primary sequence, but most species still exhibited an overall high proportion of these residues in this region.

Sites of Post-translation Modification

We also assessed the conservation of numerous sites throughout the protein that are modified in ways that may regulate PGC-1α activity (Rodgers et al. 2005; Teyssier et al. 2005). All three p38 MAP kinase phosphorylation sites (Thr262, Ser265, Thr298) were present in all taxa (Table 4). In contrast, of the two AMP kinase phosphorylation sites identified in mammals (Thr177, Ser 538, Jäger et al. 2007), Ser538 was conserved in all species, while Thr177 was specific to tetrapods. Two methylation sites (Arg665, Arg667, Teyssier et al. 2005) were shared in all species, while a third site (Arg669) was absent in several tetrapod and fish species (Table 4). The majority of lysine residues targeted for acetylation (Rodgers et al. 2005) was conserved, however, Lys320 was present only in sarcopterygians (including tetrapods), and Lys441 was limited to amniotes (Table 4).

Table 4 Covalent modifications of PGC-1α and their conservation in vertebrates

Phylogenetic Divergence

The structural comparisons showed that the majority of PGC-1α critical residues was conserved across vertebrates, but that the ray-finned fishes possessed unique and extensive amino acid insertions in two major transcription factor binding domains (NRF-1 and MEF2c). We undertook phylogenetic analyses of PGC-1α to see how this protein evolved in major vertebrate lineages.

Tree topologies based on the whole DNA partition and amino acid sequences were consistent with evolutionary relationships among vertebrates (Figs. 1, 4). However, the actinopterygian clade exhibited higher divergence (i.e., longer branch lengths) as a group as well as among specific lineages (Fig. 4). To further understand this phenomenon, we analyzed the four main functional domains independently.

Fig. 4
figure 4

PGC-1α phylogeny in vertebrates. Tree (50% majority rule) resulting from combined, partitioned Bayesian analysis of all PGC-1α data (see Fig. 5 for details on the models for each partition). The tree topology is supported by ≥99% posterior probabilities except where noted. The scale indicates the estimated number of substitution per site

The Bayesian and ML phylogenetic analyses of each domain partition revealed four trees generally consistent between analytical approaches; thus we will principally discuss the results of the Bayesian analysis (Fig. 5). First, to assess if the different functional domains exhibited different rates of evolution we tested in PAML if the estimated single ratio model (ω = dN/dS) indicated difference in evolutionary rates across domains. Essentially, as presented in Parsch et al. (2001), we estimated a single ω for the whole sequence, and compared this to a model where each domain has its own ω. This analysis confirmed that the different domains had divergent rates of evolution (ωPGC-1α = 0.17 lnLPGC-1α = −20259; ωAD = 0.11 lnLAD = −4732; ωNRF-1 = 0.24 lnL NRF-1 = −6491; ωMEF2c = 0.21 lnLMEF2c = −5258; ωRBD = 0.08 lnLRBD = −4639) according to the LRT (2ΔlnL = 1722, df = 3, P < 0.000001) as the likelihood values of each domain models are additive and can be compared to the likelihood of the single ω model for the whole sequence as presented previously (Parsch et al. 2001). The tree topologies generated were overall in accordance with the accepted classification, with a few exceptions (Fig. 5). However, disagreements between partition-specific topologies and accepted classification (e.g., the position of the bowfin for NRF-1 domain or of the lungfish for RBD domain) were usually based on nodes with lower posterior probabilities (0.57 and 0.66, respectively) (Fig. 5b, d). When considering both the protein alignment and architecture conservation, the AD was the most conserved functional domains of PGC-1α in the vertebrate species examined (Fig. 5a). Furthermore, the BEAST analysis suggested that the AD exhibited the lowest substitution rates of all PGC-1α domains in vertebrates (AD-NRF t 32 = −4.33, P = 0.0001; AD-MEF t 32 = −4.69, P < 0.0001; AD-RBD t 30 = −3.93, P = 0.0005), for example the NRF-1 domain experienced substitutions rates 1.5- and 2-fold faster than the AD in tetrapods and fish, respectively (Table 5). In addition, analysis of the pairwise distances and substitution rates within the two major groups suggested no significant difference in the number of substitutions accumulated in the actinopterygian and sarcopterygian lineages for this domain (Table 5; Fig. 6a). In contrast, the other three domains showed very different evolutionary patterns, with the ray-finned fishes exhibiting significantly higher divergence than tetrapods (Figs. 5b–d, 6b–d). In particular, fishes that experienced relatively large insertions in the NRF-1 and MEF2c domains were the most divergent vertebrate species for these respective domain partitions (Fig. 5b, c). Furthermore, the NRF-1 domain of ray-finned fish experienced rates of evolution 1.3-fold faster than tetrapods (t 34.74 = 1.43, P = 0.081; Table 5). Similarly, actinopterygians exhibited accelerated evolution of the MEF2c domain, with substitution rates approximately 1.8-fold faster than sarcopterygians (t 34.79 = 2.32, P = 0.013; Table 5). Though the RBD exhibited a similar pattern in substitution rates (45% higher in fish than in tetrapods; t 34.69 = 2.002, P = 0.027), there was less overall divergence between tetrapods and fishes (Figs. 5d, 6d; Table 5).

Fig. 5
figure 5

Phylogeny of the PGC-1α functional domains in vertebrates. The best models of evolution according to the Akaike information criterion were selected for the activation (SYM + Γ, lnL = −4382.29), NRF-1 (GTR + I + Γ, lnL = −6276.69), MEF2c (TVM + I + Γ, lnL = −5083.84), and RNA binding (GTR + I + Γ, lnL = −4435.0703) domains. Each separate partition was then subjected to Bayesian and maximum likelihood analyses (see text for details). The maximum likelihood support and Bayesian posterior probabilities for the branches are shown before and after the slash, respectively. The scale indicates the estimated number of substitution per site

Table 5 Mean substitution rates (±SD), measured as substitutions per site per million years, estimated for each of four PGC-1α domains for the Actinopterygii and Sarcopterygii
Fig. 6
figure 6

Comparison of the difference of the mean number of substitution between the ray-finned fishes and the tetrapod lineage. The bars represent the expected frequency of the mean number of substitutions generated by 103 permutations with random assignment to each group of the Kimura 2-parameters pairwise differences within each clade. The white bars indicate higher number of accumulated substitutions in fish versus tetrapods, while gray bars indicate the opposite. The arrow indicates the observed mean difference between the two groups

We also looked at the ratios of synonymous to non-synonymous (ω = dN/dS) across domains. This ratio can be informative as it may reveal neutral (ω = 1), purifying (ω < 1) or positive selection (ω > 1) in specific clades or gene partitions of the protein. For each domain, the free-ratio model (ω can vary among branches) was always the best model (P < 0.01) when compared to the one-ratio model (ω equal for all branches) and two-ratios models (separate ω for the fish and tetrapod lineages), suggesting that ω varies among species. This also suggests that the evolutionary trajectories within the different clades are too divergent to detect significant clade-specific evolutionary patterns. However, under the single branch model where ω is estimated for the entire dataset in each domain partition, both the AD (ω = 0.11) and RBD (ω = 0.08) showed stronger trends of purifying selection than the NRF-1 (ω = 0.25) and MEF2c (ω = 0.20) domains.

In all four domains, amphibians presented higher overall divergence compared to other sarcopterygian clades (e.g., MEF2c region, Fig. 5c). In fish, the domain-specific divergences were relatively variable across species. Interestingly, in the two domains harboring extensive insertions in neopterygians (NRF-1 and MEF2c), the bowfin appeared closer to the teleost group than to the bichir and sturgeon in our phylogenetic analysis. Conversely, across domains, the chondrosteans showed an overall divergence similar to tetrapods (Fig. 5).

Overall, our phylogenic consideration of PGC-1α collectively suggested asymmetric evolution across the different functional sites with radically different evolutionary trajectories of the NRF-1 and MEF2c domains in ray-finned fish, as this lineage experienced accelerated rates of evolution for these functional domains.

Discussion

The evolution of vertebrate transcriptional regulatory networks has been punctuated by the multiplication and structural diversification of the transcription factors (Meyer and Schartl 1999; Escriva et al. 2003). Most studies to date have focused on the DNA-binding transcription factors such as HOX (reviewed in Hoegg and Meyer 2005; Hurley et al. 2005) and myogenic factors (Atchley et al. 1994; Macqueen and Johnston 2008). In the evolution of the metabolic phenotype, the best studied gene family is the NHR superfamily (see Escriva et al. 2003; Bury and Sturm 2007). However, the recent recognition of coregulators as central mediators of complex physiological responses in mammals suggests a pivotal role of these factors in vertebrate evolution. In the current study, we evaluated the evolutionary history of the PGC-1 family of coactivators and assessed the divergence of PGC-1α in representative vertebrate taxa. Our results suggest modular evolution of the coactivator as the functional domains of the protein experienced radically different patterns of evolution across species.

Evolution of the PGC-1 Family

The PGC-1 family likely evolved from an ancestral gene early in metazoan history. A distant putative homolog was found in Drosophila with a role in regulating metabolic homeostatic response to nutritional status (Gershman et al. 2007). Our additional database searches revealed the presence of potential homologs in mosquito and sea-squirt species (see Table 2). The putative ancestor harbored several features common to all three PGC-1 paralogs: a carboxyl terminus serine–arginine region in tandem with an RNA recognition motif, a putative HCF binding motif, and degenerate carboxyl leucine-rich boxes that could mediate interactions with NHR (Huang et al. 1998; Gershman et al. 2007). Based on the characteristics of these invertebrate sequences, it is likely that PGC-1 common ancestor lacked many of the functions of vertebrate PGC-1 paralogs. For example, invertebrate PGC-1α homologs lack the N-terminal canonical LXXLL motifs and the PPARγ binding site, which is perhaps not surprising given that the families of the nuclear receptor family (e.g., PPAR, ER, and ERR) only expanded and diversified to multi-isoform families as a result of duplication events early on in vertebrates history (Escriva et al. 2004; King-Jones and Thummel 2005).

The one-four model, often implicated in the diversification of gene families in vertebrates (Ohno 1999; Panopoulou and Poustka 2005), is consistent with the topology of the PGC-1 family. Our amino acid phylogenetic reconstruction of the PGC-1 family history suggested a first duplication event that gave rise to the PGC-1α/β ancestor and PRC ancestor (Fig. 1). A second duplication could have produced the PGC-1α and β paralogs while one of the PRC duplicates was lost. All three family members share common structural motifs and functions; however, they differ in tissue distributions and responsiveness to metabolic signals (Puigserver et al. 1998; Andersson and Scarpulla 2001; Kressler et al. 2002; Lin et al. 2002a; Vercauteren et al. 2006). These structural features collectively argue for a common ancestry of the family with subsequent neo- or sub-functionalization of the paralogs, as seen in other vertebrate gene families (Lin et al. 2006; Woolfe and Elgar 2007).

PGC-1α Evolution in Vertebrates

We retrieved several PGC-1α homologs from representative vertebrate species via sequencing and public database searches (GENBANK, ENSEMBL). Synteny and intron/exon arrangement analyses of selected sequences revealed that these sequences were orthologous to the mammalian PGC-1α (Fig. 1a, b). We note that, despite extensive RT-PCR and database searches, no evidence of a duplicate PGC-1α gene in any taxon was found. The structural integrity of PGC-1α was overall well conserved over the course of vertebrate history.

Regulatory Sites

As is the case for many regulators of gene expression, PGC-1α activity is strongly regulated post-translationally through covalent modification of multiple residues (Cao et al. 2004; Rodgers et al. 2005; Teyssier et al. 2005; Jäger et al. 2007). Most of the regulatory sites were highly conserved in all species (see Table 4). However, the presence of a few lineage-specific sites may suggest interesting differences in the activity and adaptive role of the protein in these lineages. As an example, phosphorylation of PGC-1α by AMPK promotes mitochondrial biogenesis in murine muscle cells, and the mutation of both regulatory sites ablates PGC-1α coactivating activity on its own promoter (Jäger et al. 2007). Thus, the presence of both residues in tetrapods versus only one in all the other species, suggests potential differences in the PGC-1α responsiveness to AMPK in these species. This may in turn have important consequences on the role of the coactivator in metabolic programs, such as acclimation to exercise in muscle, modulated through AMPK. Furthermore, it would explain the absence of a PGC-1α induction in response to endurance training in zebrafish (McClelland et al. 2006). Similarly, the absence of some residues targeted for deacetylation or methylation in several lineages (e.g., Lys441, Arg669) could indicate an overall lower basal activity of the coactivator in these species as suggested by mutation analyses in mammals (Rodgers et al. 2005).

In general, the regulatory sites of PGC-1α were relatively well conserved across vertebrates. Similarly, the overall organization of the four main functional domains was conserved in all species. However, when we evaluated the phylogenetic signal of these domains independently, they exhibited very distinct evolutionary patterns within different clades.

N- and C-Termini

The amino terminus of the protein, carrying the AD, was the most conserved feature of the coactivator (Fig. 5a, 6a; Table 5). Our substitution rate analysis suggested that this domain evolved at similar rates in actinopterygians and sarcopterygians (Table 5). Furthermore, the AD experienced the slowest substitution rates of all the functional domains of the protein (Table 5). Overall, these results suggest that this domain may be under strong purifying selection in vertebrates. Functionally, this region is critical for PGC-1α effects on gene expression; upon interaction with DNA bound proteins the coactivator recruits histone remodeling proteins (SRC-1, CBP/p300) via this domain, facilitating the transcriptional activation of target genes (Puigserver et al. 1999, 2003). Another highly conserved characteristic of this region is the presence and spacing of three leucine-rich boxes (LXXLL), critical motifs for the interactions with members of the NHR superfamily (Vega et al. 2000; Wu et al. 2002; Oberkofler et al. 2003). The preservation of these NHR binding sites would allow for similar capabilities for coactivation of these transcription factors (e.g., PPARs) in all vertebrate taxa, a possibility suggested by expression profiles in goldfish (LeMoine et al. 2008).

The carboxyl terminus of the protein was comparatively more divergent in the ray-finned fish relative to tetrapods (Fig. 5d, 6d; Table 5). However, we found the RNA recognition motif to be relatively well conserved across species, and while the SR domains were variable among species, all vertebrate PGC-1α paralogs possessed a region rich in serine and arginine residues upstream of the RNA recognition motif. This suggests a conservation of the RNA binding/processing function in vertebrates. In contrast, the high variability of the region flanking the RNA recognition motif in fish may prevent interactions with other transcription factors, such as NRF-1 and FOXO1 associated with this area (Puigserver et al. 2003; Vercauteren et al. 2008). Interestingly, within the PGC-1 family, the termini are the most conserved features of the proteins suggesting a strong selective pressure promoting the general integrity of the AD and RBD, essential to the coactivating capabilities of the paralogs.

NRF-1 and MEF2c Domains

Within the central portion of PGC-1α, defining landmarks of the PGC-1 paralogs—the PPARγ and HCF interaction domains—were relatively well conserved across species, suggesting similar binding capacities. Given the stasis of these domains, our phylogenetic analysis of the functional domains presented an interesting contrast. First, the PGC-1α domains exhibited asymmetric evolutionary dynamics (Fig. 5). While the AD showed little variability within and between the sarcopterygian and actinopterygian clades, both the NRF-1 and MEF2c domains were markedly less conserved, especially among ray-finned fishes (Figs. 5a–c, 6b; Table 5). In particular, these domains evolved significantly faster in fish than in tetrapods, suggesting divergent evolutionary pressures over these domains in these lineages (Table 5). In addition, most fish species experienced several important insertions within these regions (see Fig. 3). An initial polyserine insertion in the NRF-1 binding domain likely occurred at the basis of the actinopterygian lineage, and was then repeatedly extended at the base of the neopterygian and teleost clades. Downstream of these residues, we also found a glutamine-rich insertion of variable length for most teleost species (see Fig. 3). The divergence of the NRF-1 domain along with the gradual expansion of the serine insertion suggests a relaxation of the functional constraints associated with this region in fish lineages. Indeed, the location of these residues may have important consequences on PGC-1α ability to interact with NRF-1. The NRF-1 domain has not been finely mapped in PGC-1α, but in PRC the minimal NRF-1 interaction domain spans 34 a.a. containing tri-lysine residues in the central portion of the protein (Vercauteren et al. 2006). In fish PGC-1α the polyserine insertion is directly located upstream of these conserved lysines, thus possibly within the putative NRF-1 binding domain. Therefore, in fish these insertions could prevent interactions of PGC-1α with NRF-1, and consequently alter its capacity to mediate NRF-1 effects on mitochondrial capacity. It should be noted, however, that the minimal binding domain of NRF-1 interacts with PGC-1α is absolutely identical among representative species of major vertebrate lineages (e.g., mouse, zebrafish, chicken, xenopus, anole; data not shown). This functional divergence could provide a mechanistic explanation for the absence of correlation between PGC-1α and the NRF-1 axis observed in goldfish tissues with development (fiber type differences) and physiological remodeling (diet and temperature) (LeMoine et al. 2008). Similarly, in the MEF2c domain of neopterygians, an additional serine insertion was accompanied by an increased nucleotide divergence relative to other fish (Fig. 5c). And apart from a relatively well conserved stretch (a.a. 448–471), this domain showed little apparent homology between sarcopterygians and actinopterygians. In mammals, this region has no other known functions, and the exact residues allowing interactions with MEF2c have not been fully mapped yet (Michael et al. 2001). Interestingly, a polymorphism flanking the conserved region of that domain has been associated with decreased capacity to bind MEF2c in humans (Zhang et al. 2007). Therefore, it is conceivable that the high variability of this domain reflects the low complexity of that region, and that despite this apparent divergence, PGC-1α and MEF2c assume an evolutionary conserved role in conjointly regulating muscle phenotype in vertebrates. However, current annotated public genetic databases searches (ENSEMBL) for MEF2c suggest that bony fishes appear to possess a duplicated isoform of the protein. Furthermore, within and among fish species as well as across all vertebrates, these isoforms present relatively low levels of amino acid identity in the MEF2c domain necessary for its interactions with PGC-1α. Thus it is possible that the divergence of the MEF2c binding domain in fish PGC-1α could reflect the coevolutionary history of the two binding partners (MEF2c and PGC-1α). In addition, it is evident that additional parts of the binding partners could be involved in the interactions between PGC-1 and his partners, but without any additional direct structural data on these interactions we cannot further elaborate on the exact nature of these interactions. It is, however, beyond the scope of the current study to rigorously explore these various possibilities, but it certainly generates interesting hypotheses that warrant further testing.

Each of the phylogenies generated in this study indicated that the fishes and the amphibians seem to be more divergent than other lineages. It is interesting to note that multiple species in these lineages are polyploids (Ohno 1999; Otto and Whitton 2000; Beçak and Kobashi 2004; Hellsten et al. 2007). It has been suggested, that such polyploidization events demonstrate the genomic “plasticity” of these lineages characterized by much higher divergence rates than other vertebrates (see Robinson-Rechavi and Laudet 2001; Venkatesh 2003; Wagner et al. 2004). However, there is no evidence for retention of duplicated PGC-1α genes in any species investigated in this study.

Summary

Transcription factors play a major role in the development and specialization of adaptive traits in vertebrate species, but the modulation of their activities through coregulators provides finer adjustments of the tissue-specific response to a variety of signals. This study suggests that the diversification and evolution of these coregulators such as the PGC-1 family could provide invaluable evolutionary opportunities to adaptively tailor regulatory networks and their effects in specific lineages. Specifically, the conservation of the AD, RBD and NHR boxes suggest that PGC-1α could assume a similar role in coactivating NHR across vertebrate taxa. In contrast, the potential disruption of the NRF-1 binding site in fish would reduce the function of PGC-1α as a mediator of mitochondrial biogenesis in these species, a role that could possibly be assumed in fish by other PGC-1 paralogs (e.g., PGC-1β) as suggested previously (LeMoine et al. 2008). Although gene expression studies in fish suggest such divergence in PGC-1α function (McClelland et al. 2006; LeMoine et al. 2008, 2010), direct molecular testing of these interactions are warranted to establish with certainty the nature of the coactivating capabilities of PGC-1α in lower vertebrates.