Introduction

Pituitary growth hormone (GH) is a protein hormone from the pituitary gland, which is found in most vertebrates and is structurally related to pituitary prolactin and somatolactin and to various proteins expressed in the placenta (Forsyth and Wallis 2002; Goffin et al. 1996; Rand-Weaver et al. 1993). These proteins are in turn members of the cytokine/growth factor superfamily, members of which have a similar 3D structure comprising a four-helix bundle with “up-up-down-down connectivity” (Abdel-Meguid et al. 1987; De Vos et al. 1992; Kossiakoff and De Vos 1999; Mott and Campbell 1995). In mammals the gene encoding GH comprises five exons and four introns and extends over 2–3 kb of DNA. In most mammals there is a single GH-like gene, but in higher primates a series of duplications has given rise to a cluster of GH-related genes, several of which are expressed in the placenta (Chen et al. 1989; Wallis and Wallis 2002). Some caprine ruminants appear to have two GH-like genes (Wallis et al. 1998).

GH regulates somatic growth and has a number of other metabolic actions. These actions are mediated by binding of the hormone to a cell surface receptor (a member of the cytokine receptor superfamily) followed by receptor dimerization, leading to activation of a number of intracellular pathways, particularly that involving the tyrosine kinase Jak2 (Herrington and Carter-Su 2001).

The molecular evolution of GH shows an episodic pattern in which long periods of slow basal change are interrupted by bursts of rapid evolution (Wallis 1996). In mammals two particularly marked episodes of rapid change have occurred, in the Cetartiodactyla (Cetacea plus Artiodactyla [Montgelard et al. 1997]) and Primates (Forsyth and Wallis 2002; Liu et al. 2001b; Maniou et al. 2002; Ohta 1993; Wallis 1981, 1994, 1996; Wallis et al. 2001). We have argued that these periods of rapid change may reflect adaptive evolution, possibly involving changes in receptor-binding and biological properties (Forsyth and Wallis 2002; Wallis 1996, 1997; Wallis et al. 2001). An episodic pattern of this kind is also seen in the evolution of a number of other protein hormones (Wallis 2001).

We have studied the bursts of rapid evolutionary change seen in GH evolution in cetartiodactyls and primates by determining sequences of GH genes from a number of additional species including the cetartiodactyls dolphin (Maniou et al. 2002), chevrotain (Wallis and Wallis 2001), and red deer (Lioupis et al. 1997) and the primates marmoset and slow loris (Wallis et al. 2001; Wallis and Wallis 2002). Others have also determined novel GH sequences for primates (Adkins et al. 2001; Liu et al. 2001b), allowing detailed delineation of the episode of rapid change in this group. Here we report the sequences of three further cetartiodactyl GH genes, those of hippopotamus (hippo), camel, and giraffe, and analyze in detail the nature of the episode of rapid GH evolution in this group. This analysis takes into account recent molecular evidence suggesting that cetaceans are nested within the artiodactyl phylogenetic tree (Arnason et al. 2000; Gatesy et al. 1996; Graur and Higgins 1994; Kleineidam et al. 1999; Liu et al. 2001a; Madsen et al. 2001; Murphy et al. 2001; Nikaido et al. 1999; Shimamura et al. 1997; Smith et al. 1996). Our new data on GH genes are assessed in the light of current ideas about the phylogeny of Cetartiodactyla.

Materials and Methods

Preparation of Genomic DNA

Tissue samples were obtained from Dr. A.J. Sami (camel, Camelus dromedarius) and Dr. F. Catzeflis (hippo, Hippopotamus amphibius, and giraffe, Giraffa camelopardalis). DNA was prepared from these by the SDS–proteinase K procedure of Towner (1991).

Amplification of GH Genes by Polymerase ChainReaction (PCR)

Several sets of oligonucleotide primers were used for PCR (Table 1, Fig. 1). Design of these was based on known sequences of related cetartiodactyl GH genes and on trial and error. The GH gene was amplified by PCR using these primers (50 pmol each), 10 μl of 10× Herculase polymerase reaction buffer, 0.2 mM of each dNTP, 100 ng of genomic DNA, and 5 units of Herculase DNA polymerase (Stratagene, La Jolla, CA) adjusted to a final volume of 100 μl with H2O. An initial 2-min denaturation step at 92°C was followed by 31 cycles of denaturation (92°C, 1 min), annealing (60°C, 1 min), and chain elongation (72°C, 4 min). The reaction was stopped after a final extension of 10 min at 72°C. The size and purity of the PCR products were estimated by subjecting samples to 1% agarose gel electrophoresis.

Table 1 Primer sequences used for amplifying the camel, hippo, and giraffe GH genes
Figure 1
figure 1

Cloning of the GH genes from camel, hippo, and giraffe genomic DNAs. Exons are shown in black and introns in white. Regions of the genes amplified by the PCR primer sets used are indicated. Primer sequences are shown in Table 1. The camel GH gene was cloned using primer set 1 (1936 bp); the giraffe gene using primer sets 2 (1932 bp), 3 (1776 bp), and 4 (1366 bp); and the hippo gene using primer sets 5 (1770 bp), 6 (1900 bp), and 7 (1320 bp).

Cloning and Sequencing of the GH Gene

PCR products were cloned into the phagemid PCR-Script Amp SK(+) vector according to the instructions of the PCR-Script Amp SK(+) cloning kit (Stratagene) and transformed into the supercompetent E. coli cells provided with the kit. Double-stranded phagemids carrying PCR products were prepared by the QIAprep Miniprep procedure (Qiagen) and subjected to automatic DNA sequencing, using DNA sequencing services provided by GENPAK (University of Sussex), SEQLAB (Gottingen, Germany), Genescreen Ltd. (New Milton, Hants, UK), and MWG-Biotech (Ebersberg, Germany). To correct for potential errors arising during PCR amplification, sequencing was carried out on at least two clones, derived from independent PCR reactions, for the whole GH gene; both DNA strands were sequenced. In the case of hippo and giraffe, initially ambiguous results were subsequently explained by the presence of two gene sequences.

Sequence Analysis.

GH gene and protein sequences were aligned with those available for other mammalian GHs using the ClustalW program (Higgins and Sharp 1988), followed by manual adjustment. Phylogenetic analysis was carried out using parsimony, neighbor-joining, and maximum likelihood options in PAUP (Swofford 1998). Analysis of rate variation in GH evolution was carried out using the neighbor-joining method in PAUP, with a defined tree. MacClade (Maddison and Maddison 1992) and PHYLIP (Felsenstein 1993) were also used for phylogenetic analysis. Nonsynonymous (K a) and synonymous (K s) substitution rates in coding sequence were determined using the method of Nei and Gojobori (1986), as modified to allow correction for transition/transversion ratios by Zhang et al. (1998) and Zhang and Nei (2000), and a matrix containing these was used as input for the neighbor joining option in PAUP. Significance of differences between ratios was tested using Fisher’s exact test (Zhang et al. 1998). Sequence alignments and accession numbers of sequences can be found at the following Web site: http://www.biols.sussex.ac.uk/Home/Mike_Wallis/GHAlign/.

Results and Discussion

Cetartiodactyl GH Genes

Pairs of PCR primers designed on the basis of known sequences for cetartiodactyl GH genes allowed amplification of fragments of camel, hippo, and giraffe genomic DNAs corresponding to sizes expected for the GH gene (Fig. 1). The amplified DNA was cloned and clones obtained were subjected to DNA sequencing. Sequencing was hindered in the case of the camel gene, presumably by the presence of secondary structure; the problem was overcome by using betaine and sequencing primers close to the problem regions. As sequencing progressed it became clear that for hippo and for giraffe there were two very similar GH gene sequences, and both of these were sequenced in each case. For each gene examined, both DNA strands were sequenced and all ambiguities were resolved.

The nucleotide sequences determined for the camel, hippo (two sequences), and giraffe (two sequences) GH genes (1887–1936 bp excluding PCR primers) have been deposited in the GenBank/EMBL/DDBJ database, with accession numbers AJ575419–AJ575423. In each case the overall organisation of these cetartiodactyl GH genes is similar to that of the corresponding genes in other mammals and the following analysis is based on this. The coding sequence is split into five exons by four introns, the positions of which are identical to those in other mammalian GH genes. The boundaries between exons and introns conform to the GT–AG rule; the exception to this rule seen in the pig GH gene, where there is a GC at the start of intron 1, is not seen in any of the other cetartiodactyl GH genes studied. The two sequences obtained for the hippo GH gene differ at 11 bases, plus a 21-base insertion/deletion (indel) in intron 3. None of these differences are expected to affect the derived protein sequence. The two giraffe GH sequences differ at five bases; again, none of these substitutions would affect the derived protein sequence.

The sequences include 181–189 bp of sequence 5′ of the start codon. An alignment of these and corresponding 5′ sequences for GH genes from other cetartiodactyls is shown in Fig. 2. The TATA box is seen at positions −93 to −87 (residue numbering as in Fig. 2) and the ATG start codon at positions +1 to +3 (or −3 to −1 in giraffe, where there is an additional ATG, as in bovids and red deer). Interestingly the TATA box in both the hippo and the camel GH gene sequences differs from the conserved sequence seen in other mammalian GH sequences (and indeed most other mammalian genes that contain this element). This seems surprising, given the importance of this element in initiation of transcription, but a number of deviations from the TATAAA consensus sequence have been reported (Breathnach and Chambon 1981; Bucher 1990). The substitution seen at the first base of the TATA box of the camel GH gene clearly does not lead to failed expression since isolation of GH from the camel pituitary gland has been described (Martinat et al. 1990).

Figure 2
figure 2

Alignment of the 5′ untranslated region for GH genes from cetartiodactyls, human, and dog. Positions for the regulatory elements dPit-1, pPit-1 (distal and proximal Pit-1 elements), CRE (cyclic AMP response element), NRE3 (negative regulatory element), and TATA box are shown. The “ancestral” sequence (AncCetart) for the cetartiodactyl GH gene was derived, using dog as outgroup. (-) Represents identity to the “ancestral” GH sequence; (•) represents a gap.

The negative regulatory element (NRE3) that is conserved in most mammalian GH genes (except dog; Fig. 2), and probably represents a binding site for transcription factor YY1 (Park and Roe 1996), is retained unchanged in the GH genes of camel, hippo, and giraffe (Fig. 2). Two putative binding sites for the Pit-1 transcription factor (Theill and Karin 1993) are seen in positions corresponding to those in other mammalian GH genes (Krawczak et al. 1999), the distal one fully conserved and overlapping the NRE3 and the proximal one identical to (camel and hippo) or differing at only one base from (giraffe) the sequence in pig (Fig. 2). A cyclic AMP response element (CRE) is found between the Pit-1 sites in the human GH gene promoter (Eberhardt et al. 1996; Shepard et al. 1994), but the corresponding region in these and other cetartiodactyls shows a number of differences from the human sequence (including in most cases substitutions in the critical CGTCA motif), suggesting that the CRE may not be functional in Cetartiodactyla. A glucocorticoid regulatory element has been identified in the first intron of the human GH gene by Slater et al. (1985); a sequence corresponding to this can be identified in the GH genes of cetartiodactyls, but this differs markedly from the consensus sequence and may not be functional. The TAG stop codon is followed by 68–110 bases of 3′ untranslated sequence containing a potential polyadenylation signal (AATAAAA) in the two longest (camel and giraffe) sequences.

The observed changes in recognized regulatory elements in the GH promoter region suggest that regulation of the gene may have varied considerably during the course of mammalian evolution. It is also notable that the number of substitutions outside the defined elements is quite variable, with a particularly high rate of change on the lineage leading to the chevrotain.

Cetartiodactyl GH

The derived amino acid sequence for hippo GH is identical to that of dolphin GH, though the signal peptides differ at one residue (and coding sequences differ at several synonymous sites). The sequence of hippo GH, like that of dolphin GH, differs from that of fin whale at three residues, as discussed previously (Maniou et al. 2002). Hippo and dolphin GHs are similar to pig GH, differing from this at two residues. The sequence of giraffe GH is that of a typical ruminant GH, differing from that of bovine GH at 1 residue and from that of pig at 19 residues. The camel GH sequence is identical to that of alpaca GH (Biscoglio de Jimenez Bonino et al. 1991) and differs from that of pig GH at two residues. Each of the pre-GHs comprises a 26- to 27-residue signal peptide and a 190-residue mature GH sequence. Signal peptide sequences show rather more variation than the mature GH sequences, as is usually the case, but with no obvious pattern. Figure 3 gives an alignment of amino acid sequences for mature GHs of cetartiodactyls and various other mammals. The putative sequence for the GH of the ancestral eutherian mammal (essentially an appropriately weighted consensus sequence) appears to be identical to that of pig GH and dog GH (Wallis 1994; confirmed by more recent data, though the amino acid at residue 74 (Fig. 3) is uncertain [Adkins et al. 2000], and assigned here by reference to the marsupial possum outgroup).

Figure 3
figure 3

Alignment of mammalian GH sequences. The complete sequence of pig GH (identical to the “ancestral sequence” derived for eutherian GHs) is shown in the top line. (-) Indicates identity to the pig sequence; (•) indicates a gap. Numbers at the right-hand side of the alignment indicate number of differences from the pig GH sequence.

The two differences seen between hippo and pig GHs are the same as those between dolphin GH and pig GH and have been assessed previously in the light of the structural models available for the GH–receptor complex. Residue 148 (numbered 151 in the alignment in Fig. 3 because of inserted gaps) is located a substantial distance from either receptor-binding site, but residue 47 (49 in the alignment in Fig. 3) is located within binding site 1 and the substitution at this position could have a substantial effect on binding to the receptor (Maniou et al. 2002). The two differences between pig and camel GH sequences involve conservative substitutions at residues located a substantial distance (>12 Å) from the receptor binding sites and seem unlikely to affect binding. Similarly the single difference between the sequences of bovine and giraffe GHs reflects a conservative substitution at a site distant (>15 Å) from either receptor-binding site. Four of the 19 residues that differ between giraffe and pig GHs are sited close to a receptor binding site (within 5 Å).

Cetartiodactyl Phylogeny

Discussion of GH evolution in the Cetartiodactyla requires a phylogeny of the order as a basis. Considerable recent work supports the view that Cetacea is embedded within Artiodactyla, with Hippotamidae as the sister group of Cetacea, and that the clade comprising Cetacea + Hippopotamidae forms a sister group of Ruminantia (Arneson et al. 2000; Gatesy et al. 1996; Gatesy 1997; Graur and Higgins 1994; Kleineidam et al. 1999; Liu et al. 2001a; Matthee et al. 2001; Montgelard et al. 1997; Murphy et al. 2001; Nikaido et al. 1999). The position of Giraffidae within Ruminantia has been relatively little studied using molecular data; some authors (e.g., Matthee et al. 2001) represent Giraffidae as sister group of Bovidae + Cervidae, whereas others (e.g., Liu et al. 2001) have Cervidae + Giraffidae as sister group of Bovidae. The position of Tylopoda (camel, etc.) is also uncertain. Traditionally Tylopoda were shown as the sister group of Ruminantia, with Suina (pig, etc.) as outgroup; some recent molecular studies have supported this view (e.g., Matthee et al. 2001), but others have shown camel as the outgroup for other cetartiodactyls (e.g., Murphy et al. 2001) or have failed to resolve the branching order.

The new data for GH genes were analyzed with regard to these phylogenetic uncertainties. The available 5′ and 3′ sequences are rather short, and their use for phylogenetic analysis proved rather uninformative. Analysis of intron sequences and coding sequences proved more useful. The analyses were kept separate in order to avoid confounding the useful features of each. Coding sequences could be aligned with complete certainty, and a suitable outgroup was available (cat GH), but they contained relatively few informative sites; alignment of intron sequences required introduction of gaps, and a suitable outgroup was not available, but they contained more informative sites.

Phylogenetic analysis of coding sequences gave the tree shown in Fig. 4a. Monophyly of Ruminantia and of Pecora (higher ruminants, excluding chevrotain) within Ruminantia were strongly supported. The position of Giraffidae within Ruminantia was not defined. The clade including Cetacea, hippo, and Ruminantia was weakly supported, but there was no clear support for Cetacea and hippo as sister groups within this. There was quite strong support for camel as the sister to the Cetacea + hippo + ruminant clade, with pig as the outgroup for the other cetartiodactyls.

Figure 4
figure 4

Cetartiodactyl phylogeny based on (a) coding sequences and (b) intron sequences for the GH gene. Numbers on branches are bootstrap support values (%; 1000 replicates) from parsimony (top), neighbor joining (center), and maximum likelihood (bottom) analyses.

Phylogenetic analysis of intron sequences was carried out by producing an alignment for each intron, removing gaps, and then combining the four introns for the purposes of analysis. The phylogenetic tree is shown in Fig. 4b. Monophyly of Ruminantia and Pecora is again strongly supported. Giraffe as sister group of Cervidae + Bovidae is now supported quite strongly, as is the sister group relationship between hippo and dolphin. Ruminantia as a sister group of hippo + Cetacea is supported weakly at best. In the absence of an outgroup the relative positions of pig and camel cannot be established.

The trees obtained using intron and coding sequence data are complementary and consistent. Together they provide quite strong support for the phylogenetic relationships shown in Fig. 4. Clearly, data from the GH gene alone cannot be expected to resolve all the problems of cetartiodactyl phylogeny, but the result obtained accords with the detailed analysis by Matthee et al. (2001) and provides a working basis for analysis of GH evolution in this group. It should be noted that the amino acid sequence data are consistent with the tree of Fig. 4a, in particular, the identity of hippo and dolphin GHs provides further support for the sister group relationship between hippopotamids and Cetacea.

Two GH Gene Sequences in Hippopotamus and Giraffe

Analysis of the GH genes of hippo and giraffe revealed two similar GH gene sequences in each case. For giraffe the two sequences differ at five bases, none of which affected the encoded protein. The two hippo sequences showed more differences—11 substitutions and a substantial (21 base) indel in intron 3—but again there was no effect on the encoded protein sequence. A similar situation was encountered in the chevrotain GH gene, though there protein sequences differed at one residue (Wallis and Wallis 2001). The explanation for these different sequences is not clear. They could reflect the occurrence of duplicate genes in the genomes of these species or two alleles of the same gene. Duplicate GH-like genes have been described in sheep and goat, where an allelic polymorphism occurs, one allele having a single GH gene and the other having duplicate genes, in tandem (Valinsky et al. 1990; Lacroix et al. 1996; Wallis et al. 1998). Attempts were made to explore the existence of tandem duplicate GH-like genes in hippo and giraffe by cloning the intergene region using PCR primers leading out from the 3′ region of one gene and the 5′ region of the other. This approach successfully cloned intergene regions in the marmoset GH cluster (Wallis and Wallis 2002) but was not successful for either hippo or giraffe. This may be because the duplicate GH genes in these species are a substantial distance apart, but it could also be because there is no gene duplication and the two sequences identified arise as a consequence of allelic variation.

Episodic Evolution of GH in Cetartiodactyls

GH evolution in mammals has been episodic in nature, with a slow basal rate of change and occasional bursts of accelerated evolution (Wallis 1994, 1996; Wallis and Wallis 2001). The present data, together with previous reports, show that a single such burst occurred in the Cetartiodactyla. The sequences of camel and hippo GHs are very similar to those of pig GH, confirming that slow basal evolutionary change applied during most of cetartiodactyl evolution. On the other hand, the sequence of giraffe GH differs markedly from that of pig but is very similar to that of other ruminants, confirming that the episode of rapid evolution occurred on the lineage leading to ruminants, largely after divergence of the hippopotamid + cetacean clade, and that after the episode of rapid change the evolutionary rate fell back to the basal one. Using divergence times of 63 MYA for the divergence of Cetacea/hippos from the line leading to ruminants (Gatesy and O’Leary 2001), and 45 MYA for the divergence of Tragulidae from the line leading to Pecora (Miyamoto et al. 1993), the rate of evolution of GH during the episode of rapid change would have been ~5.0 × 10−9 substitutions/amino acid site/year, compared with a basal rate of ~0.25 × 10−9 substitutions/amino acid site/year, i.e., a 20-fold rate increase. This figure is very approximate in view of uncertainty about times of divergence, and might be an underestimate given that the episode of rapid change may not have extended over the whole period shown (Fig. 5).

Figure 5
figure 5

Phylogenetic trees for cetartiodactyl GHs, showing changes undergone by various components of the GH gene. Cat was used as outgroup. The numbers shown on the five trees represent (a) amino acid substitutions in the signal peptide, (b) nucleotide substitutions in the 5′ upstream region (nucleotides 1–159 in Fig. 2), (c) nucleotide substitutions in introns, (d) nonsynonymous substitutions in sequence coding for mature protein, and (e) synonymous substitutions in sequence coding for mature protein. In each case numbers on branches were derived using the neighbor joining option in PAUP and a defined tree. The heavy line indicates the branch showing accelerated evolution for nonsynonymous substitutions. Prior to analysis, gaps were removed (b and c) and introns 1–4 were combined (c). For each of hippo, chevrotain, and giraffe, sequence 1 was used for analysis; use of sequence 2 did not significantly alter the result obtained.

The episode of rapid change is confined to amino acid sequence/nonsynonymous sites in coding sequence for mature GH. Analysis of synonymous sites in this coding sequence, the signal peptide sequence, 5′ and 3′ sequence, and introns gives no indication for a burst of rapid change on the lineage leading to ruminants (the branch emphasized in Fig. 5). For amino acid residues or nonsynonymous substitutions in mature GH, 69% of the substitutions seen in the cetartiodactyl tree in Fig. 5d occur during the episode of rapid change; this is significantly higher (p < 0.01; Fisher’s exact test [Zhang et al. 1998]) than the proportion occurring in any of trees 5a–c or 5e (1.2–7.1%).

The ratio of nonsynonymous/synonymous substitutions (K a/K s) in a coding sequence has been used frequently as a definitive indicator of positive selection (Endo et al. 1996; Messier and Stewart 1997; Zhang et al. 1998). Normally the rate of synonymous substitution is much higher than that of nonsynonymous substitution (K a/K s <1.0). If the rate of nonsynonymous substitution is significantly higher than that of synonymous substitution, then the most likely explanation is positive selection. For the periods of slow evolution shown in Fig. 5d, this ratio is 0.027 “before” the period of rapid change and 0.029“after” it. For the episode of rapid change K a/K s increases to 1.23, but this is not significantly greater than 1.0 (Fisher’s exact test).

There are two main possible explanations for the episode of rapid change seen here: loss of function, with relaxation of the purifying selection that normally maintains a conserved sequence, and positive selection associated with change in function. Complete loss of function would be expected to lead to an increase in K a/K s to 1.0. However, there is no biological evidence for loss of function—the growth-promoting function of GH appears to be essentially the same in ruminant and nonruminant artiodactyls. There are some differences between the physicochemical and receptor-binding properties of GH in pig and ox or sheep (Amit et al. 1992, 1997; Cadman and Wallis 1981; Hughes et al. 1985), but these do not reflect major changes in physiological function. Furthermore, it is notable that the rate of evolution and K a/K s ratio after the episode of rapid change return to the low “basal” value. It is difficult to see how a highly conserved protein which has lost function and accumulated a substantial number of substitutions at random could subsequently rediscover a different functional structure which then became strongly conserved. It is likely therefore that the burst of rapid change in GH sequence that occurred during cetartiodactyl evolution was due to positive selection, associated with functional change. The detailed nature of this functional change is unclear, though we have suggested previously that fluctuation of the role of GH in controlling metabolic processes other than somatic growth may have led to accelerated evolution by the mechanism of function switching (Wallis 1997; Wallis and Wallis 2001).

GC Content

The mammalian GH gene has a high G+C content, presumably because it is located in a GC-rich isochore (Bernardi 2000). There is some evidence (Duret et al. 2002) that such GC-rich regions are tending to disappear in cetartiodactyls and primates. The possibility that the episode of rapid change in cetartiodactyl GH evolution might be associated with a change in GC content was investigated.

GC content was determined for various regions of the GH genes of cetartiodactyls and a carnivore outgroup (Table 2). For coding sequences, changes at codon positions 1, 2, and 3 were analyzed separately. All the GH genes examined show a high G+C content. For the individual codon positions, position 2 is AT-rich (unlike all other regions of the GH gene), whereas codon position 3 is very GC-rich. However, for no region of the GH gene is there a significant difference (Fisher’s exact test) between the GC content in ruminants and that in other cetartiodactyls, indicating that the episode of rapid evolutionary change of GH is not associated with change in content of G+C.

Table 2 Percentage G+C in GH genes

Analysis of individual substitutions accepted during the course of cetartiodactyl GH evolution indicates that the number of G/C→A/T substitutions exceeds the number of A/T→G/C substitutions by a factor of 2.5–5, depending on which part of the gene is examined, in accordance with the idea that the GC bias is decreasing. But there is no evidence to tie such a decrease to the episode of rapid change seen on the line leading to ruminants.