Introduction

The evolutionary history of the globin genes and proteins has been explored in considerable detail using a combination of molecular phylogenetic approaches and application of the molecular clock principle. These studies have revealed that a duplication of an ancestral globin gene led to the separation of precursor genes of the α- and β-globin lineages approximately 450 million years ago (MYA) (Czelusniak et al. 1982; Goodman et al. 1984; Hardison 1998). The precursor α-globin gene duplicated early in vertebrate evolution to produce two gene lineages, each of which evolved differences in the timing of its developmental expression. Subsequent tandem gene duplications of these two ancestral genes led to the evolution of a closely linked family of expressed genes and pseudogenes in mammals. For example, the human α-globin gene cluster comprises seven tightly linked genes (ζ2–ψζ1–ψα2–ψα1–α2–α1–θ), with the ζ2-globin gene expressed in early embryonic development and the α1- and α2-globin genes expressed during foetal and adult development (Orkin 1978; Lauer et al. 1980; Proudfoot and Maniatis 1980). ψζ1, ψα2 and ψα1 are known to be pseudogenes and the functional status of θ-globin is still unclear (see below).

Despite inconsistencies in the adopted naming strategies, comparative sequence and phylogenetic analyses have revealed that the embryonically expressed α-like globins of birds (π-globin) and mammals (ζ-globin) are orthologous (Czelusniak et al. 1982; Proudfoot et al. 1982) and it is likely that the α-globin family consisted of at least two functional genes (ζ-globin and α-globin) in the common ancestor of avians and mammals. The timing of further duplication events leading to the evolution of ζ2/ψζ1-and αl/α2-globin genes in humans is unclear, as each pair of genes has evolved in concert presumedly by mechanisms such as gene conversion. However, analyses of flanking sequences of goat ζ and ψζ genes reveal extensive regions of similarity with the corresponding ζand ψζ genes of humans, suggesting that “the duplication of ζ genes was very ancient, pre-dating the mammalian radiation” (Flint et al. 1988).

The α-like globin gene θ-globin has also been described for a variety of eutherian mammals including primates (Marks et al. 1986; Shaw et al. 1987), rabbits (Cheng et al. 1986), horses (Clegg 1987), and rodents (Satoh et al. 1999), indicating that the gene has a long evolutionary history in mammals. Molecular clock estimates suggested that θ-globin arose after duplication of an ancestral α-globin gene approximately 250–260 MYA, prior to the radiation of all mammals (Clegg 1987; Shaw et al. 1987). However, these time estimates were based on the assumption that the adult α-globin lineage has evolved at an approximately similar rate to the θ-globin lineage (a rate of 0.1 × 10−9 per site per year in amino acid replacement sites), an assumption that has not been rigorously tested. The functional status of θ-globin in mammals is still uncertain, with evidence that it is a pseudogene in a number of taxa including rabbit (Cheng et al. 1986), a rat (Satoh et al. 1999), and a primate galago (Otolemur crassicaudatus [Sawada and Schmid 1986]) and, possibly, the horse (Equus caballus [Clegg 1987]), suggesting that θ-globin could be under relaxed selective constraints in these species. Therefore, the application of a molecular clock in comparisons of α-globin and θ-globin genes is unlikely to be feasible and divergence times may be grossly overestimated.

Counter to the hypothesis that θ-globin is a pseudogene, in primates such as humans, orang-utan, and baboons, the gene shows no evidence of a loss of function (Shaw et al. 1987; Hsu et al. 1988). Shaw et al. (1987) compared rates of evolution of amino acid replacement versus silent-sites among θ-globin genes of a baboon (Papio anubis) and orang-utans and concluded that stabilizing selection was still operating on the coding sequence of these genes. Similarly, the human sequence showed ratios of replacement to silent-site substitution rates considerably less than one (∼0.1) compared to baboon and orang-utan θ-globin (Hsu et al. 1988). Human θ-globin coding sequences appear to contain no mutations that would prevent expression, and transcripts from this gene have been detected in human fetal and adult erythroid tissues, although no θ-containing hemoglobin has been detected (Hsu et al. 1988; Albitar et al. 1989, 1992). Despite these studies, naturally occurring deletions of θ-globin do not appear to cause hematological alterations in newborn and adult humans (Fei et al. 1988) and the physiological role of θ-globin remains uncertain. In addition, the above pairwise comparisons of replacement versus silent-site substitution rates did not take account of the potential occurrence of lineage-specific rate differences and no statistical analyses were presented to rule out the hypothesis that θ-globin is evolving at neutral rates of evolution in these mammals (Yang 1998). Overall, both the timing of the original duplication to produce the progenitor of θ-globin and the functional status of this gene in mammals require further investigation. Our aim is to contribute to these investigations by carrying out molecular evolutionary analyses of the α-globin gene family in divergent groups of mammals, such as marsupials.

Little is known about the α-globin gene family in marsupials at the DNA level. Prior to the commencement of our study, only a single α-globin cDNA sequence had been obtained from adult erythrocytes of the native cat, Dasyurus viverrinus (Wainwright and Hope 1985). However, clues to the types of α-globin genes present in marsupials have come from protein studies of the tammar wallaby Macropus eugenii (henceforth referred to as the tammar). Initial characterization of the hemoglobins of tammar neonates revealed the presence of four major embryonic hemoglobins, each with a different isoelectric point (Holland et al. 1988). Isolation and partial sequencing of hemoglobin polypeptides from these neonates showed the existence of three distinct α-like globin sequences (Holland and Gooley 1997). Comparisons with protein databases provided evidence that two of the polypeptides (referred to as ζ and ζ′-globin) were orthologous to embryonic ζ-type globins and the third polypeptide (α-globin) was orthologous to adult α-globins of eutherian mammals. Holland and Gooley (1997) further showed that the ζ′ and ζ polypeptides were present until about 2–3 days after birth, ultimately being replaced by adult α-globin. These results suggested that marsupials, like eutherian mammals, also have a family of differentially expressed α-like globin genes, including both embryonic and adult-expressed genes.

We have recently reported the existence of a novel β-like globin gene, called ω-globin (Wheeler et al. 2001) that is linked to the α-globin gene cluster in the tammar (Wheeler et al. 2004). The latter study revealed the existence of an adult α-globin gene and a second gene, provisionally called θ-globin, that were located approximately 12.5 and 7 kb upstream of ω-globin, respectively. Recently, analyses of a BAC clone from the dunnart Sminthopsis macroura also confirmed that ω-globin was linked to the α-globin gene cluster in this species (De Leo et al. 2005), showing that this arrangement is likely to be conserved in marsupials. Here we provide a detailed characterization of the adult α-globin and θ-globin genes and report complete gene sequences of two ζ-globin genes from the tammar. Molecular phylogenetic and maximum likelihood analyses are used to investigate the evolutionary history of this gene family and the functional status of the θ-globin gene in mammals.

Methods

The isolation and sequencing of two α-like globin genes (adult α-globin and a gene provisionally called θ-globin) of the tammar have been described in a separate paper on the linkage of the β-like globin gene ω-globin to the α-globin gene family in this species (Wheeler et al. 2004). Briefly, the two genes were isolated from a lambda clone, λTG2.11, which overlapped with a second clone (λTG3.4) containing ω-globin. The open reading frame from one of these genes encoded a protein that was identical to the partially sequenced (31–amino acid) α-chain from the tammar (Swiss Prot No. P81043 [Holland and Gooley 1997]) and differed by a single amino acid residue from the full-length adult α-globin chain of the closely related eastern grey kangaroo, Macropus giganteus (Beard and Thompson 1971). This gene is referred to as α-globin (GenBank No. AY459589). The coding region of the second gene was shown by maximum parsimony (MP) analyses (Wheeler et al. 2004) to form a sister lineage with eutherian θ-globin genes and was provisionally referred to as θ-globin (GenBank #: AY459590).

PCR Amplification and DNA Sequencing of Tammar ζ-globin Genes

Two ζ-globin proteins were previously identified in the tammar and their associated genes are henceforth referred to as tammar ζ-globin and ζ‘-globin. The approach used to sequence these genes involved PCR amplification with a combination of primers and template consisting of either total cellular DNA or cDNA derived from a whole tammar wallaby pouch young (see details below). Unless otherwise stated, standard PCR amplifications were carried out in a final volume of 50 μl of 1 × PCR buffer (Amplitaq Gold; Perkin Elmer) that contained approximately 100 ng of template DNA, 10 pmo of each primer, 0.2 mM dNTPs, 2 mM MgCl2, and 1 unit of polymerase. Cycling conditions were as follows: 95°C for 9 min, then 34 cycles (94°C, 45 s; 48°C, 45 s; and 72°C, 60 s). Amplifications were carried out on either an OMN-E 500 (Hybaid) thermocycler or a PC-960G gradient thermal cycler (Corbett Research). PCR products were purified using the UltraClean PCR Clean-up DNA purification kit (MoBio Laboratories Inc.) according to the manufacturer’s protocol. Sequencing was performed using the ABI Prism Big Dye Terminator Cycle sequencing kit (PE Applied Biosystems) in 20- or 10 μl reaction volumes according to the manufacturer’s instructions. PCR primers were used as sequencing primers and each fragment was sequenced on both strands. Reaction products were purified by ethanol precipitation (as specified by Applied Biosystems) and sequenced on ABI 373 (version 3.0) automated DNA sequencers. Sequence files were edited using SeqEd version 1.0.3 (Applied Biosystems) and a consensus of bidirectional sequencing was determined. SeqEd was also used to derive amino acid sequences from the genes. Nonsynonymous and synonymous divergence values were calculated with the computer program DNASP (Rozas and Rozas 1997), using the method of Nei and Gojobori (1986).

Isolation of Tammar ζ-Globin

Initial PCR primers were designed from the consensus sequence of an alignment of human, mouse, goat, horse ζ-globin, chicken π-globin coding sequences (GenBank accession numbers: NM_005332, M26898, X04726/X04862, X07052, V00411.1). The primers G298 (forward 5′-CAG ASC AAG ACC TAC TTC C-3′) and G301 (reverse 5‘-TTG AAG ITS ACC GGG TCC AC-3‘) were used to PCR-amplify and sequence a portion of exon 2 of tammar ζ-globin. This exon 2 sequence was used to design an additional primer, G310 (forward 5′-AAG GTG GTC AAT GCT CTT GG-3′), that was used in combination with G320 (reverse 5′-TTA GCG GTA CTT CTC TGT CAG-3′) to PCR-amplify and sequence the second exon, second intron, and third exon of ζ-globin. In addition, G310 was used in combination with a poly (dT) primer to amplify and sequence cDNA derived from a whole newborn pouch young. The cDNA was prepared by reverse transcription using an oligo (dT) primer (UT17: 5′-gtaaaacgacggccagttttttttttttttttt-3′) and the Superscript II reverse transcriptase kit (Gibco). The cDNA-derived sequence was identical to the putative exon sequence of the G310/G320 fragment, confirming the intron/exon boundaries of the gene and providing evidence that this gene is transcribed in tammar newborns.

The first exon and first intron of tammar ζ-globin were obtained using a “genome-walking” strategy. Tammar total genomic DNA was digested with the restriction enzyme EcoRI and ligated to an adapter sequence. The adapter was made by kinase treatment of the primer G280 (5′-AAT TCG AAG CTT GGG GTC TCT,GGC C-3′) and annealing it to the primer S61 (5′-GGC CAG AGA CCC CAA GCT TCG-3′). The adapter-ligated genomic DNA was then used as a template for PCR amplifications using S61 and the exon 2 primer G311 (reverse 5′-CTG AGC TTG GAG AGG GCA C-3′). The resulting product was gel purified, reamplified with S61 and G311, and sequenced. This sequence was shown to contain the first intron and a portion of the second exon of ζ-globin. The remaining region of exon 1 and the 5′ and 3′ flanking regions of the gene were PCR-amplified and sequenced using a similar genome-walking approach based on the method of Zhang and Gurr (2000). The entire gene was PCR-amplified with the primers G458 (forward, 5′-CAC CTC TTT GGG CTG TTC CTA C-3′) and G459 (reverse, 5′-CTG GAA YAG GGA AGA AGG TTG A-3′), located, respectively, in the 5′ and 3′ flanking regions, and sequenced with internal primers to verify the DNA sequence of the entire gene.

Isolation of Tammar ζ-Globin

During gene-walking experiments using the primer G310 (given above), a second ζ-like globin sequence was obtained (ζ‘-globin) which included exon 2 and intron 2 sequence, with the intron 2 diverging considerably from that obtained from the first ζ-globin gene sequenced previously. An intron 2 specific primer, G481 (reverse-5′-CCT CTC CTG CCC ACC TTT AG-3′), was designed and used with primer G316 (forward 5′-CAT GTC TCT GAG CAA GAC TG-3′), located at the start of exon 1, to amplify exon 1, intron 1, exon 3 regions of ζ′-globin. The remaining intron 2, exon 3, and 5′ and 3′ flanking regions of ζ‘-globin were obtained using the genome-walking approach of Zhang and Gurr (2000). Finally, the primer G475 (reverse, 5′-GGA CAA AAG GAT AAG ATG AAA CTC-3′), located in the 3′ flanking region, was used in combination with G316 to PCR amplify ζ‘-globin and internal primers used for cycle-sequencing to verify the sequence of the gene.

Phylogenetic Analyses

We selected a wide range of α-like globin genes for phylogenetic analyses, including exemplar sequences of both embryonic and adult-expressed genes from mammals, avians, and amphibians. We included only a single exemplar of the α2-globin gene in these analyses because of evidence for concerted evolution of this gene with α1-globin in humans and horses that has made the coding sequences of these genes almost identical within species (Liebhaber et al. 1981; Clegg 1987). Phylogenetic analyses were carried out using fish α-globin genes (yellowtail and salmon) as outgroups to root the phylogeny.

Evolutionary relationships among α-like globin genes were assessed using the phylogenetic programs PAUP*4b10 (Swofford 2002) and MRBAYES (Version 3; Huelsenbeck and Ronquist 2001). The accession numbers of the globin gene sequences used in these analyses are given in the legend to Fig. 2 and phylogenetic analyses were restricted to the coding regions of each gene. Tests of homogeneity of base frequencies among taxa, implemented using PAUP*, resulted in rejection of a hypothesis of homogeneity when all sites were included (χ2 = 219.39, df = 87, p = 0.000). Exclusion of third codon positions resulted in acceptance of the homogeneity hypothesis (χ2 = 77.15, df = 90, p = 0.831).

MP analyses were carried out using PAUP* with a standard heuristic search option and random input orders of taxa (option: random stepwise addition), repeated 100 times. MP bootstrap analyses (Felsenstein 1985) were carried out using 1000 bootstrap pseudoreplicates and a heuristic search option with no random input order of taxa.

To determine the most appropriate model of evolution for the globin sequence data, a series of likelihood ratio tests was carried out to compare different nested models using the programs Modeltest (Posada and Crandall 1998) and PAUP*. The analyses indicated that the general time-reversible model (Rodriguez et al. 1990), with a proportion of invariant sites and unequal rates among sites (Yang 1996) modeled with a gamma distribution (GTR + I + G), was the most appropriate model for the Bayesian analyses. The MRBAYES analysis was carried out using the GTR + I + G model with default uninformative priors, running four chains simultaneously for 2 million generations and sampling trees every 100 generations. The likelihood values converged to relatively stationary values after about 1000 generations. A burn-in of 100 trees was, therefore, chosen with a >50% posterior probability consensus tree constructed from the remaining 20,900 trees.

ML Analyses Using PAML

Likelihood ratio tests of hypotheses of neutral evolution vs stabilizing selection on different branches of the α-globin phylogeny was carried out using the program PAML (version 3.14beta3; Yang 1997). Full details of these analyses are described under results.

Promoter Sequence Analyses

Transcription factor binding sites were identified by means of the programs TFSITESCAN (http://www.ifti.org/cgi-bin/ifti/ Tfsitescan.pl), and MATINSPECTOR V2.2 professional (http://www.genomatix.de/mat_fam) using the default options and the TRANSFAC database.

Results

Characterization of Tammar α-Globin and θ-Globin Genes

Both tammar α-globin (GenBank No. AY459589) and θ-globin (GenBank No. AY459590) have conserved intron/exon splicing signals and a three exon, two-intron structure (Fig. 1). The first introns of tammar α-globin and θ-globin are, respectively, 286 and 291 bp long and the second introns are 281 and 254 bp long. Both genes have conserved initiation and stop codons and polyadenylation signals. The exons are open reading frames encoding polypeptides of 141 amino acids, typical of known functional mammalian α-globin genes. Upstream from the initiation codon of tammar α-globin are conserved promoter signals, including an ATA element and a CCAAT box. In contrast, the promoter region of tammar θ-globin appears atypical of most mammalian α-like globin genes (Fig. 1). A possible ATA element is located at –49 bp, but there appears to be no CCAAT box. A GC-rich region (>80% GC over a sequence length of 58 bp) is found from about 60 bp upstream of the initiation codon of θ-globin. Transcription factor analysis of tammar θ-globin reveals a possible consensus site for a direct repeat binding protein, α-PAL (Efiok et al. 1994), within the GC rich region (87 bp upstream from the initiation codon; Fig. 1). Examination of the human θ-globin promoter (accession No. NM_005331) also reveals a region 36 bp upstream of the initiation codon that closely matches the α-PAL consensus (T/c G/c CAT/c GCGCA [Efiok et al. 1994]).

Figure 1
figure 1

DNA sequence of tammar θ-globin and corresponding predicted amino acid sequence in single-letter code. Noncoding DNA is shown in lowercase. Possible ATA and α-PAL sites in the promoter region of α-globin are underlined, with the α-PAL site also shown in boldface. Other conserved signals, including initiation, termination, a polyadenylation signal, and donor and acceptor splice signals of the introns, are underlined.

Phylogenetic analyses based on both MP and Bayesian analyses generally support the monophyly of tammar θ-globin with eutherian θ-globin genes (Fig. 2, Table 1). Exclusion of third codon positions in MP analyses gave much higher bootstrap support (75%) for this grouping than analyses including third codon positions (55%). Bayesian analyses applying a single model/set of parameter estimates across all sites supported an alternative arrangement with tammar θ-globin grouping with marsupial α-globin genes (tree not shown), in contradistinction to Bayesian analyses where different sets of parameter estimates were applied to each codon position. All analyses strongly support the monophyletic grouping of mammalian θ-globin and α-globin genes and a sister group relationship with chicken α-globin (Fig. 2, Table 1), providing evidence that the θ-globin lineage arose from a tandem gene duplication of an ancestral α-globin gene after the divergence of the avian and mammalian lineages.

Figure 2
figure 2

A A single most parsimonious (MP) tree of length 2053 steps from a PAUP* analysis of marsupial, eutherian, avian, and amphibian α-like globin gene coding sequences with all characters equally weighted. Sequences of yellowtail and salmon α-globin were used as an outgroup to root the tree. Numbers on branches refer to nodes with bootstrap values provided in Table 1. B A 50% majority-rule consensus phylogram from a Bayesian analysis constucted using a single GTR + I + G model of evolution and estimated base frequencies in an unlinked analysis using MrBayes. Numbers adjacent to branches refer to posterior probabilities. GenBank accession numbers for sequences are as follows: Tammar θ (AY459590), α (AY459589), ζ (AY789121), ζ′ (AY789122); horse θ (Y00284), α1 (M17902), ζ (X07051); goat α (J00043); Human α1 (V00491), θ (X06482), ζ (NM005332); mouse αA1 (NM008218), ζ (X62302); rabbit α (X04751), ζ1 (AH001223), θ (X04751); Rat θ (X56330); native cat (Dasyurus viverrlnus) α (M14567); Chicken αA, π, αD (AF098919); duck (Cairina moschata) αD (X01831); pigeon (Columba livia) αD (AB001981); Geochelone nigra αD (SEG AB1165195); yellowtail (Seriola quinquerd) αA (AB034639); salmon (Salmo salar) α (X97289); salamander (Hynobius retardatus) larval (AB034756); salamander (Pleurodeles waltlii) α (M13365); Xenopus laevis α (X14259), tadpole α T5 (X02798).

Table 1 Bootstrap support values (%) and posterior probabilities (%) of nodes shown in the α-globin gene phylogeny in Fig. 2A (except nodes 2 and 8)

Further support for the orthology of tammar θ-globin and eutherian θ-globin genes also comes from its position in the α-globin cluster. Tammar θ-globin is located approximately 4.5 kb downstream of adult α-globin, a further 7 kb upstream of the β-like globin gene ω-globin, the three genes being in the order 5′–α–θ–ω–3′ (Wheeler et al. 2004). The θ-globin gene of humans is located in a similar position (3 kb downstream) adjacent to adult α-globin1), although in the human α-globin cluster, ω-globin has not been detected in any recent analyses (Flint et al. 2001).

Is Tammar θ-Globin a Functional Gene?

RT-PCR was used to amplify a cDNA identical in the region of overlap to the coding region of θ-globin, confirming that this gene is expressed in 1-day-old tammar neonates (results not shown). However, no putative θ-like globin has been identified in hemoglobin isolated from a number of neonatal tammar blood samples (Holland and Gooley 1997; unpublished data, K. Gill and colleagues). Therefore, either (i) the θ-globin RNA is not translated, (ii) θ-like polypeptides form a very low abundant component of the total hemoglobin in neonates, or (iii) the gene is expressed in other tissues.

Clegg (1987) suggested that horse θ-globin is unlikely to function as a true globin due to the encoded protein containing a number of amino acid substitutions that would severely impair the stability of the hemoglobin subunit. To determine if tammar θ-globin has accumulated amino acid mutations similar to those found in horse θ-globin, the θ-chains of horse, orang-utan, human, and tammar were compared with the human and tammar α-chains (Fig. 3). Tammar θ-globin differs from the human α-globin chain at two amino acid residues that have been proposed (Clegg 1987) to be important for the formation of hemoglobin. The first of these substitutions, 32 Met > Thr, is unlikely to prevent the formation of functional hemoglobin, as the tammar adult α-chain also has Thr at this position. The second substitution, 38 Thr>Ser, is also seen in the horse and human θ-globin chains and is proposed to affect a residue vital for the α/β cooperative interaction (Clegg 1987). However, residue 38 is Gln in the embryonic α-type globins in the tammar (ζ and ζ′) and other mammals so a substitution here is unlikely on its own to prevent the formation of a stable tetramer. The most noteworthy feature of these θ-chain comparisons is the presence in horse θ of Cys rather than Tyr at position 140, the penultimate position. To our knowledge Tyr is invariant here in the α and β-type chains of all normal hemoglobins, where it plays an important part in stabilizing the deoxy conformation and in the conformational changes that occur in oxygenation and lead to cooperativity in the tetrameric molecule.

Figure 3
figure 3

An alignment of marsupial and eutherian α- and θ-globin polypeptide chains. Numbered positions, in boldfaces, are residues required for stability of the hemoglobin as suggested by Clegg (1987). Residue 32 shows a mutation (T—M) in both horse and tammar θ-globin, which could potentially hinder the formation of a functional tetrameric hemoglobin. Residue 140 (Cys) in horse θ is most unusual and would decrease the cooperativity of any tetramer (see text). Accession numbers for protein sequences are given in the legend to Fig. 2.

Additional clues to the functional status of tammar θ-globin can also be obtained by molecular evolutionary analyses, in particular, focusing on estimates of amino acid replacement (nonsynonymous) and silent-site (synonymous) divergence rates and the ratio between these values. The nonsynonymous divergence level of tammar θ-globin relative to other eutherian θ-globin genes ranges from 0.28 to 0.32, almost twice the level observed in the comparison between orthologous tammar and eutherian α-globin genes (0.14–0.20) (Table 2). These comparisons suggest a higher rate of amino acid changing mutations in the θ-globin lineage, an observation that may indicate a relaxation in the effects of stabilizing selection. Estimates of synonymous divergence levels among eutherian and tammar θ-globin genes are between 1.23 and 3. 00, suggesting that this relaxation in selection does not approach neutral rates of evolution. However, pairwise estimates of nonsynonymous and synonymous divergence levels do not distinguish whether particular branches of a phylogeny may be evolving at neutral rates, while other branches are under stabilizing selection.

Table 2 Nonsynonymous (below the diagonal) and synonymous (above the diagonal) divergence rates between pairs of mammalian α- and θ-globin genes

To test more rigorously the possibility that θ-globin is evolving at neutral rates of evolution, we carried out a series of likelihood ratio tests, applying different estimates of the nonsynonymous/synonymous divergence ratio (ω = d N /d s ) to different branches of an α-globin phylogeny using the program PAML (version 3. 14 β Yang 1997). To reduce computational time and potential saturation problems at synonymous sites we used a reduced number of taxa in the phylogeny to include only chicken and mammalian adult α-globin genes and mammalian θ-globin genes (Fig. 4). This phylogenetic tree is based on strongly supported relationships among mammalian orders (Madsen et al. 2001; Murphy et al. 2001) and our analyses that support the orthology of marsupial and eutherian adult α-globin and θ-globin genes, respectively. Analyses were repeated (data not shown) using the same taxon set with the divergence order of θ-globin genes identical to that found in the MrBayes tree (Fig. 2) and the conclusions for the likelihood ratio tests were not found to differ from those presented below.

Figure 4
figure 4

Phylogenetic tree used in the PAML (Yang 1997) analyses showing relationships among marsupial and eutherian α-globin and θ-globin genes.

If the θ-globin lineage was evolving at neutral rates of evolution then a ratio ω = 1 would be expected, while other parts of the α-globin phylogeny would evolve with ω < 1, with genes under stabilising selection. A null hypothesis model with ω = 1 (log likelihood value = l 0 ) can therefore be compared with an alternative hypothesis with the ω ratio free to vary along specified branches of the θ-globin lineage with a second ratio for the remainder of the tree (Table 3). Given that the null hypothesis is nested within the more general hypothesis with two ratios that are free to vary, twice the log likelihood difference, 2Δl = 2(l 1 l 0 ), should approximate a χ2 distribution with one degree of freedom corresponding to an increase of one parameter in the more complex model (Yang 1998). An alternative test of a potential loss of constraint along a lineage of the phylogeny compares the null hypothesis of a single ratio in the tree with an alternative hypothesis allowing two ratios; ω0 (background ratio) and ω H free to vary in a specified θ-globin lineage. The likelihood scores for the PAML analyses are shown in Table 3 and results for χ2 analyses are shown in Table 4. The ability of the methodology to distinguish branches that are evolving at neutral rates of evolution is exemplified by the analysis of the branch leading to rabbit θ-globin, a known pseudogene. The likelihood ratio test failed to reject the null hypothesis of ω = 1 along this branch, i.e., that the entire coding sequence of this gene is evolving at a neutral rate of evolution in the branch leading to rabbit θ-globin (Table 4A). A similar finding was found for the horse θ-globin lineage. Rat θ-globin is also proposed to be a pseudogene (Satoh et al. 1999), but for likelihood ratio tests involving this lineage, a hypothesis of ω = 1 was rejected. All other tests strongly rejected the null hypothesis of ω = 1 along different branches of the θ-globin lineage at the 1% level of significance (Table 3A). These branches included those leading to human θ-globin and marsupial (tammar) θ-globin the latter appearing to be under strong stabilizing selection, with an estimated ω of 0.001.

Table 3 Log likelihood values and parameter estimates under different models using PAMLa
Table 4 Likelihood ratio tests (2Δl) of models given in Table 3

Tests of a single ω ratio across the tree (null model) versus two estimated ω ratios in different parts of the phylogeny largely mirrored the findings from the above tests (Table 4B). The hypothesis of a single ratio was strongly rejected in favor of a two-ω ratio hypothesis, with a separate ω ratio for the horse, rabbit, and tammar θ-globin lineages. These analyses suggest that these genes are evolving under an alternative ratio compared to the background ratio in the phylogeny. In the case of the tammar θ-globin lineage (ω = 0.001) this ratio is considerably lower than the average ratio in the remainder of the tree (ω = 0.139). For the rabbit (ω = 0.416) and horse (ω = 2.852) lineages the ratio is significantly larger than the background ratio suggesting a relaxation of selective constraints.

Isolation and Characterization of Two ζ-Globin Genes

A combination of degenerate PCR-amplifications and genome-walking methods was used to isolate the entire genomic DNA sequences of two ζ-globin genes from the tammar. The conceptually translated polypeptide sequences of one of the genes is identical to a full-length polypeptide sequence of ζ-globin (K. Gill et al., unpublished data). The second gene showed four differences from a full-length polypeptide sequence of the ζ′ chain (K. Gill et al., unpublished data), but these differences are most likely to represent either sequencing errors or allelic variants. In keeping with the nomenclature of this protein study the two genes are referred to as ζ-globin (GenBank No. AY789121) and ζ‘-globin (GenBank No. AY789122). Each gene encodes a peptide of 141 amino acids and has the same three exon, two-intron structure typical of all known functional mammalian α-globin genes. The coding regions of each gene differ by about 13% and their encoded polypeptides differ by 16 amino acids (11.3%). The first introns of ζ and ζ′-globin are respectively, 425 and 962 bp long and the second introns are respectively, 214 and 425 bp long. The significance, if any, of the twofold length differences in the introns of these genes is unknown. Dot-plot analyses comparing the intron sequences between the two genes, reveal no long regions of alignment > 20 bp, and low sequence similarity (<50%) within these regions of alignment (results not shown). The promoter region of ζ-globin and ζ′-globin includes putative ATA and CCAAT sites, common to all functional α-globin genes.

Phylogenetic analyses using a Bayesian approach, applying different models of sequence evolution to different codon positions (unlinked analysis) and a single model for all sites (linked analysis), provide strong support (posterior probability=100%) for a monophyletic group containing tammar ζ-globin and ζ‘-globin, eutherian ζ-globin and avian π-globin genes (Fig. 2, Table 1). There is weak support for this group having a sister group relationship to salamander/toad larval α-like globin genes in some analyses (Table 1). The Bayesian trees also show monophyly of mammalian ζ-globin genes but only low posterior probabilities were obtained for this grouping. The MP tree based on an unweighted analysis groups tammar ζ-globin genes with avian α-globin and amphibian larval α-globin genes, to the exclusion of a second group containing eutherian ζ-globin genes. However, this unusual arrangement is likely to result from saturation of third codon positions or the lack of homogeneity of base frequencies at this position. Exclusion of third codon positions from analyses led to the acceptance of a hypothesis of homogeneity of base frequencies and, for MP analyses, gave a high bootstrap value of 84% for the monophyly of mammalian ζ-globin genes and avian π-globin (Table 1). Overall, the analyses support the conclusion that tammar ζ-globin and ζ‘-globin are orthologous to eutherian ζ-globin genes. A sister lineage relationship of tammar ζ-globin and ζ‘-globin was supported by high MP bootstrap values (>94%) and high posterior probabilities (100%).

Discussion

The analyses presented here provide strong evidence that the tammar α-globin gene family consists of at least four genes, i.e., two ζ-globin genes, one adult α-globin gene and θ-globin. Previous published analyses indicate that two of the genes, α-globin and θ-globin, are linked in the order 5′–α–θ–3′ with both genes located upstream of a β-like globin gene called ω-globin (Wheeler et al. 2004). Given the orthology of the two tammar ζ-globin genes to eutherian ζ-globin genes, and the conserved position of these genes to the 5’ side of the α-globin and θ-globin genes in several eutherian mammals (e.g., goat [Flint et al. 1988], human [Lauer et al. 1980]), it is likely that the tammar α-globin gene cluster is arranged in the order 5′–ζ?–ζ?–α–θ–ω–3′, with the relative order of the two ζ-globin genes being unknown. Recent sequence analyses of a BAC clone from a second marsupial, Sminthopsis macroura, have also shown that a ζ-globin gene is located 5′ to α-globin and θ-globin, providing further evidence for this arrangement in the tammar (De Leo et al. 2005). Our analyses cannot exclude the possibility that additional functional α-like globin genes or pseudogenes are present in the tammar. However, protein studies by Holland and Gooley (1997) did not detect additional α-like globin polypeptides, so further loci in the family are most likely to be pseudogenes.

The phylogenetic analyses support the long-term evolutionary history of the ζ-globin gene in mammals and the orthology of this gene to avian π-globin genes. In addition, there is support from some analyses for the ζ-globin/π-globin lineage being the sister group to amphibian larval α-like globin genes. The latter finding suggests that the tandem gene duplication leading to an ancestral ζ-globin gene occurred prior to the divergence of these vertebrate lineages and supports previous proposals that the embryonic α-like globin genes of amphibians, avians, and mammals are orthologous (Proudfoot et al. 1982).

The proposal by Flint et al. (1988) for an ancient duplication of ζ-globin that predated the mammalian radiation is partly supported by our finding of two ζ-globin genes in the tammar. However, two lines of evidence suggest that an alternative hypothesis might be favored. First, the two genes show the relatively high sequence similarity of 87% in their coding regions, which is higher, for example, than the similarity of adult α-globin genes of the tammar and dasyurid marsupial Dasyurus viverrinus (82%). Second, phylogenetic analyses provide strong support for their monophyly to the exclusion of eutherian ζ-globin genes (Fig. 2). These results suggest that ζ-globin may have independently duplicated within both the marsupial and the eutherian lineages. However, it is also possible that past gene conversion events have maintained similarities in the coding sequences of the two tammar ζ-globin genes and eliminated the evolutionary signal of a more ancient duplication event. Such a process has led to the concerted evolution of the two ζ-globin genes in both goats (Flint et al. 1988) and humans (Proudfoot et al. 1982). Given a similarity level of 87% it might be expected that intron sequences would also show moderate similarity. For example, dot-plot sequence comparisons of the first intron of β-globin genes of the tammar and American opossum Didelphis virginiana reveal extensive regions of homology, while the coding sequences of the two genes showing a similarity of only 83% (Wheeler et al. 2001). The first introns of tammar ζ- and ζ′-globin are considerably different in length (425 vs. 962 bp, respectively) and dot-plot and sequence alignments of both introns of each gene show an absence of significant regions of similarity. The latter finding is typical of intron sequence comparisons of paralogous genes that have diverged prior to the mammalian radiation and suggests that the ζ-globin duplication may be older than that predicted by the coding sequences, supporting the hypothesis of Flint et al. (1988).

The existence of a marsupial θ-globin gene is largely consistent with a previous dating of the α-/θ-globin duplication (250 MYA) to before the marsupial/eutherian divergence from a common ancestor (∼120 MYA) (Efstratiadis et al. 1980; Clegg 1987; Hope et al. 1990). However, the duplication date is probably considerably overestimated, given that θ-globin genes have evolved up to two times faster than α-globin genes (Table 2), negating the original assumption that α-globin and θ-globin evolved at the same rate (Clegg 1987; Shaw et al. 1987). The relatively poor bootstrap support and low posterior probabilities for the grouping of marsupial and eutherian θ-globin genes indicate a low phylogenetic signal on the branch leading to marsupial/eutherian θ-globin genes, suggesting that α-θ-globin duplication occurred just prior to the divergence of marsupials and eutherians from a common ancestor.

The functional status of θ-globin in mammals has been uncertain since its original isolation from a primate in 1986 (Marks et al. 1986). In the current study, there is evidence for θ-globin transcripts in 1-day-old tammar neonates, but to date, θ-globin polypeptides have not been detected in tammar neonate blood (Holland and Gooley 1997). The polypeptide sequence of tammar θ does not appear to contain substitutions that would impair the function of the globin or its ability to contribute to a normally functioning tetramer. In addition, evolutionary analyses, using a relatively new approach to test for neutral rates of evolution along different lineages of a phylogeny (Yang 1998), lend support to the long-term functional status of θ-globin in both marsupials and eutherians, particularly in the lineage leading to human θ-globin. The likelihood ratio tests (LRTs) suggested that rabbit and horse θ-globin are both evolving under relaxed selective constraints indicative of a pseudogene. This result was expected for rabbit θ-globin, a known pseudogene, but the functional status of horse θ-globin has been uncertain, although the translated product of this gene contains a number of amino acid substitutions (e.g., 140 Tyr–Cys) suggestive of a pseudogene (Clegg 1987). The LRTs failed to detect the pseudogene status of rat θ-globin (ψθ1 [Satoh et al. 1999]), but it is possible that in the lineage leading to rat θ-globin, θ-globin initially evolved under stabilizing selection and only recently became a pseudogene. The tammar θ-globin lineage was shown to have an estimated nonsynonymous/synonymous ratio (ω = 0.01) that is significantly smaller than the average ratio in the phylogeny (Table 2). This finding most likely reflects an elevated synonymous rate along the branch leading to tammar (estimated dS =87.99 when ω was free to vary), rather than a reduced nonsynonymous rate (estimate dN = 0.094), the latter being similar among taxa.

The 5‘ upstream region of tammar θ-globin and human θ-globin show evidence for a binding site for the transcription factor α-PAL, which was previously identified in the proximal (−20-bp) promoter region of the housekeeping gene eukaryotic Initiation Factor-2α (eIF-2α). A previous study by Leung et al. (1989) of the human θ-globin promoter showed that an upstream GC-rich region acts as a promoter element in both erythroid and nonerythroid cell lines, replacing the normal activity of the ATA and CCAAT box regions. At the time of this study α-PAL factors had not been identified. The promoter of elF-2a, like those of the θ-globin genes, is GC rich and contains no CAAT or ATA elements (Efiok et al. 1994). Binding of α-PAL to the eIF-2α promoter is believed to act in place of the ATA element, by recruiting the basal promoter machinery. These α-PAL elements are often found in genes involved in controlling metabolism, proliferation, and cell growth, DNA repair, and translation (Efiok et al. 1994; Chiorini et al. 1999). Therefore, the identification of an α-PAL element may implicate the θ-globin genes in a similar role, and this role may have evolved before the divergence of eutherians and marsupials from a common ancestor.

Conclusion

The results presented here suggest that at least a 4 gene cluster 5′–ζ–α–θ–ω–3′, including three α-like globin genes and one β-like globin gene (ω-globin [Wheeler et al. 2004]) existed prior to the divergence of the marsupial and eutherian lineages. The possibility remains that a second ζ-globin gene was also present in the common ancestor of these mammalian lineages. Analyses of the α-globin gene family of divergent marsupial lineages (e.g., the opossum, D. virginiana) or monotremes may help to further resolve this hypothesis.