Introduction

Heme proteins that can reversibly bind oxygen and have a particular fold originally identified in myoglobin (Kendrew et al. 1958) typify the hemoglobin (Hb) superfamily. In vertebrates, hemoglobin and myoglobin transport and store oxygen, respectively. They accomplish these tasks with kinetic and allosteric properties finely tuned by the protein structure. The vertebrate Hb fold consists of seven or eight helices (labeled A through H) arranged in a three-on-three topology (Perutz 1979). The iron within the heme group is normally coordinated by a single histidine (the proximal histidine, or His F8) and the sixth iron coordination position harbors the exogenous ligand. The packing of the protein is maintained by specific interhelical contacts (Lesk and Chothia 1980). Structural analysis reveals that these contacts can be organized into a template (Bashford et al. 1987), and much mechanistic insight has been derived from sequence alignment in conjunction with a large body of biochemical and biophysical studies.

In contrast to vertebrate Hbs, invertebrate Hbs constitute a heterogeneous group (Weber and Vinogradov 2001) that includes leghemoglobins (Kundu et al. 2003), flavohemoglobins (Frey and Kallio 2003; Wu et al. 2003), and a variety of other hemoglobins found in plants, protists, algae, Bacteria (Moens et al. 1996; Weber and Vinogradov 2001), and Archaea (Freitas et al. 2005). This heterogeneity in sequence is accompanied by various tertiary features and quaternary arrangements (Bolognesi et al. 1997; Moens et al. 1996; Vinogradov 1985; Vinogradov et al. 1993). In several cases, the main function differs from ligand binding and release, and although invertebrate sequence templates are useful (Moens et al. 1996), the diversity of properties encountered among recently discovered Hbs argues against a generalized interpretation of weak sequence conservation patterns.

Within the invertebrate Hbs, there exists a family of globins, referred to as truncated hemoglobins (trHbs) (Pesce et al. 2000), that are clearly distinct from all other globins. Truncated globins have primary structures that are 20 to 40 residues shorter than mammalian Hbs (Pesce et al. 2000). The deletions are distributed throughout the sequence and result in shortened or missing helices as well as modified connecting loops. Because these proteins have been identified in Bacteria, unicellular eukaryotes, and higher plants (Milani et al. 2005) and occur with a high preponderance in Bacteria, it can be proposed that the trHb fold existed long before the vertebrate fold.

Some of the organisms that contain trHbs have pathogenic properties, perform photosynthesis, fix nitrogen, or have distinctive metabolic capabilities. The few trHbs that have been characterized in sufficient detail are presumed to aid in these traits (Milani et al. 2005). For example, the truncated globin from the cyanobacterium Nostoc sp. is thought to protect its nitrogen-fixation apparatus from oxidative damage by scavenging oxygen (Hill et al. 1996). In Mycobacterium bovis, one of two trHbs (trHbN) participates in nitric oxide detoxification (Ouellet et al. 2002). When limited functional data are available, an assessment of sequence variations helps in conditioning functional expectations; such an assessment was undertaken in 2002 by Wittenberg and coworkers (2002).

The original phylogenetic analysis of trHb sequences showed that these proteins branched into three groups, designated I (trHbN), II (trHbO), and III (trHbP) (Wittenberg et al. 2002). When this analysis was completed, 42 sequences were available and structural information existed for Group I globins only. At present, over twice as many trHb genes can be identified in databases, and two members of Group II have also been structurally characterized (Giangiacomo et al. 2005; Milani et al. 2003). This report examines the expanded set of trHb sequences and their groupings. Residues known to be important for ligand binding and anticipated to play structural and functional roles are inspected for conservation patterns. In addition, an updated phylogenetic analysis is used to explore gene history and make predictions about the properties of trHbs that have yet to be characterized.

Methods

BLAST Searches

To perform BLAST (Altschul et al. 1990) searches, one trHb sequence from each of Groups I, II, and III was selected as a starting query sequence. These sequences were those in the original trHb phylogenetic analysis (Wittenberg et al. 2002) and were from M. tuberculosis (trHbN), Bacillus subtilis (trHbO), and Caulobacter crescentus (trHbP). In analyzing the search results, only the highest scoring hits were retained from each genome and the accepted hits were queried again against all prokaryotic and eukaryotic databases. At this stage, sequences with E-values above 0.1 were discarded. This allowed a certain level of specificity while maintaining some discrimination among groups. In turn, each discovered sequence was selected as a query sequence until all results were redundant. Protein-protein and protein-nucleotide BLAST searches were used depending on the type of information available for the genome of interest. Redundant sequences from different strains of the same species were eliminated from the final analysis.

Alignments

Alignments were completed using ClustalX (Thompson et al. 1997) with a Gonnet 250 matrix. Manual adjustments were made to the alignments using jalview software (Clamp et al. 2004). The overall tree, which included all groups, was aligned five sequences at a time; these sets were randomly selected from each of the groups. Manual adjustments were guided by the proximal histidine, the only residue conserved across all trHb sequences. After this manual adjustment all sequences were globally realigned. Alignments within the groups required no further manual adjustment and were checked with current structural data.

Phylogenetic Trees

The phylogenetic analysis was completed with MEGA version 2.1 (Kumar et al. 2001). The overall tree was constructed using minimum evolution (ME) with p-distance, as were the trees for the separate groups. p-distance and Poisson correction were found to give similar results in the overall tree and identical results in the group trees. p-distance was chosen in preference to the Poisson correction because of the diversity of taxa. ME was chosen because the data set was small enough that computational time was not a concern and therefore neighbor joining was not necessary. A total of 500 bootstrap replications were used to test the topology in all cases. Percentage bootstrap values are reported with each tree.

Orthologue and Paralogue Tests

Each member of the three groups was tested against all other members of the same group using the triangular orthologuey test (Tatusov et al. 1997). Members of each group were subjected to the same analysis against members of the other groups. The criteria for consideration as orthologues were first hit and E < 0.001. The paralogues (not first hit) had E values at least an order of magnitude greater.

Results and Discussion

Overall Tree and Sequence Alignment

The completion of an updated trHb phylogenetic tree was undertaken first to determine if the organization into the three groups detected with the initial 42 entries held for the 111 sequences identified in databases up to 2004. The final tree comprised 105 sequences and is illustrated in its entirety in Fig. S1 of the Supplementary Material. The new tree maintained the same number of main branches; 24 trHbs were organized in Group I, 57 in Group II, and 24 in Group III. (The alignment of all protein sequences and a list of organisms and gene information are presented in Fig. S2 and Table S1 of the Supplementary Material.) As observed on the basis of the 42-item set, only the proximal histidine (His F8) Footnote 1 is strictly conserved across all sequences. Phe CD1, long thought to be also strictly conserved (Ptitsyn and Ting 1999), is replaced in 36 sequences. TrHbs form a diverse family as indicated by pairwise identity values that are occasionally lower than 20% (Milani et al. 2005).

To explore further the sequence relationships across and within the three trHb groups, the 105 proteins were tested for homology type. Within each group, all sequences were orthologous, excluding those sequences in organisms that had two representatives in the same group. It was also confirmed that sequences from one group were paralogous to sequences from another. The discrimination among groups was sharp: even the best-scoring hits across groups gave E values two orders of magnitude greater than the best scoring hits within a group. In exception to this observation, seven Group I globin sequences yielded E values comparable to the intergroup values when tested against the other Group I globins (see below). In general, paralogues are unlikely to share a common function (Tatusov et al. 1997). Consequently, the three groups of trHbs are expected to consist of proteins that have adapted to perform distinct roles. The occasional coexistence of members of multiple groups in the same organism supports this view, but stronger evidence arises from recent biochemical studies that have revealed that functional roles are not conserved across Groups I and II (Ouellet et al. 2003). The analysis of the trHb sequences provided below focuses on orthologous gene sets to reveal valid conservation patterns.

Group I Proteins: N Globins

Figure S3 (Supplementary Material) shows the alignment of the Group I globin sequences identified in BLAST searches; a subset is presented in Fig. 1. The globin domain of trHbNs exhibits extensive and nonuniform variability in primary structure.

Figure 1
figure 1

Sequence alignment of representative truncated globins. Conservation patterns are derived from the complete set of sequences given as Supplementary Material. The symbols immediately above the sequences represent Clustal X (Thompson et al. 1997) notation for strictly conserved and two levels of strongly conserved sites. Top set: Group I globins (Fig. S3, Supplementary Material). Helix designations are based on the structures of sperm whale myoglobin (Phillips and Schoenborn 1981)(top two lines) and Mt-trHbN (Milani et al. 2001) (third line). \BCC/ indicates a 3-residue insertion and \D/ a 13-residue insertion containing the CD corner and the 7-residue D helix entirely missing in trHbs. Only the globin domain is annotated. The framed residues make up the hydrophobic tunnel in Mt-trHbN. The vertical lines on the left refer to the subgroup 1 and subgroup 2 classification (see text). Middle set: Group II globins (Fig. S4, Supplementary Material). Helix designations are based on the structures of sperm whale myoglobin (top two lines) and Mt-trHbO (Milani et al. 2003) (third line). \BCC/ indicates a 3-residue insertion; \DE/ indicates a 15-residue insertion containing the CD corner, the 7-residue D helix, and 1 E residue; and \G/ and \H/ indicate 1-residue insertions. Bottom set: Group III globins (Fig. S5, Supplementary Material). No structure is available for these proteins. Residues at expected positions B10, CD1, F8, and G8 are framed.

Heme Cavity Residues

All studied trHbs, whether from Group I or Group II, are capable of binding exogenous diatomic ligands such as O2, CO, and NO. Large differences in the affinity for these ligands are observed from protein to protein (Milani et al. 2005). Extensive studies of invertebrate hemoglobin phylogeny have mapped the heme cavity for essential residues (Weber and Vinogradov 2001). Experimental data confirm that the same positions are important to the properties of Group I globins (Egawa and Yeh 2005; Wu et al. 2003). This is shown in four three-dimensional structures, those of M. tuberculosis (Mt-trHbN) (Milani et al. 2001), Paramecium caudatum (Pc-trHbN), Chlamydomonas eugametos (Ce-trHbN) (Pesce et al. 2000), and Synechocystis sp. PCC 6803 (S6803-trHbN) (Falzone et al. 2002; Hoy et al. 2004). These structures reveal a two-on-two α-helical fold resembling the three-on-three fold of vertebrate hemoglobins (Fig. 2A). The most noticeable differences between the truncated and the full-length folds are the shortened A helix, the absence of a D helix, a long pre-F loop, and a variable-length F helix (Falzone et al. 2002; Pesce et al. 2000).

Figure 2
figure 2

A Ribbon structure of Ce-trHbN. The two-on-two α-helical fold is typical of trHbs. The five black dots indicate the positions of conserved glycines (see text). PDB 1DLY (Pesce et al. 2000). B Stereo view of key residues in the heme pocket of metcyano Mt-trHbN. The bound cyanide is stabilized by a H-bond from Tyr B10, and Tyr B10 is stabilized by a H-bond from Gln E11. PDB 1RTE (Milani et al. 2004a). C Stereo view of key residues in the heme pocket of metcyano S6803-trHbN. The bound cyanide is stabilized by a H-bond from Tyr B10 and possibly interactions with Gln E7 and Gln E11. His H16 is covalently linked to the heme 2-α-vinyl. PDB 1S69 (Trent et al. 2004).

Characterized trHbNs stabilize bound cyanide in the ferric state and oxygen in the ferrous state through a H-bond network involving residues B10 and either E7 or E11 or both (Egawa and Yeh 2005; Milani et al. 2005). The details of the interactions vary: in Mt-trHbN, a direct Tyr B10-O2 H-bond occurs that is stabilized by Gln E11 interacting with Tyr B10 (Couture et al. 1999b; Milani et al. 2001, 2004a; Yeh et al. 2000) (Fig. 2B); Pc-trHbN appears to use a combination of Tyr B10, Gln E7, and Gln E11 (Das et al. 2000), as do Ce-trHbN (Couture et al. 1999a; Das et al. 2001) and possibly S6803-trHbN (Fig. 2C) (Das et al. 2001; Trent et al. 2004; Vu et al. 2004a). In each case, Tyr B10 plays a central role (Couture et al. 1999b).

Position B10 is occupied by a tyrosine in all but nine of the Group I globin sequences (Fig. S3, Supplementary Material). Two of these, from Nostoc sp., contain a residue capable of hydrogen bonding (a histidine), whereas the other seven contain a hydrophobic residue. In general, trHbNs that have a hydrogen bonding residue at B10 have a glutamine at E7 or E11, or at both positions, presumably completing the H-bond network. In the seven sequences containing a hydrophobic side chain at B10, large hydrophobic residues are found at E7 and E11.

Figure 3 illustrates the phylogenetic tree corresponding to the trHbN alignment. The seven sequences with hydrophobic B10 cluster together, in a branching pattern inconsistent with taxonomy and arising from this and multiple other nonconservative modifications. The two Nostoc sp. sequences containing His B10 are highly divergent from the Tyr B10 sequences but, nevertheless, belong to the same branch. The E values between members of the majority set and members of the minority set are the highest recorded for any group, with Myxococcus xanthus and Xanthomonas campestris trHbNs topping the list (E = 6 × 10−4). These observations indicate the existence of two subgroups within the N group. In fact, an orthology limit on E values only tightened by a factor of 5 would make some members of these two subgroups paralogous to one another by the previously stated limits. As will appear, this situation is unique to the N globins.

Figure 3
figure 3

Minimum evolution tree of 24 Group I globin sequences. The first 17 sequences belong to subgroup 1, indicated by the presence of a hydrogen bonding amino acid at the B10 position. The last seven sequences are subgroup-2 N globins. Phyla are listed on the right side of the tree for the prokaryotes. Eukaryotic sequences are identified separately.

None of the minority trHbNs (subgroup 2 in Fig. S3, Supplementary Material) have been experimentally studied thus far. The low conservation of residues that are the hallmark of the heme cavity of other trHbs suggests that, if these subgroup 2 proteins bind diatomic ligands, they modulate their affinity with an alternative to the trHb H-bond network. A low-polarity heme pocket is reminiscent of vertebrate globins, whereas the high polarity observed in subgroup 1 is expected to favor oxygen chemistry as in peroxidases (Hiner et al. 2002). The two subgroup proteins may therefore perform distinct tasks in the cell. It is also conceivable that the outlier sequences are inactive and accumulating nonselective mutations.

Ligand Access to the Heme Group

An intriguing characteristic of Group I globins is a hydrophobic access tunnel to the heme distal side. This tunnel is detected in all structures (Milani et al. 2001, 2004b; Pesce et al. 2000; Wittenberg et al. 2002), although its formation may depend on the presence of bound ligand (Hoy et al. 2004). Figure S3 (Supplementary Material) emphasizes the 12 residues that line this tunnel in M. tuberculosis trHbN. Seven, including E15, are well conserved (occupied by a ClustalX strong group of side chains); three are weakly conserved but retain a hydrophobic character (only one or two sequences preventing a higher level of overall conservation); and two (at E19 and G19) are not well conserved unless the two N subgroups are separated. In Mt-trHbN, Phe E15 adopts two conformations, one of which appears to block ligand access to the heme group (Milani et al. 2001). Further structural and ligand binding kinetic analyses in other Group I globins will be necessary to confirm a ligand-gating role for residue E15.

Hexacoordination and Heme Reactivity

Two Group I trHbs, S6803-trHbN and Synechococcus sp. PCC 7002 (S7002) trHbN, exhibit unique properties compared to the other members of the group. First, they are hexacoordinate hemoglobins and in their resting state use His F8 and His E10 as ligands to the iron atom (Couture et al. 2000; Lecomte et al. 2001; Scott et al. 2002). Exogenous ligand binding in place of His E10 results in a large conformational change in the E and B helices (Hoy et al. 2004; Vu et al. 2004a). TrHbNs show no general potential for bis-histidine hexacoordination. The only other subgroup 1 sequences containing His E10 are those of Nostoc sp. The characterization of Nostoc sp. trHbNs is incomplete, but early reports did suggest a propensity for hexacoordination (Thorsteinsson et al. 1996). Second, S6803-trHbN and S7002-trHbN are capable of forming a covalent linkage between the 2-vinyl group of the heme and the H16 histidine (Scott et al. 2002; Vu et al. 2002, 2004b). This posttranslational modification prevents heme loss and has the potential to modulate the reactivity of the heme group. Figure S3 (Supplementary Material) illustrates considerable variability at the nominal H16 position across the N globins. As the database of structures and chemical properties grows, it will become possible to ascertain compositional and reactivity trends at this position.

Oligomerization State

Cooperative binding of oxygen is a property of the majority of vertebrate and many invertebrate Hbs (Baldwin and Chothia 1979; Monod et al. 1965; Perutz 1990). The cyanobacterial globins described above remain monomeric in solution up to high (mM) protein concentrations and, therefore, display no cooperative ligand binding behavior. Mt-trHbN, in contrast, functions as a dimer and the molecular basis for the cooperativity has been attributed to conformational changes propagated in the B helix, the E helix, and the pre-F loop (Couture et al. 1999b; Yeh 2004). According to the X-ray structure of Mt-trHbN, the residues responsible for the dimerization interactions are Arg35 (B12) and a stretch of four residues referred to here as DD1-DD4 (Tyr72-Thr73-Gly74-Ala75, in the EF loop). DD1-DD4 form H-bonds to Arg B12 side chain through their carbonyl groups. One of the heme propionates forms a H-bond with the backbone NH of Ala75, and the carbonyl group of this residue forms a H-bond with the hydroxyl group of Tyr72. This arrangement probably requires small side chains at the DD3 and DD4 positions. Figure S3 (Supplementary Material) shows that all sequences, except that from Gemmata obscuriglobus, have a charged residue at the B12 position, whereas residues DD1-DD4 exhibit some degree of conservation. S6803-trHbN has similar residues at these positions but does not form dimers. Accordingly, the residues at DD1-DD4 are only weak predictors of the oligomerization state of trHbNs.

Residues Characteristic of the TrHb Fold

The residues discussed above participate in globin function. Additional residues are important to the maintenance of the fold and its thermodynamic stability. Interhelical contacts have been reviewed elsewhere (Lecomte et al. 2005; Lesk and Chothia 1980). Here we focus on turn and loop residues of apparent structural importance. In the structure of Pc- and Ce-trHbNs, two pairs of glycines, one between the A and the B helices and one directly following the E helix, and a single glycine six residues downstream from the second Gly-Gly motif were reported to be essential for establishing the truncated globin fold (Pesce et al. 2000). Figure 2A shows the location of these glycines in the structure of Ce-trHbN. These structural glycines appear to be well conserved, especially in subgroup 1 (Fig. S3, Supplementary Material).

The last residue of interest is a strictly conserved aspartic acid between the B and the C helices. In the X-ray structure of Mt-trHbN (Milani et al. 2001), the carboxylate group of this aspartic acid (Asp39) forms a H-bond to the backbone NH of residue 41 and the side chain of His97 (G11). In S6803-trHbN, which does not contain His G11, the equivalent Asp28 forms the side-chain backbone H-bond to Arg30 (C3) and possibly a side-chain side-chain H-bond with the same residue. Every sequence in the Group I set has the potential for a side-chain side-chain H-bond except N. punctiforme subgroup 2 trHbN (Leu at C3 and Asp at G11). The side-chain backbone H-bond involving Asp28 is an N-capping interaction (Aurora and Rose 1998; Karpen et al. 1992; Presta and Rose 1988) that provides a strong start signal for the short C 310-helix. The conservation of this residue may be related to secondary and perhaps tertiary structure requirements associated with the shielding of the heme group from solvent.

Group II Proteins: O Globins

Figure S4 (Supplementary Material) shows the alignment of 57 Group II globins (4 of which are displayed in Fig. 1). This set is endowed with a much higher degree of conservation than Group I. Every helix (excluding the short A) has either a strictly conserved residue or multiple positions occupied by residues from a ClustalX strong group. To date, two metcyano trHbO crystal structures have been solved: that of M. tuberculosis (Mt-trHbO) (Milani et al. 2003) and that of Bacillus subtilis (Bs-trHbO) (Giangiacomo et al. 2005). In a feature specific to Group II globins, the region between the E and the F helix contains an additional eight-residue helix, called the Φ helix (Milani et al. 2003). Structures and the ligand binding data point to functionally important residues in the Group II globin fold.

Heme Cavity Residues

The most distinctive feature of the Group II globins is a variation on the trHbN H-bond network. According to amino acid replacement studies in Mt-trHbO, Tyr B10 and Tyr CD1 control O2 association and dissociation rates, Tyr B10 playing a minor role (Ouellet et al. 2003). The X-ray structure reveals that Tyr CD1, not B10, forms the H-bond to the cyanide ligand, with which Trp G8 also interacts.Footnote 2 The same H-bond network is expected in the O globin from Mycobacterium leprae (Visca et al. 2002). In contrast, Bs-trHbO exhibits replacements at CD1 (Tyr→Phe) and E11 (Leu→Gln) (Giangiacomo et al. 2005), and the residue at CD1 can no longer form a H-bond to the ligand. Studies of Mt-trHbO variants suggest that when Tyr CD1 is replaced with a phenylalanine, Tyr B10 assumes the role of H-bond donor (Ouellet et al. 2003). This is the case in wild-type Bs-trHbO (Giangiacomo et al. 2005). The ability to rearrange structurally while stabilizing the bound ligand explains why other trHbOs do not require Tyr CD1 for this purpose. As for the E11 replacement, a glutamine at that position forms H-bonds to the ligand and Tyr B10 in Mt-trHbN (Giangiacomo et al. 2005). Finally, Trp G8 in both Mt-trHbO and Bs-trHbO is thought to contribute a weak stabilizing H-bond and to modulate the rate of ligand escape out of the pocket (Giangiacomo et al. 2005; Milani et al. 2003).

The above structural observations suggest that residue properties at CD1 and E11 are correlated. Figure S4 (Supplementary Material) shows that all O globins that have a Tyr at position CD1 have a nonpolar residue at the E11 position. In the absence of a tyrosine, either a phenylalanine (non-hydrogen bonding) or a histidine (hydrogen bonding) is found at CD1. Phe CD1 occurs in conjunction with a H-bond donor at the E11 position (Gln or Ser), whereas His CD1 is accompanied by a hydrophobic residue (Leu or Phe). This analysis offers four possible configurations for ligand interactions: Tyr CD1 and hydrophobic E11, His CD1 and hydrophobic E11, Phe CD1 and Gln E11, and Phe CD1 and Ser E11. Tyr B10 and Trp G8 are invariant in all cases. Thus, one of the necessary H-bonding elements involved in ligand stabilization can be located either at the CD1 position or the E11 position. It is interesting that, so far, no instance of hydrogen bonding residues has been found simultaneously at the E11 and CD1 positions.

Ligand Access to the Heme Group

The Group II globins do not have the tunnel detected in the Group I globin structures. However, the surface of the characterized Group II globins exhibits a shallow depression on the proximal side of the heme. This depression provides partial solvent access to the heme C-pyrrole and is mostly lined with hydrophobic residues (Milani et al. 2003), namely Phe E14, Tyr E18, Leu F4, Tyr H10, Ala H14, and Leu H18. Of these positions, E14, F4, and H18 are strongly conserved and H14 is systematically hydrophobic. The E18 position is occupied by a hydrophobic side chain in all sequences save plants, and the H10 position has a wide assortment of residues, including an arginine in Bs-trHbO. It has been suggested that this depression has functional significance, perhaps in redox chemistry (Milani et al. 2003). If this is the case, variations at the H10 position may serve to tune the electrostatic environment of the heme group. Whether or not redox chemistry occurs, the hydrophobic depression may serve as a docking site for a reaction partner.

Among other strongly conserved residues, Group II globins contain an arginine at F5 (80% occurrence) and one at F7 (95% occurrence). Plants account for the majority of deviations from the consensus at F5. In the X-ray structures, Arg F5 points toward solvent. However, Arg F7 is involved in a salt bridge with the heme propionate in Mt-trHbO and is H-bonded to the backbone of His 69 (a residue of the ΦF corner, located 7 positions prior to the proximal histidine) in Bs-trHbO (Giangiacomo et al. 2005; Milani et al. 2003). When forming a salt bridge with a propionate, it is likely that Arg F7 stabilizes the heme in its cavity and shields the proximal histidine from solvent (Milani et al. 2003). When interacting with His 69, Arg F7 may favor a fixed conformation and seal, along with E10, the aperture across the propionates (Giangiacomo et al. 2005). Conservation of Arg F7 may be linked to the positioning of the cofactor within the protein structure.

Residues Characteristic of the TrHb Fold

It appears that the majority of Group II globins contain the Gly-Gly motifs and the single conserved glycine in the EF loop noted in Group I globins. The Gly-Gly pair in the EF loop is strictly conserved, whereas in 15% of the sequences the pair between the A and B helices has a gap at one of the positions. Structural data show that the Gly-Gly motif between the A and B helices produces the same turn whether there is one Gly or two.

The residue at the equivalent position to the C helix N-cap in the Group I globins is not as well conserved in the Group II sequences. Both trHbO structures contain a H-bond acceptor and display the side-chain main-chain interaction (Asp29 to Val31 in Mt-trHbO and His31 to Leu33 in Bs-trHbO), but H-bond donors and hydrophobic residues also occur at the same position. It is possible that the sequences that lack a helix start signal at this position have a different local stability and heme shielding.

Group III Proteins: P Globins

Group III globins account for the largest proportion of highly conserved positions in trHbs (Fig. S5, Supplementary Material). Most come from Proteobacteria, but it is interesting that proteobacterial Group II globins show less conservation than Group III globins. Thus far, no member of Group III has been characterized functionally or structurally and a sequence analysis is by default based on the known properties of globins in the other two groups. With respect to the heme pocket positions discussed above, Group III globins resemble more closely Group II than Group I globins. The pronounced identity level within Group III raises the expectation that characterization of a limited number of such proteins will be helpful to derive a relationship between sequence and activity across the group and contribute new insight into the adaptability of the trHb fold.

An obvious feature of Group III globins is the absence of the Gly-Gly motifs reliably found in the other two groups. If indeed Group III globins adopt the trHb fold, these motifs appear dispensable. At this point, an alternative arrangement of helices, dictated by bulky residues in the loops between the A and B helices and the E and F helices, cannot be eliminated. For the rest of this discussion, however, it is assumed that interhelical contacts (Lesk and Chothia 1980) contribute the most to the robustness of the trHb fold and that Group III and Group II globins have similar structures. Given the significant differences in helical assignments already reported for the Group I and Group II globins (Giangiacomo et al. 2005), the helical assignments derived for Group III globins and used below are necessarily speculative.

Conservation of Tyr B10 in Group III globins suggests that this residue plays a role similar to that in other trHbs and establishes direct or indirect interactions with exogenous ligands. In addition to Tyr B10, a conserved tryptophan occurs at G8. This same residue modulates ligand affinity and discrimination in trHbOs. Group II globins have a distinct pocket with a H-bond network consisting of B10, G8, and either CD1 or E11. In Group III globins, the CD1 position is occupied invariably by a Phe. The absence of a H-bond donor at the CD1 position may be compensated for by a conserved Trp, which aligns with the E15 position in Bs-trHbO. The structure places this residue in direct contact with the heme, and Trp G8 could provide the additional ligand interaction satisfied in certain trHbOs by the E11 residue. A ligand-gating role for the residue at E15 adds another possible pressure for conservation. The presence of Tyr B10, Trp E15, and Trp G8 is anticipated to confer unique ligand binding properties to Group III globins.

The strict conservation of a histidine located 26 residues prior to His F8 is conspicuous. A similar separation of 24 residues is observed between His F8 and the distal histidine ligand (E10) in S6803-trHbN and S7002-trHbN. A shift of a couple of residues is plausible in the alignment of trHbN and trHbP given the variable EF loop. Thus, Group III globins are likely to contain a histidine on the distal side of the heme. This residue could either act as His E7 in vertebrate globins (stabilizing bound oxygen but not normally coordinating the iron) or act as His E10 in cyanobacterial hemoglobins (coordinating the iron in the absence of exogenous ligand). S6803-trHbN NMR (Falzone et al. 2002; Vu et al. 2004a) and X-ray (Hoy et al. 2004; Trent et al. 2004) data demonstrate that the state of ligation on the distal side is coupled to structural rearrangement of the protein, a phenomenon that may occur in the Group III globins as well.

Gene History

Table S1 in the Supplementary Material lists all of the trHbs considered for this study. It organizes the prokaryotes by bacterial division, groups simple eukaryotes together, and clusters the higher plants separately. With the reservation that any such analysis is, by definition, subject to revision as the databases grow, several traits can be considered. Only two organisms, Mycobacterium avium (an Actinobacterium) and Methylococcus capsulatus (a Proteobacterium), contain genes for a member of each of the three groups. Those two divisions account for all organisms that contain trHb genes from two different groups as well. Three Actinobacteria have trHb genes from Group I and Group II, and 13 Proteobacteria have trHb genes from Group II and Group III. The Proteobacterium Hyphomonas neptunium is thus far the only organism that contains a Group I and a Group III gene. The predominance of trHbs in these two divisions could imply that the gene originated in a last common ancestor (LCA) of Proteobacteria and Actinobacteria. Phylogenetic analysis of 16s rRNA shows the proteobacterial and actinobacterial node to be one of the most ancient ones in the bacterial tree (Schloss and Handelsman 2004).

The presence of a Group II globin gene in almost all instances of organisms having genes from more than one group leads to the conclusion that the Group II gene originated prior to the Group I and Group III genes. This view is supported by the branching pattern in Fig. S1 (Supplementary Material). It is likely that the Group I and Group III genes resulted from duplication of the Group II gene. Horizontal gene transfer (HGT) is thought to play a major if not dominant role in the evolution of species (Brown 2003), and, in fact, individual examination of the groups reveals many instances of this mechanism.

Figure 4 shows that there are two major branches in Group II, one dominated by Actinobacteria and Proteobacteria and the other dominated by plants and Firmicutes. This tree brings out two points. First, the LCA of Actinobacteria and Firmicutes occurred after the split from Proteobacteria (based on 16s rRNA (Schloss and Handelsman 2004)), but it appears that the Group II globin genes of Actinobacteria and the Group II globin genes of Proteobacteria are more similar to each other than they are to those of Firmicutes. It should be noted that the bootstrap support for this branching pattern is very low, and that bacterial deep phylogeny is inherently inaccurate (Creevey et al. 2004). Additional sequences will therefore be necessary to confirm this interpretation. Second, all of the Firmicute sequences come from the Bacillales subdivision. These two pieces of evidence suggest that the second major branch of the Group II globins was most likely the result of a gene transfer event to the ancestor of Bacillales and subsequent transfer to the other represented divisions. In this view, the transfer originated from either Proteobacteria or Actinobacteria, and the gene evolved for its product to perform a specific function in the Bacillales. The plant sequences most likely arose from a HGT promoted by infection or a symbiosis event. Many Proteobacteria are plant pathogens, and it is possible that the gene came from the same ancestral sequence as was transferred to Bacillales. The genome of Phaeodactylum tricornutum, a diatom, contains a trHbO gene (Scala et al. 2002). The sequence is an outlier to the plant branch in Fig. 4. P. tricornutum trHbO lacks the first 44 residues that are characteristic of all higher plant trHbOs (Fig. S4, Supplementary Material). However, beyond the leader sequence, the globin domains of the plants and the diatom show a high level of conservation, and the branching is supported by a bootstrap value of 99% (results not shown).

Figure 4
figure 4

Minimum evolution tree of 57 Group II globin sequences. Phyla are listed next to the prokaryotic sequences and plants are listed separately. FAPs: filamentous anoxygenic phototrophs.

The current Group III globin tree (Fig. 5) shows that this lineage originated within a proteobacterial ancestor and perhaps was transferred to M. avium and Desulfitobacterium hafniense horizontally. Genomes that contain a Group III globin gene typically contain a Group II globin gene as well. Table S1 (Supplementary Material) shows that only seven organisms contain solely a Group III globin gene. Thus, it appears that a gene duplication event occurred in an ancestral Proteobacterium and for the most part both Group II and Group III genes were retained. In the seven cases above, the organism most likely lost the need for the Group II globin. Retention of the Group II globin gene is consistent with the view that the individual trHb groups have nonidentical functions.

Figure 5
figure 5

Minimum evolution tree of 24 Group III globin sequences. Phyla are listed on the right side of the organism name.

The Group I tree (Fig. 3) is more difficult to interpret because it contains two subgroups. The Group I gene apparently also originated in Proteobacteria, likely before the emergence of Group III globins. It can be proposed that when Proteobacteria and Cyanobacteria split, Cyanobacteria retained the Group I globin and discarded the Group II globin. Alternatively, certain Cyanobacteria such as Synechocystis sp. and Synechococcus sp. could have obtained the gene by HGT from Proteobacteria, as they are prone to such transfers (Porter 1986). As discussed below, no Proteobacteria have both a Group I and a Group II globin gene, and the two genes must have been in competition in their early history. The ancestor of Mycobacterium apparently obtained its Group I globin gene from HGT, after its Group II globin had diverged sufficiently from the incoming Group I globin to perform two necessary and different functions in that organism. The above scenario makes the transfer of the Group I globins to photosynthetic eukaryotes more plausible as Cyanobacteria could have transferred the gene to Chlamydomonas at the time of chloroplast generation. The genome of the diatom P. tricornutum contains a Group I globin gene most closely related to that of Methylococcus capsulatus and Synechococcus sp. (Scala et al. 2002), in agreement with current views on the origin of diatoms (Nisbet et al. 2004).

The common origin of the trHbs in Proteobacteria prompted the construction and analysis of the trHb proteobacterial tree (Fig. 6). The most interesting aspect of this tree is the close branching relationship exhibited by the Group I and Group II globins, which contrasts the proximity of the Group I and Group III globins in the overall tree (Fig. S1, Supplementary Material). This is consistent with the above analysis because Group I and Group II globins are expected to resemble each other more closely if they outcompeted one another in various Proteobacteria. The proteobacterial tree also indicates that the Group II and Group III globins originated before the split of α- and β-Proteobacteria, as those subdivisions represent all but one of the members that contain both genes. Finally, Group I globins seem to have originated in or have been preferentially retained by the γ subdivision of Proteobacteria. The above gene history scenario is summarized in Fig. 7.

Figure 6
figure 6

Minimum evolution tree of proteobacterial globin sequences. The bacterial subdivision is listed next to each sequence.

Figure 7
figure 7

Hypothetical path for the major events in Group I, II, and III truncated globin gene history. The genes for the three groups are referred to glbN, glbO, and glbP, respectively. Dotted lines represent horizontal gene transfer. LCA, last common ancestor; Proteo, Proteobacteria; Actino, Actinobacteria; Myco, Mycobacteria; Cyano, Cyanobacteria; UPE, unicellular photosynthetic eukaryotes.

Conclusions

Heme proteins are the products of some of the most ancient genes in the tree of life. Current thinking attributes oxygen scavenging or redox properties to the ancestral Hb (Bolognesi et al. 1997; Hardison 1996; Moens et al. 1996), but details of the structure and ligand binding characteristics of this ancestral Hb are, of course, unknown. The updated phylogenetic analysis places the origin of the trHb genetic history in Proteobacteria, one of the most ancient bacterial lineages (Hedges 2002). The nearly exclusive presence of Group III globin genes in α and β-Proteobacteria leads to the conclusion that these genes arose through duplication of the Group II globin gene prior to the α/β subdivision. The Group I globin gene most likely appeared before the Group III globin gene; in this view, the Group I globin rendered the Group II globin obsolete or became obsolete itself in Proteobacteria that had both. Among Actinobacteria, Mycobacterium data show that a different scenario can lead to the concomitant use of a Group I and Group II globins for distinct functions.

It is clear that the trHb gene has undergone numerous evolutionary steps to form at least two functionally distinct groups, I and II, which use an identical cofactor and have a similar fold. These two groups of proteins have specific and subtle methods of modulating ligand affinity, a cornerstone of trHb activity. The functional diversification has extended further in Group I, resulting in two subgroups. The alignment data also lead to the expectation that Group II and III globins display more consistent functional features than their Group I counterparts. However, they also suggest that Group III globins will offer yet a different array of properties. Experimental data on these globins will be essential for refining the gene history and the appearance of multiple groups within one organism.

Supplementary Material

The supplementary material contains a list of trHb sequences used in this work (Table S1), a minimum evolution tree of 105 trHb sequences (Fig. S1), the amino acid alignment of these 105 trHbs (Fig. S2), and the amino acid alignment of 24 Group I globins (Fig. S3), 57 Group II globins (Fig. S4), and 24 Group III globins (Fig. S5).