Introduction

Examination of the prokaryotic branches of unrooted rRNA trees led some years ago to the suggestion that the ancestors of Bacteria were extreme thermophiles (Achenbach-Richter et al. 1987). This possibility was supported when the rooting of the rRNA-based universal tree of life showed that its deep, oldest branches are occupied by thermophilic and hyperthemophilic prokaryotes (Stetter 1994, 2006). Within the Bacteria the earliest branching organisms are represented by Aquificales and Thermotogales, while among the Archaea the deepest and shortest branches, i.e., the slowest evolving clades, correspond to the Nanoarchaeota, Pyrodictiacea, and the Methanopyraceae (Stetter 2006). The position and length of the branches of thermophiles and hyperthermophiles [hereinafter, thermophiles, i.e, all those prokaryotes with optimal growth temperatures above 50°C (Beeby et al. 2005)] in rRNA trees lent strong support to the idea that the last common ancestor (LCA) of all extant living beings was a thermophile (Stetter 1994, 2006; Woese et al. 1990). This hypothesis has been extrapolated to the origin of life itself, which has been assumed by some to have taken place in high-temperature geological settings (Barion et al. 2007; Di Giulio 2003a, b; Pace 1991; Stetter 1994; Wächsterhäuser 2006).

However, alternative possibilities have been suggested (Galtier et al. 1999; Gogarten-Boekels et al. 1994; Miller and Lazcano 1995; Sleep et al. 1989). Several thermophilic sequences are displaced from their basal position in universal trees if molecular markers other than elongation factors or ATPase subunits are employed (Forterre 1996; Forterre et al. 1993; Klenk et al. 1994), or if alternative phylogeny-building methodologies are used (Brochier and Philippe 2002). That is, other molecular trees raise the possibility that the LCA was not a thermophile. This debate has been complicated by the recognition that recent large-scale genomic comparisons have failed to reach a general consensus over the root of the tree (Lake et al. 2007; Skophammer et al 2006; Zhaxybayeva et al. 2005). In fact, it has been argued that thermophiles are not primitive but are the result of secondary adaptations (Forterre 1996; Miller and Lazcano 1995) and that the earliest branching bacterial species seen at the base of the rRNA tree exhibit thermophilic characteristics due to lateral transfer events of sequences encoding reverse gyrase and other thermoadaptative traits from archaeal extremophiles (Forterre et al. 2000).

Protein disulfide oxidoreductases (PDOs) appear to be uniquely suited to analyze the hypothesis that the LCA flourished in a high-temperature extreme environment. PDOs are redox enzymes of approximately 230 amino acids involved in dithiol–disulfide exchange reactions (Pedone et al. 2004; Ren et al. 1998). The PDO superfamily includes a widely distributed set of enzymes comprising thioredoxin (Trx) and glutaredoxin, which take part in the reduction of disulfides, as well as the eukaryotic protein disulfide-isomerase (PDI) and the disulfide-bond forming (DsbA) families, which are involved in the formation of disulfide bridges during protein folding (Ladenstein and Ren 2006).

Computational analysis of genomic data indicate that the PDO protein may play a major role in some thermophilic prokaryotes by participating in the formation or the stabilization of intracellular protein disulfide bonds (Beeby et al. 2005; Ladenstein and Ren 2006). High-resolution structural analysis of a PDO isolated from Pyrococcus furiosus has shown that this protein is formed by two thioredoxin fold units arranged in tandem, each of which is endowed with a catalytic site CXXC motif. The two CXXC motifs have somewhat different redox properties (Ladenstein and Ren 2006; Ren et al. 1998). Characterization of wild-type and mutant PDOs from P. furiosus has shown that both CXXC motifs are essential for isomerase activity and that the enzyme’s reductive/oxidative properties depend on the CXXC motif located in the carboxy-terminal half of the protein (Pedone et al. 2004). Although the primary sequences of the PDO thioredoxin units are not highly conserved, three-dimensional structure analysis indicates that they are the outcome of an ancient gene duplication (Ren et al. 1998) and that the resulting sequences are homologous to the eukaryotic PDI (Tian et al. 2006) and the N-terminal redox active disulfide domain of the bacterial alkyl hydroperoxide reductase enzyme (AhpF) (Wood et al. 2001).

As discussed here, the availability of an increasing number of PDO sequences and their homologs in many completely sequenced cellular genomes allows us to draw a congruent picture of the early evolution of PDO and to provide some insights on the question of whether the LCA was a thermophile. It has been suggested that the PDO-encoding sequences found in thermophilic bacteria have an archaeal origin (Pedone et al. 2004). The results presented here support the possibility that the duplication that led to PDOs with double redox site (CXXC) first took place in the crenarchaeota and spread from them into the Bacteria by horizontal gene transfer via the euryarchaeota. These conclusions imply that the LCA lacked PDO-encoding sequences.

Material and Methods

The PDO (PF0094) sequence from Pyrococcus furiosus was downloaded from the Kyoto Encyclopedia of Genes and Genomes, KEGG database (http://www.genome.jp/kegg/) (Kanehisa and Goto 2000). Homologs to this sequence were searched in a database of 456 completely sequenced genomes from the three domains of life, available as of October 2006, using BLAST search (Altschul et al. 1997). Additional searches were performed using the same BLAST tool in the nonredundant database of NCBI (National Institutes of Health, USA). The sequences are shown in Fig. 1. To identify the homologous proteins with double redox site (CXXC), the best hits were aligned using Clustal_X (Thompson et al. 1997) and the result was visually edited with the Bioedit software (Hall 1999). To confirm that the two thioredoxin folds present in PDO are homologous to the duplicated regions found in AhpF and PDI, the first and the second domains of PDO were compared using the BLAST tool (0.001 e-value cutoff) with the corresponding portions of the complete AhpF and PDI sequences. Neighbor-joining and minimal evolution trees using MEGA3 software version 3.1 (Kumar et al. 2001) were calculated for all available PDO sequences (Fig. 2A). Although the two PDO thioredoxin fold units exhibit considerable divergence, they still retain significant similarity in their primary structure. Phylogenetic analysis of the aligned sequences have been used to calculate trees in which the clades of the paralogous first and second thioredoxin-like fold units, within the limits of resolution, are identical (Fig. 2B). We have also compared the PDO sequences with the available AhpF dataset to calculate minimal evolution trees using Poisson correction (Fig. 3) and gamma distance that took into account site rate variation (alpha parameter 2.01) (not shown). These trees all have essentially the same topology. Alpha parameter was calculated using TREE-PUZZLE software (Schmidt et al. 2002). Five hundred bootstrap replications were performed for each of these three trees.

Fig. 1
figure 1

Multiple sequence alignment of PDO proteins

Fig. 2
figure 2

A Minimum evolution PDO phylogeny, 500 bootstrap replications (260 amino acid sites and 20 sequences). The root is placed in the midpoint. B Minimum evolution PDO-amino (species-1) and PDO-carboxy (species-2) phylogeny, 500 bootstrap replications (130 amino acid sites and 42 sequences)

Fig. 3
figure 3

Minimum evolution tree of PDOs and the N-terminal domain of alkyl reductases (AhpF). PDO enzymes are derived from a fusion of two trx-like domains (F1), perhaps in crenarchaeota. The root (R) is placed between the AhpF and PDO portions of the tree. PDOs have fused at least three times to a FAD/NAD(P)H reductase domain found in alkyl reductases (F2, F3, and F4). AhpFs have lost the first CXXC motif of the PDO domain, as indicated by the lack of the stars

The trees shown in Fig. 2A and B were rooted using the midpoint value calculated with the MEGA3 software. The tree shown in Fig. 3 was rooted visually between the sets of PDO and AhpF sequences under comparison. Maximum-likelihood analyses were also performed for these data using the PHYML program (Felsenstein 1989). In all cases the resulting trees exhibited the same overall topology (not shown). The 3D structure classification of PDO, AhpF, and PDI was based on the Structural Classification of Proteins (SCOP) database (Murzin et al. 1995).

Results

A search for homologs of the PDO (PF0094) sequences from Pyrococcus furiosus with double redox site (CXXC) has extended and confirmed previous reports of their phylogenetic distribution (Beeby et al. 2005) (Fig. 1). Analysis of completely sequenced prokaryotic genomes available as of October 2006 shows that PDO homologs are neither universally nor randomly distributed but are present only in the completely sequenced prokaryotic genomes of the following thermophilic organisms: (a) euryarchaeota: Pyrococcus furiosus DSM 3638, Pyrococcus horikoshii OT3, Pyrococcus abyssi GE5, Thermococcus kodakarensis KOD1, Thermoplasma acidophilum, Thermoplasma volcanium, Picrophilus torridus DSM 9790, Ferroplasma acidarmanus Fer1; (b) crenarchaeota: Aeropyrum pernix K1, Sulfolobus tokodaii str. 7, Sulfolobus solfataricus P2, Sulfolobus acidocaldarius DSM 639, Pyrobaculum aerophilum str. IM2; and (c) Bacteria: Aquifex aeolicus VF5, Thermotoga maritima MSB8, Carboxydothermus hydrogenoformans Z-2901, Thermus thermophilus HB27, Thermus thermophilus HB8, and Thermoanaerobacter tengcongensis. The PDO from T. tengcongensis, which was included by Beeby et al (2005) in their analysis, is endowed with the two CXXC sites but has an e value of 0.022, which is below the minimal BLAST value of e = 0.0001 used here (see Material and Methods). The PDO sequences of Halothermothrix orenii H 168 and Chloroflexus aurianticus, two bacterial species, were retrieved from the nonredundant database (Fig. 1). We have also included in our dataset a 239-amino-acid homolog of PDO found in Nanoarchaeum equitans, although it lacks the CXXC motif in the amino portion of its sequence.

A neighbor-joining tree (500 bootstraps) including all the available PDOs sequences is shown in Fig. 2A. This tree does not include a number of shorter PDO paralogs that are found in the crenarchaeotal species S. acidocaldarius (183 amino acids), S. tokodaii (182 aa), S. solfataricus (171), A. pernix (200 aa), and the bacteria T. tengcongensis (188 aa) but which lack the CXXC motif in the amino half of their sequences. With the exception of the node connecting the crenarchaeotal and euryarchaeotal species included in this study, all other deep nodes have very low bootstrap values. These small bootstrap values are consistent with previous observations on the relatively low levels of sequence conservation of PDO (Ladenstein and Ren 2006). Nonetheless, the branching order of all the archaeal species included in this tree is the same as that of canonical rRNA phylogenies (not shown), and the most recent speciation events depicted in Fig. 2A exhibit good bootstrap values.

In the PDO tree the thermophilic bacteria T. maritima and A. aeolicus are closely connected (albeit with a low bootstrap value) and branch from euryarchaeota, suggesting an early event of lateral gene transfer (Fig. 2A). The same is true for the other bacterial species included in our sample, which branch together in a clade in which (a) the two T. thermophilus strains are in the same group as the green nonsulfur bacteria C. aurianticus, and (b) the two firmicutes species C. hydrogenoformans and H. orenii branch together. However, their clade does not include T. tengcongensis, the third firmicute included in our analysis. The PDO sequence of N. equitans has not been included in the tree shown in Fig. 2A.

Since the structural data of PDOs indicate that this protein is the outcome of an ancient internal duplication, we have compared the set of sequences of the first thioredoxin-like fold units with the second ones (Fig. 2B). The result is a tree formed by two well-defined groups of paralogous sequences, each of which has essentially the same branching order observed in Fig. 2A and which are also rooted in the crenarchaeota. This is consistent with the hypothesis that the presence of PDOs in bacterial thermophiles can be explained by lateral gene transfer events. The low bootstrap values observed in this tree are the result of comparing sequences that are half the size of those used to construct the phylogeny shown in Fig. 2A. The tree in Fig. 2B has two unusual features, namely, (1) the length of the branch corresponding to the first half of the N. equitans PDO (N. equitans-1), whose lack of the active site CXXC motif (Fig. 1) can be understood as a secondary loss due to its higher rate of change; and (2) the peculiar grouping of the two halves of the T. tengcongensis (T. tengcongensis-1 and T. tengcongensis-2), which branch together (bootstrap value 100) instead of with their corresponding paralogous homologs.

The bacterial PDO homologs alkyl hydroperoxide reductase (AhpF) and the disulfide bond–forming protein (DsbA), as well as the eukaryotic protein disulfide isomerase (PDI), have an ample phylogenetic distribution (Tian et al. 2006; Wood et al. 2001). The overall structure of the PDO and AhpF sequences is the same in all the organisms in which they have been compared (http://www.bacteria.fciencias.unam.mx/PDO/PDO.html). The finding that each half of the PDO sequences has a higher degree of similarity with the corresponding portions of AhpF confirms the hypothesis that the later genes arose by the duplication of the entire PDO. The phylogeny of AhpF and PDOs sequences is depicted in Fig. 3. Because of their highly divergent primary structure (Ren et al. 1998), we have not included in this tree the DsbA sequences, which are also homologs to PDO and AhpF.

Discussion and Conclusions

The evidence for a PDO in the LCA would have provided strong support for the idea that the LCA was a thermophile. However, the phylogenetic analysis of the PDO sequences presented here shows that the root of their phylogeny, determined by the ancient internal duplication, is located within the crenarchaeota and that the PDO spread from them into the bacterial domain by horizontal gene transfer events (HGT) via the euryarchaeota. It could be argued that the position of the euryarchaeal species N. equitans in Fig. 2A and B indicates that the internal duplication that gave rise to PDOs first took place between the Crenarchaea and the Euryarchaea. However, N. equitans is a highly derived parasitic archaeum (Brochier et al. 2005; Makarova and Koonin 2005), and the basal position of its sequences may be the outcome of an artifact. Indeed, the rooted phylogeny shown in Fig. 3 demonstrates that the ancestral duplication that gave rise to PDOs is firmly located in the crenarchaeota.

The trees shown in Fig. 2 suggest that one of these HGTs took place prior to the early branching of the Aquificales and Thermotogales. The results presented here support previous suggestions on the archaeal origin of bacterial PDO (Pedone et al. 2004) and are consistent with the other lines of evidence showing that significant gene exchanges took place between archaea and thermophilic bacteria (Forterre et al. 2000; Gogarten and Townsend 2005; Makarova and Koonin 2003; Nelson et al. 1999). Our findings argue, but only weakly, against a thermophilic LCA. The LCA could have been a thermophile that did not use S–S bonds to stabilize its proteins, or it could have been a mesophile.

It is difficult to explain the position of the firmicute T. tengcongensis in the trees shown in Fig. 2B. The grouping of its two thioredoxin folds in the same branch can be due either to a rather high rate of molecular evolution that obscures their phylogenetic analysis or perhaps to a recent, independent gene duplication event. In fact, the position of N. equitans in the PDO phylogeny shown in Fig. 2B is consistent with the possibility that it is a highly derived archaeum related to thermococcales (Brochier et al. 2005) that has undergone many modifications as part of its adaptation to a parasitic lifestyle (Brochier et al. 2005; Makarova and Koonin 2005; for an alternative view, see Di Giulio 2006). This explanation may also account for the length of the N. equitans branch in the upper half of the tree shown in Fig. 2B, which corresponds to its first thioredoxin fold and in which the CXXC redox motif is absent (Fig. 1). The current data also suggest that polyphyletic losses of this motif have taken place in the first halves of the C. aurianticus and P. aerophilum PDO sequences. Because the PDO isomerase activity depends on the two CXXC motifs, maintenance of the CXXC site located in the carboxy terminal half of the protein (Pedone et al. 2004) indicates that in N. equitans, C. aurianticus, and P. aerophilum the PDO homologs take part only in reductive/oxidative processes.

Low bootstrap values for earliest branching events and the evidence of lateral gene transfer shown in Figs. 2 and 3 suggest that PDOs and their homologs are not good evolutionary markers for studying deep phylogenies. However, they provide important insights into early stages of cell evolution. A comparison of PDOs with the N-terminal half of the bacterial AhpF indicates that the amino terminal and carboxy terminal domains of the PDOs resemble the corresponding domains of AhpF more closely than the two domains of proteins resemble each other. This finding confirms the hypothesis that AhpF arose by a duplication of the entire PDO. Biochemical evidence of the protective role of AhpF against oxidation damage (Chuang et al. 2006) suggests that it evolved after free oxygen began to accumulate in the Precambrian environment. Although AhpF sequences are widely distributed among the Bacteria, their absence in the ɛ and most α proteobacteria, as well as in actinobacteria, bacteroids, cyanobacteria, spirochaeta, and the green-sulfur bacteria, may indicate the existence of alternative mechanisms that protect against oxidation in these groups. The lack of AhpF genes in the chlamydia genomes that have been sequenced, however, is best explained as an outcome of secondary events related to their parasitic lifestyles.

The available evidence allows for a congruent picture of the evolution of PDO. Evidence for a PDO in the LCA would have reinforced the idea that it was a thermophile. However, our results suggest that this was not the case. As summarized in Fig. 4, the available evidence suggests that although the LCA may have not been a heat-loving entity, it was already endowed with a Trx-like protein with a CXXC motif involved in oxidation/reductive process (such as the synthesis of deoxyribonucleotides), which was inherited vertically by the Archaea and the Bacteria. A gene duplication event in the crenarchaeota led to the emergence of an ancestral PDO. The gene encoding this ancestral PDO was passed on vertically into the euryarchaeota and then transferred from them via horizontal gene transfer into the Bacteria. This hypothesis, which is consistent with the phylogenies shown in Figs. 2 and 3, implies that additional duplication events led to the bacterial AhpF, and in the eucarya to the PDI protein.

Fig. 4
figure 4

The gene duplication event that originated PDO enzymes is hypothesized to have taken place not in the LCA but later in a hyperthermophilic archaea lineage. Phylogenetic analysis suggests that the PDO gene spread from the euryarchaeota via horizontal gene transfer into the Bacteria. Additional duplications led to the N-terminal domain of the AhpF enzymes and, eventually, to the eucaryal PDI