Introduction

Thiolase is a key enzymatic activity present in the three domains of life. There are two main classes of thiolases; (i) acyl-CoA:acetyl-CoA C-acyltransferase (EC 2.3.1.16), also known as 3-oxoacyl-CoA thiolase or thiolase I; and (ii) acetyl-CoA:acetyl-CoA C-acetyltransferase (EC 2.3.1.9), also known as acetoacetyl-CoA thiolase or thiolase II. Although both types of thiolase catalyze reversible reactions, thiolase I is involved in catabolic processes, whereas thiolase II usually shows an anabolic function, albeit sometimes acting catabolically. Thiolase I has a broad chain-length specificity for its substrate (from 4 to 22 carbons) and catalyzes the thiolytic removal of an acetyl group from a 3-oxoacyl-CoA, as in the last step of every catalytic cycle of fatty acid β-oxidation (Yang et al. 1990). On the contrary, thiolase II is specific for C4 chains and catalyzes the nondecarboxylating Claisen condensation of two acetyl-CoA molecules to form acetoacetyl-CoA (Heath 2002). This is the first enzymatic step in many anabolic processes such as the biosynthesis of eukaryotic ketone bodies and sterols and the synthesis of poly(3-hydroxybutyric acid), a major energy and carbon storage molecule in many bacteria (Kadouri et al. 2002). Thiolase II can also act in the last thiolytic cleavage of the fatty acid β-oxidation spiral. Both classes of thiolases share significant sequence similarity (Igual et al. 1992), a common crystallographic fold, and possibly the same reaction mechanism with an acetylated cysteine as a covalent intermediate (Modis and Wierenga 2000). Altogether, this indicates that both anabolic and catabolic thiolases share a common evolutionary origin.

Different classes of thiolases have been biochemically characterized, and some of their primary sequences determined. In eukaryotes, the diverse thiolases are located in different cell compartments. Thiolase I is present in animal mitochondria and in the peroxisomes of every eukaryote examined. Thiolase II has been found in animal mitochondria (involved in the metabolism of ketone bodies and isoleucine), the peroxisome/glyoxysome of fungi and plants (where it serves to complete the degradation of fatty acids), and the cytosol of animals and fungi (with a well-characterized anabolic function in the biosynthesis of ketone bodies and sterols). Some eukaryotic cells contain an additional peroxisomal thiolase I specific for long- and branched-chain fatty acid β-oxidation, known as SPC-X. In some cases, this is a fusion protein between a thiolase and a sterol carrier protein (Stolowich et al. 2002). One evolutionary trend observed in the fatty acid β-oxidation pathway is the formation of multifunctional complexes that allow substrate channeling. In bacteria, the catabolic thiolases are 40- to 46-kDa monofunctional proteins that constitute the β subunit of the multifunctional complex (Kunau et al. 1995). The molecular association of the other β-oxidation pathway activities involved (3-hydroxyacyl-CoA dehydrogenase, 2-enoyl-CoA hydratase, and epimerase) ranges from monofunctional subunits, such as the recently described complex of Euglena gracilis (Winkler et al. 2003), to fused genes, such as those encoding the multifunctional proteins of bacteria, mitochondria, and peroxisomes (Kunau et al. 1995). Purified peroxisomal catabolic thiolases are homodimers, whereas the mitochondrial ones are homotetramers.

Taking advantage of the increasing number of complete genome sequences, and since the diverse thiolases have a common origin, it is now possible to carry out comparative phylogenetic analyses of the available genes in order to elucidate the distribution and evolutionary origin of the thiolase isoenzymes from the different eukaryotic cell compartments. We show here that eukaryotic thiolases I and II form well-defined phylogenetic clusters that in general correspond to isoenzymes with a concrete cell location. We also observe exceptions showing that, during eukaryotic evolution, some thiolase genes duplicated, gaining or losing diverse targeting sequences, which resulted in changes in their subcellular location, and even in functional shifts between anabolic and catabolic modes of action. The phylogenetic analysis of thiolases I and II suggests that all eukaryotic enzymes derive from at least two different types of proteobacteria and that they were possibly acquired very early during eukaryotic evolution.

Materials and Methods

Sequence Retrieval and Alignment

Sequences were retrieved from GenBank (www.ncbi.nlm.nih.gov) after identification by BLAST (Altschul et al. 1990). A file with all the protein sequences retrieved (934 sequences) was constructed after successive searches using different sequence queries (NCBI accession numbers): Escherichia coli FadA (AAC76848), Sulfolobus solfataricus Acab-1 (AAK40853), Homo sapiens mitochondrial 3-oxoacyl thiolase (BAA03800), Rattus norvegicus peroxisomal thiolase I (BAA14106), Xenopus laevis cytosolic thiolase II (AAD34967), Arabidopsis thaliana thiolase II (NP_199583) and thiolase I (NP_171965), Bacillus subtilis YusK (BG14023), and H. sapiens sterol carrier protein 2 (NP_02970). The sequences from Thalassiosira pseudonana, Cyanidioschyzon merolae, Myxococcus xanthus, and Bacteriovorax marinus were obtained from the Web sites of their genome projects (http://genome.jgi-psf.org/diatom/, http://merolae.biols.u-tokyo.ac.jp/, http://tigrblast.tigr.org/ufmg/index.cgi?database=m_xanthus|seq, and http://www.sanger.ac.uk/cgi-bin/blast/submitblast/b_marinus, respectively). The sequences were aligned with CLUSTAL W (Thompson et al. 1994) and the alignment was manually refined with the program ED of the MUST package (Philippe 1993).

Prediction of Targeting Sequences and Protein Domains

Putative presequences for mitochondrial targeting were examined with the TargetP program, version 1.01 (http://www.cbs.dtu.dk/services/TargetP/) (Emanuelsson et al. 2000), and the MitoProt program (http://ihg.gsf.de/ihg/mitoprot.html) (Claros and Vincens 1996). The possible peroxisome targeting signals located in protein C-termini (PTS1) or N-termini (PTS2) were visually inspected. Analysis of protein domains was performed with the NCBI CDART facility (http://www.ncbi.nih.gov/Structure/lexington/lexington.cgi?cmd=rps) that automatically searches in the databases Pfam and COG.

Phylogenetic Analysis

The complete data set (934 sequences) was initially analyzed by neighbor joining (NJ) (Saitou and Nei 1987) using the MUST package (Philippe 1993) to study the general topology of the groups and facilitate the selection of representative sequences of each thiolase subfamily. The final selection of sequences was done on an alignment of 674 sequences after eliminating partial sequences, redundancies, and sequences from environmental samples. This first inspection also allowed the detection of a subset of 165 very divergent sequences (including the eukaryotic SCP-X) that were analyzed separately. Three different data sets were constructed: (i) a wide sample of eukaryotic thiolases I and II (73 sequences, 289 sites), (ii) a sample of the divergent eukaryotic and prokaryotic thiolase sequences (112 sequences, 193 sites), and (iii) a sample of the remaining eukaryotic and prokaryotic thiolase sequences (109 sequences, 299 sites).

The three data sets were analyzed with Bayesian methods using the program MrBAYES 3 with a mixed substitution model and a Γ law (eight rate categories) and a proportion of invariant sites to take among-site rate variation into account (Ronquist and Huelsenbeck 2003). The Markov chain Monte Carlo search was run with 4 chains for 1,000,000 generations, with trees being sampled every 100 generations. The stabilization of the different parameters (tree likelihood, α shape parameter, and proportion of invariant sites) was studied with the program TRACER (Rambaut and Drummond 2003). The first 2500 trees were discarded as “burn-in,” keeping only trees generated well after the stabilization of those parameters. The Bayesian analysis was repeated four times for each data set. Trees reconstructed with the JTT model and a Γ law (eight rate categories) plus invariant sites produced identical topologies. Additional phylogenetic analyses were carried out using distance (minimum evolution; ME) and maximum parsimony (MP) methods implemented in PAUP* 4b10 (Swofford 2000). In both cases, 1000 heuristic searches were performed using the tree-bisection-reconnection branch-swapping option and random sequence addition. ME and MP bootstrap values were calculated with the same heuristic search options upon 1000 replicates. Maximum likelihood (ML) trees were constructed with the JTT model with a Γ law (eight rate categories) and a proportion of invariant sites using the program PHYML (Guindon and Gascuel 2003). For ML bootstrap analysis, 500 replicates of each data set were constructed using SEQBOOT from the PHYLIP 3.6 package (Felsenstein 1999) and analyzed with PHYML using the same parameters described above. The consensus tree was obtained using CONSENSE from the PHYLIP 3.6 package (Felsenstein 1999).

Results and Discussion

Diversity of Biochemical Functions, Subcellular Localization, and Phylogenetic Distribution of Thiolases

We constructed an exhaustive alignment of all available thiolase sequences (see Materials and Methods). Inspection of the alignment and preliminary phylogenetic analysis revealed that the eukaryotic thiolases formed six distinct clusters. One of these clusters was very divergent and, hence, analyzed independently (see below). A representative selection of 73 eukaryotic sequences from the remaining five clusters was used to construct a phylogenetic tree (Fig. 1). It showed an excellent statistical support for each one of the five thiolase groups (posterior probabilities [PP] of 1 and most bootstrap proportions [BP] of >90). In general, we observed a stronger statistical support with the Bayesian and ML than with MP and ME methods, the latter being more affected by among-site and among-species rate variation (Tateno et al. 1994). The tree was compatible with the possibility that thiolases I and II form two different monophyletic groups. This is in agreement with signature sequence positions that appear to differentiate the two thiolase classes (see Supplementary Material).

Figure 1
figure 1

Phylogenetic tree of eukaryotic thiolase sequences. Numbers close to nodes are the posterior probabilities (PP). At several key nodes, maximum likelihood (ML), maximum parsimony (MP), and minimum evolution (ME) bootstrap values are also indicated. The scale bar represents the number of substitutions per 100 positions per a unit branch length. m, predicted mitochondrial presequence; ptsl and pts2, peroxisomal targeting sequences 1 and 2; c, cytosol; ?, no recognizable targeting sequence.

The long-chain, membrane-bound, mitochondrial thiolase I cluster contained exclusively animal sequences (Fig. 1). This is in agreement with a wealth of biochemical evidence favoring the classical view that the mitochondrial localization of the β-oxidation pathway was idiosyncratic for animal cells (Kunau et al. 1995). However, the cluster corresponding to the thiolase I component of the multifunctional, mitochondrial matrix complex included not only animal sequences but also a thiolase with a predicted mitochondrial targeting sequence from the diatom Thalassiosira pseudonana (Fig. 1). This confirms the presence of a mitochondrially located β-oxidation pathway in the Heterokonta. This is also in agreement with biochemical evidence for the presence of this pathway in diatoms and several other photosynthetic eukaryotes, including green plants and euglenids (Winkler et al. 2003 and references therein).

The other main enzyme involved in fatty acid β-oxidation is peroxisomal thiolase I, which formed a statistically well-supported cluster including, among others, sequences from fungi, animals, and plants (Fig. 1). All these sequences (except the diatom T. pseudonana and the more divergent sequences from the fungi Aspergillus nidulans and Laccaria bicolor) contained a canonical peroxisomal targeting sequence of PTS2 type, i.e., the less frequent amino terminally located bipartite signal with the consensus sequence [R/K]-[L/V/I]-X5-[H/Q]-[L/A] (Swinkels et al. 1991). Interestingly, the fungi did not include any sequence from the complete genome of Schizosaccharomyces pombe, a possible indication that this organism might not contain a peroxisome (Emanuelsson et al. 2003). A sequence from the slime mold Dictyostelium discoideum clustered with the plant glyoxysomal thiolases and also showed a canonical PTS2. The red alga Cyanidioschyzon merolae contained an unusual thiolase I that belongs to this peroxisomal group, This sequence was much longer (1145 amino acid residues instead of 400 to 450 of the usual peroxisomal thiolase). The inspection of protein domains revealed that it is a multifunctional protein. The first 200 amino acids showed similarity to the enoyl-CoA hydratase/isomerase family. Residues between position 300 and position 600 showed similarity to the 3-hydroxyacyl-CoA dehydrogenase family. Finally, the C-terminal region of the protein (from position 700) showed similarity to thiolases and contained a canonical PTS2 motif. To the best of our knowledge this is the first example of a thiolase gene fused to other genes coding for enzymes participating in the β-oxidation pathway, and it would represent an extreme case of integration of the multifunctional complex.

Finally, the peroxisomal thiolase I cluster also contained, as the most divergent sequence, one from the microsporidian Encephalitozoon cuniculi. Increasing evidence supports that microsporidia represent a lineage of highly derived and reduced fungi adapted to intracellular parasitism instead of early-branching eukaryotes. They apparently lack several eukaryotic cell features such as conventional mitochondria and peroxisomes (Keeling and Fast 2002). The only sequence in the E. cuniculi genome exhibiting similarity to thiolases is the one that clustered with the peroxisomal group (Fig. 1). The emergence of E. cuniculi far from fungi in the thiolase tree was most likely due to a long branch attraction artifact (LBA) since its sequence is very divergent. Our phylogenetic analysis together with other data suggest a peroxisomal origin but a cytosolic location of the E. cuniculi thiolase I because of (i) the absence of any recognizable peroxisomal targeting signal in this sequence (neither a predictable mitochondrial presequence), (ii) the lack of evidence for the presence of either peroxisomes or the β-oxidation pathway in E. cuniculi, since there are no sequences for the corresponding enzymes in the complete genome (Katinka et al. 2001), and (iii) the cytosolic localization of the mevalonate pathway in animals and fungi (Lange et al. 2000). The metabolic chart inferred from the genome sequence of E. cuniculi indeed includes a complete enzymatic mevalonate pathway to convert acetate in isopentenyl diphosphate (Katinka et al. 2001). The E. cuniculi peroxisomal-like thiolase would therefore perform an anabolic function (the first step in the mevalonate pathway) in a different compartment (the cytosol), suggesting that E. cuniculi has retained at least one element of peroxisomal metabolism that has been recruited for another biochemical function in a different cell compartment.

Thiolase II sequences split in two well-supported clusters (Fig. 1). One contained the anabolic, cytosol-located, animal thiolases, whereas the other grouped a mixture of thiolases with different cell location, including sequences from protists, plants, fungi, and animals. In general, each phylogenetic group contained thiolases of the same cell compartment, with a few exceptions that deserve some comments. The plant cluster comprised the glyoxysomal thiolase II sequences exhibiting a canonical PTS2, but also a recently duplicated copy in Arabidopsis thaliana with a cytosolic location. It was closely related to the sequence from Raphanus sativus, identified by complementation of an erg10 mutation in Saccharomyces cerevisiae biochemically characterized (Vollack and Bach 1996). Recognizable PTS were absent in the sequences of both R. sativus and the second copy of A. thaliana. The yeast erg10 gene codes for a cytosolic thiolase II, with a well-characterized function in the biosynthesis of ergosterol (Hiser et al. 1994) and clustered with other fungal sequences (Fig. 1). Two sequences from the red alga C. merolae branched close to the plant glyoxysome/cytosol group. In this case, one contained a predicted mitochondrial targeting presequence, whereas the other (labeled with a question mark in Fig. 1) lacked any recognizable targeting signal, to either the mitochondria or the peroxisome. Two sequences from Candida tropicalis grouped within the cluster of fungal cytosolic thiolases but it has been demonstrated that these are almost identical isoenzymes showing a double cell destination: cytosol and peroxisome (Kanayama et al. 1997). Actually, within this set of fungal sequences, only those from C. tropicalis exhibited a PTS1, i.e., the C-terminal tripeptide AKL. The animal sequences in this cluster formed the group of mitochondrial thiolases II. All these sequences showed a predictable mitochondrial targeting presequence except the more divergent sequence from Caenorhabditis elegans. In this case the sequence showed the PTS1 motif KKL, one of those used in the prediction of peroxisomal proteomes by Emanuelsson et al. (2003). Two fungal sequences branching close to the animal cluster point to the existence of thiolase II in some fungal peroxisomes. Both contain a PTS2. Moreover, the biochemical function of the corresponding protein in Yarrowia lipolytica has been characterized as an essential peroxisomal enzyme required for n-decane utilization (Yamagami et al. 2001). A sequence from the diatom T. pseudonana that lacked any predictable targeting signal (labeled with a question mark in Fig. 1) branched close to the base of the group of plants and fungi. The sequence from D. discoideum also lacked any predictable targeting signal. The very divergent sequences from Plasmodium yoelii, P. falciparum, and Giardia lamblia presented a predicted mitochondrial targeting sequence. Interestingly, not only does the diplomonad G. lamblia contain nuclear genes of putative mitochondrial ancestry, but it has been recently demonstrated that it has even a mitosome, the remnant of an ancestral mitochondrion that still maintains several essential metabolic functions (Tovar et al. 2003). It would thus be possible that G. lamblia mitosomes have a thiolase activity.

Finally, as mentioned before, a sixth group of divergent eukaryotic thiolases was detected. It contained exclusively animal and fungal sequences and corresponded to SCP-X enzymes (Fig. 2), the peroxisomal thiolase I specific for the degradation of branched-chain acyl-CoA, including the oxidation of the lateral chain of cholesterol (Stolowich et al. 2002). The gene in vertebrates and some arthropods (Anopheles gambiae and Spodoptera littoralis [Takeuchi et al. 2004]) is the result of a fusion; a thiolase I in the N-terminal region and a sterol-carrier protein (SCP-2) in the C-terminal portion. All the sequences examined from either animal or fungal origin contained a PTS1 motif.

Figure 2
figure 2

Phylogenetic tree of eukaryotic SCP-X-like thiolases I and their closest prokaryotic homologues. Numbers close to nodes are the posterior probabilities (PP). At several key nodes, maximum likelihood (ML), maximum parsimony (MP), and minimum evolution (ME) bootstrap values are also given. The scale bar represents the number of substitutions per 100 positions per a unit branch length. Eukaryotic species are in boldface and different groups of prokaryotic species are shown. The eukaryotic sequences fused to a sterol carrier protein are indicated.

Eukaryotic Thiolases Have Different Proteobacterial Origins

In order to investigate the evolutionary origin of the six eukaryotic thiolase families, we carried out extensive phylogenetic analyses including a selection of eukaryotic sequences together with a wide representation of prokaryotic homologues. In the case of the SCP-X, the phylogenetic tree showed that the eukaryotic sequences formed a well-supported group and that the closest relatives to this group were sequences belonging to α- and β-proteobacterial species. This suggests a proteobacterial origin of the eukaryotic enzyme, although without a clear preference for any specific proteobacterial group (Fig. 2). Notably, most of the archaeal thiolase sequences belonged to this group of divergent sequences.

The analysis of the remaining five eukaryotic families revealed a complex picture. In fact, as occurs in eukaryotes, many prokaryotic species possess a large number of thiolase gene copies. The prokaryotic sequences used for our analysis included representatives of well-characterized metabolic functions, such as the catabolic thiolases involved in fatty acid β-oxidation and the anabolic enzymes for the biosynthesis of poly(3-hydroxybutyric acid). We also incorporated other sequences derived from the complete genome projects, such as the archaeal homologues, which still await a full biochemical characterization. The phylogenetic relationships among the prokaryotic sequences did not always agree with the monophyly of known phyla (Fig. 3). This suggests that gene duplication, differential gene loss, and horizontal gene transfer (HGT) have had an important role in the evolution of prokaryotic thiolases. A likely example of HGT concerned the thiolase II of Synechocystis sp. that branched within a group of archaeal thiolases. Nevertheless, all the eukaryotic groups, as in the previous case of the SCP-X, clearly emerged as sister groups of proteobacterial sequences.

Figure 3
figure 3

Phylogenetic tree of prokaryotic and eukaryotic thiolases I and II. Numbers close to nodes are the posterior probabilities (PP). At several key nodes, maximum likelihood (ML), maximum parsimony (MP), and minimum evolution (ME) bootstrap values are also given. The scale bar represents the number of substitutions per 100 positions per a unit branch length. Eukaryotic species are in boldface, and different groups of prokaryotic species are indicated.

The mitochondrial membrane-bound thiolase I sequences branched robustly with diverse δ- and γ-proteobacteria, very distant from their α-proteobacterial relatives. This result was unexpected for a protein of proteobacterial origin located in mitochondria. Curiously, the animal cytosolic thiolase II sequences emerged closely related to α-proteobacteria (Fig. 3), which strongly suggests a mitochondrial origin for this cytosolic enzyme. The very diverse eukaryotic cluster of thiolases II with diverse subcellular locations also showed a proteobacterial affinity, but without a clear preference for any proteobacterial group. They emerged close to several δ-proteobacteria, but also to a group containing α-, β-, and γ-proteobacteria. This eukaryotic cluster also showed a very likely case of HGT to a bacterial species, Cytophaga hutchinsonii, which branches within this cluster, close to the thiolase II from animal mitochondria. Remarkably, it has been recently reported that the aconitase and isocitrate dehydrogenase from C. hutchinsonii and other related bacteria are close to their eukaryotic homologues. This led to the proposal that an ancestral bacterium from the CytophagaFlavobacteriumBacteroides (CFB) group contributed genes to the mitochondrial citric acid cycle (Baughn and Malamy 2002) and, hence, that the mitochondrion could derive from a bacterial consortium including CFB rather than from a single α-proteobacterium (Walden 2002). However, in the case of the thiolases, C. hutchinsonii was the only CFB species branching within the eukaryotes, which suggests a recent HGT from eukaryotes to this species rather than an ancient contribution of CFB.

Unexpectedly, the two remaining groups of eukaryotic thiolases, the mitochondrial-matrix and the peroxisomal thiolases I, branched robustly with δ-proteobacterial homologues (Fig. 3). In the case of mitochondrial-matrix thiolase I, no closely related α-proteobacterial sequences were found, whereas only a very distantly related sequence from the α-proteobacterium Rhodopseudomonas palustris branched in the proximity of the peroxisomal thiolase I group. This points to a puzzling robust relationship between the eukaryotic and the δ-proteobacterial enzymes. Moreover, this result supports the chimeric nature of the mitochondrial proteome (Walden 2002), whose mitochondrial matrix thiolase I appears to come from δ-proteobacteria in several eukaryotes.

The α-proteobacterium that originated the mitochondria most likely had a complete β-oxidation pathway (Gabaldon and Huynen 2003) that was subsequently lost in most eukaryotes (Fig. 4). The presence of non-α-proteobacterial thiolases in mitochondria can be explained by relocation of the enzymes. The acquisition of the appropriated regulatory and targeting signals would allow these new genes competing with, substituting for, or diversifying the metabolic function performed by the original endosymbiont genes. Our observations indicate that those processes occurred repeatedly in different eukaryotic lineages (Fig. 4 and see below), sometimes resulting in changes not only in cell location but also in metabolic function. For example, fungi and some plants recruited a thiolase II of probable δ-proteobacterial origin to carry out the anabolic activity in the cytosol, whereas in animals this enzyme is responsible for the last cycle of the fatty acid β-oxidation and has a mitochondrial location. For the corresponding biosynthetic function in the cytosol, animal cells have an enzyme of α-proteobacterial ancestry, which was incorporated into the nuclear genome after the origin of mitochondria. In summary, our results show that there is no direct relationship among the phylogenetic origin of a metabolic enzyme, its final metabolic function, and the cell compartment where it acts.

Figure 4
figure 4

Schematic representation of the subcellular location and phylogenetic affinity of eukaryotic thiolases, α, α-proteobacteria; δ, δ-proteobacteria; γ, γ-proteobacteria; p, undetermined proteobacteria; ?, uncertain subcellular location.

Speculations on the Origin and Evolution of Eukaryotic Thiolases

The results of our phylogenetic analyses strongly suggest that the diverse eukaryotic thiolases have different evolutionary origins, but all of them can be traced back to proteobacterial ancestors. From the six monophyletic eukaryotic thiolase clusters, two of them (SPC-X and the thiolase II cluster of different subcellular locations) did not show a clear phylogenetic affinity for any specific proteobacterial lineage. One (animal cytosolic thiolase II) branched close to α-proteobacterial sequences, suggesting a mitochondrial origin, while two others (mitochondrial matrix and peroxisomal thiolases I) branched as sisters to δ-proteobacteria, suggesting an early contribution of this proteobacterial subdivision to the eukaryotic heritage (Figs. 2 and 3). Finally, the mitochondrial membrane thiolase I branched far from any α-proteobacterial representative but close to several γ- and δ-proteobacterial sequences. It is therefore difficult to conclude if it derives from any of these two proteobacterial subdivisions. However, not being of mitochondrial origin, a δ-proteobacterial ancestry would be the most parsimonious explanation, since this would involve a single, δ-proteobacterial, donor that contributed several thiolase genes to the ancestral eukaryotic lineage. This would imply that at least all the non-SCP-X thiolases I were inherited from δ-proteobacteria. Nevertheless, given the complexity of the thiolase phylogeny, we cannot completely rule out the possibility that gene duplication and differential gene losses, and even undetected HGT between different proteobacterial groups, account for these observations, although the number of events required would yield this possibility unlikely.

The distribution of the different thiolase types and their inferred evolutionary affinities in the various eukaryotic groups was diverse (Fig. 4). The picture emerging from our study is that the only universal metabolic trait in eukaryotes is the peroxisomal ability to degrade acyl-CoA by β-oxidation. At least one of the thiolases involved appears to have a δ-proteobacterial ancestry. Animal cells, and perhaps diatoms, have a second β-oxidation pathway located in mitochondria. Remarkably, no mitochondrial thiolase appears to be of α-proteobacterial origin, but they tend to branch with members of other proteobacteria, particularly from the δ subdivision.

While the α-proteobacterial affinity of the animal cytosolic thiolase II can be easily explained by a mitochondrial endosymbiotic origin, the relationship between several eukaryotic thiolases and the δ-proteobacterial homologues is intriguing. Since these enzymes are widespread in distant eukaryotic groups, they most likely have an ancient origin and were present in the common ancestor to known eukaryotes. A possible explanation might be that the α-proteobacterial ancestor of mitochondria had acquired these genes from δ-proteobacteria by HGT prior to the endosymbiosis. This possibility has been suggested to account for the presence of other genes of non-α-proteobacterial origin in eukaryotes (Schnarrenberger and Martin 2002). However, there is an excellent sampling of complete α-proteobacterial genome sequences and no HGT from δ-proteobacteria can be inferred for these genes, weakening—but not completely excluding—this argument. HGT from δ-proteobacteria to a primitive eukaryotic lineage could be an alternative explanation, as suggested for other bacterial-like genes in eukaryotes (Doolittle 1998). At any rate, this δ-proteobacterial connection, though it concerns members from only one gene family, is in agreement with an ancient symbiosis between a δ-proteobacterium and a methanogenic archaeon for the origin of eukaryotes as postulated by the syntrophy hypothesis (López-García and Moreira 1999; Moreira and López-García 1998). In this model, a second symbiosis with the α-proteobacterial ancestor of mitochondria would have occurred either simultaneously or in a more advanced evolutionary stage (Fig. 5). It would be interesting to study the rest of the genes involved in these metabolic pathways to see whether their phytogeny is congruent with that revealed by thiolases.

Figure 5
figure 5

Schematic model for the origin of eukaryotic thiolases. Dashed black arrows represent gene transfer events and solid black arrows indicate targeting of nuclear-encoded cytosol-synthesized proteins. α, α-proteobacterial origin; δ, δ-proteobacterial origin; EGT, endosymbiotic gene transfer; mit, mitochondrion; perox, peroxisome/glyoxysome; Th-I, thiolase I; Th-II, thiolase II; ?, uncertain subcellular location. The star indicates that, although we favor a δ-proteobacterial/archaeal symbiosis to explain the origin of the eukaryotic thiolases, HGT of thiolase genes from a δ-proteobacterium to a premitochondrial protoeukaryotic lineage is also possible. The dotted gray arrows show alternative possibilities for the origin of mitochondria and peroxisomes.

Our data are also compatible with an early origin of the peroxisomes with a functional β-oxidation pathway, as proposed by de Duve (1969). Whether the peroxisome was present prior to the mitochondrial endosymbiosis and whether it has an endosymbiotic origin remain unresolved issues. However, we favor the idea of a peroxisomal origin predating the mitochondrial acquisition, since this would be a more parsimonious explanation for the prevalence of δ-proteobacterial-type thiolases in the eukaryotic cell. In our view, the latter were already present in the ancestral eukaryotic cell, so that the incoming mitochondrial-type thiolases would have lost the competition against a fully functional set of δ-proteobacterial-type thiolases.