Introduction

Protein degradation is a key cellular process. All organisms possess specific systems that digest proteins into small peptides and, finally, single amino acids. In eukaryotes, most proteins are degraded by the ubiquitin-proteasome system (reviewed by Glickman and Ciechanover 2002; Groll et al. 2005). Ubiquitin chains are added to proteins that become tagged for destruction by a complex multiprotein machine, the proteasome. Proteasomes are found not only in eukaryotes but also in archaea and in some eubacteria, such as several actinomycete species (Volker and Lupas 2002; Gille et al. 2003). The eukaryotic 26S proteasome is formed by a proteolytic core, called 20S proteasome, and 19S regulatory complexes. The 20S proteasome is formed by 28 related subunits. There are two types of subunits, α and β, that are arranged to form four seven-membered rings, with the two outer rings containing only α and the inner rings only β subunits (Glickman and Ciechanover 2002; Groll et al. 2005). This structure is conserved in the archaeal and eubacterial proteasomes (reviewed by de Mot et al. 1999; Volker and Lupas 2002). The α and β subunits probably diverged before the archaea/eukaryote split but still conserve substantial primary sequence similarity and a characteristic protein fold, often called the proteasome fold (Hughes 1997; Bouzat et al. 2000; Volker and Lupas 2002; Grolle et al. 2005). However, while in prokaryotes single α and β subunits are found, in eukaryotes these subunits are encoded by multiple genes, in such a way that each of the proteasome rings is formed by seven distinct subunits (e.g., Bouzat et al. 2000). The origin of the proteasome is still obscure. Two different views are as follows: (1) proteasomes originated just before the archaea/eukaryotic split, being subsequently horizontally transferred to actinomycetes (e.g., Volker and Lupas 2002); and (2) proteasomes originated in eubacteria—archaea and eukaryotes derive from an actinomycete ancestor that already contained proteasomes (Cavalier-Smith 2002).

Many eubacteria do not contain proteasome-related genes, but they have other types of protease systems (reviewed by Groll et al. 2005). Most interestingly, a complex structurally reminiscent of the proteasome, called HsuVU (or, sometimes, ClpQY), can be found in many eubacterial species (Gille et al. 2003). This complex contains two types of proteins. HslU (also known as ClpY) is an AAA + ATPase, unrelated to any of the subunits of the 20S proteasome. HslV (ClpQ), however, has a significant similarity to proteasomal β subunits (Chuang et al. 1993). The structure of the HslVU complex has been determined, also showing interesting similiarities with proteasomes. This complex has four characteristic rings with six subunits each. The two outer rings are formed by six HslU subunits each, while inner rings contain six HslV subunits (Rohrwild et al. 1997; see also Sousa et al. 2000; Wang et al. 2001). Determination of the three-dimensional structure of HslV proteins of multiple species has shown that they also possess the characteristic proteasome fold found in α and β proteasomal subunits (e.g., Bochtler et al. 1997). This has led several authors to postulate that HslVU complexes are the eubacterial counterparts of proteasomes or even some type of precursor complexes from which 20S proteasomes derived, passing from a state with two rings of proteases to the current structure with four protease rings (Volker and Lupas 2002). For some time, it was thought that HslVU and proteasomes were mutually incompatible, so species may have one or the other (or none of them, as in some eubacteria) but not both (De Mot 1999). However, more recently it was found that protists such as the euglenozoa Leishmania and Trypanosoma or the apicomplexa Plasmodium contained both proteasome complexes and HslU and HslV genes (Couvreur et al. 2002; Gille et al. 2003). It was then suggested that these eukaryotes may have acquired the genes by horizontal transfer from the endosymbiont that gave rise to mitochondria or by more recent gene transfers favored by the direct contact of these unicellular eukaryotes with bacteria in their insect vectors (Couvreur et al. 2002; Gille et al. 2003).

In this study, we describe the finding of HslU and HslV genes in many different eukaryotic lineages. We also show that HslU eukaryotic proteins are most similar to their proteobacterial counterparts, a result that is also found, albeit with a low statistical support, for HslV proteins. Structural analyses strongly suggest that eukaryotic and prokaryotic HslU and HslV proteins may fold in very similar ways. These results suggest that active HslVU complexes may be present in many eukaryotic lineages and that eukaryotes most likely acquired these proteins from the endosymbiotic proteobacteria from which mitochondria derived. The implications of these results for our understanding of the evolutionary history of HslVU complexes and proteaseomes are discussed.

Methods

Sequence Data Mining

All sequences analyzed in this study were obtained from the National Center for Biotechnology information (NCBI) databases. To generate a representative database, several prokaryotic HslU and HslV sequences were used as queries to perform TBLASTN searches against the nonredundant, est, month, wgs, htgs, and gss NCBI databases. We used both blastcl3 client and web searches at http://www.ncbi.nlm.nih.gov/BLAST/. Large sets of representative sequences were retrieved for prokaryotic species. Then specific searches were performed to characterize all available eukaryotic sequences. Because all HslU and HslV sequences are very similar, these searches quickly became saturated (i.e., no matter the query sequence used; searches generally detected the same eukaryotic sequences). The analyses were finished in April 2006.

Multiple-Sequence Alignments and Phylogenetic Trees

We used CLUSTALX 1.83 (Thompson et al. 1997) to align the sequences obtained in our searches. Multiple alignments were then manually refined using GeneDoc 2.6 (Nicholas and Nicholas 1997). Phylogenetic trees were obtained by both the neighbor-joining and the maximum parsimony methods. Neighbor-joining analyses were performed using Mega 3.1 (Kumar et al. 2004) with the correction for multiple substitutions and the pairwise deletion option. Maximum parsimony analyses were performed also using Mega 3.1, with the following parameters: (1) all sites included; (2) initial trees obtained by random addition, with 10 replicates; and (3) close-neighbor interchange with search level 3. Interior branch tests and bootstrap tests (1000 replicates) were performed to establish the reliability of the topologies obtained for the neighbor-joining and maximum parsimony analyses. TreeView 1.6.6 (Page 1996) was used to explore the trees and generate the figures presented below. The results of phylogenetic analyses allowed to detect false-positive eukaryotic sequences (i.e., sequences anotated as eukaryotic, but being of prokaryotic origin) as well as to characterize sequences that were present two or more times. These false-positive or duplicate sequences were eliminated from our final trees.

Three-Dimensional Structure Modeling

Comparison of the primary sequences of eukaryotic HslU and HslV proteins with those of prokaryotes for which crystal structures are available and generation of the three-dimensional models were performed using Swiss-Model (http://www.swissmodel.expasy.org/) (Peitsch 1996). The images were then analyzed with Swiss-PdbViewer 3.7 (Guex and Peitsch 1997) (details at http://www.swissmodel.expasy.org/spdbv/). The same program was used to generate the figure with the three-dimensional structures shown below. The known structures of the HslU and HslV Escherichia coli proteins were used as templates. Protein Data Bank codes for the templates of HslV were 1ned (Botchler et al. 1997), 1e94 (Song et al. 2000), 1g4b, 1ht1, and 1hqy (the latter three from Wang et al. 2001). For HslU, we used the template 1do0 (Bochtler et al. 2000).

Results

HslU Genes of Likely Mitochondrial Origin Are Present in Most Eukaryotic Lineages

Figure 1 shows a multiple-sequence analysis of all the HslU eukaryotic sequences detected compared with the sequence of the canonical Escherichia coli HslU gene. The complete sequences available include not only the ATP-binding site, located in the N-terminal (N) domain, but also the intermediate (I) and C-terminal (C) domains (see, e.g., Bochtler et al. [2000] and Fig. 1 legend for a precise definition of these domains). The I domain is the less conserved region of the HslU protein. We found HslU sequences in species from five of the six main eukaryotic groups (Simpson and Roger 2004), namely, amoebozoa (Dyctiostelium discoideum), plantae (the vascular plant Oryza sativa and green algae such as Ostreococcus tauri and Cyanidoschizon merolae), chromoalveolata (alveolata such as species of the Plasmodium, Toxoplasma, and Tetrahymena genera; haptophytes such as Emiliania huxleyi; stramenophiles such as species of the Phytophthora and Thalassiosira genera; chryptophytes such as Guillardia theta), rhizaria (the cercozoan Bigelowiella natans), and excavata (the euglenozoans Leishmania infantum and the species of the Trypanosoma genus). It is unclear whether HslU is present in the sixth main eukaryotic group, the opisthokonta, which includes, among others, fungi and animals. In fact, there are many HslU sequences in the databases that have been annotated as belonging to animal species, including human. In those cases their similarity to bacterial sequences allowed us to detect and eliminate them from our final analyses. However, a single cDNA sequence putatively derived from the fish Danio rerio was found that is quite different from all other sequences, either prokaryotic or eukaryotic. This sequence encodes just a small part of the HslU protein (see Fig. 1), precluding its precise classification. Thus, it is impossible at present to know whether this sequence may be of eukaryotic origin. However, even if it is indeed eukaryotic, a contamination of Danio samples by DNA of other eukaryotic organisms cannot be ruled out. Considering that no other animals seem to have HslU genes, we think it is very unlikely that this sequence actually derives from a vertebrate.

Fig. 1.
figure 1

Multiple-sequence alignments of eukaryotic HslU sequences. Escherichia coli HslU is shown as a model of prokaryotic gene. The domains defined in the HslU protein (Bochtler et al. 2000) are located in this figure as follows: N domain (positions 2–107, 288–387), I domain (positions 108–287), and C domain (positions 388 to the end of the proteins).

Figure 2 shows a neighbor-joining phylogenetic tree for HslU sequences (101 prokaryotic plus the longest 16 eukaryotic sequences available) in which the maximum parsimony results have also been included. Both methods of phylogenetic reconstruction generate almost-identical trees, in which the eukaryotic sequences always appear as a monophyletic group, sister of α-proteobacterial HslU genes. The eukaryote/α-proteobacteria clade is strongly supported by the interior branch test. However, bootstrap support is quite low (see Fig. 2). In any case, the high similarity of eukaryotic and proteobacterial genes, already detected by other authors, albeit in much smaller analyses (Couvreur et al. 2002; Gille et al. 2003), is striking and suggests that the origin of eukaryotic HslU genes may be the endosymbiotic event that generated the mitochondria, which derives from an α-proteobacterial ancestor.

Fig. 2.
figure 2

Neighbor-joining tree of HslU sequences. Numbers in brackets indicate how many species were included in the collapsed branches. Bootstrap results for maximum-parsimony analyses were similar enough as to be embedded in this tree. Thus, the numbers above the branches indicate the bootstrap support (in percentages) according to neighbor-joining (left) and maximum-parsimony (right) analyses. Only those branches supported in both types of phylogenetic reconstruction with significant interior branch tests (PC ≥ 95% [see Sitnikova et al. 1995]) and in which at least one of the two bootstrap values was above 50% are detailed.

HslV Genes Are Also Present in Multiple Lineages

We detected HslV genes in almost as many lineages as HslU genes. Figure 3 shows the multiple-sequence analysis comparing the 22 available eukaryotic sequences together with their E. coli homolog. HslV genes were found in amoebozoa (D. discoideum), plantae (both plants [Cycas and Physcomitrella] and algae [Ostreococcus, Cyanidoschizon, Chlamydomonas]), chromoalveolata (the alveolata Plasmodium, Neospora, Toxoplasma, and Tetrahymena, the stramenophiles Phaeodactylum, Phytophthora, and Thalassiosira, and the haptophyte Prymnesium), and excavata (Leishmania, Trypanosoma). We did not find HslV sequences from Opisthokonta or Rhizaria. We generated phylogenetic trees based on HslV sequences, which included the 20 longest eukaryotic sequences (length > 100 amino acids) plus other 140 prokaryotic HslV genes (Fig. 4). In both neighbor-joining and maximum parsimony trees, eukaryotic sequences again group together with α-proteobacterial sequences. However, HslV sequences are quite short and therefore the information that they contain is limited. Consequently, bootstrap support for the corresponding topologies is very low (see details in legend to Fig. 4).

Fig. 3.
figure 3

Alignment of eukaryotic HslV protein sequences with the prokaryotic Escherichia coli canonical HslV.

Fig. 4.
figure 4

Neighbor-joining tree of HslV sequences. Conventions as in Fig. 2, except that here we show bootstrap data for all branches with significant interior branch tests. A dash indicates lack of support in maximum parsimony analyses. The branch that groups eukaryotes and all but five α-proteobacteria in the neighbor-joining tree is strongly supported by interior branch tests (PC = 99%) but bootstrap support is negligible. This branch is not supported by maximum parsimony analyses. However, this is caused by the additional five α-proteobacterial sequences being included with the rest. Therefore, both types of analyses support that eukaryotic sequences are most similar to α-proteobacterial sequences.

Eukaryotic HslU and HslV Sequences Can Be Modeled to Structures Compatible with Functional Activity

To determine whether eukaryotic sequences may encode products able to form functional HslVU complexes, we decided to use the available crystal structures for HslU and HslV proteins to model the three-dimensional folding of deduced eukaryotic proteins. As an example, Fig. 5 shows the folding of the HslU (Fig. 5a) and HslV (Fig. 5b) proteins of the green algae Cyanidoschizon merolae (results for the proteins of other eukaryotic species are similar). The modeled structures are almost identical to that of the bacterial proteins. Thus, C. merolae HslU shows the three characteristic domains already described in prokaryotic proteins (C, I, and N domains; see Fig. 5a), while C. merolae HslV is modeled as having a typical proteasome fold. These results show that the eukaryotic-specific amino acid changes (e.g., all those accumulated in the I domain of HslU that we show in Fig. 1) do not significantly alter protein folding.

Fig. 5.
figure 5

Three-dimensional structures of the Cyanidoschizon merolae HslU (a) and HslV (b) proteins modeled according to the corresponding Escherichia coli templates. The three domains in HslU are indicated. These figures can be compared to the E. coli structures shown in Figs. 2 and 3 of Bochtler et al. (2000), which are virtually identical.

Discussion

Our results establish that HslU and HslV genes are present in many eukaryotic species, thus extending the previous findings of Couvreur et al. (2002) and Gille et al. (2003), which detected those genes in a few protozoans. Given the likelihood of contamination of eukaryotic samples—especially from partially sequenced genomes—by eubacterial DNA, it is important to stress the coherence of the results, which makes it very unlikely that these results are an artifact. First, all eukaryotic genes appear as a monophyletic ensemble in the HslU phylogenetic trees, clearly distinct from the prokaryotic sequences. They also appear together in the HslV trees, although mixed with α-proteobacterial sequences due to the poor resolution provided by the short HslV sequences. Second, in many cases, both genes have been found together in the same species. This includes amoebozoa species (such as Dyctiostelium discoideum), plantae species (Cyanidoschizon merolae, Ostreococcus tauri), chromoalveolata (several Plasmodium species, Phytophthora infestans, Tetrahymena thermophila, Toxoplasma gondii, Thalassiosira pseudonana), and excavata (Leishmania infantum, two Trypanosoma species). The likelihood of all of them being false positives is negligible.

In summary, most of the main eukaryotic lineages have some species with HslU and HslV genes. Most significantly, both unikonts (such as D. discoideum) and bikont species (all the rest) have been found to have these genes. Because there is good evidence for the unikont/bikont split being the deepest dichotomy in eukaryotes (see reviews by Cavalier-Smith [2004] and Richards and Cavalier-Smith [2005]), we think that the last common eukaryotic ancestor contained both genes. This result, together with the phylogenetic analyses, which strongly hint that these genes have a proteobacterial origin, suggests that they were part of the set of genes transferred to eukaryotes in the endosymbiotic process involving an α-proteobacteria that gave rise to mitochondria. If this is indeed the correct evolutionary history, then we may deduce that HslU and HslV genes became dispensable and disappeared in many organisms independently. We have evidence that suggests that this process is still continuing today. As indicated above, most eukaryotic gene sequences are compatible with their encoding functional products. However, the sequences of both genes of Tetrahymena thermophila contain several stop codons that suggest that they have become inactive quite recently.

The substantial sequence conservation, conserved three-dimensional structures, and appeareance of both genes in many species suggest that functional HslVU complexes similar to those found in eubacteria may be still being generated in many eukaryotes. We can speculate that they serve as a backup or complementary system for the proteasome, although other functions related to protein degradation are also possible. Our results also contribute to the understanding of the relationships between the proteasome and the HslVU complex. When thinking about the origin of those two protein complexes, we envisage two main options. First, proteasome and HslVU may be viewed as alternatives, in which eubacteria have HslVU, while archaea and eukaryotes have proteasomes. Under this hypothesis, both actinomycete proteasomes and eukaryotic HslVU complexes would have been horizontally transferred, in the first case from eukaryotes to actinomycetes and in the second case by endosymbiosis. This is an option already suggested by Volker and Lupas (2002) and Couvreur et al. (2002), among other authors. A second option is that the HslVU complex is much more ancient than the proteasome and HslV a likely precursor of proteasome subunit genes. Thus, proteasomes may have emerged in actinomycetes, perhaps as a substantial modification of HslVU complexes, and later archaea and eukaryotes received proteasomes as a consequence of their deriving from an actinomycete species (as suggested by Cavalier-Smith 2002). The finding of HslVU complexes in eukaryotes again would be explained by horizontal transmission by endosymbiosis. It is unclear how to test which one of these two hypotheses is correct. However, in our opinion any proof will depend more on the true relationships among the three domains—eubacteria, archae, eukaryotes—than on further extensions of the phylogenetic range in which HslVU or proteasomes are present.