Introduction

The evolution of gene or protein families has a central importance in biology (Ohno 1970). Evolutionary analysis of individual protein families can uncover patterns of protein diversification. The acquisition of complete genome sequences has greatly increased our ability to study gene family evolution. The analysis of complete genome data has clear advantages over analysis of EST or cDNA sequences. With genome data we now have the ability to examine all of the members of any given gene family, not just those that are expressed under certain conditions. Angiosperms have long been well known for possessing a large number of complex gene families. However, due to the lack of genome data from algae and basal land plant species, it has often been unclear when and how these gene families diversified.

The small heat shock protein (sHSP) family is found in all three domains of life: Archaea, Bacteria, and Eukarya (Plesofsky-Vig et al. 1992; de Jong et al. 1993; Caspers et al. 1995; Waters 1995; Waters and Vierling 1999a; Franck et al. 2004). Duplications of the sHSPs within the vertebrate lineage gave rise to the well-studied α-crystallin proteins that are an important structural component of the vertebrate eye lens (Plesofsky-Vig et al. 1992; de Jong et al. 1993, 1998; Caspers et al. 1995; Franck et al. 2004). The functional importance of the sHSPs is illustrated by the fact that sHSP genes have been found in many genomes, even in those highly reduced genomes that have lost many other genes (Gil et al. 2003; Waters et al. 2003). It is now well established that the sHSPs are chaperone proteins that can prevent the irreversible denaturation of other proteins (Becker and Craig 1994; Boston et al. 1996; Buchner 1996; Ehrnsperger et al. 1997; van Montfort et al. 2002; Wang et al. 2004). The sHSPs are expressed during heat shock (when the need for chaperones increases) and the presence of these proteins can confer thermal tolerance; however, many sHSPs are also known to be developmentally regulated (Waters et al. 1996; van Montfort et al. 2002; Wang et al. 2004). The sHSP monomers range in size from 12 to 24 kDa; however, most form oligomeric assemblies (van Montfort et al 2002). Crystal structures of two sHSPs are known (HSP16.5 from Methanococcus jannaschii and HSP16.9 from Triticum aestivum) and both of these proteins from hollow spheres composed of β-sandwiches (Kim et al. 1998; van Montfort et al. 2001).

The sHSPs have a very interesting pattern of evolution, and within this family the evolutionary history of the land plant sHSPs has been particularly noteworthy (de Jong et al. 1993, 1998; Waters 1995, 2003; Waters et al. 1996; Waters and Vierling 1999a, b). Compared to the sHSPs in other organisms the land plant sHSPs are unusually diverse. There are 19 sHSPs in Arabidopsis thaliana, and these 19 proteins (all of which are nuclear-encoded) can be organized into at least six different subfamilies or classes (Scharf et al. 2001). There are three known subfamilies of cytosolically localized plant sHSPs. In addition, there are at least three organelle sHSP subfamilies, with one localized to the endoplasmic reticulum (ER), one to the mitochondrion (MT), and the third to the chloroplast (CP). Members of these distinct sHSP subfamilies have not been identified outside of plants (Waters 2003; Waters and Vierling 1999a, b). Analysis of sHSPs cDNAs in a moss, Funaria hygrometrica, has shown that homologues of at least three of the angiosperm sHSPs subfamilies had their origins early in land plant evolution. However, the total number of sHSPs found in mosses or other basal land plant groups is not known due to a lack of genome data. In addition, evolutionary analysis of a sHSP protein from a green alga, Chlamydomonas reinhardtii (HSP22) (Kloppstech et al. 1985; Waters and Vierling 1999b; Waters 2003), did not provide any insight into the evolution of the sHSPs in algae and plants (Waters and Vierling 1999b; Waters 2003). Therefore at this time it is not known how and when these plant sHSPs families diversified.

The evolution and origin of the organelle-localized sHSPs have been of particular interest. There are no known members of the MT- or ER-localized sHSPs within other eukaryotes (de Jong et al. 1998; Franck et al. 2004). In addition, the phylogenetic position of the CP-localized sHSPs suggests that this protein did not originate with the cyanobacterial endosymbiont of chloroplasts (Waters and Vierling 1999b). Together these data are strong evidence that the sHSP proteins have had a very different evolutionary history than have many other protein families with members targeted to different cellular compartments (Boorstein et al. 1994; Iwabe et al 1996). For example, in the HSP70 family the nuclear-encoded MT or CP HSP70s are clearly derived from the endosymbionts of the MT and CP, respectively, and the ER HSP70 evolved very early on in eukaryotic evolution.

Analysis of the sHSPs in algae can address many questions concerning the evolution of the sHSPs. Toward this end we analyzed the complete genomes of five algal species: a chlorophyte green alga, Chlamydomonas reinhardtii; two closely related prasinophyte green algae, Ostreococcus tauri and Ostreococcus lucimarinus; a red alga, Cyanidioschyzon merolae; and a diatom, Thalassiosira pseudonana (Armbrust et al. 2004; Grossman 2003; Misumi et al. 2005). While the ancestors of the diatoms, green algae, and red algae most likely diverged more than 1 billion years ago (Baldauf et al. 2000; Baldauf 2003; Nozaki et al. 2003; Yoon et al. 2004), these algal species, in particular, the red and green algae, are more closely related to land plants than are the better-known fungal and animal eukaryotes. Thus, a comparison of these five genomes allows us to examine the evolution of the sHSPs and answer a number of questions including: Do these diverse algal species have sHSPs? and If they do, are any of them members of the land plant sHSPs families? In particular, because all of these organisms are photosynthetic and have either primary or secondary (T. pseudonana) plastids, analyses of these genomes can assist in our understanding of the evolution of the CP-targeted sHSPs.

If the algal sHSPs are members of the land plant sHSP families, this would suggest that these subfamilies duplicated and diverged not within land plants but rather much earlier in green plant evolution, before land plants evolved. If, on the other hand, none of the algal sHSPs are closely related to any of the land plant sHSP families, this would indicate that while the algal sHSPs and land plant sHSPs are all members of the large sHSP family, the duplications that generated the land plant sHSPs occurred after the divergence of the common ancestor of green algae and the land plants. The answers to these questions will have significant implications for our understanding of the selective pressures that drove the diversification of the sHSPs and of their role within the CP and other organelles.

Materials and Methods

Identification of sHSPs in Complete Algal Genomes

We have identified 17 algal sHSPs. This was accomplished by using known algal and land plant sHSP sequences as queries against the following complete genomes: Chlamydomonas reinhardtii v. 3 (http://genome.jgi-psf.org/Chlre3/Chlre3.home.), Ostreococcus lucimarius v.2.0 (genome.jgi-psf.org/Ost9901_3/Ost99013.home .html) and Ostreococcus tauri v. 2.0 (http://genome.jgi-psf.org/Ostta4/Ostta4.home.html), Cyanidioschyzon merolae (http://merolae.biol.s.u-tokyo.ac.jp/), and Thalassiosira pseudonana v. 3.0 (http://genome.jgi-psf.org/thaps3/thaps3.home.html). The sequences used as queries included the previously identified algal sHSP HSP22 from Chlamydomonas reinhardtii (NCBI accession number X15053) as well as additional sHSP sequences from Funaria hygrometrica (a moss) and from Arabidopsis thaliana. The programs Blastp, Blastn, and Tblastn were used at each of the individual genome web sites. All sequences with an e value of <0.05 were downloaded and examined more carefully. Some hits within each genome were clearly false ORFs (open reading frames) with very short ORFs and/or with none of the diagnostic sHSP conserved amino acids (Plesofsky-Vig et al. 1992; de Jong et al. 1993; Caspers et al. 1995; Waters 1995; Waters and Vierling 1999a; Franck et al. 2004) present.

The 17 sHSP sequences identified were analyzed further using a variety of sequence analysis tools. First, the accuracy of gene models was checked against EST sequences available at the genome sites. When gene models excluded the start Met residues found in EST clones the gene models were changed to reflect the EST data. Most of the gene models could be confirmed with EST analysis, however, no EST clones for the sHSPs genes from O. tauri and T. pseudonana were found. In these cases careful examination of upstream regions were performed to identify alternative start Met residues. As described in detail below, only the C-terminal portion of the sHSPs was used in the phylogenetic analysis and thus a slightly longer or shorter N-terminal region would not alter the results of this analysis. Second, the amino acid sequences were generated from the DNA sequences using Vector NTI (v.7) and compared against the predicted sequences on the genome databases for accuracy. The Vector NTI program was also used to calculate protein pairwise sequence identity. Intron position was then determined from the gene models and is reported based on where the intron sequence is relative to the amino acid alignment in Fig. 1. That is, if the intron is placed between two codons, the position is reported at the last codon in the first exon. For example, in CrHSP20 the intron lies between codons for amino acids 258 and 259 in Fig. 1. The codon position is listed as 258. Estimates of the size of the proteins (kilodaltons) were also generated using Vector NTI (v.7). Amino acid sequence alignments were then generated with ClustalW (Thompson et al. 1994) and optimized by hand in BioEdit (v. 7.0.5) (Hall 1999).

Fig. 1
figure 1figure 1

Amino acid and secondary structure alignment of algal sHSPs. The highly variable N-terminal domain spans from residue 1 to residue 218. The C-terminal or α-crystallin domain spans from residue 219 until the end of the protein. Secondary structure predictions were generated with PredictProtein as described in Materials and Methods. Helical regions are indicted by a double underline and strands are indicated by a single underline. The shaded regions were generated in GeneDoc and reflect varying levels of sequence conservation. The known secondary structures for a small heat shock protein from Triticum aestivum (van Montfort et al. 2001) is indicated above the alignment

Small HSPs are targeted to certain organelles in land plants and these proteins form evolutionarily related subfamilies. Therefore, knowledge of the cellular location of the algal sHSPs will be useful in the interpretation of phylogenetic patterns among the sHSPs. Localization predictions were performed using the following programs: PSORT (http://psort.nibb.ac.jp/form.html), Predotar (http://genoplanteinfo.infobiogen.fr/predotar/), and TargetP (http://www.cbs.dtu.dk/services/TargetP/) (Emanuelsson et al. 2000). These methods have been widely used in bioinformatic analysis, including in previous genomic analysis of sHSPs (Scharf et al. 2001) and other HSPs (Lin et al. 2001) of Arabidopsis thaliana.

The photosynthetic algal species studied here are members of deeply branching eukaryotic lineages. To determine if the sHSPs of these algae are related to the land plant sHSPs or are in fact more closely related to sHSPs from nonphotosynthetic eukaryotic lineages, we also examined the complete genome sequences of four nonphotosynthetic species from early-diverging eukaryotic lineages at The Institute for Genomic Research (TIGR) web site: Plasmodium falciparium and Plasmodium yoelii, both apicomplexans and members of the Alveolates; and Trypoanosoma brucei and Trypanosoma cruzi, both kinetoplastids and members of the Euglenozoa. Names and gene model numbers of these nonalgal sHSPs are available in the Supplementary Material. The EST databases of other early eukaryotic lineages were not examined. By limiting the searches to complete genomes we were able to examine all the sHSPs in each species. The analysis of Plasmodium sHSPs is particularly interesting because these species are known to have apicoplasts or remnant plastids (Waller and McFadden 2005). The oomycetes of which Phytophthora is a member are stramenopiles and thus are related to the diatoms. A search of two Phytophthora genomes (P. sojae and P. ramorum) failed to identify any sHSP proteins. These organisms are known pathogen whose genomes may be highly specialized for this lifestyle.

For phylogenetic analysis, sHSP homologues from archaea, bacteria (including cyanobacteria), fungi, animals, and land plants were obtained from public databases (NCBI). An alignment of these sHSPs with the Plasmodium and Trypoanosoma sHSPs, and the 17 algal sHSPs was generated with ClustalW and optimized in Bioedit. The full alignment of 115 amino acid residues from 53 proteins is available as Supplementary Material. Only the conserved C-terminal α-crystallin region was aligned and used for the phylogenetic analysis. The much more variable N-terminal domain is difficult to align across the evolutionary distance included in this analysis. Bayesian methods were used to construct phylogenetic trees (MrBayes v. 3.1) (Ronquist and Huelsenbeck 2003). In the Bayesian analysis four simultaneous chains of 2 million generations each were run. First, using MrBayes, we determined that the best model of amino acid evolution for this data set was the WAG model. All subsequent analysis used the WAG model. Then the analysis was run until the standard deviation of the split frequencies was below the critical value of 0.01. From this we determined that the “burn-in” should be the first 25% of the generations run. During the final analysis trees were collected or sampled very tenth generation. Trees from the first 500,000 generations (those prior to convergence of the likelihood values) were discarded as “burn-in.” The remaining trees were retained and used to generate the consensus tree. The posterior probabilities (a measure of the reliability of the branches) are presented on the tree.

Results

Seventeen Algal sHSPs Have Been Identified

In the Chlamydomonas reinhardtii genome five sHSP genes in addition to C.r. HSP22 were identified, bringing the total number of sHSPs in this CP green algal genome to six (Table 1, Fig. 1). Some, but not all, of these genes were identified in earlier versions of the C. reinhardtii genome (Schroda 2004). In addition, some of the gene models based on earlier releases of the C. reinhardtii genome had different placements of introns. One C. reinhardtii gene encodes a 20-kDa protein and is referred to here as C.r. HSP20. There are two different genes encoding a 25-kDa sHSP in C. reinhardtii (C.r. HSP25A and -25B). In addition, genes encoding a 30 and a 36.7 sHSP were also identified within the C. reinhardtii genome (HSP30 and HSP36.7, respectively).

Table 1 Seventeen small heat shock proteins are identified in five complete algal genomes

The C. reinhardtii sequence data are organized by scaffold and has not yet been mapped to chromosomes; therefore we do not have chromosomal locations. However, the genome data do provide information on intron position and the location of the genes in relation to other genes on the same scaffold. There are two sets of genes that are located adjacent to each other: C.r.HSP25A and C.r.HSP25B, and C.r.HSP20 and C.r.HSP22. C.r.HSP25A and C.r.HSP25B are very similar with 95% amino acid sequence identity, while C.r.HSP20 and C.r.HSP22 are only 55% identical. All of the C. reinhardtii sHSP genes have introns. Both C.r.HSP25A and C.r.HSP25B have a single intron in the same position (Table 1). C.r.HSP20 has one intron while HSP22 has two introns, but the position of the only intron in HSP20 and the second intron in HSP22 are conserved (Table 1). The HSP25 intron position is not shared with either HSP20 or HSP22. However, the intron position of the single intron in HSP30 is conserved with the intron in HSP25A and -B. Interestingly, HSP30 is only 28% identical to HSP25A and -B and is located on a different scaffold. There are three introns in the gene for C.r.HSP36.7 and their positions are not conserved with any of the other sHSP genes in C. reinhardtii. This gene is not located near any of the other sHSP genes.

The two prasinophyte green algal genomes each have three sHSPs. The three Ostreococcus lucimarinus sHSPs are HSP20, HSP22, and HSP28. Ostreococcus tauri has HSP21, HSP23, and HSP33 (Table 1). Only one of the sHSPs genes in O. tauri has an intron and none of the O. lucimarinus sHSP genes have introns. The intron position in O.t.HSP21 is not shared with any of the C. reinhardtii sHSP genes. The sHSPs within each Ostreococcus genome are diverse, with less than 40% amino acid identity. However, each gene has a close homologue in the other Ostreococcus genomes and these proteins are more than 50% identical. None of the Ostreococcus sHSP genes are located near each other.

Within the genome of Cyanidioschyzon merolae, a red alga, two distinct sHSP genes were found. One encodes a 20-kDa protein (C.m.HSP20) and the other encodes a 27 kDa protein (C.m.HSP27). The genes for HSP20 and HSP27 are found adjacent to each other on the same chromosome; neither of these genes contains an intron, and the two proteins share only 24% sequence identity.

Within the genome of Thalassiosira pseudonana, a diatom, three sHSP genes were identified (Table 1). One gene encodes a 16-kDa protein (T.p.HSP16), the other a 19-kDa protein (T.p.HSP19), and the third gene is for a 20-kDa protein, T.p.HSP20. The three sHSPs in T. pseudonana also have low pairwise sequence identity, at approximately 22%. Only one of these two genes, HSP19, contains an intron. This intron position is not shared with any of the other algal sHSPs. None of the T. pseudonana sHSP genes are located near each other.

Algal sHSPs Contain the Highly Conserved α-Crystallin Domain

Multiple sequence alignment of the 17 algal sHSPs indicates that these proteins are all clearly members of the ancient sHSP family. However, it also shows that there is considerable sequence diversity among these algal sHSPs (Fig. 1). The N-terminal region (residues 1–218 in Fig.1) is highly divergent, with not a single amino acid conserved among all the algal sHSPs. In previous analysis of plant sHSPs it was shown that these individual families contain conserved regions in the N-terminal domain (Waters 1995; Waters and Vierling 1999a, b). None of these conserved regions is seen in the algal alignment. Each of the algal sHSPs possesses the α-crystallin or HSP20 domain. This region is found in the C-terminal portion of the proteins and is found in all known sHSPs (from bacteria to archaea to animals and land plants). It contains the highly conserved “GVL” residues at positions 365–368.

It has been previously established that while sHSPs display considerable primary sequence diversity, there can be conservation of structural features among these proteins (van Montfort et al. 2002). In order to examine the conservation of structural features within the algal sHSPs, we included the known secondary features of sHSP 16.9b from Triticum aestivum in the alignment of algal sHSPs (Fig. 1). A comparison of the secondary structure predictions indicates that some but not all of the secondary structural features are shared within the algal sHSPs and between the algal sHSPs and the wheat sHSP. In the C-terminal domain of the sHSPs it is evident that most of the algal proteins share a β-sheet (β9) at the conserved GVL residues. This sheet is also found in the archaebacterial sHSP MjHS16.5 (Kim et al. 1998). Other structural features are less conserved among the algal sHSPs, for example, sheets β4 and β5 of the wheat protein are conserved in some but not all algal sHSPs. Three C. reinhardtii proteins (C.r. HSP20, -22, and -30) have helices predicted in this region. It is also interesting to note that many of the algal sequences have a predicted helix where the wheat protein has sheet β8. The archaebacterial sHSP M.j.HSP16.5 also has a helix in this position, suggesting that the β8 sheet may be land plant specific.

Cellular Localization of Algal sHSPs

Some but not all of the algal sHSPs (C.r.HSP25A and -B, Cr.HSP30, C.r.HSP36.7, O.l.HSP20. O.l. HSP28, O.t.HSP23.6, O.t.HSP33, C.m.HSP27, T.p.HSP19, T.p.HSP20) have predicted α-helices in their N-terminal regions very near the start of the protein (Fig. 1). The presence of these helices raises the possibility of organelle localization (Peeters and Small 2001). With this in mind the algal sHSPs were examined in detail using three different methods of cellular localization prediction (Supplemental Table 1). In some cases there was a clear prediction of cellular location. Three algal proteins, C.r.HSP36.7, O.l.HSP28, and O.t.HSP33, were predicted to be targeted to the mitochondria (MT). However, the localization prediction methods were not consistent in their predictions of other proteins. The programs Predator and Psort predicted HSP25A and HSP25B from C. reinhardtii to be either CP or MT localized proteins. TargetP predicted that C.r.HSP25A and -B would be found in the CP (but not the MT). However, these predictions have relatively low reliability. The three methods also did not agree in their predictions of the location of C.r.HSP30. Both Psort predicts a MT location, TargetP predicts with low probability a CP location, and Predotar predicts neither. The three methods also did not agree on the localization of HSP27 from the red alga C. merolae but in each case MT localization was predicted with low probability. The three prediction methods suggested that O.t.HSP23 is localized either to the ER or is secreted. There was also no clear prediction for the locations of T.p.HSP19 and T.p.HSP20 (Supplemental Table 1). Each of the algal species has at least one sHSPs (C.r.HSP22, C.m.HSP20, O.l.HSP20, O.l.HSP22, O.t.HSP21, and T.p.HSP16) that lacks an extended N-terminal region with a helix (Fig. 1). Most of these are predicted to be found in the cytosol (Supplemental Table 1).

Phylogenetic Analysis Reveals That the Algal sHSPs Are Not Members of the Land Plant sHSP Families

Phylogenetic analysis was performed on the C-terminal domain of the sHSPs. This region has the highest level of sequence similarity and has proved useful in a number of evolutionary studies of the sHSPs (Plesofsky-Vig et al. 1992; de Jong et al. 1993, 1998; Caspers et al. 1995; Franck et al. 2004; Fu et al. 2006). The goal of our analysis is to examine the relationship of the newly identified algal sHSPs to the well-studied land plant sHSP families, to other eukaryotic sHSPs, and to the bacterial sHSPs. We have included some nonphotosynthetic species (Plasmodium falciparum and Plasmodium yoelii, Trypanosoma brucei and Trypanosoma cruzi) that are members of early-diverging eukaryote groups, in addition to the photosynthetic algae of interest, to provide some depth to the tree. The Plasmodium sequences are interesting because this lineage contains a relict plastid or apicoplast (Waller and McFadden 2005). Searches of the complete genomes of the two Plasmodium genomes found no evidence of CP-targeted proteins.

The evolutionary relationships of all of the algal sHSPs indicate that not only are the algal sHSP not members of the land plant subfamilies, but also there are no algal sHSP subfamilies. None of the algal sHSPs is found within the land plant lineage of cytosolic and organelle-localized proteins. In addition, with the exception of the closely related Ostreococcus species (see JGI genome web site for more information; http://genome.jgi-psf.org/Ost9901_3/Ost9901_3.home.html), the algal sHSPs are closely related only to those sHSPs from the same species. It is significant that the algal sHSPs predicted to be MT localized do not form a subfamily and are not closely related to each other. It is also important to note that the lack of resolution in this tree in the deeper nodes (Fig. 2) reflects the lack of resolution in most, if not all, eukaryotic trees. The purpose of this study was not to use sHSPs to understand the phylogeny of eukaryotes but to understand how the sHSPs themselves have evolved.

Fig. 2
figure 2

Phylogenetic relationships of algal small heat shock proteins. The tree is based on a Bayesian analysis of the 17 algal sHSPs with small heat shock proteins from land plants, other eukaryotes, and bacteria. Posterior probabilities, a measure of the reliability of the branches, are placed either below or above the branches (the highest probability is 1.0). The branch uniting the land plants sHSPs is marked with an arrow. The branch length reference is placed at the lower left of the tree. The tree was rooted by the Streptomyces albus sequence. The complete alignment and accession numbers for the proteins included in the analysis can be found in the Supplementary Materials. Sequences newly identified in this analysis are marked with an asterisk

Discussion

In this study 17 sHSPs were identified in five complete algal genomes. The algal species studied represent three evolutionary distinct photosynthetic lineages: diatoms, red algae, and green algae. These five species are diverse in habitat, in genome size, and in their levels of adaptation to extreme conditions. Three of these species, Ostreococcus tauri, Ostreococcus lucimarinus, and Chlamydomonas reinhardtii, are members of the green plant or Viridiplante lineage and are more closely related to land plants than are the diatom, Thalassiosira pseudonana, and the red alga, Cyanidioschyzon merolae. The Ostreococcus species are member of the Prasinophyceae. They are closely related to each other; both are extremely small organisms with small genomes and are found in marine environments (DOE JGI O. lucimarinus genome web page, http://genome.jgi-psf.org/Ost9901_3/Ost9901_3.home.html; Derelle et al 2006). Ostreococcus lucimarinus is similar in many ways to O. tauri, but unlike O. tauri it is well adapted to high light intensities. Neither species is known to be adapted to extreme temperatures. Chlamydomonas reinhardtii is also a green alga and is also not known to be tolerant of extreme temperatures. The red alga Cyanidioschyzon merolae is also a small organism and, like the Ostreococcus species, also has a small compact genome (Matsuzaki et al 2004). This alga is adapted to extreme conditions of high temperatures and low pH (45°C, pH 1.5) (Matsuzaki et al 2004). The diatom T. pseudonana is a marine alga and it is not known to be adapted to any extreme conditions. It does differ from the other species in that it possesses a secondary plastid while the red and green algae all have primary plastids. We can conclude from our analysis that the number and diversity of sHSPs are not correlated with adaptation to extreme conditions. In addition, even very small, compact genomes have sHSPs, therefore these proteins are not dispensable in free-living photosynthetic organisms.

Algal sHSPs Are Variable in Sequence and Structure

Evolutionary analyses of the 17 algal sHSPs with their homologues in other species have identified both conserved and divergent sequence and structural features. These patterns of conservation suggest conservation of oligomeric structure but also suggest some functional divergence among the algal sHSPs. There are two known crystal structures of sHSPs, one from wheat, Triticum aestivum, and the other from Methanococcus jannaschii (Kim et al. 1998; van Montfort et al. 2001). Both of these proteins are oligomeric β-sandwiches whose basic building block is a dimer (Kim et al 1998; van Montfort 2001, 2002). It is notable that these two structures are almost-superimposable even though the proteins share only 25% amino acid identity. Thus, even though the algal sHSPs share as little as 30% sequence identify, it is likely that all of these algal sHSPs also have a β-sandwich structure. Analysis of the alignment of algal sHSPs revealed a region of length heterogeneity in the C-terminal region between residue 290 and residue 340 in Fig.1. This region between β5 and β7 is very important in dimer contacts (van Montfort et al 2002) and is also known to be of variable length in other sHSPs (Caspers et al 1995; de Jong et al 1998). In the C.r.HSP30 protein there is a predicted helix in this region. This may result in a different dimer structure that could have structural and functional implications. Work done in mammalian α-crystallins suggests that variation in this particular region of the protein can result in a differently shaped dimer with an increased surface area (Feil et al. 2001; van Montfort 2002). Increased surface area in this region could have important implications for the type and amount of substrate binding.

It is well established that the sHSPs are chaperone proteins and the current model for sHSP chaperone activity is that higher temperatures leads to conformational changes which expose hydrophobic patches that are the sites for substrate binding. Whether or not different sHSPs have varying substrate specificities is still an open question. However, we do know that at least in HSP18.1 from pea (Pisum sativum) the substrate binding regions are found in both the N-terminal and the C-terminal regions (Lee et al. 1997). The substrate binding residues in the N-terminal region were found at the very beginning of the protein where there is no sequence conservation across sHSPs and little structural information. The C-terminal substrate binding region is located between β3 and β5 just before the region of sequence length heterogeneity discussed above. Among the algal sHSPs this region displays a high level of sequence divergence. The predicted secondary structures of the algal sHSPs in this region are also variable, with both strands and helices present in the structure predictions. It is then quite possible that while the algal proteins are all chaperones, the substrates for these proteins may vary.

Evolutionary History of the sHSPs

Our evolutionary analysis of the algal sHSPs clarifies some aspects of the evolution of the sHSPs but, at the same time, raises a number of interesting questions. First, our analysis indicates that the algal sHSPs are not members of the land plant sHSP subfamilies (Fig. 2). Because none of the algal sHSPs belong to the land plant organelle-localized sHSPs (whose cellular locations are known experimentally), it is not possible to use phylogenetic affinity to predict cellular location (Marcotte et al. 2000). In addition, due to the lack of experimental data for the organisms studied, at this time we must rely on bioinformatic methods to predict cellular location. The bioinformatic methods for cellular location prediction that we did employ are clearly useful but can sometimes be inaccurate (Emanuesson and von Heijne 2001; Heazlewood et al 2004). The results of these analyses must be seen, then, as predictions and not as confirmed locations. That said, from our analysis we can conclude that the red alga, Cyanidioschyzon merolae, as well as the green algae, Chlamydomonas reinhardtii, and the Ostreococcus species most likely have MT-targeted sHSPs. The presence of algal CP sHSPs is much less clear. None of the algae have sHSPs that posses clearly recognizable CP transit sequences. However, two C. reinhardtii proteins, C.r.HSP25A and C.r.HSP25B, are predicted to be found in the CP and, possibly, the MT.

The finding that the algal sHSPs do not form subfamilies, but instead proteins from each species are usually grouped together, is somewhat surprising. From this we can conclude that the sHSPs have had a very different evolutionary history than the HSP70s (Boorstein et al 1994; Lin et al 2001), a highly conserved chaperone family that also has organelle and cytosolic members. Further, the sHSP evolutionary history is very different from that of the cell cycle genes (Robbens et al. 2005) and the kinesin-like proteins (Abdel-Ghany et al. 2005). Phylogenetic analysis of both of these families found that the green algae and land plant proteins were closely related with subfamilies containing both algal and land plant homologues (Robbens et al. 2005; Abdel-Ghany et al. 2005). It is also important to note that in a study of the heme biosynthesis pathway in photosynthetic eukaryotes, Obornik and Green (2005) found that most of the enzymes involved in this pathway are from the cyanobacterial ancestor of the plastid and were transferred to the nucleus early in chloroplast evolution. Importantly, for our analysis even T. pseudonana, which has a secondary plastid, contained the cyanobacteria-derived heme biosynthetic proteins. Thus, we can conclude that the patterns of evolutionary relationships among the algal, land plant, and bacterial sHSPs do not follow a typical pattern.

The lack of a clearly identifiable CP sHSP in the algae suggests physiological diversity between the algal and the land plant plastids. It is particularly interesting that the red alga Cyanidioschyzon merolae lacks a CP-localized sHSP as it is adapted to high temperatures and contains a plastid (Matsuzaki et al. 2004). Both Ostreococcus genomes also lack genes for CP sHSPs even though O. lucimarinus is adapted to high light intensities. It has been suggested that the angiosperm CP sHSP has an important role in maintaining the redox state of the chloroplast and may be important in tolerance of angiosperms to oxidative stress (Neta-Sharir et al. 2005). We can thus conclude that the adaptation of O. lucimarinus to high light intensities and the resulting oxidative stress within the chloroplast, and the adaptation of the C. merolae CP to the stress of high temperatures, differ significantly from those of the adaptation of angiosperms to these conditions because they were not accomplished with a protein similar to the land CP sHSPs. Beyond this we can conclude that the CP-localized sHSP found in land plants is not common to all photosynthetic eukaryotes and must have evolved long after the chloroplasts evolved.

Analysis of the phylogenetic relationships of the sHSPs indicates that that these proteins do not reflect the evolutionary history of the organelles. This in turn suggests the possibility of multiple origins of MT and CP sHSPs. The red and green algal plastids are derived from the primary plastid endosymbiont, a cyanobacterium (Keeling 2004; Yoon et al. 2004), and the plastids of diatoms are a product of a secondary endosymbiotic event involving a red alga (Keeling 2004; Yoon et al. 2004). It has been established that all MT and MT-like organelles have had a single endosymbiotic origin (Embley 2006). Therefore, if the organelle-localized sHSPs evolved from the endosymbionts, then CP-localized and MT-localized proteins should be closely related to their bacterial homologues, reflecting the relationships of these organelles with the primary endosymbionts (as has been demonstrated for numerous other proteins). It appears unlikely that a MT sHSP was derived from the MT endosymbiont, and that subsequently that this protein was lost in many lineages including in the fungi animals. First, it has been established that MT evolved early in eukaryote evolution and that the transfer of MT genes to the nucleus also occurred early in eukaryote evolution (Germot et al. 1996; Embley 2006). In addition, even in cases like the FtsZ proteins where the MT FtsZ genes have been lost in some lineages, analysis shows that when the MT FtsZ proteins are present they are clearly closely related to their bacterial homologues. Therefore, the best evidence against this scenario is that the plant and algal MT proteins are not closely related to the sHSPs from the bacterial lineage that gave rise to mitochondria.

Because the MT-targeted proteins from the red and green algae do not form a subfamily, it also seems unlikely that there was a single origin of the MT sHSPs within the algae. In the phylogenetic tree (Fig. 2) it appears that the MT-targeted proteins are more closely related to the cytosolic sHSPs found in the same genome. Of course gene conversion among all the sHSPs in a genome could homogenize the sequences and obscure phylogenetic relationships. However, the extremely high sequence divergence among the sHSPs within each genome and the fact that such gene conversion events could remove the sequences necessary for proper localization within the cell make this scenario highly unlikely. The evidence against either a single eukaryotic or a single algal MT-localized family of sHSPs leaves open the possibility of multiple origins of the MT-localized sHSPs.

If in fact there were multiple origins of the MT proteins, this raises still more questions about the origin of the CP-localized sHSPs. It is intriguing that HSP25A and HSP25B from C. reinhardtii may be dual-targeted (that is present in both the MT and the CP). Of course, the exact determination of the cellular location of these two sHSPs by biochemical methods will be needed to confirm the bioinformatic predictions made here. That said, our results do raise the possibility that the CP proteins evolved from the MT proteins with an intermediate stage as dual-targeted proteins. During the early evolution of the endosymbionts, endosymbiont genes encoding proteins needed in these organelles were transferred to the nucleus early in organelle evolution (Martin 2003a, b; Martin et al. 1998). At the same time transport machineries and transit or targeting sequences to get the now nuclear-encoded proteins into these organelles evolved (Keegstra and Froehlich 1999; Reumann et al. 2005). These transport machineries evolved independently and are highly specific for each organelle. However, it has been noted that there are a few proteins that can be imported into both organelles by possessing an ambiguous targeting sequence (Peters and Small 2001). Peters and Small (2001) note that these ambiguous targeting sequences often have few negatively charged residues but have many arginine, serine, leucine, and phenylalanine residues. Examination of the transit sequences of HSP25A and HSP25B suggests that these two proteins could be dual-targeted by possessing ambiguous targeting sequences. The presence of dual-targeted green algal sHSPs suggests a mechanism for the evolution of the land plant CP sHSPs.

The analysis of the land plant sHSPs indicates that the CP and MT families are closely related to each other but at the same time are distantly related to bacterial sHSPs (Waters 1995; de Jong et al 1998; Waters and Vierling 1999a, b). This close relationship could be the result of a shared evolutionary history followed by gene duplication and subfunctionalization (Force et al. 1999, 2005; Lockton and Gaut 2005). The MT-localized proteins may have evolved first, and after that the ambiguous targeting sequences evolved, allowing CP targeting as well. This was followed by gene duplication, which led to the specialization of one protein for the MT and another for the CP. Experimental determination of sHSP localization is clearly needed but this suggestion does provide a plausible and testable hypothesis concerning the evolution of the CP- and MT-localized sHSPs.

Conclusions

Homologues of the sHSPs were found in five algal genomes. Comparative and evolutionary analysis of amino acid sequences and the predicted secondary structural features of these proteins clearly identify them as members of this ancient and complex gene family. Our analysis indicates that the numbers and diversity of sHSPs are not correlated with adaptation to extreme conditions, and that even algae with very small genomes have sHSPs. There is also evidence for considerable structural as well as functional divergence among the algal sHSPs. It is notable that none of the algal sHSPs are members of the diverse subfamilies of sHSPs found in land plants, nor do the algal sHSPs form subfamilies based on cellular location. These findings provide further evidence that the CP and MT sHSPs did not originate from the endosymbionts of CP and MT.