Introduction

Ribosomal DNA is a very useful molecule for examining phylogenetic relationships among the many groups that emerged during the early evolution of eukaryotes, primarily because no other molecule has been sequenced as extensively. Improved methods of tree construction and better taxon sampling have revealed novel lineages and identified several significant new groupings (Bolivar et al. 2001; Dacks et al. 2001; Silberman et al. 2002; Cavalier-Smith and Chao 2003a, b). However, the dramatically variable rate and mode of rRNA evolution make rooting eukaryote rRNA trees by the outgroup method entirely unsound and often lead to artefactual grouping of nonsister organisms that share exceptionally rapid rates of rRNA evolution or other systematic biases (Philippe and Adoutte 1998; Philippe et al. 2000). Another problem is its lack of resolving power for the basal branchings of the main eukaryote groups, as seen in published trees, where the branching order in the backbone that connects the major groups is largely unresolved and their clustering is unreproducible, receiving no bootstrap support and often contradicted by other more robust evolutionary data (Cavalier-Smith and Chao 2003b). Since these problems have been widely recognized our understanding of eukaryote phylogeny has seen some major changes (e.g. Roger 1999; Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2002).

Protein-coding genes allow independent testing of the tree and are increasingly used for phylogenetic studies. As with rRNA, systematic biases among lineages and random errors must be considered. The best way to reduce both problems is using more than one protein for phylogenetic reconstruction. Several such studies using concatenated protein sequences have greatly improved resolution in some parts in the eukaryote tree (Baldauf et al. 2000; Moreira et al. 2000; Bapteste et al. 2002; Lang et al. 2002). However, the value of protein sequences is still limited by restricted taxon sampling, despite recent increases in such studies (e.g. Archibald et al. 2001; Edgcomb et al. 2001; Fast et al. 2001; Keeling 2001; Dacks et al. 2002; Simpson et al. 2002). The problem of rooting the eukaryote tree is more difficult, because even when bacterial outgroups have homologous proteins they are usually so divergent as to cause artefactual long-branch rooting. The location of the root can in principle be solved independently of such biases by the accumulation of data on shared derived characters such as indels and gene fusions that can be used to exclude the root from particular parts of the tree (Baldauf and Palmer 1993; Stechmann and Cavalier-Smith 2002).

The current picture of eukaryote phylogeny based on cladistic reasoning using such discrete molecular evidence and cell biological data as well as sequence trees from many molecules shows a major split into two branches, one leading to the opisthokonts (animals, Choanozoa, and fungi) and a much more diverse one containing plants, chromalveolates, excavates, Cercozoa, Retaria (Foraminifera, Radiolaria), Apusozoa, and Heliozoa, collectively called bikonts (Cavalier-Smith 2002; Simpson and Roger 2002; Stechmann and Cavalier-Smith 2002; Cavalier-Smith and Chao 2003a, b). A gene fusion found in all bikont groups but in no opisthokonts led to the conclusion that the root of the eukaryote tree is probably between these two major clades (Stechmann and Cavalier-Smith 2002). It is currently uncertain whether Amoebozoa are sisters of opisthokonts or lie on the bikont side of the tree or even branch below this major bifurcation (Simpson and Roger 2002; Stechmann and Cavalier-Smith 2002; Cavalier-Smith and Chao 2003b). The exact order of emergence of the major bikont groups also remains unresolved (Simpson and Roger 2002; Cavalier-Smith and Chao 2003b).

Even though data from scores of proteins will probably be needed to determine the branching order of many groups robustly (Bapteste et al. 2002), it is also important to increase the range of taxa sampled for several well-selected proteins to explore the full range of eukaryotic diversity and because increased taxon sampling itself tends to increase tree quality. The heat-shock protein Hsp90 is a highly conserved molecular chaperone of about 700 amino acids, which is required for folding numerous key proteins including many involved in cell cycle and other regulation (Young et al. 2001) in all organisms except archaebacteria (eubacterial homologs are called HptG). Hsp90 offers particular promise as a universal eukaryotic phylogenetic marker as it is longer than several others that have been used and appears to evolve relatively uniformly in different lineages. Hsp90 homologs are present in the cytosol of all studied eukaryotes, with a different paralog in the endoplasmic reticulum (ER) in all except fungi, which appear to have lost it. The chloroplast Hsp90 of flowering plants appears to be derived from the ER version (Krishna et al. 2001; Emelyanov 2002); the mitochondrial version is still more divergent, its exact origin being unclear (Emelyanov 2002) and it is peculiar in being located both within mitochondria and in the cytosol or even nucleus (Cechetto and Gupta 2000; Morita et al. 2002). The cytosolic version is therefore more suitable for global eukaryotic phylogenies, though the ER version may help root the tree as it is less divergent than bacterial ones. A few phylogenetic studies involving Hsp90 have been carried out, mainly adding more sequences to already sampled groups (see Gupta 1995; Archibald et al. 2001) or for more detailed analyses of some parts of the tree (Alveolata: Fast et al. 2002; Euglenozoa: Simpson et al. 2002).

We have amplified the cytosolic Hsp90 gene from six widely different, previously unsampled eukaryote groups in order to carry out a global eukaryote phylogenetic analysis and contribute to the growing database of protein-coding genes of phylogenetic utility. We sequenced seven Hsp90 genes from three chromist and three protozoan phyla: Apusozoa (represented by Amastigomonas marina), for which this is the first nuclear protein-coding sequence suitable for making a sequence tree; Choanozoa (represented by Corallochytrium); and Cercozoa (represented by Thaumatomonas). Apusozoa are of particular interest for eukaryote evolution because they usually branch relatively early within the eukaryote rRNA tree, but their precise position is uncertain (Atkins et al. 2000; Cavalier-Smith and Chao 2003b). Our trees place Amastigomonas well within the bikonts but do not support a particularly early branching for Apusozoa. Our Hsp90 trees all robustly confirm that Choanozoa are sisters to animals and consistently support the monophyly of opisthokonts. However, even though Hsp90 seems less prone to systematic biases among lineages than most other molecules, it apparently does not contain enough information to resolve the basal branching order of bikonts, which are monophyletic on likelihood and distance trees, although the trees correctly recover many established groups.

Materials and Methods

PCR Amplification and Cloning of Cytosolic Hsp90 Genes

All DNA samples were provided by E. E-Y Chao, isolated as previously reported (Cavalier-Smith and Chao 1995). Table 1 lists organisms and their origin used in this study. Degenerate PCR primers were constructed based on an amino acid alignment kindly provided by J. Archibald (UBC). The forward primer hsp90IF was designed based on the conserved amino acid region DALDKIR (GAC GCT CTG GAY AAR ATH MG). The two reverse primers hsp90IIIR and hsp90IVR were designed on the basis of the conserved regions MKAQALR (CTC AGA GCY TGN GCY TTC AT) and MEEVD (GTC NAC YTC YTC CAT), respectively, the latter of which marks the C-terminus of Hsp90 proteins. PCRs were carried out in two rounds: initial PCRs were done using the hsp90IF and hsp90IVR primers. The products of these reactions were used for a seminested PCR-reamplification using hsp90IF and hsp90IIIR as primers. From the two haptophytes partial Hsp90 sequences were obtained using the following PCR primers, based on our Goniomonas Hsp90 sequence: Gonio-hsp90IIF (GGA CCA GCT CGA GTA CCT CGA), corresponding to positions 417–437 in the Goniomonas Hsp90 sequence, and Gonio-hsp90IIR (GTG TCC TCG TGG ATG CCG AGC), corresponding to positions 1140–1160. All amplifications were carried out using 50–200 ng genomic DNA or 1 µL of a 1:10 dilution of initial PCR product for the seminested PCR, 0.25 µM primer, 0.2 mM dNTPs, 2.5 mM MgCl2, 1 × PCR Buffer, and 1.5 units Taq-Polymerase (Invitrogen). Cycling conditions were as standard with 48–55°C annealing temperatures, depending on the primers used. All products were gel-purified before being cloned into the pCR2.1-TOPO vector using the TOPO TA cloning kit (Invitrogen). Several clones from each transformation were analyzed and inserts were sequenced using dye terminators and separated on an automated ABI-377 sequencer. Both strands were sequenced using the primer walking method.

Table 1 List of organisms for which Hsp90 sequences were obtained in this study

Sequence Analysis and Phylogenetic Methods

Sequences were analyzed for open reading frames using BioEdit 5.0.6 (Hall 1999) and identified with the BlastX program. Putative introns were manually detected and analyzed. The Thaumatomonas Hsp90 sequence contains two putative introns at positions 461–558 (89 nucleotides) and 788–862 (75 nucleotides) both with canonical GT-AG splice sites. The partial Hsp90 sequence of Prymnesium also contains a putative intronic sequence at position 152–252 (100 nucleotides) also showing GT-AG splice sites. After exclusion of the intronic sequences single ORFs were found for Thaumatomonas and Prymnesium and confirmed as Hsp90 using BlastP. Additional Hsp90 sequences were taken from Genbank. Multiple alignments were by ClustalX (Thompson et al. 1997) using default options and then refined manually with the GDE program (Smith et al. 1994). This initial alignment contained 57 eukaryotic cytosolic Hsp90 sequences, 12 homologs from the endoplasmic reticulum and 11 prokaryotic Hsp90 sequences. After removal of regions where more than one species had gaps and ambiguously aligned sites from the central acidic region of the gene (see Young et al. 2001) the alignment contained 512 positions. Phylogenetic analyses were carried out using Phylip 3.6a3 (Felsenstein 2002). Distances were calculated using ‘protdist’ with the JTT model (Jones et al. 1992) of amino acid sequence evolution. Trees were constructed using the minimum evolution (ME) (Rzhetsky and Nei 1992) and Fitch-Margoliash least squares method (FM) (Fitch and Margoliash 1967) provided in the Phylip package and with the BIONJ (Gascuel 1997) program. Both the ME and FM calculations were done with ten input order jumbles and global rearrangements. Bootstrap analyses used 100 replicates, but with only one random addition sequence for each because of computational constraints. To allow for rate variation across sites, all methods were run using a gamma distribution (Yang 1995) with eight categories and relative rates and probabilities estimated by the program. The parameter alpha was estimated by Tree-Puzzle 5.1 (Schmidt et al. 2002) to calculate the coefficient of variation of rates (CV) among amino acid positions (CV = 1/α1/2).

In order to use more rigorous methods of tree reconstruction, the long branching prokaryotic sequences and ER homologs were removed from the alignment as well as some of the eukaryotic sequences, where more than two representatives from one group or near identical sequences were present. In this alignment of 46 sequences, the most conserved part of the central acidic region was included resulting in 519 unambiguously aligned positions. To test possible effects of the outgroup sequences (here: ER homologs) on the branching order of the eukaryotic ingroup, i.e. their possible root, we also calculated trees from the 512 position amino acid alignment, where the prokaryotic sequences were excluded as well as some more eukaryotic sequences, to keep the number (50) low enough for a maximum likelihood calculation. Distance trees were calculated as described above. In addition, maximum likelihood trees were calculated with ‘ProML’ in Phylip using the JTT amino acid substitution model. To allow for rate variation among sites, the hidden Markov model (Felsenstein and Churchill 1996) was used, rate variation among sites being estimated through a gamma distribution using eight categories. The jumble option was used to randomize the input order of the taxa and the global rearrangement option was used to optimize the resulting tree. Bootstrap analyses used 50 replicates. Parsimony trees were calculated with PAUP*4.0b10 (Swofford 1998) using full heuristic searches with 5000 separate random stepwise additions of sequences and TBR for optimizing results. For bootstrap analysis, 1000 resamplings with single jumbles were carried out.

Results

Distance trees derived from all 80 sequences were rooted using the prokaryotic Hsp90 sequences (Fig. 1). All show three major clades: prokaryotic Hsp90 sequences, eukaryotic ER/chloroplast Hsp90 homologs, and cytosolic Hsp90 sequences, all with bootstrap support of ≥90% (except in ME, where all three groups are below 50% bootstrap support and FM where ER Hsp90 has only 76% support). Mitochondrial sequences were omitted as there are very few and their even longer branches put them in a indeterminate position (Emelyanov 2002). The cytosolic and ER/chloroplast versions form a reasonably well supported clade, in accordance with the views that the cytosolic and ER Hsp90 originated as a gene duplication in the ancestral eukaryote and that the chloroplast version arose later from a duplicated ER Hsp90 gene (Emelyanov 2002). In addition to its restricted taxon sampling, the ER/chloroplast subtree is confused by the apparent long branches of the Arabidopsis chloroplast sequences (see also legend of Fig. 1) and of the Cryptosporidium sequence and is not currently useful for eukaryote phylogeny. By contrast in the cytosolic Hsp90 subtree, although there are closely related paralogs in Achlya, and somewhat more distant ones in vertebrates and within flowering plants (one duplication appears to precede the dicot/monocot split; not all paralogs are included in our trees), these do not affect the deeper parts of the tree, and the overall branch lengths are much more uniform than in other molecules commonly used for phylogeny. This subtree is rooted between opisthokonts (animals, Choanozoa, and fungi) and the bikont/Amoebozoa grouping, with moderate and low bootstrap support respectively, which fits the gene fusion and indel evidence for the position of the eukaryote root (Stechmann and Cavalier-Smith 2002).

Figure 1
figure 1

Distance tree of 80 Hsp90 sequences using 512 amino acid positions. The tree was derived using the least squares method and a gamma distribution (α = 0.9). Numbers at nodes or after group names represent bootstrap percentages (100 replicates) obtained from different distance methods (BioNJ, Fitch-Margoliash, Minimum Evolution). Due to insufficient space, not all bootstrap percentages are shown. The A. thaliana NM_11656 sequence in our tree corresponds to the A. thaliana AAF21187 sequence in the tree by Emelyanov (2002) which in turn does branch with the chloroplast Hsp90 sequence of Secale cereale (Schmitz et al. 1996; Emelyanov 2002). We think that both the A. thaliana sequences shown here are possible chloroplast Hsp90 sequences and therefore are labeled as chloroplast homologs. Hsp90 sequences obtained in this study are shown in bold.

The new choanozoan Hsp90 sequence (Corallochytrium) branches as sister to the animals, with moderate to very high bootstrap support. Animals and fungi are each robustly monophyletic.

The bikont part of the tree contains well established groups together with our other new sequences. The cercozoan Thaumatomonas branches as sister to the kinetoplastid Euglenozoa in all three analyses, with low bootstrap support (unfortunately the size limitation of Fig. 1 makes this branching order appear to be apparently unresolved). This relationship is consistent with a single secondary symbiogenetic origin of plastids in euglenoids and Rhizaria as postulated by the cabozoan theory (Cavalier-Smith 1999; Cavalier-Smith and Chao 2003b). The Guillardia theta nucleomorph sequence (representing the red alga from which the nucleomorph evolved: Douglas et al. 2001) often fails to group with the green plants, as might have been expected, and forms a poorly supported cluster with Naegleria and Dictyostelium at the base of the bikont/Amoebozoa part of the tree. Moreira et al. (2000) also found that Hsp90 was one among several proteins that did not recover the monophyly of kingdom Plantae in single gene trees, even though it was strongly supported using concatenated protein sequences. In one analysis (ME), however, the nucleomorph sequence did branch with the green plants and Goniomonas, albeit receiving no support at all (not shown). The sequence of the colorless cryptophyte Goniomonas branches weakly with the green plants. The new Ochromonas Hsp90 sequence groups firmly with the two Achlya sequences as a heterokont clade, which surprisingly weakly groups with the apusozoan Amastigomonas, this cluster in turn weakly grouping with Goniomonas (the only other chromist) and the green plants. Bootstrap support for the positions of all four new bikont sequences is low to moderate in all analyses.

Figure 1 shows that the branches for the prokaryotic Hsp90 sequences are about ten times longer and the eukaryotic ER sequences about three to four times longer than for the cytosolic homologs. To reduce possible influences of the long-branch prokaryotic outgroup sequences on these weakly supported positions, further analyses using also maximum likelihood were carried out after excluding them (see Fig. 2). Finally, ER sequences were also removed, which made the dataset more homogeneous in amino acid composition as tested by the chi-square test implemented in Tree-Puzzle, where all remaining 46 sequences passed the test. Recently we showed that the root of the eukaryote tree probably lies between opisthokonts and bikonts (Stechmann and Cavalier-Smith 2002), so we have rooted the trees in this way. Interestingly, when the long branch outgroups are excluded (prokaryotes excluded: Fig. 2; ER homologs also excluded: Fig. 3) the FM distance trees show bikonts as holophyletic, in contrast to Fig. 1 where the amoebozoan Dictyostelium is weakly placed among them. The maximum likelihood tree (Fig. 4) also shows bikonts as monophyletic, but the parsimony tree (Fig. 5) does not, as it weakly groups Dictyostelium with Naegleria. Dictyostelium also branched with Naegleria using ME or with Naegleria and Guillardia theta using BioNJ, but without bootstrap support (not shown).

Figure 2
figure 2

Distance tree of 50 Hsp90 sequences using 512 amino acid positions but excluding bacteria. The tree was derived using the least squares method and a gamma distribution (α = 0.75). Numbers at nodes are bootstrap percentages obtained with BioNJ, Fitch-Margoliash and Minimum Evolution (100 replicates). Due to the limited space, not all bootstrap percentages are shown and bootstrap values below 50% are omitted (mainly at the backbone of the tree). In a maximum likelihood analysis using the same alignment (not shown), the Dictyostelium sequence branches withthe G. theta nucleomorph sequence and the main bikont groups emerge in a different order than the one shown here. Note also that bikonts are holophyletic in this figure, which does notshow due to limitations in the figure size.

Figure 3
figure 3

Distance tree of 46 cytosolic Hsp90 sequences using 519 amino acid positions. The tree was derived using the least squares method and a gamma distribution (α = 0.53). Numbersat nodes represent bootstrap percentages (100 replicates) obtained with the BioNJ, Fitch-Margoliash and Minimum Evolution methods. Bootstrap values below 50% are not shown. Due to insufficient space, bootstrap values within the bodonids and green plants are left out.

Figure 4
figure 4

Protein maximum likelihood tree of the 46 taxa dataset using 519 positions. Bootstrap percentages (50 replicates) are not shown below 50%.

Figure 5
figure 5

Maximum parsimony tree of 48 taxa, including the partial haptophyte sequences using 519 positions. The data missing from the two partial haptophyte sequences were coded as X, but genuinely missing amino acids within the regions sequenced were treated as a 21st state. Bootstrap percentages above 50% are shown.

In marked contrast to previous taxonomically more restricted studies (Gupta 1995; Archibald et al. 2001) all five analyses showed the opisthokonts as a monophyletic group, with moderate to strong bootstrap support ranging from 53 to 82% (Figs. 3, 4, 5). Within the opisthokonts, the choanozoan sequence always branches with strong to very strong bootstrap support as sister to the animals. The two partial Hsp90 sequences from the haptophytes Prymnesium and Pavlova were included only in the parsimony tree (Fig. 5), as this method alone is known to cope well with missing data (Baldauf et al. 2000). They grouped weakly with Goniomonas, in accord with the classification of both groups in the Chromista, but the third chromist group—the heterokonts—did not group with them, but was weakly sister to alveolates, as often observed on rRNA and other protein trees (Baldauf et al. 2000). However the heterokont/alveolate branch was quite close by, and the separation insignificant. In separate distance analyses of a shortened alignment, restricted to the 211 alignable positions available for haptophytes (not shown), the haptophytes cluster with heterokonts and green plants, but with no bootstrap support. Alveolates were holophyletic in all trees except the one restricted to 211 positions.

The lack of resolution within the bikonts is also shown by their differing branching order in other respects between the distance, likelihood, and parsimony trees. The apusozoan Amastigomonas branches with low bootstrap support as sister to heterokonts in most distance trees (Fig. 3), as in the analysis with more sequences (Fig. 1), but is weakly sister to the Guillardia nucleomorph in the FM (Fig. 2), ML (Fig. 4), and parsimony (Fig. 5) analyses. The cercozoan Thaumatomonas is weakly sister to Euglenozoa in distance (Figs. 1 and 3) and parsimony trees (Fig. 5) and in both analyses (distance and ML, Fig. 2) using ER sequences as outgroup; but in the ML analysis, it is sister to all eukaryotes except the discicristates, which are sisters to all other bikonts (not shown). The cryptophyte Goniomonas is sister to green plants on distance (Figs. 1, 2, 3) and ML trees that exclude haptophytes (Fig. 4), but to haptophytes on the parsimony tree (Fig. 5), and to the Euglenozoa or haptophytes in restricted-position distance trees (not shown). The percolozoan Naegleria is sister to the Euglenozoa, forming a discicristate clade, only with ML (Fig. 4); with distance it branches at the base of green plant/cryptophyte/rhodophyte cluster (Fig. 3) and with parsimony as sister to Dictyostelium (Fig. 5).

Discussion

The strongest conclusion from our analyses is that the choanozoan Corallochytrium groups firmly as sister to the animals, further supporting the finding that Choanozoa are closer to animals than to fungi (Snell et al. 2001; Lang et al. 2002; Cavalier-Smith and Chao 2003b). Corallochytrium is one of the few new sequences that never changed position with the different methods and different alignments. Our results also give further support to the opisthokont clade, which has moderate to strong support when distant paralogs are excluded, but weaker support when they are not. Our ML and FM results, which show a holophyletic bikont grouping, are consistent with the view that the eukaryote tree is best thought of comprising three major clades, the opisthokonts, the bikonts, and the Amoebozoa, with the branching order among them not yet settled (Cavalier-Smith 2002; Simpson and Roger 2002; Stechmann and Cavalier-Smith 2002). We attribute the occasional grouping of Dictyostelium with Naegleria and/or G. theta to tree reconstruction artefacts, possibly caused by the fact that each of these three branches is restricted to a single taxon, making it long and unbroken by closer relatives that might cancel out any systematic biases it may cause.

It is very interesting however, that when the long-branch prokaryotes are removed, Dictyostelium groups as sister to the opisthokonts in the FM distance tree (Fig. 2) which agrees with a tree based on concatenated mitochondrial proteins, where there is very high bootstrap support for an opisthokont/amoebozoan clade as well as for the monophyly of bikonts and opisthokonts (Lang et al. 2002). Our trees show that the position of Dictyostelium is unreliable and seems to change with different taxon sampling for the outgroup and the distance criterium used. Moreover, on the single ML tree that was computationally feasible to calculate for the ER rooted dataset, Dictyostelium branched within the bikonts; possibly more extensive taxon sampling for Hsp90 would help stabilize the trees and provide further evidence for bikont and amoebozoan monophyly. However, the agreement in the position of the eukaryote root between the concatenated mitochondrial protein tree and the nuclear Hsp90 tree may mean that the actual position of the eukaryote root is between the bikonts and an opisthokont/amoebozoan clade. In the case of the mitochondrial tree, the opisthokont and amoebozoan branches are both longer than the bikont ones, which raises the possibility that the grouping was a long branch artefact on the mitochondrial tree. In the case of cytosolic Hsp90 however, there is no significant difference in the branch lengths between the opisthokonts, bikonts, and Amoebozoa. Therefore there is less reason to doubt the reality of the opisthokont/Amoebozoa clade then for the mitochondrial data, even though the ER/chloroplast outgroup is distinctly longer. The mitochondrial tree, however, has the advantage that the branch lengths of the outgroup are relatively shorter than for Hsp90. The congruence of the two trees therefore is probably significant. As both opisthokonts and Amoebozoa ancestrally had single cilia (Cavalier-Smith 2002) we designate this putative clade the unikonts. Further sequencing of amoebozoan Hsp90s is needed to test whether the eukaryotic root is really between the unikonts and bikonts, as we now suspect.

The firm grouping of the chrysomonad alga Ochromonas (phylum Ochrophyta) with the oomycete Achlya (phylum Bigyra: Cavalier-Smith 1997) supports all previous evidence for the close relationship between these two phyla within the heterokont clade. The failure of all chromalveolates (alveolates plus chromists: Cavalier-Smith 1999, 2003) to group together, even though different pairs of them often do (e.g. Cryptophyta/Haptophyta; Alveolata/Heterokonta), is consistent with other single-gene trees such as 18S rRNA, and provides further evidence for the view that the divergence between the three chromist groups occurred so early after the divergence of chromists and alveolates that single-gene sequences contain insufficient data to resolve their correct branching order (Cavalier-Smith 1999, 2003). Given the firm evidence from other sources (Fast et al. 2001; Cavalier-Smith 2002, 2003) for the single symbiogenetic origin of the chromalveolate plastid from one red alga, it should not be misconstrued as evidence against their monophyly (see more detailed discussion of this with respect to the similarly poorly resolved 18S rRNA trees: Cavalier-Smith and Chao 2003b). The weak grouping of Goniomonas with haptophytes is biologically more reasonable than the probably artefactual grouping of cryptophytes with glaucophytes often seen on rRNA trees. However, at present only concatenated protein trees (Yoon et al. 2002) have recovered (and very robustly) the chromobiote clade comprising haptophytes and heterokonts, long predicted by cell biological and evolutionary arguments (Cavalier-Smith 1986, 1994, 2003).

Early ribosomal rRNA studies had suggested that on unrooted trees Apusozoa were somewhat closer to the opisthokonts than the other bikonts and also closer to opisthokonts than Amoebozoa (Cavalier-Smith 1998a, 2000; Cavalier-Smith and Chao 1995). More recent studies with larger taxon sampling and more sophisticated analysis leave the relative position of Apusozoa and Amoebozoa more ambiguous and essentially unresolved given the uncertainties in the trees; nonetheless, they weakly tend to suggest that Apusozoa may be among the earliest diverging bikonts (Cavalier-Smith 2002; Cavalier-Smith and Chao 2003b). Our Hsp90 sequence from Amastigomonas is the first apusozoan protein sequence suitable for making trees. It provides no support for the idea that Apusozoa are closer to the opisthokonts than are the Amoebozoa or that they are especially early diverging bikonts (Cavalier-Smith and Chao 2003b). While in our trees they are always nested well within the bikonts, there is no consistent position and no strong support for any of them. Probably numerous protein sequences and multiple taxon sampling will be needed to establish their relationships. Our present data neither support nor argue against their recent classification with Heliozoa, Cercozoa, and Retaria as the Rhizaria (Cavalier-Smith 2002) or the possibility that centrohelid heliozoa may be their closest relatives (Cavalier-Smith and Chao 2003a). They are however consistent with the evidence that apusomonads have the derived dihydrofolate reductase-thymidylate synthase (DHFR-TS) gene fusion that characterizes the bikonts (Stechmann and Cavalier-Smith 2002). If, as we suspect, Amoebozoa truly lack this fusion, they must be more distant from the bikonts than are Apusozoa. Thus both the DHFR-TS gene fusion and the ML and FM Hsp90 trees support the monophyly of the bikonts and the inclusion of Apusozoa within the bikonts (Cavalier-Smith 2002) and contradict the (weakly supported) topology of early rRNA trees with respect to the branching order of Apusozoa and Amoebozoa.

Cercozoa are an important and very diverse protozoan phylum (Cavalier-Smith 1998a, b; Cavalier-Smith and Chao 2003b) for which there is very little protein sequence data; the only previous protein trees that include any are for tubulin and actin (Keeling et al. 1998; Keeling 2001), both of which are known to have very strong systematic biases in certain lineages (Philippe and Adoutte 1998). It has been proposed that Cercozoa are sisters of the even more recently established phylum Retaria (Foraminifera and Radiolaria; Cavalier-Smith 1999), the putative clade comprising them being designated ‘core Rhizaria’ (Cavalier-Smith 2003). Such a clade is currently weakly supported by actin (Cercozoa and Foraminifera; Keeling 2001) and rRNA trees (Cercozoa and Retaria; Cavalier-Smith and Chao 2003a, b), and strongly by a polyubiquitin insertion (Cercozoa and Foraminifera; Archibald et al. 2003). Unfortunately our sequence for the cercozoan Thaumatomonas does not yet provide an additional test for the core Rhizaria: to do this sequences are needed also from Retaria, which mostly cannot be cultured. According to the cabozoan hypothesis (Cavalier-Smith 2002) core Rhizaria are sisters of excavates (Simpson and Patterson 2001) and the green plastid of chlorarachnean Cercozoa and euglenoid excavates was acquired by a single enslavement event in their common ancestor (Cavalier-Smith 1999, 2003). Unfortunately the Hsp90 trees are insufficiently robust with respect to the critical branchings to test this hypothesis, but they do not clearly contradict it.

The contradictions between the three major phylogenetic methods with respect to the branching order of the bikonts simply indicates that there is insufficient phylogenetic information in this molecule to resolve them reliably. The maximum likelihood tree agrees with the results from concatenated gene trees (Bapteste et al. 2002; Baldauf et al. 2000) better than do the parsimony or most distance trees in three respects, i.e. in showing monophyly of bikonts and discicristates, and showing heterokonts as related to alveolates. It does not have any strongly supported branches that are contrary to other evidence. This fact, coupled with its relatively uniform branch lengths and amino acid composition, suggest that cytosolic Hsp90 may be giving a reasonably accurate picture of eukaryote evolution, especially with gamma-corrected ML trees. The relatively uniform branch lengths of cytosolic Hsp90 may be a consequence of its need to interact with such a great variety of other proteins (Archibald et al. 2001). In animals it is required for a vast array of processes ranging from cell cycle control via the centrosome (De Carcer et al. 2001), through mitochondrial protein import (Young et al. 2003), to the stability of epigenetic inheritance via chromatin replication (Sollars et al. 2003); disrupting it by mutation or phenotypically it can cause a huge variety of morphological and other changes (Rutherford and Lindquist 1998; Queitsch et al. 2002).

Its relatively uniform mode of evolution suggests that it would be worthwhile obtaining several Hsp90 sequences from each protist phylum to test further the picture emerging from actin, tubulin and rRNA trees, all of which seem to be much more prone to lineage-specific biases. However, the lack of robust resolution of basal branchings among bikonts with all single-gene trees suggests that it will be necessary to combine data from at least several protein-coding genes of comparable length and phylogenetic quality to get consistently reliable trees. Even though it may be necessary to sequence scores of protein genes to get robust support (Bapteste et al. 2002) if they are randomly chosen, it seems possible that selecting fewer than ten long eukaryotic genes, which like Hsp90 evolve relatively uniformly across all taxa, might enable one to generate reasonably sound trees. Such trees should be better than single-gene trees for testing the monophyly of protozoan phyla that are not yet robustly circumscribed (notably Apusozoa, Retaria, Loukozoa and Amoebozoa: Cavalier-Smith 2002) as well as recent ideas concerning higher-level eukaryote relationships, e.g. the postulated holophyly of excavates and cabozoa (Cavalier-Smith 1999, 2003).

In conclusion, we suggest that, despite the limited resolution possible with a single gene, cytosolic Hsp90 is as good as or better than any other single molecule so far used for eukaryote global phylogeny. It should be sequenced from an even more representative selection of protist taxa and other equally good proteins sought for inclusion in multi-gene phylogenies.