Introduction

Microsporidia are unicellular eukaryotes, obligate intracellular parasites which infect a large diversity of organisms. Their hosts are free-living or parasitic animals belonging to all classes of vertebrates, various groups of invertebrates, and some apicomplexan protists. In humans, microsporidia are considered as etiologic agents of opportunistic infections, mainly in patients with AIDS. The majority of human infections are due to two genera, Enterocytozoon and Encephalitozoon, including the species Encephalitozoon cuniculi. The phylum of microsporidia consists of more than 1200 species belonging to nearly 150 genera (Franzen and Muller 2001). Despite a wide diversity of morphology and life cycle (Dunn and Smith 2001; Weiss 2001), two characteristics unify all members of the phylum and determine that an organism belongs to it: the survival of microsporidia outside their hosts only in the form of spores of very small size and the presence in the spores of a typical extrusion apparatus (the polar filament and its anchoring disk) required for entry into the host cell.

Although they are true eukaryotes, microsporidia have several unusual characteristics. They lack mitochondria and peroxisomes and their Golgi apparatus never appears as dictyosomes. The ribosomes and their components present prokaryotic features (Curgy et al. 1980; Ishihara and Hayashi 1968; Vossbrinck and Woese 1986): low sedimentation coefficients (ribosomes, 70S; ribosomal subunits, 30S and 50S; rRNA, 16S and 23S), high rRNA contents (60%) and absence of free 5.8S rRNA. The first known 16S rRNA sequence of a microsporidium emerged at the base of eukaryotes in the phylogeny of small subunit (SSU) rRNA (Vossbrinck et al. 1987). This result supported the idea that microsporidia would have diverged very early in the course of the evolutionary history of eukaryotes, before the endosymbiosis at the origin of mitochondria and before the appearance of the 5.8S rRNA. Microsporidia were placed within the Archezoa, a subkingdom defined to gather all protozoa considered to be primitively amitochondriate: Archaemoeba, Metamonada (phylum including the diplomonads), Parabasalia (phylum including the trichomonads), and Microsporidia (Cavalier-Smith 1987). The emergence of three of these groups at the base of the eukaryotic SSU rRNA tree (Leipe et al. 1993) reinforced the hypothesis of the divergence of the Archezoa before mitochondrial symbiosis (except for the Archaemoeba Entamoeba histolytica). But the understanding of the early evolution of eukaryotes changed radically after the discovery of nuclear genes with a mitochondrial origin (genes resulting from a transfer from the ancestral mitochondrial genome to the nuclear genome) in the genome of protists belonging to different amitochondriate phyla: Archaemoeba (Clark and Roger 1995), trichomonads (Brown and Doolittle 1995; Bui et al. 1996; Germot et al. 1996; Roger et al. 1996), and diplomonads (Arisue et al. 2002; Hashimoto et al. 1998; Keeling and Doolittle 1997; Roger et al. 1998; Tachezy et al. 2001). In these three phyla, the hypothesis of the past presence of mitochondria suggested by the presence of these genes was reinforced by the discovery in their cytoplasm of a new organelle, which seems to share a common origin with mitochondria (Bui et al. 1996; Germot et al. 1996; Mai et al. 1999; Roger et al. 1996; Tovar et al. 1999, 2003). In trichomonads, this organelle, called hydrogenosome, produces energy and molecular hydrogen via a specific pyruvate-to-acetate conversion.

A nuclear gene encoding a mitochondrial (mt)-type HSP70 was found in four microsporidia: Nosema locustae (Germot et al. 1997), Vairimorpha necatrix (Hirt et al. 1997), E. cuniculi, and E. hellem (Peyretaillade et al. 1998b). Its presence supported the view of a secondary loss of mitochondria during their evolution. The availability of the complete genome sequence of E. cuniculi allowed the discovery of five other genes having very probably a mitochondrial origin (Katinka et al. 2001). One of these, encoding the β subunit of the pyruvate dehydrogenase E1 component, was also identified in N. locustae (Fast and Keeling 2001). Fast and Keeling also found in E. cuniculi and N. locustae another subunit (α) of this component of probable mitochondrial origin. The four other encoded proteins and mt-HSP70 protein in E. cuniculi may be assigned to the Fe–S cluster assembly machinery that is specifically associated with mitochondria in both yeasts and mammals (for review see Lill and Kispal 2000). The presence of such proteins and of some others potentially related to known mitochondrial functions, among which five possess a predicted transit peptide showing similarities with the targeting transit peptides of mitochondria and hydrogenosomes, led Katinka et al. (2001) to put forward the hypothesis of the existence of a mitochondrion-derived organelle in microsporidia. Data from a study of the localization of the mt-HSP70 protein in Trachipleistophora hominis strongly supported this hypothesis: Antibodies raised against this protein indeed react with tiny organelles surrounded by a double membrane in the cytoplasm of T. hominis, suggesting remnants of mitochondria (Williams et al. 2002).

One year before the discovery of an mt-HSP70 gene in microsporidia, the placement of microsporidia in new molecular phylogenies also cast doubt about their primitive lack of mitochondria. Indeed, phylogenetic trees based on β-tubulin (Edlind et al. 1996) and α-tubulin (Keeling and Doolittle 1996) showed an emergence of the microsporidian lineage within fungi. It was therefore considered that these parasites should have “evolved degeneratively from higher free-living forms” (Edlind et al. 1996). The basal position inferred from previous phylogenies would be the result of a methodological artifact, the long-branch attraction phenomenon (Felsenstein 1978). Accordingly, phylogenies of the elongation factors EF-1α and EF-2 including sequences from the fish microsporidium Glugea plecoglossi showed a basal emergence with a long branch, especially for EF-1α (Kamaishi et al. 1996a, b).

Comparison of the sequences of the SSU rRNA gene and of various proteins (elongation factors, α- and β-tubulins) and the length of their branches in phylogenies revealed the existence of an increase of the evolutionary rate of these genes in microsporidia relative to that of most eukaryotes. Variable among genes, this increase seems to have been especially intense for SSU rRNA and elongation factor genes. In contrast, the tubulin genes would have retained a clearer phylogenetic signal through a moderate increase in evolutionary rate. That microsporidia are not an early-emerging phylum was given additional support by phylogenies of four proteins, mt-HSP70 (Germot et al. 1997; Hirt et al. 1997; Peyretaillade et al. 1998b), the large subunit (LSU) of RNA polymerase II (Hirt et al. 1999), TATA box binding protein (Fast et al. 1999), and valyl-tRNA synthetase (Weiss et al. 1999), as well as that of large subunit rRNA (Peyretaillade et al. 1998a; Van de Peer et al. 2000). In all cases, microsporidia were indeed late-emerging, often as a sister group to fungi or within them. The complete sequencing of the E. cuniculi genome (Katinka et al. 2001) provided many new data for a thorough study of the phylogenetic position of microsporidia. We report here a phylogenetic analysis of 99 E. cuniculi protein sequences that provides new insight into the relationships between microsporidia and fungi and into the role played by long-branch attraction in the positioning of microsporidian sequences.

Methods

Construction of Families of Homologous Sequences

Two series of similarity searches by BLAST (Altschul et al. 1997) were carried out from the 2001 predicted protein sequences of E. cuniculi. The first one was realized against the SWISS-PROT/TrEMBL database in order to detect all homologous sequences known in different eukaryotic and prokaryotic species. The second one was realized against the protein sequences of E. cuniculi themselves in order to detect potential paralogs. We next built families of homologous sequences using the criteria employed for construction of the HOBACGEN database, i.e., for each pair of sequences, a percentage of the length of these sequences aligned together higher than or equal to 80% and a level of similarity of these sequences in the alignment higher than or equal to 50% (for details see Perrière et al. 2000). These values ensure that members of a family are all homologs and that families very probably contain phylogenetic information.

Selection of Families

All families having fewer than five sequences were eliminated. As the aim of this study was to test if microsporidia diverged early or, on the contrary, have a close relationship with fungi, the only undeniable outgroups for these two hypotheses were Bacteria and Archaea. Thus, all families having no prokaryotic sequences were excluded. Families without fungal sequences were also excluded. In order to study the position of E. cuniculi among eukaryotes, families without animal or plant sequences were also eliminated.

A preliminary phylogenetic study was done to determine orthology and paralogy relationships and to verify the presence of phylogenetic signal in each family. The homologous sequences of each family were first aligned with the program CLUSTAL W (Thompson et al. 1994). After visualization of the alignment, partial sequences introducing a large number of gaps relative to the length of the complete sequences were eliminated. The sequences with a large number of undetermined sites were excluded. Then a tree was built with the neighbor-joining method (Saitou and Nei 1987) performed by the PHYLO_WIN program (Galtier et al. 1996). The distances were estimated according to the Poisson model after the exclusion of all gap-containing positions, and the bootstrap support was estimated from 100 replicates.

Visualization of the trees enabled us to select families according to two criteria: presence of a sufficient phylogenetic signal and, for multigene families, unambiguous identification of orthologous sequences. Families whose trees did not show a clear separation between eukaryotes and prokaryotes were excluded. The size of families with more than 30 sequences was reduced while maintaining their evolutionary diversity in order to obtain families with no more than about 30 sequences.

Phylogenetic Analyses

For each family, four methods of phylogenetic reconstruction were employed: a distance method and three maximum likelihood methods. In all cases, all gap-containing positions were excluded.

The distances between each pair of sequences were estimated by the likelihood method performed by the TREE-PUZZLE program version 5.0 (Strimmer and von Haeseler 1996) with the JTT-F model and a site-to-site evolutionary rate variation modeled on a gamma distribution with eight rate categories plus invariant sites and the shape parameter estimated from the data. A tree was reconstructed from these distances with the BIONJ method (Gascuel 1997). The statistical supports for the internal branches were estimated by the bootstrap method on 500 replicates. Bootstrap data sets were created with the SEQBOOT program (Felsenstein 1993). For each of them, the distances were estimated with the TREE-PUZZLE program version 5.0 with the JTT-F model and the values of the parameters of the distribution of the evolutionary rates of the sites estimated from the initial data set.

Phylogenetic trees were inferred by the maximum likelihood method of the program PROTML, version 2.3b3 (Adashi and Hasegawa 1996), with the JTT model of amino acid substitution (Jones et al. 1992) and equilibrium amino acid frequencies estimated from the data (JTT-F model) and retaining the 300 top ranking trees (option -jf -q -n 300). The statistical support for the internal branches was estimated by the resampling estimated log likelihood method (RELL) (Kishino et al. 1990) performed by the program PROTML (version 2.3b3).

Phylogenetic trees were also inferred by the maximum likelihood method of the program PROML (Felsenstein 1993) with the JTT model of amino acid substitution and a site-to-site evolutionary rate variation modeled on a gamma distribution with five rate categories plus invariant sites. The alpha parameter of the gamma distribution and the fraction of invariant sites were estimated with the TREE-PUZZLE program version 5.0 (Strimmer and von Haeseler 1996). Finally, PROML was also used with a constant evolutionary rate among sites to produce an algorithm that differs from PROTML in its heuristic for tree space exploration but not in the model of sequence evolution. Option G (global search) of the PROML program was used in both analyses. The run time of this algorithm prohibits application of the bootstrap procedure.

Estimation of the Relative Evolutionary Rates of the E. cuniculi Proteins

To determine the evolutionary rate of a protein in a species A relative to its evolutionary rate in a species B, the relative rates test compares distances dAC and dBC, C representing a third species having diverged from species A and B before their separation (Sarich and Wilson 1973). For this test to be the most accurate, it is necessary to choose as species C the closest certain outgroup to species A and B. To determine the relative evolutionary rate of the E. cuniculi proteins without putting forward a hypothesis on their phylogenetic position, the only species available to play the role of species C are prokaryotes. But the large phylogenetic distance between prokaryotes and eukaryotes does not generally enable us to obtain reliable distances. In practice, the average distance between prokaryotic and eukaryotic sequences is often close to one substitution per site, sometimes higher, which indicates the presence of saturated sites. Thus, only distances between eukaryotic species were used.

We estimated the relative evolutionary rate of each protein of E. cuniculi by the ratio of the human–E. cuniculi distance to the human–Saccharomyces cerevisiae distance (distances estimated by the TREE-PUZZLE program with a gamma distribution with eight rate categories plus invariant sites). This ratio increases when the evolutionary rate of the E. cuniculi sequence relative to the rate of the S. cerevisiae sequence increases. This statement is true independently of the true phylogenetic position of E. cuniculi. However, it is necessary that pairwise distances were correctly estimated and that the evolutionary rates of the human and the yeast sequences did not vary relatively to that of other eukaryotic sequences. In order to minimize the effect of such a phenomenon, the average distance between animal and E. cuniculi sequences and that between animal and fungal sequences were used.

Results

Construction and Selection of Families of Homologous Sequences

The procedure of construction of homologous sequence families of the HOBACGEN database (Perrière et al. 2000) enabled us to group the 2001 predicted E. cuniculi protein sequences and their homologs from the SWISS-PROT/TrEMBL database into 1766 families. Most families contain only one E. cuniculi sequence. Different criteria were then used to select those appropriate to study the phylogenetic position of E. cuniculi (Fig. 1).

Figure 1
figure 1

Selection of homologous sequence families.

Two hundred eighty-four families contained between 5 and 475 sequences. Among them, 148 contained prokaryotic sequences. Of these 148 families, 22 did not contain fungal sequences. In most of these 22 families, the size of the family was small (14 families contained fewer than 15 sequences) and one or several eukaryotic phyla were not represented: There were 9 families without plant sequences, 2 families without animal sequence, and 3 families with only prokaryotic homologs. The eukaryotic members of the 22 families therefore seemed to have diverged greatly. Four additional families lacking plant sequences were eliminated.

A total of 124 families, including 153 E. cuniculi sequences, were kept for a preliminary phylogenetic analysis that led to select 77 families for detailed analysis. Within each family, prokaryotic and eukaryotic sequences formed two well-distinct monophyletic groups. Sequences from animals, plants, and fungi generally formed monophyletic groups. Nine families contained several paralogous sequences resulting from one or several duplication events prior to the divergence of all eukaryotes but posterior to the prokaryote–eukaryote divergence. Each subtree corresponding to orthologous sequences was used as outgroup for the other subtree(s), allowing for the elimination of prokaryotic sequences. Twelve families without prokaryotic sequences, but each containing several E. cuniculi sequences, were also subjected to a preliminary phylogenetic analysis. In two of them, one duplication event would have occurred prior to the divergence of eukaryotes. These two families, each containing two E. cuniculi sequences, were added to the previously selected families. Thus, 79 families, including 99 E. cuniculi sequences, were finally investigated.

Phylogenetic Analysis

For each family, four methods of phylogenetic reconstruction were employed: the BIONJ method with distances estimated with a gamma distribution (BIONJ), the maximum likelihood method PROTML without gamma distribution (PROTML), the maximum likelihood method PROML without gamma distribution (PROML without gamma), and the maximum likelihood method PROML with a gamma distribution (PROML with gamma). The results are summarized in Table 1. Overall, the E. cuniculi sequences emerged at the base of the following groups: (1) animals, fungi, and plants in 56% (PROML with gamma) to 80% (BIONJ) of the trees; (2) fungi (or very rarely within them) in 13% (BIONJ) to 23% (PROML with gamma); (3) animals and fungi (or very rarely inside them) in 4% (BIONJ and PROTML) to 10% (PROML with gamma); and (4) animals in 1% (BIONJ) to 8% (PROML without gamma). As the PROML with gamma method shows results intermediate between those of the PROTML and the PROML with gamma methods (Table 1), we only delved into the results of the three other methods. The information provided by the use of the PROML without gamma method is presented in the Discussion.

Table 1 Positions of the E. cuniculi sequence in the 99 protein phylogenies by four methods

For 66 of the 99 analyzed proteins, the three other methods placed the E. cuniculi sequences at a similar position (Table 2). In 53 cases, the microsporidian proteins were placed before the divergence of plants, fungi, and animals (Fig. 2A). When protist sequences were present, they generally emerged either before or after the E. cuniculi sequences. Because no consistent grouping appeared between sequences of E. cuniculi and a protist phylum, the clustering of the E. cuniculi sequences with any protist sequences did not seem to have evolutionary significance but, more probably, to have resulted from a methodological artifact. In the 13 other cases where the three methods gave similar results, the E. cuniculi proteins emerged later: 2 at the base of fungal and animal sequences, a single 1 at the base of animal sequences, and 10 at the base of fungal sequences (Fig. 2B). In the phylogenies of the 10 latter proteins, the internal branch separating fungal and microsporidian sequences from other eukaryotic sequences was supported by bootstrap values ranging from 18 to 87% with the BIONJ method and by RELL values ranging from 27 to 99% with the PROTML method (no bootstrap values were available with the PROML method). The statistical support of this branch was more than 50% for seven proteins and more than 70% for two proteins.

Table 2 Comparison of E. cuniculi sequence positions in the 99 phylogeniesa
Figure 2
figure 2

PROML trees based on (A) ribosomal proteins L2–L8 (15 sequences; 226 sites; relative evolutionary rate of the E. cuniculi sequence, 3.25) and (B) SRP54 proteins (19 sequences; 377 sites, relative evolutionary rate of the E. cuniculi sequence, 1.27). SWISSPROT accession numbers are given in parentheses.

For the 33 other proteins, the sequences of E. cuniculi were placed at a different position according to the method used (Table 2). The discrepancies between the PROML trees and the other ones (29 with both PROTML and BIONJ trees) were more numerous than those between the BIONJ and the PROTML trees (13 trees). Seventeen E. cuniculi proteins emerged at the base of animal, fungal, and plant sequences in the BIONJ and PROTML trees, whereas they emerged later in the PROML trees, mainly at the base of fungal sequences (5) or of fungal and animal sequences (7). In seven cases, E. cuniculi branched at the base of animals, fungi, and plants in the BIONJ trees but at the base of fungi in the PROTML trees. Similarly, in PROML trees, E. cuniculi emerged later in six of the seven cases, mostly at the base of fungi (four) or animal and fungi (one). For two proteins, E. cuniculi was placed at the base of fungi in the BIONJ and PROML trees, but at other “nonbasal” positions in the PROTML trees.

The number of E. cuniculi sequences that are not placed at the base of the animal, fungal, and plant sequences in the trees also depends on the method used. However, in most cases, the microsporidian sequence was placed at the base of fungal or fungal and animal groups: 13 and 4 of 19 with BIONJ, 18 and 4 of 29 with PROTML, 21 and 4 of 35 with PROML without gamma, and (23 and 10 of 43 with PROML with gamma, respectively (Table 1). This nonrandom position within “higher” eukaryotes indicates the close microsporidia–fungi relationship not to be artifactual but to have resulted from a phylogenetic signal. In contrast, the basal positions observed with most families could result from the high evolutionary rate of the E. cuniculi sequences, producing long-branch attraction. To test this hypothesis, we computed the relative evolutionary rate of the E. cuniculi sequences. If the hypothesis is correct, the E. cuniculi sequences having the lowest relative evolutionary rates should preferentially support the close relationship of microsporidia with fungi.

Estimation of the Relative Evolutionary Rates of E. cuniculi Proteins

The relative evolutionary rate of each E. cuniculi protein was estimated by the ratio of the average distance between animal and E. cuniculi sequences to the average distance between animal and fungal sequences. Relative rate values sorted by increasing order are represented in Fig. 3 together with an indication of the phylogenetic position of each protein. The sequences supporting a close relationship between fungi and E. cuniculi are preferentially among those with low relative evolutionary rates.

Figure 3
figure 3

Relative evolutionary rate of E. cuniculi proteins. Proteins were sorted by increasing rate and categorized according to their position in the trees (for abbreviations see Table 1, footnote a). The scale for the evolutionary rate corresponds to the BIONJ graph. The PROTML, PROML without gamma, and PROML with gamma graphs were moved up slightly in order to be visible. Arrowheads indicate ribosomal proteins.

To evaluate the statistical significance of this result, a Wilcoxon rank–sum test was carried out. It tests if, among a series of 0s and 1s, the 1s preferentially appear at an extremity of the series. For each phylogenetic building method, two tests were realized. For the first one (test 1), all proteins supporting a position of E. cuniculi at the base of eukaryotes were coded 0, and the others 1. For the second one (test 2), all proteins supporting a position of E. cuniculi at the base of eukaryotes were coded 0, those supporting a position at the base of fungi were coded 1, and the proteins supporting another position were excluded. The results are given in Table 3. Whatever the method used and whatever the proteins retained for the test, the grouping of the proteins supporting a nonbasal position among the proteins having the lowest evolutionary rates is statistically significant.

Table 3 Wilcoxon rank–sum test for the four methods and the two groups of selected proteins

Discussion

Phylogenetic Position of Microsporidia

Microsporidia are nowadays regarded as close relatives of fungi. This assumption is based on the phylogenetic analysis of the LSU rRNA (Van de Peer et al. 2000) and of a small number of proteins (Edlind et al. 1996; Fast et al. 1999; Germot et al. 1997; Hirt et al. 1997; 1999; Keeling and Doolittle 1996; Weiss et al. 1999). The availability of the complete genome sequence of the microsporidium E. cuniculi enabled us to test this hypothesis with almost 100 proteins. We have found that only 17% (BIONJ) to 43% (PROML with gamma) of the 99 analyzed proteins are clearly in favor of a late divergence of microsporidia. That this minority of molecular phylogenies reflects the true evolutionary history of microsporidia is, however, plausible.

A first argument is that the position of a late-emerging E. cuniculi sequence within the crown is not random. The frequency of emergence at the base of fungal and/or animal sequences is very high: 95% (18 of 19) with the BIONJ method, 86% (25 of 29) with the PROTML method, 94% (33 of 35) with the PROML without gamma method, and 91% (39 of 43) with the PROML with gamma method. In addition, in the cases of emergence at the bases of fungal sequences or of animal sequences, the microsporidian lineage mostly appears as a sister group of fungi: 93% (13 of 14) with the BIONJ method, 86% (18 of 21) with the PROTML method, 72% (21 of 29) with the PROML without gamma method, and 79% (23 of 29) with the PROML with gamma method. These results could also suggest a massive horizontal gene transfer from fungi to microsporidia. Indeed, although less common than between prokaryotes, horizontal gene transfers between eukaryotes may have occurred independently of secondary endosymbiotic events (Andersson et al. 2003). To explain the discrepancies between phylogenies of SSU rRNA or elongation factors and that of tubulins, Sogin (1997) proposed that microsporidia may have “borrowed genes from host genomes.” Horizontal gene transfers could have been facilitated by an intracytoplasmic parasite lifestyle, which leads to hypothesize that microsporidia were parasites of fungi during early evolution.

A second argument lies in the repeated claim that the basal position of microsporidia in some phylogenies is due to the long-branch attraction artifact (see, e.g., Van de Peer et al. 2000). If true, the proteins whose phylogeny places microsporidia close to fungi are expected to be the ones having undergone no or the smallest increase in their evolutionary rate in E. cuniculi. With the assumption of a massive gene transfer from fungi to microsporidia, no relationship is expected between the relative evolutionary rate of E. cuniculi proteins and their phylogenetic position. In contrast, most of the trees supporting a microsporidia–fungi relationship indeed correspond to the E. cuniculi proteins that are characterized by the smallest relative evolutionary rates. We conclude that microsporidia did not diverge early within eukaryotes, the basal position resulting from long-branch attraction and microsporidia clustering preferentially with fungi when a phylogenetic signal is present.

Long-Branch Attraction

Long-branch attraction results from the acceleration of the evolutionary rate of some lineages relative to others (Felsenstein 1978). This acceleration causes a clustering and/or a basal emergence of the fast-evolving sequences. The effect of the variation of the evolutionary rate of some lineages on the performances of different phylogenetic methods was tested several times by simulation, especially with nucleotide sequences (e.g., Huelsenbeck 1995a; Kuhner and Felsenstein 1994). These studies showed that all methods are sensitive to the existence of such variations. The maximum parsimony method performs poorly in their presence. Distance-based methods, such as the neighbor-joining (NJ) method and the maximum likelihood (ML) method, are very efficient when the evolutionary model used to build the tree is close to the simulated model. When the two models differ, the performances of these methods decrease, the ML method appearing more robust than the NJ method. Noticeably, all models of sequence evolution take into account only a part of the complexity of true evolutionary mechanisms.

This report represents the first study of the influence of the acceleration of the evolutionary rate of a lineage on the performances of different tree building methods based on real rather than simulated sequences. Comparison of the results obtained with the PROML without gamma and the PROML with gamma methods allows us to compare the influence of accounting for the among-site rate variation on the performances of the PROML program. As expected, the addition of a gamma correction decreases the impact of the acceleration of E. cuniculi sequences (Table 1): the number of proteins whose phylogeny placed E. cuniculi sequence at the base of fungal or at the base of fungal and animal sequences (presence of a weak bias) is higher with a gamma correction (33 instead of 25). This result underlines the importance of taking into account the among-site rate variation in the presence of fast-evolving sequences, as previously shown for EF-2 (Hirt et al. 1999) and LSU rRNA (Peyretaillade et al. 1998a; Van de Peer et al. 2000) microsporidian sequences and more generally by simulations (e.g., Huelsenbeck 1995a, b; Sullivan and Swofford 2001). Taking into account the among-site rate variation also explains in part the discrepancies between PROTML and PROML with gamma trees (29 trees). But the relatively high number of discrepancies between PROTML and PROML without gamma trees (17 trees; see supplementary information) suggests that the algorithm of topology search also influences the results. PROML’s algorithm seems to be slightly less sensitive to long-branch attraction than PROTML’s.

Although taking into account the among-site rate variation, BIONJ, and PROML with gamma trees showed different positions for 29 E. cuniculi proteins, 25 of these supported a basal position in the BIONJ trees and a non-basal position in the PROML with gamma trees. This result suggests that ML methods are less biased by the presence of fast-evolving sequences than distance-based methods. To try to confirm this assumption, phylogenetic trees were inferred using another distance-based method (the Fitch and Margoliash method [Felsenstein 1993]) from the same distance matrix. The results, very similar to those with BIONJ (data not shown; see supplementary material), support the superiority of ML methods to all distance methods. The better results with the PROML method may come from the greater robustness of ML methods to the differences between the evolutionary model used to build the trees and the true evolutionary process.

Relative Evolutionary Rates

Whatever the method used, the relationship between the relative evolutionary rate of the E. cuniculi proteins and their phylogenetic position is not absolute. It is noteworthy that, among E. cuniculi proteins having a high (>2) relative evolutionary rate, six support a nonbasal emergence of E. cuniculi in the PROML trees but their positions seem to be randomly distributed (Fig. 3 and supplementary material). Moreover, some of the E. cuniculi proteins having the lowest relative evolutionary rates are placed at a basal position, possibly because of an imperfect estimation of the relative evolutionary rates. The use of the ratio of the average distances between animal and E. cuniculi sequences to that between animal and fungal sequences is based on the hypothesis that the relative evolutionary rate of the fungal sequences was the same for all proteins. However, an increase or a decrease in the evolutionary rate of some fungal proteins cannot be ruled out, and the real evolutionary distances are unknown. Several parameters can influence the accuracy of distance estimation, including the presence of some saturated sites and the difference between the evolutionary model and the true evolutionary process.

Modeling the among-site rate variation by a gamma distribution as done by PROML seems to reflect only partially the evolutionary process, at least for some proteins. The acceleration of the evolutionary rate of the elongation factor EF-1α in ciliates may be explained in part by an increase in the number of variable positions (Moreira et al. 1999). In microsporidia, EF-1α has been shown to evolve very rapidly (Kamaishi et al. 1996a, b) and also to contain many nonconservative substitutions at otherwise universally conserved positions (Baldauf and Doolittle 1997). A large number of unique substitutions were also observed in the microsporidian SSU rRNA (Stiller and Hall 1999). The loss of invariability of some sites in such sequences may be related to a global increase in their evolutionary rate in microsporidia that caused substitutions at sites having an evolutionary rate so low that no substitution occurred in the sequences of other species. Another interpretation is that some sites having strong selective constraints could have fixed mutations due to concomitant substitutions in cellular components interacting with them. Indeed, other components of the translation apparatus, such as LSU rRNA and EF-2, also underwent an acceleration of their evolutionary rate. Moreover, all 17 E. cuniculi ribosomal proteins analyzed in this study present a high relative evolutionary rate (1.5–4.33). Ten of them are among the 11 E. cuniculi proteins with the highest relative evolutionary rate (Fig. 3). The fast evolution of components of the translation apparatus could also explain the size reduction of SSU and LSU rRNAs as well as the fusion of the 5.8S sequence to the LSU rRNA gene in microsporidia. More generally, the discrepancy between the true evolutionary process and the evolutionary model used for phylogenetic analyses could explain, at least in part, the imperfect estimation of the distances and, consequently, of the relative evolutionary rates of microsporidian proteins.

Because animals represent an outgroup for E. cuniculi and fungi, the relative evolutionary rates quantify roughly the increase in the evolutionary rate of E. cuniculi proteins relative to fungal ones. Most analyzed proteins evolved more rapidly in E. cuniculi than in fungi. The acceleration varies between one and more than four times. Several proteins having retained a phylogenetic signal in other organisms, but having evolved much more in E. cuniculi, were excluded from our construction of sequence families. Thus, the range of variation is likely underestimated. To try to explain the acceleration of the evolutionary rate of the microsporidian genome, the parasitic lifestyle has frequently been evoked through phenomena such as population bottlenecks, relaxed selection, and/or positive selection. As these phenomena influence the whole genome, the acceleration should be roughly the same for all proteins. Our results do not support this hypothesis. The heterogeneity of the relative evolutionary rates suggests that relaxation of selection or positive selection varied from one protein to another.

A possible explanation is the existence of a link between the function of a protein and its relative evolutionary rate. But as the proteins involved in the same machinery interact, their evolution is probably not independent (see above). A link between the protein machinery in which a protein is involved and its relative evolutionary rate would thus be expected. We tried to test this hypothesis by classifying the proteins according to their functional role category (Table 4). Among the five main categories, which gather 87 of the 99 studied proteins, the percentages of proteins supporting a close relationship between E. cuniculi and fungi and the protein mean relative evolutionary rate are similar (0.27–0.40 and 1.49–1.64, respectively), except for the proteins involved in the protein biosynthesis (0.06 and 2.19, respectively), in accordance with our previous observation. But this apparent homogeneity hides important heterogeneities. For instance, within the diverse functional role subcategories of proteins involved in the metabolism, the phylogenies of the two proteins involved in the nitrogen and sulfur metabolism support a close relationship of E. cuniculi and fungi, while those of the five proteins involved in the nucleotide metabolism support a basal emergence of E. cuniculi. The same phenomenon is observed among the subcategories of the other major categories: cell growth, cell division, and DNA synthesis (cell growth and cell polarity or cell cycle control and mitosis/DNA synthesis and replication), transcription (mRNA transcription/rRNA transcription), and protein destination (protein targeting, sorting and translocation/protein modification). These heterogeneities are difficult to explain and would require a larger number of genes to be confirmed (the mean number of genes in each subcategory is only slightly over four). However, these results tend to show that the proteins involved in the same machinery did not evolve independently and that the diverse machineries underwent different selective pressure.

Table 4 Classification of the 99 E. cuniculi proteins according to their functional role category (major category and subcategory) as given by Katinka et al. (2001)

Microsporidia: Sister Group of Fungi or True Fungi?

Several similarities between fungi and microsporidia can be considered (for review see Wittner and Weiss 1999). An important example is their common capacity to form spores. Mitosis is of the closed type and dense structures called “spindle pole bodies” resemble those of yeasts. Meiosis seems to enable the diplokaryotic nucleus (two nuclei closely located) of some microsporidia and the dikaryotic nucleus (two separated paired nuclei) of some fungi to become monokaryotic (Flegel and Pasharawipas 1995). Chitin, a major polysaccharide of the fungal cell wall, is present in the inner part of the microsporidian spore wall. Trehalose, a disaccharide frequently found in fungi, has also been detected in microsporidia. In addition, the ability to synthesize both chitin and trehalose is predicted through the annotation of the E. cuniculi genome (Katinka et al. 2001). Some other shared molecular features are the insertion of 10–12 amino acids in the EF-1α sequence (Kamaishi et al. 1996b), the separation of the genes encoding dihydrofolate reductase and thymidylate synthase (Duffieux et al. 1998), and the presence of a three-component mRNA capping system (Hausmann et al. 2002). Although most of these characters are not exclusively shared by microsporidia and fungi, they support the hypothesis of a close relationship between these two groups. However, they do not contribute to the determination of the precise evolutionary position of microsporidia: Are microsporidia true fungi or are they a sister group to them?

Most of the phylogenies built so far, including in this study, do not provide a clear-cut answer to this question because of poorly sampled fungi and microsporidia. Two recent analyses based on sequences of either EF-1α and the RNA polymerase II LSU (Tanabe et al. 2002) or α- and β-tubulins (Keeling 2003) were performed with various fungi and microsporidia. They led to different conclusions. In the EF-1α tree, the microsporidium G. plecoglossi emerges far from fungi and clusters with E. histolytica. Considering the poor resolution of the relationships between the major groups in this tree, the nonrejection of alternative hypotheses placing microsporidia elsewhere within eukaryotes (Tanabe et al. 2002), and the larger increase in the evolutionary rate of EF-1α in microsporidia (see above), this protein does not appear to be a good marker. However, because of the insertion of two amino acids common to all fungal sequences and to sequences of a small number of phylogenetically unrelated organisms, but absent in the two known microsporidia sequences, Tanabe et al. (2002) proposed that microsporidia are not nested within fungi but are very likely the sister group of animals and/or fungi. This conclusion was considered to be compatible with the phylogenetic analysis of RNA polymerase (Tanabe et al. 2002).

In contrast, the recent thorough, separated or combined, analyses of α- and β-tubulins support an emergence from fungi, possibly from Zygomycetes and close to Zoopagales and Entomophthorales (Keeling 2003). However, consideration of alternative hypotheses does not allow us to exclude the emergence of microsporidia just after chytrids or, for α-tubulin, as a sister group of all fungi. Moreover, due to the increase in the evolutionary rate of these proteins in fungi (except for chytrids) and microsporidia, tubulins may not be ideal markers (Keeling 2003).

Our finding that the acceleration of the evolutionary rate is not the same in all E. cuniculi proteins suggests that alternate potential markers should be used to pursue investigations on the evolutionary origin of microsporidia. Among the proteins examined in the present work, those supporting a close relationship with fungi (see supplementary material) could be good candidates and could therefore be chosen for further extensive sequencing of genes from several species representative of the fungal and microsporidian lineages.