Introduction

Actin is an important eukaryotic protein well known for its essential role in cytoskeletal formation, cell division and many other cellular processes (Zheng et al. 2009). Due to its ubiquity and high degree of sequence conservation, actin is often used as a phylogenetic marker for higher-level eukaryote systematics (e.g., Baldauf et al. 2000; Bhattacharya and Weber 1997; Keeling 2001; Leander and Keeling 2004; Voigt and Wostemeyer 2001). In land plants and animals, actin is typically encoded by a complex gene family resulting from gene duplication and diversification (Bhattacharya et al. 2000), and the expression of the actin gene duplicates can differ depending on the tissue type and developmental stage in these multicellular organisms (Meagher et al. 1999). In contrast, in unicellular members of the green algae, alveolates and some other algal lineages such as heterokonts, actin genes are generally believed to occur as a single copy (Bhattacharya and Ehlting 1995; Bhattacharya et al. 1991, 1998; Goodner et al. 1995; Leander and Keeling 2004). However, recent studies showing actin gene multiplicity in some unicellular algal lineages such as red algae (e.g., Wu et al. 2009) suggest that there is still much to learn about actin gene family evolution in eukaryotes and that paralogy may be more of a confounding factor in the interpretation of actin gene/protein trees than originally believed.

Cryptomonads are an important group of unicellular algae that evolved photosynthesis by a “secondary” endosymbiotic event between a bi-flagellated host and a red algal endosymbiont (Archibald 2007, 2009b; Cavalier-Smith et al. 1996; Douglas and Penny 1999). Together with the chlorarachniophytes, cryptomonads are of particular evolutionary significance by virtue of the fact that the algal endosymbiont nucleus still persists in a miniaturized form called a “nucleomorph,” a situation that does not exist in any other secondary plastid-containing group (Reyes-Prieto et al. 2007; Gould et al. 2008). Nucleomorph genomes are the most highly reduced and compact nuclear genomes known (Douglas et al. 2001; Moore and Archibald 2009). The extent to which the nuclear genome of cryptomonads, and indeed all secondary plastid-containing algae, is an amalgam of eukaryotic genes from both the host and the endosymbiont is an open question, one with important implications for deep eukaryote phylogeny (Elias and Archibald 2009; Lane and Archibald 2008; Moustafa et al. 2009).

In 2000, Stibitz et al. discovered two distinct types of actin genes in the cryptomonad Pyrenomonas helgolandii. In their phylogenetic analyses, one of these genes grouped together with homologs from other photosynthetic cryptomonads as well as Goniomonas, a non-photosynthetic and plastid-lacking cryptomonad (McFadden et al. 1994). The second actin gene, much more divergent in nature, branched with a red algal actin sequence, suggesting that it could be of secondary endosymbiotic origin. More recently, actin genes of apparent red algal origin were found in P. helgolandii, Guillardia theta and several Rhodomonas species by Tanifuji et al. (2006) and were confirmed to be present in the nuclear genome of these organisms, not in the nucleomorph genome. These results suggested that at some point during cryptomonad evolution actin genes encoded in the endosymbiont nuclear genome were transferred to the host nucleus, with their protein products acquiring a function in the host cytosol. However, statistical support for the relationship between the cryptomonad and the red algal actins was weak and the phylogenies were complicated by the divergent nature of both the red algal and presumed to be nucleomorph-derived actin genes (Tanifuji et al. 2006). Indeed, and as mentioned above, it is now known that some species of red algae have multiple actin genes and in some cases, each gene has a different evolutionary rate (Hoef-Emden et al. 2005; Kitade et al. 2008; Le Gall et al. 2005; Wu et al. 2009; Zuccarello et al. 2009).

Here, we present an analysis of four actin genes from the nuclear genome of the cryptomonad G. theta and four actin genes from the red alga Galdieria sulphuraria. Phylogenetic analysis of these sequences in the context of actin proteins from across the full breadth of eukaryotic diversity suggests that gene duplication and differential gene loss have been an important factor in the evolution of the actin gene family in red algae. Consideration of intron–exon positions in the G. theta actin genes assisted in determination of inter-relationships between these four genes and revealed a probable case of reverse transcriptase-mediated complete intron loss.

Materials and Methods

Actin Gene Finding, Intron, and Target Peptide Prediction

A BLAST-based search for actin genes in the nuclear genome of the cryptomonad Guillardia theta was carried out against a preliminary 8X genome assembly generated by the Community Sequencing Program of the Joint Genome Institute (JGI) (http://www.jgi.doe.gov/sequencing/why/50026.html). The Galdieria sulphuraria genome project database (Weber et al. 2004) (http://genomics.msu.edu/galdieria/) was also searched using tBLASTn (Altschul et al. 1997) with the red algal-like actin protein sequence of Pyrenomonas helgolandii (AF284835) used as a query with an e value cutoff of <0.01. Preliminary protein sequences were inferred based on these tBLASTn results. In order to confirm whether the resulting G. theta genes did indeed encode bona fide actin homologs, preliminary phylogenetic trees were constructed using an alignment that included diverse actin sequences as well as actin-related proteins from Arabidopsis thaliana and actin super-family sequences from the green algae Chlamydomonas reinhardtii, Volvox carteri and Micromonas sp. RCC299. Only homologs that were “true” actin proteins (i.e., not actin-like sequences) were used in subsequent analyses. A total of four actin genes were found in both the G. theta and the G. sulphuraria genomes.

In the case of G. theta, complete actin cDNA sequences were successfully amplified and sequenced by RT–PCR and 5′/3′ RACE (see below) for 3 of the 4 actin genes identified by BLAST (designated G. theta actins 1, 2, and 3). To verify predicted intron positions, alignments between genomic and complete cDNA sequences were generated using MEGA 4.0 (Kumar et al. 2008; Tamura et al. 2007). Since amplification of a cDNA from the G. theta actin 4 gene was not successful, intron/exon positions for the actin 4 gene of G. theta were predicted solely by comparison to the other two divergent actin genes and their inferred protein sequences.

For Galdieria sulphuraria, scaffolds containing actin genes were downloaded from the G. sulphuraria genome database and imported into the Artemis genome assembly software (release 11.22; Rutherford et al. 2000). Intron–exon boundaries were determined by aligning genomic DNA and EST sequences. In the case of G. sulphuraria actins 1 and 2, no ESTs were identified; protein sequences were therefore inferred from these genes using other actin proteins as a guide. To compare intron positions in the actin genes of both G. theta and G. sulphuraria, we used the actin gene on chromosome VI of Saccharomyces cerevisiae (EMBL RefSeq Genome database, accession and locus no. NC_001138) as a reference, as was done by Hoef-Emden et al. (2005).

Actin proteins were examined manually for the presence of amino (N)-terminal extensions that could serve as targeting peptides. The N-terminal regions of new proteins from G. theta and G. sulphuraria were searched for the presence of signal or transit peptide sequences using iPSORT (http://ipsort.hgc.jp/; Bannai et al. 2002).

RNA Extraction, RT-PCR, and 5′/3′ Rapid Amplification of cDNA Ends (RACE)

Total RNA from G. theta was extracted using Trizol reagent (Invitrogen, Carlsbad, California) according to the manufacture’s instructions. DNase I treatment was performed at 37°C for 20 min. In order to remove DNase I, the sample was purified by phenol/chloroform extraction and ethanol precipitation.

5′/3′ RACE was performed using the 2nd Generation 5′/3′ RACE kit from Roche Diagnostics (Mannheim, Germany) according to the manufacturer’s instructions with the following modifications. Primer sequences used to synthesize first-strand 5′ cDNA ends were as follows: host_5′_sp1 (5′-GATGGAGGAGGACTGGGCA-3′) for G. theta actin 1, sca164_5′_606_629_sp1 (5′-TATCCTCTCTCAGACAGCAGC-3′) for actin 2, and sca708_5′_LA1_sp1 (5′-GCTCATTCCCAATTGCGATGACCTGTCC-3′) for actin 3. For the first PCR of 5′ RACE experiments, a poly-T forward primer and the following reverse primers were used: host-5′_sp2 (5′-CGCTCGGTGAGGATCTTCAT-3′) for actin 1, sca164_5′_320_345_sp2 (5′-ATGGGATGCTCGTCGGGCAAG-3′) for actin 2, and sca708_5′_1_sp2 (5′-AGATTTCTCCAGGTCATTGGATTCA-3′) for actin 3. A second PCR was done only for the actin 3 gene using a common PCR anchor primer, which was included in the RACE kit, and primer sca708_5′_sp3 (5′-AGACATTACTGCTTGCACTGCGAC-3′).

For 3′ RACE, total RNA was reverse transcribed using a poly-T primer. The first PCR reactions were performed using an anchor primer and specific primers as follows: host 3′_sp1 (5′-ATGAAGATCCTCACCGAGCGCGGCTACTCC-3′) for actin 1, sca164_3′_sp1 (5′-CAATCTTGAGGCTTGACCTTGCTG-3′) for actin 2, and sca708_3′_sp1 (5′-CTGTACCAATCTATGAAGGCTATGC-3′) for actin 3. A second PCR was also performed using an anchor primer and the following second PCR-specific primers: host 3′_sp1 (5′-TCCTCCTCCATCGAGAAGTCC-3′) for actin 1, sca164_3′_sp1 (5′-CGTGATCTTACCGAATACATGTGC-3′) for actin 2, and sca708_3′_sp1 (5′-CTCGACATCTGCAGAGCAAGAGAT-3′) for actin 3.

Primer sequences for RT-PCR were as follows: host_425_450_F (5′-CATGTACGTCGCCATCCAGGCTGTGCTCT-3′) and host_839_859_R (5′-CGTGGATGCCAGCCGACTCGAGTCCGATG-3′) for actin 1, sca164_320_345_F (5′-CTTGCCCGACGAGCATCCCATCCTAGTC-3′), and sca164_R (5′-TCTCATATGAGACTTCGCAATCATTAGAGGTT-3′) for actin 2, and sca708_320_348_F (5′-TATACCGGAAGAACATCCTGTCCTATTGA-3′) and sca708_717_745_R (5′-AGATTTCTCCAGGTCATTGGTTCAGCGGC-3′) for actin 3.

All PCR reactions were performed using the following general conditions: 94°C for 10 min, followed by 30 cycles of 94°C for 15 s, 55°C for 30 s, 72°C for 45 s. PCR products were cloned into the p-GEM easy T vector (Promega, Madison, WI, USA). At least three independent bacterial colonies were grown in LB broth overnight and the plasmids were extracted using the QuickLyse Miniprep Kit (Promega, Madison, WI, USA). Plasmid inserts were sequenced using the CEQ Dye Terminator Cycle Sequencing Kit (Beckman Coulter, Inc., Fullerton, CA, USA) and a Beckman Coulter CEQ8000 capillary DNA sequencer. The G. theta sequences determined in this study have been submitted to the DNA Data Bank of Japan (DDBJ) under the following accession numbers: G. theta actin 1, AB544304; G. theta actin 2, AB544305; G. theta actin 3, AB544306; G. theta actin 4, AB544307.

Phylogenetic Analysis

Multiple sequence alignments of actin proteins were constructed and edited using the CLUSTAL W option of the MEGA 4.0 sequence alignment package. All positions containing gaps and missing data were removed. The dataset consisted of actin proteins including 288 unambiguously aligned amino acid positions and 102 operational taxonomic units (OTUs).

Phylogenetic analyses based on actin protein sequences were carried out using maximum likelihood (ML), Bayesian inference (BI), neighbor joining (NJ), and maximum parsimony (MP) methods. In order to select an appropriate amino acid substitution model for ML analysis, the likelihood ratio test was performed using ProtTest version 10.2 (Abascal et al. 2005). ML phylogenetic trees were constructed using RaxML version 7.0.4 (Stamatakis et al. 2008) with the RtRev substitution matrix and Gamma + Invar model (4 site rate categories). Starting trees for RaxMl analyses were generated using a single MP tree and with 10 randomly swapped MP trees. Bootstrap values were calculated using the rapid bootstrap method and the CAT model with 1,000 replicates.

The same alignment and evolutionary models were used for BI methods using MrBayes 3.1.2 (Altekar et al. 2004; Ronquist and Huelsenbeck 2003). Simultaneous Markov chains were run for 5,000,000 generations, sampling every 100 generations. The first 5,000 of 50,000 trees were discarded as “burn-in.” Posterior probabilities for each node were obtained with MrBayes 3.1.2. MP and NJ methods were carried out with the JTT substitution matrix and Gamma (4 site rate categories) using MEGA 4.0, and bootstrap values for the MP and NJ analyses were based on 1,000 replications.

Genetic Distances Among Three Divergent G. theta Actin Genes

Two datasets were constructed to compute genetic distances between the actin 2, 3, and 4 genes of G. theta. The first dataset contained 1,113 bp of exon DNA sequences aligned with clustal W and inspected manually. For the second dataset, five introns present in all three genes at amino acid positions 73, 218, 246, 281, and 351 (relative to the S. cerevisiae actin, see above) were aligned individually using clustal W and inspected by eye, and then concatenated to produce a second data set containing a total of 222 bp of intronic sequence. All positions containing gaps and missing data were removed. The number of base substitutions per site was examined by pair-wise sequence analysis with MEGA 4.0. Analyses were conducted using the maximum composite likelihood method. Rate variation among sites was modeled on a gamma distribution with four rate categories.

Results

Novel Actin Genes from the Cryptomonad Guillardia theta and the Red Alga Galdieria sulphuraria

Preliminary tBLASTn searches of the G. theta nuclear genome identified seven scaffolds as containing actin gene candidates with e values <0.01. In order to determine whether these genes represent classical actin loci or more distantly related actin paralogs, a preliminary phylogenetic tree was constructed from an amino acid alignment containing a broad set of actins together with a variety of actin-related proteins from Arabidopsis thaliana, Chlamydomonas reinhardtii, Volvox carteri, and Micromonas sp. RCC299. Four of the seven G. theta actin genes correspond to proteins that branched with the “true” actin clade (data not shown) and were thus designated actin genes 1, 2, 3, and 4 (Table 1). Complete cDNA sequences of actins 1, 2, and 3 (but not 4) were successfully amplified by RT-PCR and 5′/3′ RACE, allowing the presence and position of spliceosomal introns to be determined. Actin 1 did not contain any introns, while actins 2, 3, and 4 contained 14, 13, and 11 introns, respectively, ranging from 45 to 237 bp in size. A significant degree of similarity in the intron positions of these three genes was observed (see below). Actin genes 1 and 3 corresponded to the partial actin sequences published previously (Stibitz et al. 2000, AF 284835; Tanifuji et al. 2006, AB126028), while actins 2 and 4 have not been identified before. None of the actin protein sequences presented here for G. theta were found to possess obvious amino terminal signal and/or transit peptide sequences characteristic of nucleus-encoded, plastid- or periplastid-targeted proteins (data not shown). We thus assume, as did by previous authors (Stibitz et al. 2000; Tanifuji et al. 2006), that their protein products function in the host cell cytoplasm.

Table 1 Actin genes from the cryptomonad Guillardia theta and the red alga Galdieria sulphuraria, along with their accession numbers (G. theta) or coding sequence (CDS) prediction IDs (G. sulphuraria)

Eight scaffolds in the Galdieria sulphuraria genome supercontig database (Build 3.0, May 2007) were found to contain actin or actin-related genes on the basis of a tBLASTn search. Protein sequences were inferred and subjected to preliminary phylogenetic analysis, resulting in four scaffolds (stig_0, 9, 11, and 29) being found to contain canonical actin genes. Gene models for four of these genes were present in the “Build 3.0” database (stig_0.Gs01760.1, stig_9.Gs15600.1, stig_11.Gs17630.1, and stig29.Gs35290.1), the latter two corresponding to sequences in the G. sulphuraria EST database (gi|00000756 and gi|00000116, respectively). The sequences derived from the stig_0.Gs01760.1 and stig_9.Gs15600.1 scaffolds were not found in GenBank or the NCBI EST database. The four G. sulphuraria actin genes are referred to herein as actin 1 (stig_11.Gs17630.1), actin 2 (stig29.Gs35290.1), actin 3 (stig_0.Gs01760.1), and actin 4 (stig_9.Gs15600.1) (Table 1). A search for additional actin genes in the genomes of the red alga Cyanidioschyzon merolae and the haptophyte Emiliania huxleyi uncovered no novel sequences.

Phylogenetic Analyses

Molecular phylogenetic analyses of 102 actin protein sequences from across the eukaryotic tree were carried out using maximum likelihood (ML), neighbor-joining (NJ), maximum parsimony (MP), and Bayesian inference (BI) methods (Fig. 1). As shown previously (Stibitz et al. 2000; Tanifuji et al. 2006), cryptomonad actin sequences are partitioned into two very distinct clusters. These were arbitrarily designated “type-1” and “type-2,” with the G. theta actin gene 1 belonging to the former and genes 2, 3, and 4 belonging to the latter. The cryptomonad type-1 clade possesses sequences with relatively short branch lengths, while the type-2 clade is much more divergent (in particular the G. theta actin 4 sequence) and corresponds to the clade that was shown previously to have affinity to red algal sequences and was touted as being of secondary endosymbiotic origin (Stibitz et al. 2000; Tanifuji et al. 2006). Overall, phylogenies constructed from DNA sequence alignments (with all three codon positions as well as 1st and 2nd positions only) showed very similar branching patterns to those seen in the protein-based tree (Fig. 1; data not shown).

Fig. 1
figure 1

Maximum likelihood phylogenetic tree constructed from an alignment of 102 actin proteins and 288 amino acid sites using RaxML. Support values are as follows: numbers above branches are bootstrap support percentages from ML (left) and NJ (right) analyses, numbers under branches are bootstrap values from MP analysis (left) and Bayesian posterior probabilities (right). No numbers are shown where bootstrap support was less than 80% or posterior probabilities were less than 0.8. Arrowheads show new actin proteins from the cryptomonad Guillardia theta and the red alga Galdieria sulphuraria. Scale bar indicates inferred number of amino acid substitutions per site

Unexpectedly, the red algal actin protein sequences also partition into two highly distinct clusters (arbitrarily “type-1” and “type-2”). The type-2 clade includes all members of the subphylum Rhodophytina (as in Yoon et al. 2006) for which sequences are known, as well as the actin 1 and 2 homologs of G. sulphuraria, a member of the Cyanidiophytina, which branch at the base of this group with maximum support (Fig. 1). The type-1 red algal actin clade consists of the actin 3 and 4 genes/proteins from G. sulphuraria, as well as single homologs from Cyanidioschyzon merolae and Cyanidium caldarium. Significantly, G. sulphuraria possesses actin homologs that branch in both the type-1 and -2 clades. In the context of this large phylogeny, the cryptomonad type-2 clade branches adjacent to a red algal type-2 and haptophyte clade, although, there is no support for this position (Fig. 1).

Given that the G. theta actin 4 protein and, to a lesser extent, the red algal type-2 proteins are quite divergent in our phylogenies, additional trees were constructed from alignments excluding actin 4 of G. theta and red algal type-2 actin proteins, with the exception of those from G. sulphuraria, to reduce potential long branch attraction artifacts. The overall topology of this tree (Fig. S1) was essentially the same as that shown in Fig. 1.

Actin Introns

Figure 2 shows the positions of introns and intron phases in the G. theta and the G. sulphuraria actin genes, using the actin protein of the yeast Saccharomyces cerevisiae as a reference. Since the cDNA sequence of actin 4 from G. theta was unavailable, the intron positions and phases of actin 4 were predicted by comparison to the other actin gene sequences. As noted above, G. theta actin genes 2, 3, and 4 contain numerous introns, many of which are in homologous positions (including the same intron phase) and share significant sequence similarity (see below). Actins 2 and 3 share 6 intron positions (4 of the 6 introns in the same position were also in the same phase), while actins 2 and 4 share 5 intron positions (4 of 5 phases were shared). Actins 3 and 4 share 11 introns (10 of 11 phases were shared). Overall, 5 introns (amino acid positions 73, 218 246, 282, and 351) are shared among the three G. theta actin genes designated as type-2.

Fig. 2
figure 2

Distribution of introns in the Guillardia theta type-2 and Galdieria sulphuraria type-1 actin genes. Boxes represent exons, lines represent introns. Numbers above the lines indicate amino acid positions relative to the yeast Saccharomyces cerevisiae (see text), numbers under the line correspond to intron phase. Gray boxes indicate same intron position, and dotted lines indicate introns in the same position and in the same phase. The intron positions of G. theta actin 4 were predicted by in silico analyses (see “Materials and methods”)

In the case of the red alga Galdieria sulphuraria, actins 3 and 4, which belong to the red algal type-1 clade, contain introns whereas actins 1 and 2 possess no introns at all. G. sulphuraria actin 3 contains seven introns corresponding to amino acid positions 52, 73, 113, 157, 177, 221, and 246. Actin 4 contains each of these introns (in the same phase as their counterparts in actin 3) plus an eighth intron near the 3′ end of the gene. Two intron positions (73 and 246) were shared among actins 2, 3, and 4 of G. theta and actins 3 and 4 of G. sulphuraria (Fig. 2).

Pair-Wise Comparisons of Actin Genes in G. theta

Table 2 shows the results of pair-wise distance calculations of both exonic and intronic regions for the actin 2, 3, and 4 genes of G. theta. 1,113 bp of exon was used. The highest degree of similarity was found to be 0.365 between actins 2 and 3. The greatest distance was 0.684 between actins 2 and 4. In order to infer genetic distances among introns, 5 homologous intron sequences were aligned independently, and a 222 bp concatenate was used for pair-wise analyses. The highest degree of similarity was 0.788 between actins 3 and 4, while the greatest difference was found to be 1.698 between actins 2 and 4.

Table 2 Estimated evolutionary divergence of cryptomonad type-2 actin genes from Guillardia theta

Discussion

The main rationale for this study was to better understand the origin and the evolution of divergent actin genes in cryptomonads relative to red algal actins. To that end, we analyzed four actin coding genes from the nuclear genome of the cryptomonad Guillardia theta as well as four novel actins from the red alga Galdieria sulphuraria, a member of the Cyanidiophytina. Phylogenetic analyses show that the actin genes from both cryptomonads and red algae are divided into two distinct clusters in the context of global eukaryote phylogeny (Fig. 1); although, not all red algae appear to possess both types. Actin is, on the whole, one of the most highly conserved proteins in eukaryotes, but the branch lengths leading to both the red algal type-2 and the cryptomonad type-2 clades were relatively long when compared to those of most other major algal clades. This is consistent with the results of previous studies (Stibitz et al. 2000; Tanifuji et al. 2006). Having ruled out the (remote) possibility that these two divergent actin clusters in fact belong to actin-related gene families, and are not canonical actin genes, we conclude that these two families are the result of gene duplications and/or endosymbiotic gene transfer. Recently, it was reported that several algal lineages, including red algae, encode conservative and/or divergent actin genes (Wu et al. 2009). These different isoforms were suggested to be the product of gene duplication events occurring after the major algal lineages diverged. Our findings are similar in showing evidence for lineage-specific gene duplications, but also suggest much deeper paralogies in the evolution of red algal and cryptomonad actins.

Origin and Diversification of Actin Genes in Cryptomonads and Their Relationship to Red Algal Homologs

We have designated the two classes of cryptomonad actin genes as type-1 and -2. The type-1 clade includes sequences from all previously examined cryptomonads (Stibitz et al. 2000; Tanifuji et al. 2006), including the plastid-lacking, deeply diverging species Goniomonas truncata (McFadden et al. 1994). These facts are consistent with the idea that the cryptomonad type-1 actin gene is the host-derived version, present in the nucleus before cryptomonads acquired photosynthesis by secondary endosymbiosis (Stibitz et al. 2000); although, it is not known whether the cryptomonad plastid was acquired before or after the divergence of Goniomonas from plastid-bearing cryptomonads (see Archibald 2009a, b).

In contrast, the type-2 actins of cryptomonads form a monophyletic group—albeit weakly supported—with haptophytes and the red algal type-2 clade (Fig. 1). Stibitz et al. (2000) first suggested that the P. helgolandii homolog, here designated as a type-2 sequence, is the product of endosymbiotic gene transfer (Fig. 3a), because it branched with the two red algal sequences available at that time. However, these authors acknowledged the possibility that this relationship might be due to a long-branch attraction artifact, because the red algal actins and the P. helgolandii homolog were relatively long branches, and the bootstrap support for their monophyly was low. Our phylogenetic trees, including those constructed from alignments with reduced taxon sampling and more amino acid sites, were consistent with the monophyly of cryptomonad type-2 and red algal type-2 clades as well, although, bootstrap support was always weak (data not shown). Given that actins 1 and 2 from the red alga G. sulphuraria serve, to some extent, to break up the long branch leading to the base of the red algal type-2 clade, we reasoned that the long branch attraction problem might be reduced when compared to previous phylogenetic analysis and a more highly resolved tree might be the result. Unfortunately this was not the case (Figs. 1, S1).

Fig. 3
figure 3

Hypotheses for the evolution actin genes in cryptomonads and red algae. a Evolutionary scheme showing an endosymbiotic gene transfer (EGT) hypothesis for the origin of type-2 actin genes in Guillardia theta and other cryptomonads. b Evolutionary scheme of cryptomonad actin genes based on gene duplication. c Possible evolutionary scenario for the evolution of multiple actin isoforms in Galdieria sulfuraria and other red algae involving gene duplication/loss and/or lateral gene transfer (LGT). Arrows within circles denote gene duplication events

If the cryptomonad type-2 actins are not of red algal endosymbiotic origin, where did they come from? The most straightforward evolutionary scenario (Fig. 3b) is that they are the result of a gene duplication of a cryptomonad type-1 gene. However, as noted above our phylogenetic trees (and those published previously) provide no evidence for a specific relationship between the type-1 and -2 clusters (Figs. 1, S1). Even when all red algal actin genes and the divergent actin 4 of G. theta are excluded, the cryptomonad type-2 clade still shows no affinity to the cryptomonad type-1 clade or indeed any of the other major algal clades (data not shown). If the type-1 and -2 actins are the product of gene duplication, it occurred some time ago and the increase in evolutionary rate of the type-2 sequences has apparently erased all traces of this relationship, to the point that the two paralogs no longer even branch together to the exclusion of sequences from other eukaryotes.

The complete absence of introns in the G. theta actin 1 gene (Fig. 2) precludes consideration of conserved intron positions between the type-1 and -2 genes of cryptomonads. However, intron comparisons do add further strength to our phylogenetic trees in suggesting that G. theta actins 2, 3, and 4 are recent duplicates of one another. These three genes share five introns, and the highly divergent actin 4 shares all 11 of its introns with actin 3. Pair-wise distance comparisons tell the same story: despite the extraordinarily divergent nature of its protein sequence, the actin 4 gene is still very similar to actin 3 in its intron sequences (Table 2). It is possible that actin 4 is in fact a recent pseudogene, as we were unable to amplify an actin 4 cDNA by RT-PCR.

The nuclear genome of the cryptomonad G. theta is being sequenced as part of the Joint Genome Institute’s Community Sequencing Program, and preliminary investigations suggest that introns are abundant (Archibald Laboratory, unpublished data). It is thus possible that the intron free structure of the G. theta actin 1 gene is the result of reverse transcriptase-mediated complete intron loss, which is an established model for intron loss by homologous recombination between an intron-lacking cDNA and the genomic locus from which it is derived (Mourier and Jeffares 2003). In addition, only one type-1 sequence was found in the 8X genome of G. theta, in contrast to Goniomonas and Rhodomonas species where multiple copies were found (Tanifuji et al. 2006). More analysis is needed to elucidate the evolutionary processes underlying this variation in actin gene structure and copy number.

In sum, our analyses are consistent with—but by no means prove—the hypothesis that cryptomonad type-2 actin genes are derived from red algae by endosymbiotic gene transfer. It is possible that a much broader sampling of cryptomonad type-1 and -2 genes, and in particular, discovery of less divergent members of the type-2 clade, will allow us to better infer their evolutionary origin. The haptophytes also deserve closer attention, as the sequences available at the present time show weak but consistent affinity to the cryptomonad type-2 and red algal type-2 sequences, and not the cryptomonad type-1 cluster.

Actin Gene Phylogeny in Red Algae

The presence of multiple copies of actin genes in red algae has been reported previously (Hoef-Emden et al. 2005; Le Gall et al. 2005; Takahashi et al. 1998; Wu et al. 2009; Zuccarello et al. 2009). The consensus view is that their actin paralogs were produced by gene duplication after the major red algal lineages diverged from one another. However, our phylogenetic analyses show that the actin genes from some red algae fall into two very distinct clusters (Figs. 1, S1). Red algae are generally split into two major lineages, Rhodophytina and Cyanidiophytina, on the basis of their morphological and genetic features (Yoon et al. 2006). All examined Rhodophytina species have sequences in the red algal type-2 clade, and only two genes from a member of the Cyanidiophytina, both from G. sulphuraria, branch with the red algal type-2 clade. The red algal type-1 clade currently consists solely of sequences from Cyanidiophytina species.

Evidence for “recent” actin gene duplication within the Cyanidiophytina comes from consideration of the four G. sulphuraria homologs and the presence and distribution of their introns. G. sulphuraria actin 4 shares 7 of 8 intron positions with actin 3 (Fig. 2). This is intriguing given that the actin 3 homolog branches robustly with a single C. merolae sequence, which does not contain any introns (Matsuzaki et al. 2004), as well as a sequence from Cyanidium caldarium (Figs. 1, S1). These data suggest that the two type-1 actin gene copies in G. sulphuraria are the product of gene duplication in the common ancestor of G. sulphuraria, C. merolae, and C. caldarium, followed by loss of one of the paralogs in C. merolae (and possibly C. caldarium) as well of loss of introns in the remaining gene. This is consistent with low intron density seen in the small C. merolae genome (Matsuzaki et al. 2004; Nozaki et al. 2007).

Overall, one scenario for actin gene evolution in red algae (Fig. 3c) is as follows: (1) a common red algal ancestor possessed both types of actin genes before they diverged into Rhodophytina and Cyanidiophytina, (2) the Rhodophytina lost the red algal type-1 type actin gene, and the Cyanidiophytina—with the sole exception thus far of G. sulphuraria—lost the red algal type-2 type actin gene, (3) various novel actin genes were subsequently generated by more recent gene duplications within each lineage, and (4) introns were gained and/or lost in the various actin paralogs prior to and/or after the split between Rhodophytina and Cyanidiophytina (see next section).

The main problem with this scenario is that the two clusters of red algal actin sequences do not branch together in our phylogenies, as would be predicted if gene duplication took place in the common ancestor of Rhodophytina and Cyanidiophytina. This issue was in fact recognized previously in the analyses of Hoef-emden et al. (2005), who sequenced 19 red algal actin genes and omitted the sequences from the Cyanidiophytina available at that time, due to valid concerns over long-branch attraction artifacts. As is the case in cryptomonads, if the red algal type-1 and -2 sequences are the product of a gene duplication event specific to the common ancestor of all red algae, the divergent nature of the red algal type-2 sequences is such that all phylogenetic signal supporting their monophyly appears to have been erased.

A more radical alternative is that one of the red algal sequence types—and indeed one of the cryptomonad actins—is not a paralog but the product of a lateral gene transfer (LGT) event. Evidence is mounting that LGT can be an important factor in eukaryotic genome evolution (see Keeling and Palmer 2008 and references therein for recent review), but most examples characterized thus far involve prokaryote-to-eukaryote LGT. Nevertheless, eukaryote–eukaryote transfers have been identified, including those involving highly conserved proteins that have traditionally been used for eukaryote systematics, such as alpha-tubulin (Simpson et al. 2008), and elongation factors (Keeling and Inagaki 2004). At the present time, there is no evidence for or against this hypothesis, as “donor” or “recipient” lineages are not obvious from our trees and heterogeneity in the rates of actin sequence evolution make building reliable phylogenies difficult. Regardless, it is potentially significant that while the actin 3 and 4 genes of G. sulphuraria were detected by EST sequencing in Weber et al. (2004), actins 1 and 2 were not. This is consistent with the possibility of functional differentiation of actin genes in this organism.

Comparison of Actin Introns in Cryptomonads and Red Algae

The actin gene families of G. theta and G. sulphuraria are superficially similar in that they both possess an intron-lacking (cryptomonad type-1 and red algal type-2) and intron-containing (cryptomonad type-2 and red algal type-1) family. Actin intron distribution in members of the Rhodophytina has been examined previously; these include introns at positions 35 (Erythrotrichia carnea), 41 (Bonnemaisonia hamifera ACT1, B. hamifera ACT2, Nemalionopsis shawi ACT1, and Chondrus crispus ACT1), 73 (Glaucosphaera vacuolata), 296 (B. hamifera ACT2), and 304 (Palmaria palmata ACT1) (Hoef-Emden et al. 2005).

Since the intron-lacking actin 1 and 2 genes from G. sulphuraria branch at the base of the red algal type-2 clade, two intron gain/loss scenarios are possible. First, a common ancestral actin gene of red algal type-2 completely lacked introns, and the introns in the Rhodophytina were acquired after the Rhodophytina and Cyanidiophytina diverged. Alternatively, multiple instances of intron loss have occurred in red algal type-2 actins, with a few introns still remaining in Rhodophytina species. Although it is at present impossible to distinguish which scenario is more likely, it is important to note that there are few shared intron positions in the red algal type-2 actin genes of different species. Only one intron (position 73) in G. vacuolata is shared with the type-1 actins (3 and 4) of G. sulphuraria. Interestingly, although the intron containing cryptomonad type-2 and red algal type-1 actin genes do not branch together in our phylogenies, they nevertheless shared two intron sites at positions 73 and 246 (Fig. 3). However, the intron at position 73 is also found in the streptophyte green alga Cosmarium botrytis (Hoef-Emden et al. 2005), and thus is not evidence for a specific relationship (via endosymbiotic gene transfer) between the cryptomonad type-2 sequences and the type-2 genes of red algae. In sum, the data suggest a complex pattern of intron gain and loss in red algal actin genes and those of cryptomonads.

Conclusion

The nuclear genomes of some cryptomonads encode two very distinct actin genes with different evolutionary rates and intron densities. The origin of the divergent type-2 gene remains a mystery, although it is still possible that it is derived from red algae by endosymbiotic gene transfer. Within red algae, a probable and hitherto unrecognized deep paralogy was revealed. The most straightforward explanation is that the common ancestor of red algae possessed two actin genes, and that only one of the two types was retained in each lineage. The sole exception thus far is G. sulphuraria. Much more data from diverse red algal genomes will be needed to rigorously test this hypothesis, but it nevertheless raises questions about the utility of actin as a broad phylogenetic marker for red algal systematics and, indeed, eukaryote systematics as a whole. Given that paralogy appears to be a recurring feature of the actin gene family on recent and ancient evolutionary time scales, it will be difficult to rigorously assess orthology versus paralogy in the absence of complete genome sequences.