Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

I. Introduction

The term ‘algae’ as used here includes all plastid-containing eukaryotes, except land plants and ‘blue-green algae’ (a popular misnomer for Cyanobacteria). Algae cover a large variety of about 20 taxonomic groups (among the best-known ones are green, red, brown and golden algae, diatoms, glaucophytes, raphidophytes, cryptophytes, haptophytes, chlorarachniophytes, dinoflagellates and euglenids). Some of these groups include both unicellular and multi-cellular species (e.g., the large-size brown algal kelp, various red and green algal taxa). In rare instances, algae are secondarily non-photosynthetic, carrying a plastid genome with reduced coding capacity; these include the colorless green algae Prototheca and Helicosporidium (Knauf and Hachtel 2002; Pombert and Keeling 2010); the euglenid Euglena (Astasia) longa (Knauf and Hachtel 2002); and Plasmodium and its apicomplexan relatives (McFadden and Waller 1997; Wilson and Williamson 1997).

Plastid genomes are best described and compared within an evolutionary framework (a phylogenetic tree based on plastid protein sequences is shown in Fig. 3.1), which is however more easily said than done. This is because phylogenetic placement of reduced or fast-evolving plastid sequences is challenging due to lack of phylogenetic signal. Another difficulty arises from the different evolutionary routes followed by plastids: vertical descent from a Cyanobacterium, and lateral acquisition from other eukaryotes. The latter entails the transfer of both the complete plastid DNA (ptDNA) plus an often undetermined number of nuclear genes from the symbiont to the host nucleus, leading to potential phylogenetic misinterpretations. For instance, the plastid tree in Fig. 3.1 groups dinoflagellates such as Kryptoperidinium and Durinskia with diatoms, apicomplexans with stramenopiles, cercozoans with green algae, and so on. One may indeed wonder which of the shown phylogenetic relationships represent vertical evolutionary descent at all. The only notable exception are primary photosynthetic eukaryotes (green, red, glaucophyte algae and land plants – collectively known as ‘Plantae’; (Cavalier-Smith 1981); see also Chap. 1), whose plastids derive directly from a cyanobacterial origin, and which are therefore expected to form a monophyletic group with nuclear, plastid and mitochondrial genes in phylogenetic analyses (Baurain et al. 2010).

Fig. 3.1.
figure 1

Phylogeny based on ptDNA-encoded proteins. The phylogeny was inferred with a set of 76 derived ptDNA-encoded protein sequences that are most common across algae. Non-photosynthetic species such as Plasmodium or Helicosporidium were excluded, as their ptDNAs encode only small subsets of genes that evolve at elevated rates (i.e., problematic for phylogenetic inference). Sequences were aligned with software developed in-house. Briefly, derived protein sequences are pre-aligned with Muscle (Edgar 2004), and alignments are iteratively refined with HMMalign (S. Eddy; http://hmmer.janelia.org) using E-values obtained with respective HMM models as an optimization criterion. Sequence positions that are not aligned with a posterior probability value of 1.0 are discarded, and alignments are concatenated. The resulting dataset including 48 ptDNAs (plus a cyanobacterial outgroup) has 17,409 amino acid positions. It was analyzed using PhyloBayes that implements Bayesian inference and the CAT model, known to be least sensitive to LBA artifacts (Lartillot and Philippe 2004; Rodriguez-Ezpeleta et al. 2007b; Lartillot and Philippe 2008, and references therein). It is best suited for the placement of rapidly-evolving species. All branches are supported by posterior probability values of 1.0 except where indicated. Species are color-coded according to their plastid origins as green, red or magenta (glaucocystophytes). Plastids that were transferred by secondary or higher-order eukaryote-eukaryote endosymbiosis are marked with an asterisk. The branch length of Chromera (dotted broken line) is about three times as long as indicated. Its association with Alveolata could be due to LBA; if analyzed alone, Chromera tends to group weakly with stramenopiles.

A. Origin and Evolution of Primary Photosynthetic Algae and Their Plastids

The origin of primary plastids represents a relatively late step in eukaryotic evolution, well after the endosymbiosis with the α-Proteobacterium that evolved into the mitochondrion. Most plastid genomes retain many more features of their (cyano) bacterial ancestor than do mitochondrial genomes, such as large conserved bacterial operons and bacteria-like RNA polymerases (but see the notable exception in jakobid mitochondria; Lang et al. 1997). Although plastids came in relatively late, the exact nature of the eukaryotic group which acquired plastids remains vague, as primary (ancestrally) non-photosynthetic members belonging to Plantae are unknown. In fact, even phylogenomic evidence for the monophyly (i.e. divergence from a single, common origin) of Plantae varies with taxon and gene sampling, with significant statistical support in some cases (e.g., Rodriguez-Ezpeleta et al. 2005 and references therein) but not in others (e.g., Burki et al. 2009; Baurain et al. 2010; Parfrey et al. 2010; Chan et al. 2011). Likewise, the branching order of primary photosynthetic lineages has been elusive, depending much on the choice of genes and species included in phylogenies (Rodriguez-Ezpeleta et al. 2005; Reyes-Prieto and Bhattacharya 2007; Deschamps and Moreira 2009). Taken together, much remains to be done in terms of resolving the origin and the evolutionary divergence of Plantae. Apparently, the resolution of the deepest branches of the eukaryotic tree remains unsatisfying, as deep eukaryotic (protist) diversity continues to be poorly sampled at the genome level. Yet, for sake of simplicity, we will assume in the following that Plantae is a valid taxonomic grouping, and therefore discuss plastids in two major subdivisions, (1) those derived from a primary endosymbiotic event and (2) those that have been acquired by higher-order (secondary, tertiary …) endosymbioses among eukaryotes.

B. Algae with Second-Hand Plastids: Eukaryote-Eukaryote Endosymbioses

In contrast to Plantae – which are characterized by plastids with two surrounding membranes – there are three or four membranes in algae that have undergone eukaryote-eukaryote endosymbiosis, the focus of most reviews on plastid DNAs (ptDNAs; e.g., Douglas and Gray 1991; Wolfe et al. 1991; Douglas 1998; McFadden 1999; Moreira and Philippe 2001; Archibald and Keeling 2002; Stoebe and Maier 2002; Bhattacharya et al. 2004; Reyes-Prieto et al. 2007; Gould et al. 2008; Archibald 2009; Keeling 2009; Keeling 2010). These plastids are in most instances retraced to either a red or green algal origin (Fig. 3.1), but whether the endosymbiotic event is secondary, tertiary or higher-order often remains speculative. In particular, the source of the highly reduced ‘apicoplast’ plastids in alveolates (e.g., Plasmodium, Eimeria, etc.) remains uncertain, believed to be of either red (Williamson et al. 1994; Fast et al. 2001; Foth and McFadden 2003) or green algal origin (Kohler et al. 1997; Funes et al. 2002, 2004). Even gene transfer from mitochondrial DNA (mtDNA) to apicoplast DNA has been proposed (Obornik et al. 2002), and there is currently no convincing avenue for overcoming the massive phylogenetic artifacts (long-branch-attraction artifacts or LBA) that are the likely cause of the unsettled dispute. Phylogenetic analyses including these species are so questionable because of both the small number of remaining plastid genes and their extreme evolutionary rates.

Another confounding factor in these analyses is the number of symbiotic events that took place across eukaryotes. Plastids in cryptophytes, alveolates, stramenopiles plus haptophytes (collectively, CASH) likely arose from a single secondary endosymbiosis with a red alga because of a unique, shared feature of their plastids, the presence of chlorophyll c, and because phylogenies based on plastid sequences (plus a few nuclear genes involved in plastid function) clearly regroup CASH with red algae. However, the consensus in interpretation stops here. Based on the idea that eukaryote-eukaryote endosymbiosis is a very rare event, proponents of the ‘chromalveolate hypothesis’ (Cavalier-Smith 2002; Keeling 2009, 2010) postulate a single, ancient secondary endosymbiosis with a red alga. This supposition is contested by others predicting much more frequent (higher-order), serial plastid transfers (Sanchez-Puerta et al. 2007; Baurain et al. 2010; Gray 2010 and references therein). We share the interpretation of frequent transfers because of cumulating evidence in this direction. In dinoflagellates, for instance, there is compelling evidence for a number of subsequent plastid replacements (e.g., Minge et al. 2010). Another (contentious) example is in two presumed photosynthetic relatives of Apicomplexa, Chromera velia and Alveolata sp. (CCMP3115). Phylogenies with several concatenated nuclear genes confirm that they represent taxonomically deep divergences to Apicomplexa, and a 34 plastid gene phylogeny associates their plastids close to (but outside of) stramenopiles (heterokonts; Janouskovec et al. 2010). According to the authors’ interpretation, this represents support for the chromalveolate hypothesis. Yet, the plastid phylogeny that we performed for the purpose of this review, with an extended number of species (79) and proteins (76) comes to a different conclusion, placing Chromera and Alveolata plastids together within stramenopiles (Fig. 3.1), indicative of a higher-order endosymbiosis. This example indicates that phylogenetic analyses with data from fast-evolving genomes have to be interpreted with extreme prudence, in particular when these diverge deeply in a tree, an indicator for a potential phylogenetic reconstruction artifact (Philippe et al. 2005). In turn, when broader taxon sampling and/or the use of a superior (more realistic) evolutionary model, such as CAT (Lartillot and Philippe 2004; Lartillot et al. 2007), leads to an alternative tree topology favoring the regrouping of rapidly with slowly evolving species, even with limited statistical support (as in Fig. 3.1), it is more likely the correct one. Clearly, further investigation of the given example is needed, which falls outside the mission of this review.

Given the confusion in distinguishing secondary and higher-order eukaryote-eukaryote endosymbionts, we will only refer to the following five well established taxa: (1) golden, brown, diatom and raphidophyte algae (Stramenopila; Patterson 1989), (2) Alveolata plus Stramenopila and Rhizaria (SAR group; Burki et al. 2007; Hackett et al. 2007; Rodriguez-Ezpeleta et al. 2007a), (3) haptophytes, (4) cryptophytes, and (5) euglenids (belonging to the ‘JEH group’ uniting jakobids, Euglenozoa plus Heterolobosea; Rodriguez-Ezpeleta et al. 2007a). It is noteworthy that there is only one major eukaryotic supergroup without photosynthetic members (and without evident genetic remnants of eukaryote-eukaryote endosymbioses), the Unikonta. This group comprises Opisthokonta (animals, fungi and their protist relatives), Amoebozoa, and arguably, Apusozoa.

In the following, we will review plastid genome organization in the various groups of algae. The highly reduced alveolate ptDNAs (apicoplasts) will not be discussed in detail as they have been well described elsewhere (Wilson and Williamson 1997; McFadden 2011).

II. Plastid Genome Organization, Genes and Functions

We will start with a short introduction on the structure of plastid genomes and the type of genes they encode, across all eukaryotes. For sequence records we refer to the plastid genome section at GenBank, and two curated databases, GOBASE (O’Brien et al. 2009) and ChloroplastDB (Cui et al. 2006). Note that (1) the catalogue of complete ptDNAs in GenBank’s genome section is currently incomplete (e.g., most records of the reduced apicomplexan and several green algal ptDNAs are missing and have to be retrieved from the nucleotide section), and gene and intron information is only validated as to consistency; (2) information in GOBASE is no longer being updated as of 2010, and (3) ChloroplastDB’s last update (at the time of writing this review) was in 2007, lacks taxonomical grouping of species, and data on certain structural RNAs (RNase P, tmRNA and signal recognition particle RNAs).

A. Plastid Genome Structure

Generally, plastids contain a single type of chromosome in multiple copies. Restriction analysis and sequencing revealed that most ptDNAs are circular-mapping (not to be confused with truly circular DNA molecules), likely representing linear head-to-tail concatemers, plus subgenome-size fragments that tend to occur in genomes carrying repeat regions (Bendich 2004, 2007; Oldenburg and Bendich 2004). A similar genome structure is observed in mitochondria (Oldenburg and Bendich 2001; Ling and Shibata 2004). The mechanism of replication remains essentially an open question. Small organelle genomes might be replicated by a rolling circle mechanism, but the presence of a substantial fraction of subgenome-size fragments suggests a more complex mechanism, likely including recombination, or template switching in other instances. Experimental evidence for this or any other type of organization and replication is available for only a small fraction of known plastid genomes. A notable curiosity exists in dinoflagellates, where several genes are separately encoded on DNA minicircles (Zhang et al. 1999, 2002; Howe et al. 2008). However, whether these circles represent the principal genome organization or highly abundant subgenomic molecules (e.g., replicative rolling circle DNAs) remains to be demonstrated. For more information on dinoflagellate plastids, we refer to recent publications (e.g., Zhang et al. 1999, 2002; Stoebe and Maier 2002; Hackett et al. 2004; Laatsch et al. 2004; Howe et al. 2008; Keeling 2010).

A widespread feature of pt genomes is a large inverted repeat (IR) region that contains genes for rRNAs and a variable number of tRNAs and proteins (e.g., Gardner et al. 1993; Douglas and Penny 1999; Sanchez Puerta et al. 2005; Belanger et al. 2006; Cattolico et al. 2008; Tanaka et al. 2011). The biological role of the IR region is likely increased gene dosage for ribosomal components (ribosomes are among the most abundant sub-cellular structures). The IR region may be present or not in related species (e.g., Pedinomonas minor, Parachlorella kessleri and Oocystis solitaria have this trait, whereas Chlorella vulgaris does not; Turmel et al. 2009b). Similarly, the two ptDNAs of photosynthetic cryptomonads have large inverted repeat regions containing rDNA genes in contrast to the direct repeats in Porphyra species, and no repeat in Cyanidium (Glockner et al. 2000), Cyanidioschyzon (Ohta et al. 2003) and Gracilaria (Hagopian et al. 2004). The pt genome of the secondarily non-­photosynthetic cryptomonad C. paramecium has single-copy rRNAs as most red algae, which is best explained as a secondary loss of the repeat. This comparison shows that, although repeat features are given high attention in many publications on complete plastid genomes, they are not well conserved across eukaryotes and are of undefined value for understanding the evolution of genome structure and function.

B. Plastid-Encoded Functions, Genes and Introns

Plastids perform numerous biological functions that rely to a large extent on nuclear genes, and that are translated in the cytoplasm and transported into plastids. For detailed information on protein import see (McFadden 1999; Wastl et al. 2000; Wastl and Maier 2000; van Dooren et al. 2001; Foth et al. 2003; Nassoury et al. 2003; Patron et al. 2005; Durnford and Gray 2006; Chaal and Green 2007; Patron and Waller 2007; Kessler and Schnell 2009; Ma et al. 2009; Felsner et al. 2010; Hempel et al. 2010; Kovacs-Bogdan et al. 2010; Li and Chiu 2010; Strittmatter et al. 2010).

Biological processes that involve at least some ptDNA-encoded genes are translation and photosynthesis. Only species that lost their photosynthetic capacity gradually eliminate the corresponding genes (Gockel and Hachtel 2000; de Koning and Keeling 2006). Additional biological processes that rely on pt-encoded genes involve transcription, ­protein transport and plastid division. Further, in a more restricted number of cases, ptDNAs code for components for tRNA processing (RNase P RNA), quality control of protein translation (tmRNAs; Gueneau de Novoa and Williams 2004), the signal recognition particle RNA (Rosenblad and Samuelsson 2004; Schunemann 2004), plus several other functions that are limited to the most gene-rich ptDNAs, in particular from red algae (Glockner et al. 2000; Ohta et al. 2003; Hagopian et al. 2004). Currently recognized pt genes and their functions are compiled in Table 3.1. This list is expected to extend as the functions of ycf genes and additional ORFs are being identified. All of the above processes are directly derived from the cyanobacterial ancestor of plastids (only a few genes/functions were acquired by lateral transfer). The pattern of genes and functions represented by ptDNA-encoded genes often does not correspond with phylogenetic affinities (i.e., gene presence/absence is an unreliable phylogenetic marker), as gene migration to the nucleus or complete gene loss has occurred numerous times independently across various eukaryotic lineages.

Table 3.1. Plastid-encoded genes and their productsa

The most reduced ptDNAs of photosynthetically active species are those in dinoflagellates that are organized in minicircles (Howe et al. 2008), encoding a bit more than a dozen identified genes, followed by the apicomplexans Alveolata sp. (CCMP3115; 124 genes) and Chromera velia (112 genes; Janouskovec et al. 2010). At the other side of the spectrum, red algae have the most gene-rich, densely packed pt genomes (with up to ∼254 genes).

Intron counts for ptDNAs are most variable: none in almost all red algae and in plastids of red algal origin (e.g., Douglas and Penny 1999; Glockner et al. 2000; Hagopian et al. 2004; Sanchez Puerta et al. 2005; Oudot-Le Secq et al. 2007), but 26 in the green alga Floydiella terrestris, and contrary to expectations, more than 100 in the red alga Compsopogon caeruleus (B.F.L. unpublished). Introns in pt genomes belong to group I and II, and are sometimes difficult to classify, because distinct secondary structure features are highly derived; i.e. such introns are only detected because coding regions are discontinuous. The most derived introns (group III, some organized in ‘twintrons’) are present in Euglena gracilis and E. longa ptDNAs (Copertino and Hallick 1993; Hallick et al. 1993), and are likely derived from group II introns (Copertino and Hallick 1993). The >100 introns in Compsopogon sp. (staghorn alga) ptDNA are also group II-related, some typical but others barely recognizable (B.F.L., unpublished). Finally, in some instances of group II intron-mediated trans-splicing, exons are located in distant genomic regions, transcribed separately and ligated to give rise to functional mRNAs (e.g., Goldschmidt-Clermont et al. 1991; Rochaix 1996; Rivier et al. 2001; Turmel et al. 2002; Belanger et al. 2006; Brouard et al. 2008, 2010; Jacobs et al. 2010). In conclusion, identification of introns can be difficult and some may be missed, even when applying most sophisticated search algorithms.

Currently, only few tools are available for automated intron recognition plus classification (Eddy 2008; Beck and Lang 2009; Gardner et al. 2009), and ptDNA annotation in general (Wyman et al. 2004; Jansen et al. 2005; Beck and Lang 2010). Tools developed by us (MFannot, RNAweasel; Beck and Lang 2009, 2010), although not fine-tuned for ptDNAs, appear to be most effective and miss only a few genes, small exons, and complex gene structures due to trans-splicing. Identification of structured RNAs is an area that needs improvements, including precise delineation of rRNA gene extremities and intron/exon boundaries. RNAse P RNA can be identified with RNAweasel or MFannot, but search models for tmRNAs and signal recognition particle RNAs remain to be added, together with an update of structural models that allow prediction of the whole range of plastid introns. Given the rapidly increasing number of genome sequences produced by new sequencing technologies, we will have to develop increasingly effective, semi-automated ways of genome annotation and GenBank submission to keep pace with data production.

III. Plastids Derived from Primary Endosymbiosis with Cyanobacteria

Plantae is a potentially monophyletic assemblage of photosynthetic (and some secondarily non-photosynthetic) lineages with primary plastids, i.e. derived directly from an endosymbiotic cyanobacterium. This large and diverse group is divided into the glaucophytes, rhodophytes (red algae) and Viridiplantae (green algae and land plants). To date, plastid genomes are available for only two glaucophytes and seven red algae (two of which unpublished; B.F.L.), but a large and rapidly growing number of green algae. The reason for this bias may be related to the difficulty of growing sufficient quantities of cell material for red and glaucophyte algae, a difficulty that no longer exists with the new sequencing technologies that require only small quantities of total DNA.

A. Rhodophyta

Rhodophyta is a morphologically diverse group with several thousand described species, both unicellular and multicellular ones. Red algal cells are characterized by the lack of centrioles and a flagellar apparatus, and the presence of phycoerythrin-containing plastids with unstacked thylakoids. Resolution of phylogenetic relationships among red algal lineages is currently limited by taxon and gene sampling (e.g., Le Gall and Saunders 2007; Verbruggen et al. 2010) and references therein), and may also be due to unequal rates of sequence evolution among red algae.

Complete plastid genomes are available from only five species, These include the multicellular taxa Porphyra purpurea and Porphyra yeozensis (Bangiales; Reith and Munholland 1995), the unicellular Cyanidales Cyanidioschyzon merolae (Ohta et al. 2003) and Cyanidium caldarium (Glockner et al. 2000), and the florideophycean Gracilaria tenuistipitata (Hagopian et al. 2004). Two additional ptDNAs are currently being sequenced in our laboratory (Stylonema alsidii, UTEX LB1424 and Compsopogon caeruleus, UTEX LB1553).

The first sequenced red algal ptDNA (P. purpurea; Reith and Munholland 1995) turned out to be more cyanobacterial-like than any other alga, based on features such as gene count, a large tRNA set, genes encoding transcriptional regulators and bacteria-like operons. This conclusion also applies to other red algal pt genomes. Whereas land plant and green algal ptDNAs encode 88–138 genes (Lemieux et al. 2007; Turmel et al. 2007), this number is close to double in red algae (230–254). Many of these genes are unique to red algae or rare in other ptDNAs, and include RNase P RNA (present in all red algal ptDNAs including Cyandium; otherwise only present in a few green plastids including Nephroselmis, Pycnococcus, Monomastix, Ostreo­coccus and in cyanelles of the two glaucophytes; (Shevelev et al. 1995; Turmel et al. 2009a); our own analysis), tmRNA (http://www.indiana.edu/∼tmrna/; Andersen et al. 2006) and signal recognition particle RNA (Andersen et al. 2006). In contrast, genes for components of the NADPH dehydrogenase complex (in ptDNAs of some prasinophyte and most land plant lineages) are absent.

B. Glaucophyta

Glaucophytes (glaucocystophytes) are freshwater algae that are particularly important for understanding the origin and evolution of photosynthesis in eukaryotes. Plastids of these organisms are unique in having retained two cyanobacterial features: a true, bacterial-type peptidoglycan cell wall (Pfanzagl et al. 1996), and carboxysomes – polyhedral micro-compartments involved in CO2 fixation (Kaplan and Reinhold 1999). The presence of these unique features strongly suggests that glaucocystophyte plastids originated directly from a symbiosis with a Cyanobacterium, and there has been a perception that this algal group might therefore have emerged early in the evolution of photosynthetic eukaryotes. However, more recent phylogenetic analyses with broad species sampling and a large number of genes do not support this idea, placing the origin of the glaucophyte plastid close to the divergence point of green and red plastids (see for instance Fig. 3.1).

The only complete plastid genome sequence from glaucophytes is that of Cyanophora paradoxa (Löffelhardt and Bohnert 1994). Recently, we have sequenced most of the Glaucocystis nostochinearum ptDNA (Lang et al. unpublished). Despite their evolutionary distance (see the deep divergence in Fig. 3.1), the two genomes are similar in terms of genome organization and gene content (a potential inverted repeat region remains to be confirmed for Glaucocystis ptDNA). The number of genes in glaucophyte ptDNAs (a total of 191 in Cyanophora, including protein, tRNA and rRNA genes; Cui et al. 2006) is relatively low compared to that of red algae (between 230 and 254). This might seem unexpected when considering that glaucophyte plastids still have a bacterial cell wall and other ‘primitive’ cyanobacterial features. In addition, in phylogenetic analyses with plastid data, glaucophyte branches are amongst the shortest ones, whereas the red algal plastids are among the more rapidly evolving ones. Evidently, gene counts do not correlate with evolutionary rates in this example.

C. Viridiplantae

Viridiplantae (green plants) is a morphologically and ecologically diverse group including the Streptophyta (land plants and their closest green algal relatives, the charophytes) and Chlorophyta (i.e., the rest of the green algae; Lewis and McCourt 2004; Sluiman 1985). Based on flagellar apparatus ultrastructure and features related to cytokinesis, Chlorophyta is further divided into four classes: Prasinophyceae (a paraphyletic group of unicellular species thought to be descendants of the ancestral flagellates from which the main green algal lineages evolved), Trebouxiophyceae, Chlorophyceae and Ulvophyceae (Lewis and McCourt 2004; Mattox and Stewart 1984). Although molecular data support the early divergence of prasinophytes (e.g., Guillou et al. 2004), the branching order of Trebouxiophyceae, Ulvophyceae and Chlorophyceae within Chloro­phyta remains uncertain (see Pombert et al. 2004, 2006 for discussion and references), which is also consistent with our analysis (Fig. 3.1).

To date, 28 green algal plastid genomes (22 from Chlorophyta and 6 from Streptophyta) have been fully sequenced (Table 3.2), and they revealed an unexpected diversity both within and between algal groups. Overall, green algal ptDNAs differ in many respects from the well characterized plastid genomes of land plants (see Chaps. 4, 5). The latter typically share the same quadripartite structure (characterized by the presence of two copies of a large inverted repeat sequence separating a small single-copy and a large single-copy region) and have the same gene partitioning pattern between the two single copies. Their genes are densely packed and most of them are organized in conserved clusters. In contrast, green algal ptDNAs are “hotbeds” for chloroplast genome evolution (Belanger et al. 2006), exhibiting great diversity in genome and gene organization, including loss or inversion of the inverted repeat, gene rearrangements, intergenic expansions, invasion by repeat elements and introns, gene loss, gene expansion and gene fragmentation.

Table 3.2. Features of green algal plastid genomes for which complete genome sequences are available

Prasinophytes

Prasinophytes are primarily marine unicellular algae that show great variation in terms of cell size and shape, flagella number, membrane covering (i.e., with our without scales) and biochemical features (Graham and Wilcox 2000). Seven prasinophyte clades are currently recognized; however, the exact relationships between these lineages and their affiliation with other green algal groups remain unresolved (Marin and Melkonian 2010).

The six currently available plastid genome sequences belong to: (1) Nephroselmis olivacea (Pseudoscourfieldiales) – a flagellate unicellular alga; (2) Pycnococcus provasolli (Pseudoscour­fieldiales, Pycnococcaceae) – a coccoid picoplanktonic alga; (3) Ostreococcus tauri (Mamiellales) – the smallest known eukaryotic organism; (4) Pyramimonas (Pyramimonadales) – a scaly quadriflagellate alga; (5) Monomastix – a scaly flagellate of unknown affiliation; and (6) Pedinomonas minor (Pedinomonadales) – a small naked uniflagellate with no clear affiliation to the other prasinophyte clades probably related to, or ancestral to, Trebouxiophyceae (Turmel et al. 2009b; see also Fig. 3.1). Overall, prasinophyte ptDNAs show extreme diversity in size (an almost 3-fold variation), gene repertoire and genome organization. On the other hand, these genomes are similar in base composition and harbor no or just a few introns (Table 3.2).

Interestingly, both ancestral and derived types of genome organization (relative to the presumed plastid genome in the most recent common ancestor of green plants; Turmel et al. 1999) have been reported among the plastid genomes described in this group. Ancestral types are characterized by large gene complements, ancestral gene clusters and a quadripartite genome structure (i.e., two identical copies of a large inverted repeat (IR), separated by single-copy (SC) regions), whereas derived types have reduced and re-arranged genomes. With 128 conserved genes, the 200.8 Kbp plastid genome of Nephroselmis has the largest gene complement yet reported for a chlorophyte alga and has retained many ancestral gene clusters (Turmel et al. 1999). Its quadripartite architecture resembles that of streptophyte counterparts in displaying (1) unequal SC regions – a large and a small one – that contain highly conserved sets of genes and (2) IR-encoded rRNA operons transcribed towards the small SC region. The ptDNA of Nephroselmis codes for several genes with limited phylogenetic distribution; for instance, ftsI (involved in peptidoglycan synthesis) has not been reported in other ptDNAs, and ndh genes (coding for subunits of the NADH:ubiquinone oxidoreductase) are absent from chlorophyte ptDNAs, but are present in other prasinophytes and land plants. At the other extreme is the plastid genome of Ostreococcus, with 88 genes highly scrambled over 71.6 Kbp, representing the smallest genome with the most reduced gene complement among photosynthetic green plants (Robbens et al. 2007). Both the small size and overall low proportion of intergenic spacers (representing 15% of the genome and varying from 1 to 476 nt length) as well as the presence of three cases of overlapping genes make this genome one of the most compact green plant ptDNAs (Table 3.2). Moreover, in contrast to Nephroselmis, its SC regions – although different in size – have the same number of genes, and the rRNA operons are transcribed away from the SC regions.

Reductions in plastid genome size and gene complement as well as the loss of the inverted repeat took place independently in several other prasinophyte lineages, leading to a variety of distinct genome configurations. For instance, the Pycnococcus plastid genome resembles the Ostreococcus counterpart in being small and highly compact (with two cases of overlapping genes and only ∼11% intergenic regions). However, it lacks the IR, and its gene complement is more similar to that of chlorophycean plastid genomes (Turmel et al. 2009a). On the other hand, the plastid genome of Monomastix has a larger size but a slightly lower number of genes (Table 3.2; Turmel et al. 2009a). The ptDNA of Pyramimonas displays intermediate genome size, compactness and gene repertoire (including six ndh genes present only in Nephroselmis and land plants, and two other genes – rpl22 and ycf65 – not reported in other chlorophytes; Turmel et al. 2009a). Lastly, the ptDNA of Pedinomonas, although very small, compact, and with a low gene count (Table 3.2), has retained the highest degree of ancestral gene linkages among all chlorophyte algae (i.e., linkages that predate the divergence of chlorophytes and streptophytes; Turmel et al. 2009b).

Trebouxiophyceae

Trebouxiophyceae (sensu Friedl 1995) are a group of morphologically heterogeneous algae (unicellular non-flagellated or filamentous) that inhabit mostly soil and freshwaters. Most phycobionts of lichens, ciliates and animals are also included in this class (Booton et al. 1998; Graham and Wilcox 2000; Lewis and McCourt 2004). To date, five plastid genomes from four photosynthetic species (Chlorella vulgaris and Oocystis solitaria – Chlorellales; Parachlorella kessleri and Leptosira terrestris – Ctenocladales) and one non-photosynthetic relative (Helicosporidium sp. – Chlorellales) have been published (Wakasugi et al. 1997; de Cambiaire et al. 2007; Turmel et al. 2009a). In addition, nearly complete ptDNAs are available from Coccomyxa sp C-169 (Coccomyxaceae; GenBank accession number HQ693844), Chlorella ellipsoidea and the colorless Prototheca wickerhamii (Knauf and Hachtel 2002; Yamada 1991). All trebouxiophyte ptDNAs sequenced so far are rather AT-rich, with Helicosporidium and Leptosira being among the most AT-rich green algal genomes (Table 3.2).

Although the plastid genomes from the four fully characterized photosynthetic species have similar gene contents, they vary significantly in size (a twofold variation). Most of this variation is accounted for by size differences in intergenic regions (Table 3.2). Gene order also varies considerably. For instance, the Chlorella plastid genome has retained many of the gene clusters present in streptophytes and prasinophytes. On the other hand, Leptosira shares little similarity in gene order with other plastid genomes and exhibits derived traits reminiscent of evolutionary patterns described for the ulvophyte and chlorophycean lineages (Turmel et al. 2009b).

The IR is missing in both Chlorella and Leptosira pt genomes, which is a feature also shared with the non-photosynthetic Helicospo­ridium (Table 3.2). Nevertheless, it is believed that the last common ancestor of trebouxiophytes possessed a plastid genome with a quadripartite structure (very similar to that of Nephroselmis and streptophytes) and that the IR was lost independently on at least two occasions. These suggestions are based on the finding of IRs in other trebouxiophyte plastid genomes (including that of Chlorella ellipsoidea, which has a large IR with a split rRNA operon; Yamada and Shimaji 1987) and on the presence of an IR remnant in Chlorella vulgaris (de Cambiaire et al. 2007).

The ptDNAs of the non-photosynthetic trebouxiophytes Helicosporidium and Prototheca are both highly reduced in size (partially sequenced; ∼37.5 and 45 Kbp, respectively). Based on its structure and compactness, the Helicosporidium genome is more similar to that described in the non-photosynthetic plastids of apicomplexan parasites. As expected, it lacks all genes for photosynthesis (de Koning and Keeling 2006), but its size reduction is due to both gene loss and reduced non-coding regions, overlapping genes, and the loss of the IR. Notable is the loss of the rRNA operon structure – an event that is thought to have taken place independently in several other lineages (including the trebouxiophyte C. ellipsoidea, several ulvophytes and charopytes as well as other non-photosynthetic algae; de Koning and Keeling 2006).

Chlorophyceae

The Chlorophyceae (sensu Mattox and Stewart 1984) comprise mostly freshwaters species, but several marine species are also known. Species in this group show diverse morphologies – from unicellular (flagellated or coccoid) to complex multicellular (colonial or filamentous) forms – and distinct configurations of their flagellar apparatus. Based on the arrangement of the flagellar basal bodies in their motile cells, two sister clades are generally described in this group. They are commonly referred to as CW (“clockwise”; Chlamydomonadales) and DO (“directly opposed”; Sphaeropleales) groups (Booton et al. 1998). Three additional lineages (Oedogoniales, Chaetopeltidales and Chaetophorales) are basal to these clades, but their divergence order is not well understood (Brouard et al. 2010; Buchheim et al. 2001; Shoup and Lewis 2003; Turmel et al. 2008). To date, seven plastid genomes from representatives of the five main chlorophycean lineages have been completely sequenced: (1) Chlamydomonadales – Chlamydomonas reinhardtii (Maul et al. 2002), Volvox carteri (Smith and Lee 2009, 2010), and Dunaliella salina (Smith et al. 2010); (2) Sphaeropleales – Scenedesmus obliquus (de Cambiaire et al. 2006); (3) Chaetophorales – Stigeoclonium helveticum (Belanger et al. 2006); (4) Oedogoniales – Oedogonium cardiacum (Brouard et al. 2008); and (5) Chaetopeltidales – Floydiella terrestris (Brouard et al. 2010).

Overall, plastid genomes in this group show tremendous variation in terms of genome size, intergenic spacers and intron numbers (Table 3.2). At the same time, the number of genes encoded in these genomes has been kept remarkably constant, within the range of derived prasinophyte pt genomes (Pycnococcus and Monomastix; Table 3.2). In terms of general genome organization, both types – with or without inverted repeats – are found among chlorophycean ptDNAs.

In cases where ptDNAs maintained the quadripartite structure, the organization of the IR and SC regions as well as the gene distribution within these regions differ among lineages. For instance, in Chlamydomonas, the two SC regions have similar sizes and differ radically in both gene content and gene organization from their counterparts in ancestral prasinophyte plastid genomes (Maul et al. 2002). Interestingly, although the Scenedesmus ptDNA shares with its Chlamydomonas counterpart a similar quadripartite structure, the sets of genes in the SC regions are very different between the two species, which indicates that genes were shuffled since the divergence of the DO and CW lineages (de Cambiaire et al. 2006). On the other hand, the Oedogonium plastid genome revealed an atypical structure with an IR significantly larger than in most of its green algal counterparts (with the notable exception of Nephroselmis) and two SC regions of vastly unequal size. Furthermore, the partitioning of genes among the two SC regions is distinctly different from that in Chlamydomonas and Scenedesmus (de Cambiaire et al. 2006).

Consistent with the situation among trebouxiophytes, the IR-lacking ptDNAs of Stigeoclonium and Floydiella also have loosely packed genes and intergenic regions rich in short repeats (Brouard et al. 2010). The most re-arranged chlorophycean plastid genome appears to be that of Stigeoclonium, which completely lacks the ancestral gene partitioning pattern displayed by Nephroselmis and streptophytes, and overall, exhibits the fewest ancestral features among all plastid genomes completely sequenced to date (Belanger et al. 2006).

Chlorophycean ptDNAs differ substantially in the amount of short repeated sequences. At one extreme, there are Oedogonium and Scenedesmus, in which such sequences occupy only 1.3% and 3% of genomes, respectively. At the other extreme, there are the ptDNAs of Chlamydomonas, Stigeoclonium, Volvox, and Floydiella, which are extremely rich in repeated sequences. For instance, short palindromic repeats (potentially acquired via mitochondria-to-plastid transfers involving mobile introns) constitute ∼64% of the Volvox plastid genome. Repeats larger than 30 bp account for half of the Floydiella pt genome (almost three times more than in Chlamydomonas and Stigeoclonium; Brouard et al. 2010; Smith and Lee 2009).

Several atypical features have also been described in this group, including: (1) strong bias in the distribution of genes between the two DNA strands (in Stigeoclonium and Scenedesmus), (2) breakup of protein-coding genes by putatively trans-spliced group II introns (rbcL, psaC, petD, psaA) (in Stigeoclonium and Floydiella); (3) fragmentation of protein-coding genes into distinct open reading frames (contiguous or distant from each other) that are not associated with any introns (rpoC1, rps2, rpoB); (4) the substantial expansion (over fivefold increase) of many protein-coding genes (e.g., cemA, clpP, ftsH, rpoB, rpoC1, rpoC2, rps3, rps4, and ycf1) due to the presence of insertions whose post-transcriptional fate (i.e., excised or not) or biological significance are mostly unknown; (5) intergenic intron-like sequences of unknown origin and function in Dunaliella; and (6) genes (int and dpoB, coding for a tyrosine recombinase and a DNA-dependent DNA polymerase, respectively) potentially acquired via horizontal gene transfer from a mitochondrial genome donor in Oedogonium (Belanger et al. 2006; Brouard et al. 2008, 2010; Smith et al. 2010).

Overall, the plastid genome in this group of algae has experienced major changes, and it displays the lowest degree of ancestral traits relative to other chlorophytes. Some of the most eccentric ptDNAs among all Viridiplantae are also found in this group: over 520 Kbp and over 77% intergenic spacers in Floydiella and Volvox; 73% AT-content in Scenedesmus; and 43 introns in Dunaliella (Table 3.2).

Ulvophyceae

Ulvophyceae are unicellular (including macroscopic forms composed of a single, large multinucleate cell) and multicellular species that are common in rocky intertidal coasts of temperate regions, but secondarily freshwater species are also known. The flagellar basal bodies in their motile cells are arranged in a counterclockwise (CCW) orientation (Floyd and Okelly 1984). To date, complete plastid genome sequences are available from three unicellular ulvophyte species: Oltmannsiellopsis viridis (Oltmannsiellopsidales; Pombert et al. 2006), Pseudendoclonium akinetum (Ulotrichales; Pombert et al. 2005) and Bryopsis hypnoides (Bryopsidales; Lu et al. 2010). The first two species belong to lineages believed to occupy a basal position within the group, whereas the phylogenetic position of the latter is uncertain (Lu et al. 2010 and Fig. 3.1). Partial sequence information is also available from Codium fragile (Ulvales; Manhart et al. 1989) and Caulerpa sertularoides (Bryopsidales; Lehman and Manhart 1997).

Although different in size, the Oltmannsiellopsis and Pseudendoclonium plastid genomes share a similar number of genes and coding density (Table 3.2). The difference in genome size is mostly accounted for by a difference in intron numbers (Table 3.2). The 27 introns in Pseudendoclonium make up for 14.8% of the genome and are thought to have arisen from the intragenomic proliferation of a few founding introns in this lineage (Pombert et al. 2005). Both genomes share a quadripartite structure that deviates from the ancestral type. Nevertheless, the IR sequences in the two genomes differ in size (with that of Oltmannsiellopsis being ∼12 Kbp larger) and gene content (the Pseudendoclonium IR encodes only the rRNA operon, while the Oltmannsiellopsis IR contains five additional genes). Also, Pseudendoclonium shows evidence of inter-organellar lateral transfer (involving some dispersed repeats and one intron) between its plastid and mitochondrial genomes (Pombert et al. 2005).

The plastid genome of Bryopsis differs significantly from those of Oltmannsiellopsis and Pseudendoclonium in several important ways. These include the absence of IRs (also lacking in the two other ulvophytes for which partial information is available; Caulerpa and Codium) and the presence of multimeric forms of ptDNA (including monomer, dimer, trimer, tetramer, and even higher-order multimers), which is a trait that has only been reported in land plants (Lu et al. 2010). Furthermore, this genome is unique in possessing 10 tRNA genes that have not been found in other completely sequenced chlorophyte ptDNAs. Note that while five of them are known in embryophytes the other five have only been reported in some bacterial genomes. Also, its rRNA locus consists of five (rrn23, rrn16, rrn7, rrn5, and rrn3) instead of the usual four coding regions; a similar situation is only found in C. reinhardtii ptDNA (Maul et al. 2002). The number of genes reported for this ptDNA is similar to that of the other two ulvophytes (Table 3.2). However, our preliminary analyses indicate a larger gene complement for this genome; likewise, the number of introns in this genome might prove to be different than listed in Table 3.2. Overall, although ulvophyte ptDNAs feature an atypical quadripartite structure, they maintained a relatively large gene complement and the degree of remodeling is intermediate relative to those seen in their trebouxiophyte and chlorophycean counterparts.

Charophyceae

Charophytes comprise thousands of mainly freshwater algal species exhibiting great variability in morphology and reproduction. They are subdivided into six monophyletic lineages: (1) Mesostigmatales represented by the scaly biflagellate Mesostigma viride (previously regarded as a member of the Prasinophyceae), (2) Chloroky­bales also represented by a single species (the sarcinoid Chlorokybus atmophyticus), (3) Klebsormidiales, (4) Zygnematales, (5) Coleochaetales and (6) Charales. Phylo­genetic analyses indicate Mesostigmatales and Chlorokybales as the earliest-diverging charophycean lineages (forming a distinct clade; Turmel et al. 2007). The branching order among the other groups remains debatable. Charales are the closest relatives of plants in some studies, while other analyses favor that Charales diverged prior to Coleochaetales and Zygnematales (see Turmel et al. 2006 for discussion and references; see also Fig. 3.1).

Complete plastid genome sequences are available from six species belonging to five of the six main charophycean lineages: Mesostigma viride (Mesostigmatales), Chlorokybus atmophyticus (Chlorokybales), Staurastrum punctulatum and Zygnema circumcarinatum (Zygnematales), Chaeto­spha­eridium globosum (Coleochaetales) and Chara vulgaris (Charales; Lemieux et al. 2007; Turmel et al. 2002, 2005, 2006). In addition, the almost complete ptDNA of Klebsormidium flaccidum has been sequenced (Fig. 3.1; BFL unpublished). Overall, charophycean ptDNAs vary in size, gene content, intron content, gene order and include the most gene-rich green plastid genomes (Table 3.2).

Consistent with their basal position among charophytes, the plastid genomes of Mesostigma and Chlorokybus are gene-rich and feature a typical quadripartite structure (Turmel et al. 2007). The two genomes are similar in gene content and gene order, with the notable presence in each of the two genomes of genes that have not been identified in other green algal and land plant pt genomes. Genes are loosely packed in Chlorokybus (the average size of intergenic spacers in Chlorokybus is twice that of Mesostigma), which also reflects in the larger genome size (Table 3.2; Turmel et al. 2007). Nevertheless, relative to the gene order in Nephroselmis and Streptophyta ptDNAs, the Chlorokybus plastid genome is more rearranged than its Mesostigma counterpart. Both genomes are intron-poor, with none in Mesostigma and a single intron in Chlorokybus (Table 3.2).

Relative to Mesostigma and Chlorokybus, the plastid genomes of the two zygnematalean lineages, Staurastrum and Zygnema, have a slightly reduced gene repertoire (Table 3.2) and lack the rRNA-encoding IR typical of other charophytes and streptophytes. Notably, the lack of IR is also shared with Spirogyra maxima – another zygnematalean species for which partial genome information is available (Manhart et al. 1990). Furthermore, both these genomes are loosely packed with genes (due to the expansion of their intergenic spacers), and feature a larger number of introns (which have also expanded in size). However, the two genomes differ extensively from one another in gene order. Also, many intergenic regions in the Staurastrum ptDNA harbour tandem repeats while such sequences are virtually absent in the Zygnema counterpart (Turmel et al. 2005).

On the other hand, the pt genomes of Chaetosphaeridium globosum (Coleo­chaetales) and Chara vulgaris (Charales) exhibit the typical quadripartite structure found in streptophytes, and resemble their land plant counterparts more closely than do other charophycean relatives. Although the two genomes have similar coding capacities (Table 3.2), Chara features four genes (rpl12, trnL(gag), rpl19, and ycf20) that are entirely missing from other charophycean and land plant ptDNAs. Furthermore, despite similarities in genome organization, gene content and intron composition, the two genomes differ in size, gene density and AT content, with the Chara genome representing the largest and most AT-rich streptophyte ptDNA (Table 3.2). Notably, Chara’s increased genome size and AT-content is mainly accounted for by increased AT-rich intergenic spacers and introns, which represent 38.8% and 13.4% of the total genome, respectively (Turmel et al. 2006). Overall, among streptophyte green algae, the ptDNAs of the charophytes Mesostigma and Chlorokybus exhibit the most ancestral features (including the largest gene complement among Viridiplantae; 137–138 genes), while the genomes of Chara and Chaetosphaeridium resemble most their land plant counterparts.

IV. Plastids Acquired via Eukaryote-Eukaryote Endosymbiosis

According to the chromalveolate hypothesis, chlorophyll c-containing plastids originated from a single photosynthetic ancestor, which acquired its plastids only once by secondary endosymbiosis with a red alga (Cavalier-Smith 2002; Keeling 2009, 2010). However, phylogenetic studies suggest a much higher incidence of plastid transfer among eukaryotes, favoring complex evolutionary scenarios involving multiple eukaryote-eukaryote endosymbioses (Sanchez-Puerta et al. 2007; Archibald 2009). The arguably most rigorous analysis in this sense is by Baurain and co-workers (Baurain et al. 2010), who find that monophyly of Cryptophytes, Alveolates, Stramenopiles, and Haptophytes (CASH) is seen neither with mitochondrial nor nuclear sequence data. This means that the very strongly supported phylogenetic relationships in trees constructed with plastid proteins (plastid-encoded as in Fig. 3.1; as well as nucleus-encoded genes of cyanobacterial origin) do not represent the evolution of CASH species but more likely multiple plastid transfers. In some instances, higher-order eukaryote-eukaryote endosymbioses are in fact evident, for instance, the grouping of plastids from the dinoflagellates (Durinskia and Kryptoperidinium; Imanian et al. 2010) with diatoms (Fig. 3.1), and the (weak) association of Alveolata sp. (Apicomplexa) plastids with stramenopiles.

A shared characteristic of ‘second hand’ plastid genomes is their reduced coding capacity relative to that of the plastid donor, which is in most instances a red and only in rare cases a green alga (i.e., in the rhizarian Bigelowiella and relatives, and the euglenozoan Euglena). Plastids of red origin are in general remarkably similar in gene content, despite their turbulent evolutionary past. In the following we will focus on the few main differences, and refer the reader otherwise to the corresponding original publications. It should be noted that gene counts and identifications differ slightly across different papers and database compilations (Cui et al. 2006; O’Brien et al. 2009). Although minor (up to about ten), these differences need to be resolved in the future, by establishing gene identification based on the same criteria. Eventually, all ptDNAs should be reannotated by using the same tools, a task that was unfortunately out of reach for this review.

A. Stramenopila

Stramenopiles is the largest group among CASH protists whose monophyly is well supported (e.g., (Baurain et al. 2010). A sizable portion of stramenopile taxa are non-photosynthetic and without plastid relicts, such as oomycetes (Phytophthora) and bicosoecids (Cafeteria). Whether or not the stramenopile ancestor had plastids, and of which origin, has been the subject of heated debates. The controversy is in part due to over-interpretation of BLAST analyses and lack of resolution in single-gene phylogenies (Stiller et al. 2009) and references therein). The few clear examples pointing to a plastid origin of genes in plastid-less stramenopiles may in fact be explained by transfer of individual genes, rather than endosymbiotic events.

PtDNA sequences are available from bacillariophytes (diatoms), phaeophytes (brown algae), raphidophytes, pelagophytes, xanthophytes, but curiously not from chrysophytes (golden algae).

Diatoms

Bacillariophyta are most diverse (>250 genera), unicellular, silica-walled algae that live either attached to surfaces or are planktonic. Complete ptDNAs have been sequenced from four phylogenetically relatively distant species: Phaeodac­tylum tricornutum, Thalassiosira pseudonana (Oudot-Le Secq et al. 2007), Odontella sinensis (Kowallik et al. 1995) and Fistulifera sp. (Tanaka et al. 2011).

These ptDNAs are relatively uniform, coding for a similar set of 160–170 genes. A putative serine recombinase gene (serC2) is potentially of plasmid origin. It also occurs in the diatom plastids residing in certain dinoflagellates (Imanian et al. 2010).

Phaeophytes

Brown algae are a large group of multicellular organisms (∼250 genera) that occur mostly in marine habitats and grow attached to surfaces. Complete ptDNAs are published from two representatives of distinct orders, Ectocarpus siliculosus and Fucus vesiculosus (Le Corguille et al. 2009). Their gene counts are similar to those of diatoms, with only minor differences.

Raphidophytes

Raphidophytes is a small group (four genera) of flagellated unicellular organisms that occur in both marine and fresh water habitats, and that lack a rigid cell wall. A complete ptDNA sequence is available for two strains of Heterosigma akashiwo (Cattolico et al. 2008). The number of ptDNA-encoded genes (197) is relatively high compared to other algae with plastids from secondary or higher-order endosymbioses, and a putative serine recombinase gene is present as in diatoms. Another unusual ORF codes for a potential G-protein-coupled receptor. Again, the functionality and biological role of these extra genes remain to be demonstrated. Several protein-coding genes and their mRNAs contain large, in-frame inserts, when compared to orthologs in other plastids. These inserts likely represent derived forms of protein introns (inteins; Liu 2000; Gogarten and Hilario 2006) that may have lost their capacity for splicing. In fact, one typical bona fide intein has been identified in the dnaB gene of H. akashiwo ptDNA (Cattolico et al. 2008).

Pelagophytes

This group of algae known for causing algal blooms was originally included in the Chrysophy­ceae, but based on biochemical, physiological and phylogenetic criteria it now forms its own class Pelagophyceae. Complete ptDNAs are available from Aureococcus anophagefferens and Aure­oumbra lagunensis (Ong et al. 2010). The large inverted repeat, otherwise common in other second-hand red plastids is missing, and the two genomes code for only 137 and 141 genes, respectively. About 20 genes that are usually present in stramenopile ptDNAs are absent from both pelagophytes. According to our phylogenetic analysis with plastid data, pelagophytes branch deeply within stramenopiles, but their placement relative to the raphidophytes and xanthophytes is unresolved (Fig. 3.1).

Xanthophytes

The Vaucheria litorea plastid genome has been characterized during the course of a most unusual investigation of the green sea slug Elysia chlorotica. This animal acquires plastids (“kleptoplasts”, see Chap. 2) by ingesting Vaucheria litorea as food, and sequestrating the organelles into the digestive epithelium, where photosynthesis occurs for several months (Rumpho et al. 2008). As it turns out, the plastid genome sequence is typical for stramenopiles (167 genes), and contains the common inverted repeat. According to the authors, some nuclear gene products that have to be imported and are required for plastid function are likely encoded in the animal’s nuclear genome (the algal nucleus is digested during the organelle sequestration process). So far, horizontal gene transfer from the algal genome to the mollusk genome has been demonstrated only for a few nuclear genes. Evidently, nuclear genome sequences of the sea slug and of Vaucheria are required to substantiate this unusual case of horizontal gene transfer (see Chap. 2).

B. Alveolata

Alveolates comprise ciliates, apicomplexans and dinoflagellates, but only the two latter ones contain photosynthetic plastids.

Dinoflagellata

In most dinoflagellates, the ptDNA consists of multiple minicircles that code for a total of about a dozen genes. Here we will only discuss the pt genomes of Kryptoperidinium foliaceum and Peridinium quinquecorne that possess a conventional genome organization, since their ptDNAs derive from a higher-order endosymbiosis with diatoms (Imanian et al. 2010; see also Fig. 3.1). These dinoflagellate ptDNAs possess IR regions similar to those in diatoms, and K. foliaceum has as a putative serine recombinase gene that is characteristic for diatom and raphidophyte ptDNAs. According to the authors’ interpretation (Imanian et al. 2010), the larger size of the K. foliaceum ptDNAs may be due to the insertion of numerous plasmid-derived genes that are dispensable for plastid function.

Apicomplexa

As already mentioned in the introduction, ptDNAs have been sequenced from two photosynthetic relatives of Apicomplexa, Chromera velia and Alveolata sp. (CCMP3115; Janouskovec et al. 2010). The Chromera plastid DNA is very rapidly evolving, and therefore difficult to place in phylogenetic analyses. Its genome is larger than that of Alveolata sp., and translates UGA stop codons as tryptophan as is otherwise common for (in most cases also rapidly evolving) mtDNAs.

The gene count of both ptDNAs is modest (124 and 112 genes, respectively) compared to other second-hand red algal ptDNAs. A gene for a horizontally transferred phosphonopyruvate decarboxylase is inserted into the rRNA operon of Alveolata. According to our phylogenetic analysis (Fig. 3.1), plastids of the two species could have a common origin by vertical descent, yet the positioning of the Chromera ptDNA alone is unresolved, somewhere close to stramenopiles. According to our phylogenetic results with Alveolata, its plastids may stem from a tertiary endosymbiosis with a photosynthetic stramenopile rather than from a unique secondary acquisition, as proposed by the chromalveolate hypothesis. In fact, the authors of the original genome paper state that ‘comparing gene content among alveolate plastids reveals the nearly mutually-exclusive gene sets of apicomplexans and dinoflagellates’, which can be interpreted as further evidence against their common origin.

C. Cercozoa (Rhizaria)

Chlorarachniophytes are a small group of photosynthetic marine flagellates with two recognized genera Chlorarachnion and Bigelowiella. Similar to cryptophytes (for details on cryptomonads see below) they carry a second reduced nucleus (nucleomorph), but of green algal origin (not precisely identified according to our analyses presented in Fig. 3.1 and those published by others; Rogers et al. 2007). A complete ptDNA sequence is available for Bigelowiella natans. The genome has a small size (69.2 Kbp), a highly compact gene organization, and a nearly full complement of photosynthesis-related genes that is similar to those in some of the less gene-rich green algae such as Chlamydomonas (Rogers et al. 2007). Most of the reduction in gene content comes from the loss of ycf and tRNA genes.

D. Cryptomonada

Cryptomonads are unicellular flagellates that are mostly photosynthetic, containing chlorophyll c and phycobilins as photosynthetic pigments. They carry direct physical evidence for eukaryote-eukaryote endosymbiosis in form of a second, remnant eukaryotic nucleus, the ‘nucleomorph’ (for a recent review see Moore and Archibald 2009) of evidently red algal origin. Non-photosynthetic cryptomonad species include Cryptomonas paramecium that contains plastids with a secondarily reduced plastid genome (Donaher et al. 2009), and heterotrophic Goniomonas species that have no plastids. Whether Goniomonas is indeed primarily without plastids (e.g., Keeling et al. 1999) and may thus represent the ancestral group that engulfed an alga with red plastids, remains to be demonstrated with nuclear genome sequence data.

The three completely sequenced cryptomonad pt DNAs are from Guillardia theta (Douglas and Penny 1999), Rhodomonas (Pyrenomonas) salina (Khan et al. 2007) and the non-photosynthetic C. parasiticum (Donaher et al. 2009). The gene count of cryptomonad ptDNAs is >180, more than in green algae but about a quarter less than in red algae. The non-photosynthetic C. parasiticum has about 70 genes less in its plastid genome\, including only a few remaining members of the pet, psa and psb photosynthetic gene families (Donaher et al. 2009). An interesting acquisition in R. salina ptDNA is a gene for the tau/gamma subunit of DNA polymerase III (dnaX) that was likely acquired by lateral gene transfer from a firmicute bacterium (Khan et al. 2007). Whether or not this gene is transcribed, translated, and functional in plastids, remains to be shown.

E. Haptophyta

Haptophytes (prymnesiophytes) are unicellular photosynthetic flagellates (some are colonial), and unlike in cryptophytes, ­heterotrophic taxa are unknown in this clade. Currently, pt genomes of only two species are available, those of Emiliana huxleyi (Sanchez Puerta et al. 2005) and Pavlova lutheri (Burger et al. unpublished). Their genomes have about the same size and gene content (105 Kbp and 155 genes in E. huxleyi), and carry few notable features. Phylogenetic analyses based on pt data sometimes (but not always) unite haptophytes and cryptophytes (Bachvaroff et al. 2005; Keeling 2009; Le Corguille et al. 2009; Fig. 3.1).

F. Euglenids

Euglenids are unicellular flagellates, some of which contain plastids (chlorophyll a and b, β-carotene and xanthophylls), which were acquired via secondary endosymbiosis with a green alga. Euglenid ptDNA sequences are available from two species, Euglena gracilis (Copertino and Hallick 1993; Hallick et al. 1993) and the non-photosynthetic Euglena (Astasia) longa (Knauf and Hachtel 2002). In both instances, plastid genes are loaded with a large number of unusual introns (see above). At only 73 Kbp, the A. longa ptDNA has about half the size of its photosynthetic relatives, with all photosynthesis-related protein genes missing except for rbcL. According to published phylogenetic analyses based on pt sequences (Turmel et al. 2009a), Euglena plastids derive from a relative of the green alga Pyramimonas, which is clearly corroborated by our phylogenetic analysis (Fig. 3.1).

V. Conclusions

The availability of information on plastid genomes has increased over the last few years at an almost disquieting pace, in particular in green algae (as well as in land plants that are not covered in this chapter). Unfortunately, from the standpoint of evolutionary biology, the traditional bias in attention to green algae and plants remains. In particular, we have sequence data from just a handful of red algal pt genomes, a skimpy two from glaucophytes, and similarly low coverage for the numerous groups of algae with second-hand plastids. In fact, we are surprised that sequencing of almost identical flowering plant ptDNAs appears to be more important than sequencing those for which we know so little.

During the course of writing this review, we have come across several issues that touch on data production and analysis. For most pt genome projects underway, sequencing is performed with new technologies, some of which are fraught with systematic error (e.g., pyrosequencing technology suffers from frameshifts in homopolymer stretches among other, less well understood sequencing artifacts). This shortcoming may lead to mistaking genes for pseudogenes with great confidence (based on high coverage of systematic error). In a few cases, we have seen omission of gene annotation that may be due to such frameshifts. Further, as new genome data are pouring in at an unprecedented rate, detailed genome annotation by the end user (typically manual intervention) becomes increasingly challenging. The best solution to both issues, detecting erroneous gene features and potential sequencing error, and keeping up with high standards of genome annotation, is the development of automated genome annotation pipelines. We are aware of only one published tool for organelle genome annotation (DOGMA; (Wyman et al. 2004), and the currently unpublished but freely available tools developed by ourselves (MFannot, RNAweasel; Lang et al. 2007; Beck and Lang 2009, 2010). These are still far from perfect, justifying a continued time investment that should ideally be integrated with ongoing large scale sequencing projects. In this context we noticed that plastid gene identification is relatively straightforward, based on a wide consensus on gene names and functions (which cannot be said for mitochondrial genes). Yet, it seems that renaming ycf genes with now known functions would be timely, so would be a systematic identification and renaming of conserved ORFs as ycf, as long as they are present in distant species. Identification of weakly conserved genes is best achieved by HMM searches (http://hmmer.janelia.org; Eddy 1996, 1998) that are as fast and by far more sensitive and reliable than BLAST.