Introduction

Multiple endosymbiotic events led to the current diversity of photosynthetic eukaryotes. The single endosymbiotic uptake of a cyanobacterium by a non-photosynthetic protist, the primary endosymbiosis, gave rise to the plastids of land plants, green and red algae, and glaucophytes (Price et al. 2012; Rodríguez-Ezpeleta et al. 2005). During endosymbiosis, massive amounts of genes in the cyanobacterial endosymbiont genome were either lost or transferred to the host nuclear genome, resulting in the highly reduced organelle genomes in modern-day plastids. The transferred genes encode many essential proteins for plastid functions, and these nucleus-encoded proteins are targeted back into the plastids across the double envelope membrane (Timmis et al. 2004). In contrast to plants, various groups of photosynthetic algae (e.g., chlorarachniophytes, cryptophytes, dinoflagellates, euglenophytes, haptophytes, and heterokonts) as well as a parasitic group (apicomplexans) obtained complex plastids, known as “secondary plastids,” through multiple secondary endosymbioses between a protist and a green or red alga (Archibald 2009; Gould et al. 2008; Ishida 2005; Keeling 2010). In these events, the nuclear genomes of the integrated organisms were generally lost, and many genes encoding plastid proteins were secondarily transferred from the endosymbiont to the host nuclear genome. Unlike in most secondary plastid-bearing algae, a relict endosymbiont nucleus called the nucleomorph exists in chlorarachniophytes and cryptophytes (Archibald and Lane 2009).

Chlorarachniophytes are a small group of marine unicellular algae (the phylum Chlorarachniophyta contains only 14 species in 8 genera), and they possess secondary plastids derived from a green algal endosymbiont (Hirakawa 2014). The secondary plastid is surrounded by four envelope membranes, and the nucleomorph is localized in the periplastidal compartment, which is the space between the inner and outer pair of plastid membranes. The nucleomorphs contain an endosymbiotically derived genome that is extremely reduced and compacted in comparison with green algal genomes. To date, complete nucleomorph genomes have been sequenced for four species of chlorarachniophytes (Gilson et al. 2006; Suzuki et al. 2015; Tanifuji et al. 2014). The nucleomorph genomes encode hundreds of proteins, including the same set of 17 plastid-associated proteins, suggesting that nucleomorphs are essential to sustain the plastids. Although the nucleomorph genomes show similar architectures (e.g., three linear chromosomes, subtelomeric ribosomal DNA operons, and 189 conserved protein-coding genes), they exhibit remarkable variation in size, ranging from 373 to 611 kbp, mainly due to the presence/absence of multiple duplicated genes. It seems likely that chlorarachniophyte nucleomorph genomes underwent most of their reductive evolution prior to the divergence of this group, after which multiple gene duplication events led to increased genome sizes and extensive rearrangements of gene order occurred at the species level (Suzuki et al. 2015).

Chlorarachniophyte plastids contain two different genomes, a nucleomorph and plastid genome, which have existed over the same evolutionary time after the secondary endosymbiosis. It should be interesting to investigate and compare evolutionary histories of these two genomes in chlorarachniophytes. Whereas four nucleomorph genomes have been sequenced so far, complete plastid genome sequences are available for only two chlorarachniophytes, Bigelowiella natans (Rogers et al. 2007) and Lotharella oceanica (Tanifuji et al. 2014). To gain further insight into endosymbiotic genome evolution in chlorarachniophytes, we sequenced the plastid genomes of three species, Gymnochlora stellata, Lotharella vacuolata, and Partenskyella glossopodia. Comparative analyses of five chlorarachniophyte plastid genomes revealed that they were highly conserved in size, gene content, and gene order, although their nucleomorph genomes are divergent in these features. Architectural conservation of these plastid genomes may be related to their high gene density because frequent rearrangements are likely to disrupt the coding sequences. A remarkable finding was the presence of group I and II introns in the plastid genomes of four chlorarachniophytes, but not B. natans, suggesting that the loss of introns occurred in at least one lineage during the reductive evolution of chlorarachniophyte plastid genomes. Furthermore, we performed phylogenetic analyses using multiple plastid genome-encoded proteins, suggesting that chlorarachniophyte plastids are derived from a green algal lineage that was closely related to Bryopsidales in the group of Ulvophyceae.

Materials and methods

DNA extraction and plastid genome sequencing

G. stellata CCMP2053 and P. glossopodia RCC365 were cultivated at 20 °C under white illumination (80‒100 μmol photon/m2) on a 14:10 h light:dark cycle in 250–500 mL flasks containing ESM medium (Kasai et al. 2009) or IMK medium (Wako Pure Chemical Industries, Ltd., Osaka, Japan). The cells were collected by gentle centrifugation from two- to three-week-old cultures. The total DNA was extracted by a standard phenol–chloroform protocol and plastid DNA was isolated by Hoechst dye-cesium chloride density gradient ultracentrifugation at 50,000 rpm for 24 h with a Vti 65.2 rotor (Beckman Coulter, Inc., Brea, CA, USA). Plastid DNA of P. glossopodia was also separated using pulsed-field gel electrophoresis according to the methods described in (Ishida et al. 2011), and the DNA was purified from the gels using a GELase Agarose Gel-Digesting Preparation Kit (Epicentre, Illumina, Inc., Madison, WI, USA). Plastid DNA of G. stellata was Sanger sequenced and assembled at the National Institute of Genetics in Japan. These contigs of G. stellata had an average coverage of 3.1-fold. To fill the gaps between resulting contigs, multiple polymerase chain reactions (PCR) were performed, followed by sequencing with an ABI 3130 Genetic Analyzer. Plastid DNA of P. glossopodia was sequenced by three runs using the 454 GS Junior System (454 Life Sciences, Roche Co., Branford, CT, USA) and one run using the Illumina HiSeq 2000. The resulting 267,551 single-end reads from the GS Junior were assembled using Newbler v.2.5 (454 Life Sciences, Roche Co.) and three small gaps were closed by PCR. These contigs of P. glossopodia had an average coverage of 268.1-fold. A dGTP BigDye Terminator Cycle Sequencing Kit (Applied Biosystems, Life Technologies, Waltham, MA, USA) was used to analyze some PCR fragments that could not be sequenced by a BigDye Terminator v3.1 Kit. The 49,531,305 paired-end reads from the Illumina HiSeq 2000 run were used to correct GS Junior pyrosequencing errors. The plastid genome sequence of Lotharella vacuolata was obtained in the previous sequencing project of its nucleomorph genome (Suzuki et al. 2015). The contigs composed of Sanger reads of L. vacuolata had an average coverage of 4.5-fold.

Gene annotation

Open reading flames (ORFs) longer than 50 nucleotides were manually predicted as protein-coding genes using Artemis 13.2 (Rutherford et al. 2000). Functional annotation of the ORFs was carried out by BLASTX and BLASTP searches (Altschul et al. 1997). Ribosomal RNA (rRNA) genes were predicted using RNAmmer 1.2 (Lagesen et al. 2007) as well as BLASTN searches against the rRNAs of Bigelowiella natans. Transfer RNA (tRNA) genes were predicted using tRNAscan-SE 1.21 (Schattner et al. 2005). Group I and group II introns were predicted using the RNAweasel web server (Lang et al. 2007). The plastid genomes of G. stellata, L. vacuolata, and P. glossopodia are deposited in the GenBank/EMBL/DDBJ databases under the accession numbers, AP014947, AP014949, and AP014948 respectively.

Estimation of rearrangement scenarios among plastid genomes

Possible rearrangement scenarios between two plastid genomes were estimated using UniMoG 1.0 (Hilker et al. 2012) with the double cut and join operation, which considers gene inversions, translocations, fissions, and fusions (Yancopoulos et al. 2005). Plastid genomes were compared within each group of five chlorarachniophytes, six Ulvophyceae species, 34 Trebouxiophyceae species, seven Chlorophyceae species, and three Pedinophyceae species (Online Resource 1). Each dataset consists of conserved plastid genes encoding proteins, rRNAs, and tRNAs, except for duplicated genes in regions of inverted repeats (IRs).

Phylogenic analyses

To construct phylogenetic trees, available plastid genomes in chlorophytes (Ulvophyceae, Trebouxiophyceae, Chlorophyceae, Pedinophyceae, and Prasinophyceae) were collected from the GenBank database (Online Resource 1). Plastid gene sequences of Tetraselmis subcordiformis were identified from transcriptome data deposited in GenBank (accession number GANN00000000.1). Plastid genomes of two species in Streptophyta, Mesostigma viride and Chlorokybus atmophyticus, were used as the outgroup. The final dataset was composed of 55 plastid-encoded proteins, excluding highly divergent genes (e.g., rpl19, rps9, ycf1, and maturase-like), collected from 70 taxa. Their amino acid sequences were aligned using MAFFT 7.164b with the L-INS-i option (Katoh and Toh 2008). Poorly aligned regions were removed manually using MEGA 6.0 (Tamura et al. 2013). The final concatenated sequences consisted of 9,876 amino acid positions. Maximum likelihood (ML) analyses were carried out using RAxML v.8.0.20 (Stamatakis 2014) with the LG+GAMMA+F model that was the best-fit model selected by IQ-TREE multicore version 1.3.2 (Nguyen et al. 2015) using the Bayesian information criterion (BIC). The best-scoring ML tree was determined in multiple searches using 20 distinct randomized maximum-parsimony trees, and statistical support (BP) was evaluated by 500 rapid bootstrap replicates. Bayesian analyses were performed using MrBayes v3.2.6-svn (Ronquist et al. 2012) and the LG+GAMMA+F model. The inference consisted of 1,000,000 generations with sampling every 1,000 generations, starting from a random starting tree and using four Metropolis-coupled Markov chain Monte Carlo (MCMCMC) simulations. Two separate runs were performed, and Bayesian posterior probabilities (BPP) were calculated from the majority rule consensus of the tree sampled after the initial 250 burn-in trees.

Results and discussion

Architecture of plastid genomes in three chlorarachniophytes

We obtained the complete sequences of the circular plastid genomes for Gymnochlora stellata and Partenskyella glossopodia, and the almost complete plastid sequence of Lotharella vacuolata. The sizes of the plastid genomes in G. stellata, P. glossopodia, and L. vacuolata were 67,451, 72,620, and 71,557 bp, respectively (Fig. 1a–c). In the L. vacuolata genome, a short sequence gap between psbE and atpI could not be filled by our PCR-based sequencing analyses, and the corresponding region of Bigelowiella natans could not be sequenced either, presumably due to their secondary structures (Rogers et al. 2007). In the other two plastid genomes, corresponding regions consisted of two pairs of inverted repeats (IRs), likely constructing stem-loop structures (71 and 132 bp in G. stellata and 176 and 206 bp in P. glossopodia). All three plastid genomes had a canonical quadripartite structure consisting of two IRs, dividing the circular genome into a short and a large single-copy (SSC and LSC) region (Fig. 1a–c). Each of the IRs encoded three ribosomal RNA genes (rns, rnl, and rrn5), the same set of 4 tRNAs, and two or four proteins; the IRs of G. stellata and L. vacuolata consisted of psbM and petB, and the P. glossopodia IRs carried psbM, petA, petB, and petD, and a duplicated psbM is pseudogenized in L. vacuolata and P. glossopodia (Fig. 1a–c). The three genomes were predicted to encode the same set of 59 plastid proteins, 6 rRNAs, almost the same number of tRNAs (29 for L. vacuolata and 31 for G. stellata and P. glossopodia), and some genes had duplicated copies in IRs (Fig. 1d; Table 1). Although trnH (GUG) and trnG (GCC) were not found in L. vacuolata, trnH (GUG) is expected to be present in the unsequenced region between psbE and atpI because it was detected in the corresponding region in the other plastid genomes.

Fig. 1
figure 1

Genome map of the plastid genomes of three chlorarachniophytes. Plastid genomes of Gymnochlora stellata (a), Partenskyella glossopodia (b), and Lotharella vacuolata (c). Genes on the outside are transcribed in the clockwise direction, and inner genes are transcribed in the counterclockwise direction. Genes are colored according to their function as follows: photosynthesis (green), transcription/translation (pink), ribosomal/transfer RNAs (blue), and miscellaneous (yellow). Introns are showed by black boxes. Inverted repeats (IR) are indicated by thick lines outside the circle. d Gene conservation and rearrangement of plastid genomes among five chlorarachniophytes. Thick lines indicate IRs, and shaded regions represent the rearranged genes (e.g., insertion/deletion and coding strand switch) among the plastid genomes

Table 1 General features of the plastid genomes of four chlorarachniophytes

The plastid genomes of five chlorarachniophytes lack several genes that conserved in core chlorophytes, e.g., petL, psaM, psbZ, rpl12, rpl32, rps9, infA, ccsA, cemA, chlB, chlL, chlN, and ftsH (ycf2). Some of those homologs (rpl12, rpl32, rps9, infA, and ftsH) are found in the nuclear genome of B. natans (Curtis et al. 2012), which suggested that a part of plastid genes were lost or transferred to the nuclear genome through the secondary endosymbiosis in chlorarachniophytes. Multiple gene losses for three subunits of a light-independent protochlorophyllide reductase (chlB, chlL, and chlN) were also reported in plastid genomes of some land plants (Wicke et al. 2011), implying that chl genes somehow tend to disappear from plastid genomes. The ycf1 genes of chlorarachniophyte plastid genomes are homologous to those of chlorophytes, but they have no clear sequence homology to streptophyte ycf1 genes that currently have been identified to encode a component of translocons at the inner envelop membrane of chloroplasts (Kikuchi et al. 2013). Although function of chlorarachniophyte ycf1 genes remains unknown, it is interesting to note that the ycf1 coding sequences show a large variation in both sequence and length (885–1,695 amino acids) within chlorarachniophytes, and such variation also has been observed among chlorophytes and streptophytes (de Vries et al. 2015).

High conservation of chlorarachniophyte plastid genomes

Genome organization was highly conserved among the plastid genomes of five chlorarachniophytes, including B. natans and Lotharella oceanica. In terms of gene content, almost all genes were shared among the five genomes (Table 1, Online Resource 2). A remarkable difference was the presence/absence of an ORF (maturase-like) that encoded a protein that was roughly similar to bacterial reverse-transcriptase/maturase. No maturase-like genes were found in the B. natans plastid genome, whereas the other genomes had one. The five plastid genomes had slight differences in size, ranging from approximately 69 to 72 kbp. The size differences were mainly due to duplicated genes in IRs and variation in the size of the ycf1 genes in SSC regions (Fig. 1d). The gene order was mostly identical among the five plastid genomes, except for the duplicated genes and a couple of tRNA genes located near IR boundaries (Fig. 1d). The order of petB and petD was inverted between P. glossopodia and L. vacuolata, and coding strand switches were observed in trnS (UGA) among the five genomes (Fig. 1d).

There are extensive rearrangements in plastid genomes in general, even between closely related taxa (Brouard et al. 2011; Leliaert and Lopez-Bautista 2015; Turmel et al. 2015). We estimated rearrangement scenarios with gene translocations, inversions, fissions, and fusions between the plastid genomes of chlorarachniophytes using the double cut and join operation. The estimated number of rearrangement events was 2–8 among five plastid genomes of chlorarachniophytes. We also estimated rearrangement scenarios within chlorophyte groups, and determined that the number of rearrangement events were 22–83 in Chlorophyceae, 38–71 in Ulvophyceae, 1–75 in Trebouxiophyceae, and 4–53 in the group of Pedinophyceae and Chlorellales. Even for closely related species of chlorophytes, Bryopsis hypnoides and Bryopsis plumosa, and Chlorella vulgaris and Chlorella variabilis, the estimated numbers of rearrangements were 42 and 26, respectively. Thus, there were clearly fewer predicted rearrangement events in the chlorarachniophytes than in the chlorophyte groups. This may be explained by the higher gene density of chlorarachniophyte plastid genomes, which apparently increases the risk of gene disruption via frequent rearrangements. The coding regions represented 85.1–87.1 % of the plastid genomes of chlorarachniophytes, and 19.5–81.8 % for those of core chlorophytes. A similar pattern was found in cryptophytes with complex secondary plastids. Based on comparative analyses of plastid genomes in four cryptophytes, there are only a small number of inversion events, and the coding regions account for a high proportion of these genomes, i.e., between 80 and 87 % (Kim et al. 2015) Our findings revealed that chlorarachniophyte plastid genomes were highly conserved in size, gene content, and gene order among species. This suggests that the current architecture of chlorarachniophyte plastid genomes evolved in a common ancestor, and changed very little during the subsequent diversification of chlorarachniophyte species.

Introns in chlorarachniophyte plastid genomes

Based on the in silico prediction by RNAweasel, some putative introns were found in the four plastid genomes of G. stellata, L. vacuolata, L. oceanica, and P. glossopodia. A self-splicing group I intron was predicted in the plastid trnL (UAA) of L. oceanica (212 nucleotides), L. vacuolata (227 nucleotides), and P. glossopodia (187 nucleotides) at identical positions within their tRNA anticodon loops (Fig. 2a). Group I introns have been reported in plastid trnL genes of diverse chlorophytes (Kuhsel et al. 1990; Besendahl et al. 2000), and the positions of trnL introns are conserved among chlorophytes and chlorarachniophytes (Fig. 2a). This suggests that the group I introns of chlorarachniophyte trnL genes were derived from a green algal endosymbiont, and were subsequently lost in the plastid genomes of B. natans and G. stellata during their reductive evolution. We also found that ycf3 and/or psbM of four chlorarachniophytes carry group II introns, which were predicted by their secondary structures, whereas G. stellata lacks the ycf3 intron and B. natans had no introns in either gene (Fig. 2b, c). Intron sizes ranging from 282 to 537 nucleotides were estimated based on alignments of psbM and ycf3 sequences of chlorarachniophytes, including the intron-lacking species (Fig. 2b, c). The intron positions of ycf3 and psbM were conserved among the chlorarachniophytes (Fig. 2b, c). In chlorophyte plastid gnomes, group II introns were detected in ycf3 of Picocystis salinarum (Lemieux et al. 2014), Bryopsis hypnoides (Lü et al. 2011), and Bryopsis plumose (Leliaert and Lopez-Bautista 2015), and in psbM of Oocystis solitaria (Turmel et al. 2009) and Schizomeris leibleinii (Brouard et al. 2011). The positions of ycf3 introns were not conserved between the chlorophytes and the chlorarachniophytes, whereas psbM intron positions were identical between O. solitaria and the chlorarachniophytes. The ycf3 introns of chlorarachniophytes appear to have been present in their common ancestor, and the two species G. stellata and B. natans lost the intron. Introns of chlorarachniophyte psbM might also be inherited from the common ancestor of chlorarachniophytes for the following reasons. First, introns identical to chlorarachniophyte psbM were not found in chlorophytes, other than O. solitaria. Second, the O. solitaria plastid was phylogenetically distinct from chlorarachniophyte plastids (see following section). Last, the psbM intron of O. solitaria included an ORF coding a putative reverse transcriptase (Turmel et al. 2009), whereas no ORFs were detected in the chlorarachniophyte psbM introns.

Fig. 2
figure 2

Intron positions of three plastid genes of chlorarachniophytes. a Schematic image and alignment of plastid trnL (UAA) genes in chlorarachniophytes and chlorophytes. The trnL genes of Bigelowiella natans and Lotharella vacuolata lack a group I intron. b Alignment of 5′ partial sequences of chlorarachniophyte ycf3 genes including a group II intron. c Alignment of 5′ sequences of psbM genes in chlorarachniophytes and Oocystis solitaria, showing the conserved position of group II introns. Bn Bigelowiella natans, Gs Gymnochlora stellata, Lo Lotharella oceanica, Lv Lotharella vacuolata, Pg Partenskyella glossopodia, Te Tydemania expeditiones, Bp Bryopsis plumosa, Da Dicloster acuatus, Os Oocysits solitaria, Sh Stigeoclonium helveticum, Ao Actodesmus obliquus

We found that the plastid genomes of four chlorarachniophytes, G. stellata, L. vacuolata, L. oceanica, and P. glossopodia, possess at least one group II intron, whereas the B. natans plastid genome had no introns. As described above, plastid genomes of the four chlorarachniophytes other than B. natans consisted of an ORF encoding a putative reverse transcriptase/intron maturase. Reverse transcriptase/intron maturase proteins are generally encoded within group II introns, and promote splicing by facilitating the formation of the catalytically active structure of the intron RNA (Lambowitz and Zimmerly 2004). This implies that the plastid genome of B. natans discarded group II introns as well as the splicing-related gene during the reductive evolution.

Origin of chlorarachniophyte plastids

The endosymbiotic origin of the chlorarachniophyte secondary plastids has previously been predicted based on molecular phylogenetic analyses (Ishida et al. 1997; Ishida et al. 1999; Van de Peer et al. 1996; Rogers et al. 2007; Takahashi et al. 2007; Tanifuji et al. 2014). Phylogenetic analyses with particular gene types (e.g., the plastid and nucleomorph SSU rRNA, and a nucleus-encoded plastid-targeted protein) have resulted in different inferred trees indicating that chlorarachniophyte plastids are closely related to Trebouxiophyceae (Van de Peer et al. 1996), Ulvophyceae (Ishida et al. 1997; Ishida et al. 1999), and Tetraselmis (Takahashi et al. 2007). Phylogenetic trees reconstructed with approximately 50 plastid-encoded proteins and/or nucleomorph-encoded proteins suggest that chlorarachniophyte plastids are related to the so-called UTC group including Ulvophyceae, Trebouxiophyceae, and Chlorophyceae (Rogers et al. 2007; Tanifuji et al. 2014), whereas the accurate position of the chlorarachniophyte plastids within the UTC group remains unclear owing to poor taxon sampling. To address this issue, we reconstructed phylogenetic trees using 55 plastid-encoded proteins from 63 operational taxonomic units (OTUs) within Chlorophytes, five chlorarachniophyte OTUs, and two OTUs in Streptophytes as the outgroup (Fig. 3). The trees suggested the robust monophyly of the core chlorophytes (the UTC group, Pedinophyceae, and Chlorodendrophyceae) with strong support (BP = 100, BPP = 1.00), and chlorarachniophyte OTUs were included in this clade (Fig. 3). Although the monophyly of each of the Chlorophyceae and the Pedinophyceae was strongly supported (BP = 100, BPP = 1.00), Trebouxiophyceae was divided into two well-supported clades, and one that consisted of the Chlorellales formed a sister clade with the Pedinophyceae (Fig. 3). The five chlorarachniophyte OTUs formed a robust monophyletic group (BP = 100, BPP = 1.00), and were predicted to be closely related to the Bryopsidales in Ulvophyceae with 84 % bootstrap support (BPP = 1.00).

Fig. 3
figure 3

Maximum likelihood (ML) phylogenic tree of 55 plastid-encoded proteins in chlorarachniophytes and diverse chlorophyte species. The best tree was reconstructed using the concatenated dataset of 9,876 amino acids. The values at nodes represent bootstrap support that are higher than 50 %. Bayesian posterior probabilities (BPP) were calculated by MrBayes and those of >0.5 are shown below each node. Thick lines show BP = 100 and BPP = 1.00. Bar represents 0.2 substitutions per site

Our phylogenetic analyses suggest that chlorarachniophyte plastids are derived from a green algal lineage closely related to Bryopsidales, which is composed of filamentous and branched multinucleate marine algae. A previous phylogenetic study based on nucleus-encoded EF-Tu supported the close relationship between chlorarachniophytes and Bryopsidales (Ishida et al. 1997). Interestingly, the chlorarachniophyte Cryptochlora perforans was isolated from a sample of the filamentous green alga Boodleopsis pusilla in Bryopsidales (Calderon-Saenz and Schnetter 1987), and amoeboid cells of C. perforans penetrated the algal filaments and engulfed part of their contents (Calderon-Saenz and Schnetter 1989). This implies that the secondary plastids of chlorarachniophytes might be acquired by the uptake of a filamentous green alga of Bryopsidales, similar to the feeding behavior of C. perforans. Furthermore, some sea slugs temporarily use the plastids of green algae in Bryopsidales (Clark et al. 1990; de Vries et al. 2014) suggesting that the plastids of this algal group somehow tend to be integrated into diverse organisms.

Conclusion

In this study, we reported three plastid genomes of chlorarachniophytes. Our comparative analyses indicated that the plastid genomes were highly conserved in size, gene content, and gene order among chlorarachniophyte species. The current architecture of chlorarachniophyte plastid genomes was present in a common ancestor and changed very little during the evolution of these species. The extreme conservation of the plastid genomes may be explained by their highly compacted genome structures, which is expected to increase the risk of gene disruption by frequent genomic rearrangements. Additionally, our phylogenetic analyses based on multiple plastid genes suggest that the endosymbiotic origin of chlorarachniophyte plastids is closely related to a green algal lineage of Bryopsidales.