Introduction

Diatoms are a widespread group of eukaryotic microalgae found in freshwater and marine environments, and are the main contributors to global carbon cycling. Furthermore, diatoms play a key role in controlling silicon cycling in the ocean (De La Rocha et al. 1998). Therefore, many studies have examined the ecological and geological significances of diatoms in natural environments. Recent attention has been focused on biofuel production by diatoms via photosynthetic conversion of carbon dioxide as a source of neutral lipids (Courchesne et al. 2009; Rodolfi et al. 2009; Schenk et al. 2008). Diatoms have relatively high growth rates (approximately 1.0 day−1) and do not compete with food crops.

Diatoms are also important for understanding the evolutionary history of the eukaryotic cell. The endosymbiotic hypothesis suggests that plastids are primarily derived from endosymbiotic cyanobacteria. Chloroplasts, the light-harvesting plastids in diatoms, were derived from a red alga by secondary endosymbiogenesis, rather than directly from prokaryotes as occurred in plants. To date, three diatom chloroplast genomes have been fully sequenced, including the centric diatoms Odontella sinensis and Thalassiosira pseudonana, and the pennate diatom Phaeodactylum tricornutum (Kowallik et al. 1995; Oudot-Le Secq et al. 2007). Furthermore, analysis of the complete plastid genome of the dinoflagellate Kryptoperidinium foliaceum, the plastid likely derived from the pennate diatom by a tertiary endosymbiosis, revealed that gene content and genome organization were similar to that of the pennate diatom P. tricornutum. Two exogenous plasmids originated from a pennate diatom, and Cylindrotheca fusiformis might have been incorporated by a lateral gene transfer (Imanian et al. 2010).

Genome sequence analysis of more diverse diatom species is a promising approach for further understanding the evolutionary history of eukaryotic cells. In recent years, several technologies for large-scale DNA sequencing, designated as next-generation sequencing technologies, have been developed and provide faster and more cost-effective sequencing throughput (Hert et al. 2008; Ansorge 2009; Schuster 2008). Pyrosequencing, developed by Roche 454 Life Sciences, has been applied to de novo sequencing of microbial and other small genomes (Hongoh et al. 2008; Argueso et al. 2009).

Here, we describe chloroplast genome analysis of a newly isolated marine pennate diatom, Fistulifera sp. strain JPCC DA0580, by next-generation sequencing technology using the Genome Sequencer FLX System. Fistulifera sp. (formerly Navicula sp.) strain JPCC DA0580 was identified as the highest neutral lipid-producer among 1393 strains by screening a marine microalgal culture collection (Matsumoto et al. 2010). Furthermore, comparative analysis of genome among five species (Fistulifera sp. strain JPCC DA0580, T. pseudonana, P. tricornutum, O. sinensis, and K. foliaceum) was performed. This molecular sequence analysis of a novel diatom provides important insights into diatom evolution.

Materials and methods

DNA extractions

Marine diatom strain JPCC DA0580 was used in this study. Strain JPCC DA0580 was isolated from the junction of the Sumiyo-River and Yakugachi River in Kagoshima, Japan (Matsumoto et al. 2010). On the basis of the phenotypic and genotypic comparison of this isolate with the other strains, JPCC DA0580 was identified as a strain closely related morphologically to Fistulifera saprophila. Strain JPCC DA0580 was cultured in f/2 medium (Guillard 1975) (75 mg NaNO3, 6 mg Na2HPO4·2H2O, 0.5 μg vitamin B12, 0.5 μg biotin, 100 μg Thiamine HCl, 10 mg Na2SiO3·9H2O, 4.4 mg Na2-EDTA, 3.16 mg FeCl3·6H2O, 12 μg CoSO4·5H2O, 21 μg ZnSO4·7H2O, 0.18 mg MnCl2·4H2O, 70 μg CuSO4·5H2O, and 7 μg Na2MoO4·2H2O) dissolved in 1 l of artificial seawater. Cultures were bubbled with sterile air at 20°C under 140 μmol/m2/s illumination for 14 days.

Cells at late logarithmic growth phase (4.5 l of culture) were collected by centrifugation at 10,000×g for 10 min at 4°C. Cell pellets were frozen in liquid nitrogen, resuspended in 15 ml of lysis buffer (50 mM Tris–HCl pH 8.0, 10 mM EDTA pH 8.0, 1% SDS, and 10 mM DTT), and incubated at 50°C for 30 min (Bowler et al. 2008). Genomic DNA was stained by Hoechst33258 dye (Dojindo, Japan) and purified by cesium chloride centrifugation (Armbrust et al. 2004).

Genome sequencing

The chloroplast genome of strain JPCC DA0580 was sequenced using a GS FLX Titanium DNA pyrosequencer (Roche 454 Life Sciences, Branford, CT, USA). The library for the GS FLX Titanium was constructed using the GS FLX Titanium General Library Preparation Kit. The library was amplified onto DNA capture beads by emulsion PCR according to the manufacturer’s instructions. The collected beads were quantified, and the genome was sequenced using a GS Titanium Sequencing Kit XLR70 and Genome Sequencer FLX System.

The nucleotide sequences were assembled using GS De Novo Assembler version 2.3. The chloroplast genome sequence was compared with the reference sequence from the complete chloroplast genome of Phaeodactylum tricornutum (Oudot-Le Secq et al. 2007). Scaffolds containing the chloroplast genome were identified by their similarities to other chloroplast genomes. The remaining gaps were closed by primer walking of gap-spanning PCR products that were identified using linking information from forward and reverse reads with a BigDye Terminator v3.1 Cycle sequencing kit.

Genome annotation and analysis

The genome was examined for open reading frames (ORFs) using Artemis software (Rutherford et al. 2000). ORFs were annotated using BLAST 2.2.23 (Camacho et al. 2009) against the NCBI nr and nt databases. tRNAScan-SE v1.23 (Lowe and Eddy 1997) was used to identify transfer RNAs and BLASTN for identification of ribosomal RNAs. The regions of the inverted repeat were defined using BLASTN against the sequence of Fistulifera sp. Physical maps were generated using GenomeVx (Conant and Wolfe 2008) and further edited manually.

Results and discussion

General features of the chloroplast genome of Fistulifera sp. strain JPCC DA0580

The chloroplast genome of Fistulifera sp. strain JPCC DA0580 (DDBJ Accession: AP011960) was fully sequenced by high-throughput pyrosequencing using GS FLX Titanium. A total of 273,968 sequences were generated covering 114.5 Mb with an average read length of 418 bases. The sequences were assembled into several contigs using GS De Novo Assembler version 2.3 software to cover the entire chloroplast genome. Two major gaps of 10 and 2.7 kbp are located in inverted repeats (IRs). Gaps within IRs were also observed in the sequencing results of the mungbean (Vigna radiata) chloroplast genome by high-throughput pyrosequencing (Tangphatsornruang et al. 2010). The remaining gaps were filled through PCR and Sanger sequencing. The chloroplast genome of Fistulifera sp. strain JPCC DA0580 was 134, 918 bp, containing a large single copy (LSC) region of 62,994 bp and a small single copy (SSC) region of 45,264 bp, divided by two IRs of 13,330 bp (Fig. 1).

Fig. 1
figure 1

A chloroplast genome map of Fistulifera sp. strain JPCC DA0580. Annotated genes are colored according to the functional categories. First circle gene organization, second circle tRNA position, third circle; inverted repeat regions (IR), small single copy (SSC) and large single copy (LSC). Roman numerals (I–IV) mark the locations of four distinct regions in the chloroplast genome of strain JPCC DA0580

The general features of the chloroplast genomes from four diatoms, including strain JPCC DA0580 and one diatom endosymbiont of dinoflagellate are summarized in Table 1. A previous report examined common features of chloroplast genomes from five chromists, including three diatoms (Phaeodactylum tricornutum, Thalassiosira pseudonana, and Odontella sinensis), one haptophyte (Emiliania huxleyi), and one cryptophyte (Guillardia theta) (Oudot-Le Secq et al. 2007). The features included compact size, complete lack of introns, four identical overlapping genes, and small intergenic spacers (88–116 bp) among three diatoms. As expected, the chloroplast genome from strain JPCC DA0580 was also compact, completely lacked introns, and contained four overlapping genes, including sufCsufB overlapped by 1 nt, atpDatpF by 4 nt, rpl4rpl23 by 8 nt, and psbDpsbC by 53 nt. On the other hand, the chloroplast genome of strain JPCC DA0580 contained much larger intergenic spacers with an average length of 179.5 bp, which was the largest of this sequence among the four diatoms compared in this study. Low gene-density regions (I, II, and IV in Fig. 1), which included unidentified genes, and a long intergenic region were identified. These regions showed no similarity to other chloroplast genomes in diatoms. This specific feature was also confirmed in the chloroplast genome of the diatom endosymbiont of dinoflagellate K. foliaceum (Imanian et al. 2010). The average intergenic spacing in K. foliaceum is 246.7 bp.

Table 1 Chloroplast genome summary

Gene content in the chloroplast genome

Table 2 listed gene content in the chloroplast genome of the Fistulifera sp. strain JPCC DA0580. Protein-coding genes were found to be nearly identical to those of P. tricornutum, with several exceptions, such as psbE and psbL genes. These two genes showed the highest similarity with those of O. sinensis. The most common features among the four compared diatoms were three rRNA subunits (rns, rnl, and rrn5) in the IRs and 27 tRNAs (Oudot-Le Secq et al. 2007).

Table 2 Functional classification of Chloroplast-encoded genes in Fistulifera sp. strain JPCC DA0580

Five ORFs lacked significant similarity to any entry in the public domain sequence databanks. These putative genes were named JC032, -033, -034, -081, and -082. Three genes, JC032, -033, and -034, were detected in the IRs and duplicated. The others are located in the low gene density region (I, II, IV in Fig. 1). In addition, the chloroplast genome encoded a putative serine recombinase gene, serC2.

Comparison of sequence identity

The percentages of sequence identity between genes of JPCC DA0580 and other diatoms, raphidophyte, pelagophyte, or diatom endosymbiont of dinoflagellate in chloroplast genomes are listed in Supplementary Tables S-1 and S-2. These comparisons suggest that the gene components of diatom species are almost identical. The closest homologous species of strain JPCC DA0580 is P. tricornutum, and a highly homologous genome was also seen with the symbiotic chloroplast genome of dinoflagellate. On the other hand, the tsf (translational elongation factor) gene was retained in only two diatom’s chloroplasts, strain JPCC DA0580 and P. tricornutum (Supplementary Table S-1). The tsf gene is found on the chloroplast genomes of Guillardia theta and Porphyra purpurea. This result suggested that the loss in T. pseudonana and O. sinensis may be relatively recent. The comparison of amino acid sequences of the fundamental components in photosystem I and II showed high-homology (average 89.3% identities) of each gene component among chloroplasts derived from diatoms (Supplementary Table S-2). On the other hand, compared with the raphidophyte, pelagophyte, the average homology was 77.2%, which suggests that there is a difference in the evolution between the diatoms and the other chromalveolata species. Furthermore, strain JPCC DA0580 possesses a unique sequence difference relative to the consensus in other chloroplasts of diatoms (Supplementary Table S-3). The polymorphism appears focused on the psbL gene, where amino acid 19 is Phe in three diatoms except JPCC DA0580, in which it is Tyr. A previous report on the psbL gene in Synechocystis sp. PCC 6803 suggested that the amino acid substitutions in this position influenced photoautotrophic doubling time (Luo and Eaton-Rye 2008). These polymorphisms might be related to unique characteristics of high growth rate and high-accumulating triglyceride in Fistulifera sp. JPCC DA0580.

Comparison of gene content and order in a region surrounding serC2

Gene content and order in the 4.5-kb region surrounding serC2, including a low gene density region (region II in Fig. 1) and a non-coding region (region III in Fig. 1), were compared with three other diatoms and the dinoflagellate, K. foliaceum (Fig. 2a). In the 4.5-kb region, the gene order between trnR and ycf35 was conserved among all four diatoms, although some deletions or insertions were observed. Regions II and III were located upstream of trnR and downstream of ycf35, respectively. Interestingly, similar tendencies were observed in K. foliaceum; i.e., low gene density regions were also located upstream of psb28, and downstream of ycf35, respectively. Furthermore, the intergenic regions showed similarity between strain JPCC DA0580 and K. foliaceum. A previous report suggested that the non-coding regions in K. foliaceum showed strong similarity to the pCf1/pCf2 plasmid sequences of the marine diatom C. fusiformis, and could be incorporated into the chloroplast genome of K. foliaceum by a lateral gene transfer event (Imanian et al. 2010). It is therefore possible that a similar event could occur in the chloroplast genome of Fistulifera sp. strain JPCC DA0580 chloroplast genome. Region II shares similarity to the pCf1 and pCf2 plasmids of the diatom C. fusiformis (Hildebrand et al. 1992), and includes a putative serine recombinase gene, serC2. The serC2 gene of Fistulifera sp. strain JPCC DA0580 shares 45.2, 39.4, and 42.9% aa identity with K. foliaceum serC2, ORF 218 from pCf1, and ORF 217 from pCF2, respectively. This is the first report that the trace of plasmid transfer was found in the diatom chloroplast genome, and these investigations will probably bring new insights regarding the diatom ancestral state.

Fig. 2
figure 2

Gene order surrounding the serC2 gene. Chloroplast genomes of four diatoms and one diatom endosymbiont of K. foliaceum (a), and two plasmids of diatom C. fusiformis (b)

Conclusions

In this study, we sequenced the chloroplast genome of a novel marine pennate diatom, Fistulifera sp. strain JPCC DA0580, using high-throughput pyrosequencing, and presented new information on diatom chloroplast genome architecture. The general features and gene content are broadly similar to those of three other fully sequenced diatom chloroplast genomes. However, there are some unique sequence variations in genes of photosystem II that differ from the consensus in other diatom chloroplast genomes. Furthermore, several identical regions that are low gene-density and show no similarity to the other diatom chloroplast genomes were identified in the strain JPCC DA0580 chloroplast genome. One identical region retained a putative serine recombinase also identified in the dinoflagellate K. foliaceum chloroplast genome and in two plasmids from another pinnate diatom, C. fusiformis. These results suggest that DNA sequences could be incorporated by lateral gene transfer of exogenous DNA. Currently, only four chloroplast genomes, including strain JPCC DA0580, are available. However, the database will continually grow due to next-generation DNA sequencing, and these advances will contribute novel insights to diatom evolution.