Introduction

Harmful algal blooms (HABs) are a worldwide phenomenon caused by extreme successions of phytoplankton communities, which have attracted close attention due to serious consequences of HABs on human health, as well as economic loss and social disruption (Hoagland et al. 2002; Jin et al. 2008; Dyson and Huppert 2010; McPartlin et al. 2017; Crosman et al. 2019; Moore et al. 2020). HAB species produce potent toxins such as paralytic shellfish poisoning (PSP) from Alexandrium acatenella, diarrhetic shellfish poisoning (DSP) from Dinophysis acuta, and neurotoxic shellfish poisoning (NSP) from Gymnodinium breve, which are harmful to humans through the food chain (Hallegraeff 1993; Balmer-Hanchey et al. 2003). In 2012 alone, HABs in China resulted in a direct economic loss of more than 2 billion RMB (Chen et al. 2015). As a result of climate change and intensifying human activities, HABs have gained new features with a larger scale, a longer duration, more serious consequences, and more notable global expansion (Yu and Chen 2019). Therefore, studies on HAB species and HABs are becoming urgently important.

Diatoms are a diverse association of unicellular, autotrophic, eukaryotic algae that play an important role in aquatic food webs (Pogoda et al. 2019). Diatoms have the most effective RuBisCO enzyme within autotrophs (Giordano et al. 2005) and work as a vital part in the cycling of CO2 that accounts for ∼ 40% of marine carbon export (Armbrust et al. 2004; Shibl et al. 2020). Diatoms are the most diverse kind of algae with approximately 200,000 species (Mann and Droop 1996; Mann 1999), many of which cause HABs (Qi et al. 2004).

Odontella regia is a HAB species of the genus Odontella, family Odontellaceae, order Eupodiscales, and class Mediophyceae (Hegde et al. 2011). It is a planktonic and bipolar centric diatom that reproduces by sexual reproduction with 64 sperm cells per cell instead of sporulation. Odontella regia is distributed in coastal waters of warm temperate zones and tropics and appears in all seas of China (Yang and Dong 2006) and has been reported to form HABs in estuarine and coastal waters (Badylak 2004; Carstensen et al. 2015). In Jiaozhou Bay, China, O. regia HABs have brought adverse effects on marine fisheries and coastal aquaculture (Han et al. 2004; Wu et al. 2005). Multiple other species in the genus Odontella including Odontella aurita and Odontella sinensis can cause HABs as well (Guo 2004; Ibrahim and Imad 2017). In particular, O. aurita has been found to be rich in eicosapentaenoic acid and has attracted widespread attention (Haimeur et al. 2012; Mimouni et al. 2012; Xia et al. 2013). In contrast, research on O. regia was limited to morphological observation and only a few molecular markers including 18S ribosomal DNA (18S rDNA) gene and the ribulose-1,5-bisphosphate carboxylase (rbcL) gene.

Driven by the recent development of DNA sequencing technologies, mitochondrial genomes of numerous species have been sequenced and fully constructed, uncovering valuable knowledge about gene function and evolutionary trajectories (Smith 2016). Furthermore, comparative analysis of mitochondrial genomes has been applied as an effective method to develop high-resolution molecular markers (Chen et al. 2019). However, our knowledge about the mitogenomes of diatoms, especially those that cause HABs, is very limited. Until now there are only 33 published mitogenomes of species in Bacillariophyta, which consist of three main classes in Mediophyceae, Coscinodiscophyceae, and Bacillariophyceae. Comparative analysis of mitogenomes can help us understand the complex evolutionary relationships of algal species.

In this study, we determined the mitogenome of the HAB species in O. regia. We also constructed the mitogenome of another Bacillariophyta species Lithodesmium undulatum for comparative analysis. Both strains were isolated in the Jiaozhou Bay, which is connected to the Yellow Sea with a small opening. We analyzed morphological features of two species and determine their taxonomic positions using common molecular markers. For accurate comparison, we re-annotated 33 published mitogenomes downloaded from GenBank at NCBI. Those genes that were missing in the GenBank annotation have been re-annotated, and several annotation errors have also been corrected. Of the 35 diatom mitogenomes here, 31 are a complete circular-mapping molecule, whereas the four remaining are incomplete mitogenomes.

Materials and methods

Strain isolation, culturing, and phylogeny-based species characterization

Odontella regia strain CNS00380 and Lithodesmium undulatum strain CNS00316 were isolated from Jiaozhou Bay (120° 10′ 839″ E, 36° 06′ 105″ N; 120° 13′ 971″ E, 36° 04′ 023″ N), respectively. The strains were isolated with the method of single-cell capillary and cultured in L1 medium with 1‰ volume fraction Na2SiO3 with H2O added (Guillard and Hargreaves 1994). The culture temperature was 18–20 °C, and the irradiance was from 27 to 40 μmol photons m−2 s−1 with a photoperiod of 12 h light-12 h dark. Species identification was according to morphological features and phylogenetic analyses based on full-length 18S rDNA and ribulose-1,5-bisphosphate carboxylase (rbcL) genes.

The phylogenetic trees were constructed using MEGA7 (Kumar et al. 2016). Phylogenetic relationships were inferred using the neighbor-joining method (Saitou and Nei 1987). The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches (Felsenstein 1985). The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the maximum composite likelihood method (Tamura et al. 2004) and are in the units of the numbers of base substitutions per site. The analyses involved 41 and 18 nucleotide sequences in the 18S rDNA and rbcL genes, respectively. The codon positions included were 1st + 2nd + 3rd + non-coding. All positions containing gaps and missing data were eliminated. There were a total of 1450 and 1289 positions in the final dataset of 18S rDNA and rbcL gene.

DNA library preparation and whole-genome sequencing

Total DNA was extracted with DNAsecure Plant Kit (Tiangen Biotech, China). The genomic DNA sample was fragmented by sonication (Covaris S220, Covaris, USA) to a size of 350 bp. DNA fragments were then end polished, A-tailed, and ligated with the full-length adapters for Illumina sequencing, followed by PCR (MiniAmp thermal cycler, Thermo Fisher, USA) enrichment using generic adapter P5 and P7 oligos. The PCR products were purified by the AMPure XP system (Beckman Coulter, USA); libraries were analyzed for size distribution by NGS3K/Caliper and quantified by real-time PCR (Qubit 3.0 fluorometer, Invitrogen, USA). Qualified libraries were sequenced on an Illumina platform according to the effective concentration and data volume at Novogene (Beijing, China). Sequencing was conducted using NovaSeq PE150 (Illumina, USA), generating paired-end reads in size of 150 bp. The sequencing results have been submitted to NCBI (BioProject number PRJNA685980).

Construction and annotation of mitochondrial genomes

Raw data was trimmed using Trimmomatic-0.39 with the parameters LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:75 (Bolger et al. 2014). Clean data were assembled using SPAdes v3.14 with default parameters spades/bin/spades.py -t 32 (Bankevich et al. 2012). The mitogenome of Skeletonema marinoi (NC_028615) was used as a query to search for scaffolds corresponding to the mitogenome sequences of strain CNS00380 and strain CNS00316 from the resulting assembly scaffolds using BLAST with default parameters (Johnson et al. 2008), respectively. The mitogenome sequences were validated using BWA, SAMtools, and IGV. Briefly, reads were aligned to the draft mitogenome using BWA (0.7.17) (Li and Durbin 2009). SAMtools (1.10) (Li et al. 2009) was used to extract the alignment results, and IGV (Thorvaldsdottir et al. 2013) was used to inspect the alignments for validation and error correction. Annotations (the accession numbers for O. regia and L. undulatum are MW018491 and MW023083, respectively) were made with MFannot (https://github.com/BFL-lab/Mfannot) and NCBI’s ORF Finder (https://www.ncbi.nlm.nih.gov/orffinder/), completed in NCBI’s Sequin 15.10 (https://www.ncbi.nlm.nih.gov/projects/Sequin/) using the genetic code of Mold, Protozoan, and Coelenterate Mitochondrial; Mycoplasma/Spiroplasma.

Phylogenetic analysis using concatenated mitochondrial protein-coding genes

Thirty-two mitochondrial protein-coding genes (atp6, 8, 9; cob; cox1, 2, 3; nad1-7, 4L, 9, 11; rpl2, 5, 6, 14, 16; rps3, 4, 7, 8, 10, 11, 13, 14, 19; and tatC) from each diatom species were extracted and concatenated for phylogenetic analysis. The amino acid sequences of each of the 32 genes from different diatom mitochondria were individually aligned using MAFFT with default parameters (Katoh and Standley 2013). The regions that were ambiguously aligned in each alignment were deleted using trimAl 1.2rev59 (Capella-Gutierrez et al. 2009) with the parameters gt = 1, and all amino acid sequences were concatenated using Phyutility (Smith and Dunn 2008). The concatenated dataset of amino acid that was 6424 characters was partitioned by gene position. The incongruence length difference test (ILD, also called the partition homogeneity test) was conducted via PAUP4.0 with the following parameters: number of replications = 100; optimality criterion = maximum parsimony (Swofford 2002). The p value (0.01) indicated that combining data would not affect the phylogenetic accuracy. When the ILD detected p values lower than 0.001, the combined data suffered relative to the individual partitions (Cunningham 1997). The evolutionary models and partitioning of the amino acid data were determined using ModelFinder (Kalyaanamoorthy et al. 2017). The phylogenetic tree was constructed with IQ-TREE using default parameters (Trifinopoulos et al. 2016). The ultrafast bootstrap analysis with 1000 replicates of the dataset and approximate Bayes test was performed to estimate statistical reliability (Anisimova et al. 2011; Minh et al. 2013). Phytophthora ramorum and Saprolegnia ferax in Oomycota were used as out-groups.

Synteny analysis of closely related mitochondrial genomes

Synteny analysis of five mitogenomes sequences from class Mediophyceae was carried out with the program Mauve v2.3.1 using progressiveMauve with default parameters (Darling et al. 2010). The comparative illustration of mitochondrial genomes was performed using circos-0.69 (Krzywinski et al. 2009).

Results

Morphological and molecular characterization of the harmful algal species Odontella regia

The diatom strain CNS00380 of Odontella regia was generally rectangular with the middle part of the valve surface being flat or slightly concaved with elongated rodlike protrusions extending from the four corners of the cell and obvious small ridges on the shell surface inside the protrusions with hollow spines on it (Fig. 1a). Such morphological features were in accordance with that of the species Odontella regia (Ashworth et al. 2013). In the full-length 18S rDNA-based phylogenetic tree, the strain CNS00380 clustered with KC309502.1 Odontella regia (Ashworth et al. 2013) and HQ912564.1 Odontella sinensis (Theriot et al. 2010) with PID 100% and 99.94%, respectively (Fig. 1c). Phylogenetic analysis of ribulose-1,5-bisphosphate carboxylase (rbcL) genes suggested that the rbcL gene of CNS00380 clustered well with KC309576.1 Odontella regia (Ashworth et al. 2013) with PID 100% in a monophyletic clade (Fig. 1d). The combined morphological and molecular features of CNS00380 confirmed that this strain was Odontella regia.

Fig. 1
figure 1

Representative micrographs of diatom species studied in this project. a Odontella regia (strain CNS00380). b Lithodesmium undulatum (strain CNS00316). c Phylogenetic analysis based on 18S ribosomal DNA (18S rDNA) gene. d Phylogenetic analysis based on ribulose-1,5-bisphosphate carboxylase (rbcL) gene. Numbers at the branches represent bootstrap values. Branch lengths are proportional to the genetic distances, which are indicated by the scale bar

The strain Lithodesmium undulatum CNS00316 was short columnar with a quadrangular valve surface, with adjacent cells connected into groups by the membranous valve membrane (Fig. 1b), and was consistent with that of L. undulatum (Karp-Boss et al. 2014). This taxonomical annotation was also confirmed by phylogenetic analysis using the full-length 18S rDNA and rbcL sequences as molecular markers (Fig. 1c, d).

General characteristics of mitochondrial genomes

Diatom mitogenomes vary widely in size, ranging from 32,777 bp in Melosira undulata to 103,605 bp in Halamphora calidilacuna (Pogoda et al. 2019). AT contents vary from 65.0% in Phaeodactylum tricornutum to 78.4% in Melosira undulata (Oudot-Le Secq and Green 2011; Pogoda et al. 2019) (Table 1). The number of tRNA genes ranged from 22 to 28. The number of introns ranged from 0 to 20. Start codons of genes were usually ATG. The start codons of atp8 in different diatom mitogenomes could be different, including ATT, ATA, and TTG (Table 2). The complete mitochondrial genomes of O. regia and L. undulatum were 37,057 bp (Fig. 2a) and 37,617 bp (Fig. 2b) in size, respectively. The AT content of O. regia and L. undulatum was 73.4% and 75.3%, respectively, which were relatively higher than those of most of other diatom mitogenomes such as Skeletonema marinoi (70.3%) (An et al. 2017) and Thalassiosira pseudonana (69.9%) (Armbrust et al. 2004) (Table 1).

Table 1 Genome features of 35 mitogenomes from the phylum Bacillariophyta
Table 2 Mitochondrial gene content of 35 mitogenomes from the phylum Bacillariophyta
Fig. 2
figure 2

Complete mitogenomes of a Odontella regia (strain CNS00380) and b Lithodesmium undulatum (strain CNS00316). Genes shown on the inside of the map are transcribed in the clockwise direction, whereas those on the outside of the map are transcribed counterclockwise. The assignment of genes into different functional groups is indicated by colors. The ring of bar graphs on the inner circle shows the GC content in dark gray

Typical diatom mitogenomes contain a set of 35 core genes, including 33 protein-coding genes (PCGs) and two non-coding rRNA genes (rns and rnl) that are completely conserved throughout the taxa (Table 2). The genomes of O. regia and L. undulatum contained 24 and 25 tRNA genes, respectively. The start codons of atp8 in O. regia and L. undulatum were ATG and ATT, respectively. The rn5 gene, which is missing from many other diatoms (Guillory et al. 2018), was present in both O. regia and L. undulatum. The nad11 split coding region is present in many diatoms, which makes the nad11 gene undergo fission into two separate submits with its own start and stop codons respectively and corresponds to the iron-sulfur binding (nad11a) and molybdopterin-binding domains (nad11b) (Imanian et al. 2012; Guillory et al. 2018). However, this gene was absent from both O. regia and L. undulatum. The loss of rps2 is deep and ancient within angiosperms (Adams and Palme 2003). Even though the loss of rps2 is uncommon in diatoms, it was absent from both O. regia and L. undulatum.

Phylogenetic analysis of O. regia and other diatom species

To explore the evolutionary relationship between O. regia and other diatom species, we constructed a phylogenetic tree using 32 mitochondrial protein-coding genes shared by many species of Bacillariophyta and Oomycota using the maximum likelihood method with amino acid sequences (Fig. 3). The phylogenetic tree revealed three distinct classes, Bacillariophyceae, Mediophyceae, and Coscinodiscophyceae. Our results indicated that Bacillariophyceae was non-monophyletic (Fig. 3), which is inconsistent with a previous report reporting a monophyletic relationship based on 13 conserved mitochondrial protein-coding genes (Pogoda et al. 2019). The difference may be caused by the different numbers of genes used for constructing phylogenetic trees. Our results indicated that Mediophyceae was a sister group of Bacillariophyceae, which is consistent with the result based on the amino acid dataset of 34 mitochondrial PCGs (An et al. 2017). Odontella regia and Toxarium undulatum, which are both Mediophyceae, each formed an independent clade (Fig. 3). The close relationship between L. undulatum and Thalassiosira pseudonana was consistent with a previous report that Thalassiosirales and Lithodesmiales (which included Lithodesmium) were sisters (Williams 2007). Thalassiosira pseudonana was closely related to Skeletonema marinoi, as reported previously (An et al. 2017). Previous phylogenetic analyses using amino acid sequences of 30 mitochondrial PCGs suggested that Melosira undulata and T. pseudonana were sister species (Liu et al. 2019). Such discrepancy may be caused by different number of genes used in constructing phylogenetic trees and by the lack of a sufficiently large number of representative mitogenomes for phylogenetic analysis in Bacillariophyta.

Fig. 3
figure 3

Maximum likelihood (ML) phylogenetic tree based on concatenated amino acid sequences of 32 mitochondrial protein-coding genes (atp6, 8, 9; cob; cox1, 2, 3; nad1-7, 4L, 9, 11; rpl2, 5, 6, 14, 16; rps3, 4, 7, 8, 10, 11, 13, 14, 19; and tatC). Mitochondrial protein-coding genes of Phytophthora ramorum and Saprolegnia ferax were used as out-group taxa. Numbers on the left and right sides at the branches represent Bayesian posterior probabilities and bootstrap values. Branch lengths were proportional to the amount of sequence change, which are indicated by the scale bar below the trees

Synteny analysis revealed extensive rearrangement events

The mitogenomes of closely related species L. undulatum, T. pseudonana, S. marinoi, T. undulatum, and O. regia exhibited a high level of gene rearrangement (Fig. 4a), which was consistent with previous studies showing that mitogenomes have a variety of rearrangements in class Bacillariophyceae (Pogoda et al. 2019). Lithodesmium undulatum and O. regia displayed a new architecture that was different from other lineages in Mediophyceae by a series of gene translocation and inversion events.

Fig. 4
figure 4

a Synteny relationships among five mitogenomes based on Mauve analysis. Rectangular blocks of the same color indicate collinear regions. b Mitogenome gene arrangements of five diatom species. Blocks with the same color represent the same type of genes

The arrangements of five mitogenomes are shown in Fig. 4b. Notably, all genes of O. regia were located on a single strand, which is different from the other four mitogenomes that encode genes on both strands. This gene arrangement in O. regia was similar to that in Trachydiscus minutus, Nannochloropsis oceanica, and Nannochloropsis oculata, in which all mitochondrial genes are located on single strands except the gene tatC (Wei et al. 2013; Starkenburg et al. 2014; Sevcikova et al. 2016).

Although gene arrangements varied, many conserved multi-gene blocks can be identified. For example, the arrangement of tatC and nad1 genes was shared by the mitogenomes of Bacillariophyceae, Eustigmatophyceae, Raphidophyceae, Phaeophyceae, and Chrysophyceae (Synura synuroidea) (Oudot-Le Secq et al. 2001; Goer and Olsen 2006; Oudot-Le Secq and Green 2011; Burger and Nedelcu 2012; Liu et al. 2014), and tatC-nad1 was shared by four mitogenomes except O. regia because of the translocation of nad1 gene in this study. The conserved back-to-back arrangement of trnH-rnl-rns-trnM was present at all five mitogenomes. Among diatom mitochondrial genomes, rps7 was missing as in Synedra acus and Melosira undulata (Guillory et al. 2018; Pogoda et al. 2019), and rpl5-rpl14-trnR-rps7-rps12 was shared by the four mitogenomes except O. regia because of the translocation trnR gene.

In P. tricornutum, the two genes nad9 and rps14 have been found to be fused by an in-frame insertion and are cotranscribed (Oudot-Le Secq and Green 2011). However, these two genes were not adjacent in O. regia. In the mitogenomes of L. undulatum, T. pseudonana, S. marinoi, and T. undulatum, these two genes were adjacent but were not fused together.

The gene blocks rps8-rpl6-rps2-rps4 and rpl2-rps19-rps3-rpl16 were conserved in previously reported algae including Bacillariophyceae but were interrupted in Eustigmatophyceae (Wei et al. 2013; Sevcikova et al. 2016). In this study, the conserved block rps8-rpl6-rps2-rpl4 was absent from O. regia and L. undulatum because of the missing rps2 gene, which may have been transferred to their nuclear genomes (Ravin et al. 2010). The gene block trnP-trnY-rps11-rpl2-rps19-rps3-rpl16-atp9 was shared by all 5 mitogenomes. Odontella regia possessed the least conserved gene arrangement, suggesting that the evolutionary rate of its mitogenome may be in divergence with other four mitogenomes in Mediophyceae.

The comparative analysis of L. undulatum and T. pseudonana mitochondrial genomes showed that there were eight conserved back-to-back arrangements of genes including trnH-rnl-rns-trnM, rps10-trnF-rps8-rpl6, rps4-trnN, trnL-1-trnL-2-rps12-rps7-rpl14-rpl5-trnG, tatC-nad1, tatA-trnW, trnP-trnY-rps11-rpl2-rps19-rps3-rpl16-atp9-trnK-nad4L-trnD-nad11, and nad7-nad9-rps14 (Fig. 5a).

Fig. 5
figure 5

a The comparative analysis of Lithodesmium undulatum and Thalassiosira pseudonana mitochondrial genomes. b The comparative analysis of Skeletonema marinoi and Thalassiosira pseudonana mitochondrial genomes. The assignment of genes into different functional groups is indicated by colors. The conserved back-to-back arrangement of genes is all shown with the same color

The highest synteny conservation was identified between the mitochondrial genomes of S. marinoi and T. pseudonana, which belonged to two different families Skeletonemataceae and Thalassiosiraceaea, respectively, with rearrangements involving only the five genes cox2, cox3, trnW-2, trnM-3, and trnV (Fig. 5b). This high synteny conservation suggested that the evolutionary distance between S. marinoi and T. pseudonana may be smaller than current positions.

Discussion

In this study we analyzed mitochondrial genome characteristics of the HAB species O. regia and L. undulatum which would provide useful information for understanding the phylogenetic relationships among mitogenomes of diatoms, especially the HAB species. Phylogenetic analysis based on the protein-coding genes of 35 stains diatoms improved our understanding of their complex genetic relationships. Previous studies suggested that increasing the number of taxonomic samples was a suitable way to improve the result of phylogenetic analyses (Pollock et al. 2002). Therefore, further studies to obtain more mitochondrial genomes of diatoms and especially HAB species would be beneficial.

Rearrangements of DNA include events such as inversion, translocation, deletion, and duplication (Pogoda et al. 2019). Synteny comparison of five mitogenomes in Mediophyceae suggested that the gene arrangements were not conserved among different classes. All genes of the O. regia mitochondrial genome were on the same strand, which may represent a more economic transcription mode, as suggested previously in Ulva pertusa (Liu et al. 2017). Obviously, further studies are needed to explore the role of this transcription mode in the formation of O. regia blooms. It was not clear whether the conserved gene blocks, which were shared by all five mitogenomes in species in Mediophyceae, perform by working together. Further studies are needed to help us understand the function of these blocks.

Mitochondrial genomes usually show fast evolutionary rates (Ward et al. 1981; Palmer et al. 2000; Chen et al. 2014), which makes them an appropriate platform for developing molecular markers with high resolution. The availability of the O. regia mitochondrial genome will enable development of molecular markers for tracking the genetic diversity of O. regia, as done recently (Song et al. 2020).

Conclusion

Odontella regia is the first species in the family Odontellaceae to have its complete mitogenome sequenced. Among all sequenced Bacillariophyta mitogenomes, the feature that all mitochondrial genes are located on the same strand has only been found in O. regia so far. The genetic relationship of the O. regia mitogenome is far from other the four mitogenomes in Mediophyceae; meanwhile, the genes of the O. regia mitogenome have experienced the most rearrangements. These results provide insight into the evolution of mitochondrial genomes in Bacillariophyta, especially class Mediophyceae, and help us improve our understanding of the HAB species O. regia. The O. regia mitochondrial genome can be used to develop high-resolution molecular markers for tracking O. regia genetic diversity and O. regia HAB development.