Introduction

Sargassum C.Agardh is a genus in the order Fucales, comprising 360 brown algal species (Mattio and Payri 2011; Guiry and Guiry 2018). Most species are distributed within intertidal and subtidal regions of temperate and tropical oceans, forming important ecological structures known as marine forests that provide food and shelter to a diverse range of invertebrates, fishes, sea turtles, and mammals (Laffoley et al. 2011; Witherington et al. 2012; Duncan 2013; Milledge and Harvey 2017). Some Sargassum species (e.g., Sargassum horneri, Sargassum natans, and Sargassum fluitans) can maintain high growth rates while in a floating state within the pelagic zone, forming harmful macroalgal blooms known as “golden tides” (e.g., Komatsu et al. 2008; Smetacek and Zingone 2013; Komatsu et al. 2014). The reoccurrence of large-scale drifting Sargassum biomass, or golden tides, in the Yellow Sea of China in recent years has caused considerable damage to the local environment and economy (China Xinhua News 2017). The Sargassum biomass mainly consisted of drifting S. horneri thalli, with a small fraction composed of Sargassum confusum thalli (Liu et al. 2018).

Organelle genomes carry important genetic information crucial for comparative genomics and understanding the evolutionary history of brown algae (e.g. Le Corguillé et al. 2009; Yotsukura et al. 2010; Liu et al. 2017a, b). Known brown algal mitochondrial and chloroplast genomes (mtDNAs and cpDNAs) typically consist of single circular molecules. These include an ever-increasing number of mitochondrial (e.g., cox3, cox1, nad6, atp9, nad1, nad4, rns, and rnl) and plastid (e.g., rbcL, rbcLSspacer, psbA, and psaA) DNA markers that have been investigated and used in phylogenetic analyses in brown algae (e.g., Phillips et al. 2005; Lane et al. 2007; Engel et al. 2008; Boo et al. 2011; Silberfeld et al. 2014). Previous work has attempted to unveil some organelle genome sequences in the order Fucales. The size of sequenced mtDNAs in Fucales ranges from 34.6 to 36.4 kb and the cpDNAs from 124.1 to 125.0 kb (Supplementary material: Table S1). These mtDNAs contain a total of 65–67 genes including 35 protein-coding genes (PCGs), 25–26 tRNA genes, three rRNA genes, and two to three conserved open reading frames (ORFs) (Oudot-Le Secq et al. 2006; Liu et al. 2015; Liu and Pang 2016b, c; Liu et al. 2016a, b; Bi and Zhou 2016; Hughey and Gabrielson 2017), whereas the cpDNAs carry a total of 173 genes including 137 PCGs, 28 tRNA genes, six rRNA genes, and two conserved ORFs (Le Corguillé et al. 2009; Liu and Pang 2016a; Yang et al. 2016; Bi et al. 2017; Graf et al. 2017).

Thus far, the complete mitochondrial genome has been sequenced for 12 species of Sargassum, and 3 species have had their plastid genomes sequenced (Supplementary material: Table S1). Sargassum horneri is the first Sargassum species to have its mtDNA and cpDNA sequenced (Liu et al. 2015; Liu and Pang 2016a). Comparison of the fully sequenced mitochondrial and plastid genomes showed that genome size, gene content, and architecture are highly conserved within the Sargassum genus (Liu et al. 2017a, b). A major important advantage of mtDNA and cpDNA over the diploid nuclear genome in brown algae is that organelle genomes are effectively haploid, with no heterozygosity. This suggests that this genetic information could potentially be used as effective molecular markers to track the source of the drifting Sargassum biomass to manage golden tides (Amaral-Zettler et al. 2017; Liu et al. 2017b). Furthermore, the richness of the organelle genomic data can greatly promote our understanding of the evolutionary history and phylogenetics of brown algae (Liu et al. 2017a, b).

In this work, we report the complete mitochondrial and chloroplast genomes of the brown alga S. confusum and performed comparative genomic and phylogenomic analysis in the ecologically important genus Sargassum.

Materials and methods

Sampling and DNA extraction

The drifting adult plants of Sargassum confusum C.Agardh were collected in the Yellow Sea (32° 40′–34° 00′ N, 120° 30′–121° 45′ E) on a cruise on the R/V Sutongyu-01026 from 4 March to 12 March 2017 surveying the abundance of drifting Sargassum horneri biomass. Sargassum confusum was a minor constituent of the golden tide biomass in the Yellow Sea in spring 2017 (Liu et al. 2018). Algal thalli were transported to the laboratory in coolers (5–8 °C) after collection. Frozen tissue from the original algal samples was used for DNA extraction. Algal tissue was ground to fine powder in liquid nitrogen. Total DNA was extracted using a Plant Genomic DNA Kit (Tiangen Biotech, China) according to the manufacturer’s instructions. The quality and concentration of isolated DNA were evaluated using the electrophoresis on a 1.0% agarose gel.

Genome sequencing and assembly

The whole mitochondrial and chloroplast genome sequences of S. confusum were determined using long PCR and primer walking sequencing techniques (Cheng et al. 1994). Six primer sets were used to amplify the complete S. confusum mitochondrial genome into six large fragments and 15 primer sets to amplify the chloroplast genome into 15 fragments according to Liu et al. (2015) and Liu and Pang (2016a), respectively. Long PCRs were performed in 50 μL reaction mixtures containing 10 μL of 5× PrimeSTAR GXL buffer (5 mM Mg2+ plus, Takara, Japan), 4 μL of dNTP mixture (2.5 mM each), 1 μL of each primer (10 μM), 1 μL of PrimeSTAR GXL DNA polymerase (1.25 units μL−1, Takara, Japan), 1 μL of DNA template (approximate 50 ng), and 32 μL of sterile distilled H2O. PCR amplification was carried out on a TC1000-G Thermal Cycler (Scilogex, USA) with an initial denaturation at 94 °C for 3 min, followed by 30–35 cycles of denaturation at 94 °C for 20 s, annealing at 50–52 °C for 50 s, extension at 68 °C for 1 min kb-1, and a final extension at 68 °C for 10 min. PCR products were purified using a QIAquick Gel Extraction Kit (Qiagen, Germany). Sequencing reactions were performed using ABI 3730 XL automated sequencers (Applied Biosystems, USA). The DNA sequences were manually edited and assembled using BioEdit v7.1.9 (Hall 1999) and Geneious 7.1 software (Biomatters, http://www.geneious.com). DNA sequences of the complete mitochondrial and chloroplast genomes were determined by comparison with published mtDNA and cpDNA sequences of Sargassum species. This resulted in the 34,721-bp scaffold for mtDNA and the 124,375-bp scaffold for cpDNA.

Genome annotation and analysis

The locations of the protein-coding genes (PCGs) were determined with DOGMA (Wyman et al. 2004), Open Reading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/orfig.cgi), and Blastx (GenBank). Transfer RNA (tRNAs) genes were identified by reconstructing their cloverleaf structures using the tRNAscan-SE 1.21 software with default parameters (Lowe and Chan 2016). Ribosomal RNA (rRNAs) genes were identified by BLAST searches (Altschul et al. 1997) of the nr database at the National Center for Biotechnology Information (NCBI). The mtDNA and cpDNA of S. confusum have been deposited in GenBank with the accession numbers of MG459430 and MG459429, respectively.

Base composition and codon usage of mtDNAs and cpDNAs were examined by MEGA 7.0 software (Kumar et al. 2016). Sequence datasets composed of newly sequenced genomes as well as those of previously reported data available in Genbank were subjected to concatenated alignments using a ClustalW with MEGA 7.0. So far, only four Sargassum species including S. confusum, S. thunbergii, S. horneri, and S. vachellianum have complete mtDNAs and cpDNAs sequenced (Supplementary material: Table S1). Single-gene sequences (nt and aa) in mtDNAs and cpDNAs as well as genome sequences from these four species were separately aligned and manually adjusted using MEGA 7.0. The identity percentages of mitochondrial and chloroplast gene or genome sequences were evaluated using the BioEdit v7.1.9 software (Hall 1999). Analysis of the number of base substitutions per site from averaging overall sequence pairs was conducted using the maximum composite likelihood model by a bootstrap procedure (1000 replicates) using MEGA 7.0.

Phylogenomic analysis

The phylogenies inferred from whole mitochondrial and plastid genomes in Fucales were constructed with 1000 bootstrap replicates with the maximum likelihood (ML) method. The ML tree was obtained based on the Kimura two-parameter model (Kimura 1980). The datasets were further analyzed by Bayesian inference (BI) using MrBayes v.3.2 (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003). Four Markov chains were run for 1,000,000 generations to allow for adequate time for convergence. Every 1000th generation was saved and the first 100 generations were discarded as burn-in. The remaining trees were used to estimate the 50% majority rule consensus tree and the Bayesian posterior probability values. Fucus species were selected as out-group taxa. Positions containing gaps and missing data were eliminated in DNA sequences.

Results and discussion

Genome features of mtDNA and cpDNA

The circular-mapping mitochondrial genome of S. confusum is 34,721 bp in size and carries 35 PCGs, 25 tRNA genes, three rRNA genes, and two conserved ORFs (Supplementary material: Fig. S1). The tRNA gene situated between nad4 and nad5 was annotated as trnI(uau), which was different from that in Sargassun aquifolium [trnK(uuu)] (Liu et al. 2017b). The 124,375-bp cpDNA of S. confusum possesses conserved canonical quadripartite structures with a 73,552-bp large single-copy (LSC) region separated from the 39,941-bp small single-copy (SSC) region by two 5441-bp inverted repeats (IRs), each of which encodes three rRNA genes and two tRNA genes. The S. confusum cpDNA contains 137 PCGs, 28 tRNA genes, six rRNA genes, and two conserved ORFs (Supplementary material: Fig. S2). Only one intron with the size of 220 bp is detected in the trnL(uaa) gene in cpDNA. The genome size, gene content, and genome architecture in mtDNA and cpDNA of S. confusum bear close resemblance to that of previously reported Sargassum species (Supplementary material: Table S1).

The A+T contents of mtDNA and cpDNA in S. confusum are 63.43 and 69.65%, respectively. The cpDNAs tend to be much richer in A+T than mtDNAs. This is not unique to Sargassum species, with other brown macroalgae exhibiting similar A+T contents (Supplementary material: Table S1). The A+T ranges from 68.81% (in Dictyopteris divaricata) to 71.06% (in Fucus vesiculosus) in cpDNAs (Liu et al. 2017a), and from 62.01% (in Pylaiella littoralis) to 68.01% (in Colpomenia peregrina) in mtDNAs (Liu and Pang 2015). The intergenic spacer content in mtDNA (4.59%) is significantly lower than that in cpDNA (13.80%) (Table 1), indicating a higher level of compactness of mtDNA than that of cpDNA.

Table 1 Comparison of general features of mtDNA and cpDNA in Sargassum confusum

Nearly all the PCGs and ORFs in S. confusum mtDNA and cpDNA have an ATG start codon, with the only exception being the plastid psbF which starts with a GTG codon, a conserved phenomenon within all known Sargassum cpDNAs (Liu and Pang 2016a; Yang et al. 2016; Bi et al. 2017). The TAA stop codon is the most commonly used in both mtDNA and cpDNA (77.70 vs 67.57%), but the codon usage biases are different in TAG and TGA stop codons. The mtDNA shows the same usage frequency for TAG (16.22%) and TGA (16.22%), while the cpDNA has a tendency towards TAG (18.71%) compared with TGA (3.60%) (Table 1).

The cpDNAs are more conserved than mtDNAs

The mtDNA and cpDNA sequence pairs in four Sargassum species including S. confusum, S. thunbergii, S. horneri, and S. vachellianum were compared. These are the only Sargassum species to have complete genomic data from both compartments. The identity values in mtDNAs (88.2–92.6%) were much lower than those in cpDNAs (94.7–97.6%) (Table 2), indicating the mtDNAs in the four Sargassum species are more variable in nucleotide sequences than the cpDNAs. The average evolutionary divergence over mtDNA sequence pairs within four Sargassum species was 10.57 ± 0.13%, which is higher than the value over cpDNA pairs (3.31 ± 0.02%). More base pair substitutions have accumulated in the mtDNAs than those in the cpDNAs. A total of 5991 substitution sites occupied 17.14% of Sargassum mtDNAs, while 7574 substitution sites were detected in cpDNAs and accounted for 6.04%.

Table 2 The comparison of identity percentages (%) in mitochondrial and chloroplast sequence pairs of genome sequences, tRNA genes, rRNA genes, and PCGs (including the conserved ORFs, nt, and aa) within four Sargassum species including S. confusum (Sco), S. thunbergii (Sth), S. horneri (Sho), and S. vachellianum (Sva)

Functional groups of tRNAs and rRNAs displayed higher average identity values than PCGs (including conserved ORFs), which was the shared feature in Sargassum mtDNAs and cpDNAs (Table 2). For each functional gene group, average identity values of cpDNA genes were much higher than those in mtDNA genes, indicating that Sargassum chloroplast genes were more conserved than mitochondrial genes in each group. A total of 15 chloroplast tRNA gene sequences were identical for four Sargassum species, whereas only one mitochondrial tRNA gene, trnP(ugg), was completely identical. In four Sargassum cpDNAs, rrn5 was very conserved with an identity value of 100%, followed by rns (99.48%), and rnl (99.02%). The rrn5 was the most variable rRNA gene in mtDNAs with an average identity value of 89.58%, compared to rnl (95.17%) and rns (95.53%).

Two chloroplast PCGs involved in photosystem II, psbF and psbJ, are the most conserved, with average values of 99.20 and 99.15%, respectively. The two most conserved PCGs in mtDNAs are atp9 (96.88%) and atp8 (95.20%), which are involved in ATP synthase. The average identity values in nearly all chloroplast PCGs are > 92.40%, with the exception of rpl32 (85.33%) (Fig. 1). The rpl32 in S. horneri cpDNA is 51 bps shorter than those present in the other three Sargassum species, due to the premature termination codon TAA. Only four (atp8, atp9, nad4L, and nad3) of mitochondrial PCGs have identity values of > 92.40%. Two mitochondrial PCGs, rpl31 and orf131, have the lowest identity values of 83.83 and 83.25%, respectively (Fig. 1).

Fig. 1
figure 1

The comparison of average identity percentages (%) in single mitochondrial (left) and chloroplast (right) gene sequences (nt) in four Sargassum species. The mtDNA and cpDNA datasets included 65 and 173 genes, respectively

Although the gene content and genome architecture of mtDNAs and cpDNAs were conserved in Sargassum (Liu et al. 2017a, b), we did not know which (mtDNAs or cpDNAs) are more conserved in this lineage before. Mitogenomes were originally thought to be more conserved than chloroplast genomes. In contrast to similar studies on terrestrial plants and dinoflagellates, which showed a more conserved mtDNA than cpDNA (Wolfe et al. 1987; Smith and Keeling 2015), a few species from various algal lineages (e.g., glaucophytes, red algae, green algae, and haptophytes) have shown to have higher mutation rates in the mitochondrion than in the chloroplast (Smith 2015). The brown algae are classified within the group of stramenopiles and contain plastids from red algae as the result of a secondary endosymbiotic event (Keeling 2010; Yang et al. 2012). Most lineages with red algal-derived plastids have an mtDNA/ptDNA mutation rate ratio of > 1, but the ratio changed drastically in different lineages (Smith 2015). The mitochondrial genomes have an approximately three-fold greater mutation rate than chloroplast genomes in Sargassum species, even in brown algae (data not given). This feature could be effectively used by developing novel molecular markers for evolutionary analyses and phylogenetics, even making function in tracking the source of the golden tides.

Phylogenomic analysis

Based on mtDNA and cpDNA datasets, our phylogenomic analysis showed that S. confusum combined with other reported Sargassum species formed the Sargassum clade with high support values (ML/BI = 100%) (Fig. 2). It is worth noting that mitochondrial phylogenomics positioned S. confusum in a subclade with S. muticum, supporting the close relationship between S. muticum and S. confusum. The genetic distance of mtDNAs between these two species was only 2.66% (Supplementary material: Table S2). Previously, we found that the cox3-atp6 spacer regions displayed apparent variation in size among Sargassum species (Liu et al. 2017b). However, this spacer region in S. confusum was 105 bp in size and identical to that in S. muticum. Two species shared 13 tRNA genes with the same nt sequences and four PCGs (atp8, atp9, nad4L, and orf39) with identical aa sequences (Table 3). A total of 60 genes with identity values of > 96% were detected in mitochondrial genomes of S. confusum and S. muticum.

Fig. 2
figure 2

Phylogenomic trees constructed from analyses of the known mitochondrial (a) and chloroplast (b) genomes in Sargassum. The trees were rooted with Fucus species. The numbers at internal nodes (ML/BI) indicated maximum likelihood (ML) bootstrap values, and Bayesian inference (BI) posterior probability values, respectively. The legend below represents the scale for nucleotide substitutions

Table 3 Identity percentages (%) of specific mtDNA sequences from S. confusum and S. muticum

Our genomic data stressed the close relationship between S. confusum and S. muticum, which correlates to morphology and reproductive strategy observations. Sargassum confusum and S. muticum are two common seaweeds found along the Yellow Sea coasts and share some similar morphological characteristics in terms of stem structure, spherical vesicles, and shape of leaves (Tseng 2009). These two species are monoecious, with the same reproductive strategy (monoecism), setting them apart from other Sargassum species which are dioecious (Liu et al. 2013).

Conclusion

The genome size, gene content, and genome architecture of mtDNA and cpDNA were highly conserved for Sargassum species. However, the two organelle genomes displayed an array of distinctly different evolutionary features in terms of A+T content, spacer content, and stop codon usage. A key finding in this study was that the mtDNAs for Sargassum species are more variable in nucleotide sequences than cpDNAs. This result provides concrete evidence for the use of mtDNA and cpDNA sequences for evolutionary analyses and phylogenetics. Furthermore, based on the higher evolution rate in mtDNAs when compared to cpDNAs for Sargassum species, mtDNA is a more effective molecular marker and could aid in tracking the source of the golden tides.