Introduction

One of the significant halophytes found both inland and in the intertidal zone is Suaeda monoica (family Amaranthaceae), a perennial succulent halophytic herb. Typically, it grows in dry shrubland areas or deserts throughout the East and West Coast mangroves in Asia, Europe and North America [1]. Furthermore, it was recorded from hypersaline soils that extend from Syria to South Africa, Arabian Peninsula, Pakistan to India and Sri Lanka [2]. There are about 100 species in the genus Suaeda, five of which are distinguished in many cultures: S. glauca, S. japonica, S. australis, S. maritima and S. malacosperma [3]. The coastal halophyte S. monoica thrives in marine environments and has been extensively studied [4]. The ability of the plant to absorb sodium chloride has also made it helpful in reclaiming salt-affected agricultural lands [5]. In traditional medicine, the leaf of S. monoica is considered to be effective against hepatitis; scientifically, it has been used as an ointment for wound healing and has antiviral properties [6]. Moreover, it was reported to possess antiviral, antimicrobial, antioxidant, wound healing, and phytoremediation activities [7, 8] (Muthazhaga et al. 9). In addition, S. monoica has been reported to be effective in treating rheumatism, paralysis, asthma, and snakebites in Saudi Arabia [10].

The chloroplast is a semi-autonomous organelle in algae, cyanobacteria and plants that performs photosynthesis [11,12,13]. The chloroplast genomes of higher plants generally consist of double-stranded circular DNA with highly conserved structure and gene content, ranging in size from 120 to 170 kb and containing 120 to 130 genes [14, 15]. This genome is generally composed of a small single-copy region (SSC), a large single-copy region (LSC) and two inverted repeat regions (IR) that separate the SSC and LSC [16]. Featuring parthenogenetic inheritance, a small genome, and a low mutation rate [17]. Genetic information obtained from chloroplasts has been widely used to develop DNA barcoding techniques and markers for classifying medicinal plants, species identification, population genetics, genome evolution and phylogenetics [18, 19].

Hence, this study aimed to sequence, annotate, and report the complete chloroplast genome of S. monoica, which had never been reported. Furthermore, the new chloroplast genome was compared with the other available chloroplast genome of Suaeda species, namely S. japonica [3], S. salsa [20], S. glauca [21], S. malacosperma [22] and S. physophora [23] in order to highlight the genetic diversity and evolutionary dynamics within Suaeda genus.

Materials and methods

Sample collection and DNA extraction

The S. monoica plant materials were collected from the Southern corniche (Jeddah, Saudi Arabia) (Fig. 1), a saline sandy area with very few plants (GPS: Latitude 21° 13′ 5.518″ N; Longitude 39° 10′ 32.264″ E). Total genomic DNA was extracted using WizPrep™ gDNA Mini Kit (Cell/Tissue, WIZBIO, Seoul, South Korea) following the kit’s manual. DNA quality was assessed using 1% TBE agarose gels and measured using Quantus™ Fluorometer (Promega, USA) dsDNA Quantification Kit. Extracted gDNA was stored at − 20 °C until further processing.

Fig. 1
figure 1

Map of S. monoica sampling location in Jeddah, Saudi Arabia

Library construction, chloroplast genome assembly and annotation

The DNA library was constructed using a fragmented 350 bp short insert following the standard TruSeq protocol (Illumina, San Diego, California, USA). Illumina HiSeq 4000 was used to sequence the library in pair-end mode with 150 bp length/read (Novogene, China). With high-quality clean reads, de novo assembly was performed using the single-contig approach [24,25,26] and remapping approach was applied with 25 iterations [24]. The S. monoica cp genome assembly and confirmation of coding sequences were performed by using Geneious Prime [27]. In order to annotate the chloroplast genome, GeSeq was used [28]. In addition, the tRNA scan-SE 2.0 search server was used to validate the anticodon sequences and the typical cloverleaf secondary structures of all tRNA [29].

Tadem repeats and codon usage bias analysis

With a minimum repeat size of 30 and Hamming distance, REPuter predicted a long repetitive sequence based on Hamming distance method. The tool of Microsatellite identification (MISA) software was used to detect simple sequence repeats (SSRs) [30]. The codon preference (i.e., relative synonymous codon usage RSCU) of protein sequences encoded by six species of the Suaeda species was counted by MEGA X [31]. A heatmap figure was generated from the codon frequencies using the online tool Heatmapper (www.heatmapper.ca, accessed on the 10th of November 2022).

Chloroplast genomes comparative analysis and phylogenetics

IRscope was used to compare IR/SSC and IR/LSC boundaries and junctions among the six Suaeda species (https://irscope.shinyapps.io/irapp/, accessed on the 12th of November 2022). The mVISTA (https://genome.lbl.gov/vista/mvista/submit.shtml, accessed on the 12th of November 2022) was used to compare the complete chloroplast genome of the six Suaeda species. To explore the phylogenetic relationships between the six Suaeda species, the total chloroplasts, as well as each cp region (IR, SSC, and LSC), were aligned using Mauve genome aligner [32] and used for phylogenetic construction by FastTree V2 [33] following the software default parameters.

Results

Characteristics of S. monoica chloroplast genome

The cp characteristics of S. monoica and the other five Suaeda species had a conventional quadripartite structure characteristic of most land plants (Table 1). The length of the cp genome of S. monoica was approximately 151,789 kb, with 83,404 bp for LSC, 18,007 bp for SSC and 25,189 bp for the IR. The cp genome of S. monoica had 130 genes, including 87 protein-coding genes, 37 tRNA genes, eight rRNA genes, and one pseudogene (Fig. 2). The GC content of the S. monoica cp genome was 36.4%.

Table 1 The basic characteristics of S. monoica cp genome and the other available Suaeda species.
Fig. 2
figure 2

The complete chloroplast genome map of S. monoica with a total length of 151,789 bp. The inner circle represents the GC content and is defined by cp regions (LSC, SSC, IRA and IRB). The outer circle represents the complete cp sequence, with the genes annotated outside the circle to represent forward genes and the genes annotated inside the circle to represent reversed genes. All genes are colored according to their functional groups

SSR and long repeats sequence analysis

Through MISA analysis of numbers, types and spatial distributions of simple sequence repeats, we detected the distribution and differences in SSRs among S. monoica. As with most land plants, mono-nucleotide repeats represent the most abundant SSRs in the plastid genome. The total SSRs loci in cp genomes ranged from 7 to 42 bp, with a total occurrence number of 940 SSRs represented by mono to deca-nucleotide repeats were found. A single deca-nucleotide repeat occurrence was found in the LSC region for the S. monoica cp genome. In contrast, nova (9)-nucleotide repeat was found in all regions except for SSC, where none was found (Table 2). LSC was the most common location for SSRs (59%), followed by SSC (15.4%) and IRs (12.3%). In the current study, we analyzed the repeat sequence of the complete cp genomes of the six Suaeda species and found that the repetition type was similar. However, S. glauca recorded a higher number of hexanucleotides than the others; in contrast, S. physophora recorded a lower number of pentanucleotides than the other Suaeda species (Fig. 3).

Table 2 Repeated sequences in the S. monoica chloroplast genome, including repeat class, repeat abundances and percentage abundance
Fig. 3
figure 3

The total SSR counts in the chloroplast genome s of six Suaeda species. For 6-nt repeat, the highest value is presented by up-arrow and for the 5-nt repeat the lowest is presented by down-arrow

Codon usage bias of Suaeda species

Based on the translation properties of the PCGs, the genes present in S. monoica are encoded by 20 amino acids, where the codons encoded for Leucine (L) were the most frequent (10.5%), followed by Isoleucine (8.7%) in contrast to 1.10% recorded for the codons encode Cysteine (C). Based on codon bias or preference, the RSCU values for 29 codons were more than 1.00. The values ranged between 0.34 for codon CGC to 1.98 for codon UUA. Moreover, the codons with A or U (T) nucleotide at the third codon position were most preferred in the RSCU value of S. monoica (Fig. 4).

Fig. 4
figure 4

RSCU value for codon occurred in the chloroplast genomes of S. monoica. Codon with values above 1.00 is highlighted in orange. The highest RSCU value is recorded for UUA and highlighted in red

When the six Suaeda species were compared, the codons were clustered into two major groups based on their occurrences (frequency) within each genome. One cluster featured very low abundance codons in S. glauca in contrast to all other species. The other cluster featured high abundance codons in S. physophora in contrast to all other species. Both species were clustered apart from the other four, while S. monoica and S. salsa showed similar codon occurrences values approximate to what was observed in S. japonica and S. malacosperma (Fig. 5).

Fig. 5
figure 5

Codon distribution and occurrences among six Suaeda species. Data is transformed and clustered for codons and species

IR expansion and contraction

The six Suaeda species were analyzed for the gene content and synteny around the boundaries of the IR region to the LSC and SSC. IR/LSC and IR/SSC regions have also been analyzed for their expansion and contraction diversities. This results in relatively conserved cp genomes across Suaeda species in terms of gene arrangement, structure and the number of genes affected across regions (Fig. 6). The genes affected by the LSC/IRb (JLB) boundary were rpl22, rps19 and rpl2. In addition, the ycf1 and ndhF were affected by the IRb/SSC (JSB) boundary, and the SSC/IRa boundary (JSA) was positioned within the ycf1 gene. Similarly, rpl2, rps19, trnH and psbA genes were part of the IRa/LSC (JLA) boundary.

Fig. 6
figure 6

Comparisons of LSC, SSC and IRs junctions among the Suaeda species. Colored boxes represent genes, while arrows show the coordinate positions of each gene near the junctions. Abbreviations denote the junction site of the plastid genome JLA (IRa/LSC), JLB (IRb/LSC), JSA (SSC/IRa) and JSB (IRb/SSC)

However, in terms of boundary position, the LSC/IRb cp genome boundary was positioned across the 279 bp rps19 gene, which separated the genes into 114–165 and 116–163 bp for S. physophora and S. glauca, 131–148 for S. monoica and S. salsa and 130–149 bp for S. japonica and S. malacosperma. A similar pattern can be observed for the IRb/SSC boundary, where the ndhF gene overlapped with the ycf1 pseudogene across the boundaries for all Suaeda species, except for S. physophora and S. glauca, while ycf1 was not fully annotated in S. monoica compared to the other species. Another significant difference was observed for the IRa/LSC boundary, where the rps19 gene was crossed in S. monoica. In addition, the SSC/IRa junction extends different positions across the ycf1 pseudogene in the cp genomes of all six species. There were differences in the lengths of the four regions in the six Suaeda species due to variations in the IR/SC boundary region in cp genomes.

Comparative chloroplast genome analysis of Suaeda species

The sequence similarity of S. monoica chloroplast compared to the other five Suaeda species was analyzed using mVISTA, resulting in high sequence similarity between the analyzed chloroplast genomes with a clear divergence of the consistently divergent species S. physophora and S. glauca (Fig. 7). There was less divergence in IR regions compared to SSC and LSC regions; the PCGs showed a high level of similarity (more than 85%). Land plants with IR regions are more conservative than those with LSCs and SSCs. Although the genomic patterns of the CP genomes were similar, nucleotide variations were found in coding and non-coding regions, including introns and intergenic spaces. In particular, the coding regions that showed a high nucleotide variation were accD and ycf1. While the most divergent non-coding regions were the rps16 intron, the trnH-psbA, psaA-pafI, pafI-trnS, pafII-cemA and ndhF-rpl32 intergenic spacers. All were significantly different across all species. Subsequently, based on the hypervariability, the ndhF-rpl32 is the most suitable for developing DNA barcodes and molecular markers to identify and assess genetic variability among Suaeda species.

Fig. 7
figure 7

Comparison of six Suaeda chloroplast genomes using mVISTA. The S. monoica cp genome was used as a reference and compared to other five species (1: S. physophora, 2: S. salsa, 3: S. glauca, 4: S. japonica, and 5: S. malacopserma). The Red shaded area represents high similarity, the grey arrows represent annotated genic regions, and the highlighted blue zones represent exons for intron-containing genes. The genome is presented in three sections, left (1–50 kbp), middle (> 50–100 kbp) and right (> 100–151.789 kbp)

Phylogenetic relationships of Suaeda species

The phylogenetic trees were built based on the complete cp genome and the IR, SSC and LSC, separately (Fig. 8). In each constructed tree topology, high bootstrap support values were detected on most nodes (> 0.7). The unrooted tree based on the complete cp genome showed three clear nodes consistent with the previous results, where S. monoica and S. salsa formed a highly supported monophyletic node (0.84). When compared to the highly supported monophyletic nodes created by S. glauca and S. physophora, and by S. japonica and S. malacosperma. Except for the SSC tree topology, which failed to correctly group the six Suaeda species, the other unrooted trees showed the same tree topology as the complete cp genome. However with a different bootstrap value for the monophyletic node of S. monoica and S. salsa, a lower value (0.76) for the IR region, and a higher value (1.00) for the LSC region-based trees. The S. monoica and S. salsa were found to be the most closely related species within the analyzed Suaeda species. A kinship that is reflected by the complete cp sequence, IR and LSC.

Fig. 8
figure 8

Unrooted molecular phylogenetic tree of Suaeda species based on total chloroplast genome, IR, SSC and LSC regions

Discussion

In this study, Suaeda plants were studied more closely with regard to their genetic relationships and evolutionary characteristics. Based on the presented findings, the complete cp genome of S. monoica was sequenced for the first time using NGS technology and assembled and compared to other available Suaeda species in the NCBI genomic database after splicing and gap filling. The Suaeda species share a similar genomic structure, a similar genomic base GC content and a similar gene composition, indicating a stable genome structure and a low level of evolutionary change. Due to its highly conserved nature and low evolutionary rate, the cp genome is a hotspot for phylogenetic studies [34]; therefore, the entire sequence of the cp genome is a valuable resource for molecular phylogenetics and ecology research [35].

It was reported that the cp genomes in the majority of land plants showed two identical IR regions, with lower nucleotide substitution rates and fewer indels than SSC and LSC regions [36]. Therefore, the cp genomes structure in the genus Suaeda was conservative, as in most land plants. Like the most reported in other land plants, the length variation in the LSC region is relatively higher than IR and SSC regions, suggesting the SSC and IR regions were more conservative [37]. The GC content of Suaeda species was quite similar, as of S. monoica was 36.4%, like the GC content of S. japonica, S. salsa and S. malacosperma and was 36.5% in S. glauca and S. physophora [21,22,23].

A total of 942 SSRs were detected in S. monoica chloroplast genome. Mononucleotides and hexanucleotides were the most abundant repeat type in the plastid genome; similar has been reported in other species [38]. SSRs (simple sequence repeats) are widely used as molecular markers because they are abundant, reproducible, codominantly inherited, uniparentally inherited and relatively conserved, making them ideal for identifying species and assessing genetic variation at both population and individual levels [39]. In addition, most SSR were distributed in LSC and SSC regionsconsistent with the plastid genome. Long repeats sequence was also analyzed in this study.

As a species evolves, it develops adaptive codon usage patterns. Similar codon usage bias indicates similar living environments or close genetic relationships between species [40, 41]. Codon preference refers to the uneven usage of synonymous codons encoding the same amino acid by organisms [34], which evolved through long-term evolution. It has a complex set of synthesis mechanisms [42]. The evolution and phylogeny of land plants have been extensively investigated using codon usage bias analyses [43]. The RSCU value was greater than 1, indicating codon usage bias. Moreover, codons with A or U (T) nucleotide at the third codon position showed higher RSCU values. Several plant species enrich codons with A or U (T) at the third codon position [19].

Cp genomes experience structural variations primarily due to the expansion and contraction of IRs, contributing significantly to genome diversity. Comparatively, the duration of its specific position and interval plays a crucial evolutionary role [34]. In the cp genomes of the Suaeda species, the boundaries of SSC/IR and LSC/IR were found to be different. Despite the presence of more conservation in the LSC/IRb, IRa/LSC and IRb/SSC border regions, variations can be seen in the IRb/IRa border regions, indicating an expansion or contraction of the outer regions [39]. During the contraction and expansion of the IR region, the length of the plastid genome and genes may change [44, 45]. Previous research has shown that the gene extent length of IR/SSC and IR/LSC boundaries is associated with systematic features among plant species [46]. The genes ndhF, ycf1 and psbA vary among Suaeda species in gene length and the distance away from IR/LSC and IR/SSC boundary regions; the LSC region was the most divergent and IR regions were the most conservative [47].

Intergenic spacers are more divergent than introns and protein-coding sequences [16]. However, pseudogenes suffer the same fate as intergenic spacers due to a lack of functional importance, leading to less conservative strains. Pseudogenization was common in the evolution of chloroplast genomes, such as accD, ccsA, ycf1, rps19 and psbB pseudogenes [36, 48, 49]. The most divergent genes among the six Suaeda species were two pseudogenes accD and ycf1, an intron of the rps16 genes, and the intergenic spacer ndhF-rpl32. Regions that can be used to develop molecular markers for species identification and population studies [24].

In our phylogenetic analysis, the monophyly of Suaeda species was well supported, in agreement with our results and consistent with prior research [3, 21,22,23, 54], implying the closely related relationship between S. monoica and S. salsa, which cluster in one branch, consisting of phylogenetic analyses based on cpDNA, IR and LSC [50].

The chloroplast genome of halophytes, including Suaeda monoica, is likely to harbor genes essential for the plant’s adaptation to saline environments. The primary role of the chloroplast, which is photosynthesis, suggests that many genes will be related to this function. Efficient photosynthesis is paramount in halophytes, especially under high salinity conditions, which can restrict CO2 availability due to stomatal closure [51]. Additionally, halophytes like Suaeda monoica might possess genes in their chloroplast genomes associated with osmotic stress responses, facilitating the plant’s survival under high salt concentrations [52]. Furthermore, the synthesis of compatible solutes, which act as osmoprotectants in cellular protection and osmotic adjustment, may be regulated by specific genes within the chloroplast [53]. Oxidative stress is another significant challenge in salty habitats and thus genes responsible for the synthesis of antioxidants and other protective molecules might be present in the chloroplast genome [54]. Protein damage can be prevalent under salt stress; hence, genes related to protein turnover might play a critical role in halophyte chloroplasts [55]. Lastly, ion transport and sequestration are pivotal in halophytes for managing excess salt ions to avoid cellular damage, suggesting the possible presence of genes linked to this process in the chloroplast [56,57,58].

Conclusions

In the current study, the structure of S. monoica cp genome, such as basic features, repeat sequences and codon preferences, have been investigated extensively. Moreover, it was compared and tested for IR/LSC and IR/SSC boundaries, code usage bias and phylogenetic relationship with the other five available cp genomes of Suaeda species. The S. monoica cp genome was quadripartite and consisted of 130 functional genes. In terms of genome size and content, it is similar to other Suaeda species. The structure of the chloroplast genome was highly conservative in the genus Suaeda. Tandem repeats and SSR sequences detected in this study may be used for population genetic analyses. Two pseudogenes, one intron and three intergenic spacers were highly variable regions suitable for developing DNA barcodes and molecular markers to identify and assess genetic variability among Suaeda species. Markers that could be employed in species identification and population genetic studies. The phylogenetic analyses implied that S. monoica was closer to S. salsa than other species. Our study sheds light on cp genomics and the genetic diversity of Suaeda, paving the way for cp genome editing, DNA barcoding, molecular markers, phylogenetics and population studies in the future.