Introduction

The genus Rheum (Polygonaceae) is widely distributed throughout the temperate and subtropical areas of Asian interior and contains about 60 species (Losina-Losinskaya 1936), 38 of which are found in China, especially in Qinghai–Tibetan Plateau (QTP) (Bao and Grabovskaya-Borodina 2003). Rheum species are usually perennial herbs and prefer growing in the mountainous areas at high altitudes ranging from 2000 to 4000 m. Species of this genus evolved diversified morphological traits in response to selection pressures under the harsh distributional environment (Sun et al. 2012). The congeneric relationships of Rheum based on the morphological characters were disputed, and the variations in pollen ornamentation of this genus are inconsistent with the morphological classification (Wang et al. 2005; Yang et al. 2001). Although phylogeny of some Rheum species was inferred based on chloroplast (cp) DNA fragments, the species delimitation of this genus still remains challenging due to the paucity of sufficient genetic markers (Sun et al. 2012; Wang et al. 2005). Therefore, more specific genetic markers are needed to infer the phylogeny of Rheum species and discriminate them from other related species.

Eight sections have been established and acknowledged under Rheum according to the morphological traits (Bao and Grabovskaya-Borodina 2003). Half of Rheum species are endemic to China, and many Rheum species possess important medicinal properties. For example, three species of Sect. Palmata, namely Rheum palmatum Linn., Rheum officinale Baill. and Rheum tanguticum (Maxim. ex Regel) Maxim. ex Balf., are the source plants of rhubarb which is as known as “lord or king of herbs” in China and has been used as an important herb in traditional Chinese medicine for more than 2000 years with the functions including cooling blood, detoxification, removal of blood stasis, removing dampness, abating jaundice, etc. (Chinese Pharmacopoeia Committee 2015). Our previous studies have revealed that the above-mentioned three species could be treated as one species based on genetic and morphological data, and the plasticity of morphological traits is frequently observed among these species due to influences from environmental conditions (Wang et al. 2014, 2018a). Other relatives, such as Rheum franzenbachii Munt., Rheum hotaoense C. Y. Cheng and T. C. Kao, Rheum wittrockii C. E. Lundstr., Rheum racemiferum Maxim., Rheum pumilum Maxim., Rheum acuminatum J. D. Hook. and Thomson and Rheum przewalskyi Los. were closely related to Sect. Palmata, and Rheum franzenbachii, Rheum hotaoense and Rheum wittrockii were usually regarded as adulterants of rhubarb (Xiao 1981). Previous study indicated that Rheum species were originated from a common radiation in the QTP (Sun et al. 2012), which resulted in the complex intraspecific relationships of these species. Therefore, accurate species delimitation of Rheum may not only help to resolve the phylogenetic relationships, but to facilitate reliable genetic authentication of medicinal herb in this genus.

Chloroplast is an essential organelle in plant cell that plays a significant role in the process of photosynthesis and carbon fixation. Chloroplast is presumed to originate from cyanobacteria according to endosymbiosis theory (Raven and Allen 2003) and has its own circular genome which usually encodes 110–130 unique genes (Palmer 1985). The cp genome is uniparentally inherited and generally has a quadripartite structure consisting of two copies of inverted repeat (IR) regions divided by one small single-copy (SSC) region and one large single-copy (LSC) region (Bendich 2004). The cp genome is highly conserved compared to nuclear and mitochondrial genomes in terms of gene structure and composition (Asaf et al. 2017). Therefore, a large amount of cpDNA markers were selected for the utility of phylogeny or DNA barcoding. However, commonly used cpDNA markers may have the limited resolution in species discrimination or phylogenetic analyses for closely related taxa (Dong et al. 2017). The complete cp genomes which contain more hotspot regions with single nucleotide polymorphisms and insertion/deletions (InDels) have been proven to be more informative than cpDNA fragments in inferring evolutionary relationships at different taxonomic levels (Caron et al. 2000; Cho et al. 2015; Wang et al. 2018b; Yang et al. 2016; Yao et al. 2019; Zhou et al. 2018). With the advent of next generation sequencing in recent years, it has become comparatively easy to sequence the complete cp genome of non-model taxa and infer phylogenetic relationships based on whole cp genomes (Guo et al. 2017; Ruhsam et al. 2015; Saarela et al. 2018).

In this study, we characterized the complete cp genomes of eight Rheum species, one Rumex and one Oxyria species. We compared the plastome differences of these species and inferred the phylogeny of Polygonaceae based on the available cp genomes. Our results will be useful for marker development, species discrimination, and the inference of phylogenetic relationships in the genus Rheum.

Materials and methods

Plant materials and DNA extraction

Ten samples, including eight Rheum species, Rumex crispus L. and Oxyria digyna (Linn.) Hill, were collected from Shaanxi, Gansu, Qinghai, Sichuan, Tibet, Yunnan and Shanxi provinces, China (Table S1). Young leaves were put into silica gel for DNA extraction. Voucher specimens were deposited at the herbarium of Xi’an Jiaotong University, Xi’an, China. Total genomic DNA was isolated using a cetyl trimethyl ammonium bromide protocol (Doyle 1987), and the quantity and quality of the extracted DNA was determined by both agarose gel electrophoresis and a NanoDrop 2000 Spectrophotometer.

Plastome sequencing, assembly, annotation and validation

The DNA library with an insert size of 270 bp was prepared according to the description by Zhou et al. (2018) and then sequenced on an Illumina HiSeq X Ten platform. The raw reads were filtered to obtain high-quality reads by removing adapters, low-quality sequences (reads with unknown bases “N”), and reads with more than 50% low-quality bases (quality value ≤ 10) with the NGS QC Toolkit v2.3.3 (Patel and Jain 2012). The clean reads were firstly aligned to the plastome sequences of Rheum palmatum (NCBI accession NC_027728), Rheum wittrockii (NCBI accession NC_035950) and Rumex acetosa Linn. (NCBI accession KC817303) to obtain chloroplast-like sequences using Bowtie v2.2.6 with default parameters (Langmead and Salzberg 2012). The matched paired-end clean reads were initially de novo assembled by SPAdes v3.6.0 (Bankevich et al. 2012). The derived longest contig was selected as seed sequence for further assembly using NOVOPlasty v2.6.2 (Dierckxsens et al. 2017). To validate the accuracy of the assembled plastome, all the clean reads were mapped against the unannotated cp genome in Geneious v10.1 with bowtie2 algorithm (Biomatters Ltd., Auckland, New Zealand). Some ambiguous regions with low coverage were confirmed by PCR-based Sanger sequencing using the primers designed for gap-flanking regions (Table S2). PCR amplifications were performed in a reaction volume of 25 μL with 12.5 μL 2 × Taq PCR Master Mix, 0.4 μM of each primer, 2 μL template DNA and 10.1 μL ddH2O. All amplifications were performed in SimpliAmp™ Thermal Cycler (Applied Biosystems, Carlsbad, CA, USA) as follow: denaturation at 94 °C for 5 min, followed by 35 cycles of 94 °C for 50 s, at specific annealing temperature for 40 s, 72 °C for 90 s and a final extension at 72 °C for 7 min. PCR products were visualized on a 1.5% agarose gel, and the DNA fragments were sequenced by Sangon Biotech (Shanghai, China). The cp genome was aligned to its reverse complement to determine inverted repeat regions, and the boundaries of the inverted repeats and single-copy regions were also verified by Sanger sequencing (Table S3, Fig. S1). Annotation was primarily conducted using automatic annotator DOGMA (Wyman et al. 2004). The draft annotated results were subsequently inspected and adjusted manually according to the annotation information of reference species in Geneious v10.1. Gene boundaries were manually checked to preserve reading frame and start/stop codons. The circular map of each plastid genome was drawn using the online program OrganellarGenome DRAW (Lohse et al. 2013). The complete plastomes have been submitted to Genbank with accession numbers: MN564922-MN564931.

Genome comparison, repeat structure analysis and markers identification

Ten newly sequenced plastomes and two previously published plastomes of Rheum palmatum and Rheum wittrockii were compared and visualized using mVISTA software (Frazer et al. 2004) in a Shuffle-LAGAN mode with the annotation of Rheum palmatum as a reference. The alignment of 12 cp genomes was retrieved by MAFFT v7.402 (Katoh and Standley 2013). Sliding window analysis was conducted to analyze DNA polymorphism and nucleotide diversity (Pi) with 200 bp step size and 600 bp window length using DnaSP v6.0 (Rozas et al. 2017). Some of the highly divergent non-coding regions were selected to design primers and validated based on PCR-based Sanger sequencing (same as the previous PCR protocols) after the comparison of these plastomes. IR expansion/contraction and gene distribution at the IR/SC borders of these plastomes were also compared. Tandem repeat sequences were identified by the Tandem Repeats Finder v4.09 (Benson 1999) with the following parameters: 2 for the alignment parameter match and 7 for mismatch and InDels. REPuter program was utilized to identify dispersed and palindromic repeats with a minimum repeat size of 30 bp and sequence identity of no less than 90% (hamming distance equal to 3) (Kurtz et al. 2001). Simple sequence repeats (SSRs) were searched using the MISA Perl script (Thiel et al. 2003) with the following minimum number of repeats: ten for mono, five for di-, four for tri-, and three for tetra-, penta, and hexanucleotide SSRs.

Estimation of substitution and selective pressure analysis

To detect selective pressures on the plastid genes, non-synonymous (dN) and synonymous (dS) substitution rates as well as the dN/dS ratios of each protein-coding gene were calculated using the yn00 program in PAML v4.9 (Yang 2007). The site-specific model was further selected to detect signatures of natural selection using EasyCodeML (Gao et al. 2019) based on the CODEML algorithm (Yang 2007). Six substitution models were used to test each plastid gene, namely M0, M1a, M2a, M3, M7, and M8 models. The site-specific model assumed that ω (dN/dS) ratio is the same across branches of the phylogeny but varying among different sites (Gao et al. 2019). Log likelihood values of every model were compared against neutral model by means of a likelihood-ratio test to detect statistical significance.

Phylogenetic inference

Phylogenetic analysis was conducted based on 28 taxa (Table S4), including 25 Polygonaceae species, and three Plumbaginaceae species (Plumbago auriculata, Ceratostigma willmottianum and Limonium tenellum) were set as outgroups. Ten cp genomes were obtained from the present study, while others were retrieved from NCBI GenBank (accession numbers were summarized in Table S4). All the cp genome sequences were aligned using MAFFT v7.402 with default parameters (Katoh and Standley 2013). The ambiguous and most variable sites in the multiple alignments were removed using Gblocks v0.91b (Talavera and Castresana 2007). To obtain robust phylogenetic relationships, three methods, including maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI), were used to construct phylogenetic trees. MP analysis was performed in PAUP v4.0b10 with 1000 bootstrap replicates (Swofford 1998), and addition-sequence was set as 1000 replications for heuristic search. The ML analysis was conducted using IQ-TREE (Nguyen et al. 2015) with the best best-fit model selected by ModelFinder (Kalyaanamoorthy et al. 2017) (Table S5), and the bootstrap replicates were 1000. MP/ML bootstrap support (BS) values were shown at each node. BI analysis was conducted using MrBayes v3.2.6 (Ronquist et al. 2012) with the nucleotide substitution model inferred from Modeltest v3.7 (Table S5). The Markov Chain Monte Carlo (MCMC) algorithm was run for one million generations and sampled every 100 generations with one cold chain and three incrementally heated chains. The first 25% of trees were discarded as burn-in, and the remaining trees were used to build a majority-rule consensus tree with posterior probability values for each node. Likewise, the above-mentioned three phylogenetic-inference methods were used to infer the phylogenetic tree from the 58 shared cp genes (Table S6) with the same settings.

The estimation of divergence time

Divergence time was estimated using BEAST v2.4.5 under uncorrelated lognormal relaxed clock and birth–death model of speciation (Bouckaert et al. 2014). For BEAST analysis, all the InDel sites in the multiple alignments were removed using Gblocks v0.91b (Talavera and Castresana 2007). GTR + G + I model of nucleotide substitution selected by Modeltest v3.7 was used to construct tree. Due to lack of specific fossil records for Rheum species, we used the fossils from other Polygonaceae species and the estimated divergence time in previous studies as the calibration points (Manchester and O’Leary 2010; Yao et al. 2019). All calibration priors were treated as a normal distribution with a mean and a wide SD. The Polygonaceae crown group was constrained to a mean age of 75.7 million years ago (Ma) and SD of 1.07 Ma (Yao et al. 2019). The Rumiceae, Polygoneae and Fagopyreae clustered in the group which was constrained to a mean age of 66 Ma and SD of 0.5 Ma (Manchester and O’Leary 2010). The MCMC simulation was run for 2.0 × 1010 generations with 10% discarded as burn-in. Tracer v1.7.1 was used for checking the convergence of the chains through sufficient effective sample sizes (ESS) value (> 200) (Rambaut et al. 2018).

Results

Plastome features and genome composition

A total of 7,087,845–8,586,533 paired-end reads were retrieved with a sequence length of 150 bp. After filtering and mapping, 255,572–1,088,699 clean reads were successfully mapped to the reference plastome (Table 1). Average sequencing depth ranged from about 224 × (Rheum przewalskyi) to 1,007 × (Rheum hotaoense) (Table 1). With a combination of de novo and reference guided assemblies, plastomes of eight Rheum species, Rumex crispus and O. digyna were obtained. The size of ten newly sequenced cp genomes ranged from 160,698 bp (O. digyna) to 162,048 bp (Rheum hotaoense) (Table 1). All the plastomes possess the typical quadripartite structure, consisting of a pair of IRs (30,534–31,023 bp) separated by the LSC (86,145–86,997 bp) and SSC regions (12,784–13,172 bp) (Table 1). The overall GC content among the ten newly sequenced plastomes and the two previously published plastomes of Rheum palmatum and Rheum wittrockii was very similar (37.3–37.6%). Each plastome identically encoded 131 predicted genes, including 79 protein-coding genes (PCGs), 30 tRNA genes and four rRNA genes; 17 genes in the IR regions were found to be duplicated (Fig. S2, Table 1). Across these cp genomes, rpl23 was found to be a pseudogene with two copies located in the IR regions, respectively. Three PCGs (ycf3, clpP and rps12) harbored two introns, while nine PCGs (rps16, petD, atpF, rpl16, petB, rpl2, ndhA, ndhB, rpoC1) and six tRNA genes (trnV-UAC, trnL-UAA, trnG-UCC, trnA-UGC, trnI-GAU, trnK-UUU) harbored one intron. No introns were detected from the remaining plastid genes.

Table 1 Comparison of chloroplast genomic characteristics in 10 Rheum species, Rumex crispus and Oxyria digyna, including aspects of genome size, G-C content, gene number and sequence reads (raw and mapped)

Comparative analyses of plastomes

To infer the cp genome divergence among these 12 species, overall sequence alignments were compared using the annotation of Rheum palmatum as a reference (Fig. 1). The aligned results showed that cp genome sequences of these species are relatively conserved with the same gene order. Especially, three Sect. Palmata species showed highly similar sequences (Fig. 1). No gene relocation or inversion were detected among these plastomes. However, some highly variable hotspots were found in the intergenic regions. The sliding window analysis also indicated that the divergence in intergenic regions is higher than that in genic regions, and the LSC/SSC regions are more variable than the IR regions (Fig. 2). The most divergent non-coding regions among these cp genomes were rps16-trnQ(UUG), trnS(GCU)-trnG(UCC), trnR(UCU)-atpA, atpF-atpH, psbM-trnD(GUC), trnE(UUC)-trnT(GGU), trnT(GGU)-psbD, ycf3-trnS(GGA), rps4-trnT(UGU), trnL(UAA) intron, trnT(UGU)-trnL(UAA), psaJ-rpl33, rpl33-rps18, rps18-rpl20 and clpP intron1. Some of these variable regions were used to design primers (Table S2) for validating the discriminatory powers, which indicated that the above-mentioned variable hotspots could be utilized as potential genetic markers to discriminate Rheum species (Fig. S3).

Fig. 1
figure 1

Percentages of identity comparing 12 plastomes of 10 Rheum species, Rumex crispus and Oxyria digyna with the R. palmatum as the reference (mVISTA). The y axis represents the percent identity within 50–100%. Genome regions are color-coded as protein-coding (purple), rRNA or tRNA coding genes (blue), and non-coding sequences (pink)

Fig. 2
figure 2

Nucleotide diversity (Pi) in the complete cp genomes of 10 Rheum species, Rumex crispus and Oxyria digyna. Sliding window analysis with a window length of 600 bp and a step size of 200 bp

Additionally, we found that the IR/SC boundary regions were still relatively conserved among these species (Fig. 3). Comparison of these 12 plastomes in Rumiceae revealed a few expansion and contraction of the IRs. Some genes including rps19, ndhF, rps15, ycf1, rpl2 and trnH(GUG) were found in the LSC/IR and SSC/IR borders. Of these genes, rps15 was found to be 64 bp, 49 bp, 80 bp away from the SSC/IRa border in Rheum wittrockii, O. digyna and Rumex crispus, respectively (Fig. 3), but it was similarly distributed in the remaining species. Except rps15, the remaining five genes showed similar distribution status at SC/IR borders in 12 plastomes. Especially, the above-mentioned six genes located in SC/IR borders of three Sect. Palmata species were never shifted.

Fig. 3
figure 3

The IR/SSC borders in the plastomes of 10 Rheum species, Rumex crispus and Oxyria digyna

Identification of SSRs and repeat sequences

A total of 809 SSRs were identified across the cp genomes of 12 Rumiceae species by MISA analysis. The number of SSRs per plastome/species ranged from 53 (Rheum przewalskyi) to 77 (Rheum palmatum). The majority of SSRs were mononucleotide repeats (423), followed by dinucleotide (178), trinucleotide (111), tetranucleotide (84), pentanucleotide (12) and hexanucleotide (1) repeats (Fig. 4). Mono nucleotide repeat motif (A/T), dinucleotide repeat motif (AT/TA), trinucleotide repeat motif (AAT/ATT) and tetranucleotide repeat motif (mostly AAAT/ATTT or rarely ACAG/CTGT) were present in each plastome (Fig. 4). Most of SSRs were located in non-coding regions, but a few of them were distributed in genic regions (Table S7). Over half of the SSRs were found in the LSC region, whereas low proportions of SSRs were found in the SSC or IR regions.

Fig. 4
figure 4

Statistics of repeat elements in the plastomes of 10 Rheum species, Rumex crispus and Oxyria digyna.a Number of forward, palindromic, complement and reverse repeats. b Number of tandem repeats. c Number of different simple sequence repeat (SSR) types detected in 12 plastomes. d Frequency of identified SSR motifs in 12 plastomes

The analysis of the 12 plastomes using Reputer recognized 523 repeats. The repeat number of each species ranged from 34 (O. digyna) to 49 (Rheum racemiferum and Rheum przewalskyi). Forward repeats are the most abundant types (260), followed by palindromic (235) and reverse (22) repeats (Table S8). The reverse repeats are rare in these 12 cp genomes, and only 22 reverse repeats were found. By searching the tandem repeats with Tandem Repeats Finder, the results indicated that the tandem repeat number of 12 plastomes ranged from 3 (O. digyna) to 25 (Rumex crispus) with a total number of 116 (Table S9). We found that most repeats were located in the intergenic or intron regions, and only a few repeats were distributed in protein-coding regions (ycf1, ycf2 and psaA, psaB, rpl14) (Table S8, S9).

Estimation of substitution and selective pressure analysis

The dS values of 76 PCGs between 12 paired Rumiceae species ranged from 0 to 1.4488 with an average value of 0.0361, and dN values ranged from 0 to 3.0185 with an average value of 0.0161 (Table S10). A total of 3709 paired dN/dS values were recovered, most of which were less than 1, indicating that the majority of cp genes were under purifying selection. Twelve cp genes including matK, ndhB, ndhD, ndhF, ndhG, psaJ, psbL, rpoA, cemA, rpl32, ycf1 and ycf2 were detected with dN/dS values > 1, indicating that these genes had undergone positive selection. Based on the site-specific model of CODEML, nine genes (atpE, ndhD, psaJ, psbC, rpl16, rpoC2, rps3, ycf1 and ycf2) with positively selected sites were identified (Table S11). By combining the results of YN and CODEML algorithm, we found that a total of 17 cp genes were under positive selection.

Phylogenetic inference

Three phylogenetic methods (MP/ML/BI) yielded almost identical topologies with generally high support values. The monophyly of Polygonaceae was strongly supported based on the available cp genome dataset (MP/ML, BS = 100; BI, PP = 1) (Fig. 5a). Species belonging to three tribes including Rumiceae, Polygoneae and Fagopyreae were also clustered in the same clade, respectively. Except for Rheum wittrockii, which was clustered with Rumex species in the same clade, other Rheum species formed a monophyletic group with high BS and PP values. This result also confirmed the monophyly of Sect. Palmata. The phylogenetic analysis also produced the similar result based on 58 shared CDS regions with three different methods (Fig. 5b).

Fig. 5
figure 5

Phylogenetic tree of 28 taxa using maximum likelihood, maximum parsimony and Bayesian inference based on dataset of the 58 shared genes and complete plastome sequence. a The phylogenetic tree inferred from complete plastome sequence dataset. b The phylogenetic tree inferred from the dataset of 58 shared genes. ML topology was shown with ML bootstrap value/MP bootstrap value /Bayesian posterior probability at each node. The pentastar indicates that the support rate of branch is 100/100/1.0

Estimation of divergence times

Using three Plumbaginaceae species as outgroups and the phylogeny generated here, divergence times of the major clades within Polygonaceae were estimated based on the dataset comprised of 28 plastome sequences (Fig. 6). Our analyses based on the two combined MCMC runs produced sufficient effective sample sizes (> 200) for all relevant parameters and suggested adequate sampling of the posterior distribution. The results suggest that the splits in crown group Polygonaceae were dated to the Upper Cretaceous (especially in the Campanian period, 73.86–77.99 Ma). The divergence between Rumiceae and Polygoneae was estimated as c. 54.23 Ma (41.30–64.15 Ma). The divergences within Rumiceae clade were dated to Oligocene (32.01 Ma, 15.35–48.97 Ma). The divergence time of Sect. Palmata clade was estimated to have occurred as c. 1.60 Ma.

Fig. 6
figure 6

BEAST-derived chronograms of Polygonaceae based on the plastome sequences with two calibration points (red pentastar) derived from previous study. Blue bars indicate the 95% highest posterior density (HPD) credibility intervals for node ages (Ma). Posterior probabilities are sequentially labelled above nodes. Mean divergence dates for major nodes (ae) are labeled

Discussion

The ten newly obtained cp genomes in the present study not only offered an opportunity to comprehensively compare cp genome sequences of Rumiceae, but also provided sufficient genetic resources for species discrimination and phylogenetic utility in Rheum. The cp genomes display the typical quadripartite structure with a LSC and a SSC region which are separated by two IRs. All the analyzed Rheum species exhibit 131 predicted genes and share the same gene content and gene synteny. Major structural changes such as gene loss and genome rearrangement have been reported in several angiosperm lineages (Cai et al. 2008; Weng et al. 2013; Wicke et al. 2016), but such changes were not found in the plastomes of Rheum. Especially, the overall cp genome structure of Sect. Palmata species was extremely conserved, which, to some extent, confirmed the close relationships of these three species.

The sizes of plastid genomes vary considerably in different angiosperms (63–242 kb) and even show high discrepancy in the same family (46–190 kb in Orobanchaceae) (Frailey et al. 2018; Gruenstaeudl et al. 2017; Wicke et al. 2013). On the contrary, Rheum species show minor difference in plastome sizes, from 159,051 bp in Rheum wittrockii to 162,048 bp in Rheum hotaoense. It was reported that the size increases of plastomes are usually caused by the expansions of the IR regions (Weng et al. 2013; He et al. 2017). Of compared Rheum plastomes, Rheum wittrockii had the shortest length of IRs, while Rheum hotaoense had the longest. It is similarly presumed that the differences in plastome size recorded in the present study are mainly the result of extractions or contractions of the IR regions. Genes located in SC/IR borders of Rheum plastomes were identical, and only minor length differences were detected between these genes and SSC/IR border. As Rheum species with close interspecific relationships mainly derived from a rapid radiation, we deduced that the highly conserved nature of cp genome resulted in the similar gene distributions at SC/IR junctions.

Although cp genome is highly conserved, some variable hotspots that include InDels have been detected (Aldrich et al. 1988). These hotspots in plastomes may provide several highly variable cpDNA markers. Species of Sect. Palmata are important in traditional Chinese medicines; therefore, the accurate species discrimination of Rheum is vital for the utilization of rhubarb. However, morphological similarities have further hampered the authentication of rhubarb species. Thus the molecular identification based on DNA markers will provide a more effective way to overcome these problems. The plastome has a conserved sequence length from 110 to 160 kb, which far exceeds the length of commonly used molecular markers and can provide more variation to distinguish the closely related species (Li et al. 2015; Nguyen et al. 2017). Therefore, some mutation hotspot regions could be tested as Rheum specific DNA markers and used to discriminate the rhubarb from its adulterants. These regions might also provide sufficient genetic variations for resolving the phylogenetic relationships of Rheum species.

Repeat elements detected in plastomes have been proven to be correlated with rearrangement, sequence divergence and recombination (Asano et al. 2004; Timme et al. 2007; Weng et al. 2013). The numbers and distributions of tandem, dispersed (forward and reverse), and palindromic repeats were surveyed in this study. Our results indicated that the repeat distribution status is similar in cp genomes of Rheum. Interestingly, most repeats were found in the intergenic or intron regions, and a few repeats were distributed in same gene regions (ycf1, ycf2) or gene with similar functions (psaA, psaB). Chloroplast simple sequence repeats (cpSSRs) are usually variable within the same species and have been proven to be an important molecular marker for species discrimination and population genetics at lower taxonomic levels (Provan et al. 2001; Ruhsam et al. 2015; Xue et al. 2012). The SSRs distributed in Rheum plastomes were similar, with the mononucleotide (A/T) being the most abundant repeat type. Poly (A)/(T) SSRs are pervasively found in plant cp genomes (Wang et al. 2018b; Yang et al. 2016; Zhou et al. 2019). Most cpSSRs were found in non-coding regions, but only a few SSRs were detected in coding regions. CpSSRs located in non-coding regions usually show high intraspecific variation in repeat numbers (Eguiluz et al. 2017). Therefore, the cpSSRs will provide more valuable genetic resources for the species identification and population genetics of Rheum.

Previous studies have revealed signatures of natural (purifying or positive) selection in some cp gene regions (e.g. psbA, matK, rbcL) encoding proteins involved in photosynthesis (Carbonell-Caballero et al. 2015; Ye et al. 2018). We found 17 protein-coding genes (atpE, matK, ndhB, ndhD, ndhF, ndhG, psaJ, psbL, psbC, rpoA, rpoC2, cemA, rpl16, rpl32, rps3, ycf1 and ycf2) being under the positive selection. Of these, matK with sufficient variant sites has been used as a standard DNA barcode for species discrimination, and it was highly divergent in Caryophyllales (Cuénoud et al. 2002). Eleven ndh genes in plant cp genome encode NAD(P)H dehydrogenase (NDH) complex which is essential for photosystem I cyclic electron transport and chlororespiration (Kofer et al. 1998). As NDH monomer is sensitive to high light intensity, we inferred that the ndh genes might have changed greatly to generate new functions for the stress resistance, and previous studies also showed the similar results (Peng et al. 2011; Wang et al. 2018b; Yang et al. 2016). Psa and psb genes are primary members of photosystem which may evolve rapidly in some Rumiceae species. Plastid genes, including rpoA and rpoC2, encoding proteins involved in transcription and post-transcriptional modification have been found to evolve under positive selection (Piot et al. 2018). CemA gene is related to the synthesis of PPR7 protein and may have coevolved with nuclear genes (Jalal et al. 2015). Therefore, we presumed that cemA gene may have a fast evolution rate in Rumiceae species, and some previous studies on other green plants also obtained the similar result (Xu et al. 2015; Zhou et al. 2016). It has been suggested that rpl and rps encode ribosomal proteins that have more divergent sequences than proteins related to photosynthesis (Xu et al. 2015). Ycf1 and ycf2 are two of the largest genes encoding a putative membrane protein (Cuénoud et al. 2002; Kikuchi et al. 2013) and have rapidly evolved in several species (Cho et al. 2015; Park et al. 2018; Wang et al. 2018b; Yang et al. 2016; Zhou et al. 2019).

The plastome has been proven to be the most important genetic resource for inferring the phylogeny of green plant and disentangling phylogenetic relationships of species that have experienced rapid radiations (Davis et al. 2014; Li et al. 2019; Ma et al. 2014; Yao et al. 2019). Rheum species were originated from a common radiation in the QTP (Sun et al. 2012) and shared the similar morphological traits. Phylogenetic relationships of Rheum species were inferred based on the available plastomes using MP, ML and Bayesian methods. Most Rheum species were clustered in the same clade with a high BS value, which was compatible with the previous phylogeny inferred from cpDNA fragments (Sun et al. 2012). However, Rheum wittrockii was clustered with Rumex species. Unexpectedly, the overall genomic structure of Rheum wittrockii was much more different from the congeneric Rheum species. Therefore, high sequence divergence of cp genome resulted in a discordant phylogenetic position. The phylogenetic result also showed that Rheum palmatum, Rheum officinale and Rheum tanguticum were constantly clustered in the same clade with a high resolution value, and the plastome sequences of these three species also showed the high similarity. Previous molecular and morphological data indicated that these three species should be treated as one species complex (Wang et al. 2018a). Therefore, our study further confirmed the close relationships of species belonging to Sect. Palmata. In addition, based on the phylogenic trees, species from three tribes (Rumiceae, Polygoneae and Fagopyreae) separately formed monophyletic groups with high support values, indicating that cp genomes are suitable for resolving the phylogeny of Polygonaceae. However, limited taxon sampling may provide insufficient phylogenetic information which may lead to a discrepant tree topology (Eguiluz et al. 2017; Leebens-Mack et al. 2005). Therefore, sufficient samples should be recovered to obtain a more reliable inference of the phylogenetic relationships of Polygonaceae.

Based on the available plastome dataset, we estimated the divergence time of these Polygonaceae species. The ages of the major Polygonaceae splits are in agreement with the previously published result (Yao et al. 2019). Most Rheum species mainly diverged in the middle Miocene to Pliocene. It has been proven that the uplift of the eastern edge of the QTP during the Miocene and Late Pliocene facilitated the radiations of species (Sun et al. 2011; Wen et al. 2014). Therefore, the divergence and diversification of Rheum species might have been affected by such important tectonic events, which was consistent with the previous phylogenetic study (Sun et al. 2012). According to our results, the earliest divergence of Sect. Palmata group was dated to Pleistocene. It can be inferred that the repeated climatic fluctuation in the Quaternary promote the divergence of these three species, which is compatible with the previous studies based on plastid and nuclear markers (Sun et al. 2012; Wang et al. 2018a).

Conclusions

The complete plastomes of eight Rheum species, one Rumex and one Oxyria species were sequenced and compared. All the plastomes showed identical structure with the same gene order, and no gene relocation or inversion were detected among these plastomes. However, some highly variable hotspots were found in the intergenic regions which provided candidate genetic markers for species authentication and phylogeny for Rheum. Besides, based on comparative cp genome analyses, repeat elements were detected from the plastomes, and the abundant SSRs could be used to the species discrimination and population genetics of Rheum. We found that most plastid genes have undergone purifying selection, and 17 genes were subjected to positive selection. The phylogenetic analyses further confirmed the monophyly of Polygonaceae based on the available cp genomes, and also indicated that plastomes could facilitate to reconstruct the phylogeny of Polygonaceae. According to the molecular dating based on plastome sequences, the divergence time of three tribes of Polygonaceae was estimated. We confirmed that the diversification of Sect. Palmata was caused by fluctuant climate in the Quaternary.