Introduction

The plastid genome, known as the plastome, is an ideal genetic system for evolutionary genomic studies due to its crucial role in the photosynthesis of green plants (Krause 2008; Wicke et al. 2013; Wicke and Naumann 2018). In general, the plastome organizations are highly conserved among angiosperms due to purifying selections (Young and dePamphilis 2005; Banerjee and Stefanović 2019). Most genes are key components of the photosynthetic apparatus (light and dark reactions) or plastid genetic apparatus (translation and transcription processes). Notably, parasitic plant obtains nutrients needed for the survival from host plants using the haustorium, which is a special group in terms of significant morphological, physiological, ecological, and genomic changes (Westwood et al. 2010; Heide-Jørgensen 2013; Wicke and Naumann 2018). Parasitic plants consist of 292 genera and 4750 species across 12 lineages of flowering plants and have evolved independently 12 or 13 times (Westwood et al. 2010; Nickrent 2020). The transition from autotrophy to heterotrophy usually involves structural and functional variations in the plastome (Krause 2008; Wicke et al. 2016), making the parasitic plant as an important system to investigate plastome evolution under relaxed selective pressures and adaptation of plants to lifestyle changes (Young and dePamphilis 2005; McNeal et al. 2007; Barrett et al. 2019).

Plastome degradation is a common phenomenon in the evolutionary history of the parasitic plant (Wolfe et al. 1992; Wicke and Naumann 2018), leading to reduction of genome size, gene loss and structural variation (Krause 2008; Wicke et al. 2016; Wicke and Naumann 2018). For instance, hemiparasitic Pedicularis has a 146 kb plastome in size with 11 pseudogenes and IR boundary shifts (Li et al. 2021), while holoparasitic Rafflesia and Sapria completely lose their plastid sequences (Molina et al. 2014; Cai et al. 2021). Furthermore, plastome changes have been found to be correlated with the relaxed selection, where nucleotide substitution of plastid genes occurs faster in the parasitic plant than those in autotrophic plant (Wicke et al. 2016; Wicke and Naumann 2018). Some retained genes in the parasitic plant are still under purifying selection (Wolfe et al. 1992; Cusimano and Wicke 2016; Liu et al. 2019; Li et al. 2021). Researchers have found that plastome degeneration is positively associated with the degree of parasitism (Barrett and Davis 2012; Barrett et al. 2014; Wicke et al. 2016; Graham et al. 2017; Banerjee 2022) and can be classified into five stages of plastomic degradation: (1) NAD(P)H complex (ndh genes), (2) groups of photosynthesis-related (psa/b, pet, ycf3/4, cemA, and ccsA genes) and the plastid-encoded polymerase genes (rpo genes), (3) loss of prolonged or alternative function genes (e.g., atp and rbcL genes), (4) photosynthesis-unrelated metabolic genes (e.g., accD, clpP, and ycf1/2), (5) a core group of housekeeping genes (e.g., matK, rpl, rps, rrn, and trn). However, this model was based on taxon samplings including both hemi- and holo-parasites, whether this model is suitable for the sole holoparasitic lineage in Convolvulaceae and other groups warrants further investigation.

Structural variations of the plastome have been found in several groups of angiosperms, such as Geraniaceae (Guisinger et al. 2011), Aristolochiaceae (Sinn et al. 2018), Ericaceae (Graham et al. 2017), Fabaceae (Cai et al. 2008) and can be classified into three main types: shift of IR boundaries, genomic rearrangements, and translocations. Studies have uncovered that structural variations in the plastome are highly associated with IR regions, which can maintain and break plastome stability (Knox 2014; Sinn et al. 2018), and can be induced by short DNA repeats, tRNA genes, and palindromic sequences (Sinn et al. 2018). The transition of parasitism is thought to lead to a decreas the GC content, which may trigger structural variations (Wicke et al. 2013; Wicke and Naumann 2018). However, this hypothesis is difficult to test (Wicke and Naumann 2018).

Cuscuta, the sole parasitic genus in Convolvulaceae, is a diverse lineage of nearly 200 species with global distribution and has agricultural and ecological importances (Costea et al. 2015), and it is characterized by scale-like leaves, slender twining stems, and absence of roots (Yunker 1932; Costea et al. 2015; Banerjee and Stefanovic 2020). It represents one of the 12 angiosperm orders that have transitioned independently from autotrophs to parasites (Westwood et al. 2010; Nickrent 2020). The genus Cuscuta contains sole holoparasitic species, making it a unique system for studying plastome evolution resulting from lifestyle changes (Yunker 1932; Costea et al. 2015; Banerjee and Stefanovic 2020; Pan et al. 2023). Costea et al. (2015) proposed a classification system for Cuscuta based on phylogenetic and morphological evidence, dividing the genus into 4 subgenera, Monogynella, Cuscuta, Pachystigma, and Grammica. To date, evolutionary associations between plastomic variations and lifestyle changes from autotrophic to parasitic in Convolvulaceae is not well investigated. Therefore, we conducted additional sampling and assembly of plastomes from seven Cuscuta species and nine species of autotrophic Convolvulaceae, respectively, to fill this knowledge gap. Our objectives were: (1) to examine whether changes from autotrophic to parasitic lifestyle are associated with structural variations in the plastome of Convolvulaceae; (2) to infer evolutionary patterns of the plastid gene changes in Cuscuta; and (3) to identify whether the factors of plastome characteristics and selections are associated with plastomic degradation in Cuscuta.

Material and methods

Sampling and DNA sequencing

In this study, we conducted de novo assembling of 29 new plastomes including 20 samples from seven Cuscuta species and nine autotrophic species of Convolvulaceae (Table 1), which represents seven major clades of Convolvulaceae (Chen et al. 2022). Fresh vines of Cuscuta were collected from the field, preserved, and subsequently dried in silica gel. Voucher specimens were deposited at the Herbarium of Xishuangbanna Tropical Botanical Garden (HITBC) and the Herbarium of Kunming Institute of Botany (KUN), Chinese Academy of Sciences. We also downloaded 20 plastome sequences of Cuscuta spp., as well as plastome sequences of Ipomoea nil (L.) Roth and Nicotiana tabacum L. from GenBank. In total, we sampled 40 individuals of Cuscuta, representing 23 species from four subgenera, subgen. Monogynella (16 individuals, 3 species), subgen. Grammica (17 individuals, 14 species), subgen. Cuscuta (5 individuals, 4 species) and subgen. Pachystigma (2 individuals, 2 species) (Table 1 and Supplementary Table S1).

Table 1 Basic information of physical property of newly assembled plastomes in parasitic and non-parasitic convolvulaceae plants.

We extracted total genomic DNAs from the silica-gel-dried materials using the CTAB method (Doyle and Dickson 1987) and randomly fragmented the purified genomic DNAs. We then indexed the fragmented with tags and constructed a short-insert (350–500 bp) library using NEB Next® Ultra II™ DNA Library Prep Kit for Illumina®. We generated 150 bp paired-end raw reads for each sample using Illumina Sequencing Platform and obtained 4.0–5.0 Gb of raw reads.

Plastome assembly and genome annotation

To de novo assemble the complete plastome from raw reads data, we used the Getorganelle toolkit (Jin et al. 2020b), which enables the automatic export of the complete and circular plastome sequence. We checked the assembled FASTG graphs in Bandage (Wick et al. 2015) and annotated the complete plastome sequence using Ge-seq (Tillich et al. 2017) with I. nil (AP017304) as the reference. We then manually adjusted the annotated plastomes using Geneious by verifying the start and stop codons of the coding sequence, and comparing them with the reference. The average base coverages of plastomes were presented in Supplementary Table S4.

We used two criteria to check the annotations against the reference. First, if a gene had an intact open reading frame (Fleischmann et al. 2011) but the sequence length was 60% shorter than that of the reference gene, or if 70% of the sequence was different from the reference gene due to long insertion or deletion, we treated the gene as a pseudogene. Second, if the gene was not found in homologous contiguous stretches of less than 20 nucleotides, we treated the gene as absence. Additionally, we also verified and detected putative tRNAs using tRNAscan-SE with default parameters ( Chan et al. 2019). Finally, we detected local collinear blocks using the alignment algorithm tool MAUVE (Darling et al. 2010) plugin in Geneious (https://www.geneious.com/).

Phylogenetic inferences and molecular dating

A total of 66 plastome sequences, including 34 from Chen et al. (2022), were used to reconstruct a robust phylogenetic tree of Convolvulaceae. The 79 coding DNA sequences (CDS) were extracted using “get_annotated_regions_from_gb.py” script (https://github.com/Kinggerm/PersonalUtilities), then each gene matrix was aligned using MACSE 2.07 with the amino acid translation option using genetic code (Ranwez et al. 2011). The outlier nucleotide sites of CDS alignments were identified and trimmed using Spruceup software with default parameters (Borowiec 2019). All CDS alignments were concatenated into a super-matrix using “concatenate_fasta.py” script (https://github.com/Kinggerm/PersonalUtilities). Due to substitution rates in plastid genes varied among different subgenera of Cuscuta, six matrixes were utilized to assess the stable phylogenetic topology. Of six matrices, the Matrix I with all sampled individuals of Cuscuta, the Matrix II included all four subgenera of Cuscuta with one individual representing each species (hereafter), the Matrix III included three subgenera without the C. subgen. Grammica, the Matrix IV included two subgenera, C. subgen. Monogynella and subgen. Cuscuta, the Matrix V included the C. subgen. Monogynella alone, and the Matrix VI excluded all species of Cuscuta.

Two approaches, super-matrix and multispecies coalescent (MSC) (Vogel et al. 2018), were used to reconstruct phylogenies of Convolvulaceae. For the super-matrix approach, the six CDS super-matrices were utilized to infer the phylogenetic tree with IQ-TREE2 using GHOST model with 4 classes in conjunction with the GTR substitution model (Crotty et al. 2020), and with RAxML using GTR + Γ + I substitution model, and 1,000 bootstrap replicates were performed to obtain support values of nodes (Stamatakis 2014). For the MSC approach, each CDS gene tree was inferred with RAxML using GTR + Γ + I substitution model, then all CDS gene trees were used to estimate coalescent-based specie trees with ASTRAL-III (Zhang et al. 2018). In all the above phylogenetic analyses, representative species from other Solanales families were selected as outgroups. Consulting with previous phylogenetic studies (Stefanović et al. 2003; Chen et al. 2022; Simões 2022), the IQ-TREE2 with GHOST model provided a more robust phylogenetic framework of Convolvulaceae (see the details in the result sections). Consequently, Matrix I was employed for the subsequent analysis.

The divergence-time estimation was carried out with the MCMCTree 4.10 dating software from the PAML package (Yang 2007). Due to the lack of valid fossil records in Cuscuta, we estimated the divergent time using the ages of two external nodes (Convolvulaceae + Solanaceae, 61–88 Mya; Convolvulaceae, 50–60 Mya) (Magallón et al. 2015; Srivastava et al. 2018). Analyses were carried out using the main tree topology which was removed repetitive individuals of the same species (Fig. 1). We used the 65 CDS genes encoded by all species for dating analyses. The GTR model and the independent rates of molecular clock models (clock = 2) were used. The first 160,000 generations were discarded as burn-in, and the Markov chain was sampled every 10 generations until 400,000 samples were collected.

Fig. 1
figure 1

Ancestral state reconstruction of gene functional or physical loss in Cuscuta based on Dollop analyses. The plastid genes were classified into three types, each type has two states with solid or hollow boxes. Black indicates the functional gene turns into nonfunction; pink indicates the complete length gene becomes truncated; grey indicates gene loss. The node number represents divergent time

Detecting relaxed selective constraints

To detect the selective pressure of CDS genes within Convolvulaceae, we performed RELAX analysis (Wertheim et al. 2015) with Hyphy 2.5 based on the dating tree with genus Cuscuta as the test branch and other taxa as the reference branch by calculating the ratio of nonsynonymous to synonymous substitution rates (ω or dN/dS). In addition, we used a definition intensity parameter K to examine whether a branch experienced relaxed selection compared to the background branch. The tested branch underwent intensified selection when K > 1, relaxed selection when K < 1, and neutral selection when K = 1. Relative values of dN and dS were calculated in Hyphy using custom batch scripts and the MG94 × GTR_3 × 4 codon model (Kosakovsky et al. 2020), then R project v.4.3.2 was employed to visualize the results. All analyses were run via the command line, and.json result files were parsed with a Python script.

Ancestral reconstruction of gene losses and genomic structure

To infer the degradation history of plastid genes in Cuscuta, we constructed ancestral states based on the dating tree using Dollop (http://evolution.genetics.washington.edu/phylip.html) (Supplementary Fig. 1B) and the ‘phytools’ package of R program with the ‘ER’ likelihood model (Revell 2012). We used 67 functional gene statuses as discrete characters and classified each plastid gene status into four types: the ancestral functional status (0), nearly complete but nonfunctional status (1); truncated status (2); and absent status (3). Function (0) was the ancestral gene status. We divided the plastid gene status into three groups for Dollop: (I) function (0) → nonfunction (1, 2, 3), (II) intact structure (0, 1) → damaged or loss structure (2, 3), and (III) retained (0, 1, 2) → absent (3) (Fig. 1).

For plastid genomic reconfigurations, we inferred its evolutionary history using both the ‘ER’ likelihood model using the ‘phytools’ package, with the plastid genomic structure of N. tabacum as the ancestral character. We calculated 13 regions of genomic rearrangements and coded them as follows: (1) atpI–rpoB, (2) trnC-GCA–trnT-GGU, (3) psbD–rps4, (4) trnT-UGU, (5) trnL-UAA–trnF-GAA, (6) ndhJ–ndhC, (7) trnV-UAC–atpB, (8) rbcL–cemA, (9) rps8–rpl2, (10) petA–psbE, (11) ycf1, (12) ccsA–trnL-UAG, and (13) rpl32. We classified genomic structure variations into two types: typical and variant (inversion + translocation + loss) (Figs. 3 and 4).

For complex IR boundary variations, we focused on the variant directions of four plastome regions. With the plastid genomic structure of N. tabacum as the ancestral and typical type, we recognized seven types of IR variations, including no contraction and expansion (I), IRa 5′ contraction (II), IRa 5′ contraction and IRb 3′ contraction (III), IRa 5′ contraction and IRa 3′ expansion (IV), IRa 5′ contraction and IRa 3′ contraction (V), IRa 5′ contraction and IRb 3′ expansion (VI), as well as IRa 5′ contractions, IRa 3′and IRb 3′ expansion (VII). Each type was coded and analyzed using the ‘ER’ likelihood model in the ‘phytools’ package (Fig. 4).

Phylogeny-based statistical analysis

We analyzed the relative synonymous codon usage (RSCU) using MEGA (Kumar et al. 2018) and visualized it using ‘phyheatmap’ package in R project. RSCU > 1.0 indicated a positive codon usage bias, RSCU = 1.0 indicated no bias; RSCU < 1.0 indicated negative bias. We used the RepeatMasker website (http://www.repeatmasker.org) to detect the repetitive sequences of plastome by removing one copy of IR.

We computed the strength of the phylogenetic signal in different plastid genomic traits using phylo.d function in the 'geiger' package(Fritz and Purvis 2010), with D value > 1.0 indicating a random phylogenetic pattern; and D < 1.0 indicating a clumped phylogenetic pattern. Finally, we employed Fisher’s exact test in R project to examine the significance of group variations in different plastome traits.

Results

General features of plastid genome in Convolvulaceae

The general characteristics of 29 newly assembled plastome sequences and 22 plastome sequences downloaded from NCBI were presented in Table 1 and Supplementary Table S1, respectively. Plastome size in autotrophic Convolvulaceae varied from 146,436 to 164,112 bp, with LSC being 82,334–90,593 bp, SSC being 11,483–23,361 bp, IR being 18,129–34,945 bp. However, plastome size in Cuscuta spp. was reduced to 85,263–121,020 bp, with LSC being 50,388–79,528 bp, SSC being 6,727–8,678 bp, and IR being 14,074–16,569 bp. All plastid genes were functional in autotrophic Convolvulaceae, except the rpl23 gene which is nonfunctional in Dichondra, Dinetus, and Poranopsis (Tables 1, S1). In plastomes of parasitic Cuscuta, gene contents varied from 87 to 98 unique genes, including 59–65 CDS, 24–29 tRNA genes, and 4 rRNA genes (Table 1; Supplementary Fig. S2). The GC content of Cuscuta plastomes was in the range of 38.2–38.3% in subgen. Monogynella, 37.6%-37.9% in subgen. Grammica, 35.2% in subgen. Cuscuta, with an average of 37.9%, which was lower than that of 38.3% in autotrophs (Table 1).

The plastome of Cuscuta exhibited the highest frequency of AAA codon encoding lysine (4.6%) and the least frequency of stop codon UAG (0.3%) (Supplementary Fig. S3). Overall, Cuscuta spp. showed a similar pattern in the codon usage preference. A total of 23 codons showed a significant bias (RSCU > 1), with 91% of the third position being A/U, indicating that Cuscuta is generally biased toward A/U in the third position. We also detected DNA repeats in all Convolvulaceae species (Fig. 2). Plastomes of autotrophic Convolvulaceae species and two early divergent subgenera of Cuscuta had a greater number of long repeats (≥ 100 bp). In the Cuscuta sect. Grammica, the repeat numbers were significantly decreased, and long repeats (≥ 100 bp) were only found in C. boldinghii, C. erosa, and C. mexicana (Fig. 2). Genomic rearrangements were found in both IR-lacking and IR-retaining clades of clade in Cuscuta (Figs. 3 and 4).

Fig. 2
figure 2

Statistics analysis. A Heatmap of GC content in different genes or gene groups; B the number of repeats in different species

Fig. 3
figure 3

The physical plastome map of structural rearrangements among Convolvulaceae. The plastome of Nicotiana tabacum is depicted as a line, and genes are depicted by boxes. Among Convolvulaceae, the syntenic regions are highlighted by different colored arrows. The direction of the arrow means the same or reversal directions of the reference plastome of N. tabacum

Fig. 4
figure 4

The variations of IR region boundary (A) and ancestral state reconstruction of genomic reconfiguration variations in plastome (B). A, the plastome is depicted as a line and genes are depicted as boxes. The color of the boxes means different functional groups of genes and their length does not reflect the gene length. Gene above the black line is oriented from 5′ to 3′, while those below the line are from 3′ to 5′. The purple dashed lines indicate the boundary of each partition. B, the small pie charts represent the result of the ancestral state reconstruction of IR boundary variations, and the numbers on the branches indicate the ancestral state of genomic reconfiguration variations

Phylogenetic relationship and molecular dating

Phylogenetic topologies of major clades in Convolvulaceae were inconsistent among six datasets using the RAxML inferring approach (Supplementary Fig. S1A). In contrast, phylogenetic topologies of six datasets are nearly identical using the IQ-TREE inferring approach (Supplementary Fig. S1A). Phylogenetic analyses well supported that the clade of Erycibeae and Cardiochlamyeae was the earliest divergent group and sister to other tribes in Convolvulaceae, followed by Cuscuteae, then Dichondreae and Cresseae. All analyses strongly supported the monophyly of Cuscuta spp. [BS (Bootstrap support value) = 100; Supplementary Fig. S1]. Within Cuscuta, four subgenera were fully resolved as monophyletic groups (Supplementary Fig. S1B). Of them, C. subgen. Monogynella was the first divergent clade, followed by subgen. Cuscuta, and then subgen. Pachystigma and subgen. Grammica formed a clade (BS =  100; Supplementary Fig. S1B).

Divergent time estimation indicated that the divergence between Convolvulaceae and Solanaceae occurred at 71.77 Mya (95% highest posterior density (HPD): 61.11–83.99 Mya), and the diversification of Convolvulaceae happened at 61.55 Mya (95% HPD 58.55–65.19 Mya) (Fig. 1). The crown age of Cuscuta was 53.69 Mya (95% HPD 48.52–58.27 Mya). Cuscuta subgen. Cuscuta was divergent from the remaining two subgenera occurred at 48.92 Mya (95% HPD 42.68–54.70 Mya), then they are divergent at 44.21 Mya (95% HPD 37.29–50.81 Mya). The diversification time of four subgenera varied from 20.12 Mya (95% HPD 15.29–42.04 Mya) in C. subgen. Monogynella to 35.57 Mya (95% HPD 27.77–43.23 Mya) in subgen. Grammica (Fig. 1).

Variations in plastomic structure and organization

The plastome of Convolvulaceae species underwent structural reconfiguration through inversion, relocation, and deletion. Convolvulaceae species showed greater structural reconfigurations compared to the typic plastome of the outgroup N. tabacum (Fig. 3). Collinear analysis identified nine types of structural reconfigurations in 13 regions of Convolvulaceae plastomes. Of these regions, eight were found in autotrophic Convolvulaceae (regions 1, 2, 3, 4, 5, 6, 7, and 11), and eleven in Cuscuta spp. (regions 2, 3, 4, 5, 7, 8, 9, 10, 12, and 13) (Fig. 3), with five regions sharing by two lineages. Of the nine types of structural reconfigurations, seven types occurred in the plastome of Cuscuta (Fig. 3). Specifically, C. exaltata exhibited a ~ 12 kb inversion in the LSC region and a ~ 2 kb length inversion in the SSC region. C. reflexa and C. japonica each had a ~ 3 kb inversion located in the LSC region. Relocation of the rpl32 gene and two inversions (trnF-GAAtrnT-UGU and rpl32) only occurred in subgen. Cuscuta, whereas rpl32 gene was completely lost in subgen. Pachystigma (Fig. 3). Interestingly, only two species of subgen. Grammica exhibited inversions: C. bonafortunae with a ~ 22 kb inversion in the LSC region, and C. boldinghii with a ~ 5 kb in the LSC region. The ancestral plastome of Cuscuta experienced a region 6 (ndhJndhC) loss and inversion in region 13 (rpl32). Moreover, the ancestor of C. subgen. Monogynella underwent reconfiguration of region 12 (ccsA–trnL-UAG), as well as the ancestor of C. subgen. Cuscuta experienced various structural variations, including the inversion of regions 4 (trnT-UGU) and 5 (trnL-UAAtrnF-GAA), the IR loss, and the relocation of region 13 (Fig. 4B).

As the hotspot for structural reconfiguration of plastome, IR regions displayed variations in expansion, contraction, or even complete loss (Fig. 4), and IR boundaries of plastome exhibited dramatic positional shifts at both IR borders. Notably, the LSC/IR boundary had shifted to incorporate rpl2, rpl23, trnI-CAU, and partial ycf2 into the IR, while the IR/LSC junction involved trnH-GUG, psbA, trnK-UUU, or intergenic region. Contraction of the IRa 5′ was prevalent in all Convolvulaceae spp. plastomes, while expansion of the IRa 5′ and IRb 3′ boundaries was prevalent in Cuscuta species (Fig. 4A). The expansion and contraction in the IR/SSC boundary had an influence in inversion of ycf1 genes in Poranopsis and Dinetus. Ancestral states analysis indicated that the ancestor of Convolvulaceae spp. had experienced an LSC/IRa boundary shift, and IR/SSC contraction had occurred in the early age of C. sect. Ceratophorae (Fig. 4B).

Pseudogenization and gene loss

The rpl23 gene is fragmented in three autotrophic genera of Convolvulaceae, Dichondra, Dinetus, and Poranopsis, as well as parasitic Cuscuta, even completely lost (Supplementary Fig. S2). There are 21 pseudogenes in Cuscuta spp., with a subgenus-specific gene reduction pattern (Supplementary Table S1; Fig. S2). Nine ndh genes encoding subunits of the NAD(P)H dehydrogenase complex were completely deleted, leaving only relict ndhB in 21 species (except C. boldinghii and C. erosa). Additionally, rpl23, rps16, and trnK-UUU genes were pseudogenized in C. subgen. Monogynella. The ndhD and trnV-UAC genes were lost in C. subgen. Cuscuta. The rpo, psbZ, and trnA-UGC genes were fragmented in C. subgen. Pachystigma. The psaI, matK, rpl32, trnI-GAU, trnR-ACG, trnG-UCC, and trnK-UUU genes were lost in C. subgen. Grammica. More than 90% of the plastid genes encoding photosystem (psa and psb) were pseudogenes or complete loss in C. erosa, C. boldinghii, and C. strobilacea (Supplementary Table S1; Fig. S2).

The physical or functional loss of plastid genes mainly resulted from deletions, premature stop codons, and loss of 5′ start regions (Supplementary Figs. S4–S6). Functional loss of atpE gene in five species of C. subgen. Grammica was caused by a single nucleotide (T) insertion inducing a premature stop codon (Supplementary Fig. S4). The psbZ gene was pseudogenized only in C. africana, C. erosa and C. boldinghii (Supplementary Fig. S5), because it has a 36 bp deletion in the 5′ end region leading to fragment in C. africana, one mutation [tTa (Leucine) → tAa (stop codon)] shifting to a premature stop codon in C. erosa (Supplementary Fig. S5). All rpl23 genes underwent pseudogenization in Cuscuta by having premature stop codons or large deletions (Supplementary Fig. S6).

Evolutionary analysis of physical and functional gene loss

Both ‘ER’ likelihood model and Dollop analyses revealed that nine ndh genes (ndhA, ndhC, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, and ndhK) had already lost, with all ndh genes showing pseudogenization at the 3rd node (Fig. 1 and Supplementary Fig. S7). The ‘ER’ model analysis showed that functional loss of ndhD was at the 3rd node with 57.3% probability, while Dollop analysis inferred for the loss at the 4th node (Fig. 1 and Supplementary Fig. S7). Both analyses revealed that the rpl23 gene was physically lost at the 7th node, yet its functional loss could be at 1st node (‘ER’ model) or 2nd node (Dollop analysis) (Fig. 1 and Supplementary Fig. S7). The functional and physical loss of rps16 occurred at different nodes using different methods (‘ER’: functional loss at the 3rd node with 57% probability, and physical loss at the 9th node with 86.4% probability; Dollop: functional loss at the 3rd node, and physical loss at the 6th node) (Supplementary Fig. S7). Meanwhile, a partial sequence of rps16 gene was recovered in eight species (Fig. 1). All analyses found that four tRNA genes (trnG-UCC, trnR-ACG, trnI-GAU, and trnK-UUU) were physically lost at the 6th node.

Both analyses showed that rbcL gene became pseudogenes at the 10th node, matK gene was completely lost at the 6th node, and rpo genes displayed diverse histories in pseudogenization (Fig. 1 and Supplementary Fig. S7). Specifically, functional loss of rpoA gene happened at the 5th node, and Dollop and ‘ER’ likelihood model analyses showed that rpoA as a pseudogene with nearly complete sequence length, transitioned to a truncated gene at the 5th or 9th node, respectively. In addition, physical loss of both rpoB was at the 6th node in the ‘ER’ model and the 7th node in the Dollop analyses. All analyses estimated that rpoC2 lost function at the 5th node, then was completely lost at both the 5th or 8th nodes in the ‘ER’ model and the Dollop analyses, respectively (Fig. 1 and Supplementary Fig. S7).

Evolutionary selection of retained genes in Cuscuta

As shown in Fig. 5, ten plastid gene groups in parasitic Cuscuta spp. underwent the relaxed selection (K < 1), with nine plastid genes or gene groups (accD, ccsA, clpP, pet, psa, psb, rbcL, rpo, and ycf1 + 2) undergoing significant selection (P < 0.05). Five plastid genes or gene groups (atp, rpl, rps, matK, and ycf3 + 4) experienced intensified selection (K > 1).

Fig. 5
figure 5

Results of relax analysis in genes and gene groups among Convolvulaceae, Cuscuta as a test group

In the photosynthesis-related gene group, ycf3 + 4 (P < 0.05) genes underwent intensified selection (K > 1). Four photosynthesis-related gene groups (ccsA, pet, psa, and psb) experienced significant relaxed selection (K < 1, P < 0.05). With the exception for ten genes (petG/L/N, psaJ, psbD/H/I/M, rbcL, ycf4), other genes of pet, psa and psb gene groups underwent relaxed selection (K < 1) (Supplementary Fig. S10). The ATP synthase gene group exhibited the relaxed selection, but half of atp genes experienced intensified selection (Fig. 5, Supplementary Fig. S10). In contrast, the protein synthesis gene group showed intensified selection (rps: K = 1.78, P < 0.01; rpl: K = 1.36, < 0.01).

Comparative analyses of nucleotide substitution rates showed dN and dS values of Cuscuta spp. are higher than those of autotrophic Convolvulaceae in 30 genes and 40 genes, respectively (Supplementary Fig. S9). Within Convolvulaceae, 65 genes in each species exhibit higher dS values than dN values. In addition, Dichondra shows the highest dN and dS values among the entire family (Supplementary Fig. S9).

Phylogeny-based statistical analyses

Phylogeny-based analyses of structural variations revealed that all structural variations exhibited strong phylogenetic signals (D < 0.0, P < 0.05) (Supplementary Table S2). Variations of IR boundaries exhibited a strong phylogenetic signal (D < 0.0, P < 0.05), except the type V (D < 1.0, P > 0.05). Most pseudogenes or gene loss among Cuscuta exhibited strong phylogenetic signals (D < -1.0, P < 0.01), except that three genes (i.e., atpF, psbM, and rps14) exhibited an over-dispersed phylogenetic pattern (D > 1.0). Fisher’s exact test for analyzing structural variations of the plastome and IR boundary shifts of the plastome between autotrophic and parasitic Convolvulaceae showed all structural variations and five types of IR boundaries showed significant differences (P < 0.05), while other genomic inversion, translocation, and one types of IR boundary shifts were insignificantly different (P > 0.05) (Supplementary Table S3). Physical or functional gene loss among four subgenera of Cuscuta showed the same pattern in 26 genes (P > 0.05) and showed a significant difference in 29 genes (P < 0.05) (Supplementary Table S3).

Discussion

Diversity of plastid genomic structure in Convolvulaceae

The plastome structure of angiosperms is a highly conserved typical quadrant structure similar to that of Nicotiana (Mower and Vickrey 2018). However, a typical structural variation including gene order change (inversion and translocation) and IR boundary shift, have been found in several groups of angiosperms. This study found that the structural variation and IR boundary shift are present in both parasitic Cuscuta and autotrophic Convolvulaceae (Funk et al. 2007; Banerjee and Stefanovic 2019; Park et al. 2019; Lin et al. 2022), and IR lost in Cuscuta subgen. Cuscuta (Figs. 3 and 4) (Banerjee and Stefanovic 2020). In agreement with previous study (Lin et al. 2022), we found a foreign DNA insertion with around 1.5 and 2.2 kb in Dinetus and Dichondra, respectively. Non-parasitic Convolvulaceae species displayed two different types of structural variations (Fig. 3), with half of the species maintaining ancestral plastome structures and the other half inversing the regions 1–7, and 11 or the regions 5–7 (Fig. 4B). The common ancestor of Cuscuta has also undergone inversion of the region 13, as well as completely lost the region 6. Among Cuscuta, the common ancestors of different subgenera have unique inversions or translocation (Fig. 4). The strong phylogenetic signal in these regions confirmed that these variations were closely related to phylogenetic evolution, emphasizing the need for a comprehensive phylogenetic framework in the analysis of structural variations in Convolvulaceae. Plastome reorganization is believed to have been mainly driven in three different ways. The first is ebb-and-flow shifts in the IR boundaries, which are thought to be important rearrangement means in Geraniaceae (Chumley et al. 2006; Guisinger et al. 2011), Pedicularis (Li et al. 2021) and Convolvulaceae (Fig. 4). For instance, ycf1 gene of some green Convolvulaceae was inversed through first IRa/SSC expansion and then contraction (Fig. 4A). The second is transposition-mediated plastomic changes in Trifolium (Cai et al. 2008) and Trachelium (Haberle et al. 2008). The third is homologous or nonhomologous recombination primary way to trigger inversions (Palmer 1991). The endpoints of most rearranged regions are flanked by repeated sequences in Convolvulaceae plastomes, which are related to abundant short and long repeats (Fig. 2). These repeats mainly occurred via slippage and mispairing during DNA replication or repair (Palmer 1991). Notably, plastids of a few Convolvulaceae are paternal inheritance, which might lead to intermolecular recombination (Palmer 1991; Yuan et al. 1998). Therefore, more complicated ways of plastome evolution may elucidate the occurrence of rearrangements, rather than any single mechanism mentioned above.

Consistent with previous report (Banerjee and Stefanovic 2020), we found that the ancestor of C. subgen. Cuscuta lost the IR region (Fig. 4B). The loss of one copy of IR has also been observed in legumes (Palmer et al. 1987; Wojciechowski et al. 2004; Cai et al. 2008), and members of Orobanchaceae (Downie and Palmer 1992; Wicke et al. 2011), Cassytha L. (Song et al. 2017; Yu et al. 2023), Drypetes Vahl (Jin et al. 2020a ), Erodium L'Hér. ex Aiton (Guisinger et al. 2011), Lophopyxis Hook. f. (Jin et al. 2020a ), and Saguaro cactus (Sanderson et al. 2015). However, the reason for the IR loss in these groups remains unknown. Previous studies of legumes suggested that the IR promotes genome structure stability largely by impending rearrangement events (Palmer et al. 1987; Palmer 1991; Cai et al. 2008). However, plastomes of C. subgen. Cuscuta showed less rearrangements than IR-retaining subgenera of Cuscuta and several autotrophic Convolvulaceae (Fig. 3). Similar findings have also been reported in Geraniaceae (Guisinger et al. 2011), Campanulaceae (Knox 2014), and Lobeliaceae (Knox and Palmer 1999).

Additionally, our analyses revealed that Convolvulaceae exhibited a complex IR boundary shift pattern with six types of IR regions (Fig. 4A). Of these, IRa 5′ contraction is an ancestral trait of Convolvulaceae (Fig. 4B), while IRa 5′ expansion was detected in Erycibe (Lin et al. 2022). Autotrophic Convolvulaceae species exhibited different IR/SSC boundary shifts (Fig. 4), indicating that it has undergone sophisticated IR boundary variations and unprecedented IR evolutionary patterns. Furthermore, IRb 3′ contraction occurred in the common ancestor of Cuscuta sect. Ceratophorae in the subgenus Grammica, excluding C. costaricensis (Banerjee and Stefanović 2019). The IR boundary shifts showed a strong phylogenetic signal indicating that IR boundary variations have significant evolutionary signature in Convolvulaceae (Supplementary Table S2).

How are structural variations triggered in Convolvulaceae? It is believed that the lifestyle change from autotrophy to parasitism is the main factor and triggers genomic structure instability via functional relaxation (Wicke et al. 2013; Wicke and Naumann 2018). Several genes in the inversed fragments (the region 2, 3, 7, 9 10, and 12) showed significant relaxed selection with lower GC content (Fig. 2; Supplementary Fig. S10), indicating that genes from organellar and nuclear genomes involved in repairing DNA breaks, mis-incorporated bases, and illegitimate recombinants during coevolution may also undergo relaxed selection (Barrett et al. 2019; Zhang et al. 2018). Therefore, the relaxed selection is likely the driving force behind these fragment variations through introducing unstable factors, such as large repeats, low GC content, and DNA breaks in Cuscuta.

It is noteworthy that genomic inversion and translocation did not have significant differences between autotrophic and parasitic species in Convolvulaceae (P > 0.05, Supplementary Table S3) by using Fisher’s exact test, except for all types of structural variations (inversion + translocation) (P < 0.05). This suggested that the lifestyle changes are associated with structural variations of plastome in Convolvulaceae. The IR boundary shifts accelerated among Convolvulaceae after the lifestyle transition (Fig. 4). Furthermore, Convolvulaceae showed a specific evolutionary history of plastome and tended to sophisticated plastomic organizations. Thus, a comprehensive phylogenetic framework of the family could provide insights into the evolutionary reconstruction of structural variations of plastome.

Evolutionary history of plastid genes in Cuscuta

Massive gene loss was detected in Cuscuta, being consistent with previous studies (McNeal et al. 2007; McNeal et al. 2009; Banerjee and Stefanović, 2019, 2020) (Supplementary Fig. S2). Ancestral state reconstruction revealed that the trajectories of plastid gene loss in Cuscuta can be described as five stages: NAD(P)H complex (ndh genes) → PEP complex (rpo genes) → Photosynthesis-related (psa/psb, pet, ycf3/4, cemA, and ccsA genes) → Ribosomal protein subunits (rps and rpl) → ATP synthase complex (atp genes), which is similar with the findings of Banerjee (2022). In addition, housekeeping genes also experienced concomitant degradation in Cuscuta (Fig. 1). The trajectories of physical and functional loss of genes appeared to be similar to other heterotrophic plants (Barrett and Davis 2012; Wicke and Naumann 2018), which supported models proposed by Wicke et al. (2016) and Graham et al. (2017). However, Cuscuta displayed some differences in the trajectory. Photosynthesis-related genes appeared to be retained longer, although there were small losses of peripheral genes before a full-scale loss of photosynthetic genes. A similar pattern of gene loss has also been found in Corallorhiza (Barrett and Davis 2012; Barrett et al. 2014).

Plastid genes in Cuscuta exhibited an increasing disruption of revolutionary stasis compared to their autotrophic counterparts of Convolvulaceae. In autotrophic Convolvulaceae, only rpl23 gene lost the function in plastomes of three autotrophic genera in this study (Fig. 1), as well as in the study of Lin et al. (2022). Ancestral state reconstruction analyses suggested that the common ancestor of Convolvulaceae may have lost function of the rpl23 gene (Fig. 1 and Supplementary Fig. S7). This study found that IRb 3’ expansion disrupts the open reading frame of the rpl23 gene in autotrophic Convolvulaceae and the ancestral state reconstruction showed that the occurrence of IRb 3′ expansion was associated with functional loss of the rpl23 gene (Fig. 4). These results indicated that loss of plastid genes might be linked to the IR expansion or contraction (Li et al. 2021). There are progressive pseudogenization and gene loss through the evolutionary divergence of subgenera in Cuscuta, Fisher’s exact tests showed that differences among subgenera were found in rpo and pet gene groups, and tRNA complex, with around one-third of the tested genes (Supplementary Table S3). Combined with ancestral state reconstruction, these genes were progressively and repeatedly lost in the common ancestor of each subgenus (Fig. 1). Notably, the clade C. erosaC. strobilacea in C. subgen. Grammica explosively lost a series of photosynthesis-related genes, suggesting this subgenus is undergoing continuous and gradual evolutionary changes, with increased disruption of evolutionary stasis (Braukmann et al. 2013; Banerjee and Stefanović 2019). Banerjee and Stefanović (2023) reported an evolutionary model in 12 sections of C. subgen. Grammica and proposed a refined model specific to Cuscuta, but the subtle gene reduction of the evolutionary model in C. subgen. Grammica remains unclear due to insufficient sampling coverage at species level to better understand the lineage-specific evolution of plastome reduction, thus, calling for broad and comprehensive taxon samplings of the whole genus of Cuscuta (Barrett et al. 2019). Moreover, rpl32 and rps16 genes were frequently lost in other heterotrophic lineages (Barrett et al. 2014; Cusimano and Wicke 2016; Chen et al. 2020) and they are essential components of the plastid translation apparatus, suggesting that they were likely transferred to nuclear or replaced by nuclear gene copies (Fleischmann et al. 2011; Park et al. 2015; Shrestha et al. 2020). Therefore, the loss of photosynthesis and relaxed requirement of plastid translation may trigger frequent transfer of ribosomal protein genes from plastids to the nucleus.

Most gene loss and or pseudogenization showed significant phylogenetic signals (Supplementary Table S2), which indicates that gene loss and pseudogenization have strong relatedness among species. Hence, related Cuscuta species shared the same pattern in gene loss and plastome degradation (Fritz and Purvis 2010). For example, C. subgen. Grammica shared pseudogenization and gene loss in the same gene groups (Supplementary Fig. S2), and ancestral state reconstruction showed that these genes were lost in the common ancestor of C. subgen. Grammica. In addition, our findings support the hypothesis that different evolutionary processes could produce similar phylogenetic signals (Revell et al. 2008). In other words, genes that exhibited strong phylogenetic signals could undergo different gene degradation processes. For example, the petA gene that exhibited a strong phylogenetic signal underwent two regressive ways. First, the petA gene lost its function in the common ancestor of Cuscuta, then recovered repetitively, and at last, completely lost in the clade C. erosaC. strobilacea of C. subgen. Grammica (Fig. 1). Moreover, the functional loss of petA gene occurred independently in C. japonica (Supplementary Fig. S7), rather than in the common ancestor of Cuscuta. These results indicate that using phylogenetic signals to directly infer evolutionary procedure has limitations (Revell et al. 2008). Furthermore, our results support the idea that a strong phylogenetic signal is generally uncorrelated with the evolutionary rate (Revell et al. 2008). For example, both petB and petD exhibited a strong phylogenetic signal (Supplementary Table S2), despite that petD underwent relaxed selection while petB did not (Supplementary Fig. S10). Therefore, it is necessary to exercise caution when inferring evolutionary rate or process by measuring phylogenetic signals.

Mystery of plastomic degradation

Plastome degradation involves more than just genomic size reduction and gene loss (Wicke and Naumann 2018; Barrett et al. 2019). It also includes rearrangement, IR boundary shift, nucleotide change, increased repeating number, and altered evolutionary rate (Wicke et al. 2013; Wicke and Naumann 2018; Barrett et al. 2019). To understand these changes, it is necessary to consider the common ancestor of Cuscuta rather than just the extant taxon alone. Our studies revealed that the divergence time of Cuscuta (53.69 Mya) was similar to the emergence of parasitism in Orobanchaceae (49.68 Mya) (Yu et al. 2018), however, Cuscuta has lost hemiparasitic lifestyle lineages. This suggests that Cuscuta could have more time to accumulate mutation (Barrett and Davis 2012). Previous studies have consistently shown there are more genes under relaxed selection in Cuscuta than those in autotrophic lineages of angiosperms (McNeal et al. 2007; Banerjee and Stefanovic 2020).

This study showed that 25 out of 65 genes in Cuscuta underwent relaxed selection (Supplementary Fig. S10), and nearly half of the gene groups showed significantly relaxed selection (Fig. 5), which coincided with the degree of plastomic degradation. Specifically, genes with significant relaxed selection were lost in only a few species. Interestingly, we observed microstructure and base mutations in these genes (Supplementary Figs. S4–S6). For instance, internal stop codons, large deletions, and fragmented reading frames manifested physical or functional loss of rpl23 genes among Cuscuta (Supplementary Fig. S6). Gene psbZ lost its function by deleting 36 bp in 5′ end region in C. africana, and mutation [tTa (Leucine) → tAa (stop codon)] in C. erosa and C. boldinghii, leading to its pseudogenization (Supplementary Fig. S5). This suggests that small indels may be accumulated under the relaxed selection (Barrett and Davis 2012; Wicke and Naumann 2018). In addition, codon usage biased toward stop codon may also indicate micro mutations resulting in gene functional or physical loss (Supplementary Fig. S3).

We hypothesized that the plastome of Cuscuta is undergoing functional decline after the transition to parasitism, which causes relaxed purifying selection in genes and accumulation of point mutations, such as in GC content decrease, repeat increase, and gene loss. Accumulation of small mutations might be attributed to delete gene or inter-gene regions and genomic structural variations (Krause 2008; Barrett and Davis 2012; Wicke and Naumann 2018). This may explain why the common ancestor of Cuscuta experienced severe plastome degradation and had accelerated disruption of evolutionary stasis. Although the divergence time of many parasitic angiosperms was earlier than that of Cuscuta, there are fewer genomic evolutionary studies of these parasites. Therefore, our study may provide insight into the plastome evolution of basal parasites and holoparasitic lineages. Furthermore, Cuscuta has a smaller genome size than free-living Convolvulaceae species, which have larger genomes and insertions. Thus, our findings support the hypothesis that heterotrophic plants tend to shrink plastid genomes due to the bioenergetic costs of maintaining DNA (Barrett and Davis 2012). This genomic degradation may contribute to synchronizing its own physiology with its host, thereby, optimizing parasite fitness (Shen et al. 2020).

Conclusions

Our study demonstrated that the lifestyle change of parasitism is associated with structural variations of the plastome in Convolvulaceae, and described that the evolution of plastid genes in Cuscuta could be described as five stages. Moreover, our findings highlighted the importance of using phylogenetic signals to explain evolutionary history and rates. Additionally, we observed that the plastomes of Convolvulaceae break the evolutionary stasis, with one clade exhibiting an accelerated degradation rate. Based on our results and previous studies (Wicke and Naumann 2018; Barrett et al. 2019; Shen et al. 2020), we can speculate about the degradation process of the plastome and its driving force. Overall, our study offers a unique perspective on understanding molecular evolution and genomic structural variations.