Introduction

Photosynthesis is the hallmark of green-plant evolution; however, this process has been lost repeatedly, most often observed now in flowering plants. As a consequence, parasitic plants have reduced to completely absent reliance on photosynthesis and are instead dependent on autotrophs for meeting their carbon budget. True parasitism, which requires the forging of direct connections between the vascular systems of the host and parasite via haustoria, has evolved independently at least 13 times across angiosperm lineages and is associated with sweeping morphological, physiological and genomic changes brought about by a decreased reliance or complete lack of photosynthesis (Barkman et al. 2007; Colwell 1994; Kuijt 1969; Westwood et al. 2010). This so-called ‘parasitic reduction syndrome’ defines a set of convergent evolutionary changes shared amongst distinct groups of heterotrophs (including both haustorial parasites as well as mycotrophic plants) and provides a fertile system to study genomic changes caused by a relaxation of erstwhile strong selective pressures.

The genomic changes observed as a result of a transition to a parasitic lifestyle are expected to be particularly striking in the plastid genomes (plastomes) of heterotrophic plants given the importance of plastids as the site of photosynthesis in the cell. Photosynthetic land plants typically have plastomes that are subjected to strong purifying selection and are consequently highly conserved in terms of their size, gene content, and sequence order (Wolfe et al. 1992). They are usually 140 to 160 kbp in length (e.g., Nicotiana tabacum: 155,939 bp, Arabidopsis thaliana: 154,478 bp), coding for c. 113 genes and contain four major structural components: the large and small single copy regions (LSC and SSC respectively) and two inverted repeat regions (IRs) (Downie and Palmer 1992; Wicke et al. 2011). Most plastome genes code for key portions of the photosynthetic apparatus or for plastid housekeeping functions, although there are a few genes with other or unknown functions (Braukmann et al. 2017; Wicke et al. 2011). Hence, plastomes in heterotrophic plants are often found to have undergone drastic reductions in size and gene composition. A series of recent studies highlight heterotrophic plastomes of severely reduced length and content (Barrett et al. 2018; Bellot and Renner 2015; Braukmann et al. 2017; Naumann et al. 2016), and there is even indication that plants may have lost their plastid genomes entirely (e.g., Rafflesia lagascae) (Molina et al. 2014).

Comparative analyses of plastomes along the full trophic spectrum, from autotrophy to mixotrophy to full heterotrophy, would allow for assessment of the degree to which genomic changes take place prior to complete loss of photosynthesis, and the dissection of the evolutionary pressures on plastomes exerted by their other metabolic functions. Over time, as ever greater numbers of parasitic and mycoheterotrophic plant plastomes are being reported, the general pattern of sequence loss linked to heterotrophy is emerging across the angiosperms: ndh genes (primarily responsible for mitigating the effects of photo-oxidative stress) are usually the first system of genes to be lost, followed by groups of photosynthesis-related genes (psa, psb, pet etc.), while a core group of non-bioenergetic genes are generally retained even in severely reduced genomes (Barrett and Davis 2012; Graham et al. 2017). However, these initial observations still need broader support from plastomes in heterotrophic lineages which are underrepresented in current research. Furthermore, the currently available plastome data are mainly coarse in the scale of their sampling as far as phylogenetic distance between species is concerned, with notable exceptions being provided by the recent work on Corallorhiza (Orchidaceae; Barrett et al., 2018) and on Aphyllon (Orobanchaceae; Schneider et al., 2018). Fine-scale comparisons of plastomes amongst closely related species are more likely to fully capture the sequence of evolutionary events as plants change along the gradient from photosynthetic to holoparasitic.

Owing to its species richness, broad distribution, and the fact that it represents one of the 13 independent transitions to haustorial parasitism within the angiosperms, Cuscuta (dodders, Convolvulaceae) is one of the most studied lineages of parasitic plants. While these plants have a nearly world-wide distribution, c. 75% of the generic diversity is in the Americas (Garcia et al. 2014; Yuncker 1932). This lineage is comprised of c. 200 branch parasites characterized by scale-like leaves, twining, pale, slender stems and an absence of roots. Their unusual colour is attributed to absent or reduced accumulation of chlorophylls (van der Kooij et al. 2000), and species in this genus are all unable to meet their food requirements on their own (i.e., they are obligate parasites). Most Cuscuta species are thus nearly completely dependent upon their hosts and forge direct xylem and phloem connections with green plants for sourcing water and nutrients (Heide-Jorgensen 2008). Some dodders are, however, capable of limited and localized photosynthesis (Dawson et al. 1994; Hibberd et al. 1998) and have been referred to as ‘cryptically photosynthetic’ (McNeal et al. 2007a). This feature makes Cuscuta a heterogeneous group from a trophic point of view (Westwood et al. 2010), containing both hemi- and holoparasitic representatives. Also, phylogenetically, this genus is well-understood and circumscribed in four subgenera (Grammica, Pachystigma, Cuscuta, and Monogynella) and 19 sections (Costea et al. 2015).

Despite the intensity of scrutiny on the dodders over the years and the relatively large size of the genus, only four plastomes from Cuscuta have been published to date (Funk et al. 2007; McNeal et al. 2007b), two each from the subgenera Monogynella and Grammica (essentially representing two data points, owing to phylogenetic proximity). The plastomes from subg. Monogynella are ~ 120 kbp long (with c. 103 genes) while those from subg. Grammica are ~ 85 kpb in length (with c. 92 genes) demonstrating a great degree of variation in plastome length and gene composition in the genus. However, the tempo and mode of plastid genome evolution across the diversity of the group remain largely unknown. Recognizing the potential importance of Cuscuta as a system of closely-related plants that seem to exhibit fine-scale variation in a typically conserved and slow-evolving genome, Braukmann et al. (2013) surveyed the genus for the presence or absence of 48 protein-coding plastome genes across a phylogenetically diverse sample of 112 species through a Southern hybridization slot-blot assay. They identified confirmed trends of gene loss across the dodders (e.g., all ndh genes, rpl23, rpl32, rps16) but were also able to pinpoint two sections, both within the largest subgenus Grammica, that exhibited more varied and extensive reductions as particularly interesting for further studies (Braukmann et al. 2013).

One of these groups, section Ceratophorae, is remarkable for the variation in plastome content apparent among the five species sampled initially (Braukmann et al. 2013). This complex seemed to contain members that have ancestral plastomes (i.e., similar to previously reported Grammica samples) as well as those that appeared highly modified. In other words, sect. Ceratophorae presents a gradient of plastome degradation in what is phylogenetically a group of very closely related species. This pattern appeared to be congruent with a progressive gradual change, as described by the Evolutionary Transition Series hypothesis (Mayr 1982; Young et al. 1999), and offers an opportunity to study the evolution of parasitic plastomes ‘in action’. The broad survey by Braukmann et al. presented interesting trends but was incomplete in terms of taxonomic sampling and gene coverage, contained no sequence data, and was thus unable to provide information at the most basic, sequence level, including none on elements like ribosomal and transfer RNA genes, introns, promoters, etc. Our current investigation builds on their work and comprehensively examines the evolution of the plastid genome in sect. Ceratophorae through the assembly of plastomes from all eight known species in this clade in order to learn how sequence changes accumulate in Cuscuta. We also compare these trends to the patterns of plastome evolution observed in other lineages of heterotrophic plants.

Materials and methods

Taxon sampling, DNA extraction, and sequencing

Cuscuta section Ceratophorae was essentially exhaustively sampled; all eight species confirmed to belong to this clade (Costea et al. 2015, 2011) were used for our study: C. boldinghii Urb., C. bonafortunae Costea and I. Garcia, C. carnosa Costea and Stefanovic, C. chapalana Yunck., C. costaricensis Yunck., C. erosa Yunck., C. mexicana Yunck., and C strobilacea Liebm (voucher information available on NCBI; accession numbers listed in Table 1). A potential ninth species (C. ortegana) has been previously placed in Ceratophorae but there are doubts as to whether it actually belongs in this section, and therefore has been excluded here. Total genomic DNA was isolated from fresh, herbarium, or silica-dried tissue using the modified cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle 1987) and checked for quantity and quality using a Nano Drop 1000 Spectrophotometer (Thermo Fisher Scientific). Extractions were sent for high-throughput sequencing on either an Illumina HiSeq 2500 platform (2 × 126 bp paired-end reads; The Centre for Applied Genomics, SickKids Hospital, Toronto, Ontario) or an Illumina HiSeq 2000 platform (2x100 bp paired-end reads; McGill University and Genome Quebec, Montreal, Quebec). Demultiplexing of raw reads and the removal of indexing barcodes were performed at the sequencing facilities.

Table 1 Plastid genome size and structure information for the eight species assembled from the Cuscuta section Ceratophorae

Plastome assembly, annotation, and correlation with Southern hybridization results

Reads were trimmed using Sickle v1.33 (Joshi and Fass 2011) with minimum post-trim read lengths set at 71 bp for the 2 × 100 bp reads and 99 bp for the 2 × 126 bp reads, and the threshold for quality set at a minimum PHRED score between 25 and 27 at each nucleotide. Several separate assemblies were conducted in the native Geneious (R9 and R10) (Biomatters, Auckland, New Zealand) de novo assembler using 10–15% of total paired-end reads (between 35,110,054 and 81,302,878 reads for each species) with the ‘produce scaffolds’ and ‘don’t merge variants’ boxes unchecked. The derived contigs were aligned and joined, with remaining gaps closed manually on Geneious.

Initial annotations for Cuscuta costaricensis were conducted on Geneious using Cuscuta obtusiflora (McNeal et al. 2007b) as a reference and then refined and confirmed using BLASTx searches (Altschul et al. 1990). The fully annotated C. costaricensis plastome was then used as a reference for the annotation of the remaining seven species in this study. Cuscuta costaricensis was chosen for this role because it has been shown to be sister for the rest of section Ceratophorae in previously published and well-supported phylogenies (Costea et al. 2011). BLASTx was used to establish all open reading frames and to identify pseudogenization events while tRNAscan-SE 2.0 was used to identify and determine the boundaries of tRNAs (Lowe and Chan 2016). The genomes were aligned natively on Geneious for comparative analyses regarding size and composition, and using the progressiveMauve system (Darling et al. 2010) for structural comparisons.

The results generated by the whole plastome assemblies produced in this research were correlated with the results of the Southern hybridization survey conducted for the same species complex in Braukmann et al. 2013. Their survey had included five of the eight species included here. The slot-blot probe results were classified either as full strength ‘positive signal’ or as ‘reduced or no signal’ while the genes from the whole plastome data were denoted ‘present’ as open reading frames and considered functional, ‘pseudogenized’ and considered non-functional, or entirely ‘absent’. A goodness-of-fit (Chi square) analysis of independence (with one degree of freedom) was conducted for each species to compare the results from the two studies.

Phylogenetic analyses and computational methods

All protein coding regions greater than 150 bp in length present across the eight Ceratophorae plastomes sampled were aligned and used for the whole-plastome phylogenetic reconstruction. This amounted to 27,587 bp in the dataset. Cuscuta obtusiflora, also in subgenus Grammica and whose plastome has been previously published (McNeal et al. 2007b), was included as an outgroup. jModelTest 2 (Darriba et al. 2012; Guindon and Gascuel 2003) was used to infer the best model for DNA substitution rates (identified to be GTR + G+I for these data). Given the moderate number of operational taxonomic units, a branch-and-bound search was conducted for the best tree in PAUP v4.0 (Wilgenbusch and Swofford 2003) ensuring the recovery of all optimal trees. Both maximum likelihood (ML) and parsimony were used as optimality criteria. The software was allowed to estimate all parameters of the model, including the frequency of state changes and the rate of site-to-site variations. To provide statistical support, bootstrapping was conducted under ML, with a full heuristic search, tree-bisection and reconnection branch swapping, and 10,000 replicates.

The dN and dS branch lengths for seven protein coding genes (accD, ycf1, ycf2, atpB, atpH, clpP and rpl20), chosen as representatives from the major functional groups of genes that are retained in the plastomes of all eight species sampled, were estimated based on the same model of DNA evolution and using the fixed topology obtained from the above whole-plastome ML analysis. For dN trees, only bases occupying the first and second codon positions were used to calculate branch lengths because substitutions in these bases generally lead to nonsynonymous amino acid changes. Only third codon bases were used to calculate branch lengths for the dS trees because substitutions involving the wobble position usually cause synonymous change. The type of selection was estimated on a codon-by-codon basis using a fast, unconstrained Bayesian approximation for inferring selection (FUBAR) (Murrell et al. 2013) on the platform HyPhy (Pond et al. 2005) using 5 independent MCMC chains to obtain posterior samples of grid point weights based on 2,000,000 total steps and sampling every 10,000 steps. The phylogenetic relationships used for the FUBAR analysis was inferred using plastome data in this research (as described previously).

Results

Plastome reductions in Cuscuta sect. Ceratophorae

Closed circular plastomes were assembled for each of the eight species sampled from section Ceratophorae, ranging from 86,691 bp to 60,959 bp in length (Table 1, Fig. 1). Three species (Cuscuta boldinghii, C. strobilacea, and C. erosa) are substantially more reduced than the other five ingroup species, and represent a 60–62% sequence reduction compared to the photosynthetic Ipomoea nil, also belonging to the morning glory family.

Fig. 1
figure 1

Annotated plastid genomes from eight species belonging to Cuscuta section Ceratophorae created using OrganellarGenomeDRAW v1.3.1 (Greiner et al. 2019). For legibility, only one of the two inverted repeat regions are represented. Pseudogenes are represented by the ψ symbol beside the gene labels. Sequence inversions are shown by the gray dashed lines. The phylogenetic relationships are based on the whole-plastome phylogenetic analysis (compare with tree in Fig. 3)

In terms of gene composition, plastomes in this group contain between 32 and 59 protein coding genes, and between 21 and 25 tRNA genes. Cuscuta costaricensis is the largest plastome with 59 protein coding and 25 tRNA genes, and shows a 21% composition reduction compared to Ipomoea nil (whose plastome contains 78 protein coding and 30 tRNA genes) (Park et al. 2018). The most diminished genome, Cuscuta boldinghii, contains 32 protein coding and 25 tRNA genes, a reduction of 46% (Table 1). These results confirm the expectation of substantial variation in terms of genome size and composition within this group of very closely related plants.

Similar to other sequenced Cuscuta species (Funk et al. 2007; McNeal et al. 2007b), the whole ndh family of genes has been lost from all Ceratophorae plastomes along with the rpo genes. The maturase-encoding matK, photosystem genes psaD and psaI, ribosomal protein coding rpl23, rpl32 and rps16 genes, and ycf15 of unknown function are also missing from the eight plastomes sampled (Fig. 2). In addition to these reductions, a subclade containing Cuscuta strobilacea, C. boldinghii and C. erosa have also lost most of the cytochrome b6/f (pet) and photosystem I and II (psa and psb) genes, pivotal to the functioning of the photosynthetic apparatus (Figs. 2 and 4). In contrast, all ATP synthase (atp) genes (except for atpF in C. boldinghii) and most ribosomal protein coding genes have been retained even in the smallest Ceratophorae plastomes.

Fig. 2
figure 2

Heat map showing plastome sequence content in the eight species assembled from Cuscuta section Ceratophorae. Genes represented in black are present as open reading frames and considered functional, those in gray are present as pseudogenes, and those in white are absent. For legibility, the four rpo genes have been excluded from this figure because they are all absent from these species. The phylogenetic relationships are based on the whole-plastome phylogenetic analysis (compare with tree in Fig. 3)

Table 2 shows presence and absence data for tRNA genes in the genomes assembled. trnG-UCC, trnI-GAU, trnK-UUU, trnR-ACG and trnV-UAC have been lost completely in sect. Ceratophorae, as is the case for C. campestris and C. obtusiflora in sect. Cleistogrammica (24 tRNA genes). However, sect. Ceratophorae maintains trnA-UGC in seven of its eight species (all except for C. bonafortunae), a gene absent from both known sect. Cleistogrammica plastomes. Cuscuta mexicana is also missing trnR-UCU and trnS-GCU, while C. carnosa has lost trnC-GCA, trnD-GUC, trnE-UUC, and trnY-GUA (Table 2).

Table 2 tRNA gene presence (+) and absence (−) in the eight species assembled from Cuscuta section Ceratophorae

Correlation with the Southern hybridization survey results

As mentioned before, Braukmann et al. (2013) conducted a comprehensive Southern hybridization survey of protein coding genes, encompassing nearly 60% of the known species diversity in Cuscuta (Braukmann et al. 2013) including five sect. Ceratophorae species. Table 3 compares the results of that study with the whole-plastome analysis conducted here. In general, there is good correlation between the presence and absence information generated by Braukmann et al. and the next-generation sequencing (NGS) approach used here, with 239 of 275 results in agreement. Pairwise Chi squared analyses comparing the two datasets found the differences between them to be non-significant for each of the five comparable species with p values ranging from 0.21 to 0.62 (Table 3).

Table 3 Correlations between the whole plastome data generated in this study and the results of the Southern hybridization survey conducted for the same species complex by Braukmann et al. 2013

Pseudogenes

Ten pseudogenization events are recorded in these plastomes (Figs. 2 and 4), based on observations of premature stop codons present early on in the ORFs, high sequence divergence, or missing start codons. atpF is pseudogenized in Cuscuta boldinghii due to multiple stop codons beginning approximately two-thirds of the way through the gene, caused by a frameshift mutation. The same is true of ccsA in C. carnosa (approximately in the first quarter of the sequence). Single stop codons, likely due to single substitutions, have caused pseudogenization events in cemA and rbcL in C. mexicana, as well in rpl36 along the branch leading to C. bonafortunae, C. strobilacea, C. boldinghii and C. erosa. High sequence divergence has caused the absence of recognizable ORFs at the DNA- as well as the protein-level for petG, psaJ and psbT in the three most reduced species, with petG and psbT eventually being completely lost in C. boldinghii and C. erosa, respectively. Finally, missing start codons have led to the losses of ORFs for psbL in C. bonafortunae, psbM in C. carnosa and psbZ in C. strobilacea and C. erosa, followed by its total removal from C. boldinghii.

Structural changes

Two sequence inversions have been found, as marked in Fig. 1: one between cemA and trnC-GCA (~ 23,100 bp) in Cuscuta bonafortunae, and the other between rpl2 and rps8 (~ 3900 bp) in C. boldinghii. Aside from these two cases, there are no other major plastome rearrangements or structural mutations in section Ceratophorae (as shown in supplementary Figure S1).

The basic four-part structure (IR-SSC-IR-LSC) of the plastome molecule is maintained in the species assembled, with no major sequence ebb-and-flow between these components. The only minor change noted is the movement of tRNA-Ile (CAU) from the IR to the LSC in all Ceratophorae plastomes other than C. costaricensis, which retains this tRNA gene in the IR like the other Cuscuta genomes published thus far and like photosynthetic members of the Convolvulaceae.

Phylogenetic relationships and selection on genes

Branch-and-bound searches using the whole-plastome sequence data produced the same single best topology regardless of which optimality criterion (ML or parsimony) was employed. The phylogeny created using whole-plastome alignments under maximum likelihood is shown in Fig. 3 and has been used throughout this paper and in all figures.

Fig. 3
figure 3

Maximum likelihood tree obtained through a branch and bound search under the GTR + I + G model generated using 27,587 bp aligned from the whole plastomes dataset assembled in this study. All genes common to the eight Cuscuta section Ceratophorae species (at least 150 bp in length) were used to populate the dataset. ML bootstrap support values have been provided at nodes. Numbers following species names correspond to internal DNA accessions

Figure 5 shows the branch lengths of trees generated for the protein coding genes accD, ycf1 and ycf2 for sites that are assumed to accrue more nonsynonymous substitutions (i.e., the first two codon positions, dN in Fig. 5) vs. those that are expected to accrue more synonymous substitutions (the wobble position, dS). For each of the three genes, the branch lengths, and thus the amount of evolutionary change, accumulated in the dS tree is greater than that in the dN tree indicating synonymous changes are more prevalent than nonsynonymous ones in Ceratophorae plastomes. This analysis was also conducted on four additional genes (atpB, atpH, clpP, and rpl20) and results, showing the same trends, are depicted in the supplementary Figure S2.

Table 4 shows the number of codons under positive and purifying selection (with greater than/equal to 90% probability) for each of the seven genes mentioned previously based on alignments constrained by the inferred phylogeny in Fig. 3. The number of codons under purifying selection is greater than those under positive selection for every gene. This difference is most stark for atpB and atpH for which 25.1% and 11.1% of codons are under purifying selection respectively.

Table 4 Analysis of selection on selected plastome genes in Cuscuta sect. Ceratophorae conducted using FUBAR (a fast, unconstrained Bayesian approximation for inferring selection; Murrell et al. 2013) on HyPhy (Pond et al. 2005)

Discussion

Plastid genomes in Cuscuta sect. Ceratophorae

The results of the whole-plastome assembly research reported here document in-depth the fine-scale differences in plastome size and gene composition within Cuscuta sect. Ceratophorae, a complex of closely related species previously predicted to harbour a substantial heterogeneity in their plastid genomes (Braukmann et al. 2013). Furthermore, we distinguish between two categories of plastomes in this lineage. In one category we see plastomes that are fully comparable in size and gene content to those of other species in the subgenus Grammica; these plastomes are reduced when compared to autotrophs but still contain a large number of photosynthetic genes/ORFs, compatible with their ‘cryptic’ photosynthetic status. The other category contains plastomes with a further strong reduction in size and gene content, including the loss of most genes involved in photosynthesis; these plants are likely not capable of even residual (cryptic) amounts of photosynthesis and are the first truly holoparasitic plastomes described for Cuscuta (Table 1, Fig. 1).

Cuscuta costaricensis has been shown to be sister to the rest of sect. Ceratophorae both in previous research (Costea et al. 2011) and currently (Fig. 3). The plastid genome of this species seems to be of the ancestral type for subgenus Grammica. It is nearly identical in terms of sequence length and protein-coding gene content to the previously published Cuscuta obtusiflora and C. campestris genomes (Funk et al. 2007; McNeal et al. 2007b) from the relatively closely related section Cleistogrammica (Stefanovic et al. 2007). It also retains trnI-CAU in the inverted repeat (IR) region, a structural similarity with C. obtusiflora and C. campestris that none of the other Ceratophorae species share. Cuscuta chapalana, likewise, is identical to C. obtusiflora when it comes to protein-coding gene composition and has only lost four tRNA genes (Tables 1, 2, Fig. 2). The six other species have accumulated a number of losses and/or pseudogenization events in protein-coding genes, as shown in Fig. 4, although most of these changes occur on the branch leading to the clade containing C. boldinghii, C. erosa, and C. strobilacea. It is on this branch that a complete loss of photosynthesis appears to have evolved in Cuscuta.

Fig. 4
figure 4

Modified plastome tree from Fig. 3 mapping functional and actual protein-coding gene loss events in Cuscuta section Ceratophorae. Genes functionally lost (i.e., pseudogenes) are shown below branches and physical gene losses are shown above. Numbers following species names correspond to internal DNA accessions

As in four previously sequenced species of Cuscuta (Funk et al. 2007; McNeal et al. 2007b), and as suggested by the Southern hybridization survey across the genus (Braukmann et al. 2013), plastomes in sect. Ceratophorae do not contain any NADH dehydrogenase (ndh) or plastid-encoded polymerase (PEP, rpo) genes. The loss of ndh genes is expected in heterotrophic plants because they are not essential in normal, non-stress environments (Krause 2011). The products of these genes have been implicated in photo-oxidative stress response through the regulation of electron flow (Peltier et al. 2016) but have even been lost from autotrophic plastomes, as in the case of some photosynthetic orchids (Chang et al. 2006; Wu et al. 2010). In dodders, the absence of ndh genes is further expected because of their limited gas exchange and consequent high internal levels of carbon dioxide, which mitigates photo-oxidative stress (Braukmann et al. 2013; Hibberd et al. 1998). Similarly, the absence of rpo genes from heterotrophic plastomes has been noted regularly in previous studies (Graham et al. 2017; Wicke et al. 2013) and is considered to be an early event in the transition away from a photosynthetic lifestyle. While the loss of PEP may seem unexpected at first, because plastomes without rpo genes are still often intact and functioning, there is evidence that the responsibility for the expression of plastid genes is subsumed by the nucleus and nuclear-encoded polymerase (NEP) in these cases (Krause et al. 2003).

Mirroring results found previously in Cuscuta sect. Cleistogrammica (Funk et al. 2007; McNeal et al. 2007b), plastomes in sect. Ceratophorae have also all lost the intron maturase gene matK (McNeal et al. 2009), which is usually responsible for splicing group IIA introns in the plastid genome. Correspondingly, seven of the eight group IIA introns are also missing in these species, all except for the one in clpP (second intron) which is capable of splicing itself (Zoschke et al. 2010). Five group IIB introns are retained in the five larger Ceratophorae plastomes (in rpl16, petD, petB, clpP (first intron), and trnL-UAA) while the ones in petD and petB are lost in the three reduced genomes along with the genes that contain them.

The clade containing Cuscuta boldinghii, C. erosa, and C. strobilacea has incurred substantial further losses in genes encoding key portions of the photosynthetic apparatus. Sequences responsible for photosystems I and II (psa and psb), as well as those coding for components of the cytochrome b6/f complex (pet) are mostly absent in these three plastomes, and those that remain largely do so as pseudogenes (as described above). The gene rbcL, which codes for the large subunit of RUBISCO, has been lost in these species (and pseudogenized in C. mexicana, Fig. 2). This is a particularly interesting finding, because rbcL has been known to often remain intact even in greatly reduced plastomes, such as those of Aphyllon (Orobanchaceae; Schneider et al., 2018) or various lineages of mycoheterotrophs (Graham et al. 2017) and is thought to have a supplementary function in lipid synthesis pathways (Schwender et al. 2004). In addition, two genes encoding membrane proteins involved in photosynthesis, cemA and ccsA, are also absent from this clade, as are the photosystem I assembly factors ycf3 and ycf4 (Fig. 2).

The only genes with a photosynthetic function that are still retained in this clade, otherwise containing species with highly reduced plastomes, are the ATP synthase (atp) family of genes of which all six still remain intact and presumably functional (except for atpF in C. boldinghii, Fig. 2). These genes are frequently found intact in heterotrophs, in particular in the Orobanchaceae and in several mycotrophic orchids (Barrett et al. 2018; Wicke et al. 2013) and hence presumably have a secondary non-photosynthetic function. They are known to have a role in protein transport across thylakoid membranes in autotrophic plants (Kohzuma et al. 2012), but it is not known if they maintain this particular function in heterotrophs (Graham et al. 2017).

As shown in Table 2, most tRNA genes that seem to be ancestral for subg. Grammica are preserved in sect. Ceratophorae, with five of the eight plastomes in the species complex containing the same 25 genes. The tRNA loss events in the remaining three species are all idiosyncratic and display no discernable phylogenetic pattern. There is no relationship between the number of protein-coding and tRNA sequences lost, with the three members of the clade containing more diminished plastomes possessing the full complement of these genes.

The correlation analysis conducted between the whole-plastome sequences from this research and the Southern hybridization survey conducted by Braukmann et al. (2013) concluded that there was a high degree of agreement in the results of the two studies (Table 3). A Chi squared goodness-of-fit test (with one degree of freedom) was unable to reject the null hypothesis that there are no significant differences between the two datasets for any of the five species. We can thus conclude that, for these species at least, the findings of the Southern hybridization probing in Braukmann et al. (2013) is accurate and can be used for analyses based on gene presence-and-absence data.

Selection on plastome genes

In order to further assess the state of plastomes in Cuscuta sect. Ceratophorae, we examined the type of selection acting upon plastid genes that are retained by all eight species. Figure 5 approximates the amount of nonsynonymous and synonymous substitutions (as represented by branch lengths derived from nonsynonymous and synonymous changes mapped on the constrained organismal topology) in three core non-bioenergetic genes (accD, ycf1, ycf2) known to retain vital functions even in strongly reduced plastomes (Bellot and Renner 2015; Graham et al. 2017; Wicke et al. 2013).

Fig. 5
figure 5

dN versus dS branch lengths for three genes retained in all Cuscuta section Ceratophorae plastomes generated using maximum likelihood and the fixed topology presented in Fig. 3. dN trees were created using the bases occupying the first and second codon positions while the dS trees were created using just the third codon (wobble) positions

Across the board, branch lengths generated by nonsynonymous substitutions (i.e., on the trees derived from the first and second codon positions) were shorter than branch lengths generated by synonymous substitutions (on the trees derived from the third codon position only) indicating that these genes are under purifying selection because they accumulate more synonymous changes than nonsynonymous ones. This trend is particularly strong for the ycf1 and ycf2 genes (Fig. 5). These results suggest that these core essential genes seem not to be in danger of impending pseudogenization/loss in any of these species and will probably be retained as functional in sect. Ceratophorae in the future. The results of the codon-by-codon analysis of selection conducted on FUBAR (Table 4) further substantiate these conclusions, with all three genes having more codons under purifying selection than under positive selection. For ycf1 in particular, 122 codons (6.7% of the total number) were found to be under purifying selection compared to just 1 codon (< 0.1%) under positive selection. Twelve codons (1.9%) were found to be under purifying selection in accD compared to four (0.6%) under positive selection, and 31 codons (1.6%) were found to be under purifying selection in ycf2 compared to 16 (0.8%) under positive selection.

These analyses were also conducted for four more genes, representative of additional functional gene groups retained by all species (supplementary Figure S2). For atpB, atpH, and clpP, the synonymous branch lengths are much longer than the nonsynonymous ones indicating that they are also under strong purifying selection. There is no clear trend with regards to the branch lengths for rpl20 suggesting that this gene may be under more neutral selection. Mirroring these results, 124 (25.1%) and 9 (11.1%) codons in atpB and atpH respectively were found to be under purifying selection compared to 0 codons under positive selection for both genes (Table 4). The gene clpP also has 0 codons under positive selection with 6 (3.0%) under purifying selection. Echoing the conclusions drawn from Figure S2, Table 4 shows that rpl20 seems to be under more neutral selection than the other genes observed, with 4 codons (3.1%) under positive selection and 5 codons (3.9%) under purifying selection. The gene clpP, which codes for a proteolytic subunit of a complex responsible for the breakdown of abnormally folded proteins, is also considered an important non-bioenergetic gene. As discussed earlier, though, the role of atp genes in heterotrophic plants is not well understood, and the fact that they appear to be under strong purifying selection in sect. Ceratophorae adds to the growing body of evidence indicting this this gene family is generally retained even in advanced cases of heterotrophy, strongly supporting the possibility of their products having an additional vital function in the plant cell.

The whole-plastome phylogeny for Cuscuta sect. Ceratophorae

The maximum likelihood phylogeny presented in Fig. 3 is entirely compatible with the plastid trees for section Ceratophorae generated in the past. The previous plastid tree, based solely on trnL-F, a short non-coding region (~ 620 bp aligned sequences), had multiple individuals sampled for a number of species. However, it suffered from a lack of support and resolution, especially along the backbone of the phylogeny, and the absence of plastid sequence data from two species. All of these issues are rectified here: we present a comprehensive sampling, with full resolution, and very strong support across the board (Fig. 3). On the other hand, our plastome-derived topology differs from previous trees produced using nuclear ribosomal sequences (ITS and partial 26S rDNA) in several aspects: the relative placement of Cuscuta mexicana and C. bonafortunae as well as the relationship between C. chapalana and C. carnosa (Costea et al. 2011). All of these discrepancies received only weak to moderate support with the ribosomal DNA data (Costea et al. 2011). Therefore, because the research described here focuses on plastid evolution, and has the benefit of more comprehensive taxon sampling and a much larger dataset, the phylogeny reconstructed here can be considered the best current plastome tree for sect. Ceratophorae.

Patterns of plastome reduction

The fine-scale system of parasitic plants sampled in the research reported here allows for comparisons of sequence composition between very closely related (and in some cases sister) species. This permits us to be able to phylogenetically triangulate the location of gene loss and pseudogenization events in sect. Ceratophorae in a very precise fashion and present a step-by-step description of plastome evolution. This is particularly interesting because Cuscuta is one of two clades of parasitic angiosperms, along with the Orobanchaceae, that are known to span the continuum from hemiparasitic to holoparasitic (Westwood et al. 2010; their Fig. 1). However, Cuscuta (and specifically its subgenus Grammica) achieves this trophic diversity at a lower phylogenetic level (i.e., species level) compared to Orobanchaceae. This makes the Cuscuta model described here more tractable because there is expected to be substantially less interference from the confounding historical factors, usually difficult to avoid at the higher phylogenetic level.

Figure 4 maps the location of changes in protein-coding genes in this species complex and has been generated based on the assumption of irreversibility, i.e., the premise that plastomes can lose function (and corresponding genes), but once lost, the function cannot be regained. The broad sequence of reduction supports the models of plastome degradation summarized earlier in the Introduction (Barrett and Davis 2012; Graham et al. 2017) whereby the ndh genes are lost first (in all Cuscuta), followed by rpo genes (apparently across subg. Grammica). The five species forming the basal grade in sect. Ceratophorae that exhibit limited plastid genome reduction are missing ndh and rpo genes but not much more and seem to fit solidly in the category of hemiparasites (i.e., “cryptically” photosynthetic species). The plastomes in the more reduced clade, however, have gone much further on the scale of degradation and have retained only the housekeeping genes, genes with other non-bioenergetic functions, and atp genes.

Braukmann et al. (2013), when analyzing the results of their Southern hybridization survey, speculated that plastid genomes in sect. Ceratophorae may exemplify gradual evolution according to the Evolutionary Transition Series hypothesis where changes accumulate phylogenetically ‘progressively’ (Mayr 1982; Young et al. 1999). Based on the results of the whole-genome approach in this research, it seems that plastome change in this species complex is somewhere in between gradual and punctuated (whereby short bursts of evolutionary change are separated by long periods of relative stasis). There is indeed an intense burst of gene loss along the branch leading to the more reduced clade in sect. Ceratophorae, but three of the other species in the group have undergone idiosyncratic sequence losses in protein coding genes as well (Fig. 4) and therefore cannot be described as simply being ‘in stasis’.