Keywords

12.1 Plastid Structure, Mutational Rates, and Inheritance in Angiosperms

Palmer (1985) provided an early review of plastid structure and gene content, documenting, in angiosperms, (1) its relatively small size (generally 120–160 knt); (2) high copy number (as many as 1000 per cell); (3) quadripartite circular structure comprising two inverted repeats (IR), flanking a large single-copy (LSC) region and a small single-copy (SSC) region; (4) labile structure of the IR region variously shrinking and expanding in different lineages with the junction between the inverted repeat and the large single-copy region located in a generally fixed position within the 276-nt rps 19 gene; (5) repertoire of a complete set of rRNA, tRNA, and protein-encoding genes (Fig. 12.1); (6) only rare modifications of this basic structure in parasitic plants with reduced gene content, deletion of the IR region in the Fabaceae, or extensive gene rearrangements in the Geraniaceae. In summary, most of the over 200 angiosperm chloroplast genomes examined at that time were overwhelmingly similar in size, conformation, repeat structure, gene content, and gene order and arrangement, with the predominant mode of structural evolution consisting of small deletions and insertions occurring in intergenic spacers, 5′ and 3′ untranslated regions, and in the few introns found in their genes.

Fig. 12.1
figure 1

Structure of the carrot mitochondrial and plastid genomes and inter-organelle DNA transfer; genome coordinates every 25 kb are listed inside the figure. a Mitochondrial (top) and plastid (bottom) genomes (visualized using Circos version 0.69-6; Krzywinski et al. 2009) and gene annotations of Daucus carota; these circularized genomes are drawn open to show gene transfers between them. For the plastid, only genes over 300 nt are annotated for space limitations, but these are collinear with those fully annotated in Ruhlman et al. (2006). Duplications within (blue) and between (red) genomes are shown by connected lines or ribbons. The direction of all duplications between genomes is presumed to be from plastid to mitochondrion except DcMP from mitochondrion to plastid (Iorizzo et al. 2012a, b) as labeled by the arrow. Organellar sequences and gene annotations were obtained from NCBI accessions NC_017855 (mitochondrion) and NC_008325 (plastid). Duplicated regions were detected using BLAST+ version 2.6.0 megablast program (Camacho et al. 2009) with minimum alignment length of 50, minimum percentage similarity of 80, and no dust filtering. b Structure of the plastid D. carota DcMP sequence. Open reading frames (ORFs) were detected using Open Reading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). The sequence was oriented according to 5′–3′ (indicated by arrows); ORF orientation is in opposite direction as related to other figures. Thick vertical blue lines indicate target site duplication (TSD). Thin red vertical lines indicate relative position of P1, P2, and P3 tnrV promoters. The red box indicates the region comprising partial sequence of cox1 gene. The scheme is drawn to scale

Palmer (1985) mentioned the maternal inheritance of plastid DNA, documented for most species by Tilney-Bassett (1978). Corriveau and Coleman (1988) developed a rapid cytological screen based on epifluorescence microscopy for maternal inheritance and examined 235 plant species from 80 angiosperm families. They detected putative plastid DNA in the generative and/or sperm cells of pollen from 43 species in 26 genera of 15 families, but not in the generative or sperm cells of pollen from the remaining 192 species (82%), strongly suggesting that they have only maternal inheritance. Their results corroborated most reports of maternal plastid inheritance, and suggested that biparental inheritance of plastids is rare, occurring in about 14% of flowering plant genera, scattered among 19% of the families examined. The carrot plastid genome follows a pattern of maternal inheritance (Vivek et al. 1999). Jansen and Ruhlman (2012) reviewed data on maternal inheritance of plastids in angiosperms and provided a similar figure (80%) for angiosperm species with maternal inheritance, the remaining 20% with biparental inheritance.

Wolfe et al. (1987) compared mutational rates among plant mitochondrial (mtDNA), plastid (cpDNA), and nuclear DNA (nDNA) sequences; and among plant and animal mitochondrial DNA sequences. He documented that (1) in contrast to mammals, where mtDNA evolves at least five times faster than nDNA, angiosperm mtDNA evolves at least five times slower than nDNA, (2) plant mtDNA undergoes much more frequent rearrangements and is larger and variable in size than mammalian mtDNA, (3) cpDNA evolves much slower than plant nDNA, and (4) DNA from the cpDNA IR region evolves much more slowly that the plant LSC or SSC regions. The relative structural conservatism and slower evolution rate of cpDNA in plants made it an ideal molecule for plant phylogenetic studies.

Early plastid phylogenetic studies were based partly on DNA restriction site procedures, but were largely replaced by massive data from next-generation DNA sequencing, stimulating the rapid accumulation of whole plastid DNA sequences. For example, Jansen and Ruhlman (2012) reported the public availability of 200 plastid genomes that as of June 2018 has grown to over 3000 (https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=2759&opt=plastid), allowing for finer comparisons of plastid DNA sequences. Raubeson and Jansen (2005) documented varying rates of change in different regions of the plastid genome, favoring phylogenetic studies at different taxonomic levels. Plastid DNA analyses (first DNA restriction site studies, and then DNA sequences from portions of the genome) dominated much of the molecular phylogenetic literature in the 1980s and 1990s. Jansen and Ruhlman (2012) documented additional lineages of both gymnosperms and angiosperms (the Campanulaceae) deviating from stability of plastid architecture, gene and intron content, and gene order across seed plants. They documented highly rearranged plastomes to exhibit three general phenomena: (1) highly accelerated rates of nucleotide substitutions, (2) an increase in the number of dispersed repeats, many of which are associated with rearranged endpoints, and (3) biparental plastid inheritance. They reviewed studies (e.g., Lilly et al. 2001) documenting deviations from the typical circular arrangement of the plastid molecule, to include multimeric circles or linear and branched structures.

The phylogenetic analysis of 81 plastid genes in 64 sequenced genomes by Jansen et al. (2007) allowed lineage-specific correlations between rates of nucleotide substitutions. They documented gene and intron content in plastids to be highly conserved among the early diverging angiosperms and basal eudicots, but found 62 independent gene and intron losses limited to the more derived monocot and eudicot clades. They showed that most angiosperm plastid genomes contain 113 different genes, 16 of which are duplicated in the inverted repeat, for a total of 129 genes. Intron content was shown to be highly conserved across angiosperms with most genomes containing 18 genes with introns. Like gene losses, intron losses were shown to be restricted to the more derived monocot and eudicot clades. Their fully resolved and strongly supported phylogenetic tree supported the genus Amborella as the earliest diverging lineage of flowering plants (now estimated to contain over 257,400 species classified into 52 orders and about 450 families; Judd et al. 2016), followed by the angiosperm orders Nymphaeales and Austrobaileyales, and provided strong support for a sister relationship between eudicots and monocots.

12.2 Plastid Structure in the Apiales (Apiaceae and the Sister Family Araliaceae)

Our literature survey of the in the Apiales (Table 12.1; data as of May 1, 2018) recovered 79 reports of published genomes in the Apiaceae and 33 reports (112 in total) in the Araliaceae. Like the Jansen et al. (2007) wider survey of the angiosperms, our survey of all 112 Apiales plastid genomes from these two families documents a single circular double-stranded DNA molecule, displaying the typical quadripartite structure of angiosperm plastid genomes, containing 111–114 nonduplicated genes. All plastid genomes are collinear, consistent with the rarity of recombination in plant plastomes (Palmer 1985). Total genome lengths varied from 146,512 in Angelica nitida to 171,083 in Caucalis platycarpos; with a large single-copy region from 83,553 in Daucus crinitus to 94,684 in Pimpinella rhomboidea; a small single-copy region ranging from 17,139 in Crithmum maritimum to 19,117 in Schefflera delavayi; and a pair of inverted repeats from 17,217 nt in P. rhomboidea to 27,993 in C. maritimum. Average CG contents range from 36.8% in Eleutherococcus gracilistylus to 38.1% in Aralia undulata and Panax notoginseng. The number of nonduplicated genes ranged from 111 in Bupleurum falcatum to 114 in many other species.

Table 12.1 Summary of genome statistics of fully sequenced plastids of members of the Apiaceae and sister family Araliaceae

12.3 Plastid Structure in Daucus Sensu Lato

All reports of Daucus in its expanded sensu (sensu lato, Banasiak et al. 2016, see Chap. 2) likewise documented a typical chloroplast quadripartite circular genome consisting of a total length in nt varying from 155,441 in Daucus involucratus to 157,336 in Daucus setulosus; a large single-copy region from 83,553 in D. crinitus to 84,444 in Rouya polygama; a small single-copy region 17,314 in R. polygama to 17,887 in Daucus tenuisectus; and a pair of inverted repeats 26,924 nt in Daucus bicolor to 27,741 in Daucus aureus. Spooner et al. (2017) did not report average GC contents but they documented an inverse relationship between read coverage and GC content, most notably in the second half of the inverted repeat region, as seen in the coverage plots (Fig. 12.2). This observation is likely a reflection of the Illumina platform that introduces coverage bias in regions with high GC content (Ross et al. 2013). All reports documented 113 unique genes consisting of 80 protein-coding genes, 29 tRNA genes, and 4 RNA genes.

Fig. 12.2
figure 2

Read coverage and percent GC plots spanning the plastid genome of Daucus carota subsp. carota PI 274297; inverted repeat regions highlighted in gray

The inverted repeat junctions flanking the LSC were identical in all genotypes examined by Spooner et al. (2017), while those flanking the SSC were variable (Fig. 12.3). These variations form six distinct classes (A–F), with the out-group Oenanthe virgata (class F) having the largest fraction of the ycf1 gene included in the inverted repeat, including a 9-nt insertion unique to this species. Relative to Oenanthe, class A consists of 15 accessions, which includes D. carota, and has a 326-nt contraction (reduction in the size of the inverted repeat); class B consisting of only D. aureus has the largest contraction, 422 nt; class C consisting of five accessions has a 318-nt contraction; class D consisting of 15 accessions has a 319-nt contraction; and class E consisting of only C. platycarpos has a 50-nt contraction. Relative to the plastid phylogeny of Spooner et al. (2017), there is a direct cladistic relationship of these inverted repeat junction classes with all accessions of D. carota and its immediate sister species Pseudorlaya pumila and Rouya polygama having class A; D. aureus class B; D. muricatus, D. tenuisectus, and D. crinitus class C; D. conchitae, D. crinitus, D. glochidiatus, D. littoralis, D. pusillus, D. setulosus, class D; out-group Caucalis platycarpos class E; and out-group O. virgata class F.

Fig. 12.3
figure 3

Junctions of the inverted repeats and small single-copy plastid regions. Functional genes are represented in blue, tRNA in tan, and pseudogenes in gray. Numbers in the figure represent the number of nucleotides no longer present in inverted repeat B relative to Oenanthe virgata. A (Daucus carota NC_008325.1) is representative of 14 additional genotypes; B (D. aureus 319,403) is unique to this genotype; C (D. crinitus 652,413) is representative of four additional genotypes; D (D. guttatus 286,611) is representative of 14 additional genotypes; E (Caucalis platycarpos 649,446) and F (O. virgata Ames 30,293) are unique to these genotypes

The plastids of members of D. carota sensu lato have variable numbers of repeats (scanned for minimum length 30 nt) between 13 and 18, with a minimum size of 70 nt for R. polygama and a maximum size of 127 nt in D. crinitus. Twenty-five accessions share a maximum repeat size of 88, three accessions 106 nt, and two accessions 109 nt. Species in closely related clades share a larger number of repetitive sequences (Spooner et al. 2017).

12.4 Mitochondrial Structure and Function in Angiosperms

Mitochondrial DNA has the same basic role in plants as it does in other eukaryotes, encoding a small number of essential genes of the mitochondrial electron transfer chain. For the expression of these few genes, the mitochondrion has its own translation system that is also partially encoded by the mtDNA, including rRNAs, tRNAs, and a variable number of ribosomal proteins that vary across different species (Kubo and Newton 2008). A few proteins involved in the assembly of functional respiratory complexes are encoded by the plant mtDNA. However, all factors required for the maintenance of the mtDNA and the expression of its genes are encoded in the nucleus and imported from the cytosol, thus placing mtDNA replication, structural organization, and gene expression under nuclear control.

Although the number of mitochondrial genes varies little between species, the size of the mtDNA varies over more than a 100-fold, with land plant mitochondrial genomes by far the largest. Angiosperm mitogenomes are usually in the range of 200–700 kb, but can be as large as 11 Mb in Silene conica (Sloan et al. 2012). Although a few additional genes exist in plant mitogenomes, and several genes contain introns, these features do not contribute significantly to the large size or the size variation of plant mtDNA. Rather, most of the genome consists of noncoding sequences that are not conserved across species. Horizontal transfer seems to be responsible for the acquisition of exogenous sequences (Bergthorsson et al. 2003), and a fraction of plant mitogenomes can be recognized as derived from plastid, nuclear, or viral DNA. However, most noncoding sequences are of unknown origin.

The structure of angiosperm mitochondrial genomes is frequently characterized by repeat sequences (Gualberto et al. 2014). The number and the size of these repeats are important, as they influence the size of the genome, and they are the sites of intragenomic recombination, underlining evolutionary changes in mitochondrial genome organization and structural dynamism in vivo (Guo et al. 2017; Gupta et al. 2013). The repeats have often been classified as large repeats (>500 nucleotides), which can be involved in frequent homologous recombination; intermediate-size repeats (50–500 nucleotides), which are involved in infrequent ectopic homologous recombination; and small repeats (<50 nucleotides), which can promote illegitimate microhomology-mediated recombination (Arrieta-Montiel et al. 2009; Davila et al. 2011; Gualberto et al. 2014). Based on the very active recombination behavior of large repetitive sequences, early studies postulated that the entire genetic content of mtDNA could be assembled into a circular molecule, the so-called master circle, from which multiple subgenomic circular molecules are generated by intramolecular recombination across direct repeats. Although the repetitive sequences across species are not conserved, their organization and structure, which drive the recombination process, are conserved. Recent studies based on gel-based approaches or electron microscopy and quantitative sequence data from next-generation sequencing have indicated that circular and linear forms of mtDNA co-exist in vegetative tissue. Sequencing data also revealed the evolution of multichromosomal genomes associated with genome size expansion.

An economically important trait that can result from intraspecific variation promoted by recombination within mitogenomes is cytoplasmic male sterility (CMS)—the maternally transmitted inability of a plant to produce viable pollen. CMS is widespread in natural plant populations and is important for the evolution of gynodioecious species, in which females and hermaphrodites co-occur in populations (Dufay et al. 2007). In crop breeding, including in carrot it is an economically valuable trait used extensively for the production of hybrid seeds (see Chap. 3). It usually results from the expression of a chimeric gene created de novo by recombination processes, particularly microhomology-mediated recombination events, each of which involves just a few nucleotides of sequence identity. Multiple CMS phenotypes in carrot have been described and are used in breeding programs. A maternal mode of inheritance of the mitochondrial (mt)DNA has been observed in carrot CMS plants by several authors, and different genes/ORFs have been proposed to control this important trait (see Chap. 3).

Given the larger genome size relative to plastid, the diversity of repetitive sequences, and its dynamic organization, assembling mitochondrial genomes is challenging, and for this reason the number of mitochondrial genomes available is far lower than the plastomes.

12.5 Carrot Mitochondrial Genome, Structure, and Organization

In 2012, Iorizzo et al. (2012a) assembled and characterized the carrot mitochondrial genome, the first and still the only mitochondrial genome sequenced in the Apiaceae. With 281,132 nt, the carrot mitogenome is among the smallest mitochondrial genomes sequenced to date among the angiosperms and confirmed previous estimation (255,000 nt) made by Robison and Wolyn (2002) based on restriction digestion mapping. Although the genome could be assembled and represented as a master circle, Southern blot analysis confirmed the presence of two recombinant sub-circles. The overall GC content of carrot (45.4%) is comparable to other angiosperms (Alverson et al. 2011; Rodriguez-Moreno et al. 2011).

Annotation of the genome identified 44 protein-coding sequences and three ribosomal RNAs, which confirmed the previous report of Adams et al. (2002) based on Southern hybridization that surveyed mitochondrial gene presence or loss across 280 angiosperms. Truncated copies of atp1 and atp9 were detected, confirming observations previously reported by Bach et al. (2002). Considering a set of 51 mitochondrial conserved genes, the carrot mitogenome lack 7 genes (sdh3, sdh4, rpl2, rps2, rps10, rps14, and rps19), and three of them were identified in the carrot genome assembly. In addition to coding genes, the carrot mitogenome contains 18 tRNAs that recognize 15 amino acids and is missing tRNA genes for six amino acids, which are likely coded by the nuclear genome.

As expected, intergenic spacer regions represent the largest part of the genome, 224,526 nt (79.9%), with repetitive sequences occupying the majority of this space (49%). With 74 repeats ranging from 37 to 14,749 nt, the carrot mitochondrial genome has the lowest number of repeats among the sequenced plant mitochondrial genomes, which reflect its small genome size. All but one are dispersed repeats. Most of the repeats (about 90%) are between 20 and 202 nt in length accounting for just 2.0% of the total genome coverage. Nine large repeats ranging from 4220 to 14,749 nt account for 44.0% of the genome. The insertion of the large repeat 1, between repeat 2 and 3, forms a 35 kb super-repeat. After wild cabbage (Chang et al. 2011), this is the largest repeat region described in eudicot mitochondrial genomes to date. Other sequences in the intergenic spacer regions include additional open reading frames not associated with any conserved mt genes, and DNA of nuclear or plastid origin, derived from intracellular gene transfer (IGT) or possibly horizontal gene transfer (HGT), a prevalent and ongoing process in plant evolution.

12.6 Intracellular DNA Transfer in Angiosperms

While nuclear and mitochondrial genomes integrate foreign DNA via IGT and HGT, plastid genomes (plastomes) have resisted foreign DNA incorporation and only recently has IGT been uncovered in the plastomes of a few land plants. The emergence of contemporary genomics has dispelled traditional hypotheses of the sole evolution by vertical descent with modification. Drawing on phenotypic data, early investigators could not have predicted the impact of HGT on both the universality of the genetic code and diversity of organisms found on earth (Vetsigian et al. 2006). Although first recognized among eubacteria (Tatum and Lederberg 1947), HGT occurs across all domains of life and has shifted our views on the phylogeny of organisms from one of bifurcation to a more reticulate, web-like mode of evolution (Soucy et al. 2015).

Just as the sharing of DNA sequences among unrelated organisms has shaped their evolutionary history, so has the transfer of sequences among the genome-bearing compartments of individual cells shaped the evolution of eukaryotic species. Intracellular gene transfer, along with HGT, has played a pivotal role in the evolution of multicellularity and the oxygenation of earth’s atmosphere, facilitating the evolution of plant and animal life (Timmis et al. 2004). The free-living, single-celled organisms that ultimately became mitochondria, and later plastids, of eukaryotic cells through endosymbiosis contained the necessary complement of genetic material for survival in the extracellular environment. Once housed within the host cell, much of that genetic material was transferred to the host nuclear genome. This massive transfer of DNA sequence fully integrated the processes of the organelles with those of the host nucleus.

Since the establishment of the cellular organelles, both mitochondrial and plastid genomes (mitogenomes and plastomes) of plants have continued to divest themselves of both coding and noncoding DNA. While mitogenomes exhibit more variation in overall size and retained gene content (Adams et al. 2002), most plastomes harbor a conserved set of coding sequences within a relatively stable size and configuration, with a small set of genes that tend to be transferred to the nucleus across the plant phylogeny (Jansen and Ruhlman 2012). The transfer of DNA sequence from both organelles to the nucleus is an ongoing process that has contributed to the evolution of the nuclear genome, regardless of whether those sequences were eventually purged from their original location or activated for their ancestral function elsewhere in the cell following nuclear transcription (Timmis et al. 2004). Likewise, plant mitogenomes contain extensive insertions of both plastid and nuclear DNA (nDNA), although, for the most part, these remain nonfunctional (Mower et al. 2012). Plastomes, however, appear to be recalcitrant to the incorporation of foreign DNA either by HGT or IGT, possibly because of the lack of an efficient DNA uptake system within plastids (Bock 2015; Richardson and Palmer 2007; Smith 2011).

Among the >3000 complete angiosperm plastomes now available in GenBank (https://www.ncbi.nlm.nih.gov/genbank/), just a few lineages have been recognized to contain DNA of nonplastome origin. Although a few studies explored putative plastome sequences with high identity to mtDNA, for the most part, the identity was due to the presence of sequences of plastid or nuclear origin in mitogenomes (Chumley et al. 2006; Ohtani et al. 2002).

The notion that land plant plastomes could incorporate foreign DNA sequences without biotechnological intervention was unheard of prior to 2009 (Goremykin et al. 2009). To date, legitimate cases of foreign DNA insertions into the plastome have been reported in four unrelated families/genus of angiosperms including Daucus (Iorizzo et al. 2012a), Apocynaceae (Straub et al. 2013), Bambusoideae (Ma et al. 2015), and Anacardium (Rabah et al. 2017). Identification of these rare events have been facilitated in part by the availability of complete mitogenome sequences. Given the wide distribution of these four families across four orders of land plants: Apiales (asterid II), Gentianales (asterid I), Sapindales (rosid II), and Poales (commelinid) combined with the lack of informative common features, suggested at least four independent events across all land plants, which likely occurred only once within each clade.

12.7 Inter-organelle DNA Transfer in the Apiaceae, a Story of First Discoveries

Goremykin et al. (2009), while analyzing the Vitis vinifera L. (grape) mitochondrial genome, detected two sequences of 74 and 126 nt which were similar to the carrot plastid genome (Ruhlman et al. 2006). The larger sequence has high similarity to the coding region of the mitochondrial cytochrome c oxidase subunit 1 gene (cox1), prompting the authors to suggest that its presence in the Daucus plastome might possibly represent a rare transfer of DNA from the mitochondrion into the plastid. These two sequences are contained within a large 1439-nt fragment of the D. carota inverted repeat at positions 99,309–100,747 and 139,407–140,845 (Ruhlman et al. 2006) that is a part of the 30rps12-trnV-GAC intergenic spacer region. This fragment, however, has no similarity to any other published plastid nucleotide region (Goremykin et al. 2009). Subsequently, Iorizzo et al. (2012a), in characterizing the entire carrot mitochondrial genome, verified the presence of this sequence in both plastid and mitochondrial genomes and designated this site as the D. carota mitochondrial-plastid (DcMP) region (Fig. 12.1a). The DcMP sequence is 1452 nt-long in the carrot plastome and is present as three noncontiguous, rearranged sequences in the mitochondrial genome of D. carota (Iorizzo et al. 2012a). In the plastome, however, the DcMP sequence, or a large portion of it, is present only in Daucus (seven species) and its close relative Cuminum L. (cumin), both of Scandiceae subtribe Daucinae. Analysis of the plastid DcMP sequence identified three putative open reading frames (ORFs) with similarity to retrotransposon element domains (gag domain and reverse transcriptase) and a 6 nt direct repeat (CTTGAC), flanking the DcMP sequence, upstream of DcMP1, and downstream of DcMP4 (Fig. 12.1b) (Iorizzo et al. 2012b). These characteristics suggested that the DcMP might be a non-LTR retrotransposon and the direct repeats represent target site duplication (TSD) created because of the DcMP integration following its mobilization from a donor site localized in the mitochondrial genome. Overall, these two complementary studies demonstrated for the first time that DNA transfer from the mitochondrion to the plastid can occur in flowering plants and provided a hypothesis about its possible mode of integration.

Considering the stability of the plastid genome, it is legitimate to hypothesize that a mt-to-pt insertion within a phylogenetic clade is likely to have originated from a single event in a common ancestor, making this type of insertion useful to trace ancestry and genetic relationships within the Scandiceae tribe, which includes three subtribes Daucinae, Torilidinae, and Scandicinae. Analysis of 37 plastid genomes including members of the Daucinae and Torilidinae subtribes indicated that the DcMP region was detected in all 36 members of the Daucinae clade and in C. platycarpos, a member of the Torilidinae clade (Spooner et al. 2017). Comparative analysis of the DcMP region across the 37 plastid genomes revealed 21 structural variants (SVs) (insertions or deletions) (Fig. 12.4). Relative to the plastid phylogeny of Spooner et al. (2017), there is a direct cladistic relationship of these SVs with all accessions of Daucus and its immediate sister species P. pumila, R. polygama, and C. platycarpos (Fig. 12.4). To expand the search for DcMP insertion within the Apiaceae, Downie and Jansen (2015) compared the plastomes of six Apiaceae species (C. maritimum, D. carota, Hydrocotyle verticillata, Petroselinum crispum, and Tiedemannia filiformis subsp. greenmani) including Anthriscus cerefolium, a member of the Scandicinae subtribe. Despite the observation that another putative insertion of mtDNA, unrelated to DcMP is present in the plastid genome of P. crispum, none of these six plastid genomes contain the DcMP sequence. Overall, these two studies indicated that the DcMP insertion is restricted to the Torilidinae subtribe (C. platycarpos) and Daucinae (36 species), which implies that within the Scandicinae tribe these two subtribes are genetically more closely related as compared with the Scandicinae subtribe where the insertion has not been detected. This hypothesis is supported by previous systematic and molecular marker work (Lee and Downie 2000; Lee et al. 2001) and confirms our hypothesis that detection of the DcMP sequence can be used as a marker to delineate relationships in this clade.

Fig. 12.4
figure 4

Phylogenetic distribution and sequence comparison of plastid sequences spanning the mitochondrial-to-plastid (mt-to-pt) insertion designated as DcMP across all species included in Spooner et al. (2017). Green segments represent plastid sequence, and blue segments represent sequence of mitochondrial origin. The green region “F” designates conserved plastid sequences flanking the mt-to-pt insertion. The green region “A” designates a 339-nt region containing the ancestral promoter P4 and P5 (Tohdoh et al. 1981). DcMP1-2-3-4 (blue) designates the regions spanning the original mt-to-pt insertion described in Iorizzo et al. (2012a). CpMP5 and CpMP6 denote the two large insertions (6663 and 360 nt) identified Spooner et al. (2017) in C. platycarpos; trnV (green) represents the region coding for the trnV-GAC gene in the carrot plastid genome; P1, P2, and P3 indicate the location of the three putative promoters of the D. carota trnV (Manna et al. 1994). The vertical gray lines indicate the location of the 6-nt direct repeat flanking the DcMP insertion in D. carota and described in Iorizzo et al. (2012b). The double slash designates the masked portion of the 6663-nt insertion identified in C. platycarpos; this portion of DcMP5 insertion was masked to fit the figure in one panel. Vertical black lines indicate single-nucleotide polymorphisms (SNPs). Red lines indicate deletions identified based on the sequence alignment

Sequence analysis of the DcMP regions detected in 36 species (Spooner et al. 2017) revealed other important aspects related to IGT in plants. Within the DcMP region, two large insertions were detected in the C. platycarpos plastid genome, named Cp MP5 (6663 nt) and Cp MP6 (360 nt). A large portion of the Cp MP5 sequence (KX832334 from 102,567 to 105,470) shares a high similarity (91% identity) with DCAR_022437, a nuclear gene located on carrot Chr6 annotated as an auxin response factor (ARF). The alignment covers seven of the 14 DCAR_022437 predicted exons, and none of its flanking nuclear sequences shares similarity with other plastid sequences (Fig. 12.5a). These findings represent the first evidence of a known nuclear sequence inserted in a plastid genome. Either the plastid ARF DNA sequence found in C. platycarpos could be part of the ancestral mitochondrial DcMP sequence, or it could have been transferred directly from the nucleus or mitochondrion into the plastid after the mt-to-pt DcMP insertion occurred. The mechanism of transfer of this nuclear DNA relative to the insertion of DcMP in the plastid genome is unknown. However, the sequence covering the DcMP and CpMP regions documented in C. platycarpos contains an intact cox1 copy and fragments of ARF gene. Indeed, the Cp MP5 3′ end and Cp MP6 5′ end are contiguous to the pt-DcMP2 sequence and the carrot mt-Dc MP2 flanking sequences and cover the full length of the mitochondrial cox1 gene (Fig. 12.5b). These findings indicate that direct insertion of nDNA into the plastome at the very same locus as mtDNA insertion is implausible compared with its insertion along with the mtDNA, as mitogenomes of land plants contain abundant foreign DNA from both IGT and HGT events (Knoop 2004; Alverson et al. 2010; Park et al. 2014). In particular, an ARF gene (ARF17) has been transferred to the mitogenome in several genera of Brassicaceae (Qiu et al. 2014).

Fig. 12.5
figure 5

DcMP comparative analysis. a Comparison between the Daucus carota nuclear genome region containing auxin response factor (ARF) gene DCAR_022437 in the antisense orientation, and Caucalis platycarpos plastid sequence spanning the DcMP region. Gray shading linking sequences indicate regions with ˃92% nucleotide similarity. b Comparison between the C. platycarpos plastid sequence spanning the DcMP region and D. carota plastid and mitochondrial genomes. Red dashed lines indicate deletions of the sequence in the corresponding genome. Mitochondrial sequences are not directly contiguous, which are represented by gaps and blue dashed lines. Regions labeled with single digits 1 through 4 correspond to DcMP regions 1 through 4

In higher plants, horizontally transferred DNA is generally not functional in the recipient genome (Bock 2015; Richardson and Palmer 2007). In contrast, in carrot the DcMP sequence integrated three new functional promoters (P1, P2, and P3) located 105-, 41-, and 16-nt upstream of trnV, respectively, at the 3′—DcMP insertion junction. According to Manna et al. (1994), all three promoters are expressed in carrot cells and were responsible for the differential expression of trnV during embryogenesis. Assuming that all three promoters have a functional role, we expect their sequences to be conserved. Across all the samples harboring the pt-DcMP insertion, SVs resulted in the deletion of the P1 or P2 promoter sequences in at least one species (Spooner et al. 2017). In contrast, despite the observation that multiple independent insertion or deletion events occurred in the DcMP-4 region near the P3 promoter, its sequence is conserved across all accessions harboring the DcMP insertion (Fig. 12.4). Considering correct the hypothesis proposed by Manna et al. (1994) that the P3 promoter plays a functional and advantageous role on the expression of trnV, the comparative studies suggest that natural selection has maintained its sequence intact promoting the retention of the ancestral DcMP sequence in the plastid genome after its first integration.