Abstract
Introns and spacers are a rich and well-appreciated information source for evolutionary studies in plants. Compared to coding sequences, the mutational dynamics of introns and spacers is very different, involving frequent microstructural changes in addition to substitutions of individual nucleotides. An understanding of the biology of sequence change is required for correct application of molecular characters in phylogenetic analyses, including homology assessment, alignment coding, and tree inference. The widely used term “indel” is very general, and different kinds of microstructural mutations, such as simple sequence repeats, short tandem repeats, homonucleotide repeats, inversions, inverted repeats, and deletions, need to be distinguished. Noncoding DNA has been indispensable for analyses at the species level because coding sequences usually do not offer sufficient variability. A variety of introns and spacers has been successfully applied for phylogeny inference at deeper levels (major lineages of angiosperms and land plants) in past years, and phylogenetic structure R in intron and spacer data sets usually outperforms that of coding-sequence data sets. In order to fully utilize their potential, the molecular evolution and applicability of the most important noncoding markers (the trnT–trnF region comprising two spacers and a group I intron; the trnS–G region comprising one spacer and a group II intron in trnG; the group II introns in petD, rpl16, rps16, and trnK; and the atpB–rbcL and psbA–trnG spacers) are reviewed. The study argues for the use of noncoding DNA in a spectrum of applications from deep-level phylogenetics to speciation studies and barcoding, and aims at outlining molecular evolutionary principles needed for effective analysis.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The application of noncoding chloroplast DNA sequence data in plant molecular systematics has been steadily increasing over the last decade. Sequencing of rapidly evolving spacers and introns was initially proposed for unravelling evolutionary patterns among closely related species (Taberlet et al. 1991; Manen and Natali 1995). The idea was to use universal amplification primers that anneal to conserved genes and thereby span more variable spacers and introns. At about the same time, pronounced differences in mutational dynamics and consequently in levels of variability between coding and noncoding plastid regions were pointed out by Morton and Clegg (1993), Clegg et al. (1994), and others. As compared to coding genes, the sequences of introns and spacers are functionally less constrained. This, however, describes average sequence conservation. Introns in particular possess a well-conserved secondary structure that leads to a mosaic of highly conserved and extremely variable parts (Cech 1988; Michel et al. 1989; Cech et al. 1994; Kelchner 2002; Borsch et al. 2003). For example, in the trnL group I intron, the PQRS elements exhibit hardly any mutations throughout land plants (Quandt et al. 2004), whereas the P6 and P8 stem-loop elements are extremely variable. In group II introns, the different domains exhibit different conservation patterns due to their different roles in intron function (Lehmann and Schmidt 2003; Pyle and Lambowitz 2006). To some extent, such a mosaic of differing sequence conservation can also be found in spacers. Spacers can be fully, partly, or not at all transcribed and contain conserved promoter elements as well as hairpin structures to end transcription (e.g., Quandt et al. 2004; Won and Renner 2005; Štorchová and Olson 2007).
Most important is the growing awareness that DNA evolution is considerably more complex than a mere stochastic replacement of nucleotides with eventually differing rates at different sites. This awareness has led to the recognition of microstructural mutations and their underlying biological patterns as an important feature to be considered in the phylogenetic analysis of sequences (e.g., Thorne et al. 1992; Gu and Li 1995; Benson 1997; Kelchner 2000). Contrary to structural modifications and rearrangements within genomes, which can copy, translocate, or invert whole genes or blocks of genes including spacers and introns, microstructural mutations occur in addition to genomic rearrangements (Graham et al. 2000). Microstructural mutations thus per definition occur within genomic regions, such as genes, introns, and spacers, and on average do not exceed 200 nucleotides.
Borsch et al. (2003, 2005) and Löhne and Borsch (2005) showed that rapidly evolving introns and spacers of the chloroplast genome single-copy regions possess high performance as phylogenetic markers for resolving relationships among major lineages of angiosperms. The recognition of microstructural mutations, most of which are simple sequence repeats, allowed a robust sequence alignment. It was further shown that extreme sequence variability is confined to mutational hotspots in the structurally and functionally least constrained areas of the introns or intergenic spacers. And consequently, these mutational hotspots, in which homology could not be established for more distant sequences, could easily be excluded from phylogenetic analysis. As a result, trees depicting deep nodes in angiosperms that were reconstructed with about one-fifth of the characters coming from noncoding DNA (Borsch et al. 2003) yielded comparable or even higher resolution and overall statistical node support as those data sets derived from multiple genes of all three genomic compartments (Qiu et al. 1999). Several studies have compared the performance of coding versus noncoding markers at the family level (e.g., Richardson et al. 2000; Asmussen and Chase 2001; Sauquet et al. 2003) arriving at the conclusion that trees of spacer and intron sequences usually have higher consistency and retention indices as compared to rbcL or atpB trees.
The higher number of variable sites in noncoding DNA was considered to be an explanation of its better phylogenetic performance when compared to coding DNA. Inspired by the possibility that functionally less constrained noncoding DNA of introns and spacers also evolves closer to neutrality than DNA of coding genes, Müller et al. (2006) developed a software package to test the phylogenetic performance per variable site. Their study compared rbcL, matK, and trnT–trnF data sets for early branching angiosperms as a model. The result was positive and showed an increasing quality of phylogenetic signal per informative site from rbcL to matK to noncoding trnT–trnF. Selection of optimal phylogenetic markers for a given range of inclusiveness of the study group (i.e., whether relationships among species, genera, families, or major plant lineages should be inferred) therefore goes beyond selecting simply by amounts of variability, as for example that described by average p-distances, usually found in a genomic region. A simple correlation of genetic distances with organismic distances is further complicated by structurally constrained mutational hotspots (Borsch et al. 2003) with lineage-specific possibilities for homology assessment. There is thus a need for understanding mutational dynamics of genomic regions in relation to their structural and functional constraints in order to optimize their use as phylogenetic markers.
The first chloroplast genome that was completely sequenced is from Nicotiana tabacum (Shinozaki et al. 1986). Due to ever decreasing costs per base pair sequenced and recent methodological innovations (e.g., Jansen et al. 2005; Moore et al. 2006), several complete chloroplast genomes of green plants are now generated every year (e.g., Wolf et al. 2003; Goremykin et al. 2004; Leebens-Mack et al. 2005; Chang et al. 2006; Chumley et al. 2006; Pombert et al. 2006; Jansen et al. 2007; Moore et al. 2007; Haberle et al. 2008). We therefore have a good understanding of the overall architecture of this genomic compartment in plants.
In this review, we focus on the noncoding proportion of the chloroplast genome as phylogenetic information source. Examining the rich literature of the last decade that deals with chloroplast introns and spacers, and evaluating results obtained in our own work, we aim at providing an up-to-date summary on molecular evolution and phylogenetic utility of noncoding chloroplast DNA. First of all, we will characterize the structurally and functionally different noncoding regions of the chloroplast genome. Then we will look at the different known forms of microstructural mutations, and building upon this, we will address selected individual introns and spacers and discuss their molecular evolution and phylogenetic utility. Prospects and pitfalls for their use will be especially considered. Further paragraphs then deal with more general issues: the specificity of evolutionary patterns to the chloroplast genome, limits to analysis imposed by mutational hotspots, the taxonomic spectrum to which spacer and intron sequence data can be applied, the idea to search for markers with the highest phylogenetic structure, and quality measures needed when using noncoding markers. With this paper we want to stimulate both the application of noncoding sequence data in plant evolutionary studies and further research on molecular evolution of introns and spacers.
The plastid genome and its structurally and functionally different noncoding genomic regions
Apart from a few exceptions, such as in the majority of hornworts, each cell of land plants usually contains multiple plastids, the number of which differs among cells of different organs. Similar to mitochondria, plastids are inherited following a uniparental pathway (Birky 1995). While in most cases plastids are inherited maternally, as in the vast majority of flowering plants (e.g., Corriveau and Coleman 1988; Reboud and Zeyl 1994), paternal inheritance is occasionally observed. Examples are Actinidia in Actinidiaceae (Testolin and Cipriani 1997) and gymnosperms (reviewed in Reboud and Zeyl 1994). Each plastid harbors several generally uniform plastid chromosomes that contain a ringlike DNA molecule. Following the general consensus, we call this generally circular molecule the plastid genome. However, due to the uniparental inheritance as well as the general lack of recombination and heteroplasmy, the polyploid plastid genome is effectively haploid. The number of genomes per plastid differs among lineages and depends on the age, maturity, and distance from the basal meristem (e.g., Baumgartner et al. 1989; Shaver et al. 2006). For example, in Nicotiana tabacum cells of young leaves contain 190 plastid genomes whereas the number decreases to 70 in mature leaves (Shaver et al. 2006). Typically, these plastid genomes are not equally distributed throughout the organelle but concentrated in defined areas where they form aggregates, so-called nucleoids with proteins (Kuroiwa 1991; Sakai et al. 2004). Sticking to our model plant Nicotiana tabacum, Kuroiwa (1991) report 8–40 nucleoids, each with about 10 plastid genomes in mature chloroplasts.
Having lost more than 1,700 genes, due to their transfer to the nucleus during endosymbiosis (Martin et al. 2002), the genome size of contemporary land-plant plastid genomes ranges from 70 to 217 kb. Extreme genome size deviations occur in some lineages, particularly parasitic plants, that show even further reduced genomes, but most genomes are typically 120–160 kb. The usually monomeric circular plastid genomes are characterized by a quadripartite organization where two single-copy regions, the small single-copy region (SSC region), and the large single-copy region (LSC region) are separated by two inverted repeats (IRA/IRB). However, monomeric circles are not necessarily the rule. For example, Lilly et al. (2001) reported that only 40–50% of the analyzed plastid genomes of Arabidopsis and Nicotiana are found to be circular. The rest of the molecules were either linear (20–25%) or present in other conformations, such as D-loops or lasso-like. In addition, it is known that plastid genomes can exist in two orientations (isomers), differing only in the relative orientation of their single-copy sequences (Palmer 1983). Responsible for these isomers is a so-called flip-flop recombination between the large inverted repeat sequences (Palmer 1985). Although evolving more slowly, with a substitution rate that is three to four times lower than the single-copy regions (Wolfe et al. 1987), the inverted repeats account for most of the length variation, ranging in size from 5 to 76 kb (Palmer 1991; Chumley et al. 2006). They even can be lost, such as in some legumes and conifers (Koller and Delius 1980; Wakasugi et al. 1994). The corresponding single-copy regions are similar in size throughout land plants, usually ranging between 16 and 27 kb (SSC) and 80 and 90 kb (LSC).
In Fig. 1, we show the plastid genome of Nicotiana tabacum for reference and illustration because it was the first one completely sequenced (Shinozaki et al. 1986) and because it represents the typical genome structure of the majority of land plants. Generally, the plastid genome encodes 3–5 plastid rRNA genes, about 30 tRNA genes, and usually more than 100 protein-encoding genes (e.g., Sugiura et al. 1998). Due to its origin, it also shares characteristics with cyanobacterial genomes, such as sequence and organization of transcription promoters, mRNAs lacking 5′ caps and eukaryotic 3′ poly (A) tails, 70S-type ribosomes, and the general lack of spliceosomal introns (e.g., Yamaguchi et al. 2000; Zerges 2000). In the plastid genome of Nicotiana, the protein and RNA coding genes make up only 60% of the genome. Thus, 40% of the genome consists of noncoding regions, such as intergenic spacers (IGS; regions separating coding regions) or introns (noncoding DNA within a gene; Fig. 2).
Genes in the plastid genome are usually separated by intergenic spacers. While these spacers are generally considered to be noncoding, they usually harbor important elements for transcription, splicing, and translation as well as maturity of mRNA. For example, plastid transcription promoters are usually located in the IGS upstream of the protein-coding regions (Fig. 2). From the starting point of transcription only a few base pairs of these spacers are transcribed, and they are therefore also termed nontranscribed spacers (such as the trnT–trnL IGS), in contrast to transcribed intergenic spacers such as the trnL–trnF IGS (Fig. 2). Based on such structural and functional constraints, spacers often exhibit mosaic-like patterns of variability. There is a trend of transcribed spacers or transcribed parts of spacers being more conserved. This was impressively shown for the psbA–trnH spacer (Štorchová and Olson 2007), which is rather conserved from the psbA CDS downstream to the putative transcription end for psbA, whereas the nontranscribed part further downstream is extremely variable at the population level within various groups of angiosperms. Intergenic spacers are those noncoding genomic regions that under some circumstances may be affected by chloroplast genomic rearrangements in certain lineages, thus limiting their phylogenetic utility to these lineages (see below). As currently known, the psbA–trnH spacer is also one of the spacers possessing the most complex mutational dynamics harboring occasionally complete protein-coding genes in some lineages of flowering plants such as monocots. With the expansion and shrinkage of the inverted repeat regions being the most important evolutionary process affecting the structure of the chloroplast genome (Raubeson and Jansen 2005), the position of the psbA–trnH IGS in the large single-copy region close to the inverted repeat explains that it is likely to be affected by structural mutations. Molecular evolutionary patterns of chloroplast genome spacers should therefore be examined across land plants as a basis for assessing their utility.
Several rearrangements in the organization of plastid genomes have been found in different land-plant lineages. Extreme deviations are long known from the parasite Epifagus (Wolfe et al. 1992) and were more recently reported from Pelargonium (Chumley et al. 2006). Among angiosperms large inversions and rearrangements occur in legumes (e.g., Kato et al. 2000) and grasses (e.g., Hiratsuka et al. 1989), but also in almost all other major land-plant lineages, such as gymnosperms (e.g., Wakasugi et al. 1994), ferns (e.g., Wolf et al. 2003), hornworts (Kugita et al. 2003), mosses (Sugiura et al. 2003), and liverworts (Ohyama et al. 1986). Despite the observed variability, the plastid genomes of all land plants exhibit a remarkable similarity in size and overall architecture. Gene content and order as well as intron composition of the charophyte Chaetosphaeridium largely resemble the plastid genome of the liverwort Marchantia polymorpha. It can therefore be assumed that the architecture of plant chloroplast genomes was already gained during the evolution of the common ancestor of charophytes and land plants (Turmel et al. 2002).
Introns make up 12% of the total sequence length of the plastid genome of Nicotiana, representing one-third of its noncoding regions. Apart from the trnLUAA intron, which represents the only group I intron in the plastid genome of land plants, all other plastid introns are classified as group II introns, according to a conserved RNA folding pattern and a particular splicing mechanism (Michel et al. 1989; Kelchner 2002; Haugen et al. 2005). During the last few years, the molecular evolution of “autocatalytic” plastid introns, especially the trnL group I intron (e.g., Borsch et al. 2003; Stech et al. 2003; Quandt et al. 2004; Quandt and Stech 2005) as well as the trnK, rpl16, rps16, and petD group II introns (e.g., Hausner et al. 2006; Kelchner 2002; Löhne and Borsch 2005; Müller and Borsch 2005a, b) received considerable attention from the phylogenetic community.
In concordance with a single origin of chloroplasts (Nelissen et al. 1995; Delwiche and Palmer 1997; McFadden and van Dooren 2004), all trnLUAA genes form a monophyletic group, thus comprising only trnL orthologues that entered the plastid genome in a single event via endosymbiosis of the progenitors of plastids (e.g., Kuhsel et al. 1990; Besendahl et al. 2000). One of the characteristics of the trnLUAA intron, and of group I introns in general, is the mosaic structure of highly conserved elements (IGS, P, Q, R, S) essential for correct splicing (e.g., Davies et al. 1982; Cech 1988, 1990) that alternate with less constrained regions of variable size (Fig. 3; e.g., Cech et al. 1994; Quandt et al. 2004).
In principal, group II introns share an RNA structure consisting of six domains arranged around a central wheel (Fig. 3; e.g., Michel et al. 1989; Qin and Pyle 1998; Hausner et al. 2006). Each of these domains exhibits a characteristic size, structure, and variability (see Korotkova et al. 2009, this volume, for illustrations). Whereas domains I and IV tend to be large and include highly length-variable AT-rich stem-loop elements in plants (Kelchner 2002; Löhne and Borsch 2005; Hausner et al. 2006; Korotkova et al. 2009, this volume), domains V and VI are small and particularly well conserved. Group II introns thus represent a characteristic mosaic structure attributed to the need to form an active core for splicing. Another peculiar feature of some group II introns is the presence of an open reading frame (ORF) in domain IV. The respective intron-encoded proteins (IEPs) are considered to assist in splicing the host intron (Toor et al. 2001; Hausner et al. 2006). However, most extant organellar group II introns either entirely lack, or harbor severely degraded, ORFs, with the intron in trnK and its IEP matK actually representing the only functional maturase in the plastome (Toor et al. 2001; Hausner et al. 2006). In contrast to the tRNALeu group I introns of cyanobacteria, which readily undergo auto-excision (Xu et al. 1990), there are so far no reports of self-splicing activity of plastid trnLUAA introns, nor of any other plastid intron (Xu et al. 1990; Besendahl et al. 2000). It has, therefore, been argued that plastid group I as well as group II introns depend on splicing co-factors that interact with the pre-mRNA intron and facilitate secondary and perhaps tertiary structure formation (e.g., Akins and Lambowitz 1987; Cech et al. 1992; Besendahl et al. 2000). In the case of plastid group II introns, a universal role of matK in the splicing process has been advocated recently (reviewed in Hausner et al. 2006), a role that evidently predates the origin of vascular plants (Duffy et al. 2009). Several nuclear-encoded splicing factors such as CRS1 (chloroplast RNA splicing 1), required solely for the splicing of the chloroplast atpF intron, have been reported (Jenkins et al. 1997; Vogel et al. 1999; Ostersetzer et al. 2005).
Forms of microstructural mutation
Anyone who has aligned noncoding DNA sequences knows that the alignment of protein-coding regions is much more straightforward. Apart from the increased substitution rates in noncoding compared with coding DNA, this difficulty is largely attributed to more numerous indels and to inversions (i.e., inverted sequence motifs, Figs. 4, 5, and 6). The term indel is widely used for length mutations visible in an alignment and sometimes even for length mutations in general. Whether a length mutation corresponds to an insertion or a deletion of a sequence motif can only be evaluated in a phylogenetic context. Inversions usually have no effect on sequence length. We therefore prefer to use microstructural mutations as the most inclusive term. This better reflects the fact that inversions are, strictly speaking, another kind of mutation in addition to indels. Inversions have just recently been discovered to be rather frequent in many data sets of noncoding sequences, which may explain why they have not held a prominent place in many previous discussions on molecular characters.
In the assessment of microstructural mutations, it is important to consider that spacers and introns do not show a triplet-like mutation pattern, like rRNA and tRNA coding regions. The few rapidly evolving coding regions, such as matK, which sometimes deviate from this pattern in less conserved areas of the gene, however, have deviations from their reading frame generally repaired a few nucleotides up- or downstream from the initial indel (e.g., Wanke et al. 2007). Figure 4 illustrates the most important microstructural mutations of noncoding chloroplast genomic regions. For example, the motif “GCGTC” present in two of a set of four sequences (Fig. 4a) could either be inserted in sequences 2 and 3 or deleted in 1 and 4. From the alignment it is not possible to judge which assumption is correct. There is, however, no obvious sequence motif that could be associated with any particular mutational event. As we will realize further down, the simple situation described here is infrequent in the chloroplast genome. Most of the time when there are no obvious motifs directly adjacent to an indel, reconstruction of sequence evolution in the phylogenetic context will show a deletion. Larger deletions (approximately 200–300 nt) have been reported in some spacers such as the trnL–trnF and trnT–trnF spacers (Yang and Wang 2007; Sánchez del-Pino et al. 2009) and the psbA–trnH spacer (340-nt deletion in Hypericum annulatum, Kosuth et al. unpublished data).
This is different in the case of the simple sequence repeats (SSR) illustrated in Fig. 4b. Here, the same “GACAT” sequence motif appears twice, in two adjacent copies. Empirical data from plastid genomes obtained through various phylogenetic studies in different taxonomic groups revealed that SSRs represent the most common type of indels, whereas indels of random motifs that have no similarity with the adjacent region are rather rare (e.g., Graham et al. 2000; Löhne and Borsch 2005; Müller and Borsch 2005a; Stech and Quandt 2006; Borsch et al. 2007). Recent empirical studies of chloroplast sequence evolution in a phylogenetic context indicate that SSRs of a certain size (i.e., >3 nt) in the vast majority of cases are in fact insertions (Borsch et al. 2007). Such simple sequence repeats thus result from short duplications of sequence that are inserted adjacent to their template. Which of the two motifs (Fig. 4b) represents the ancestral state (= the template) can not be determined by any criterion during the alignment process, nor after testing respective homology statements in the alignment in a phylogenetic context. As long as no substitutions occur in the repeated sequence motifs, this is not problematic. In cases with substitutions, the different positional assignment of an SSR can imply different signals. The effects are, however, minor. In most cases of chloroplast DNA there is only one or rarely very few substitutions between duplicated sequences, also still allowing for easy motif recognition.
Basically, a dimeric short (tandem) repeat (STR) is nothing other than a simple sequence repeat (SSR) of two nucleotides (Fig. 4c). However, the number of insertion or deletion events can not be obtained readily from the alignment because mutations can involve more than one nucleotide dimer in one step. When such repeat motifs appear, obscuring repeat recognition, we talk about cryptic simplicity. This phenomenon is mainly attributed to microsatellites (Tautz et al. 1986) where secondary deletions of a previous duplication, secondary duplications of already duplicated elements, etc. might occur. The respective mutational events can only be detected with some certainty in a phylogenetic framework if all sequences that are the result of an evolutionary process in a set of extant taxa are included.
Moreover, the terminology and classification of such repetitive DNA are somewhat confusing in the literature. STRs are usually understood to consist of short sequences, normally not exceeding five or six nucleotides, repeated numerous times in a head-to-tail manner. In that sense, the definition is equivalent to microsatellites. Whereas STRs or microsatellites (especially nuclear microsatellites) represent the most common marker for population-level studies, analyses using plastid microsatellites are rather limited. This is due to the fact that they show lower mutation rates compared to nuclear microsatellites and are less abundant (e.g., Provan et al. 1999). When the repeat motifs range from 10 to 60 nucleotides, they are referred to as minisatellites. In contrast to nuclear minisatellites, plastid minisatellites are again less commonly used (e.g., Cozzolino et al. 2003), and patterns of variability in plastid minisatellites are poorly understood. They occur for example in stem-loop regions of intron sequences, and mutational rates may differ considerably between lineages and respective genomic regions harboring a minisatellite. Borsch et al. (2007) found no infraspecific variation in satellites located in the P8 stem-loop of the trnL group I intron in Nymphaea, although recent studies reporting extensive duplications of trnF pseudogenes in the plastid genome of Brassicaceae (e.g., Koch et al. 2005; Ansell et al. 2007; Schmickl et al. 2009, this volume) show variability of repeat number at the population level. Technically, the trnF pseudogenes could also be classified as minisatellites, but it remains to be seen if mutational mechanisms compare to the above-described satellites. The recently detected sequential repeats of 30–200 nt at the endpoints of rearranged blocks of genes in the highly restructured plastid genomes of Jasminum, Perlagonium, and Trachelium (Cosner et al. 1997; Chumley et al. 2006; Lee et al. 2007; Haberle et al. 2008) may also fall into the above category. These repeats were considered to be caused by the action of DNA strand repair mechanisms following a large mutation, such as an inversion (Haberle et al. 2008).
A further case is the so-called mononucleotide or homopolymeric “repeats” (Fig. 4d). They are abundant in the plastid genome and are commonly considered as plastid microsatellites (e.g., Weising and Gardner 1999). Ranging in size from approximately 7 to 20 nucleotides, mostly As or Ts, and usually exhibiting size variability, they are a valuable and frequently applied source of sequence change for determining different haplotypes within plant species (e.g., Echt et al. 1998; Weising and Gardner 1999). Recent analyses show that the mutational dynamics in plastid microsatellites is fast compared to sequences outside the satellites and may involve repeated and overlapping insertions and deletions from one to several nucleotides (Tesfaye et al. 2007). The uniformity of nucleotides hinders motif recognition, so that homology assessment is rarely possible once there are more than two different sizes of plastid microsatellites in a data set. Homoplasy of indel characters from microsatellites will likely be very high, so that their utility in phylogenetic analyses is limited.
It has been observed that a certain size class of SSRs, those four to six nucleotides in length, has a pronounced frequency of occurrence in plastid genomes (e.g., Graham et al. 2000; Stech and Quandt 2006; Borsch et al. 2007). Our graph (Fig. 5) shows this pattern to be widespread in noncoding chloroplast DNA. This is unexpected because, theoretically, if slipped strand mispairing (SSM) following Levinson and Gutman (1987) is assumed as the mechanism for repeat formation, the frequency of repeats should drop with the length of the repeat motif. In addition, molecular evolutionary analysis shows that there is a strong insertion bias for simple sequence repeats of three and more nucleotides, i.e., once a repeat is gained, it is very unlikely to be lost again (Borsch et al. 2007). Both observations led to the assumption that mechanisms other than SSM sensu Levinson and Gutman (1987) must be responsible for the formation of complex repeats (e.g. Quandt et al. 2006; Borsch et al. 2007). The phylogenetic utility of SSRs located outside of satellite regions is very high across different genomic partitions and plant lineages. It has now been shown in many studies that adding microstructural characters significantly increases resolution and support over the simply substitution-based matrices of chloroplast sequences (e.g., Graham et al. 2000; Simmons et al. 2001; Hamilton et al. 2003; Müller and Borsch 2005a, b; Löhne and Borsch 2005; Borsch et al. 2007). Microstructural characters are often less homoplasious than substitutions, even though most microstructural characters in plastid DNA matrices result from SSRs.
Unlike the many easily identifiable SSRs in organellar genomes, smaller inversions are less frequent and often overlooked during alignment (Fig. 4e). Based on our own experience, we know that inversions are generally not detected if an automatic alignment approach is chosen. Inversions received more attention during the past 15 years and empirical studies on noncoding as well as protein-coding organellar DNA reveal that minute to small inversions occur more frequently than previously thought (e.g., Golenberg et al. 1993; Kelchner and Wendel 1996; Graham et al. 2000; Quandt et al. 2003; Quandt and Stech 2004; Kim and Lee 2005; Hernández-Maqueda et al. 2008). Although in organellar genomes, these inversions are generally associated with hairpins (Fig. 6), their recognition is seemingly difficult, especially if subsequent mutations or limited sampling size hampers a clear identification (Hernández-Maqueda et al. 2008). Prominent examples are a 4-bp inversion found in the rpl16 intron of Chusquea (Kelchner and Wendel 1996; Kelchner and Clark 1997) or a 7-bp inversion upstream of trnF in mosses (Fig. 6; Quandt and Stech 2004). But inversions are not restricted to spacers and introns, they can also be found in coding regions such as matK (Quandt unpublished). In contrast to other microstructural mutations, the recognition of inversions is problematic, as it requires an alignment, depends on the sampling size and density, and is affected by the experience of the researcher. With no automated inversion finder being available at the moment, reports of inversions have been based on an inspection of the alignment by eye and secondary structure estimates (e.g., Kelchner and Wendel 1996; Quandt et al. 2003; Hernández-Maqueda et al. 2008).
Given the fact that inversions make up entire loops of putative hairpins (Fig. 6), it seems unlikely that inversions can be explained via multiple independent substitutions. Thus, although the mechanism for the formation of inversions is still unknown, the most likely and commonly assumed way to explain these hairpin-associated inversions (HAIs) is a single mutational event (Kelchner and Wendel 1996; Graham et al. 2000; Kelchner 2000; Quandt et al. 2003; Kim and Lee 2005). In terms of generating a homology hypothesis, a safe approach would be to separate inverted and noninverted sequence elements in different columns of the alignment (Fig. 6), as positional homology is not recovered when the inverted nucleotides are aligned directly with noninverted ones. Different suggestions exist on how to handle inversions in alignment coding, and most widely accepted is the proposal to reverse-complement one inversion type in the data matrix (Fig. 6; Graham and Olmstead 2000; Kelchner 2000; Kelchner and Wendel 1996; Quandt et al. 2003; Löhne and Borsch 2005; Sotiaux et al. 2009). In this case, substitutions in the inverted motif are retained as informative characters. In order to use information on the occurrence of inversions in a data set, they can be coded as a single binary character and appended to the data matrix as discussed in Quandt et al. (2003). Interestingly, evidence is accumulating that in most cases the presence/absence of inversions is highly homoplastic, even at the population level (Quandt et al. 2003; Quandt and Stech 2004; Sotiaux et al. 2009).
There are further special cases of small inversions. Occasionally observed are the so-called inverted repeats. These small inverted repeats are not located in a terminal loop of a hairpin but represent an inverted simple sequence repeat (Fig. 4f) and can be treated as such in alignment coding. Graham and Olmstead (2000) found an unusual chloroplast DNA inversion of about 200 nt in the IR of the plastid genome of Laurales and Nymphaeales that is flanked by short highly conserved inverted repeats. The authors considered that these inverted repeats may have played a role in the inversion mechanism. Hupfer et al. (2000) similarly detected a number of adjacent small inverted repeats close to a 54-kb inversion in Oenothera elata. It is likely that we will become aware of more such cases with the availability of more completely sequenced chloroplast genomes.
Common noncoding markers and their phylogenetic utility
During the past years, an increasing number of chloroplast spacers and introns have been used in plant evolutionary studies. The most frequently used and thus most important of these markers are discussed in the following section. For practical reasons, we treat genomic regions as “markers,” i.e., units that are normally co-amplified and then sequenced. To facilitate the combination of data sets via sequence databases (EMBL, GenBank), it is actually necessary and practical that the phylogenetic community focus on certain markers, which can then be routinely applied, rather than if individual researchers all use different markers. Most importantly, genomic regions favored as markers should be well understood in terms of their molecular evolution, in order to make full use of their information content and in order to interpret variability patterns correctly. We therefore first present fundamentals of molecular architecture of each genomic region, then discuss practical aspects of amplification and sequencing, and noteworthy peculiarities in their molecular evolution, which are often lineage-specific. We then report on what is known about the performance of the markers in phylogeny reconstruction, analyses of speciation, and their potential for species identification (barcoding). Beyond obvious general trends in phylogenetic utility caused by common structures of the molecules such as the one present in group II introns, there is evidence for additional, rather specific mutational patterns inherent to individual plastid introns and spacers that we present here.
The group II intron in petD
The psbB operon contains five genes (psbB, psbT, psbH, petB, and petD; Westhoff and Herrmann 1988) and is located in the large single-copy region. The petD group II intron resides in the upstream part of petD, following the 8-bp 5′ exon of the gene, and is one of the medium-sized group II introns in the chloroplast genome.
Universal primers were designed by Löhne and Borsch (2005) to co-amplify the intron with the petB–petD spacer, yielding fragments of 850–1,200 nt in angiosperms and gymnosperms. Both universal primer pairs (PipetB1365F and PipetD738R; PipetB1411F and PipetD738R) have been shown to universally amplify the region in a number of studies across seed plants (Löhne and Borsch 2005; Löhne et al. 2007; Worberg et al. 2007; Kårehed et al. 2008; Groeninckx et al. 2009; Korotkova et al. 2009, this volume). According to our experience, petD amplification works extremely well, even with lower quality DNA from herbarium specimens, and usually produces high yields with hardly any other co-amplified products.
Mutational hotspots are restricted to smaller distal parts of stem-loops in domains I and IV (Löhne and Borsch 2005; Worberg et al. 2007), so that alignment is possible among more distantly related sequences. An independent growth of an AT-rich stem-loop region in domain I must have occurred during the evolution of Euphorbiaceae (Korotkova et al. 2009, this volume), resulting in sequences of 100–200 nt that are unique to this lineage. The petD intron may thus be particularly valuable for future studies in Euphorbiaceae. A similar pattern is known from the P8 stem-loop of the trnL group I intron (Borsch et al. 2003, 2007; Quandt et al. 2004). Sequencing of petD is usually very convenient because large polyA/T homonucleotide strands are rare. Given the ease of labwork, its good alignability, and high phylogenetic structure, petD appears to be an ideal marker for angiosperm phylogenetic analysis.
Comparison of phylogenetic structure for identical taxon sets in Nymphaeales (Löhne et al. 2007) indicated slightly lower phylogenetic structure R for the intron as a whole compared to the introns in trnK and rpl16, and also compared to the matK gene. A reason for this rather unusual result in water lilies may be smaller stem-loop elements in domains I and IV that lead to a higher average ratio of more conserved stem elements in the petD intron as compared to the other introns. Kårehed et al. (2008) and Groeninckx et al. (2009) found the petD region to be by far the most phylogenetically informative chloroplast marker in the tribe Spermacoceae of Rubiaceae, compared to the atpB–rbcL spacer, the rps16 intron, and the trnL–trnF region. The petD marker exhibited about twice as many parsimony informative substitutions and indels and showed the best values in a partition metric analysis (Penny and Hendy 1985) of the petD contribution to a combined plastid and nuclear tree. However, the authors did not distinguish between the petB–petD spacer and the group II intron in petD (both constitute the “petD marker”) and did not provide any explanation for such high performance based on molecular evolution. For inferring deep nodes in angiosperms, petD performed similar to matK (Löhne and Borsch 2005; Worberg et al. 2007). Korotkova et al. (2009, this volume) generated the so far best resolved tree of the order Malpighiales using petD sequences.
First studies within speciose genera such as Campanula (Borsch et al. 2009) further underscore the high utility of petD for tree inference and species identification. The absence of long polyA/T stretches makes this intron very straightforward to sequence. There is so far only limited observation about the variability of petD sequences at the species level. Whereas the marker may not be suitable for population-level studies, it could be promising for DNA-barcoding, considering that it is easier to sequence than other noncoding plastid regions, it is one of the few plastid introns that appears to be present in most plant lineages, and it exhibits higher variability than plastid genes.
The group II intron in rpl16
The rpl16 gene is usually flanked by rpl14 and rps3 in the LSC near the IR border of streptophyte plastid genomes. In contrast, Chumley et al. (2006) report a duplication of rpl16 due to an expanded IR in Pelargonium × hortorum that includes rpl14–rpl16–rps4. It has been lost at least in some Geraniaceae, Goodeniaceae, and Plumbaginaceae (Campagna and Downie 1998).
Two different strategies for amplification of rpl16 are common in the literature. The early method placed the forward primer (F71) into the only 9-nt-long rpl16 5′ exon, including nucleotides of the rps3–rpl16 IGS as well as the rpl16 intron (Jordan et al. 1996). The reverse primer (R1661) was positioned well inside the rpl16 3′ exon, approximately 100 nt from the intron/exon junction (Jordan et al. 1996). For sequencing, Kelchner and Clark (1997) substituted the reverse primer R1661 with R1516, which is situated closer to the intron/exon junction, and thus avoids 72 nt of the 3′ exon. Probably due to the review of Shaw et al. (2005) that suggested primer R1516 already for amplification, the F71/R1516 primer combination is nowadays the most frequently used, including minor primer sequence modifications by, e.g., Campagna and Downie (1998). Degenerated primers by Small et al. (2005) were then also able to amplify rpl16 from lycophytes and monilophytes. Because this primer combination turned out to be problematic for bryophytes, Olsson et al. (2009) designed a new reverse primer between R1661 and R1516 that performs very well in combination with F71. The more recent approach, originally developed for angiosperms, co-amplifies the rpl16 intron with the rps3–rpl16 IGS, substituting F71 with a forward primer situated further upsteam in rps3 (e.g., Downie et al. 2000; Tesfaye et al. 2007). This approach allows easy amplification and is highly recommended for two reasons: first, it allows recovery of complete intron sequences that then can also be used for molecular evolutionary studies, and second, it facilitates sequencing.
In several lineages, microsatellites are located in domain I close to rps3. If the overall size of the intron exceeds 800 nt, and a second primer is required for sequencing, it is best to use the forward amplification primer also for sequencing. If this forward primer can be placed more than 200 nt away from the microsatellite in the approach described here, many sequence reads will not show problems related to slippage of the satellite sequence. The rpl16 intron is one of the largest plastid introns, similar in size to the trnK intron. It displays more variability compared to most other group II introns, especially with regard to length mutations (e.g., Kelchner 2002; Löhne et al. 2007; Tesfaye et al. 2007). The reason for this appears to be the large domains I and IV with extensive stem-loop elements.
Data sets of rpl16 intron sequences so far examined have comparatively high phylogenetic structure R, matching or exceeding those of the trnK intron and matK data sets (Löhne et al. 2007; Sánchez del-Pino et al. 2009). Kelchner (2002) reported heterogeneous mutation patterns using Myoporaceae (= Scrophulariaceae) as an example, measured as substitutions across the rpl16 intron, and postulated a heterogeneous mode of mutation to be a general feature of group II introns. Despite the observed mutational heterogeneity, the percentage of potentially informative characters for phylogenetic analysis seemed evenly distributed between domain and structural classes (Kelchner 2002). Among indel characters, heterogeneity was even more obvious, as indel-prone areas were confined to peripheral elements or internal bulges of the intron and thus could be structurally defined. Frequency, distribution, and exact location of indel-prone areas, however, seem to depend on the lineage and might therefore differ considerably among studies based on our experience. Estimated secondary structures for the intron have been published (e.g., Kelchner 2002; Wu et al. 2009). Further work evaluating mutational patterns in the context of intron secondary structures in different plant groups is thus needed. Although the rpl16 intron has been predominantly used for inter- and infrageneric phylogeny inference in plants (compare Kelchner 2002; Shaw et al. 2005), the intron offers some potential both at deeper phylogenetic levels (Barniske et al., unpublished data) as well as at population levels. Similar to other group II introns, mutational dynamics in the rpl16 intron shows a mosaic-like pattern with conserved helical elements (dominating domains II, III, V, and VI) and highly variable AT-rich stem-loop elements (dominating domains I and IV in particular). Mutational hotspots can easily be identified and excluded in phylogenetic analyses. The rpl16 intron may be one of the most phylogenetically useful introns, if not the most useful. However, because of the frequency of polyA/T microsatellites with >10 repeat units, use of this marker is a tradeoff between high performance and high sequencing effort due to the need for three or more sequencing primer reads, depending on the study group.
Similar to the trnK intron, the rpl16 intron exhibits significant infraspecific variability, and in combination with other markers, is suitable for population studies in plants. This is even true for pleurocarpous mosses that are known for their notorious paucity of molecular variability (e.g., Huttunen et al. 2008; Hedenäs 2009; Sotiaux et al. 2009).
The group II intron in rps16
The Rps16 gene including its intron is present in most plastid genomes of streptophytes but not in all. If present, it usually resides in the LSC between chlB and trnK/matK in nonflowering plants, whereas it is located between trnQ and trnK in most angiosperms because of the loss of chlB (Boivin et al. 1996). Multiple losses of rps16 among and within the major land-plant lineages can be assumed based on available plastid genome sequences. For example, the intron is present in algal lineages, such as Chara vulgaris (Turmel et al. 2006) and Chaetosphaeridium globosum (Turmel et al. 2002), but absent in both sequenced liverworts, Aneura and Marchantia (Ohyama et al. 1986; Wickett et al. 2008), and the moss Physcomitrella (Sugiura et al. 2003). However, it is present in the hornwort Anthoceros (Kugita et al. 2003). Among lycophytes, it occurs in Huperzia (Wolf et al. 2005) but is lost again in Sellaginella (Tsuji et al. 2007). Among monilophytes, rps16 is present with its intron in Adiantium (Wolf et al. 2003) and Angiopteris (Roper et al. 2007), whereas it was lost in Psilotum (Wakasugi et al. 1998). Among gymnosperms and Gnetales, a similarly patchy scenario emerges. Rps16 is absent in the Gnetales (McCoy et al. 2008; Wu et al. 2009) and Keteleeria (Wu et al. 2009) as well as in Pinus (Tsudzuki et al. 1992; Wakasugi et al. 1994), but present in cycads (Wu et al. 2007) as well as in Cryptomeria (Hirao et al. 2008) and two other Cupressaceae (Shaw et al. 2005). Within angiosperms, multiple independent losses in various papilionoid legumes have been reported (e.g., Doyle et al. 1995; Jansen et al. 2008). The rps16 including its intron is also entirely absent in various other plastid genomes such as those of Adonis (Ranunculaceae; Johansson 1999), Epifagus (Orobanchaceae; Wolfe et al. 1992), Passiflora (Passifloraceae; Jansen et al. 2007), Dioscorea (Dioscoreaceae; Hansen et al. 2007), and Trachelium (Campanulaceae; Haberle et al. 2008).
Two different primer sets are proposed in the literature for rps16 intron amplification, the first one (rpsF/rpsR2) designed by Oxelman et al. (1997) for a study of Caryophyllaceae. This primer set has subsequently been used by many who study Rubiaceae (e.g., Andersson and Rova 1999; Persson 2000) or Arecaceae (Baker et al. 2000). After introducing a couple of wobble bases in both primers, Shaw et al. (2005) report the region to be easily amplified and sequenced across angiosperms if the intron itself is present. The second set of primers was developed by Downie and Katz-Downie (1999) in a study of southern African Apioideae, and a modification was published to amplify it in the legume tribe Glycininae (Lee and Hymowitz 2001). The Downie and Katz-Downie (1999) primer pair that displays a high degree of wobble bases results in a slightly longer amplicon, as the reverse primer is located approximately 110 bp downstream of the Oxelman et al. (1997) primer rpsR2. The forward primer was moved closer to the 5′ exon border, although still in overlap with rpsF. Thus, both primer pairs can be used in a nested approach in cases of problematic amplifications.
The phylogenetic use of this region is limited by its scattered occurrence in plant lineages, which, in any case, hampers deep-level phylogenetics. Studies using rps16 usually address family- to species-level relationships (e.g., Oxelman et al. 1997; Andersson and Rova 1999; Downie and Katz-Downie 1999; Baker et al. 2000; Lee and Hymowitz 2001). Reported variability among closely related taxa seems to be rather low, and the same appears to be true for phylogenetic structure R in rps16 data sets, although detailed comparisons with other noncoding markers have not been carried out so far. A comparison by Kårehed et al. (2008) in the Rubiaceae tribe Spermacoceae indicated that using the rps16 intron as a marker yielded comparable numbers of parsimony informative characters compared to the trnL–trnF-region and the atpB–rbcL spacer, although the number of parsimony informative indels was smaller. There are very few data on molecular evolutionary patterns in the rps16 intron.
However, in addition to its apparently low interspecific variability, the region has no potential to serve as a universal species identification marker, either in angiosperms or in land plants, due to its scattered occurrence in plastid genomes.
The group II intron in trnK
Apart from a few exceptions, such as in the parasite genera Epifagus and Cuscuta, the lycophyte Selaginella, and the majority of leptosporangiate ferns (e.g. Wolfe et al. 1992; Wolf et al. 2003; Funk et al. 2007; Tsuji et al. 2007; Duffy et al. 2009; own data unpublished), the trnK intron is otherwise universally present in land plants. The intron is usually situated in the LSC next to chlB or rps16 (compare rps16 section below) on the 5′ side and psbA on the 3′ side. However, due to occasional plastid reorganizations, the 5′ and 3′ adjacent genes might change, such as in various leptosporangiate ferns or Lotus (Funk et al. 2007; Duffy et al. 2009). In domain IV, the intron hosts an ORF (commonly referred to as matK) encoding a protein with high structural similarity to other group II intron ORFs (Kelchner 2002; Hausner et al. 2006). The matK gene is the only intact ORF within plastid group II introns.
For flowering plants, this marker is already well established and easily accessible with standard PCR primers annealing to the conserved trnK exons (Johnson and Soltis 1995; Hilu and Liang 1997). Due to deletions in primer binding sites and frequent substitutions, these primers have their limits in studies of early diverging land-plant lineages. Thus, Long et al. (2000) suggested a new set of primers suitable for bryophytes. Because amplification attempts were not successful among early land plants, Hausner et al. (2006) as well as Wicke and Quandt (2009) were forced to develop new primer sets and protocols, which later turned out to be universal for land plants and showed high amplification efficiency. Because amplification primers are situated in the trnK exons, the fast evolving gene matK (approximately 1,500 bp) is co-amplified, yielding an amplicon of approximately 2.6 kb and the need for several internal sequencing primers. For more difficult templates (e.g., DNA isolates from herbarium specimens), the amplification in two overlapping halves is advisable. However, lineage-specific primers have to be used due to the variability of the matK CDS. Microsatellites composed of polyA/Ts do not usually exceed 10 or 12 nt and are thus straightforward to read over in cycle sequencing. An up-to-date list with internal amplification and sequencing primers for flowering plants is available from the Eudicot Evolutionary Research Website (www.eudiots.de).
In contrast to the only plastid group I intron, where length variation is largely confined to P8, mutational hotspots with a high degree of length mutations and independent extensions in the trnK group II intron are more evenly distributed across the domains/subdomains, and comprise fewer nucleotides (e.g., Müller and Borsch 2005a; Hausner et al. 2006; own unpublished data). The size restriction of hotspots together with the more even distribution of indels allows confident alignment across a broad range of taxa, even if representatives of all major land-plant lineages are included. Thus, whereas only about 220 bp of the trnL intron core can be used for phylogenetic reconstruction across land plants, the trnK intron is a prime noncoding and plastid candidate for deep-level phylogeny reconstruction in land plants. A large chloroplast genome inversion with an endpoint inside the 3′ part of the trnK intron, shifting the trnK exon 2 about 59 kb away in Menodora (Oleaceae), represents a rare exception (Lee et al. 2007). The only known duplication of the trnK intron in angiosperms occurred in the common ancestor of Nepenthes (Nepenthaceae; Meimberg et al. 2006), where the duplications are easily recognized as nonfunctional because of many substitutions and deletions in normally conserved parts. The location of the pseudogeneous copy is, however, not yet known. There is a gradient of variability within domain I, with about 100–150 nt close to the 3′ trnK exon being very conserved (Müller and Borsch 2005a, b). Mutational hotspots are located largely in terminal stem-loops of domains I and IV and are restricted to microsatellites (except Aristolochiaceae). The presence of frequent inversions within the 3′ intron part should be noted (e.g., Müller and Borsch 2005a; Wanke et al. 2007), but these can be easily identified.
The trnK intron is nearly universally present in land plants and angiosperms, and given the phylogenetic structure observed in most data sets, it represents an ideal marker for phylogenetic reconstruction at all levels. Due to previous difficulties in amplifying trnK in many non-angiosperm land plants, only one study exists using trnK (including matK) to address deep relationships in land plants (Long et al. 2000), whereas the region is frequently applied within various seed plant lineages ranging from species to family level (e.g., Meimberg et al. 2001; Miller and Bayer 2003; Donoghue et al. 2004; Chaw et al. 2005; Müller and Borsch 2005a, b; Rahmanzadeh et al. 2005; Wanke et al. 2006, 2007; Beck et al. 2008). The number of parsimony informative characters as well as phylogenetic structure provided by the trnK intron even outcompetes fast-evolving coding regions, such as matK, which is interesting in that the latter target is twice as long (e.g., Müller and Borsch 2005b; Young and dePamphilis 2000).
The trnK intron has successfully been employed in studies of speciation and phylogeography of various angiosperm genera (e.g., Watanabe et al. 2006). It appears to yield rather high infraspecific variation compared to other markers usually applied for this purpose (trnL–trnF and rpl16 intron), and in some angiosperm lineages specific AT-rich satellite-like stem-loops have evolved that offer a high potential as population markers, such as those reported for Aristolochia (Wanke et al. 2006). Sequences of the trnK intron are thus very promising molecular species identifiers, following a thorough assessment of infraspecific variation.
The trnS–trnG region
In angiosperms, this region consists of an intergenic spacer between trnS(GCU) and trnG(UCC) and a group II intron in trnG(UCC). In the paraphyletic bryophytes, lycophytes, monilophytes, and gymnosperms, additional genes (ycf12 and psaM) are located between trnS (GCU) and trnG (UCC). The latter (psaM) was lost in the gnetophytes as well as in cycads, and in angiosperms both ycf12 and psaM are absent. The trnS–trnG region is generally situated in the LSC in close proximity to atpA on one side and rps16 on the other (but compare section on rps16). In the only published lycophyte plastid genome (Huperzia lucidula), trnG has been lost (Wolf et al. 2005). Additionally, the trnS–trnG spacer is absent in putative algal ancestors of land plants due to a relocation of trnS(GCU) (e.g., Turmel et al. 2002).
Initial studies employing the region aimed solely at sequencing the trnS–trnG IGS. Thus, the first sets of primers by Hamilton (1999) and Xu et al. (2000) were located in trnS and the trnG 5′ exon. At the same time, Pacak and Szweykowsska-Kulinska (2000) provided a set of primers to amplify the trnG intron alone, in a study that traced the origin of organelles in the allopolyploid hybrid Pellia borealis. Subsequently both approaches have been employed, such as in Pedersen and Hedenäs (2003), Perret et al. (2003), and Sakai et al. (2003). Considering the amplification of the whole trnS–trnG region including the trnG intron to be more efficient in angiosperms, which yields an amplicon of approximately 1.6 kb, Shaw et al. (2005) designed a new set of primers with a forward primer in trnS (upstream of the Hamilton primer) and the reverse primer in the trnG 3′ exon. Unfortunately, this set of primers has proven to be troublesome for some angiosperm groups (see Shaw et al. 2007) as well as monilophytes (Murdock 2008) due to various mismatches and indels; this led Tesfaye et al. (2007) to publish an improved primer set for complete trnS–trnG region amplification. In addition, Shaw et al. (2007) provided a new set of primers purported to work for most flowering plants, with a completely new forward primer situated between the Hamilton (1999) and the initial Shaw et al. (2005) primer; the reverse primer was just moved a couple of nucleotides upstream in the trnG 3′ exon. Similarly, the primers by Murdock (2008), of which the F1 primer is just a modification of the Shaw et al. (2005) primer, offer a new set that is reported to work in most nonflowering plant lineages (Murdock 2008).
So far, no comparative studies, either for the intron or the spacer, are available, and the exact secondary structure of the group II intron has not been assessed. However, several studies indicate that the trnS–trnG IGS provides more variable and informative sites compared to commonly used spacers such as the atpB–rbcL IGS and the trnL–trnF IGS as well as the trnL intron based on comparisons of sequences from closely related species (Shaw et al. 2005 and references therein). Similarly, the trnG intron in bryophytes seems to harbor at least more variability than, e.g., the whole trnL–trnF region (Pacak and Szweykowsska-Kulinska 2000; Pedersen and Hedenäs 2003). Levels of variability appear to be similar to the trnK intron (Tesfaye et al. 2007), but assessments of phylogenetic structure in larger taxon sets have not yet been contrasted to other introns and spacers.
An exceptional rearrangement can be found in the common ancestor of Ecdeiocoleaceae, Joinvilleaceae, and Poaceae (monocots), where a 6-kb inversion occurred with an endpoint in the downstream half of the trnS–trnG spacer (Michelangeli et al. 2003). The intron residing in trnG is not affected, thus apart from the reported exceptional loss in lycophytes, the trnG intron appears to be uniformly present in land plants. While the loss of ycf2 and psaM would be problematic for deep-level phylogenetic inferences across land plants, due to problematic homology assessment, this is unproblematic for analyses within major clades of land plants.
The nearly overall presence together with a high level of sequence variation suggests at least the intron can be used for DNA-based species identification. It remains to be seen whether the disrupted trnS–trnG spacer in Poales results in limited use of the region for studies in grasses.
The trnT–trnF region
This region constitutes the most frequently applied marker in plant molecular systematics (Shaw et al. 2005) and consists of a nontranscribed spacer between trnT(UGU), a group I intron in trnL(UAA), and a transcribed spacer between trnL and trnF(GAA). TrnL and trnF are both encoded on the A strand and are transcribed counter-clockwise, whereas trnT is encoded on the B strand and transcribed clockwise (Hiratsuka et al. 1989; Kanno and Hirai 1993). Although the trnL–trnF IGS is a transcribed spacer, a rather conserved putative PEP (plastid-encoded polymerase) promoter similar to the bacterial sigma70-type (TTGACA) occurs approximately 45 nt upstream of trnF in most land-plant lineages (e.g., Quandt and Stech 2004; Quandt et al. 2004; Fig. 2). The function of this promoter still remains to be verified experimentally. The trnL group I intron is an ancient intron (Kuhsel et al. 1990) that comprises highly conserved PQRS elements corresponding to the catalytic core (Cech 1988; Michel and Westhof 1990; Besendahl et al. 2000) plus variable P6 and P8 stem-loop elements (compare Fig. 3; Borsch et al. 2003; Quandt et al. 2004), both of which contain most of the phylogenetically informative characters. Length variability is mostly due to the microstructural mutations in P6 and P8, rendering average intron sizes between 450 and 600 nt in most angiosperms. The trnL–trnF IGS ranges between 350 and 430 nt (Quandt et al. 2004), so that both intron and spacer can easily be co-amplified. The spacer between trnT and trnL exhibits a threefold size variation across angiosperms (about 480 nt in Amborella and most Nymphaeales; Borsch et al. 2003, 2007), with a maximum known size of >1,400 nt in Tofieldia (Tofieldiaceae) and Houttuynia (Saururaceae; Borsch et al. 2003). This points to a disadvantage in that complete sequencing of the larger-sized spacers requires internal, very specific primers. There is a large mutational hotspot in the upstream half of the spacer, and additional sequence elements (sometimes several hundred nt) in those taxa with a large trnT–trnL spacer seem to be inserted within this hotspot (Borsch et al. 2003). However, the origin of these sequence elements is still unclear. The trnT–trnL spacer shows a higher frequency of microstructural changes compared to the trnL intron and the trnL–trnF IGS (Borsch et al. 2007). Both spacers and the intron are smaller in most nontracheophyte land plants and especially in the three bryophyte lineages (Stech et al. 2003; Quandt and Stech 2004), where the spacers rarely exceed 200 nt. The tandem arrangement of three tRNA genes in the trnT–trnF region is a synapomorphy of land plants (Quandt et al. 2004), providing a prerequisite for application to a broad spectrum of phylogenetic questions in land plants.
Taberlet et al. (1991) designed universal primers annealing to the three tRNA genes. Whereas primers trnTc and trnTf can be routinely and efficiently used to co-amplify the trnL intron and the trnL–trnF spacer, the amplification of the trnT–trnL spacer is more difficult. Amplification of the complete trnT–trnF-region, using primer trnTa, annealing to the trnT gene, and trnTf, annealing to the trnF gene, does not work in many angiosperms, and often also produces multiple bands. Sequence variability in the tRNA genes is limited but present primarily in the trnL(UAA) 3′ exon (Quandt et al. 2004) and to a lesser extent in the 5′ exon. In the anticodon site of the trnL gene of Ginkgo, several leptosporangiate ferns, and the moss Takakia, a change from UAA into CAA is observed, suggesting anticodon editing (Quandt et al. 2004) that has been recently proved at least for Takakia (Miyata et al. 2008). Variability in the exons also implies that trnL–trnF of certain land-plant lineages can be better amplified with modified primers (the Eudicot Evolutionary Research Website provides an up-to-date list). Sauquet et al. (2003) used another strategy to place a forward primer into the upstream rps4 gene and a reverse primer into the highly conserved P-element of the trnL intron (trnL110R; universal primer for angiosperms). This primer combination has easily yielded specific PCR products in Nymphaeales and magnoliids (e.g., Sauquet et al. 2003; Borsch et al. 2007; Löhne et al. 2007) but has not yet been applied more extensively. Both primers also work well for sequencing. Similarly, Quandt et al. (2004) developed an internal primer (trnL_P6/7) annealing to the highly conserved R element in order to amplify and sequence the region for early diverging land plants. Mort et al. (2007) reported problems with amplifying the trnT–trnL spacer in a variety of eudicots (see primers published by Shaw et al. 2005). Rather than using the amplification primers, sequencing of the trnL–trnF region can be done best with primers trnTd (Taberlet et al. 1991) and trnL460F (Worberg et al. 2007). Primer trnL460F was developed based on secondary structures of the trnL intron (Borsch et al. 2003; Quandt et al. 2004), anneals to the highly conserved S-element, and is universal at least in angiosperms. Primer trnTd anneals to the 3′ exon of trnL and overlaps more than 120 nt with trnL460F.
The independent growth of the P8 stem-loop in various land-plant lineages (see below; Fig. 6) is a characteristic feature of the trnL intron. Within the respective lineages (e.g., angiosperms and gymnosperms, hornworts, Sphagnum, lycopods, leptosporangiate ferns) most of P8 can be aligned unambiguously, and in angiosperms, it in fact provides a major part of phylogenetically informative sites. In addition, several smaller lineages within angiosperms such as Nuphar or the Nymphaea subg. Hydrocallis + Lotus-clade of water lilies (Borsch et al. 2007) possess further AT-rich, satellite-like elements in P8. Their variability between species is high, although they seem to contain rather limited phylogenetic structure R. In Nymphaea, the AT-rich P8 elements are obviously rather conserved within species. Quandt et al. (2004) and Borsch et al. (2007) expressed the idea that secondary structure formation stabilizes the stem-loop regions and thus limits infraspecific variability. However, this needs to be examined in more taxa to get a representative picture. Cozzolino et al. (2003) present a case where SSM in an AT-rich minisatellite region causes intrapopulational variation in Anacamptis palustris, but the authors did not determine the exact structural position of this satellite DNA within the intron, so it cannot be readily compared with the situation in Nymphaea. Pirie et al. (2007) described a rare case of an ancient duplication of the trnL–trnF region in the common ancestor of Annonaceae with intact exons of the tRNA genes but deviating sequences of the paralogous intron and spacer copy. The trnL–trnF spacer can show large deletions (200 nt or more) affecting most of its length, as observed by Yang and Wang (2007) in Pedicularis (Orobanchaceae) or by Sánchez del-Pino et al. (2009) in two different clades of Amaranthaceae (approximately 163 and 215 nt, respectively). Whereas a putative σ-type 35 promoter was still present in Amaranthaceae, a conserved −10 element could not be found. All large deletions occurred independently in unrelated lineages and appear not to be restricted to semiparasitic plants as suggested by Yang and Wang (2007). Large indels were also found in the P6 and P8 stem-loop elements of the trnL intron of certain Rhamnaceae genera (Kellermann and Udovicic 2008), mostly resembling independent losses of about 45 and 125 nt, respectively. Similarly, Hernández-Maqueda et al. (2008) report extensive deletions and complex inversion patterns in the P8 stem-loop element of the Grimmiaceae up to the point that almost the entire P8 is inverted. Molecular evolution of trnT–trnF in Gnetales is special and has produced short and highly deviant sequences (Won and Renner 2005). Promoter elements are missing, which led Won and Renner (2005) to suggest a specific mechanism for trnL–trnF transcription and tRNA processing that basically utilizes hairpin structures. Inversions appear to be relatively rare in the trnT–trnF region. Kocyan et al. (2007) describe an interesting 35- or 40-nt inversion just upstream from the −35 promoter in the trnL–trnF spacer that is synapomorphic to Cucurbitaceae but was re-inverted again in two sublineages.
Müller et al. (2006) calculated the phylogenetic structure per informative site of the complete trnT–trnF region in comparison to matK and rbcL using 42-taxon angiosperm data sets with exactly matching species and found trnT–trnF to clearly outperform matK. The spacers and the intron of the trnT–trnF region not only had the highest number of variable and informative characters but also the best signal quality (i.e., highest R value) per informative site. In addition to its frequent use to infer infraspecific and infrageneric relationships (e.g., van Ham et al. 1994; Böhle et al. 1996; Gielly and Taberlet 1996; Small et al. 1998; Bakker et al. 2000; Borsch et al. 2007; Galley and Linder 2007), the region has been very successfully applied in all cases of phylogenetic questions between families (e.g., Renner 1999; Sauquet et al. 2003; Löhne et al. 2007) or major clades of angiosperms (Borsch et al. 2003; Worberg et al. 2007, 2009). The complete trnT–trnF region was employed in a minority of cases (e.g., Böhle et al. 1996; Small et al. 1998; Borsch et al. 2007; Galley and Linder 2007; Hernández-Maqueda et al. 2008). Alignment is straightforward with high variability confined to clearly recognizable hotspots (Borsch et al. 2003), underscoring the high utility of the region as a phylogenetic tool. Despite its known phylogenetic structure, the application of the trnT–trnL IGS as a universal marker is not practical due to difficult amplification in many taxa and the need to exclude a major part of spacer sequences in a mutational hotspot when taxa with a large trnT–trnL IGS are included in data sets. The trnL intron and the trnL–trnF spacer are universally useful markers for application in a broad spectrum of phylogenetic questions.
The upstream part of the region including the trnL intron and the trnL–trnF spacer has been successfully used in studies of haplotypes in various angiosperms (e.g., McCauley 1994; Hamilton et al. 2003). The insertion of multiple trnF pseudogenes into the downstream part of the trnL–trnF spacer is known from different lineages of Brassicaceae (Koch et al. 2007; Schmickl et al. 2009, this volume), and Asteraceae (Microseris, Vijverberg and Bachmann 1999). The evolution of trnF pseudogenes exhibits high rates of mutation and levels of homoplasy (Ansell et al. 2007) and leads to large numbers of different haplotypes within the respective species. Such excessive variation in the trnL–trnF spacer will probably pose problems to straightforward species identification unless haplotypes are fully sampled for species defined with further biological and morphological characters. Taberlet et al. (2007) designed barcoding primers for the trnL intron based on secondary structure data from Borsch et al. (2003) and successfully used short trnL sequence fragments for identifying plant remains in arctic permafrost soils. TrnL intron and trnL–trnF spacer together appear to be a good barcoding candidate because the region is universally present in land plants and shows relatively few and short homonucleotide repeats. By contrast, the trnT–trnL spacer proposed for angiosperm barcoding by Edwards et al. (2008) cannot be recommended due to the limitations described above.
The psbA–trnH spacer
This region is one of the most variable intergenic spacers of the chloroplast genome (Shaw et al. 2007; Timme et al. 2007) and is located downstream of the trnK intron that includes the matK gene, in the LSC region close to the IRa in angiosperms. The sequence downstream of psbA is transcribed; a TATA-box is followed by a stem-loop in the RNA secondary structure that is believed to function as a transcription stop for psbA (Štorchová and Olson 2007). This untranslated part (UTR) is about 28–70 nt in angiosperms and is followed downstream by a much more variable untranscribed part of extreme length variation, from 200 to 1,077 nt. The longest psbA–trnH spacer known is found in Trillium (Shaw et al. 2005).
Sang et al. (1997) first designed primers that anneal to the psbA and trnH genes and first used them in Paeonia. Hamilton (1999) designed universal primers (H and PSBA) that anneal to psbA and trnH, respectively, which have been successfully used by many researchers across angiosperms.
The trnH–psbA spacer is close to the IR boundary and therefore is likely to be affected by expansions and contractions of the inverted repeat structural mutations. In monocots, the trnH–rps19 cluster is located within the inverted repeat, due to its IR expansion. Chang et al. (2006) speculate that the trnH–rps19 cluster was duplicated in the early evolution of monocots, resulting in a translocation of a rps19 paralogue into the trnH–psbA spacer. Because the Acorus genome (Leebens-Mack et al. 2005) also has an rps19 gene within the psbA–trnH spacer, this translocation appears to have occurred in the common ancestor of the monocots. However, it is not present in the Ceratopyllum (Moore et al. 2007) or other nonmonocot angiosperm genomes and cannot provide a hint to the monocot’s nearest relatives. In the orchid Phalaenopsis aphrodite, the rps19 gene is relatively large (251 nt; Chang et al. 2006), whereas its size ranges between 35 and 50 nt in the other monocots. It not clear yet how the actual UTR and untranscribed noncoding spacer sequence elements are affected by these changes in genome structure, and whether orthologous segments are universally present across angiosperms.
Phylogenetic use of the spacer will therefore be restricted to analyses within angiosperm lineages exhibiting orthologous sequence composition. A special, rearranged trnK–psbA region involving a tandem duplication is known from Pinus banksiana and Pinus contorta (Lidholm and Gustafsson 1991; Lidholm et al. 1991) that also affects the sequence composition of the psbAII–trnH spacer. Translocation of sequence elements thus is a potential source of error in this case. Given that double bands result from PCR amplification of psbA–trnH in most cycads (except the genus Cycas; Sass et al. 2007), psbA appears to be present in two copies in this lineage. However, it is currently not known if one of these copies is a pseudogene and if it is located as well in the cp genome, and a detailed analysis of the trnK(matK)–psbA region in gymnosperms has not yet been carried out. The psbA–trnH spacer falls within the IR in leptosporangiate ferns such as Adiantum (Wolf et al. 2003).
The high plasticity of the spacer with frequent indels has resulted in distinct sequences among even closely related species, as first described by Aldrich et al. (1988). Hamilton et al. (2003) also described a high indel mutational rate relative to substitutions in Lecythidaceae. Inversions and inverted repeats were found in the region by Sang et al. (1997). Štorchová and Olson (2007) showed that the RNA loop serving as a putative transcription end of the psbA 3′ UTR is present across angiosperms. Hairpin-associated inversions have been observed frequently (Štorchová and Olson 2004, 2007; Ferrufino et al. unpublished data), and the distal loop elements in fact appear to flip-flop even at the population level, similar to what was described for the trnL–trnF spacer in mosses (Quandt and Stech 2004). Štorchová and Olson (2007) also show extremely high infraspecific variability in four different sectors across psbA–trnH of Silene latifolia and S. vulgaris, not limited to mononucleotide repeats and STRs. There is no explanation yet on the probable mechanism leading to this extreme variability within Silene. A similar situation occurs in Smilax (Ferrufino et al., unpublished data), suggesting that such patterns of infraspecific variability might be inherent in the molecular architecture of psbA–trnH.
The psbA–trnH spacer has been used for phylogeny inference in many different lineages of angiosperms (e.g., Sang et al. 1997; Azuma et al. 1999; Kim et al. 1999; Renner and Chanderbali 2000; Scheen et al. 2004) and mosses (Hedderson and Zander 2007). Sang et al. (1997) noticed considerable homoplasy of indel characters within Paeonia, and phylogenetic performance of psbA–trnH sequences lower than of matK gene sequences. Kim et al. (1999) encountered some resolution within Sonchus (Asteraceae) but rather low statistical support of nodes as compared to nrITS data, which were four times more variable and yielded considerably better trees. Scheen et al. (2004) found variability similar to trnL–trnF in Cerastium (Caryophyllaceae) but slightly lower resolution and support in resulting topologies. There are, however, no detailed studies comparing phylogenetic structure in psbA–trnH to other highly variable chloroplast markers. Short sequence length of the psbA–trnH spacer, such as that in Sonchus (385–450 nt), will also limit the power of this marker. Due to its specific molecular architecture, the psbA–trnH spacer can only be used for phylogeny inference among closely related taxa (e.g., species within genera), after excluding more or less large proportions of its sequence due to mutational hotspots. Even then, individual, very homoplastic characters are likely to obscure the historical signal, so that the performance of psbA–trnH sequences will be inferior to other noncoding genomic regions. The structurally more conserved 3′ UTR on the other hand is too short (28–70 nt; Štorchová and Olson 2004) to justify sequencing this spacer.
The psbA–trnH spacer was proposed as a universal barcode for land plants by Kress et al. (2005). As with any other genomic region, however, there are lineage-specific differences in variability, resulting in inconsistent success of species identification. From the few studies so far that examine all species of larger genera to evaluate identification success in a taxonomic setting, Seberg and Petersen (2009) have shown psbA–trnH alone to be able to identify about half of all 86 known Crocus species, a performance similar to that of matK. In Hypericum (Hypericaceae), psbA–trnH identifies nearly all species in some lineages such as in sect. Hypericum but is almost invariable within sect. Ascyreia (Borsch et al. unpublished data). Extremely high intraspecific variability (inversions in the 3′ UTR stem-loop, satellite-like variation in the untranscribed downstream part) that is present outside of clearly delimitable satellites in certain lineages of angiosperms (e.g., Silene, Smilax) poses severe problems to unambiguous species identification. Unless the population genetic variability is fully assessed within phenotypically and biologically delimited taxa, psbA–trnH molecular diversity will likely overestimate species diversity.
The atpB–rbcL spacer
In addition to the trnT–trnF region, the IGS separating the plastid genes atpB and rbcL was one of the first noncoding markers used for phylogeny inference (Ehrendorfer et al. 1994; Manen et al. 1994a). Like trnT–trnF, it is located in the large single-copy region. The molecular evolution of this IGS was studied in detail early on (Golenberg et al. 1993; Savolainen et al. 1997; Manen et al. 1994b) because it contains promoter elements for genes encoding subunits of the chloroplast ATP synthase and the large subunit of ribulose-1,5-biphosphate carboxylase, involved in energy metabolism and photosynthesis, respectively. Clockwise transcription of atpB occurs in an operon (atpB/E) and is initiated by a series of promoters (Manen 2000), whereas transcription of rbcL is anti-clockwise and initiated by simple −35 (TTGCGC) and −10 (TACAAT) elements (Manen 2000; Crayn and Quinn 2000). The spacer contains two different but highly conserved and obviously functional atpB promoters, suggesting interaction of distinct initiation complexes, and as well two conserved promoters for rbcL (Manen et al. 1994b). Widely separated locations of promoters lead to an overlap of sequence elements transcribed with atpB and rbcL. The largely herbaceous subfamily Rubioideae of Rubiaceae was found to have relaxed functional conservation on atpB transcription, which resulted in deleted or altered PEP promoters (Manen 2000). In any case, functional constraints lead to a mosaic-like structure of differing sequence conservation in the atpB–rbcL IGS. The size of the spacer increases from the bryophyte lineages, where it is normally about 350–400 nt (Chiang and Schaal 2000; Stech and Quandt 2006; Vanderpoorten and Long 2006), to about 600–850 nt in ferns (e.g. Su et al. 2005), lycopods (Hoot et al. 2006), and angiosperms.
Primers for amplification are located in the CDS of rbcL and atpB, respectively, as designed by Manen et al. (1994a) and Hoot and Douglas (1998). Sequencing with a single primer, at least in angiosperms is not possible as the region is slightly too long. The sequencing effort using two primers is therefore similar to what is needed for one of the much more phylogenetically useful group II introns.
Not surprisingly, all earlier studies applying atpB–rbcL spacer in addition to rbcL or atpB coding sequences found better performance of the IGS (e.g., Hoot and Douglas 1998; Crayn and Quinn 2000). Savolainen et al. (1997) provided an alignment of atpB–rbcL spacer sequences spanning Celastrales and other eudicots. Sequences of the atpB–rbcL spacer appear to be alignable across larger taxonomic distances, for example, within angiosperms. This, and the fact that there is a sufficient number of variable sites also between sequences from closely related species make it attractive as a marker to be used widely in plant evolutionary studies. However, there is still no comprehensive comparison of phylogenetic structure R compared to that of other spacers and introns. Nevertheless, some conclusions may be drawn from several studies that have used different gene trees for identical taxon sets. Soejima and Wen (2006) successfully used the atpB–rbcL IGS for tree inference in Vitaceae, resulting in resolution and support similar to a trnL–trnF data set; on the other hand, the rps16 intron performed distinctly better than these. Kadereit et al. (2006) densely sampled Salicornioideae (Chenopodiaceae) and attained atpB–rbcL trees that were well resolved and supported in large parts of the topology. For the bromeliad subfamily Tillandsioideae, known for its difficult to resolve phylogeny, Barfuss et al. (2005) depict trees of seven plastid regions (including the trnL intron, the rps16 intron, and the trnK intron) of which the atpB–rbcL tree is the least resolved. Wissemann and Ritz (2005) recovered only weakly supported trees for the genus Rosa, and Kårehed et al. (2008) revealed it as the worst performing noncoding marker in Spermacoceae (Rubiaceae). Signal of the atpB–rbcL spacer proved to support a different tree topology as compared to six other introns and spacers and the matK gene in the genus Coffea (Rubiaceae; Tesfaye et al. 2007). In summary, the atpB–rbcL spacer appears to be of rather inferior phylogenetic utility compared to other markers. This is in line with the high indel homoplasy found in this region, as noted by Golenberg et al. (1993), which is likely to lead to spurious signal when using atpB–rbcL indels as phylogenetic characters.
However, rapid evolution of the highly variable AT-rich stretches and regions outside the homonucleotide stretches facilitate the ability to distinguish different haplotypes within species, as has been shown in ferns (Su et al. 2005) and angiosperms (Hamilton et al. 2003; Watanabe et al. 2006; Tan et al. 2008). So despite its relatively limited phylogenetic signal, the atpB–rbcL IGS might be a practical marker for species identification, with far fewer problems (long polyA/T stretches and inversions occur rarely, architecture is stable in land plants) than a region such as the psbA–trnH spacer.
Chloroplast genomic regions evolve differently than mitochondrial or nuclear DNA
Differences in mode of inheritance, organization, and structural evolution of the three plant genomic compartments have long been recognized and undisputed. It is generally appreciated that these differences heavily influence the fate of genomic regions within genomes, whether they are translocated, copied, eventually evolve into gene families, or are inherited as simple orthologues (e.g., Hillis 1994). It is also generally accepted that genes have been transferred between these compartments during evolutionary history, especially from the chloroplast to the nucleus (Martin et al. 1998). But differences in patterns of molecular evolution also exist when we look at mutations within genomic regions such as genes, introns, or spacers. Well known are site-specific probabilities and biases of nucleotide substitutions (e.g., Morton and Clegg 1995). Patterns of substitutions in coding genes are an increasingly well-studied field, especially relating to the topic of neofunctionalization (e.g., Benderoth et al. 2006; Teshima and Innan 2008). This is less the case for microstructural mutations and for introns and spacers.
Empirical observations from a variety of noncoding genomic regions in chloroplast genomes indicate that there are common patterns that may be specific to this compartment. For example, Graham et al. (2000) suggested a characteristic size distribution of microstructural changes in different regions of the inverted repeat. We show in Fig. 5 that there is a near uniform distribution of simple sequence repeat length across chloroplast introns and spacers in both the single-copy and inverted repeat regions, and most likely also across different land-plant lineages. SSRs of four to six nucleotides are generally the most frequent. This pattern suggests a common mutational mechanism at work throughout plastid genomes but not so in other genomic compartments, where this pattern is unknown. Although little is known, the existence of chloroplast-specific mutational mechanisms would not be surprising, considering that plastids and thus plastid genomes all have a common origin (McFadden and van Dooren 2004). All chloroplast genomic regions are likely to be replicated and maintained (i.e., DNA repair) by a similar machinery that is derived from a cyanobacterial DNA replication and repair machinery. Our current understanding of DNA replication and repair in the chloroplast genome is, however, still in its beginning stages. Heinhorst and Cannon (1993), Lugo et al. (2004), and Nikolaou and Almirantis (2006) propose respective mechanisms, including an observation of actual properties of cpDNA with electron microscopical techniques. For more details on DNA replication, recombination, and repair in plastids we refer to the latest review by Day and Madesis (2007). Mitochondrial genomes are similar in being circular and uniparentally inherited but differ through greater variation in size, architecture, and molecular evolution among the different kingdoms of eukaryotes (Backert et al. 1997; Knoop 2004). However, among tracheophytes, mitochondrial genomes deviate from the compact circular organization; instead a multipartite organization with a high degree of recombination seems to be the rule (e.g., Ogihara et al. 2005; Grewe et al. 2009).
The nuclear genome is not uniparentally inherited and is recombined in every generation. Most frequently used in molecular systematics is the ribosomal DNA, including the nrITS region with two noncoding, transcribed spacers (see Calonje et al. 2009, this volume). In sequences of ITS, a distribution of SSRs is not evident, as described above for the chloroplast genome. Instead, it appears that most insertions and deletions comprise only one nucleotide, which leads to severe problems in primary homology assessment when these short insertions and deletions have occurred repeatedly in the evolutionary history of a lineage. Length mutations involving 1 nt hinder recognition of motifs, in fact alignments often contain unclear shifts of sequence elements. In any case, empirical studies are needed to precisely reconstruct mutational changes in a phylogenetic context in ITS and in the vast number of other nuclear genomic regions. Whereas recent work (e.g., Biffin et al. 2007) hints at the susceptibility of alignment (and signal obtained) to the consideration of secondary structure constraints, there is hardly any understanding of the history and typical forms of microstructural mutations in ITS. Such data can then be evaluated statistically in comparison to chloroplast data. Different patterns of molecular evolution among different genomic compartments also have an effect on homology assessment (i.e., alignment), and will, for example, imply the use of different alignment criteria for sequences of different genomic compartments.
Practical limits to hypothesis testing of sequence evolution: mutational hotspots
The existence of motifs that can be observed in sequence data sets allows us to hypothesize certain mutational events that led to their formation. Such hypotheses about ancestral mutational events are generated during the process of alignment (= primary homology assessment, = putative synapomorphy hypotheses; see Ochoterena 2009, this volume). Theoretically, it should be possible to reconstruct the historical sequence of such mutational events, and thus to explain with precision how sequences have diverged. In real data sets, this process can be obscured in a sequence that has experienced repeated mutational events involving the same nucleotides. Hypotheses of ancestral mutational events can be impossible to construct in such circumstances, and sequence alignment may be similarly confounded. Such regions are defined as mutational hotspots (Borsch et al. 2003). There are two different types of mutational hotspots: complex areas (large hotspots) involving irregular sequence elements and satellites (micro- and minisatellites). Microsatellites are a special kind of mutational hotspot: in many cases and particularly when the hotspot involves mononucleotide strands, they will not contribute any information to phylogeny inference when only substitutions are used because they consist of length-variable STRs. Empirical analysis of the evolution of an STR microsatellite in Coffea by Tesfaye et al. (2007) confirmed high levels of homoplasy in a microsatellite locus compared to microstructural mutations outside the satellite.
Large hotspots are frequently located in distal parts of stem-loop regions of both group I and group II introns (Borsch et al. 2003; Löhne and Borsch 2005; Worberg et al. 2007; Korotkova et al. 2009, this volume) and are AT-rich. Contrary to small microsatellites, they are not necessarily composed of STRs but also contain larger SSRs (>5 nt). Early data sets that included more distant sequences (Borsch et al. 2003; Worberg et al. 2007) suggest that the position of larger hotspots is more or less fixed across sequences from major lineages of flowering plants, whereas microsatellites (particularly homonucleotide strands of polyA/Ts that rarely exceed 20–25 nt in length) appear much more specifically within individual lineages such as families or genera. The birth of microsatellites may be facilitated through slipped strand mispairing once a certain number of identical nucleotides is present in a sequence (Levinson and Gutman 1987). The size of large hotspots is expected to increase with genetic distance of the sequences included in an alignment (Worberg et al. 2007), whereas this is not the case in microsatellite-caused hotspots.
In cases of both large and small hotspots, homology assessment is usually not adequate, and the respective sequence elements need to be excluded from the character matrix used for phylogeny inference as suggested by a majority of authors (e.g., Swofford et al. 1996; Grundy and Naylor 1999; Castresana 2000). Some authors such as Aagesen (2004) propose that a series of alignments should be generated for such regions, which can then be used to extract information from ambiguously aligned sequences in order not to remove any characters from an analysis. In line with Ochoterena (2009, this volume), we argue that a robust alignment is required as a primary homology hypothesis that can then be further tested. Including unalignable sequence elements from hotspots into phylogenetic analysis violates this principle. Simulation studies also point to adverse effects of including ambiguously aligned elements. Talavera and Castresana (2007) found better resolved trees after removing blocks of unclear nucleotide homology from simulated protein alignments. It was shown that despite losing some characters by excluding hotspots, phylogenetic signal increased. Simulation studies have also shown that topological error increases if alignment error increases (Ogden and Rosenberg 2006).
Taxonomic spectrum for the application of spacer and intron sequence data
It is most important to ensure that a noncoding DNA region that will be included in a phylogenetic data set has evolved in an orthologous manner across the study group. The relatively high structural conservation of the chloroplast genome in land plants with a highly conserved gene order (Raubeson and Jansen 1992; Raubeson et al. 2007) allows for many intergenic spacers and introns to be used for phylogeny inference across distantly related taxa. By contrast, mitochondrial genomes have frequently rearranged throughout their history and show variable gene and intron content in plants (Malek and Knoop 1998; Knoop 2004; Qiu and Palmer 2004) as well as extensive intron mobility. Chloroplast spacers and introns are therefore considerably more important as phylogenetic tools. Occasional rearrangements of the circular plastid genome have occurred in plants, however, and increasing numbers of complete genome sequences now becoming available reveal some highly rearranged genomes such as those of the unrelated eudicot genera Jasminum, Pelargonium, or Trachelium (Chumley et al. 2006; Lee et al. 2007; Haberle et al. 2008). As a result of such rearrangements, spacers will often be partially translocated and thus may not be available as completely orthologous copy across plastomes of distantly related taxa.
For the most widely used genomic regions, we have presented the current knowledge regarding their evolution within land plants (see above). Whereas few spacers will be strongly affected by structural rearrangements within angiosperms (e.g., the psbA–trnH spacer), this becomes more relevant when looking at a broader sample of land plants. Introns are often mobile elements but not so much in land-plant chloroplast genomes (Raubeson et al. 2007). Important and well known is the process of intron loss for group II introns, residing in genes such as rps16, rpoC1, and rpl16 (reviewed by Kelchner 2002) or atpF (Daniell et al. 2008). These introns are usually lost multiple times within angiosperms, even within single orders such as Malpighiales (Daniell et al. 2008). By contrast, the only group I intron, in trnL, is omnipresent in land-plant chloroplast genomes, located within a structurally conserved trnT–trnF region (Quandt et al. 2004).
How deep, then, can the signal of noncoding regions be pushed in land plants? In those genomic regions that are orthologues in land plants, at least one other phenomenon has to be considered. Quandt et al. (2004) and Korotkova et al. (2009, this volume) recently discovered that AT-rich stem-loops of introns can enter strong lineage-specific evolutionary processes. An SSM-mediated process is hypothesized to elongate AT-rich stem-loop regions, which stabilize through hairpin elements in secondary structure and then become more GC-rich through later substitutions. This often leads to independent growth of stem-loop elements, which can have independent evolutionary histories in different groups of plants. As a consequence, these stem-loop elements can only be fully included in phylogenetic analysis when the sampling is within a respective lineage. The most prominent example is the P8 loop of the trnL group I intron in land plants (Quandt et al. 2004) where stepwise growth of stem-loops led to internal paralogy (Fig. 7). Due to their independent origins, such stem-loop elements will be highly divergent in primary sequence and can be easily recognized during alignment. In the case of the trnL group I intron, only a small proportion (approximately 20%) of nucleotides is orthologous throughout land plants and can thus be used for phylogeny inference at that level. In group II introns, the structure-forming nucleotides will be alignable across land plants. Kelchner (2002), for example, aligned 185 structure-forming nucleotides of the rpl16 group II from liverworts to angiosperms without observing major microstructural changes. Such a pattern is also to be expected for other group II introns. On the other hand, the evolution of less conserved stem-loop elements of group II introns is much less known, also regarding their deep signal. However, major lineages of land plants, such as angiosperms, leptosporangiate ferns, or lycophytes, appear to possess largely orthologous nucleotide content in their introns and spacers, underscoring the utility of group II introns at least within these lineages. Intergenic spacers are more heterogeneous in their molecular evolution, so that the taxonomic spectrum of their utility depends much more on the individual spacer.
In search of markers with the highest phylogenetic structure
The selection of a marker for a given phylogenetic question is very often based on common belief in various groups of plants (Verbruggen and Theriot 2008) rather than on hard criteria. Percentage of variable or informative characters has been a frequently applied criterion for selecting molecular markers, based on assessments of data sets prior to phylogenetic analyses such as implemented in PAUP* (Swofford 2001). For angiosperms, Soltis and Soltis (1998) employed overall variability in providing a table of widely used genomic regions and their putative range of successful application to phylogenetic inference. As a general trend, noncoding introns and spacers were recommended for use at species and genus levels, whereas conserved genes were recommended for deeper levels. This concept has been changed substantially through the advances in our understanding of molecular evolution of introns and spacers made in the last decade (e.g., Kelchner and Clark 1997; Kelchner 2002; Borsch et al. 2003; Quandt et al. 2004; Löhne and Borsch 2005; Müller et al. 2006). It is now generally understood that introns and spacers are highly complex mosaics in terms of sequence evolution, and that their evaluation based on simple p-distances is an oversimplification of the actual patterns. It is therefore not too surprising that intron and spacer phylogenetic signals can be pushed considerably deeper than the species or genus levels.
There is a debate on the causes of better performance of noncoding versus coding molecular markers. Higher amounts of variable and informative sites in noncoding DNA are easily shown (e.g., Asmussen and Chase 2001). Nevertheless, Borsch et al. (2003) considered that a higher number of informative sites only partly explained their generally better phylogenetic performance. Based on the assumption that sites in noncoding DNA on average evolve closer to neutrality than in coding DNA (Jukes and King 1971; Kimura 1983), the authors hypothesized higher signal quality to be a key factor (observed patterns of sequence variability are less obscured by convergences due to selective pressure).
One of the problems in evaluating phylogenetic utility of molecular markers is the uneven information base. A molecular “marker” is defined in practical terms, i.e., spanning a sequence region that can easily be amplified by a single primer pair. Sequencing is then usually carried out using the amplification primers and a few additional internal sequencing primers. Depending on the amplicon, absolute sequence length usually ranges from 500 to 2,500 nt, and so do numbers of variable and informative characters. As a consequence, the phylogenetic utility of an approximately 500-nt trnL–trnF spacer and an approximately 1,100-nt rpl16 intron cannot be compared directly. Taking such different information entities into account, the relative performance of coding and noncoding DNA is also difficult to compare, even if average absolute sequence lengths are equivalent. This is because the percentage of variable and informative sites in noncoding introns and spacers is usually higher than in coding genes. The question of whether the average number of informative sites per nucleotide sequenced in noncoding DNA is higher than that of coding DNA is thus easily, and positively, answered. However, how many informative sites can be gathered per primer read is rather a question of practical “marker” economics (Shaw et al. 2005). If alignable, and if not saturated so that the historical signal is obscured, noncoding markers will usually perform better in phylogenetic analysis than coding markers.
Homoplasy, i.e., multiple transformations of the same character state in the same data set (contrary to apomorphic changes) is not necessarily deleterious to phylogenetic utility. It rather depends on when and where character state changes have occurred during the process of sequence divergence, and thus on how homoplasy is distributed in the data set. Impressive arguments are provided by studies that empirically confirm the important role that selectively less-constrained 3rd codon positions play in the phylogenetic performance of many coding sequences (Björklund 1999; Kallersjö et al. 1999; Müller et al. 2006; Simmons et al. 2006). Recent evidence on frequent and widespread parallel evolution of protein sequences (Rokas and Carroll 2008) is in line with these observations, too. For spacers and introns, investigation of site rates and their effects on phylogenetic performance is still limited; however, the discovery that noncoding sequence data sets can outperform coding sequence data sets in average branch support for identical taxon sets indicates that saturation is unlikely to simply obscure phylogenetic structure (see Borsch et al. 2003; Müller et al. 2006; Worberg et al. 2007). Homoplasy may in fact be a very relevant factor that obscures or biases the phylogenetic signal in some coding sequence data sets.
In order to account for the different size of the respective markers and thus for different numbers of phylogenetically informative characters a PERL script was developed that allows the user to compare equal numbers of informative characters occurring in each genomic region (Müller et al. 2006). Through a resampling approach, equal units of informative characters are assessed for their phylogenetic structure R, i.e., phylogenetic structure can be measured per informative character. The method for calculating R refers to a phylogenetic tree as a hypothesis of common descent (Müller et al. 2006) expressed by the mean statistical support across nodes (theoretically expected nodes). R is 1 when all nodes receive maximum support either using the jackknife, bootstrap, or posterior probability values. In a broader sense, phylogenetic structure was determined by calculating means of support values for nodes receiving >50% support (Kallersjö et al. 1999) or by adding up Bremer support values (Kallersjö et al. 1992).
So far, there are only a very few studies available with exactly comparable character partitions (i.e., different genomic regions were sequenced and analyzed for identical taxon sets). Müller et al. (2006) compared the trnT–trnF region comprising two spacers and a group I intron with the matK and rbcL genes for a 42-taxon data set of angiosperms using the PERL script. The authors found not only higher phylogenetic structure in the trnT–trnF marker but also a significantly greater phylogenetic structure per informative site in the noncoding partition as compared to the already very highly performing matK gene (see Hilu et al. 2003; Müller and Borsch 2005a). Löhne et al. (2007) compared three group II introns, the trnL group I intron and several spacers in Nymphaeales and found R for the rpl16 and trnK data sets to be significantly higher than for petD. The trnL intron was comparable in signal to petD, whereas spacers (trnT–trnL, trnL–trnF, petB–petD, trnK–psbA) were comparable or weaker, the latter basically caused by their smaller size. Notably, CI and RC homoplasy indices did not correlate with R. This underscores the above arguments that the number of character state changes divided by tree-length (CI) is an estimate for the amount of homoplasy but does not sufficiently reflect the distribution of homoplasy that determines phylogenetic structure.
In a series of studies, Small et al. (1998) and Shaw et al. (2005, 2007) examined the utility of cpDNA genomic regions for resolving evolutionary relationships among closely related species. Because genetic distances of genomic regions sequenced from such recently diverged organisms are low, the aim was to screen the chloroplast genome for its most variable parts. Shaw et al. compared so-called PICs (potentially informative characters) based on the variability observed in three-taxon alignments. PIC values do not equate with values for phylogenetic structure (defined as R) determined for a sequence data set of a genomic region after inferring a phylogenetic hypothesis as this would require consideration of the level and distribution of homoplasy based on more representative taxon sets. Even if PIC values can be related to the size of a marker, and thus be of more practical use for selecting markers that exhibit variability, they will not predict phylogenetic utility. The more classical way is to compare pairwise nucleotide divergences (p-distances) in larger data sets. Calviño and Downie (2007) for example found spacers adjacent to the group II intron in rps16 (trnQ–rps16 and rps16–trnK) to evolve twice as fast on average in Apiaceae; Timme et al. (2007) ranked p-distances of genomic regions between Helianthus and Lactuca plastid genomes, resulting in spacers being the most divergent noncoding cpDNA regions. As with phylogenetic structure, PIC values do not correlate directly with p-distances (Timme et al. 2007).
Quality measures when working with noncoding markers
Most important when working with DNA sequences that exhibit evolutionary patterns with frequent microstructural mutations is a clear and reproducible putative synapomorphy assessment or alignment [also addressed by Ochoterena (2009) and Morrison (2009b, this volume]. Alignment rules have been proposed by Graham et al. (2000), Kelchner (2000), Borsch et al. (2003), and Löhne and Borsch (2005), largely based on inferred types of microstructural mutations. The assumptions underlying the rules appear to be reflected by reconstructions of sequence character evolution in an increased number of data sets (e.g., Borsch et al. 2007), therefore also leading to further establishment of these criteria.
Nevertheless, the current literature still contains many studies that use ambiguous alignment approaches, stated in formulations such as “alignment was manually adjusted” or “alignments obtained with Clustal X were subsequently corrected by eye.” Putative synapomorphy assessment should be based on criteria that explicitly explain how it was carried out. All respective matrices should be published as the verifiable basis for phylogenetic analysis. This actually includes not only the alignment (i.e., the DNA character matrix) with and without mutational hotspots but also all additional matrices resulting from alignment coding. A nice addition would be to document hypothesized microstructural mutational events for individual sequences (Löhne and Borsch 2005; Müller and Borsch 2005a, b; Borsch et al. 2007) as this will easily allow their further testing in a phylogenetic context and with additional taxon sampling. Rigorous documentation of matrices is the standard for other kinds of characters than DNA used in phylogenetic analysis. Along the same line, Morrison (2009a, b, this volume) discusses the limits of current computer programs for alignment and possible future needs for automation. In any case, if an alignment is properly done as described in this and other studies, potentially erroneous homology statements for a few individual characters will rarely influence the all-over signal in a noncoding DNA data set.
Conclusion and future work
The application of noncoding chloroplast DNA for inferring phylogenetic hypotheses is promising at a broad taxonomic spectrum that ranges from species-level questions to the reconstruction of deep nodes. Numerous studies report well-resolved and statistically supported trees for a variety of plant groups based on sequences of chloroplast introns and spacers in recent years. Noncoding markers evidently outperform more conserved coding markers in many cases.
However, when working with noncoding sequences, which evolve by both substitutions and frequent microstructural mutations, certain quality measures need to be considered. The alignment of sequences (and also the coding of microstructural mutations) should follow clearly formulated criteria to designate putative mutational events in a given genomic region. Resulting matrices with aligned sequences as well as hypothesized microstructural changes should be made accessible because they are testable statements of homology of DNA characters and their states. It is important to realize that such criteria are based on our understanding of the molecular evolution of the respective genomic regions. With a wealth of sequence data becoming available, we are beginning to understand molecular evolutionary patterns and their underlying mechanisms acting in different genomic regions.
Empirical analyses of well-sampled sequence characters in the phylogenetic context play an important role and are likely to yield many new insights in the coming years. A greater number of taxon sets for which a variety of different genomic regions have been sequenced from identical organisms will be needed as a comparative basis. So far, studies that reconstruct the historic pathway of sequence evolution in a phylogenetic context, and studies that avoid lineage effects by using a comparative approach of different genomic regions, are relatively rare. However, an extension of knowledge about DNA sequence evolution is likely to lead to further refined approaches in sequence analysis. One result of such studies is already evident: mutational dynamics of different kinds of genomic regions and among the three genomic compartments of plants differ significantly. As a consequence, universal approaches to evolutionary sequence analysis have their limits including, for example, the application of fixed universal gap costs in multiple sequence alignment. This point is also raised by Kjer et al. (2009) who discussed the impact of structural and evolutionary considerations on alignable RNA sequences. A better understanding of molecular evolutionary patterns, which extends beyond assessing pairwise sequence variability, will be beneficial to the effective application of noncoding genomic regions in both phylogenetic analysis and DNA barcoding.
It is obvious that the evolution of the different kinds of microstructural characters described above involves selective constraints that differ considerably from the replacement of single nucleotides in a strand of DNA. Microstructural change may therefore need to be described by different models. Bayesian and likelihood approaches will play an increasingly important role in phylogenetic inference because they allow the use of mixed models or relaxed parameters (Pagel and Meade 2004; Kelchner and Thomas 2007). To fully utilize the information content of noncoding DNA, future work should focus on more realistic models that describe microstructural sequence evolution and the implementation of such models in phylogenetic and evolutionary analyses.
References
Aagesen L (2004) The information content of an ambiguously alignable region, a case study of the trnL intron from the Rhamnaceae. Org Divers Evol 4:35–49
Akins RA, Lambowitz AM (1987) A protein required for splicing group I introns in Neurospora mitochondria is mitochondrial tyrosyl-tRNA synthetase or a derivative thereof. Cell 50:331–345
Aldrich J, Crnheey BW, Merlin E, Christopherson L (1988) The role of insertions/deletions in the evolution of the intergenic region between psbA and trnH in the chloroplast genome. Curr Genet 14:137–146
Andersson L, Rova JHE (1999) The rpsl6 intron and the phylogeny of the Rubioideae (Rubiaceae). Plant Syst Evol 214:161–186
Ansell SW, Schneider H, Pedersen N, Grundmann M, Russell SJ, Vogel JC (2007) Recombination diversifies chloroplast trnF pseudogenes in Arabidopsis lyrata. J Evol Biol 20:2400–2411
Asmussen CB, Chase MW (2001) Coding and noncoding plastid DNA in palm systematics. Am J Bot 88:1103–1117
Azuma H, Thien LB, Kawano S (1999) Molecular phylogeny of Magnolia (Magnoliaceae) inferred from cpDNA sequences and evolutionary divergences of the floral scents. J Plant Res 112:291–306
Backert S, Lynn Nielsen B, Börner T (1997) The mystery of the rings: structure and replication of mitochondrial genomes from higher plants. Trends Plant Sci 2:477–483
Baker WJ, Hedderson TA, Dransfield J (2000) Molecular phylogenetics of subfamily Calamoideae (Palmae) based on nrDNA ITS and cpDNA rps16 intron sequence data. Mol Phylogenet Evol 14:195–217
Bakker FT, Culham A, Gomez-Martinez R, Carvalho J, Compton J, Dawtrea R, Gibby M (2000) Patterns of nucleotide substitution in angiosperm cpDNA trnL(UAA)-trnF(GAA) regions. Mol Biol Evol 17:1146–1155
Barfuss MHJ, Samuel R, Till W, Stuessy TF (2005) Phylogenetic relationships in subfamily Tillandsioideae (Bromeliaceae) based on DNA sequence data from seven plastid regions. Am J Bot 92:337–351
Baumgartner BJ, Rapp JC, Mullet JE (1989) Plastid transcription activity and DNA copy number increase early in barley chloroplast development. Plant Physiol 89:1011–1018
Beck SG, Fleischmann A, Huaylla H, Müller KF, Borsch T (2008) Pinguicula chuquisacensis (Lentibulariaceae), a new species from the Bolivian Andes, and first insights on phylogenetic relationships among South American Pinguicula. Willdenowia 38:201–212
Benderoth M, Textor S, Windsor AJ, Mitchell-Olds T, Gereshenzon J, Kroymann J (2006) Positive selection during diversification in plant secondary metabolism. Proc Natl Acad Sci USA 103:9118–9123
Benson G (1997) Sequence alignment with tandem duplication. J Comput Biol 4:351–367
Besendahl A, Qiu YL, Lee J, Palmer JD, Bhattacharya D (2000) The cyanobacterial origin and vertical transmission of the plastid tRNALeu group-I-intron. Curr Genet 37:12–23
Biffin E, Harrington MG, Crisp MD, Craven LA, Gadek PA (2007) Structural partitioning, paired-sites models and evolution of the ITS transcript in Syzygium and Myrtaceae. Mol Phylogenet Evol 43:124–139
Birky CW Jr (1995) Uniparental inheritance of mitochondrial and chloroplast genes: mechanisms and evolution. Proc Natl Acad Sci USA 92:11331–11338
Björklund M (1999) Are third positions really bad? A test using vertebrate cytochrome b. Cladistics 15:191–197
Böhle UR, Hilger HH, Martin FW (1996) Island colonization and evolution of the insular woody habit in Echium L. (Boraginaceae). Proc Natl Acad Sci USA 93:11740–11745
Boivin R, Richard M, Beauseigle D, Bousquet J, Bellemare G (1996) Phylogenetic inferences from chloroplast chlB gene sequences of Nephrolepis exaltata (Filicopsida), Ephedra altissima (Gnetopsida), and diverse land plants. Mol Phylogenet Evol 6:19–29
Borsch T, Hilu KW, Quandt D, Wilde V, Neinhuis C, Barthlott W (2003) Non-coding plastid trnT-trnF sequences reveal a well resolved phylogeny of basal angiosperms. J Evol Biol 16:558–576
Borsch T, Löhne C, Müller K, Hilu KW, Wanke S, Worberg A, Barthlott W, Neinhuis C, Quandt D (2005) Towards understanding basal angiosperm diversification: recent insights using rapidly evolving genomic regions. Nova Acta Leopoldina NF 92(342):85–110
Borsch T, Hilu KW, Wiersema JH, Löhne C, Barthlott W, Wilde V (2007) Phylogeny of Nymphaea (Nymphaeaceae): evidence from substitutions and microstructural changes of the chloroplast trnT-trnF region. Int J Plant Sci 168:639–671
Borsch T, Korotkova N, Raus T, Lobin W, Löhne C (2009) The petD group II intron as a species level marker: utility for tree inference and species identification in the diverse genus Campanula (Campanulaceae). Willdenowia 39:7–33
Calonje M, Martín-Bravo S, Dobeš, Gong W, Jordon-Thaden I, Kiefer C, Kiefer M, Paule J, Schmickl R, Koch MA (2009) Non-coding nuclear DNA markers in phylogenetic reconstruction. Plant Syst Evol (this volume, pp 257–280). doi:10.1007/s00606-008-0031-1
Calviño CI, Downie SR (2007) Circumscription and phylogeny of Apiaceae subfamily Saniculoideae based on chloroplast DNA sequences. Mol Phylogenet Evol 44:175–191
Campagna ML, Downie SR (1998) The intron in chloroplast gene rpl16 is missing from the flowering plant families Geraniaceae, Goodeniaceae, and Plumbaginaceae. Trans Illinois State Acad Sci 91:1–11
Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552
Cech TR (1988) Conserved sequences and structures of group I introns: building an active site for RNA catalysis—a review. Gene 73:259–271
Cech TR (1990) Self-splicing of group I introns. Annu Rev Biochem 59:543–568
Cech TR, Herschlag D, Piccirilli JA, Pyle AM (1992) RNA catalysis by a group I ribozyme: developing a model for transition state stabilization. J Biol Chem 267:1749–17482
Cech TR, Damberger SH, Gutell RR (1994) Representation of the secondary and tertiary structure of group I introns. Struct Biol 1:273–280
Chang C-C, Lin H-C, Lin I-P, Chow T-Y, Chen H-H, Chen W-H, Cheng C-H, Lin C-Y, Liu S-M, Chang C-C, Chaw S-M (2006) The chloroplast genome of Phalaenopsis aphrodite (Orchidaceae): comparative analysis of evolutionary rate with that of grasses and its phylogenetic implications. Mol Biol Evol 23:279–291
Chaw SM, Walters TW, Chang CC, Hu SH, Chen SH (2005) A phylogeny of cycads (Cycadales) inferred from chloroplast matK gene, trnK intron, and nuclear rDNA ITS region. Mol Phylogenet Evol 37:214–234
Chiang T-Y, Schaal BA (2000) Molecular evolution and phylogeny of the atpB-rbcL spacer of chloroplast DNA in the true mosses. Genome 43:417–426
Chumley TW, Palmer JD, Mower JP, Fourcade HM, Calie PJ, Boore JL, Jansen RK (2006) The complete chloroplast genome sequence of Pelargonium x hortorum: organization and evolution of the largest and most highly rearranged chloroplast genome of land plants. Mol Biol Evol 23:2175–2190
Clegg MT, Gaut BS, Learn GH, Morton BR (1994) Rates and patterns of chloroplast DNA evolution. Proc Natl Acad Sci USA 91:6795–6801
Corriveau JL, Coleman AW (1988) Rapid screening method to detect potential biparental inheritance of plastid DNA and results for over 200 angiosperm species. Am J Bot 75:1443–1458
Cosner ME, Jansen RK, Palmer JD, Downie SR (1997) The highly rearranged chloroplast genome of Trachelium caeruleum (Campanulaceae): multiple inversions, inverted repeat expansion and contraction, transposition, insertions/deletions, and several repeat families. Curr Genet 31:419–429
Cozzolino S, Cafasso D, Pellegrino G, Musacchio A, Widmer A (2003) Molecular evolution of a plastid tandem repeat locus in an orchid lineage. J Mol Evol 57:S41–S49
Crayn DM, Quinn CJ (2000) The evolution of the atpB-rbcL intergenic spacer in the Epacrids (Ericales) and its systematic and evolutionary implications. Mol Phylogenet Evol 16:238–252
Daniell H, Wurdack KJ, Kanagaraj A, Lee S-B, Saski C, Jansen RK (2008) The complete nucleotide sequence of the cassava (Manihot esculenta) chloroplast genome and the evolution of atpF in Malpighiales: RNA editing and multiple losses of a group II intron. Theor Appl Genet 116:723–737
Davies RW, Waring RB, Brown TA, Scazzocchio C (1982) Making ends meet—a model for RNA splicing in fungal mitochondria. Nature 300:719–724
Day A, Madesis P (2007) DNA replication, recombination, and repair in plastids. In: Bock R (ed) Cell and molecular biology of plastids. Topics in current genetics, vol 19. Springer, Heidelberg, pp 65–119
Delwiche CF, Palmer JD (1997) The origin of plastids and their spread via secondary symbiosis. Plant Syst Evol 11:53–86
Donoghue MJ, Baldwin BG, Li J, Winkworth RC (2004) Viburnum phylogeny based on chloroplast trnK intron and nuclear ribosomal ITS DNA sequences. Syst Bot 29:188–198
Downie SR, Katz-Downie DS (1999) Phylogenetic analysis of chloroplast rps16 intron sequences reveals relationships within the woody southern African Apiaceae subfamily Apioideae. Can J Bot 77:1120–1135
Downie SR, Katz-Downie D, Watson MF (2000) A phylogeny of the flowering plant family Apiaceae based on chloroplast DNA rpl16 and rpoC1 intron sequences: towards a suprageneric classification of subfamily Apioideae. Am J Bot 87:273–292
Doyle JJ, Doyle JL, Palmer JD (1995) Multiple independent losses of two genes and one intron from legume chloroplast genomes. Syst Bot 20:272–294
Duffy AM, Kelchner SA, Wolf PG (2009) Conservation of selection on matK following an ancient loss of its flanking intron. Gene 438:17–25
Echt CS, Deverno LL, Anzidei M, Vendramin GG (1998) Chloroplast microsatellites reveal population genetic diversity in red pine, Pinus resinosa Ait. Mol Ecol 7:307–316
Edwards D, Horn A, Taylor D, Savolainen V, Hawkins JA (2008) DNA barcoding of a large genus, Aspalanthus L. (Fabaceae). Taxon 57:1317–1327
Ehrendorfer F, Manen J-F, Natali A (1994) cpDNA intergene sequences corroborate restriction site data for reconstructing Rubiaceae phylogeny. Plant Syst Evol 190:245–248
Funk HT, Berg S, Krupinska K, Maier UG, Krause K (2007) Complete DNA sequences of the plastid genomes of two parasitic flowering plant species, Cuscuta reflexa and Cuscuta gronovii. BMC Plant Biol 7:45
Galley C, Linder HP (2007) The phylogeny of the Pentaschistis clade (Danthonioideae, Poaceae) based on chloroplast DNA, and the evolution and loss of complex characters. Evolution 61:864–884
Gielly L, Taberlet P (1996) A phylogeny of European gentians inferred from chloroplast trnL (UAA) intron sequences. Bot J Linn Soc 120:57–75
Golenberg EM, Clegg MT, Durbin ML, Doebley J, Ma DP (1993) Evolution of a noncoding region of the chloroplast genome. Mol Phylogenet Evol 2:52–64
Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH (2004) The chloroplast genome of Nymphaea alba: whole-genome analyses and the problem of identifying the most basal angiosperm. Mol Biol Evol 21:1445–1454
Graham SW, Olmstead RG (2000) Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr Genet 37:183–188
Graham SW, Reeves PA, Burns ACE, Olmstead RG (2000) Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int J Plant Sci 161:S83–S96
Grewe F, Viehoever P, Weisshaar B, Knoop V (2009) A trans-splicing group I intron and the tRNA-hyperediting in the mitochondrial genome of the lycophyte Isostes engelmannii. Nucl Acids Res. doi:10.1093/nar/gkp532
Groeninckx I, Dessein S, Ochoterena H, Persson C, Motley TJ, Kårehed J, Bremer B, Huysmans S, Smets S (2009) Phylogeny of the herbaceous tribe Spermacoceae (Rubiaceae) based on plastid DNA data. Ann Missouri Bot Gard 96:109–132
Grundy WN, Naylor GJ (1999) Phylogenetic inference from conserved sites alignments. J Exp Zoo 285:128–139
Gu X, Li WH (1995) The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J Mol Evol 40:464–473
Haberle RC, Fourcade HM, Boore JL, Jansen RK (2008) Extensive rearrangements in the chloroplast genome of Trachelium caeruleum are associated with repeats in tRNA genes. J Mol Evol 66:350–361
Hamilton MB (1999) Four primer pairs for the amplification of chloroplast intergenic regions with intraspecific variation. Mol Ecol 8:521–523
Hamilton MB, Braverman JM, Soria-Hernanz DF (2003) Patterns and relative rates of nucleotide and insertion/deletion evolution at six chloroplast intergenic regions in new world species of the Lecythidaceae. Mol Biol Evol 20:1710–1721
Hansen DR, Dastidar SG, Cai Z, Penaflor C, Kuehl JV, Boore JL, Jansen RK (2007) Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae). Mol Phylogenet Evol 45:547–563
Haugen P, Simon DM, Bhattacharya D (2005) The natural history of group I introns. Trends Genet 21:111–119
Hausner G, Olson R, Simon D, Johnson I, Sanders ER, Karol KG, McCourt RM, Zimmerly S (2006) Origin and evolution of the chloroplast trnK (matK) intron: a model for evolution of group II intron RNA structures. Mol Biol Evol 23:380–391
Hedderson TA, Zander RH (2007) Triquetrella mxinwana, a new moss species from South Africa, with a phylogenetic and biogeographic hypothesis for the genus. J Bryol 29:151–160
Hedenäs L (2009) Relationships among arctic and non-arctic haplotypes of the moss species Scorpidium cossonii and Scorpidium scorpioides (Calliergonaceae). Plant Syst Evol. doi:10.1007/s00606-008-0131-y
Heinhorst S, Cannon GC (1993) DNA replication in chloroplasts. J Cell Sci 104:1–9
Hernández-Maqueda R, Quandt D, Werner O, Muñoz J (2008) Phylogenetic relationships and generic classification of the Grimmiaceae. Mol Phylogenet Evol 46:863–877
Hillis DM (1994) Homology in molecular biology. In: Hall B (ed) Homology—the hierarchical basis of comparative biology. Academic Press, San Diego, pp 339–369
Hilu KW, Liang H (1997) The matK gene: sequence variation and application in plant systematics. Am J Bot 84:830–839
Hilu KW, Borsch T, Müller K, Soltis DE, Soltis PS, Savolainen V, Chase MW, Powell M, Alice MA, Evans R, Sauquet H, Neinhuis C, Slotta TA, Rohwer JG, Campbell CS, Chatrou L (2003) Angiosperm phylogeny based on matK sequence information. Am J Bot 90:1758–1776
Hirao T, Watanabe A, Kurita M, Kondo T, Takata K (2008) Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species. BMC Plant Biol 8:70
Hiratsuka J, Shimada H, Whittier R, Ishibashi T, Sakamoto M, Mori M, Kondo C, Honji Y, Sun CR, Meng BY, Li YQ, Kanno A, Nishizawa Y, Hirai A, Shinozaki K, Sugiura M (1989) The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol Gen Genet 217:185–194
Hoot SB, Douglas AW (1998) Phylogeny of the Proteaceae based on atpB and atpB-rbcL intergenic spacer region sequences. Aust Syst Bot 11:301–320
Hoot SB, Taylor WC, Napier NS (2006) Phylogeny and biogeography of Isoёtes (Isoёtaceae) based on nuclear and chloroplast DNA sequence data. Syst Bot 31:449–460
Hupfer H, Swiatek M, Hornung S, Herrmann RG, Maier RM, Chiu WL, Sears B (2000) Complete nucleotide sequence of the Oenothera elata plastid chromosome, representing plastome I of the five distinguishable Euoenothera plastomes. Mol Gen Genet 263:581–585
Huttunen S, Hedenäs L, Ignatov MS, Devos N, Vanderpoorten A (2008) Origin and evolution of the northern hemisphere disjunction in the moss genus Homalothecium (Brachytheciaceae). Am J Bot 95:720–730
Jansen RK, Raubeson LA, Boore JL, DePamphilis CW, Chumley TW, Haberle RC, Wyman SK, Alverson AJ, Peery R, Herman SJ, Fourcade HM, Kuehl JV, McNeal JR, Leebens-Mack J, Cui L (2005) Methods for obtaining and analyzing whole chloroplast genome sequences. Methods Enzymol 395:348–384
Jansen RK, Cai Z, Raubeson LA, Daniell H, DePamphilis CW, Lebens-Mack J, Müller KF, Guisinger-Bellian M, Haberle RC, Hansen AK, Chumley TW, Lee SB, Peery R, McNeal JR, Kuehl JV, Boore JL (2007) Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA 104:19369–19374
Jansen RK, Wojciechowski MF, Sanniyasic E, Leec SB, Daniell H (2008) Complete plastid genome sequence of the chickpea (Cicer arietinum) and the phylogenetic distribution of rps12 and clpP intron losses among legumes (Leguminosae). Mol Phylogenet Evol 48:1204–1217
Jenkins B, Kulhanek D, Barkan A (1997) Nuclear mutations that block group II RNA splicing in maize chloroplasts reveal several intron classes with distinct requirements for splicing factors. Plant Cell 9:283–296
Johansson JT (1999) There large inversions in the chloroplast genomes and one loss of the chloroplast gene rps16 suggest an early evolutionary split in the genus Adonis (Ranunculaceae). Plant Syst Evol 218:133–143
Johnson LA, Soltis DE (1995) Phylogenetic inference in Saxifragaceae sensu stricto and Gilia (Polemoniaceae) using matK sequences. Ann Missouri Bot Gard 82:149–175
Jordan WC, Courtney MW, Neigel JE (1996) Low levels of intraspecific genetic variation at a rapidly evolving chloroplast DNA locus in North American duckweeds (Lemnaceae). Am J Bot 83:430–439
Jukes TH, King JL (1971) Deleterious mutations and neutral substitutions. Nature 231:114–115
Kadereit G, Mucina L, Freitag H (2006) Phylogeny of Salicornioideae (Chenopodiaceae): diversification, biogeography, and evolutionary trends in leaf and flower morphology. Taxon 55:617–642
Kallersjö M, Farris JS, Kluge AG, Bull C (1992) Skewness and permutation. Cladistics 8:275–287
Kallersjö M, Albert V, Farris JS (1999) Homoplasy increases phylogenetic structure. Cladistics 15:91–93
Kanno A, Hirai A (1993) A transcription map of the chloroplast genome from rice (Oryza sativa). Curr Genet 23:166–174
Kårehed J, Groeninckx I, Dessein S, Motley TJ, Bremer B (2008) The phylogenetic utility of chloroplast and nuclear DNA markers and the phylogeny of Rubiaceae tribe Spermacoceae. Mol Phylogenet Evol 49:843–866
Kato T, Kaneko T, Sato S, Nakamura Y, Tabata S (2000) Complete structure of the chloroplast genome of a legume, Lotus japonicus. DNA Res 7:323–330
Kelchner SA (2000) The evolution of noncoding chloroplast DNA and its application in plant systematics. Ann Missouri Bot Gard 87:482–498
Kelchner SA (2002) Group II introns as phylogenetic tools: structure, function, and evolutionary constraints. Am J Bot 89:1651–1669
Kelchner SA, Clark LG (1997) Molecular evolution and phylogenetic utility of the chloroplast rpl16 intron in Chusquea and the Bambusoideae (Poaceae). Mol Phylogenet Evol 8:385–397
Kelchner SA, Thomas MA (2007) Model use in phylogenetics: nine key questions. Trends Ecol Evol 22:87–94
Kelchner SA, Wendel JF (1996) Hairpins create minute inversions in noncoding regions of chloroplast DNA. Curr Genet 30:259–262
Kellermann J, Udovicic F (2008) Large indels obscure phylogeny in analysis of chloroplast DNA trnL-F sequence data: Pomaderreae (Rhamnaceae) revisited. Telopea 12:1–22
Kim K-J, Lee HL (2005) Widespread occurrence of small inversions in the chloroplast genomes of land plants. Mol Cells 19:104–113
Kim S-C, Crawford DJ, Jansen RK, Santos-Guerra A (1999) The use of a noncoding region of chloroplast DNA in phylogenetic studies of the subtribe Sonchinae (Asteraceae: Lactuceae). Plant Syst Evol 215:85–99
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge
Kjer KM, Roshan U, Gillespie JJ (2009) Structural and evolutionary considerations for multiple sequence alignment of RNA, and the challenges for algorithms that ignore them. In: Rosenberg MS (ed) Sequence alignment: methods, models, concepts, and strategies. University of California Press, Berkeley, pp 105–149
Knoop V (2004) The mitochondrial DNA of land plants: peculiarities in phylogenetic perspective. Curr Genet 46:123–139
Koch M, Dobeš C, Matschinger M, Bleeker W, Vogel J, Kiefer M, Mitchell-Olds T (2005) Evolution of the plastidic trnF(GAA) gene in Arabidopsis relatives and the Brassicaceae family: monophyletic origin and subsequent diversification of a plastidic pseudogene. Mol Biol Evol 22:1032–1043
Koch M, Dobeš C, Kiefer M, Schmickl R, Klimes L, Lysak MA (2007) Supernetwork identifies multiple events of plastidic trnF(GAA) pseudogene evolution in the Brassicaceae. Mol Biol Evol 24:63–73
Kocyan A, Zhang L-B, Schaefer H, Renner SS (2007) A multi-locus chloroplast phylogeny for the Cucurbitaceae and its implications for character evolution and classification. Mol Phylogenet Evol 44:553–577
Koller B, Delius H (1980) Vicia faba chloroplast DNA has only one set of ribosomal RNA genes as shown by partial denaturation mapping and R-loop analysis. Mol Gen Genet 178:261–269
Korotkova N, Schneider JV, Quandt D, Worberg A, Zizka G, Borsch T (2009) Phylogeny of the eudicot order Malpighiales—analysis of a recalcitrant clade with sequences of the petD group II intron. Plant Syst Evol (this volume, pp 201–228). doi:10.1007/s00606-008-0099-7
Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102:8369–8374
Kugita M, Kaneko A, Yamamoto Y, Takeya Y, Matsumoto T, Yoshinaga K (2003) The complete nucleotide sequence of the hornwort (Anthoceros formosae) chloroplast genome: insight into the earliest land plants. Nucl Acids Res 31:716–721
Kuhsel MG, Strickland R, Palmer JD (1990) An ancient group I intron shared by eubacteria and chloroplasts. Science 250:1570–1573
Kuroiwa T (1991) The replication, differentiation, and inheritance of plastids with emphasis on the concept of organelle nuclei. Int Rev Cytol 128:1–62
Lee J, Hymowitz T (2001) A molecular phylogenetic study of the subtribe Glycininae (Leguminosae) derived from the chloroplast DNA rps16 intron sequences. Am J Bot 88:2064–2073
Lee H-J, Jansen RK, Chumley TW, Kim K-J (2007) Gene relocations within chloroplast genomes of Jasminum and Menodora (Oleaceae) are due to multiple, overlapping inversions. Mol Biol Evol 24:1161–1180
Leebens-Mack J, Raubeson LA, Cui L, Kuehl JV, Fourcade MH, Chumley TM, Boore JL, Jansen RK, dePamphilis CW (2005) Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one’s way out of the Felsenstein Zone. Mol Biol Evol 22:1948–1963
Lehmann K, Schmidt U (2003) Group II introns: structure and catalytic versatility of large natural ribozymes. Crit Rev Biochem Mol 38:249–303
Levinson G, Gutman GA (1987) Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221
Lidholm J, Gustafsson P (1991) A three-step model for the rearrangement of the chloroplast trnK-psbA region of the gymnosperm Pinus contorta. Nucl Acids Res 19:2881–2887
Lidholm J, Szmidt A, Gustafsson P (1991) Duplication of the psbA gene in the chloroplast genome of two Pinus species. Mol Gen Genet 226:345–352
Lilly JW, Havey MJ, Jackson SA, Jiang JM (2001) Cytogenomic analyses reveal the structural plasticity of the chloroplast genome in higher plants. Plant Cell 13:245–254
Löhne C, Borsch T (2005) Molecular evolution and phylogenetic utility of the petD group II intron: a case study in basal angiosperms. Mol Biol Evol 22:317–332
Löhne C, Borsch T, Wiersema JH (2007) Phylogenetic analysis of Nymphaeales using fast-evolving and noncoding chloroplast markers. Bot J Linn Soc 154:141–163
Long DG, Möller M, Preston J (2000) Phylogenetic relationships of Asterella (Aytoniaceae, Marchantiopsida) inferred from chloroplast DNA sequences. Bryologist 103:625–644
Lugo SK, Kunnimalaiyaan M, Singh NK, Niesen BL (2004) Required sequence elements for chloroplast DNA replication activity in vitro and in electroporated chloroplasts. Plant Sci 166:151–161
Malek O, Knoop V (1998) Trans-splicing group II introns in plant mitochondria: the complete set of cis-arranged homologs in ferns, fern allies, and a hornwort. RNA 4:1599–1609
Manen JF (2000) Relaxation of evolutionary constraints in promoters of the plastid gene atpB in a particular Rubiaceae lineage. Plant Syst Evol 224:235–241
Manen JF, Natali A (1995) Comparison of the evolution of ribulose-1,5-biphosphate carboxylase (rbcL) and atpB-rbcL noncoding spacer sequences in a recent plant group, the tribe Rubieae (Rubiaceae). J Mol Evol 41:920–927
Manen JF, Natali A, Ehrendorfer F (1994a) Phylogeny of Rubiaceae-Rubieae inferred from the sequence of a cpDNA intergene region. Plant Syst Evol 190:195–211
Manen J-F, Savolainen V, Simon P (1994b) The atpB and rbcL promoters in plastid DNAs of a wide dicot range. J Mol Evol 38:577–582
Martin W, Stoebe B, Goremykin V, Hapsmann S, Hasegawa M, Kowallik KV (1998) Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393:162–165
Martin W, Rujan T, Richly E, Hansen A, Cornelsen S, Lins T, Leister D, Stoebe B, Hasegawa M, Penny D (2002) Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci USA 99:12246–12251
McCauley DE (1994) Contrasting the distribution of chloroplast DNA and allozyme polymorphism among local populations of Silene alba—implications for studies of gene flow in plants. Proc Natl Acad Sci USA 91:8127–8131
McCoy SR, Kuehl JV, Boore JL, Raubeson LA (2008) The complete plastid genome sequence of Welwitschia mirabilis: an unusually compact plastome with accelerated divergence rates. BMC Evol Biol 8:130
McFadden GI, van Dooren GG (2004) Evolution: red algal genome affirms a common origin of all plastids. Curr Biol 14:R514–R516
Meimberg H, Wistuba A, Dittrich P, Heubl G (2001) Molecular phylogeny of Nepenthaceae based on cladistic analysis of plastid trnK intron sequence data. Plant Biol 3:164–175
Meimberg H, Thalhammer S, Brachmann A, Heubl G (2006) Comparative analysis of a translocated copy of the trnK intron in carnivorous family Nepenthaceae. Mol Phylogent Evol 39:478–490
Michel F, Westhof E (1990) Modeling of the 3-dimensional architecture of group-I catalytic introns based on comparative sequence analysis. J Mol Biol 216:585–610
Michel F, Umesono K, Ozeki H (1989) Comparative and functional anatomy of group II catalytic introns—a review. Gene 82:5–30
Michelangeli FA, Davis JI, Stevenson DW (2003) Phylogenetic relationships among Poaceae and related families as inferred from morphology, inversions in the plastid genome, and sequence data from the mitochondrial and plastid genomes. Am J Bot 90:93–106
Miller JT, Bayer RJ (2003) Molecular phylogenetics of Acacia subgenera Acacia and Aculeiferum (Fabaceae: Mimosoideae), based on the chloroplast matK coding sequence and flanking trnK intron spacer regions. Austral Syst Bot 16:27–33
Miyata Y, Sugita C, Maruyama K, Sugita M (2008) RNA editing in the anticodon of tRNALeu (CAA) occurs before group I intron splicing in plastids of a moss Takakia lepidozioides S. Hatt. & Inoue. Plant Biol 10:250–255
Moore MJ, Dhingra A, Soltis PS, Shaw R, Farmerie WG, Folta KM, Soltis DE (2006) Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biol 6:17
Moore MJ, Bell CD, Soltis PS, Soltis DE (2007) Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA 104:19363–19368
Morrison DA (2009a) Why would phylogeneticists ignore computerized sequence alignment? Syst Bot 58:150–158
Morrison DA (2009b) A framework for phylogenetic sequence alignment. Plant Syst Evol (this volume, pp 127–149). doi:10.1007/s00606-008-0072-5
Mort ME, Archibald JK, Randle CP, Levsen ND, O’Leary TR, Topalov K, Wiegand CM, Crawford DJ (2007) Inferring phylogeny at low taxonomic levels: utility of rapidly evolving cpDNA and nuclear ITS loci. Am J Bot 94:173–183
Morton BR, Clegg MT (1993) A chloroplast DNA mutational hotspot and gene conversion in a noncoding region near rbcL in the grass family (Poaceae). Curr Genet 24:357–365
Morton BR, Clegg MT (1995) Neighboring base composition is strongly correlated with base substitution bias in a region of the chloroplast genome. J Mol Evol 41:597–603
Müller K, Borsch T (2005a) Phylogenetics of Utricularia (Lentibulariaceae) and molecular evolution of the trnK intron in a lineage with high mutational rates. Plant Syst Evol 250:39–67
Müller K, Borsch T (2005b) Phylogenetics of Amaranthaceae based on matK/trnK sequence data: evidence from Parsimony, Likelihood, and Bayesian analyses. Ann Missouri Bot Gard 92:66–102
Müller KF, Borsch T, Hilu KW (2006) Phylogenetic utility of rapidly evolving DNA at high taxonomical levels: contrasting matK, trnT-F and rbcL in basal angiosperms. Mol Phylogenet Evol 41:99–117
Murdock AG (2008) Phylogeny of marattioid ferns (Marattiaceae): inferring a root in the absence of a closely related outgroup. Am J Bot 95:626–641
Nelissen B, van de Peer Y, Wilmotte A, De Wachter R (1995) An early origin of plastids within the cyanobacterial divergence is suggested by evolutionary trees based on complete 16S rRNA sequences. Mol Biol Evol 12:1166–1173
Nikolaou C, Almirantis Y (2006) Deviations from Chargaff’s second parity rule in organellar DNA—insights into the evolution of organellar genomes. Gene 381:34–41
Ochoterena H (2009) Homology in coding and noncoding DNA sequences: a parsimony perspective. Plant Syst Evol (this volume, pp 151–168). doi:10.1007/s00606-008-0095-y
Ogden TH, Rosenberg MS (2006) Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 55:314–328
Ogihara Y, Yamazaki Y, Murai K, Kanno A, Terachi T, Shiina T, Miyashita N, Nasuda S, Nakamura C, Mori N, Takumi S, Murata M, Futo S, Tsunewaki K (2005) Structural dynamics of cereal mitochondrial genomes as revealed by complete nucleotide sequencing of the wheat mitochondrial genome. Nucl Acids Res 33:6235–6250
Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi H, Ozeki H (1986) Chloroplast gene organization deduced from complete sequence of the liverwort Marchantia polymorpha chloroplast DNA. Nature 322:572–574
Olsson S, Buchbender V, Huttunen S, Enroth J, Hedenäs L, Quandt D (2009) Evolution of the Neckeraceae (Bryophyta): resolving the backbone phylogeny and identifying ancestral character states. Syst Biodiv 7 (in press)
Ostersetzer O, Cooke AM, Watkins KP, Barkan A (2005) CRS1, a chloroplast group II intron splicing factor, promotes intron folding through specific interactions with two intron domains. Plant Cell 17:241–255
Oxelman B, Liden M, Berglund D (1997) Chloroplast rps16 intron phylogeny of the tribe Sileneae (Caryophyllaceae). Plant Syst Evol 206:393–410
Pacak A, Szweykowsska-Kulinska Z (2000) Molecular data concerning alloploid character and the origin of chloroplast and mitochondrial genomes in the liverwort species Pellia borealis. J Plant Biotechnol 2:101–108
Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence of character state data. Syst Biol 53:571–581
Palmer JD (1983) Chloroplast DNA exists in two orientations. Nature 301:92–93
Palmer JD (1985) Comparative organisation of chloroplast genomes. Annu Rev Genet 19:325–354
Palmer JD (1991) Plastid chromosomes: structure and evolution. In: Bogorad L, Vasil IK (eds) The molecular biology of plastids. Academic Press, San Diego, pp 5–53
Pedersen N, Hedenäs S (2003) Phylogenetic investigations of a well supported clade within the acrocarpous moss family Bryaceae: evidence from seven chloroplast DNA sequences and morphology. Plant Syst Evol 240:115–132
Penny D, Hendy MD (1985) The use of tree comparison metrics. Syst Zool 34:75–82
Perret M, Chautems A, Spichiger R, Kite G, Savolainen V (2003) Systematics and evolution of tribe Sinningieae (Gesneriaceae): evidence from phylogenetic analyses of six plastid DNA regions and nuclear ncpGS1. Am J Bot 90:445–460
Persson C (2000) Phylogeny of Gardenieae (Rubiaceae) based on chloroplast DNA sequences from the rps16 intron and trnL(UAA)-F(GAA) intergenic spacer. Nord J Bot 20:257–270
Pirie MD, Vargas MPB, Botermans M, Bakker FT, Chatrou LW (2007) Ancient paralogy in the cpDNA trnL-F region in Annonaceae: implications for plant molecular systematics. Am J Bot 94:1003–1016
Pombert JF, Lemieux C, Turmel M (2006) The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes. BMC Biol 4:3
Provan J, Soranzo N, Wilson NJ, Goldstein DB, Powell W (1999) A low mutation rate for chloroplast microsatellites. Genetics 153:943–947
Pyle AM, Lambowitz AM (2006) Group II introns: ribozymes that splice RNA and invade DNA. In: Gesteland RF, Cech TR, Atkins JF (eds) The RNA world. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 449–505
Qin PZ, Pyle AM (1998) The architectural organization and mechanistic function of group II intron structural elements. Curr Opin Struct Biol 8:301–308
Qiu Y-L, Palmer JD (2004) Many independent origins of trans splicing of a plant mitochondrial group II intron. J Mol Evol 59:722–724
Qiu Y-L, Lee J, Bernasconi-Quandroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (1999) The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402:404–407
Quandt D, Stech M (2004) Molecular evolution and phylogenetic utility of the chloroplast trnT-trnF region in bryophytes. Plant Biol 6:545–554
Quandt D, Stech M (2005) Molecular evolution of the trnL(UAA) intron in bryophytes. Mol Phylogenet Evol 36:429–443
Quandt D, Müller K, Huttunen S (2003) Characterisation of the chloroplast DNA psbT-H region and the influence of dyad symmetrical elements on phylogenetic reconstructions. Plant Biol 5:400–410
Quandt D, Müller K, Stech M, Hilu KW, Frey W, Frahm JP, Borsch T (2004) Molecular evolution of the chloroplast trnL-F region in land plants. Monogr Syst Bot Missouri Bot Gard 98:13–37
Quandt D, Wanke S, Müller K, Hernández-Maqueda R, Stech M, Löhne C, Hilu KW, Borsch T (2006) The role of hairpins in molecular evolution. Abstract no. 738. Presented at “Botany 2006,” BSA international conference, Chico, CA, USA, July 28-August 2, 2006. Abstract no. 738
Rahmanzadeh R, Müller K, Fischer E, Bartels D, Borsch T (2005) Linderniaceae and Gratiolaceae are further lineages distinct from Scrophulariaceae (Lamiales). Plant Biol 7:67–78
Raubeson LA, Jansen RK (1992) Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants. Science 255:1697–1699
Raubeson LA, Jansen RK (2005) Chloroplast genomes of plants. In: Henry RJ (ed) Plant diversity and evolution: genotypic and phenotypic variation in higher plants. CAB International, Cambridge, pp 45–68
Raubeson LA, Peery R, Chumley TW, Dziubek C, Fourcade HM, Boore JL, Jansen RK (2007) Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus. BMC Genomics 8:174
Reboud X, Zeyl C (1994) Organelle inheritance in plants. Heredity 72:132–140
Renner SS (1999) Circumscription and phylogeny of the Laurales: evidence from molecular and morphological data. Am J Bot 86:1301–1315
Renner SS, Chanderbali AS (2000) What is the relationship among Hernadiaceae, Lauraceae, and Monimiaceae, and why is this question so difficult to answer? Int J Plant Sci 161:S109–S119
Richardson JE, Fay MF, Cronk QCB, Bowman D, Chase MW (2000) A phylogenetic analysis of Rhamnaceae using rbcL and trnL-F plastid DNA sequences. Am J Bot 87:1309–1324
Rokas A, Carroll SB (2008) Frequent and widespread parallel evolution of protein sequences. Mol Biol Evol 25:1943–1953
Roper JM, Hansen SK, Wolf PG, Karol KG, Mandoli DF, Everett KDE, Kuehl J, Boore JL (2007) The complete plastid genome sequence of Angiopteris erecta (G. Forst.) Hoffm. (Marattiaceae). Am Fern J 97:95–106
Sakai M, Kanazawa A, Fujii A, Thseng FS, Abe J, Shimamoto Y (2003) Phylogenetic relationships of the chloroplast genomes in the genus Glycine inferred from four intergenic spacer sequences. Plant Syst Evol 239:29–54
Sakai A, Takano H, Kuroiwa T (2004) Organelle nuclei in higher plants: structure, composition, function, and evolution. Int Rev Cytol 238:59–118
Sánchez del-Pino I, Borsch T, Motley TJ (2009) trnL-F and rpl16 sequence data and dense taxon sampling reveal monophyly of unilocular anthered Gomphrenoideae (Amaranthaceae), and an improved picture of their internal relationships. Syst Bot 34:57–67
Sang T, Crawford D, Stuessy TF (1997) Chloroplast DNA phylogeny, reticulate evolution, and biogeography of Paeonia (Paeoniaceae). Am J Bot 84:1120–1136
Sass C, Little DP, Stevenson DW, Specht CD (2007) DNA barcoding markers for species identification of Cycads. PLoS ONE 2(11):e1154. doi:10.1371/journal.pone.0001154
Sauquet H, Doyle JA, Scharaschkin T, Borsch T, Hilu KW, Chatrou LW, Le Thomas A (2003) Phylogenetic analysis of Magnoliales and Myristicaceae based on multiple data sets: implications for character evolution. Bot J Linn Soc 142:125–186
Savolainen V, Spichiger R, Manen JF (1997) Polyphyletism of Celastrales deduced from a chloroplast noncoding DNA region. Mol Phylogenet Evol 7:145–157
Scheen A-C, Brochmann C, Brysting AK, Elven R, Morris A, Soltis DE, Soltis PS, Albert VA (2004) Northern hemisphere biogeography of Cerastium (Caryophyllaceae): insights from phylogenetic analysis of noncoding plastid nucleotide sequences. Am J Bot 91:943–952
Schmickl R, Kiefer C, Dobeš C, Koch MA (2009) Evolution of trnF(GAA) pseudogenes in cruciferous plants. Plant Syst Evol (this volume, pp 229–240). doi:10.1007/s00606-008-0030-2
Seberg O, Petersen G (2009) How many loci does it take to DNA barcode a crocus? PloS ONE 4(2):e4598. doi:10.1371/journal.pone.0004598
Shaver JM, Oldenburg DJ, Bendich AJ (2006) Changes in chloroplast DNA during development in tobacco, Medicago truncatula, pea, and maize. Planta 224:72–82
Shaw J, Lickey E, Beck JT, Farmer SB, Liu W, Miller J, Siripun KC, Winder CT, Schilling EE, Small RL (2005) The tortoise and the hare II: relative utility of 21 noncoding chloroplast DNA sequences for phylogenetic analysis. Am J Bot 92:142–166
Shaw J, Lickey EB, Schilling EE, Small RL (2007) Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: the tortoise and the hare III. Am J Bot 94:275–288
Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K, Ohto C, Torazawa K, Meng BY, Sugita M, Deno H, Kamogashira T, Yamada K, Kusuda J, Takaiwa F, Kato A, Tohdoh N, Shimada H, Sugiura M (1986) The complete nucleotide sequence of the tobacco chloroplast genome—its gene organization and expression. EMBO J 5:2043–2049
Simmons MP, Ochoterena H, Carr TG (2001) Incorporation, relative homoplasy, and effect of gap characters in sequence-based phylogenetic analyses. Syst Biol 50:454–462
Simmons MP, Zhang LB, Webb CT, Reeves A (2006) How can third codon positions outperform first and second codon positions in phylogenetic inference? An empirical example from the seed plants. Syst Biol 55:245–258
Small RL, Ryborn JA, Cronn RC, Seelanan T, Wendel JF (1998) The tortoise and the hare: choosing between noncoding plastome and nuclear Adh sequences for phylogenetic reconstruction in a recently diverged plant group. Am J Bot 85:1301–1315
Small RL, Lickey EB, Shawa J, Hauk WD (2005) Amplification of noncoding chloroplast DNA for phylogenetic studies in lycophytes and monilophytes with a comparative example of relative phylogenetic utility from Ophioglossaceae. Mol Phylogenet Evol 36:509–522
Soejima A, Wen J (2006) Phylogenetic analysis of the grape family (Vitaceae) based on three chloroplast markers. Am J Bot 93:278–287
Soltis DE, Soltis PS (1998) Choosing an approach and appropriate gene for phylogenetic analysis. In: Soltis DE, Soltis PS, Doyle JJ (eds) Molecular systematics of plants II. DNA sequencing. Kluwer, London, pp 1–42
Sotiaux A, Enroth J, Olsson S, Quandt D, Vanderpoorten A (2009) When morphology and molecules tell us different stories: a case-in-point with Leptodon corsicus, a new and unique endemic moss species from Corsica. J Bryol (in press)
Stech M, Quandt D (2006) Molecular evolution and phylogenetic utility of the chloroplast atpB-rbcL spacer in bryophytes. In: Sharma AK, Sharma A (eds) Plant genome: biodiversity and evolution, vol 2B. Science Publishers, Enfield, pp 409–431
Stech M, Quandt D, Frey W (2003) Molecular evolution of the chloroplast DNA trnL-trnF region in the hornworts (Anthocerotophyta) and its phylogenetic implications. J Plant Res 116:389–398
Štorchová H, Olson MS (2004) Comparison between mitochondrial and chloroplast DNA variation in the native range of Silene vulgaris. Mol Ecol 13:2909–2910
Štorchová H, Olson MS (2007) The architecture of the chloroplast psbA-trnH noncoding region in angiosperms. Plant Syst Evol 268:235–256
Su Y-J, Wang T, Zheng B, Jiang Y, Ouyang P-Y, Chen G-P (2005) Genetic variation and phylogeographical patterns in Alsophila podophylla from southern China based on cpDNA atpB-rbcL sequence data. Am Fern J 95:68–97
Sugiura M, Hirose T, Sugita M (1998) Evolution and mechanisms of translation in chloroplasts. Annu Rev Genet 32:437–459
Sugiura C, Kobayashi Y, Aoki S, Sugita C, Sugita M (2003) Complete chloroplast DNA sequence of the moss Physcomitrella patens: evidence for the loss and relocation of rpoA from the chloroplast to the nucleus. Nucl Acids Res 31:5324–5331
Swofford DL (2001) PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland
Swofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) Phylogenetic inference. In: Hillis DM, Moritz C, Mable BK (eds) Molecular systematics. Sinauer Associates, Sunderland, pp 407–514
Taberlet P, Gielly L, Pautou G, Bouvet J (1991) Universal primers for amplification of 3 noncoding regions of chloroplast DNA. Plant Mol Biol 17:1105–1109
Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, Valentini A, Vermat T, Corthier G, Brochmann C, Willerslev E (2007) Power and limitations of the chloroplast trnL(UAA) intron for plant DNA barcoding. Nucl Acids Res 35:e14
Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564–577
Tan B, Liu K, Yue XL, Liu F, Chen JM, Wang QF (2008) Chloroplast DNA variation and phylogeographic patterns in the Chinese endemic marsh herb Sagittaria potamogetifolia. Aquat Bot 89:372–378
Tautz D, Trick M, Dover GA (1986) Cryptic simplicity in DNA is a major source of genetic variation. Nature 322:652–656
Tesfaye K, Borsch T, Govers K, Bekele E (2007) Characterization of Coffea chloroplast microsatellites and evidence for the recent divergence of C. arabica and C. eugenioides chloroplast genomes. Genome 50:1112–1129
Teshima KM, Innan H (2008) Neofunctionalization of duplicated genes under the pressure of gene conversion. Genetics 178:1385–1398
Testolin R, Cipriani G (1997) Paternal inheritance of chloroplast DNA and maternal inheritance of mitochondrial DNA in the genus Actinidia. Theor Appl Genet 94:897–903
Thorne JL, Kishino H, Felsenstein J (1992) Inching towards reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16
Timme TE, Kuehl JV, Boore JL, Jansen RK (2007) A comparative analysis of the Lactuca and Helianthus (Asteraceae) plastid genomes: identification of divergent regions and categorization of shared repeats. Am J Bot 94:301–312
Toor N, Hausner G, Zimmerly S (2001) Coevolution of group II intron RNA structures with their intron-encoded reverse transcriptases. RNA 7:1142–1152
Tsudzuki J, Nakashima K, Tsudzuki T, Hiratsuka J, Shibata M, Wakasugi T, Sugiura M (1992) Chloroplast DNA of black pine retains a residual inverted repeat lacking rRNA genes: nucleotide sequences of trnQ, trnK, psbA, trnI and trnH and the absence of rps16. Mol Gen Genet 232:206–214
Tsuji S, Ueda K, Nishiyama T, Hasebe M, Yoshikawa S, Konagaya A, Nishiuchi T, Yamaguchi K (2007) The chloroplast genome from a lycophyte (microphyllophyte), Selaginella uncinata, has a unique inversion, transpositions and many gene losses. J Plant Res 120:281–290
Turmel M, Otis C, Lemieux C (2002) The chloroplast and mitochondrial genome sequences of the charophyte Chaeotosphaeridium globosum: insights into the timing of the events that reconstructed organelle DNAs within the green algal lineage that led to land plants. Proc Natl Acad Sci USA 99:11275–11280
Turmel M, Otis C, Lemieux C (2006) The chloroplast genome sequence of Chara vulgaris sheds new light into the closest green algal relatives of land plants. Mol Biol Evol 23:1324–1338
Van Ham RCHJ, ‘t Hart H, Mes THM, Sandbrink JM (1994) Molecular evolution of noncoding regions of the chloroplast genome in the Crassulaceae and related species. Curr Genet 25:558–566
Vanderpoorten A, Long DG (2006) Budding speciation and neotropical origin of the Azorean endemic liverwort, Leptoscyphus azoricus. Mol Phylogenet Evol 40:73–83
Verbruggen H, Theriot E (2008) Building trees of algae: some advances in phylogenetic and evolutionary analysis. Eur J Phycol 43:229–252
Vijverberg K, Bachmann K (1999) Molecular evolution of a tandemly repeated trnF(GAA) gene in the chloroplast genome of Microseris (Asteraceae) and the use of structural mutations in phylogenetic analysis. Mol Biol Evol 16:1329–1340
Vogel J, Börner T, Hess W (1999) Comparative analysis of splicing of the complete set of chloroplast group II introns in three higher plant mutants. Nucleic Acids Res 27:3866–3874
Wakasugi T, Tsudsuki J, Ito S, Nakashima K, Tsudsuki T, Sugiura M (1994) Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc Natl Acad Sci USA 91:9794–9798
Wakasugi T, Nishikawa A, Yamada K, Sugiura M (1998) Complete nucleotide sequence of the plastid genome from a fern, Psilotum nudum. Endocytobiosis Cell Res 13(Suppl):147
Wanke S, Quandt D, Neinhuis C (2006) Universal primers for a large cryptically simple cpDNA microsatellite region in Aristolochia. Mol Ecol Notes 6:1051–1053
Wanke S, Jaramillo MA, Borsch T, Samain MS, Quandt D, Neinhuis C (2007) Evolution of the Piperales—matK and trnK intron sequence data reveals a lineage specific resolution contrast. Mol Phylogenet Evol 42:477–497
Watanabe K, Kajita T, Murata J (2006) Chloroplast DNA variation and geographical structure of the Aristolochia kaempferi group (Aristolochiaceae). Am J Bot 93:442–453
Weising K, Gardner RC (1999) A set of universal PCR primers for the analysis of simple sequence repeat polymorphisms in chloroplast genomes of dicotyledonous angiosperms. Genome 42:9–19
Westhoff P, Herrmann RG (1988) Complex RNA maturation in the chloroplast: the psbB operon from spinach. Eur J Biochem 171:551–564
Wicke S, Quandt D (2009) Universal primers for amplification of the trnK/matK region in land plants. Anales Jard Bot Madrid (in press)
Wickett NJ, Zhang Y, Hansen SK, Roper JM, Kuehl JV, Plock SA, Wolf PG, DePamphilis CW, Boore JL, Goffinet B (2008) Functional gene losses occur with minimal size reduction in the plastid genome of the parasitic liverwort Aneura mirabilis. Mol Biol Evol 25:393–401
Wissemann V, Ritz CM (2005) The genus Rosa (Rosoideae, Rosaceae) revisited: molecular analysis of nrITS-1 and atpB-rbcL intergenic spacer (IGS) versus conventional taxonomy. Bot J Linn Soc 147:275–290
Wolf PG, Rowe CA, Sinclair RB, Hasebe M (2003) Complete nucleotide sequence of the chloroplast genome from a leptosporangiate fern, Adiantum capillus-veneris L. DNA Res 10:59–65
Wolf PG, Karol KG, Mandoli DF, Kuehl J, Arumuganathan K, Ellis MW, Mishler BD, Kelch DG, Olmstead RG, Boore JL (2005) The first complete chloroplast genome sequence of a lycophyte, Huperzia lucidula (Lycopodiaceae). Gene 350:117–128
Wolfe KH, Li WH, Sharp PM (1987) Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci USA 84:9054–9058
Wolfe KH, Morden CW, Palmer JD (1992) Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proc Natl Acad Sci USA 89:10648–10652
Won H, Renner SS (2005) The chloroplast trnT-trnF region in the seed plant lineage Gnetales. J Mol Evol 61:425–436
Worberg A, Quandt D, Barniske A-M, Löhne C, Hilu KW, Borsch T (2007) Phylogeny of basal eudicots: insights from noncoding and rapidly evolving DNA. Org Divers Evol 7:55–77
Worberg A, Alford MH, Quandt D, Borsch T (2009) Huerteales sister to Brassicales plus Malvales, and newly circumscribed to include Dipentodon, Gerrardina, Huertea, Perrottetia, and Tapiscia. Taxon 58:468–478
Wu CS, Wang YN, Liu SM, Chaw SM (2007) Chloroplast genome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium: insights into cpDNA evolution and phylogeny of extant seed plants. Mol Biol Evol 24:1366–1379
Wu CH, Lai YT, Lin CP, Wang YN, Chaw SM (2009) Evolution of reduced and compact chloroplast genomes (cpDNAs) in gnetophytes: selection toward a lower-cost strategy. Mol Phylogenet Evol. doi:10.1016/j.ympev.2008.12.026
Xu MQ, Kathe SD, Goodrich-Blair H, Nierzwicki-Bauer SA, Shub DA (1990) Bacterial origin of a chloroplast intron: conserved self-splicing group-I introns in cyanobacteria. Science 250:1566–1570
Xu DH, Sakai AJ, Kanazawa A, Shimamoto A, Shimamoto Y (2000) Sequence variation of noncoding regions of chloroplast DNA of soybean and related wild species and its implications for the evolution of different chloroplast haplotypes. Theor Appl Genet 101:724–732
Yamaguchi K, von Knoblauch K, Subramanian AR (2000) The plastid ribosomal proteins: identification of all the proteins in the 30 S subunit of an organelle ribosome (chloroplast). J Biol Chem 275:28455–28465
Yang F-S, Wang X-Q (2007) Extensive length variation in the cpDNA trnT-trnF region of hemiparasitic Pedicularis and its phylogenetic implications. Plant Syst Evol 264:251–264
Young ND, dePamphilis CW (2000) Purifying selection detected in the plastid gene matK and flanking ribozyme regions within a group II intron of nonphotosynthetic plants. Mol Biol Evol 17:1933–1941
Zerges W (2000) Translation in chloroplasts. Biochimie 82:583–601
Acknowledgments
This work was supported by grants of the Deutsche Forschungsgemeinschaft to T.B. (BO1815/2) and D.Q. (QU153/2) for the project “Mutational dynamics of noncoding genomic regions and their potential for reconstructing eudicot evolution.” Karsten Salomo (Dresden) provided unpublished data for Fig. 5 and Kim Govers (Berlin) helped with preparing Fig. 1. Various kinds of support by Susi Wicke (Vienna/Bonn) and Markus Ackermann (Berlin) are appreciated. We kindly acknowledge helpful comments by Scot Kelchner and David Morrison to an earlier version of this manuscript.
Author information
Authors and Affiliations
Corresponding authors
Additional information
An erratum to this article can be found at http://dx.doi.org/10.1007/s00606-009-0233-1
Rights and permissions
About this article
Cite this article
Borsch, T., Quandt, D. Mutational dynamics and phylogenetic utility of noncoding chloroplast DNA. Plant Syst Evol 282, 169–199 (2009). https://doi.org/10.1007/s00606-009-0210-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00606-009-0210-8