Introduction

Chloroplasts are organelles of prokaryotic origin located in the cytoplasm of most green plants and algae (Margulis 1970). These organelles present several features that make them uniquely useful for studies of population and evolutionary genetics. Chloroplasts contain a genome approximately 115–165 kb in size that codes for ~130 genes, comprising about 90 proteins (including essential proteins for photosynthesis), 30 tRNAs, and four rRNAs, making them sufficiently large and complex to include structural and point mutations (Raubeson and Jansen 2005). Strong purifying selection on photosynthesis-related genes imposes constraints on nucleotide and structural mutation rates, leading to high degree of conservation of chloroplast sequence content and order across, and especially within, taxa. In most angiosperms, chloroplasts are inherited maternally, while the mode of inheritance in gymnosperms is mostly paternal (Petit et al. 2005). The haploid state and uniparental transmission of chloroplasts gives chloroplast genes and genomes an effective population size of one-fourth of a nuclear locus (Birky 1978; Birky et al. 1983), making them more susceptible to stochastic processes such as drift and founder events than nuclear genes and genomes. As a result, chloroplast genes display much higher genetic differentiation values as compared to differentiation at nuclear loci for the same taxa. A review of the comparative organization of organelle and nuclear diversity in plants covering 152 studies and 144 species reported an average G ST of 0.637 ± .002 for maternally inherited genes (i.e., chloroplast genes) versus an average G ST of 0.184 ± 0.002 for biparentally inherited (nuclear) genes in angiosperms (Petit et al. 2005). This contrasting pattern of nuclear and chloroplast differentiation is a property that has commonly been exploited to test hypotheses of seed and pollen dispersal, migration/colonization routes, intraspecific differentiation, and interspecific introgression (Petit 2004; Petit et al. 2001, 2004, 2005; Magni et al. 2005; Tovar-Sanchez et al. 2008).

Oaks are angiosperm trees in the family Fagaceae, in the genus Quercus, which consists of over 500 species worldwide. Quercus contains three major groups: sections Lobatae, Protobalanus, and Cerris; all evolved at middle latitudes of the Americas (Nixon 1993), but only Lobatae are unambiguously monophyletic (Manos et al. 1999). The Lobatae, or red oak group, contains about 195 species, with as many as 19 species sympatric with northern red oak (Quercus rubra L.). The taxonomy of oak species in general and red oaks in particular presents a considerable challenge, as closely related, interfertile oak species are not well differentiated (Manos et al. 1999; Moran et al. 2012), and even more distantly related congeners may share traits associated with habitat specialization due to convergent evolution (Cavender-Bares et al. 2004). For the same reasons, however, oak species are good models for studies of speciation, population genetics and gene flow, as they are primarily outcrossing, often have a wide geographical range, and they hybridize freely with congeners. Although ecophysiological studies indicated that sympatric red oaks may specialize into niches associated with soil moisture or fire regimes, sharing the landscape while maintaining diversity (Cavender-Bares and Holbrook 2001), molecular studies of communities occupied by multiple oak species reveal a complex relationship between hybridization and adaptation (Moran et al. 2012; Lind and Gailing 2013).

Northern red oak is an economically and ecologically important forest tree of North America. It is a major dominant hardwood species with a wide native range extending east from Nebraska to the Atlantic coast (60° to 96° W longitude) and from northern Ontario to southern Alabama (32° to 47° N latitude). Northern red oak has adapted to wide ranges of mean annual temperature and rainfall. Often it is the dominant oak species on lower slopes and north-facing slopes of the eastern deciduous forest (Abrams 2002), and it usually co-occurs with several other close congeners. Intraspecific variation for adaptive traits such as height growth, phenology, drought resistance, and cold-hardiness has been reported on altitudinal, latitudinal, and longitudinal clines across the species' range (Gall and Taft 1973; McGee 1974; Schlarbaum and Bagley 1981; Kriebel 1993). Compared to other oak species, however, the neutral-marker population differentiation (G ST) and chloroplast haplotype diversity of northern red oak is remarkably low (Sork et al. 1993; Hokanson et al. 1993; Magni et al. 2005; Feng et al. 2008). White oaks from the eastern United States had a mean chloroplast G ST value of 0.87 ± 0.07 (Whittemore and Schaal 1991) while six species of European white oaks exhibited coefficients of chloroplast differentiation ranging from 0.78 in Q. robur to 0.96 in Q. pyrenaica (Petit et al. 2002). Magni et al. (2005) found the value of cpDNA differentiation in Q. rubra (G ST = 0.46) was lower than the mean G ST value of 22 European forest trees and shrubs (G ST = 0.54, Petit et al. 2005) and the overall mean of angiosperms (G ST = 0.76, Dumolin-Lapegue et al. 1999), though Tovar-Sanchez et al. (2008) reported an R ST (similar to G ST) value of 0.398 for Quercus crassipes in an introgression zone.

As in most angiosperms, oak chloroplasts are inherited maternally (Dumolin et al. 1995; Romero-Severson et al. 2003). Chloroplast intergenic markers have been used extensively to examine genetic differentiation of European white oak populations (Quercus subgenus Quercus, Petit et al. 2002), to map the northward colonization of white oaks in Europe following the last glacial maximum approximately 21,000–18,000 years ago (Kremer et al. 2010; Petit et al. 2001, 2004), and to elucidate the invasion-by-hybridization model of gene flow in the Quercus roburQuercus petraea forests of southwestern Europe (Petit 2004).

Q. rubra chloroplasts show strikingly low differentiation of among populations, and low haplotype diversity. This has been attributed to short post-glacial migration routes and a particularly high seed-mediated gene flow (Magni et al. 2005; Birchenko et al. 2009). Whatever the historical factors causing low haplotype diversity in red oak, detection of statistically robust population differentiation requires larger amounts of chloroplast DNA sequence - both in terms of number of individuals sampled and number of bases sequenced - than nuclear DNA sequence, (Small et al. 1998). Large amounts of sequence have traditionally been generated by cloning and Sanger sequencing or primer walking approaches (Jansen et al. 2005; Petit and Vendramin 2006), which can be laborious and time-consuming. Recently introduced second-generation sequencing technologies provide the possibility of acquiring entire genomes — or many simple genomes — at a fraction of the cost and time of traditional approaches. Multiplex tagging methods have the potential to spread the capacity of high-capacity sequencers across many genomes and strike an acceptable balance between coverage, throughput, and cost (Cronn et al. 2008). Second-generation sequencing platforms have proven useful in sequencing chloroplast genomes (Moore et al. 2006, 2007; Cronn et al. 2012; Parks et al. 2010; Straub et al. 2012). As a consequence, there has been a rapid increase in the number of chloroplast genomes added to public databases in recent years (Ravi et al. 2008; Parks et al. 2012).

Given that the intraspecific chloroplast diversity detected in northern red oak is low, more powerful tools are necessary to more accurately characterize Q. rubra chloroplast diversity. A fully sequenced and annotated northern red oak chloroplast genome would serve as a reference for the construction of chloroplast genomes of other red oak species and provide a reference for comparative genomics within the Fagaceae. Detection of intra- and interspecific polymorphisms is essential for studying population differentiation, phylogeography, and evolutionary history of North American oak species. Tools for detecting intraspecific variation are also needed for management goals such as monitoring reintroduced populations, tracking wood products, and certifying seed lots and forests (Deguilloux et al. 2003). To this end, we report the sequencing, assembly, and annotation of the chloroplast genome of northern red oak. Our objectives were to 1) annotate and characterize the genome including the location and description of conserved regions, genes, and pseudogenes, and 2) to locate polymorphisms between Quercus species and within northern red oak.

Materials and methods

Branches bearing fresh leaves were collected from 17 trees in the Cherokee National Forest Watauga northern red oak seed orchard in Unicoi County, TN (36.33°N, 82.03°W; Table 1). Branches were stored in the dark at 4 °C for 7–10 days in order for starches to break down. Chloroplast isolation and cpDNA extraction were performed following the methods of Michaud et al. (1995) for chloroplast isolation in broad-leaved tree species. The method has two major components: the isolation of chloroplasts in non-aqueous solutions to avoid the activity of oxidative and phenolic compounds, and the chloroform/isoamylalcohol extraction of chloroplast DNA. Briefly, young leaves were ground in liquid nitrogen and freeze dried in vacuo. Dry leaf powder was blended with n-hexane/carbon tetrachloride and the homogenate filtered through nylon cloth. The filtrate was overlaid with n-hexane/CCl4 and centrifuged. The chloroplast fraction, visible as a dark green band, was extracted, washed twice by pelleting in 12 ml of n-hexane/CCl4 mixtures of ρ = 1.25 and 1.12, successively, pelleted in n-hexane, and desiccated until use. For cpDNA extraction, chloroplast pellets were resuspended in a lysing solution and centrifuged. Supernatant from the lysate was incubated with a pronase and CTAB buffer and the mixture was extracted two or three times with chloroform/isoamylalcohol. Nucleic acids were precipitated from the aqueous phase of the final extraction with 2-propanol, washed with cold ethanol, pelleted by centrifugation, and stored at −20°C until use.

Table 1 Identification and origin information for 17 Quercus rubra ramets in the Watauga northern red oak clonal seed orchard (Watauga, TN, USA) that contributed leaf samples for chloroplast isolation

Genome Sequencer 20 (Roche Life Sciences) library construction and sequencing were performed as described in Marguiles et al. (2005) with slight modifications as specified by Roche Life Sciences. High molecular weight DNA from the chloroplast DNA extraction was sheared by nebulization into to a size range of 300–800 bp. DNA fragment ends were repaired and phosphorylated using T4 DNA polymerase and T4 polynucleotide kinase. Adaptor oligonucleotides "A" and "B" supplied with the 454 Life Sciences sequencing reagent kit were ligated to the DNA fragments using T4 DNA ligase. Purified DNA fragments were hybridized to DNA capture beads and amplified by emulsion PCR (emPCR). DNA capture beads containing amplified DNA were deposited onto a 40 x 75 mm PicoTiterPlate equipped with an eight lane gasket. Four pools containing chloroplast DNA were assigned two lanes on a single plate (quarter-run). Each pool received a unique MID adapter (Roche Life Sciences) for use in identifying sequences within each pool in downstream applications.

Read trimming, read assembly, reference mapping, and sequence analysis was performed in CLC Bio Genomics Workbench (Finlandsgade, Denmark) and Newbler sequence assembly software. Read trimming, read assembly, and reference mapping were performed using manufacturer's settings except that long reads reference mapping similarity was increased from 0.8 to 0.9. A modified four-step assembly method following Cronn et al. (2008) was used to generate the Q. rubra chloroplast consensus sequence. First, reads were assembled de novo into contigs. Second, reads were mapped against water oak (Quercus nigra L. section Lobatae, Moore et al. 2010) and pedunculate oak (Quercus robur L. section Quercus, Kremer et al. 2012). Third, contigs from de novo assembly were mapped to the Q. nigra-based consensus sequence. The Q. nigra genome was chosen as the reference as Q. nigra is more closely related to Q. rubra than is Q. robur; thus, more de novo contigs from Q. rubra mapped to the Q. nigra genome than the Q. robur genome. The new contigs were aligned to the Q. nigra-based consensus sequence, and gaps between contigs were filled with the Q. nigra-based consensus sequence to generate the "pseudoreference". Fourth, a final reference-guided assembly (RGA) was performed using the pseudoreference and the trimmed reads. This iterative method (also used by Parks et al. 2010 and Straub et al. 2012) leads to more accurate assemblies than RGA alone when the reference and target are not the same species (Straub et al. 2012). To the author's knowledge, this is the first use of this assembly method using 454 short-read data.

CLC Genomics Workbench employs the neighborhood quality standard (NQS) model for SNP calling (Altshuler et al. 2000) which incorporates the quality of the central base (SNP) and the quality of bases in a neighborhood, or window (here, 11 bp). SNP and deletion/insertion polymorphism (DIP) detection was first carried out with parameters set to manufacturer's settings, and carried out a second time with minimum detection coverage increased from 4 to 6, the minimum average quality of surrounding bases increased from 20 to 30, and minimum quality of the central base increased from 30 to 40. The minimum variant frequency for SNP and DIP detection was 35 %, i.e., a SNP or DIP was accepted when the minority base at any polymorphic position was present at a frequency of at least 35 %. Primers were designed by visual examination of the SNP and surrounding sequence using the following guidelines: at least 16 bp in length, GC content of 40 % with no more than 1 % GC difference in the forward and reverse strands, melting temperature between 52 °C and 58 °C with no more than 2 °C difference in melting temperatures between the forward and reverse strands, and at least 40 bp between the SNP and the primers.

Results

The GS 20 quarter-run produced a total of 331,364 reads comprising 115,735,252 bases. MID multiplex tagging allowed the sequencing of 16 individuals in four pools; however, three of the four pools had significant (>95 %) contamination from non-target DNA and were not included in downstream analysis. The remaining pool of four genotypes produced a total of 159,338 reads comprising 54,304,612 bases. The average read length was 365.2 bp. Reads were trimmed for length, quality, and adapters according to manufacturer's settings; after trimming, 156,024 reads remained for de novo assembly and reference mapping with an average read length of 348.1 bp. Of the trimmed reads, 54,921 assembled into 10,441 de novo contigs with a mean length of 538 bp, comprising a total of 5,628,135 bases. 19,348 reads comprising 7,065,243 bases mapped to the Q. nigra reference, more reads than any other reference species. Removal of the inverted repeat region "B" (IRB) from the Q. nigra-based reference allowed alignment of de novo contigs to the Q. nigra-based reference. Thirty-nine contigs with a mean length of 9,412 ± 4,236 bp and an average coverage of 19.2 ± 27.0 aligned to the Q. nigra-based reference. Contigs were merged into two large "supercontigs": the first covered reference bases 1–36,000 and the other bases 36,011 to 135,233 (i.e., to the end of the reference SSC region). The missing area of 11 bases in the LSC was filled in with the Q. nigra-based reference. No areas of conflict were found between contigs and the reference sequence within IRA; however, six areas of misalignment or incorrect base calls were found in the small single copy (SSC) and large single copy (LSC) regions, with all six related to sequence gaps of >10 bp in the contigs relative to the reference. In all six cases, "extra" sequence contained within the Q. nigra-based reference was removed and the de novo contigs joined to create the 161,306-bp pseudoreference sequence. RGA based on the pseudoreference and the 156,024 trimmed reads from MID pool 4 yielded the Q. rubra chloroplast consensus sequence. One hundred percent of the pseudoreference sequence was covered with a minimum coverage of 2 and an average coverage of 43.75 (Table 2, Online Resource 1).

Table 2 Reference mapping coverage statistics for a Quercus nigra-based reference-guided assembly of the Q. rubra chloroplast genome via the Genome Sequencer 20 pyrosequencing platform

The 161,304-bp Q. rubra chloroplast genome consensus sequence (GenBank JX970937) was annotated using DOGMA (Wyman et al. 2004), an online database for searching and annotating chloroplast genomes. During annotation, each putative coding sequence (hereafter, gene) was visually inspected, start and stop codons were manually selected, and putative exons were examined and joined. Annotation results confirmed by BLAST searches of Q. rubra chloroplast genes against other Viridiplantae plastid genomes revealed 11 major classes of proteins coded by 138 genes in the Q. rubra chloroplast: ribosomal RNA (eight genes), transfer RNA (41), photosystem I (five), photosystem II (15), cytochrome b6/f (six), ATP synthase (seven), NADH dehydrogenase (12), ribosomal small and large subunits (13 and 11), RNA polymerase (four), hypothetical open reading frames (eight), and eight miscellaneous proteins (Fig. 1, Online Resource 2). Ribosomal small subunit gene rps19 contained the IRB/LSC border such that its duplicate at the IRA/LSC border appears truncated into a pseudogene. The Q. rubra chloroplast genome LSC and SSC contain 91,121 and 20,150 bp, respectively, while the IR is 25051 bp in length. Almost 60 % (57.4) of the Q. rubra chloroplast genome is comprised of coding sequence (Table 3).

Fig. 1
figure 1

Circular representation of the Quercus rubra chloroplast genome. Black bars in the inner circle denote boundaries between the large single copy (LSC), small single copy (SSC) and inverted repeat (IR) regions. Genes are denoted by boxes scaled to size; boxes outside and inside the circle represent genes on the plus and minus strands, respectively. Introns are represented by asterisks (*) either within the gene or next to the gene name. Distribution of six single nucleotide polymorphisms (SNPs) within the chloroplast genomes of four chloroplast Q. rubra individuals detected via reference-guided assembly of chloroplast GS 20 reads are shown with dotted lines

Table 3 Basic characteristics of the Quercus rubra chloroplast genome

SNPs were detected using standard and high quality (HQ) settings and each SNP and DIP was visually examined for accuracy. A total of 737 (99 HQ) SNPs and 214 DIPs were detected between Quercus robur and Q. rubra; 23 (5 HQ) SNPs and 163 DIPs were detected between Q. nigra and Q. rubra; and eight (6 HQ) SNP and 45 DIPs were detected within the genomes of the four sequenced Q. rubra individuals. All polymorphisms were in the LSC and SSC; no polymorphisms were found in the inverted repeat regions (Fig. 2). Of the 23 SNPs detected between Q. nigra and Q. rubra, nine were located within genes. Of the six HQ SNPs detected within Q. rubra, three were located in the coding region and three in the noncoding regions; ten of 45 DIPs were found in coding regions. Q. rubra chloroplast genes rpoC2 and ycf1 showed the most variability. Including monomers, 233 repeat regions of 9 to 108 bp were detected in the chloroplast genome (Table 4).

Fig. 2
figure 2

Overview of chloroplast genome single nucleotide polymorphism (SNP) and deletion/insertion polymorphism (DIP) distribution between Quercus rubra and Q. robur (left; 737 SNPs, 214 DIPs), Q. rubra and Q. nigra (center; 23 SNPs, 163 DIPs) and within four Q. rubra individuals (right; six SNPs, 45 DIPs). Polymorphisms are clustered in the large and small single copy regions and notably absent in inverted repeat regions

Table 4 Repeat motifs in the Quercus rubra chloroplast genome

Discussion

The most common method of isolating plant plastid DNA involves sucrose-based chloroplast isolation followed by rolling-circle amplification (RCA). It is significantly less expensive than BAC library construction or primer walking (Cronn et al. 2008). However, like our method, the RCA-based method leads to non-chloroplast DNA contamination, with approximately 10–40 % of the resulting product being non-target (mostly nuclear) DNA. This contamination penalty must be overcome in Sanger-based sequencing by increasing sequencing capacity, thereby partially mitigating the significant savings that could come with reducing sequence coverage.

The same contamination reduces plastid genome coverage in GS 20 sequencing runs, but it does not impede the recovery of essentially complete plastid genomes of high accuracy, as demonstrated by the chloroplast genome sequencing of Platanus occidentalis and Nandina domestica (Moore et al. 2006). Moore et al. (2006) also noted that although they observed a higher percentage of non-cpDNA contamination in the RCA product of Platanus (44 %) versus Nandina (18 %), and although that led to a lower average coverage of Platanus, it did not lead to a higher sequencing error rate in Platanus compared to Nandina. While the contamination rate in our sequencing project was much higher (>80 %), we recovered the essentially complete chloroplast genome at an average coverage of 43×, and found high-quality structural and point polymorphisms for further analysis.

In theory, multiplex tags would have allowed us to distribute the capacity of the pyrosequencing platform across 16 genomes and strike a favorable balance between throughput, coverage, and cost, and include even more individuals in the discovery of polymorphisms. High levels of non-target DNA contamination (primarily nuclear) constrained our sequencing to one pool of four individuals. The method we used to isolate and lyse whole chloroplasts for cpDNA extraction relies on the separation of organelles in non-aqueous solutions of differing specific gravities. Leaves with low chloroplast yields need solutions of higher specific gravity than high-yielding leaves to obtain a given amount of chloroplasts; however, solutions of higher specific gravity (e.g., ρ = 1.33 used here) attract more nuclear DNA (Bowman and Dyer 1982). New, partially expanded leaves would likely have a higher concentration of chloroplasts, although Michaud et al. (1995) noted that cork oak (Quercus suber) has leaf chloroplast concentrations low enough to make this method unsuitable at any stage of leaf growth. Another method, such as chloroplast primer walking, may be a more efficient method for isolating and sequencing Quercus cpDNA, though primer walking does not guarantee complete chloroplast coverage (Ferris et al. 1993).

Recently published methods for chloroplast genome sequencing such as "genome skimming" from shotgun sequencing data (Straub et al. 2012) and targeted enrichment of chloroplast sequence, e.g., by hybridization (Cronn et al. 2012) overcome contamination pitfalls by eliminating non-target DNA before sequencing. For example, Straub et al. (2012) reported the mean percentage of cpDNA sequence in genomic libraries of Apocynaceae species to be 11.4 %. In the present study, 19,348 of 156,024 reads (about 12 %) mapped to the chloroplast. Thus, the method we used did not enrich chloroplast sequence beyond what is likely to be present in a genomic library. Even though our plastid prep was contaminated with high levels of nuclear DNA, the 43× coverage depth we obtained was more than the 30x coverage recommended for plastome assembly (Straub et al. 2012). Chloroplast genomes are being added to public databases at a rapid rate (Ravi et al. 2008; Straub et al. 2012). As of January 2013, there were 311 complete chloroplast genomes published in the NCBI organelle genomes database; 19 of those were managed forest tree species (http://www.ncbi.nlm.nih.gov/genomes). Many partially complete chloroplast genomes are also available; for example, over 75 Pinus cp genomes were assembled by Parks et al. (2012) and estimated to be >98 % complete. These sequences are housed in the NCBI nucleotide database, making it difficult to accurately assess the number of functionally complete chloroplast genomes currently available.

Q. nigra, water oak, is native to the USA, a congener with Q. rubra, and the two species often co-occur and freely hybridize, making it the closest related reference sequence (Solomon 1983). After Cronn et al. (2008) we mapped trimmed GS 20 reads to Q. nigra then mapped de novo contigs back to the Q. nigra-based consensus to create a pseudoreference sequence. Eleven base positions (36,001–36,011) were filled in using the Q. nigra-based reference, while 87 base positions were deleted by manually joining contigs rather than using reference sequence. Mapping to this pseudoreference increased the number of reads used, increased minimum coverage from 0 to 2, and decreased the number of zero-coverage areas from 174 to 0, relative to mapping to the Q. nigra reference. This method is better than de novo or RGA alone when an exact reference is not available, especially in difficult repeat regions. Comparison of contigs assembled de novo relative to the reference can be used to identify putative insertions, deletions, and rearrangements that would otherwise be incorrectly assembled in RGA (Parks et al. 2010). In this case, generating a Q. rubra chloroplast consensus sequence using reference mapping alone led to a 99.4 % sequence similarity to the Q. nigra cp genome. Including de novo contigs to generate a pseudoreference decreased the similarity between genomes to 98.8 %, indicating that 0.6 % (~970) more base positions in the final consensus sequence were from Q. rubra (rather than the reference species). This iterative method represents an improvement over reference mapping alone, particularly when searching for polymorphisms between closely related species.

Pine chloroplast genomes assembled by Parks et al. (2012) and Cronn et al. (2012) using an Illumina platform and the assembly methods herein were estimated to be 92 % and 98 % complete, respectively. Using the GS 20 platform as described here, the assemblies of Moore et al. (2006) were estimated to be >99.6 % complete. Although no resequencing or in-group/out-group analysis is available to quantify the completeness of our assembly, we expect it is within the 92–99 % published range due to the close relationship between the reference species Q. nigra and Q. rubra and the resulting high coverage of the reference genome (only 0.04 % of base positions were represented by less than 30× coverage with no zero-coverage areas). In order to fully assemble the Q. rubra chloroplast genome, IR boundaries should be sequenced using available chloroplast universal primers.

The northern red oak chloroplast genome consensus sequence was 161,304 bp, the second smallest oak chloroplast that has been reported (Kremer et al. 2012). Only the chloroplast of Q. robur (161,295 bp) is smaller. Chloroplast structure and gene order are conserved as in most plant chloroplasts, with an IR region of 25 kb separating large (91 kb) and small (20 kb) single-copy regions (Palmer 1991; Raubeson and Jansen 2005). The 138 chloroplast genes code for 32 tRNA genes, four rRNA genes, and 82 protein-coding genes, a total of 118 proteins, higher than the average of 90 for green plants (Ravi et al. 2008), 113 for Nandina and American sycamore (Moore et al. 2006), and 127 for Castanea (Jansen et al. 2010). Unlike many angiosperm taxa, Northern red oak appears to contain a functional ycf1 gene. A functional ycf1 gene was also reported for Prunus persica and Castanea mollissima (Jansen et al. 2010). Four genes have been reported to show variation within Fagaceae: infA, accD, rps18, and petA (Kremer et al. 2012). In Q. rubra, infA appears functional as in other oak species. The accD gene is composed of 542 codons, shorter than the 553 reported for most oaks, but longer than the same gene in Fagus and Castanea. The rps18 gene reportedly displays a length reduction of about 20 amino acids compared to other Quercus species; we found the rps18 gene to contain 102 amino acids. We found no evidence of a large, direct repeat in the petA gene as has been reported for other Fagaceae (Kremer et al. 2012).

In a rangewide study consisting of 290 individuals collected from 66 natural populations, Magni et al. (2005) used five chloroplast primer/enzyme combinations to evaluate northern red oak population subdivision. They found 12 total haplotypes with one haplotype present in 75 % of individuals, and estimated a population subdivision value (G ST) of 0.46 ± 0.08. No spatial structure of chloroplast genetic diversity was detected. The estimate of population subdivision was higher (G ST = 0.53) for 35 populations in the northwestern part of the native range, however, in a series of three studies covering the northwestern range of northern red oak, only five haplotypes were found (Romero-Severson et al. 2003; Feng et al. 2008). We found a promising number of polymorphisms among Quercus species and within Q. rubra in four northern red oak individuals, and primers were designed for the Q. rubra intraspecific SNPs for future assays of these and other populations.

The annotated chloroplast sequence of northern red oak is a valuable genetic tool with interspecific utility in providing new insights into the population and evolutionary genetics of oaks. The Q. rubra chloroplast map provides an ordered list of genes and their sequences that will be valuable as a reference for the construction of other Fagaceae chloroplast genomes. This map will increase the ease of identifying SNPs and performing population genetic analysis and studies of interspecific variability. In combination with nuclear markers, these polymorphisms should be useful for understanding the complex interaction of ecophysiology, morphology and adaptation that occurs in oak hybrid zones (Moran et al. 2012). This map can help elucidate, for example, the significance of hybridization and introgression in species evolution, where precise patterns of gene exchange in a species complex must be known. Quantifying cytoplasmic exchanges using sequence divergence rather than presence or absence of a few haplotypes will increase the resolution to detect introgression that informs whether the long term evolution and management of oak species should be considered independently (Dumolin-Lapegue et al. 1999). Similarly, calculations of chloroplast genetic distances (CGD) are restricted in resolution due to the low number of RFLP sites between species (Kremer et al. 2012). Improved CGD measures based on sequence divergence will improve partial correlation coefficients that describe the relative influences of biogeography and natural selection on population differentiation. Fragmentation of forests constitutes a threat to genetic diversity and viability of forest communities worldwide, and may be exacerbated in many areas by a warming climate. Forest managers can combine forest area and structure information with high resolution chloroplast sequence data of oak species to understand the influence of fragmentation on oak species and complexes. In Europe, genetic variation of several oak species increased as fragmentation increased, likely due to addition by humans of seedlings with varying genotypes (Petit et al. 2002). This relationship may not hold true in the range of northern red oak, where man's record of impact is less intense, and species diversity is much higher than in Europe (Sauer 1988). Forest managers may also use annotated chloroplast sequences for identification purposes. Goals such as monitoring reintroduced populations (Worthen et al. 2010a,b), tracking wood products, and certifying oak seed lots and forest stands will be aided by this valuable tool.