Introduction

Chromosomal DNA in the eukaryotic nucleus is comprised of linear molecules with specific end sequences (telomeres) that protect against degradation by exonucleases and the potential loss of DNA at the 5′ end with each round of replication (Watson 1972). Whereas the telomerase mechanism is employed in the eukaryotic nucleus, alternative strategies are used to protect and maintain the ends of linear chromosomes for viruses, bacteria, and mitochondria, including covalently bound proteins, hairpin structures, and terminal inverted repeats (Chaconas and Kobryn 2010; Nosek et al. 2006; Smith and Keeling 2013). The conceptual “ends problem” does not apply to circular DNA forms, such as bacterial plasmids and mammalian mitochondrial DNA (mtDNA) (Pohjoismaki and Goffart 2011). Until recently, a circular form was also assumed for plastid DNA (ptDNA), and the linear forms that were observed were dismissed as artifacts from breakage of circular molecules (Bendich 2004). Our objective was to determine the sequences at the ends of ptDNA molecules for two reasons. First, this information may be used to elucidate the replication, inheritance, and maintenance (or lack thereof) of ptDNA throughout leaf and chloroplast development (Bendich 2004, 2007; Oldenburg et al. 2014). And second, the form of chromosomal DNA molecules within plastids may influence the design of plastid transformation vectors (Oldenburg and Bendich 2009) so as to improve plastid transformation in cereals.

Plastid genomes typically vary from 115–165 kb and in most, but not all, green plants there are four defined regions: long single copy (LSC), short single copy (SSC) and two inverted repeats (IRa and IRb) (Ravi et al. 2008). The convention for sequenced plastid genomes is to define the first nucleotide (nt = 1) as the beginning of the LSC or its equivalent in genomes without IRs. The ptDNA origins of replication (oris) have been mapped for several plant species and two closely spaced oris are often found. For maize ptDNA three origins had been identified: oriA and oriB by sequence similarity to Oenothera ptDNA; and the third, designated here as oriC, by similarity to Chlamydomonas reinhardtii (Heinhorst and Cannon 1993; Kunnimalaiyaan and Nielsen 1997; Oldenburg and Bendich 2004a). The size of the plastid genome in maize, Zea mays L., is 140,384 bp (Maier et al. 1995), although the extracted ptDNA is found in various structural forms, including branched-linear multigenomic molecules, unit-genome-sized linear isomers and head-to-tail concatemers, and less-than-genome-sized fragments (Oldenburg and Bendich 2004a). The relative abundance of these forms changes during development and depending on growth conditions: branched-linear and monomer ptDNA molecules are prominent at the base of the stalk in green maize seedlings, whereas the ptDNA from a green leaf blade is mostly or entirely degraded to subgenomic fragments (Oldenburg and Bendich 2004b; Oldenburg et al. 2006). In dark-grown maize the genomic monomer, dimer, and higher concatemers are found (Oldenburg et al. 2006), and these forms are abundant in green leaves for tobacco, Nicotiana tabacum L. (Lilly et al. 2001), barrel medic, Medicago truncatula Gaertn. (Shaver et al. 2006), and other dicots (Deng et al. 1989; Rowan et al. 2010).

We previously identified discrete termini (ends) for maize ptDNA by restriction enzyme digestion and blot hybridization (Oldenburg and Bendich 2004a). This method showed that there are two major genomic isomers with ends located at nucleotides (nts) 88,000 and 100,000, with a standard deviation of ±5000 bp, and three minor end isomers; the approximate location of only one of these (nt 78,000) was determined (Oldenburg and Bendich 2009). Using the same methods, ends have also been identified for ptDNA of tobacco (Scharff and Koop 2006, 2007), and barrel medic, a species without an IR (Shaver et al. 2008). This method, however, only provided the approximate terminal locations within the plastid genome and no information about the terminal structure.

We have now determined the precise location for two terminal regions of maize ptDNA by direct sequencing. We find that the ends are near oris and may form stem-loop structures. Cumulative GC skew analysis may be used to identify ends and oris in ptDNA from other species. The structure and mechanism of replication for maize ptDNA molecules are discussed with reference to the linear DNA molecules of herpes simplex virus (HSV).

Materials and methods

Seeds of Zea mays L. inbred line B73 and commercial hybrid Mycogen 2722 were soaked in tap water overnight, sown in soil (Sunshine mix #4) and grown in controlled temperature chambers at 24 °C with 16/8 h light/dark cycles under fluorescent lights or in continuous darkness. Juvenile maize seedlings (8–12 days old) were harvested and washed for ~3 min in 0.5 % sarkosyl (w/v), then rinsed exhaustively in water to minimize microbial contamination. The entire seedling or the first and second leaf blades (L1 and L2) were used for plastid and ptDNA preparations (Oldenburg and Bendich 2004a; Shaver et al. 2006). For plastid isolation, we used high-salt buffer [HSB; 1.25 M NaCl, 40 mM HEPES pH 7.6; 2 mM EDTA, 0.1 % BSA (w/v)] and dilution buffer [DB; 0.33 M D-sorbitol, 20 mM HEPES pH 7.6, 2 mM EDTA, 1 mM MgCl2, 0.1 % BSA (w/v)]. In the initial grinding step, 0.1 % ß-mercaptoethanol (v/v) was added to HSB, and the tissue was homogenized in HSB (plus 0.5–1 mL of antifoam) using a Waring blender and 3 × 5-s bursts. The homogenate was filtered through 1–3 layers of Miracloth and plastids were pelleted by centrifugation at 10,000×g for 5 min. The plastids were then washed once each with HSB and DB. The plastids were layered onto a cushion of 80 % Percoll in DB and centrifuged at 14,000×g for 10 min. Any residual nuclei and starch remaining after differential centrifugation pellets to the bottom of the microfuge tube. The intact plastids, which remain at the top of the Percoll layer, were recovered using a wide-bore pipet and washed 2 or 3 times with DB. The final plastid pellet was resuspended in a small volume of DB.

Plastids were embedded in 0.7 % low-melt agarose (w/v) and lysed overnight at 48 °C in 40 mM EDTA pH 8, 1 % sarkosyl (w/v), 200 µg/mL proteinase K. Phenylmethyl-sulfonyl fluoride (PMSF) was used to inactivate the proteinase K. In-gel ptDNA samples were washed extensively in TE (10 mM Tris pH 8, 1 mM EDTA) and stored at 4 °C in TE. In-gel tobacco ptDNA was prepared by the same procedure, using the leaves of greenhouse-grown plants. Alternative lysis conditions were used for in-gel ptDNAs intended for exonuclease digestion (given in Table 3), and the digestions were performed as described for liverwort mtDNA (Oldenburg and Bendich 2001). Note that the in-gel ptDNAs used for end-cloning (Fig. 1) were prepared using 40 mM EDTA, rather than the more commonly used 470 mM EDTA. We found that lysis conditions using either high EDTA or low EDTA plus 1 M NaCl lead to increased DNA migration from the well (see Online Resource 3). However, these high-salt concentrations can also induce the formation of G-quartet structures at the ends of double-stranded DNA (Bochman et al. 2012; Sharma 2011) that would probably interfere with the end-cloning process. In fact, in silico analysis using QGRS Mapper showed that the End1 (but not End2) sequence could form G-quartet structures (Kikin et al. 2006). It is for this reason that we avoided high ionic strength in the buffer used to lyse the plastids in-gel.

Fig. 1
figure 1

Pulsed-field gel electrophoresis of maize and tobacco ptDNAs. Maize and tobacco ptDNAs were prepared by in-gel lysis of plastids and fractionated by pulsed-field gel electrophoresis (PFGE). Prior to PFGE, the in-gel ptDNA was treated with end repair enzymes, followed by ligation to EcoRV-digested pBluescript. After PFGE, the well-bound and monomer fractions were excised (corresponds to Steps 14 in Fig. 2). Lane 1 Tobacco ptDNA from mature leaves of greenhouse-grown plants. The red arrows indicate the position of the 155-kb genomic monomer and dimer bands. Lane 2 Maize ptDNA from 11-day seedlings grown in the dark. Most of the ptDNA is found as multigenomic branched-linear molecules and is retained in the well-bound fraction. Lanes 3 and 5 Maize ptDNA from seedlings grown for 7 days in light followed by 1 day in dark. The blue arrow indicates the position of the 140-kb genomic monomer band, but most of the ptDNA is found as a smear of less-than-genome-size fragments. The original gel images are presented for Lanes 14, and in Lane 3 the maize ptDNA monomer band is very faint and may not be obvious in this gel image, but was easily discernible by eye. Therefore, to enhance visualization of the maize ptDNA monomer band, Adobe Photoshop was used to uniformly adjust the brightness and contrast for the entire image (Lanes 5, 6). Lanes 14 are from the same gel, so that the DNA size standards shown in Lane 4 apply to both the left and middle gel panels. Lanes 4 and 6 DNA size standards: lambda DNA concatemers. Compression zone cz, ethidium bromide EtBr. There are four main regions recognized following fractionation of ptDNA by PFGE: the well-bound fraction contains complex DNA molecules unable to migrate through the gel matrix; supercoiled circular molecules would migrate to a region between the well and the cz (this region is “forbidden” to linear molecules); the cz fraction is comprised of linear molecules that are not size-separated; and in the region below the cz, linear molecules are size-separated where the maximum separated length of DNA depends on the electrophoresis conditions (Bendich and Smith 1990; Bendich 1996). For ptDNAs the linear region typically shows one or more bands representing an oligomeric series of unit-genome linear molecules (monomer, dimer, etc) and a smear of less than genome-sized linear molecules (Bendich and Smith 1990; Deng et al. 1989; Oldenburg and Bendich 2004b; Oldenburg et al. 2006). Imaging of EtBr-stained maize ptDNA (DNA movies) from the well-bound fraction reveals that most molecules consist of branched-linear forms with <4 % in open circular forms (Oldenburg and Bendich 2004a, b). The complex forms are also found for ptDNAs from tobacco and other plants (Bendich and Smith 1990; Rowan et al. 2004; Shaver et al. 2006). Note that, as is typical for PFGE fractionation of ptDNAs, for the maize and tobacco preparations shown here, there are no bands of ptDNA in the expected migration zone for the supercoiled circular form (Circular molecules region). The faint band below the cz in Lanes 1, 3, and 5 is probably the religated relaxed-circular form of the pBluescript plasmid (~3 kb)

The ptDNA ends were ligated to a linearized plasmid, either prior to or after fractionation of in-gel preparations of intact maize ptDNA molecules by pulsed-field gel electrophoresis (PFGE). The in-gel ptDNA was treated with end repair mixture (T4 DNA polymerase Klenow fragment and dNTPs only or with the addition of T4 polynucleotide kinase and ATP) and then ligated to a blunt-end linearized plasmid (pBluescriptKSII+ digested with EcoRV). The in-gel plasmid-ligated ptDNA was fractionated by PFGE, and the well-bound fraction (multigenomic, branched-linear form) and linear monomer fraction were excised from the gel. The next step was to digest the plasmid-ligated-ptDNA with a restriction enzyme (BamHI or SacI) to give ends compatible within the plasmid polylinker and the ptDNA, followed by ligation of the compatible ends. The plasmid-ptDNA ligation mixture was used for Escherichia coli transformation followed by blue/white colony screening, selection of white colonies and isolation of plasmids. Plasmid preparations were assessed for the ptDNA End insert by restriction enzyme digestion and agarose gel electrophoresis. Sequencing of the ptDNA End insert from both directions was performed using universal primers [M13(-20) forward and M13 reverse] (GeneWiz, Seattle, Washington USA). This end-cloning process was performed using ptDNA from several different maize ptDNA preparations and on both branched-linear (well-bound) and linear monomer molecules. The terminal sequences were identified by alignment to the maize plastid genome sequence [GenBank: NC_001666] using MacVector.

The similarity to the maize End sequences was assessed using Align-to-Reference and ClustalW features of MacVector for the plastid genomes given in Fig. 2 and in Figs. S1–S4 and Tables S1–S4 of Online Resource 1. For Align-to-reference the settings were: residue scoring match = 2, mismatch = −3, ambiguous match = 0, gap penalty = 4 and alignment parameters hash value = 4, sensitivity = 4, score threshold = 50, X dropoff = 15. For ClustalW the settings were: gap penalty = 5, window = 4, open gap penalty = 10, extend gap penalty = 5, delay divergent = 40 %, and weighed transitions. The Align-to-Reference tool, used to compare the cloned-end sequencing reads to the maize plastid genome reference sequence, provides directional information about the ptDNA End insert indicating that all five End1 inserts were from IRb, whereas one of the End2 inserts was from IRb and the other two were from IRa.

Fig. 2
figure 2

Process used to clone and sequence ends of maize ptDNA. All ptDNA preparation and modification steps were in-gel to avoid shearing of the large molecules (unit-genome-size monomer and branched multigenomic forms). Step 1 digest plasmid pBluescriptIIKS+ with EcoRV to give blunt ends, treat with alkaline phosphatase (to reduce plasmid religation), and heat-inactivate enzymes. Polylinker is shown in red. The red arrows indicate the location and direction for the primer binding sites, M13 (-20) forward [M13F] and M13 reverse [M13R]. S SacI, N NotI, B BamHI, E EcoRV, K KpnI. Step 2 treat in-gel ptDNA with end repair mix. The dark and light gray arrows indicate two isomers with different ends. Step 3 ligation using T4 DNA ligase to join linearized pBluescript to ends of ptDNA. For clarity, the genome dimers are not depicted. Step 4 PFGE fractionation and excision of well-bound fraction and monomer band (Fig. 1). This step also removes excess plasmid not ligated to the ptDNA. Step 5 in-gel restriction digestion to create compatible ends between pBluescript and ptDNA, followed by proteinase K (PK) treatment and PMSF to inactivate PK. Digestion was performed with BamHI and the number of BamHI sites in the maize plastid genome is 64 (represented by vertical lines). Step 6 T4 DNA ligase used to create pBluescript-ptDNA End plasmids. Step 7 transformation of E. coli with the pBluescript-ptDNA End ligation reaction, followed by blue/white selection, plasmid isolation from single colonies, and sequencing. Restriction digestion of the plasmids was also performed to show the presence or absence of the ptDNA End insert. The ptDNA inserts were sequenced using both M13F and M13R. The sequence adjacent to the M13R primer corresponds to the ptDNA End. The ptDNA and plasmid are not drawn to scale; the size of the plasmid (~3 kb) is exaggerated relative to the 140-kb maize plastid genome size

The cumulative GC skew (Grigoriev 1998) profiles were generated with window size 500 and step size 50, using http://gcat.davidson.edu/dgpb/gc_skew/. Both the GC for each 500-nt window [(G−C)/(G+C)], and the cumulative GC [sum of (G−C)/(G+C)] are shown in each skew diagram. The GC skew profiles are shown in Fig. 5 for maize and in Online Resource 2 for other species. The secondary stem-loop structures were produced using Mfold for DNA (Zuker 2003), http://www.bioinfo.rpi.edu/applications/Mfold. The predicted secondary structures and the folding energies (dG values) are shown in Online Resource 4.

GenBank accession nos. for plastid genomes: Arabidopsis thaliana L. NC_000932; Chlamydomonas reinhardtii P.A.Dang. BK0000554; Euglena gracilis NC_001603; Marchantia polymorpha L. (liverwort); X04465; Medicago truncatula Gaertn. (barrel medic) NC_003119; Mesostigma viride Lauterborn. NC_002186; Nicotiana tabacum L. (tobacco) NC_001879; Oryza nivana S.D.Sharma and Shastry. (wild rice) AP0062728; Oryza sativa L. indica (Indica rice) AY522329; Oryza sativa L. japonica (Japonica rice) X15901; Pisum sativum L. (pea) NC_014057; Spinacia oleracea L. (spinach) NC_002202; Triticum aestivum L. (wheat) NC_002762; Zea mays L. (maize) NC_001666. GenBank accession no. for viral genome of Herpes simplex Z896099.

Results and discussion

Total ptDNA of maize is found in three major molecular forms (multigenomic branched complexes, linear unit-genome-sized monomers, and linear fragments of subgenomic size) that can be separated by pulsed-field gel electrophoresis (PFGE) (Fig. 1). We previously reported that both the complex forms (well-bound fraction) and monomers have the same defined ends as determined by restriction digestion and PFGE (Oldenburg and Bendich 2004a), whereas the subgenomic fragments end at any point along the 140-kb genome sequence. Furthermore, simple, linear molecules comprise >90 % of all ptDNA molecules from light-grown maize seedling, most of which are subgenomic fragments (Lanes 3 and 5 in Fig. 1 and references Oldenburg and Bendich 2004a, b). Therefore, in a preparation of total ptDNA, the number of moles of random ends greatly outnumbers the moles of defined ends found in the complex and monomer molecules. To ensure recovery and identification of the terminal regions, we used an end-cloning procedure.

In-gel prepared total maize ptDNA was blunt-end ligated to a linearized plasmid followed by PFGE fractionation and excision of the multigenomic complex (well-bound ptDNA) and genomic monomer fractions (Fig. 1). Following this end-cloning process (Fig. 2), nine plasmids with ptDNA inserts were recovered and sequenced (Table 1). Two major terminal regions, End1 and End2, were identified by alignment to the maize plastid genome. An additional minor terminal region was also indicated, but needs further characterization.

Table 1 Summary of results from plasmid sequencing

The ptDNA end-cloning process outlined in Fig. 2 may not be very efficient as indicated by the low number (9) of plasmids with end sequences compared to the total (85) analyzed by restriction digestion. Two factors may contribute to this inefficiency. (1) It is necessary to perform all of the end-cloning steps in-gel. The ptDNA end ligation (Step 3 in Fig. 2) requires diffusion of the EcoRV-linearized pBluescript plasmid into the gel to encounter a ptDNA end. Although excess non-end-ligated plasmid would be removed in the next step (PFGE fractionation), some plasmid could be re-ligated around the gel matrix and become trapped in the well-bound fraction. (2) The end-ligation step was performed using total in-gel ptDNA, leading to competition between plasmid ligation to the defined ends of the monomer and branched complex forms and ligation to the more numerous random ends of the degraded, subgenomic ptDNA fragments. Nonetheless, the ptDNA end-cloning process yielded eight plasmids containing the End1 and End2 inserts and one plasmid with an End3 insert (Table 1).

The locations of both End1 and End2 are within the IRs, leading to four genomic isomers (Fig. 3). The position of End1 was determined as nt 94,973 in IRb and 127,764 in IRa, and is located in the intergenic region between the trnV and 16S rRNA genes on the maize map, and in fact aligns with the 16S rRNA promoter. In addition, End1 is located 5′-upstream of the origin of replication oriB2 in IRb. The position of End2 was determined as nt 93,859 in IRb and 128,877 in IRa; it is located in the intergenic region between the rps12 and orf85 genes and is 3′-downstream of oriA2 in IRb. Previously, restriction fragment and blot hybridization results suggested that maize ptDNA consists of a population of molecular isomers with five different terminal regions, and mapping predicted two major isomers with ends at approximately 88,000 and 100,000 (map d and map c, respectively, Fig. 1 in reference Oldenburg and Bendich 2004a). In addition, the amount of each isomer appeared to vary in relative abundance based on the ethidium-DNA fluorescence intensity of the different restriction fragments. To determine whether the ends sequenced here correspond to map c or map d, an in silico analysis of the predicted fragments and hybridization patterns for End1 and End2 was performed (Table 2). For both End1 and End2, the smaller AscI fragment would be expected to hybridize to the rbcL and 3′rps12/rps7 probes, therefore correlating with map c (for map d, the smaller fragment hybridizes only to rbcL). This finding suggests that the End1 and End2 AscI fragments would co-migrate to the same position following PFGE fractionation and likely accounts for the higher ethidium-DNA fluorescence intensity found with the map c fragments. Although the end-cloning and sequencing process resulted in identification of the End1 and End2 isoforms, it is likely there are other end isomers (such as corresponding to map d) that were not found. Surprisingly, we were unable to clone the ends of tobacco ptDNA, despite starting with a much larger amount of the tobacco genomic monomer ptDNA (Lane 1 in Fig. 1). As described below, the different terminal structures for maize and tobacco ptDNAs probably account for our inability to ligate the linearized plasmid to the termini of tobacco ptDNA.

Fig. 3
figure 3

Maps of the maize plastid genome showing the terminal regions. The positions for the two sequenced ends, End1 and End2, and the origins of replication, oriA and oriB (yellow circles) are shown for the four genomic isomers of maize ptDNA. The long single copy (LSC) region is shown in dark blue, short single copy (SSC) in light blue, and the inverted repeats: IRb in dark green and IRa in light green. The scale at the bottom represents more than one-genome length of the 140-kb maize plastid DNA sequence. The vertical black arrow marks the position 140 438/1 between IRa and LSC, which is the ending/beginning given for the maize plastid genome reference sequence [GenBank: NC_001666]. The terminal positions are also indicated: nt 94 973 in IRb for End1-Isomer1; nt 127 764 in IRa for End1-Isomer2; nt 93 859 in IRb for End2; and nt 128 877 in IRa for End2-Isomer2. Scale bar is 10,000 nts

Table 2 Restriction fragments and blot hybridization for maize ptDNA

Sequence comparisons for the End1 and End2 regions were performed to assess similarity to other plastid genomes: four cereals, five dicots, one bryophyte, and two algal species. For each end, two sequences of 120 nts were used for comparison and these correspond to the regions located 3′-upstream and 5′-downstream of nt 94,973 for End1 and nt 93,859 for End2. These sequences are shown in Fig. 4, along with rice and wheat ptDNA alignments. Sequence alignments and similarities for all the species comparisons are shown in Online Resource 1. For the cereals, a high degree of similarity (≥97 % identity) was found with three of the maize plastid end sequences. Less similarity (51–57 %) was found with the fourth sequence (End2-5′IRb), although the terminal ~90 nts were highly similar. Three of the dicots (Arabidopsis, spinach, and tobacco) showed high sequence identity to the End1 regions (80–93 %) and End2-3′IRb (83–94 %), but again low similarity was found with End2-5′IRb (27–46 %). Slightly lower similarity was found in the two legumes (pea and barrel medic) for End1 (61–77 %) and End2-3′IRb (68–70 %). The lower similarity to End1-5′IRb is not surprising since a divergence in the 16S rRNA promoter sequence has been described for some legumes relative to other higher plants (Suzuki et al. 2003). For End2-5′IRb, 46–47 % similarity was found with pea and barrel medic. For liverwort, similarity was greatest with End1-3′IRb (74 %), with lower similarity (40–53 %) for the other three regions. For the algae, similarity was only 38–56 % for all four sequences. These results suggest that the termini of linear molecules are highly conserved among cereals. Although similarity to the maize End sequences was found for the five dicots, it is unclear from sequence comparisons alone whether these truly represent terminal regions.

Fig. 4
figure 4

Sequences for End1 and End2 regions of maize ptDNA and comparisons to rice and wheat ptDNAs. Maize ptDNA terminal 120-nt sequences and alignments to predicted ends for rice (Oryza sativa japonica) and wheat (Triticum aestivum): a End1-5′IRb, nt 94,973–95,092; b End1-3′IRb, nt 94,853–94,972; c End2-5′IRb, nt 93,859–93 978; d End2-3′IRb, nt 93,739–93 858. The nt (nucleotide) values correspond to maize plastid genome [GenBank: NC_001666]

DNA strand asymmetry exists when the leading strand of DNA replication contains more G than C as compared to the lagging strand. Such asymmetry has been analyzed using GC skew algorithms to provide information about DNA replication mechanisms for bacterial and organellar genomes (Gerhold et al. 2014; Grigoriev 1998; Xia 2012). We therefore generated GC skew plots for the plastid genomes of maize (Fig. 5a, b shows IRb) and other plant species and HSV DNA, previously considered as a model for ptDNA replication (Oldenburg and Bendich 2004a) (see Online Resource 2).

Fig. 5
figure 5

GC skew plots for the maize plastid genome. a Maize plastid genome analyzed using GC skew (Grigoriev 1998). Dark green line along abscissa is IRb; light green is IRa. b Expanded view for IRb. Red line is GC skew for each 500-nt window; blue line is cumulative GC skew. Positions indicated by: black arrows (End1, End2) and yellow arrows (oriA, oriB, oriC). OriA and oriB are analogous to origins mapped in Oenothera species and oriC in Chlamydomonas reinhardtii

Cumulative GC skew plots were originally used to predict ori and termination (ter) sites for circular bacterial chromosomes, showing the ori at a single minimum, the ter at a single maximum, and assuming bidirectional replication (Grigoriev 1998; Xia 2012). Subsequently, more complex GC plots (with multiple peaks and valleys) were found for archaebacterial chromosomes where multiple origins were identified (Xia 2012). In addition, for the linear mitochondrial genome of the yeast Candida parapsilosis, the GC skew profile showed local minima near both ends, suggesting the ends function as oris for a recombination-dependent replication (RDR) mechanism (Gerhold et al. 2014). The GC skew plots for the plastid genomes of the protist Euglena gracilis (Morton 1999) (see Online Resource 2) and two green algal species (Belanger et al. 2006; Brouard et al. 2011) conform to the patterns found for E. coli, Bacillus subtilis and other circular-mapping bacterial genomes. Thus, a similar profile would be expected if maize ptDNA existed as a circular chromosome and employed a D-loop-to-bidirectional replication mechanism (theta or Cairns model) as initially proposed (Kolodner and Tewari 1975). The maize ptDNA GC plot, however, is complex with multiple minima and maxima, and the oris are at slight inflection points (bumps) with the two Ends at GC minima between oriA and oriB. The GC skew profiles for ptDNAs of rice and wheat are very similar to that of maize, with maize-like Ends at local minima between the two origins (see Online Resource 2). The same pattern is also found with the GC skew profiles for tobacco and barrel medic (see Online Resource 2). These findings refute the Cairns model in favor of the RDR mechanism proposed for linear ptDNA molecules (Bendich 2004; Marechal and Brisson 2010; Oldenburg and Bendich 2009). Furthermore, the GC plot for HSV DNA is also complex (see Online Resource 2), and this viral genome has other features similar to maize ptDNA: a linear molecule ~150-kb in size with single-copy regions flanked by inverted repeats, three oris, and the ends are found in the IR regions (Oldenburg and Bendich 2004a; Weller and Sawitzke 2014). A specific type of RDR mechanism, single-strand annealing (SSA), was proposed for HSV DNA replication (Weller and Sawitzke 2014). SSA may initiate replication, produce concatemers, and generate branched-linear multigenomic forms—all features shared by HSV and ptDNA. We propose that ptDNA replication involves activation of an internal ori within a linear genomic monomer and strand-invasion of an end to generate a replication fork (Bendich 2004; Oldenburg and Bendich 2009).

The terminal regions were assessed for sequence or structural properties that could be important for ptDNA function and integrity. The susceptibility of linear DNAs to exonuclease digestion can be used to infer whether the ends are “open” or “blocked”, such as with a 5′-protein or hairpin loop. Resistance to exonuclease digestion may be affected by the methods used for preparing in-gel DNA, but this property can also yield information about the terminal structure (Oldenburg and Bendich 2001). Maize ptDNA was prepared in-gel using four different lysis conditions: low or high EDTA and with or without proteinase K before treatment with λ exonuclease or Exonuclease III and fractionation by PFGE (see Online Resource 3). Maize ptDNA was digested by both exonucleases under all four lysis conditions (Table 3), indicating that the ends are open. In contrast, for ptDNA from tobacco, exonuclease sensitivity was dependent on the lysis conditions, suggesting a protein at the 5′-end and an open 3′-end, as previously proposed for liverwort mitochondrial DNA (Oldenburg and Bendich 2001). The amount of ptDNA first increases and then decreases during leaf development, although the magnitude of the decline is greater for maize than tobacco (Oldenburg et al. 2014; Shaver et al. 2006). The greater stability and integrity of tobacco ptDNA in mature leaf tissue may, in part, be due to protection at the ends of the linear molecules, whereas the unprotected maize ptDNA would be more susceptible to degradation in vivo. This blockage at the ends of tobacco ptDNA would also account for our inability to clone and sequence these ends using the same methods as used here for maize ptDNA.

Table 3 Exonuclease digestion of maize and tobacco ptDNA and liverwort mtDNA

Secondary DNA structures, including stem-loop and cruciform, have been associated with DNA replication by recruitment of replication proteins and activation of oris (Boulikas 1996; Pearson et al. 1996), with HSV DNA as a relevant example (Muylaert et al. 2011; Weller and Sawitzke 2014). To determine whether the ends of maize ptDNA exhibit such structural motifs, we assessed the potential to form secondary structures within a single strand of DNA at the terminal regions using Mfold for DNA (Zuker 2003) (see Online Resource 4). For End1-5′IRb, four stem-loops were predicted, with the ‘−35’ and ‘−10’ elements of the 16S rRNA promoter (Suzuki et al. 2003) located at the base of two of the stem-loops. The complex structure of End1-3′IRb corresponds to that expected for folding of the trnV gene. Three small stem-loops were predicted for End2-5′IRb, and for End2-3′IRb one large and three small stem-loops were predicted. Similar patterns of stem-loops were also predicted for the analogous end sequences in ptDNA of rice and wheat (see Online Resource 4). Although the types of predicted stem-loop structures differed between the maize ptDNA 120-nt sequences of End1 and End2, such terminal structures may function in ori activation and/or RDR. The propensity for stem-loop formation by short IR sequences, close proximity to rRNA genes, and AT-richness are all features associated with DNA replication origins in viruses, bacteria, and the eukaryotic nucleus (Aslani et al. 2000; Boulikas 1996; Pearson et al. 1996), and these features are also found in plastid genomes (Heinhorst and Cannon 1993). Interestingly, maize End1-5′IRb is adjacent to the 16S rRNA gene, and End2-5′IRb is AT-rich (A+T = 82 %, whereas it is ~50 % for the other three 120-nt end sequences). For HSV, the UL9 origin-binding protein (OBP) association with the stem-loop promotes opening of its AT-rich segment to allow access for other DNA replication proteins (Aslani et al. 2000; Muylaert et al. 2011; Weller and Coen 2012). One of these, ICP8, is a single-strand binding protein (SSBP) that has been implicated in the SSA replication process that generates branched, multigenomic molecules (Weller and Sawitzke 2014). Proteins involved in ptDNA (and plant mtDNA) replication and recombination include TWINKLE helicase, SSBPs WHY and RPA, and an OBP (Deng et al. 2015; Lassen et al. 2011; Majeran et al. 2012; Marechal and Brisson 2010; Moriyama and Sato 2014; Sakaguchi et al. 2009). A replication mechanism analogous to that of HSV may operate for maize ptDNA, whereby an OBP interacts with an internal or end-located ori to facilitate DNA unwinding and binding of SSBP to promote SSA replication (Oldenburg and Bendich 2015).

Conclusions

For years the plastid chromosome was modeled as a unit-genome-sized circle and proposed to replicate by D-loop-to-theta and rolling-circle mechanisms. Similarly, a circular replication model was initially suggested for the linear genome of HSV, although its branched-linear replicative forms are more likely the product of the recently proposed single-strand-annealing RDR mechanism (Weller and Sawitzke 2014). For maize ptDNA, a linear structure is now established by sequencing and characterization of two terminal regions, and an end-mediated recombination-dependent mode of ptDNA replication is proposed.

Nevertheless, our understanding of ptDNA replication and maintenance is still minimal (Marechal and Brisson 2010; Oldenburg and Bendich 2015) and progress toward both objectives could be facilitated by focusing on the ends of ptDNA molecules, end-binding proteins (Majeran et al. 2012), and the roles of recombination and strand-invasion in ptDNA replication. The introduction of DNA into the plastid genome has led to agronomic improvement for some crop plants, although plastid transformation for cereals has not yet been successful (Clarke et al. 2011; Hanson et al. 2013; Maliga and Bock 2011). Designing transformation vectors with maize ptDNA terminal regions may increase the chances for success (Bendich and Oldenburg 2013; Oldenburg and Bendich 2009).