Introduction

Ever since the first physical map of chloroplast DNA (maize 1976) and the elucidation of the first full-length chloroplast nucleotide sequence (Nicotiana 1986), the chloroplast genome has become increasingly valuable for sketching out the evolution of photosynthetic taxa, especially in land plants (Clegg 1993; Chu et al. 2003; Sugiura 2003; Givnish et al. 2010). Studying chloroplast DNA offers many advantages over other genomes for a variety of reasons including small size and high copy number (Raubeson and Jansen 2005). Much of the phylogenetic significance of chloroplast sequence data is based on the finding that all chloroplasts derive from a common ancestor and thus are monophyletic (Bhattacharya and Medlin 1995). The common ancestor is the endosymbiont (Margulis 1981) that evolved into the chloroplast organelle, and its genome has undergone massive gene number reduction from several thousand protein-encoding genes (as seen in the putative modern day ancestral relatives, cyanobacteria) down to between 50 and 250 genes (Melkonian 2001; Martin 2003). It appears that there existed a common subset of genes in all chloroplasts with probable positive selection acting to sustain their usage in modern organisms (Palmer 1985; Barbrook et al. 2006). The conservation of this subset of genes as well as a slow rate of nucleotide substitution relative to either nuclear or mitochondrial genomes are significant because they lend themselves well to evolutionary studies (Wolfe et al. 1987).

Molecular rearrangements in the chloroplast including insertions, deletions, inversions, and transpositions are important because their fixations in the genome occur rarely throughout evolution (Downie and Palmer 1992; Lee et al. 2007). Magee et al. (2010) examined 103 different genomes and found that 27 genes have been lost in at least one lineage. As these events can be marked, the placement of taxa into two groups according to the presence or absence of the rearrangement may define a monophyletic origin. A great opportunity to follow through with defining a monophyletic lineage was undertaken here using loss events of the chloroplast gene acetyl-CoA carboxylase subunit D (accD) as the genome marker. In this paper, accD will always refer to the chloroplast encoded gene.

The accD gene encodes one of four subunits of the acetyl-CoA carboxylase enzyme of the type found in prokaryotes and in most chloroplasts. Acetyl-CoA carboxylase (ACC) is a rate-limiting, catalytic enzyme leading to the formation of malonyl-CoA from acetyl-CoA, the first committed step in fatty acid synthesis (Schulte et al. 1997; Cronan and Waldrop 2002).

The accD gene has been lost either partially or completely from certain monocots (i.e., some members of the Poales and Acoraceae) (Katayama and Ogihara 1996). A multifunctional, nuclear-encoded ACC enzyme has replaced the plastidic enzyme inside the chloroplasts of these organismal groups (Sasaki and Nagano 2004; Cai et al. 2008). Complete sequencing of the nuclear and plastid genomes of some Poaceae species has provided direct evidence of chloroplast accD loss in this family, some of the most derived of all monocots (Chase et al. 2000).

Though the accD gene appears to have been lost in five distinct lineages in dicots, in other dicot lineages, accD has been shown to be essential (Jansen et al. 2007; Magee et al. 2010). For example, in tobacco (a dicot) knockout studies show that accD is essential for proper leaf development and maintenance of plastid structure (Kode et al. 2005). In Arabidopsis, accD was found to be essential for fatty acid biosynthesis and results in embryo lethality if non-functional (Bryant et al. 2011). The gene is even preserved in the extremely reduced plastid genomes of the non-photosynthetic, parasitic plant Epifagus virginiana (a dicot) (Kode et al. 2005 ), and in the underground orchid Rhizanthella gardneri (a monocot) (Delannoy et al. 2011) revealing its probable importance to these organisms.

Martin et al. (1998) and Lee et al. (2007) showed that the accD gene was lost independently in at least five chloroplast genomes, Cyanophora (a glaucophyte sister to red and green algae), Odontella (a diatom in the Stramenopiles sister to the Alveolates), Euglena (protozoa Euglenozoa sister to the Heterolobosea), Oleaceae (a eudicot), and Zea (a monocot). However, more systematic sampling is necessary to discern whether each loss event was a single occurrence defining a respective lineage or if accD was lost multiple times in each lineage.

This study sought to uncover the point of loss of the accD gene in the Poaceae through more systematic sampling near the putative loss point. Southern hybridization work by Katayama and Ogihara (1996) supported the point of loss of accD in monocots in the progenitor of Poales and Cyperales (based on familial relationships according to Dahlgren et al. 1985). Their data showed no hybridization of an accD probe in 30 species of Poaceae as well as no signal for many species of monocot families very closely related to Poaceae (Katayama and Ogihara 1996). Based on this study alone, the point of accD gene loss appeared to have occurred somewhere before the divergence of Poales from the Commelinales. Another study by Konishi et al. (1996) produced data conflicting with that of Katayama and Ogihara (1996) showing the presence of accD in families Eriocaulaceae, Cyperaceae, and Juncaceae. Our data are consistent with Konishi et al. (1996) but refines the region of loss to a later point of divergence among the Poales.

Most of the monocot families lying in the probable vicinity of the accD loss have no reported gene sequences or are known only from partial sequence data. Therefore, the study presented here was designed to sample more extensively in this phylogenetic neighborhood to clarify the gene loss point in the Poaceae by using Southern hybridization techniques supported by sequence data analysis. As this region of the chloroplast genome (between ribulose–bisphosphate carboxylase (rbcL) and photosystem I (psaI) sandwiching accD) seems to be prone to mutation, it has been dubbed a “hotspot” (Morton and Clegg 1993; Maier et al. 1995; Ogihara et al. 2002). Taking a more detailed look at this region in members related to Poaceae may help define both loss point(s) and potential rearrangement events.

Materials and Methods

Chloroplast Isolation and DNA Extraction

Choice of species for DNA extraction was designed to include monocot families thought to border the putative point of loss of the accD gene and included 24 representative species (Electronic Supplementary Materials (ESM), Table 1). Living specimens were obtained for 23 taxa. Data for taxon, Joinvillea, were performed using total DNA extractions donated by Dr. Cliff Morden of University of Hawaii at Manoa and by Dr. Mel Duvall of Northern Illinois University. All living specimens were maintained at the CSUN Botanic Gardens. DNA was processed from 24 monocot species representing 6 orders and 17 families (APG 2009) (ESM, Table 1). Eleven species from ten families were sampled across the Poales, and six species were sampled across the Commelinales including five from the family Commelinaceae, two from each major subfamily and one basal representative from the family Haemodoraceae (Burns et al. 2011).

Between 1and 12 g of leaf tissue were collected as starting material. Fresh material was processed within 30 min of collecting. Isolation of chloroplasts was performed by modification of the protocol by Nobel (1967) with filtered 0.4 M sucrose–Tris–Cl buffer (pH 7.5). Sterile techniques were utilized to avoid bacterial genome contamination as some carry an accD gene. After chloroplast isolation, each sample was scanned by light microscopy to verify presence of chloroplasts. This homogenate was filtered and centrifuged to remove whole cells and cellular debris. The remaining suspension of chloroplasts was used in DNA extraction using the QiagenTM DNeasy Plant Mini Kit. The concentration and purity of extracted DNA were assessed by spectrophotometry.

Polymerase Chain Reaction and Sequencing

Polymerase chain reaction (PCR) amplification primers were designed against conserved portions of the rbcL and accD gene sequences. Primer sequences are shown in ESM Table 2 and priming sites are diagramed in ESM Fig. 4. The rbcL primers were designed for this study by our lab as a control and yielded a product of ~673 bp. Various primers were designed to amplify not only regions within accD but also regions up- and downstream ranging from rbcL to ycf4. PCR amplifications were carried out using a gradient thermocycler (Eppendorf Mastercycler Gradient). Reaction components were generally as follows: both buffer and Taq polymerase from Promega, 2 mM MgCl, 0.05–1.0 μg DNA, 0.42–1.7 μM primers, 0.125 mM dNTPs, and 2 % DMSO. Cycler program was 1 cycle for 2 min at 94°C, 45 cycles (30 s at 94°C, 45 s at 47–55°C, 45 s-4 min at 72°C), and 1 cycle for 7 min at 72°C. PCR products were separated by electrophoresis on 0.8 % TAE agarose.

Products were subcloned using the TOPO TA® Cloning Kit with One Shot® chemically competent Top10 cells (Invitrogen). Plasmids were isolated using Wizard® Plus SV Minipreps DNA Purification System (Promega, no. 1460) and presence of an insert verified by restriction enzyme digestion and gel electrophoresis visualization. Inserts were amplified using M13 primers and sequenced with BigDye™ v.3.1 chemistry on an ABI 377 DNA Sequencer platform located at the CSUN DNA Sequencing Facility. Some PCR products were sequenced directly without the cloning step following gel purification using the MinElute kit (Qiagen, no. 28606).

When internal accD primers failed to yield a product, a larger overlapping region was amplified and product size analysis and/or sequencing was utilized to discern one of three conditions; the presence of the accD gene, the partial presence (i.e., similarity to the truncated rice-like accD pseudogene), or the total absence of the gene.

Sequence Analysis

Identification of the sequences generated in this study was confirmed by BLAST analysis (Altschul et al. 1990). These new sequences were then aligned with previously sequenced genes downloaded from the NCBI database (ESM, Table 1). Sequences for both genes (accD and rbcL) were concatenated and aligned by ClustalW (Thompson et al. 1994) using the free software BioEdit version 7.0.0 (Hall 1999) and edited by eye. The data matrix consisted of 1,164 nucleotides for 43 species (39 sequences for accD (527 bp + 14 binary) and 43 sequences for rbcL (637 bp)). Of the four missing accD sequences, one was coded as a deletion verified by sequence data and the others were conservatively coded as missing data because no PCR products were obtained in repeated attempts preventing us from independently verifying the absence of the gene in these species. Gaps were coded as binary data at the end of each sequence block for Bayesian analysis. Two approaches were taken for sequence analysis: maximum likelihood using PAUP*4.0 (Swofford 2002) and Bayesian inference using MrBayes 3.1.2 (Ronquist and Huelsenbeck 2003; Huelsenbeck and Ronquist 2005). The bootstrapping technique was applied using PAUP with “fast” stepwise addition for 500 replicates to test node support. For Bayesian analysis, 3 million generations were run and best gene tree was constructed minus 25 % burning (Fig. 3).

Southern Blotting

Approximately 10-μg template DNA from each of 21 species was digested with either EcoRI or HindIII. Digestions were cleaned and concentrated by ethanol precipitation and then run on a 0.7 % TAE agarose gel. Blotting was done by upward capillary alkaline (NaOH) transfer onto a positively charged nylon membrane (Boehringer Mannheim, no. 1417240) and set by baking 1 h at 99°C.

To produce DNA clean enough to be labeled for use as probes for Southern blotting, PCR products were obtained from amplification of Callisia fragrans (Commelinaceae) DNA using the accDf2′/R primers and amplification of Dichorisandra thyrsiflora (Commelinaceae) DNA using rbcLR1/F1 primers. These products were then subcloned, extracted, and purified by gel extraction with a MinElute gel extraction kit (Qiagen, no. 28606). The purified products were quantified by spectrophotometry. This probe template was labeled with biotin in a one-step process using the LabelIT kit (MIRUS, MIR3400). Hybridization and detection of genomic DNA were performed with the North2South® Chemiluminescent Hybridization and Detection Kit (Pierce, no. 17097). Hybridization time varied between 8 and 12 h. Stringency washes consisted of 0.1 % SDS/ 0.1× SSC and 10–30 % formamide at 65°C. Blot exposure to X-ray film (Kodak X-Omat FS-1) varied from 10 min to 1 h. Blots were first probed with the accD probe then stripped and re-probed with the rbcL probe. Stripping was accomplished by incubation at 37°C with gentle shaking in 0.2 M NaOH/0.1 % SDS solution for 30 min. Blots were stored at 4°C in 2× SSC between hybridizations.

Results

Evidence for the presence/absence of the accD gene was shown by PCR amplification, sequencing and/or by Southern blot hybridization (Fig. 1). AccD in these species lies in a chloroplast segment between the rbcL (ribulose carboxylase large subunit) and the ycf4 gene (an essential gene for photosystem I assembly) (Ozawa et al. 2009). The order of genes in this region of the chloroplast is rbcL, rpl23, ORF133, accD, psaI, and ycf4 (Fig. 2). Variations in this pattern were used in developing a model for accD loss (see “Discussion”).

Fig. 1
figure 1

Results of Southern blot, PCR and sequence analyses conducted in this study and used to determine accD gene status. Boxes containing minus sign (−) denote no PCR band/no sequence of a PCR product or no Southern blot signal. Boxes containing plus sign (+) denote presence of a PCR product of expected size, product was sequenced or a positive signal in Southern blotting. Boxes containing letter x denote unreadable Southern blot data. Shaded boxes denote tests not performed

Fig. 2
figure 2

Overview of the accD gene presence in monocot families from Commelinaceae through Poaceae. Gene layout is not to scale. Numbers in parentheses for accD pseudogenes indicate known number of basepairs either published or from this study. Dashed lines indicate uncertain gene presence. The figure assumes the psaI and ycf4 genes are present in all families shown although we did not confirm this in all groups. The presence of a full copy of accD is assumed for Commelinaceae, Juncaceae, Eriocaulaceae, and Xyridaceae based on high identity mid-gene coverage data obtained in the current study. Numbers at nodes represent the ancestral states and are explained in text. The model tree presented to the left represents an amalgam of family relationship understanding from published analyses (Givnish et al. 2010; Morris and Duvall 2010) and the gene loss events portrayed here

Southern Blotting

Southern blot hybridization provided positive/negative scoring of taxa for both the accD and rbcL genes. Hybridization was positive for accD in all species tested except Joinvillea and Bambusa, consistent with our expectation that the gene is absent or truncated in these two groups (see “Polymerase Chain Reaction” section). Furthermore, the rbcL-positive control probe hybridized both of these species. Oryza and Bambusa carry only pseudogenes of accD (Hiratsuka et al. 1989; Wu et al. 2009) that differed in probe specificity. The accD probe was able to bind sufficiently to Oryza but could not detect the Bambusa pseudogene. The rbcL probe produced a much stronger signal in Oryza compared to Bambusa indicating that lack of signal may have been due to low concentration of Bambusa DNA.

Species that gave positive PCR and sequencing results for both genes also showed clear hybridization with the probes with few exceptions (Fig. 1). In these cases, gene sequencing was deemed sufficient to determine presence in these genomes. The Southern data for Chondropetalum, Typha, and Anigozanthos were not reliable and so are not considered here. Typha and Anigozanthos produced PCR products and accD sequences as expected. Chondropetalum, however, produced neither PCR product for accD in repeated amplification attempts (as evaluated by gel electrophoresis) nor sequence data of the gene. PCR with rbcL primers was the only test to give positive results for Chondropetalum indicating the presence of rbcL and that the DNA was reliable.

Polymerase Chain Reaction

Amplification of accD was attempted for all species and the product was successfully subcloned and sequenced for most. Taxa for which exhaustive PCR never resulted in the accD product were tested using rbcL primers to ensure that the DNA template was adequate for amplification. The rbcL primers produced a band of the expected size for all species attempted (Fig. 1). Most rbcL PCR products were not sequenced because the rbcL sequences for those species were available through Genbank. Taxa Baloskion, Chondropetalum (both of family Restionaceae), and Joinvillea (family Joinvilleaceae), which are closely related to Poaceae (APG 2009), never amplified accD in repeated attempts. All species producing an accD PCR product produced readable accD sequence except for Strelitzia nicolai (family Strelitziaceae, order Zingiberales), and there is no reason to suspect that the accD gene does not reside in Strelitzia.

Primers designed to amplify the sequence between rbcL and any of the following genes accD, psaI, or ycf4 worked well in both control species: C. fragrans (family Commelinaceae, contains whole accD gene) and Oryza sativa (family Poaceae, contains truncated, accD pseudogene). By comparison of test products to control products, we were able to distinguish between absent genes, putative pseudogenes, and probable full-length genes. Figure 2 shows a summary map of the rbcL to ycf4 gene region in the monocot families from the Commelinaceae through the Poaceae outlining gene status.

All monocot families outside of the Flagellariaceae possessed full-length accD copies. Although our accDf2′/accDR primers amplified only a part of the entire gene, both Eriocaulon and Xyris likely carry the complete accD gene based on the high level of conservation of the sequence data obtained for each species. Interestingly, while the relative positioning of genes from rbcL to ycf4 are the same as in later lineages rpl23 and ORF133 sequences present in many Poaceae are absent here. The gene region between rbcL and accD is known only from Typha (Typhaceae) (Jansen et al. 2007; Guisinger et al. 2010) but this region is assumed to be the same (except in Joinvilleaceae—see next) in this part of the lineage for all the Poales basal to the Poaceae. The rpl23 translocation has been studied by Katayama and Ogihara (1996) who report rpl23 only in the Poaceae; here, we add the novel presence of rpl23 adjacent to rbcL in the Joinvilleaceae.

Flagellaria (Flagellariaceae) carries a fragmented version of accD that is likely a pseudogene. In the sequence, between the primers accDf2′—accDR, Flagellaria contains one insertion of 403 nucleotides (nt) not found in any other species tested so far. In addition, there is an early stop codon in the coding region just before the insertion. Sequences homologous to rpl23 or ORF133 were not found immediately downstream of rbcL in Flagellaria. It should be noted that there was some sequence upstream of accD that did not get fully sequenced but based on size of the PCR product, it does not seem likely that these sequences are present.

Repeated attempts to amplify accD from representatives of the Restionaceae (Baloskion and Chondropetalum) produced no PCR products even though the rbcL gene was present and amplified. Southern blot data for these species yielded unusable results. We concluded from these efforts the likely absence of accD in the Restionaceae. While Baloskion and Chondropetalum produced rbcL PCR products and/or sequences, extensive PCR/sequencing covering the region between rbcL and ycf4 did not produce any sequences homologous to rpl23, ORF133, or psaI, a scenario similar to the missing accD. However, like Flagellaria, there remain some unsequenced sections.

In Joinvillea (Joinvilleaceae) complete sequencing across the region between rbcL and ycf4 showed a full-length rpl23 gene but no evidence for either ORF133 or accD not even a pseudogene. We conclude that accD is completely lost from this group.

The gene region spanning the accD in members of the Poaceae is more complicated. The purported basal subfamily Anomochlooideae (Givnish et al. 2010; Morris and Duvall 2010) in thorough sequence analysis showed the complete absence of rpl23, ORF133, and accD despite the fact that more derived members of the Poaceae possess either complete copies or fragments of these genes.

Sequencing Bambusa (Bambusoideae) DNA across the entire region between rbcL and ycf4 revealed a pseudogene of accD 415 bp in length. In addition, no sequence evidence for rpl23 or ORF133 was detected. Bambusa is part of the well-supported BEP clade (subfamilies Bambusoideae + Ehrhartoideae + Pooideae) (GPWG 2001) which is sister to the PACMAD clade (Panicoideae + Arundinoideae + Chloridoideae + Micrairoideae + Aristidoideae + Danthonioideae) within the Poaceae (originally, PACCMAD, Sanchez-Ken et al. 2007; Duvall et al. 2007; Sanchez-Ken and Clark 2010).

Comparison of published or submitted sequences in the rbcL and ycf4 region from members of the two other BEP subfamilies also showed the presence of truncated accD pseudogenes and variable presence of rpl23 and ORF133 genes. The accD pseudogene from Oryza (Ehrhartoideae) is similar to that in Bambusa but has experienced the loss of an additional 78-nt stretch in the middle of the fragment and a 20-nt stretch at the 3′ end of the fragment (Hiratsuka et al. 1989). Several representative members of the Pooideae (e.g., wheat, rye and barley) show variation in partial accD gene presence, ranging from 0-nt (absent) to a 349-nt pseudogene fragment (Ogihara et al. 2000, 2002; Aagesen et al. 2005; GI accession nos. 164453499 and 118201020). Within the BEP clade, the region upstream of the accD pseudogene region shows a wide range of conditions. Members of the Pooideae appear to have full-length rpl23 and ORF133 sequences whereas Oryza (Ehrhartoideae) bears a rpl23 pseudogene with a full-length ORF133.

Among the known sequences in members of the PACMAD clade, Eragrostis (Chloridoideae) has a known accD pseudogene (Kress and Erickson 2007) while members of the Arundinoideae and Panicoideae show no sign of accD (Chu et al. 2011). The rpl23 gene is present in most PACMAD groups but ORF133 seems to be absent in many.

Phylogenetic Trees

Phylogenetic analyses were conducted on the dataset with accD + rbcL sequences combined using two methods, maximum likelihood and Bayesian analysis yielding the phylogenetic tree shown in Fig. 3. The gene tree showed distinction of the outgroups (hornwort Anthoceros and whisk-fern Psilotum) and clustering of species more or less agreed with traditional taxonomy of angiosperms, including dicot and monocot groupings. Clade support values among the tree nodes are generally low (below 95 %) with some exceptions as noted. Low support was expected here given the small data set and the specific goal of examining accD relationships in the order Poales. The gymnosperm clade was well supported in both analyses and the magnoliid clade was conserved but not given significant support in either analysis. The separation between monocot and dicot clades was also preserved (Fig. 3). The Bayesian prior probabilities (only) support the delineation between the Liliopsida (monocots) and eudicots with the magnoliids basal to both. The order Poales separated as expected from the order Commelinales with Brahea (order Arecales) as basal to the Poales. The only exception to this was Juncus (a Poales) positioned inside the Commelinales group. Within the Poales, there are a few supported arrangements notably Joinvillea as basal to the PACMAD representative Eragrostis and the BEP clade (Calamagrostis, Thinopyrum, Oryza, and Bambusa). Our results place Chondropetalum (a Restionaceae) clustering outside of Joinvillea but inside Flagellaria, an unusual placement relative to published work (see “Discussion”) (Bremer 2002; Givnish et al. 2010).

Fig. 3
figure 3

Best tree constructed from dataset accD + rbcL by two phylogenetic estimation methods. Numbers near nodes indicate support (percent of 500 bootstrap replicates by Maximum Likelihood (top number) and posterior probability values based on 3 million generations sampled every 100th generation by Bayesian analysis (bottom number)). The solid line box encloses the dicot clade. The dashed box encloses species associated with accD gene loss (see “Discussion”)

Discussion

Gene Loss

While the ideal method to identify a loss point for accD would be to analyze the whole-genome chloroplast (cp) sequences of all the Poales, lack of whole-genome cp sequence data covering all Poales restricts this study to alternative methods including Southern blot analysis and PCR sequencing to determine loss events among a wide array of monocots. Our Southern blot results were intended as the first line of evidence for gene presence or absence. Negative Southern results using the accD probe were further investigated by PCR and verified by sequencing where possible of the accD gene and/or the region between the genes rbcL and psaI, flanking accD in 13 representatives of the Poales and six related plant orders (Fig. 1).

To our knowledge thus far, there are three reported occurrences of accD pseudogenes, all occurring in the Ehrhartoideae (Oryza, Potamophila and Microleane) (Nock et al. 2011). Here, we report an accD pseudogene in Bambusa, similar to that in Oryza but longer (Fig. 2). Also, in analyzing the deposited whole-chloroplast genome sequences from the BEP clade of Poaceae, the presence of the Oryza-type accD pseudogene was revealed in the Pooideae ranging from 0 bp in at least one species of Triticeae (Ogihara et al. 2002) up to 349 bp in Secale of the Triticeae (Aagesen et al. 2005) (Fig. 2). We also report the absence of accD from Restionaceae and Joinvilleaceae and the presence of an accD pseudogene in the Flagellariaceae. The only PACMAD representative known to have a portion of accD is Eragrostis (in sub-family Chloridoideae) which carries a 263-bp pseudogene (Kress and Erickson 2007).

Our data clarifies the two published reports for loss points of accD and points to a node of initial loss. Katayama and Ogihara (1996) placed the accD loss point prior to the divergence of the Poales. Konishi et al. (1996) found accD in Eriocaulaceae, Cyperaceae and Juncaceae. Here, we add accD gene presence to the Xyridaceae and partial gene presence in Flagellariaceae. Our new data combined with additional sequences from Genbank deposits have allowed us to propose a model for accD loss at a point in the evolution of Poales past the Eriocaulaceae/Xyridaceae split (see Bremer 2002 and Givnish et al. 2010).

Model of accD Loss

Our data show several important mutational event positions that inform us about accD loss as indicated by node numbers on the model presented on the left in Fig. 2. The common ancestor to the Flagellariaceae and later families in the monocots reveals that a first mutational event corresponding to a loss of function in accD occurred at node one. All known members past this point have no possible plastid accD function even if sequence is present. The next two nodes (2 and 3) lead to (in our representation) the families Restionaceae and Joinvilleaceae. These two groups (along with the Flagellariaceae) are clearly basal to the Poaceae (Bremer 2002; Givnish et al. 2010) but this position also represents a place of potential ambiguity. Our loss model moves the Restionaceae inside of the Flagellariaceae only for the purpose of defining the node of loss (discussed later). None of the sequence matching that of the accD probe was found in Joinvillea (Joinvilleaceae), Chondropetalum, and Baloskion (Restionaceae) by us, meaning that the ancestral pseudogene was lost in these lineages. These data correspond with phylogenetic data that clusters the Joinvilleaceae with the Poaceae, the grass family, and preserves the Graminid clade which also lacks the accD gene (APG 2009; Bremer 2002). Our phylogenetic analyses agree with this too (see below).

If we assume a functional loss of accD by insertion or truncation or both creating a pseudogene in accD in the common ancestor at node one, the variable presence of a pseudogene in the BEP clade and some members of the PACMAD clades (Chloridoideaea) is easily understood, and strongly supports the argument that the common ancestor at nodes 2–8 probably possessed a pseudogene as well. The extent by nucleotide count to which the accD gene is present in each of the three subfamilies of BEP infers gradual loss with the most advanced in Pooideae and the least lost in Bambusoideae with the retention of a very truncated version in one subfamily of PACMAD. This correlates with other findings that among the BEP, Pooideae has the closest relationship to the other major subclade of Poales, the PACMAD clade (GPWG 2000) most of which do not carry any recognizable remnants of accD.

This model would lead us to propose that the remnant pseudogene originating at node one and present through nodes 2–8 was subsequently lost several independent times along the lineages for the Restionaceae, Joinvilleaceae, and Anomochlooideae and the ancestor at node eight in the PACMAD clade. The alternative hypothesis to our model would necessarily invoke a complete loss in three lineages past node one (perhaps at node two) with a reinsertion of a pseudogene in the ancestor to the BEP clade with another event loss at node eight. While this may appear to be a viable option, the mechanism and source of a pseudogene reinsertion to us is problematic. Within the BEP clade and an early member of the PACMAD clade, we can see evidence of both gradual and complete loss of accD following three lineages of absence; very possible if all had an ancestral pseudogene. Our proposal here of an initial functional loss in the ancestor at node one followed by a varied pattern of complete and partial additional losses of sequences at later node points seems logical. We know that this region of the plastid genome is a hotspot for mutational activity (Ogihara et al. 1991; Morton and Clegg 1993; Maier et al. 1995) causing us to believe that multiple independent and sometimes confusing loss/rearrangement patterns might exist.

A confusion of independent losses occurs in the BEP/PACMAD clades of the Poaceae because the Ehrhartoideae include both the rpl23 and accD pseudogene sequences each interrupted by missing segments while Bambusoideae (sister group to the Ehrhartoideae) does not contain any of the rpl23 sequence but does contain a non-segmented and lengthier sequence of accD (Fig. 2). Triticum aestivum is the only member of the BEP (specifically Pooideae) found so far not to carry any sequence of the accD gene while Triticum monococcum is found to contain a truncated accD gene. The available data to date show all Pooideae species possessing rpl23. In addition, rpl23 exists in members of the PACMAD which (with one known exception) are completely missing accD. So, deletions in the BEP subfamilies (Fig. 2) may or may not have been independent events. Clearer understanding will depend on whether or not the Pooideae are sister to Bambusoideae and Ehrhartoideae or to the PACMAD, a yet uncertain relationship (Kellogg 2000; Saarela and Graham 2010).

The serial loss of accD in Poales described here shows similar characteristics to the serial loss of accD in the dicot tribe Jasmineae (Lee et al. 2007). In the Jasmineae, the 5′ and 3′ ends were lost before the middle of the gene. Our data suggest a similar scenario in bamboo, rice, the Pooideae, and the PACMAD, indicating that loss began at the extremities of the gene sequence (Fig. 2).

Phylogenetic Implications of the accD Loss Model

Our accD data were combined with rbcL data generated here and from database information to use in phylogenetic analysis thereby testing our loss model. Clearly, resolution of the true point of loss of the accD gene will require more substantial sampling than our dataset provides, however our tree results appear to support our model (Fig. 3).

Our phylogenetic analysis using the rbcL and accD data mirrors the model arrangement of basal monocot orders. Orders Asparagales, Zingiberales and Commelinales cluster in two loose groups but the Poales are clearly defined and are correctly derived from the Arecales (Givnish et al. 2010) where chloroplast gene order similar to the basal Commelinids is preserved (Khan et al. 2011). Our branching, not unexpectedly, follows the pattern of proposed loss of accD. This pattern reflects the possession of either a full-length gene, a pseudogene or absence of the gene. Only two oddities arise. The first is the positioning of Carex (a Cyperaceae) and Juncus (a Juncaceae) in sister families (Bremer 2002; Givnish et al. 2010) that both possess probable full-length accD genes. These two split to odd positions near Flagellariaceae (Carex) and within the Commelinales (Juncus). We can only assume that these positions are artifacts of sequence identity in the rbcL or accD genes. Because both of these species lie outside of the node point of initial accD loss, while positioned improperly, do not impact our refinement of the accD loss point in the Poales.

The second oddity in our tree is the relative position of the Flagellariaceae and Restionaceae (Fig. 3). Our data, tree, and model place Flagellariaceae outside of the Restionaceae whereas phylogenetic analyses using multiple plastidic gene data sets (Saarela and Graham 2010; Givnish et al. 2010) reverse these positions. We are not refuting that arrangement here. The data that switch these positions in our small data set almost certainly lies within the accD portion of the sequences which naturally would draw the Flagellariaceae pseudogene data closer to the full-length functional sequences in the other Poales over the lack of any accD sequence for the Restionaceae. Our tree positioning actually would be predicted by a model where the Flagellariaceae bears the closest relationship to the basal ancestor to both the restiid and graminids. We propose that ancestor as bearing the new accD pseudogene.

Loss of a gene is not always a reliable indication of phylogenetic relationships as seen in the apparent multiple independent losses of chloroplast genes rpl22, infA, rps16 (Jansen et al. 2006) and even the gene of our study, accD. The accD gene has been deleted independently in many chloroplast-carrying lineages other than the Poales (Martin et al. 2002; Chumley et al. 2006; Maier et al. 1995; Goremykin et al. 2005; Lee et al. 2007; Cosner et al. 2004). Also, independent partial deletions have occurred in non-angiosperms, such as in liverwort (a non-vascular plant), and black pine (gymnosperm). One explanation for the common occurrence of deletions is that accD lies in a region of high mutation rates, termed a “hot spot” (Ogihara et al. 1991; Morton and Clegg 1993; Maier et al. 1995). Its position in the chloroplast genome also appears to be associated with a rearrangement endpoint as found in the dicots Pelargonium, Trachelium, and the Lobeliaceae (Chumley et al. 2006). These findings support the probability of independent partial deletions among the Poales as described above.

As pointed out by the Angiosperm Phylogeny Group III (APG 2009), the order Poales, although accepted as being monophyletic, still lacks clarity of family interrelationships. For instance, Joinvilleaceae is tentatively found by APG III to be sister to Poaceae while Bremer (2002) found strong support for Ecdeiocoleaceae as sister. Our data suggest that Joinvilleaceae is closest to Poaceae among our set of tested families, which unfortunately did not include Ecdeiocoleaceae. Ecdeiocoleaceae, therefore, is a family that should be tested for the presence of the accD and rpl23 genes and may help in the resolution of the true sister to Poaceae.