Introduction

Arguably the most vexing outstanding question for the seed plant phylogeny remains the placement of Gnetales, a small group of gymnosperms currently circumscribed in three genera (Ephedra, Gnetum, and Welwitschia), despite intensive cladistic investigations over past two decades. Multiple permutations for the relationship of Gnetales to other extant lineages of spermatophytes (otherwise comprising Ginkgo, cycads, Pinaceae, cupressophyte conifers and flowering plants) have been proposed by different studies, using different types of data and/or different analytical methods (reviewed by Magallón and Sanderson 2002; Soltis et al. 2002; Burleigh and Mathews 2004, 2007a; Mathews 2009). However, because of their repeated recovery across many studies, six particular hypotheses on the placement of Gnetales stand out as the most frequently invoked ones (Fig. 1). A series of early cladistic analyses based on morphological data and including relevant fossil taxa suggested the placement of Gnetales close to angiosperms (Crane 1985; Doyle and Donoghue 1986; Loconte and Stevenson 1990; Nixon et al. 1994), in accordance with the “anthophyte” hypothesis (Wettstein 1907). Later, some molecular studies recovered the same results, albeit only with weak support (Stefanović et al. 1998; Rydin et al. 2002). However, most early analyses of molecular data resulted in so-called “gnetales-sister” hypotheses, featuring Gnetales either sister to the rest of gymnosperms (“gnetales-sister I”; Hasebe et al. 1992; Goremykin et al. 1996; Samigullin et al. 1999; Frolich and Parker 2000; Mathews and Donoghue 2000) or sister to the rest of seed plants, including angiosperms (“gnetales-sister II”; Hamby and Zimmer 1992; Albert et al. 1994; Rai et al. 2003, 2008). Subsequent analyses, based mainly on sequences from multiple genes and genomes, suggested a “gnetifer” hypothesis, in which Gnetales are found as sister to all conifers (Chaw et al. 1997, 2000). But one of the most surprising results of molecular plant systematics placed Gnetales within conifers, as sister to Pinaceae, in what became known as the “gnepine” hypothesis (Bowe et al. 2000; Chaw et al. 2000; Gugerli et al. 2001; Qiu et al. 2007). More recently, another hypothesis rendering conifers paraphyletic has gained some momentum. According to the “gnecup” hypothesis, Gnetales are also placed within conifers, but as sister to cupressophytes, not Pinaceae (see Nickrent et al. 2000; Rydin and Källersjö 2002; Doyle 2006; Chumley et al. 2008).

Fig. 1
figure 1

Correspondence between competing rooted and unrooted seed plant phylogenetic hypotheses. a Six most prominent rooted hypotheses of relationships among major lineages of seed plants. Arrows indicate alternative root placements as in b, 1–6 Parsimony reconstructions of gene losses for the ndh gene suite are mapped (oval) under the assumption of irreversibility. b Unrooted trees showing that the underlying topologies for each of the two major sets of hypotheses are identical when their respective outgroups, the other land plants (1–3) or other seed plants (4–6) are excluded. Note that these two unrooted trees are incompatible. Taxon abbreviations: ANG angiosperms, CUP cupressophytes, CYC cycads, GIN Ginkgo, GNE Gnetales, GYM− gymnosperms minus Gnetales, PIN Pinaceae

One major cause of ambiguity for spermatophyte phylogeny inference can be attributed to the ambiguous rooting. While these six competing hypotheses seem significantly different from one another when polarized with outgroups (i.e., rooted; Fig. 1a), they fall into only two categories, each with an identical underlying unrooted tree (Fig. 1b). For example, it becomes apparent that there are no topological differences among the first three major phylogenetic hypotheses (anthophyte and gnetales-sister, I and II; Fig. 1a, 1–3) when the other land plant outgroups (represented by grey arrows in Fig. 1) are pruned off and the remainder is taken as an unrooted tree. Similarly, the only difference between the remaining three alternative scenarios within seed plants (gnetifer, gnepine, and gnecup hypotheses; Fig. 1a, 4–6), is the placement of the root for the clade comprising conifers and Gnetales. Besides rooting issues, another important source of ambiguity is due to the substantial sequence divergence among living lineages of seed plants and their nearest outgroups (ferns and lycopods). In seed plants, this is particularly evident in long branches leading to angiosperms, Gnetales, and Pinaceae as seen in most molecular trees, regardless of whether derived from plastid, nuclear, or mitochondrial sequences (Chaw et al. 1997, 2000; Bowe et al. 2000; Rydin et al. 2002; Graham and Iles 2009). In conjunction, these two phenomena frequently lead to the strongly supported yet spurious tree rooting due to long-branch attraction or related artifacts (Felsenstein 1983; Hendy and Penny 1989). Further complicating the issue, different methods of analyses and different optimality criteria frequently support alternative topologies, even though based on the same sequences (Bowe et al. 2000; Rydin et al. 2002; Burleigh and Mathews 2004, 2007b). Surprisingly, the removal of most rapidly evolving sequences or sites was shown to have little to no effect on the inferences on higher-order seed plant relationships (Burleigh and Mathews 2004; Rai et al. 2008), raising the question of the limits of nucleotide data (Mathews 2009).

Molecular evidence is not limited only to primary sequence data. Additional sources of molecular data relevant for the number of open questions of seed plant phylogeny can be sought from so-called “rare genomic markers” (e.g., Raubeson and Jansen 1992; Rokas and Holland 2000; Moreira and Philippe 2000; Gugerli et al. 2001). In this regard, the plastid (pt) genome seems to be particularly promising. In seed plants, this genome is highly conserved in size, structure, content, and synteny (Palmer 1991; Downie et al. 1991; Clegg et al. 1994). Because of their relatively infrequent evolutionary occurrence, any major structural mutation in the pt genome, such as inversions, gene/intron losses, and contractions/expansions of the inverted repeat (IR), are often considered to be more reliable phylogenetic markers compared to sequences (e.g., Downie et al. 1991; Downie and Palmer 1992; Raubeson and Jansen 1992; Doyle et al. 1995; Doyle et al. 1996; Bailey et al. 1997; Graham and Olmstead 2000a; Plunkett and Downie 2000; Jansen et al. 2007; but see McPherson et al. 2004; Palmer et al. 2004 for cautionary views).

The loss of ndh genes from the pt genome of some seed plant represents one such potentially informative structural change. The ndh genes encode subunits of the plastid NAD(P)H-dehydrogenase (Ndh) complex, a homologue of mitochondrial complex I (Shinozaki et al. 1986). In plastids, the Ndh complex seems to be primarily involved with transfer of electrons from stromal reductants to a plastoquinone pool, a process commonly known as “chlororespiration” (Bennoun 2002; Peltier and Cournac 2002). In addition, involvement of this complex in photooxidative stress reduction in high light intensity, regulation of photosynthesis by modulating the activity of cyclic electron flow around photosystem I, and/or leaf senescence regulation have also been suggested (Casano et al. 2001; Bukhov and Carpentier 2004; Zapata et al. 2005; Diaz et al. 2007; Tallon and Quiles 2007; Romeau et al. 2007; Endo et al. 2008).

While the precise role of the Ndh complex is still uncertain, the ndh genes are known to be widespread in the autotrophic seed plants and remain highly conserved over large evolutionary distances, indicating the presence of strong selection pressure for their retention. Results of entire (or extensive) pt genome sequencing from a number of individual studies indicate that the complete suite of ndh genes is present in Ginkgo (Leebens-Mack et al. 2005; Jansen et al. 2007) and selected representatives of cycads (Wu et al. 2007) and cupressophytes (Hirao et al. 2008) as well as in ~60 species of diverse angiosperms (summarized most recently by Jansen et al. 2007). As suggested by Bungard (2004), the loss of ndh genes in flowering plants seems to be confined only to parasitic plant lineages (dePamphilis and Palmer 1990; Olmstead et al. 2001; Stefanović and Olmstead 2005; Funk et al. 2007; McNeal et al. 2007). This link with heterotrophy is further supported by the loss of ndh genes in a non-photosynthetic liverwort (Wickett et al. 2008) and a green but potentially mycotrophic orchid (Chang et al. 2006). The absence of functional ndh genes from the pt genomes of fully autotrophic seed plants is presently reported only from Gnetales (Wu et al. 2007, 2009; McCoy et al. 2008) and several genera of Pinaceae (Wakasugi et al. 1994; Cronn et al. 2008; Rai et al. 2008; Wu et al. 2009). If inferred to have happened concurrently, this loss could represent a strong synapomorphy for Gnetales and Pinaceae (Chaw et al. 2000; Burleigh and Mathews 2004; Wu et al. 2007). Despite the potential of this rare structural genomic character to bear significantly on the seed plant relationships and help choose among alternative phylogenetic hypotheses, the full extent of presence or absence of ndh genes among living gymnosperms is unknown.

In the present study, we gathered data using a comprehensive slot-blot hybridization survey of the complete suite of plastid ndh genes with a dense sampling of gymnosperms, the most extensive data matrix applied to this issue to date, in order to: (1) ascertain the extent and distribution of ndh gene losses across gymnosperms; and (2) assess the utility of these losses as phylogenetic markers for seed plant phylogeny.

Materials and methods

Taxon sampling

In total, 70 of the 85 genera and 162 of the ~1,070 species of extant gymnosperms were sampled in this study (Table 1), corresponding to 82% of their generic- and 15% of their species-richness, respectively. Our sampling encompasses all four major lineages of living gymnosperms; however, the percentage of diversity coverage differs among these groups (compare with Table 1). Ginkgo biloba is the sole living representative of Ginkgoales. Conifers, including both Pinaceae and cupressophytes, are represented by a total of 131 species (out of 680; 20%), grouped into 59 genera (out of 70; 85%), from all seven currently recognized families. The species-richness of cycads is represented to a significantly lesser degree, by 14 out of 305 species (5%), but our sampling covers 64% of genera (7 out of 11). Finally, 16 species of Gnetales are sampled in total (out of the 92 species; 17%), including all three recognized genera/families. Representatives of four genera of autotrophic angiosperms (Table 1), the last remaining lineage of extant seed plants, were also included in our surveys as positive controls.

Table 1 Seed plant taxa surveyed for the presence/absence of plastid ndh(A-K) genes

DNA extraction and hybridization

Total genomic DNA was isolated using the modified 2× CTAB method (Doyle and Doyle 1987). Fresh leaf material was used where available, but approximately two-thirds of the samples were either from silica-gel dried tissue or herbarium specimens. Quality of DNA varied significantly, from high molecular weight to considerably degraded (from some herbarium material). Because of the poor quality of a number of samples, frequently accompanied also by limited quantity, the standard restriction endonuclease digestion followed by electrophoretic separation approach to Southern hybridization (Sambrook et al. 1989) could not be used. Instead, the slot-blot hybridization method was used, as described in detail by Doyle et al. (1995).

In brief, a slot-blot apparatus (Bio-Rad) was used to make seven sets of pseudoreplicate filter-blots, following the manufacturer’s protocol. Approximately 500–800 ng of total DNA (per sample and per set) was bound to Immobilon-Ny + nylon membrane (Millipore). DNAs from several species with sequenced pt genomes (e.g., Nicotiana, Acorus, Amborella, Ginkgo, or Welwitschia) were included on each membrane as positive or negative controls. Membranes were prehybridized and hybridized at 60°C–62°C in 5× standard saline citrate (SSC), 0.1% SDS, 50 mM Tris (pH 8.0), 10 mM EDTA, 2× Denhardt’s solution, and 5% dextran sulfate. After hybridization, filters were washed twice for 30–45 min in 0.5% SDS and 2× SSC at the hybridization temperature. Probes were labeled with 32P using random oligonucleotide primers (Invitrogen). Autoradiography was carried out using intensifying screens at −80°C for 18–48 h. Filters were stripped of probe between hybridizations by boiling twice for 5–10 min in 0.1% SSC. The absence of carryover signal from previous hybridizations was assured by an overexposure (3–5 days) prior to new rounds of hybridization.

Hybridization probes for all 11 plastid-encoded ndh genes and small plastid ribosomal subunit (16S rDNA; used as a control probe) were derived from tobacco via polymerase chain reaction (PCR). Primer names and sequences are provided in Supplementary Table 1. A total of 17 probes were constructed and their relative positions are indicated in Supplementary Fig. 1. For the two ndh genes, usually interrupted by introns (ndhA and ndhB), two probes were used, each covering one exon. Two additional longer ndh genes (ndhD and ndhH) were surveyed with two probes situated at the 5′- and 3′-ends, respectively. In addition, to estimate the unspecific background hybridization levels, an initial negative hybridization control was performed under the same stringency conditions (see above) and the same amount of 32P, but without probe added.

Results

Interpretation of slot-blots

The slot-blot data ranged from no diminution to complete absence of signal and were for the most part readily interpretable. The presence or absence of ndh genes was determined by eye, by comparison of hybridization signal to the corresponding 16S probe, which is used as a control to establish the presence of significant amounts of ptDNA on the blots. For each blot set and probe combination, the strength of signal was estimated by comparison with a number of positive and negative controls; namely, the species known to contain functional ndh genes (e.g., Ginkgo, Amborella, Acorus, etc.) or to lack these genes (e.g., Welwitschia, Pinus spp., etc.), based on previously available entire ptDNA sequence data.

Representative hybridization results, arranged phylogenetically, are shown in Fig. 2, and the scores for all surveyed species and probes are summarized in Table 1. For every probe, the relative presence or absence of signal was scored for each taxon as showing either full (++), diminished (+), or absent (−) hybridization in comparison to their respective 16S positive controls (Table 1). Full hybridization strength is assumed to indicate that the gene is present and functional. For genes that have two probes (i.e., two exons or 5′- and 3′-end) full hybridization to both probes is required to indicate that the gene is functional. Diminished signals, where hybridization is weaker than the control but there is definite signal presence, can be interpreted in two different ways. It can indicate that the gene is divergent with respect to tobacco but still present and functional or that the gene is present but pseudogenized (i.e., rendered nonfunctional). Absence was scored if no detectable hybridization to the probe was observed. Under our experimental conditions, plants in which a gene has been transferred to the nucleus would fail to produce a detectable hybridization signal when compared to a plant that retains the gene in its plastid genome, due to significant reduction in copy number and an increase in nucleotide substitution rates (Wolfe et al. 1987). Hence, the absence of signal implies either outright loss of genes or their functional transfer to the nucleus. Given the generally conservative substitution rates of ptDNA, it is less likely that the absence of signal represents a highly divergent yet functional gene. Lastly, in certain cases we were unable to determine the presence or absence of signal and we scored these taxa as unknown (“?”; see Table 1). These ambiguities are due to insufficient amounts or poor quality of ptDNA for a given pseudoreplicate.

Fig. 2
figure 2

Autoradiographs showing slot-blot hybridization results of probes derived from ndh(A-K) genes for 31 selected species representing seed plant (out of 166 surveyed; compare with Table 1) presented in a phylogenetic context. Small plastid ribosomal subunit (16S rDNA) was used as positive control (shown here is one representative out of seven filter-sets). The topology shown is a composite tree depicting current understanding of relationships derived from several published phylogenetic analyses (see text for references). Note that the absence or near absence of hybridization for the ndh probes is restricted only to the surveyed members of Gnetales and Pinaceae. Taxon abbreviations are the same as in Fig. 1

Altogether, these assumptions on the presence or absence of genes can lead to potential underestimates or overestimates of gene losses. For example, signals that appear present could potentially represent relatively recent pseudogenized genes while significantly diminished signals might be due to divergent but functional genes. Nevertheless, despite these caveats, a hybridization approach remains a cost effective and efficient method for surveying numerous and diverse samples (Doyle et al. 1995; Adams et al. 2002).

Distribution of ndh gene losses

As expected, the full hybridization signal was observed for all taxa used as positive controls. Based on results from whole pt genome sequencing, the entire ndh gene suite is known to be present in autotrophic angiosperms, indicating that its presence is the shared ancestral condition for this group of plants (Jansen et al. 2007). The presence of all 11 ndh genes is confirmed here across the representatives chosen to span the basal nodes of flowering plant diversity (Fig. 2; Table 1). Also, for Ginkgo, the hybridization to all probes derived from the ndh genes was similar in strength to the control DNA (Fig. 2; Table 1). This was expected as well, given the known presence of these genes based on extensive sequencing of its pt genome (Leebens-Mack et al. 2005; Jansen et al. 2007). In addition, the relative strength of the hybridization of tobacco-derived probes to Ginkgo illustrates the conserved nature of the ndh genes across large phylogenetic distances, including the angiosperm-gymnosperm divergence (>325 Mya; Judd et al. 2002; Palmer et al. 2004).

As a group, cycads strongly hybridized to seven of 11 ndh genes. For a few taxa (Table 1), the signal was diminished with probes for ndhB (both exons), ndhH 3, ndhI, and ndhJ. Although some of the hybridizations are weaker than the positive controls, they do not necessarily indicate loss of function of these ndh genes but rather that these genes are divergent to some degree in these taxa. The presence of functional ndh genes in cycads is expected given that they are found as open reading frames (ORFs) in the sequenced ptDNA of Cycas taitungensis (Wu et al. 2007).

In contrast to the previous lineages, the evidence for loss of ndh genes from pt genomes is widespread in Gnetales and Pinaceae. In Gnetales, there was no significant hybridization signal for most of the probes (Fig. 2; Table 1). For some taxa weak signal was present for ndhA (both exons), ndhC, and ndhH 5′-end. This pattern indicates that the loss of the ndh genes is common to all Gnetales (Fig. 2), in accordance with the results of entire ptDNA sequences from a small number of select representatives of this group (McCoy et al. 2008; Wu et al. 2009). Similarly, there was a generally weak to absent hybridization signal observed for most probes across Pinaceae. In particular, ndhI and ndhJ are absent from all surveyed taxa. Substantially diminished signal was common for probes derived from ndhA (both exons), ndhD, ndhE, ndhH, and ndhI genes. However, ndhB (most notably, the 5′ end), ndhC, and ndhK appeared present in many (but not all) taxa. Given what is known from the entire plastid genome sequences of several Pinus spp. (Wakasugi et al. 1994; Cronn et al. 2008), Picea sitchensis (Cronn et al. 2008), and Keteleeria davidiana (Wu et al. 2009) species, it can be deduced that the presence of weak to moderate hybridization signal observed for these ndh genes (Fig. 2; Table 1) corresponds to pseudogenes. Compared to Pinaceae, Gnetales appear to have fewer remnants of ndh genes (Fig. 2; Table 1), which is expected, given the highly elevated rates of molecular evolution observed in Gnetales plastids generally (Rydin et al. 2002; Burleigh and Mathews 2007a; McCoy et al. 2008).

Unlike Gnetales and Pinaceae, the general trend across cupressophytes was a strong hybridization to almost all ndh probes, indicating that the entire suite of ndh genes is present and conserved within this group (Fig. 2; Table 1). This is fully in agreement with the only published entirely sequenced ptDNA from cupressophytes, Cryptomeria japonica (Cupressaceae s. lat.; Hirao et al. 2008). However, some members of Araucariaceae, Podocarpaceae, Taxaceae, and Cephalotaxaceae hybridized weakly to ndhB (one or both exons) and ndhI, while Cupressaceae s. lat. exhibit diminished hybridization signal to ndhG but not ndhI (see Table 1). In these cases, diminished signal is most likely due to the elevated sequence divergence of ndh genes, as evidenced by the presence of ndhB and ndhF ORFs in few representative species from these families that are currently sequenced (Rai et al. 2008).

Discussion

Implications of ndh losses for spermatophyte relationships

The ndh genes comprise about one-tenth of the ~120 genes retained in plastids of most photosynthetic seed plants. Based on numerous entirely sequenced pt genomes of angiosperms (see Jansen et al. 2007 for the most recent summary) as well as a limited number of gymnosperms (Wakasugi et al. 1994; Wu et al. 2007, 2009; Cronn et al. 2008; McCoy et al. 2008; Hirao et al. 2008), it appeared that the loss of the ndh genes was restricted to Gnetales (McCoy et al. 2008; Wu et al. 2009) and Pinaceae (Wakasugi et al. 1994; Cronn et al. 2008; Wu et al. 2009). The results of our survey extend the previous inferences of the ndh gene absence to be common to all of Gnetales and Pinaceae, but not to other gymnosperms (nor to autotrophic angiosperms). While the losses of ndh genes from the plastids is rare in autotrophic plants, their absence has been observed repeatedly in heterotrophic angiosperms (Olmstead et al. 2001; Stefanović and Olmstead 2005). Extrapolating from those cases, it seems that the ndh genes are generally lost as a suite (Bungard 2004; Krause 2008). Hence, from a phylogenetic point of view, the entire suite should be considered as a single loss (Stefanović and Olmstead 2005) and not as 11 independent losses.

Given the extent and distribution of presence and absence of the ndh genes among spermatophytes, the most parsimonious solution suggests that the loss of these genes is a synapomorphy for Gnetales and Pinaceae, a shared derived character inherited from their common ancestor (Fig. 1) supporting the gnepine hypothesis. Each of the five alternatives for the Gnetales relationships with the other seed plants (i.e., anthophyte, gnetales-sister, gnetifer, and gnecup hypthotheses) would require a minimum of two independent losses of ndh genes, one in Gnetales and one in Pinaceae (Fig. 1). While the possibility that ndh genes were lost more than once in gymnosperms cannot be positively excluded, the low frequency of loss of these genes, and in particular the near-absence of loss among autotrophic seed plants, argues against such a scenario.

Additional lines of evidence supporting the gnepine hypothesis are provided by a couple of other plastid structural characters, each with putatively the same phylogenetic distribution among seed plants as that observed for the loss of ndh genes. First, an expansion of the inverted repeat (IR) that includes the duplication of trnI-CAU and a partial duplication of the psbA gene region situated at the end of the large single copy (LSC) is found in several sequenced members of Gnetales and Pinaceae but is not known from any other land plant (Wu et al. 2007, 2009; McCoy et al. 2008; Hirao et al. 2008). Second, both Gnetales and Pinaceae appear to share a common loss of rps16, to the exclusion of other gymnosperms and basal angiosperms lineages from which the presence of this gene is ascertained (Wakasugi et al. 1994; Wu et al. 2007, 2009; Jansen et al. 2007; McCoy et al. 2008). However, contrary to the above examples, one particular structural genomic marker does not support the gnepine phylogeny. The loss (or a significant reduction) of the IR is reported from both cupressophytes and Pinaceae, but not from Gnetales (Raubeson and Jansen 1992; Wu et al. 2007, 2009; McCoy et al. 2008; Hirao et al. 2008). This ptDNA feature favors the phylogenetic interpretation according to which the loss of the ndh genes occurred independently in Gnetales and Pinaceae.

Future directions

Although the preponderance of genomic structural changes currently supports the gnepine hypothesis, caution is still warranted because most of these features have only been observed in a subset of exemplar taxa. Hence, their full distribution across seed plants and their evolutionary significance remains poorly understood. To determine their relative importance, all of these underexamined markers require further in-depth surveys across a broader taxon sample, as was done here for the ndh genes. In particular, additional ptDNA sequences are needed from the representatives of Araucariaceae and Podocarpaceae, to help triangulate the ancestral conditions for many of these potentially phylogenetically important characters in cupressophytes. It would also be valuable to survey for ndh genes in Parasitaxus usta (Podocarpaceae), the only known mycoheterotrophic conifer (Feild and Brodribb 2005). We predict the functional absence of all ndh genes in this highly derived podocarp species (Sinclair et al. 2002), which would represent an independent loss of the Ndh complex in conifers, related to its shift to a fully heterotrophic nutritional mode.