Introduction

Extant gymnosperms are considered the most ancient group of seed-bearing plants that first appeared, approximately 300 million years ago (Murray 2013). They consist of four major groups, including gnetophytes, conifers, cycads, and ginkgo. The Podocarpaceae family is considered the most diverse family of conifers, comprising 173 species in 18 genera, which are mainly distributed in the Southern Hemisphere, extending also to the north in subtropical China, Japan, Mexico, and the Caribbean (Farjon 1998; Biffin et al. 2011). The Retrophyllum genus comprises five species: Retrophyllum comptonii, R. minor, Retrophyllum piresii, R. rospigliosii, and R. vitiense. The endemic species from highlands of Pacaás Novos National Park, Brazil, R. piresii was classified in 1976 by João Murça Pires, who collected seeds, which were germinated and the plants maintained in the Botanical Garden Museu Paraense Emílio Goeldi, Belém, Pará, Brazil. Nowadays, very few data related to physiological, ecological, and genetic characteristics of this species are known.

Plastid genome (plastome) sequencing is an efficient tool for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and genetic population analyses (Besnard et al. 2011; Dexter et al. 2012; López et al. 2012; Rogalski et al. 2015). It has been used to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, being sources of structural and functional information about the evolution of the different groups of plants (Jansen et al. 2007; Moore et al. 2007; Parks et al. 2009; Moore et al. 2010; Wu et al. 2011; Yi et al. 2013; Vieira et al. 2014a). Plastome sequences are available for all families of conifers: Cephalotaxaceae (Yi et al. 2013), Cupressaceae (Hirao et al. 2008), Pinaceae (Wakasugi et al. 1994; Cronn et al. 2008; Lin et al. 2010), Taxaceae (Zhang et al. 2014), Araucariaceae (Wu and Chaw 2014; Ruhsam et al. 2015), and Podocarpaceae (Wu and Chaw 2014; Vieira et al. 2014a). For Podocarpaceae family, the plastome sequence has recently been revealed for three species: the endemic New Zealand Podocarpus totara G. Benn. ex Don (NC_020361.1), Podocarpus lambertii, a species from the biodiversity hotspot of South America, the Araucaria forest (Vieira et al. 2014a), and the Asiatic species Nageia nagi (Wu and Chaw 2014).

Usually, the plastome of photosynthetic land plants is 120–220 kb in size, with two copies of the inverted repeats (IRs) separating the small and large single-copy (SSC and LSC) regions (Palmer 1983; Knox 2014). The size of IRs in plastids of land plants is highly variable and it is dependent on plant group, genus, family, or species (Wicke et al. 2011; Jansen and Ruhlman 2012; Guo et al. 2014; Gurdon and Maliga 2014; Vieira et al. 2014a). The IR copies recombine themselves and are intended to maintain or confer stability to the remaining plastome (Palmer 1983; Stein et al. 1986; Knox 2014).

In gymnosperms, IR size ranges from large to completely absent. Different taxonomic orders as Gnetales, Cycadales, and Ginkgoales have retained the classical IRs, which can range from 17.3 to 25.1 kb (Wu et al. 2007, 2009; Lin et al. 2012; Guo et al. 2014). In conifers, there are short IR regions, containing different genes, but principally transfer RNA (tRNA) genes or a part of other gene sequence. Recently, in species of the Juniperus genus, the presence of short IRs containing two copies of full trnQ-UUG (Guo et al. 2014) was observed. These short IR sequences (~250 bp) showed to be able to recombine and create different isoforms of plastome, which have been proven to happen in different individual plants and in different tissues of the same plant (Guo et al. 2014). Gurdon and Maliga (2014) reported an unprecedented presence of two plastome configurations, with ~45 kb inversion, produced by recombination of short imperfect inverted sequences containing 20–24 bp in different Medicago truncatula ecotypes.

In transgenic plastids, the presence of inverted or direct repeats produced by using short endogenous plastid 5′- or 3′-UTRs as signals for expression cassettes was demonstrated to generate different plastome isoforms (Rogalski et al. 2006; 2008a; 2008b; Gray et al. 2009; Alkatib et al. 2012). The genome rearrangement is dependent on the direction of the two repeated sequences (i.e., directed repeats or inverted repeats). Whether the sequences are presented as directed repeats, the sequence between them and one of them are deleted from the plastome (Rogalski et al. 2008b; Alkatib et al. 2012), whereas if the sequences are found as inverted repeats, they recombine and induce an inversion of the sequence between them (Rogalski et al. 2006; 2008a).

Here, we demonstrated the presence of recombinationally active repeated sequences, consisting of different copies of tRNA genes, one as inverted and the other as directed repeat in the same plastome. These repeated sequences produce an IR-mediated inversion and a directed repeat (DR)-mediated deletion, resulting in different plastome arrangements. However, the isoform created by DR-mediated deletion may produce an unviable plastome, with deletion of photosynthetic genes and other genes involved in plastid gene expression machinery.

Results and discussion

Retrophyllum piresii plastome size and gene content

R. piresii plastome size was determined to be 133,291 bp, only 443 bp smaller than P. lambertii (133,734 bp; NC_ 023805) and 431 bp smaller than N. nagi (133,722; NC_ 023120). The plastome size of Podocarpaceae species is consistent with other non-Pinaceae species (conifer clade II), which present a plastome ranging from 127,311 bp in Calocedrus formosana (NC_023121) to 145,625 bp in Agathis dammara (NC_023119). Otherwise, they are larger than the sequenced plastomes of Pinaceae species (conifer clade I), which range from 107,122 bp in Cathaya argyrophylla (NC_014589) to 124,168 bp in Picea morrisonicola (NC_016069), and smaller than the cycads Cycas taitungensis (163,403 bp; NC_009618) and Cycas Revoluta (162,489 bp; NC_020319). The GC content determined for R. piresii plastome is 37.25 %, which is very similar to other Podocarpaceae species P. lambertii (37.10 %) and N. nagi (37.26 %).

A total of 120 genes were identified in the R. piresii plastome, of which 118 were single copy and two genes, trnN-GUU and trnD-GUC, were found to be duplicated and occurring as inverted and directed repeat sequences, respectively. The following genes were identified and are listed in Fig. 1 and Table 1: 4 ribosomal RNA genes, 31 unique transfer RNA genes, 20 genes encoding large and small ribosomal subunits, 1 translational initiation factor, 4 genes encoding DNA-dependent RNA polymerases, 50 genes encoding photosynthesis-related proteins, 8 genes encoding other proteins, including the unknown function gene ycf2, and 1 pseudogene, ycf68.

Fig. 1
figure 1

Gene map of Retrophyllum piresii plastome. Genes drawn inside the circle are transcribed clockwise, and genes drawn outside are counterclockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle corresponds to GC content, and the lighter gray to AT content. The location of the IR-mediated inversion and the DR-mediated deletion are highlighted on the outer circle by blue and red bars, respectively

Table 1 List of genes identified in Retrophyllum piresii plastome

Among these 118 single copy genes, 13 were genes containing introns (Table 1). Even though the R. piresii and N. nagi gene content is strictly similar to P. lambertii, they lost the rpoC1 intron (Wu and Chaw 2014; Vieira et al. 2014a). In addition, a double copy of trnD-GUC is an exception only present in R. piresii plastome. The rps16 is also absent in the R. piresii plastome, indicating that Podocarpaceae and Araucariaceae families have lost this gene during the evolution process, while other non-Pinaceae species did not (Wu et al. 2007; Hirao et al. 2008; Wu et al. 2011; Yi et al. 2013; Wu and Chaw 2014; Vieira et al. 2014a). Although this gene was shown to be essential for cell survival in tobacco, an angiosperm species (Fleischmann et al. 2011), it is also absent or nonfunctional in other gymnosperms, such as Pinaceae and Gnetophyte species (Tsudzuki et al. 1992; Wu et al. 2007; 2009), and also in some angiosperms, such as species from Fabaceae (Guo et al. 2007; Tangphatsornruang et al. 2009), Dioscoreaceae (Hansen et al. 2007), and Melanthiaceae (Do et al. 2014) families.

Repeat sequence analysis

The plant population genetic studies may be greatly facilitated by the use of chloroplast DNA (cpDNA) markers due to its nonrecombinant, uniparentally inherited nature in most plant species, and low rates of mutation perceived in plastome (Powell et al. 1995; Provan et al. 2001). Plastome presents a conserved gene set and a general lack of heteroplasmy and recombination, which made it an attractive tool for plant phylogenetic studies. Furthermore, cpDNA may be applied to studies involving genetic structure of natural populations due to its mode of inheritance in comparison to nuclear markers (Provan et al. 2001).

Hence, chloroplast simple sequence repeat (SSR) has been widely used for high-resolution phylogeographic studies (Ahmed et al. 2013; Tomar et al. 2014). Other applications include the characterization of alloplasmic lines in wheat (Tomar et al. 2014), the support of sweet potato domestications theory (Roullier et al. 2011), the distribution of genetic diversity in Pinus pinaster (Vendramin et al. 1998), the gene flow and hybridization among almond tree species (Delplancke et al. 2012), and the studies involving population genetic structure in different species (Kato et al. 2011; 2013; Roullier et al. 2013; Baskauf et al. 2014).

In the present study, we analyzed the occurrence and type of SSRs, consisting of tandemly repeated motifs of 6 bp or less in R. piresii plastome. In total, 168 SSRs were identified. Among them, homo- and dipolymers were the most common with, respectively, 96 and 62 occurrences, whereas tri- (2) and tetrapolymers (8) occurred with lower frequency (Table 2). Among the mono- and dipolymers identified, only 4 mono- and 1 dipolymer presented more than 15 repeats (Table 2), which is in accordance to the nature of chloroplast microsatellites of generally <15 mononucleotide repeats (Provan et al. 2001). Penta- and hexapolymers were not identified in R. piresii, what differs from P. lambertii plastome, in which one penta- and one hexapolymer were identified (Vieira et al. 2014a), and from N. nagi, in which one pentapolymer was identified using the same parameters described in the “Material and methods” section (data not shown).

Table 2 List of simple sequence repeats identified in Retrophyllum piresii plastome

The homopolymers were mostly constituted by A/T sequences (91.66 %), but for dipolymers, only 56.45 % was constituted by multiple A and T bases. In Colocasia spp., the complete plastome sequence was used to identify polymorphic microsatellites suitable for high-resolution phylogeographic studies (Ahmed et al. 2013). The intraspecific sequence alignments revealed that polymorphic microsatellites were mostly mononucleotide A/T, and only one polymorphic, dinucleotide microsatellite AT/TA (Ahmed et al. 2013) Similarly, in wheat, 24 cpSSRs of the 25 polymorfic SSRs were mononucleotide A/T repeats, and only one was C/G repeat (Tomar et al. 2014).

In this study, we identified 158 repeats with one or two nucleotide repeat, totaling almost 94.5 % of all SSRs identified, most of them consisting of A/T sequences. These results reveal the presence of several SSR sites in R. piresii plastome that can be assessed for the intraspecific level of polymorphism, leading to innovative highly sensitive phylogeographic and population genetics studies for this species. This study may help to describe the conservation status of this species in its endemic region, Pacaás Novos National Park, Brazil.

Plastome structure

In land plants, most plastomes consist of large single-copy region (LSC), small single-copy region (SSC) and two inverted repeat regions (IR) (Palmer 1983; Shinozaki et al. 1986, Knox 2014). This plastome organization is highly conserved in angiosperms, with very few exceptions (Guo et al. 2007; Hansen et al. 2007; Tangphatsornruang et al. 2009; Do et al. 2014; Gurdon and Maliga 2014). In gymnosperms, the loss of the large IR has been reported in several species, mainly in conifers (Hirao et al. 2008; Wu and Chaw 2014; Yi et al. 2013). Also, many rearrangements may be observed in the plastome, and such rearrangements appear to play an important role in their evolution (Wu and Chaw 2014; Yi et al. 2013; Vieira et al. 2014a). As in other species of Podocarpaceae family (Wu and Chaw 2014; Vieira et al. 2014a), the plastome of R. piresii lacks one of the IRs (Fig. 1).

Comparing the plastome of R. piresii with P. lambertii and N. nagi by dot-plot analyses (Fig. 2), we noted that the structure of the R. piresii plastome differs from the other two species by one large inversion (~56 kb) flanked by a short IR region containing the trnN-GUU gene. Thus, we investigated if these short IR sequences were a recombinationally active site, leading to an IR-mediated inversion. The presence of these arrangements occurring between the short IR containing the trnN-GUU gene was confirmed by specific PCR primers suitable to amplify all recombination products (Fig. 3). PCR amplification with several primer combinations confirmed that indeed, this IR-mediated inversion produced two different isoforms of the R. piresii plastome (Fig. 3). The plastid DNA used for the PCR amplification was isolated from the same plant and revealed that both isomers coexist in a single R. piresii plant (Fig. 3). The presence of the different plastome isoforms in needle tissues of the same plant was also confirmed by mapping of paired-end reads (Electronic Supplementary Material 1).

Fig. 2
figure 2

Dot-plot analyses of Podocarpus lambertii and Nageia Nagi plastome sequence against Retrophyllum piresii. A positive slope denotes that the compared two sequences are in the same orientations, whereas a negative one indicates that the compared sequences can be aligned but their orientations are opposite. Graphs represent comparisons between R. piresii (axis X) and P. lamberti (axis Y) (a) and R. piresii (axis X) and N. Nagi (axis Y) (b)

Fig. 3
figure 3

PCR analysis of recombinant genomes. a PCR amplification products for IR-mediated inversion with 100 bp ladder; b PCR amplification products for DR-mediated deletion with 100 bp ladder; c PCR amplification products for IR-mediated inversion with 1 kb ladder; d PCR amplification products for DR-mediated deletion with 1 kb ladder; e PCR primer combination designed to amplify genome IR-mediated inversion, isoform 1; f, e PCR primer combination designed to amplify genome IR-mediated inversion, isoform 2; g PCR primer combination designed to amplify genome DR-mediated deletion, isoform 1; h PCR primer combination designed to amplify genome DR-mediated deletion, isoform 3. IR indicates the short inverted repeat formed in the position of trnN-GUU. DR indicates the short directed repeat formed in the position of trnD-GUC. In h, one copy of the directed repeat is deleted and only one copy remains

Although, the two isoforms differ in the orientation of a 56-kb segment of the plastome (Fig. 1), they are functionally equivalent, considering that they both carry the same gene content and do not affect the integrity of other chloroplast genes (Fig. 1). The different isoforms were readily detectable by PCR, and it is highly unlikely that PCR artifacts are involved here since the recombination was also observed by sequencing data (Electronic Supplementary Material 1).

In M. truncatula, an unprecedented presence of two stable alternative plastomes configuration was reported (Gurdon and Maliga 2014). These two configurations were a ~45 kb inversion between a short (20–24 nt) imperfect repeat in different ecotypes. Shortly after, Guo et al. (2014) described these multiple genomic isoforms coexisting within individual plants. In Juniperus, two plastome configurations with a large ~36-kb inversion between inverted repeats of 250 bp containing two copies of trnQ-UUG genes and coexist also in the same plant. Different isoforms are not always present in similar amounts because homologous recombination is a randomly physical mechanism and is distributed due to random segregation of the isoforms during cell and organelle division (Rogalski et al. 2006, 2008b; Guo et al. 2014).

Analyzing different gymnosperm plastome sequences available in GenBank, it is possible to detect the presence of different tRNA genes repeated in direct or inverted copies (Table 3). In general, conifer clade I present trnI-CAU, trnS-GCU, and trnH-GUG in inverted repeat, and trnT-GGU in direct repeat. The conifer clade II, families Cupressaceae and Taxaceae, present trnI-CAU and trnQ-UUG in inverted repeat, while Cephalotaxaceae presents the trnQ-UUG, and Podocarpaceae presents the trnN-GUU. Cycadidae, Gnetidae, and Ginkgoidae did not lose the large IRs; therefore, they have several tRNAs in inverted repeats.

Table 3 List of repeated tRNA in sequenced gymnosperms plastomes

The trnI-CAU gene in conifers clade I was not reported to show ability to recombine and generate inversion between them (Lin et al. 2010; Wu et al. 2011). However, in conifer clade II species, the recombinationally activity of the short IR containing trnQ-UGG (544 bp) was found to occur in C. oliveri (Cephalotaxaceae) but not in C. japonica (Cupressaceae) and T. cryptomerioides (Cupressaceae).The last two species have trnQ-UUG-containing short IRs of approximately 280 bp (Yi et al. 2013). Two species of Podocarpaceae family showed short IRs composed of two copies of trnN-GUU (Vieira et al. 2014a; Wu and Chaw 2014), but it remains to be assayed if they are recombinationally active. In the conifer clade II species, Juniperus genus (Cupressaceae), short IRs (~250 bp) containing trnQ-UUG were shown to recombine and created a large 36 kb inversion (Guo et al. 2014). More recently, a triplication of trnI-CAU was observed in an angiosperm species, Paris verticillata, although, at first analyses, no rearrangements were observed (Do et al. 2014).

We also identified in R. piresii plastome a short DR of 173 bp containing the trnD-GUC gene. This DR is separated by several tRNA genes and genes encoding proteins related to photosynthesis, chlororespiration, and translation (~25 kb) (Fig. 1). We investigated whether this DR could recombine and cause the deletion of its internal content. PCR data containing amplified products with suitable primers (Fig. 3) confirmed the presence of the two plastome isoforms, one containing the DR and the other one with a single copy of trnD-GUC gene and the deletion of the previous internal gene content (Fig. 3). This hypothesis was confirmed by mapping the paired-end reads with both plastome isoforms (Electronic Supplementary Material 2).

In transgenic plastids, the appearance of unexpected plastome conformations was observed when endogenous regulatory sequences were used (Rogalski et al. 2006; 2008a; 2008b; Fleischmann et al. 2011; Alkatib et al. 2012). The use of endogenous sequences (promoters, 5′- and 3′-UTRs) to control transgene expression duplicated relatively short sequences in the plastome and these recombinationally active duplicated sequences can be distributed by chance as IR or DR. If they were positioned as DR, they induced deletion of the sequence between them (Rogalski et al. 2008a) and, otherwise, if they were found as IR, they can work as flip-flop recombination (Rogalski et al. 2006; 2008b).

Deletion of plastome sequences via genetic engineering of directly repeated sequences is a precise method already used successfully for elimination of the selectable marker gene (Iamtham and Day 2000; Day et al. 2005) and targeted disruption of a plastid gene (Kode et al. 2006). The two mechanisms in transgenic plastids, deletion or inversion, mediated by repeated sequences were demonstrated to be a totally random process considering that the different isoforms were found in the same plastids, cells, and/or tissues with different predominance (Rogalski et al. 2006; 2008a; 2008b; Fleischmann et al. 2011; Alkatib et al. 2012).

The results found in the present work comprise the first report in nature of a DR-mediated deletion in plastome of untransformed plants. Similarly to the previous analysis, DNA from only one plant was used, confirming that these isoforms co-exist within a single plant. Given that no abnormal or variegated needles were observed in the R. piresii plant used for plastome sequencing, there are several interesting and remaining questions: Is this a peculiarity of R. piresii plastome or a more common phenomenon present in other plastomes that has been overlooked before? What is the evolutionary advantage of this recombination since photosynthetic and housekeeping genes are deleted? Considering that plastomes have a high ploidy level, is there a mix of viable and unviable plastome isoforms which suffice for gene expression, providing sufficient amount of tRNAs and proteins related to plastid gene expression and photosynthesis? If there is a selection pressure exerted by plastid gene expression and photosynthesis on plastome to eliminate the unviable plastome isoforms and prevent aberrant growth in conifers, how does it work? Deepen on these and others questions can help unravel important aspects of the adaptive evolution of conifers.

Material and methods

Plant material and cpDNA purification

Chloroplast isolation of R. piresii was performed from a single individual fresh leaf gently provided by Goeldi Museum (Museu Paraense Emílio Goeldi), Brazil. Chloroplasts and plastid DNA from young needles were obtained according to Vieira et al. (2014b).

Plastome sequencing, assembling, and annotation

Approximately 50 ng of cpDNA was used to prepare sequencing libraries with Nextera DNA Sample Prep Kit (Illumina Inc., San Diego, CA) according to the manufacturer’s instructions. The obtained library was sequenced using Illumina MiSeq (Illumina Inc., San Diego, CA). The paired-end reads (2 × 300 bp) were applied on a de novo assembly performed using Newbler 2.6v and CLC Genomics Workbench 6.5v. The plastome coverage was estimated using the CLC Genomics Workbench 6.5v software. By using this approach, a total of 165,080 paired-end reads were mapped resulting in ~480-fold plastome coverage. Initial annotation of the R. piresii plastome was performed using Dual Organellar GenoMe Annotator (DOGMA) (Wyman et al. 2004). From this initial annotation, putative starts, stops, and intron positions were determined based on comparisons to homologous genes in other plastomes. The tRNA genes were further verified by using tRNAscan-SE (Schattner et al. 2005). The physical map of the circular plastome was drawn using OrganellarGenomeDRAW (OGDRAW) (Lohse et al. 2013). The physical map of the circular plastome isoforms 2 and 3 are supplied in Electronic Supplementary Material 3.

Repeat sequence analysis and IR identification

Simple sequence repeats (SSRs) were detected using MISA perl script, available at http://pgrc.ipk-gatersleben.de/misa/, with thresholds of eight repeat units for mononucleotide SSRs, four repeat units for di- and trinucleotide SSRs, and three repeat units for tetra-, penta-, and hexanucleotide SSRs. REPuter (Kurtz et al. 2001) was used to visualize the remaining IRs in R. piresii by forward vs. reverse complement (palindromic) alignment. The minimal repeat size was set to 30 bp and the identity of repeats ≥90 %.

Comparative analysis of plastome structure

We used the PROtein MUMmer (PROmer) Perl script in MUMmer 3.0 (Kurtz et al. 2004), available at http://mummer.sourceforge.net/, to visualize gene order conservation (dot-plot analyses) between R. piresii and the Podocarpaceae conifer representatives P. lambertii and N. nagi.

Plastome recombination analysis

The presence of the three plastome isoforms produced by DR- and IR-mediated recombinations was confirmed by PCR amplification using the combinations of primers indicated in Table 4. In 25 μl reactions, 25 ng of cpDNA was amplified in a reaction mixture containing 200 mM of each dNTP (Sigma-Aldrich), 2.0 mM MgCl2 (Sigma-Aldrich), 5 pmol of each primer (Sigma-Aldrich), and 1 U Taq DNA polymerase (Sigma-Aldrich). The standard PCR program was 40 cycles of 30 s at 94 °C, 30 s at 63 °C, and 60 s at 72 °C with a 3-min extension of the first cycle at 94 °C and a 10-min final extension at 72 °C. PCR products were analyzed by electrophoretic separation in 1 % agarose gel.

Table 4 Set of primers used in PCR amplification

The different plastome presence was also confirmed by mapping of Illumina paired-end reads, which indicated the presence of the three plastome products of recombination. The products were analyzed by using the software CLC Genomics Workbench 6.5v.

Data archiving statement

The complete nucleotide sequence of R. piresii plastome sequenced in this study is available in GenBank database under accession number KJ617081.