1 Introduction

The chloroplast genome (plastome) is circular and reduced, ranging from 107 to 218 kb in length and containing about 120 genes (Daniell et al. 2016). In Angiosperms, except for some Fabaceae, the plastome has a quadripartite structure: it consists of two inverted regions (IRs) interspersed between two regions called the large single-copy (LSC) region and small single-copy (SSC) region (Whicke et al. 2011; Zhu et al. 2016). With the emergence of next-generation sequencing technologies, genome sequencing has become more accessible and widely used in evolutionary and phylogenetic studies because it provides a much larger amount of data than with markers for specific DNA fragments (Yang et al. 2013; Rogalski et al. 2015; Daniell et al. 2016).

The general structure of plastomes has been described and compared with the purpose of finding more informative regions for phylogenetic analyses and inferring about the evolutionary dynamics of botanical families (Niu et al. 2017) and lower taxonomic groups (Yang et al. 2013; Luo et al. 2014; Perini et al. 2015). In Orchidaceae, plastome sequences also have been used to study the evolution of ndh genes encoding NADH dehydrogenase (Kim et al. 2015; Lin et al. 2015, 2017) and the plastome evolution in heterotrophic species (Barett et al. 2014; Schelkunov et al. 2015; Feng et al. 2016; Graham et al. 2017).

After heterotrophic orchids, some groups of the subfamily Epidendroideae, mainly the subtribes Cymbidiinae and Dendrobiinae, have been the focus of genomic studies in Orchidaceae due to their large number of species and the economic importance. However, there is still much to be studied in this subfamily. One example is the absence of genomic studies focused on the subtribe Pleurothallidinae (Epidendroideae, Epidendrae), a group composed of ca. 5000 species (Karremans 2016) occurring exclusively in the Neotropics (Pridgeon 2005), representing about 20% of the richness of the whole Orchidaceae. To date, there are only two plastomes of representatives of this subtribe in the NCBI database (GenBank), both of the genus Masdevallia Lindl.

The genus Anathallis Barb.Rodr. belongs to subtribe Pleurothallidinae. It comprises more than 150 species distributed from the Greater Antilles and southern Mexico to Brazil, Bolivia and Argentina (Pridgeon 2005), with A. obovata (Lindl.) Pridgeon and M.W. Chase as type species. Although no genome-based phylogenetic studies have been performed, some studies using nrITS and plastid matK with a moderated number of species show that it is a polyphyletic genus (Chiron et al. 2012; Karremans et al. 2013; Karremans 2014), but more systematics studies with more makers are needed (Mauad et al. in prep.).

In this study, we aimed to publish the complete chloroplast genome sequence of A. obovata and describe its structure and gene composition. In addition, we compare it with the available plastomes of genus Masdevallia and of Cattleya crispata (Thunb.) Van den Berg, from subtribe Laeliinae, the sister group of Pleurothallidinae (Chase et al. 2015), in order to find structural variation patterns.

2 Materials and methods

The fresh leaf material of A. obovata was collected from an individual in a greenhouse at the Federal University of Paraná (UFPR) (voucher: M.C. Santos 22, UPCB). The chloroplast isolation was performed according to the protocol developed by Vieira et al. (2014), adapted for few amounts of tissue (Sakaguchi et al. 2017). Chloroplast DNA (cpDNA) was extracted according to Doyle and Doyle (1987), scaled to 2 mL. Purification of cpDNA was performed with DNA Clean and Concentrator kit (Zymo Research, Orange, CA).

Approximately 1 ng of purified cpDNA was used for library preparation with the Nextera XT DNA Sample Prep Kit (Illumina Inc., San Diego, CA), following the manufacturer’s instructions. The sequencing was performed through Illumina MiSeq (Illumina Inc., San Diego, CA) platform.

The 1,378,072 paired-end reads (2 × 250 bp) obtained from sequencing were imported as Illumina’s fastQ file in CLC Genomics Workbench v.11.0 (http://www.qiagenbioinformatics.com). The reads had a mean length of 131.7 bp and were trimmed using 0.05 quality threshold, remaining 1,267,373 reads with 139.2 bp in average.

A hybrid reference-guided de novo assembly approach was used to obtain the complete genome sequence, using Masdevallia coccinea Linden ex Lindl. (KP205432) as reference genome sequence. This approach consists in the realization of a de novo assembly for generating contigs and a reads mapping to the reference sequence concomitantly, so that the consensus sequence generated by the referenced assembly will guide the manual genome assembling from the contigs.

The final consensus sequence was imported into the online Dual Organellar Genome Annotation (DOGMA) program (Wyman et al. 2004) for the preliminary annotation of the genes. Determination of correct positioning of start and stop codons and introns was made based on comparisons with homologous genes from other plastomes available on GenBank. The plastome plot was obtained from the online program Organellar Genome DRAW (OGDRAW) (Lohse et al. 2007, 2013), and the plastome nucleotide sequence was submitted to GenBank under accession number MH979332.

A Mauve alignment was performed through the Geneious R7 program (Kearse et al. 2012) with the progressive Mauve algorithm (Darling et al. 2004) to verify structural differences between the plastomes of Anathallis obovata, Cattleya crispata (KP168671), Masdevallia coccinea and M. picturata Rchb.f. (KJ566305). The plastomes were aligned with only one inverted repeat region to avoid Mauve errors (Karnkowska et al. 2018). The correct size and positioning of the IRs in complete plastomes were determined through the online platform REPuter (Kurtz et al. 2001), and the IR borders were analysed and compared visually in Geneious R7.

3 Results

The referenced assembly mapped 204,053 reads (16.1%) to M. coccinea genome sequence, generating an initial consensus sequence with 161.84 × of average coverage that guided the manual contigs assembly. This manual approach successfully ordered the contigs that filled low-coverage portions and joined the start and end of the initial consensus by sequence overlay, generating the final consensus sequence.

Anathallis obovata plastome is a circular molecule of 155,515 bp arranged in the typical quadripartite structure: the LSC has 83,694 bp, each IR has 26,949 bp, and the SSC has 17,923 bp. It encodes 113 genes, being 79 protein coding, 30 of tRNA and 4 of rRNA (Table 1). Some of these genes are duplicated in the IRs (Fig. 1), which are: all rRNA coding, 8 tRNA coding and 20 protein coding, of which the ycf1 gene is only partially duplicated (Table 1). The majority of genes have only one exon: 15 are composed of two exons (9 protein coding and 6 tRNA coding), and the clpP, rps12 and ycf3 genes have three exons (Table 1).

Table 1 List of genes identified in the plastome of Anathallis obovata
Fig. 1
figure 1

Map of Anathallis obovata complete chloroplast genome sequence. The IRs are represented by the thick lines and the genes by coloured rectangles. Genes inside the circle are transcribed clockwise, and those outside the circle are transcribed counter clockwise

From the referenced annotation of plastid genes of A. obovata, we verified the presence of premature stop codons in the accD and ndhF genes, in the larger copy of ycf1 and in the first exon of the ndhA and ndhB genes (in both copies). In the former three genes, this results in a reduction of approximately 380, 230 and 270 aa, respectively. Similarly, the premature stop codons in the first exon of the ndhA and ndhB genes causes a decrease of about 100 aa in both protein products. These cases may be related to a pseudogenization process and may result in non-functional proteins.

The Mauve alignment indicated that there are no major structural differences between the analysed plastomes (Fig. 2). Therefore, we observed length variations of the LSC, IRs and SSC regions (Table 2) and differences in the positioning of some genes in IR borders (Fig. 3) between the four plastomes.

Fig. 2
figure 2

Mauve alignment of the four plastomes with one copy of the IR taken out. The monochromatic alignment indicates that there are no structural variations between the plastomes

Table 2 General features of plastid genomes analysed in this study
Fig. 3
figure 3

Comparison scheme of the IR borders in the four plastomes analysed. The numbers indicate the lengths of genes, intergenic spacers and distance between IR/LSC and IR/SSC junctions in base pairs

4 Discussion

The percentage of reads mapped in the referenced assembly (16.1%) is similar to that found by Sakaguchi et al. (2017), of 14.7%, when establishing the adaptation of the chloroplast isolation protocol proposed by Vieira et al. (2014). Besides requiring only 2 g of leaf material instead of 20 g, which is very advantageous because of the small size of the individuals of the Pleurothallidinae subtribe, the adapted protocol that we followed in this study results in a high-quality chloroplast isolation. Although most of the reads acquired in the sequencing referred to non-plastidial sequences, the percentage of reads pertaining to cpDNA was enough for the successful assembly of A. obovata plastome, with a high depth of coverage (~ 160 ×).

The SSC region of Cattleya crispata is considerably smaller than the other three plastomes, about 5 kb, due to the deletion of the ndhA, ndhF and ndhI genes. Furthermore, all other ndh genes were annotated as pseudogenes in C. crispata plastome with the exception of ndhE gene (Perini et al. 2015), while the other three plastomes have all the ndh genes, but in A. obovata the genes ndhA, ndhB and ndhF have premature stop codons. This observation is corroborated by the results of Luo et al. (2014), who compared different plastomes of the subfamily Epidendroideae and found that closely related species show a similar pattern of variation of ndh gene content.

The ndh family comprises 11 genes that act on electron transport in photosystem I (Martín and Sabater 2010). However, in the Orchidaceae family there is a great variation in the retention/deletion of the ndh genes along the lineages, suggesting the occurrence of multiple independent events of loss of these genes during the evolution of orchids (Kim et al. 2015; Lin et al. 2015; Niu et al. 2017). Because they are important genes for photosynthesis, there is the possibility of functional copies in the nuclear genome (Chang et al. 2006). However, from transcriptome analyses in species lacking the ndh genes in plastome, traces of functional copies of these genes in the other genomes were not found (Johnson et al. 2012; Lin et al. 2015).

It is believed that ndh genes are dispensable in contemporary plants because there are no deleterious effects on species that do not possess these genes (Ruhlman et al. 2015). Also, the existence of an alternative route for the transport of electrons in photosystem I, the cyclic electron transport pathway PGR5 dependent, makes the ndh family redundant in the genome (Munekage et al. 2002, 2004; DalCorso et al. 2008; Niu et al. 2017).

Kim et al. (2015) suggested that there is a strong relationship between the loss of the ndh genes and the instability in IR/SSC boundaries due to the great structural variation of plastomes of mycoheterotrophic orchid (Dellanoy et al. 2011; Logacheva et al. 2011; Barrett and Davis 2012). Besides that, it was observed that there was an expansion of the IRs in plastomes of gnetophytes and conifers (Braukmann et al. 2009; Wu et al. 2009, 2011), of orchids (Chang et al. 2006; Wu et al. 2010; Kim et al. 2015; Lin et al. 2015; Niu et al. 2017) and of Najas flexilis (Willd.) Rostk. and Schmidt (Peredo et al. 2013) with deletion of the total or partial set of ndh genes.

In this study, we also observed variations in the IR/LSC borders between the four plastomes analysed (Fig. 3). The rpl22 gene has 366 bp and belongs entirely in LSC region in Masdevallia picturata, but in M. coccinea it has 102 bp in the IRA, producing a copy of this size in the IRB. The same occurs in A. obovata and Cattleya crispata, which has 28 and 38 bp of rpl22 gene in the IRA, respectively. Conversely, the rps19 gene has 279 bp and is presented in two copies in the IRs, but in M. picturata it has 207 bp in the LSC, making the copy on the IRB to be truncated, with only 72 bp.

Other differences between the plastomes were observed in the IR/SSC borders (Fig. 3). The ycf1 gene is located in the SSC with a 3′ portion occupying the IRA, producing a minor copy in the other IRB. In A. obovata, this gene has a premature stop codon on the large copy, and in Cattleya crispata, it is almost entirely in SSC producing a copy in IRB significantly smaller relative to the other plastomes: 152 bp, against 1026 bp in Masdevallia and 1062 bp in A. obovata. Still in IRB/SSC boundary, the ndhF gene is absent in C. crispata and has a premature stop codon in A. obovata, while in both Masdevallia plastomes this gene is complete, overlapping 73 bp with ycf1, of which 64 bp are in IRB.

From the comparison of only four plastomes, three of them of subtribe Pleurothallidinae, it was possible to observe a considerable structural variation. This indicates that future studies focused on the subtribe should sequence chloroplast genomes of more genera to understand the molecular evolution of the group and also, with the large amounts of data generated by the next-generation sequencing, explore variable molecular markers to be used in phylogenetic inference and phylogeographic studies.