1 Introduction

The Apocynaceae family consists of the subfamilies Rauvolfioideae, Apocynoideae, Periplocoideae, Secamonoideae and Asclepiadoideae (Simões et al. 2007), among which the Rauvolfioideae includes the Hancornia speciosa Gomes species. The genus Hancornia is monotype, and for H. speciosa there have been described four botanical varieties: H. speciosa var. cuyabensis Malme, H. speciosa var. gardneri (A. DC.) Muell. Arg., H. speciosa var. speciosa Gomes and H. speciosa var. pubensces (Nees. and Martius) Muell. Arg. (Collevatti et al. 2018). Hancornia speciosa is known as mangabeira in the vernacular language and is of economic, ecological and social importance, utilized in agriculture and industry for human food (Silva-Júnior and Lêdo 2006; Silva et al. 2011). The phylogeny of the Apocynaceae is robust and contains monophyletic and paraphyletic subfamilies; however, some tribes and the paraphyletic Rauvolfioideae and Apocynoideae subfamilies need new studies for better understanding of paraphyly (Fishbein et al. 2018).

The chloroplasts are essential for plants and are important for photosynthesis, biosynthesis, and carbon sequestration and have been utilized for phylogenetic analysis with robust branch support in other taxa (Barrett et al. 2016). These cytoplasmic organelles have a genome independent of the nuclear genome, and it is generally inherited through the female line only. The chloroplast genome is organized in a quadripartite circular structure, with a 100–200 kbp size range, primarily composed of two large inverted repeats (IRs), which contain the ribosomal genes and other plastid genes, separated by a large single copy (LSC) region and a small single copy (SSC) region. The coding regions include 16S, 23S, 5S, and 27–31 tRNA genes, which are sufficient to translate all of the amino acids, including three genes for the RNA polymerase subunit (similar to the situation in prokaryotes) and most of the genes for photosystem I, photosystem II, cytochrome, and ATP synthesis (revised by Green 2011), totaling approximately 80 proteins (Huang et al. 2013).

The chloroplast genomes are haploid and highly conserved regarding their genetic content and quadripartite genomic structure. They have been widely used to study the evolutionary relationships of different taxonomic levels in plants (Ravi et al. 2008). The present study investigated three questions: (1) What is the structure of the H. speciosa chloroplast genome? (2) Is phylogeny using complete chloroplast genomes more robust than when using partial chloroplast sequences? (3) Is the evolution of the structure of IRs, LSC, and SSC different within the subfamily Rauvolfioideae? To answer these questions, we obtained the complete H. speciosa chloroplast genome, and an evolutionary analysis was constructed for the family using complete chloroplast genomes.

2 Materials and methods

Plant material, DNA isolation, and high-throughput DNA sequencing

Hancornia speciosa var speciosa plant material was collected in the state of Pernambuco, Brazil, and total DNA was extracted (including nuclear, chloroplast, and mitochondrial DNA) from the leaves utilizing the cetyltrimethylammonium bromide (CTAB) extraction method (Doyle and Doyle 1987). The quality and quantity of the extracted DNA were verified by visualization on a 1% agarose gel and a spectrophotometer, respectively. The DNA sample was fragmented into 400–500 bp to construct the sequencing library. The fragments were ligated with adapters using the “Nextera DNA Sample Preparation” (Illumina), and we sequenced the 100 nt paired-end reads. The sequencing was performed at the Central Laboratory for High Performance Technologies in Life Sciences (LacTad-Laboratório Central de Tecnologias de Alto DesempenhoemCiências da Vida) at the State University of Campinas-UNICAMP, São Paulo.

Chloroplast genome assembly and annotation

– Twenty-four million reads were utilized for de novo assembly using Ray software (Boisvert et al. 2012). The de novo contigs for the chloroplast genomes were then manually merged using paired-end pair reads in the Geneious R9 software. Genome annotation was achieved using Catharanthus roseus (L.) (Rauvolfioideae) as the reference by the Geneious software. For annotation using Geneious, a minimum of 70% identity cutoff between the genomes was considered. The annotations were individually checked, and if necessary, were manually corrected for start and stop codons. The annotated genome was checked with cpGavas (Liu et al. 2012). A graphic representation of the plastomes was created using Geneious R9.

Phylogenetic analysis and genome comparison

– The complete chloroplast genome sequences were aligned using the program MAFFT v7.017 (Katoh and Standley 2013) implemented as the “Multiple align” tool in Geneious R9. The GTR model was determined using Bayesian Information Criterion Evolutionary implemented in MEGA7 software (Kumar et al. 2016). The evolutionary history was inferred by using the maximum likelihood method, and branch support was assessed with 1000 bootstrap replicates conducted in MEGA7 software. For analysis of the branches support, phylogenetic trees were performed using rbcL gene, rbcL + matK genes, rbcL + matK + trnH-psbA and complete chloroplast genome sequences. The linear comparison among chloroplast genomes of the Rauvolfioideae was performed using BLAST Ring Image Generator with the default software settings (Alikhan et al. 2011), and the number of interspecific SNPs was identified using Geneious.

3 Results and discussion

Chloroplast genome of H. speciosa

– Genetic studies based on NGS analyses are capable of unlocking information about the genome, transcriptome, and epigenome of any organism (Chaitankar et al. 2016). In these circumstances, the Illumina paired-end reads from H. speciosa var speciosa were de novo assembled, which produced contigs that allowed the entire genome to be included in the draft. The complete genome was finished by mapping reads in the draft genome, which corrected all errors. After obtaining the final genomes, the 24 million reads were mapped, resulting in an average coverage of 340×. The chloroplast genome of H. speciosa consisted of 155,357 bp (NCBI access number MG049918), with the typical conserved “quadripartite” structure, with a pair of inverted repeats (IRA and IRB) separated by LSC and SSC regions (Fig. 1). The IRA and IRB consisted of 25,755 bp and 25,654 bp, respectively, separated by 85,702 bp of LSC and 18,229 bp of SSC. The genome included 127 genes, 83 protein-coding, 36 tRNAs, and 4 rRNAs (Table 1 and Fig. 1), and all protein-coding genes used AUG as the start codon. The chloroplast genome of H. speciosa showed a typical structure, with the expected content of LSC, SSC, and two IRs, corroborating the findings of several other studies that have described the chloroplast genome as a conserved genome and that the genome size is largely due to indels in the LSC regions (Barrett et al. 2016; Chen et al. 2017). In the Rauvolfioideae subfamily, only three chloroplast genomes are available: Rhazya stricta Decne (Park et al. 2014), Catharanthus roseus (Ku et al. 2013), and Carissa macrocarpa (Eckl.) A. DC. (Jo et al. 2017). The genome sequences produced in the present study will contribute to future phylogenetics analysis using complete chloroplast genomes.

Fig. 1
figure 1

Chloroplast genome map of H. speciosa. Annotated genes are colored according to the functional categories as described in the legend. Genes are transcribed in a clockwise direction and in a counterclockwise direction. LSC, large single copy region; SSC, small single copy region; IRA; IRB. The inside gray circle represents the C + G content

Table 1 List of genes present in the Hancornia speciosa chloroplast genome

Comparative chloroplast genome analysis in Rauvolfioideae

– The complete chloroplast genomes for the Rauvolfioideae subfamily were utilized in the comparative analyses, and the results showed that regions IRA and IRB showed smaller abundance of SNPs when compared with the LSC and SSC regions (Fig. 2). The genome includes similar genes annotation, and the arrangements of these regions are exclusively collinear (Fig. 3). The phylogenetic analysis showed that H. speciosa is close to Catharanthus roseus with high support (100%), while Rhazya stricta and Carrissa macrocarpa were in another clade, with high support (Fig. 3). When the partial chloroplast genome sequences were used (rbcL gene, rbcL + matK genes and rbcL + matK + trnH-psbA), the results showed low support when compared to using complete chloroplast genome sequences (Fig. S1). The previous phylogenetic analysis in Apocynaceae indicated 100% support for many branches; however, there are many other branches with 90% or less than 90% support (Fishbein et al. 2018). In the present study, all branches showed 100% bootstrap support, suggesting that phylogeny using complete chloroplast genome is more robust than using partial chloroplast sequences. For SSR analysis, we observed motifs of di-, tri-, and tetra-nucleotides; however, tetra-nucleotides were found only in H. speciosa (Fig. 4). Among these SSR motifs, the di-nucleotides were more abundant in the four species, followed by tri-nucleotides, of which the number of di-nucleotides ranged from 8 to 14 among genomes and tri-nucleotides ranged from 1 to 5 (Fig. 4).

Fig. 2
figure 2

Distribution of SNPs among chloroplast genomes from the Rauvolfioideae subfamily. LSC Large single copy region, IRA inverted repeat region A, SSC small single copy, IRB inverted repeat region B

Fig. 3
figure 3

Circular map of the content of the chloroplast genomes from the Rauvolfioideae subfamily. LSC and SSC (dark blue) and IRA and IRB (gray). The black ring represents the C + G. The chloroplast genomes are represented in red (H. speciosa), purple (Catharanthus roseus), yellow (Rhazya stricta), and green (Carissa macrocarpa). Gene annotations are represented in dark green. The inside circle represents the molecular phylogenetic analysis by the maximum likelihood method, with supported values estimated by bootstrap

Fig. 4
figure 4

Comparative analysis of microsatellites in the chloroplast genome of Rauvolfioideae

The Apocynaceae family contains 370 genera, organized into 25 tribes in five subfamilies, and although the phylogeny of Apocynaceae has been analyzed based on morphological characteristics and DNA sequences, the phylogenetics analysis has shown that Rauvolfioideae is first diverging lineage, while other subfamilies are derived (Lahaye et al. 2007; Simões et al. 2007; Livshultz 2010; Simões et al. 2016; Fishbein et al. 2018; Ollerton et al. 2018). The phylogenetic tree using complete chloroplast genomes corroborates the findings of prior studies (Simões et al. 2007; Livshultz et al. 2007; Fishbein et al. 2018), in which Rauvolfioideae is basal in the Apocynaceae; however, two clades have been observed, suggesting that Rauvolfioideae is paraphyletic as described in previous studies (Simões et al. 2007; Fishbein et al. 2018). The paraphyly (Rauvolfioideae and Apocynoideae) and monophyly (Periplocoideae, Secamonoideae and Asclepiadoideae) in the family happened mainly due to its wide distribution in the tropics around the world, which resulted in different modifications and adaptations between species. However, the paraphyly within Rauvolfioideae have not yet been analyzed for their chloroplast genome structure, but a comparison among genomes has shown that their gene order and content are highly conserved.