Introduction

The Adinandra genus was first established in 1822 by Jack W. with a species A. dumosa from Sumatra and the Malay Islands. There are about 85 species mainly distributed in the region of southern Asia and the African tropical forest (Min and Bruce 2007a, b). Eleven species of the Adinandra genus occur in Vietnam, including A. dongnaiensis Gagnep., A. annamensis Gagnep., A. caudata Gagnep., A. poilanei Gagnep., A. hainanensis Hay., A. microcarpa Gagnep., and A. integerrima. T. And. ex Dyer, A. millettii Benth. & Hook. f. ex Hance, A. rubropunctata Merr. & Chun., A. megaphylla Hu., and A. glischroloma Hand-Mazz. (Ho 1999; Nguyen 2003). Several species belonging to the Adinandra genus have been used as a traditional medicinal product (Chen et al. 1997; Yu and Chen 1997) and in phylogenetic analysis (Rosea et al. 2018; Wu et al. 2007a, b; Yu et al. 2017; Nguyen et al. 2021a, b). The initial information about A. bockiana was first found in the journal Bot. Jahrb. Syst. 29: 474 (1900). However, the related information on this species is recently still limited only to morphological descriptions (Wu et al. 2007a, b) and bioactivity assay (Nguyen et al. 2020). Thus, the shortcomings of studies on the genomics of A. bockiana, focusing on the nuclear, mitochondrial, and chloroplast genome, severely limit the research, utilization, and conservation of this plant species.

The chloroplast (cp) genome has characteristics of haploid inheritance (Bendich 2004). The chloroplast, a plastid characterized by thylakoid structures and a double-layer membrane, plays a vital role in photosynthesis and several essential biochemical processes of the plant (Mustárdy et al. 2008; Schattat et al. 2011). It is also one of two organelles (as well as mitochondria) that contain their genomes. In the A. bockiana cp genome, a pair of inverted repeats (IRs) separates the large single-copy region (LSC) and the small single-copy region (SSC) (Bendich 2004). Generally, the majority of proteins that involve in photosynthesis are encoded by the cp genes; in addition, there are some genes encoding transfer RNAs and ribosomal RNAs (Daniell et al. 2016). Although the chloroplast genome is smaller than nuclear genomes in size and gene content, the chloroplast genome is crucial for investigating plant evolution and molecular phylogeny (Wang et al. 2018). Since cp genomes are conserved in organization, structure, and DNA sequences compared with nuclear and mitochondrial genomes, they have been widely used in plant taxonomic study and phylogenetic analysis. For example, ndhF, matK, and trnS-trnG regions within the cp genome have been amplified for barcoding, species identification, and phylogenies (Dong et al. 2012; Shaw et al. 2014). Recently, the advent of third-generation sequencing (NGS) technology is an efficient method that makes sequencing the whole cp genome advantageous.

Our goal in building DNA barcodes is to identify and analyze the molecular evolution of some pharmacological species of the genus Adinandra. Therefore, in the previous report, the complete cp genome sequence of A. megaphylla Hu, a medicinal plant, was sequenced and annotated. The results also show that the matK gene is a better candidate for phylogenetic analysis (Nguyen et al. 2021a, b). Here is an assembly annotation of the complete cp genome sequence of Adinandra bockiana E. Pritz. ex Diels. Additionally, the A. bockiana genome was compared to sequence divergence, analyzed molecular evolution among the Adinandra species, inference of the phylogenetic relationship, and reconstructed the phylogeny of the plant species within the family Pentaphylacaceae.

Material and Methods

A. bockiana Samples

The leaf samples and specimens of A. bockiana samples (Voucher NHQ 02) were collected in August 2019, at an altitude of 800 m, in the coordinates 21°59′15″'N; 104° 19′28″'E (Fig. 1) at Hoang Lien—Van Ban Nature Reserve (Lao Cai, Vietnam). The voucher specimens were stored at the Institute of Ecology and Biological Resources, Vietnam. To collect these samples, we have the permission of the Ministry of Agriculture and rural development in Vietnam and followed the regulations of the Nature Reserve and legislation.

Fig. 1
figure 1

Morphology of A. bockiana E. Pritz. ex Diels (Voucher NHQ 02). A Abaxial surface of leaf, B adaxial surface of leaf, C branch with buds, D bud; all photo by Huu Quan Nguyen

DNA Extraction from Leaves and cp Genome Sequencing of A. bockiana

The modified CTAB method (Souza et al. 2012) was selected to extract the genomic DNA from young plant leaves. The DNA sample purity was evaluated by absorption spectrum and by 0.8% agarose gel electrophoresis. In addition, DNA library preparation was implemented from total genomic DNA using SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, Menlo Park, CA), and adapter ligation was thereafter performed, according to the manufacturer’s protocol for genomic DNA above 20 kb (Pacific Biosciences). SMRTbell libraries were loaded on one chip and sequenced on a Pacbio SEQUEL system at the Key Laboratory for Gene Technology, Institution of Biotechnology (Hanoi, Vietnam).

Assembly and Annotation of A. bockiana cp Genome

The total genome DNA was obtained by a resequencing method in the PacBio platform. Then, the raw sequences belonging to the cp genome were sorted out by mapping to the reference cp genome of Adinandra angustifolia (MF179491) (Yu et al. 2017) using pbmm2 version 1.2.0 (https://github.com/PacificBiosciences/pbmm2). The A. bockiana cp genome was assembled by the software HGAP4 (Chin et al. 2013); subsequently, the contigs were analyzed using the BLAST software (Johnson et al. 2008). The protein, tRNA, and rRNA-coding genes were annotated by the CpGAVAS2 pipeline (Liu et al. 2012). Default parameters of tRNA genes were confirmed using the tRNAscan-SE version 1.21 software (Lowe and Eddy 1997). The plastome circular cp map of the A. bockiana genome was constructed using the OrganellarGenomeDRAW tool (OGDRAW) version 1.3.1 (Lohse et al. 2007). Repeat elements were found using two approaches. Simple sequence repeats (SSRs) were identified in the A. bockiana chloroplast genome using the online software MIcroSAtellite (MISA) (Beier et al. 2017) with the following parameters: ten repeat units for mononucleotides, five for dinucleotides, four for trinucleotides, and three units for tetra-, penta-, hex nucleotides SSRs motifs. The size and location of long repeats (palindromic, forward, reverse, and complement) in the Adinandra plastome were investigated using REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer) (Kurtz and Schleiermacher 1999) with the set parameters as follows: hamming distance of 3 kb, a minimal repeat size of 20 bp, and 90% or greater sequence identity.

Cp Genome Comparison Among Several Adinandra Species

The structure, size, content, and repeats of A. bockiana cp genome were compared with those of the cp genomes of A. megaphylla (GenBank: MW697901) (Nguyen et al. 2021a, b), A. millettii (GenBank: MF179492) and A. angustifolia (GenBank: MF179491) (Yu et al. 2017). The whole cp genome sequences of the four Adinandra plants (A. bockiana, A. megaphylla, A. millettii, and A. angustifolia) were aligned with the MAFFT server (Katoh et al. 2019) and visualized using LAGAN mode in mVISTA (Mayor et al. 2000). For the mVISTA plot, we used the annotated cp genome of A. bockiana as a reference. The Irscope was employed to visually display and compare the borders of LSC, SSC, and IR regions among the four Adinandra species (A. bockiana, A. megaphylla, A. millettii, and A. angustifolia) (Amiryousefi et al. 2018). Using DnaSP version 6.12.03 to analyze the relative usage of synonymous codons and the divergence of sequences (Pi) among the four species A. bockiana, A. megaphylla, A. millettii, and A. angustifolia (Rozas et al. 2017). For the sequence divergence analysis, we applied the window size of 600 bp with a 200 bp step size.

Analysis of Phylogeny

The taxonomic position of the A. bockiana was identified by the completed cp genome sequences from species of the Adinandra genus and some other species of the Pentaphylacaceaephy family. The MUSCLE mode in Unipro UGENE software v36.0 (Okonechnikov et al. 2012) was used to align these sequences then constructed a maximum likelihood (ML) phylogenetic tree using Mega-X software (Kumar et al. 2018) and conducted the rapid bootstrap analysis (1000 replicates). The selected methods are based on the previous study of this genus (Yu et al. 2017). However, because of the shortcoming of available cp genomic sequences among the genus Adinandra in literature, other commonly used barcoding genes such as matK and trnL were aligned using the ClustalW method, and the trees were built by a similar strategy as mentioned above.

Results

Characteristics of A. bockiana cp Genome

The complete cp genome of A. bockiana is 156,284 bp in length (Fig. 2). The A. bockiana cp genome has a typical tetrahedral structure consisting of four regions with a pair of IRs is 26,090 bp in length, and LSC is 85,693 bp in length, and the SSC is 18,411 bp in length. The A. bockiana cp genome contains 129 genes, and the percent of the GC content of the cp genome was 37.4%.

Fig. 2
figure 2

Chloroplast genome map of A. bockiana. Genes shown inside the circle are transcribed clockwise, whereas genes outside are transcribed counterclockwise. The AT content shows light gray, and the GC content shows dark gray in the inner circle. Chloroplast map of A. bockiana E. Pritz. ex Diels in Vietnam. The annotated A. bockiana chloroplast genome sequence was deposited in GenBank under accession MW699853 (Nguyen et al. 2021b)

Gene composition analysis of the cp genome of A. bockiana shows that there were 129 genes including 84 protein-coding genes (PCGs), 37 tRNA genes, and 8 rRNA genes (Suppl. 1). In the cp genome of A. bockiana, 129 genes were distributed in 18 groups including the photosynthesis-related gene category (6 groups), the transcription and translation gene category (6 groups), another gene category (5 groups), and conserved open reading frames. A total of 45 genes belong to the photosynthesis-related gene category, 65 genes are involved in transcription and translation processes, and 20 genes are containing from one to two introns. Otherwise, encoding genes have a high degree of conservation, and some genes have unknown functions. In the IR regions, there are 17 genes with two copies annotated (Table 1). The annotated A. bockiana cp genome sequence was deposited in GenBank under accession MW699853 (Nguyen et al. 2021ab).

Table 1 The gene groups of A. bockiana cp genome

Datasets of Repeat Sequences and Protein-Coding Genes in A. bockiana

The 51 SSRs in the cp genome of A. bockiana have been identified, with the mono repeat type, including A, T, or C from 10 to 19 bp in length (Suppl. 2A). The SSRs with 2, 3, 4, 5, and 6 bp repeat types were not found in the A. bockiana. The results in Suppl. 2B showed that the majority of SSRs were located in the LSC region (35 SSRs), while in the SSC and IR regions, the SSRs are much less, with 6 and 4 SSRs, respectively.

Seventy repeat sequences were identified in the cp genome of A. bockiana, in which there are 48 palindromic repeat sequences, 20 forward repeat sequences, and 2 reverse repeat sequences, and no complement repeat sequences were found in this cp genome (Suppl. 3). The repeat units were 22 to 56 bp in size, and repeat units were higher than 30 bp, accounting for 66%.

Analysis of codons usage frequency in the A. bockiana showed that 52,057 codons were found in the coding regions of the protein-coding genes (Table 2). In protein-coding triplets, G and C endings are more common than A and U. The analysis results in Table 2 also show that, among amino acids, the number of codons encoding leucine is 5338, accounting for 10.25%, followed by the number of codons encoding serine (5067 codons), accounting for 9.73%, and at least tryptophan-encoding codons with 693 codons, accounting for about 1.33%. Out of 64 codons, the number of codons used more frequently is 30, with relative synonymous codon usage (RSCU) greater than 1, and 29 codons have partiality of the codon usage, with RSCU < 1. The used frequency of AUG (methionine) and UGG (tryptophan) was no deviation (RSCU = 1). The stop codons include UAA, UGA, and UAG.

Table 2 The codons and relative synonymous codon usage (RSCU) for protein-coding genes in A. bockiana (*stop codon)

Comparing the cp Genome Between A. bockiana, A. megaphylla, A. millettii, and A. angustifolia

The comparative result of four cp genomes found very high similarity between A. bockiana, A. megaphylla, A. millettii, and A. angustifolia (Fig. 3). In the four genome cp datasets, conservation in nucleotide sequences accounts for a large proportion, and only a few regions have variation. Compared to the IR region, the LSC and SSC regions have higher differences. The coding regions have a propensity to be more conserved compared to non-coding regions, and variants were detected mainly in conserved non-coding regions. The exons in the four cp genomes have nearly identical nucleotide sequences, and the matK, psaA, ndhK, ndhG, and rbcL genes have different nucleotide fragments between the four species: A. bockiana, A. megaphylla, A. millettii, and A. angustifolia.

Fig. 3
figure 3

Chloroplast genome comparison plot of A. bockiana, A. megaphylla, A. millettii, and A. angustifolia

Analysis results of Adinandra cp genome indicated that the average Pi value of the LSC was 0.001447, the SSC was 0.001436, and the IRs were 0.000265. Thus, it can be seen that the LSC and SSC regions contain many variants (Suppl. 4). The average Pi value of nucleotide diversity of four species, A. bockiana, A. megaphylla, A. millettii, and A. angustifolia, was 0.00105.

Comparison of cp genomes between A. bockiana, A. megaphylla, A. millettii, and A. angustifolia showed that the size of IR in the cp genome of A. bockiana was 26,090 bp, of A. megaphylla was 26,093 bp, of A. angustifolia was 26,091 bp, and of A. millettii is 26,096 bp. In the cp genome of all four species: A. bockiana, A. megaphylla, A. millettii, and A. angustifolia, the ndhF gene belongs to the LSC region but overlaps the IRa 5 bp. In the cp genome of A. megaphylla, A. millettii, and A. angustifolia, the tail of the ycf1 gene in size 1067 bp was located in the IRa region, and the ycf1 gene was the borders across IRa and SSC, but in the A. bockiana cp genome, the ycf1 gene was not presented in this region (Fig. 4). The results of IR analysis showed that no expansion and contraction were found in the IR regions of the cp genome of all four species: A. bockiana, A. megaphylla, A. millettii, and A. angustifolia.

Fig. 4
figure 4

The junction positions of LSC, IR, and SSC and distribution of genes located in/near the border of two IR regions in the chloroplast genome of A. bockiana, A. megaphylla, A. millettii, and A. angustifolia

Phylogenetic Analysis Based on the cp Genome Sequences

Phylogenetic analysis based on complete cp genome sequences performed with several species belonging to three families as Pentaphylacaceae, Theaceae, and Styracaceae that belong to the order Ericales. The complete cp genome sequences of these species have been published in GenBank. The Pentaphylax genus of the Pentaphylacaceae split off into a clade; meanwhile, the clade of Adinandra, Eurya, and Euryodendron genera were sisters to the clade of two genera Ternstroemia and Anneslea. Besides, all four species A. bockiana, A. megaphylla, A. millettii, and A. angustifolia formed one clade with 100% support (Fig. 5A). These results agreed with the previous study (Yu et al. 2017). The topologies of the matK sequences and cp genome yielded a similar structure. For the genus Adinandra, the novel sequence of the studied A. bockiana E. Pritz. ex Diels and other species of the Adinandra genus distributed in one branch with a 96.8% bootstrap value (Fig. 5B). In contrast to the matK gene, at the branch of the Adinandra genus, the trnL sequence gave less phylogenetic resolution (bootstrap value of 51%). In Fig. 5C, two subclades of the Adinandra genus were separated, of which one was constructed by A. bockiana and one established by A. megaphylla and A. angustifolia. In summary, the complete cp genome sequence of plants belonging to Pentaphylacaceae is the potential for the investigation of taxonomic identification purposes, and especially the matK and trnL genes are highly recommended as the DNA barcoding candidates for phylogenetic analysis of this plant family.

Fig. 5
figure 5

Phylogenetic trees of A. bockiana and the species related based on cp genome sequences (A), matK gene (B), and trnL (C) gene. Bootstrap values are above the nodes of branches (1000 replicates). The bar (bottom left) indicates 0.02 and 0.05 changes per nucleotide position

Discussion

Three hundred and forty-five species of 12 genera belonging to the Pentaphylacaceae family have been found in the world (Steven 2017). To date, along with the cp sequences of A. bockiana and A. megaphylla, only 2 Adinandra cp genomes out of 8 cp genomes belonging to 6 genera of the Pentaphylacaceae family have been available on GenBank. Eighty-five species of the Adinandra genus have been found mainly in southern Japan, China, Western Asia, New Guinea, Southeast Asia, and the tropical forest in Africa (Min and Bruce 2007a, b). Extracts and some bioactive compounds of many species of the Adinandra genus were investigated and isolated (Brad and Zhang 2018; Chen et al. 2015; Gao et al. 2010; Liu et al. 2010; Yuan et al. 2019; Zuo et al. 2010).

In this study, cp genomes for one Vietnamese A. bockiana were sequenced, and performed comparative analyses with cp A. dumosa genome and cp A. angustifolia genome to establish the position of A. bockiana in the Pentaphylacaceae family. The analysis results of gene organization along with the codon usage patterns showed high conservation ability, which can be powerful in terms of phylogenetic and population genetics studies. This feature is consistent with the that plastome of flowering plants harbor highly conserved structure and gene content (Daniell et al. 2016; Palmer 1985).

When analyzing Adinandra plastomes, the typical tetraploid structure and the expected size (~ 15.6 kbp) for angiosperms were determined. In previous studies, the angiosperm plastomes harbor 129 genes with 18 genes including introns, and witnessed the conservation of gene contents (Jansen et al. 2007; Palmer 1985). The cp genome of A. bockiana similarly contains 129 genes, and twenty of them consist of introns. And the cp genomes of all four species, A. bockiana, A. megaphylla, A. millettii, and A. angustifolia, contain 70 small repeats in both the coding and non-coding regions. The number of repeat sequences is significantly higher than other counterparts (32–49 long repeats in Curcuma, 49 in Hibiscus syriacus, 42 in S. adstringens 49 in Papaver spp. (Cheng et al. 2020; Gao et al. 2018; Liang et al. 2020; Souza et al. 2019; Nguyen et al. 2021a, b). In several angiosperm taxa, repeats are particularly related to the plastome reconstruction and can be marked as a signal of recombination (Jansen and Ruhlman 2012). Because of the potential to create secondary structures, recognition signals during the recombination process can be identified by repeated sequences (Kawata et al. 1997). Due to the predominance of uniparental inheritance, recombination is suggested that rarely occurs in angiosperms. In angiosperms, homological recombination at the molecular level has been identified in some cases (Medgyesy et al. 1985; Sullivan et al. 2017). So far, plastome recombination in the taxa has not received much attention. Furthermore, plastome recombination has not been found in the Pentaphylacaceae family. In this study, inter- and intra-specific plastome recombination could not be strongly indicated by the higher number of repeats evaluated. Complete chloroplast genomes provide sufficient information to construct phylogenetic relationships of plants, especially classification in lower taxonomic levels (He et al. 2012; Zhang et al. 2016). Among plant DNA barcodes, matK is one of the common candidates (Dong et al. 2014). Nevertheless, the phylogeny analysis revealed that different results for species classification from different genes can be generated by using only a single gene. Using more than one barcode can provide better phylogeny results.

Conclusion

In this study, the gene content and organization, distribution of sequence variation, repeats, and structural characteristics of complete cp genomes in species A. bockiana were revealed. Overall, the cp genome of four species, A. bockiana, A. megaphylla, A. millettii, and A. angustifolia, has a similar structure and content of genes and a high degree of conservation. The structural variations and absence of no expansions and contractions in the IR region were identified in four species, A. bockiana, A. megaphylla, A. millettii, and A. angustifolia. The cp genome information of A. bockiana in this study can be useful for cp genome studies of species belonging to the Adinandra genus and further explore phylogenetic relationships in the Pentaphylacaceae family.