Introduction

Albizia julibrissin Durazz., also called “He Huan” in traditional Chinese medicine practice, is a large tree found on hillsides or that is cultivated in fields. It is naturally from the northeast to the southern parts of China, including Southwest China (Han et al. 2011), as well as Korea, India, other Asian regions, Africa, and North America (Li and Yang 2020). The medicinal parts of A. julibrissin are the flowers and bark (China-Pharmacopoeia-Committee 2020). The flowers of A. julibrissin may ameliorate memory loss induced by sleep deprivation in the Drosophila model (Chang et al. 2019), and the bark of A. julibrissin could reduce the risk of developing dementia (Chen et al. 2021).

A Cp of plants has a circular genome outside the nucleus and mitochondria, which plays a key role in photosynthesis, development, and physiology (Neuhaus and Emes 2000; Howe et al. 2003). The Cp genome is generally a tetrad ring structure (120–160 kb in size) and contains about 110–130 unique genes (Jarvis and Lopez-Juez 2013). Most angiosperms have a tetrad ring structure, with a LSC region and a SSC region, separated by a pair of IR regions of the same length (Wicke et al. 2011; Shetty et al. 2016). The evolution rate of the Cp genome is much lower than that of the nuclear genome (Wolfe et al. 1987), due to the high difficulty of mutation, the low nucleotide substitution rate, and maternal genetic characteristics. Cp genomes are generally more conserved and less mutable than genes in the nucleus, which is of vital importance to plant phylogeny and species identification (Wang et al. 2020). Therefore, the Cp genome is not only used to analyse genes and their encoded proteins but also to examine the evolutionary relationship between the pedigrees of different species (Moore et al. 2010; Shaw et al. 2014; Zhu et al. 2019). At present,  ~ 30 thousand records of the complete Cp genome are accessible in the database of the National Centre for Biotechnology Information (NCBI).

Although studies have reported transcriptome sequencing, the coding sequence (CDS) of genes in Cps of A. julibrissin (Wojciechowski et al. 2004; Wang et al. 2009) is not available. In this study, the complete Cp genome of A. julibrissin was de novo assembled from short Illumina reads, and the detailed features of the Cp genome structure and organization were analysed. In addition, the phylogenetic analysis on the five genomes previously annotated in the Leguminosae family and 21 Cp genomes from other related species from GenBank and A. julibrissin were performed to determine genealogical relationships. Our results could provide valuable basic information for the conservation and sustainable utilization of A. julibrissin.

Materials and methods

Plant materials

Fresh and healthy leaves of A. julibrissin were collected from a single individual from Anhui University of Chinese Medicine, Anhui, China (N31°56′17″; E117°23′25″). Species identity was confirmed by experts in our lab. All tissues were collected from different parts of the same plant. A voucher specimen of this plant is stored in the Center of Herbarium, Anhui University of Chinese Medicine (Hefei, China) with a depository No. of AHTCM2020yxy04ZJ.

DNA extraction and library construction

An optimized protocol previously used by our team was taken to extract the complete genomic DNA using a commercial DNAsecure Plant Kit (TIANGEN Biotech Co., Ltd, Beijing, China) (Meng et al. 2021; Yao et al. 2021). Total DNA of A. julibrissin was checked for integrity and quality with BioPhotometer Plus (Nucleic Acid and Protein Detector, Eppendorf, Germany) and 1.0% agarose gel electrophoresis, and high-quality DNA was used to construct gene libraries. DNA libraries were constructed using the VAHTSTM Universal DNA Library Prep Kit for Illumina® V3 (Vazyme Biotech Co., Ltd, Nanjing, China) following the manufacturer’s protocol (Slatko et al. 2018). The total DNA was sonically fragmented into < 500 bp (Covaris S220), end-repaired, 5'-phosphorylated, and detailed with End Prep, followed by T-A ligation with splices at both ends. Fragments of ~ 470 bp were then recovered for PCR amplification. PCR products were purified and validated using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), and quantified by a Qubit 3.0 Fluorometer (Invitrogen, Carlsbad, CA, USA). The library with different indexes was multiplexed and loaded onto the Illumina HiSeq instrument according to the manufacturer's instructions (Illumina, San Diego, CA, USA).

Illumina sequencing and genome assembly

To obtain the sequence raw data, Genewiz Biotechnology Co. Ltd. (Suzhou, China) was commissioned to perform the next-generation sequencing using the Illumina Hiseq platform (Illumina, San Diego CA, USA) with a 2 × 150 paired-end (PE) configuration. Image analysis and base calling were executed by HiSeq Control Software (HCS) + OLB + GAPipeline-1.6 (Illumina) on the HiSeq instrument, and raw reads on the Cp genome of A. julibrissin were subsequently obtained. Trimmomatic software (version 0.39) (Bolger et al. 2014) was used to filter low-quality reads of the raw data, and the Q value was defined by Sanger. Ends were cut if the base quality was less than Q30 and the reads were excluded from further analysis if they had a length less than 100 bp or with an average Q value less than 30. The reads that passed QC were assembled to contigs using Velvet (version 1.2.10) (Zerbino and Birney 2008), and gap-filling was performed by SSPACE (version 3.0) (Boetzer et al. 2011) and GapFiller (version 1–10) (Boetzer and Pirovano 2012). The Cp genome of A. julibrissin was assembled using NOVOPlasty 2.7.2 (Dierckxsens et al. 2017) and auxiliary software Spades (Bankevich et al. 2012) on all the contigs, and the complete Cp sequences of A. odoratissima (NC_034987) were used as the reference genome. The assembled complete Cp genome sequence of A. julibrissin was submitted to NCBI, with the accession number MW539046.

Genome annotation

Based on the assembled clean data, the coding gene, tRNA, rRNA, and other ncRNAs were predicted by GoSeq (version 1.78) (Tillich et al. 2017), and the programs tRNAscan-SE (Lowe and Eddy 1997) and RNAmmer (Lagesen et al. 2007). The predicted protein sequence of the coding genes was subjected to database (NR, KEGG, and GO) searches using BLAST (version 2.2.31 +) (Wixon and Kell 2000; Harris et al. 2004; Boratyn et al. 2013; Yu and Zhang 2013), and the best matching result was selected as the gene annotation.

Comparative genomics

The Cp genome was outlined using Organellar Genome DRAW (Lohse et al. 2013) (http://ogdraw.mpimp-golm.mpg.de/index.shtml), with manual adjustment when necessary. The sequence divergence and LSC/IRb/SSC/IRa junction regions were compared and analysed among A. julibrissin, Arabidopsis thaliana, and four other Leguminosae species by mVISTA (Frazer et al. 2004) (http://genome.lbl.gov/vista/mvista/submit.shtml) and IRscope (Amiryousefi et al. 2018) (https://irscope.shinyapps.io/irapp/).

Repeat sequence analysis

The REPuter program (Kurtz et al. 2001) (https://bibiserv.cebitec.uni-bielefeld.de/reputer) was used to identify the four repetitive sequences of the Cp genome including the forward, reverse, palindrome, and complement sequences. The detection length and accuracy of repeated sequences were set to ≥ 30 bp and > 90%, respectively (Vieira Ldo et al. 2014; Chen et al. 2015). The Cp SSRs were detected using MISA (Thiel et al. 2003) with the minimum repeats of single, two, three, four, five, and hexanucleotides set to 10, 5, 4, 3, 3, and 3, respectively.

Codon usage analysis

The program CodonW1.4.2 (http://downloads.fyxm.net/CodonW-76666.html) (Du et al. 2020) was used to analyse the synonymous codon usage of 93 protein-coding genes (PCGs) in the Cp genome of A. julibrissin and to calculate two related parameters number of effective codons (Nc) and relative synonymous codon usage (RSCU). Nc is often used to evaluate codon bias at the level of the individual gene (Frank 1990). RSCU is the observed codon frequency divided by the expected frequency. An RSCU value close to 1.0 indicates that the deviation is not significant (Sharp et al. 1986). The AA frequency was calculated as the percentage of codons encoding the same amino acid divided by the total codons.

Prediction of RNA editing sites

Application of Prep-Cp (Mower 2009) (http://prep.unl.edu/) and CURE (Du et al. 2009) (http://bioinfo.au.tsinghua.edu.cn/pure/) were used to predict the RNA editing sites of 93 PCGs, and the threshold (cut-off value) was set to 0.8 to ensure the accuracy of the editing site prediction.

Phylogenomic analyses

To understand the phylogenetic relationship between A. julibrissin and related species, a total of 26 available Cp genomes records (in addition to A. julibrissin) were downloaded from NCBI, with four species, Penthorum chinese, Nicotiana tabacum, Schizophragma hydrangeoides, and Hydrangea petiolaris, as outgroups. The 27 plastome sequences were aligned and Maximum Likelihood (ML) phylogenetic analysis was conducted with MEGA-11 (version: 11) (Tamura et al. 2021). The percentage of replicate trees in which the related taxa clustered together in the bootstrap (BS) test (1000 replicates) are shown aside from the branches (Fig. 7).

Statistical analysis

All experimental tissues were collected from different parts of the same plant. In the DNA extraction experiment, three repetitions of independent biological experiments were used to ensure the reproducibility of the results. In the sequencing process, a sufficiently high threshold was set to eliminate unreliable sequencing results. In the subsequent analysis, at least three calculations were conducted to prevent result deviation caused by uncontrollable reasons, such as network or computer performance.

Results and discussion

Raw data analysis

Raw data of the complete Cp genome of A. julibrissin from Illumina sequencing contained 24,165,368 reads representing 3,624,805,200 bp, which were deposited at https://www.ncbi.nlm.nih.gov/sra/SRX11034648. The sequencing quality distribution (Fig. S1 A), base type percentage (Fig. S1 B), and mean quality distribution (Fig. S1 C) of raw reads were evaluated. These assessments indicated the sequencing quality met standard requirements and was suitable for further analysis. After optimizing the raw data, the effective ratio of Q20 and Q30 were as high as 99.82% and 99.31%, respectively.

Basic characteristics of the chloroplast genome of A. julibrissin

The complete Cp genome size of A. julibrissin was 175,922 bp according to the de novo genome assembly (Fig. S1 D). Four distinct regions, which were similar to other reported Cp genome structures, were also contained in the A. julibrissin Cp genome (Howe et al. 2003; Vieira Ldo et al. 2014): the LSC region, SSC region, and a pair of IR regions, with lengths of 91,323 bp, 5145 bp, 39,725 bp, and 39,725 bp, respectively (Fig. 1). The GC content in the A. julibrissin cp genome was 35.48% with 8.34% GC bias.

Fig. 1
figure 1

Plastid genome map of A. julibrissin. Genes inside the circle are transcribed clockwise, genes outside are transcribed counter-clockwise. Genes are color-coded to indicate functional groups. The dark grey area in the inner circle corresponds to GC content while the light grey corresponds to the adenine–thymine (AT) content of the genome. The small (SSC) and large (LSC) single-copy regions and inverted repeat (IRa and IRb) regions are noted in the inner circle

The Cp genome of A. julibrissin encoded a total of 34 ncRNAs and 93 unique genes, including 93 PCGs, 30 tRNAs, and four rRNA genes (rrn5, rrn4.5, rrn16, rrn23), as annotated by the software Prodigal (version 3.02) (Hyatt et al. 2010) and database of Rfam (version 14) (Kalvari et al. 2021) (Table 1). Eighteen contained introns, of which 15 (six tRNA genes and nine protein-coding genes) contained one intron, while the other three (rps12, clpP1, and ycf3) contained two (Table 2). The rps12 is a trans-spliced gene with three exons located in the LSC and the IR regions. The average length of the 93 PCGs was 959 bp (N50 was 1533 bp) with a 36.90% GC content. The genome data of the A. julibrissin Cp genome was submitted to the database of NCBI GenBank with a registration number of MW539046.

Table 1 A list of genes found in the plastid genome of A. julibrissin
Table 2 Genes with introns within the A. julibrissin Cp genome and the length of exons and introns

Only the complete Cp genome of A. odoratissima and A. bracteata has been reported for the genus Albizia of Leguminosae (Wang et al. 2017a, b; Zhang et al. 2020), with a Cp genome size of 174,816 bp and 176,054 bp in total and GC content of 36% and 35.4%, respectively. Similar to A. julibrissin, four different regions were characterized in A. odoratissima: the SSC region (4928 bp), LSC region (90,169 bp), and a pair of IR regions (each 39,882 bp). There were also 93 coding genes in the A. odoratissima Cp genome, and the total ncRNA, rRNA, and tRNA were 45, 8, and 37, respectively. The complete Cp genome of A. bracteata also had four different regions, the SSC region (5033 bp), LSC region (91,245 bp), and a pair of IR regions (each 39,888 bp), with the number of coding genes, total ncRNA, rRNA, and tRNA being 92, 46, 9, and 37, respectively. According to these data, the Cp genomes of A. julibrissin, A. bracteata, and A. odoratissima are highly similar.

The gene function annotation of A. julibrissin

Results gleaned from the NR database (Non-Redundant Protein Sequence Database) revealed 18 protein transcription genes that are similar to those in Acacia ligulata. In addition, about 8, 11, 10, and 4 genes were homologous to those in A. odoratissima, Inga leiocalycina, S. saman, and Staphylococcus aureus, respectively (Table S1). The annotation suggested that 82 of 93 coding genes in the A. julibrissin Cp genome had been classified in the Cp of Leguminosae, and some to other related families (11).

Based on the KEGG (Kyoto Encyclopedia of Genes and Genomes) biological pathway annotation, the Cp genes of A. julibrissin were distributed to four primary classes (cellular processes, genetic information processing, metabolism, organic systems). Among the four primary classes, the number of genes belonging to metabolism was the largest (especially genes related to energy metabolism processes) (Fig. 2). This embodied the biological function of Cp energy synthesis. The detailed gene annotation is presented in Table S2.

Fig. 2
figure 2

The KEGG pathway categories of A. julibrissin. The table of contents within the diagram is the second classification of biological pathways; the numbers are the number of genes; the different colors are used to distinguish the primary classification of biological pathways. KEGG divides biological metabolic pathways into six categories, and each category is divided into secondary classifications by the system and each of the statistical secondary classifications

With different GO (Gene Ontology) distributions, the second-level GO terms that target genes are reflected in the Cp genome of A. julibrissin (Fig. 3). The annotations indicated that the gene encoding proteins are involved in the regulation of a variety of cellular biological processes. Among them, 63 genes involved in metabolic processes belonged to the biological process followed by the cellular process. The detailed gene annotations are in Table S3.

Fig. 3
figure 3

Histogram of target gene distribution of A. julibrissin in GO terms

Codon usage analysis

Codon usage bias is affected by a variety of factors including gene mutation and selective evolution (Ermolaeva 2001; Wong et al. 2002). To check the codon usage, we calculated the Nc values of the 93 PCGs from the complete Cp genome of A. julibrissin (Table S4). The Nc value for each PCG indicated that the number of codons in all 93 PCGs ranged from 23.41 (rpl36) to 61 (petG, petL). The Nc value of rpl36 indicated that it had the most biased codon usage, with an average value of 46.21. Figure 4 and Table S5 show the status of codon usage and RSCU of the A. julibrissin Cp genome, where 31 codons in the 93 PCGs showed codon usage preference. Twenty-nine of thirty-one codons with preference are A or T-ending codons. Similar conclusions have been found in forsythia, rice, and other plants (Liu and Xue 2004; Wang et al. 2017a, b; Li et al. 2019). Correspondingly, the G or C terminal codons did not show codon preference (RSCU value < 1), while the stop codons are more biased towards the use of TAA.

Fig. 4
figure 4

The RSCU values of the 20 amino acids in the 93 PCGs of the A. julibrissin Cp genome and their different codon usages

Our results indicated that the GC content and codon usage bias at the A- and T-ends might be the main factors that determine the codon usage of the A. julibrissin Cp gene. The 93 unique PCGs contain 89,190 bp and encode 29,730 codons. The amino acid frequency of the A. julibrissin Cp genome was also calculated. Among all the codons used in the PCGs, 3133 (10.54%) encode leucine, which is the most commonly used in the A. julibrissin Cp genome. Cysteine was rarest, which is encoded by only 367 (1.23%) codons.

SSRs and LSRs of the A. julibrissin Cp genome

The software REPuter (Kurtz et al. 2001) was used to analyse the repeat sequences in the A. julibrissin Cp genome. Among 30 forward repeats, 18 palindrome repeats, and one reverse repeat were found with lengths ≥ 30 bp (identity > 90%), and without the detection of complement repeats (Table S6). Among 49 repetitions, 10 repetitions (20.4%) were 30–39 bp, 23 repetitions (46.9%) were 40–49 bp, 11 repetitions (22.4%) were 50–59 bp, 2 repetitions (4.1%) were 60–69 bp, 3 repeats (6.1%) were 70–81 bp, and the longest was 81 bp. Generally, duplications occur in non-coding regions (Nazareno et al. 2015; Yao et al. 2015), but 36.7% of the duplications were found in coding regions of the A. julibrissin Cp genome (Table S6).

Playing an important role in gene mutations, simple sequence repeats (SSRs) are widely distributed in the genomes of most species (Cavalier-Smith 2002). Previous studies have used them as valuable molecular markers for the study of species polymorphisms and population genetics (Xue et al. 2012; Hu et al. 2015). In this study, we analysed the occurrence, type, and distribution of SSRs in the Cp genome of A. julibrissin. The results showed that there were 149 SSRs in the complete genome (Table S7), which accounts for 4131 bp (2.35%) of the total sequence.

A large proportion of these SSRs consisted of mono-repeats that were found in 98 cases. Di-nucleotide-(7), Tri-(4), tetra-(3), and pentanucleotide repeat sequences-(2) occurred at lower levels. There were also 35 compound SSRs. The results showed that most of the repetitive sequences consist of A and T nucleotides instead of tandem G or C repetitive sequences. Other studies have obtained similar results (Kuang et al. 2011; Qian et al. 2013). A total of 31 SSR repeats occurred in coding gene regions, including ndhA, ndhF, petB, petD, rpl16, rpoC1, rpoC2, rps16, rps19, trnV-UAC, ycf1, ycf3, and ycf4. We found that ycf1 (12) had the most repetitions. Almost all SSRs were distributed across the entire A. julibrissin Cp genome, including SSC, LSC, and double IR. These SSRs could be used as lineage-specific markers, which would guide evolution and genetic diversity studies.

Prediction of RNA editing sites

Among the 93 PCGs of the A. julibrissin Cp genome, we predicted 55 RNA editing sites in 20 of them (Table 3). The ndhB gene contains the most editing sites (13), a similar finding to previous studies. (Freyer et al. 1995; Kahlau et al. 2006; Chateigner et al. 2007; Wang et al. 2017a, b). The ndhD genes were predicted to have eight editing sites, with five in ndhF; four in matK and rpoB; three in ropC2 and ndhG; two in rps14 and clpP1; and one each in accD, atpA, atpF, matK, ndhD, ndhF, ndhA, ndhB, ndhG, petB, psbE, psbF, rpoA, rpoB, rpoC1, rpoC2, rps2, rps14, and rps16. All predicted editing sites involve the conversion from C to U. The phenomenon of editing is also common in the Cps and mitochondria of Spermatophyta (Bock 2000). The predicted results showed that editing sites were only in the first position of the codon, with no editing sites found in the second and third codons. The amino acid with 32 editing sites changed can lead to variations in the acidity and polarity of proteins, such as serine, phenylalanine, leucine, and tyrosine. The conversion from serine to leucine is the most abundant in the prediction of editing sites. In addition, we also compared the predicted RNA editing sites of A. julibrissin with two other species of the same genus (A. bracteate and A. odoratissima). This showed that their features were highly similar, including both the predicted site locations and amino acid shifts (Table S8). As a form of post-transcriptional regulation of gene expression, previous studies also have revealed this characteristic in most RNA editing research (Jiang et al. 2011).

Table 3 The predicted RNA editing site in the A. julibrissin Cp genes

Comparative genome and phylogenetic analysis of A. julibrissin

To detect potential divergence among the A. julibrissin complete Cp genome and its associated species, we downloaded five additional Cp genome sequences already reported (A. odoratissima, S. saman, L. trichandra, Pyrus flexicaule, A. bracteata) in the Leguminosae family, as well as the model plant A. thaliana from the NCBI. As shown in Table S9, these Cp genome sequences ranged from 154,478 bp (A. thaliana) to 178,887 bp (P. flexicaule) in length, and every part of the quadrilateral cycle is comparable in the selected Cp genomes. The total GC content of these Cp genomes was also similar (35–36%).

The highly conserved IR region plays an essential role in stabilizing the structure of the Cp genome (Maréchal and Brisson 2010; Fu et al. 2016). For the IR and SC boundary regions, their expansion and contraction are usually viewed as the chief mechanisms behind the variation of Cp genome length in angiosperms (Chumley et al. 2006; Lei et al. 2016). The adjacent genes and boundaries of LSC/IRb/SSC/IRa of A. julibrissin Cp genomes were compared with the five other species in Leguminosae family, and the model plant A. thaliana (Fig. 5). Moreover, the Cp genome composition of the seven species was compared (Table S9), and the expansions and contractions in IR boundary regions were observed.

Fig. 5
figure 5

Comparison of the borders of the LSC, SSC, and IR regions in Cp genomes of six species of Leguminosae and Arabidopsis thaliana. The number above the gene represents the distance between the ends of the genes and the junction sites. The arrows indicate the location of the distance

Different from A. thaliana, the rps19 gene of the five species of legumes (S. saman, L. trichandra, A. odoratissima, A. julibrissin, A. bracteate) all existed at the LSC/IRb boundary, while five of them have rpl2 gene in the IRb region. There was an expansion of the ndhF gene at the IRb/SSC boundary in A. julibrissin, and the ccsA gene is 188 bp away from the SSC/IRa boundary. This was similar to S. saman but differed from A. odoratissima and A. bracteate. This indicated that the IRa region of A. julibrissin is partially contracted. In addition, the rps19 gene was found in A. bracteate, S. saman, and A. odoratissima, but not at the same position as in A. julibrissin. It is speculated that the presence of the rps19 gene in other genomes may be the result of gene duplication. The trnH gene existed in all six plants of Leguminosae, located in the LSC region, except for A. thaliana. The distance from trnH to the LSC/IRa boundary ranges from 0 to 30 bp among these six same family species, among which, it is closer to the boundary of LSC/IRa in S. saman, A. bracteate, A. odoratissima, and A. julibrissin. In general, the IR/SC junctions among these five species in the family Leguminosae are similar, but there are certain differences when compared with A. thaliana. Our results again suggested that the Cp genomes of related species are conserved, whereas greater diversity may exist between species of different families (Fig. 5).

To further detect the differences in the Cp genome among associated species and identify whether gene rearrangements are also present in the A. julibrissin Cp genome, we compared the homology of entire Cp sequences in five species from the family Leguminosae and A. thaliana using mVISTA (Frazer et al. 2004). Among them, the A. julibrissin Cp genome was used as the reference genome (Fig. 6). The results indicated no occurrence of genomic structural rearrangements in the selected Cp genomes, except for A. thaliana; the genome similarity of the other six Cp genomes was all higher than 90% and highly conserved.

Fig. 6
figure 6

Sequence alignment of five Cp genomes of Leguminosae and one Cp genome of Arabidopsis by mVISTA, with the annotation of A. julibrissin as the reference. The vertical scale indicates the percentage of identity, ranging from 50 to 100%. The horizontal axis indicates the coordinates within the chloroplast genome. Genome regions are color-coded as protein-coding, rRNA, tRNA, intron, and conserved non-coding sequences

To understand the phylogenetic relationships among A. julibrissin and other 26 related species (Table S10), sequences of the Cp genomes were aligned by MEGA-11 (version 11) (Tamura et al. 2021). The percentage of replicate trees where the relevant taxa are clustered together is shown next to the branches in bootstrap (BS) tests (1000 replicates). As can be expected, A. julibrissin was distributed to genus Albizia, the Leguminosae subfamily, and was closely related to A. bracteate and A. odoratissima, with a 100% BS value (Fig. 7).

Fig. 7
figure 7

Maximum likelihood phylogenetic tree of A. julibrissin with 26 Cp genomes of previously reported species as constructed by Cp genome sequences. Numbers on the nodes are bootstrap values from 1000 replicates

Conclusions

In this research, we constructed the complete Cp genome of A. julibrissin using Illumina HiSeq reads and other technical approaches. The results show that the Cp genome of A. julibrissin had a classical tetrameric structure of 175,922 bp in length, consisting of one LSC of 91,327 bp, one SSC of 5,145 bp, and two copies of IR regions of 39,725 bp. Overall, the GC content of Cp genome in this plant was 35.48%. The Cp genome contains 127 unique genes, which are 93 PCGs, 30 tRNA genes, and 4 rRNA genes.

A total of 93 coding genes of the A. julibrissin Cp genome were classified as Cp genes in the Leguminosae family. In the codon usage analysis, most of the Nc values were greater than 44, suggesting that the gene codon usage bias in the A. julibrissin Cp genome is weak. In addition, we detected 149 SSRs. There is no denying that SSRs found in this study are significant in studying the evolution and diversities of genomes in other species if they can be used as specific lineage markers. Most of the repetitive sequences are filled with A and T nucleotides, while tandem G or C repetitive sequences are not common. Among the coding genes, the amino acid changes at 32 editing sites can lead to changes in the acidity and polarity of the proteins. The overall GC content, boundaries of LSC/IRb/SSC/IRa, and homology of the entire Cp sequence, were also similar among different Cp genomes included in the current research (A. odoratissima, S. saman, L. trichandra, P. flexicaule, and A. thaliana). The phylogenetic relationship of A. julibrissin and another 26 related species shows that A. julibrissin was placed within genus Albizia, the Leguminosae subfamily, and closely related to A. bracteate and A. odoratissima in the same genus with a 100% BS value.

These results do not only contribute to offering valuable evidence to clarify the evolutionary history of A. julibrissin at the genetic level but are also beneficial to explore more genetic information and better breeding of A. julibrissin.

Author Contributions Statement

SX, JZ, HH, CQ, XM, and BH designed the project and/or conducted aspects of the experimental work. JZ, HH, FM, XY, JW, and XG conducted the experiments and the collection of electronic resources. SX and HH supported this work financially and participated in its planning. JZ, SX, and HH wrote the manuscript. All authors edited the manuscript and approve its submission in the current form.