Introduction

The chloroplast genome plays an important role in plant plastid genetic system, and its highly conserved circular quadripartite double-stranded structure consists of a large (LSC; 80−90 kb) and small single-copy regions (SSC; 16−27 kb), separated by two inverted repeat regions (IRs) with length of 20–28 kb. This configuration leads to its low mutation rate during plant evolution. Therefore, the stable gene content, simple structure, non-recombinant, and mostly maternally inherited properties indicate that the chloroplast genomes contain a great deal of valuable biological information as an ideal material to support phylogeny and evolution studies [1]. With the rapid development of high-throughput sequencing technology in recent years, researchers have efficiently extracted and sequenced chloroplast genomes from plants, thus greatly advancing the process of chloroplast genome sequencing. Chloroplast genome sequencing information has been widely used to build the basis of phylogenetic analysis, and the evolutionary history of many plant groups has been deeply explored and supported [2].

The abundance of species in Bignoniaceae includes a total of 650 species in 120 genera, including Catalpa, Campsis, Adenomocalymma, Amphilophium, and Anemopaegma [3]. Bignoniaceae plants, which mainly include trees, shrubs, or woody vines, are widely distributed in the tropics and subtropics and are important tropical plants. The vast majority of species of Bignoniaceae have very large and beautiful flowers and various exotic fruit shapes and are cultivated in botanical gardens around the world, as ornamental, scenic, and street trees, and as an ideal shade pergola plant for the tropics [4]. Campsis grandiflora is a climbing vine affiliated with the genus Campsis, family Bignoniaceae. Distinguished from Campsis radicans, other plants of the same genus derived from North America, C. grandiflora is mainly distributed in China and Japan and cultivated in Vietnam, India, and Pakistan [5]. C. grandiflora can be used for ornamental and medicinal purposes. Pharmacological studies have shown that it has antibacterial, antithrombotic, and antitumor effects [6]. According to the Chinese Pharmacopoeia (2015 Edition) [7], C. grandiflora promotes blood circulation, and its flower is a diuretic for meridional treatment and can cure the disease of falling and injury [8].

Although the family Bignoniaceae has numerous species, only more than 40 chloroplast data have been recorded [9]. The chloroplast genome study of the entire genus Campsis, an important branch of Bignoniaceae, is still blank. In the present study, we obtained the chloroplast genome sequence of the C. grandiflora by using high throughput sequencing technology, characterized the gene contents, gene loss, IR border, genome rearrangements within the family Bignoniaceae, obtained phylogenetic information about C. grandiflora and its closely related species within the family Bignoniaceae. In summary, results obtained in this study provided valuable information to elucidate the evolutionary history of species in Bignoniaceae.

Materials and methods

Plant material, DNA purification, and genome sequencing

The C. grandiflora sample was collected in Huazhong Medicinal Botanical Garden, China (located at 109.76 E, 30.18 N) with voucher sample ID of implad201808016 (IMPLAD, China). The whole-genome DNA of C. grandiflora was extracted using the plant genomic DNA kit (Tiangen Biotech, Beijing, China). Library construction and genome sequence were completed using the Hiseq 2500 platform (Illumina, San Diego, CA, USA) [10].

Chloroplast genome assembly and annotation

The raw data of the sequence were assembled into a complete chloroplast genome with NOVOplasty (ver. 4.0.1) [11].

Genome annotation and repeat analysis were conducted using CPGAVAS2, DB 2 [12]. For the annotation of tRNA genes, both tRNAscan and ARAGORN were used to predict tRNA genes initially. Those prediction results from tRNAscan-SE for genes without intron are saved, while those prediction results from ARAGORN for genes with intron are saved. These saved tRNA genes were used to search tRNAdb based on sequence similarity (http://trna.bioinf.uni-leipzig.de/DataOutput/Search). According to the results of BLAST search, we determined the name of the tRNA gene as the best hits. As a result, the trnE-UUC, trnS-CGA and trnM-CAU were curated as trnI-GAU, trnG-UCC and trnI-CAU, respectively.

Phylogenetic analysis

To determine the phylogenetic position of C. grandiflora in Bignoniaceae, we used the maximum likelihood method [13] to construct an evolutionary tree with the cpREV model of IQ-Tree [14] for 56 common protein sequences of 45 species, including genus Adenocalymma [15], Neojobertia [16], Pleonotoma [16], Amphilophium [17], Anemopaegma [18], Tanaecium [19], Dolichandra [20], Oroxylum [21], Catalpa [22, 23], Incarvillea [24,25,26], Spathodea [27], and two outer groups (Paulownia tomentosa [28] and Arabidopsis thaliana [29]) of species from the family Bignoniaceae. For phylogenic tree construction, we used Phylosuite (version 1.2.2) [30] to extract the GenBank files of 47 species to obtain the common protein-coding genes sequences. Then, we conducted multiple sequence alignment of the common protein-coding genes by using MAFFT (v7.313). The MAFFT outcome of the common protein-coding genes was concatenated and the conserved blocks from multiple alignments were calculated by Gblocks (v0.91b) for phylogenetic analysis. After we obtained the contree file, the visual work of the evolutionary tree was performed using iTOL Interactive Tree of Life [31].

Simple sequence repeat (SSR) and repeat analysis

The SSR locus and distribution were identified using the MIcroSAtellite identification tool [32]. The long tandem repeats (matching parameter = 2, mismatching and indel parameter = 7, minimum identity score = 50, maximum repeat period = 500, minimum repeat size = 30 bp, repeat unit similarity ≥ 90%) were identified using the tandem repeat finder [33]. The long interspersed repeats (repetition length ≥ 30 bp, Hamming distance = 3) were identified using the Vmatch (large scale sequence analysis software) [34].

Synteny analysis

In this study, we compared 45 Bignoniaceae species with A. thaliana to perform gene scale dot-plot analysis with Gepard (ver. 1.40 final) [35].

Genome rearrangements were identified between the chloroplast genome of A. thaliana and those of A. oligoneuron (NC_037232.1), A. gnaphalanthum (NC_042903.1), T. tetragonolobum (NC_027955.1), A. paniculatum (NC_042918.1), I. compacta (NC_050666.1), I. sinensis (NC_051523.1), N. candolleana (NC_036503.1), A. allamandiflorum (NC_036494.1), A. biternatum (NC_036496.1), A. marginatum (NC_037457.1), and C. grandiflora (MW430049), using BLASTN with an E-value cutoff of 1e-10. The homologous regions and gene annotations were visualized using a web-based genome synteny viewer Easyfig (ver. win2.1) [36].

Junction sites analysis

We used the GenBank files of 11 representative species with genomic structural variations from 45 species of Bignoniaceae that were used for detailed analysis to obtain the gene distribution on LSC, SSC, IRa, and IRb border. The location of genes on the boundaries was visualized using IRSCOPE [37].

Non-synonymous replacement (Ka)/synonymous replacement (Ks) analysis

We used the (adaptive branch-site random effects likelihood) aBSREL model of Hyphy Vision software to contribute the selective pressure analysis [38] among 45 species in Bignoniaceae. We first acquired the corresponding chloroplast genome GB and FASTA files according to the accession number in NCBI. Then, 63 clusters of orthologous genes were obtained among these species to calculate the Ka/Ks. The outcome was listed in aBSREL.json format. In the present study, we selected genes with value < 0.05. The detailed information is shown in the web version of aBSREL.

Results

Genome organization and compositions

The chloroplast genome sequence (GenBank accession no.: MW430049) of C. grandiflora was a typical circular DNA molecule with a total length of 154,303 bp. It has a conservative tetrad structure consisting of an LSC region, an SSC region, and a pair of IR regions, with lengths of 85,064, 18,009, and 25,615 bp, respectively (Fig. 1). The G/C content of the chloroplast genome of C. grandiflora was 38.09%. The G/C content in the IR region (43.17%) was higher than that in the SSC (32.74%) and LSC regions (36.16%).

Fig. 1
figure 1

Map of the chloroplast genome of Campsis grandiflora. Four rings are observed in the figure: from the center outwards, the red and green arcs in the first circle represent the forward and reverse repeating sequence, respectively. The short bars in the second circle represent tandem repeats. The short bar in the third circle represents the microsatellite repetition sequence. The fourth circle is the genetic structure and location map of the chloroplast genome. Genes with different functions are shown in different colors. (Color figure online)

Gene content

The chloroplast genome of C. grandiflora encodes 110 unique genes, including 79 protein-coding genes, 27 transfer RNA (tRNA) coding genes, and 4 ribosome RNA (rRNA) coding genes (Table S1). Among these genes, eight protein coding genes (rps12, ndhB, rpl2, rpl23, rps7, ycf1, ycf2, and ycf15), 7 tRNA coding genes (trnA-UGC, trnE-UUC, trnL-CAA, trnM-CAU, trnN-GUU, trnR-ACG, and trnV-GAC) and 4 rRNA coding genes (rrn16S, rrn23S, rrn5S, and rrn4.5 S) were located in the IR region. Twelve protein-coding genes (rps16, atpF, rpoC1, petB, petD, rpl16, rpl2 (+), rpl2(−), ndhB(+), ndhB(−), and ndhA) contain one intron, and one protein-coding genes (ycf3) contain two introns. Eight tRNA coding genes (trnK-UUU, trnG-UUC, trnL-UAA, trnV-UAC, trnI-GAU(−), trnI-GAU(+), trnA-UGC (−), and trnA-UGC (+)) contain one intron (Table S2). We also found the clpP gene became a pseudogene, unable to encode a complete protein.

The coding sequence (CDS) in the chloroplast genome of C. grandiflora was 79,170 bp, accounting for 51.31% of the total genome length. The length of the rRNA genes was 9388 bp, accounting for 6.08% of the whole genome length. The length of the tRNA genes was 2811 bp, accounting for 1.82% of the whole genome length. The non-coding regions of the C. grandiflora chloroplast genome mainly includes introns and gene spacers, whose length accounts for 40.79% of the whole genome length.

SSR and repeat sequences analysis

The repeat sequences are particular nucleic characteristic sequence repeat units with multiple copies in the genome. These repeats might play a significant role in the evolution of the chloroplast genome and can be used for species identification and molecular breeding as molecular markers. The repeat sequences are classified into three forms, namely, SSR, long tandem repeats, and long interspersed repeated sequence according to their length and correlation [1].

SSR is also named microsatellite sequence. It is a piece of DNA that consists of multiple duplicate basic repeat units made of 1–6 nucleotides. The SSR is widespread all around the different places of the gene. Their length is usually below 200 bp. We analyzed and listed the quantity, type, size, and locus of SSRs in the chloroplast genome of C. grandiflora. In total, 59 SSRs were identified in the C. grandiflora chloroplast genome. These SSRs are mainly composed of mononucleotide and dinucleotide repeat units (Table S3). No other forms such as tri-, tetra-, penta-, and hexa-nucleotide repeat units were found. Most of the 59 SSRs we found in the intergenic spacers (35 SSRs), 9 SSRs were located in the coding sequences, and 7 SSRs were situated in the introns of particular genes (Table S4).

The long tandem repeats refer to the repeated repetition of a sequence on a chromosome. A total of 40 tandem repeats have been found, satisfying the two conditions that the total length is over 20 bp, and the similarity between repeating units is greater than or equal to 90% (Table S5). We also listed the related property in the table. Among the long tandem repeats, more than half (22) of the repeats were located in IGS, 16 repeats are shown in the CDS, the one remainder repeats were located in the intron of gene.

Interspersed repeats are another kind of repeated sequence different from tandem repeats. It includes palindromic and direct repeats. With the e value less than 1E-4 as the threshold, the scattered repeats of plumbic chloroplast genomes included 49 direct repeats. Notably, all of the interspersed repeats of C. grandiflora chloroplast genome are D type (direct repeat sequence). These interspersed repeats are all in the range of 62,500–63,700 of accD gene, and almost all of them are located in the non-coding region, except for one sequence that its repeat unit I in the CDS of accD (Table S6).

Phylogenetic analysis

To obtain the phylogenetic information of C. grandiflora and make valid hypotheses about the homology between different lineages of Bignoniaceae, we used 45 Bignoniaceae species and 2 outgroup species chloroplast genomes to construct the phylogenetic tree of Bignoniaceae (Fig. 2).

Fig. 2
figure 2

Evolutionary tree of family Bignoniaceae. The phylogenetic results included 45 species within families and 2 outer species. The N. candolleana and P. albiflora interspersed in 13 Adenocalymma species converged into a large clade together with 11 Amphilophium species and 8 Anemopaegma, T. tetragonolobum, and D. cynanchoides species. This large clade subsequently converged with two species in the genus Catalpa and eventually gathered at the base of the evolutionary tree with C. grandiflora, three species in the genus Incarvillea, and T. capensis. According to the evolutionary tree, the event of C. grandiflora differentiation occurred in a relatively early period and has a close genetic relationship with Incarvillea. The right of the panel is the structure type of species according to the Fig. 4 and S3

The tree shows that two primary branches initially diverged from the tree root. Fifteen species from the genus Adenocalymma, Neojobertia, and Pleonotoma gathered into a branch on the tree. Eleven species of genus Amphilophium converged into a branch. Eight species of genus Anemopaegma converged into a branch. Then, genus Amphilophium, Anemopaegma, Tanaecium, and Dolichandra gathered into a big branch with Adenocalymma, Neojobertia, and Pleonotoma. Furthermore, the grand branch congregated a branch with genus Oroxylum, and then the genus Spathodea. Two species of genus Catalpa gathered into a branch. From this view, the eight genera mentioned above have contributed to the upper grand branch of the evolutionary tree of the family Bignoniaceae. In the remaining part of the tree, three species of genus Incarvillea gathered into a branch, and then Tecomaria have aggregated a branch with genus Incarvillea. At last, genus Campsis, Incarvillea, and Tecomaria have converged into another grand branch of the tree. These results indicate that the closest sister genus of Campsis is Incarvillea and Tecomaria in Bignoniaceae,

In the phylogenetic tree of the family Bignoniaceae, the bootstrap scores of all branches of the evolutionary tree were high (≥ 47%), indicating that the evolutionary tree has high reliability. The results of the phylogenetic analysis are consistent.

Synteny analysis

To identify the genome rearrangement of Bignoniaceae, we selected the cp. genome sequences of C. grandiflora and other 44 species belonging to Bignoniaceae for synteny analyses (Table 1). These 44 species include Adenocalymma (13), Anemopaegma (8), Amphilophium (11), Catalpa (2), Dolichandra (1), Oroxylum (1), Pleonotoma (1), Spathodea (1), Incarvillea (3), Tanaecium (1), Tecomaria (1), Neojobertia (1), respectively (Table 1). According to whether the structure was inverted and whether the IR region was expanded, these genomes were classified into 10 types compared with A. thaliana. The first group includes Anemopaegma acutifolium, Anemopaegma arvense, Anemopaegma glaucum, Anemopaegma foetidum, Anemopaegma album, Anemopaegma chamberlaynii, Anemopaegma prostratum, Anemopaegma oligoneuron which are all belonged to Anemopaegma. There was an inversion in the LSC region of the chloroplast genomes of this group compared with that of A. thaliana, which results in the ycf2 gene being transcribed counterclockwise. Meanwhile, the IR region underwent expansion, resulting in the duplication of truncated rps15, ycf1, genes included ancestral angiosperm IR regions (trnR, trnN, rrn5, rrn4.5, rrn23, trnA, trnI, rrn16, trnV, rps12, rps7, ndhB, trnL, ycf2, trnI, rpl23, rpl2), rps19, rpl22, rps3, rpl16, rpl14, rps8, infA, rpl36, rps11, rpoA, petD, truncated petB in the IRs (Fig. 3A). The second group includes Adenocalymma acutissimum, Adenocalymma trifoliatum, Adenocalymma aurantiacum, Adenocalymma bracteatum, Adenocalymma divaricatum, Adenocalymma peregrinum, Adenocalymma cristicalyx, Pleonotoma albiflora, Adenocalymma pedunculatum, Amphilophium gnaphalanthum, Amphilophium lactiflorum, Amphilophium chocoense, Amphilophium cuneifolium, Dolichandra cynanchoides, Oroxylum indicum, Spathodea campanulate, Catalpa bungee, Catalpa ovata (Fig. 3B), whose chloroplast genome structure is similar to that of A. thaliana, except the duplication of truncated rps15 and ycf1 in the IR region. The second group contains the most species of Bignoniaceae. The third group includes Amphilophium carolinae, Amphilophium dolichoides, Amphilophium steyermarkii, Amphilophium dusenianum, Amphilophium ecuadorense, Amphilophium paniculatum, Amphilophium pilosum whose chloroplast genome structure is similar to that of the first group but without the small inversion in the LSC region (Fig. 3C). The fourth group includes Incarvillea compacta, whose IR region contains truncated rps15, ycf1, and ancestral angiosperm IR region and also a large inversion in LSC region (Fig. 3D). The fifth group includes Incarvillea sinensis, whose IR region contains ndhA, ndhH, rps15, ycf1, and ancestral angiosperm IR region (Fig. 3E). The sixth group includes Adenocalymma hatschbachii, Neojobertia candolleana whose genome contains structural variation in the IR region (Fig. S2F). The seventh group includes Adenocalymma allamandiflorum, whose genome includes an inversion in LSC region (Fig. S2G). The eighth group includes Adenocalymma biternatum, Adenocalymma nodosum, whose genome contains an inversion in the LSC region (Fig. S2H). The tenth group includes Adenocalymma marginatum whose genome contains the 50 kb inversion in the LSC region (Fig. S2I). The IR regions of seventh, eighth and tenth groups are similar to the IR region of the second group. The ninth group includes C. grandiflora whose genome contains an inversion in the LSC region (Fig. 4A). We next performed a genome comparison compared by using Gepard (ver. 1.40 final). The visualization result shows that the rearrangement occurred at 48,772–73,286 bp in the C. grandiflora chloroplast genome (Figure S1) which result in the incomplete of clpP gene (Fig. 4B). The eleven group includes Tanaecium tetragonolobum, Incarvillea arguta, and Tecomaria capensis whose genome contains no inversion (Fig. S2J). These results suggested that inversions frequently occurred in the evolution of Bignoniaceae.

Fig. 3
figure 3

Comparative genomic analyses of A. thaliana and five other representative species of the family Bignoniaceae. The chloroplast genome of A. thaliana was aligned with those of five species. Each horizontal black line represents one genome. The species names, accession numbers and the structure types are shown to the right of the corresponding line. The conserved regions are bridged by lines. Panels A to E show that the five types of genome structure respectively from A. oligoneuron, A. gnaphalanthum, A. paniculatum, I. compacta, I. sinensis. The blue and red bar represents the identity of forward and reverse comparison, respectively

Fig. 4
figure 4

Synteny analysis of C. grandiflora and A. thaliana. A panel shows that each horizontal black line represents a genome. The species names are shown to the right of the corresponding line. The green arrows represent genes, and the direction of the arrows indicates where the genes start and end on the genome. In the alignment of the two sequences, the conserved regions are bridged by lines, and the matching genes in the same direction are connected by blue lines. The reverse and matching genes are connected by red lines. The darker the color, the better or t less the match. B panel shows the details of the genome rearrangement area of C. grandiflora

Table 1 Comparison of IR contents in the chloroplast genomes of the structurally variant species in family Bignoniaceae

.

Comparative analysis of gene loss in family Bignoniaceae

This study determined the correlation between gene loss and the rearrangement of genome structure. We made detailed statistics of the protein-coding gene loss in the particular plants of Bignoniaceae. All the plants involved in the statistics are derived from phylogenetic trees (Fig. 2). Based on the statistical results, The number of genes in the eight species from the genus Anemopaegma was highly conserved and consistent. In terms of gene loss, the accD gene was lost in the genus Incarvillea. The clpP gene was found lost in I. arguta and T. tetragonolobum and had incomplete structure in C. grandiflora. The ycf15 gene was only found in T. tetragonolobum, D cynanchoides, S campanulate, C. bungee, C. ovata, C. grandiflora, I compacta, T capensis. In general, most of the gene loss occurred in the genus Incarvillea and Tanaecium.

Ka/Ks selective pressure analysis

In terms of genetics, Ka/Ks or dN/dS represents the ratio between non-synonymous replacement (Ka) and synonymous replacement (Ks). This ratio can be used to determine whether selective pressure acts on the protein-coding gene [39]. Nucleotide variations that do not lead to amino acid changes are called synonymous mutations, whereas non-synonymous mutations occur. Generally, synonymous mutations are not subject to natural selection, whereas non-synonymous mutations are. In evolutionary analysis, the rate at which synonymous and non-synonymous mutations occur should be determined [39].

In the present study, we used the phylogenetic tree (Fig. 2) as species reference and utilized the aBSREL model of software Hyphy for the selection pressure analysis of protein-coding genes (Table S7). Six genes were positively selected, including ndhG, rbcL, rpl22, rpl23, rps12, and rps15. In species A. bracteatum, the ndhG gene is positively selected. In species A. glaucum and A. divaricatum, the rbcL gene was positively selected. The rpl22 gene was positively selected in species A. steyermarkii and D. cynanchoides. In species A. allamandiflorum and A. chamberlaynii, rpl23 gene was positively selected. In C. ovata, rps12 and rps15 were positively selected. In species C. grandiflora, rps15 was positively selected.

IR expansion and contraction To unravel the gene distribution of junction site and compare the distinction between C. grandiflora and other species with genome rearrangement structure in the family Bignoniaceae, we visualized the gene distribution with IRSCOPE (Fig. S3).

Based on the result of visualization, the complete genome was divided into five parts with four vertical bars. The five parts include LSC, IRb, SSC, IRa, and LSC. Except for the T. tetragonolobum, C. grandiflora, I. arguta, I. sinensis and T. capensis, in the most species from Bignoniaceae, the rps15 gene has crossed the JSA between SSC and IRa (Table 1 and Fig. S3), In I. sinensis, the ndhF crossed the JSB between IRb and SSC regions. Notably, significant differences were observed in the length of SSC and IR regions between the species from genus Incarvillea. The SSC region in I. sinensis was only 8,666 bp in length, and the IR regions was 35,394 bp in length. However, in I. compacta, the SSC region reached a length of 21,925 bp. The length of genomic regions also differed in the genus Amphilophium. In the third structure type species, the gene that crossed the IRb and LSC are petD. While in the species of A. chocoense and A. cuneifolium, the counterpart gene is rpl2. The expansion and contraction of the IR region led to the difference in IR length. For example, A. paniculatum and A. oligoneuron, their IR regions have reached 37,372 and 39,614 bp, which are much longer than the of A. cuneifolium with 27,814 bp. In the genus of Anemopaegma, the expansion of IR region to petB gene leads to the longest length of IR region in order Lamiales. These results suggested that the contraction and expansion of IR region are consistent with evolution.

Discussion

In the current study, we extracted and sequenced the chloroplast genome of C. grandiflora. The raw data were assembled and annotated with relevant tools, and the complete information of the transiting chloroplast genome was obtained. Furthermore, phylogenetic analysis of C. grandiflora was performed. We obtained the rearrangement structure in the genome of C. grandiflora compared with that of A. thaliana (Fig. 4). The synteny analyses between species from the family Bignoniaceae and A. thaliana were also conducted. This information could provide us a new direction of chloroplast genome research of C. grandiflora.

Special distribution of interspersed repeated sequences in accD gene

Based on the analysis of repeated sequences, we found the particularity of interspersed sequences. In comparison with other species in this family, the interspersed sequences in C. grandiflora chloroplast genome showed obvious centralization and uniformity. The results showed that all the interspersed repeated sequences were distributed in the coding region of accD gene. The distribution range is concentrated in 62,000–64,000 bp. In addition, the types of repeated sequences are only direct sequences, and palindrome sequences are not found (Table S6).

The acetyl-CoA carboxylase (accD) gene is present in plastids such as chloroplasts in most flowering plants, including non-photosynthetic parasites. Its function is to encode the β-carboxylase subunit of acetyl-CoA carboxylase, thereby participating in plant life activities and material metabolism. Previous studies on tobacco have shown that if the accD gene is knocked out or destroyed and cannot be successfully expressed in plastids, the leaf development of the plant will be severely affected. The loss of tissue cells leads to the stagnation of leaf division and differentiation, causing the failure of photosynthesis and the death of plants. Therefore, the accD gene is an essential gene in plants. In the present study, the special distribution of interspersed sequences raised the possibility of molecular markers for the unique sequence in the gene coding region, and based on the statistics and analysis of the location of different repeat sequence families in different genes, new interspecies relationships or evolutionary processes can be found. These new directions are expected to be realized in future research.

Phylogenetic tree

Based on the distribution of species displayed in the phylogenetic tree, the genus Adenocalymma has a distant genetic relationship with the genus Campsis. By contrast, the genus Incarvillea, Tecomaria, and Catalpa have a closer genetic relationship with the genus Campsis. Considering that C. grandiflora is located at the base of the whole tree, the divergence event occurred in an earlier period of the evolution process in Bignoniaceae.

IR expansion and contraction

The results showed that the location and species of boundary genes were different with the length of genome sequence (Fig. S3). Therefore, the variation in the length of genomic regions leads to differences in the genes located at the boundaries. In C. grandiflora and T. tetragonolobum, the ycf1 gene was located at JSB and JSA, whereas in C. grandiflora, rps19 was located at the LSC region but crossed the JLB in T. tetragonolobum.

Systematic analysis of genome rearrangement that occurred in Bignoniaceae

We verified whether other species in the Bignoniaceae underwent genome rearrangement. We then analyzed 44 other species from the phylogenetic tree with Gepard (ver. 1.40 final). Finally, we identified 11 genomic structures in chloroplast genomes from 45 species of Bignoniaceae. We used EasyFig to visualize these 11 rearrangement structures (Fig. 3 and Fig. S2). Eight species of the genus Anemopaegma share the same genomic rearrangement [18]. In combination with the above-mentioned statistical results of gene content of IR region and the results of synteny analysis (Fig. 3, Fig. S3 and Table 1), in genus Anemopaegma, 8 species had the same genome structure and maintained a highly conservative gene number. This property can be considered as an intergeneric characteristic of the genus Anemopaegma. The second structure type contains the most species of Bignoniaceae. Meanwhile the IR regions of the seventh, eighth and tenth groups are similar to that of the second group. It is proposed that the second type structure is located at the base node of evolution. The chloroplast genome of C. grandiflora contains an inversion in the LSC region (Fig. 4A). The rearrangement occurred at 48,772–73,286 bp in the C. grandiflora chloroplast genome (Fig. S1) which result in the incomplete of clpP gene (Fig. 4B). Among species from the genus Incarvillea, the gene content of IR region was also significantly different, which displays rapid variation in the genus.

Conclusions

In the present study, we extracted, assembled, sequenced, and annotated the complete chloroplast genome of C. grandiflora, filling in the gaps in chloroplast genome information of genus Campsis. The phylogenetic analysis reveals the phylogenetic information of Bignoniaceae as well as the overall evolutionary history of 45 species of the family. The repeat sequence analysis also revealed the genetic characteristic information. The Ka/Ks analysis indicated the direction of evolution of Bignoniaceae. We conducted a detailed and in-depth analysis of the chloroplast genome of C. grandiflora and found that the chloroplast genome has an inverted rearrangement structure through synteny analysis. We also found and sorted out the rearrangement structures of 11 chloroplast genomes of Bignoniaceae from the available data by synteny analysis. The results will provide important phylogenetic information of C. grandiflora. Gene loss analysis was used to determine the relationship between rearrangement structure and the gene quantity variation.

The Bignoniaceae family includes many species, but limited information is currently available. The results of this study are based on all the released chloroplast genome sequences available so far. With the acceleration of sequencing progress, the database of Bignoniaceae will be enriched day by day in the future, and more information will be discovered.