Introduction

The genus Actinidia, commonly known as kiwifruit, ‘the king of fruits’, includes economically important horticultural species, such as A. chinensis and A. chinensis var. deliciosa that have been extensively cultivated worldwide (Testolin et al. 2016). The genus Actinidia, together with other two sister genera, Clematoclethra and Saurauia, belongs to Actinidiaceae that is located on the basal asterids, Ericales. Based on the morphological characteristics of fruit, pith and hair, Actinidia has been classified into four intrageneric sections, Leiocarpae (Lei), Maculatae (Mac), Stellatae (Ste), and Strigosae (Str) (Chat et al. 2004; Testolin et al. 2016). Given the traditional classification system could poorly reflect the actual relationships among Actinidia species, more accurate approaches need to be recruited to define their evolutionary lineage (Liu et al. 2017; Tang et al. 2019b).

Meanwhile, molecular approaches have been employed to establish distinct phylogenetic relationships for Actinidia taxa by using makers of RAPD (randomly amplified polymorphic DNA) (Huang et al. 2002), ITS (internal transcribed spacers) (Li et al. 2002), and/or sequence fragments from chloroplast and mitochondrion genomes (Chat et al. 2004). Due to lack of genome-wide sequences, with few markers including limited nucleotide information, the reconstructed phylogenetic relationships remain either incompletely resolved or weakly supported. Nevertheless, along with public releases of nuclear genome of A. chinensis “Hongyang” (Huang et al. 2013), using genome-wide SNPs (single nucleotide polymorphisms), an improved phylogenetic tree of 26 Actinidia species was reconstructed, clustering into five main groups (Liu et al. 2017), including A. chinensis complex, A. arguta complex, the A. polygama, A. rufa clade and other hairy and/or spotted fruit taxa. More recently, a comprehensive phylogenetic relationship was reconstructed on the basis of four noncoding intergenic sequences alone from chloroplast genomes of 59 Actinidia taxa (Tang et al. 2019b).

However, the subdivisions in Actinidia based on molecular phylogenetic relationships are apparently in conflict with morphological classification. Molecular phylogenetic reconstructions demonstrated that the four morphologically defined intrageneric sections were not monophyletic, probably because of natural interspecific hybridization/introgression facilitated by the sympatric distributions of Actinidia species (Chat et al. 2004; Liu et al. 2017).

Compared to nuclear genomes, the plants’ chloroplast genomes are more suitable for deciphering phylogenetic relationships in the complicated plant families, due to the hereditary characteristics, conserved genome structure and small size (Martin et al. 2005; Daniell et al. 2016). The land plants’ chloroplast genomes are mainly inherited from maternal parents and possess a highly conserved genome structure with four independent parts, including an LSC (large single-copy) region, an SSC (small single-copy) region, and two separated inverted repeat regions (IRa and IRb) between LSC and SSC (Daniell et al. 2016).

In this study, using the chloroplast genome sequences of 137 Ericales species downloaded from the NCBI genome database (http://www.ncbi.nlm.nih.gov/genome), including 25 Actinidia species available for A. zhejiangensis (Ai and Liu 2019), A. callosa var. henryi (Wu et al. 2019), A. callosa var. strigillosa (Liu et al. 2020), A. chinensis (Yao et al. 2015), A. chinensis var. deliciosa (Yao et al. 2015), A. chinensis var. setosa (Lin et al. 2019), A. lanceolata (Zhang and Liu 2019), A. arguta (Li et al. 2018; Lin et al. 2018), A. arguta var. giraldii (Ding et al. 2021), A. eriantha (Tang et al. 2019a), A. kolomikta (Lan et al. 2017), A. polygama (Wang et al. 2016), A. tetramera (Wang et al. 2016), A. rufa (Kim et al. 2018), A. valvata (Chen et al. 2020; Lin et al. 2020), A. cylindrica var. cylindrica, A. cylindrica var. reticulata, A. styracifolia (Yang et al. 2020), A. macrosperma (Chen et al. 2019), A. fulvicoma (Zhang et al. 2019), A. hubeiensis, A. hemsleyana (Xiaoqiong et al. 2021), A. indochinensis, A. latifolia, and A. rubus (Xu et al. 2020), the chloroplast genomes’ characteristics and divergent regions, as well as the evolutionary lineage were explored by comprehensive genome-wide comparative analyses in terms of genome structure, gene organization, boundaries between IR, SSC and LSC regions, SSRs (simple sequence repeats), long repeat sequences and sequence synteny and diversity. Interestingly, a seemingly widespread clpP gene loss event reported previously (Yao et al. 2015; Wang et al. 2016) was carefully inspected, and its bona fide existence and expression were redefined. Based on LSC, LSC plus SSC regions’ sequences or complete chloroplast genome sequences, distinct phylogenetic relationships among 25 Actinidia taxa were reconstructed, respectively. Our findings would provide insights for refining evolutionary relationships among Actinidia taxa, and potential molecular markers to further resolve the complicated phylogenetic lineage in genus Actinidia.

Materials and methods

The chloroplast genome data sets

The complete chloroplast genome sequences of 137 species from Ericales, including 25 Actinidia species, were downloaded from NCBI genome database. The detailed information of chloroplast genomes was listed in Table S1.

Genome structure, gene organization and repeat sequences

The genes in each chloroplast genome sequences were re-annotated using PGA (Plastid Genome Annotator) (Qu et al. 2019), GeSeq (Tillich et al. 2017) and CPGAVAS2 (Shi et al. 2019), respectively. Subsequently, the annotation results from three programs were merged. The gene organization, including total gene number, gene copy and intron number in each gene, was analyzed using our Python scripts.

SSRs were detected using MISA Perl script with thresholds of 10, 6, 5, 5, 5 and 5 repeats as a unit, respectively, for mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSRs. Long repeats, including forward, reverse, palindromic and complement repeats, in Actinidia chloroplast genomes were identified using REPuter (Kurtz et al. 2001). For all repeat types, the Hamming distance was 3, which meant that two repeat copies had at least 90% similarity. The minimum repeat length was 30 bp, and the maximum number of repeated sequences displayed was 1,000.

Comparative analyses of boundaries between LSC, SSC and IR regions

Mummer 3.0 (Delcher et al. 2003) was used to align each chloroplast genome sequence to itself, to confirm the boundaries between the LSC, SSC and IR regions. If the inverted repeat region is not 100% similar, we manually adjust the position of the inverted repeat region based on Mummer’s alignment results. The boundaries’ visualization between LSC, SSC and IR regions was implemented using the SVG module in Perl.

Identification of hypervariable regions

The aligned chloroplast genome sequences of Actinidia species were imported into program DnaSP (Rozas et al. 2017) to calculate the nucleotide polymorphism. In sliding window analysis, the window length and step size were set to 600 bp and 200 bp, respectively. Meanwhile, the multiple sequence alignment of the 25 Actinidia species’ complete chloroplast genomes was also visualized using mVISTA (Frazer et al. 2004).

Analyses of clpP gene sequence and its surrounding syntenic region

Using clpP encoding protein sequence in other Ericales species as query, the tBlastn analyses were performed against the chloroplast genome sequences of 25 Actinidia species. The ORFs (open reading frames) were predicted in the similar nucleotide sequence in each tested Actinidia species, respectively. The multiple sequence alignment of predicted clpP encoding protein sequence in 25 Actinidia species, and another two Actinidiaceae species, Saurauia tristyla in genus Saurauia, and Clematoclethra scandens in genus Clematoclethra were performed using MAFFT (Katoh et al. 2019).

The syntenic regions surrounding clpP gene were retrieved from 25 Actinidia species’ chloroplast genome sequences and subsequently compared. The genes distributed in the syntenic regions were visualized using SVG module in Perl.

Phylogenetic relationship reconstruction

The phylogenetic tree among 25 Actinidia species including 27 independent chloroplast genome sequences was reconstructed, using two Actinidiaceae species, S. tristyla in genus Saurauia and C. scandens in genus Clematoclethra, as an outgroup (Table S1). The ML (maximum likelihood) phylogenetic tree was constructed, using whole chloroplast genome sequences, LSC and LSC plus SSC regions’ sequences, respectively.

Additionally, three other phylogenetic trees were reconstructed with 29 independent chloroplast genome sequences derived from 25 Actinidia species, including an additional two sequences from A. chinensis “AC017” (tetraploid) (Genbenk accession number: KP297243) and A. chinensis var. deliciosa “AD019” (hexaploid) (Genbenk accession number: KP297245), respectively. The ML (Maximum likelihood) phylogenetic tree was constructed, using whole chloroplast genome sequences, and sequences of LSC alone and LSC plus SSC regions, respectively.

In each phylogenetic relationship analysis, the nucleotide sequences were aligned by MAFFT and subsequently adjusted by trimAl (Capella-Gutierrez et al. 2009). The tree construction was performed by IQ-TREE 1.6.12 (Nguyen et al. 2015) with 1000 bootstrap replicates. The suitable model for each tree construction was determined by ModelFinder (Kalyaanamoorthy et al. 2017) integrated in IQ-TREE 1.6.12 (Nguyen et al. 2015).

Results

The summary of chloroplast genomes in Actinidia species

The 29 independent chloroplast genome sequences of 25 Actinidia species were downloaded from NCBI genome database, including A. zhejiangensis, A. callosa var. henryi, A. callosa var. strigillosa, A. chinensis, A. chinensis var. deliciosa, A. chinensis var. setosa, A. lanceolata, A. arguta, A. arguta var. giraldii, A. eriantha, A. kolomikta, A. polygama, A. tetramera, A. rufa, A. valvata, A. cylindrica var. cylindrica, A. cylindrica var. reticulata, A. styracifolia, A. macrosperma, A. fulvicoma, A. hubeiensis, A. hemsleyana, A. indochinensis, A. latifolia, and A. rubus (Table 1, Table S1). In our study, there are four independent chloroplast genomes of A. chinensis, from three diploids, “AC011”, “Hongyang”, and “Jinguo”, and one tetraploid “AC017”, respectively. Two independent chloroplast genomes were also collected in A. chinensis var. deliciosa, from “AD006” (tetraploid) and “AD019” (hexaploid). The detailed information of chloroplast genomes for 25 Actinidia species is presented in Table 1, including species name, Genbank accession number, size of LSC, SSC, IR region or whole chloroplast genome, number of genes coding for proteins, tRNAs or rRNAs, as well as GC content.

Table 1 Summary of the chloroplast genomes from 25 Actinidia species in this study

The Actinidia species’ chloroplast genomes comprised four independent parts, i.e., LSC, SSC, IRa and IRb (Fig. 1). Among the 25 Actinidia species, the average genome size is 156,673.38 bp, with an average 37.20% GC content. A. indochinensis has the smallest genome size (155,931 bp), while the largest genome size in A. tetramera is up to 157,659 bp with the lowest GC content (37.03%) (Table 1, Fig. 1).

Fig. 1
figure 1

Genome map representing the chloroplast genome structure in Actinidia species. The PCG genes in different categories, tRNA and rRNA are labeled in different color box, respectively. The genes located in inner and outer circle represent the location in plus and minus DNA strand, respectively. Red characters represent gene copy number variation. The bold in italics represents variation of introns’ number

Additionally, the genome size of A. zhejiangensis (156,717 bp) and A. callosa var. henryi (156,826 bp) is close to those in most other Actinidia species, but these two genomes encode the smallest number of genes (128 genes). Seemingly there is no association between genome size and gene number in chloroplast genome sequences of the 25 Actinidia species (Table 1).

Gene content and exon–intron structure in Actinidia species’ chloroplast genomes

The genes encoded by chloroplast genome include three types, PCG (protein-coding gene), tRNA and rRNA (Fig. 1). Except A. rubus (82 genes), A. styracifolia (82) and A. zhejiangensis (82), the other 22 Actinidia species have 83 or 84 or 85 PCGs. Additionally, the tRNA genes number varies from 37 to 41 (Table 1). As illustrated in Fig. 1 and Table 2, gene doubling took place at loci of all the four rRNAs, tRNAs and PCGs, including rps12, ndhB, psbA, ycf2, ycf15, trnA-UGC, trnI-CAU, trnI-GAU, trnL-CAA, trnN-GUU, trnR-ACG, trnV-GAC, trnH-GUG, trnM-CAU, and trnfM-CAU (Table 2). Furthermore, rps12, psbA, ycf15, trnH-GUG, trnM-CAU, and trnfM-CAU were doubled in several Actinidia species. These analyses suggested that a considerable portion of total gene number variation might be evolved from gene doubling in the chloroplast genomes of the 25 Actinidia species (Table 2).

Table 2 Gene lists in the chloroplast genomes of 25 Actinidia species

The analysis on exon–intron structures showed that most of PCGs, tRNAs and rRNAs simply contained a single exon without intron, while a few genes had one or two introns (Table 2). Seven tRNAs (trnA-UGC, trnI-GAU, trnL-UAA, trnV-UAC, trnG-UCC, trnG-GCC and trnK-UUU) and 11 PCGs (rps12, rps16, rpl2, rpl16, rpoC1, ndhA, ndhB, petB, petD, atpF and ycf2) have one intron. By contrast, ycf3, a PCG gene, contains two introns with a more complicated exon–intron structure (Table 2). Nevertheless, some orthologs are divergent in exon–intron structure among different Actinidia species, including two PCGs, rps16 (Fig. S1) and petB (Fig. S2). For example, petB in A. callosa var. henryi has no intron, whereas the orthologous gene in other 24 Actinidia taxa contains one intron (Fig. S2).

Furthermore, the gene member expansion seems to be associated with varied exon–intron structure. The ycf2 has been doubled in 25 Actinidia species, both copies containing an intron in A. kolomikta (Fig. S3) in contrast to no intron existing in the both copies of other 24 Actinidia species. Twenty one out of 25 Actinidia species have two copies of rps12, each containing an intron (Fig. S4). Interestingly, cultivars ‘Hongyang’ and ‘Jinguo’ have two copies of rps12 instead of a single copy found in ‘AC011’ and ‘AC017’, although they all belong to A. chinensis.

The exon merge and loss of intron of clpP gene in Actinidia species’ chloroplast genomes

The clpP gene coding for the proteolytic subunit of Clp protease has been reported to be completely lost in the chloroplast genomes of Actinidia and other Actinidiaceae species and implicated to be transferred into nucleus in A. chinensis during chloroplast evolution (Yao et al. 2015).

To test whether clpP gene loss is synapomorphy in Actinidia genus or even other Actinidiaceae species, the clpP gene sequence was searched in the chloroplast genomes of 25 Actinidia species and another two Actinidiaceae species, S. tristyla in genus Saurauia and C. scandens in genus Clematoclethra. Consequently, NCBI genome annotation files of the tested 27 Actinidiaceae species indicated that a clpP gene was present only in S. tristyla, containing two exons and an intron. Subsequently, using the protein sequence of clpP gene in other Ericales species as query, the tBlastn analyses indicated that DNA sequence fragments showing high similarity were identified in 25 Actinidia species and C. scandens. The ORF (open reading frame) analyses indicated just two exons existed in 25 Actinidia species and C. scandens (Fig. 2, Table S2), encoding a 196–208 aa length clpP protein (Fig. 3). Additionally, using the predicted exon sequence of clpP gene in A. chinensis “AC011” as query, the matched Illumina raw reads from fruits and leaves transcriptome in SRA database could be identified through Blast analyses (Table S3), suggesting clpP gene may be constitutively expressed in Actinidia taxa.

Fig. 2
figure 2

The syntenic genomic regions including clpP and surrounding genes in 27 Actinidiaceae species

Fig. 3
figure 3

Multiple sequence alignment of clpP protein sequences in 27 Actinidiaceae species

To track the evolutionary variations of clpP gene, the clpP gene structures were compared among the 27 Actinidiaceae taxa and other 106 Ericales species with sequenced chloroplast genomes downloaded from NCBI Genbank database. However, multiple sequence alignment of the 137 clpP encoding protein sequences in Ericales species demonstrated that amino acids’ variation in clpP encoding protein upstream sequences just occurred in Actinidiaceae species, including 25 Actinidia species, and C. scandens, with the 19–31 upstream amino acids residues varied (Fig. S5).

Subsequently, compared to those in 27 Actinidiaceae species, the other 104 Ericales species’ clpP genes have three exons and two introns, except those in Huodendron biaristatum (1 exon) and Alniphyllum pterospermum (two exons), respectively (Table S2). For the 104 clpP members with three exons and two introns, additional Blast analyses showed the second and third exon merged with intron loss in 27 tested Actinidiaceae species.

Interestingly, the first intron’s sequences of the 104 clpP members could also be traced around the intron sequences of clpP genes in 25 Actinidia species and C. scandens (Fig. 2, Fig. S6, Table S4), but absent in S. tristyla. Comprehensive Blast analyses in NCBI Nt and Nr database showed the varied clpP sequence including two exons is Actinidiaceae-specific, implicating the exon merge and losses of intron in clpP gene might occur after the Actinidiaceae-other Ericales species divergence.

Boundaries between IR, SSC and LSC region in Actinidia species’ chloroplast genomes

Generally, there were mainly three different types of boundaries between IR, LSC or SSC regions in Actinidia species with little difference (Fig. 4, Fig. S7). Type I was found in A. arguta, A. arguta var. giraldii, A. chinensis “AC011”, A. chinensis “AC017”, A. chinensis “Jinguo”, A. chinensis “Hongyang”, A. chinensis var. deliciosa “AD006”, A. chinensis var. deliciosa “AD019”, A. cylindrica var. cylindrica, A. indochinensis, A. eriantha, A. hemsleyana, A. kolomikta, A. polygama, A. rubus, A. rufa, A. styracifolia, A. valvata, A. macrosperma, A. tetramera, A. zhejiangensis and A. fulvicoma (red labeled in Fig. 4, Fig. S7). Among Type I members, each trnN is located in IRa and IRb region, respectively, close to SSC region. Additionally, ycf1 resides at the overlapping region of SSC and IRa (Fig. 4, Fig. S7). Type II was detected in A. chinensis var. setosa, A. latifolia, and A. valvata (green labeled in Fig. 4, Fig. S7). In Type II members, ycf1 locating in SSC region is a representative characteristic.

Fig. 4
figure 4

Comparison of boundaries between LSC, SSC and IR regions in representative Actinidia species’ chloroplast genomes. The representative Actinidia species in Type I, II, and III are labeled in red, green and blue, respectively

In type III members (blue labeled in Fig. 4, Fig. S7), such as A. callosa var. strigillosa, A. hubeiensis, A. cylindrica var. reticulata and A. lanceolata, the boundary compositions between IR, LSC and SSC regions are similar to those in type II members, while a large difference is that the trnI and trnH occur at IRa/b and LSC regions besides the boundaries, respectively.

There are four species complexes in our study, including A. callosa complex (A. callosa var. henryi, A. callosa var. strigillosa), A. arguta complex (A. arguta, A. arguta var. giraldii), A. cylindrica complex (A. cylindrica var. cylindrica, A. cylindrica var. reticulata) and A. chinensis complex (A. chinensis, A. chinensis var. deliciosa, A. chinensis var. setosa). Except A. arguta complex, obvious boundary divergence could be found within the other three species complexes. A. callosa var. henryi, A. cylindrica var. cylindrica, A. chinensis and A. chinensis var. deliciosa were located in Type I (Fig. 4, Fig. S7). Whereas, A. chinensis var. setosa in Type II, and A. callosa var. strigillosa and A. cylindrica var. reticulata in type III were also observed.

SSRs in Actinidia species’ chloroplast genomes

The SSRs, including mono-, di-, tri-, tetra-, penta-, and hexanucleotide types, were analyzed in 25 Actinidia species’ chloroplast genomes and consequently, 24 (325 bp)–46 SSRs (536 bp) were identified (Fig. 5a, Table S5). A. callosa var. henryi has the largest number (46 SSRs), while A. arguta var. giraldii has the smallest (24 SSRs), respectively.

Fig. 5
figure 5

The SSR and long repeat sequences in Actinidia species’ chloroplast genomes. a The statistics of total number and size of SSR in Actinidia species. b The statistics of total number and size of long repeat sequences. c The long repeat sequences’ distribution in 70–80 kb range of the chloroplast genomes in A. tetramera, A. kolomikta, and A. callosa var. henryi, respectively

In Actinidia species, four types’ SSRs were detected, including mono-, di-, tri-, and hexa-nucleotide type. Detailed analyses indicated that the detected SSRs in Actinidia species were mainly mono-nucleotide type, accounting for 87.50–95.56% of total SSRs (Fig. S8, Table S5). Furthermore, the A/T type is the most abundant mono-nucleotide SSRs in Actinidia species, with C/G type accounting for a very small proportion (Table S5). Interestingly, the hexanucleotide SSR only exists in A. tetramera (1 SSR) and A. callosa var. henryi (1 SSR), respectively (Table S5). Our result is in accordance with previous reports that most SSRs in land plants’ chloroplasts genomes were mono- and/or di-nucleotide type, with few tri-, tetra-, penta-, and hexanucleotide type SSRs (Cui et al. 2019; Nie et al. 2019; Park et al. 2019; Huang et al. 2020; Tyagi et al. 2020). Nevertheless, compared to those in many sequenced plants’ chloroplast genomes (Cui et al. 2019), the totally detected SSRs accounted for obviously lower percentage of whole chloroplast genomes, ranging from 0.21 to 0.34% in Actinidia species.

Long repeat sequences in Actinidia species’ chloroplast genomes

A large number of long repeats, including forward, reverse, palindromic, and complementary repeats, were identified in chloroplast genomes of Actinidia species, ranging from 115 (5148 bp) to 482 (29,010 bp) (Fig. 5b, Table S6), with forward and palindromic repeats accounting for the largest portion in Actinidia species. The complementary repeats were detected only in A. callosa var. henryi, A. lanceolata and A. chinensis var. setosa, with a single copy in each species (Fig. S9, Table S6). Compared to SSRs, the total number and size of long repeat sequences in each Actinidia species largely exceeded those of SSRs, respectively (Fig. 5a, b). Similar observations were reported in Pterocarpus (Hong et al. 2020) and Aristolochia (Li et al. 2019).

The number of long repeats in A. tetramera is largely greater than that in other 24 Actinidia species (Table S6). A. tetramera has up to 482 long repeats, including 427 forward, 49 palindromic and 6 reverse repeats (Table S6). By contrast, A. lanceolata had the fewest long repeats, 115 in total, including 84 forward, 28 palindromic, 2 reverse and 1 complementary repeats (Table S6). Interestingly, 261 long repeats identified in A. chinensis var. deliciosa AD019”, largely exceeded that of the other species in A. chinensis complex (A. chinensis, A. chinensis var. deliciosa, and A. chinensis var. setosa).

Among 25 Actinidia species, the length of the long repeat majorities is shorter than 100 bp (Table S7), predominantly ranging between 30 and 40 bp (Table S8). For long repeats’ length exceeding 100 bp, 15 out of 25 Actinidia species has less than 15 long repeats (Table S7), whereas A. kolomikta, A. fulvicoma, A. hubeiensis, A. arguta var. giraldii, A. cylindrica var. reticulata, A. latifolia, A. chinensis “AC011”, A. chinensis “AC017”, A. chinensis var. deliciosa “AD006”, A. chinensis var. deliciosa “AD019”, A. chinensis “Jinguo”, A. chinensis “Hongyang”, A. chinensis var. setosa or A. rubus own 40, 30, 19, 18, 16, 15, 28, 34, 34, 64, 32, 24, 33 or 39 long repeats with length exceeding 100 bp, respectively, ranging in size predominantly between 100 and 300 bp (Table S8).

Furthermore, the distribution of long repeat sequences displayed a species-specific enrichment in Actinidia taxa (Fig. S10). Using 10 kb sequences as a statistics unit, there were three peaks of long repeat sequences’ distribution in ranges of 50–60 kb, 70–80 kb and 130–140 kb, respectively. Specifically in A. tetramera, the majority of long repeats are located in the ranges of 50–60 kb and 70–80 kb (Fig. S10). In 70–80 kb alone, the majority of long repeats are derived from three species, A. tetramera, A. kolomikta, and A. callosa var. henryi. Further syntenic sequence analyses indicated these long repeat sequences of 70–80 kb are mainly located at the intergenic region between rps12 and psbB (Fig. 5c).

Divergent sequence regions in Actinidia species’ chloroplast genomes

To characterize the divergence, the chloroplast genome sequence alignments of Actinidia species are present by mVISTA, using A. chinensis “AC011” as reference. High sequence similarities among 25 Actinidia species were revealed by sequence identity plots of the chloroplast genome sequences (Fig. S11). The majority of sequence variations are distributed in intergenic regions, whereas the PCGs, rRNAs and tRNAs contain comparatively less sequence fluctuations. The most divergent coding regions are located in genes accD and ycf1 (Fig. S11).

To further investigate the variable nucleotides, especially the hot spots possibly involved in evolution, the sequence diversity was calculated for 25 Actinidia species tested. As a result, the average value of nucleotide diversity (Pi) is 0.00559, and the average Pi value of LSC (0.00664) and SSC (0.00814) is much higher than that in the IR (0.00249).

Detailed Pi value demonstrated the many variable regions are located in LSC and SSC regions, with the IR regions remaining relatively conserved across Actinidia genus (Fig. 6a). In LSC, SSC and IR regions, there are nine, two and zero DNA fragments showing relatively high nucleotide diversity (Pi value > 0.016) (Table S9). In IR regions, Pi value of the most divergent sequences is 0.0105. In LSC and SSC regions, there are four highly divergent regions, rps16 ~ trnQ-UUG, rps4 ~ trnT-UGU, petA ~ psbJ and rps12 ~ psbB, which exhibit remarkably higher Pi values (> 0.02) (Fig. 6a). Furthermore, rps12 ~ psbB, exclusively embodying clpP gene and its up/down-stream noncoding sequence, is the most divergent region, with Pi value > 0.03 (Fig. 6a). We checked the genome location of three divergent regions, including rps4 ~ trnT-UGU, petA ~ psbJ and rps12 ~ psbB, which are distributed in a sytenic region between 46,390 and 75,519 bp of the chloroplast genomes in 25 Actinidia species. This 29 kb region included 31 genes, including 24 PCGs and 7 tRNAs (Fig. 6b). The abundant variable nucleotide sites in the 29 kb region could provide suitable molecular markers for further phylogenetic studies of Actinidia species.

Fig. 6
figure 6

Divergent hot spots in Actinidia species chloroplast genomes. a The nucleotide diversity (Pi value) in chloroplast genomes of 25 Actinidia species. Four most divergent hot spots (Pi values > 0.02) are labeled, respectively. b The detailed gene distribution in the genomic regions spanning three highly divergent regions among 25 Actinidia species

Phylogenetic reconstruction in Actinidia

Using two Actinidiaceae species, S. tristyla and C. scandens as outgroup, phylogenetic relationships among 25 Actinidia species were reconstructed. Based on the chloroplast genome sequences, the ML phylogenetic tree was constructed among the 27 Actinidiaceae species (Fig. 7a).

Fig. 7
figure 7

The reconstructed ML phylogenetic tree of 25 Actinidia species. The ML (Maximum likelihood) phylogenetic tree of 25 Actinidia species based on the whole chloroplast genome sequences (a) and LSC plus SSC regions (b), respectively. Four infrageneric sections in Actinidia, including Leiocarpae (Lei), Maculatae (Mac), Stellatae (Ste), and Strigosae (Str), are labeled besides each species

In the phylogenetic tree, 25 Actinidia species could be classified into three main groups, Group I (7 species), Group II (11) and Group III (7). Group II and Group III represented closer phylogenetic relationships in comparison with Group I located in the outer (Fig. 7a). In Group III, A. chinensis, A. chinensis var. deliciosa, A. chinensis var. setosa, A. indochinensis and A. callosa var. strigillosa clustered together, whereas A. zhejiangensis and A. rufa formed another independent cluster.

Group II included two independent clades. A. cylindrica var. cylindrica, A. rubus, A. hubeiensis, and A. callosa var. henryi were clustered in one clade. A. styracifolia, A. eriantha, A. fulvicoma, A. cylindrica var. reticulata, A. hemsleyana, and A. latifolia, showed closely phylogenetic relationships in another clade (Fig. 7a).

Considering the abundant variable nucleotides in LSC and SSC regions (11 regions with Pi value > 0.016) (Table S9), two other ML phylogenetic trees of 25 Actinidia species was constructed based on the sequences of LSC plus SSC regions (Fig. 7b) and LSC alone (Fig. S12), respectively, showing consistent phylogenetic relationships with that reconstructed through the chloroplast genome sequences. Furthermore, based on chloroplast genome sequences (Fig. 7a), LSC plus SSC regions (Fig. 7b) or LSC alone (Fig. S12), our reconstructed relationships among the 25 Actinidia species are mainly in accordance with previous reports in Actinidia species (Liu et al. 2017; Tang et al. 2019b).

In addition, another three phylogenetic trees were also reconstructed based on whole chloroplast genome sequences (Fig. S13a), LSC plus SSC regions (Fig. S13b) or LSC alone (Fig. S14) of 31 independent chloroplast genomes from 25 Actinidia species, by adding another two chloroplast genomes from polyploid species, A. chinensis “AC017” (tetraploid) and A. chinensis var. deliciosa “AD019” (hexaploid). All the three phylogenetic trees also displayed consistent topology with our other trees based on either the whole chloroplast genome sequences or SSC and/or LSC regions alone, respectively.

Discussions

In this study, the genome-wide comparative genomic analyses were performed among chloroplast genomes of 25 Actinidia species. The clpP gene sequence with exon merge and intron deletion was identified in all the 29 tested chloroplast genomes tested from 25 Actinidia species. Four highly divergent sequence regions, including rps16 ~ trnQ-UUG, rps4 ~ trnT-UGU, petA ~ psbJ and rps12 ~ psbB were identified. Based on either sequences of LSC, combined SSC and LSC or the whole chloroplast genome sequences, the consensus phylogenetic tree with improved distinct resolution for 25 Actinidia taxa was reconstructed.

The chloroplast genomes of Actinidia species could represent genus specific evolution characteristics. In the chloroplast genomes of Actinidia species, three out of a total four highly divergent sequence regions, including rps4 ~ trnT-UGU, petA ~ psbJ and rps12 ~ psbB, were defined in a syntenic region, ranging from 46,390 to 75,519 bp. To compare the high variation sequence regions with other Ericales species, the nucleotide polymorphisms were also calculated in the chloroplast genomes of species in family Balsaminaceae (4 members), Ebenaceae (11), Pentaphylacaceae (3), Primulaceae (31), Sapotaceae (5), Styracaceae (22) and Theaceae (29), respectively (Fig. S15). Consequently, the highly divergent sequence regions in the aforementioned Ericales species’ chloroplast genomes are distinct from those in Actinidia species, representing a different evolutionary process in genus Actinidia.

Furthermore, rps12 ~ psbB region could be implicated as the most important evolutionary hot spot in genus Actinidia. The rps12 ~ psbB region exclusively embody varied clpP gene and its up/down-stream noncoding sequence (Fig. 2). Our comprehensive analyses indicate the varied clpP sequence including two exons is just Actinidiaceae-specific (Fig. 3, Fig. S5). The nucleotide variation (Pi value ˃ 0.3) demonstrates rps12 ~ psbB is the most divergent region in chloroplast genomes of Actinidia species (Fig. 6). This region is also one enriched with long repeats, mainly derived from A. tetramera, A. kolomikta, and A. callosa var. Henryi (Fig. 5c).

In the previous studies, due to lack of sufficient chloroplast genome sequences, the phylogenetic analyses of Actinidia species were mainly based on variant sites of limited nucleotide sequence fragments derived from nuclear, chloroplast and mitochondrion genomes (Huang et al. 2002; Li et al. 2002; Chat et al. 2004). Our phylogenetic studies have been performed at chloroplast genome-level among 25 Actinidia species, including sufficient nucleotides polymorphism for phylogenetic relationship reconstruction. Most of the bootstrap values besides the tree branches are 100 (Fig. 7). Significantly, our phylogenetic tree of 25 Actinidia species based on whole chloroplast genome, LSC plus SSC, or LSC alone, showed consistent phylogenetic relationships, further demonstrating the accuracy and reliability of our method and results (Fig. 7, Figs. S12, S13, S14).

Morphologically classified Actinidia taxa includes four infrageneric sections, Leiocarpae (Lei), Maculatae (Mac), Stellatae (Ste), and Strigosae (Str) (Chat et al. 2004; Testolin et al. 2016). Apparently, our phylogenetic tree is largely in accordance with the four sections, including nine species from section Ste, eight from section Lei, six from section Mac and two from section Str (Fig. 7).

All the members from section Ste and section Mac are clustered to form neighboring Group I and Group II that represent relatively closer phylogenetic relationships. Specifically, five Ste members, four Mac members and two Str members are clustered together in Group II. Adjacent to four Ste members, two from section Mac and one from Section Lei are clustered within Group III. Intriguingly, seven out of eight members from section Lei, including A. kolomikta, A. valvata, A. polygama, A. macrosperma, A. arguta, A. arguta var. giraldii, and A. tetramera, are consecutively located in Group I, consistently supporting the basal positions of most Lei species in Actinidia genus (Fig. 7). A major discrepancy is that the A. rufa, a member of Lei, is clustered with A. zhejiangensis to form an independent cluster located in Group III. But this exception seems not in conflict with another two investigations using either SNPs of nuclear genomes (Liu et al. 2017) or four polymorphic intergenic spacers sequences derived from the chloroplast genomes (Tang et al. 2019b).

Recently, two phylogenetic studies based on genome-wide SNPs (Liu et al. 2017) or four intergenic spacers sequences of the chloroplast genomes were reported (Tang et al. 2019b), respectively. Our phylogenetic tree is largely consistent with that based on genome-wide SNPs (Liu et al. 2017), supporting an improved resolution in determining the interspecific relationships of Actinidia species using whole chloroplast genome sequences in our study. An exception is that in the genome-wide SNPs phylogenetic tree (Liu et al. 2017), A. zhejiangensis is closely clustered with A. latifolia, A. eriantha, A. fulvicoma, A. cylindrica, A. callosa var. henryi and A. lanceolata, in contrast to our tree wherein A. zhejiangensis shows close relationship with A. chinensis complex (A. chinensis, A. chinensis var. deliciosa, A. chinensis var. setosa), A. callosa var. strigillosa, A. indochinensis, and A. rufa to form a monophyletic clade (Fig. 7).

Distinct from the trees of ours (Fig. 7) and on the basis of SNPs of nuclear genomes (Liu et al. 2017) that A. valvata is closely clustered with A. polygama, A. valvata shows the closest lineage with A. tetramera using four intergenic spacers sequences of the chloroplast genomes (Tang et al. 2019b). In addition, our data and previous studies (Liu et al. 2017; Tang et al. 2019b) indicated A. macrosperma together with other section Lei species are grouped into the basal clade of Actinidia species (Fig. 7). Interestingly, A. macrosperma shows different interspecific sister relationships in the three studies, including A. macrosperma/A. kolomikta (Liu et al. 2017), A. macrosperma/A. polygama (Tang et al. 2019b) and A. macrosperma/A. arguta of ours.

Additionally, for A. chinensis complex (A. chinensis, A. chinensis var. deliciosa, and A. chinensis var. setosa), A. arguta complex (A. arguta, and A. arguta var. giraldii), and A. cylindrica complex (A. cylindrica var. cylindrica, and A. cylindrica var. reticulata), both the tree of ours and on the basis of SNPs of nuclear genomes (Liu et al. 2017) support closer relationships of the species in each species complex (Fig. 7), with members from each species complex clustered in a main clade in both trees, respectively. Interestingly, A. indochinensis other than members in A. chinensis complex shows interspecific sister relationships with A. chinensis in both trees (Fig. 7) (Liu et al. 2017). In both studies’ results, similar findings also exist in A. arguta complex and A. cylindrica complex. It was demonstrated that the largely divergent evolution process might occur in the members of each Actinidia species complex.

We believe all the discrepancies could happen due to the occurrences of naturally interspecific hybridization and/or introgression events originating many times resulting in distant cytoplasm–nuclear hybridizations and reticulate evolution events in Actinidia (Chat et al. 2004; Testolin et al. 2016), as well as the independent evolution directions of the chloroplast and nuclear genomes.

Conclusion

In this study, chloroplast genome-wide comparative analyses were performed in 25 Actinidia species. The average chloroplast genome size is 156,673.38 bp, with average 37.20% GC content. The total gene number variation mainly resulted from gene copy number variations and gene losses. The long repeat sequences other than SSRs are the main repeats resulting genome size expansion. The most hypervariable regions involving evolutionary hot spots in Actinidia species is rps12 ~ psbB wherein the clpP gene sequence with exon merge and intron loss was discovered and implicated in differentiation of Actinidiaceae. The phylogenetic relationships of 25 Actinidia taxa are refined as well.