Introduction

Having two copies of inverted repeat (IR) separated by the large and small single copy (LSC and SSC) regions is a typical feature of most plastomes (Jansen and Ruhlman 2012). The plastome size of land plants is typically 120–160 kb in length (Bock 2007) but varies considerably among some photosynthetic seed plant plastomes (Jansen and Ruhlman 2012). An extreme example comes from the plastome of the family Geraniaceae, which ranges from 116,935 bp in Erodium carvifolium to 217,942 bp in Pelargonium × hortorum (Blazier et al. 2011; Guisinger et al. 2011). The expansion/contraction or loss of an IR is one of the most evident causes of plastome size variation (Jansen and Ruhlman 2012). The IR size of land plants is typically 15–30 kb (Zhu et al. 2016) but ranges from 10 to 76 kb considering some extreme examples (Palmer 1991). Small IR expansions and contractions (<100 bp) are very frequent in angiosperm plastomes (Goulding et al. 1996), whereas large IR expansions (>1000 bp) occur less, and large contractions are even rare (Raubeson and Jansen 2004). Large IR expansions were reported from plastomes of Pelargonium × hortorum, Berberidaceae, Ranunculaceae, and some other plants (Chumley et al. 2006; Downie and Jansen 2015; Goulding et al. 1996; Hoot and Palmer 1994; Kim and Jansen 1994; Luo et al. 2016; Palmer et al. 1987a; Plunkett and Downie 2000; Shen et al. 1982). Large IR contractions were reported in the fern genus Osmunda (Stein et al. 1986) and Ophioglossum (Grewe et al. 2013) and some members from Myoporaceae, Loganiaceae, and Apiaceae (Plunkett and Downie 2000). Extremely, plastomes of some plants have contracted most or even lost IR, as observed in conifers (Guo et al. 2014; Wu and Chaw 2014), the inverted-repeat-lacking clade (IRLC) legumes (Doyle et al. 1996; Palmer et al. 1987b), some species of Erodium (Guisinger et al. 2011), and some broomrapes (Blazier et al. 2016; Wicke et al. 2013). In addition, repeat content has been also considered as an important contributor to plastome size variation (Dugas et al. 2015; Jo et al. 2011). The size of four published mimosoid plastomes varies significantly according to their IR length. Inga leiocalycina with expanded IR has the largest plastome (175,489 bp) among studied legumes (Dugas et al. 2015). Tandem repeat proportion was also inferred to be associated with mimosoid plastome size (Dugas et al. 2015). The size variation of mimosoid plastomes should be systematically addressed by including more representative taxa.

Previous studies revealed significant structural variation among legume plastomes. Most have been found in plastomes of subfamily Papilionoideae. A 50-kb inversion in the LSC region occurs in plastomes of most papilionoids except a few early diverged lineages (Doyle et al. 1996). A further 78-kb inversion including the 50-kb segment occurs in Vigna radiata and Phaseolus vulgaris (Palmer et al. 1988), and a further 36-kb inversion within the 50-kb segment occurs in Lupinus luteus (Martin et al. 2014). Plastomes of the IRLC have lost their IRs (Doyle et al. 1996; Palmer et al. 1987b; Wojciechowski et al. 2000). Significant plastome structural rearrangements have been detected in many species of the IRLC. For example, the plastomes of the “core Trifolium” undergo extensive genomic reconfiguration including many gene duplications, losses, and order changes attributable to inversions (Cai et al. 2008; Sveinsson and Cronk 2014); plastome of Cicer arietinum exhibit many gene and intron losses (Jansen et al. 2008); plastomes of Astragalus species present several gene losses and large inversions (Choi et al. 2015; Lei et al. 2016). However, only limited studies focused on the other two subfamilies. Within Mimosoideae, a ca. 13-kb IR-SSC boundary shift was detected in the plastomes of Acacia and Inga and a loss of clpP intron 1 in the plastome of Inga (Dugas et al. 2015). Five plastomes of Caesalpinioideae are available in GenBank (National Center for Biotechnology Information), but relevant results have not been published. More samples should be included in comparative analyses to detect new plastome structural variation in these two subfamilies.

Mimosoideae is the second largest subfamily of Fabaceae and is characterized by regular flowers that often have increased stamen numbers (Käss and Wink 1996). It is comprised of 83 genera and approximately 3270 species in four recognized tribes of Acacieae, Ingeae, Mimoseae, and Mimozygantheae (Lewis et al. 2005; LPWG 2013). Mimosoids have a pantropical distribution and form ecologically abundant elements in all major tropical biomes (Lewis et al. 2005; LPWG 2013). Many mimosoid species are important horticultural plants (species of Albizia and Calliandra), tropical feeding crops (Leucaena leucocephala), timbers (species of Acacia, Anadenanthera, and Prosopis), glues, and food thickeners (species of Acacia) (Lewis et al. 2005).

For such an economically and ecologically important group, the classification and phylogenetic relationships of Mimosoideae remain unclear. The monophyly of the subfamily is strongly supported by previous molecular studies (LPWG 2013). Nevertheless, all traditionally recognized tribes (excluding the monotypic Mimozygantheae) are supported to be non-monophyletic by a few plastid and nuclear gene regions, and relationships among many major lineages remain unresolved (LPWG 2013). Phylogenomics applies genomic data to reconstruct the evolutionary history of organisms (Delsuc et al. 2005; Eisen 1998; Eisen and Fraser 2003) and has been applied to tackle rapidly radiating clades (Barrett et al. 2014; Bewick et al. 2012; Jian et al. 2008; Ruhfel et al. 2014; Wickett et al. 2014; Zhou et al. 2012). A plastid phylogenomic approach has been successfully applied to resolve many enigmatic relationships within angiosperms and across all green plants (Jansen et al. 2007; Moore et al. 2007; Moore et al. 2010; Xi et al. 2012; Zhong et al. 2010). To fully resolve phylogenetic relationships within this subfamily, more variable plastid gene fragments and even whole plastomes should be applied in the analysis.

In this study, we integrated 11 newly sequenced plastomes representing three of four traditionally recognized mimosoid tribes with three mimosoid plastomes available in GenBank. The main objectives of this study were (1) to reveal plastome structural and size variation in mimosoids, (2) to identify synonymous substitution rate shifts of genes relocated into or out of the IR, (3) to explore sequence divergence of plastome regions for further evolutionary and systematic study of mimosoids, and (4) to test the applicability of plastid phylogenomics in resolving phylogenetic relationships of mimosoids and trace the evolutionary pattern of IR expansion/contraction.

Materials and methods

Taxa sampling

Fresh leaves and silica-gel dried materials were sampled from 11 species representing 11 genera of three tribes of the subfamily Mimosoideae and one species (Cadellia pentastylis) of Surianaceae (unpublished). The voucher specimens for the ten fresh sampled plants collected from China and Australia were deposited at the Herbarium of Kunming Institute of Botany (KUN). The two silica-gel-dried materials from South Africa were obtained from the University of Johannesburg (JRAU). An Online Resource shows the details of collecting information (see Online Resource 1). Other three mimosoid plastomes obtained from GenBank were also included in analyses. We also included 27 other legume plastomes available in GenBank (Table 1) as closely related outgroups and the newly sequenced C. pentastylis as an outgroup of Fabaceae to reconstruct phylogenetic relationships and to infer the evolutionary pattern of IR expansion/contraction.

Table 1 Accession number and reference of sampled legume plastomes obtained from GenBank

Chloroplast DNA extraction and sequencing

Two different methods were used to obtain plastomes in accordance with the availability of fresh leaf material. For the four species with more than 50 g fresh leaves collected from the Kunming Botanical Garden (KBG) and Xishuangbanna Tropical Botanical Garden (XTBG), chloroplast DNA (cpDNA) was extracted using the protocol described in Zhang et al. (2011). For the remaining seven species obtained from Australia and South Africa, total genomic DNA was firstly isolated with a modified cetyl trimethyl ammonium bromide (CTAB) method (Doyle 1987), in which 4% CTAB with approximately 1% polyvinyl polypyrrolidone (PVP) and 0.2% dl-dithiothreitol (DTT) was included (Yang et al. 2014); cpDNA was subsequently amplified using long-range polymerase chain reaction (PCR) with 15 universal primer pairs and methods described in Zhang et al. (2016). The isolated and amplified cpDNA was fragmented to construct short-insert (500 bp) libraries following the manufacturer’s manual (Illumina). Paired-end (PE) sequencing was performed on the Illumina Hiseq 2500 instrument at Beijing Genomics Institute (BGI, Shenzhen, Guangdong, China) and Hiseq 2000 instrument at Plant Germplasm and Genomics Center (Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China).

Plastome assembly and annotation

Quality control checks on the short reads were performed using NGS QC Tool Kit (Patel and Jain 2012) with default parameters. High-quality short reads were de novo assembled into contigs using CLC Genomics Workbench version 8.5.1 (evaluation version, CLC Bio), with a k-mer of 63 and a minimum contig length of 1000 bp. Subsequently, all contigs were aligned with reference plastomes (plastomes of subfamily Mimosoideae; Dugas et al. 2015; Williams et al. 2015) using Basic Local Alignment Search Tool (BLAST; Altschul et al. 1990) with default search parameters. Then, we determined the proper orders of the aligned contigs according to the reference plastomes, and the gaps between the de novo assembled contigs were filled with consensus sequences of raw reads mapped to the reference plastomes. At this point, the target contigs were assembled into complete plastomes. Finally, we designed primer pairs (see Online Resource 2) to verify the four junctions between the single copy (SC) and IR regions of the plastomes. PCRs were performed in 25.5 μL volume using 12.5 μL Taq DNA polymerase, 0.5 μL each of forward and reverse primers (dissolved in 10× ddH2O), and 1 μL (30 ng/μL) template DNA under the following conditions: 3 min at 94 °C, followed by 35 cycles of 50 s at 94 °C, 2 min at 50 °C, and 1 min at 72 °C, and then a final 8-min extension at 72 °C. PCR products were sequenced at the Kunming Sequencing Department of Biosune Biotechnology Limited Company (Shanghai, China). Sanger sequences and assembled plastomes were aligned using Geneious version 8.1.4 (Kearse et al. 2012) to determine if there were any differences.

Annotation of the plastomes was performed using DOGMA (Wyman et al. 2004), coupled with manual correction of start and stop codons and intron/exon boundaries. The online tRNAscan-SE service (Schattner et al. 2005) was used to further determine tRNA genes. The final complete plastomes were deposited into GenBank with accession numbers KX852435–KX852445 (see Online Resource 1). OrganellarGenomeDRAW (Lohse et al. 2013) was applied to draw physical maps for all newly sequenced plastomes, and an Online Resource shows physical maps of ten species (see Online Resources 3 and 4). To detect the number of mapped paired-end reads and the depth of coverage, paired reads were remapped to the assembled genome using Bowtie 2 (Langmead and Salzberg 2012) implemented in Geneious version 8.1.4.

Repeat analysis

Tandem repeat composition and distribution were identified in 14 mimosoid plastomes using the Tandem Repeat Finder (TRF) web interface (Benson 1999) with default parameters. Repeats with overlapped regions were treated as one for counting repeat size and number. REPuter (Kurtz et al. 2001) was hired to identify dispersed repeats, including forward, reverse, and palindromic repeats. Dispersed repeats at a minimum size of 30 bp were detected using a Hamming distance of 3, corresponding to a sequence identity of over 90%. The IRA of each plastome was removed before the dispersed repeat detection, then the location of repeats in IRA was manually found based on those detected in IRB. Because REPuter overestimates the number of repetitive elements in a given sequence by recognizing nested or overlapping repeats within a given region containing multiple repeats (Curci et al. 2015; Downie and Jansen 2015), all overlapping repeats were manually identified and then adjusted or excluded based on the following criteria: repeats would be excluded when nested wholly within another one or nested in or overlapped mostly with tandem repeats that were found by TRF; if motifs of repeats within a certain region were all recognized as being repetitive with those of another region, only the first motifs within each of the two regions would have remained. Correlation between repeat proportion (total repeat length/plastome size) and plastome size was analyzed using R v.3.3.2. (R Development Core Team 2016). To compare results in the same dimensions, we divided the sampled mimosoids into IR-expanded species and IR-canonical species based on their IRs. Acacia dealbata and Pithecellobium flexicaule were excluded from the analysis of IR-expanded species for additional large IR-LSC junction shifts in their plastomes (see “Results”). We also counted the repeat number, measured the repeat length, and calculated their proportion to each region of LSC, SSC, and IR.

Sequence divergences

Alignment and comparison of all 14 sampled mimosoid plastomes were performed using mVISTA (Frazer et al. 2004) in LAGAN mode. Pithecellobium flexicaule was used as reference because of its largest genome size and most gene content in IR.

To identify sequence divergence in mimosoid plastomes, 111 coding (including protein coding genes, tRNAs, and rRNAs) and 128 noncoding regions (including intergenic spacers and introns) from 14 plastome sequences were extracted and then aligned using MEGA6 (Tamura et al. 2013). The 421-bp rps18-containing inversion in A. dealbata (see “Results”) was manually reversed (boundaries of this inversion were delimited by comparing the plastome of this species to that of Acacia podalyriifolia, an unpublished plastome). The duplicated clpP copy locating between psbZ and trnG GCC of A. dealbata was removed from alignment. Comparisons of extracted coding and noncoding regions including numbers of constant, parsimony uninformative sites, parsimony informative sites (PIS), and range of sequence divergence in pairwise comparisons (uncorrected “p” distance) were conducted by using PAUP* 4.0a147 (Swofford 2002) with treatment of “gap” states as “missing data.”

Synonymous substitution rate shifts of relocated genes

There are 14 protein-coding genes in mimosoid plastomes relocated into or out of IR (see “Results”). Nucleotide synonymous substitution rates were estimated for ten genes (ndhA, ndhD, ndhE, ndhG, ndhH, ndhI, psaC, rpl2, rps3, and rps15); genes ndhF, rpl23, rps19, and ycf1 across IR-SC boundaries were excluded from analyses following Zhu et al. (2016). Sequence alignment for each group of species (out-IR and in-IR) was performed by using MUSCLE (Edgar 2004). Adenanthera microsperma was used as reference for estimating the synonymous substitution rates in each species by using CodeML in PAML v4.8 (Yang 2007) with the options of runmode = −2, seqtype =1, CodonFreq =2. Then, average values of synonymous substitution rates for each data matrix were obtained. A significance test was conducted by using the cor.test function in R v3.3.2 (R Development Core Team 2016).

Phylogenetic analysis and the evolutionary pattern of IR expansion/contraction

All 77 protein-coding genes were extracted and aligned using MAFFT version 7 (Katoh and Standley 2013) implemented in Geneious version 8.1.4 with default parameters. These matrices were then concatenated to a final data matrix (deposited at the TreeBASE repository, http://purl.org/phylo/treebase/phylows/study/TB2:S20385). The maximum likelihood (ML) phylogeny was reconstructed using RAxML version 8.1.1179 (Stamatakis 2014) at the XSEDE Teragrid of the CIPRES science Gateway (Miller et al. 2010), including tree robustness assessment using 1000 replicates of rapid bootstrap with the GTRGAMMA substitution model.

Fitch parsimony (FP, a character optimization method) was performed using Mesquite version 3.10 (build 765; Maddison and Maddison 2016) to trace the evolutionary pattern of the IR expansion/contraction in legumes. The IR was treated as one character and five character states were coded (0, IR-lacked; 1, IR-canonical; 2, IR-expanded into SSC; 3, IR-expanded into LSC; 4, IR-contracted out of LSC). The state scores for each sampled species were listed in Online Resource 5. The newly reconstructed ML phylogeny was applied for tracing the ancestral states of IR.

Results

Plastome organization

All newly sequenced mimosoid plastomes displayed the typical quadripartite structure, conserved gene content, and a similar gene order. The mean coverage of the 11 newly sequenced mimosoid plastomes ranged from 1011.8× (Pararchidendron pruinosum) to 2521.6× (Parkia javanica). The overall GC content was relatively conserved and around 36% (Table 2). Each plastome contained 111 different genes, including 77 protein-coding, 30 tRNA, and 4 ribosomal RNA (rRNA) genes (Fig. 1, Table 3 and Online Resources 3 and 4). Eighteen genes (six of which are tRNAs) contained introns. The protein-coding genes rps12 and ycf3 contained two introns. The rps12 gene was spliced into two transcriptions, with exon 1 in LSC and exons 2 and 3 in IR. The clpP gene also had introns in our sampled species except Faidherbia albida and Acacia dealbata, each of which lacked both two introns in their clpP. In addition, another duplicated clpP copy was found between psbZ and trnG GCC in A. dealbata. A 421-bp inversion containing rps18 was found in A. dealbata.

Table 2 Plastome information and characteristics of sampled mimosoids
Fig. 1
figure 1

Gene maps of the plastome of Acacia dealbata. Genes are indicated by boxes on the inside (clockwise transcription) and outside (counterclockwise transcription) of the outermost circle. The inner circle identifies the major structural components of the plastome (LSC, IR, and SSC). Genes belonging to different functional groups are color-coded. The dashed area in the inner circle indicates the GC content of the plastome

Table 3 Genes identified in the 11 newly sequenced mimosoid plastomes

IR expansion and contraction in mimosoid plastomes

Within the sampled mimosoids, six species of Adenanthera, Dichrostachys, Leucaena, Parkia, Piptadenia, and Prosopis belonging to tribe Mimoseae had canonical IRs (ranging from 26,007 bp in P. javanica to 26,142 bp in Dichrostachys cinerea; Table 2) which contain 16 complete genes (including five protein-coding genes, seven tRNAs, and all four rRNAs) and parts of the 5′ end of ycf1 (692–779 bp) and rps19 (91–105 bp) (Fig. 2). The remaining eight species belonging to tribe Ingeae and Acacia s.s. had much longer IRs ranging from 38,254 bp (A. dealbata) to 41,503 bp (Pithecellobium flexicaule), because all of them experienced ca. 13-kb IR expansion into SSC. Accordingly, their IRs included nine more protein-coding genes, with the IRB/SSC junction (JSB) occurring within ndhF, resulting in the duplication of the 3′ end (6–239 bp) of this gene in IRs, and the SSC/IRA junction (JSA) occurring between ccsA and ndhD in these eight species except F. albida, in which the last nucleotide of the stop codon of ccsA was included in IR. The LSC/IRB junction (JLB) of most of the aforementioned eight (IR-expanded) species was also located in rps19, making 98–109 bp of the 5′ end of this gene run into IR. However, a ca. 1.7-kb IR expansion into LSC was detected in P. flexicaule, making its JLB occur between rpl16 and rps3 and two more genes (rps3 and rps19) in IR. In contrast, a ca. 1.9-kb IR contraction out of LSC was found in A. dealbata, making its JLB occur within rpl23, and hence rpl2, rps19, and 199 bp of the 3′ end of rpl23 ran into LSC. Accordingly, the length of LSC within sampled mimosoid plastomes showed relatively moderate variation and ranged from 88,577 bp (Adenanthera microsperma) to 93,690 bp (Leucaena trichandra), whereas the length of SSC varied significantly, ranging from 4573 bp (Archidendron lucyi) to 18,941 bp (Piptadenia communis).

Fig. 2
figure 2

Comparison of the IR-SC boundaries among sampled mimosoid plastomes. The JSB, JSA and JLA refer to junctions of SSC/IRB, SSC/IRA, and LSC/IRA, respectively. Ψ means a gene is pseudo due to part of it duplicated in IR

Repeat analyses

Tandem repeats were detected in 54 (A. microsperma)–96 (L. trichandra) sites among IR-canonical species and in 64 (Albizia odoratissima)–104 (P. flexicaule) sites among IR-expanded species (a detailed list of repeat numbers and lengths is shown in Online Resource 6). The total length of tandem repeats ranged from 2902 (A. microsperma) to 7018 bp (P. flexicaule), and the proportion ranged from 1.82% (A. microsperma) to 3.92% (A. dealbata and P. flexicaule). After excluding the overlapped repeats detected by REPuter and TRF and taking one IR copy into account, 11 (A. odoratissima)–57 (L. trichandra) pairs of dispersed repeats were identified: 12 (A. microsperma)–57 (L. trichandra) within IR-canonical species and 11 (A. odoratissima)–45 (P. flexicaule) within IR-expanded species. In the majority of these species, the most dispersed repeats were forward, then palindromic and the least reverse. The length and proportion of the dispersed repeats ranged from 813 and 0.46% (A. odoratissima) to 3222 and 1.85% (A. dealbata). Correlation (p < 0.05) was found between both tandem and dispersed repeat proportion and plastome size in both IR-expanded species and IR-canonical species (Table 4). The LSC regions of all studied mimosoid plastomes possessed the highest proportion of both tandem and dispersed repeats (Fig. 3 and Online Resource 6). Within IR-canonical species, the SSCs contained a higher repeat proportion than IRs. Within IR-expanded species, SSC had a higher repeat proportion than IR in some species, while less in other species.

Table 4 Correlation analyses between repeat proportion and plastome size
Fig. 3
figure 3

Repeat proportion in different regions of mimosoid plastomes. a Tandem repeats. b Dispersed repeats. Species are ordered by plastome size from small to large. LSC large single copy, SSC, small single copy, IR inverted repeat. The yellow-dashed area refers to IR-conical species, while the gray-dashed area refers to IR-expanded species

Identification of sequence divergence

Comparison of sampled mimosoid plastomes using mVISTA revealed that noncoding regions were more diverged than coding regions. Generally, noncoding regions located in SCs were more divergent than those in IRs (Fig. 4).

Fig. 4
figure 4

Identity plot comparing the mimosoid plastomes. Pithecellobium flexicaule was used as reference. Coding and noncoding regions are colored blue and red, respectively

Aligned length for each of 111 coding and 128 noncoding regions ranged from 9 bp (psbF-psbE) to 6957 bp (ycf2). The number of PIS ranged from 0 (for 45 loci) to 302 (trnS GCU-trnG UCC). The number of variable sites ranged from 0 (for 17 loci) to 948 (ycf1). The five regions with relatively high sequence divergence were ccsA-ndhD (3.56–24.80%), clpP (0.68–29.25%), psaA-ycf3 (0.83–28.95%), rps12-clpP (1.49–26.04%), and trnS GCU-trnG UCC (2.77–25.41%), while those with no sequence divergence were mostly tRNAs and rrn5. An Online Resource shows the detailed comparative results (see Online Resource 7). Percentages of variable sites and PIS in coding and noncoding regions are shown in Fig. 5. As to coding regions, accD, clpP, rpl32, rps3, and ycf1 had high percentage of variable sites and PIS, with clpP having an especially high PIS percentage (19.59%) comparing with the four other regions (all less than 5%) (Fig. 5a). The percentages of both variable sites and PIS of noncoding regions were much higher than those of coding regions. The top five variable regions were accD-psaI, ccsA-ndhD, psaA-ycf3, psbI-trnS GCU, and trnS GCU-trnG UCC. The top five PIS regions were accD-psaI, ccsA-ndhD, psbI-trnS GCU, rpl33-rps18, and trnS GCU-trnG UCC (Fig. 5b).

Fig. 5
figure 5

Percentages of variable and parsimony-informative sites in homologous regions among mimosoid plastomes. a Coding regions; b noncoding regions. The homologous regions are oriented according to their locations in the plastomes

Synonymous substitution rate shifts of relocated genes

Nucleotide synonymous substitution rate shifts for genes relocated into and out of IR were shown in Online Resource 8. Most of these genes had higher synonymous substitution rates in SC than those in IR (Fig. 6). Linear regression analysis showed decreased synonymous substitution rates of genes relocating from SSC into IR (y = 0.8899x). However, the synonymous substitution rate of rps3 increased following translocation from LSC into IR.

Fig. 6
figure 6

Synonymous substitution rate shifts of relocated genes. Hollow circles refer to the substitution rates of ndhA, ndhD, ndhE, ndhG, ndhH, ndhI, psaC, rpl2, rps3, and rps15 in-IR (ordinate) and out-IR (abscissa). The red line shows the best-fit regression (constrained to pass through the origin); the black diagonal line refers to the y = x line

Phylogenetic relationships and evolutionary pattern of IR expansion/contraction in mimosoids

Phylogenetic analysis based on 77 protein coding genes well resolved relationships among sampled species from three subfamilies with high bootstrap supports (BS) (Fig. 7). Mimosoideae and Papilionoideae were strongly supported to be monophyletic (BS = 100) and nested within the paraphyletic Caesalpinioideae. Phylogenetic relationships among sampled mimosoids were resolved with high supports. The clade comprising Ingeae and Acacia s.s. (nested within Ingeae) was strongly supported (BS = 100). The tribe Mimoseae was supported to be paraphyletic.

Fig. 7
figure 7

The maximum likelihood (ML) tree of sampled mimosoids based on 77 protein coding genes. Numbers at the left of nodes are bootstrap support values, and bootstrap values of 100% are not shown

The state IR-canonical was traced as plesiomorphy of Fabaceae (Fig. 8). The states IR-expanding into SSC and IR-lacked were synapomorphies of the Ingeae + Acacia s.s. clade and the IRLC, respectively. The state IR-expanded into LSC and IR-contracted out of LSC evolved separately in different species within the Ingeae + Acacia s.s. clade, which should be their autapomorphies.

Fig. 8
figure 8

The evolutionary pattern of IR expansion/contraction

Discussion

This study revealed moderate variation in mimosoid plastomes that is consistent with and adds to what was found by Dugas et al. (2015). The previously reported ca. 13-kb IR-SSC boundary shift (Dugas et al. 2015) was detected in plastomes of all sampled species within the clade formed by Ingeae and Acacia s.s. and led to a major increase of plastome size. In addition, some structural variations were firstly found in this subfamily, including a 421-bp inversion containing rps18 and a duplicated copy of clpP in plastome of Acacia dealbata, and the loss of both introns of clpP in plastomes of A. dealbata and Faidherbia albida (Fig. 1 and Online Resources 3 and 4).

Structural and size variation of mimosoid plastomes

The shift of IR-SC boundaries played an important role in mimosoid plastome size variation (Dugas et al. 2015). The ca. 13-kb IR expansion into SSC resulted in significant larger legume plastomes in the Ingeae + Acacia s.s. clade. Plastome of Pithecellobium flexicaule experienced further IR expansion into LSC, which led to the largest known legume plastome (Fig. 2). In contrast, plastome of A. dealbata, which experienced IR contraction out of LSC, was the shortest within this clade.

Goulding et al. (1996) proposed a double-strand break (DSB) model that starts with a double-strand break followed by strand invasion and recombination to explain the IR expansion in Nicotiana acuminata. This model was applied by Wang et al. (2008) to explain IR expansions in many angiosperm lineages. Previous studies also suggested that regions with high content of short repeats or “poly A tract” are closely associated with the dynamics of IR-LSC junctions and expansions of IR (Dugas et al. 2015; Wang et al. 2008). In all eight mimosoid plastomes that exhibit ca. 13-kb IR expansion into SSC, the ca. 100-bp regions upstream of the IR-SSC junctions are extremely AT-rich (>90%) and have many poly A tracts and short repeats. Initiating with the DSBs and ending in the AT-rich or poly A-tract regions could be applied to explain the large IR expansion of mimosoid plastomes. Large IR contractions have been rarely reported; illegitimate recombination has been considered as the most plausible explanation (Blazier et al. 2016; Downie and Jansen 2015; Goulding et al. 1996).

The duplicated copy of clpP was found to be located between psbZ and trnG GCC in A. dealbata (Fig. 1). Gene duplications caused by IR are common in plastomes and is believed to be an important driving force in the evolution of genomes, leading to the creation of new genes and new gene functions (Xiong et al. 2009). Gene duplication is previously reported in multiple angiosperm lineages and most of them are tRNAs (Drabkova et al. 2004; Haberle et al. 2008; Hipkins et al. 1995; Koch et al. 2005; Lee et al. 2007; Lin et al. 2012; Vijverberg and Bachmann 1999). Duplication of protein-coding genes outside of the IR is rare, which have been reported in Pinus thunbergii (psaM; Wakasugi et al. 1994), Jasminum and Poaceae (rpl23; Lee et al. 2007; Xiong et al. 2009), Silene and Lychnis (clpP; Erixon and Oxelman 2008), Trachelium (psbJ; Haberle et al. 2008) and Pelargonium (many genes; Chumley et al. 2006). Haberle et al. (2008) supposed that duplicative transposition may be responsible for the three complete copies of psbJ occurring in the LSC region of Trachelium. Same mechanism could also be applied to explain the duplication of the clpP in A. dealbata. Any yet, the underlying mechanisms of gene duplication in SCs of plastome remain unclear (Lee et al. 2007).

Effect of repeats on mimosoid plastome

Plastome size variation were previously reported to be contributive to tandem (Dugas et al. 2015; Jo et al. 2011) and dispersed repeats (Cosner et al. 1997; Guisinger et al. 2011; Haberle et al. 2008; Ogihara et al. 1988; Weng et al. 2014). Congruent with Dugas et al. (2015), our analyses illustrated that both tandem and dispersed repeat contributed to plastome size of mimosoids (p < 0.05; Table 4), and LSC contains the highest repeat proportion and contributed most to plastome size (Fig. 3 and Online Resource 6). Repeats were previously inferred to associate with plastome structural variation (Cai et al. 2008; Cosner et al. 1997; Greiner et al. 2008). In this study, the plastome of A. dealbata possessed an extremely high repeat proportion, and presented the most structural variations. These together supposed that repeats may also affect both structure and size variation in mimosoid plastomes.

Identification of plastome sequence divergence

Plastomes supply valuable markers for reconstructing phylogenetic relationships. Many fragments of coding regions, introns, and intergenic spacers, including atpB, atpB-rbcL, matK, ndhF, rbcL, rpl16, rps4-trnS, rps16, trnH-psbA, trnL-F, trnS-G, etc., have been used for phylogenetic reconstructions at various taxonomic levels (Gao et al. 2008; Hilu et al. 2008; Kim and Jansen 1995; Li 2008; Peterson et al. 2010; Wilson 2009). Some plastome regions, such as atpF-H, matK, psbK-I, rbcL, rpoB, rpoC1, trnH-psbA, etc., have been relied upon heavily for development of candidate markers for plant DNA barcoding (Chase et al. 2007; Dong et al. 2012; Hollingsworth et al. 2011; Kress et al. 2005; Newmaster et al. 2006). The noncoding trnH-psbA and coding matK have been applied to distinguish the sister species complex of pantropical Acacia (Newmaster and Ragupathy 2009). Mimosoid plastomes contain multiple variable regions. The noncoding regions located in SCs show higher divergence than those in IRs (Fig. 4), which is accordant with previous studies (reviewed by Zhu et al. 2016). Many of their moderately evolved gene loci could be applied to resolve deep and intergeneric relationships of this subfamily. Multiple fast-evolving loci such as psaA-ycf3, ccsA-ndhD, trnS GCU-trnG UCC, psbI-trnS GCU, and accD-psaI could be applied to resolve inter- or intraspecific relationships (Fig. 5 and Online Resource 7).

Substitution rate shift of relocated genes

Nucleotide substitution rates of plastome coding genes have been proved to be decreased after translocating from SC regions to IR in many plant lineages (Li et al. 2016; Lin et al. 2012; Perry and Wolfe 2002; Zhu et al. 2016). In this study, we again found a decrease of substitution rates for genes shifted into IR (Fig. 6). One exception is that rps3 had a higher substitution rate after relocation into IR. Relocation of rps3 to IR may occur too recent for a rate decreasing to a detectable level; a similar phenomenon was detected for ycf2 in Ginkgo biloba (Lin et al. 2012).

Phylogenetic analysis and evolutionary pattern of IR expansion/contraction

The deep relationships of mimosoids were poorly resolved by phylogenetic studies applying a few plastid markers (Bouchenak-Khelladi et al. 2010; Luckow et al. 2003; Luckow et al. 2000; Miller et al. 2003; Miller and Seigler 2012). Plastid phylogenomics has been proved to be efficient to resolve difficult relationships at family level like Orchidaceae (Givnish et al. 2015) and lower taxonomic level such as subfamilies Bambusoideae (Wysocki et al. 2015; Zhang et al. 2011) and Chloridoideae (Duvall et al. 2016) of Poaceae. In this study, plastid phylogenomics was proved to be an efficient way to resolve relationships of Mimosoideae and those of Fabaceae (Fig. 7), which could be applied to reconstruct a robust mimosoid or even legume phylogeny with systematic sampling.

Large structural rearrangements of plastome have been proved to have a strong phylogenetic signal and were applied to define monophyletic lineages in many plant groups (Downie and Palmer 1992; Raubeson and Jansen 2004). For instance, the IRLC of Papilionoideae lacks IR (Lavin et al. 1990; Wojciechowski et al. 2000), and the “Aegopodium group” and “Apium group” of Apiaceae are characterized by large IR expansion and contraction, respectively (Downie and Jansen 2015). In Mimosoideae, tribe Ingeae and Acacia s.s. (nested within Ingeae) formed a strongly supported clade, and “IR-expanded into SSC” was explored to be synapomorphy of the Ingeae + Acacia s.s. clade (Fig. 8). This clade could be named as “inverted-repeat-expanding clade (IREC)” comparing with the IRLC in Papilionoideae.