Introduction

Terrestrial habitat islands are patches of unique ecosystems on land that resemble oceanic islands in that they are spatially isolated from each other and gene flow between organisms on different islands is restricted. These islands have therefore been suggested as model systems for ecological and evolutionary studies (Barbara et al. 2008; Barbara et al. 2007; Porembski and Barthlott 2000). Some recent studies from different types of terrestrial habitat islands greatly improve our knowledge on biogeography and evolution of species on these islands (Barbara et al. 2008; Barbara et al. 2007; DeChaine and Martin 2005; Fan et al. 2012; Hughes and Eastwood 2006; Knowles 2001; Liu et al. 2012c; Schönswetter et al. 2005; Smith and Farrell 2005; Yu et al. 2013). However, these study areas are mainly located in temperate mountains of North America (Browne and Ferree 2007; DeChaine and Martin 2004; DeChaine and Martin 2005; Knowles 2001), the Qinghai-Tibet Plateau and adjacent areas in Asia (Fan et al. 2012; He and Jiang 2014; Liu et al. 2012c), and Europe (Schönswetter et al. 2005). Little is known of the species restricted to terrestrial habitat islands of subtropical China (Sun et al. 2014b; Yan et al. 2012).

Montane cloud forests (MCFs) are a type of terrestrial habitat island composed of evergreen mountain forest in tropical and subtropical regions, usually found between 1000 and 3000 m above sea level (Cayuela et al. 2006; Chu et al. 2014; van de Weg et al. 2014). MCFs are frequently enveloped by trade wind-derived orographic clouds and mist (Still et al. 1999), and are highly fragmented because of their rare microclimatic envelope (Cruz-Cardenas et al. 2012). The isolation and uniqueness of MCFs promote explosive speciation and high endemism of biota, while making them one of the most vulnerable ecosystems to climate change and human disturbance (Foster 2001; Oliveira et al. 2014). Most studies of MCFs to date have concentrated on species from the Neotropics and African tropics, with little attention paid to MCFs of subtropical Asia (Anderson et al. 2012; Anderson and Ashe 2000; Bradford and Jaffre 2004; Patterson et al. 1998).

The Chinese subtropics harbor higher plant diversity than other areas in the region (Wu 1980; Ying 2001; Ying et al. 1993) and are considered to be among the most important refugia for lineages that evolved prior to the late Pliocene and Pleistocene glaciations (Axelrod et al. 1996). In China, the subtropical zone ranges from 34° N to 22° N and has experienced complex climatic changes and shifting species distributions during the glacial-interglacial epochs. Paleoclimatic data indicate that the climate of subtropical China was cooler (by c. 4–6 °C) and drier (by c. 400–600 mm/yr. precipitation) at the Last Glacial Maximum (LGM) than at present (Lu et al. 2013; Sun and Chen 1991; Zheng et al. 2003; Zhou et al. 1991). During the LGM, the warm-temperate evergreen forests retreated southward and tropical forests disappeared from the southern Chinese mainland (Harrison et al. 2001; Yu et al. 2000). However, the diverse habitats in the mountain regions may have buffered these extreme climatic effects. The wide mountain ranges (eg. Nanling, Lingnan, Xuefeng, and Wuyi Mountains) in this region would have provided suitable refugial habitats for species to survive and maintain high levels of genetic diversity during the LGM.

Phylogeographic studies on plant taxa endemic to southern China have revealed limited dispersal and migration, less effects of Quaternary glaciation (as compared to species of the Qinghai-Tibetan Plateau), multiple localized refugia, and geographic barriers to gene flow (e.g. mountains and drainage systems), all of which influence the patterns and levels of genetic diversity of plant populations (Gao et al. 2007; Sun et al. 2014a; Wang et al. 2009; Xu et al. 2015; Yan et al. 2012; Zhang et al. 2013). Interestingly, these studies showed a range of contrasting trends in phylogeographic patterns between plant species that related to differences in the species’ growth habitats, adaptive capacity, breeding systems, and seed dispersal abilities. Generally, the subtropical-adapted species lost suitable habitat during the LGM, but later showed some range expansion after the LGM (Liu et al. 2012a; Sun et al. 2014a; Xu et al. 2015). Conversely, cold-tolerant temperate and semi-arid taxa experienced range expansion during the LGM (Lu et al. 2015; Wu et al. 2010; Yan et al. 2012). These phylogeographic studies provide profound insights into the diversification pattern of plants in subtropical Asia. However, whether these patterns hold true for MCF endemics of subtropical China is unknown.

Threatened species with a narrow niche (e.g. terrestrial habitat islands) are likely to be sensitive to the effects of global climate change and human disturbance because of their restricted distribution range, small population size, and limited gene flow among populations (Robin et al. 2010; Zhao et al. 2013). How to conserve the rare species restricted to subtropical MCFs is largely unknown. Therefore, studies on the spatial patterns of genetic diversity and historical evolution of MCF species are essential to support and inform much needed conservation strategies.

Quercus arbutifolia, belonging to Quercus subg. Cyclobalanopsis, is a small, shrubby tree native to MCFs of subtropical China and southern Vietnam, characterized by limestone habitats on mountain tops, or open slopes close to the peaks, at 1600–1800 m elevation (Deng et al. 2011a; Deng et al. 2011b; Huang et al. 1999). Currently, only six populations of Q. arbutifolia are known, all of which are highly restricted and fragmented (130–1000 km between populations). Several populations have experienced severe habitat degradation, such as the population on Daqin Mountain, Fujian province, whose habitat has been mostly taken over by bamboos and fewer than 100 individuals remain. According to the International Union for Conservation of Nature Red List Categories & Criteria (IUCN 2011), this species qualifies as Vulnerable (Deng et al. 2011a). Despite this threat listing, little is known of the patterns of genetic diversity and gene flow between the few remaining populations of Q. arbutifolia. Considering the restricted distribution area, small population size, uniqueness of ecological niche, severe degradation of habitat, and impending global climate change, Q. arbutifolia is likely actually Critically Endangered, but more information is needed to confirm this threat status change. Understanding its pattern of genetic diversity and gene flow is crucial for conservation management.

In this study, we used cpDNA and nrDNA sequences to investigate the spatial genetic structure of the rare oak Q. arbutifolia. The aim of this study was to address the following questions: (1) What is the population divergence pattern of Q. arbutifolia and does the observed genetic pattern reflect modern or historical phylogeography? (2) How does the phylogeographical pattern compare to other closely related species that are more widespread, or to other plants in general, and why? (3) Are the observed levels of genetic diversity and population structure compatible with the view of ‘terrestrial habitat islands’? (4) What are the implications of these results for conservation management of this threatened species?

Materials and methods

Sampling and laboratory procedures

A total of five wild populations of Q. arbutifolia were sampled during 2010 and 2012, covering all known localities of this rare species in China (Fig. S1). Only one other population of this species is known, located in Vietnam, which we were not able to sample. Eighty-one individual trees from five populations were sampled. In an ideal situation, the trees were sampled at least 20 m away from each other. However, in some populations, where the population size was too small, such as the CH population (in Guangdong province) and the DM population (in Guangxi province), we sampled all the trees we could find regardless of the distance between them. Two to three healthy leaves were sampled from each individual and dried in silica gel until DNA extraction. Voucher specimens of each tree were collected and deposited in the herbarium of Shanghai Chenshan Botanical Garden (CSH). Provenance data on populations and their abbreviations are summarized in Table 1.

Table 1 Localities of sampled Quercus arbutifolia populations

Total genomic DNA was extracted from silica-dried leaves using the modified CTAB method (Doyle 1987). Four cpDNA regions: atpB-rbcL (Terachi 1993), atpI-atpH (Grivet et al. 2001), petG-trnP (Huang et al. 2002) and rpl16 intron (Small et al. 1998) and internal transcribed spacer (ITS) of nrDNA (Bellarosa et al. 2005) were sequenced for all samples. All primer sequences used in this study are given in Table S1. PCR reactions were performed using TakaRa Taq PCR Amplification Kit (TakaRa, Inc., China) in 20 μL volumes containing 1× PCR buffer, 2.0 mM MgCl2, 250.0 μM each dNTP, 0.2 μM each primer, 1.5 U Taq DNA polymerase and 50.0 ng genomic DNA. Typical conditions for PCR were 4 min at 94 °C, followed by 35 cycles of 45 s at 95 °C, 60 s at 52 °C, and 60 s or 120 s at 72 °C. A final extension followed at 72 °C for 10 min. The amplified PCR products were verified on a 1.0 % agarose gel and products were bidirectionally sequenced by a professional laboratory (Shanghai Sangon Biological Engineering Technology & Service, Shanghai, China) following standard sequencing protocols. Sequences obtained in this study have been deposited in GenBank (Table S2).

Double-stranded sequences were assembled and checked by eye in SEQUENCHER 4.1.4 (Gene Codes Corp., Ann Arbor, MI. U.S.A.). Seven heterozygous loci were detected for ITS. IUPAC ambiguity codes were used to represent polymorphic ITS sites of heterozygous individuals. Insertions and deletions (indels) were coded as single mutations. Four cpDNA sequences were concatenated into one matrix for analysis. The cpDNA and nrDNA ITS sequences were analyzed separately. The PHASE 2.1 Bayesian method (Stephens and Donnelly 2003; Stephens et al. 2001) of DnaSP v5 (Librado and Rozas 2009) was used to resolve the gametic phase of heterozygotes for ITS. The number of iterations and threshold of haplotype probability were 100 and 0.9, respectively. Phased sequences with weak statistical support (probability < 0.9) were removed. This haplotype gathering procedure can provide accurate allelic data for subsequent analyses (Librado and Rozas 2009). Recombination tests for ITS were performed in RDP v4 (Martin and Rybicki 2000) with default parameters.

Genetic diversity and population genetic structure

Haplotypes and polymorphic sites of Q. arbutifolia obtained from the four cpDNA sequences and ITS sequences, respectively, were calculated using DnaSP v5 (Librado and Rozas 2009). Population genetic indexes [haplotype diversity (h), nucleotide diversity (π)] were computed in Arlequin v3.5 (Excoffier and Lischer 2010; available at http://cmpg.unibe.ch/Software/arlequin35/).

Spatial analysis of molecular variance (SAMOVA) for cpDNA and ITS sequences were performed in SAMOVA v1.0 (Dupanloup et al. 2002) to identify population genetic structure. Analyses were run repeatedly using different numbers of K groups (inferred population clusters) ranging from K = 2–4 in order to maximize the percentage of variance among groups. All data were also analyzed with the Bayesian inference model in BAPS 6.0 (Corander and Marttinen 2006; Corander et al. 2008; Corrander et al. 2006) using 10 runs with maximum K set to 5. Results of the mixture analyses were further used in the admixture analysis. We conducted a hierarchical analysis of molecular variance (AMOVA) to further quantify genetic differentiation between groups (as defined by SAMOVA), as well as between populations within groups and among individuals within populations. AMOVA analyses were performed with the program Arlequin v3.5 (Excoffier and Lischer 2010). Total haplotype diversity (H T), within-population diversity (H S), population differentiation (G ST) and an estimate of population subdivision for phylogenetically ordered alleles (N ST) were calculated using the program HAPLONST (Pons and Petit 1996). An N ST value significantly greater than G ST shows the existence of phylogeographic structure among populations. Corresponding estimates of gene flow (Nm), or the average number of migrants exchanged among populations per generation, were calculated from G ST according to the equation Nm ≈ (1-G ST) /4G ST, as modified from Wright (1949) and applicable for nuclear genes (Qiu et al. 2007).

To evaluate the phylogenetic relationships among haplotypes, a median-joining network (Bandelt et al. 1999) of cpDNA and ITS sequences was constructed using Network v4.6.1.2 (available at http://www.fluxus-engineering.com/sharenet_rn.htm). To examine the effect of geographic distance on genetic structure and the relative contribution of gene flow and drift to genetic structure (Hutchison and Templeton 1999), isolation-by-distance (IBD) analyses were performed using GENALEX v6 (Peakall and Smouse 2006; available at http://biology.anu.edu.au/GenAlEx/Welcome.html).

Haplotype divergence time estimations

The divergence times of all cpDNA haplotypes were calculated in BEAST v1.8.0 (Drummond et al. 2012). The uncorrelated lognormal relaxed clock model was selected because likelihood ratio tests of cpDNA and ITS in PAUP* 4.01b10 (Swofford 2002) rejected the hypothesis of a strict molecular clock (Table S3). The GTR model of DNA sequence evolution was chosen based on Akaike information criterion (AIC) results from Modeltest v3.7 (Posada and Crandall 1998). Q. aliena, Q. variabilis, Lithocarpus litseifolius, L. longinux, and Castanea seguinii from Fagaceae were chosen as outgroups. The GenBank accession numbers of the five species were KM21618-KM21622 and KU312175-KU312189 (Table S2). The constant-size coalescent was applied as the tree prior. Four fossil taxa were used to assign minimum age constraints on four internal stem nodes: Castanea (fossilized pistillate flowers from Castanopsoidea sp., 47.8–59.2 Ma; Crepet and Nixon 1989), Q. aliena (fossilized acorns from Q. rehenana [Kräusel et Weyland] Knobloch & Z. Kvaček of section Lobatae, 23.03–33.9 Ma), Q. varibilis (fossil taxa of Q. cerrisaecarpa of sect. Cerris, 15–17.5 Ma; Song et al. 2000) and Lithocarpus (fossilized leaves with cuticle from Lithocarpus saxonicus H. Walther & Kvacek, 23.0–33.9 Ma; Kvacek and Teodoridis 2007; Sauquet et al. 2012). For the BEAST analysis, MCMC runs were performed for 1 × 108 generations, with sampling every 1000th generation, following a burn-in of the initial 20 % of cycles. MCMC samples were inspected in TRACER 1.6 (http://tree.bio.ed.ac.uk/software/tracer/) to confirm sampling adequacy and convergence of the chains to a stationary distribution. Resulting chronograms were visualized in FigTree 1.4 (http://tree.bio.ed.ac.uk/software/figtree/).

Demographic analyses

Tajimaʼs D test (Tajima 1989) and Fu’s Fs test (Fu 1997) were performed to examine whether Q. arbutifolia underwent historic and/or recent demographic expansion. Positive D values may be due to balancing selection or population contraction, while negative D values can represent positive selection or population expansion. A significantly negative Fs value indicates a recent demographic expansion. Mismatch distribution analysis (MDA; Schneider and Excoffier 1999) was also used to infer the demographic history of Q. arbutifolia. Unimodal pairwise mismatch distributions indicate that populations have experienced recent demographic expansion, whereas multimodal distributions are related to demographic equilibrium or decline (Rogers and Harpending 1992; Slatkin and Hudson 1991). The sum of squared deviations (SSD) between the observed and expected distributions was computed. Raggedness index (H Rag) and P values were estimated to test the significance of the population expansion model (Harpending 1994). If sudden expansions were detected, the equation t = τ/2u (Rogers and Harpending 1992) was used to estimate the expansion time of Q. arbutifolia. The value τ (tau) was obtained from mismatch distribution analysis (τ is the time in number of generations elapsed since the sudden expansion episode). Value u (mutation rate per generation for the total length of analyzed sequence) was calculated as u = μkg, where μ is the mutation rate per nucleotide site per year (s/s/y), k is the average sequence length used for analysis and g is the generation time in years. For μ, we used the mean substitution rate of ITS in perennial woody plants (2.15 × 10−9) (Kay et al. 2006). All the aforementioned demographic analyses were performed in Arlequin v3.5 (Excoffier and Lischer 2010).

Results

Sequence diversity and haplotype distribution

The lengths of the cpDNA loci petG-trnP, atpB-rbcL, rpl16, and atpI-atpH were 444 bp, 795–802 bp, 965–966 bp, and 1076–1109 bp, respectively. The concatenated length of all four cpDNA sequences was 3278–3313 bp. The length of nrDNA ITS sequences was 538–578 bp. The combination test based on RPD indicates no recombination signals in ITS. Based on transition/transversion and indel variation, a total of 16 cpDNA haplotypes, and 26 phased ITS haplotypes were obtained. The variable sites of each haplotype are summarized in Table S4 and Table S5.

Total haplotype diversity (H T) and within-population diversity (H S) for cpDNA sequences were estimated to be 0.962 and 0.511, respectively, and that for ITS sequences were 0.891 and 0.612, respectively. Haplotype diversity indexes can be seen in Table 1 and spatial distributions of cpDNA and ITS haplotypes are illustrated in Figs. 1 and 2, respectively. For most populations, haplotype diversity and nucleotide diversity were higher for ITS than for cpDNA loci. Only population SH exhibited higher haplotype and nucleotide diversity for cpDNA. Population YZ had the highest nucleotide diversity for both types of marker. It is notable that population DM, the most geographically isolated population, had the second highest intra-population haplotype diversity, but the lowest nucleotide diversity for cpDNA loci (Table 1). For ITS, population DM had the lowest diversity index values of all populations.

Fig. 1
figure 1

Distribution of 5 populations of Quercus arbutifolia (abbreviations listed in Table 1) and 16 cpDNA haplotypes. The blue dashed lines delimit the four population groups detected by SAMOVA

Fig. 2
figure 2

Distribution of 26 nrDNA (ITS) haplotypes identified in 5 populations of Quercus arbutifolia (abbreviations listed in Table 1). The red dashed lines delimit the two population groups detected by SAMOVA

High proportions of unique haplotypes were found within populations for cpDNA and phased ITS sequences (Table S6 and S7). Of cpDNA haplotypes, 13 out of 16 were private to one population. Common cpDNA haplotype H1 was shared by populations SH, YZ and ZZ. Haplotype H2 was shared by populations SH and ZZ, and haplotype H3 was shared with populations SH and YZ (Fig. 1). For nrDNA ITS sequences, 20 out of 26 haplotypes were private. The most common nrDNA haplotype, H5, was shared by populations SH, CH, YZ and ZZ (Fig. 2). There were no haplotypes identified that were common to all populations for either type of marker.

Population genetic structure and phylogeographic structure

The phylogenetic relationships among haplotypes are revealed in the median-joining networks (Fig. 3). Only one missing (i.e. extinct or unsampled) haplotype was inferred by the cpDNA network (Fig. 3a), while the nrDNA network hypothesized four missing haplotypes (Fig. 3b). The maximum clade credibility (MCC) chronogram of cpDNA haplotypes showed a distinct genetic difference between the DM population and the other four populations (Fig. 4).

Fig. 3
figure 3

Networks of (a) 16 cpDNA haplotypes and (b) 26 nrDNA (ITS) haplotypes identified in Quercus arbutifolia. Circle size is proportional to the frequency of a haplotype over all populations. Each line between haplotypes represents a mutational step. Solid diamonds represent hypothetical missing haplotypes

Fig. 4
figure 4

Bayesian analysis chronogram of Quercus arbutifolia cpDNA haplotypes and five outgroup species from Quercus s.l., Castanea and Lithocarpus. Fossil calibration points are marked with letters in circles and correspond to the date ranges listed in the upper left table. Numbers above branches are posterior probabilities (PP). The divergence time (in Ma) and 95 % highest probability density (HPD) for the most recent common ancestor (TMRCA) of sampled Q. arbutifolia populations are shown below the node. Colored circles correspond to the 16 haplotypes as shown in Fig. 1. The vertical line indicates the Miocene-Pliocene boundary. Pli: Pliocene; Ple: Pleistocene

A significant phylogeographic signal was detected both in cpDNA (N ST = 0.68 > G ST = 0.47; P < 0.05) and ITS sequences (N ST = 0.36 > G ST = 0.31; P < 0.05) (Table 2). SAMOVA results recovered an optimal grouping of K = 4 inferred population clusters for cpDNA loci (SH + ZZ, CH, YZ, DM; F CT = 0.59), and K = 2 for ITS (SH + ZZ + CH + YZ, DM; F CT = 0.55). BAPS tests revealed the existence of four clusters for both cpDNA and ITS datasets (Fig. S2). In BAPS tests, the most distinct populations were DM and YZ when analyzing cpDNA (Fig. S2a), while ITS analysis identified DM as the most distinct population (Fig. S2b), consistent with the SAMOVA results. Based on the groupings by SAMOVA, AMOVA analysis showed that variation among groups was greater than 50 %, variation within populations was greater than 30 %, and variation among populations within groups was less than 10 % for both cpDNA and ITS analyses (Table 3). The G ST-derived Nm value of 0.549 is indicative of restricted gene flow among wild populations. Mantel tests conducted on the genetic and geographical distance matrices revealed significant IBD for cpDNA (R2 = 0.224, P = 0.048) and ITS sequences (R2 = 0.711, P = 0.020).

Table 2 Results of phylogeographical signal analysis, neutral tests and mismatch analysis
Table 3 Hierarchical analyses of molecular variance (AMOVA) for both marker datasets (cpDNA and nrDNA) for all populations and SAMOVA-derived groups of Q. arbutifolia

Divergence time dating

The divergence time of the most recent common ancestor (TMRCA) of Q. arbutifolia populations revealed by cpDNA haplotypes was c. 10.25 Ma (95 % HPD: 14.89–5.89 Ma). The maximum clade credibility chronogram generated by BEAST revealed the DM population diverged early from the remaining populations, around 10 Ma (the clade of H7 + H8 in Fig. 4). The remaining lineages began to diversify around the Miocene-Pliocene boundary and diversification intensified during the Pleistocene. The average substitution rate of cpDNA for Q. arbutifolia was 0.54 × 10−9 s/s/y, which was similar to the rate previously reported for Q. glauca (0.69–0.96 × 10−9 s/s/y; Xu et al. 2015). Although average substitution rates generally reported for non-coding regions of the chloroplast genome from other plants are 1.2–1.7 × 10−9 s/s/y (Graur and Li 2000), the substitution rate calculated for Q. arbutifolia is consistent with the comparatively low evolution rate of the chloroplast genome reported in Fagaceae (Frascaria et al. 1993; Simeone et al. 2013).

Historical demography

Tests for demographic expansion and mismatch analysis detected dissimilar patterns between cpDNA and ITS sequences. The values of D and Fs were positive for cpDNA sequences (D = 0.807, P = 0.258; Fs = 1.192, P = 0.351), but significantly negative for ITS sequences (D = −1.253, P < 0.05; Fs = −6.860, P < 0.01) (Table 2). The mismatch distribution for cpDNA haplotypes was multimodal, but the mismatch distribution for ITS haplotypes of Q. arbutifolia was unimodal (Fig. 5). The values of the sum of squared deviations (SSD) between observed and expected distributions and raggedness index (H Rag) were not significant for ITS (Table 2). This evidence shows that ITS data, but not cpDNA data, indicate historical population expansion. Using 20 years as the generation time, we estimated that the time of population expansion based on the ITS dataset was 98.1 ka (159.1–25.5 ka), during the last interglacial of the Quaternary.

Fig. 5
figure 5

Mismatch distribution for all populations of Quercus arbutifolia based on (a) cpDNA data and (b) ITS data. Black and gray lines represent the expected and observed mismatch distributions, respectively; R represents the raggedness index and p represents the significance of a simulation value greater than the observed value

Discussion

Genetic diversity

Rare and endangered species with limited distributions are generally expected to exhibit low levels of genetic variation due to stochastic events, such as genetic drift and inbreeding (Hedrick and Godt 1989; Spielman et al. 2004). Instead, we observed very high genetic diversity based on the cpDNA sequences of Q. arbutifolia across the 5 populations in southern China. The cpDNA total genetic diversity (H T = 0.962) among all populations of Q. arbutifolia was unexpectedly high when compared with other angiosperm species, which exhibit a mean value of H T = 0.670 (Petit et al. 2005). Comparatively high cpDNA diversity has been detected in several other East Asian species of Fagaceae, such as Q. variabilis (H T = 0.888 for 26 haplotypes based on 3 cpDNA intron regions; Chen et al. 2012), Castanopsis fargesii (H T = 0.973 for 49 haplotypes based on 8 cpSSR loci; Sun et al. 2014a) and Q. glauca (H T = 0.902 for 33 haplotypes based on 3 cpDNA intron regions; Xu et al. 2015), but these species are much more common and widespread than Q. arbutifolia. Two possible reasons for the unexpectedly high genetic diversity found in Q. arbutifolia may be its long evolutionary history and rare extinction events. The long evolutionary history, dating back to the late Miocene (Fig. 4), would have allowed the species to accumulate many mutations, and as evidenced by the relatively few missing haplotypes in the cpDNA and ITS networks (Fig. 3), very little of that genetic diversity has been lost over time.

Furthermore, the complex geomorphology in southern China, especially the large mountain chains with high climatic and habitat heterogeneity, may have offered refugia for Q. arbutifolia populations to survive historic climatic extremes and thus maintain high genetic diversity. Moreover, terrestrial habitat islands, like MCFs, can act as generators of genetic diversity over both recent and historical timescales (Knowles 2001; Masta 2000; McCormack et al. 2008; Sullivan 1994). Although the ecological niche of Q. arbutifolia seems very narrow, it could have survived during the Quaternary glaciation/interglaciation periods by altitudinal shifts up and down the mountains that maintained sufficiently high effective population sizes and preserved genetic diversity (Hewitt 1996). Such distributional shifts caused by climatic fluctuations may have occasionally resulted in different populations or subpopulations coming in contact during glacial periods, and isolating during interglacial periods. This process might have contributed to diversification at the population level for this species (Carstens and Knowles 2007; McCormack et al. 2009). The unexpectedly high level of genetic diversity in Q. arbutifolia also suggests that habitat fragmentation and population isolation might have occurred recently, as factors such as genetic drift and inbreeding would not have had enough time to homogenize the genetic diversity of this long-lived tree species.

Phylogeographic structure

AMOVA analyses showed that the proportion of genetic variation among population groups was greater than 50 %, and significant isolation by distance (IBD) and phylogeographic structure were detected in both cpDNA (N ST = 0.68, G ST = 0.46, P < 0.05) and ITS (N ST = 0.36, G ST = 0.31, P < 0.05) analyses. Interestingly, the phylogeographic substructure and IBD results of Q. arbutifolia are inconsistent with previous studies on other Fagaceae species in East Asia, which recovered no obvious phylogeographic structure based on cpDNA sequences for Q. glauca (Xu et al. 2015), Q. variabilis (Chen et al. 2012), Castanopsis eyrei (Shi et al. 2014), Fagus lucida and F. longipetiolata (Zhang et al. 2013). This lack of phylogeographic structure was mainly due to these species having large, interconnected populations and lowland distributions that were less affected by climatic changes during the Quaternary. Being an MCF species, Q. arbutifolia occupies very small and isolated areas and inhabits a unique ecological niche, which makes it unlikely to colonize wide regions. Phylogeographic studies on plant species from terrestrial habitat islands of southwest China (e.g. Sedum lanceolatum) also showed significant phylogeographic structure with IBD (He and Jiang 2014). Although the topographic complexity in southeast China is not as high as that in southwest China, the vast continental mountainous areas in southeast China can be effective barriers to gene flow for those species with narrow distributional ranges and restricted ecological niches. Thus, Q. arbutifolia exhibits population dynamics more similar to distantly related plant species of other terrestrial habitat islands, like Sedum lanceolatum, than to species that are phylogenetically and geographically much closer, such as Q. glauca.

The pattern of genetic barriers among populations of Q. arbutifolia detected by SAMOVA analyses differed for cpDNA and ITS loci. The cpDNA analysis identified significant gene flow barriers between four of the five populations, whereas the ITS analysis revealed just one significant genetic barrier between DM and the remaining four populations. This difference is likely due to the contrasting modes of dispersal and inheritance of chloroplast versus nuclear markers. Chloroplast DNA is maternally inherited (Morris et al. 2010) and is thus only transferred through the large, heavy acorns (female gametophyte) of Quercus species, which are dispersed mostly by gravity and rarely end up far from the mother tree. Although rodents and certain birds (e.g. jays and woodpeckers) can move acorns some distance, because of their hoarding behavior, such a dispersal mechanism is rather inefficient and cached acorns are usually restricted to within half a kilometer of the mother tree (Pons and Pausas 2007; Ramos-Palacios et al. 2014; Scofield et al. 2011; Scofield et al. 2010; Steele et al. 2001; Takahashi et al. 2006; Xiao et al. 2005; Zhang et al. 2006). Thus, the maternally inherited cpDNA genome is difficult to exchange between populations growing even a short distance apart. On the other hand, oak pollen grains, released by the millions from male catkins, are able to disperse very long distances on the wind. Interpopulation pollen flow was estimated to be 200 times greater than interpopulation seed flow in oaks (Ennos 1994). Therefore, compared to biparentally and paternally inherited markers, maternally inherited markers generally exhibit much higher population subdivision, as seen by the SAMOVA results presented here.

The genetic differentiation among populations was higher for cpDNA than for ITS (Table 2), an expected result based on the outcomes of previous studies and the characteristics of Fagaceae reproductive biology. A variety of factors can lead to higher genetic diversity patterns for cytoplasmic than nuclear markers. One reason is historic introgression of chloroplasts from closely related species. The phenomenon of chloroplast capture through hybridization is common in Fagaceae (Petit et al. 2003; Shi et al. 2014). Although we did not find evidence of hybridization of Q. arbutifolia with other Quercus species in our field observations, ancient introgression and chloroplast capture from other oaks might be possible, especially as species’ ranges were shifting during glacial/interglacial transitions (Simeone et al. 2016). Another reason for higher genetic diversity in cpDNA markers is because of the mode of dispersal of cytoplasmically inherited genomes compared with nuclear genomes, which was outlined in detail earlier in this section. The different gene flow barriers detected by cpDNA and ITS markers reflect the relative levels of pollen and seed migration among populations of Q. arbutifolia. Similarly, levels of population differentiation under drift-migration equilibrium are expected to differ for these markers due to the relative inability of cpDNA genomes to mix and homogenize between populations.

Evolutionary history of Q. arbutifolia

Our analyses suggest that early intraspecific divergence of Q. arbutifolia occurred during the late Miocene (10.25 Ma; 95 % HPD: 14.89–5.89 Ma) and that lineages started to diversify around the Miocene-Pliocene boundary, with intensified lineage expansion during the Pleistocene (Fig. 4). The estimated age of almost all the recovered haplotypes is younger than 2.6 Ma (except for H4 and H6). These estimates for the divergence time from TMRCA and the diversification of haplotypes are similar to another widespread evergreen oak in the southeast Asian subtropical region, Q. glauca (Xu et al. 2015), and to four Chinese beeches (Fagus spp.) (Zhang et al. 2013), and two East Asian relict tree species, Tetracentron sinense (Sun et al. 2014b) and Cercidiphyllum (Qi et al. 2012). These results confirm that geomorphological and climatic changes prior to the Quaternary, as well as in the Quaternary, have had great impacts on the evolutionary history of tree species in subtropical East Asia.

The late Miocene was a key period of diversification for temperate and subtropical woody plants in East Asia (Sun et al. 2014b). The significant uplift of the Tibetan-Himalayan Plateau during the late Miocene (10–8 Ma) resulted in aridification in Central Asia (8–7 Ma) (Guo et al. 2002; Miao et al. 2012; Tang et al. 2011) and intensified Indian and East Asian monsoons (9–8 Ma) (An et al. 2005; An et al. 2001), which are widely seen as the main factors triggering the Neogene rapid diversification of plants in East Asia (Fan et al. 2013; Gao et al. 2007; Li et al. 2008; Yao et al. 2011). The speciation event and early diversification of Q. arbutifolia lineages follow this timeline.

The two earliest-derived cpDNA haplotypes in Q. arbutifolia were found only in the DM population in southwest China (Figs. 3 and 4), and the SAMOVA analyses and BAPS analyses also indicate there was a significant genetic barrier between the DM population and the others. The DM population may have split from the others around the Miocene-Pliocene boundary (Fig. 4), and then evolved independently. Mountains, rivers, ravines, and anthropogenic barriers can hinder gene flow via seed and pollen dispersal among populations. The Pearl River drainage system developed during the Mid-Miocene (Shao et al. 2008), which coincides with the divergence time between the DM population and the others. Therefore, the Pearl River may have acted as the main barrier causing the early isolation of DM, to the west of the Pearl River, from the other populations to the east. Further field surveys are needed to identify and sample any additional unknown populations from west of the Pearl River valley to confirm this hypothesis.

The influence of more recent Quaternary climatic oscillations on the current geographical distributions and population genetic structure of plant species in subtropical China has been well documented (Liu et al. 2012b; Qiu et al. 2011; Wang et al. 2015). Haplotype diversification of Q. arbutifolia intensified during the Pleistocene with almost all haplotypes dating to younger than 2.6 Ma (Fig. 4), which suggests that recent Quaternary climate changes had great impacts on shaping its current genetic diversity pattern. Indeed, a similar scenario has been detected in other subtropical Fagaceae species in China (Zhang et al. 2013; Xu et al. 2015). Additionally, a star-like pattern was detected in the ITS haplotype network (Fig. 3b), which is generally considered to indicate relatively recent range-expansion events (e.g. Matthews et al. 2007; Teixeira et al. 2011). Neutral tests and mismatch analysis of ITS data also indicate that Q. arbutifolia populations recently underwent population expansion (Table 2). The expansion time predicted by ITS was 98.1 ka (95 % HDP 25.5–159.1 ka), which coincides with the end of the last interglacial (c. 116–130 ka) (Otto-Bliesner et al. 2006). Unlike the ITS data, there was no population expansion detected based on cpDNA data. As described above, the different modes of inheritance of cpDNA and nuclear DNA is likely the primary reason for the incongruous results, with ITS being a more sensitive marker to distributional shifts and expansions.

The highly heterogeneous habitats of large montane regions enable species to migrate vertically during climate fluctuations to altitudes with a more suitable climate (Hewitt 1996). During glacial periods when temperatures dropped, high altitude plants migrated to lower altitudes, facilitating population mixing. When temperatures increased during the interglacial periods, plants would likely have moved to higher altitudes to establish the isolated populations typical of terrestrial habitat island species. Inconsistent with our expectations, Q. arbutifolia populations did not expand in the lowlands during glacial periods, as evidenced by our mismatch distribution analysis, which instead showed that population expansion occurred during the last interglacial of the Quaternary. This may be due to a lack of sufficient moisture in the lowlands during glacial periods, which led to a reduction in suitable habitat. Additionally, the relatively slow maturation rate and limited dispersal ability for Q. arbutifolia acorns, and its slow growth rate, would have hindered the massive range expansion of populations across the lowlands during glacial periods. Conversely, the subtropical-adapted Q. arbutifolia would have been better suited to the wetter climate during in the last interglacial of the Quaternary, leading to population expansion during this time period.

Implications for conservation

Evaluating levels of genetic diversity and understanding population dynamics of threatened species are essential criteria for the development and management of effective conservation strategies. Our results show that although Q. arbutifolia has only six populations and likely fewer than 2000 individuals remaining in the wild, it still maintains unexpectedly high levels of genetic diversity. Considering the limited distribution, small number of isolated populations, small total population size, limited gene flow, and severe ongoing degradation of its unique montane cloud forest habitat, Q. arbutifolia is at high risk of extinction.

Our results indicate that populations YZ and ZZ had higher haplotype and nucleotide diversity, and more haplotype uniqueness, than the other three populations (CH, SH, and DM). Therefore, these two populations are valuable because they represent the deepest gene pool for the species. However, the private haplotypes present in the other three populations should not be undervalued, as populations that have unique haplotypes may harbor valuable adaptive genetic diversity and need to be carefully protected (Liao et al. 2007). Population DM had the second highest intra-population haplotype diversity, but the lowest nucleotide diversity (π) index for cpDNA analyses. Low levels of nucleotide diversity with high haplotype diversity may signal past genetic bottleneck events, during which most haplotypes became extinct followed by a population expansion event (Avise 2000; e.g., Ojeda 2010). A lack of genetic diversity makes populations more susceptible to deleterious genetic processes (e.g. genetic drift, harmful genetic mutations and inbreeding), pests and diseases, and climate fluctuations. Considering the early isolation and unique (albeit narrow) genetic composition of population DM, this population is likely to be the most vulnerable to extinction. Given the rarity and isolation of Q. arbutifolia populations and the unique gene pool harbored within each population, it is our recommendation to protect all populations of this threatened species.

A national network of in situ conservation sites of Q. arbutifolia, in which long term active conservation and monitoring is carried out, should be implemented as soon as possible. Populations YZ and SH have the largest population size and are located in existing nature reserves, which make these two populations more secure than the others. It is our recommendation, therefore, that new conservation and monitoring sites for Q. arbutifolia be established to protect population ZZ (harboring high genetic diversity) and DM (harboring unique haplotypes). A strong in situ conservation program is the most practical and efficient way to conserve both Q. arbutifolia and its unique habitat.

As part of the in situ conservation program, it is important to ensure that conditions support increased seed set and germination, and ultimately gene flow, within populations. Comparatively, population sizes of DM, ZZ and CH are small (<100 trees). Severe habitat degradation has occurred in populations ZZ and CH, as their habitats have been invaded by bamboos. Low seed germination rate and the lack of fruit production in populations DM, ZZ, and CH in 2010–2014 make both transplanting individuals from the same population or propagation of seedlings impractical. Q. arbutifolia is a long-lived and slow-growing species, which is not likely to colonize degraded regions in the habitat. Therefore, ecosystem restoration, e.g. removing the invasive fast growing species (mostly bamboos), facilitating canopy closure, and seed predator control are important for long-term reproduction and survival of these three populations.

In situ conservation management plans for populations DM, ZZ, and CH, should focus on the enlargement of population sizes and restoration of regeneration abilities of the populations. The transplantation of plants or propagules from other populations into populations DM, ZZ, and CH is a potential conservation strategy. However, it is better that transplantations are carried out between populations that are already exchanging gene flow, such as populations SH and ZZ, because of the lack of information on the adaptive differences between populations of Q. arbutifolia. There was no gene flow barrier detected between populations SH and ZZ, which may be attributed to their close geographic distance, and population SH had the highest cpDNA haplotype diversity. Considering that population ZZ is no longer producing viable fruits, introduction of individuals from population SH to ZZ might be a solution to enlarge the population size and genetic diversity of ZZ. Population DM was the most genetically distinct population, but it is also low in nucleotide diversity and might have experienced a severe genetic bottleneck. Therefore, transplantation of individuals from other populations is likely to be an effective tool to the genetic enhancement of population DM, as the benefits of added genetic diversity are likely to outweigh any potential outbreeding depression effects.

In addition to these in situ conservation recommendations, ex situ conservation methods are also necessary to act as an insurance policy against extinction in the short term, and to aid the recovery and support of self-sustaining wild populations in the long term. Collecting seeds from the field and propagating them in nurseries could significantly improve germination rate and seedling survival. Eventually, self-sustaining wild populations can be re-established via population reinforcement in existing sites and through new introductions to sites that harbor suitable MCF habitat within the historic range of the species (Frankham et al. 2002). These sites could be in existing nearby nature reserves (e.g. Chebaling, Daiyunshan) or in newly designated conservation sites within MCFs. Considering the high level of genetic diversity and/or unique haplotypes in all Q. arbutifolia populations, if ex situ conservation is put into practice, samples should be collected from as many populations as possible. If more extensive resources are available, strategies such as controlled breeding, vegetative propagation, plant tissue culture, and micro-cuttings could be implemented as additional complementary techniques to prevent extinction. These strategies may be very effective to overcome the limited fruit production and low germination rate in many Q. arbutifolia populations.

Given the uniqueness of MCF habitat, it is possible that Q. arbutifolia might have very specific physiological requirements during growth and reproduction, which require further investigation to fully understand. Additional studies that thoroughly document the phenology, reproductive biology, pollination, seed viability, and seed dispersal mechanism(s) of Q. arbutifolia would greatly inform propagation and conservation strategies.

Conclusions

Q. arbutifolia exhibits an unexpectedly high level of genetic diversity, considering how small and fragmented its few remaining populations are. We relate this finding to the species’ long evolutionary history, infrequent haplotype extinction events, limited seed dispersal ability, and recent isolation of populations. Repeated vertical range shifts during the glacial/interglacial periods of the Quaternary likely led to iterative merging and isolation scenarios for different populations/subpopulations, which may have acted to promote high genetic diversity in Q. arbutifolia. Significant phylogeographic structure was detected, and genetic differentiation between populations was strongly related to their geographical distance. The time to TMRCA of all haplotypes was in the late Miocene. Subsequent rapid diversification of haplotypes occurred during the Quaternary. Population expansion of Q. arbutifolia occurred during the last interglacial of the Quaternary. The westernmost population (DM) was the first to diverge and become isolated from the rest of the populations during the late Miocene, which was likely due to the formation of the Pearl River drainage system. These ancient and more recent population diversification events correlate well with significant geological phenomena, such as the uplift of the Tibetan-Himalayan Plateau. In summary, the combined effects of paleogeomorphological and climatic changes of the Cenozoic have had profound impacts that shaped the modern distribution and genetic diversity pattern of Q. arbutifolia. Considering its high risk of extinction and the unique gene pools harbored by each population, all extant populations of Q. arbutifolia and its unique montane cloud forest habitat should be protected for genetic and ecological conservation.