Introduction

Forest trees are major components of the ecosystems in large geographical regions. Consequently, they are essential for the maintenance of both mountain and lowland biodiversity because many organisms depend on the habitats provided or shaped by woody plants. Past population dynamics may be used to improve the conservation and management of genetic resources, and numerous studies have focused on populations (past and present) of tree species in the southern European region where the main glacial refugia of biodiversity in Europe is believed to have occurred. The refugia are hotspots that appear to have allowed the survival of many species, with strong persisting effects on the distribution and diversity of current flora and fauna (Taberlet et al. 1998; Petit et al. 2003; Provan & Bennett 2008).

Conifers are the major components of many ecosystems, and many populations also have considerable economic importance (Farjon 2008; Eckenwalder 2009). Hence, the population demographics, genetic structures, migration patterns, and systematic relationships of conifers have been intensively studied, especially in the Northern Hemisphere (Naydenov et al. 2007; Godbout et al. 2008; Jaramillo-Correa et al. 2008).

In conifers, mitochondrial DNA (mtDNA) is maternally inherited and chloroplast DNA (cpDNA) is generally paternally inherited (Powell et al. 1995a, b; Vendramin et al. 1996). These differences can be highly valuable for tracing patterns of gene flow via seeds and pollen, respectively, and mtDNA and cpDNA have been widely used in analyses of the genetic structure of conifer populations (Naydenov et al. 2007; Bonavita et al. 2015). Because wind-dispersed plant seeds are thought to have a smaller effective radius of distribution than pollen (Wright 1979; Grant 1980), markers in these genomes have widely differing rates of gene flow. Additionally, the syntheses of multiple analyses of plants, birds and insects suggest that there is generally a negative correlation between the rates of intraspecific gene flow and introgression in the absence of selective pressures under a null neutral model (Petit & Excoffier 2009). This negative correlation indicates that the effective delimitation of species’ (or subspecies’) complexes will be more congruent with low mutation rates (i.e. long signal/memory of the population history) and high levels of variation in markers, such as cpDNA variations in coniferous species (Provan et al. 1999). This phenomenon is particularly important for analyses of the genetic structure of collective species with the following: (i) incomplete reproductive barriers; (ii) retained ancestral polymorphisms; (iii) fragmented distributions (e.g. by mountains); and (iv) a long history of anthropogenic pressure. Such is the case for the European Black Pine according to Fukarek (1958a, b), Dobrinov et al. (1982), and Vidakovic (1991). The meaning of collective species used by the authors is closer to the definition of a species complex and cryptic species of the present day.

The European Black Pine (Pinus nigra Arn., subgenus Diploxilon) is one of the most intensively studied conifers after the Scots Pine, Maritime Pine, and Norway Spruce. It is considered to be a Tertiary relict (i.e. 2.6–66.0 Ma; Mirov 1967) and has been one of the most economically important forest species in the Mediterranean region for at least 200 years. Its natural area of distribution is in southern Europe, northwestern Africa and Asia Minor. It is present in all of the main European glacial refugia, including the Iberian, Apennine, and Balkan Peninsulas, Turkey and some Mediterranean islands. Currently, its distribution is highly fragmented in its marginal regions and west of the Balkans (i.e. west of the Adriatic and Ionian Sea basins); however, it is more contiguously distributed in the Balkans and Asia Minor (Turkey). It occupies exposed sites with southern aspects and well-drained, poor soil at altitudes ranging from 100 to 1800 m above sea level according to Dobrinov et al. (1982).

The species was first taxonomically described by Miller (1768), and since then, several authors have identified more than 20 sub-species based on morphological and anatomical data (Fukarek 1958a, b; Vidakovic 1991). This delineation is of particular interest because the long-term fragmentation and persistence of individual populations has given rise to extreme adaptation to the local environment, which is expressed as high morphological variability among populations. Consequently, several authors consider it an example of a collective species (Fukarek 1958a, b; Dobrinov et al. 1982; Vidakovic 1991). During the last 30–40 years, several attempts have also been made to analyse the genetic and chemical-phenotypical population and subspecies structures of the Black Pine in geographic regions of varying sizes using karyology, terpene and allozyme analyses (Fineschi 1984; Bojović 1995; Naydenov et al. 2003). The first molecular analysis of the species’ genetic structure was based on cpDNA fragment SSR analyses of 324 individuals from nine Black Pine populations in Bulgaria (Naydenov et al. 2006). This analysis was followed by two studies using the same molecular markers in populations from the western Mediterranean in Europe (Raffi & Dodd 2007; Soto et al. 2010; Bonavita et al. 2015). The proposed structures have been generally linked with the species’ geographic distribution, but not the full area of its natural distribution was analysed. Thus, the global genetic dynamics and structure of the Black Pine remain unclear despite multiple publications on its biogeography and population genetic variation (Dobrinov et al. 1982; Vidakovic 1991; Bojović 1995).

Therefore, in the present study, we analysed sequence variation in 13 regions of the cpDNA of P. nigra individuals from across its entire natural range. Then, we evaluated the acquired data using several statistical methods to more thoroughly elucidate the Black Pine diversity and genetic structure. We hypothesised that the acquired data would indicate a strong phylogeographic structure in all studied populations and a high level of among-population differentiation.

Material and methods

Sampling and laboratory analysis

A total of 3154 individuals from 106 natural populations (29.7 trees per population on average) were sampled from the entire area of the species’ natural distribution from 5.1° N to 48.1° N latitude (from Morocco to Austria) and 4.7° W to 39.1° E longitude (from Morocco to Turkey, Fig. 1). Total DNA was extracted from megagametophytes or green needle tissues using the Plant Elute Genomic Mini Kit (Sigma-Aldrich, USA) following the manufacturer’s instructions. cpSSRs primer pairs that were tailed, modified and fluorescently labelled with FAM-6, NED or VIC as described by Vendramin et al. (1996) (synthesized by Applied Biosystems) were initially used to determine size variations at six microsatellite loci following the PCR protocol of Naydenov et al. (2006). MapMarker 1000-ROX (Bio Ventures) size standards, an ABI-3130 Genetic Analyzer and the Run3130 DATA Collection v. 3.0 and GeneMAPPER v.3.7 software (Applied Biosystems) were used. The six-tailed modified cpSSR primers were Pt-30204, Pt-36480, Pt-45002, Pt-71936, Pt-79951 and Pt-87268. The length variations for each chloroplast fragment were classified by size and population geographic origin. Then, for accuracy, both strands of the samples from a third of the individuals (i.e. 1051) were sequenced after purification using CleanSEQ and AMPure (Agencourt Bioscience) SPRI paramagnetic beads using the same six (non-labelled and non-modified) primers, a BigDye Terminator v. 1.1 Sequencing Kit (Applied Biosystems), and the same Genetic Analyzer. The obtained sequences were assembled by the SEQUENCHER v.4.8 software (Gene Code Corp.). Variations in 13 regions were investigated: seven SSRs (single sequence repeats) and six flanking regions with single nucleotide polymorphisms (SNPs). The seven SSRs were from six primer sets because the region sequenced with Pt-30204 had two independent repetitive sequences. The sequencing was performed to minimize homoplasy and false-positive signals in the basic data and to substantially increase the power of the statistical analysis.

Fig. 1
figure 1

The studied natural populations of Pinus nigra (Arn.) and the principal edges of genetic diversity determined by the BARRIER program (dashed line), and the model-based method of STRUCTURE/TESS programs (solid line)

Statistical analysis of the population structure

The six chloroplast region sequence analyses for the 1051 individuals were interpolated for fragments of the same size and from the same population (i.e. the same length fragments from sequenced and non-sequenced individuals were interpreted as identical if they were from same population); one third of the samples were sequenced as an additional control. We obtained the sequence for the six chloroplast regions from all 3154 analysed individuals with less possibility of homoplasy (i.e. electromorph size homoplasy) due to the intensive sequence analysis and interpolation within the limit of each population (Table S2 in the Appendix). The sequences obtained from the 3154 individuals by interpolation from the six chloroplast regions were used for all of the statistical analyses. Chloroplast DNA diversity was calculated for each natural population in terms of the number of haplotypes expressed as N hap/N ind (the number of haplotypes detected divided by the total number of analysed individuals from each population) and the nucleotide diversity. These results were compared to the Hcp chlorotype diversity (expected heterozygosity) estimated from 103 random permutations using ARLEQUIN v. 3.1 (Excoffier et al. 2002). We estimated an unbiased size variant number in the chloroplast SSR variable regions for each population with the HP-RARE v.1.0 program (Kalinowski 2005).

The population structure was studied using both distance-based (I) and model-based (II), i.e. frequency-based approaches. We used two distance-based methods with the obtained results interpreted as principals; the model-based method was used as a complementary approach. For the distance-based approach (I), we calculated pairwise genetic distances according to the method of Rogers (1972) based on the assumptions of no mutations and no selection. To avoid the risk of mutations in the models changing between populations and group of populations in the large area of the Black Pine’s natural distribution, we choose the non-mutation distance in the among-population analyses. The unweighted pair group method with arithmetic mean (UPGMA) algorithm for tree building was preferred to the neighbour joining (NJ) method after preliminary testing using the TreeFIT v.1.0 software (R 2 av = 0.94; Kalinowski 2009). The Kalinowski’s R 2 av is the average proportion of variation in the matrix of genetic distances between populations that is explained by the trees at different scenarios of migration rates (m: 0.01; 0.001; 0.0001; 0.00001; 0.000001) and divergence times for the populations in the generations before sampling (t: 100; 500; 1000; 5000; 10,000; 50,000; 100,000), data not shown. The distances were calculated using PowerMarker v. 3.0 (Liu & Muse 2005) with 103 bootstrap replicates. The pairwise genetic distances, i.e. distance-based method have been used with success with cpSSRs for multiple species (Terrab et al. 2006; Wang et al. 2011; Vinceti et al. 2013). Monmonier’s maximum difference algorithm implemented in BARRIER v. 2.2 (Manni et al. 2004) with 103 bootstrap replications and 1000 distance matrices from Rogers (1972) was used as a second strategy to study the natural population structure determined here using genetic distances combined with geographic data. The BARRIER program has often been successfully used for uniparental markers (Dzialuk et al., 2009; Hohn et al., 2009; Hodel & Gonzales 2013).

The geographic distribution of the haplotype frequencies was analysed using a Bayesian clustering algorithm (BYM) from two different programs for additional verification: STRUCTURE v.2.2 (Pritchard and Wen 2003; Falush et al. 2003a) and TESS v. 2.1 (Chen et al. 2007). The STRUCTURE program was successfully applied to linked loci to define bacterial populations (Falush et al. 2003b) and to plastid DNA (uniparental markers) to assess the population structure of plants (Pico et al. 2008; Delplancke et al. 2012; Perdereau et al. 2014). We used the chloroplast haplotype frequencies to perform the model-based methods (i.e. a single locus with multiple alleles) as described in Coart et al. (2006) and Bonavita et al. (2015). The results of the model-based methods were used if they agreed with the results from the distance-based methods.

Initially, we expected that the STRUCTURE program would be better adapted for the cpDNA data; however, TESS was a better choice for the spatial population genetic analyses because it used the hierarchical MRF model from Voronoi and Dirichlet tessellation (François et al. 2006) that was suitable for species with fragmented and large natural distribution areas, such as the European Black Pine. We used two models (without admixture and with admixture) of natural populations and individuals. For the without admixture model, the maximal number of clusters (K max) was initially set to range between 2 and 33 (i.e. approximately one third of the population numbers). At higher numbers of clusters (K max > 33), the likelihood history was unstable and the obtained data were considered unusable. For the with admixture model, the K max was initially set to range between 2 and 8 (i.e. close to the result of the BARRIER program). At higher numbers of clusters (K max > 6), the likelihood profile was unstable and the obtained data were considered unusable. The true number of clusters (K true) was determined by two methods: (1) an estimate of the posterior probability of the data for a given K by calculating Pr(KjX) and Pr(XjK) as described by Pritchard et al. (2000) and (2) K true determined by ΔK = m(|L″(K)|)/s[L(K)], i.e. the mean likelihood L(K) values and a second-order rate of change of the likelihood (ΔK) calculated for three values of the interaction parameter ψ (0.6, 1.2 and 2.4) from 100 runs of 6 × 104 MCMC sweeps with 104 burn-in sweeps as recommended by Evanno et al. (2005). The K true was compared to the results of the BARRIER program and the Rogers (1972) distance as described by Pritchard et al. (2000).

Additionally, the N st and G st indices were calculated from 104 permutations by the SPAGEDI v. 1.2 software (Hardy & Vekemans 2002) and GENETIX v. 4.05.2 (Belkhir 2000). The comparison between N st vs. G st was used to determine the presence of the phylogeographic structure (at N st  > G st ) conceived by the SSRs in the different groups of natural populations from the without admixture and with admixture models. The indices are calculated by 104 permutations by the SPAGEDI software and GENETIX. The population structure analyses were completed by hierarchical tests using AMOVA (with 105 permutations) in ARLEQUIN. To detect the spatial edges of differentiated groups of populations, we used BARRIER v. 2.2 (Manni et al. 2004) with 103 bootstrap replications.

Coalescence analysis

The coalescence process was used to model the ancestry of the samples and the variability-reducing events (Kingman 1982; Hudson 1990). Two sets of groups (AA-BB and BB-CC) were analysed with the BATWING v.1.03 software (Wilson et al. 2003); the principal Black Pine formations: AA (western Mediterranean), BB (the Balkan Peninsula) and CC (Asia Minor), were determined from the admixture model of the spatial population genetic analyses (BYM algorithm). BATWING’s algorithm is Bayesian, which is adapted for haplotype data and assumes that the effect of selection is negligible. The population size, time of most recent common ancestor (TMRCA) and divergent time among the groups of populations were calculated by the Markov chain Monte Carlo (MCMC) algorithm to generate random samples from the posterior distribution of the population genetic parameter. The BATWING program has often been successfully used for uniparental markers (Kayser et al. 2006; Raffi & Dodd 2007; Balaresque et al. 2010; Duminil et al. 2010). The priors of the mutation per generation mu (μ) range varied from 3.2 × 10−5 to 8 × 10−5 (average 5.6 × 10−5) according to Provan et al. (1999), and the priors of the effective population size was v 1 = 4 and v 2 = 250,000 in a pure linear growth population size model with a burn-in of the MCMC algorithm (105) and number of sample output equal to 104 interactions.

The number of years for one generation is very difficult to determine for forest trees and depends on many factors. Thus, reproductive activity, period of intensive body growth and longevity are more important. These factors have a strong relationship and determine the effective capacity of one individual to transmit its genome to the next generation (Pianka 1978; Lande et al. 1999, 2003; Spellman & Klicka 2006). For the European Black Pine, we calculated the TMRCA and more recent split time with three different generation ranges (20, 68 and 100 years). The first range (20 years per generation) corresponded to the average age of the starting reproductive activity. The second range (68 years per generation) corresponded to the average age of the limit of the period of intensive body growth from the timber harvesting growth table. The third range (100 years per generation) corresponded to the average age of the Black Pine’s life limit. We expect that this question is broadly disputable. The number of years per generation depends on many ecological conditions and can vary at different times and different natural distribution areas. For this reason, the time calibration in the present publication is treated with caution.

The results of the coalescence analysis are presented by three different quantiles (5, 50 and 95 %), three different mutation rates per generation (3.2 × 10−5, 8 × 10−5 and average) and three different numbers of years per generation (20, 68 and 100). This approach generated nine different scenarios for the posterior ancestral population size (N a ) and the relative sub-cluster size and 27 different scenarios for the TMRCA and time of the most recent split. To make the choice closer to reality, we calibrated the results, i.e. we compared the obtained time from different scenarios with climatic records timed from the same region as the Last Glacial Maximum (LGM; 0.01–0.02 Ma), the Early–Middle Pleistocene Transitions (0.892–1.2 Ma) and the Messinian Salinity Crisis (5.33–5.96 Ma). Mediterranean region tectonic history studies are used as a complementary method (i.e. from southern Europe, northern Africa and Asia Minor). The scenario obtained from the TMRCA and time of the most recent split calibration is used for the posterior ancestral population size (N a ) and relative sub-cluster size calculations. This scenario is based upon three guiding principles: (1) focus on the quantiles 95 %—i.e. high confidence scenario; (2) the period of most recent split must be from the same period (or after it) of important climatic records timed from the same region; (3) if there are no matches, the analysis must be performed with modification of the sample size and mutation rate (μ).

Results

A total of 272 distinct haplotypes were detected within the 13 examined cpDNA microregions (seven SSRs and six SNPs; Table S1 in the Appendix). We used the maximum number of chloroplast markers that were similar to other Black Pine studies. The region sequences with Pt-30204 have two independent repetitive sequences, whereas the remaining primer sets had one. The GenBank numbers of the sequences were doi:10.1007/s11295-016-1022-y (Data Accessibility-1). SNPs were detected by chance in the flanking regions of the SSRs. We integrated the SNPs to increase the power of the statistical analysis due to their low mutation rates, i.e. two haplotypes with the same SSR length, but different SNPs in the flanking regions were interpreted as two different haplotypes (Table S2 in the Appendix). The method applied here has a missed interpolation error of 33–50 % less than that of published results without the sequencing analysis control.

A total of 48.9 % of the haplotypes were detected in less than three individuals. The number of distinct haplotypes specific for different geographic regions ranged from 4 to 136 (Fig. S1 in the Appendix). The gene diversity at these size-variant loci ranged from 0.123 (Pt-36480) to 0.716 (Pt-71936). We performed rarefactions to measure the size variant richness in the population and showed that 30 (at p ≤ 0.05) and 50 (at p ≤ 0.01) individuals per population (or groups of populations) were sufficient to accurately estimate the size variant frequencies (Data Accessibility-2). This parameters varied between natural populations and regions, but there was no evidence of a clear east-west divide. The average number of distinct haplotypes per population was 12.8, with a range from four to 27, a mean diversity index (H cp ) of 0.288 (0.192 to 0.442) and an average nucleotide diversity of 0.029 (Table 1). The average level of association between the number of haplotypes (N hap) and the total number of analysed individuals per population (N ind) is 0.42. Both parameters—N hap and N ind are independent according to results of the T test (t = 504.6 with 208 degrees of freedom at p = <0.001), with weak correlation between them (r 2 = 0.187; at p = 0.05).

Table 1 The studied natural populations of Pinus nigra (Arn.) along with details on genetic diversity

Present population structure

The present population structure was investigated using two approaches based on the following: (1) genetic distances irrespective of geographic distances using Rogers (1972) geometric distances (both genetic and geographic distances) in the BARRIER program and (2) combined distance and geographic parameters from the model-based methods (STRUCTURE and TESS programs).

The result of the first approach using UPGMA clustering with Rogers (1972) geometric distances indicated the presence of three large clusters (with ≥50 % bootstrap support) and an association of the haplotypes with the geographical regions (Figs. S1 and S2 in the Appendix). Each of the large UPGMA clusters contained at least four sizeable sub-clusters with bootstrap support. Finer geographical structuring was also apparent in the clusters or sub-clusters in several geographic regions. The first UPGMA cluster included most natural populations from west of the Adriatic and Ionian Sea basins France, Italy, Morocco and Spain (FR, IT, MA and ES) and three natural populations located in the western Balkan Peninsula (BA-11 and GR-2) and Asia Minor (TR-24). The second and third cluster generally represented natural populations from the Balkan Peninsula and Asia Minor, respectively. The misclassified populations in the three principal clusters were most often observed in the Asia Minor formation, followed by the Balkan Peninsula and Western Europe. These misclassified populations are indicators of a limit of this algorithm for a large number of populations; thus, it was necessary to complete the analysis with different methods, and the presence of admixing (i.e. an admixture of a significant number of local and universal haplotypes; Figs. S1 and S2 in Appendix). The hierarchical AMOVA analyses also showed substantial genetic variation among groups (clusters) with Φ CT = 0.081 and among populations with Φ SC = 0.066 (Table S3 in the Appendix). The calculated F st and G st values suggested that migration and genetic drift contributed significantly to the genetic natural population structure.

The BARRIER results indicated the presence of six barriers with more than 50 % bootstrap support that were clearly consistent with the UPGMA, no admixture and admixture clusters (Fig. 2). The highest rate of change (0.80) was found at the barriers ii-iv and iii-iv between the Black Pine admixture clusters AA and BB, and (0.76) at the barrier v-v between admixture clusters BB and CC, followed by the barrier (i-i) coinciding with the Rhone valley (0.65). The other barrier (vi-vi) appeared to be less important (0.61) and was located in the extreme eastern part of the distribution area of P. nigra. The AMOVA results confirmed the relative significance of the barriers with a Φ CT of 0.018–0.020 for the strong barrier vs. 0.003 for the less pronounced edges of the combined genetic/geographic rates of change (Table S3 in the Appendix).

Fig. 2
figure 2

Bayesian clustering (coloured polygons) of European Black Pine populations based of “none admixing” clusters algorithm implanted in STRUCTURE v.2.2 (K max = 21 and K true = 22) and TESS v.2.1 (K max = 20 and K true = 21). The bold lines are the genetic barriers detected with Monmonier’s maximum difference algorithm of Manni et al. (2004; BARRIER v. 2.2 program)

The UPGMA distance analysis of the population structure did not consider prior geographic data. Therefore, we complemented this analysis with an allele frequency test of the group-individual-population structure combined with geographic data in space via Bayesian analysis using two scenarios: without admixture (no-admixture) and with admixture (STRUCTURE and TESS programs). The with-admixture scenario indicated that the optimal number of clusters (K true) was three for all trends (data not shown). The numbers of natural populations in the with-admixture clusters were approximately proportional to the present natural distribution area of the Black Pine, with 19, 53 and 34 populations in the three formations (designated AA—western Mediterranean, BB—the Balkan Peninsula, and CC—Asia Minor, respectively; Fig. 3). The population/individual membership was determined by estimating the admixture proportions at 90 % probability intervals.

Fig. 3
figure 3

Map of the natural distribution range of P. nigra (top) according Critchfield & Little (1966). Dashed black lines indicate the two big edges of “admixture”, AB (left) and BC (right), according the algorithm implanted in STRUCTURE v.2.2 (K true = 3) and TESS v.2.1 (K true = 3). The average proportion of individual membership (bottom) for each “admixture” cluster AA (brown; Westerns Mediterranean Formation), BB (green; the Balkan Peninsula Formation) and CC (rose; Asia Minor Formation) and the regions of admixture AB (brown/green; IT-W.Balkan) and BC (green/rose; Balkan-CY-TR-CP-RU); Φ CT—among formations; Φ ST—within formations

Two large edges of the admixture were identified: (AB) in the Adriatic and Ionian Sea basins and (BC) in the basins of the Aegean, Marmara and Black Seas. The first with admixture cluster (AA) was predominantly comprised of individuals from regions west of the Adriatic and Ionian Sea basins. The second (BB) included individuals predominantly from the Balkan Peninsula and neighbouring islands, whereas the third (CC) included a high proportion of individuals from Asia Minor, the eastern Black Sea coast and the eastern Mediterranean Sea islands (Fig. 3 and Table 2). The partitioning of genetic variance among the formations was Φ CT = 0.054 and within the formations was Φ SC = 0.119. The optimum number of clusters (i.e. groups) detected assuming no admixture (BYM algorithm) was 22 (K true) with a median interaction coefficient (ψ) of 1.2 and an estimated posterior marginal probability (tau) >88 % (data not shown). Between one and 15 natural populations were present in each no admixture cluster (Fig. 2). Six clusters (no admixture) were located in the region west of the Adriatic and Ionian Sea Basins (clusters 1, 2, 3, 4, 5 and 6); 11 in the Balkan Peninsula (clusters 8 to 17) and the neighbouring islands of Thasos (cluster 21) in the Aegean and Adriatic Seas; one in the region bordering the Eastern Black Sea coast in Russia, Turkey and the Crimean Peninsula (cluster 7); one on the eastern Mediterranean Sea islands Cyprus and Turkey (clusters 18, 19 and 20); and three in Asia Minor (clusters 18, 19 and 20). Hierarchical AMOVA indicated among-group and within-group Φ CT = 0.079 and Φ SC = 0.081 of the variation, respectively. The partitioning of genetic variance among the formations according to the with-admixture hierarchical analysis model was less pronounced than with the no admixture model. Strong phylogeographic patterns were detected in clusters 1, 2, 3, 5, 11, 20 and 22 of the no admixture model, (i.e. N st  > G st(nc) at p < 0.05 and 2 × 104 permutations; Table 2). The phylogeographic structure has been observed clearing physical barriers, such as the Adriatic Sea (IT/HR), the Straits of Gibraltar (ES/MA), the western Alps (FR/IT) and lowlands with highly competitive flora in different regions. No significant differences were observed in the F st (RH or RH′) and N st among any of the clusters identified by the admixture model (AA, BB and CC formations) and among some of the natural populations identified by the no admixture model (p < 0.05; Table 2). These findings indicate that the mutation rate is negligible compared to the effects of migration and genetic drift.

Table 2 Genetic diversity parameters and population differentiation estimates performed at 2 × 104 permutations using SPAGEDI v.1.2 (Hardy & Vekemans 2002) and GENETIX v.4.05.2 (Belkhir 2000)

The differences between the results obtained using all of the analytical strategies reflect the differences in the methodological algorithms, i.e. population vs. individual assessments, distance-based vs. model-based methods and non-model vs. model-based methods. The results of the model-based methods STRUCTURE and TESS agreed (full or partial match) with the results of the distance-based methods (Roger’s geometric distances and the BARRIER program). Approximately 25 % of the clusters from the no admixture model did not match Roger’s geometric distances. All clusters from the BARRIER program matched all clusters from the no admixture model with one partial exception (cluster 3). All clusters and their individuals (approximately 80 %) from Roger’s geometric distances and from the BARRIER program matched clusters from the admixture model (Figs. 1, 2 and 3; Appendix-Fig S2). The presence of misclassified populations in Roger’s geometric distances showed the limit of this method for a large population number. The best distance-based method was the Monmonier’s algorithm from BARRIER program.

Analysis of historic population demographics

The more recent coalescence events, i.e. time of the most recent split and time of the most recent common ancestor (TMRCA), were analysed for the populations of the three main admixture clusters (AA, BB and CC formations) determined by the Bayesian assignment test (Table 3). The results show that none of the 27 coalescent event scenarios were associated with the LGM (i.e. less than 20–25 thousand years ago). The best fit was observed for one scenario with 95 % quantiles, an average mutation rate of μ = 5.6 × 10−5 per generation and a 68-year lapse of time per generation, i.e. calibration with the timing of the Early–Middle Pleistocene Transitions and the Messinian Salinity Crisis. All data interpretation and the following discussion are based on this scenario. We interpreted the time of the paleo-climatic and tectonic events with caution because the different methodology used for the time calculation from the paleoclimatology and tectonic dynamics might vary to some extent. The most recent splits appear to have occurred between the natural populations of the Balkan Peninsula and Asia Minor (BC: 0.93 Ma the Early–Middle Pleistocene Transitions) and the natural populations of the Balkan and Apennine Peninsulas (BA: 4.37 Ma after the Messinian Salinity Crisis). The TMRCA follows the same pattern: 10.72 Ma for BB/CC and 14.75 Ma for AA/BB (i.e. from the early Pleistocene-Pliocene-late Miocene; former Tertiary period; Table 3). The posterior ancestral population size (N a ) is 26.0–26.5 × 104 individuals, and the relative sub-cluster size varies between 17.5 and 20.0 × 104 individuals.

Table 3 Posterior ancestral population size (N a) in 104 individuals, most recent common ancestor in 104 years at generation time 20, 68$ and 100¥ years (TMRCA years); relative sub-cluster size in 104 individuals, time in 104 years at generation time 20, 68 and 100 years of the most recent split at prior (μ) mutation rate 3.2 and 8.0 × 10−5 per generation and sample size (4 up to 250,000 individuals) for coalescent analysis with Splitting Model at 11 × 104 interaction performed using Wilson et al. (2003). The AA, BB and CC are formation (“admixing”) cluster from the haplotype frequency geographical distribution Bayesian method (BYM)

Discussion

Population demographic history

Our study showed a moderate level of association between the number of haplotypes and the total number of analysed individuals per population (N hap/N ind = 0.42), which was close to the results obtained using the same chloroplast microsatellites for Mediterranean pines (Pinus pinea and Pinus halepensis) by Gomez et al. (2002, 2005), Pinus pinaster by Bucci et al. (2007), Pinus mugo by Heuertz et al. (2010) and P. nigra by Bonavita et al. (2015) and Naydenov et al. (2006). The number of haplotypes per population is generally ≤0.50 for many pines worldwide, including Pinus banksiana and Pinus resinosa from North America (Walter & Epperson 2005; Naydenov et al. 2005) and Pinus krempfii from Asia (Wang et al. 2014). In the Iberian Peninsulas, slightly greater than average N hap/N ind = 0.61 (Table 2; nhp15 = 9.16) were reported for P. nigra by Soto et al. (2010). We and others using the same chloroplast microsatellites could not confirm the high level of Black Pine haplotypes per population (av. 0.91; Table 1) reported by Raffi & Dodd (2007).

The homoplasy is a phylogenetic phenomenon resulting from some cause other than common ancestry. In evolutionary studies, this phenomenon is of two main types: convergence and reversion. Homoplasy is contrasted with homology. For molecular analysis, the homoplasy is the presence of identical microsatellite electromorphs with heterogeneous DNA sequence (Doyle et al. 1998; Hale et al. 2004). In phylogeography studies, homoplasy has not been a substantial problem for low mutation rates (μ < 10-4) of microsatellites as chloroplast according Navascués & Emerson (2005), and it depends on species, molecular marker type, sample size and the used statistical method (Estoup et al. 2002; Adams et al. 2004). The population genetic studies here are less affected by homoplasy as compared to studies without sequencing control.

The level of differentiation among populations found in our study (Φ SC = 0.07–0.13) was similar to the level reported for the Stone and Aleppo Pines by Gomez et al. (2002, 2005), the P. pinaster by Bucci et al. (2007), the Mountain Pine by Heuertz et al. (2010) and the Black Pine by Bonavita et al. (2015) for the same geographic region. Generally, the results for paternally inherited DNA (cpSSR) from coniferous species (Abies, Picea and Pinus) showed ≤0.15 among-population variability, including Picea abies by Scotti et al. (2006) from Europe, Pinus sylvestris by Semerikov et al. (2014) from Siberia-Asia and Abies balsamea by Cinget et al. (2015) from North America. It seemed that the best strategy is using the BARRIER program (distance-based method) as the principal method, combined with the complementary model-based method of STRUCTURE and TESS programs as necessary.

Genetic analyses identified three differentiated genetic formations that were consistent with the European Black Pine’s natural geographic distribution. The ancestors of the principal formations began diverting relatively recently (14.75 Ma for AB and 10.72 Ma for the BC edges) compared with the TMRCA results from the Haploxylon and Diploxylon sections of pines (89–90 Ma; He et al. 2012). None of these scenarios from Table 3 showed a population demographic signal from LGM. Paleobotanical research confirmed the presence of P. nigra (i.e. its ancestor) in the Balkan Peninsula during the Tertiary (Stefanov 1941/42, 1943; Palamarev 1989). In the period from 10–35 Ma, the Mediterranean region tectonic structure was different from the current structure (Gealey 1989; Stampfli et al. 2002), and a significant part of the Balkan Peninsula and Asia Minor and probably one very small part of Calabria in southern Italy were located in a massive long topographic structure known as the Balkan-Pontides microplate between the Western Tethys and the Paratethys Sea (Fig. 4). The region had multiple moderate elevations, including mountains, latitude variations and a stable tectonic structure for a 25-My period that was favourable for the speciation process. We speculate that the principal Black Pine ancestral populations were located in this area during this period (Fig. 4). In contrast to the Balkan Peninsula, Iberian Peninsula and Asia Minor with their relatively stable tectonic history, the Apennine Peninsula’s tectonic history is a puzzle (i.e. non-monolith, unstable and collective over the last 25 My). The period of the speciation process was not sufficient for the Black Pine ancestor to evolve into multiple distinct species; this probably explains the lack of reproductive barriers between different Black Pine sub-species. In contrast to P. nigra, some ancestors of the Pinus species from Mexico and the western USA exhibit evolution into multiple separated species in the present day (more than 44 according Critchfield & Little 1966) in regions that have been tectonically stable for a very long period (>80 Ma). This stability has resulted in more pronounced isolation between separated populations in the large ecological gradient determined by mountain tops and large latitudinal ranges, i.e. species response with strong glacial/interglacial cycles from oceanic-continental latitude-altitude gradients over a very long time.

Fig. 4
figure 4

The Balkan-Pontides microplate region of the most probable Black Pine ancestor distribution (ellipses) from the Paleogene and Neogene periods. This microplate today is fragmented and shared between the European and Anatolian tectonic plate. The Mediterranean’s basin Paleogeographic reconstruction map from Miocene (left) up to the Oligocene epoch (right) are according to Gealey (1989) and Stampfli et al. (2002) and re-designed by Dr. Ron Blakey from Colorado Plateau Geosystems, USA. The Western Tethys and Paratethys Seas (i.e. Mediterranean Tethys and Paratethys) are in the southern and northern portion of the ellipse

The results of the population structure analysis discriminate between the natural populations of P. nigra along the edges of the Adriatic and Ionian Seas (AB), i.e. between the Apennine and Balkan Peninsulas. This edge (AB) divides the Mediterranean basin into two topographically and climatically distinct regions. Ancient climatic fluctuations most likely resulted in different migration histories and forest community associations with different successional patterns (i.e. interglacial refugia vs. macro long-term refugia). The refuge terminology and classification used here are based on the theory of Stewart et al. (2010).

Many non-pine tree genera are often present in Black Pine forest associations (including Quercus, Acer, Fraxinus and Carpinus), and their associated fauna follow similar phylogeographic patterns. The AB and BC edges seem to be a complementary edge of east/west admixing for multiple species with large distributions (i.e. not limited to the Mediterranean region). For example, Petit et al. (2002a, b) and Lumaret et al. (2005, 2009) used chloroplast DNA to illustrate the presence of eastern and western lineages of eight different species of oaks; similar results were published for hornbeam by Grivet & Petit (2003) and for three species of European ashes by Heuertz et al. (2006). The edge BC (Aegean and Marmara seas) between the Balkan Peninsula and Asia Minor (Turkey) determined here was ignored for many years because these areas have been sampled less intensely. The east/west phylogeographic structure was reported by Emerson & Hewitt (2005) using mtDNA for animals such as Erinaceus europeus/concolor and Ursus arctos, insects such as Chorthippus parallelus in Europe, and P. sylvestris (Naydenov et al. 2007). Interestingly, similar edges are marked by the distribution limits of other Mediterranean pines, such as P. pinaster and P. brutia (Critchfield & Little 1966).

The timing of the Black Pine’s most recent split and the TMRCA occurred during the Pleistocene-Pliocene-late Miocene epochs during which there were multiple significant temperature fluctuations not typical of a warm period (Joos & Colin 2004). The maximal value of time of the more recent split was the Messinian Salinity Crisis (late Miocene epoch: 4.37 Ma) that occurred in the Mediterranean Basin between the Apennine and Balkan Peninsulas. This period resulted in the formation of a natural migration bridge between the Apennine and Balkan Peninsulas (and the continents and islands). The older split (4.37 Ma) between the Apennine and Balkan Peninsulas explained the more pronounced phylogeographic edge (AB) and indicated the beginning of significant fragmentation of the Black Pine’s natural area.

The time of the most recent split and TMRCA between the Balkan Peninsula and Asia Minor do not predate those from the AB edge. The younger split (0.93 Ma) between the Balkan Peninsula and Asia Minor explains the less pronounced phylogeographic edge (BC) and indicates the relatively recent beginning with the significant fragmentation of the Black Pine’s natural area. The timing of the younger split was a period of global climatic change and significant ice accumulation on both poles due to increasing climatic cycle duration that was described by Maslin & Ridgwell (2005) as the Early–Middle Pleistocene transitions (0.892–1.2 Ma). This ecological condition was most likely unfavourable for the Black Pines in this region (eastern Mediterranean) and contributed to a significant reduction in the species’ distribution, fragmentation, isolation and the split between the Balkan Peninsula and Asia Minor. This hypothesis has been supported by the presence of severe bottlenecks a long time before the first human civilization developed in the region, or agricultural use of the land (Naydenov et al. 2011, 2015). We hope to obtain more information from future complementary mtDNA analyses.

The coalescence analysis did not show any signal from the LGM (BATWING program) in contrast to the results reported by Raffi & Dodd (2007). These authors reported Black Pine postglacial recolonisation (i.e. expansion); however, multiple independent paleobotanical studies have reported opposing data, i.e. Black Pine postglacial compression from the same region and time (García-Amorena et al. 2011; Roiron et al. 2013). The P. nigra natural populations from the western Mediterranean formation but not the Balkan Peninsula and Asia Minor formations are in habitats identified by fossil record from LGM as interglacial refugia (cryptic southern refugia). This hypothesis was supported by modern paleobotanical studies from the western area of the natural distribution of P. nigra which showed this species was one of the dominant forest trees in the LGM period in the Iberian Peninsula and southern France (García-Amorena et al. 2011; Roiron et al. 2013). Only 4–7 °C was required to transform the western European Black Pine natural area from a dominant to an endangered species over a very short period of time (less than 10 thousand years).

The among-population structure of the Black Pine on the Balkan Peninsula and Asia Minor observed in the present study was generated by principal topographic structures, such as mountains and watersheds, and the repeated in situ survival of the populations during cool periods with little altitude migration as suggested by Naydenov et al. (2006, 2015). We did not find any fossil records for significant P. nigra area size fluctuations for either region during the LGM period (and before/after). Thus, the Balkan Peninsula and Asia Minor may be areas of endemism or macro long-term refugia rather than glacial or post-glacial refugia for the Black Pine natural populations according to the theory of Stewart (2003) and Stewart & Dalen (2008). Our phylogeographic and coalescence-based results support the hypothesis of individual spatial and temporal responses of different Black Pine formations (i.e. interglacial refugia vs. macro long-term refugia). Similar longitudinal species adaptation for contrasting climatic and soil conditions was observed for the sea buckthorn (Hippophae rhamnoides), which naturally persisted in the Asiatic step and the Atlantic seaboard (Iversen 1973). The phylogeographic structure of P. nigra revealed by cpDNA is the result of a long evolutionary history shaped by very long-term (15–16 Ma) climatic/ecological changes and tectonic historic structure changes. The obtained results show that chloroplast DNA is a very useful tool for phylogeographic studies with long historical imprints to present population structures.

Perspectives

Generally, isolated populations are believed to be capable of rapid evolutionary changes due to the high chance of random genetic drift, which generates variations on which selection can rapidly act (Lehman 1998). Mediterranean vegetation is considered to be very resilient because it has experienced intense and abrupt climatic changes. As a result, these taxa are very stable over time and consequently they have retained their ancestral characteristics. Furthermore, the vegetation tends to not fill its climatic range (Svenning & Skov 2004; Petit et al. 2005). The Black Pine fulfils the principal characteristics of Mediterranean taxa. Consequentially, it is necessary to mount a collective effort (national and international) to protect biodiversity in the region, and to manage the ecosystem with a better understanding of its genetic, morphological and physiological characteristics.

Much effort has gone into different meta-genomic forest tree projects (i.e. ProCoGen, EvolTree, TreeBreDex and NovelTree) over the last 10–15 years in Europe (EU) and has achieved very good results. New genomics/epigenetics projects and collective effort are necessary to elucidate the sub-species structure of the Black Pine for future selection, management and conservation practices. The present results concerning the phylogeographic structure are complimentary and would have been very expensive and difficult to obtain using other techniques. In the near future, we will use mitochondrial DNA to complement the results reported here. This approach will help us develop a better understanding of the natural population structure and history of the Black Pine and confirm the importance of using large sampling areas for molecular analyses of periods of climatic change and anthropogenic pressure.