Introduction

Climate change scenarios predict not only higher annual temperatures, but also changes in precipitation patterns, increasing the risk of extreme events, such as floods and droughts (Trenberth 2011). In Central Europe, an increment in the temperature of 1.3 °C has been observed during the first decade of the twenty-first century compared to the last half of the nineteenth century. Similarly, the frequency of hot days, tropical nights, and heat waves has increased since the last half of the twentieth century, whereas cold periods and frost days have been reduced (Kovats et al. 2014). Additionally, an increase in the duration and intensity of summer droughts has also been observed, and this trend is expected to continue through the twenty-first century (Beniston and Goyette 2007; Kovats et al. 2014).

Changes in climate will very likely affect the survival of forest trees, altering the composition and distribution of forests (Allen et al. 2010; Crookston et al. 2010; Chmura et al. 2011). European beech (Fagus sylvatica L.) is one of the most important and widely distributed forest tree species in Europe (Ellenberg 1988). In Switzerland, F. sylvatica is the second most important tree species, being predominant in the montane vegetation zone (Weber et al. 2011). Similar to other beech species, its distribution depends mainly on temperature, followed by moisture availability (Fang and Lechowicz 2006). Under climate change, the distribution of European beech is expected to be affected, with a reduction in the south, expansion in the north, and a shift in distribution toward higher elevations (Kramer et al. 2010; Bugmann et al. 2014).

Genetic variation is needed for a species to cope with environmental changes. Genetic studies on beech using isozyme, randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and microsatellite or simple sequence repeat (SSR) markers have found high genetic variation, high gene flow, and low population structure in European beech (Sander et al. 2000; Emiliani et al. 2004; Jump and Peñuelas 2007; Kraj and Sztorc 2009; Pluess and Weber 2012). However, those markers have limited potential to study adaptation. In particular, SSR markers are mainly located in noncoding regions (random genomic SSRs) and, thus, likely represent selectively neutral genetic variation, i.e., are not under natural selection (Holderegger et al. 2006). Instead, single nucleotide polymorphisms (SNPs) represent the most common variation throughout the genome, being found in both noncoding and coding regions, and they are considered to be more suitable markers to study adaptive genetic variation (Morin et al. 2004). Recently, multiple SNP markers have been developed in candidate genes potentially involved in stress response and phenology in European beech (Seifert et al. 2012; Lalagüe et al. 2014; Müller et al. 2015a). These SNPs have been successfully used to detect genetic variation showing signatures of selection (e.g., Csilléry et al. 2014; Müller et al. 2015b; Pluess et al. 2016; Krajmerová et al. 2017).

Different analyses can be used to identify genetic variation under selection. FST outlier analyses rely on the assumption that nonselective processes have the same effect on all loci, while selection would affect only certain loci in genome. Thus, loci with genetic differentiation (measured by the FST parameter) higher or lower than expected under neutrality are considered to be under positive or balancing selection, respectively (Lewontin and Krakauer 1973; Vitti et al. 2013). On the other hand, environmental association analysis (EAA) aims at identifying associations between allele frequencies and environmental variables (Rellstab et al. 2015; Stephan 2016), relying on the assumption that alleles in a locus under selection caused by a particular environmental factor might show a change in allele frequency following environmental change (Holderegger et al. 2010).

FST outlier analysis and EAA are complementary approaches for the detection of genetic variation under selection (De Mita et al. 2013; de Villemereuil et al. 2014). Both analyses have been applied in different organisms, including forest trees. For instance, associations of genetic variation with temperature and precipitation have been detected in Quercus lobata (Sork et al. 2010), Arabis alpina (Poncet et al. 2010; Manel et al. 2010), Pseudotsuga menziesii (Eckert et al. 2009), Pinus taeda (Eckert et al. 2010a, b), Pinus pinaster, and Pinus halepensis (Grivet et al. 2011). Likewise, in F. sylvatica, genetic variation at AFLP markers has been associated with temperature (Jump et al. 2006) and water availability (Pluess and Weber 2012). More recently, SNPs in candidate genes that might be under climate selection have been found in European beech (Csilléry et al. 2014; Lalagüe et al. 2014), and their associations with environmental variables such as temperature, precipitation, and drought have been determined (Pluess et al. 2016). However, the genetic variation underlying adaptation to different environmental conditions in F. sylvatica remains insufficiently studied.

Precipitation gradients may cause differences in water availability for plants and, thus, reflect differences in selection pressure acting on forest populations. In this study, we investigated the patterns of genetic variation and genetic structure at supposedly neutral genetic markers (SSRs) and potentially adaptive genetic markers (SNPs) in saplings and adults from 12 populations of European beech occurring along two steep precipitation gradients in Switzerland. The studied SNPs were located in candidate genes potentially involved in important traits, such as phenology and stress response, and were used for the identification of signatures of selection. Two different approaches were used for such identification: FST outlier analysis and EAA. FST outlier SNPs were identified by using three different methods implemented in LOSITAN, Arlequin, and BayeScan software. SNPs showing significant association with important environmental variables such as precipitation, temperature, and humidity were discovered and tested using two different methods implemented in Bayenv2 and Samβada, respectively.

Materials and methods

Plant material

Twelve populations of European beech located in the dry inner-alpine Rhone and Rhine valleys in Switzerland were used in this study (six populations per valley). The populations were located at similar elevations (550–850 m above sea level), with a mean annual temperature between 9.8 and 10.1 °C (Arend et al. 2016). The mean annual precipitation ranged between 849 and 1334 mm in the Rhine Valley and between 603 and 1012 mm in the Rhone Valley (Table 1). Leaves from 2 to 4 saplings underneath the same adult tree were collected, for a total of 60–64 saplings sampled per population. Additionally, leaves from 25 adult trees per population were collected. In total, 755 saplings and 300 adult trees were sampled. The leaves were dehydrated with silica gel and stored at room temperature.

Table 1 Environmental characteristics of 12 European beech populations studied in the Rhine and Rhone valleys

DNA isolation

DNA was isolated from dry leaves using the DNeasy™ 96 Plant Kit (Qiagen, Hilden, Germany). The amount and quality of the DNA were examined using electrophoresis in agarose gel at 1% with 1× TAE as running buffer. DNA was stained with Roti®-Safe GelStain (Roth, Karlsruhe, Germany), visualized by UV illumination, and compared with a Lambda DNA size ladder (Roche, Mannheim, Germany).

SSR amplification and genotyping

Individuals were genotyped at 13 SSR loci. Ten SSR loci were random genomic SSRs representing noncoding regions: Six of them were originally developed for F. sylvatica: FS3-04 (Pastorelli et al. 2003), msf11 (Vornam et al. 2004), csolfagus_06, csolfagus_19 (Lefèvre et al. 2012), Fagsyl_002929, and Fagsyl_003994 (Pluess and Määttänen 2013). Four markers—sfc0018, sfc0161, sfc1063, and sfc1143—were originally developed for Fagus crenata (Asuka et al. 2004). The other three SSR loci—GOT066, FIR065, and FIR004—were EST-linked (EST-SSRs). They were originally developed for Quercus robur (Durand et al. 2010) and successfully used for F. sylvatica in this study.

The PCR amplifications were performed using fluorescent dye–labeled primers as follows: 6-carboxyfluorescein (FAM) dye for mfs11, sfc0161, sfc1063, csolfagus_06, csolfagus_19, Fagsyl_003994, and FIR004 and 6-hexachlorofluorescein (HEX) dye for sfc0018, sfc1143, Fagsyl_002929, GOT066, FIR065, and FS3-04. This allowed us to assemble four different PCR amplification multiplexes. The first multiplex contained the FS3-04 and msf11 markers, the second multiplex—all four sfc markers, the third multiplex—the csolfagus and Fagsyl markers, and the fourth multiplex—all three EST markers. The PCR amplifications were performed in a total volume of 15 μL containing 2 μL of genomic DNA (about 10 ng), 1× reaction buffer (10 × 0.8 M Tris-HCl pH 9.0, 0.2 M (NH4)2SO4, 0.2% w/v Tween-20; Solis BioDyne, Tartu, Estonia), 2.5 mM MgCl2, 0.2 mM of each dNTP, 0.3 μM of each forward and reverse primer, and 1 unit of Taq DNA polymerase (HOT FIREPol® DNA Polymerase, Solis BioDyne, Tartu, Estonia). The amplification conditions were as follows: an initial denaturation step at 95 °C for 15 min, followed by 30 cycles consisting of a denaturation step at 94 °C for 1 min, an annealing step at 55 °C (first, second, and third multiplexes) or at 47 °C (EST multiplex) for 30 s, and an extension step at 72 °C for 1 min. After 30 cycles, a final extension step at 72 °C for 20 min was executed. The PCR fragments were separated and sized on an ABI PRISM® 3100 Genetic Analyzer (Applied Biosystems, Foster City, USA). The GS 500 ROX™ (Applied Biosystems, Foster City, USA) was used as an internal size standard. The genotyping was done using the GeneMapper 4.1® software (Applied Biosystems, Foster City, USA).

Candidate genes and SNPs

SNPs in candidate genes were selected from previously published studies on European beech (Seifert et al. 2012; Lalagüe et al. 2014; Müller et al. 2015a). From these studies, genes were selected that according to the databases uniprot.org (Apweiler et al. 2004) and arabidopsis.org (Lamesch et al. 2012) are very likely involved in stress response and phenology. From these genes, SNPs showing signatures of natural selection in previous studies (Csilléry et al. 2014; Müller et al. 2015b) were further selected and tested to get a final set of 24 candidate genes. For the candidate genes that contained several SNPs, linkage disequilibrium (LD) blocks were identified using the htSNPer 1.0 software (Ding et al. 2005), and a subset of SNPs representing the majority of haplotypes (haplotype tag SNPs) was selected for further genotyping. Finally, 76 SNPs (21 nonsynonymous, 27 synonymous, and 28 noncoding SNPs) were selected for genotyping (Supplementary material 1 Table S1). Nucleotide sequences neighboring selected SNPs were sent to LGC Genomics Ltd. (Hoddesdon, UK) for primer design and SNP genotyping using their PCR-based KASP™ genotyping assays.

Environmental data

Data on climatic variables collected by meteorological stations located near the populations were downloaded from the website of the Federal Office of Meteorology and Climatology MeteoSwiss (http://www.meteoswiss.admin.ch). Climate normals for the reference period 1961–1990 were used as proxies for the climate that imposed selection pressure on the early life stages of adult trees, whereas climate normals for the reference period 1981–2010 were used for the saplings. The environmental variables included data on annual and growing season (May–September) temperature and precipitation, heat days (HD), and summer days (SD), as well as latitude and longitude (Table 2). Three derived climatic variables were additionally calculated: potential annual direct incident solar radiation (ASR), the Thornthwaite’s moisture index (Im) (Thornthwaite 1948), and the Ellenberg’s climatic quotient (EQ) (Jahn 1991) (Table 2). ASR was calculated using data on latitude, slope, and aspect according to McCune and Keon (2002). To calculate Im, first, monthly potential evapotranspiration (PET) was calculated according to Thornthwaite (1948) using the R package SPEI 1.7 (Begueria and Vicente-Serrano 2017) with R software (R Core Team 2016). Then, Im was calculated according to the formula \( {I}_{\mathrm{m}}=\frac{100s-60d}{n} \), where s is the sum of surplus water for the months when precipitation exceeds PET, d is the sum of water deficiency for the months when PET exceeds precipitation, and n is water need (annual PET) (Thornthwaite 1948; Maliva and Missimer 2012). According to Thornthwaite (1948), moist climates have positive values of Im, and dry climates have negative values. EQ, which is widely used to describe habitats suitable for the genus Fagus, was calculated as \( EQ=\frac{\mathrm{Temperature}\ \mathrm{of}\ \mathrm{July}\ \left({}^{{}^{\circ}}\mathrm{C}\right)}{\mathrm{Annual}\ \mathrm{precipitation}\ \left(\mathrm{mm}\right)}\times 1000 \) (Jahn 1991; Fang and Lechowicz 2006). According to Jahn (1991), regions with values of EQ below 20 represent a pure beech climate, while the beech competitiveness slowly decreases in regions with EQ values between 20 and 30 and disappears in regions with EQ > 30. Information about the environmental variables per population and for the reference periods 1961–1990 and 1981–2010 are presented in Supplementary material 1—Table S2.

Table 2 Summary of the environmental variables used in the study

Spearman’s rank correlation coefficients between all pairs of environmental variables were calculated. Principal component analysis (PCA) was used to reduce dimensionality of the environmental variables; variables were standardized to a mean of 0 and standard deviation of 1 before PCA analysis. Principal components (PCs) with eigenvalues greater than 1 were kept for the environmental association analysis; these PCs will be referred further as environmental PCs. All analyses were conducted using the software Statistica 12 (Dell Inc. 2015). Environmental PCs as well as individual environmental variables were used further to find their association with SNPs.

Analysis of genetic data

Tentative neutral genetic variation (SSRs)

Allelic richness was calculated taking into account differences in sample size with the HP-Rare program (Kalinowski 2005) using a sample size of 50 individuals. Additionally, the diversity parameters observed (Ho) and expected (He) heterozygosity and the fixation index (FIS) were calculated using the GenAlEx 6.5 software (Peakall and Smouse 2006, 2012). FIS is one of the Sewall Wright’s fixation indices or F-statistics that reflects an excess of observed homozygosity and deficiency of heterozygosity, respectively, compared to the expected ones calculated assuming Hardy-Weinberg equilibrium (HWE). If other factors, such as selection and subpopulation structure (Wahlund effect) are excluded, then this parameter can be used as a proxy for measuring inbreeding. It gives more specific information than just measuring deviation of observed genotypic frequencies from expected ones according to HWE. Therefore, it is recommended to test both FIS and HWE. The GENEPOP 4.2 software (Raymond and Rousset 1995; Rousset 2008) was used to test for HWE using the exact probability test and for LD for each pair of SSR loci using 10,000 dememorizations, 1000 batches, and 10,000 iterations per batch for Markov chain parameters. Furthermore, the MICRO-CHECKER software (Van Oosterhout et al. 2004) was used to identify and correct genotyping errors, such as null alleles. Differences in genetic diversity parameters between saplings and adults were tested for significance using the FSTAT 2.9.3.2 software (Goudet 1995).

To assess genetic differentiation, FST and Hedrick’s standardized G″ST (Meirmans and Hedrick 2011) were calculated with the GenAlEx 6.5 software (Peakall and Smouse 2006, 2012) using 999 permutations. Population structure was inferred using the Bayesian approach implemented in the STRUCTURE 2.3.4 software (Pritchard et al. 2000); the analysis was done for genomic SSRs and EST-SSRs separately and for all SSRs together. The admixture model with correlated allele frequencies was used, with 100,000 iterations for both the MCMC (Markov chain Monte Carlo) burn-in period and the following MCMC. We tested from 1 to 20 possible populations or clusters (K), using 20 iterations for each of them. The analysis was performed in two modes: without prior population information and with sampling locations as prior information (LOCPRIOR option). The LOCPRIOR option assists clustering when there is weak structure (Hubisz et al. 2009). The most likely number of clusters K was determined considering mean posterior probability of the data (LnP(D)) and the ΔK method (Evanno et al. 2005), which is implemented in the STRUCTURE HARVESTER 0.6.94 software (Earl and vonHoldt 2012). The CLUMPAK software (Kopelman et al. 2015) was used for summation and graphical representation of the results obtained by STRUCTURE. In addition, a PCA was performed using a matrix of covariances calculated from population allele frequencies with the GenoDive 2.0b23 software (Meirmans and Tienderen 2004).

Tentative adaptive genetic variation (SNPs)

For SNP markers, the diversity parameters Ho and He, the fixation index FIS, deviations from HWE, and LD between pairs of SNPs were calculated as described above for SSR markers. FST and Hedrick’s standardized G″ST (Meirmans and Hedrick 2011) were calculated as described for SSR markers to assess genetic differentiation. In addition, analysis of population structure was performed for all SNPs, potentially adaptive SNPs (see “Results”), and potentially neutral SNPs separately, the same way as it was described for the SSR markers.

Signatures of natural selection at SNP markers

Two different approaches were used to detect SNPs showing signatures of selection: FST outlier tests and EAA, respectively. For the detection of FST outlier SNPs, three different methods with different demographic assumptions were used. The first method was developed by Beaumont and Nichols (1996) and is implemented in the LOSITAN software (Antao et al. 2008). This method determines the expected thresholds for the distribution of FST as a function of He for loci with selectively neutral variation under an island model of migration. The analysis was performed using the infinite allele model with 200,000 simulations, a confidence interval of 95%, and a false discovery rate (FDR) of 0.1. To run LOSITAN, we used a procedure typically used in similar studies (e.g., Krutovsky et al. 2009). LOSITAN was run first using all loci to estimate the mean neutral FST. After the first run, all loci outside the 95% confidence interval were removed, and using only putatively neutral loci that were not removed, LOSITAN was run again to estimate a second mean neutral FST. Finally, a third run was done using all loci and the second mean neutral FST. This procedure lowers the bias when estimating the mean neutral FST by removing, at the end of the first run, the most extreme loci from the estimation (Antao et al. 2008). LOSITAN analysis was done taking into account the entire set of populations and also for each valley (Rhine or Rhone) separately.

The second method used for the detection of FST outliers is implemented in the Arlequin 3.5 software (Excoffier and Lischer 2010) and is similar to the one implemented in LOSITAN but considers a hierarchical island model of migration, in which populations exchange more migrants within groups than between groups (Excoffier et al. 2009). Populations of saplings and adults were grouped hierarchically according to the region; furthermore, populations of saplings were also grouped according to the groups suggested by the STRUCTURE analysis based on all SSR markers. Then, 50,000 simulations were carried out, using 10 groups of 100 demes as running conditions as recommended by Excoffier et al. (2009). An FDR of 0.1 was applied using the Benjamini and Hochberg (1995) method implemented in the R script “p.adjust” of the R software stats package (https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/p.adjust).

The third FST outlier detection method is implemented in the BayeScan 2.1 software (Foll and Gaggiotti 2008). It assumes that populations diverged from an ancestral gene pool, and their allele frequencies show different degrees of differentiation from it. Running conditions used in BayeScan were as follows: a burn-in period with 50,000 iterations, a thinning interval of 10, a sample size of 5000, and 20 pilot runs with 5000 iterations each, for a total of 100,000 iterations. A locus was considered outlier if its q value was less than FDR < 0.05 or 0.1. The BayeScan analysis was done taking into account the entire set of populations and also for each region separately.

For EAA, environmental PCs as well as individual environmental variables were used for the detection of associations with SNPs. Two different methods were used—one implemented in the Bayenv2 software (Coop et al. 2010; Günther and Coop 2013; https://gcbias.org/bayenv) and another in the Samβada software (Stucki et al. 2017; https://lasig.epfl.ch/sambada). Bayenv2 uses a Bayesian method that first estimates the covariance in allele frequencies among populations from a set of neutral markers, and then uses this information as a null model to test for associations between allele frequencies at selected SNPs and environmental variables (Coop et al. 2010; Günther and Coop 2013). In this study, the putatively neutral SSR markers were used to estimate the covariance among populations. The covariance matrix was obtained with the JMP® software, version 13.1.0 (SAS Institute Inc., Cary, NC, 1989–2007). The matrix was then used as a null model for testing for associations between the SNPs and the environmental variables using Bayenv2. Bayes factors (BFs) were calculated for each SNP against each environmental variable using 100,000 iterations. As there can be variation across Bayenv2 runs, five independent runs were performed, and the BFs were averaged across runs, as recommended by Blair et al. (2014). SNPs with a BF > 3 were considered significantly associated with the corresponding environmental variable. The Jeffreys’ scale was used to determine the strength of the associations with the following cutoffs: 3 ≤ BF < 10 = substantial; 10 ≤ BF < 30 = strong; and 30 ≤ BF < 100 = very strong (Jeffreys 1961).

The Samβada method tests for associations between genotypes and environmental variables using logistic regressions and allows for the inclusion of population structure (Stucki et al. 2017). SNPs were coded as presence/absence of a given genotype in each individual. Given the results obtained with STRUCTURE 2.3.4 software (Pritchard et al. 2000) for all SSR markers, a multivariate analysis was run in the saplings including population structure as the coefficients of membership (Q) for each individual; the G scores to assess significance were calculated according to Samβada manual. For the adults, a univariate analysis (without including population structure, see “Results”) was run. The G scores obtained in both multivariate and univariate analyses were used to compute the corresponding P values using a χ2 distribution with 1 degree of freedom. Correction for multiple testing was done by adjustment of P values for an FDR = 0.1 using the Benjamini and Hochberg (1995) method implemented in the R function “p.adjust” (https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/p.adjust). A SNP was considered to be candidate under selection if at least one of its three genotypes showed significant association with an environmental PC or environmental variable (Stucki et al. 2017). Graphical representation of logistic regression fits was done with the software JMP®, version 13.1.0 (SAS Institute Inc., Cary, NC, 1989–2007).

Results of the five different methods (LOSITAN, Arlequin, BayeScan, Bayenv2, and Samβada) were compared across methods, markers, and generations (adult trees and their saplings).

Results

Relationships between environmental variables

Latitude was strongly positively correlated with minimum temperatures, precipitation variables, and the moisture index Im and moderately negatively correlated with maximum temperatures, SD, HD, and EQ based on Spearman’s rank correlation coefficients (Supplementary material 2—Fig. S1). Longitude had either no correlation or weak positive correlations with most of the variables, most of which were not significant. Maximum temperatures were strongly and positively correlated with SD and HD, while negatively correlated with minimum temperatures and precipitation variables. The Thornthwaite’s moisture index Im was strongly negatively correlated with maximum temperatures, SD, and HD and strongly positively correlated with precipitation. In contrast, the EQ index was positively correlated with maximum temperatures and SD and HD and negatively correlated with minimum temperatures and precipitation. ASR had either weak or no correlation with all the environmental variables (Supplementary material 2—Fig. S1).

PCA showed that the top 3 PCs had eigenvalues higher than 1 and captured the most of the overall variance of the environmental variables for both reference periods: 95.54% for 1961–1990 and 95.99% for 1981–2010 (Table 3). To interpret each environmental PC, environmental variables showing strong correlation coefficients with values more than |0.8| with a given environmental PC were considered (Supplementary material 1—Table S3). Thus, for both reference periods, the environmental PC1 was strongly and positively correlated with latitude, minimum temperatures, precipitation variables, and the moisture index Im, whereas negatively correlated to maximum temperatures, SD, HD, and the EQ index (Table 3; Supplementary material 1—Table S3). This indicates that positive values of PC1 represent more humid/colder environments, while negative values indicate drier/warmer environments. The environmental PC2 was strongly correlated only with mean annual temperature, and the environmental PC3 was strongly and positively correlated only with solar radiation (Table 3; Supplementary material 1—Table S3).

Table 3 Eigenvalue and variance explained (VE) for the first three environmental principal components (PCs) for the reference periods 1961–1990 and 1981–2010, and environmental variables that contributed mostly to these PCs

Tentative neutral genetic variation (SSRs)

For the 13 SSR markers, 4–19 alleles were detected in the saplings and 3–17 alleles were detected in the adults. No significant deviations from HWE were found (Supplementary material 1—Table S4). No loci showed evidence of null alleles. In general, EST-SSRs demonstrated lower genetic diversity than genomic SSRs (Supplementary material 1—Table S4). Analysis of genetic diversity revealed no significant differences between saplings and adults: A = 6.36 vs. 6.37 (P = 0.9) and He = 0.649 vs. 0.645 (P = 0.6) (Table 4). Likewise, there were no significant differences between the two regions, neither in the saplings—A = 6.49 vs. 6.23 (P = 0.3) and He = 0.656 vs. 0651 (P = 0.1)—nor the adults—A = 6.59 vs. 6.14 (P = 0.1) and He = 0.651 vs. 0.650 (P = 0.8) (Table 4). No significant deviations from HWE were found, except for the adult trees in the Saxon population. Significant LD was observed for 15 pairs of all 78 possible pairs (19.2%) of the 13 SSR loci in the populations of saplings (Supplementary material 2—Fig. S2), but only for the Sfc0018-FIR065 pair (1.3%) in the populations of adults. This pair was in LD also in the saplings.

Table 4 Mean diversity parameters ± their standard errors based on 13 SSR loci for sapling and adult populations of European beech in the Rhine and Rhone valleys

Genetic differentiation among populations was low but significant for saplings (FST = 0.017, P < 0.001; G″ST = 0.029, P < 0.001) and adults (FST = 0.027, P < 0.001; G″ST = 0.027, P < 0.001). Likewise, the STRUCTURE analysis with and without the LOCPRIOR option based on all SSRs together, genomic SSRs and EST-SSRs (Fig. 1a, b and Supplementary material 2—Figs. S3S13), revealed that there was a weak genetic structure among populations. There were possibly two clusters (K) in the saplings due to Chamoson forming a separate “cluster” as a population being likely the most genetically different from others forming another “cluster” (Fig. 1a and Supplementary material 2—Figs. S3S6). PCA on allele frequency data revealed that Martigny is also possibly genetically different (Supplementary material 2—Fig. S14), which is further supported by its high pairwise FST and G″ST values (Supplementary material 2—Fig. S15). In the adults, K = 1 is the most likely number of genetic groups (Fig. 1b). PCA on allele frequencies confirmed these results, suggesting no strong clustering among adult populations (Supplementary material 2—Fig. S14). Nevertheless, Chamoson showed high pairwise FST and G″ST values in the adults (Supplementary material 2—Fig. S15).

Fig. 1
figure 1

STRUCTURE analysis based on the 13 SSR markers (a, b) and the 70 SNPs (c, d) for K = 2. Colors in the bar plots for each individual indicate its admixture based on its assignment probability to one of two different clusters (K) in saplings (a, c) and adults (b, d). Population name abbreviations: Fel—Felsberg; Chu—Chur; Mal—Malans; Mas—Mastrils; Sar—Sargans; Mel—Mels; Ard—Ardon; Cha—Chamoson; Sax—Saxon; Mar—Martigny; Col—Collombey; Oll—Ollon

Tentative adaptive genetic variation (SNPs)

Among the 76 SNPs genotyped, six were monomorphic (APX1_2, PhyB, 50_320, 52_1_249, 92_166, 110_1_111). Based on the remaining 70 SNPs, both observed and expected heterozygosities were not much different between each other and between saplings and adults: Ho = 0.301 vs. 0.311 and He = 0.309 vs. 0.310 for saplings vs. adults, respectively (Table 5 and Supplementary material 1—Table S5). Overall, FIS was close to 0, and no significant deviations from HWE were found, except for the Mastrils, Sargans, and Ollon populations in the saplings and the population Mastrils in the adults (Table 5).

Table 5 Mean diversity parameters ± their standard errors based on 70 SNPs for saplings and adult populations of European beech in the Rhine and Rhone valleys

In both saplings and adults, almost half the LDs were found between SNPs in the same gene. In the saplings, significant LD was observed for 134 pairs of all 2415 possible pair combinations of SNPs (5.5%), and 68 of them were found between SNPs in the same gene (Supplementary material 2—Fig. S16). Similarly, for populations of adults, 107 pairs (4.4%) of all the possible pairs showed significant LD, and 59 of them were found between SNPs in the same gene (Supplementary material 2—Fig. S17).

Genetic differentiation was low but significant for populations of both saplings (FST = 0.020, P < 0.001; G″ST = 0.020, P < 0.001) and adults (FST = 0.028, P < 0.001; G″ST = 0.016, P < 0.001). Likewise, analysis of population structure using the STRUCTURE program without prior population information and with the LOCPRIOR option based on all SNPs, as well as on potentially adaptive SNPs and potentially neutral SNPs separately, revealed that there is a weak population structure in both saplings and adults (Fig. 1c, d and Supplementary material 2—Figs. S19S28). However, one should be really careful with interpreting STRUCTURE based on SNPs because some of them can violate the assumption of selective neutrality and be under different forms of selection.

Signatures of natural selection at the SNP markers

In the saplings, no outlier SNPs were identified by LOSITAN when performing the analysis with all populations together and with the populations from the Rhine Valley alone. However, the analysis that included populations from the Rhone Valley detected the SNP ALDH_4 as an outlier, possibly being under balancing selection (Table 6 and Supplementary material 2—Fig. S29).

Table 6 FST outlier SNPs detected in saplings and adults

Arlequin identified the SNPs ERD, CysPro_202, and NAC_962 as outliers that are likely under positive selection (Table 6 and Supplementary material 2—Fig. S29). No significant outlier SNPs were identified by BayeScan.

More outlier SNPs were identified in the adults than in the saplings. In the LOSITAN analysis for adults, 15 SNPs fell outside the 95% confidence interval when analyzing all populations and populations from each valley separately (Table 6 and Supplementary material 2—Fig. S30). In the Arlequin analysis, five SNPs fell outside the 95% interval (Table 6 and Supplementary material 2—Fig. S30), but no significant outliers were detected by BayeScan in the adults. Thus, among the detected outliers, three (4.3%) SNPs (CysPro_202, NAC_962, and 92_352 SNPs) are very likely true outliers under selection, because they were detected by both LOSITAN and Arlequin methods in the adults (Table 6). The SNPs CysPro_202 and NAC_962 were also detected by Arlequin in the saplings.

EAA carried out with Bayenv2 and Samβada identified additional SNPs showing significant association with the environmental variables and PCs, indicating that they are potentially under selection. Bayenv2 detected 14 (20%) and 5 (7.1%) SNPs in the saplings and adults, respectively (Table 7). Details of the BFs can be found in Supplementary material 1—Tables S6 and S7. Samβada identified 44 (62.9%) and 16 (22.9%) SNPs in the saplings and adults, respectively (Table 7). Details of the genotypes per SNP showing significant associations with the environmental variables as detected by Samβada can be found in Supplementary material 1—Tables S8 and S9. In the saplings, 13 (18.6%) SNPs were identified by both methods, while 4 (5.7%) SNPs were identified by both methods in the adults (Table 7). SNPs detected by both methods showed differences in allele and genotype frequencies along the environmental gradient. For instance, in the APX4_2 SNP, the frequency of the allele C and the homozygote genotype CC increased with positive values of PC1, i.e., in populations with humid/colder environments (Figs. 2a and 3a). Similarly, in the 17_1081 SNP, the allele T and the homozygote genotype TT decreased in frequency with increasing annual precipitation (AP) and decreasing EQ, i.e., in more humid environments (Figs. 2b, c and 3b, c). On the other hand, in the 50_232 SNP, the frequency of allele A and genotype AA decreased with increasing growing season (from May to September) precipitation (GSP) (Figs. 2d and 3d).

Table 7 SNPs and environmental variablesa for which significant associations were found using Bayenv2 and Samβada
Fig. 2
figure 2

ad Examples of some SNP allele frequencies calculated for each population and plotted against PC1, annual precipitation (AP), the Ellenberg’s climatic quotient (EQ), and growing season (from May to September) precipitation (GSP) for three SNPs that were identified as being very likely under selection by EAA. Black and open circles denote Rhine and Rhone populations, respectively

Fig. 3
figure 3

ad Examples of logistic regression fit of genotype frequencies along the environmental variables PC1, annual precipitation (AP), the Ellenberg’s climatic quotient (EQ), and growing season (from May to September) precipitation (GSP) for three SNPs identified as being very likely under selection by EAA

Comparing the results from the five different methods used to detect candidate SNPs under selection (LOSITAN, Arlequin, BayeScan, Bayenv2, and Samβada), it was found that 14 (20%) SNPs in the saplings and 9 (12.8%) SNPs in the adults were detected by at least two methods, and thus, they were considered as very likely true candidates under selection (Table 8). These SNPs are located in 12 (50%) and 7 (29.2%) candidate genes in saplings and adults, respectively.

Table 8 Summary of the SNPs considered to be very likely under selection based on the results generated by five methods used in this study to search for outlier markers (LOSITAN, Arlequin, and BayeScan) and SNP-environmental variable associations (Bayenv2 and Samβada) in European beech populations

Discussion

Putative neutral genetic variation (SSRs)

Among the existent molecular markers, SSRs are the most commonly used for the study of neutral genetic variation, i.e., variation that is selectively neutral (Holderegger et al. 2006; Kirk and Freeland 2011). In this study, a high genetic variation with SSR markers was found in all the studied populations of F. sylvatica (Table 4), similar to the values reported by other studies based on similar sets of SSR loci (Seifert 2012; Müller 2013; Bontemps et al. 2013; Rajendra et al. 2014). Among the SSRs, EST-SSRs showed lower variation (Supplementary material 1—Table S4), which can be attributed to their linkage with coding regions, making them more conserved and, thus, less polymorphic (Varshney et al. 2005; Ellis and Burke 2007). No significant differences in genetic variation between saplings and adults were found, suggesting that the saplings represent the genetic variation of the adult populations.

From the SSR loci used in this study, seven were transferred from F. crenata and Q. robur. They did not show evidence of null alleles, which is supported by the fixation indices (FIS) close to 0 (Supplementary material 1—Table S4). These results confirmed the observations from other studies indicating that the transferability of SSR loci among species of the genus Fagus is relatively high (Pastorelli et al. 2003; Lefèvre et al. 2012) and that transferability of EST-SSRs can be successful even in species from different genera but the same family (Ellis and Burke 2007).

LD between SSR loci was found for 19.2% of all the possible pairs in the saplings. In contrast, 1.3% of all the possible pair combinations were in LD in the adults, which is similar to the low percentage found in a previous study (Lefèvre et al. 2012). The higher percentage of SSR loci in LD in the saplings could be an effect of relatedness, since groups of two to four saplings were collected underneath the same adult tree. In fact, those saplings had higher pairwise relatedness coefficients than saplings collected under different trees (data not shown). Furthermore, since there are no genetic linkage data for the studied loci, it is impossible to see if the observed LD is due to close linkage.

The low FST and G″ST values and the STRUCTURE analysis demonstrated that population differentiation was very weak in the studied populations of European beech (Fig. 1). These findings are in agreement with other studies in European beech that also report low genetic differentiation in Germany (Sander et al. 2000; Rajendra et al. 2014; Müller et al. 2015b), Italy (Paffetti et al. 2012), France (Csilléry et al. 2014), Switzerland (Pluess and Weber 2012; Pluess et al. 2016), and other parts of Europe (Buiteveld et al. 2007). High gene flow may explain the low differentiation even in populations from different valleys, since F. sylvatica is an outcrossing wind-pollinated tree species with potentially long-distance pollen flow (Belmonte et al. 2008; Oddou-Muratorio et al. 2011; Piotti et al. 2012). However, despite the low genetic differentiation in general, STRUCTURE analysis with SSRs identified Chamoson as a genetically distinct population (Fig. 1a); additionally, Chamoson also had the highest pairwise population differentiation in the adults. The PCA analysis suggested that Martigny could be also genetically different. Hypothetically, it could be due to seed transfer or planting using genetically different beech populations, but we do not see any reason why beech should be planted in these areas. However, we have no information about the long-term history of these two stands, but intense forest management is very unlikely given the steepness of the forest area in Chamoson and Martigny and the poor soil conditions (very shallow rocky soil) in Martigny. However, some past forest management cannot be completely ruled out as one of the explanations of distinctiveness of these two populations from others.

Tentative adaptive genetic variation (SNPs)

Due to their location in coding regions, SNPs in candidate genes are potentially under natural selection and, thus, may not behave as one would expect it for neutral markers. They might represent adaptive genetic variation, which is essential for a species to cope with environmental changes. Therefore, the investigation of SNPs under natural selection provides insight into the genetic adaptive potential of a species. In this study, high genetic variation in SNP markers was found in the studied populations of European beech (Table 5). Similar levels of heterozygosity have been found in other studies using SNP markers (Seifert et al. 2012; Müller et al. 2015a), indicating a great adaptive potential for this species. No significant deviations from HWE were found for nearly all the SNPs (Supplementary material 1—Table S5), indicating that their variation is in the mutation–selection balance which can be expected for open-pollinated highly outcrossing species with large effective population size, such as European beech (Jump et al. 2006; Aitken et al. 2008).

Furthermore, high levels of gene flow are expected in European beech (Belmonte et al. 2008; Oddou-Muratorio et al. 2011; Piotti et al. 2012). Gene flow leads to the homogenization of allele frequencies in both neutral and adaptive loci and, consequently, to a decrease in population differentiation (Holderegger et al. 2006). In fact, we found a weak population structure not only for SSR markers and potentially neutral SNPs, but also for the SNPs that are very likely under selection (Supplementary material 2—Figs. S19S28). Indeed, although gene flow can counteract the effects of selection, it has been suggested that adaptation in forest trees can be maintained despite high gene flow (Kremer et al. 2012; Tigano and Friesen 2016). Thus, the mechanisms maintaining adaptation in this species remain for further exploration (Tigano and Friesen 2016).

In addition, low LD is also expected for a highly outcrossing, wind-pollinated tree species, such as European beech (Jump et al. 2006; Aitken et al. 2008). LD analysis revealed that 5.5 and 4.4% of all the possible SNP pairs were found to be in LD in the saplings and adults, respectively. These values are similar to the percentage (5.01%) reported by Pluess et al. (2016) for populations in Switzerland, but considerably lower than 18.45% reported by Müller et al. (2015b) for populations in Germany.

Signatures of natural selection at the SNP markers

Outlier detection tests can produce false outliers due to confounding factors (Schoville et al. 2012; Vitti et al. 2013). Thus, to address this problem, tests with different demographic assumptions can be used and compared, and common loci detected in consensus are more likely to be real targets of selection (Li et al. 2012). In this study, the three outlier methods detected different sets of outlier SNPs, but partly overlapping (Table 6). Such discrepancies are common and have been reported also in other studies (Russello et al. 2012; Tsumura et al. 2014; Konijnendijk et al. 2015). They are attributed to the different demographic assumptions underlying each outlier method and the different rates of type I and type II errors (Narum and Hess 2011). Interestingly, no SNPs were identified as outliers by BayeScan. Indeed, BayeScan is considered more conservative in identifying outlier SNPs than other methods (Narum and Hess 2011). In total, only three SNPs (4.3%) were detected as outliers under positive selection by at least two methods in the adults—CysPro_202, NAC_962, and 92_352 (Table 6). We consider them as likely true outlier SNPs under selection. The first two of them were also detected as outliers in the saplings. The small proportion of outlier loci detected is in line with other studies carried out in forest trees, such as boreal black spruce (Prunier et al. 2011), Cryptomeria japonica (Tsumura et al. 2014), and Quercus petraea (Alberto et al. 2013). This may be due to the limited sensitivity of outlier methods to detect subtle changes in allele frequencies, as what occurs when selection is weak or there is high gene flow counteracting selection (Narum and Hess 2011; Rellstab et al. 2015; Stephan 2016). The power of FST outlier tests depends also largely on sample size and number of sampled populations; a higher proportion of outliers can be identified with a larger number of individuals and populations (Lotterhos and Whitlock 2015; Ahrens et al. 2018).

Unlike FST outlier tests, EAA are more sensitive to subtle changes in allele frequencies and also more robust to small sample sizes (De Mita et al. 2013; Stephan 2016; Ahrens et al. 2018). This could explain the higher number of SNPs potentially under selection detected by EAA (Table 7). However, EAA approaches could be prone to false positives, especially if a hidden population structure is unaccounted (Rellstab et al. 2015). In this study, by using SSR markers, weak neutral population structure was found in saplings and adults, although there are possibly two clusters in the saplings (Fig. 1a). Thus, the potential confounding effect of neutral genetic structure was accounted for in the analysis with Bayenv2 and Samβada. Bayenv2 detected 20 and 7.1% SNPs significantly associated with environmental variables in saplings and adults, respectively, while Samβada detected 62.9 and 22.95%, in saplings and adults, respectively. In general, the two methods detected different sets of SNPs as potential candidates under selection (Table 7), which is expected given the different statistical frameworks of the methods (Coop et al. 2010; Günther and Coop 2013; Lotterhos and Whitlock 2015; Stucki et al. 2017). However, 5.7 and 18.6% of the 70 SNPs were consistently identified by both EAA methods in adults and saplings, respectively (Table 7), and they showed differences in allele and genotype frequencies in contrasting environments, as demonstrated for some example SNPs in Figs. 2 and 3. These percentages are moderately high compared to other studies that used candidate genes (3.6–11.1% SNPs showing significant association, e.g., Alberto et al. 2013; Pluess et al. 2016; Rellstab et al. 2016).

Including correlated environmental variables in EAA may inflate associations (Cushman and Landguth 2010). To avoid this, two general approaches can be used: (1) reduction of the number of variables by removing the ones highly correlated or (2) integration of the highly correlated variables by using PCA (Rellstab et al. 2015). In this study, the PCA approach was used to generate synthetic variables. Nevertheless, to avoid missing important associations, single environmental variables were also used. Although possible false positives could result from using both individual correlated environmental variables and synthetic variables based on them and derived from PCA, it is unlikely that they would affect severely our results, since most of the SNPs showing association with an environmental PC also showed association with the variables contributing to such PC (Tables 3 and 7). Furthermore, only SNPs detected by both Bayenv2 and Samβada were considered as very likely under selection, which helped us decrease the risk of having false positives. Noticeably, supporting our statement, it has been found that the number of environmental variables did not affect the proportion of SNPs detected as potentially adaptive (Ahrens et al. 2018).

Studies using a low number of SNPs might have a greater proportion of false positives due to an inadequate characterization of neutral genetic structure (Ahrens et al. 2018). Although the number of SNPs used in this study was low, it is unlikely that the proportion of SNPs significantly associated was severely affected by a poor characterization of neutral genetic structure, since such characterization was done with SSR markers, which are considered to represent neutral genetic variation and have high power to estimate population structure due to their high polymorphism (Selkoe and Toonen 2006; Holderegger et al. 2006, but see Fischer et al. 2017).

In total, 20 and 12.8% SNPs were detected by at least two of the five methods (LOSITAN, Arlequin, BayeScan, Bayenv2, and Samβada) in saplings and adults, respectively, and were considered as the most likely true candidates under selection in the studied populations. The comparative aspects of this study add special values to the obtained results, because loci detected by two or more methods and in both generations can be very likely considered as true candidates under selection. Some of these SNPs have shown evidence of selection also in other studies of European beech; for example, the CysPro_728 SNP has been associated with bud burst (Müller et al. 2015b), a known adaptive trait, and the 50_232, 52_1_235, and 52_1_368 SNPs have shown evidence of epistatic selection (Csilléry et al. 2014). Although the rest of the SNPs very likely under selection have not been reported as such by other studies on European beech, those studies showed that other SNPs from the same genes could be under selection (Csilléry et al. 2014; Müller et al. 2015b; Pluess et al. 2016; Krajmerová et al. 2017), stressing the importance of the studied candidate genes in the adaptation of European beech to different environmental conditions. Indeed, we found potentially adaptive SNPs in 50% of the candidate genes in the saplings and in 29.2% of the candidate genes in the adults. These genes are involved in a wide range of cellular functions and represent oxidoreductases, hydrolases, oxidases, transferases, transporters, chaperones, and transcription factors, which could be expected since many traits in plants are polygenic, involving complex interactions among several genes (Ingvarsson and Street 2011). Besides, SNPs in these genes have also shown signatures of selection in other plant species. For example, SNPs in the NAC gene have been detected as potentially under selection by outlier analyses in white and black spruce (Namroud et al. 2008; Prunier et al. 2011), and SNPs in the DAG and PP2C genes have been associated with environmental variables such as temperature and water availability in Dodonaea viscosa (Christmas et al. 2016). Thus, further investigation including additional SNPs and other candidate genes might help confirm the results found in this study.

Partly different SNPs showing signatures of selection were detected in saplings and adults (Tables 6, 7, and 8). Different sample sizes between saplings and adults could result in different detection power (Lotterhos and Whitlock 2015). Thus, a possible failure in the identification of the same potentially adaptive SNPs in adult trees could explain the partly different and larger numbers of SNPs under selection found in the saplings. In addition, it should be taken into account that the environment can impose different selection pressures at different life stages, and as a result, the set of genes controlling the same trait might differ (Petit and Hampe 2006; Prunier et al. 2013). Moreover, due to high competition and mortality, only a small fraction of seeds survive until the adult stage (Petit and Hampe 2006), which means that adult trees have passed through different selection pressures during their life. This could be also reflected in the different set of SNPs showing signatures of selection in saplings and adults.

Not only nonsynonymous SNPs showed signatures of selection, but also synonymous and noncoding SNPs. Since nonsynonymous SNPs represent amino acid replacements and, thus, a change in protein sequence, they have been usually thought to be the main target of natural selection. However, several studies indicated that synonymous substitutions may affect mRNA splicing, stability, and translation kinetics (Chamary et al. 2006; Komar 2007), affecting also the production of the final protein (Pagani et al. 2005). Similarly, SNPs in noncoding regions may also be involved in the regulation of gene expression (Barrett et al. 2012). Therefore, synonymous and noncoding SNPs can show signatures of selection not only due to a tight linkage with selective loci, but also because they can be under natural selection directly (Morin et al. 2004; Fyon et al. 2015).

Even though the SNPs identified only by one method could be considered false positives, they should not be disregarded for further investigation, especially since some of them have been found to be associated with important climate-related traits and environmental variables in other studies (Müller et al. 2015b; Pluess et al. 2016). Thus, to determine their participation in the adaptation to different environmental conditions of populations of European beech, other approaches could be used. For example, haplotypes can have a substantial advantage over single SNP analysis for the detection of adaptive genetic variation (Balding 2006; Rajora et al. 2016), as well models incorporating polygenic and epistatic selection (Pritchard and Di Rienzo 2010; Fu and Akey 2013; Csilléry et al. 2014).

Finally, although the use of individual environmental variables and synthetic variables derived from PCA in our EAA allowed us to reduce the risk of missing important SNPs showing associations with environment, some of the environmental variables were highly correlated, which makes it difficult to determine their relative importance on the selection process (Rellstab et al. 2015). In addition, it is possible that other environmental factors that were not accounted for could also exert selection pressure on the studied populations. In this study, climate data were taken from stations less than 10 km away from the actual populations. However, the Alps have high variation in topography, and climatic factors such as temperature and precipitation can vary over short distances (Baruck et al. 2016). Therefore, small-scale heterogeneity and microclimatic conditions specific to a respective population that were not accounted for could explain some of the differences in allele frequencies. Furthermore, although precipitation and temperature are the main climatic factors influencing plants’ distribution, which is supported by several studies that showed their association with potential adaptive genetic variation in the Alps (Poncet et al. 2010; Manel et al. 2012; Pluess et al. 2016), soil properties also affect plants’ distribution, because water availability depends on the interaction between climatic variables and soil characteristics (Piedallu et al. 2013). For example, Gärtner et al. (2008) found that lower humidity can be compensated for by a greater available soil water storage capacity (ASWSC) that allows the growth of beech. Furthermore, soil properties affect not only the present distribution of plants in the Alps but also determined the migration pathways during the post-glacial recolonization (Alvarez et al. 2009). Thus, the identification of adaptive genetic variation might be improved by including not only climatic variables but also microclimatic conditions and soil characteristics. However, this task remains challenging, since characteristics of alpine soils vary considerably over short spatial ranges, and soil information is still limited (Baruck et al. 2016).

In this study, a candidate gene approach was used to investigate adaptive genetic variation in European beech. By combining genetic variation in SNPs in candidate genes, outlier detection tests, and environmental association analysis, it was possible to identify loci showing signatures of selection. This opens new perspectives for understanding the genetic basis of adaptation of F. sylvatica to different environmental conditions.