Introduction

Habitat fragmentation may strongly affect plant population genetics by resulting in decreased effective population sizes (Ellstrand and Elam 1993) and reduced gene flow among populations (Schaal and Leverich 1996; Couvet 2002), thereby potentially causing inbreeding effects and the loss of genetic diversity (Keller and Waller 2002). Ultimately, this can put species survival at risk and is therefore considered one of the major threats to biodiversity (Young et al. 1996). Fragmentation has a higher impact on organisms with low dispersal ability and organisms that are obligate outcrossing. Autogamous plants are expected to be little affected by fragmentation, while exclusively cross pollinated plants (self-incompatible dioecious plants) would be highly sensitive to fragmentation effects (Berge et al. 1998; Larson and Barrett 2000; Lennartsson 2002). In this group, wind pollinated plants are less affected than animal pollinated plants (Berge et al. 1998; Weidema et al. 2000).

The Mediterranean basin is a hotspot of biodiversity, despite the millennial influence of human populations (Cowling et al. 1996; Blondel et al. 2010). Since the introduction of agriculture, continuous expanses of natural forests were transformed into a mosaic of cultivated fields and forest patches. With time, many species adapted to this kind of conditions, which allowed them to persist in highly fragmented cultural landscapes (Blondel et al. 2010). After the middle of the twentieth century, with modernization and intensification of agriculture, cultivation of arable fields that were not suitable for high productivity and machinery have been progressively abandoned (Sluiter and de Jong 2007). This resulted in the recolonization of trees and shrubs from the surrounding areas which changed the habitats significantly (Chauchard et al. 2007). A number of studies have evaluated the effect of this development on the biodiversity of those regions (e.g. Sluiter and de Jong 2007; Porto et al. 2011; Santana et al. 2011). Nevertheless, past fragmentation and recolonization may also have an effect on the genetic diversity of tree and shrub populations especially, but to the best of our knowledge, this has never been analyzed (but for comparable studies in other regions see Jacquemyn et al. 2009; Leite et al. 2014).

The Sabor region from Northeast Portugal is a good example of past fragmentation caused by agriculture (Hoelzer 2003). Human population and agriculture strongly expanded in the area from the mid-nineteenth century to the 1950s. Then a process of human population decline and land abandonment started and it is ongoing until today. Until the mid-twentieth century, almost all the area was cultivated, with remnants of native woody vegetation being confined to rock outcrops and the steepest slopes of the river valley. In the second half of the last century, there was a progressive recovery of shrub land and forest vegetation that now covers vast areas across the region. A peculiarity of this region is two hydroelectric dams built between 2009 and 2013 which started flooding the valley in the winter of 2013/2014. As a consequence, the dams will contribute to destroying at least some of the forest patches near the river that may be the origin of the colonization of the abandoned farmlands. This makes our study of high priority because understanding the spatial genetic structures of populations destroyed by the dams will help to understand the impacts of these constructions in the future.

In this study, we investigate the genetic structure of stands of Juniperus oxycedrus L. (prickly juniper, Cupressaceae). We expect that the structure patterns found would reflect the impact of the past fragmentation followed by land abandonment and subsequent expansion of woody species. The species was considered particularly suitable to study this process, because its expansion in the region is occurring through natural processes, whereas afforestation has facilitated the expansion of other dominant trees such as oaks Quercus spp. J. oxycedrus are small trees or big bushes with a distribution from Portugal (West) to Iran (East) and from Morocco (South) to France (North) (Amaral 1986). It is a drought resistant, light demanding species that together with its low soil quality requirements makes it a common pioneer of areas affected by fire and deforestation (Bondi 1990; Cano et al. 2007). These characteristics make it more likely to establish quickly after abandonment of agricultural fields. Sluiter and de Jong (2007) found that when the use of agriculture fields stopped, pioneer vegetation occupies it immediately. J. oxycedrus is a dioecious and wind pollinated plant with low fertility (Ortiz et al. 1998). This means it may not be as affected by fragmentation effects like the other exclusive cross-pollinated plants, though its low reproductive success may make it more susceptible. J. oxycedrus has a low production of viable seeds (Juan et al. 2003), and studies with its close relatives Juniperus macrocarpa and Juniperus communis showed seed viability values of 12 and 3 % (Juan et al. 2003; Verheyen et al. 2005). The seeds are mainly dispersed by small mammals and birds, making J. oxycedrus a long dispersal species (Flynn et al. 2006). In a J. oxycedrus population in Italy, Baldoni et al (2004) found that these trees achieve maturity between 17 to 21 years of age, and the oldest individual was 83 years old.

Former studies on J. oxycedrus had mainly a morphological (e.g. Klimko et al. 2007; Brus et al. 2011), phytosociological (e.g. Cano et al. 2007), and demographic (e.g. Baldoni et al. 2004) focus. The only molecular studies were made at the European scale, focusing on species and subspecies differentiation (e.g. Adams et al. 2005; Boratyński et al. 2014). Population genetics studies have been made mainly on other juniper species like J. communis and J. macrocarpa. Most of them used chloroplast (Provan et al. 2008; Juan et al. 2012) and dominant markers like AFLP and isozymes (Lewandowski et al. 1996; Van Der Merwe et al. 2000; Oostermeijer and De Knegt 2004; Vanden-Broeck et al. 2011). Michalczyk et al. (2006) was able to develop five microsatellite loci for J. communis that were later used by Provan et al. (2008) in J. communis from populations from Ireland, and by Boratyński et al. (2014) for genetic differentiation of several juniper species. Although the use of microsatellites is an improvement in relation to the other markers, using such a low number of markers is not powerful enough to detect structure in small geographical scales.

In this study, we identified a set of microsatellite loci suitable to identify small scale genetic structure in J. oxycedrus. We then investigated genetic structure and diversity patterns at these loci in J. oxycedrus trees in the Sabor region to determine the main source of genetic structure in the area, in order to estimate signs of past and present impacts of the agriculture land use on the vegetation. Specifically, we aim to answer the following questions: 1) Is there a signature of past fragmentation in the current genetic structure of the population? 2) Does genetic structure depend on geographical distance? 3) Does it depends rather on the age structure of the trees and thus reflect former recolonization pathways? The study is part of a Long Term Ecological Research (LTER) project designed to determine the impact of fragmentation caused by agriculture and infrastructure development. Monitoring of J. oxycedrus populations in this region will be made during the lifetime of the hydroelectric dams (about 65 years). This way it will be possible to observe the development of genetic structure after the construction event.

Material and methods

Study area

Field sampling was carried out in 2011 in the valley of the lower reaches of the Sabor River. Climate of the region is Mediterranean with a subcontinental character; in the river valley and slopes, the mean annual temperature is 14–16 °C and annual precipitation is 400–600 mm, whereas in the surrounding plateau, the mean annual temperature is 12–14 °C and the annual precipitation is 500–800 mm (Hoelzer 2003). The precipitation is concentrated in the wet and cold semester (October–March), and it is virtually nil in summer. The bedrock of the study area is dominated by schists, whereas the soil types prevailing are leptosols and anthrosols (Hoelzer 2003). Human population is low and concentrated in a few scattered villages, and the declining agriculture is dominated by olive and almond groves, and extensive livestock grazing (mainly sheep and goats). Natural vegetation is recovering after the peak of agricultural expansion in the 1950s, and includes forest patches dominated by cork oak Quercus suber or holm oak Quercus rontundifolia, sometimes in combination with prickly juniper, tall scrubs often dominated by prickly juniper, and shrublands with species of, for instance, Cistus, Genista, and Cytisus (Costa et al. 1998; Hoelzer 2003).

Sampling strategy and DNA isolation

We collected leaves from 126 individuals corresponding to five populations defined a priori according to the geographic clustering of the individuals (Online Resource 1; Fig. 1). This corresponds to a subset of the existing J. oxycedrus in the region, though it is not possible to estimate what portion of the total population was sampled because no census study is currently available. This study is part of a broader project aiming to study the influence of the construction of the two dams on the genetic diversity of J. oxycedrus in the region. Because of that, the sampling was made in a way to cover localities alongside the river on both sides of the dam construction sites (populations II to V). One population (I) was sampled to represent regions away from the river. We measured tree trunk perimeter for 55 individuals (Table 1), which was used as a proxy to determine the relation between tree age and genetic structure. Because of high vegetation density and other accessibility problems in some regions, the trunk perimeter could not be measured for all individuals sampled. Some of the unmeasured trees were estimated visually to have trunk perimeters >100 cm, and were thus classified as “>100 cm” and included in all the analyses using trunk perimeter (Table 1; Online Resource 1). A sample of J. oxycedrus from the UTAD (Universidade de Trás-os-Montes e Alto Douro) botanical garden was collected for marker discovery (Reference: D7D8J7).

Fig. 1
figure 1

Map of the study area, showing sampling localities and the area to be flooded by the Sabor dam. The background gray pallet corresponds to the altitudinal gradient. The rectangles correspond to the populations defined a priori

Table 1 Number of individuals sampled for genetic analysis per a priori population. For each population, the table also shows the number of individuals measured for trunk perimeter, minimum and maximum measures, the average and standard deviation of the trunk perimeter, and the number of unmeasured individuals with trunk perimeter visually estimated to be >100 cm

For DNA isolation, leaves were stored and dried in silica gel. Twenty milligrams of leaf tissue was ground by stainless steel beads in 2 mL tubes using a Retsch Mill (MM400), at maximum force (30 Hz). The remaining procedure was performed using the protocol described by Alexander et al. (2006) using a CTAB-based Lysis buffer and a 96 well format like described earlier (Curto et al. 2013).

Marker discovery, screening, and data production

Microsatellite discovery was performed by pyrosequencing of a genomic library enriched for microsatellite motifs in a Roche 454 GS-FLX platform, subsequent marker screening, and primer design as a service by Genoscreen (Lille Cedex, France). To do so, the company used 1 μg of DNA for library construction. This library was subsequently enriched for the following motifs: TG, TC, AAC, AAG, AGG, ACG, ACAT, and ACTC. The result from this step was sequenced using only 1/12th of the 454 platform capacity. The resulting sequences were filtered so that they contained microsatellite motifs that were suitable for primer design. Forward primers were tagged with an extra oligonucleotide sequence on their 5′ end according to the M13-tailed primer method (Oetting et al. 1995) as described earlier (Curto et al. 2013). Those corresponded to four universal primer tails with complementary sequence to a third primer that had a sequence specific florescent dye: 6-FAM (TGTAAAACGACGGCCAGT), VIC (TAATACGACTCACTATAGGG), NED (TTTCCCAGTCACGACGTTG), and PET (GATAACAATTTCACACAGG). A GTTT tail was added to the 5′ end of the reverse primer to minimize the polymerase stuttering effect. In this three primer assay, the forward and reverse primers were used to amplify the region of interest and the tail primer to tag the resulting fragment with a fluorescent chromatophore. The different universal primers allowed multiplexing in the PCR.

All primers were first tested in a simplex on their ability to amplify the sample used for marker discovery. With this step, primers that had mismatches presumably due to sequencing errors were sorted out. Amplification was done using QIAGEN Multiplex PCR Master Mix (Qiagen, Valencia, CA, U.S.A.) in 10 μL reactions containing 5 μL of QIAGEN Multiplex PCR Master Mix, 3.3 μL of water, 0.5 μL of DNA, 0.4 μL of each primer solution with the following concentrations: 10 mM for the reverse and the universal florescent primer, and 1 mM for the forward primer. The temperature profile is described below. All primers that showed positive results were multiplexed in four combinations of 50 μL solutions containing 4 nmol of each forward primer, and 40 nmol of each reverse and florescent universal primers. Each multiplex contained a maximum of four primer pairs corresponding to the four dyes used. Those mixtures were then used to amplify all 126 samples using the PCR protocol described above with 1 μL of primer mix and adjusted amount of water.

All PCR reactions were executed according to the following temperature profile: initial denaturation/activation step of 15 min; 95 °C for 30 s; touchdown starting at 62 °C to 56 °C, decreasing 0.5 °C per cycle for 60 s; extension at 72 °C for 30 s; followed by 20 cycles at 54 °C and eight cycles at 53 °C. Amplification success was evaluated by electrophoreses on a 2 % agarose gel. Nevertheless, all amplification results were also genotyped with an internal size standard (Genescan-500 LIZ; Applied Biosystems, Inc., Foster City, CA, U.S.A.) in an ABI3130xl automatic sequencer (Applied Biosystems, Inc.). Alleles were called using GeneMapper ver. 4.0 (Applied Biosystems, Inc.).

Variability and genetic structure detection

The markers with positive amplification for most of the samples were screened for variability and information content. Variability was assessed by the number of alleles per marker and information content by calculating average polymorphism information content (PIC) and expected heterozygosity (He) per marker using the program Cervus ver. 3.0.3 (Kalinowski et al. 2007). Monomorphic loci were not included in further analyses. Deviations from Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) estimations were tested using the software Genepop v 4.2 (Rousset 2008) and FSTAT (Goudet 1995), respectively. Frequency of null alleles was estimated using the program FreeNA (Chapuis and Estoup 2007).

Genetic structure patterns were assessed with and without location coordinates of the individuals as prior information using the software Geneland v.4.0.4 (Guillot et al. 2005) and STRUCTURE v. 2.3.4 (Hubisz et al. 2009), respectively. Since the populations studied are in a small area and gene flow among them is expected, STRUCTURE analysis was performed using the admixture model and correlated allele frequencies among populations. To find the best value of K, the program was run using K values of 1 to 20 (ten replicates each) for 200,000 replications, and eliminating the first 100,000. The best value of K was defined by Evanno et al. (2005) method as implemented on Structure Harvester v0.6.93 (Earl and vonHoldt 2012). A STRUCTURE analysis with the best K was performed with the same parameters but letting the program run for 1,000,000 replications excluding the first 500,000. All the analyses made with STRUCTURE were performed with and without using populations as a prior. Initially, Geneland was run for 100,000 replications for a maximum number of populations ranging from 2 to 20. The results were recorded every 100th iteration and the first 50,000 replications were excluded. The optimal number of populations was defined by choosing the higher number of clusters where most of the individuals showed an assignment larger than 0.5 to one of the clusters. As in STRUCTURE analyses, allele frequencies were considered to be correlated. Tests with and without the spatial and null allele model were performed as described above with the best maximum number of populations found. The model combination that resulted in the least ambiguous cluster assignment was used in an additional run, where 1,000,000 iterations were performed and results were recorded every 100 replicate. As in the previous tests, the first half of the results were excluded.

Genetic structure was estimated for populations circumscribed as explained above and two age classes. Trunk perimeter was used to separate “younger” from “older” individuals. Studies from J. oxycedrus in Italy indicate that a trunk perimeter of 60 cm corresponds to trees about 70 years old (Baldoni et al 2004). This value was used as threshold to divide the samples in a group of individuals that were likely present before the peak of land use abandonment (about 1940-50) and a group that established later. This threshold also resulted in highest genetic differentiation as estimated from AMOVA when testing groupings with thresholds set at 10 cm intervals from perimeters 30 to 100 cm.

An AMOVA analysis was also used to evaluate which grouping explained the highest amount of difference among groups. AMOVA was done using the software Genalex v. 6.5 (Peakall and Smouse 2012) with the following grouping: populations, trunk perimeter, and STRUCTURE and Geneland clusters. Groups according to STRUCTURE and Geneland, corresponded to the three clusters indicated by these analyses, respectively. Hereby, samples with assignment probability higher than 0.5 for a certain cluster were considered. The AMOVA analyses was performed using F ST instead of R ST because it has been shown that this measure is more efficient in detecting genetic structure in population with high degree of admixture (Balloux and Goudet 2002), which is expected due to the relatively small distance among individuals.

Genetic diversity variation among populations and tree size categories was also determined using the software FSTAT by calculating the average expected heterozygosity (He) and allelic richness (Ar) per grouping. As an additional measure of differentiation, populations’ pairwise F ST was calculated using the program FreeNA. This software was used because it can incorporate a null allele correction. The software Colony (Wang 2004) was used to find possible family relationships in the dataset containing trunk perimeter information. Trees with perimeter above 60 cm were assumed as potential parents and trees with perimeter below 60 cm as potential offspring. The mating system was considered to be polygamous and because we are studying a confound area that was previously fragmented, we allowed inbreeding to be possible. A medium length, full likelihood run was performed with medium precision allowing allele frequencies to be updated. Because there is no knowledge of family structure, no siblingship prior was used. Since information about the gender of the individuals was not collected, the potential parents were considered to be both potential fathers and mothers.

Results from BOTTLENECK v. 1.2.02 (Cornuet and Luikart 1997) were used to evaluate the possibility that demographic expansions or declines or founding effect led to the current structure. This was done assuming the Stepwise Mutation Model (SMM) since all the alleles were separated according to their repeat motif. Significant deviations from the mutation-drift equilibrium were calculated using Wilcoxon signed-rank test because it is the most reliable when low number of loci are used (Piry et al. 1999). As a complementary measure to this analysis, HWE deviations for all populations and trunk perimeter categories were estimated using the program Genepop v 4.2.

Results

Pyrosequencing of the enriched library resulted in 32,883 sequences from which 6293 contained microsatellite motifs. From all the potential primer pairs analyzed, 241 passed Genoscreen bioinformatics validation. From those, 42 primer pairs were constructed and tested for amplification success, resulting in 18 markers that could be used for genotyping all samples (Online resource 2).

All primers amplified most of the samples and only two of them (Joxy28 and Joxy35) had an amplification failure larger than 10 %. Joxy11 was the only monomorphic marker and for that reason was the only marker excluded from further analyses. The remaining markers had between two and eleven alleles (Table 2). The marker with lowest PIC and He values (Joxy13; PIC = 0.08; He = 0.09) was not the least polymorphic with three alleles. The most polymorphic and informative marker (Joxy8) had a PIC and He values of 0.83 and 0.85, respectively. According to the LD deviation test using a Bonferroni correction, none of the markers were linked. Six markers deviated significantly from Hardy-Weinberg equilibrium due to heterozygotes deficiency (F IS of these markers ranged from 0.27 to 0.89). In addition, these six markers were the only ones presenting a frequency of null alleles greater than 0.1. To test the utility of those markers, the analyses evaluating population genetic diversity and structure were performed with two datasets. The first one was composed by all markers that were variable (17 marker dataset) and the second one was composed by all markers that did not deviated from HWE (11 marker dataset).

Table 2 Markers variability, amplification success and Hardy-Weinberg deviation statistics.

STRUCTURE analysis indicated K = 3 as optimal according to Evanno’s method. The analyses with and without a priori knowledge of populations had similar results. A priori knowledge of population assignment improved the AMOVA, but the structure found did not correspond to any geographical pattern, with individuals from the same populations being assigned to different clusters (results not shown). The analysis without the markers deviating from HWE showed the same assignment probability to all the clusters for all individuals.

In Geneland analysis, the optimal number of populations was also three. However, when the non-spatial and null allele models were used, all the samples were assigned to the same cluster and so these models were not used in further analyses. No structure among individuals was found for the analyses using 11 markers, so only results with the 17 markers are shown. For the analysis using the spatial model, all the populations showed assignment to more than one cluster (Fig. 2a). Cluster 1 (white in Fig. 2a) was more frequently assigned in the south and cluster 3 (black) in the north, resulting in a slight signature of geographic structure. Nevertheless, all populations contained individuals assigned to at least two clusters. When Geneland’s cluster assignment probability was plotted according to trunk perimeter as proxy for tree age class, a similar result was obtained (Fig. 2b). Although all age classes show some degree of assignment to all clusters, trees larger than 60 cm showed more assignment to the northern cluster while smaller trees to the southern cluster.

Fig. 2
figure 2

Plots from the cluster assignment probability obtained in Geneland analyses for a maximum number of three groups. The plots were sorted according three different criteria: a a priori populations, b trunks perimeter and a priori populations, and c separate analyses of trees below 60 cm and trees above 60 cm

Two additional Geneland runs were performed using either “younger” or “older” trees to evaluate whether there was any spatial pattern within age classes. In these new GENELAND analyses, the best grouping was two for “younger” trees, and three for “older” trees. For “younger” trees, there was almost no structure with just a very small tendency for a North to South gradient in the relative prevalence of the two groups in individuals genotyped (Fig. 2c). For older trees, we see a clear differentiation among populations, with one of the groups dominating in population V and III, another represented primarily in population IV, and another dominating in population II but also occurring in population IV (Fig. 2c).

AMOVA showed low levels of variation among groups and low values of F ST, from which only the ones with the 17 marker dataset were significant (Table 3). The analyses considering different populations showed both slightly higher F ST and slightly higher percentage of variation explained by differences among groups (F ST = 0.018 and1.8 %, respectively) when comparing to the division made with trunk perimeter (F ST = 0.014 and 1.4 %, respectively). The STRUCTURE test showed higher F ST values and higher variation among groups (11 % and 0.108, respectively) than the Geneland analyses (4 % and 0.045, respectively). However, for the STRUCTURE analyses, fewer individuals where used because there was more ambiguity in the assignment of the individuals.

Table 3 Summary results of AMOVA analyses for individuals grouped according to the sampled population, trunk perimeter (<60 cm vs > 60 cm), STRUCTURE analyses with and without population priors, and Geneland (See main text for details)

Expected heterozygosity and allelic richness per a priori population only varied a little, ranging between 0.38 to 0.48, and 2.03 to 2.28, respectively (Table 4). For both estimates, the least diverse was population III, which may be a consequence of its small sampling size. A similar result was found for the 11 marker set (exclusion of Markers with significant deviation from HWE). Overall, there was no clear geographical pattern in diversity. The He and AR estimates were slightly higher for older trees in both datasets (Table 4). Using BOTTLENECK, no population or trunk perimeter category defined showed significant deviations from the mutation-drift equilibrium (p > 0.01), meaning that there was no genetic signature of the major population expansion occurring since about the 1950s. Deviations from HWE were only verified for the 17 markers dataset (p > 0.01; see Table 4) for populations I and III and all trunk perimeter categories. Pairwise F ST values for the 11 marker dataset were smaller than the ones from 17 marker dataset ranging from −0.014 to 0.017 and from −0.004 to 0.089, respectively (Table 5). These values only corresponded to geographical patterns for the 17 marker dataset, where population I was more distant in relation to the others and, with exception of the comparison between population III and V, F ST values seemed to be related with geographical proximity when compared to the other populations. For example, populations II and III and populations V and IV seem to be more related to each other than to the others. Several negative values were observed which can be a consequence of continuous gene flow among these groups.

Table 4 Genetic diversity and HWE deviation test results for the populations defined and categories of trunks perimeter, computed using the data set either with (17) or without (11) markers under Hardy-Weinberg equilibrium
Table 5 Pairwise F ST for all pair of a priori populations, computed using the data set either with (17) or without (11) markers under Hardy-Weinberg equilibrium

The family structure analyses performed with Colony found eight potential relationships for the 17 marker dataset and 13 for the 11 marker dataset, corresponding only to a small portion of the individuals analyzed (Online resource 3). Most of the relationships were between population IV and V and within them. Besides those, there was only one relationship between population II and V. Each of the relationship found corresponded to a different potential offspring individual and to four potential parents from populations IV and V.

Discussion

Marker development

In this study, we were able to develop 17 variable microsatellite markers for J. oxycedrus. This improvement, compared with previous microsatellite studies on Juniperus (five markers for J. communis, Michalczyk et al. 2006), is likely to be a consequence of the high throughput sequencing technology used. The higher number of sequences generated, several thousand compared to several hundred sequences using a subcloning strategy, facilitates the process of microsatellite motifs discovery. Thus, using Next Generation Sequencing is becoming rapidly the method of choice for marker discovery (Ekblom and Galindo 2011). With pyrosequencing (the 454 technology), typically around 100,000 sequences can be produced (Loman et al. 2012) and, when an enrichment step is applied, most of the microsatellite loci from the genome can be represented on those sequences. In our case, although we did not use the full capacity of a run, we produced 32,883 sequences from which 6293 contained microsatellite motifs and 241 were good enough for primer design. Even when an approach without enrichment is used, a high number of potential loci are obtained. For example, Csencsics et al. (2010) and Curto et al. (2013) obtained 307 and 866, respectively, good quality microsatellite containing sequences from a total of 76,692 and 65,401 initial sequences. Nevertheless, using an enrichment step resulted in 19 % of the loci containing a motif, compared to 1.3 % in our earlier study (Curto et al. 2013). In this study, there was very small number of sequences suitable for primer design when compared to the microsatellite containing sequences (only 19 %). The reads excluded in this step did not contain sufficiently long flanking regions for primer design. This may be a consequence of the presence of some fragmented DNA that is preferably amplified in the emulsion PCR step of the library construction due to its smaller size. This may be creating overrepresentation of sequences containing microsatellite motifs in the beginning or in the end of the reads. Recently, alternative sequencing technologies like Illumina (Castoe et al. 2010), Ion torrent (Elliott et al. 2014), and PacBio (Grohme et al. 2013) platforms have been used for microsatellite discovery with similar or better results than pyrosequencing. Illumina and Ion torrent platforms produce smaller reads (maximum of 250 and 300 bp, respectively) than 454; however, they have higher throughput making them generally cheaper methods. PacBio produces longer reads (around 2000 bp) increasing the likelihood of obtaining a microsatellite containing read with flanking regions big enough for primer design.

The high number of sequences obtained by NGS allows pooling different samples during sequencing and screening for markers that are variable already within the sequencing step. We used only one individual mainly because we used only a comparably small subset of the 454 run for marker discovery, and we were interested in variation on a relatively small spatial scale. For both reasons, the amount of duplicated loci in the group of polymorphic markers might be increased if multiple non-barcoded individuals are used. Nevertheless, from the primers designed, 18 amplified most of the samples and only one of them was monomorphic, but in general the markers presented a low number of alleles. Only one of them had more than 10 alleles and most of them had less than five alleles. The sampling used was geographically very restricted (the most distant sampling localities were around 35 km apart) so low variation is expected. For future studies using a larger study area, the variability may be higher and the monomorphic locus may be variable in that situation. In addition, less variable markers may be especially useful when different species are used. Theoretically, if they are less variable in their motif, their primer binding regions might be more conserved and thus more likely to be cross amplified in different species. In Boratyński et al. (2014) work, the authors had a problem in cross-amplifying microsatellite markers in multiple Juniperus species, ending up with a dataset of only three markers. The more conserved markers described here may contribute for improvements in this kind of studies.

Six markers deviated from Hardy-Weinberg equilibrium because their heterozygosity was lower than expected. Of those, only one amplified in all the samples, and all of them had a high frequency of null alleles, in terms of missing data or determined by the program FreeNA. The genetic diversity measures could therefore be underestimated in our analysis. For the STRUCTURE and Geneland analyses, when these markers were discarded, the differentiation signal was lost. Excess of null alleles in microsatellite analysis is generally attributed to mutations in the primers’ binding sites. The variability in the primer binding site may reflect a general higher variability of these loci which is ultimately contributing to its higher information content. This might explain why their inclusion in the analyses lead to detection of a weak genetic structure patterns which cannot be resolved using only the 11 marker dataset. No structure was obtained using the complete dataset in Geneland when the null allele model was used. Geneland tends to overestimate null allele frequencies in case missing data are present (Guillot 2012) which could have influenced the inference.

Genetic structure

The dataset contains a weak signal of genetic structure, which was determined using Bayesian clustering methods as implemented in the programs STRUCTURE and Geneland. For the STRUCTURE analysis, the assignment within each population was very heterogeneous and there was no differentiation among the populations studied. For the Geneland analysis, the assignment was more homogeneous within each population, and there were clear differences among populations. AMOVA supported higher differentiation among groups for the STRUCTURE analyses. Geneland analyses uses spatial information as a prior, and the algorithm is not sensitive to loci deviating from Hardy-Weinberg equilibrium (Guillot et al. 2005). Contrary, STRUCTURE assigns individuals by optimizing HWE within each cluster (Hubisz et al. 2009), and the use of loci not in agreement with this assumption can result in a wrong inference. In the case of Geneland, none of the assumptions were violated, making it more suitable for this particular dataset. Moreover, due to the low variability found in the data, the spatial information may have been valuable to detect any weak signal of structure. The Geneland analyses implementing the spatial model recovered better results than when this model was not used. The non-spatial model does not consider geographical information, thus making the analyses less reliable in recovering structural patterns for populations with weak genetic structure (Guillot et al. 2008).

Similar to our analysis, other Juniperus species in larger geographical scales showed high variability within populations and low genetic differentiation. This was, for example, the case of J. communis populations from Northwest Europe (Van Der Merwe et al. 2000; Oostermeijer and De Knegt 2004; Michalczyk et al. 2006; Provan et al. 2008; Vanden-Broeck et al. 2011) and J. macrocarpa from Southern Iberia (Juan et al. 2012). Juniperus are dioecious wind pollinated plants, making gene flow among fairly distant populations likely. This pattern was found in populations that were way more distant from each other than the ones analyzed in here. Thus, the gene flow in the populations analyzed may be extremely high making it likely for those five populations to be in fact one. This would explain the low F ST values, the AMOVA results, and the fact that we found paternity relationships between distant populations (i.e. between populations IV and I). Moreover, our results show that genetic structure found was generally weak and it can be explained by geographical patterns and by tree age. This is mainly supported by the AMOVA and Geneland analyses. In our AMOVA result, the division by populations provided only slightly better results than the division by trunk perimeter. In the Geneland results, one of the clusters was more frequently assigned in the south and to younger trees, and the other in the north and to older trees. This can be a signal of a weak isolation by distance pattern and a week differentiation among young and old trees. The isolation by distance pattern is also supported by the pairwise F ST results, for which the more distant population (population I) is also the one more genetically dissimilar, whereas geographically close populations like IV and V or populations II and III showed the lowest F ST values. The weak genetic structure found has also been reported for other Juniperus populations (e.g. Van der Merwe et al. 2000; Oostermeijer and De Knegt 2004; Juan et al. 2012).

When Geneland analysis was ran separately for young and old trees, we found a clear geographical structure in older trees. No structure was found in younger trees showing that there is no bias created by our sampling. This analysis is furthermore congruent with a scenario of past fragmentation followed by recolonization. Genetic structure between populations of older trees indicates some restrictions to gene flow among these populations in the past. More recently, these barriers ceased to exist allowing for a population expansion. As it is shown from the results of Colony, some individuals appeared to have contributed more for the expansion of the J. oxycedrus than others, resulting in some alleles to be more frequent in younger than older trees due to founding effects. This ultimately resulted in the lack of structure in younger trees, which contributed to the differentiation observed between old and young trees, and the weakening of spatial structure. Because all the parental individuals are from populations IV and V, we could assume that those are the major contributors for the genetic pool of younger trees. However, we were only able to find family relationships in a small set of the samples used, especially when populations I and II are considered. Thus, younger trees may have been originated from populations that were not sampled and a more intensive sampling of the region and the surrounding areas is necessary in order to take conclusions about this matter. Nevertheless, the fact that a structure according to age exists is an indication that some populations are contributing more for the recolonization than others. This can be a consequence of demographic reasons, where some populations were larger and older when expansion started, making them numerically more likely to reproduce successfully. Selection of pioneering features may be also behind this finding. Some individuals may have presented characters that were advantageous to the establishment of pioneer vegetation being favorably selected. To test these hypotheses, a study assessing pioneering features such as seed production and viability is necessary. If no difference in these features is found among populations, the demographic hypotheses would be the most likely explanation.

This past fragmentation scenario followed by expansion of the populations is congruent with the land use history of the region. After arable fields were abandoned, the expansion of natural vegetation could take place and the population of J. oxycedrus increased. A similar pattern was found by Vellend (2004) in the forest associated herb Trillium grandiflorum, where individuals from younger secondary forest were genetically more divergent from individuals from older primary forest. Other studies using forest species found a loss of genetic diversity in recent forests that colonized abandoned agriculture fields (e. g. Jacquemyn et al. 2009; Leite et al. 2014). This loss of genetic diversity of younger plants is expected because populations undergone a genetic bottleneck due to the fragmentation and recolonization event. However, Juniperus plants are dioecious making them particularly sensitive to barriers to gene flow. Vanden-Broeck et al. (2011) found that younger trees had lower genetic diversity in fragmented populations of J. communis. Although we found a greater genetic diversity in older trees, the difference in relation to younger trees was small. In addition, no signal of deviation from HWE and mutation-drift equilibrium was found for both groups. This is an indication that the founding effect was not very severe, which may be a consequence of how recent the land abandonment began (around 60 years ago). Alternatively, as described above, the genetic diversity of Juniperus populations is highly heterogeneous being possible for one population to retain the same degree of genetic diversity when compared with a larger group of populations. The high variability within populations plus how recent the event is make in this case the founding effects not to create a drastic reduction of genetic diversity.

We found that the best trunk perimeter threshold to assess genetic differentiation between younger and older trees was 60 cm. This corresponds to trees about 70 years old, as inferred from a regression equation between trunk diameter and tree age for a J. oxycedrus populations from Italy (Baldoni et al 2004), considering that a perimeter of 60 cm corresponds to 19 cm of diameter in a perfectly circular section of trunk. Taking into account that the maturity age for J. oxycedrus is around 20 years, and that land abandonment started at around 60 year ago, it is likely that the trees with current perimeter above 60 cm were the only ones able to reproduce at the time. This made them the major contributors of the population’s expansion, which explains why the value of 60 cm was obtained as the best trunk perimeter division. Nevertheless, to be sure of this conclusion, a dendrochronological study would be needed to have a more accurate estimate of tree ages and the local relationship between trunk diameter and age.

Ideally, a fine-scale genetic structure analyses would have been performed, focusing not only on genetic diversity and structure patterns, but also on gene flow. Only a small set of the populations and localities present in the region were sampled, therefore some other populations may be contributing for the genetic poll of the region. For that reason, we did not want to take conclusions in the matter of gene flow patterns and relations among populations. Debout et al. (2011) did a spatial genetic structure analyses (SGS) for Distemonanthus benthamianus, a tree from Western Africa rain forest, which consisted in finding correlations between the coefficient of kinship and the spatial distance, allowing to find small scale genetic structure patterns. Using that analyses, they were able to estimate seed dispersal distances. The application of this analysis in J. oxycedrus populations would be mainly advantageous in a larger geographical scale. This way it would be possible to study the main barriers to gene flow in this species and not only in the current study area. In addition, it would also be interesting to see if similar patterns are found in other populations that were and were not fragmented in the past. That particular study would help to validate the findings in this paper. However, due to the high intensity of agricultural and pastoral activities in the Mediterranean region, it would be impossible or nearly impossible to find an area suitable for this study.

The finding that genetic structure in J. oxycedrus in the Sabor region is best explained by age has several implications: the recovery of the population after the past fragmentation might still be ongoing. Trees that established during the recolonization are not yet fully admixed with older trees, even though we would expect this to happen in the future. This has some implications for the impact of the dam, because it has caused the loss of part of the remnant trees that survived the peak of agricultural expansion in steep slopes of the river valley. Therefore, these trees will not be available as source of genetic variation in the future, and the current pattern of genetic structure might be stabilized. The data presented here are a first step in our attempt to study this context.