Introduction

Black poplar (Populus nigra L.) has a wide geographical distribution ranging from Central and South Europe to Central Asia and North Africa (Zsuffa 1974). This pioneer tree species of riparian ecosystems is dioecious whose pollen is dispersed by wind and seeds are disseminated by wind and water. P. nigra also readily propagates vegetatively from broken branches and cuttings; it plays a central role in poplar breeding programs and has contributed to many successful interspecific hybrids (Frison et al. 1995). Most of the P. × euramericana (Dode) Guinier hybrids are planted commercially for timber, whereas P. nigra ‘Italica’ is used for windbreaking or landscaping purposes. During the last centuries, large areas of the natural habitat of P. nigra have been lost because of changed management of riverbanks, involving drainage, more intensive grazing, and more frequent tree felling (Lefèvre et al. 1998). As a consequence, P. nigra is threatened with extinction. The introduction of a small number of P. × euramericana clones and P. nigra varieties, which will likely intercross with wild P. nigra trees, is another concern for the genetic diversity of P. nigra germplasm (Cagelli and Lefèvre 1995; Frison et al. 1995; Lefèvre et al. 1998), even though recent results suggest this may not be the major threat for the species (Tabbener and Cottrell 2002).

Currently, efforts are being made to restore the natural borders of rivers and their associated ecosystems. In softwood flood plain forests, P. nigra is a keystone species because it is highly adapted to water dynamics and sediment movement. Together with Salix species, P. nigra plays an important role in the initial phase of flood plain forest development. The remaining Black poplar stands as well as the germplasm collected in gene banks provide the genotypes for establishing new populations of P. nigra. Efforts should be made to optimize the level of genetic diversity in these new populations to enable them to adapt to changing environmental conditions, a prerequisite that is considered to be essential for the long-term survival of populations (Booy et al. 2000). Over the past 50 years, gene bank collections of P. nigra have been set up in most European countries, mainly for breeding purposes (Turok et al. 1995, 1998; Lefèvre et al. 1996; Heinze 1997; Vietto 2000; Cottrell et al. 2002; Storme et al. 2002; Table 1). Environmental influences on the morphology and the phenotypic differences between juvenile and mature characters make it difficult to discriminate between genetically different individuals on the basis of morphological traits alone and may result in genetic duplications within a collection, leading to increased space requirement and maintenance time (van Hintum et al. 1996; van Treuren et al. 2001). In addition, a number of hybrids with characteristics similar to those of P. nigra may have remained undetected.

Table 1 Overview of the number of accessions analyzed per country, the number of hybrids, the number of duplications, the number of unique genotypes based on SSR data, and the average genetic (dice) similarity between unique genotypes based on microsatellite data

Molecular markers can be used to detect hybrids and duplicated accessions and to establish the extent of genetic variation and the levels of heterozygosity (Karp et al. 1997). In several studies of Black poplar, (e.g., Rajora 1990; Legionnet and Lefèvre 1996; Legionnet et al. 1997), isozymes have successfully distinguished the different Populus species and their hybrids (Rajora 1989a; Janssen 1997; Heinze 1998). Amplified fragment length polymorphisms (AFLPs; Vos et al. 1995) produce multi-locus patterns without the need for prior knowledge of the genome. AFLP has been applied to characterize gene bank collections in other species (e.g., Chavarriaga-Aguirre et al., 1999; van Treuren et al. 2001). Microsatellites or simple sequence repeats (SSRs) are stretches of a variable number of tandem repeats with a core repeat of two to six base pairs (Hamada et al. 1982; Tautz and Renz 1984; Chambers and MacAvoy 2000; Schlötterer 2000) that can be amplified as single-locus, multi-allelic, and co-dominant markers using the unique flanking sequences obtained by sequencing genomic DNA (Smulders et al. 1997). Microsatellites are ideal markers for estimating the level of heterozygosity. In addition, because they are robust when used across laboratories, they are the most appropriate markers for establishing databases of germplasm collections in several different countries (Bredemeijer et al. 2002; Röder et al. 2002).

In the investigation reported here we have analyzed 675 P. nigra accessions by molecular markers in order to estimate the number of hybrids, the extent of duplications, and the genetic diversity within and between nine European gene bank collections. In addition, we explored to what extent the information obtained from the gene bank collections can be used to infer the structure and distribution of diversity in natural populations across Europe.

Materials and methods

Plant material

European gene banks of Populus nigra from nine countries were studied: Austria (AT), Belgium (BE), France (FR), Germany (DE), United Kingdom (UK), Hungary (HU), Italy (IT), The Netherlands (NL), and Spain (ES) (Table 1; Fig. 1). The criterion for the establishment of pure gene banks was the plant descriptor for P. nigra (Van Slycken 1995). Leaf tissues were collected for DNA extraction from up to 100 trees per gene bank collection, resulting in a sample of 675 accessions (Table 1). DNA was extracted with the DNeasy Plant Miniprep kit (Qiagen, Helden, Germany; http://www.qiagen.com).

Fig. 1
figure 1

Map of the collected trees

Marker systems

AFLP analysis was performed essentially as described by Vos et al. (1995) with modifications (Arens et al. 1998; Smulders et al. 2002). Three primer combinations were used: E32-M36 (EcoRI+AAC/MseI+ACC), E33-M40 (EcoRI+AAG/MseI+AGC), and E44-M40 (EcoRI+ATC/MseI+AGC). The selective amplification with E32-M36 and the gel running conditions were as described in Smulders et al. (2002); with E33-M40 and E44-M40, the selective amplification was as described by Cervera et al. (2001) and the gel running conditions were according to Storme et al. (2002). For each primer pair, one laboratory was responsible for analyzing all samples, thereby avoiding problems associated with consistency between laboratories. The presence of a band was indicated by “1”; its absence by “0”. To estimate the error percentage of the AFLP analysis, the primer combination E33-M40 was repeated on all Belgian accessions, and both analyses were manually scored independently by two persons.

The conditions for isozyme analysis were essentially according to Hochu and Fady (1998). Seven isozyme systems encoded by 11 loci were surveyed: 6-phosphogluconate dehydrogenase (6-Pgd-a), phosphoglucomutase (Pgm-b), malate dehydrogenase (Mdh-a, -b, -c, and -d), menadione reductase (Mnr-b), isocitrate dehydrogenase (Idh-a), phosphoglucose isomerase (Pgi-a, and -b), and leucine amino peptidase (Lap). For each locus, the bands were numbered according to their frequency, with the more common designated “1”, and so on.

Microsatellite analysis was performed as described by Smulders et al. (2002) and Cervera et al. (2001) with a γ-[33P]ATP-labeled primer or with a fluorochrome-labeled primer, respectively. Five loci were analyzed: PMGC14, WPMS09, WPMS14, WPMS16, and WPMS20. PMGC14 (forward primer 5′-TTCAGAATGTGCATGATGG-3′ and reverse primer 5′-GTGATGATCTCACCGTTTG-3′) was developed by the Poplar Molecular Genetics Co-operative, whereas those with prefix WPMS have been described previously (van der Schoot et al. 2000; Smulders et al. 2001). Microsatellite alleles were scored by comparison to 22 reference genotypes that are part of the EUFORGEN Core Collection that had been prepared by the EUFORGEN Populus nigra Network (Turok et al. 1995, 1998; Vietto 2000); these were also run on the gels. This collection aimed at presenting a maximum diversity in a minimum number of genotypes by choosing two morphologically different trees from 18 European gene banks. Alleles were assigned a syllable, starting with A for the largest allele (Smulders et al. 2002). Uppercase syllables were used first, followed, after 26 allele size classes, by lowercase syllables. This format of allele nomenclature makes it possible to enter the data directly, without any transformation, into popgene, the genetic analysis software. This procedure also avoids the need to obtain the exact molecular size of each allele, which is a time-consuming process that may be confusing when, as in this study, different detection systems are used among the laboratories. The band assignment and reproducibility among different laboratories was tested with a ring test (Bredemeijer et al. 2002; Röder et al. 2002) by using plants from the EUFORGEN Core Collection (Turok et al., 1998). The genotypes of the 22 accessions from the EUFORGEN Core collection are accessible through the EUFORGEN website (http://www.ipgri.cgiar.org/networks/euforgen/Networks/Poplars/PNTechnicalPapers.htm).

Data analysis

Hybrid detection

The isozyme loci Pgi-b, Pgm-b, and Lap are considered to be diagnostic for differentiating between (F1) hybrids and non-hybrid P. nigra (e.g., Rajora 1989a, 1989b, 1990; Janssen 1997). The hybrids that were detected by these enzyme systems were eliminated from further analyses.

Duplication of genotypes

Accessions that shared the same microsatellite multilocus genotype were identified and subsequently compared for isozyme and AFLP fingerprints. In these essentially pairwise comparisons, loci with missing values were ignored. Duplicated genotypes were reduced to one entry for further analyses. The remaining genets were grouped in two manners: by country and by geographical location (see below).

The power to detect duplicated genotypes is usually based on the effective number of alleles (N e ) averaged over loci. Because AFLPs and microsatellite markers vastly differ in this respect (many loci with N e ranging between one and two for AFLP and only a few loci with N e usually much greater than two for microsatellites), the average N e is not a good measure. Therefore, we developed a new measure, N p , the effective number of polymorphisms, which is the sum of N e per locus, corrected for the fact that each locus, even when monomorphic, will always have at least one effective allele:

$$N_{p} = {\sum\limits_{}^n {{\left( {{\left( {\frac{1} {{{\sum\limits_{i = 1}^k {p^{2}_{i} } }}}} \right)} - 1} \right)}} }$$
(1)

where n is the number of loci and p i is the frequency of the ith allele at a given locus (k alleles per locus). Interestingly, this correction (Eq. 1) is equivalent to correcting, for each locus, N e for the polymorphism information content (PIC) value of a locus. Given that

$$N_{e} = \frac{1} {{{\sum\limits_{i = 1}^k {p^{2}_{i} } }}}$$

(N ep of Weir 1989) for each locus, and that

$$PIC = 1 - {\sum\limits_{i = 1}^k {p^{2}_{i} } }$$

(modified from Botstein et al. 1980) for each locus, a combination of these two formulae gives, on a per-locus basis, the correction for one allele per locus:

$$N_{e} \times PIC = {\left( {\frac{1} {{{\sum\limits_{i = 1}^k {p^{2}_{i} } }}}} \right)} \times {\left( {1 - {\sum\limits_{i = 1}^k {p^{2}_{i} } }} \right)} = \frac{1} {{{\sum\limits_{i = 1}^k {p^{2}_{i} } }}} - 1 = N_{e} - 1$$
(2)

Equation 2 is valid for a single locus: summing over loci gives formula 1. The measure N p can thus compare the discriminating power of two sets of loci and is insensitive to both variations in N e among loci and to the presence of monomorphic loci.

Estimation of diversity by country

The dataset of the 465 unique P. nigra genotypes with known origin of location was subdivided by country. Multi-population descriptive statistics were calculated for all marker systems with popgene 1.31 (http://www.ualberta.ca/~fyeh/info.htm). All loci were considered for the microsatellite and the AFLP analyses, whereas for the isozyme analysis, only the polymorphic loci Pgm-b, Mdh-a, Mnr-b, Idh-a, and Lap. The following statistics were calculated: percentage of polymorphic loci, observed (N a ) and effective (N e ) number of alleles, observed (H o ) and expected (H e ) heterozygosity (based on SSR and isozymes), gene diversity (H, based on AFLP), and F-statistics. Dendrograms were inferred from distance matrices based on Nei’s genetic distance (Nei 1978). Principal component analysis (PCA), based on the frequencies of the five most frequent alleles for each SSR locus, was performed with the SAS Enterprise Guide Software (SAS Institute, Cary, N.C.) to indicate the genetic variation between countries. The product-moment correlation between the interpoint distance and the similarity matrices, calculated with the dice coefficient, and the Mantel test statistic were computed to measure the degree of relationship between geographical and genetic distance, by using the mxcomp module of the ntsys-pc software package, version 2.02i (Rohlf 1998; http://www.exetersoftware.com/).

Estimation of diversity by topographic regions

The sampling area was divided into discrete regions based on topography, taking care that each region contained sufficient genotypes. The dataset of the 465 unique P. nigra genotypes, for which the original locations were available, was divided into 19 regions (Table 2). With the exception of four regions, each region contained 12 or more genotypes. The analyses were done as described for the country divisions. For the combined SSR-isozyme analysis, the Fst value was also estimated with the Arlequin software program (http://lgb.unige.ch/arlequin/). The corresponding significance was calculated with 1,023 permutations. For the PCA analysis based on SSR data, only the frequencies of the five most frequent alleles for each SSR locus were taken into consideration. To visualize the differentiation within the 19 regions, we carried out principal co-ordinate analysis (PCO) based on the SSR data from the individual trees. Therefore, the microsatellite data were transformed into a binary data matrix, with “1” indicating the presence and “0” the absence of an allele, regardless of whether the locus was homozygous or heterozygous, and the dice coefficient was calculated. For the PCO analysis, the ntsys-pc software package, version 2.02i (Rohlf 1998) was used.

Table 2 Geographical subdivisions

Results

Hybrid detection

Based on their typical morphology, putative non-hybrid P. nigra trees were chosen from nine gene bank collections. The 675 trees obtained in this manner were genotyped with AFLP, microsatellites, and isozymes. The isozyme loci Pgi-b, Pgm-b, and Lap are considered to be diagnostic for differentiating F1 P. nigra × P. deltoides Marsh hybrids from non-hybrid P. nigra (e.g. Rajora 1989a, 1989b, 1990; Janssen 1997). By using these markers in combination it should also be possible to detect most of the F2 backcrosses. Only 21 hybrid trees (3.1%) were found, of which 11 were derived from the Belgian collection (Table 1). All of these were F1 hybrids—i.e., heterozygous at each of the three loci. The morphological plant descriptor was thus very efficient at distinguishing P. nigra from hybrids. These hybrids were eliminated from further analyses.

Duplication of genotypes

Based on N p (see Materials and methods), the power of AFLP (based on 108 markers) in detecting duplicates was 1.7-fold larger than of microsatellites (based on five loci), whereas the power of isozymes was much lower (based on five loci) (Table 3). In practice, both AFLP and SSR analyses identified the same duplicated accessions. However, the process of scoring the AFLP bands across all 675 accessions was more time-consuming, with an up to 8% scoring error. All cases in which a small genetic distance was obtained between pairs (or larger groups) of trees (similarity above 0.92) needed to be verified by manual re-comparison of the original traces, thereby making the AFLP method for identifying genotypes inefficient in terms of time. Therefore, the microsatellite analysis was by far the most efficient method.

Table 3 Differentiating power of the methods (n=465 accessions)

Most duplicated samples were found in the gene banks of countries with a dense human population and with relatively few natural populations of Black poplar. In Belgium and the UK, more than 50% of the trees in the gene bank were duplicates, while in The Netherlands almost 50% of the trees were duplicates (Table 1). A total of 38 duplicated genotypes were detected among the 675 trees, varying from a twofold to 30-fold duplication. In Belgium, Hungary, The Netherlands, and the UK, single clones were duplicated many times, but in Austria, Spain, and Italy this occurred only once or twice. The distance between duplicated genotypes varied from 0 to 288 km. Surprisingly, considering the number of duplicates within some gene bank collections, no duplicated genotypes were found between countries. Overall, 26% of the 675 trees in European gene banks were duplicated genotypes; thus, the collections could be reduced by 25% merely by removing the duplicates. For further analyses, only the unique genotypes with known geographical origin were retained (Table 1).

Estimation of diversity by country

The 465 accessions were analyzed by AFLP, isozymes, and microsatellites to compare the genetic diversity across the gene bank collections. A total of 108 AFLP bands were scored, of which 106 were polymorphic. The number of alleles for the isozyme loci varied between two and four, whereas those for the microsatellite loci varied from 9 to 25. Descriptive statistics of each gene bank are presented in Table 4. No collection had the highest score for every parameter. The gene banks in France, Austria, Spain, and Italy contain the highest number of unique genotypes (103, 92, 91 and 65, respectively). France shows a high diversity in terms of percentage polymorphic loci (AFLP, isozyme, and SSR), N a (AFLP, isozyme, and SSR), N e (AFLP and SSR), and H (AFLP). Austria shows high diversity in terms of percentage polymorphic loci (isozyme and SSR), N a (isozyme and SSR), N e (SSR), and H e (SSR). Spain shows high diversity in terms of percentage polymorphic loci (AFLP and SSR), N a (AFLP and SSR), N e (AFLP), and H (AFLP) but low diversity in terms of percentage polymorphic loci (isozyme), N a (isozyme), N e (isozyme), and H e (isozyme). Italy shows high diversity in terms of percentage polymorphic loci (AFLP, isozyme, and SSR), N a (AFLP and isozyme), N e (AFLP), and H (e) (AFLP and SSR). The UK, with 16 unique genotypes, shows the lowest diversity for all of the described statistics, followed by Belgium and The Netherlands. Belgium, with only 18 unique genotypes, shows low diversity in terms of percentage polymorphic loci (isozyme), N a (isozyme and SSR), N e (SSR), and H e (SSR). The Netherlands, with 37 unique genotypes, shows low diversity in terms of percentage polymorphic loci (isozyme), N a (isozyme), N e (AFLP), and H (AFLP). Hungary and Germany, with 21 and 22 unique genotypes, respectively, show medium diversity.

Table 4 Descriptive statistics for the molecular data of the gene bank collections, subdivided by country. The three highest values in each column are in bold; the two lowest are in italics

Dendrograms inferred from genetic distances between populations using AFLP or SSR analysis did not indicate grouping according to proximity of countries, except that France and Spain were always in the same cluster (data not shown). An overview of the genetic similarities between gene banks may be obtained by PCA analysis. PCA analysis clearly separates the eastern, the northern, and the southern countries, and groups countries that are geographically close to each other (Fig. 2). The UK is distinct from the other countries.

Fig. 2
figure 2

PCA analysis based on SSR data, subdivided by country. Variation explained by the first principal component (Z1) is 32%; by the second principal component (Z2), 22%. The ellipses indicate the separation between northern, eastern, and southern countries and the island of the UK

For each country, we have analyzed the relationship between the absolute geographical interpoint distance (calculated using the geographical coordinates of the accessions) and the genetic similarity based on data from microsatellites, AFLPs, and isozymes. Mantel tests indicated no significant correlation. This result is consistent with the notion that the genetic diversity in P. nigra is not structured according to absolute geographical distance because dispersal of propagules may be hindered by mountain ranges. In France, for example, the Loire and the Rhône are close to each other, but mountain ranges separate the populations that occur along these two rivers.

Estimation of diversity by region

To what extent can the information obtained from the gene bank collections be used to infer the structure and distribution of diversity in natural populations across Europe? We have classified the 465 accessions according to river systems or mountain regions. Within the same river systems the material from the different countries was retained as separate groups in order to be able to make comparisons along the course of a river. Some natural regions or river systems that were spatially close to one another had to be combined in order to provide sufficient numbers of samples per region. The number of samples partly reflected the differences in sampling activity in the respective countries and the density of the remaining trees. As a result, 19 regions were discerned, each of which largely contained a single river system (Table 2).

Unique SSR alleles were found in regions with the highest number of accessions, such as Italy (the Po and tributaries), the Danube, the Alpes, the Rhône, and the Ebro (Table 5), but it is unclear which region has the highest diversity. The results differ according to the marker system used. Based on AFLP, the highest diversity is found in Belgium, Italy, and the Ebro region, whereas based on SSR, it is highest in the Danube, the Mur, and Italy. Based on isozymes, the Loire, the DanubeH, and Italy scored highest in terms of polymorphic loci and observed number of alleles. However, based on N e and the expected heterozygosity, the regions of the Rhine, the DanubeH, and the Dutch coast had the highest score. The differences between the marker systems could be due to the fact that AFLP analysis with 108 markers allows a more genome-wide analysis than the limited number of loci analyzed by SSR and isozyme analyses.

Table 5 Descriptive statistics for molecular data of the gene bank collections, subdivided by region (Table 2). The three highest values in each column are in italics

Observed heterozygosity values, calculated for SSRs and isozymes, were nearly always higher than those of expected heterozygosity. The slight excess in heterozygotes is also reflected in the negative Fis value (Table 6). Observed heterozygosity values could not be calculated from AFLP data because of their dominant character. Instead, gene diversity values were calculated.

Table 6 Distribution of diversitya within and among the 19 regions

The population differentiation Fst was 0.11 for microsatellites, 0.16 for isozymes, and 0.28 for AFLPs (Table 6). The Fst value of the combined SSR and isozyme dataset differed only slightly (6.5%) from the Fst value of the SSR data alone. The Fst value for the combined SSR and isozyme dataset calculated with Arlequin was in the same range (0.08, P=0.01). To check whether these values were biased by an unequal sample size between regions, Pearson correlation coefficients were calculated between the number of accessions and N a , N e , H e (or H for AFLP data), and H o (data not shown). With the exception of the correlation between the number of accessions and N a , correlation coefficients were low, indicating no bias due to the number of accessions.

Dendrograms, based on AFLP data and on the combined dataset of SSRs and isozymes, show the genetic distance between regions (Fig. 3). The two dendrograms largely agree. It is clear from the dendrogram based on AFLPs that all regions from the same country cluster together. Major clusters are formed according to geography. The four French and the two Spanish regions cluster together. Belgium clusters together with the Austrian and Dutch regions. Italy and the UK cluster separately, which makes sense because these countries are isolated by the Alps and the sea, respectively. The position of DanubeH is slightly biased due to the fact that only two AFLP primer combinations were analyzed for this region. The dendrogram based on SSRs and isozymes had an even higher contrast between south and north-northeast than the dendrogram based on AFLP results. In contrast to the dendrograms based on AFLP, clusters based on SSRs and isozymes reflected the groupings based on the river systems better. The Rhine from Germany and The Netherlands cluster together. The same cluster comprises the IJssel, which is a side river of the Rijn. The Danube regions of Austria and Hungary are also highly similar but are separated from the Danube in Germany. The region of the Dutch coast clusters together with Belgium. These two regions include accessions from the rivers Schelde and Maas.

Fig. 3
figure 3

Dendrograms based on Nei’s genetic distance. Left Based on AFLP data, Right based on isozyme and SSR data

PCA analysis based on AFLP data (Fig. 4, upper panel) reveals only genetic differentiation between Spain (Ebro, Tajo-Duero), Italy, and the other regions. The PCA analysis of the regions based on SSR data (Fig. 4, lower panel) is consistent with the result of the dendrogram. The regions of the Rijn (red arrow) are closely grouped together. Regions from the same country cluster together. Mediterranean countries have small or negative y-values (Z1 axis), whereas the more eastern countries (Austria and Hungary) have positive y-values. The three regions of the Danube located in three different countries also group together (indicated by blue arrows). The region of the Pyrénées in France is close to the region of the Ebro, south of the Pyrénées in Spain and the region of the Alpes is closely located to Italy. PCO analysis based on the SSR data from the individual trees clearly shows that genetic differentiation is largest in the region of the Danube in Austria, in both regions of Spain, in all the French regions, and in Italy.

Fig. 4
figure 4

PCA plots. Upper panel Based on AFLP data [% of the variation explained by the first principal component (Z1) is 23%; by the second principal component (Z2), 14%], lower panel based on SSR data [% of the variation explained by the first principal component (Z1) is 22%; by the second principal component (Z2), 17%]

Discussion

Duplication of genotypes

We have studied the genetic diversity in nine gene bank collections of Black poplar. A quarter of the samples are represented by duplicated genotypes. The level of genotype duplication in the Belgian, UK, and Dutch gene banks were 78% (63/81), 76% (50/66), and 43% (30/70), respectively (Table 1). These countries have experienced a particularly steep decline in the number of P. nigra populations due to the loss of riparian sites. In addition, due to the canalization of rivers, the relict populations have become less exposed to river dynamics, which may have resulted in a greater proportion of vegetative than sexual reproduction (van Splunder et al. 1995; Arens et al. 1998). In Belgium, where very few relict populations remain, virtually all of the existing trees were sampled. In the UK, the gene bank is a combination of three separate collections. One of them, the Talybont gene bank, is largely based on trees from Wales (Cottrell et al. 2002) and had very low diversity, in that only three different genotypes were found in 30 trees. The other two British collections, which were taken from a larger area, consisted of approximately 50% duplications.

The high percentage of duplication in certain gene banks may also reflect the intended purpose of the gene bank when it was established. In The Netherlands, Hungary, and Germany, the gene bank material was collected for breeding purposes. The gene banks in Austria, Italy, and Belgium had the sole purpose of creating a collection that was as diverse as possible for conservation purposes. In Spain, the Ebro collection was mainly established for breeding purposes, whereas the samples from the Tajo and Duero region were intended for genetic conservation. The collection of material from superior trees for breeding purposes may have led unconsciously to the repeated selection of the same genotype.

The fact that trees belonging to the same genotype grew up to 142 km apart in The Netherlands and up to 288 km from each other in the UK probably indicates that cuttings of superior genotypes have been selected, transported, and planted long before the gene bank collection was made. This observation indicates that wide geographic sampling for gene bank material does not guarantee that multiple sampling of the same genotype will be avoided. The results demonstrate the practical usefulness of testing prospective trees with molecular markers prior to inclusion into gene banks in order to avoid duplications of genotypes.

Comparison of the different marker systems

Microsatellites proved to be the best markers for both the assessment of genetic diversity as well as for the identification of clonal duplication. Moreover, once the SSR markers are developed, the analysis is very simple. This is especially so when samples are analyzed on an ABI sequencer: three different microsatellite combinations, each labeled with a different fluorescent dye, can be multiplexed and run simultaneously. The sensitivity of AFLP polymorphisms was higher, but this marker system is technically more demanding. Moreover, because many more bands have to be scored, this study shows that the errors in interpreting presence and absence of bands hamper the identification of identical genotypes. On the other hand, three isozyme loci—Pgi-b, Pgm-b, and Lap—proved to be very useful for the detection of hybrids. Only 3% (21/675) were detected that were not identified by morphological analysis. The low number of hybrids in the gene banks indicates that the morphological descriptions used to identify pure P. nigra specimens are successful in discriminating non-hybrid P. nigra trees from hybrids. We do not know to what extent the descriptors were too specific. Some non-hybrid P. nigra trees might have been incorrectly classified as hybrids and, therefore, not collected because we did not include trees classified as hybrids on the basis of the morphological descriptions.

Estimation of diversity

Descriptive statistics indicated that the genetic diversity of P. nigra is largest in the collections of the southern countries (France, Italy, and Spain). Several factors may explain this observation: the topography of the country, the variety of climate and soil characteristics, the number of different river systems, the (human) population density, and the criteria on which the gene bank was established. France, Italy, and Spain have a larger surface area, a larger variety in climatic regions, and a lower (human) population density than the northern countries. In these three countries, many rivers remain uncanalized, and this would promote the survival of natural populations of P. nigra. The river systems in Belgium, Germany and The Netherlands are partly connected, which facilitates gene flow and leads to a higher similarity among populations (Figs. 3, right and 4, lower panel). In addition, sampling criteria obviously had a great influence on the genetic diversity found, just as on the number of identical clones.

Because of the above factors, a subdivision based on country of origin is not the most appropriate approach for understanding the distribution of the diversity of P. nigra trees in the natural populations from which the samples were taken. Hence, when the genetic similarity in relation to the geographic distance was analyzed, the Mantel test showed no correlation between the geographic distance and genetic similarities, indicating that factors other than geographical distance influence genetic distance. Such factors may include the structure of the river catchments and the existence of mountain barriers such as the Alps, the Pyrénées and the Massif Central. When the accessions are grouped on the basis of smaller geographic regions, the genetic differentiation between populations increases relative to the subdivision based on the country of origin. The Fst/Gst value shows a moderate level of genetic differentiation between the regions (0.12 for SSRs + isozymes, 0.28 for AFLP). The Fst value calculated from the combined SSR and isozyme data, compared to the Fst value calculated from the SSR data alone, indicates that the isozyme analysis is less useful for genetic differentiation studies. A slight excess of heterozygotes, calculated for SSRs and isozymes, was observed. This result is probably not an artifact of the collection activity— because the trees were collected from different sites, they cannot be considered as a sample from one continuous population, and the Wahlund effect is expected to cause a shortage of heterozygotes. The excess is possibly due to selection (more heterozygous trees being fitter than less heterozygous trees).

A PCA analysis showed that the variation between regions conforms to natural geography. Descriptive statistics and a PCO analysis based on SSR data of individual regions (Fig. 5) both indicated that the genetic diversity within the southern regions was larger than that within the northern ones. However, Austria showed diversity levels almost equal to those found in the southern regions, perhaps due to the low (human) population density and an appropriate sampling strategy. Alternatively, the relatively high diversity may be indicative of an ice age refugium in the Balkans, next to putative ones in Italy and Spain. Direct evidence for such refugia in Black poplar was obtained in a study on the cpDNA haplotype distribution across Europe (J.E. Cottrell, personal communication).

Fig. 5
figure 5

PCO analysis based on SSR data of individual trees. For clarity, the results are plotted by country. The first and second principal coordinate, Z1 and Z2, explain 8% and 6% of the variation, respectively

Sampling effort

Can we assess which regions are underrepresented in the current gene bank collections? Two measures may be considered as indicative of the relative degree of sampling effort in a country: (1) the number of duplicated trees in the collection—i.e., trees of the same genotype, but not necessarily sampled at the same location; (2) the average genetic similarity between the trees that have unique genotypes. A high number of duplicates usually means that trees growing close to each other were collected, although this was not always the case in The Netherlands, and the UK. Low genetic similarities could indicate that large parts of the country were not sampled. One could argue that the optimum sampling effort (not the most cost-efficient one) is obtained when genetic distances among trees in the collection are small but duplicates are absent. Sampling efforts in Belgium, the Netherlands, and the UK were clearly not optimal, with many duplicates. The genetic similarity between unique genotypes is largest in Belgium and in the UK (Table 1). Spain and France have the most optimum sampling effort, with hardly any duplicated trees and a genetic similarity among the trees of 0.43 and 0.39, respectively. Other countries, notably Austria and Germany, have lower average genetic similarities, so a significant part of the diversity may not have been sampled. Italy falls in between. It is also useful to take into account how the collections were made. Gene banks were not constructed in a very rational way and, importantly, large areas in several countries remain unsampled; for example, Northern France, Central England (and the whole of Ireland), and Western Spain (Fig. 1), the latter deserving special attention.

Extrapolation of the results to understand the structure and distribution of diversity in the in-situ populations of Black poplar in Europe is difficult for several reasons. Firstly, many of the gene banks tended to concentrate sampling in one particular area. For example, in Spain almost 80% of the samples were collected from the Ebro valley and, in France, most of the sampling effort was performed in the south of the country. Secondly, no rational strategy was applied to capture the maximum diversity held within a country at the time of gene bank construction, partly because some of the gene banks had been set up for breeding rather than conservation purposes. Thirdly, information on the original location of many of the samples in the gene banks has been lost, making it impossible to link their molecular data to their geographical origin. Finally, the high incidence of duplication of genotypes that occurred in some countries impaired comparisons between countries when the numbers of genotypes differed in each country. The occurrence of duplicates, which were collected on different sites in the country, indicates that not all samples are from natural populations, untouched by man. Nevertheless, despite these difficulties, a careful analysis of the data has extended our understanding of the distribution of the natural genetic diversity in this species.

Knowledge of the unique genotypes contained in a gene bank is necessary when these gene banks are to act as sources for the establishment of new in-situ populations in the future. A moderate genetic differentiation is observed between the regions, with the highest diversity found in South Europe. Our results suggest that additional sampling in these countries should be carried out to be representative for the genetic differentiation in the respective countries.