Introduction

Genebanks conserve the genetic diversity of crop species, which forms the raw material of plant breeding. If possible, genetic diversity is conserved in the form of accessions: batches of seed sampled from wild populations, traditional landraces, modern cultivars, genetic stock or other research material.

In genebank collections of self-fertilising species the genetic diversity is largely distributed between accessions. The exception may be landraces of a self-fertilising species which may be composed of mixtures of pure lines. Between accessions, genetic diversity has been studied for many crop species. A survey of methods for the analysis of this form of genetic diversity is given by Mohammadi and Prasanna (2003). Depending on the species, considerable diversity can be found within accessions. If it concerns a cross-fertilising species propagated using open pollination, the within-accession diversity will generally be large. If it concerns a strictly self-fertilising species the within-accession diversity, especially of modern varieties, will be small or even absent. Within-accession diversity has great implications for both the conservation and the use of accessions.

Homogeneous accessions of self-fertilising species may be considered to consist of one genotype. All plants of the accession are equal and, in principle, one plant could suffice to regenerate an accession. However, in the genebank practice, always more plants are used, e.g. in order to guarantee enough seed. If within-accession diversity is present, always a greater number of plants are required to regenerate an accession since the variants occurring in the accession must be represented in the sample that is used for regeneration. In addition, if redundancy is to be reduced in a collection, within-accession diversity can cause many complications. For example, accessions will no longer be either completely identical or entirely different, but may partly overlap (van Treuren et al. 2004). It is generally assumed that within-accession diversity occurs mainly in cross-fertilising species and, to a lesser extent, in wild populations and landraces of a self-fertilising species.

When molecular markers came available it appeared that in modern varieties of a self-fertilising species, some residual diversity could be found. For example, van Treuren and van Hintum (2001) showed, using AFLP markers, that even accessions of modern barley varieties contain diversity, although this crop species is usually considered as completely self-fertilising. This diversity, less than 2% of the polymorphisms, can be expected to be either point mutations or remnants of the original diversity in the cross that yielded the variety.

The next logical question is how to determine the amount of diversity present within accessions. Morphological characterization can only show very few characteristics and variants, and often suffers from environmental influences. Therefore, biochemical or molecular markers are more appropriate. For genotyping plant material, various marker systems have been used: isozymes, RAPDs, RFLPs and currently also AFLPs and micro-satellites. The use of molecular markers for genebank management has been discussed in many papers such as Bretting and Widrlechner (1995) and van Hintum and van Treuren (2002). Marker systems yield data sets with varying number of markers. In addition, the reliability and reproducibility varies between markers systems. After the choice of the marker system, the question remaining is how to sample accessions and how to quantify diversity within accessions.

In this paper, six crop types of Lactuca sativa L. (lettuce) will be investigated and compared with regard to genetic diversity within accessions. In the set of data that could be used for this study, each accession was represented by two individuals. The data used in this study were generated in a much larger project aimed at characterising the entire Centre for Genetic Resource (CGN)’s lettuce collection with molecular markers (van Hintum 2003). In this paper a measure of within-accession diversity is proposed, which is closely related to the gene diversity measure of Nei (1987). Statistical properties of these measures are investigated. Comparison of crop types requires determination of the inaccuracy of the estimate of within-accession diversity. The measure is used to compare six crop types of lettuce.

Materials and methods

Plant material

The plant material consisted of the complete CGN collection of L. sativa (as present at the time of sampling). Table 1 lists the numbers of accessions per lettuce crop type that were analysed in the current study. Two plants of each accession were analysed.

Table 1 Numbers of accessions of different types of L. sativa

DNA extraction

All accessions were grown in the greenhouse, one accession per pot. When the plants had two mature leaves, 1.5 cm2 of leaf tissue was sampled from the third immature leave. Most accessions reached this stage 2 weeks after sowing. DNA was extracted from the sample leaf tissue. For DNA extraction, a protocol based on van der Beek et al. (1992) was used.

AFLP analysis

The AFLP protocol followed the procedures described by Vos et al. (1995). An AFLP pre-screening was carried out in which 4 EcoRI/MseI +3/+3 and 20 EcoRI/MseI +3/+4 AFLP primer combinations were screened on a DNA pool containing 10 randomly chosen accessions. Based on the number of AFLP markers per lane and the distribution of the AFLP markers in a lane, the combinations of primer EcoRI + ACA with primers MseI + CTAT, MseI + CTGT and MseI + CTTG were selected. All individuals were genotyped on the basis of presence/absence of AFLP bands for these three primer combinations.

Measuring within-accession diversity

Diversity measure

The diversity measure that will be considered is based on the probability p(a,m) that two random individuals from an accession a are different with regard to a marker m. This probability is directly related to the band frequency f(a,m) of marker m in accession a. The relationship takes the form:

$$ \begin{aligned}{} p{\left( {a,m} \right)} = & 1 - f{\left( {a,m} \right)}^{2} - {\left( {1 - f{\left( {a,m} \right)}} \right)}^{2} \\ = & 2f{\left( {a,m} \right)}{\left( {1 - f{\left( {a,m} \right)}} \right)}. \\ \end{aligned} $$
(1)

According to Eq. 1, the probability p(a,m) as a function of the band frequency f(a,m) has a parabolic form. It is equal to zero if the band frequency f(a,m) is either zero or unity, and 0.5 if the band frequency f(a,m) is equal to 0.5. This formula is identical to the one given by Nei (1987) for heterozygosity. However, it should be noticed that in Nei’s formula f refers to allele frequencies rather than band frequencies.

The band frequency may vary between accessions and, perhaps more importantly, between lettuce types. The average probability \( \ifmmode\expandafter\bar\else\expandafter\=\fi{p}{\left( m \right)} \) that two random individuals from one accession are different with regard to marker m is given by

$$ \ifmmode\expandafter\bar\else\expandafter\=\fi{p}{\left( m \right)} = 2\ifmmode\expandafter\bar\else\expandafter\=\fi{f}{\left( m \right)}{\left( {1 - \ifmmode\expandafter\bar\else\expandafter\=\fi{f}{\left( m \right)}} \right)} - 2\text{var} _{a} {\left( {f{\left( {a,m} \right)}} \right)}, $$
(2)

in which \( \ifmmode\expandafter\bar\else\expandafter\=\fi{f}{\left( m \right)} \) is the average band frequency of marker m with regard to variation between accessions and var a (f(a,m)) the variance of the band frequency of marker m with regard to variation between accessions. As a consequence, variation between accessions with regard to band frequency will deflate the average probability that two random individuals from one accession are different.

The type of statistical distribution, which can be used to describe variation in band frequency between accessions or markers in a general way, is the beta distribution (Mood and Graybill 1963). According to this distribution the between-accession variance of the band frequency of marker m may be written as

$$ \text{var} _{a} {\left( {f{\left( {a,m} \right)}} \right)} = \phi _{a} {\left( m \right)}\ifmmode\expandafter\bar\else\expandafter\=\fi{f}{\left( m \right)}{\left( {1 - \ifmmode\expandafter\bar\else\expandafter\=\fi{f}{\left( m \right)}} \right)}, $$
(3)

in which φ a (m) is a positive constant. The variance of the band frequency between accessions is proportional to the variance as expected from the average band frequency. As a consequence, by combining Eqs. 2 and 3, the average probability \( \ifmmode\expandafter\bar\else\expandafter\=\fi{p}{\left( m \right)} \) that two random individuals from one accession are different with regard to marker m may be written as

$$ \ifmmode\expandafter\bar\else\expandafter\=\fi{p}{\left( m \right)} = 2{\left( {1 - \phi _{a} {\left( m \right)}} \right)}\ifmmode\expandafter\bar\else\expandafter\=\fi{f}{\left( m \right)}{\left( {1 - \ifmmode\expandafter\bar\else\expandafter\=\fi{f}{\left( m \right)}} \right)}. $$
(4)

The average probability that two random individuals from one accession are different as a function of the band frequency will follow a more or less quadratic shape. The parameter φ is closely related to the correlation of genes within individuals or inbreeding coefficient as defined by Weir and Cockerham (1984). It should be noticed again that the parameter φ refers to bands rather than alleles.

Statistical analysis (1): estimation of φ a (m)

For marker m the total variance of the marker data, i.e. the variance ignoring the influence of classifying factors such as accession, is equal to,

$$ V_{{\text{T}}} {\left( m \right)} = \ifmmode\expandafter\hat\else\expandafter\^\fi{f}{\left( m \right)}{\left( {1 - \ifmmode\expandafter\hat\else\expandafter\^\fi{f}{\left( m \right)}} \right)}, $$

in which \( \ifmmode\expandafter\hat\else\expandafter\^\fi{f}{\left( m \right)} \) represents the observed band frequency of marker m. This variance can be split into two components, one component representing the variation between accessions V B(m), the other component representing the variation within accessions V W(m). The parameter φ a (m) can be estimated by

$$ \ifmmode\expandafter\hat\else\expandafter\^\fi{\phi }_{a} {\left( m \right)} = \frac{{V_{{\text{B}}} {\left( m \right)}}} {{V_{{\text{T}}} {\left( m \right)}}} = \frac{{V_{{\text{B}}} {\left( m \right)}}} {{V_{{\text{B}}} {\left( m \right)} + V_{{\text{W}}} {\left( m \right)}}}. $$
(5)

For each marker m, the variance components V B(m) and V W(m) can be estimated using residual maximum likelihood (REML) (Paterson and Thompson 1971). The calculations can be carried using the REML facilities of Genstat (Genstat Committee 2003).

It should be noticed from the estimate \( \ifmmode\expandafter\hat\else\expandafter\^\fi{\ifmmode\expandafter\bar\else\expandafter\=\fi{p}}{\left( m \right)} \) of the average probability \( \ifmmode\expandafter\bar\else\expandafter\=\fi{p}{\left( m \right)} \) that two random individuals from one accession are different with regard to marker m and may be written as twice the estimate of the within-accession variance component \( \ifmmode\expandafter\hat\else\expandafter\^\fi{V}_{{\text{W}}} \) for marker m with regard to variation with accessions, i.e.

$$ \ifmmode\expandafter\hat\else\expandafter\^\fi{\ifmmode\expandafter\bar\else\expandafter\=\fi{p}}{\left( m \right)} = 2\ifmmode\expandafter\hat\else\expandafter\^\fi{V}_{{\text{W}}} . $$

Statistical analysis (2): comparison of lettuce types

In order to be able to interpret the difference between lettuce types, two different analyses were carried out. First, for each combination of lettuce type and AFLP marker the average band frequency was calculated. Using these average band frequencies a principal component analysis (Chatfield and Collins 1980) was carried out to show diversification between lettuce types with regard to band frequencies of AFLP markers.

Secondly, for each accession the number of AFLP markers for which the two plants of the accession were different was determined. At the same time, for each accession the number of AFLP markers for which the two plants had marker observations was determined. Using these numbers, a comparison of lettuce types with regard to average within-accession diversity was made using a generalised linear model for overdispersed binomial data (McCullagh and Nelder 1989; Williams 1982).

All calculations were carried out using Genstat (Genstat Committee 2003).

Effects of errors in the data

Estimates of diversity measures may be affected by errors in the data. Suppose, p 0 denotes the true probability that two individuals within an accession are different with regard to a marker. Furthermore, suppose that errors occur at random at a rate equal to r. Then, the ‘realised’ probability p r that individuals within an accession are different is equal to

$$ \begin{aligned}{} p_{{\text{r}}} = & 2r{\left( {1 - r} \right)} + p_{0} {\left( {1 - 4r{\left( {1 - r} \right)}} \right)} \\ = & 2r{\left( {1 - r} \right)} + 2f{\left( {1 - f} \right)}{\left( {1 - 4r{\left( {1 - r} \right)}} \right)}, \\ \end{aligned} $$

in which f denotes the band frequency. The relative error expressed as a percentage,

$$ 100(p_{{\text{r}}} - p_{0} )/p_{0} , $$

can be used to gauge the effect of random errors on the gene diversity. It should be noticed that if f=0.5, random errors have no effect on the probability that two individuals within one accession are identical.

Results

Band frequencies

In total, 151 polymorphic AFLP markers were scored: 54, 32 and 65 for the three combinations of primer EcoRI + ACA with primers MseI + CTAT, MseI + CTGT and MseI + CTTG, respectively. Figure 1a shows a histogram of the band frequencies of all 151 polymorphic AFLP markers. Most AFLP markers have a band frequency either close to zero or close to unity (39 AFLP markers have a band frequency smaller than 0.01 and 28 AFLP markers have a band frequency larger than 0.99). Figure 1b shows that variation in band frequency between accessions is much smaller compared to that between markers. This may be the result of the fact that the majority of AFLP markers have a band frequency close to zero or unity and, as a consequence, do not vary across accessions.

Fig. 1
figure 1

Histogram of band frequencies: a AFLP markers, b accessions

Diversity within accessions

For each combination of accession and AFLP marker it was determined whether the two individuals were different. For each AFLP marker the proportion of accessions, for which the two plants were different, was calculated. A histogram of these proportions is shown in Fig. 2a. For each accession the proportion of AFLP markers for which the two plants were different was calculated. A histogram of these proportions is shown in Fig. 2b. For 14 AFLP markers, plants were identical within accessions for all 1,390 accessions. For 938 accessions, no difference between the two plants was found with regard to all AFLP markers.

Fig. 2
figure 2

a Histogram of the proportion of accessions for which the two plants were different. b Histogram of the proportion of AFLP markers for which the two plants were different

Relationship between within-accession diversity and band frequency

For each marker the value of the between-accession variance with regard to band frequency has been plotted against the total variance in Fig. 3a. This figure shows that the parameter φ, i.e. the ratio of the between-accession variance and the total variance, is more or less constant across markers. The estimate \( \ifmmode\expandafter\hat\else\expandafter\^\fi{\phi } \) of φ [obtained by linear regression with a fixed intercept at (0,0)] is 0.955 (SE=0.0014).

Fig. 3
figure 3

a The variance in band frequency between accessions versus the total variance. b The proportion of accessions for which the two plants are different versus the band frequency

For each marker the value of the proportion of accessions, for which the two plants are different, has been plotted against the band frequency in Fig. 3b. This figure shows that in general the larger proportions of accessions for which the two plants are different are found if the band frequency is intermediate. The quadratic shape of the relationship shown in Fig. 3b is in agreement with previously derived properties. The solid curve is a graphical representation of expression 4 using \( \ifmmode\expandafter\hat\else\expandafter\^\fi{\phi } = 0.955. \)

Principal component analysis of lettuce types with regard to band frequency

A principal component analysis was carried out on the band frequencies obtained for combinations of L. sativa types and AFLP markers. Figure 4 shows the results for the first two axes. These account for 55.3 and 23.1% of the variance, respectively.

Fig. 4
figure 4

Results of a principal component analysis on average band frequencies: a scores of the lettuce types (indicated by abbreviated names), b loadings of the markers (indicated by marker numbers)

Figure 4a shows that the first axis mainly represents a difference between type Stalk lettuce and the others, while the second axis represents a difference between Butterhead lettuce and the other types. A major conclusion is that AFLP markers that have very different band frequencies in different lettuce types are present. As a consequence these markers will contribute differently in different lettuce types to the within-accession diversity measures.

Figure 4b shows the loadings of the AFLP markers for the first two principal axes. The loadings are the contributions of the individual AFLP markers to the first and second principal axes. Figure 4b shows that a large number of AFLP markers do not contribute to differences between the lettuce types; the loadings of these markers are found close to the origin (0,0). Some AFLP markers show large differences in band frequency between Stalk lettuce and the other lettuce types (e.g. AFLP markers with numbers 72 and 42 and AFLP markers close to these markers). Other markers show large differences in band frequency between Butterhead lettuce and the other lettuce types (e.g. AFLP markers with numbers 49 and 103 and AFLP markers close to these markers).

Comparison of lettuce types

For each lettuce type the value of the average proportion of accessions with a difference within accession is shown in Table 2. Each value is accompanied by the corresponding standard error. The type Stalk lettuce has the highest within-accession diversity, whereas type Crisp lettuce has the smallest within-accession diversity.

Table 2 Estimates of the average percentage of accessions with a difference within accession for the different lettuce types accompanied by standard errors

The effect of errors on within-accession diversity

Figure 5 shows the relative percent error as a function of the band frequency if the error rate is 1%. It shows that errors have far more effect if band frequencies are close to zero and unity. They have very little or no effect if the band frequency is close to 0.5.

Fig. 5
figure 5

The relative percent error as a function of the band frequency if the error rate is 1%

Discussion

The measure of within-accession diversity considered in this paper is very closely related to the gene diversity measure of Nei (1980). Nei’s measure refers to allele frequencies rather than band frequencies. Band frequencies could be transformed into allele frequencies by assuming the accessions are completely homozygous in the case of a self-fertilising species or in Hardy–Weinberg equilibrium in the case of an out-breeding species. However, this is unnecessary for the purpose of this analysis or, in general, for maintaining genebank material.

The probability that two individuals of one accession are different with respect to the set of AFLP markers obtained in this study is on average approximately 1%. Results (see Table 2) were obtained using a generalised linear model for overdispersed binomial data. The differences between the lettuce types can be traced back to the different origins of the accessions. The accessions of Crisp and Butterhead lettuce are mainly modern breeder’s varieties; in order to obtain breeder’s right protection the varieties have to be homogenous. The accessions of Stalk and Cutting lettuce are mainly landraces. They are the opposite of Crisp and Butterhead lettuce with regard to domestication and crop improvement (I.W. Boukema, personal communication).

A large proportion of accessions shows no within-accession diversity. In such a case it is not sensible trying to estimate the within-accession diversity for each accession separately with great accuracy. In most cases the within-accession diversity will be estimated as zero even if the number of individuals is increased. A sensible strategy would be to estimate the within-accession diversity by the proportion accessions of which the individuals show differences, on the basis of small number of individuals. A value of two would be a good suggestion.

Cross-fertilising species naturally exhibit a much larger within-accession diversity than found in this study. In such case, estimation of the within-accession diversity may require another approach with more than two plants per accession and based on the proportion of pairs of different individuals per accession. The proportion P of pairs of individuals that are different with regard to a marker is equal to

$$ P = 2F{\left( {N - F} \right)}/N^{2} , $$

where F and N denote the number of individuals carrying a band and the total number of individuals in an accession, respectively.

In this study, the majority of AFLP markers have band frequencies close to zero or unity. An intrinsic property of these markers is that they contribute very little to the within-accession diversity. Moreover, they are more vulnerable to typing errors than markers with band frequencies around 0.5. The effect of errors can be reduced by selecting markers with a band frequency between 0.3 and 0.7 within the group of accessions considered.

The principal component analysis shows that lettuce types are different with regard to band frequency for a number of AFLP markers, i.e. the markers with high absolute loadings. Since within-accession diversity is directly linked to band frequency, this means that markers may contribute differently to the average within-accession diversity of different lettuce types. It should be noticed that usually markers with loadings close to zero are markers with average band frequencies close to zero or unity in the entire group of accessions.

The methods described in this paper can be applied using standard statistical software. However, general application of the methods would require further testing of the model on data from other self-fertilising species, e.g. barley.

The accessions of lettuce considered in this paper exhibit very little within-accession diversity. An important problem for genebank curators is whether they want to maintain these low levels of within-accession diversity in future generations. Regeneration of accessions would inevitably require large numbers of plants.