Introduction

Since the 1990s, a large number of DNA markers has been available for sugar beet (e.g., Barzen et al. 1992, 1995; Pillen et al. 1992, 1993). These markers have been used to map major genes for bolting (El-Mezawy et al. 2002), fertility restoration (Pillen et al. 1993), hypocotyl color (Barzen et al. 1992), and resistance against nematodes (Cai et al. 1997). However, the majority of traits of economic interest in sugar beet breeding programs such as beet yield, sugar yield, as well as quality parameters, are quantitatively inherited (Weber et al. 2000).

In sugar beet, linkage mapping was employed to dissect quantitative traits into underlying genetic factors, called quantitative trait loci (QTL). Weber et al. (1999, 2000) detected QTL for sugar yield based on two segregating populations grown in different locations. Schneider et al. (2002) identified QTL for sugar yield and quality parameters based on one segregating population using expressed sequence tag related markers.

The major limitations of linkage mapping approaches are the poor resolution in detecting QTL and that only two alleles at any given locus can be studied in biparental crosses of inbred lines (Flint-Garcia et al. 2003). Association mapping methods, which were successfully applied in human genetics to detect genes coding for human diseases (e.g., Ozaki et al. 2002), promise to overcome these limitations (Kraakman et al. 2004). Therefore, in plant genetics several attempts have been made for detecting QTLs by such methods (e.g., Breseghello and Sorrells 2006; Wilson et al. 2004).

In comparison with linkage mapping approaches, however, association mapping approaches have a low power to detect QTL in genome-wide scans (Yu and Buckler 2006). Yu et al. (2008) proposed the nested association mapping (NAM) strategy for plants, which combines the high power to detect QTL in genome-wide approaches of linkage mapping with the high mapping resolution of association mapping approaches. This strategy requires the establishment of segregating populations derived from several crosses of parental inbreds in a systematic manner. For plant breeders, however, it is an appealing idea to exploit information on the individuals routinely derived from multiple, related crosses in plant breeding programs for QTL detection using a similar approach called association mapping in multiple segregating populations (AMMSP). To our knowledge, no study has investigated the applicability of this mapping strategy in a plant breeding context based on empirical data.

The objectives of our research were to (1) examine the applicability of AMMSP in a plant breeding context based on segregating populations of various size of sugar beet, (2) compare different biometric approaches for AMMSP, and (3) detect markers with significant main effect across locations for nine traits in sugar beet.

Materials and methods

Plant materials

Our study was based on 765 F2, two F3, and one F4 sugar beet (Beta vulgaris L.) genotypes which were randomly derived from 19 crosses among diploid elite sugar beet clones. The number of progenies from each cross ranged from 3 to 72 (Table 1). Testcross progenies were produced by crossing the 768 F n genotypes (n = 2, 3, 4), for which pedigree information was available up to eight generations back (Fig. 1), to one diploid tester. All plant material used in this study was provided by the breeding company Strube–Dieckmann.

Table 1 Description of the 19 segregating populations underlying our study
Fig. 1
figure 1

Pedigree relatedness of the 19 segregating populations of sugar beet underlying our study

Field experiments

In 2004, all 768 testcross progenies were evaluated in a series of 24 field trials that included a set of four common checks. Two of the 24 trials were grown in 12 locations and the others in 11 locations. Not all series of trials were performed in the same locations and, thus, the total number of locations was 21 with trials located in France (10 locations), Germany (7 locations), Belgium (2 locations), and the Netherlands (2 locations). The number of trials per location ranged from 6 to 24. The experimental design of each trial was a 6 × 6 lattice design with two replicates per location. The 32 testcross progenies and four checks of each trial were grown in three-row plots of 7 m length and a plant density of 92 063 plants ha−1.

Data was recorded for beet yield (BY; Mg ha−1). Content of potassium (K; mmol kg−1), sodium (Na; mmol kg−1), α-amino nitrogen (N; mmol kg−1), and sugar (SC; %) were measured as described by Burba and Puscz (1976). Impurity content (IC; %) was calculated as [0.343 × (K + Na) + (0.094 × N) + 0.29]/SC × 100. White sugar content (WSC; %), sugar yield (SY; Mg ha−1), and white sugar yield (WSY; Mg ha−1) were calculated as SC − IC, BY × SC, and BY × WSC, respectively.

Molecular marker analyses

A subset of the 768 F n entries comprising 369 F2 progenies from nine crosses (Table 1) was fingerprinted by Strube–Dieckmann according to standard protocols with 49 simple sequence repeat markers and nine restriction fragment length polymorphism markers. These markers were randomly distributed across the sugar beet genome with an average marker distance of 8.4 cM. Map positions of all markers were based on the linkage map of Strube–Dieckmann (unpublished data).

Statistical analyses

The entries of our study were randomly derived from 19 crosses and, thus, main effects for entries as well as their interaction with the locations were regarded as random in the one-step joint linkage association analyses. Furthermore, we also performed two-step AMMSP in which entries were regarded as random in the second-step but were regarded as fixed in the first step (cf., Piepho and Möhring 2007).

Phenotypic data analysis

The phenotypic data of each of the 21 locations were first analyzed separately based on the statistical model:

$$ y_{ikno} = \mu + g_i + t_{k} + r_{nk} + b_{onk} + e_{ikno}, $$
(1)

where y ikno was the phenotypic observation for the ith sugar beet genotype in the oth incomplete block of the nth replication of the kth trial, μ was an intercept term, g i was the genetic effect of the ith genotype, t k was the effect of the kth trial, r nk was the effect of the nth replication of the kth trial, b onk was the effect of the oth incomplete block of the nth replication of the kth trial, and e ikno was the residual. Except g i , all effects were regarded as random. For the genotypes (entries and checks), adjusted entry means were calculated for the jth location according to:

$$ M_{ij} = \widehat{\mu_j} + \widehat{g_{ij}}, $$
(2)

where \(\widehat{\mu_j}\) was the estimate for the intercept at the jth location and \(\widehat{g_{ij}}\) was the estimate of the genetic effect of the ith sugar beet genotype at the jth location both estimated based on the statistical model in Eq. (1). A hierarchical cluster analysis was performed on the correlation coefficient of adjusted entry means M ij among all pairs of locations.

A combined analysis of plot-level phenotypic data across locations was performed based on the statistical model:

$$ y_{ijkno} = \mu + c_i + d_{i}g_i + l_j + (cl)_{ij} + d_{i}(gl)_{ij} + t_{kj} + r_{njk} + b_{onjk} + e_{ijkno}, $$
(3)

where y ijkno was the phenotypic observation for the ith sugar beet genotype at the jth location in the oth incomplete block of the nth replication of the kth trial, μ was an intercept term, c i was a factor with a single level for each check and a single level for all entries, d i was a indicator variable with d i = 0 for checks and d i = 1 for entries, g i was the genetic effect of the ith genotype, l j was the effect of the jth location, (cl) ij was the interaction effect of the ith check and the jth location, (gl) ij was the interaction effect of the ith entry and the jth location, t kj was the effect of the kth trial at the jth location, r njk was the effect of the nth replication of the kth trial at the jth location, b onjk was the effect of the oth incomplete block of the nth replication of the kth trial at the jth location, and e ijkno was the residual. For details regarding the use of dummy coding for separating checks and entries see Piepho et al. (2006). Because locations were purposefully selected, l j was regarded as fixed. For estimation of adjusted entry means over all trials and locations, we regarded d 1i c i and d 2i g i as fixed and all other effects as random. Error variances were assumed to be heterogeneous among locations. For each of the 768 entries, an adjusted entry mean M i over all trials and locations was calculated as:

$$ M_i = \hat{\mu} + \widehat{g_i}, $$
(4)

where \(\hat{\mu}\) was the estimate for the intercept and \(\widehat{g_i}\) was the estimate of the genetic effect of the ith entry, both estimated based on the statistical model in Eq. (3).

For estimation of variance components, the statistical model in Eq. (3) was used, where l j and d 1i c i were regarded as fixed and all other effects as random.In order to consider the relatedness among entries with (g 1) and without (g 2) available marker data, we assumed:

$${{\mathbf{g}}} = \left( \begin{array}{l} {{\mathbf{g}}}_1\\ {{\mathbf{g}}}_2\\ \end{array} \right) \sim {\rm MVN} \left[\left( \begin{array}{l} {{\mathbf{0}}}\\{{\mathbf{0}}}\\ \end{array} \right), \left( \begin{array}{ll} {{\mathbf{A}}}_{11}&{{\mathbf{A}}}_{12}\\ {{\mathbf{A}}}_{21}&{{\mathbf{A}}}_{22}\\ \end{array} \right) \sigma_{g}^2 \right]$$
(5)

where A 11 and A 22 are the matrices of coancestry coefficients that define the degree of genetic covariance among entries with (A 11) and without available marker data (A 22). A 12 is the matrix of coancestry coefficients between entries with available marker data and entries without available marker data and A 21A 12 . The coancestry coefficients were estimated based on the available pedigree records, according to the rules described by Falconer and Mackay (1996) and using PROC INBREED in SAS (SAS Institute 2004).

To avoid excessive computing time, we neglected the relatedness of the entries with respect to genotype × location interaction and assumed:

$$({{\mathbf{gl}}}) = \left( \begin{array}{l} ({{\mathbf{g}}}_1{{\mathbf{l}}})\\ ({{\mathbf{g}}}_2 {{\mathbf{l}}})\\ \end{array} \right) \sim {\rm MVN}\left[\left( \begin{array}{l} {{\mathbf{0}}}\\{{\mathbf{0}}}\\ \end{array} \right), \left( \begin{array}{ll} {{\mathbf{I}}}&{{\mathbf{0}}}\\ {{\mathbf{0}}}&{{\mathbf{I}}}\\ \end{array} \right) \sigma_{gl}^2 \right], $$
(6)

where I was an identity matrix. Heritability on an adjusted entry mean basis h 2 g was calculated according to Emrich et al. (2008). ρ p was calculated as correlation between the plot-level phenotypic data across locations of two traits and ρ g as correlation between the adjusted entry means M i of two traits.

One-step approach for AMMSP

In the current study, not only the entries with available marker data were included in the AMMSP approach but also the entries without available marker data. This was due to the fact that based on a statistical model with random entries and on the modeling of the relatedness among the entries of both sets, the phenotypic observations of the entries without available marker data improve the estimation of the genotypic value of the entries with available marker data. Thereby, the power for detection of marker-phenotype associations is increased.

In studies based on testcross progenies with a common tester, no dominance effects can be estimated, because the effects of allele substitution \(({\varvec{\alpha}})\) comprise also the dominance effects between parental alleles and those of the tester (Melchinger 1988). Therefore, we used the following statistical model for one-step AMMSP:

$$ \begin{aligned} y_{ijkno} &= \mu + c_i + l_j + ({\varvec{\alpha}} + {\varvec{\alpha}}{\user2{l}}_j)^\prime {{\mathbf{x}}}_i\\ &\quad + d_{i} \check{g}_i + (cl)_{ij} + d_{i}(\check{g}l)_{ij} + t_{kj} + r_{njk} + b_{onjk} + e_{ijkno}, \end{aligned} $$
(7)

where \({\varvec{\alpha}}{\user2{l}}_j\) were the interaction effects of the allele substitution effects with the jth location, x i was a column vector with the number of copies of the corresponding alleles, and \(\check{g}_i\) was the residual genetic effect of the ith entry except for the effect of the marker locus under consideration. We regarded \(d_{i}\check{g}_i, (cl)_{ij}, d_{i}(\check{g}l)_{ij}, t_{kj}, r_{njk},\) and b onjk as random and all other effects as fixed. However, the expected value of the genotypic effects of entries without available marker data is not zero any longer, when conditioning on the markers. To overcome this problem, we used the following definition of the variance-covariance matrix (for derivation see Appendix):

$$ \left( \begin{array}{l} {\check{{\mathbf{g}}}}_1\\ {\check{{\mathbf{g}}}}_2\\ \end{array} \right) \sim {\rm MVN}\left[ \left( \begin{array}{l} {{\mathbf{X}}}_1{\varvec{\alpha}}\\ {{\mathbf{X}}}_2{\varvec{\alpha}}\\ \end{array} \right), \left( \begin{array}{ll} (1-p_g){{\mathbf{A}}}_{11}&(1-p_g){{\mathbf{A}}}_{12}\\ (1-p_g){{\mathbf{A}}}_{21}&{{\mathbf{A}}}_{22}-p_g {{\mathbf{A}}}_{21}{{\mathbf{A}}}^{-1}_{11}{{\mathbf{A}}}_{12}\\ \end{array} \right) \sigma_{\check{g}}^2 \right], $$
(8)

where X 1 was a 369 × (p−1) matrix with p being the number of alleles of the marker locus under consideration (i.e., the ith row of X 1 equalled x i ) and p g the proportion of the genotypic variance by the marker locus under consideration. The mth column of this matrix was calculated as the number of copies of the (m + 1)th allele minus the number of copies of the first allele observed for the corresponding genotypes. The matrix X 2 was calculated as A 21 A −111 X 1, and p g was the proportion of the genotypic variance explained by the locus under consideration.

From Eq. (8) we have \(\hbox{Var}({\check{{\mathbf{g}}}}) = {{\mathbf{K}}}_1 \sigma_{\check{g}}^2 + {{\mathbf{K}}}_2 \sigma_{\check{g}_{p_g}}^2,\) where \({\check{{\mathbf{g}}}} = ({\check{{\mathbf{g}}}}_1, {\check{{\mathbf{g}}}}_2),\)

$$ {{\mathbf{K}}}_1= \left(\begin{array}{ll} {{\mathbf{A}}}_{11}&{{\mathbf{A}}}_{12}\\ {{\mathbf{A}}}_{21}&{{\mathbf{A}}}_{22}\\ \end{array}\right), \quad{{\mathbf{K}}}_2= \left( \begin{array}{ll} -{{\mathbf{A}}}_{11}&-{{\mathbf{A}}}_{12}\\ -{{\mathbf{A}}}_{21}&-{{\mathbf{A}}}_{21}{{\mathbf{A}}}^{-1}_{11}{{\mathbf{A}}}_{12} \\ \end{array} \right),$$

and \(\sigma_{\check{g}_{p_g}}^2 =p_g \sigma_{\check{g}}^2,\) such that \(p_g = \sigma_{\check{g}_{p_g}}^2/\sigma_{\check{g}}^2.\) The proportion of the genotypic variance explained by the marker locus under consideration (\(\widehat{p_g}\)) can thus be calculated from variance component estimates as

$$ {\widehat{p_g}} = {\frac{\widehat{\sigma_{\check{g}_{p_g}}^2}} {\widehat{\sigma_{\check{g}}^2}}}. $$
(9)

Two-step approaches for AMMSP

Approach based on adjusted entry means for each location: This approach, which was designated as two-step approach A, was based on the adjusted entry means M ij calculated for each of the 21 locations. These adjusted entry means were then used in a second step for joint linkage and association analyses based on the statistical model:

$$ M_{ij} = \mu + c_i + l_j + ({\varvec{\alpha}} + {{\varvec{\alpha}}{{\mathbf{l}}}}_j)^\prime {{\mathbf{x}}}_i + d_{i}\check{g}_i + e_{ij}, $$
(10)

where all effects except \(d_{2i}\check{g}_i\) were regarded as fixed. The assumptions made in the one-step approach for AMMSP concerning \(\hbox{Var}({\check{{\mathbf{g}}}})\) were also made in this approach.

Approach based on adjusted entry means over all locations: This approach, which was designated as two-step approach B, was based on the adjusted entry means M i calculated over all locations. The M i of the 369 entries with available marker data were then used in a second step for joint linkage and association analyses based on the statistical model:

$$ M_i = \mu + {\varvec{\alpha}}^\prime {{\mathbf{x}}}_i + \check{g}_{1i} + e_i, $$
(11)

where \(\check{g}_{1i},\) the residual genetic effect of the ith entry with marker data except for the effect of the marker locus under consideration, was regarded as random and all other effects were regarded as fixed. We assumed \(\hbox{Var}({\check{{\mathbf{g}}}}_1) = 2{{{\mathbf{A}}}_{11}}\sigma_{\check{g}_1}^2.\) In contrast to the two-step approach A, the two-step approach B does not allow to make inferences about the interaction effects of the allele substitution effects with the locations.

Weighting method for two-step AMMSP: The covariance matrix for the vector of residuals e is denoted here as R (Lynch and Walsh 1998). In two-step mixed-model procedures, the matrix R may be chosen in different ways to approximate the actual variance–covariance matrix of adjusted entry means denoted here as V.

We used for both two-step AMMSP approaches the weighting method proposed by Smith et al. (2001). For the two-step approach A, this method is based on the V matrix calculated for each location (V j ) with R −1 j D(V −1 j ), where D(V −1 j ) was a diagonal matrix with diagonal elements equal to those of V −1 j . The matrix R −1 was then calculated as: \({\mathop \oplus\limits_{j=1}^{21}} {\bf R}_j^{-1}.\) For the two-step approach B, R −1 = D(V −1), where V was the actual variance–covariance matrix of adjusted entry means calculated over all locations.

In a linkage mapping context, the use of the false discovery rate to overcome the multiple test problem is dubious (Chen and Storey 2006). Therefore, for all abovementioned AMMSP approaches we applied the Bonferroni–Holm procedure (Holm 1979) to detect markers with significant (P < 0.05) (1) main effects across locations and (2) marker × location interactions. The total proportion of the genotypic variance explained by all markers with significant main effect was obtained by fitting a model including all these markers simultaneously. All mixed-model calculations were performed with ASReml release 2.0 (Gilmour et al. 2006).

Linkage disequilibrium was assessed by the composite linkage disequilibrium measure Δ (Weir 1996). Significance of Δ was tested with χ2 tests. LD computations were performed with the GDA 1.0 software (Lewis and Zaykin 1999).

Results

The coancestry coefficient calculated from pedigree records (Fig. 1) ranged for the 768 entries from 0.02 to 0.98 with an average of 0.39. The total number of alleles detected for the 58 molecular markers was 155, with the number of alleles per locus ranging from 2 to 6. The allele frequency of the 155 alleles varied between 0.01 and 0.98. Across the nine segregating populations, the percentage of SSR loci pairs with significant (P < 0.05) LD was 76.5% (Fig. 2). Within the segregating populations, the percentage of SSR loci pairs with significant (P < 0.05) LD was considerably lower and ranged from 2.2 to 48.6%.

Fig. 2
figure 2

Linkage disequilibrium (LD; Δ) between pairs of SSR markers (above the diagonal) and its significance (P < 0.05) based on χ2 tests (below the diagonal). Dark-gray coloring indicates high Δ values and significant LD. White coloring indicates Δ = 0 and no significant LD. The thin horizontal and vertical lines mark off the chromosomes

For all traits, the cluster analysis based on the correlation coefficient of adjusted entry means among all pairs of locations revealed the absence of distinct subgroups of locations (data not shown). Significant (P < 0.01) genotypic variance and significant (P < 0.01) variance of genotype × location interaction were observed for all nine traits (Table 2). Heritability on an adjusted entry mean basis was high for the nine traits and ranged from 0.82 (WSY) to 0.96 (K, IC). Correlations r p ranged from −0.56 (Na/WSC) to 0.99 (SC/WSC, SY/WSY) (Table 3). Likewise, correlations r g varied between −0.46 (Na/WSC) and 0.98 (SC/WSC, SY/WSY).

Table 2 First- and second-degree statistics for sugar beet testcross progenies for potassium content (K), sodium content (Na), α-amino nitrogen content (N), proportion of impurities (IC), sugar content (SC), white sugar content (WSC), beet yield (BY), sugar yield (SY), and white sugar yield (WSY)
Table 3 Correlation ρ p between the plot-level phenotypic data (above diagonal) and correlation ρ g between the adjusted entry means cross all locations (below diagonal) for potassium content (K), sodium content (Na), α-amino nitrogen content (N), proportion of impurities (IC), sugar content (SC), white sugar content (WSC), beet yield (BY), sugar yield (SY), and white sugar yield (WSY) based on 768 sugar beet entries

Using the one-step AMMSP approach, the total number of significant (P < 0.05) marker-phenotype associations was 44 (Table 4). The number of markers with significant main effect varied from three for BY, SY, and WSY to nine for Na. The proportion of the genotypic variance explained simultaneously by all markers with significant main effect was lowest for SY (4.4%) and highest for Na (36.8%). The proportion of the genotypic variance explained by the individual markers ranged from 1.5 (Na) to 11.8% (Na). For all markers identified based on their significant main effect, significant (P < 0.05) marker × location interactions were observed, which explained between 7.9 (BY) and 20.8% (SC) of the variance of genotype × location interactions (σ 2 gl ).

Table 4 Marker loci significantly (P < 0.05) associated with potassium content (K), sodium content (Na), α-amino nitrogen content (N), proportion of impurities (IC), sugar content (SC), white sugar content (WSC), beet yield (BY), sugar yield (SY), and white sugar yield (WSY), the explained proportion of the genotypic variance \((\widehat{p_g}),\) the explained proportion of the genotype × location interaction \((\widehat{p_{g\times{l}}})\) by the marker × location interaction, and the allele substitution effects \((\widehat{\alpha})\)

For the nine traits examined, Spearman rank correlations between the P values calculated for the 58 marker loci by using the one-step AMMSP approach and the P values estimated based on the two-step approach A varied between 0.972 and 0.996 (Table 5). The correlations observed between the P values of the one-step procedure and those of the two-step procedure B were slightly lower and ranged from 0.942 to 0.996 for the nine traits. The same trend was observed for the Spearman rank correlation between the allele substitution effects calculated by using the one-step approach and those using the two-step procedures for AMMSP. Two-step approach A failed to detect two markers with significant (P < 0.05) main effects, which were identified using the one-step approach (Fig. 3). In contrast, the two-step approach B failed to detect five markers previously identified using the one-step approach, but detected three marker-phenotype associations, which were not identified using the one-step approach.

Table 5 Spearman rank correlation between P values (ρ P ) and allele substitution effects (ρα) calculated using the one-step AMMSP approach and the two-step AMMSP approaches A and B
Fig. 3
figure 3

Number of markers with significant (P < 0.05) main effect detected for the nine traits under consideration using the (1) one-step joint linkage and association mapping (AMMSP) approach and (2) two-step AMMSP approaches A and B. For a detailed description of the different methods see “Materials and methods

Discussion

The current study was based on 19 segregating populations which were derived from connected crosses of parental genotypes (Fig. 1). In previous studies, such populations were analyzed using linkage mapping methods (e.g., Rebai and Goffinet 1993). These methods, however, use only the LD within each segregating population. Yu et al. (2008) proposed the NAM strategy which in addition to the LD within each segregating population exploits also the LD present in the set of parental genotypes.

When genotypes representing global genetic diversity are used as parents of the segregating populations, NAM approaches promise to result in a mapping resolution considerably higher than that obtained with linkage mapping approaches (Stich et al. 2007). This is due to the fact that LD decays in such germplasm over a short physical distance (Wilson et al. 2004). For AMMSP approaches in a plant breeding context, however, elite genotypes must be used as parental genotypes in order to obtain QTL information of direct use in elite breeding programs (Crepieux et al. 2004). This leads to a reduction in the mapping resolution in comparison with that in the studies of Stich et al. (2007) and Yu et al. (2008), because LD is expected to decay in a set of elite genotypes over a longer physical distance than in a set of diverse genotypes (cf., Palaisa et al. 2004). Nevertheless, even in this case exploitation of the LD present in the set of elite genotypes confers AMMSP approaches a mapping resolution which is higher than that of linkage mapping approaches.

Statistical approaches for AMMSP

In experiments, in which the genome sequence of all entries is available, the statistical models for AMMSP differ from those for multi-cross linkage mapping in that they neither comprise fixed population effects (e.g., Blanc et al. 2006) nor regard markers as nested within populations. The results of Yu et al. (2008) suggested that approaches neglecting population structure provide a higher power for QTL detection and a lower proportion of false positives than approaches taking population structure into account. This observation might be explained by the fact that differences among the mean performance of segregating populations derived from different crosses of parental genotypes are caused by QTL. In statistical approaches which take population structure into account by fixed population effects, the effects of these QTL are absorbed in the population effect and, thus, these QTL can not be detected.

In the current study, however, no genome sequence was available for the entries and, thus, detection of marker-phenotype association is based on LD between markers and trait coding polymorphisms. Therefore, population structure of the entries must be considered in order to adhere to the nominal α level. In a AMMSP context, this can be achieved by regarding entries as random and defining covariances among the entries based on a numerator relationship matrix.

Stich et al. (2008) proposed the use of marker-based estimates of the numerator relationship matrix optimally adapted to the phenotypic data for association mapping approaches. This way of calculating the numerator relationship matrix also seems to provide the best estimates of pair-wise coancestry coefficients in the context of AMMSP with respect to adherence to the nominal α level as well as the power for QTL detection (Stich et al. 2008). In the current study, however, the estimation of a marker-based numerator relationship matrix was not possible as marker data were not available for all entries. Therefore, we calculated the numerator relationship matrix from pedigree records.

Comparison of one- and two-step approaches for AMMSP

In all types of genetic mapping experiments, the one-step approach, in which the phenotypic and genotypic data analysis is performed in one step, is the only fully efficient analysis (Stich et al. 2008). Consequently, P values and allele substitution effects calculated for the marker loci under consideration based on such a statistical model are the reference values (Piepho and Pillen 2004). However, AMMSP data sets which are currently under development are of such a size that a one-step analysis of phenotypic and genotypic data might be either impossible or impractical due to excessive computing time. As this problem can be overcome by applying two-step procedures, we examined the consistency of the results of one- and two-step approaches for AMMSP.

Spearman rank correlations observed in the current study between the P values of the two-step approach B and those of the one-step approach (Table 5) were similar to those found by Stich et al. (2008) in an association mapping approach for wheat. Despite the rather high correlation, the two-step approach B failed to detect five marker-phenotype associations which were identified using the one-step procedure (Fig. 3) but identified three markers which were not detected using the one-step procedure. In addition, a rather low correlation was observed between the allele substitution effects of the two-step procedure B and the one-step procedure (Table 5). These observations suggested that for AMMSP, use of the two-step approach B, frequently applied in linkage mapping (e.g., Schön et al. 2004) and association mapping (e.g., Breseghello and Sorrells 2006) experiments, is questionable.

Spearman rank correlations between P values calculated using the two-step approach A and the one-step approach were only slightly higher than the correlation observed between two-step approach B and the one-step approach (Table 5). Nevertheless, the two-step approach A failed only to detect two marker-phenotype associations which were identified using the one-step approach and detected only marker-phenotype associations which were also identified based on the one-step approach. These observations indicated that the two-step approach A has a considerably higher power for QTL detection and results in a lower proportion of false positives than the two-step approach B. Therefore, for data sets in which the one-step analysis of phenotypic and genotypic data might be impossible or impractical, we suggest using the two-step approach A. Our further discussion, however, is restricted to results from the one-step approach due to its more desirable properties.

Detected marker-phenotype associations

In our study, the lowest number of markers with significant (P < 0.05) main effect was detected for BY, SY, and WSY, whereas a considerably higher number of markers were associated with K, Na, and N (Table 4). This observation is in accordance with findings of Schneider et al. (2002) and might be explained by the fact that BY, SY, and WSY are complex traits for which no QTL with major effects were expected but rather a large number of QTL with small individual effects (Moreau et al. 2004). However, the power to detect such QTL is low. In contrast, for less complex traits influenced by a low number of QTL, such as K, Na, and N, the individual QTL explain a high proportion of the variance and, thus, a high power for QTL detection is anticipated.

Proportion of the genotypic variance explained by markers: In the current study, the proportion of the genotypic variance explained by the markers with significant main effect ranged from 1.5 to 11.8% and the proportion explained by all significant markers together varied for the nine traits from 4.4 to 36.8% (Table 4). These estimates of the proportion of the explained variance were considerably lower than those observed by Schneider et al. (2002) for the same traits in a linkage mapping experiment of sugar beet. This finding might be explained by the different concepts in choosing the parental genotypes of the segregating populations underlying these studies. In the present study, current sugar beet elite genotypes were used as parents of the segregating populations while in the study of Schneider et al. (2002) the parents of the mapping population were chosen in such a way that they maximally differed for the traits under consideration. The latter approach increases the probability of detecting QTL explaining a large proportion of the genotypic variance (Lander and Botstein 1989). However, in contrast to the current study, this procedure leads to QTL information which might be worthless for marker-assisted selection in elite plant breeding programs because the favorable QTL allele might already be fixed in the elite germplasm pool.

A further reason for the considerable discrepancy in the proportion of the explained variance observed in our study and the biparental cross of Schneider et al. (2002) might be the difference in allele frequencies expected for the germplasm in both studies. This is due to the fact that the proportion of genotypic variance explained by a marker is a function of the allele frequency and the allele substitution effect. Under the assumption of a fixed allele substitution effect, the maximum of the proportion of genotypic variance explained by a marker is observed for an allele frequency of 0.5, as expected for the entries derived from a biparental cross. In contrast, for progenies derived from several crosses of parental genotypes, like those examined in our study, the allele frequencies might be considerably different from 0.5 and, thus, the proportion of the genotypic variance explained by a marker might be notably lower despite the same underlying allele substitution effect. Because allele frequencies of plant breeding populations are expected to be more similar to that of populations derived from several crosses of parental inbreds than to those of a population derived from a biparental cross (Crepieux et al. 2004), we suggest that in comparison to the latter approach the former approach leads to a more representative estimate of the variance accounted for by a marker in the breeding population.

Furthermore, the differing results between our study and that of Schneider et al. (2002) with respect to the proportion of the variance explained by markers might be attributable to the fact that the number of genotypes underlying the study of Scheider et al. (2002) was considerably lower than that of the current study. With a limited sample size, however, model selection generally causes an overestimation of the genotypic variance explained by markers (Beavis 1994; Utz and Melchinger 1994; Schön et al. 2004).

Markers associated with multiple traits: Our results suggested that several markers were not only associated with one trait but with two to five traits (Table 4). This observation is in accordance with the results of Schneider et al. (2002) and can be explained by a single gene with pleiotropic effects on several traits. The same phenomenon can be the result of tightly linked genes, which due to the limited mapping resolution are detected as one locus, but affect different traits. For some markers (e.g., M28), this discrimination is of less interest to plant breeders if the same allele affects the phenotype for all traits in the desired direction. However, for most markers associated with several traits in our study (e.g., M5), the positive allele for one trait had a negative effect for the other trait. In this case, the explanation for colocalization of marker-phenotype associations for different traits by pleiotropy or linkage determines whether marker-assisted selection can be used to improve both traits concurrently. Based on our results, however, this discrimination can not be made.

The high extent of LD observed for the markers of our study across the nine segregating populations suggested that the marker density should be sufficient for genome-wide QTL detection. Nevertheless, colocalization of QTL for different traits might be further analyzed based on the germplasm of our study genotyped in the identified genome regions with a higher marker density and applying multivariate association mapping approaches. Subsequently, experiments for a direct proof of the allele function might be performed, because AMMSP studies provide only statistical, i.e., indirect evidence for the function of the identified genome regions (Andersen and Lübberstedt 2003). This can be obtained by comparing isogenic genotypes which can be produced by recurrent backcrossing or based on targeting induced local lesions in genomes (Jung and Hohmann 2006).

Marker × location interactions: For all markers with significant main effects identified in the current study, also significant marker × location interactions were observed (Table 4). Our observation is in contrast to results from most linkage mapping studies reported in the literature (e.g., Cockerham and Zeng 1996; Melchinger et al. 1998), which rarely found significant marker × location interactions despite the presence of significant genotype × location interactions. This difference might be explained by the fact that in contrast to previous studies the entries of our study were grown in a large number of locations allowing a more reliable estimation of marker × location interactions. Furthermore, the high number of entries examined in our study increases the power for detection of marker × location interactions (cf. Boer et al. 2007).

Marker × location interactions can not be directly used in marker-assisted selection programs. They rather contribute to the instability of QTL effects across multiple environments (Piepho 2000) and thereby reduce the efficiency of marker assisted selection with respect to a broad adaptation of the germplasm (e.g., Bouchez et al. 2002). One way to use marker × location or genotype × location interactions for breeding progress is the development of specific varieties for specific groups of locations (Curnow 1988). In contrast to results of Moreau et al. (2004), however, a clustering approach did not reveal the presence of distinct subgroups of locations for any trait. Therefore, we did not further examine the possibility of unraveling marker × location interactions in marker × mega-environment interactions.

Conclusions

We proposed an approach for AMMSP which is not only based on entries with available marker data but also on related entries without available marker data. Thereby, the estimation of the genotypic value of the entries with available marker data is improved and the power for detection of marker-phenotype associations is increased. Based on this approach, we identified several genome regions associated with the traits under consideration, which are promising for marker-assisted selection. This observation indicated that not only segregating populations derived from crosses of parental genotypes in a systematic manner, as suggested by Yu et al. (2008), can be used for QTL detection, but also populations routinely derived in plant breeding programs from multiple, related crosses. Furthermore, our results indicate that data sets, which are of a size that does not permit analysis by a one-step AMMSP approach, might be analyzed based on the two-step approach A without losing too much power for detection of marker-phenotype associations. By contrast, the commonly employed two-step approach B suffers from a more severe loss of power.