Introduction

Technology for creating maize (Zea mays L.) doubled haploids reduces the time required to produce inbreds. Whereas six or more generations of selfing are needed to produce recombinant inbreds (RI) from F1 plants, doubled haploids are produced in only two generations. The routine production of maize doubled haploids has been made possible by the development of inducer lines that, when used in crosses, lead to a high frequency of haploids (e.g., >8%, Röber et al. 2005; Seitz 2005). Putative haploids are identified through morphological markers and haploid seedlings are treated with colchicine to double their chromosomes so that their selfed seeds are diploid (Chase 1951; Bordes et al. 1997; Melchinger et al. 2005; Seitz 2005).

A key question in applying doubled haploid technology in maize breeding is the generation during which haploids should be induced. Production of doubled haploids will be the quickest if haploids are induced among F1 plants. In practice, F1-derived doubled haploids (denoted by DH) are typically produced in maize (Röber et al. 2005; Melchinger et al. 2005; Smith et al. 2008). Studies have shown, however, that the reduced recombination in DH lines may decrease the response to a single cycle (Riggs and Snape 1977) or to multiple cycles (Jannink and Abadie 1999) of selection.

Because DH lines are produced after only one meiosis, the number of underlying crossover events is expected to be less among DH lines than among RI lines. Likewise, the number of entire chromosomes or the size of chromosome segments that are passed intact from a parental inbred are expected to be larger among DH lines than among RI lines. In maize, Murigneux et al. (1993) found that the percentage of recombinants was twice higher among RI lines than among DH lines. Smith et al. (2008) found an average of 15 recombinations per RI genome versus ten recombinations per DH genome. The results of both Murigneux et al. (1993) and Smith et al. (2008) were consistent with theory indicating that the ratio between recombination frequencies among RI and DH lines approaches 2.0 when loci are tightly linked and decreases to 1.0 when loci are unlinked (Haldane and Waddington 1931). Furthermore, Smith et al. (2008) found that the percentage of lines with four or more intact parental chromosomes (N = 10 chromosomes in maize) was 37% among DH lines and 13% among RI lines.

Maize breeders typically develop new and improved inbreds from crosses among the most elite inbreds (Bauman 1981; Mikel and Dudley 2006; Nelson et al. 2008). Suppose two DH lines both inherit an intact copy of chromosome 1 from the same parental inbred. If the two DH lines are crossed to form a new F1 for inbred development, all of the alleles on chromosome 1 from the other parent become lost. The limited recombination in DH lines, coupled with selection of only a few improved inbreds per cross, may therefore effectively lead to bottlenecks that would reduce the selection gain when DH lines are used continuously in selection (Jannink and Abadie 1999). On the other hand, the limited recombination in DH lines could preserve superior epistatic combinations of alleles (Mayr 1954, p 165) that may have accumulated in elite germplasm.

Induction of DH lines among F2 plants would allow an increased amount of recombination in the resulting DHF2 lines without substantially increasing the amount of time needed for generating inbreds. The responses, however, to continuous selection among DH, DHF2, and RI lines have not been compared. My objective in this study was to determine if, for sustaining long-term response, doubled haploids should be induced in F1 or F2 plants in maize inbred development.

Materials and methods

Genetic models, founder inbreds, and Cycle 0 base population

I simulated a trait controlled by L = 20, 100, or 200 quantitative trait loci (QTL), each with four alleles. The sizes of each of the ten maize chromosomes corresponded to those in the Senior et al. (1996) linkage map. The QTL were located at random on the ten chromosomes according to a uniform distribution across the entire genome. The QTL1 had the largest effect, QTL2 had the second largest effect, and the Lth QTL had the smallest effect. The QTL effects were defined as the genotypic values when the lines were testcrossed to an unrelated inbred tester.

In the additive model, the magnitudes of QTL effects followed a geometric series (Lande and Thompson 1990) in which the top 10% of the QTL with the largest effects accounted for 34% of the total genetic variance. At the kth QTL, the testcross genotypic values of the four homozygotes were 1.8a k for Q1Q1, 0.6a k for Q2Q2, −0.6a k for Q3Q3, and −1.8a k for Q4Q4, where a = (1 − L)/(1 + L) as specified by Lande and Thompson (1990). The genotypic values were scaled in this manner so that half the difference in genotypic values among the four homozygotes had a mean of a k. Given that testcross means behave in a purely additive manner (Hallauer and Miranda 1989, p 28), dominance was assumed absent and the testcross genotypic value of each of the six heterozygotes was equal to the mean value of the corresponding homozygotes. The testcross genotypic value of a line was equal to the sum of the corresponding testcross genotypic values across the all QTL.

In the epistatic model, the testcross genotypic values were modeled as the flux in a linear metabolic pathway of L enzymes, each coded by a QTL (Kacser and Burns 1981). The flux, in turn, was modeled by adapting the biallelic model of Bost et al. (1999) to a four-allele model at each QTL. First, the mean activity of the kth enzyme was modeled as m k  = 1 − a vk, where v = 1 with 20 QTL, v = 3 with 100 QTL, and v = 4 with 200 QTL. These values of v and m k were chosen so that, at the population level, the sum of enzyme control coefficients (Bost et al. 1999) of the top 10% of QTL with the largest effects was equal to 34% of the sum of enzyme control coefficients across all QTL. Next, the scale factor for the variation in enzyme activity at the kth locus was b k  = m k c√2 (Bost et al. 1999), where c = 0.15 was the coefficient of variation. Lastly, the testcross enzyme activities at the kth locus (E k ) were m k  + 1.8b k for Q1Q1, m k  + 0.6b k for Q2Q2, m k  − 0.6b k for Q3Q3, and m k  − 1.8b k for Q4Q4. Values of E k were always positive. Dominance for enzyme activity was assumed absent. The testcross genotypic value of a line was then calculated from the corresponding enzyme activities across all QTL as 1/∑(1/E k ) (Bost et al. 1999).

At a given QTL, each of four founder inbreds (I1, I2, I3, I4) carried a unique QTL allele. So that the founder inbreds complemented each other, four configurations were considered between founder inbreds and QTL alleles. First, I1 had Q1, I2 had Q2, I3 had Q3, and I4 had Q4 at the following QTL: k = 1, 5, 9,…, (L − 3). Second, I1 had Q4, I2 had Q3, I3 had Q2, and I4 had Q1 at the following QTL: k = 2, 6, 10,…, (L − 2). Third, I1 had Q3, I2 had Q1, I3 had Q4, and I4 had Q2 at the following QTL: k = 3, 7, 11,…, (L − 1). Finally, I1 had Q2, I2 had Q4, I3 had Q1, and I4 had Q3 at the following QTL: k = 4, 8, 12,…, L. Given that the QTL were randomly located in the genome, repulsion and coupling linkages were generated at random.

Cycle 0 initially comprised equal proportions of three double crosses among the founder inbreds: (I1 × I2) × (I3 × I4); (I1 × I3) × (I2 × I4); and (I1 × I4) × (I2 × I3). A total of N = 500 or 1,000 DH, DHF2, and RI lines were then simulated to form Cycle 0. The DH lines were created by simulating meiosis in a Cycle 0 plant (i.e., double cross) and doubling the chromosomes of a random gamete. The Kosambi mapping function was used to relate map distances and recombination frequencies. The DHF2 lines were created after one generation of selfing the Cycle 0 plants. The RI plants were created by single-seed descent with seven generations of selfing.

To estimate the total genetic variance (V G), which was needed to simulate phenotypic values and calculate standardized responses to selection, a large (N = 20,000), random-mated Cycle 0 population was created from equal proportions of the three double crosses. For the additive and epistatic genetic models, testcross genotypic values of S0 plants were calculated and V G was obtained as the variance among testcross genotypic values. This large population was not used in selection.

Selection among DH, DHF2, and RI lines

The overall selection procedure (Fig. 1) involved 15 cycles of inbred development. For each of the three types of inbreds, the best N Sel = 10 (for N = 500) or 20 (for N = 1,000) lines were selected in Cycle 0. Selection was based on testcross phenotypic values, which were obtained by adding a random nongenetic effect to the testcross genotypic value of each line. The nongenetic effects were normally distributed with a mean of zero and a variance of VE. The VE corresponded to a heritability, on a testcross-mean basis, of H = 0.20, 0.50, or 0.80 in a random-mated Cycle 0 S0 population. Pairs of the N Sel lines from Cycle 0 were crossed to form 5 (for N Sel = 10) or 10 (for N Sel = 20) Cycle 1 F1’s. The crossing scheme among the N Sel lines was as follows: [1 × (N Sel − 1)], [2 × (N Sel − 2)], [3 × (N Sel − 3)],…, [(N Sel/2 − 1) × (N Sel/2 + 1)], and [(N Sel/2) × (N Sel)] (Fig. 1). This crossing scheme was used to minimize crosses between related N Sel lines after Cycle 1 (i.e., each F1 plant in Cycles 2–15 had four unique grandparents).

Fig. 1
figure 1

Development of DH, RI, and DHF2 populations from four founder inbreds (I1, I2, I3, and I4), illustrated with a total number of lines equal to 500 and the N Sel = 10 best lines selected in each cycle

For each Cycle 1 F1, a total of 100 DH, DHF2, or RI lines were then simulated so that the total population size across all Cycle 1 F1’s remained constant at N = 500 or 1,000 in the next cycle. The selection and recombination procedures in Cycle 0 were repeated in Cycle 1 and in the succeeding cycles until Cycle 15 was obtained.

Data analysis

Each simulation experiment comprised a combination of the type of inbred (DH, DHF2, or RI), number of QTL, genetic model (additive or metabolic flux), N and N Sel, and H. Each simulation experiment was repeated 1,000 times and the results were averaged across repeats. Each repeat differed in the location of QTL, the large Cycle 0 population generated to estimate V G, the genotypes of the lines generated, and their phenotypic values.

For each cycle, standardized responses to selection were obtained by subtracting the mean of Cycle 0 from the mean of the particular cycle and dividing the difference by the square root of V G. The statistical significance (P = 0.05) of differences in selection response was determined with z tests, using the variances of the selection response across the 1,000 repeats of an experiment.

Results

Across cycles of selection for testcross performance, the ratio between the response among RI lines and the response among DH lines (R RI:DH) was largest with 200 QTL, intermediate with 100 QTL, and smallest with 20 QTL. Consider the additive model, a heritability of H = 0.50, and the best N Sel = 20 lines being selected in each cycle. When 200 QTL controlled the trait, R RI:DH was 105% in Cycle 5 and increased to a maximum of 108% in Cycle 15 (Fig. 2). These R RI:DH values corresponded to selection responses that were 0.37–1.11 genetic standard deviations higher among RI lines than among DH lines (LSD0.05 = 0.07–0.10; results not shown). When the number of QTL decreased from 200 to 100, R RI:DH decreased to 103% in Cycle 5 and 105% in Cycle 15. The R RI:DH was 100–101% across different cycles when 20 QTL controlled the trait.

Fig. 2
figure 2

Ratio between selection response among RI lines and DH lines (R RI:DH, solid squares), ratio between selection response among DHF2 lines and DH lines (R DHF2:DH, solid triangles), and ratio between selection response among RI lines and DHF2 lines (R RI:DHF2, open triangles) for different genetic models. Responses are for a trait with a heritability of H = 0.50 and with N Sel = 20 lines selected out of 1,000 lines in each cycle

Differences were small between the selection responses among RI and DHF2 lines. With 200 QTL, H = 0.50, and N Sel = 20, the ratio between the response among RI lines and the response among DHF2 lines (R RI:DHF2) in the additive model was less than 103% across cycles (Fig. 2). These R RI:DHF2 values corresponded to selection responses that were 0.11–0.36 genetic standard deviation higher among RI lines than among DHF2 lines (significant at P = 0.05). The values of R RI:DHF2 decreased when fewer QTL controlled the trait.

The responses were larger among DHF2 lines than among DH lines. With 200 QTL, H = 0.50, and N Sel = 20, the ratio between the response among DHF2 lines and the response among DH lines (R DHF2:DH) in the additive model was less than 103% in Cycle 5 and about 106% in Cycle 15 (Fig. 2). These R DHF2:DH values corresponded to selection responses that were 0.26–0.74 genetic standard deviation higher among DHF2 lines than among DH lines (significant at P = 0.05). As with R RI:DH and R RI:DHF2, R DHF2:DH decreased when fewer QTL controlled the trait. When the number of QTL decreased from 200 to 20, the R DHF2:DH values under the additive model corresponded to selection responses (across cycles) that were only 0.05–0.09 genetic standard deviation higher among DHF2 lines than among DH lines (LSD0.05 = 0.06–0.07).

Metabolic-flux epistasis slightly reduced the differences among the responses with RI, DH, and DHF2 lines. With 200 QTL, H = 0.50, and N Sel = 20, the R DHF2:DH in Cycle 15 decreased from 106% in the additive model to 104% in the epistatic model (Fig. 2). The R DHF2:DH values in Cycles 5–15 in the epistatic model corresponded to selection responses that were 0.10–0.36 genetic standard deviation higher among DHF2 lines than among DH lines (significant at P = 0.05). With 20 QTL, where R DHF2:DH was barely greater than 100% in the additive model, R DHF2:DH remained 100–102% in the epistatic model.

While increases in H and N Sel led to increases in the response to selection, both H and N Sel had little effect on R RI:DH, R RI:DHF2, and R DHF2:DH. Consider the additive genetic model with 100 QTL. The R DHF2:DH in Cycle 15 was 103–104% when N Sel was 20 and when H varied from 0.20 to 0.80 (Fig. 3a). When N Sel decreased from 20 to 10, the R DHF2:DH in Cycle 15 were likewise 103–104% regardless of H (Fig. 3b).

Fig. 3
figure 3

Ratio between selection response among DHF2 lines and DH lines (R DHF2:DH) for the additive genetic model with 100 QTL controlling a trait and with a heritability of H = 0.20 (solid squares), 0.50 (open triangles), and 0.80 (solid circles). Responses are for a N Sel = 20 out of 1,000 lines and b N Sel = 10 out of 5,000 lines selected in each cycle

The variation in the response increased as the number of cycles of selection increased and as H and N Sel decreased. Again consider the additive genetic model with 100 QTL. The standard deviation (in units of the genetic standard deviation) of the responses among DHF2 testcrosses was 0.65 in Cycle 1 with H = 0.20 and N Sel = 20. With N Sel constant (20), this standard deviation increased to 0.74 in Cycle 15 (H = 0.20) and decreased to 0.59 with H = 0.80 (Cycle 1). When N Sel decreased from 20 to 10, the standard deviation of the response increased from 0.65 (Cycle 1, H = 0.20) to 0.70 (Cycle 1, H = 0.20).

Discussion

Factors affecting selection response among DH, DHF2, and RI lines

The strength of linkage among QTL was the key factor that affected the relative responses to selection among RI, DHF2, and DH lines. For a trait controlled by 100 or more QTL, the cumulative responses to testcross selection were up to 4–6% larger among DHF2 lines than among DH lines. The cumulative responses were up to 5–8% larger among RI lines than among DH lines. Jannink and Abadie (1999) likewise found higher responses among RI lines than among DH lines for a genetic model involving 30 QTL, recombination frequencies of 0.05 or 0.15, and no epistasis. With a finite genome, the strength of linkage among QTL is largely determined by the number of QTL. If the size of the maize linkage map is 1,749 cM (Senior et al. 1996) and 200 randomly located QTL control the trait, the mean distance between adjacent QTL is about 9 cM. But if only 20 QTL control the trait, the QTL would on average be unlinked with a mean distance of 87 cM between adjacent QTL.

The number of QTL controlling quantitative traits is generally unknown. Nevertheless, traits such as kernel oil and protein in maize are likely controlled by more than 50 QTL (Dudley 1977; Laurie et al. 2004). Furthermore, maize breeders select for multiple traits at a time (Bauman 1981; Hallauer 1990). If a trait such as kernel oil is controlled by at least 50 QTL, then selecting for multiple complex traits such as grain yield, moisture, and stalk and root lodging strongly suggests that more than 100 QTL underlie the genetic gain in maize inbred development. Also, regardless of the number of QTL, linkage among QTL may be tight if the QTL are clustered in families of genes that are similar in structure and function (Li et al. 2007; Onishi et al. 2007).

The smaller advantage of RI lines over DH and DHF2 lines under the epistatic model than under the additive model suggested that the increased frequency of recombinants in RI lines may have been partially offset by the disruption of favorable allele combinations found in each parent. Riggs and Snape (1977) found that complementary gene action increases the genetic variance among DH lines when coupling linkages are present and decreases the genetic variance among DH lines when repulsion linkages are present. The current study involved a random mixture of coupling and repulsion linkages and metabolic-flux epistasis, which leads to an L-shaped distribution of gene effects that is consistent with empirical estimates of QTL effects from mapping experiments (Bost et al. 1999). However, the nature of epistatic interactions in maize remains unknown particularly in elite inbreds for which continuous selection may have led to coadapted gene complexes (Mayr 1954, p 165) that comprise favorable combinations of epistatic genes.

Empirical experiments, perhaps with elite versus older germplasm, are therefore needed to determine the possible influence of epistasis on the usefulness of RI versus DH or DHF2 lines. Furthermore, the RI, DH, and DHF2 lines were obtained at random in this study. Molecular markers, however, may be used to select RI, DH, or DHF2 lines that have either increased frequencies of recombinants or decreased frequencies of recombinants (Smith et al. 2008). This procedure would then allow the subsequent phenotypic evaluation of a subset of lines in which either the break up of repulsion linkages or preservation of favorable gene complexes has been enhanced.

Repeated generations of selfing prior to inducing doubled haploids will lead to an increased number of recombinants to the extent that selection responses among DHF3 lines (i.e., derived from F3 plants) or DHF4 lines may be greater than the selection response among DHF2 lines. However, delaying the induction of doubled haploids until the F3 or F4 generation is likely unfeasible for two reasons. First, inducing doubled haploids after the F2 generation will further increase the time required for inbred development and could defeat the main purpose of producing doubled haploids in the first place. Second, the opportunity for increased recombination due to an additional meiosis quickly declines with each additional selfing generation.

In particular, linkage affects the gametic output of double heterozygotes only. An Aabb single heterozygote, for example, will produce equal frequencies of Ab and ab gametes regardless of whether or not the two loci are linked. With 100 QTL and a linkage map of 1,749 cM (Senior et al. 1996), the mean recombination rate between adjacent QTL is r = 0.18 based on the Kosambi mapping function. The frequency of double heterozygotes then decreases by [(1/2 − r)2 + 1/4] with each selfing generation (Karlin 1968). This means that in the cross between two inbreds, the frequencies of double heterozygotes with r = 0.18 are 100% in the F1, 35% in the F2, 12% in the F3, and 4% in the F4. With 200 QTL and r = 0.09, the frequencies of double heterozygotes are 100% in the F1, 42% in the F2, 17% in the F3, and 7% in the F4. The frequency of double heterozygotes in which a crossover event will lead to a recombinant genotype therefore drastically decreases after the F2 generation.

Choice of generation for inducing doubled haploids

Overall, the results suggested that DHF2 lines are an ideal compromise between DH lines and RI lines. Selection responses among DHF2 lines were only ≤3% lower than the responses among RI lines and were up to 6% higher than the responses among DH lines. The variation in selection responses indicated that the advantage of DHF2 over DH lines may not always be realized in each biparental cross. The advantage of DHF2 over DH lines should be realized, however, when averaged across the large numbers of biparental crosses that are evaluated each year in large (e.g., commercial) breeding programs for major crops. While this simulation study used maize as a model species, the results were consistent with expectations from quantitative genetic theory and should be generally applicable to other species for which doubled haploids can be routinely generated.

Selection is meaningless among the homogeneous F1 plants obtained by crossing two inbreds, and the advantage of DHF2 over DH lines may further increase if selection is practiced among the F2 plants from which DHF2 lines are produced. Selection among F2 plants may be based on plant type and disease and insect resistance in a nursery (Bauman 1981; Hallauer 1990) or on marker-trait associations determined from prior information (Johnson 2001; Eathington et al. 2007; Bernardo 2008). Simulation studies have also suggested that selection gains may be higher if testcrosses of F2 plants are first evaluated in yield trials and doubled haploids are subsequently developed only from the F2 plants (or their F3 families) with the best performance (Longin et al. 2007).

This study focused on the response after a fixed number of cycles of testcross selection rather than on the gain per unit time with RI, DH, and DHF2 lines. As previously mentioned, doubled haploid technology allows the development of DH lines in only two generations and DHF2 lines in three generations. Compared with RI lines, the reduction in the time required per cycle due to the use of DH or DHF2 lines will vary according to how year-round nurseries or greenhouses are used in maize inbred development. But the percentage of reduction in the time per cycle with DH or DHF2 lines will be much greater than the 5–8% increase in selection response due to the continuous use of RI lines. In other words, while the long-term per cycle response is expected to be greater with RI lines, the response per unit time will be much greater with DH or DHF2 lines.

If year-round nurseries or greenhouses are used and new F1 crosses for inbred development are created on a speculative basis, the use of DHF2 lines instead of DH lines should not cause a delay in inbred development. Suppose that experimental inbreds are evaluated in hybrid combination during the summer of year 1 (Table 1). The breeder then waits for the yield trial data to decide which experimental inbreds should be used as parents of new F1 crosses for inbred development. The use of year-round nurseries or greenhouses will allow the testcross evaluation of DH lines in the summer of year 3. In contrast, waiting for the yield trial data before making F1 crosses for inbred development will delay the testcross evaluation of DHF2 lines until year 4. This delay will be avoided if, during the summer of year 1, the breeder creates a larger set of speculative F1 crosses involving experimental inbreds at the same time that hybrids of the experimental inbreds are evaluated in yield trials (Table 1). When the yield trial data become available, the best F1 crosses can then be selected and selfed to obtain F2 plants from which DHF2 lines can be generated. In this scheme, the testcrosses of the DHF2 lines can be evaluated in year 3. For DH lines, creating speculative F1 crosses in year 1 will not speed up the breeding process because the yield trials of DH testcrosses will still have to be conducted in year 3. The use of year-round nurseries and the creation of speculative crosses in breeding with DHF2 lines will therefore prevent any delay in the inbred development.

Table 1 Timeline for development and evaluation of DH and DHF2 maize lines