An analytical formula to estimate confidence interval of QTL location with a saturated genetic map as a function of experimental design

Weller, Joel Ira; Soller, Morris

doi:10.1007/s00122-004-1664-2

An analytical formula to estimate confidence interval of QTL location with a saturated genetic map as a function of experimental design

Original Paper
Published: 23 September 2004

Volume 109, pages 1224–1229, (2004)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Theoretical and Applied Genetics Aims and scope Submit manuscript

An analytical formula to estimate confidence interval of QTL location with a saturated genetic map as a function of experimental design

Download PDF

Joel Ira Weller¹ &
Morris Soller²

288 Accesses
23 Citations
Explore all metrics

Abstract

Analytical formulae are derived for the confidence interval for location of a quantitative trait locus (QTL) using a saturated genetic map, as a function of the experimental design, the QTL allele substitution effect, and the number of individuals genotyped and phenotyped. The formulae are derived assuming evenly spaced recombination events, rather than the actual unevenly spaced distribution. The formulae are useful for determining desired sample size when designing a wide variety of QTL mapping experiments, and for evaluating a priori the potential of a given mapping population for defining the location of a QTL. The formulae do not take into account the finite number of recombination events in a given sample.

Mapping of Quantitative Trait Loci

Fine Mapping

Efficiency of low heritability QTL mapping under high SNP density

Article 09 December 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Many studies have shown that individual quantitative trait loci (QTL) can be detected and mapped with the aid of genetic markers. With multiple linked genetic markers, the approximate map location of QTL for a given set of experimental data can be determined by single marker or interval mapping. Both analytical (e.g., Lander and Botstein 1989) and empirical methods (Visscher et al. 1996) have been proposed to obtain confidence intervals (CI) for estimated QTL location, based on a given set of experimental results. While it is possible to derive empirical formulas for various specific situations by extensive simulation (e.g., Darvasi and Soller 1997; Darvasi et al. 1993; Ronin et al. 2003), it is clearly preferable to have a generally applicable simple formula for the CI of QTL map locations that enables the mapping potential of complex designs to be evaluated without the need for simulations. This is becoming particularly important with the availability of physical maps and complete genome sequences, since the width of the CI of a given QTL map location will determine the potential population of candidate genes for the QTL.

With many tightly linked markers, the limiting factors in locating a QTL are the number of recombination events in the sample and the magnitude of the QTL effect relative to the residual standard deviation (VanRaden and Weller 1994). This enables the map resolution attainable in a given experiment to be estimated by simply estimating the expected number of recombinants in a given interval. The objective of this study is to derive analytical formulas to predict the CI of any QTL map location within a saturated genetic map as a function of sample size, QTL effect, and experimental design.

Theory

A saturated genetic map consisting of many evenly spaced completely informative markers is assumed. We will further assume that the number of recombination events in a finite sample of individuals is a continuous variable, even though it is in fact discrete. The consequences of these assumptions will be considered in the Discussion. QTL location can then be estimated by single marker mapping, involving a t-test at each marker, and the most likely QTL location will be at the marker with the greatest estimated effect. With single markers, Simpson (1989) proposed that linkage of a segregating QTL to a marker could be detected by a likelihood ratio test, with the null hypothesis that the recombination frequency between the QTL and genetic marker is 0.5. Simpson (1992) showed for the backcross (BC) design that the statistical power for this test is equal to that obtained by a t-test with the null hypothesis of equal means for the two marker genotypes. It follows that for single markers, the (1−α) CI for QTL location, CI_(1-α), can be determined from the CI_(1-α) for the QTL effect. Consequently, with single marker mapping, given that the marker with the greatest estimated QTL effect is M₁, the CI for QTL location will include marker M₂ if the CI for the QTL effect at marker M₁ also includes the effect estimated at marker M₂. Thus, the CI for QTL location can be derived from the CI for the difference of the expected QTL effect for a marker at the QTL and the expected effect for a marker at some other chromosomal location. Clearly, this difference will be due solely to those individuals that are recombinant in the interval between the two markers. Therefore, given that the marker with the greatest estimated QTL effect is M₁, the CI for QTL location will include marker M₂ if the CI for QTL effect at marker M₁, considering recombinant individuals only, also includes the effect estimated at marker M₂.

Assuming a normal distribution of the estimated marker-associated effects and considering recombinant individuals only, the probability that the QTL effect at marker M₁ also includes the effect estimated at marker M₂ is equal to the probability, α/2, of obtaining the value:

$$Z_{{\alpha /2}} = D/{\text{SE}}(D)$$

(1)

where Z_α/2 is the value of the standard normal variable corresponding to a probability of α/2. The “contrast”, D= E(M₁)−E(M₂), where E(M₁) is the expected QTL effect evaluated at M₁, and E(M₂) is the expected QTL effect evaluated at M₂; SE(D) is the standard error of D.

D and SE(D) are functions of the experimental design. Their derivation is now exemplified for a BC design initiated by a cross between two parental lines. The two QTL alleles are denoted Q and q, the QTL is assumed to be located at marker M₁, and the parental genotypes are denoted M₁QM₂/M₁QM₂ and m₁qm₂/m₁qm₂. Relative to the genetic markers, there are two recombinant genotypes in the BC₁ generation: M₁m₂/m₁m₂ and m₁M₂/m₁m₂, with expected mean values denoted $\underline{{\text{M}}} _{1} /\underline{{\text{m}}} _{2} $ and $\underline{{\text{m}}} _{1} \underline{{\text{M}}} _{2} .$ $E({\text{M}}_{1} ) = \underline{{\text{M}}} _{1} \underline{{\text{m}}} _{2} - \underline{{\text{m}}} _{1} \underline{{\text{M}}} _{2} $ and $E({\text{M}}_{2} ) = \underline{{\text{m}}} _{1} \underline{{\text{M}}} _{2} - \underline{{\text{M}}} _{1} \underline{{\text{m}}} _{2} ,$ giving:

$$D = E({\text{M}}_{1} ) - E({\text{M}}_{2} ) = 2\underline{{\text{M}}} _{1} \underline{{\text{m}}} _{2} - 2\underline{{\text{m}}} _{1} \underline{{\text{M}}} _{2} = 2(\underline{{\text{M}}} _{1} \underline{{\text{m}}} _{2} - \underline{{\text{m}}} _{1} \underline{{\text{M}}} _{2} )$$

(2)

Letting the phenotypic variance within the marker genotypes equal 1.0, standardized effects at the QTL are: QQ=+d, Qq=h, and qq=−d. Defining E(M₁)=δ=d+h, and R=the number of individuals carrying a recombinant chromosome in each marker genotype group, we have:

$$D = 2(d + h) = 2\delta $$

(3)

$${\text{Var}}(\underline{{\text{M}}} _{1} \underline{{\text{m}}} _{2} ) = {\text{Var}}(\underline{{\text{m}}} _{1} \underline{{\text{M}}} _{2} ) = 1/R$$

(4)

To derive SE(D), recall that when X and Y are independent, Var[b(X−Y)]=b²(VarX+VarY). Applying this to (2) yields

$${\text{SE}}^{2} (D) = 4[{\text{Var}}(\underline{{\text{M}}} _{1} \underline{{\text{m}}} _{2} ) + {\text{Var}}(\underline{{\text{m}}} _{1} \underline{{\text{M}}} _{2} )] = 8/R$$

(5)

Substituting (3) and (5) in (1), gives

$$Z_{{\alpha /2}} = 2\delta /{\left( {8/R} \right)}^{{0.5}} $$

(6)

Defining k as the proportion of the mapping population included in each marker genotype group, r as the proportion of recombination between M₁ and M₂, and N as the population size, we have: R=rkN. For the BC design, k=0.5. Substituting rkN for R in (6) gives:

$$Z_{{\alpha /2}} = 2\delta /(8/rkN)^{{0.5}} = \delta /(2/rkN)^{{0.5}} $$

(7)

Note that the interval between markers M₁ and M₂ defines half of the CI_(1-α). Assuming a chromosome of infinite length, the CI will be symmetrical, so that r= CI_(1-α)/2, with the CI of QTL map location in units of proportion of recombination. Generally, the CI of map location is given in cM. To convert cM to proportion of recombination, cM are first converted to percent recombination using an appropriate mapping function, such as the Haldane mapping function, and then to proportion of recombination by dividing by 100. Thus, r=CI*_(1-α)/200, where CI*_(1-α) is the CI expressed as percent recombination. Substituting for r for 2R/N gives:

$$R = {\text{CI}}^{*}_{{(1 - \alpha )}} N/400$$

(8)

Substituting (8) in (7), gives Z_α/2=2δ/(3,200/CI*_(1-α)N)^0.5, and solving for CI*_(1-α) and N, yields:

$${\text{CI}}^{*}_{{(1 - \alpha )}} = 800Z^{2}_{{\alpha /2}} /\delta ^{2} N$$

(9)

and

$$N = 800Z^{2}_{{\alpha /2}} /\delta ^{2} {\text{CI}}^{*}_{{(1 - \alpha )}} $$

(10)

In a similar, but more complex derivation (see the Appendix) for the F₂ design, the contrast between the appropriate marker genotype groups, D′, is: 2δ/(2−r), where E(M₁)=δ=2d; and SE²(D′)=32/(2−r)²rN (Eqs. 18 and 19). Thus for the F₂ design:

$$Z_{{\alpha /2}} = 2\delta /(32/rN)^{{0.5}} = \delta /(2/rkN)^{{0.5}} $$

(11)

For the F₂ design only homozygotes for alternative marker alleles are used to construct the contrast (Appendix). Therefore, k=0.25 for this design. Letting r= CI*_(1-α)/200, substituting in (1), and solving for CI*_(1-α) and N yields:

$${\text{CI}}^{*}_{{(1 - \alpha )}} = 1,600Z^{2}_{{\alpha /2}} /\delta ^{2} N,$$

(12)

and

$$N = 1,600Z^{2}_{{\alpha /2}} /\delta ^{2} {\text{CI}}^{*}_{{(1 - \alpha )}} $$

(13)

Taking α=0.05, so that Z_α/2=1.96 and substituting δ=d+h and δ=2d in Eqs. 9 and 12 for CI*_(1-α) yields CI*_(1-α)=3,073/(d+h)²N for a BC design, and CI*_(1-α)=1,537/d²N for an F₂ design. These equations are virtually identical to those obtained by extensive simulation in Darvasi and Soller (1997).

Equations 7 and 11 can readily be generalized to other mapping designs according to the corresponding values for δ, the expectation of the contrast for the marker M₁ located at the QTL, and k, the proportion of the mapping population in each marker genotype group making up the contrasts for the markers M₁ and M₂. More complex mapping designs that accumulate recombination events, such as advanced intercross lines (AIL, Darvasi and Soller 1995), full-sib intercross lines (FSIL, Song et al. 1999), and recombinant inbred lines (RIL, Soller and Beckmann 1990) differ from the BC and F₂ designs in the proportion of recombination per cM. To take this into account, Eq. 7 must be modified as follows to convert the proportion of recombination, r, which is the proportion of recombination in an F₂ or BC generation, into the effective accumulated proportion of recombination obtained in generation g:

$$Z_{{\alpha /2}} = \delta _{{\text{D}}} /({\text{2}}/t_{{\text{D}}} k_{{\text{D}}} rN)^{{0.5}} $$

(14)

where δ_D and k_D are the appropriate δ and k values for the given design; and t_D is a factor that converts the proportion of recombination obtained in generation g into the effective accumulated proportion of recombination obtained in actuality. Substituting r=CI_(1-α)/2 in (14) gives the general expressions:

$${\text{CI}}^{*}_{{(1 - \alpha )}} = 400Z^{2}_{{\alpha /2}} /{\left( {\delta ^{2}_{{\text{D}}} t_{{\text{D}}} k_{{\text{D}}} N} \right)}$$

(15)

$$N = 400Z^{2}_{{\alpha /2}} /{\left( {\delta ^{2}_{{\text{D}}} t_{{\text{D}}} k_{{\text{D}}} {\text{CI}}^{*}_{{(1 - \alpha )}} } \right)}$$

(16)

Results

The predicted CI with a saturated genetic map for various experimental designs are given in Table 1. BC, F₂, and AIL designs are assumed to be initiated from fully inbred parental lines. Half-sib and full-sib designs in outcrossing populations (Soller and Genizi 1978) are the equivalent of BC and F₂ designs respectively, assuming that the parents of the families are heterozygous at the QTL and that the family size is sufficiently large so that the marker-QTL phase can be determined virtually without error. The FSIL design is a variant of the AIL design, adapted to outcrossing populations. It is initiated as a large F₁ family produced by a mating between two individuals, and is maintained by continued random mating within the families of each generation. In the cumulative AIL and FSIL designs (CAIL and CFSIL), progeny are phenotyped and genotyped in each generation from the F₂ generation on, to build a cumulating mapping population consisting of individuals from all of the generations. The mapping resolution of such a population will depend on the accumulated recombinants over all generations.

Table 1 Confidence interval of QTL location with a saturated genetic map for various experimental designs. δ represents the contrast value between marker genotype groups for the quantitative trait (codominance is assumed), t_D the effective proportion of recombination per cM, k_D the proportion of the mapping population in each of the marker genotype groups forming the mapping contrast, CI_1-α the length of the CI of 1-α in percent recombination, CI_(0.95) the length of the 95% CI in percent recombination, and N(10, 0.25) the required total population size for CI_(0.95)=10 cM with d=0.25. The codes representing the design of the population groups are as follows: BC backcross, AIL(g) advanced intercross line carried to generation g, FSIL(g) full-sib intercross line carried to generation g, CAIL(g) and CFSIL(g) cumulative AIL and FSIL respectively carried to generation g, RIL(n) recombinant inbred lines with n individuals phenotyped per line. The additive effect at the QTL is represented as d. σ_f =(2h²+(1−h²)/n)^0.5, where h² equals heritability in the narrow sense and h²=0.25 is assumed. Z_α/2 gives the value of the standard normal variable with a probability of α/2. For the CAIL and CFSIL designs the number of individuals per generation is given in parenthesis, and for the RIL designs the total number of individuals phenotyped is given in parenthesis

Full size table

In the AIL design, from the F₂ generation on, only half of the chromosomal regions in any generation will be in the heterozygous state. Thus recombination accumulates at a rate of t_D=0.5g, relative to the BC and F₂ designs, where g is the number of generations. In the FSIL design, three-quarters of the descendants of any one of the four parental chromosomes will be in the heterozygous state. Thus recombination accumulates at a rate of t_D=0.75g, relative to the BC and F₂ designs. In the RIL design the final proportion of recombination over small distances is twice that in the F₂ generation, so that t_D=2.0 (Soller and Beckmann 1990).

Assuming codominance at the QTL, the contrast values will be δ_D=d for the BC design and δ_D=2d for the F₂ and AIL designs. The effect for the FSIL designs approaches δ_D=2d, depending on the specific configuration of the marker and the QTL alleles (Song et al. 1999). The contrast value for the RIL design depends on the number of individuals phenotyped in each line, and will be δ_D=2d/σ_f, where σ ²_f is the variance among means of inbred lines. σ ²_f =2h²+(1−h²)/n where h² is the heritability in the narrow sense and n is the number of individuals scored for the quantitative trait from each RIL (Soller and Beckmann 1990).

As in the BC design, k_D=0.5 for the RIL design, for which one half of all lines are homozygous for one allele at each marker, and the other half are homozygous for the alternative allele. In the F₂ and AIL designs one quarter of the population are in each of the two contrasted marker genotype groups, so that k_D=0.25. In the FSIL population, at the optimal configuration of marker and QTL alleles, the proportions will be the equivalent of k_D=0.25 in each marker genotype group (Song et al. 1999).

Table 1 also shows the general expressions for CI_(1-α) as a function of Z_α/2, δ_D, t_D, k_D and N, and specific values for CI_(0.95). These expressions demonstrate that the CI is inversely proportional to population size and the square of the contrast value. Thus, methods such as multi-trait analysis that increase the contrast value (Korol et al. 1995) can markedly reduce the CI for given population size. General expressions for N as a function of Z_α/2, δ_D, t_D, k_D, and CI_(1-α), as derived from Eq. 16, are also presented.

The required number of individuals genotyped to obtain a CI_(0.95) of 10 cM with d=0.25 are listed in the right-hand column. g=6 is assumed for the AIL and FSIL designs, and h²=0.25 for the RIL design. Sample sizes are quite large for the BC and F₂ designs, about 5,000 and 2,500 respectively. The numbers of genotyped individuals required by the AIL and FSIL designs are one-third and one-fifth respectively those for the F₂. Values for the FSIL assume an optimal configuration of marker and QTL alleles. Cumulative AIL and cumulative FSIL with g=6 require just twice the total numbers required for a single generation, but the numbers per generation are only 40% of the total. Thus, these designs are useful when total facilities are limited and not elastic. For the RIL design, the number of individuals genotyped is the total number of lines, while the number of individuals that must be scored for the quantitative trait is the number of lines multiplied by the number of individuals per line. This value is given in Table 1 in parenthesis. When a single individual is phenotyped for each line, 768 RIL have mapping power equivalent to an AIL (g=6); but 330 RIL, each evaluated on 20 individuals (6,600 individuals scored for the quantitative trait), has the same mapping power as a BC sample of 5,000 individuals. The formulas given in Table 1 demonstrate that very large samples are required for high resolution mapping. Achieving a 1 cM CI_(0.95) would require 50,000 BC individuals, 8,000 AIL (g=–6), or 3,300 RIL with 20 individuals scored per line for a total of 66,000 phenotypes.

The equations for the BC and F₂ designs are virtually identical to Eqs. 1 and 2 of Darvasi and Soller (1997). The only differences are that the values of k are 3,073 and 1,537, instead of 3,000 and 1,500 respectively, and that the CI is measured in percent recombination, rather than in cM. The value of 3,073 is well within the CI for the empirical estimate of the corresponding parameter derived by Darvasi and Soller (1997) and denoted “k”. Furthermore, Darvasi and Soller (1997) slightly underestimated “k”, because they simulated a chromosome of 100 cM. This imposes an artificial upper limit on the CI. This problem can also be noted in Fig. 2 of Darvasi and Soller (1997), where they used only a range of 1–20 cM to estimate “k”.

The analytical formula derived can be used for any CI up to 50% recombination on either side of the estimated QTL location. The expressions in the sixth column of Table 1 can also be used to derive the minimum value of d²N for which a valid QTL CI can be derived. A CI_(0.95)≥100 will include any point up along the chromosome with up to 50% recombination relative to the point with the maximum test statistic, which means that a CI_(0.95)>100 is essentially infinite. For the BC design, a valid CI_(0.95) is obtain only if d²N>30.73. For example, if N=1,000, then a valid CI_(0.95) can be obtained only for d>0.175. If the estimated QTL position is near one end of the chromosome, a “one-tailed” CI would be more appropriate than a “two-tailed” CI that includes nonexistent DNA.

Discussion

The formulas we have derived here are only asymptotically correct. During analysis of an actual data set, the likelihood profile across the chromosome may be far from symmetrical, because chromosomes are of finite length, information content differs among markers, and marker spacing is not uniform. This would be taken into account when deriving CI from the data. We emphasize, however, that the purpose of this note is to derive a priori expectations for the CI for design purposes, or for general evaluation of the overall mapping resolution of an experiment, and not to derive CIs for specific QTL from an actual data set. In estimating CI for an actual experimental data set, empirical methods, such as bootstrap analysis (Visscher et al. 1996), can be used to obtain the CI of estimated QTL map locations.

Although this study assumed that an infinite number of markers were genotyped, Darvasi and Soller (1994) demonstrated that for most experimental designs if genotyping costs are large compared to phenotyping costs, the power to detect QTL is economically optimized by phenotyping many individuals for fewer markers. For crosses between inbred lines or half-sib designs the optimum marker spacing is 80 cM, provided that unlimited numbers of individuals are available for phenotyping. Even if phenotyping costs are large relative to genotyping costs, the optimum marker spacing is no less than 20 cM if all markers are completely informative (Darvasi et al. 1993).

Percent recombination is close to cM for small values, but underestimates cM for larger values for most commonly used mapping functions. Measuring the CI in cM, rather than percent recombination, would not affect the relationship between the simulated and the predicted CI if both are given in the same units, as long as the CI_(0.95) measured in percent recombination is <100.

The equations derived in this study do not account for the fact that the number of events of recombination in a finite sample is also finite. Although the expectation of the number of recombinants will be equal to 2R, the number of recombinants in a finite sample will have a binomial distribution, which will increase the variance of the standard error of the contrast. As noted by Kruglyak and Lander (1995) in the BC design, N events of recombination per Morgan are expected in a sample of N individuals. If in the sample of N individuals, there were no events of recombination between point x₁ and x₂ on the chromosome, then the probability of QTL location will be equal across the chromosomal segment x₁–x₂. [In this case the likelihood function is completely flat between x₁ and x₂, and has first and second derivatives of zero. This is one reason why generic software, such as Proc NLIN (SAS 1999), often has difficulty obtaining QTL CI.] Thus even for a gene with complete heritability, the mean length of the “critical interval” for gene location will be 2/N in units of Morgans, or 200/N in cM. The critical interval, as defined by Kruglyak and Lander (1995), differs from the CI in that with complete heritability there is zero probability that the gene is outside this interval.

The effect of the finite distribution of events of recombination will be negligible, unless the QTL effect is very large relative to the phenotypic standard deviation. For example even if δ=0.88, then for the BC design, CI_(0.95)=20 for N=200. In this case, 2R=20, and ten recombinants are expected in each marker class. When simulating, taking into account the binomial distribution of the number of recombinants, the standard error of the contrast was increased by 3.7%. As the effect of the QTL decreases, the discrepancy from the analytical formula will also decrease.

We were informed by a reviewer that Visscher and Goddard (2004), using somewhat different methodology, also derived the same formulas to predict CI_(0.95) for the BC and F₂ designs.

References

Darvasi A, Soller M (1994) Optimum spacing of genetic markers for determining linkage between marker loci and quantitative trait loci. Theor Appl Genet 89:351–357
Google Scholar
Darvasi A, Soller M (1995) Advanced intercross lines, an experimental population for fine genetic mapping. Genetics 141:1199–1207
CAS PubMed Google Scholar
Darvasi A, Soller M (1997) A simple method to calculate resolving power and confidence interval of QTL map location. Behav Genet 27:125–132
Google Scholar
Darvasi A, Vinreb A, Minke V, Weller JI, Soller M (1993) Detecting marker-QTL linkage and estimating QTL gene effect and map location using a saturated genetic map. Genetics 134:943–951
CAS PubMed Google Scholar
Korol AB, Ronin YI, Kirzhner VM (1995) Interval mapping of quantitative trait loci employing correlated trait complexes. Genetics 140:1137–1147
CAS PubMed Google Scholar
Kruglyak L, Lander ES (1995) High-resolution genetic mapping of complex traits. Am J Hum Genet 57:1212–1223
Google Scholar
Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–199
CAS PubMed Google Scholar
Ronin Y, Korol A, Shtemberg M, Nevo E, Soller M (2003) High resolution mapping of quantitative trait loci by selective recombinant genotyping. Genetics 174:1757–1777
Google Scholar
SAS/STAT User’s Guide, Version 8 (1999) SAS Institute, Cary
Simpson SP (1989) Detection of linkage between quantitative trait loci and restriction fragment length polymorphisms using inbred lines. Theor Appl Genet 77:815–819
Google Scholar
Simpson SP (1992) Correction: detection of linkage between quantitative trait loci and restriction fragment length polymorphisms using inbred lines. Theor Appl Genet 85:110–111
Article Google Scholar
Soller M, Beckmann JS (1990) Marker-based mapping of quantitative trait loci using replicated progenies. Theor Appl Genet 80:205–208
Article Google Scholar
Soller M, Genizi A (1978) The efficiency of experimental designs for the detection of linkage between a marker locus and a locus affecting a quantitative trait in segregating populations. Biometrics 34:47–55
Google Scholar
Song JZ, Soller M, Genizi A (1999) The full-sib intercross line (FSIL) design: a QTL mapping design for outcrossing species. Genet Res 73:61–73
Article Google Scholar
VanRaden PM, Weller JI (1994) A simple method to locate and estimate effects of individual genes with a saturated genetic marker map. J Dairy Sci 77[Suppl 1]:249
Visscher PM, Goddard ME (2004) Prediction of the confidence interval of QTL location. Behav Genet 34:477–482
Article PubMed Google Scholar
Visscher PM, Thompson R, Haley CS (1996) Confidence intervals in QTL mapping by bootstrapping. Genetics 143:1013–1020
CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported by a grant from the Israel Milk Marketing Board, the US-Israel Binational Agricultural Research and Development fund (BARD) and FP5 program of the EU under the BovMAS proposal. We thank A. Genizi for useful discussions, and the reviewers for their comments.

Author information

Authors and Affiliations

Institute of Animal Sciences, Agricultural Research Organization, The Volcani Center, P.O. Box 6, Bet Dagan, 50250, Israel
Joel Ira Weller
Department of Genetics, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
Morris Soller

Authors

Joel Ira Weller
View author publications
You can also search for this author in PubMed Google Scholar
Morris Soller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joel Ira Weller.

Additional information

Communicated by H.C. Becker

Appendix

Derivation of a formula for CI of QTL location for an F2 mapping population using only the recombinant progeny

The contrast for an F₂ mapping population is based on individuals homozygous for alternative marker alleles. Thus, to be included in the recombinant F₂ mapping population, an individual must be recombinant for at least one chromosome, and homozygous for at least one of the marker alleles. Three of the nine possible F₂ marker genotypes do not meet these criteria. These are the two homozygous parental types (M₁M₂/M₁M₂ and m₁m₂/m₁m₂) and the double recombinant type (M₁m₂/m₁M₂). The remaining six F₂ marker genotypes that meet these criteria are listed in Table 2. The genotypic value of each genotype, assuming that the QTL is located at marker M₁, and the expected number of individuals having that genotype in the mapping population are also listed in the table.

Table 2 Composition of the F₂ including only recombinant progeny

Full size table

The contrast for an F₂ mapping population is composed of the expected mean value of marker genotype groups that are homozygous for the alternative markers M₁, m₁, M₂, and m₂. Each of these, however, is composed of two recombinant marker genotype groups. For example, the marker genotype group M₂M₂ is composed of the recombinant genotype groups: m₁M₂/M₁M₂ (class E in Table 2) with genotypic value h and frequency 2(1−r)r/4 in the entire F₂ population, and marker genotype group m₁M₂/m₁M₂ (class D in Table 2) with genotypic value −d, and frequency r²/4. The mean genotypic value of the M₂M₂ group including recombinants only is the mean of the genotypic values of the classes E and D, weighted by their relative frequency in the M₂M₂ recombinant group i.e., 2(1−r)r/[2(1−r)r+r²] and r²/[2(1−r)r+r²], which simplifies to 2(1−r)/(2−r) and r/(2−r), respectively. These relative frequencies are the “weighting factors” listed in column three of Table 2. The contrast for the F₂ mapping population, D′, is computed as follows:

$$D' = (\underline{{\text{M}}} _{1} \underline{{\text{M}}} _{1} - \underline{{\text{m}}} _{1} \underline{{\text{m}}} _{1} ) - (\underline{{\text{M}}} _{2} \underline{{\text{M}}} _{2} - \underline{{\text{m}}} _{2} \underline{{\text{m}}} _{2} )$$

(17)

Letting A, B, C, D, E, and F represent their respective genotypic values, and letting k₁=2(1−r)/(2−r), and k₂=r/(2−r), so that k₁+k₂=1, we have:

$$\begin{array}{*{20}l} {{\underline{{\text{M}}} _{1} \underline{{\text{M}}} _{1} } \hfill} & {{ = k_{1} {\text{A}} + k_{2} {\text{B}}} \hfill} \\ {{{\text{m}}_{1} \underline{{\text{m}}} _{1} } \hfill} & {{ = k_{1} {\text{C}} + k_{2} {\text{D}}} \hfill} \\ {{\underline{{\text{M}}} _{2} \underline{{\text{M}}} _{2} } \hfill} & {{ = k_{1} {\text{E}} + k_{2} {\text{D}}} \hfill} \\ {{\underline{{\text{m}}} _{2} \underline{{\text{m}}} _{2} } \hfill} & {{ = k_{1} {\text{F}} + k_{2} {\text{B}}} \hfill} \\ \end{array} $$

Substituting in (17) gives

$$\begin{array}{*{20}l} {{D'} \hfill} & {{ = {\left[ {{\left( {k_{1} {\text{A}} + k_{2} {\text{B}}} \right)} - {\left( {k_{1} {\text{C}} + k_{2} {\text{D}}} \right)}} \right]} - {\left[ {{\left( {k_{1} {\text{E}} + k_{2} {\text{D}}} \right)} - {\left( {k_{1} {\text{F}} + k_{2} {\text{B}}} \right)}} \right]}} \hfill} \\ {{} \hfill} & {{ = k_{1} {\text{A}} + k_{1} {\text{F}} - k_{1} {\text{C}} - k_{1} {\text{E}} + {\text{2}}k_{2} {\text{B}} - {\text{2}}k_{2} {\text{D}}} \hfill} \\ {{} \hfill} & {{ = k_{1} {\left( {{\text{A}} + {\text{F}} - {\text{C}} - {\text{E}}} \right)} + 2k_{2} {\left( {{\text{B}} - {\text{D}}} \right)}} \hfill} \\ {{} \hfill} & {{ = {\left[ {1/{\left( {2 - r} \right)}} \right]}{\left\{ {{\left[ {2{\left( {1 - r} \right)}} \right]}{\left[ {d + h + d - h} \right]} + 2r{\left[ {d + d} \right]}} \right\}}} \hfill} \\ {{} \hfill} & {{ = 4d/{\left( {2 - r} \right)}} \hfill} \\ \end{array} $$

(18)

To calculate SE(D), note that:

$$\begin{array}{*{20}l} {{\sigma ^{2}_{{\text{A}}} = \sigma ^{2}_{{\text{F}}} = \sigma ^{2}_{{\text{C}}} = \sigma ^{2}_{{\text{E}}} = 1/[2(1 - r)rN/4] = 4/2(1 - r)rN} \hfill} \\ {{\sigma ^{2}_{{\text{B}}} = \sigma ^{2}_{{\text{D}}} = 4/r^{2} N} \hfill} \\ \end{array} $$

Thus:

$$\begin{array}{*{20}l} {{{\text{SE}}^{{\text{2}}} {\left( {D'} \right)}} \hfill} & {{ = 4{\left[ {2{\left( {1 - r} \right)}/{\left( {2 - r} \right)}} \right]}^{2} 4/{\left[ {2{\left( {1 - r} \right)}rN} \right]} + 2{\left[ {2r/{\left( {2 - r} \right)}} \right]}^{2} {\left[ {4/{\left( {r^{2} N} \right)}} \right]}} \hfill} \\ {{} \hfill} & {{ = 4{\left[ {2{\left( {1 - r} \right)}} \right]}/{\left( {2 - r} \right)}^{2} {\left[ {4/rN} \right]} + 2{\left[ {4/{\left( {2 - r} \right)}^{2} } \right]}{\left[ {4/N} \right]}} \hfill} \\ {{} \hfill} & {{ = 32{\left( {1 - r} \right)}/{\left( {2 - r} \right)}^{2} rN + 32/{\left( {2 - r} \right)}^{2} N = {\left[ {32{\left( {1 - r} \right)} + 32r} \right]}/{\left[ {{\left( {2 - r} \right)}^{2} rN} \right]}} \hfill} \\ {{} \hfill} & {{ = 32/{\left( {2 - r} \right)}^{2} rN} \hfill} \\ \end{array} $$

(19)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weller, J.I., Soller, M. An analytical formula to estimate confidence interval of QTL location with a saturated genetic map as a function of experimental design. Theor Appl Genet 109, 1224–1229 (2004). https://doi.org/10.1007/s00122-004-1664-2

Download citation

Received: 25 December 2003
Accepted: 20 March 2004
Published: 23 September 2004
Issue Date: October 2004
DOI: https://doi.org/10.1007/s00122-004-1664-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An analytical formula to estimate confidence interval of QTL location with a saturated genetic map as a function of experimental design

Abstract

Similar content being viewed by others

Mapping of Quantitative Trait Loci

Fine Mapping

Efficiency of low heritability QTL mapping under high SNP density

Introduction

Theory

Results

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Derivation of a formula for CI of QTL location for an F2 mapping population using only the recombinant progeny

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An analytical formula to estimate confidence interval of QTL location with a saturated genetic map as a function of experimental design

Abstract

Similar content being viewed by others

Mapping of Quantitative Trait Loci

Fine Mapping

Efficiency of low heritability QTL mapping under high SNP density

Introduction

Theory

Results

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Derivation of a formula for CI of QTL location for an F2 mapping population using only the recombinant progeny

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation