Interval Mapping for Ordinal Traits

Xu, Shizhong

doi:10.1007/978-0-387-70807-2_10

Shizhong Xu²

3672 Accesses

Abstract

Many disease resistance traits in agricultural crops are measured in ordered categories. Thegeneralized linear model (GLM) methodology (McCullagh and Nelder 1999, Nelder and Wedderburn 1972, Wedderburn 1974) is an ideal tool to analyze these traits.Ordinal traits are usually controlled by the segregation of multiple QTL and environmental factors. The genetic architecture of such traits can be studied using linkage analysis. One can analyze the association of each marker with the disease phenotype. If the marker information is fully observable, i.e., marker genotypes can be observed, the standardGLM methodology can be directly applied to the association study by screening markers of the entire genome for their association with the disease trait. Many statistical software packages, e.g.,SAS (SAS Institute 2008b), have built-in functions or procedures to perform the standard GLM analysis. One can simply execute the built-in procedures many times, one for each marker, to scan the entire genome without developing a new computer program. In any genetic experiments, missing marker genotypes are unavoidable.

Access provided by Autonomous University of Puebla. Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Many disease resistance traits in agricultural crops are measured in ordered categories. Thegeneralized linear model (GLM) methodology (McCullagh and Nelder 1999, Nelder and Wedderburn 1972, Wedderburn 1974) is an ideal tool to analyze these traits.Ordinal traits are usually controlled by the segregation of multiple QTL and environmental factors. The genetic architecture of such traits can be studied using linkage analysis. One can analyze the association of each marker with the disease phenotype. If the marker information is fully observable, i.e., marker genotypes can be observed, the standardGLM methodology can be directly applied to the association study by screening markers of the entire genome for their association with the disease trait. Many statistical software packages, e.g.,SAS (SAS Institute 2008b), have built-in functions or procedures to perform the standard GLM analysis. One can simply execute the built-in procedures many times, one for each marker, to scan the entire genome without developing a new computer program. In any genetic experiments, missing marker genotypes are unavoidable. In addition, interval mapping requires detection of association between the trait phenotype and loci that are not necessarily located at marker positions. Genotypes of these additional loci are never observed. Therefore, GLM with missing values must be applied. There is a rich literature on the missing value GLM analysis (Horton and Laird 1999, Ibrahim 1990, Ibrahim et al. 2002, 2005). The most popular method is the maximum likelihood (ML) method implemented via the EM algorithm (Horton and Laird 1999). Other methods are also available, such asmultiple imputation (MI, Rubin [1987]), fully Bayesian (FB, Ibrahim et al. [2002]) andweighted estimating equations (WEE, Ibrahim et al. [2005]). A complete review on the methods can be found in Ibrahim et al. [2005]. Hackett and Weller [1995] first applied the ML method to mapping ordinal trait QTL. They took advantage of an existing software package namedGeneStat for the standard GLM analysis (without missing covariates) and modified the software by incorporating a weight variable. The modified GLM for missing data duplicates the data by the number of genotypes per locus, e.g., two for a backcross population and three for an { F} ₂ population. The weight variable is simply the posterior probabilities of the missing genotypes. The weight variable is updated iteratively until the iteration converges. The modified GLM program is not necessarily simpler than a program written anew. Furthermore, the variance–covariance matrix of estimated parameters is not available for the modified GML algorithm. Xu et al. [2003] developed an explicit EM algorithm using the posterior probability of missing covariates as the weight variable and further provided the variance–covariance matrix of the estimated parameters by using the Louis’ [1982] adjustment for the information matrix. Standard deviations (square roots of the variances) of estimated parameters represent the precisions of the estimates, which are required in the final report for publication. The variance–covariance matrix of the estimated QTL effects can also be used to calculate the Wald-test statistic (Wald 1943), which is an alternative test that can replace the likelihood ratio test statistic. Although using the large sample distribution for the likelihood ratio test gives more accurate approximation for small and moderate-sized samples, the latter has a computational advantage since it does not require calculation of the likelihood function under the null model (McCulloch and Searle 2001). A missing QTL genotype usually has partial information, which can be extracted from linked markers. This information can be used to infer the QTL genotypes using several different ways (McCulloch and Searle 2001). In QTL mapping for continuously distributed traits,mixture model (Lander and Botstein 1989) is the most efficient way to take advantage of marker information. Theleast-squares method of Haley and Knott [1992] is the simplest way to incorporate linked markers. Performances of the weighted least-squares method of Xu [1998a,b] and estimating equations (EE) algorithm of Feenstra et al. [2006] are usually between the least-squares and mixture model methods. These methods have been successfully applied to QTL mapping for continuous traits, but they have not been investigated for ordinal trait QTL mapping. This chapter will introduce several alternative GLM methods for mapping quantitative trait loci of ordinal traits.

1 Generalized Linear Model

Suppose that a disease phenotype of individual j (j = 1, …, n) is measured by an ordinal variable denoted by S _j = 1, …, p + 1, where p + 1 is the total number of disease classes and n is the sample size. Let Y _j = { Y _jk}, ∀k = 1, …, p + 1, be a (p + 1) ×1 vector to indicate the disease status of individual j. The kth element of Y _j is defined as

$${ Y }_{jk} = \left \{\begin{array}{*{20}c} 1\\ 0\\ \end{array} \right.\begin{array}{*{20}c} \mathrm{if\ }{S}_{j} = k \\ \mathrm{if\ }{S}_{j}\neq k \\ \end{array}$$

(10.1)

Using theprobit link function, the expectation of Y _jk is defined as

$${\mu }_{jk} = E({Y }_{jk}) = \Phi ({\alpha }_{k} + {X}_{j}\beta + {Z}_{j}\gamma ) - \Phi ({\alpha }_{k-1} + {X}_{j}\beta + {Z}_{j}\gamma )$$

(10.2)

where α_k (α₀ = − ∞ and α_p + 1 = + ∞) is the intercept, β is a q ×1 vector for some systematic effects (not related to the effects of quantitative trait loci), and γ is an r ×1 vector for the effects of a quantitative trait locus. The symbol Φ(. ) is the standardized cumulative normal function. The design matrix X _j is assumed to be known, but Z _j may not be fully observable because it is determined by the genotype of j for the locus of interest. Because the link function is probit, this type of analysis is called probit analysis. Let μ_j = { μ_jk} be a (p + 1) ×1 vector. The expectation for vector Y _j is E(Y _j) = μ_j, and the variance matrix of Y _j is

$${V }_{j} = \mathrm{var}({Y }_{j}) = {\psi }_{j} + {\mu }_{j}{\mu }_{j}^{T}$$

(10.3)

where ψ_j = diag(μ_j). The method to be developed requires the inverse of matrix V _j. However, V _j is not of full rank. We can use ageneralized inverse of V _j, such as V _j ⁻ = ψ_j ^− 1, in place of V _j ^− 1. The parameter vector is θ = { α, β, γ} with a dimensionality of (p + q + r) ×1.Binary data is a special case of ordinal data in that p = 1 so that there are only two categories, S _j = { 1, 2}. The expectation of Y _jk is

$${ \mu }_{jk} = \left \{\begin{array}{*{20}c} \Phi ({\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma ) - \Phi ({\alpha }_{0} + {X}_{j}\beta + {Z}_{j}\gamma ) \\ \Phi ({\alpha }_{2} + {X}_{j}\beta + {Z}_{j}\gamma ) - \Phi ({\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma )\\ \end{array} \right.\begin{array}{*{20}c} \mathrm{for}\ k = 1 \\ \mathrm{for } \ k = \end{array}$$

(10.4)

Because α₀ = − ∞ and α₂ = + ∞ in the binary case, we have

$${ \mu }_{jk} = \left \{\begin{array}{*{20}c} \Phi ({\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma ) \\ 1 - \Phi ({\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma )\\ \end{array} \right.\begin{array}{*{20}c} \mathrm{for}\ k = 1 \\ \mathrm{for } \ k = \end{array}$$

(10.5)

We can see that μ_j2 = 1 − μ_j1 and

$${\Phi }^{-1}({\mu }_{ j1}) = {\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma $$

(10.6)

The link function is Φ ^− 1(. ), and thus, it is called the probit link function. Once we take the probit transformation, the model becomes a linear model. Therefore, this type of model is called ageneralized linear model (GLM). The ordinary linear model we learned before for continuous traits is a special case of the GLM because the link function is simply the identity, i.e.,

$${I}^{-1}({\mu }_{ j1}) = {\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma $$

(10.7)

or simply

$${\mu }_{j1} = {\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma $$

(10.8)

Most techniques we learned for the linear model apply to the generalized linear model.

2 ML Under Homogeneous Variance

Let us first assume that the genotypes of the QTL are observed for all individuals. In this case, variable Z _j is not missing. The log likelihood function under the probit model is

$$L(\theta ) ={ \sum }_{j=1}^{n}{L}_{ j}(\theta )$$

(10.9)

where

$${L}_{j}(\theta ) ={ \sum }_{k=1}^{p+1}{Y }_{ jk}\ln [\Phi ({\alpha }_{k} + {X}_{j}\beta + {Z}_{j}\gamma ) - \Phi ({\alpha }_{k-1} + {X}_{j}\beta + {Z}_{j}\gamma )]$$

(10.10)

and θ = { α, β, γ} is the vector of parameters. This is the simplest GLM problem, and the classical iteratively reweighted least-squares approach for GLM (Nelder and Wedderburn 1972, Wedderburn 1974) can be used without any modification. The iterative equation under the classical GLM is given below:

$${\theta }^{(t+1)} = {\theta }^{(t)} + {I}^{-1}({\theta }^{(t)})S({\theta }^{(t)})$$

(10.11)

where θ^(t) is the parameter value in the current iteration, I(θ^(t)) is theinformation matrix, and S(θ^(t)) is the score vector, both evaluated at θ^(t). We can interpret

$$\Delta \theta = {I}^{-1}({\theta }^{(t)})S({\theta }^{(t)})$$

(10.12)

in (10.11) as the adjustment for θ^(t) to improve the solution in the direction that leads to the ultimate maximum likelihood estimate of θ. Equation (10.3) shows that the variance of Y _j is a function of the expectation of Y _j. This special relationship leads to a convenient way to calculate the information matrix and the score vector, as given by Wedderburn [1974],

$$I(\theta ) ={ \sum \nolimits }_{j=1}^{n}{D}_{ j}^{T}{W}_{ j}{D}_{j}$$

(10.13)

and

$$S(\theta ) ={ \sum \nolimits }_{j=1}^{n}{D}_{ j}^{T}{W}_{ j}({Y }_{j} - {\mu }_{j})$$

(10.14)

where W _j = ψ_j ^− 1. Therefore, the increment (adjustment) of the parameter can be estimated using the following iteratively reweighted least-squares approach:

$$\Delta \theta ={ \left [{\sum \nolimits }_{j=1}^{n}{D}_{ j}^{T}{W}_{ j}{D}_{j}\right ]}^{-1}\left [{\sum \nolimits }_{j=1}^{n}{D}_{ j}^{T}{W}_{ j}({Y }_{j} - {\mu }_{j})\right ]$$

(10.15)

where D _j is a (p + 1) ×(p + q + r) matrix for the first partial derivatives of μ_j with respect to the parameters and W _j = V _j ⁻ = ψ_j ^− 1 is the weight matrix. Matrix D _j can be partitioned into three blocks,

$${ D}_{j} = \frac{\partial {\mu }_{j}} {\partial {\theta }^{T}} = \left [\begin{array}{*{20}c} \frac{\partial {\mu }_{j}} {\partial {\alpha }^{T}} & \frac{\partial {\mu }_{j}} {\partial {\beta }^{T}} & \frac{\partial {\mu }_{j}} {\partial {\gamma }^{T}}\\ \end{array} \right ]$$

(10.16)

The first block ∂μ_j ∂α ^T = ∂μ_jk ∂α _l is a (p + 1) ×p matrix with

$$\begin{array}{rlrlrl} \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{k-1}} & = -\phi ({\alpha }_{k-1} + {X}_{j}\beta + {Z}_{j}\gamma ) & & \\ \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{k}} & = \phi ({\alpha }_{k} + {X}_{j}\beta + {Z}_{j}\gamma ) & & \\ \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{l}} & = 0,\;\forall l\neq \{k - 1\;,k\} &\end{array}$$

(10.17)

The second block ∂μ_j ∂β ^T = ∂μ_jk ∂β is a (p + 1) ×q matrix with

$$\frac{\partial {\mu }_{jk}} {\partial \beta } = {X}_{j}^{T}[\phi ({\alpha }_{ k} + {X}_{j}\beta + {Z}_{j}\gamma ) - \phi ({\alpha }_{k-1} + {X}_{j}\beta + {Z}_{j}\gamma )]$$

(10.18)

The third block ∂μ_j ∂γ ^T = ∂μ_jk ∂γ is a (p + 1) ×r matrix with

$$\frac{\partial {\mu }_{jk}} {\partial \gamma } = {Z}_{j}^{T}[\phi ({\alpha }_{ k} + {X}_{j}\beta + {Z}_{j}\gamma ) - \phi ({\alpha }_{k-1} + {X}_{j}\beta + {Z}_{j}\gamma )]$$

(10.19)

In all the above partial derivatives, the range of k is k = 1, …, p + 1. The sequence of parameter values during the iteration process converges to a local maximum likelihood estimate, denoted by ${\hat{\theta}}$. The variance–covariance matrix of ${\hat{\theta}}$ is approximately equal to var(${\hat{\theta}}$) = I ^− 1(${\hat{\theta}}$), which is a by-product of the iteration process. Here, we are actually dealing with a situation where the QTL overlaps with a fully informative marker because observed marker genotypes represent the genotypes of the disease locus. If the QTL of interest does not overlap with any markers, the genotype of the QTL is not observable, i.e., Z _j is missing. The classical GLM does not apply directly to such a situation. The missing value Z _j still has some information due to linkage with some markers. Again, we use an { F} ₂ population as an example to show how to handle the missing value of Z _j. The ML estimation of parameters under thehomogeneous variance model is obtained simply by substituting Z _j with the conditional expectation of Z _j given flanking marker information. Let

$${p}_{j}(2 - g) =\Pr ({Z}_{j} = {H}_{g}\vert \mathrm{marker}),\forall g = 1,2,3$$

(10.20)

be the conditional probability of the QTL genotype given marker information, where the marker information can be either drawn from two flanking markers (interval mapping, Lander and Botstein 1989) or multiple markers (multipoint analysis, Jiang and Zeng 1997). Note that p _j(2 − g) is not p _j multiplied by (2 − g); rather, it is a notation for the probabilities of the three genotypes. For g = 1, 2, 3, we have p _j( − 1), p _j(0), and p _j( + 1), respectively, where p _j( − 1), etc., are defined early in Chap. 9. Vector H _g for g = 1, 2, 3 is also defined in Chap. 9 as genotype indicator variables.

Using marker information, we can calculate the expectation of Z _j, which is

$${U}_{j} = E({Z}_{j}) ={ \sum }_{g=1}^{3}{p}_{ j}(2 - g){H}_{g}$$

(10.21)

The method is called ML under thehomogeneous residual variance because when we substitute Z _j by U _j, the residual error variance is no longer equal to unity; rather it is inflated, and the inflation varies across individuals. However, the homogeneous variance model here assumed the residual variance is constant across individuals. This method is the ordinal trait analogy of the Haley and Knott’s [1992] method of QTL mapping.

3 ML Under Heterogeneous Variance

The homogeneous variance model is only a first moment approximation because the uncertainty of the estimated Z _j has been ignored. Let

$${\Sigma }_{j} = \mathrm{var}({Z}_{j}) ={ \sum }_{g=1}^{3}{p}_{ j}(2 - g){H}_{g}^{T}{H}_{ g} - {U}_{j}^{T}{U}_{ j}$$

(10.22)

be the conditional covariance matrix for Z _j. Note that model (10.2) with Z _j substituted by U _j is

$${\mu }_{jk} = E({Y }_{jk}) = \Phi ({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma ) - \Phi ({\alpha }_{k-1} + {X}_{j}\beta + {U}_{j}\gamma )$$

(10.23)

An underlying assumption for this probit model is that the residual error variance for the “underlying liability” of the disease trait is unity across individuals. Once U _j is used in place of Z _j, the residual error variance becomes

$${\sigma }_{j}^{2} = {\gamma }^{T}{\Sigma }_{ j}\gamma + 1$$

(10.24)

This is an inflated variance, and it is heterogeneous across individuals. In order to apply the probit model, we need to rescale the model effects as follows (Xu and Hu 2010):

$${\mu }_{jk} = \Phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma )\right ] - \Phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k-1} + {X}_{j}\beta + {U}_{j}\gamma )\right ]$$

(10.25)

This modification leads to a change in the partial derivatives of μ_j with respect to the parameters. Corresponding changes in the derivatives are given below.

$$\begin{array}{rcl} \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{k-1}}& =& -\frac{1} {{\sigma }_{j}}\phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k-1} + {X}_{j}\beta + {U}_{j}\gamma )\right ] \\ \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{k}} & =& \frac{1} {{\sigma }_{j}}\phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma )\right ] \\ \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{l}} & =& 0,\;\forall l\neq \{k - 1\;,k\} <EquationNumber>10.26</EquationNumber> \\ \frac{\partial {\mu }_{jk}} {\partial \beta } & =& \frac{1} {{\sigma }_{j}}\phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma )\right ]{X}_{j}^{T} \\ & & -\frac{1} {{\sigma }_{j}}\phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k-1} + {X}_{j}\beta + {U}_{j}\gamma )\right ]{X}_{j}^{T}\end{array}$$

(10.27)

and

$$\begin{array}{rcl} \frac{\partial {\mu }_{jk}} {\partial \gamma } & =& \frac{1} {{\sigma }_{j}}\phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma )\right ]\left [{U}_{j}^{T} - \frac{1} {{\sigma }_{j}^{2}} ({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma ){\Sigma }_{j}\gamma \right ] \\ & & -\frac{1} {{\sigma }_{j}}\phi \left [ \frac{1} {{\sigma }_{j}}({\alpha }_{k-1} + {X}_{j}\beta + {U}_{j}\gamma )\right ]\left [{U}_{j}^{T} - \frac{1} {{\sigma }_{j}^{2}} ({\alpha }_{k-1} + {X}_{j}\beta + {U}_{j}\gamma ){\Sigma }_{j}\gamma \right ] \\ & & \end{array}$$

(10.28)

The iteration formula remains the same as (10.11) except that the modified weight and partial derivatives are used under theheterogeneous residual variance model.

4 ML Under Mixture Distribution

Themixture model approach defines genotype-specific expectation, variance matrix, and all derivatives for each individual. Let

$${\mu }_{jk}(g) = E({Y }_{jk}) = \Phi ({\alpha }_{k} + {X}_{j}\beta + {H}_{g}\gamma ) - \Phi ({\alpha }_{k-1} + {X}_{j}\beta + {H}_{g}\gamma )$$

(10.29)

be the expectation of Y _jk if j takes the gth genotype for g = 1, 2, 3. The correspondingvariance–covariance matrix is

$${V }_{j}(g) = {\psi }_{j}(g) - {\mu }_{j}(g){\mu }_{j}^{T}(g)$$

(10.30)

where ψ_j(g) = diag[μ_j(g)]. Let D _j(g) be the partial derivatives of the expectation with respect to the parameters. The corresponding values of D _j(g) are

$$ \begin{array}{l} \frac{{\partial \mu _{jk} (g)}}{{\partial \alpha _{k - 1} }} = - \phi (\alpha _{k - 1} + X_j \beta + H_g \gamma ) \\ \frac{{\partial \mu _{jk} (g)}}{{\partial \alpha _k }} = \phi (\alpha _k + X_j \beta + H_g \gamma ) \\ \frac{{\partial \mu _{jk} (g)}}{{\partial \alpha _l }} = 0,\forall l \ne \left\{ {k - 1,k} \right\} \\ \frac{{\partial \mu _{jk} (g)}}{{\partial \beta }} = X_j^T \left[ {\phi (\alpha _k + X_j \beta + H_g \gamma ) - \phi (\alpha _{k - 1} + X_j \beta + H_g \gamma )} \right] \\ \end{array} $$

(10.32)

$$ \frac{{\partial \mu _{jk} (g)}}{{\partial \gamma }} = H_g^T \left[ {\phi (\alpha _k + X_j \beta + H_g \gamma ) - \phi (\alpha _{k - 1} + X_j \beta + H_g \gamma )} \right] $$

and

$$\frac{\partial {\mu }_{jk}(g)} {\partial \gamma } = {H}_{g}^{T}[\phi ({\alpha }_{ k} + {X}_{j}\beta + {H}_{g}\gamma ) - \phi ({\alpha }_{k-1} + {X}_{j}\beta + {H}_{g}\gamma )]$$

(10.33)

Let us define the posterior probability of QTL genotype after incorporating the disease phenotype for individual j as

$$ p_j^* (2 - g) = \frac{{p_j (2 - g)Y_j^T \mu _j (g)}}{{\sum\nolimits_{g' = 1}^3 {p_j (2 - g')Y_j^T \mu _j (g')} }} $$

(10.34)

The increment for parameter updating under the mixture model is

$$ \Delta \theta = \left[ {\sum\nolimits_{j = 1}^n {E\left( {D_j^T W_j D_j } \right)} } \right]^{ - 1} \left[ {\sum\nolimits_{j = 1}^n {E\left( {D_j^T W_j (Y_j - \mu _j )} \right)} } \right] $$

(10.35)

where

$$ E\left( {D_j^T W_j (Y_j - \mu _j )} \right) = \sum\limits_{g = 1}^3 {p_j^* (2 - g)} \,D_j^T (g)W_j (g)(Y_j - \mu _j (g)) $$

(10.37)

$$ E\left( {D_j^T W_j D_j } \right) = \sum\limits_{g = 1}^3 {p_j^* (g)} \,D_j^T (g)W_j (g)D_j (g) $$

(10.36)

and

$$ W_j (g) = \psi _j^{ - 1} (g) $$

(10.38)

This is actually anEM algorithm where calculating the posterior probabilities of QTL genotype and using the posterior probabilities to calculate ED _j ^T W _j D _j and ED _j ^T W _j(Y _j − μ_j) constitute the E-step and calculating the increment of the parameter using the weighted least-squares formula makes up the M-step. A problem with this EM algorithm is that var(${\hat{\theta}}$) is not a by-product of the iteration process. For simplicity, if the markers are sufficiently close to the trait locus of interest, we can use

$$\mathrm{var}(\hat{\theta }) \approx {\left [{\sum \nolimits }_{j=1}^{n}E\left ({D}_{ j}^{T}{W}_{ j}{D}_{j}\right )\right ]}^{-1}$$

(10.39)

to approximate the covariance matrix of estimated parameters. This is an underestimated variance matrix. A more precise method to calculate var(${\hat{\theta}}$) is to adjust the above equation by the information loss due to uncertainty of the QTL genotype. Let

$$S(\hat{\theta }\vert Z) ={ \sum }_{j=1}^{n}{D}_{ j}^{T}{W}_{ j}({Y }_{j} - {\mu }_{j})$$

(10.40)

be the score vector as if Z were observed. Louis [1982] showed that the information loss is due to the variance–covariance matrix of the score vector, which is

$$\mathrm{var}[S(\hat{\theta }\vert Z)] ={ \sum }_{j=1}^{n}\mathrm{var}\left [{D}_{ j}^{T}{W}_{ j}({Y }_{j} - {\mu }_{j})\right ]$$

(10.41)

The variance is taken with respect to the missing value Z using the posterior probability of QTL genotype. The information matrix after adjusting for the information loss is

$$I(\hat{\theta }) ={ \sum }_{j=1}^{n}E\left ({D}_{ j}^{T}{W}_{ j}{D}_{j}\right ) -{\sum }_{j=1}^{n}\mathrm{var}\left [{D}_{ j}^{T}{W}_{ j}({Y }_{j} - {\mu }_{j})\right ]$$

(10.42)

The variance–covariance matrix for the estimated parameters is then approximated by var(${\hat{\theta}}$) = I ^− 1(${\hat{\theta}}$). Details of { var}[D _j ^T W _j(Y _j − μ_j)] are given by Xu and Hu [2010].

5 ML via the EM Algorithm

TheEM algorithm to be introduced here is different from the EM under the mixture model described in the previous section. We now use a liability model (Xu et al. 2003) to derive the EM algorithm. Xu et al. [2003] hypothesizes that there is an underlying liability that controls the observed phenotype. Theliability is a continuous variable and has exactly the same behavior as a quantitative trait. The only difference is that the liability is not observable while the quantitative trait can be measured in experiments. The observed ordinal trait phenotype is connected with the liability by a series of thresholds, as demonstrated in Fig. 10.1. In thegeneralized linear model under the mixture distribution, the EM algorithm treats the QTL genotype as missing value. Here, we treat the liability asmissing value as well. Let y _j be the liability for the jth individual. This is different from Y _j = { Y _jk}, the multivariate representation of the ordered categorical phenotype in the generalized linear model. The liability can be described by the following linear model:

$${y}_{j} = {X}_{j}\beta + {Z}_{j}\gamma + {\epsilon }_{j}$$

(10.43)

where ε_j ∼ N(0, σ²) is assumed. Under the liability model, σ² cannot be estimated, and thus, we set σ² = 1. This arbitrary scale will not affect the significance test because the estimated parameters θ = { α, β, γ} are defined relative to σ². The connection between y _j and the observed phenotype is

$${S}_{j} = k\ ,\mathrm{for}\ {\alpha }_{k-1} < {y}_{j} \leq {\alpha }_{k}$$

(10.44)

where k = 1, …, p + 1. The thresholds α do not appear in the linear model explicitly but serve as converters from y _j to S _j. Xu et al. [2003] developed an EM algorithm for ordinal trait QTL mapping by using this liability model. They used a three-step approach, where the first step is to estimate the non-QTL effects (β), the second step is to estimate the QTL effects (γ), and the third step is to estimate thethresholds (α). The method does not have a simple way to calculate the variance–covariance matrix of the estimated parameters. Xu and Xu [2006] extended the method using a multivariate version of the GLM. This method gives a way to calculate the variance–covariance matrix of the estimated parameters. Both methods (Xu and Xu 2006, Xu et al. 2003) are quite complicated in the E-step. When the number of categories is two (the binary case), both methods can be simplified. This section will deal with the simplifiedbinary trait QTL mapping where only one threshold is applied. In this case, the single threshold is set to zero so that it is not a parameter for estimation, and thus, we only estimate β and γ. In the binary situation, S _j = { 1, 2} and

$${ Y }_{j1} = \left \{\begin{array}{*{20}c} 1\\ 0\\ \end{array} \right.\begin{array}{*{20}c} \mathrm{for} \\ \mathrm{for} \\ \end{array} \begin{array}{*{20}c} {S}_{j} = 1 \\ {S}_{j} = \end{array}$$

(10.45)

and

$${ Y }_{j2} = \left \{\begin{array}{*{20}c} 0\\ 1\\ \end{array} \right.\begin{array}{*{20}c} \mathrm{for} \\ \mathrm{for} \\ \end{array} \begin{array}{*{20}c} {S}_{j} = 1 \\ {S}_{j} = \end{array}$$

(10.46)

The liability model remains the same as that given in (10.43). The derivation of the EM algorithm starts with the complete-data situation. If both Z _j and y _j were observed, the ML estimates of β and γ would be

$$\left [\begin{array}{*{20}c} \beta \\ \gamma \\ \end{array} \right ] ={ \left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}{X}_{j}&{\sum }_{j=1}^{n}{X}_{j}^{T}{Z}_{j} \\ {\sum }_{j=1}^{n}{Z}_{j}^{T}{X}_{j} & {\sum }_{j=1}^{n}{Z}_{j}^{T}{Z}_{j}\\ \end{array} \right ]}^{-1}\left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}{y}_{j} \\ {\sum }_{j=1}^{n}{Z}_{j}^{T}{y}_{j}\\ \end{array} \right ]$$

(10.47)

This is simply the ordinary least-squares estimates of the parameters. TheEM algorithm takes advantage of this explicit solution in the maximization step. If we had observed y _j but still not been able to estimate Z _j, the maximization step of the EM algorithm would be

$$\left [\begin{array}{*{20}c} \beta \\ \gamma \\ \end{array} \right ] ={ \left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}{X}_{j} &{\sum }_{j=1}^{n}{X}_{j}^{T}E({Z}_{j}) \\ {\sum }_{j=1}^{n}E({Z}_{j}^{T}){X}_{j}& {\sum }_{j=1}^{n}E({Z}_{j}^{T}{Z}_{j})\\ \end{array} \right ]}^{-1}\left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}{y}_{j} \\ {\sum }_{j=1}^{n}E({Z}_{j}^{T}){y}_{j}\\ \end{array} \right ]$$

(10.48)

The problem here is that we observe neither Z _j nor y _j. Intuitively, the maximization step of the EM should be

$$\left [\begin{array}{*{20}c} \beta \\ \gamma \\ \end{array} \right ] ={ \left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}{X}_{j} &{\sum }_{j=1}^{n}{X}_{j}^{T}E({Z}_{j}) \\ {\sum }_{j=1}^{n}E({Z}_{j}^{T}){X}_{j}& {\sum }_{j=1}^{n}E({Z}_{j}^{T}{Z}_{j})\\ \end{array} \right ]}^{-1}\left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}E({y}_{j}) \\ {\sum }_{j=1}^{n}E({Z}_{j}^{T}{y}_{j})\\ \end{array} \right ]$$

(10.49)

where the expectations are taken with respect to both Z _j and y _j using the posterior probabilities of QTL genotypes. We now present the method for calculating these expectation terms. We first address E(Z _j) and E(Z _j ^T Z _j) using the posterior probabilities of the QTL genotypes.

$$\begin{array}{rcl}{ p}_{j}^{{_\ast}}(2 - g) = \frac{{p}_{j}(2 - g){\left [\Phi ({X}_{j}\beta + {H}_{g}\gamma )\right ]}^{{Y }_{j1}}{\left [1 - \Phi ({X}_{ j}\beta + {H}_{g}\gamma )\right ]}^{{Y }_{j2}}} {{\sum \nolimits }_{g^{\prime}=1}^{3}{p}_{j}(2 - g^{\prime}){\left [\Phi ({X}_{j}\beta + {H}_{g^{\prime}}\gamma )\right ]}^{{Y }_{j1}}{\left [1 - \Phi ({X}_{j}\beta + {H}_{g^{\prime}}\gamma )\right ]}^{{Y }_{j2}}} & & \\ & &\end{array}$$

(10.50)

Given the posterior probabilities, we have

$$E({Z}_{j}) ={ \sum }_{g=1}^{3}{p}_{ j}^{{_\ast}}(2 - g){H}_{ g}$$

(10.51)

and

$$E({Z}_{j}^{T}{Z}_{ j}) ={ \sum }_{g=1}^{3}{p}_{ j}^{{_\ast}}(2 - g){H}_{ g}^{T}{H}_{ g}$$

(10.52)

The expectations for terms that involve y _j can be expressed as

$$E({y}_{j}) ={ E }_{Z}\left [{E }_{y}({y}_{j}\vert {Z}_{j})\right ] ={ \sum }_{g=1}^{3}{p}_{ j}^{{_\ast}}(2 - g){E }_{ y}({y}_{j}\vert {H}_{g})$$

(10.53)

and

$$E({Z}_{j}^{T}{y}_{ j}) ={ E }_{Z}\left [{Z}_{j}^{T}{ E }_{ y}({y}_{j}\vert {Z}_{j})\right ] ={ \sum }_{g=1}^{3}{p}_{ j}^{{_\ast}}(2 - g){H}_{ g}^{T}{ E }_{ y}({y}_{j}\vert {H}_{g})$$

(10.54)

where

$$\begin{array}{rcl} {E }_{y}({y}_{j}\vert {H}_{g}) = {X}_{j}\beta + {H}_{g}\gamma + \frac{({Y }_{j2} - {Y }_{j1})\phi ({X}_{j}\beta + {H}_{g}\gamma )} {{\left [\Phi ({X}_{j}\beta + {H}_{g}\gamma )\right ]}^{{Y }_{j1}}{\left [1 - \Phi ({X}_{j}\beta + {H}_{g}\gamma )\right ]}^{{Y }_{j2}}} & & \\ & &\end{array}$$

(10.55)

Therefore, the EM algorithm can be summarized as:

1.
Initialize parameters θ⁽⁰⁾ = { β⁽⁰⁾, γ⁽⁰⁾}.
2.
Calculate E(Z _j), E(Z _j ^T Z _j) , E(y _j), and E(Z _j ^T y _j).
3.
Update β and γ using (10.49).
4.
Repeat Step 2 to Step 3 until convergence is reached.

Once the EM algorithm converges, we obtain the estimated parameters and are ready to calculate the Louis [1982] information matrix. The variance–covariance matrix of the estimated parameters simply takes the inverse of the information matrix. Let

$$H(\theta ,Z,y) = -\left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}{X}_{j}&{\sum }_{j=1}^{n}{X}_{j}^{T}{Z}_{j} \\ {\sum }_{j=1}^{n}{Z}_{j}^{T}{X}_{j} & {\sum }_{j=1}^{n}{Z}_{j}^{T}{Z}_{j}\\ \end{array} \right ]$$

(10.56)

be theHessian matrix of thecomplete-data log likelihood function and

$$S(\theta ,Z,y) = \left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}({y}_{j} - {X}_{j}\beta - {Z}_{j}\gamma ) \\ {\sum }_{j=1}^{n}{Z}_{j}^{T}({y}_{j} - {X}_{j}\beta - {Z}_{j}\gamma )\\ \end{array} \right ]$$

(10.57)

be the score vector of the complete-data log likelihood function. TheLouis information matrix is

$$I(\theta ) = -E\left [H(\theta ,Z,y)\right ] - E\left [S(\theta ,Z,y){S}^{T}(\theta ,Z,y)\right ]$$

(10.58)

where the expectations are taken with respect to themissing values of Z and y. Note that

$$\begin{array}{rcl} \mathrm{var}\left [S(\theta ,Z,y)\right ] = E\left [S(\theta ,Z,y){S}^{T}(\theta ,Z,y)\right ] - E\left [S(\theta ,Z,y)\right ]E\left [{S}^{T}(\theta ,Z,y)\right ]& & \\ & &\end{array}$$

(10.59)

and E[S(θ, Z, y)] = 0 at θ = ${\hat{\theta}}$. This leads to

$$E\left [S(\theta ,Z,y){S}^{T}(\theta ,Z,y)\right ] = \mathrm{var}\left [S(\theta ,Z,y)\right ]$$

(10.60)

Therefore, theLouis information matrix is also expressed as

$$I(\theta ) = -E\left [H(\theta ,Z,y)\right ] -\mathrm{var}\left [S(\theta ,Z,y)\right ]$$

(10.61)

The first term is easy to obtain, as shown below:

$$-E\left [H(\theta ,Z,y)\right ] = \left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}{X}_{j}^{T}{X}_{j} &{\sum }_{j=1}^{n}{X}_{j}^{T}E({Z}_{j}) \\ {\sum }_{j=1}^{n}E({Z}_{j}^{T}){X}_{j}& {\sum }_{j=1}^{n}E({Z}_{j}^{T}{Z}_{j})\\ \end{array} \right ]$$

(10.62)

The second term can be expressed as

$$\mathrm{var}\left [S(\theta ,Z,y)\right ] ={ \sum }_{j=1}^{n}\mathrm{var}\left [{S}_{ j}(\theta ,Z,y)\right ]$$

(10.63)

where

$$\begin{array}{l} {S}_{j}(\theta ,Z,y) = \left [\begin{array}{*{20}c} {X}_{j}^{T}({y}_{j} - {X}_{j}\beta - {Z}_{j}\gamma ) \\ {Z}_{j}^{T}({y}_{j} - {X}_{j}\beta - {Z}_{j}\gamma )\\ \end{array} \right ] \\ \\ \end{array}$$

(10.64)

Explicit form of varS _j(θ, Z, y) can be derived. This matrix is a 2 × 2 block matrix, denoted by

$$\mathrm{var}\left [{S}_{j}(\theta ,Z,y)\right ] = \left [\begin{array}{*{20}c} {\Sigma }_{11} & {\Sigma }_{12} \\ {\Sigma }_{21} & {\Sigma }_{22}\\ \end{array} \right ]$$

(10.65)

we now provide detailed expressions of the blocks.

$$\begin{array}{l} {\Sigma }_{11} = E\left [{X}_{j}^{T}\mathrm{var}({y}_{j} - {X}_{j}\beta - {Z}_{j}\gamma ){X}_{j}\right ] \\ {\Sigma }_{22} = E\left [{Z}_{j}^{T}\mathrm{var}({y}_{j} - {X}_{j}\beta - {Z}_{j}\gamma ){Z}_{j}\right ] \\ {\Sigma }_{12} = E\left [{X}_{j}^{T}\mathrm{var}({y}_{j} - {X}_{j}\beta - {Z}_{j}\gamma ){Z}_{j}\right ]\\ \end{array}$$

(10.66)

where var(y _j − X _jβ − Z _jγ) is the variance of a truncated normal variable (the truncation point being zero) conditional on Y _j = { Y _j1, Y _j2} and Z _j. Let

$$\varphi ({Z}_{j}) = \mathrm{var}({y}_{j} - {X}_{j}\beta -\mathrm{ {Z}}_{j}\gamma )$$

(10.67)

be the short notation for the variance of thetruncated normal variable. With some manipulation on Cohen [1991] formula, we get

$$\begin{array}{rcl} \varphi ({Z}_{j}) = 1 - \psi ({X}_{j}\beta + {Z}_{j}\gamma )\left [\psi ({X}_{j}\beta + {Z}_{j}\gamma ) - ({Y }_{j1} - {Y }_{j2})({X}_{j}\beta + {Z}_{j}\gamma )\right ]\mathrm{}& & \\ & &\end{array}$$

(10.68)

where

$$\psi ({X}_{j}\beta + {Z}_{j}\gamma ) = \frac{\phi ({X}_{j}\beta + {Z}_{j}\gamma )} {{\left [1 - \Phi ({X}_{j}\beta + {Z}_{j}\gamma )\right ]}^{{Y }_{j1}}{\left [\Phi ({X}_{j}\beta + {Z}_{j}\gamma )\right ]}^{{Y }_{j2}}}$$

(10.69)

Therefore,

$$\begin{array}{rcl}{ \Sigma }_{11}& =& {\sum }_{g=1}^{3}{p}_{ j}^{{_\ast}}(2 - g){X}_{ j}^{T}\varphi ({H}_{ g}){X}_{j} \\ {\Sigma }_{12}& =& {\sum }_{g=1}^{3}{p}_{ j}^{{_\ast}}(2 - g){X}_{ j}^{T}\varphi ({H}_{ g}){H}_{g} \\ {\Sigma }_{22}& =& {\sum }_{g=1}^{3}{p}_{ j}^{{_\ast}}(2 - g){H}_{ g}^{T}\varphi ({H}_{ g}){H}_{g}\end{array}$$

(10.70)

Further manipulation on theinformation matrix, we get

$$I(\theta ) = \left [\begin{array}{*{20}c} {\sum }_{j=1}^{n}E\left [{X}_{j}^{T}\left (1 - \varphi ({Z}_{j})\right ){X}_{j}\right ],&{\sum }_{j=1}^{n}E\left [{X}_{j}^{T}\left (1 - \varphi ({Z}_{j})\right ){Z}_{j}\right ] \\ {\sum }_{j=1}^{n}E\left [{Z}_{j}^{T}\left (1 - \varphi ({Z}_{j})\right ){X}_{j}\right ], & {\sum }_{j=1}^{n}E\left [{Z}_{j}^{T}\left (1 - \varphi ({Z}_{j})\right ){Z}_{j}\right ]\\ \end{array} \right ]$$

(10.71)

which is a 2 ×2 matrix.

Xu and Xu [2003] proposed an alternative method to calculate theLouis information matrix viaMonte Carlo simulations. The method does not involve the above complicated derivation; instead, it simply simulates the QTL genotype (Z _j) using the posterior distribution for each individual and theliability (y _j) conditional on the genotype using thetruncated normal distribution for the individual. The method directly uses the following information matrix:

$$I(\theta ) = -E\left [H(\theta ,Z,y)\right ] - E\left [S(\theta ,Z,y){S}^{T}(\theta ,Z,y)\right ]$$

(10.72)

with E[S(θ, Z, y)S ^T(θ, Z, y)] obtained viaMonte Carlo simulations. Let Z ^(t) and y ^(t) be simulated Z and y at the tth sample so that S(θ, Z ^(t), y ^(t)) is the score vector given Z ^(t), y ^(t), and θ = ${\hat{\theta}}$. The Monte Carlo approximation of E[S(θ, Z, y)S ^T(θ, Z, y)] is

$$E\left [S(\theta ,Z,y){S}^{T}(\theta ,Z,y)\right ] \approx \frac{1} {T}{\sum }_{t=1}^{T}S(\theta ,{Z}^{(t)},{y}^{(t)}){S}^{T}(\theta ,{Z}^{(t)},{y}^{(t)})$$

(10.73)

where T is a large number, say 10,000. The liability for the jth individual, y _j, is simulated from atruncated normal distribution. We adopt theinverse transformation method that has an acceptance rate of 100 % (Rubinstein 1981). With this method, we first defined

$$v = 1 - \Phi ({X}_{j}\beta + {Z}_{j}\gamma )$$

(10.74)

and then simulated a variable u from U(0, 1). Finally, we took the inverse function of the standardized normal distribution to obtain

$${y}_{j} = {Y }_{j1}{\Phi }^{-1}(u\,v) + {Y }_{ j2}{\Phi }^{-1}[v + u(1 - v)]$$

(10.75)

Intrinsic functions for both Φ(. ) and Φ ^− 1(. ) are available in many computer software packages. For example, in theSAS package (SAS Institute 2008a), Φ(x) is coded as Φ(x) = probnorm(x) and Φ ^− 1(u) is coded as Φ ^− 1(u) = probit(u). The Monte Carlo approximation is time consuming so that we cannot calculate the information matrix for every point of the genome scanned. Instead, we only calculate the information matrix at the points where evidences of QTL are strong.

6 Logistic Analysis

Similar to the probit link function, we may also use thelogit link function to perform the generalized linear model analysis. Let

$${\zeta }_{jk} = \frac{\exp ({\alpha }_{k} + {X}_{j}\beta + {Z}_{j}\gamma )} {1 +\exp ({\alpha }_{k} + {X}_{j}\beta + {Z}_{j}\gamma )}$$

(10.76)

be the cumulative distribution function of α_k + X _jβ + Z _jγ. Under thelogistic model, the mean of Y _jk is modeled by

$${\mu }_{jk} = E({Y }_{jk}) = {\zeta }_{jk} - {\zeta }_{j(k-1)}$$

(10.77)

The logistic model for the binary data is

$${ \mu }_{jk} = \left \{\begin{array}{*{20}c} {\zeta }_{j1} \\ 1 - {\zeta }_{j1}\\ \end{array} \begin{array}{*{20}c} \\ \\ \end{array} \right.\begin{array}{*{20}c} \mathrm{for}\ k = 1 \\ \mathrm{for } \ k = \end{array}$$

(10.78)

From μ_j1 = ζ_j1, we obtain

$$\mathrm{logit}({\mu }_{j1}) =\ln \left ( \frac{{\mu }_{j1}} {1 - {\mu }_{j1}}\right ) = {\alpha }_{1} + {X}_{j}\beta + {Z}_{j}\gamma $$

(10.79)

Both theprobit andlogit transformations of the expectation of Y _j1 lead to a linear model. Note that the linear model obtained here only shows the property of the transformation. In the actual theory development and data analysis, the linear transformations in (10.6) and (10.79) are never used. Showing the linear transformations may potentially cause confusion to students because, by intuition, they may try to transform the ordinal data (Y _jk) first and then conduct the usual linear regression on the transformed data, which is not appropriate and certainly not the intention of the GLM developers. The maximum likelihood analysis under thehomogeneous variance,heterogeneous variance, andmixture model and theEM algorithm described previously in theprobit analysis apply to thelogistic analysis. We only show the logistic analysis under the homogeneous variance model as an example. Note that under this model, we only need to substitute Z _j by U _j to define the expectation, i.e.,

$${\zeta }_{jk} = \frac{\exp ({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma )} {1 +\exp ({\alpha }_{k} + {X}_{j}\beta + {U}_{j}\gamma )}$$

(10.80)

and

$${\mu }_{jk} = E({Y }_{jk}) = {\zeta }_{jk} - {\zeta }_{j(k-1)}$$

(10.81)

Once μ_j is defined, the weight W _j is also defined. The only item left is D _j, which is

$${ D}_{j} = \frac{\partial {\mu }_{j}} {\partial {\theta }^{T}} = \left [\begin{array}{*{20}c} \frac{\partial {\mu }_{j}} {\partial {\alpha }^{T}} & \frac{\partial {\mu }_{j}} {\partial {\beta }^{T}} & \frac{\partial {\mu }_{j}} {\partial {\gamma }^{T}}\\ \end{array} \right ]$$

(10.82)

The first block ∂μ_j ∂α ^T = ∂μ_jk ∂α _l is a (p + 1) ×p matrix with

$$\begin{array}{rlrlrl} \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{k-1}} & = -{\zeta }_{j(k-1)}(1 - {\zeta }_{j(k-1)}) & & \\ \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{k}} & = {\zeta }_{jk}(1 - {\zeta }_{jk}) & & \\ \frac{\partial {\mu }_{jk}} {\partial {\alpha }_{l}} & = 0,\;\forall l\neq \{k - 1\;,k\} &\end{array}$$

(10.83)

The second block ∂μ_j ∂β ^T = ∂μ_jk ∂β is a (p + 1) ×q matrix with

$$\frac{\partial {\mu }_{jk}} {\partial \beta } = {X}_{j}^{T}{\zeta }_{ jk}(1 - {\zeta }_{jk}) - {X}_{j}^{T}{\zeta }_{ j(k-1)}(1 - {\zeta }_{j(k-1)})$$

(10.84)

The third block ∂μ_j ∂γ ^T = ∂μ_jk ∂γ is a (p + 1) ×r matrix with

$$\frac{\partial {\mu }_{jk}} {\partial \gamma } = {U}_{j}^{T}{\zeta }_{ jk}(1 - {\zeta }_{jk}) - {U}_{j}^{T}{\zeta }_{ j(k-1)}(1 - {\zeta }_{j(k-1)})$$

(10.85)

In the above partial derivatives, the range of k is k = 1, …, p + 1.

7 Example

The experiment was conducted by Dou et al. [2009]. A female sterile line of wheat XND126 and an elitewheat cultivar Gaocheng 8901 with normal fertility were crossed for genetic analysis offemale sterility measured as the number of seededspikelets per plant. The parents, their { F} ₁ and { F} ₂ progeny, were planted at the Huaian experimental station in China for the 2006–2007 growing season under the normal autumn sowing condition. The mapping population was an{ F} ₂ family consisting of 243 individual plants. About 84 % of the { F} ₂ progeny had seeded spikelets, and the remaining 16 % plants did not have any seeds at all. Among the plants with seeded spikelets, the number of seeded spikelets varied from one to as many as 31. The phenotype is the count data point and can be modeled using thePoisson distribution. The phenotype can also be treated as abinary data point and analyzed using theBernoulli distribution. In this example, we treated the phenotype as a binary data (seed presence and absence) and analyzed it using the Bernoulli distribution. A total of 28 SSR markers were used in this experiment. These markers covered five chromosomes of the wheat genome with an average genome marker density of 15.5 cM per marker interval. The five chromosomes are only part of the wheat genome. These chromosomes were scanned for QTL of the binary data. Let A ₁ and A ₂ be the alleles carried by Gaocheng 8901 and XDN128, respectively. Let A ₁ A ₁, A ₁ A ₂, and A ₂ A ₂ be the three genotypes for the QTL of interest. The genotype is numerically coded as 1, 0, and − 1, respectively, for the three genotypes. The genome was scanned with 1-cM increment. All the three methods described in this chapter were used for theinterval mapping. They are the homogeneous variance model (HOMOGENEOUS), theheterogeneous variance model (HETEROGENEOUS), and themixture model (MIXTURE). TheLOD score profiles are depicted in Fig. 10.2. When LOD = 3 is used as the threshold value, all three methods detected two major QTL on chromosome 2. The LOD score for the mixture model appears to be higher than the other two models, but the difference is very small and can be safely ignored.

The estimated QTL effect profiles are given in Fig. 10.3. Again the three methods are almost the same for the estimated QTL effects except that the mixture model and the heterogeneous model give slightly higher estimates than the homogeneous model. In practice, we recommend the heterogeneous model because it produces almost the same result as the mixture model but with much less computing time than the mixture model.

References

Abdi H (2007) Bonferroni and Šidák corrections for multiple comparisons. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, California
Google Scholar
Almasy L, Blangero J (1998) Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Human Genet 62(5):1198–1211
CAS Google Scholar
Amos CI (1994) Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Human Genet 54(3):535–543
CAS Google Scholar
Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519
PubMed CAS Google Scholar
Banerjee S, Yandell BS, Yi N (2008) Bayesian quantitative trait loci mapping for multiple traits. Genetics 179(4):2275–2289
PubMed Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate – a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Stat Methodol) 57(1):289–300
Google Scholar
Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, Landfield PW (2004) Incipient Alzheimer’s disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Nat Acad Sci USA 101(7):2173–2178
PubMed CAS Google Scholar
Bottolo L, Petretto E, Blankenberg S, Cambien F, Cook SA, Tiret L, Richardson S (2011) Bayesian detection of expression quantitative trait loci hot-spots. Genetics 189(4):1449–1459
PubMed Google Scholar
Bottolo L, Richardson S (2010) Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal 5(3):583–618
Google Scholar
Box GEP, Cox DR (1964) An analysis of transformations. J Roy Stat Soc Ser B (Stat Methodol) 26(2):211–252
Google Scholar
Box GEP, Tiao GC (1973) Bayesian inference in statistical analysis. Wiley, New York
Google Scholar
Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296(5568):752–755
PubMed CAS Google Scholar
Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc Ser B (Stat Methodol) 64(4):641–656
Google Scholar
Cai X, Huang A, Xu S (2011) Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12(1):211
PubMed Google Scholar
Che X, Xu S (2010) Significance test and genome selection in Bayesian shrinkage analysis. Int J Plant Genomics 2010:doi:10.1155/2010/893206
Google Scholar
Chen M, Presting G, Barbazuk WB, Goicoechea JL, Blackmon, B, Fang G, Ki H, Frisch D, Yu Y, Sun S, Higingbottom S, Phimphilai J, Phimphilai D, Thurmond S, Gaudette B, Li P, Liu J, Hatfield J, Main D, Farrar K, Henderson C, Barnett L, Costa R, Williams B, Walser S, Atkins M, Hall C, Budiman MA, Tomkins JP, Luo M, Bancroft I, Salse J, Regad F, Mohapatra T, Singh NK, Tyagi AK, Soderlund C, Dean RA, Wing RA (2002) An integrated physical and genetic map of the rice genome. Plant Cell 14(3):537–545
PubMed Google Scholar
Cheung VG, Spielman RS (2002) The genetics of variation in gene expression. Nat Genet 32(Supp):522–525
Google Scholar
Chun H, Keles S (2009) Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 182(1):79–90
PubMed CAS Google Scholar
Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138(3):963–971
PubMed CAS Google Scholar
Civardi L, Xia Y, Edwards EJ, Schnable PS, Nikolau BJ (1994) The relationship between genetic and physical distances in the cloned al-h2 interval of the Zea mays L. genome. Proc Nat Acad Sci USA 91(17):8268–8272
Google Scholar
Cohen AC (1991) Truncated and censored samples:theory and applications, vol 119 of Statistics: textbooks and monographs, 1st edn. Marcel Dekker Inc., New York
Google Scholar
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184–194
PubMed CAS Google Scholar
Cullinan WE, Herman JP, Battaglia DF, Akil H, Watson SJ (1995) Pattern and time course of immediate early gene expression in rat brain following acute stress. Neuroscience 64(2): 477–505
PubMed CAS Google Scholar
Dagliyan O, Uney-Yuksektepe F, Kavakli IH, Turkay M (2011) Optimization based tumor classification from microarray gene expression data. Publ Libr Sci One 6(2):e14579
CAS Google Scholar
de Boor C (1978) A practical guide to splines. Springer, New York
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B (Stat Methodol) 39(1):1–38
Google Scholar
Dou B, Hou B, Xu H, Lou X, Chi X, Yang J, Wang F, Ni Z, Sun Q (2009) Efficient mapping of a female sterile gene in wheat (Triticum aestivum l.). Genet Res 91(05):337–343
Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Google Scholar
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26
Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Nat Acad Sci USA 95(25):14863–14868
PubMed CAS Google Scholar
Elston RC, Steward J (1971) A general model for the genetic analysis of pedigree data. Hum Hered 21(6):523–542
PubMed CAS Google Scholar
Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, Gulcher JR, Reitman ML, Kong A, Schadt EE, Stefansson K (2008) Genetics of gene expression and its effect on disease. Nature 452:423–428
PubMed CAS Google Scholar
Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Longman Group Ltd., London
Google Scholar
Feenstra B, Skovgaard IM, Broman KW (2006) Mapping quantitative trait loci by an extension of the Haley-Knott regression method using estimating equations. Genetics 173(4):2269–2282
PubMed CAS Google Scholar
Felsenstein J (1981a) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376
PubMed CAS Google Scholar
Felsenstein J (1981b) Evolutionary trees from gene frequencies and quantitative characters: finding maximum likelihood estimates. Evolution 35(6):1229–1242
Google Scholar
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791
Google Scholar
Fisher RA (1946) A system of scoring linkage data, with special reference to the pied factors in mice. Am Nat 80(794):568–578
PubMed CAS Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Software 33(1):1–22
Google Scholar
Fu YB, Ritland K (1994) On estimating the linkage of marker genes to viability genes controlling inbreeding depression. Theoret Appl Genet 88(8):925–932
Google Scholar
Fulker DW, Cardon LR (1994) A sib-pair approach to interval mapping of quantitative trait loci. Am J Hum Genet 54(6):1092–1103
PubMed CAS Google Scholar
Gelfand AE, Hills SE, Racine-Poon A, Smith AFM (1990) Illustration of Bayesian inference in normal data models using Gibbs sampling. J Am Stat Assoc 85(412):972–985
Google Scholar
Gelman A (2005) Analysis of variance – why it is more important than ever. Ann Stat 33(1):1–53
Google Scholar
Gelman A (2006) Prior distributions for variance parameters in hierarchical models (Comment on article by Browne and Draper). Bayesian Anal 1(3):515–533
Google Scholar
Gelman A, Jakulin A, Pittau MG, Su YS (2008) A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat 2(4):1360–1383
Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell PAMI-6(6):721–741
Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Google Scholar
George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Statistica Sinica 7:339–373
Google Scholar
Ghosh D, Chinnaiyan AM (2002) Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18(2):275–286
PubMed CAS Google Scholar
Gilks WR, Richardson S, Spiegelhalter DJ (1996) Markov chain Monte Carlo in practice. Chapman and Hall/CRC, London
Google Scholar
Glonek G, Solomon P (2004) Factorial and time course designs for cDNA microarray experiments. Biostatistics 5(1):89–111
PubMed CAS Google Scholar
Goldgar DE (1990) Multipoint analysis of human quantitative genetic variation. Am J Hum Genet 47(6):957–967
PubMed CAS Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
PubMed CAS Google Scholar
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732
Google Scholar
Hackett CA, Meyer RC, Thomas WTB (2001) Multi-trait QTL mapping in barley using multivariate regression. Genet Res 77(1):95–106
PubMed CAS Google Scholar
Hackett CA, Weller JI (1995) Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51(4):1252–1263
PubMed CAS Google Scholar
Haldane JBS (1919) The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet 8(29):299–309
Google Scholar
Haldane JBS, Waddington CH (1931) Inbreeding and linkage. Genetics 16(4):357–374
PubMed CAS Google Scholar
Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69(4):315–324
PubMed CAS Google Scholar
Haley CS, Knott SA, Elsen JM (1994) Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136(3):1195–1207
PubMed CAS Google Scholar
Han L, Xu S (2008) A Fisher scoring algorithm for the weighted regression method of QTL mapping. Heredity 101(5):453–464
PubMed CAS Google Scholar
Han L, Xu S (2010) Genome-wide evaluation for quantitative trait loci under the variance component model. Genetica 138(9–10):1099–1109
PubMed Google Scholar
Hardy GH (1908) Mendelian proportions in a mixed population. Science 28(706):49–50
PubMed CAS Google Scholar
Hartigan J, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J Roy Stat Soc Ser C (Appl Stat) 28(1):100–108
Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Google Scholar
Hartl DL, Clark AG (1997) Principles of population genetics, 3rd edn. Sinauer Associates Inc., Sunderland, Massachusetts
Google Scholar
Haseman JK, Elston RC (1972) The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 2(1):3–19
PubMed CAS Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109
Google Scholar
Hayes JG (1974) Numerical methods for curve and surface fitting. Bull Inst Math Appl 10(5/6):144–152
Google Scholar
Hayes PM, Liu BH, Knapp SJ, Chen F, Jones B, Blake T, Franckowiak J, Rasmusson D, Sorrells M, Ullrich SE, Wesenberg D, Kleinhofs A (1993) Quantitative trait locus effects and environmental interaction in a sample of North American barley germ plasm. Theor Appl Genet 87(3):392–401
Google Scholar
Heath SC (1997) Markov chain Monte Carlo segregation and linkage analysis of oligogenic models. Am J Hum Genet 61(3):748–760
PubMed CAS Google Scholar
Henderson CR (1950) Estimation of genetic parameters (abstract). Ann Math Stat 21(2):309–310
Google Scholar
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2):423–447
PubMed CAS Google Scholar
Henshall JM, Goddard ME (1999) Multiple-trait mapping of quantitative trait loci after selective genotyping using logistic regression. Genetics 151(2):885–894
PubMed CAS Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(2):55–67
Google Scholar
Horton NJ, Laird NM (1999) Maximum likelihood analysis of generalized linear models with missing covariates. Stat Methods Med Res 8(1):37–50
PubMed CAS Google Scholar
Hu Z, Xu S (2009) PROC QTL – a SAS procedure for mapping quantitative trait loci. Int J Plant Genom 2009:1–3, doi:10.1155/2009/141234
Google Scholar
Huelsenbeck JP Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294(5550):2310–2314
PubMed CAS Google Scholar
Ibrahim JG (1990) Incomplete data in generalized linear models. J Am Stat Assoc 85(411): 765–769
Google Scholar
Ibrahim JG, Chen MH, Lipsitz SR (2002) Bayesian methods for generalized linear models with covariates missing at random. Can J Stat 30(1):55–78
Google Scholar
Ibrahim JG, Chen MH, Lipsitz SR, Herring AH (2005) Missing-data methods for generalized linear models. J Am Stat Assoc 100(469):332–346
CAS Google Scholar
Jia Z, Xu S (2005) Clustering expressed genes on the basis of their association with a quantitative phenotype. Genet Res 86(3):193–207
PubMed CAS Google Scholar
Jia Z, Xu S (2007) Mapping quantitative trait loci for expression abundance. Genetics 176(1): 611–623
PubMed CAS Google Scholar
Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140(3):1111–1127
PubMed CAS Google Scholar
Jiang C, Zeng ZB (1997) Mapping quantitative trait loci with dominance and missing markers in various crosses from two inbred lines. Genetica 101(1):47–58
PubMed CAS Google Scholar
Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinform 6:148
Google Scholar
Kao CH (2000) On the differences between the maximum likelihood and the regression interval mapping in the analysis of quantitative trait loci. Genetics 156(2):855–865
PubMed CAS Google Scholar
Kao CH, Zeng ZB, Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152(3):1203–1216
PubMed CAS Google Scholar
Kendziorski C, Wang P (2006) A review of statistical methods for expression quantitative trait loci mapping. Mamm Genome 17(6):509–517
PubMed Google Scholar
Kendziorski CM, Chen M, Yuan M, Lan H, Attie AD (2006) Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62(1):19–27
PubMed CAS Google Scholar
Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156(2):899–911
PubMed CAS Google Scholar
Korol AB, Ronin YI, Itskovich AM, Peng J, Nevo E (2001) Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics 157(4):1789–1803
PubMed CAS Google Scholar
Korol AB, Ronin YI, Kirzhner VM (1995) Interval mapping of quantitative trait loci employing correlated trait complexes. Genetics 140(3):1137–1147
PubMed CAS Google Scholar
Kosambi DD (1943) The estimation of map distances from recombination values. Ann Hum Genet 12(1):172–175
Google Scholar
Lan H, Chen M, Flowers JB, Yandell BS, Stapleton DS, Mata CM, Mui ET, Flowers MT, Schueler KL, Manly KF, Williams RW, Kendziorski C, Attie AD (2006) Combined expression trait correlations and expression quantitative trait locus mapping. Pub Lib Sci Genet 2(1):e6
Google Scholar
Land AH, Doig AG (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520
Google Scholar
Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121(1):185–199
PubMed CAS Google Scholar
Lee Y, Lee C (2003) Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9):1132–1139
PubMed CAS Google Scholar
Li CC (1955) Population genetics. University of Chicago Press, Chicago
Google Scholar
Liao J, Chin K (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15):1945–1951
PubMed CAS Google Scholar
Liu BH (1998) Statistical genomics: linkage, mapping and qtl analysis, 1st edn. CRC, Boca Raton
Google Scholar
Lorieux M, Goffinet B, Perrier X, Leon DG, Lanaud C (1995a) Maximum-likelihood models for mapping genetic markers showing segregation distortion. 1. Backcross populations. Theor Appl Genet 90(1):73–80
Google Scholar
Lorieux M, Perrier X, Goffinet B, Lanaud C, Leon DG (1995b) Maximum-likelihood models for mapping genetic markers showing segregation distortion. 2. F2 populations. Theor Appl Genet 90(1):81–89
Google Scholar
Loudet O, Chaillou S, Camilleri C, Bouchez D, Daniel-Vedele F (2002) Bay-0 × Shahdara recombinant inbred line population: a powerful tool for the genetic dissection of complex traits in Arabidopsis. Theor Appl Genet 104(6):1173–1184
PubMed CAS Google Scholar
Louis T (1982) Finding the observed information matrix when using the EM algorithm. J Roy Stat Soc Ser B (Stat Methodol) 44(2):226–233
Google Scholar
Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 19(4):474–482
PubMed CAS Google Scholar
Luo L, Xu S (2003) Mapping viability loci using molecular markers. Heredity 90(6):459–467
PubMed CAS Google Scholar
Luo L, Zhang YM, Xu S (2005) A quantitative genetics model for viability selection. Heredity 94(3):347–355
PubMed CAS Google Scholar
Luo ZW, Zhang RM, Kearsey MJ (2004) Theoretical basis for genetic linkage analysis in autotetraploid species. Proc Nat Acad Sci USA 101(18):7040–7045
PubMed CAS Google Scholar
Luo ZW, Zhang Z, Leach L, Zhang RM, Bradshaw JE, Kearsey MJ (2006) Constructing genetic linkage maps under a tetrasomic model. Genetics 172(4):2635–2645
PubMed CAS Google Scholar
Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits, 1st edn. Sinauer Associates Inc., Sunderland
Google Scholar
Ma P, Castillo-Davis CI, Zhong W, Liu JS (2006) A data-driven clustering method for time course gene expression data. Nucleic Acids Res 34(4):1261–1269
PubMed CAS Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297, Berkeley, California
Google Scholar
Mangin B, Thoquet P, Grimsley N (1998) Pleiotropic QTL analysis. Biometrics 54(1):88–99
Google Scholar
McCullagh P, Nelder JA (1999) Generalized linear models. Monograph on statistics and applied probability. Chapman and Hall/CRC, London
Google Scholar
McCulloch CE, Searle SR (2001) Generalized linear and mixed models. Wiley, New York
Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Google Scholar
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3):413–422
PubMed CAS Google Scholar
McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712
PubMed CAS Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
CAS Google Scholar
Mitchell-Olds T (1995) Interval mapping of viability loci causing heterosis in Arabidopsis. Genetics 140(3):1105–1109
PubMed CAS Google Scholar
Morgan TH (1928) The theory of the gene. Yale University Press, New Haven
Google Scholar
Morgan TH, Bridges CB (1916) Sex-linked inheritance in drosophila. Carniegie Institute of Washington, Washington DC
Google Scholar
Narula SC (1979) Orthogonal polynomial regression. Int Stat Rev 47(1):31–36
Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Google Scholar
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J Roy Stat Soc Ser A (General) 135(3):370–384
Google Scholar
Nettleton D, Doerge RW (2000) Accounting for variability in the use of permutation testing to detect quantitative trait loci. Biometrics 56(1):52–58
PubMed CAS Google Scholar
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–176
PubMed Google Scholar
Ouyang M, Welsh WJ, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20(6):917–923
PubMed CAS Google Scholar
Pan W, Lin J, Le CT (2002) Model-based cluster analysis of microarray gene expression data. Genome Biol 3(2):research0009.1–0009.8
Google Scholar
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103(482):681–686
CAS Google Scholar
Park T, Yi SG, Lee S, Lee SY, Yoo DH, Ahn JI, Lee YS (2003) Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics 19(6):694–703
PubMed CAS Google Scholar
Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19(7):834–841
PubMed CAS Google Scholar
Piepho HP (2001) A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157(1):425–432
PubMed CAS Google Scholar
Potokina E, Caspers M, Prasad M, Kota R, Zhang H, Sreenivasulu N, Wang M, Graner A (2004) Functional association between malting quality trait components and cDNA array based expression patterns in barley. Mol Breed 14(2):153–170
CAS Google Scholar
Qu Y, Xu S (2004) Supervised cluster analysis for microarray data based on multivariate Gaussian mixture. Bioinformatics 20(12):1905–1913
PubMed CAS Google Scholar
Qu Y, Xu S (2006) Quantitative trait associated microarray gene expression data analysis. Mol Biol Evol 23(8):1558–1573
PubMed CAS Google Scholar
Robinson GK (1991) That BLUP is a good thing: the estimation of random effects. Stat Sci 6(1):15–32
Google Scholar
Rubin NB (1987) Multiple imputation for nonresponse in survey. Wiley, New York
Google Scholar
Rubinstein R (1981) Simulation and the Monte Carlo method. Wiley, New York
Google Scholar
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425
PubMed CAS Google Scholar
SAS Institute (2008a). SAS/IML 9.2 user’s guide. SAS Institute Inc, Cary, North Carolina
Google Scholar
SAS Institute (2008b) SAS/STAT 9.2 user’s guide. SAS Institute Inc., Cary, North Carolina
Google Scholar
Satagopan JM, Yandell BS, Newton MA, Osborn TC (1996) A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144(2):805–816
PubMed CAS Google Scholar
Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422:297–302
PubMed CAS Google Scholar
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470
PubMed CAS Google Scholar
Schliep A, Schnhuth A, Steinhoff C (2003) Using hidden Markov models to analyze gene expression time course data. Bioinformatics 19(supp 1):i255–i263
PubMed Google Scholar
Schork NJ (1993) Extended multipoint identity-by-descent analysis of human quantitative traits: efficiency, power, and modeling considerations. Am J Hum Genet 53(6):1306–1319
PubMed CAS Google Scholar
Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Google Scholar
Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New Yok
Google Scholar
Seber GAF (1977) Linear regression analysis, 1st edn. Wiley, New York
Google Scholar
Sillanpää MJ, Arjas E (1998) Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148(3):1373–1388
PubMed Google Scholar
Sillanpää MJ, Arjas E (1999) Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151(4):1605–1619
PubMed Google Scholar
Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3(2):Article 3
Google Scholar
Sobel E, Sengul H, Weeks DE (2001) Multipoint estimation of identity-by-descent probabilities at arbitrary positions among marker loci on general pedigrees. Hum Hered 52(3):121–131
PubMed CAS Google Scholar
Sober E (1983) Parsimony in systematics: philosophical issues. Ann Rev Ecol Systemat 14: 335–357
Google Scholar
Sokal R, Michener C (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Google Scholar
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York
Google Scholar
Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643
PubMed CAS Google Scholar
Steeb W, Hardy Y (2011) Matrix calculus and Kronecker product: a practical approach to linear and multilinear algebra. World Scientific Publishing Company, Singapore
Google Scholar
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis of time course microarray experiments. Proc Nat Acad Sci USA 102(36):12837–12842
PubMed CAS Google Scholar
Studier JA, Keppler KJ (1988) A note on the neighbor-joining algorithm of Saitou and Nei. Mol Biol Evol 5(6):729–731
PubMed CAS Google Scholar
Swofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) Phylogenetic inference. In: Hillis DM, Moritz C, Mable BK (eds) Molecular systematics, 2nd edn. Sinauer Associates, Sunderland, Mass., pp 407–514
Google Scholar
ter Braak CJF, Boer MP, Bink MCAM (2005) Extending Xu’s Bayesian model for estimating polygenic effects using markers of the entire genome. Genetics 170(3):1435–1438
PubMed Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Roy Stat Soc Ser B (Stat Methodol) 58(1):267–288
Google Scholar
Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244
Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Nat Acad Sci USA 98(9):5116–5121
PubMed CAS Google Scholar
Visscher PM, Haley CS, Knott SA (1996) Mapping QTLs for binary traits in backcross and F2 populations. Genet Res 68(01):55–63
Google Scholar
Vogl C, Xu S (2000) Multipoint mapping of viability and segregation distorting loci using molecular markers. Genetics 155(3):1439–1447
PubMed CAS Google Scholar
Wald A (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc 54(3):426–482
Google Scholar
Wang C, Zhu C, Zhai H, Wan J (2005a) Mapping segregation distortion loci and quantitative trait loci for spikelet sterility in rice (Oryza sativa l.). Genet Res 86(2):97–106
Google Scholar
Wang H, Zhang Y, Li X, Masinde GL, Mohan S, Baylink DJ, Xu S (2005b) Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170(1):465–480
PubMed CAS Google Scholar
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61(3):439–447
Google Scholar
Weinberg W (1908) Über den nachweis der vererbung beim menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg 64:368–382
Google Scholar
Welham S, Cullis B, Kenward M, Thompson R (2007) A comparison of mixed model splines for curve fitting. Aust New Zeal J Stat 49(1):1–23
Google Scholar
Williams JT, Van Eerdewegh P, Almasy L, Blangero J (1999) Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. Likelihood formulation and simulation results. Am J Hum Genet 65(4):1134–1147
CAS Google Scholar
Wolfinger RD, Gibson C, Wolfinger ED, Bennet L, Hamadeh H, Rishel P, Afshari C, Paules RS (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol 8(6):625–637
PubMed CAS Google Scholar
Xie C, Xu S (1999) Mapping quantitative trait loci with dominant markers in four-way crosses. Theor Appl Genet 98(6):1014–1021
Google Scholar
Xu C, Li Z, Xu S (2005) Joint mapping of quantitative trait loci for multiple binary characters. Genetics 169(2):1045–1059
PubMed CAS Google Scholar
Xu C, Wang X, Li Z, Xu S (2009) Mapping QTL for multiple traits using Bayesian statistics. Genet Res 91(1):23–37
CAS Google Scholar
Xu C, Xu S (2003) A SAS/IML program for mapping QTL in line crosses. Proceedings of the twenty-eighth annual SAS users group international conference (SUGI), Cary, NC. SAS Institute
Google Scholar
Xu S (1995) A comment on the simple regression method for interval mapping. Genetics 141(4):1657–1659
PubMed CAS Google Scholar
Xu S (1996) Mapping quantitative trait loci using four-way crosses. Genet Res 68(02):175–181
Google Scholar
Xu S (1998a) Further investigation on the regression method of mapping quantitative trait loci. Heredity 80(3):364–373
PubMed Google Scholar
Xu S (1998b) Iteratively reweighted least squares mapping of quantitative trait loci. Behav Genet 28(5):341–355
PubMed CAS Google Scholar
Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163(2):789–801
PubMed CAS Google Scholar
Xu S (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63(2):513–521
PubMed CAS Google Scholar
Xu S (2008) Quantitative trait locus mapping can benefit from segregation distortion. Genetics 180(4):2201–2208
PubMed Google Scholar
Xu S, Atchley WR (1995) A random model approach to interval mapping of quantitative trait loci. Genetics 141(3):1189–1197
PubMed CAS Google Scholar
Xu S, Hu Z (2009) Mapping quantitative trait loci using distorted markers. Int J Plant Genom 2009, doi:10.1155/2009/410825
Google Scholar
Xu S, Hu Z (2010) Generalized linear model for interval mapping of quantitative trait loci. Theor Appl Genet 121(1):47–63
PubMed Google Scholar
Xu S, Xu C (2006) A multivariate model for ordinal trait analysis. Heredity 97(6):409–417
PubMed CAS Google Scholar
Xu S, Yi N (2000) Mixed model analysis of quantitative trait loci. Proc Nat Acad Sci USA 97(26):14542–14547
PubMed CAS Google Scholar
Xu S, Yi N, Burke D, Galecki A, Miller RA (2003) An EM algorithm for mapping binary disease loci: application to fibrosarcoma in a four-way cross mouse family. Genet Res 82(2):127–138
PubMed CAS Google Scholar
Yeung KY, Bumgarner RE (2003) Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 4(12):R83
PubMed Google Scholar
Yi N (2004) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167(2):967–975
PubMed CAS Google Scholar
Yi N, George V, Allison DB (2003) Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics 164(3):1129–1138
PubMed CAS Google Scholar
Yi N, Shriner D (2008) Advances in Bayesian multiple QTL mapping in experimental designs. Heredity 100(3):240–252
PubMed CAS Google Scholar
Yi N, Xu S (1999) A random model approach to mapping quantitative trait loci for complex binary traits in outbred populations. Genetics 153(2):1029–1040
PubMed CAS Google Scholar
Yi N, Xu S (2000) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155(3):1391–1403
PubMed CAS Google Scholar
Yi N, Xu S (2001) Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics 157(4):1759–1771
PubMed CAS Google Scholar
Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179(2): 1045–1055
PubMed CAS Google Scholar
Yuan M, Lin Y (2005) Efficient empirical Bayes variable selection and estimation in linear models. J Am Stat Assoc 100(472):1215–1225
CAS Google Scholar
Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136(4):1457–1468
PubMed CAS Google Scholar
Zhan H, Chen X, Xu S (2011) A stochastic expectation and maximization algorithm for detecting quantitative trait-associated genes. Bioinformatics 27(1):63–69
PubMed CAS Google Scholar
Zhao H, Speed TP (1996) On genetic map functions. Genetics 142(4):1369–1377
PubMed CAS Google Scholar
Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427–443
PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Botany and Plant Sciences, University of California, 900 University Avenue, Riverside, California, USA
Shizhong Xu

Authors

Shizhong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, S. (2013). Interval Mapping for Ordinal Traits. In: Principles of Statistical Genomics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-70807-2_10

Download citation

DOI: https://doi.org/10.1007/978-0-387-70807-2_10
Published: 30 July 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-70806-5
Online ISBN: 978-0-387-70807-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Interval Mapping for Ordinal Traits

Abstract

Keywords

1 Generalized Linear Model

2 ML Under Homogeneous Variance

3 ML Under Heterogeneous Variance

4 ML Under Mixture Distribution

5 ML via the EM Algorithm

6 Logistic Analysis

7 Example

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation