Bayesian Multiple QTL Mapping

Xu, Shizhong

doi:10.1007/978-0-387-70807-2_15

Shizhong Xu²

3725 Accesses

Abstract

So far we have learned the least-squares method, the weighted least squares method, and the maximum likelihood method for QTL mapping. These methods share a common problem in handlingmultiple QTL, that is, the problem ofmulticollinearity. Therefore, a model can include only a few QTL. Recently, Bayesian method has been developed for mapping multiple QTL (Satagopan et al. 1996; Heath 1997; Sillanpää and Arjas 1998; Sillanpää and Arjas 1999; Xu 2003; Yi 2004; Wang et al. 2005b; Yi and Shriner 2008).

Access provided by Autonomous University of Puebla. Download chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

So far we have learned the least-squares method, the weighted least squares method, and the maximum likelihood method for QTL mapping. These methods share a common problem in handlingmultiple QTL, that is, the problem ofmulticollinearity. Therefore, a model can include only a few QTL. Recently, Bayesian method has been developed for mapping multiple QTL (Satagopan et al. 1996; Heath 1997; Sillanpää and Arjas 1998; Sillanpää and Arjas 1999; Xu 2003; Yi 2004; Wang et al. 2005b; Yi and Shriner 2008). Under the Bayesian framework, the model can tolerate a much higher level of multicollinearity than the maximum likelihood method. As a result, theBayesian method can handle highly saturated model. This chapter is focused on the Bayesian method via the Markov chain Monte Carlo (MCMC) algorithm. Before introducing the methods ofBayesian mapping, it is necessary to review briefly the background knowledge ofBayesian statistics.

1 Bayesian Regression Analysis

We will learn the basic principle and method of Bayesian analysis using a simple regression model as an example. The simple regression model has the following form:

$${y}_{j} = {X}_{j}\beta + {\epsilon }_{j},\forall j = 1,\ldots ,n$$

(15.1)

where y _j is the response (dependent) variable, X _j is the regressor (independent variable), β is the regression coefficient, and ε_j is the residual error with an assumed N(0, σ²) distribution. This model is a special case of

$${y}_{j} = \alpha + {X}_{j}\beta + {\epsilon }_{j},\forall j = 1,\ldots ,n$$

(15.2)

with α = 0, i.e., regression through the origin. We use this special model to derive the Bayesian estimates of parameters. In subsequent sections, we will extend the model to the usual regression with nonzero intercept and also regression with multiple explanatory variables (multiple regression). The log likelihood function is

$$L(\theta ) = -\frac{n} {2} \log ({\sigma }^{2}) - \frac{1} {2{\sigma }^{2}}{ \sum \nolimits }_{j=1}^{n}{({y}_{ j} - {X}_{j}\beta )}^{2}$$

(15.3)

where θ = { β, σ²}. The MLEs of θ are

$$\hat{\beta } ={ \left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}^{2}\right )}^{-1}\left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}{y}_{j}\right )$$

(15.4)

and

$$\hat{{\sigma }}^{2} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}{({y}_{ j} - {X}_{j}\hat{\beta })}^{2}$$

(15.5)

In the maximum likelihood analysis, parameters are estimated from the data. Sometimes investigators have prior knowledge of the parameters. This prior knowledge can be incorporated into the analysis to improve the estimation of parameters. This is the primary purpose of Bayesian analysis. The prior knowledge is formulated as aprior distribution of the parameters. Let p(β, σ²) be the joint prior density of θ. Usually, we assume that β and σ² are independent so that

$$p(\beta ,{\sigma }^{2}) = p(\beta )p({\sigma }^{2})$$

(15.6)

The choice of p(β) and p(σ²) depends on investigator’s knowledge on the problem and mathematical attractiveness. In the simple regression analysis, the following priors are both legitimate and attractive, which are

$$p(\beta ) = N(\beta \vert {\mu }_{\beta },{\sigma }_{\beta }^{2})$$

(15.7)

and

$$p({\sigma }^{2}) = \text{ Inv} - {\chi }^{2}({\sigma }^{2}\vert \tau ,\omega )$$

(15.8)

where N(β | μ_β, σ_β ²) is the notation for the normal density of variable β with mean μ_β and variance σ_β ², and { Inv} − χ²(σ² | τ, ω) is the probability density for the scaledinverse chi-square distribution of variable σ² with degree of freedom τ and scale parameter ω. The notation for a distribution and the notation for the probability density of the distribution are now consistent. For example, x ∼ N(μ, σ²) means that x is normally distributed with mean μ and variance σ², which is equivalently described as p(x) = N(x | μ, σ²). The exact forms of these distributions are

$$p(\beta ) = N(\beta \vert {\mu }_{\beta },{\sigma }_{\beta }^{2}) = \frac{1} {\sqrt{2\pi {\sigma }_{\beta }^{2}}}\exp \left [- \frac{1} {2{\sigma }_{\beta }^{2}}{(\beta - {\mu }_{\beta })}^{2}\right ]$$

(15.9)

and

$$\begin{array}{rcl} p({\sigma }^{2}) = \text{ Inv} - {\chi }^{2}({\sigma }^{2}\vert \tau ,\omega ) = \frac{{(\tau \omega /2)}^{\tau /2}} {\Gamma (\tau /2)} {({\sigma }^{2})}^{-(\tau /2+1)}\exp \left (-\frac{\tau \omega } {2{\sigma }^{2}}\right )& &\end{array}$$

(15.10)

where Γ(τ ∕ 2) is thegamma function with argument τ ∕ 2. Conditional on the parameter θ, the data vector y has a normal distribution with probability density

$$\begin{array}{rcl} p(y\vert \theta ) ={ \prod \nolimits }_{j=1}^{n}N({y}_{ j}\vert \mu ,{\sigma }^{2}) \propto \frac{1} {{({\sigma }^{2})}^{n/2}}\exp \left [- \frac{1} {2{\sigma }^{2}}{ \sum \nolimits }_{j=1}^{n}{({y}_{ j} - {X}_{j}\beta )}^{2}\right ]& &\end{array}$$

(15.11)

We now have the probability density of the data and the density of the prior distribution of the parameters. We treat both the data and the parameters as random variables and formulate the joint distribution of the data and the parameters,

$$\begin{array}{rcl} p(y,\theta ) = p(y\vert \theta )p(\theta )& &\end{array}$$

(15.12)

where p(θ) = p(β)p(σ²). The purpose of Bayesian analysis is to infer the conditional distribution of the parameters given the data and draw conclusion about the parameters from the conditional distribution. Theconditional distribution of the parameters has the form of

$$\begin{array}{rcl} p(\theta \vert y) = \frac{p(y,\theta )} {p(y)} \propto p(y,\theta )& &\end{array}$$

(15.13)

which is also called theposterior distribution of the parameters. The denominator, p(y), is themarginal density of the data, which is irrelevant to the parameters and can be ignored because we are only interested in the estimation of parameters. Note that the above conditional density is rewritten as

$$\begin{array}{rcl} p(\beta ,{\sigma }^{2}\vert y) = \frac{p(y,\beta ,{\sigma }^{2})} {p(y)} \propto p(y,\beta ,{\sigma }^{2})& &\end{array}$$

(15.14)

which is still a joint posterior density with regard to the two components of the parameter vector. The ultimate purpose of the Bayesian analysis is to infer the marginal posterior distribution for each component of the parameter vector. The marginal posterior density for β is obtained by integrating the joint posterior distribution over σ²,

$$p(\beta \vert y) ={ \int \nolimits \nolimits }_{0}^{\infty }p(\beta ,{\sigma }^{2}\vert y)\mathrm{d}{\sigma }^{2}$$

(15.15)

The integration has an explicit form, which turns out to be the kernel of a t-distribution with $n + \tau - 1$ degrees of freedom (Sorensen and Gianola 2002). The β itself is not a t-distributed variable. It is $(\beta -\tilde{ \beta })/{\sigma }_{\tilde{\beta }}$ that has at-distribution, where

$$\text{ E}(\beta \vert y) =\tilde{ \beta } ={ \left ( \frac{1} {{\sigma }_{\hat{\beta }}^{2}} + \frac{1} {{\sigma }_{\beta }^{2}}\right )}^{-1}\left ( \frac{\hat{\beta }} {{\sigma }_{\hat{\beta }}^{2}} + \frac{{\mu }_{\beta }} {{\sigma }_{\beta }^{2}}\right )$$

(15.16)

is the marginal posterior mean of β and

$$\text{ var}(\beta \vert y) = {\sigma }_{\tilde{\beta }}^{2} ={ \left ( \frac{1} {{\sigma }_{\hat{\beta }}^{2}} + \frac{1} {{\sigma }_{\beta }^{2}}\right )}^{-1}$$

(15.17)

is the marginal posterior variance of β. Both the mean and the variance contain $\hat{\beta }$ and $\hat{{\sigma }}^{2}$, the MLEs of β and σ², respectively. The role that $\hat{{\sigma }}^{2}$ plays in the above equations is through

$${\sigma }_{\hat{\beta }}^{2} ={ \left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}^{2}\right )}^{-1}\hat{{\sigma }}^{2}$$

(15.18)

The density of the t-distributed variable with mean $\tilde{\beta }$ and variance ${\sigma }_{\tilde{\beta }}^{2}$ is denoted by

$$p(\beta \vert y) = {t}_{n+\tau -1}(\beta \vert \tilde{\beta },{\sigma }_{\tilde{\beta }}^{2})$$

(15.19)

The marginal posterior density for σ² is obtained by integrating the joint posterior over β,

$$p({\sigma }^{2}\vert y) ={ \int \nolimits \nolimits }_{-\infty }^{\infty }p(\beta ,{\sigma }^{2}\vert y)\mathrm{d}\beta $$

(15.20)

which happens to be a scaledinverse chi-square distribution with

$${\tau }^{{_\ast}} = n + \tau - 1$$

(15.21)

degrees of freedom and ascale parameter (Sorensen and Gianola 2002)

$${\omega }^{{_\ast}} = \frac{\tau \omega +{ \sum \nolimits }_{j=1}^{n}{({y}_{j} - {X}_{j}\tilde{\beta })}^{2}} {\tau + n - 1}$$

(15.22)

The density of the new scaled inverse chi-square variable is denoted by

$$p({\sigma }^{2}\vert y) = \text{ Inv} - {\chi }^{2}({\sigma }^{2}\vert {\tau }^{{_\ast}},{\omega }^{{_\ast}})$$

(15.23)

The mean and variance of the above distribution are

$$\text{ E}({\sigma }^{2}\vert y) =\tilde{ {\sigma }}^{2} = \frac{\tau \omega +{ \sum \nolimits }_{j=1}^{n}{({y}_{j} - {X}_{j}\tilde{\beta })}^{2}} {\tau + n - 3}$$

(15.24)

and

$$\text{ var}({\sigma }^{2}\vert y) = \frac{2{[\tau \omega +{ \sum \nolimits }_{j=1}^{n}{({y}_{j} - {X}_{j}\tilde{\beta })}^{2}]}^{2}} {{(\tau + n - 3)}^{2}(\tau + n - 5)}$$

(15.25)

respectively (Sorensen and Gianola 2002).

The marginal posterior distribution of each parameter contains all the information we have gathered for that parameter. The Bayesian estimate of that parameter can be either theposterior mean, theposterior mode, or theposterior median, depending on the preference of the investigator. Themarginal posterior distribution of a parameter itself can also be treated as an estimate of the parameter. Assume that the marginal posterior mean of a parameter is considered as the Bayesian estimate of that parameter. The Bayesian estimates of β and σ² are $\tilde{\beta }$ and $\tilde{{\sigma }}^{2}$, respectively.

The simple regression analysis (regression through origin) discussed above is the simplest case of Bayesian analysis where the marginal posterior distribution of each parameter is known. In most situations, especially when the dimensionality of the parameter θ is high, the marginal posterior distribution of a single parameter involves high-dimensional multiple integration, and often the integration does not have an explicit expression. Therefore, the posterior distribution of a parameter often has an unknown form, which makes the Bayesian inference difficult. Thanks to the ever-growing computing power, we can perform multiple numerical integrations very efficiently. We can even utilize Monte Carlo integration by repeatedly simulating multivariate random variables. For extremely high-dimensional problems, Monte Carlo integration is perhaps the only way to implement the Bayesian method.

Let us now discuss the relationship between thejoint distribution and themarginal distribution. Let $\theta =\{ {\theta }_{1},{\theta }_{2},\ldots ,{\theta }_{m}\}$ be an m dimensional multiple variables. Let $p(\theta ) = p({\theta }_{1},\ldots ,{\theta }_{m}\vert y)$ be the joint posterior distribution. The marginal posterior distribution for the kth component is

$$p({\theta }_{k}\vert y) = \int \nolimits \nolimits \ldots \int \nolimits \nolimits p({\theta }_{1},\ldots ,{\theta }_{m}\vert y)\mathrm{d}{\theta }_{1}\ldots \mathrm{d}{\theta }_{k-1}\mathrm{d}{\theta }_{k+1}\ldots \mathrm{d}{\theta }_{m}$$

(15.26)

If themultiple integration has an explicit form and we can recognize the marginal distribution of θ_k, i.e., p(θ_k | y) is the density of a well-known distribution, then the expectation (or mode) of this distribution is what we want to know in the Bayesian analysis. Suppose that we know neither the joint posterior distribution nor the marginal posterior distribution, but somehow we have a joint posterior sample of multivariate θ with size N. In other words, we are only given N joint observations of θ. The sample is denoted by $\{{\theta }^{(1)},{\theta }^{(2)},\ldots ,{\theta }^{(N)}\}$. We can imagine that the data in the sample are arranged in a N ×m matrix. Each row represents an observation, while each column represents a variable. What is the estimated marginal expectation of θ_k drawn from this sample? Remember that this sample is supposed to be generated from the joint posterior distribution. The answer is simple; we only need to calculate the algebraic mean of variable θ_k from this sample, i.e.,

$$\bar{{\theta }}_{k} = \frac{1} {N}{\sum \nolimits }_{j=1}^{N}{\theta }_{ k}^{(j)}$$

(15.27)

This average value of θ_k is an empirical marginal posterior mean of θ_k, i.e., a Bayesian estimate of θ_k. We can see that as long as we have a joint sample of θ, we can infer the marginal mean of a single component of θ simply by calculating the mean of that component from the sample. While calculating the mean only requires knowledge learned from elementary school, generating the joint sample of θ becomes the main focus of the Bayesian analysis.

2 Markov Chain Monte Carlo

There are many different ways to generate a sample of θ from the joint distribution. The classical method is to use the following sequential approach to generate the first observation, denoted by θ⁽¹⁾:

Simulate θ₁ ⁽¹⁾ from p(θ₁ | y)
Simulate θ₂ ⁽¹⁾ from p(θ₂ | θ₁ ⁽¹⁾, y)
Simulate θ₃ ⁽¹⁾ from p(θ₃ | θ₁ ⁽¹⁾, θ₂ ⁽¹⁾, y)
$\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots $
Simulate θ_m ⁽¹⁾ from $p({\theta }_{m}\vert {\theta }_{1}^{(1)},\ldots ,{\theta }_{m-1}^{(1)},y)$

The process is simply repeated N times to simulate an entire sample of θ. Observations generated this way are independent. We can see that we still need the marginal distribution for θ₁ and various levels of marginality of other components. Only θ_m is generated from a fully conditional posterior, which does not involve any integration. Therefore, this sequential approach of generating random sample is not what we want.

TheMCMC approach draws all variables from their fully conditional posterior distributions. To draw a variable from a conditional distribution, we must have some values of the variables that are conditioned on. For example, to draw y from p(y | x), the value of x must be known. Let θ⁽⁰⁾ be the initial value of multivariate θ. The first observation of θ is drawn using the following process:

Simulate θ₁ ⁽¹⁾ from p(θ₁ | θ_− 1 ⁽⁰⁾, y)
Simulate θ₂ ⁽¹⁾ from p(θ₂ | θ_− 2 ⁽⁰⁾, y)
Simulate θ₃ ⁽¹⁾ from p(θ₃ | θ_− 3 ⁽⁰⁾, y)
$\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots $
Simulate θ_m ⁽¹⁾ from p(θ_m | θ_− m ⁽⁰⁾, y)

where θ_− k ⁽⁰⁾ is a subset of vector θ⁽⁰⁾ that excludes the kth element, i.e.,

$${\theta }_{-k}^{(0)} =\{ {\theta }_{ 1}^{(0)},\ldots ,{\theta }_{ k-1}^{(0)},{\theta }_{ k+1}^{(0)},\ldots ,{\theta }_{ m}^{(0)}\}$$

This special notation (negative subscript) has tremendously simplified the expressions of the MCMC sampling algorithm. The above process concludes the simulation for the first observation. The process is repeated N times to generate a sample of θ with size N. The sampled θ^(t) depends on θ^{(t − 1)}, i.e., the sampled θ in the current cycle only depends on the θ in the previous cycle. Therefore, the sequence

$$\{{\theta }^{(0)} \rightarrow {\theta }^{(1)} \rightarrow \cdots \rightarrow {\theta }^{(N)}\}$$

forms aMarkov chain, which explains why the method is calledMarkov chain Monte Carlo. Because of the Markov chain property, the observations are not independent, and the first few hundred (or even thousand) observations highly depend on the initial value θ⁽⁰⁾ used to start the chain. Once the chain is stabilized, i.e., the sampled θ does not depend on the initial value, we say that the chain has reached itsstationary distribution. The period from the beginning to the time when the stationary distribution is reached is called theburn-in period. Observations in the burn-in period should be deleted. After the burn-in period, the observations are presumably sampled from the joint distribution. The observations may still be correlated; such a correlation is calledserial correlation orautocorrelation. We can save one observation in every sth cycle to remove the serial correlation, where s = 20 or s = 50 or any other integers, depending on the particular problem. This process is calledtrimming orthinning the Markov chain. After burn-in deleting and chain trimming, we collect N ^∗ observations from the total of N observations simulated. The sample of θ with N ^∗ observations is theposterior sample (sampled from the p(θ | y) distribution). Any Bayesian statistics can be inferred empirically from this posterior sample.

Recall that the marginal posterior for β is a t-distribution and the marginal posterior for σ² is a scaled inverse chi-square distribution. Both distributions have complicated forms of expression. The MCMC sampling process only requires the conditional posterior distribution, not the marginal posterior. Let us now look at the conditional posterior distribution of each parameter of the simple regression analysis.

As previously shown, the MLE of β is

$$\hat{\beta } ={ \left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}^{2}\right )}^{-1}\left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}^{}{y}_{j}\right )$$

(15.28)

and the variance of the estimate is

$${\sigma }_{\hat{\beta }}^{2} ={ \left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}^{2}\right )}^{-1}{\sigma }^{2}$$

(15.29)

Note that ${\sigma }_{\hat{\beta }}^{2}$ differs from that defined in (15.18) in that σ² is used here in place of $\hat{{\sigma }}^{2}$. So, just from the data without any prior information, we can infer β. The estimated β itself is a variable, which follows a normal distribution denoted by

$$\beta \sim {N}_{1}(\hat{\beta },{\sigma }_{\hat{\beta }}^{2})$$

(15.30)

The subscript 1 means that this is an estimate drawn from the first source of information. Before we observed the data, the prior information about β is considered the second source of information, which is denoted by

$$\beta \sim {N}_{2}({\mu }_{\beta },{\sigma }_{\beta }^{2})$$

(15.31)

The posterior distribution of β is obtained by combining the two sources of information (Box and Tiao 1973), which remains normal and is denoted by

$$\beta \sim N(\bar{\beta },{\sigma }_{\bar{\beta }}^{2})$$

(15.32)

where

$$\bar{\beta } ={ \left ( \frac{1} {{\sigma }_{\hat{\beta }}^{2}} + \frac{1} {{\sigma }_{\beta }^{2}}\right )}^{-1}\left ( \frac{\hat{\beta }} {{\sigma }_{\hat{\beta }}^{2}} + \frac{{\mu }_{\beta }} {{\sigma }_{\beta }^{2}}\right )$$

(15.33)

and

$${\sigma }_{\bar{\beta }}^{2} ={ \left ( \frac{1} {{\sigma }_{\hat{\beta }}^{2}} + \frac{1} {{\sigma }_{\beta }^{2}}\right )}^{-1}$$

(15.34)

We now have the conditional posterior distribution for β denoted by

$$p(\beta \vert {\sigma }^{2},y) = N(\beta \vert \bar{\beta },{\sigma }_{\bar{ \beta }}^{2})$$

(15.35)

from which a random β is sampled.

Given β, we now evaluate the conditional posterior distribution of σ². The prior for σ² is a scaled inverse chi-square distribution with τ degrees of freedom and a scale parameter ω, denoted by

$$p({\sigma }^{2}) =\mathrm{ Inv} - {\chi }^{2}({\sigma }^{2}\vert \tau ,\omega )$$

(15.36)

The posterior distribution remains a scaled inverse chi-square with a modified degree of freedom and a modified scale parameter, denoted by

$$p({\sigma }^{2}\vert \beta ,y) =\mathrm{ Inv} - {\chi }^{2}({\sigma }^{2}\vert {\tau }^{{_\ast}},{\omega }^{{_\ast}})$$

(15.37)

where

$${\tau }^{{_\ast}} = \tau + n$$

(15.38)

and

$${\omega }^{{_\ast}} = \frac{\tau \omega +{ \sum \nolimits }_{j=1}^{n}{({y}_{j} - {X}_{j}\beta )}^{2}} {\tau + n}$$

(15.39)

Note that ω^∗ defined here differs from that defined in (15.22) in that β is used here while $\tilde{\beta }$ is used in (15.22). The conditional posterior of β is normal, which belongs to the same distribution family as the prior distribution. Similarly, the conditional posterior of σ² remains a scaled inverse chi-square, also the same type of distribution as the prior. These priors are calledconjugate priors because they lead to the conditional posterior distributions of the same type.

The MCMC sampling process is summarized as:

1.
Initialize $\beta = {\beta }^{(0)}\mathrm{\ and\ }{\sigma }^{2} = {\sigma }_{}^{2(0)}$
2.
Simulate β⁽¹⁾ from $N(\beta \vert \bar{\beta },{\sigma }_{\bar{\beta }}^{2})$
3.
Simulate σ²⁽¹⁾ from Inv − χ²(σ² | τ^∗, ω^∗)
4.
Repeat Steps (2) and (3) until N observations of the posterior sample are collected.

It can be seen that the MCMC sampling-based regression analysis only involves two distributions, a normal distribution and a scaled inverse chi-square distribution. Most software packages have built-in functions to generate random variables from some simple distributions, e.g., N(0, 1) and χ²(τ). Let Z ∼ N(0, 1) be a realized value drawn from the standardized normal distribution and X ∼ χ²(τ^∗) be a realized value drawn from a chi-square distribution with τ^∗ degrees of freedom. To sample β from $N(\bar{\beta },{\sigma }_{\bar{\beta }}^{2})$, we sample Z first and then take

$$\beta = {\sigma }_{\bar{\beta }}Z +\bar{ \beta }$$

(15.40)

To sample σ² from { Inv} − χ²(τ^∗, ω^∗), we first sample X and then take

$${\sigma }^{2} = \frac{{\tau }^{{_\ast}}\ {\omega }^{{_\ast}}} {X}$$

(15.41)

In summary, the MCMC process requires sampling a parameter only from thefully conditional posterior distribution, which usually has a simple form, e.g., normal or chi-square, and it draws one variable at a time. This type of MCMC sampling is also calledGibbs sampling (Geman and Geman 1984). With the MCMC procedure, we turn ourselves into experimentalists. Like plant breeders who plant seeds, let the seeds grow into plants, and measure the average plant yield, we plant the seeds of parameters in silico, let the parameters “grow,” and measure the average of each parameter. The Bayesian posterior mean of a parameter simply takes the algebraic mean of a parameter in the posterior sample collected from the in silico experiment. Once the Bayesian method is implemented via the MCMC algorithm, it is no longer owned by a few “Bayesians”; rather, it has become a popular tool that can be used by people in all areas, including engineers, biologists, plant and animal breeders, social scientists, and so on.

Before we move on to the next section, let us demonstrate the MCMC sampling process using the simple regression as an example. The values of x and y for 20 observations are given in Table 15.1.

Table 15.1 Data used in the text to demonstrate the MCMC sampling process

Full size table

The model is

$${y}_{j} = {X}_{j}\beta + {\epsilon }_{j},\ \ \forall j = 1,\ldots ,20$$

The sample size is n = 20. Before introducing the prior distributions, we provide the MLEs of the parameters, which are

$$\begin{array}{rlrlrl} \hat{\beta } & ={ \left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}^{2}\right )}^{-1}{ \sum \nolimits }_{j=1}^{n}{X}_{ j}{y}_{j} = 2.5115 & & \\ \hat{{\sigma }}^{2} & = \frac{1} {n}{\sum }_{j=1}^{n}{({y}_{ j} - {X}_{j}\hat{\beta })}^{2} = 2.3590 & & \end{array}$$

The variance of $\hat{\beta }$ is

$${\sigma }_{\hat{\beta }}^{2} ={ \left ({\sum \nolimits }_{j=1}^{n}{X}_{ j}^{2}\right )}^{-1}\hat{{\sigma }}^{2} = 0.1180$$

Let us choose the following prior distributions:

$$p(\beta ) = N(\beta \vert {\mu }_{\beta },{\sigma }_{\beta }^{2}) = N(\beta \vert 0.1,1.0)$$

and

$$p({\sigma }^{2}) = \text{ Inv} - {\chi }^{2}({\sigma }^{2}\vert \tau ,\omega ) = \text{ Inv} - {\chi }^{2}({\sigma }^{2}\vert 3,3.5)$$

The marginal posterior mean and posterior variance of β are

$$\text{ E}(\beta \vert y) =\tilde{ \beta } ={ \left ( \frac{1} {{\sigma }_{\hat{\beta }}^{2}} + \frac{1} {{\sigma }_{\beta }^{2}}\right )}^{-1}\left ( \frac{\hat{\beta }} {{\sigma }_{\hat{\beta }}^{2}} + \frac{{\mu }_{\beta }} {{\sigma }_{\beta }^{2}}\right ) = 2.2571$$

and

$$\text{ var}(\beta \vert y) = {\sigma }_{\tilde{\beta }}^{2} ={ \left ( \frac{1} {{\sigma }_{\hat{\beta }}^{2}} + \frac{1} {{\sigma }_{\beta }^{2}}\right )}^{-1} = 0.1055$$

respectively. The marginal poster mean and posterior variance of σ² are

$$\text{ E}({\sigma }^{2}\vert y) =\tilde{ {\sigma }}^{2} = \frac{\tau \omega +{ \sum \nolimits }_{j=1}^{n}{({y}_{j} - {X}_{j}\tilde{\beta })}^{2}} {\tau + n - 3} = 2.8308$$

and

$$\text{ var}({\sigma }^{2}\vert y) = \frac{2{[\tau \omega +{ \sum \nolimits }_{j=1}^{n}{({y}_{j} - {X}_{j}\tilde{\beta })}^{2}]}^{2}} {{(\tau + n - 3)}^{2}(\tau + n - 5)} = 0.8904$$

respectively.

We now use the MCMC sampling approach to generating the joint posterior sample for β and σ² and calculate the empirical marginal posterior means and posterior variances for the two parameters. For a problem as simple as this, the burn-in period can be very short or even without burn-in. Figure 15.1 shows the first 500 cycles of MCMC sampler (including the burn-in period) for the two parameters, β and σ². The chains converge immediately to thestationary distribution. To be absolutely sure that we actually collect samples from the stationary distribution, we set the burn-in period to 1,000 iterations (very safe), and the chain was subsequently trimmed to save one observation in every 50 iterations after the burn-in. Theposterior sample size was 10,000. The total number of MCMC cycles was $1,000 + 50 \times 10,000 = 5,01,000$. The empirical marginal posterior means and marginal posterior variances for β and σ² are given in Table 15.2, which are very close to the theoretical values given before.

Table 15.2 Empirical marginal posterior means and posterior variances for the two parameters, β and σ²

Full size table

3 Mapping Multiple QTL

Although interval mapping (under the single QTL model) can detect multiple QTL by evaluating the number of peaks in the test statistic profile, it cannot provide accurate estimates of QTL effects. The best way to handle multiple QTL is to use amultiple QTL model. Such a model requires knowledge of the number of QTL. Most QTL mappers consider that the number of QTL is an important parameter and should be estimated in QTL mapping experiments. Therefore, model selection is often conducted to determine the number of QTL (Broman and Speed 2002). Under the Bayesian framework, model selection is implemented through thereversible jump MCMC algorithm (Sillanpää and Arjas 1998). Xu [2003] and Wang et al. [2005b] had a quite different opinion, in which the number of QTL is not considered as an important parameter. According to Wang et al. [2005b], we can propose a model that includes as many QTL as the model can handle. Such a model is called an oversaturated model. Some of the proposed QTL may be real, but most of them are spurious. As long as we can force the spurious QTL to have zero or close to zero estimated effects, theoversaturated model is considered satisfactory. Theselective shrinkage Bayesian method can generate the result of QTL mapping exactly the same as we expect, that is, spurious QTL effects are shrunken to zero while true QTL have effects subject to no shrinkage.

3.1 Multiple QTL Model

Themultiple QTL model can be described as

$${y}_{j} ={ \sum \nolimits }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum \nolimits }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k} + {\epsilon }_{j}$$

(15.42)

where y _j is the phenotypic value of a trait for individual j for $j = 1,\ldots ,n$ and n is the sample size. The non-QTL effects are included in vector $\beta =\{ {\beta }_{1},\ldots ,{\beta }_{q}\}$ with matrix ${X}_{j} =\{ {X}_{j1},\ldots ,{X}_{jq}\}$ being the design matrix to connect β and y _j. The effect of the kth QTL is denoted by γ_k for $k = 1,\ldots ,p$ where p is the proposed number of QTL in the model. Vector ${Z}_{j} =\{ {Z}_{j1},\ldots ,{Z}_{jp}\}$ is determined by the genotypes of the proposed QTL in the model. The residual error ε_j is assumed to be i.i.d. N(0, σ²). Let us use a BC population as an example. For the kth QTL, Z _jk = 1 for one genotype and ${Z}_{jk} = -1$ for the alternative genotype. Extension to F ₂ population and adding the dominance effects are straightforward (only requires adding more QTL effects and increasing the model dimension). The proposed number of QTL is p, which must be larger than the true number of QTL to make sure that large QTL will not be missed. The optimal strategy is to put one QTL in every d cM of the genome, where d can be any value between 5 and 50. If d < 5, the model will be ill conditioned due to multicollinearity. If d > 50, some genome regions may not be visited by the proposed QTL even if there are true QTL located in those regions. Of course, a larger sample size is required to handle a larger model (more QTL).

3.2 Prior, Likelihood, and Posterior

The data involved in QTL mapping include the phenotypic values of the trait and marker genotypes for all individuals in the mapping population. Unlike Wang et al. [2005b] who expressed marker genotypes explicitly as data in the likelihood, here we suppress the marker genotypes from the data to simplify the notation. The linkage map of markers and the marker genotypes only affect the way to calculate QTL genotypes. We first use the multipoint method to calculate the genotype probabilities for all putative loci of the genome. These probabilities are then treated as the prior probabilities of QTL genotypes, from which the posterior probabilities are calculated by incorporating the phenotype and the current parameter values. Therefore, the data used to construct the likelihood are represented by $y =\{ {y}_{j},\ldots ,{y}_{n}\}$. The vector of parameters is denoted by θ, which consists of the positions of the proposed QTL denoted by $\lambda =\{ {\lambda }_{1},\ldots ,{\lambda }_{p}\}$, the effects of the QTL denoted by $\gamma =\{ {\gamma }_{1},\ldots ,{\gamma }_{p}\}$, the non-QTL effects denoted by $\beta =\{ {\beta }_{1},\ldots ,{\beta }_{q}\}$, and the residual error variance σ². Therefore, θ = { λ, β, γ, ψ, σ²}, where $\psi =\{ {\sigma }_{1}^{2},\ldots ,{\sigma }_{p}^{2}\}$, will be defined later. The QTL genotypes ${Z}_{j} =\{ {Z}_{j1},\ldots ,{Z}_{jp}\}$ are not parameters but missing values. The missing genotypes can be redundantly expressed as ${\delta }_{j} =\{ {\delta }_{j1},\ldots ,{\delta }_{jp}\}$ where

$${\delta }_{jk} = \delta ({G}_{jk},\kappa )$$

is theδ function. If G _jk = κ, then δ(G _jk, κ) = 1, else δ(G _jk, κ) = 0, where G _jk is the genotype of the kth QTL for individual j and κ = 1, 2, 3 for an F ₂ population (three genotypes per locus). The probability density of δ is

$$p({\delta }_{j}\vert \lambda ) ={ \prod \nolimits }_{k=1}^{p}p({\delta }_{ jk}\vert {\lambda }_{k})$$

(15.43)

The independence of the QTL genotype across loci is due to the fact that they are the conditional probabilities given marker information. So, the marker information has entered here to infer the QTL genotypes. The prior for the β is

$$p(\beta ) ={ \prod \nolimits }_{i=1}^{q}p({\beta }_{ i}) = \text{ constant}$$

(15.44)

This is auniform prior or, more appropriately,uninformative prior. The reason for choosing uninformative prior for β is that the dimensionality of β is usually very low so that β can be precisely estimated from the data alone without resorting to any prior knowledge. The prior for the QTL effects is

$$p(\gamma \vert \psi ) ={ \prod \nolimits }_{k=1}^{p}p({\gamma }_{ k}\vert {\sigma }_{k}^{2}) ={ \prod \nolimits }_{k=1}^{p}N({\gamma }_{ k}\vert 0,{\sigma }_{k}^{2})$$

(15.45)

where σ_k ² is the variance of the prior distribution for the kth QTL effect. Collectively, these variances are denoted by $\psi =\{ {\sigma }_{1}^{2},\ldots ,{\sigma }_{p}^{2}\}$. This is a highly informative prior because of the zero expectation of the prior distribution. The variance of the prior distribution determines the relative weights of the prior information and the data. If σ_k ² is very small, the prior will dominate the data, and thus, the estimated γ_k will be shrunken toward the prior expectation, that is, zero. If σ_k ² is large, the data will dominate the prior so that the estimated γ_k will be largely unaltered (subject to no shrinkage). The key difference between this prior and the prior commonly used in Bayesian regression analysis is that different regression coefficient has a different prior variance and thus different level of shrinkage. Therefore, this method is also called theselective shrinkage method (Wang et al. 2005b). The classical Bayesian regression method, however, often uses a common prior for all regression coefficients, i.e., ${\sigma }_{1}^{2} = {\sigma }_{2}^{2} = \cdots = {\sigma }_{p}^{2} = {\sigma }_{\gamma }^{2}$, which is also called ridge regression (Hoerl and Kennard 1970). The problem with this selective shrinkage method is that there are too many prior variances and it is hard to choose the appropriate values for the variances. There are two approaches to choosing the prior variances,empirical Bayesian (Xu 2007) andhierarchical modeling (Gelman 2006). The empirical Bayesian approach attempts to estimate the prior variances under the mixed model methodology by treating each regression coefficient as a random effect. The hierarchical modeling treats the prior variances as parameters and assigns a higher level prior to each variance component. By treating the variances as parameters, rather than as hyperparameters, we can estimate the variances along with the regression coefficients. Here, we take the hierarchical model approach and assign each σ_k ² a prior distribution. The empirical Bayesian method will be discussed in the next chapter. The scaled inverse chi-square distribution is chosen for each variance component,

$$p({\sigma }_{k}^{2}) = \text{ Inv} - {\chi }^{2}({\sigma }_{ k}^{2}\vert \tau ,\omega ),\ \ \forall k = 1,\ldots ,p$$

(15.46)

The degree of freedom τ and the scale parameter ω are hyperparameters, and their influence on the estimated regression coefficients is much weaker because the influence is through the σ_k ²’s. It is now easy to choose τ and ω. The degree of freedom τ is also called the prior belief. Although the proper prior should have τ > 0 and ω > 0, our past experience showed that an improper prior works better than the proper prior. Therefore, we choose $\tau = \omega = 0$, which leads to

$$p({\sigma }_{k}^{2}) \propto \frac{1} {{\sigma }_{k}^{2}},\ \ \forall k = 1,\ldots ,p$$

(15.47)

The joint prior for all the σ_k ² is

$$p(\psi ) ={ \prod \nolimits }_{k=1}^{p}p({\sigma }_{ k}^{2})$$

(15.48)

The residual error variance is also assigned to the improper prior,

$$p({\sigma }^{2}) \propto \frac{1} {{\sigma }^{2}}$$

(15.49)

The positions of the QTL depend on the number of QTL proposed, the number of chromosomes, and the size of each chromosome. Based on the average coverage per QTL (e.g., 30 cM per QTL), the number of QTL allocated to each chromosome can be calculated. Let p _c be the number of QTL proposed for the cth chromosome. These p _c QTL should be placed evenly along the chromosome. We can let the positions fixed throughout all the MCMC process so that the positions are simply constants (not parameters of interest). In this case, more QTL should be proposed to make sure that the genome is well covered by the proposed QTL. The alternative and also more efficient approach is to allow QTL position to move along the genome during the MCMC process. There is a restriction for the moving range of each QTL. The positions are disjoint along the chromosome. The first QTL must move between the first marker and the second QTL. The last QTL must move between the last marker and the second last QTL. All other QTL must move between the QTL in the left and the QTL in the right of the current QTL, i.e., the QTL that flank the current QTL. Based on this search strategy, the joint prior probability is

$$p(\lambda ) = p({\lambda }_{1})p({\lambda }_{2}\vert {\lambda }_{1})\ldots p({\lambda }_{{p}_{c}}\vert {\lambda }_{{p}_{c}-1})$$

(15.50)

Given the positions of all other QTL, the conditional probability of the position of QTL k is

$$p({\lambda }_{k}) = \frac{1} {{\lambda }_{k+1} - {\lambda }_{k-1}}$$

(15.51)

If QTL k is located at either end of a chromosome, the above prior needs to be modified by replacing either λ_k − 1 or λ_k + 1 by the position of the nearest end marker. We now have a situation where the prior probability of one variable depends on values of other variables. This type of prior is called adaptive prior.

Since marker information has been used to calculate the prior probabilities of QTL genotypes, they are no longer expressed as data. The only data appearing explicitly in the model are the phenotypic values of the trait. Conditional on all parameters and the missing values, the probability density of y _j is normal. Therefore, the joint probability density of all the y _j’s (called the likelihood) is

$$\begin{array}{rlrlrl} p(y\vert \theta ,\delta ) & ={ \prod }_{j=1}^{n}p({y}_{ j}\vert \theta ,{\delta }_{j}) & & \\ & ={ \prod }_{j=1}^{n}N\left ({y}_{ j}\left \vert {\sum \nolimits }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum \nolimits }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k},{\sigma }^{2}\right.\right ) &\end{array}$$

(15.52)

The fully conditional posterior of each variable is defined as

$$p({\theta }_{i}\vert {\theta }_{-i},\delta ,y) \propto p({\theta }_{i},{\theta }_{-i},\delta ,y)$$

(15.53)

where θ_i is a single element of the parameter vector θ and θ_− i is the collection of the remaining elements. The symbol ∝ means that a constant factor (not a function of parameter θ_i) has been ignored. The joint probability density $p({\theta }_{i},{\theta }_{-i},\delta ,y) = p(\theta ,\delta ,y)$ is expressed as

$$\begin{array}{rlrlrl} p(\theta ,\delta ,y) \propto &p(y\vert \theta ,\delta )p(\delta \vert \theta )p(\theta ) & & \\ = &p(y\vert \theta ,\delta )p(\beta \vert \psi )p(\psi )p(\delta \vert \lambda )p(\lambda )p({\sigma }^{2}) &\end{array}$$

(15.54)

The fully conditional posterior probability density for each variable is simply derived by treating all other variables as constants and comparing the kernel of the density with a standard distribution. After some algebraic manipulation, we obtain the fully conditional distribution for most of the unknown variables (including parameters and missing values).

The fully conditional posterior for the non-QTL effect is

$$p({\beta }_{i}\vert \ldots \,) = N({\beta }_{i}\vert \hat{{\beta }}_{i},{\sigma }_{\hat{{\beta }}_{i}}^{2})$$

(15.55)

The special notation $p({\beta }_{i}\vert \ldots \,)$ is used to express the fully conditional probability density. The three dots ($\ldots $) after the vertical bar mean everything else except the variable of interest. The posterior mean and posterior variance are calculated using (15.58) and (15.59) given below:

$$\hat{{\beta }}_{i} ={ \left ({\sum }_{j=1}^{n}{X}_{ ji}^{2}\right )}^{-1}{ \sum }_{j=1}^{n}{X}_{ ji}\left ({y}_{j} -{\sum }_{i^{\prime}\neq i}^{q}{X}_{ ji^{\prime}}{\beta }_{i^{\prime}} -{\sum }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k}\right )$$

(15.56)

and

$${\sigma }_{\hat{{\beta }}_{i}}^{2} ={ \left ({\sum }_{j=1}^{n}{X}_{ ji}^{2}\right )}^{-1}{\sigma }^{2}$$

(15.57)

The fully conditional posterior for the kth QTL effect is

$$p({\gamma }_{k}\vert \ldots \,) = N({\gamma }_{k}\vert \hat{{\gamma }}_{k},{\sigma }_{\hat{{\gamma }}_{k}}^{2})$$

(15.58)

where

$$\hat{{\gamma }}_{k} ={ \left ({\sum \nolimits }_{j=1}^{n}{Z}_{ jk}^{2} + \frac{{\sigma }^{2}} {{\sigma }_{k}^{2}}\right )}^{-1}{ \sum \nolimits }_{j=1}^{n}{Z}_{ ji}\left ({y}_{j} -{\sum \nolimits }_{i=1}^{q}{X}_{ ji}{\beta }_{i} -{\sum \nolimits }_{k^{\prime}\neq k}^{p}{Z}_{ jk^{\prime}}{\gamma }_{k^{\prime}}\right )$$

(15.59)

and

$${\sigma }_{\hat{{\gamma }}_{k}}^{2} ={ \left ({\sum \nolimits }_{j=1}^{n}{Z}_{ jk}^{2} + \frac{{\sigma }^{2}} {{\sigma }_{k}^{2}}\right )}^{-1}{\sigma }^{2}$$

(15.60)

Comparing the conditional posterior distributions of β_i and γ_k, we notice the difference between a normal prior and a uniform prior with respect to the effects on the posterior distributions. When a normal prior is used, a shrinkage factor, $\frac{{\sigma }^{2}} {{\sigma }_{k}^{2}}$, is added to ∑_j = 1 ⁿ Z _jk ². If σ_k ² is very large, the shrinkage factor disappears, meaning no shrinkage. On the other hand, if σ_k ² is small, the shrinkage factor will dominate over ∑_j = 1 ⁿ Z _jk ², and in the end, the denominator will become infinitely large, leading to zero expectation and zero variance for the conditional posterior distribution γ_k. As such, the estimated γ_k is completely shrunken to zero. The conditional posterior distribution for each of the variance component σ_k ² is a scaled inverse chi-square variable with probability density

$$p({\sigma }_{k}^{2}\vert \ldots \,) = \text{ Inv} - {\chi }^{2}\left ({\sigma }_{ k}^{2}\left \vert \tau + 1, \frac{\tau \omega + {\gamma }_{k}^{2}} {\tau + 1} \right.\right )$$

(15.61)

where $\tau = \omega = 0$. The conditional posterior density for the residual error variance is

$$p({\sigma }^{2}\vert \ldots \,) = \text{ Inv} - {\chi }^{2}\left ({\sigma }^{2}\left \vert \tau + n, \frac{\tau \omega + n{S}_{e}^{2}} {\tau + n} \right.\right )$$

(15.62)

where

$${S}_{e}^{2} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}{\left ({y}_{ j} -{\sum \nolimits }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum \nolimits }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k}\right )}^{2}$$

(15.63)

The next step is to sample QTL genotypes, which determine the values of the Z _j variables. Let us again use a BC population as an example and consider sampling the kth QTL genotype given that every other variable is known. There are two sources of information available to infer the probability for each of the two genotypes of the QTL. One information comes from the markers denoted by p _j( + 1) and p _j( − 1), respectively, for the two genotypes, where ${p}_{j}(+1) + {p}_{j}(-1) = 1$. These two probabilities are calculated from the multipoint method (Jiang and Zeng 1997). The other source of information comes from the phenotypic value. The connection between the phenotypic value and the QTL genotype is through the probability density of y _j given the QTL genotype. For the two alternative genotypes of the QTL , i.e., Z _jk = 1 and ${Z}_{jk} = -1$, the two probability densities are

$$\begin{array}{rlrlrl} p({y}_{j}\vert {Z}_{jk} = +1) & = N\left ({y}_{j}\left \vert {\sum \nolimits }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum \nolimits }_{k^{\prime}\neq k}^{p}{Z}_{ jk^{\prime}}{\gamma }_{k^{\prime}} + {\gamma }_{k},{\sigma }^{2}\right.\right ) & & \\ p({y}_{j}\vert {Z}_{jk} = -1) & = N\left ({y}_{j}\left \vert {\sum \nolimits }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum \nolimits }_{k^{\prime}\neq k}^{p}{Z}_{ jk^{\prime}}{\gamma }_{k^{\prime}} - {\gamma }_{k},{\sigma }^{2}\right.\right ) &\end{array}$$

(15.64)

Therefore, the conditional posterior probabilities for the two genotypes of the QTL are

$$\begin{array}{rlrlrl} {p}_{j}^{{_\ast}}(+1) & = \frac{{p}_{j}(+1)p({y}_{j}\vert {Z}_{jk} = +1)} {{p}_{j}(+1)p({y}_{j}\vert {Z}_{jk} = +1) + {p}_{j}(-1)p({y}_{j}\vert {Z}_{jk} = -1)} & & \\ {p}_{j}^{{_\ast}}(-1) & = \frac{{p}_{j}(-1)p({y}_{j}\vert {Z}_{jk} = -1)} {{p}_{j}(+1)p({y}_{j}\vert {Z}_{jk} = +1) + {p}_{j}(-1)p({y}_{j}\vert {Z}_{jk} = -1)} &\end{array}$$

(15.65)

where ${p}_{j}^{{_\ast}}(+1) = p({Z}_{jk} = +1\vert \ldots \,)$ and ${p}_{j}^{{_\ast}}(-1) = p({Z}_{jk} = -1\vert \ldots \,)$ are the posterior probabilities of the two genotypes. The genotype of the QTL is ${Z}_{jk} = 2u - 1$, where u is sampled from a Bernoulli distribution with probability p _j ^∗( + 1). So far we have completed the sampling process for all variables except the QTL positions. If we place a large number of QTL evenly distributed along the genome, say one QTL in every 10 cM, we can let the positions fixed (not moving) across the entire MCMC process. Although this fixed-position approach does not generate accurate result, it does provide some general information about the ranges where the QTL are located. Suppose that the trait of interest is controlled by only 5 QTL and we place 100 QTL evenly distributed on the genome, then majority of the assumed QTL are spurious. The Bayesian shrinkage method allows the spurious QTL to be shrunken to zero. This is why theBayesian shrinkage method does not need variable selection. A QTL with close to zero estimated effect is equivalent to being excluded from the model. When the assumed QTL positions are fixed, investigators actually prefer to put the QTL at marker positions because marker positions contain the maximum information. This multiple-marker analysis is recommended before conducting detailed fully Bayesian analysis with QTL positions moving. Result of the detailed analysis is more or less the same as that of the multiple-marker analysis. Further detailed analysis is only conducted after the investigators get a general picture of the result.

We now discuss several different ways to allow QTL positions to move across the genome. If our purpose of QTL mapping is to find the regions of the genome that most likely carry QTL, the number of QTL is irrelevant and so are the QTL identities. If we allow QTL positions to move, the most important information we want to capture is how many times a particular segment (position) of the genome is hit or visited by nonspurious QTL. A position can be visited many times by different QTL, but all these QTL have negligible effects; such a position is not of interest. We are interested in positions that are visited repeatedly by QTL with large effects. Keeping this in mind, we propose the first strategy of QTL moving, therandom walking strategy. We start with a “sufficient” number of QTL evenly placed on the genome. How sufficient is sufficient enough? This perhaps depends on the marker density and sample size of the mapping population. Putting one QTL in every 10 cM seems to work well. Each QTL is allowed to travel freely between the left and the right QTL, i.e., the QTL are distributed along the genome in a disjoint manner. The positions of the QTL are moving but the order of the QTL is preserved. This is the simplest method of QTL traveling. Let us take the kth QTL for example; the current position of the QTL is denoted by λ_k. The new position can be sampled from the following distribution:

$${\lambda }_{k}^{{_\ast}} = \lambda \pm \Delta \lambda $$

(15.66)

where Δλ ∼ U(0, δ) and δ is the maximum distance (in cM) that the QTL is allowed to move away from the current position. The following restriction ${\lambda }_{k-1} < {\lambda }_{k}^{{_\ast}} < {\lambda }_{k+1}$ is enforced to preserve the current order of the QTL. Empirically, δ = 2 cM seems to work well. The new position is always accepted, regardless whether it is more likely or less likely to carry a true QTL relative to the current position. The Markov chain should be sufficiently long to make sure that all putative positions are visited a number of times. Theoretically, there is no need to enforce the disjoint distribution for the QTL positions. The only reason for such a restriction is the convenience of programming if the order is preserved. With the random walk strategy of QTL moving, the frequency of hits by QTL at a position is not of interest; instead, the average effect of all the QTL hitting that position is the important information. The random walk approach does not distinguish “hot regions” (regions containing QTL) and “cold regions” (regions without QTL) of the genome. All regions are visited with equal frequency. The hot regions, however, are supposed to be visited more often than the cold regions to get a more accurate estimate of the average QTL effects for those regions. The random walk approach does not discriminate against the cold regions and thus needs a very long Markov chain to ensure that the hot regions are sufficiently visited for accurate estimation of the QTL effects.

The optimal strategy for QTL moving is to allow QTL to visit the hot regions more often than the cold regions. This sampling strategy cannot be accomplished using the Gibbs sampler because the conditional posterior of the position of a QTL does not have a well-known form of the distribution. Therefore, theMetropolis–Hastings algorithm (Hastings 1970, Metropolis et al. 1953) is adopted here to sample the QTL positions. Again, the new position is randomly generated in the neighborhood of the old position using the same approach as used in the random walk approach, but the new position λ_k ^∗ is only accepted with a certain probability. The acceptance probability is determined based on the Metropolis–Hastings rule, denoted by $\min \left [1,\alpha ({\lambda }_{k}^{{_\ast}},{\lambda }_{k})\right ]$. The new position λ_k ^∗ has an $1 -\min \left [1,\alpha ({\lambda }_{k}^{{_\ast}},{\lambda }_{k})\right ]$ chance to be rejected, where

$$\alpha ({\lambda }_{k}^{{_\ast}},{\lambda }_{ k}) = \frac{{\prod \nolimits }_{j=1}^{n}\left [{\sum \nolimits }_{l=-1,+1}\Pr ({Z}_{jk} = l\vert {\lambda }_{k}^{{_\ast}})p({y}_{j}\vert {Z}_{jk} = l)\right ]} {{\prod \nolimits }_{j=1}^{n}\left [{\sum \nolimits }_{l=-1,+1}\Pr ({Z}_{jk} = l\vert {\lambda }_{k})p({y}_{j}\vert {Z}_{jk} = l)\right ]} \frac{q({\lambda }_{k}^{}\vert {\lambda }_{k}^{{_\ast}})} {q({\lambda }_{k}^{{_\ast}}\vert {\lambda }_{k}^{})}$$

(15.67)

If it is rejected, the QTL remains at the current position, i.e., λ_k ^∗ = λ_k. If the new position is accepted, the old position is replaced by the new position, i.e., λ_k ^∗ = λ ± Δλ. Whether the new position is accepted or not, all other variables are updated based on the information from position λ_k ^∗, where $\Pr ({Z}_{jk} = -1\vert {\lambda }_{k})$ and $\Pr ({Z}_{jk} = +1\vert {\lambda }_{k})$ are the conditional probabilities that ${Z}_{jk} = -1$ and ${Z}_{jk} = +1$, respectively, calculated from the multipoint method. These probabilities depend on position λ_k. Previously, these probabilities were denoted by ${p}_{j}(-1) =\Pr ({Z}_{jk} = -1\vert {\lambda }_{k})$ and ${p}_{j}(+1) =\Pr ({Z}_{jk} = +1\vert {\lambda }_{k})$, respectively. For the new position λ_k ^∗, these probabilities are $\Pr ({Z}_{jk} = -1\vert {\lambda }_{k}^{{_\ast}})$ and $\Pr ({Z}_{jk} = +1\vert {\lambda }_{k}^{{_\ast}})$, respectively. The proposal probabilities q(λ_k ^∗ | λ_k) and q(λ_k | λ_k ^∗) are usually equal to $\frac{1} {2\delta }$ and thus are canceled out each other. However, once λ_k and λ_k ^∗ are near the boundaries, these two probabilities may not be the same. Since the new position is always restricted to the interval where the old position occurs, the proposal density q(λ_k ^∗ | λ_k) and its reverse partner q(λ_k | λ_k ^∗) may be different. Let us denote the positions of the left and right QTL by λ_k − 1 and λ_k + 1, respectively. If λ_k is close to the left QTL so that ${\lambda }_{k} - {\lambda }_{k-1} < \delta $, then the new position must be sampled from ${\lambda }_{k}^{{_\ast}}\sim U({\lambda }_{k} - {\lambda }_{k-1},{\lambda }_{k} + \delta )$ to make sure that the new position is within the required sample space. Similarly, if λ_k is close to the right QTL so that ${\lambda }_{k+1} - {\lambda }_{k} < \delta $, then the new position must be sampled from ${\lambda }_{k}^{{_\ast}}\sim U({\lambda }_{k} - \delta ,{\lambda }_{k+1})$. In either case, the proposal density should be modified. The general formula of the proposal density after incorporating the modification is

$$q({\lambda }_{k}\vert {\lambda }_{k}^{{_\ast}}) = \left \{\begin{array}{c} \frac{1} {\delta +({\lambda }_{k}-{\lambda }_{k-1})} \\ \frac{1} {\delta +({\lambda }_{k+1}-{\lambda }_{k})} \\ \frac{1} {2\delta } \end{array} \right.\begin{array}{c} \text{ if }{\lambda }_{k} - {\lambda }_{k-1} < \delta \\ \text{ if }{\lambda }_{k+1} - {\lambda }_{k} < \delta \\ \text{ otherwise} \end{array}$$

(15.68)

The assumption of using the above proposal density is that the distance between any two QTL must be larger than δ. The reverse partner of this proposal density is

$$q({\lambda }_{k}^{{_\ast}}\vert {\lambda }_{ k}) = \left \{\begin{array}{c} \frac{1} {\delta +({\lambda }_{k}^{{_\ast}}-{\lambda }_{k-1})} \\ \frac{1} {\delta +({\lambda }_{k+1}-{\lambda }_{k}^{{_\ast}})} \\ \frac{1} {2\delta } \end{array} \right.\begin{array}{c} \text{ if }{\lambda }_{k}^{{_\ast}}- {\lambda }_{k-1} < \delta \\ \text{ if }{\lambda }_{k+1} - {\lambda }_{k}^{{_\ast}} < \delta \\ \text{ otherwise} \end{array}$$

(15.69)

The differences between sampling λ_k and sampling other variables are the following: (1) The proposed new position may or may not be accepted, while the new values of all other variables are always accepted, and (2) when calculating the acceptance probability for a new position, the likelihood does not depend on the QTL genotype, while the conditional posterior probabilities of all other variables depend on sampled QTL genotypes.

3.3 Summary of the MCMC Process

TheMCMC process is summarized as follows:

1.
Choose the number of QTL to be placed in the model, p.
2.
Initialize parameters and missing values, θ = θ⁽⁰⁾ and Z _j = Z _j ⁽⁰⁾.
3.
Sample β_i from $N({\beta }_{i}\vert \hat{{\beta }}_{i},{\sigma }_{\hat{{\beta }}_{i}}^{2})$.
4.
Sample γ_k from $N({\gamma }_{k}\vert \hat{{\gamma }}_{k},{\sigma }_{\hat{{\gamma }}_{k}}^{2})$.
5.
Sample σ_k ² from { Inv} − χ²(σ_k ² | 1, γ_k ²).
6.
Sample σ² from { Inv} − χ²(σ² | n, S _e ²).
7.
Sample Z _jk from its conditional posterior distribution.
8.
Sample λ_k using the Metropolis–Hastings algorithm.
9.
Repeat Step (3) to Step (8) until the chain reaches a desired length.

The length of the chain should be sufficiently long to make sure that, afterburn-in deleting and chaintrimming, the posterior sample size is large enough to allow accurate estimation of the posterior means (modes or medians) of all QTL parameters. Methods and computer programs are available to check whether the chain has converged to the stationary distribution (Gelfand et al. 1990, Gilks et al. 1996). Our past experience showed that the burn-in period may only contain a few thousand observations. The trimming frequency of saving one in every 20 observations is sufficient. The posterior sample size of 1,000 usually works well. However, if the model is not very large, it is always a good practice to delete more observations for the burn-in and trim more observations to make the chain thinner.

3.4 Post-MCMC Analysis

The MCMC process is much like doing an experiment. It only generates data for further analysis. The Bayesian estimates will only be available after summarizing the data (posterior sample). The parameter vector θ is very long, but not all parameters are of interest. Unlike other methods in which the number of QTL is an important parameter, the Bayesian shrinkage method uses a fixed number of QTL, and thus, p is not a parameter of interest. Although the variance component for the kth QTL, σ_k ², is a parameter, it is also not a parameter of interest. It only serves as a factor to shrink the estimated QTL effect. Since the marginal posterior of σ_k ² does not exist, the empirical posterior mean or mode of σ_k ² does not have any biological meaning. In some observations, the sampled σ_k ² can be very large, and in others, it may be very small. The residual error variance σ² is meaningful only if the number of QTL placed in the model is small to moderate. When p is very large, the residual error variance will be absorbed by the very large number of spurious QTL. The only parameters that are of interest are the QTL effects and QTL positions. However, theQTL identity, k, is also not something of interest. Since the kth QTL may jump all of places over the chromosome where it is originally placed, the average effect γ_k does not have any meaningful biological interpretation. The only things left are the positions of the genome that are hit frequently by QTL with large effects. Let us consider a fixed position of a genome. A position of a genome is only a point or a locus. Since the QTL position is a continuous variable, a particular point of the genome that is hit by a QTL has a probability of zero. Therefore, we define a genome position by a bin with a width of d cM, where d can be 1 or 2 or any other suitable value. The middle point value of the bin represents the genome location. For example, if d = 2 cM, the genome location 15 cM actually represents the bin covering a region of the genome from 14 cM to 16 cM, where $14 = 15 -\frac{1} {2}d$ and $16 = 15 + \frac{1} {2}d$. Once we define the bin width of a genome location, we can count the number of QTL that hit the bin. For each hit, we record the effect of that hit. The same location may be hit many times by QTL with the same or different identities. The average effect of the QTL hitting the bin is the most important parameter in theBayesian shrinkage analysis. Each and every bin of the genome has an average QTL effect. We can then plot the effect against the genome location to form a QTL (effect) profile. This profile represents the overall result of the Bayesian mapping. In the BC example of Bayesian analysis, the kth QTL effect is denoted by γ_k. Since the QTL identity k is irrelevant, it is now replaced by the average QTL effect at position λ, which is a continuous variable. The λ without a subscript indicates a genome location. The average QTL effect at position λ can be expressed as γ(λ) to indicate that the effect is a function of the genome location. The QTL effect profile is now represented by γ(λ). If we use γ(λ) to denote the posterior mean of QTL effect at position λ, we may use σ²(λ) to denote the posterior variance of QTL effect at position λ. If QTL moving is not random but guided by the Metropolis–Hastings rule, the posterior sample size at position λ should be a useful piece of information to indicate how often position λ is hit by a QTL. Let n(λ) be the posterior sample size at λ; the standard error of the QTL effect at λ should be $\sigma (\lambda )/\sqrt{n(\lambda )}$. Therefore, another useful profile is the so-called t-test statistic profile expressed as

$$t(\lambda ) = \sqrt{n(\lambda )}\frac{\gamma (\lambda )} {\sigma (\lambda )}$$

(15.70)

The corresponding F-test statistic profile is

$$F(\lambda ) = n(\lambda )\frac{{\gamma }^{2}(\lambda )} {{\sigma }^{2}(\lambda )}$$

(15.71)

The t-test statistic profile is more informative than the F-test statistic profile because it also indicates the direction of the QTL effect (positive or negative) while the F-test statistic profile is always positive. On the other hand, the F-test statistic can be extended to multiple effects per locus, e.g., additive and dominance in an F ₂ design. Both the t-test and F-test statistic profiles can be interpreted as kinds of weighted QTL effect profiles because they incorporated the posterior frequency of the genome location.

Before moving on to the next section, let us use a simulated example to demonstrate the behavior of the Bayesian shrinkage mapping and its difference from the maximum likelihood interval mapping. The mapping population was a simulated BC family with 500 individuals. A single chromosome of 2,400 cM in length was evenly covered by 121 markers (20 cM per marker interval). The positions and effects of 20 simulated QTL are demonstrated in Fig. 15.2 (top panel). In the Bayesian model, we placed one QTL in every 25 cM to start the search. The QTL positions constantly moved according to the Metropolis–Hastings rule. The burn-in period was set at 2,000, and one observation was saved in every 50 iterations after the burn-in. The posterior sample size was 1,000. We also analyzed the same data set using the maximum likelihood interval mapping procedure. The QTL effect profiles for both the Bayesian and ML methods are demonstrated in Fig. 15.2 also (see the panels in the middle and at the bottom). The Bayesian shrinkage estimates of the QTL effects are indeed smaller than the true values, but the resolution of the signal is much clearer that the maximum likelihood estimates. The Bayesian method has separated closely linked QTL in several places of the genome very well, which is clearly in contrast to the maximum likelihood method. The ML interval mapping provides exaggerated estimates of the QTL effects across the entire genome.

4 Alternative Methods of Bayesian Mapping

4.1 Reversible Jump MCMC

Reversible jump Markov chain Monte Carlo (RJMCMC) was originally developed by Green [1995] for model selection. It allows the model dimension to change during the MCMC sampling process. Most people believe that QTL mapping is a model selection problem because the number of QTL is not known a priori. Sillanpää and Arjas (1998, 1999) are the first people to apply theRJMCMC algorithm to QTL mapping. They treated the number of QTL, denoted by p, as an unknown parameter and infer the posterior distribution of p. The assumption is that p is a small number for a quantitative trait and thus can be assigned aPoisson prior distribution with mean ρ. Sillanpää and Arjas [1998] used the Metropolis–Hastings algorithm to sample all parameters, even though most QTL parameters have known forms of the fully conditional posterior distributions. The justification for use of M–H sampling strategy is that it is a general sampling approach while the Gibbs sampling is only a special case of the M–H sampling. The M–H sampler does not require derivation of the conditional posterior distribution for a parameter. However, the acceptance probability for a proposed new value of a parameter is usually less than unity because the proposal distribution from which the new value is sampled is a uniform distribution in the neighborhood of the old value and not from the conditional posterior distribution. Therefore, the M–H sampler is computationally less efficient. Yi and Xu (1999, 2000, 2001) extended RJMCMC to QTL mapping for binary traits in line crosses and random mating populations using Gibbs sampler for all parameters except the number of QTL and the location of QTL. In this section, we only introduce the RJMCMC for sampling the number of QTL. All other variables are sampled using the same method as described in the Bayesian shrinkage analysis. Another difference between the RVJMCMC and the Bayesian shrinkage method is that γ_k is assigned a uniform prior distribution for the RVJMCMC method while a N(0, σ_k ²) prior is chosen for the shrinkage method. The conditional posterior distribution of γ_k remains normal but with mean and variance defined as

$$\hat{{\gamma }}_{k} ={ \left ({\sum }_{j=1}^{n}{Z}_{ jk}^{2}\right )}^{-1}{\sum }_{j=1}^{n}{Z}_{ jk}^{}\left ({y}_{j} -{\sum }_{i=1}^{q}{X}_{ ji}{\beta }_{i} -{\sum }_{k^{\prime}\neq k}^{p}{Z}_{ jk^{\prime}}{\gamma }_{k^{\prime}}\right )$$

(15.72)

and

$${\sigma }_{\hat{{\gamma }}_{k}}^{2} ={ \left ({\sum }_{j=1}^{n}{Z}_{ jk}^{2}\right )}^{-1}{\sigma }^{2}$$

(15.73)

respectively.

We now introduce thereversible jump MCMC. The prior distribution for p is assumed to be a truncated Poisson with mean ϕ and maximum P. The probability distribution function of p is

$$\Pr (p) ={ \left (\frac{\Gamma (P + 1,\phi )} {P!} \right )}^{-1}\left (\frac{{\phi }_{}^{p}{\mathrm{e}}^{-\phi }} {p!} \right ) \propto \frac{{\phi }_{}^{p}{\mathrm{e}}^{-\phi }} {p!}$$

(15.74)

where Γ(P + 1, ϕ) is anincomplete Gamma function and

$$\frac{\Gamma (P + 1,\phi )} {P!} ={ \sum }_{p=0}^{P}\Pr (p)$$

(15.75)

is the cumulative Poisson distribution up to P, which is irrelevant to p and thus a constant. We make a random choice among three move types of the dimensionality change: (1) Do not change the dimension, but update all other parameters except p with probability p ₀; (2) add a QTL to the model with probability p _a; and (3) delete a QTL from the model with probability p _d. The three probabilities of move types sum to one, i.e., ${p}_{0} + {p}_{a} + {p}_{d} = 1$. The following values of the probabilities may be chosen, ${p}_{0} = {p}_{a} = {p}_{d} = \frac{1} {3}$. If no change is proposed, all other parameters are sampled from their conditional posterior distributions. If adding a QTL is proposed, we choose a chromosome to place the QTL. The probability of each chromosome being chosen is proportional to the size of the chromosome. Once a chromosome is chosen, we place the proposed new QTL randomly on the chromosome. All parameters associated with this new QTL are sampled from their prior distributions. The new QTL is then accepted with a probability determined by min[1, α(p + 1, p)], where

$$\alpha (p + 1,p) = \frac{{\prod \nolimits }_{j=1}^{n}p({y}_{j}\vert p + 1)} {{\prod \nolimits }_{j=1}^{n}p({y}_{j}\vert p)} \times \frac{\phi } {p + 1} \times \frac{{p}_{d}} {(p + 1){p}_{a}}$$

(15.76)

There are three ratios occurring in the above equation. The first ratio is thelikelihood ratio, the second one is theprior ratio of the number of QTL, and the third ratio is theproposal ratio. The likelihood is defined as

$$p({y}_{j}\vert p + 1) = N\left ({y}_{j}\left \vert {\sum }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k} + {Z}_{j(p+1)}{\gamma }_{(p+1)},{\sigma }^{2}\right.\right )$$

(15.77)

and

$$p({y}_{j}\vert p) = N\left ({y}_{j}\left \vert {\sum }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum \nolimits }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k},{\sigma }^{2}\right.\right )$$

(15.78)

The prior probability for p is

$$\Pr (p) = \frac{{\phi }^{p}{\mathrm{e}}^{-\phi }} {p!}$$

(15.79)

and the prior probability for p + 1 is

$$\Pr (p + 1) = \frac{{\phi }^{p+1}{\mathrm{e}}^{-\phi }} {(p + 1)!}$$

(15.80)

Therefore, the prior ratio is

$$\frac{\Pr (p + 1)} {\Pr (p)} = \frac{{\phi }^{p+1}{\mathrm{e}}^{-\phi }} {(p + 1)!} \frac{p!} {{\phi }^{p}{\mathrm{e}}^{-\phi }} = \frac{\phi } {p + 1}$$

(15.81)

The proposal probability for adding a QTL is $\xi (p + 1,p) = {p}_{a}$. The reverse partner is $\xi (p,p + 1) = \frac{{p}_{d}} {p+1}$. It is easy to understand $\xi (p + 1,p) = {p}_{a}$ because we already defined that p _a is the probability of adding a QTL. However, the reverse partner is not p _d but ${p}_{d}/(p + 1)$, which is hard to understand if we do not understand the Hastings’ adjustment for the proposal probability. This probability says that if a deletion has occurred (with probability p _d) given that we have p + 1 QTL in the model, the probability that the newly added QTL (not any other QTL) is deleted is $1/(p + 1)$ due to the fact that each QTL has an equal chance to be deleted. Therefore, the probability that the newly added QTL (not others) is deleted is ${p}_{d}/(p + 1)$. As a result, the proposal ratio is

$$\frac{\xi (p,p + 1)} {\xi (p + 1,p)} = \frac{{p}_{d}/(p + 1)} {{p}_{a}} = \frac{{p}_{d}} {(p + 1){p}_{a}}$$

(15.82)

Note that the proposal ratio is the probability of deleting a QTL to the probability of adding a QTL, not the other way around. This Hastings’ adjustment is important to prevent the Markov chain from being trapped at a particular QTL number. This is the very reason for the name “reversible jump.” The dimension of the model can jump in either direction without being stuck at a local value of p.

If deleting a QTL is proposed, we randomly select one of the p QTL to be deleted. Suppose that the kth QTL happens to be the unlucky one. The number of QTL would change from p to p − 1. The reduced model with p − 1 QTL is accepted with probability min[1, α(p − 1, p)], where

$$\alpha (p - 1,p) = \frac{{\prod \nolimits }_{j=1}^{n}p({y}_{j}\vert p)} {{\prod \nolimits }_{j=1}^{n}p({y}_{j}\vert p - 1)} \times \frac{p} {\phi } \times \frac{{p}_{a}p} {{p}_{d}}$$

(15.83)

where

$$p({y}_{j}\vert p - 1) = N\left ({y}_{j}\left \vert {\sum }_{i=1}^{q}{X}_{ ji}{\beta }_{i} +{ \sum }_{k^{\prime}\neq k}^{p}{Z}_{ jk^{\prime}}{\gamma }_{k^{\prime}},{\sigma }^{2}\right.\right )$$

(15.84)

The prior ratio is

$$\frac{\Pr (p - 1)} {\Pr (p)} = \frac{{\phi }^{p-1}{\mathrm{e}}^{-\phi }} {(p - 1)!} \frac{p!} {{\phi }^{p}{\mathrm{e}}^{-\phi }} = \frac{p} {\phi }$$

(15.85)

The proposal ratio is

$$\frac{\xi (p,p - 1)} {\xi (p - 1,p)} = \frac{{p}_{a}} {{p}_{d}/p} = \frac{{p}_{a}p} {{p}_{d}}$$

(15.86)

The reversible jump MCMC requires more cycles of simulations because of the frequent change of model dimension. When a QTL is deleted, all parameters associated with this QTL are gone. The chain does not memorize this QTL. In the future, if a new QTL is added to the neighborhood of this deleted QTL, the parameter associated to this added QTL must be sampled anew from the prior distribution. Even if the newly added QTL occupies exactly the same location as a previously deleted QTL, the information of the previously deleted QTL is gone permanently and cannot be reused. An improved RJMCMC may be developed to memorize the information associated with deleted QTL. If the position of a deleted QTL is sampled again later in the MCMC process (a new QTL is added to a previously deleted QTL), the parameters associated with that deleted QTL can be used again to facilitate the sampling for the newly added QTL. The improved method can substantially improve the mixing of the Markov chain and speed up the MCMC process. The tradeoff is the increased computer memory requirement for the improved method.

With the RJMCMC, the QTL number is a very important parameter. Its posterior distribution is always reported. Each QTL occurring in the model is deemed to be important and counted. In addition, the positions of QTL are usually determined by the so-calledQTL intensity profile, which is simply the plot of a scaled posterior sample at a particular location n(λ) against the genome location λ.

4.2 Stochastic Search Variable Selection

Stochastic search variable selection (SSVS) is a variable selection strategy for large models. The method was originally developed by George and McCulloch (1993, 1997) and applied to QTL mapping for the first time by Yi et al. [2003]. The difference between this method and many other methods of model selection is that the model dimension is fixed at a predetermined value, just like the Bayesian shrinkage analysis. Model selection is actually conducted by introducing a series of binary variables, one for each model effect, i.e., the QTL effect. For p QTL effects, p indicator variables are required. Let η_k be the indicator variable for the kth QTL. If η_k = 1, the QTL is equivalent to being included in the model, and the effect will not be shrunken. If η_k = 0, the effect will be forced to take a value closed to, but not exactly equal to, zero. Essentially, the prior distribution of the kth QTL takes one of two normals. The switching button is variable η_k, as given below:

$$p({\gamma }_{k}) = {\eta }_{k}N({\gamma }_{k}\vert 0,\Delta ) + (1 - {\eta }_{k})N({\gamma }_{k}\vert 0,\delta )$$

(15.87)

where δ is a small positive number closed to zero, say 0.0001, and Δ is a large positive value, say 1,000. The two variances (δ and Δ) are constant hyperparameters. The indicator variable is not known, and thus, the above distribution is a mixture of two normal distributions. Let $p({\eta }_{k} = 1) = \rho $ be the probability that γ_k comes from the first distribution; themixture distribution is

$$p({\gamma }_{k}) = \rho N({\gamma }_{k}\vert 0,\Delta ) + (1 - \rho )N({\gamma }_{k}\vert 0,\delta )$$

(15.88)

The mixture proportion ρ is unknown and is treated as a parameter. When the indicator variable (η_k) is known, the posterior distribution of γ_k is $p({\gamma }_{k}\vert \cdots \,) = N({\gamma }_{k}\vert \hat{{\gamma }}_{k},{\sigma }_{\hat{{\gamma }}_{k}}^{2})$. The mean and variance of this normal are

$$\hat{{\gamma }}_{k} ={ \left ({\sum }_{j=1}^{n}{Z}_{ jk}^{2} + \frac{{\sigma }^{2}} {{\upsilon }_{k}}\right )}^{-1}{ \sum }_{j=1}^{n}{Z}_{ jk}\left ({y}_{j} -{\sum }_{i=1}^{q}{X}_{ ji}{\beta }_{i} -{\sum }_{k^{\prime}\neq k}^{p}{Z}_{ jk^{\prime}}{\gamma }_{k^{\prime}}\right )$$

(15.89)

and

$${\sigma }_{\hat{{\gamma }}_{k}}^{2} ={ \left ({\sum }_{j=1}^{n}{Z}_{ jk}^{2} + \frac{{\sigma }^{2}} {{\upsilon }_{k}}\right )}^{-1}{\sigma }^{2}$$

(15.90)

respectively, where

$${\upsilon }_{k} = {\eta }_{k}\Delta + (1 - {\eta }_{k})\delta $$

(15.91)

is the actual variance of the posterior distribution. Let the prior distribution for η_k be

$$p({\eta }_{k}) =\mathrm{ Bernoulli}({\eta }_{k}\vert \rho )$$

(15.92)

The conditional posterior distribution of η_k = 1 is

$$p({\eta }_{k} = 1\vert \cdots \,) = \frac{\rho N({\gamma }_{k}\vert 0,\Delta )} {\rho N({\gamma }_{k}\vert 0,\Delta ) + (1 - \rho )N({\gamma }_{k}\vert 0,\delta )}$$

(15.93)

There is another parameter ρ involved in the conditional posterior distribution. Yi et al. [2003] treated ρ as a hyperparameter and set $\rho = \frac{1} {2}$. This prior works well for small models but fails most often for large models. The optimal strategy is to assign another prior to ρ so that ρ can be estimated from the data. Xu [2007] took abeta prior for ρ , i.e.,

$$p(\rho ) =\mathrm{ Beta}(\rho \vert {\zeta }_{0},{\zeta }_{1}) = \frac{\Gamma ({\zeta }_{0} + {\zeta }_{1})} {\Gamma ({\zeta }_{0})\Gamma ({\zeta }_{1})}{\rho }^{{\zeta }_{1}-1}{(1 - \rho )}^{{\zeta }_{0}-1}$$

(15.94)

Under this prior, the conditional posterior distribution for ρ remains beta,

$$p(\rho \vert \cdots \,) =\mathrm{ Beta}\left (\rho \left \vert {\zeta }_{0} + p -{\sum \nolimits }_{k=1}^{p}{\eta }_{ k},{\zeta }_{1} +{ \sum \nolimits }_{k=1}^{p}{\eta }_{ k}\right.\right )$$

(15.95)

The values of the hyperparameters were chosen by Xu [2007] as ζ₀ = 1 and ζ₁ = 1, leading to anuninformative prior for ρ, i.e.,

$$p(\rho ) =\mathrm{ Beta}(\rho \vert 1,1) =\mathrm{ constant}$$

(15.96)

The Gibbs sampler for σ_k ² in the Bayesian shrinkage analysis is replaced by sampling η_k from

$$p({\eta }_{k}\vert \cdots \,) =\mathrm{ Bernoulli}\left ({\eta }_{k}\left \vert \frac{\rho N({\gamma }_{k}\vert 0,\Delta )} {\rho N({\gamma }_{k}\vert 0,\Delta ) + (1 - \rho )N({\gamma }_{k}\vert 0,\delta )}\right.\right )$$

(15.97)

and sampling ρ from

$$p(\rho \vert \cdots \,) =\mathrm{ Beta}\left (\rho \left \vert 1 + p -{\sum \nolimits }_{k=1}^{p}{\eta }_{ k},1 +{ \sum \nolimits }_{k=1}^{p}{\eta }_{ k}\right.\right )$$

(15.98)

in the SSVS analysis.

The additional information extracted from SSVS is the probabilistic statement about a QTL. If the marginal posterior mean of η_k is large, say p(η_k | data) > 95 %, the evidence of locus k being a QTL is strong. If the QTL position is allowed to move, η_k does not have any particular meaning. Instead, the number of hit of a particular location of the genome by QTL with η(λ) = 1 is more informative.

4.3 Lasso and Bayesian Lasso

4.3.1 Lasso

Lasso refers to a method calledleast absolute shrinkage and selection operator (Tibshirani 1996). The method can handle extremely large models by minimizing the residual sum of squares subject to a predetermined constraint, the constraint that the sum of absolute values of all regression coefficients is smaller than a predetermined shrinkage factor. Mathematically, the solution of regression coefficients is obtained by

$${ \min }_{\gamma }{ \sum \nolimits }_{j=1}^{n}{\left ({y}_{ j} -{\sum \nolimits }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k}\right )}^{2}$$

(15.99)

subject to constraint

$${\sum \nolimits }_{k=1}^{p}\left \vert {\gamma }_{ k}\right \vert \leq t$$

(15.100)

where t > 0. When t = 0, all regression coefficients must be zero. As t increases, the number of nonzero regression coefficients progressively increases. As t → ∞, the Lasso estimates of the regression coefficients are equivalent to the ordinary least-squares estimates. Another expression of the problem is

$${ \min }_{\gamma }\left [{\sum \nolimits }_{j=1}^{n}{\left ({y}_{ j} -{\sum \nolimits }_{k=1}^{p}{Z}_{ jk}{\gamma }_{k}\right )}^{2} + \lambda {\sum \nolimits }_{k=1}^{p}\left \vert {\gamma }_{ k}\right \vert \right ]$$

(15.101)

where λ ≥ 0 is aLagrange multiplier (unknown) which relates implicitly to the bound t and controls the degree of shrinkage. The effect of λ on the level of shrinkage is just the opposite of t, with λ = 0 being no shrinkage and λ → ∞ being the strongest shrinkage where all γ_k are shrunken down to zero. Note that the Lasso model does not involve X _jβ, the non-QTL effect described earlier in the chapter. The non-QTL effect in the original Lasso refers to the population mean. For simplicity, Tibshirani [1996] centered y _j and all the independent variables. The centered y _j is simply the original y _j subtracted by $\bar{y}$, the population mean. The corresponding centered independent variables are also obtained by subtraction of $\bar{{Z}}_{k}$ from Z _jk. The Lasso estimates of regression coefficients can be efficiently computed viaquadratic programming with linear constraints. An efficient algorithm called LARS (least angle regression) was developed by Efron et al. [2004] to implement the Lasso method. The Lagrange multiplier λ or the original t is called theLasso parameter. The original Lasso estimates λ using the fivefold cross validation approach. One can also use any other fold cross validations, for example, the n-fold (leave-one-out) cross validation. Under each λ value, the fivefold cross validation is used to calculate theprediction error (PE),

$$\mathrm{PE} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}{\left ({y}_{ j} -{\sum \nolimits }_{k=1}^{p}{Z}_{ jk}\hat{{\gamma }}_{k}\right )}^{2}$$

(15.102)

This formula appears to be the same as the estimated residual error variance. However, the prediction error differs from the residual error in that the individuals predicted do not contribute to parameter estimation. With the fivefold cross validation, we use $\frac{4} {5}$ of the sample to estimate γ_k and then use the estimated γ_k to predict the errors for the remaining $\frac{1} {5}$ sample. In other words, when we calculate ${\left ({y}_{j} -{\sum \nolimits }_{k=1}^{p}{Z}_{jk}\hat{{\gamma }}_{k}\right )}^{2}$, the γ_k is estimated from $\frac{4} {5}$ of the sample that excludes y _j. Under each λ, the PE is calculated, denoted by PE(λ). We vary λ from 0 to large value. The λ value that minimizes PE(λ) is the optimal value of λ.

4.3.2 Bayesian Lasso

Lasso can be interpreted as Bayesian posterior mode estimation of regression coefficients when each regression coefficient is assigned an independentdouble-exponential prior (Park and Casella 2008, Tibshirani 1996, Yuan and Lin 2005). However, Lasso provides neither the estimate for the residual error variance nor the interval estimate for a regression coefficient. These deficiencies of Lasso can be overcome by theBayesian Lasso (Park and Casella 2008). The double-exponential prior for γ_k is

$$p({\gamma }_{k}\vert \lambda ) = \frac{\lambda } {2}\exp (-\lambda \vert {\gamma }_{k}\vert )$$

(15.103)

where λ is the Lagrange multiplier in the classical Lasso method (see (15.101)). This prior can be derived from a two-level hierarchical model. The first level is

$$p({\gamma }_{k}\vert {\sigma }_{k}^{2}) = N({\gamma }_{ k}\vert 0,{\sigma }_{k}^{2})$$

(15.104)

and the second level is

$$p({\sigma }_{k}^{2}\vert \lambda ) = \frac{{\lambda }^{2}} {2} \exp \left (-{\sigma }_{k}^{2}\frac{{\lambda }^{2}} {2} \right )$$

(15.105)

Therefore,

$$p({\gamma }_{k}\vert \lambda ) ={ \int \nolimits \nolimits }_{0}^{\infty }p({\gamma }_{ k}\vert {\sigma }_{k}^{2})p({\sigma }_{ k}^{2}\vert \lambda )\mathrm{d}{\sigma }_{ k}^{2} = \frac{\lambda } {2}\exp (-\lambda \vert {\gamma }_{k}\vert )$$

(15.106)

The Bayesian Lasso method uses the same model as the Lasso method. However, centralization of independent variables is not required, although it is still recommended. The model is described as follows:

$${y}_{j} ={ \sum }_{i=1}^{q}{X}_{ ji}^{}{\beta }_{i} +{ \sum }_{k=1}^{p}{Z}_{ jk}^{}{\gamma }_{k} + {\epsilon }_{j}$$

(15.107)

where β_i remains in the model and can be estimated along with the residual variance σ² and all QTL effects. Bayesian Lasso provides the posterior distributions for all parameters. The marginal posterior mean of each parameter is the Bayesian Lasso estimate, which is different from the posterior mode estimate obtained from the Lasso analysis. The Bayesian Lasso differs from the Bayesian shrinkage analysis only in the prior distribution for σ_k ². Under theBayesian Lasso, the prior for σ_k ² is

$$p({\sigma }_{k}^{2}\vert \lambda ) = \frac{{\lambda }^{2}} {2} \exp \left (-{\sigma }_{k}^{2}\frac{{\lambda }^{2}} {2} \right )$$

(15.108)

The Lasso parameter λ needs a prior distribution so that we can estimate λ from the data rather than choosing an arbitrary value a priori. Park and Casella [2008] choose the following gamma prior for λ² (not λ):

$$p({\lambda }^{2}\vert a,b) =\mathrm{ Gamma}({\lambda }^{2}\vert a,b) = \frac{{b}^{a}} {\Gamma (a)}{({\lambda }^{2})}^{a-1}\exp \left (-b{\lambda }^{2}\right )$$

(15.109)

The reason for choosing such a prior is to enjoy the conjugate property. The hyperparameters, a and b, are sufficiently remote from σ_k ² and γ_k, and thus, their values can be chosen in an arbitrary fashion. Yi and Xu [2008] used several different sets of values for a and b and found no significant differences among those values. For convenience, we may simply set $a = b = 1$, which is sufficiently different from 0. Note that $a = b = 0$ produces animproper prior for λ². Once a and b values are chosen, everything else can be estimated from the data.

The fully conditional posterior distributions for most variables remain the same as the Bayesian shrinkage analysis except that the following variables must be sampled using the posterior distribution derived under the Bayesian Lasso prior distribution. For the kth QTL variance, it is better to deal with ${\alpha }_{k} = \frac{1} {{\sigma }_{k}^{2}}$. The conditional posterior for α_k is aninverse Gaussian distribution,

$$p({\alpha }_{k}\vert \cdots \,) =\mathrm{ Inv - Gassian}\left ({\alpha }_{k}\left \vert \sqrt{\frac{{\lambda }^{2 } {\sigma }^{2 } } {{\gamma }_{k}^{2}}} ,{\lambda }^{2}\right.\right )$$

(15.110)

Algorithm for sampling a random variable from an inverse Gaussian is available. Once α_k is sampled, σ_k ² simply takes the inverse of α_k. The fully conditional posterior distribution for λ² remains gamma because of the conjugate property of the gamma prior,

$$p({\lambda }^{2}\vert \cdots \,) =\mathrm{ Gamma}\left ({\lambda }^{2}\left \vert p + a, \frac{1} {2}{\sum \nolimits }_{k=1}^{p}{\sigma }_{ k}^{2} + b\right.\right )$$

(15.111)

The Bayesian Lasso can potentially improve the estimation of regression coefficients for the following reasons: (1) It assigns an exponential prior, rather than a scaled inverse chi-square prior, distribution to σ_k ², and (2) it increases the hierarchy of the prior to another level so that the hyperparameters do not have strong influence on the Bayesian estimates of the regression coefficients.

5 Example: Arabidopsis Data

The first example is the recombinant inbred line data ofArabidopsis data (Loudet et al. 2002), where the two parents initiating the line cross were Bay-0 and Shahdara with Bay-0 as the female parent. The recombinant inbred lines were actually F ₇ progeny of single-seed descendants of the F ₂ plants.Flowering time was recorded for each line in two environments: long day (16-h photoperiod) and short day (8-h photoperiod). We used the short-day flowering time as the quantitative trait for QTL mapping. The two parents had very little difference in short-day flowering time. The sample size (number of recombinant inbred lines) was 420. A couple of lines did not have the phenotypic records, and their phenotypic values were replaced by the population mean for convenience of data analysis. A total of 38microsatellite markers were used for the QTL mapping. These markers are more or less evenly distributed along five chromosomes with an average 10.8 cM per marker interval. The marker names and positions are given in the original article (Loudet et al. 2002). We inserted apseudomarker in every 5 cM of the genome. Including the inserted pseudomarkers, the total number of loci subject to analysis was 74 (38 true markers plus 36 pseudomarkers). All the 74 putative loci were evaluated simultaneously in a single model. Therefore, the model for the short-day flowering time trait is

$$y = X\beta +{ \sum \nolimits }_{k=1}^{74}{Z}_{ k}{\gamma }_{k} + \epsilon $$

where X is a 420 ×1 vector of unity, Z _k coded as 1 for one genotype and 0 for the other genotype for locus k. If locus k is a pseudomarker, ${Z}_{k} =\Pr (\text{ genotype} = 1)$, which is the conditional probabilities of marker k being of genotype 1. Finally, γ_k is the QTL effect of locus k. For the original data analysis, the burn-in period was 1,000. The thinning rate was 10. The posterior sample size was 10,000, and thus, the total number of iterations was $1,000 + 10,000 \times 10 = 101,000$. We also performed apermutation analysis (Che and Xu 2010) to generate empirical quantiles of the QTL effects under the null model. The posterior sample size in permutation analysis was 80,000. The total number of iterations was $1,000 + 80,000 \times 10 = 801,000$. The estimated QTL effects and the permutation generated 0.5 % and 99.5 % (corresponding to a type I error of 0.01) and 2.5 % and 97.5 % (corresponding to a type I error of 0.05) are shown in Fig. 15.3. Based on the 0.01 criterion, a total of five QTL were detected on four chromosomes (1, 3, 4, and 5).

References

Abdi H (2007) Bonferroni and Šidák corrections for multiple comparisons. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, California
Google Scholar
Almasy L, Blangero J (1998) Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Human Genet 62(5):1198–1211
CAS Google Scholar
Amos CI (1994) Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Human Genet 54(3):535–543
CAS Google Scholar
Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17(6):509–519
PubMed CAS Google Scholar
Banerjee S, Yandell BS, Yi N (2008) Bayesian quantitative trait loci mapping for multiple traits. Genetics 179(4):2275–2289
PubMed Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate – a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Stat Methodol) 57(1):289–300
Google Scholar
Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, Landfield PW (2004) Incipient Alzheimer’s disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc Nat Acad Sci USA 101(7):2173–2178
PubMed CAS Google Scholar
Bottolo L, Petretto E, Blankenberg S, Cambien F, Cook SA, Tiret L, Richardson S (2011) Bayesian detection of expression quantitative trait loci hot-spots. Genetics 189(4):1449–1459
PubMed Google Scholar
Bottolo L, Richardson S (2010) Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal 5(3):583–618
Google Scholar
Box GEP, Cox DR (1964) An analysis of transformations. J Roy Stat Soc Ser B (Stat Methodol) 26(2):211–252
Google Scholar
Box GEP, Tiao GC (1973) Bayesian inference in statistical analysis. Wiley, New York
Google Scholar
Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296(5568):752–755
PubMed CAS Google Scholar
Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc Ser B (Stat Methodol) 64(4):641–656
Google Scholar
Cai X, Huang A, Xu S (2011) Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12(1):211
PubMed Google Scholar
Che X, Xu S (2010) Significance test and genome selection in Bayesian shrinkage analysis. Int J Plant Genomics 2010:doi:10.1155/2010/893206
Google Scholar
Chen M, Presting G, Barbazuk WB, Goicoechea JL, Blackmon, B, Fang G, Ki H, Frisch D, Yu Y, Sun S, Higingbottom S, Phimphilai J, Phimphilai D, Thurmond S, Gaudette B, Li P, Liu J, Hatfield J, Main D, Farrar K, Henderson C, Barnett L, Costa R, Williams B, Walser S, Atkins M, Hall C, Budiman MA, Tomkins JP, Luo M, Bancroft I, Salse J, Regad F, Mohapatra T, Singh NK, Tyagi AK, Soderlund C, Dean RA, Wing RA (2002) An integrated physical and genetic map of the rice genome. Plant Cell 14(3):537–545
PubMed Google Scholar
Cheung VG, Spielman RS (2002) The genetics of variation in gene expression. Nat Genet 32(Supp):522–525
Google Scholar
Chun H, Keles S (2009) Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 182(1):79–90
PubMed CAS Google Scholar
Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138(3):963–971
PubMed CAS Google Scholar
Civardi L, Xia Y, Edwards EJ, Schnable PS, Nikolau BJ (1994) The relationship between genetic and physical distances in the cloned al-h2 interval of the Zea mays L. genome. Proc Nat Acad Sci USA 91(17):8268–8272
Google Scholar
Cohen AC (1991) Truncated and censored samples:theory and applications, vol 119 of Statistics: textbooks and monographs, 1st edn. Marcel Dekker Inc., New York
Google Scholar
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184–194
PubMed CAS Google Scholar
Cullinan WE, Herman JP, Battaglia DF, Akil H, Watson SJ (1995) Pattern and time course of immediate early gene expression in rat brain following acute stress. Neuroscience 64(2): 477–505
PubMed CAS Google Scholar
Dagliyan O, Uney-Yuksektepe F, Kavakli IH, Turkay M (2011) Optimization based tumor classification from microarray gene expression data. Publ Libr Sci One 6(2):e14579
CAS Google Scholar
de Boor C (1978) A practical guide to splines. Springer, New York
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B (Stat Methodol) 39(1):1–38
Google Scholar
Dou B, Hou B, Xu H, Lou X, Chi X, Yang J, Wang F, Ni Z, Sun Q (2009) Efficient mapping of a female sterile gene in wheat (Triticum aestivum l.). Genet Res 91(05):337–343
Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Google Scholar
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26
Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Nat Acad Sci USA 95(25):14863–14868
PubMed CAS Google Scholar
Elston RC, Steward J (1971) A general model for the genetic analysis of pedigree data. Hum Hered 21(6):523–542
PubMed CAS Google Scholar
Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, Gulcher JR, Reitman ML, Kong A, Schadt EE, Stefansson K (2008) Genetics of gene expression and its effect on disease. Nature 452:423–428
PubMed CAS Google Scholar
Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Longman Group Ltd., London
Google Scholar
Feenstra B, Skovgaard IM, Broman KW (2006) Mapping quantitative trait loci by an extension of the Haley-Knott regression method using estimating equations. Genetics 173(4):2269–2282
PubMed CAS Google Scholar
Felsenstein J (1981a) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376
PubMed CAS Google Scholar
Felsenstein J (1981b) Evolutionary trees from gene frequencies and quantitative characters: finding maximum likelihood estimates. Evolution 35(6):1229–1242
Google Scholar
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791
Google Scholar
Fisher RA (1946) A system of scoring linkage data, with special reference to the pied factors in mice. Am Nat 80(794):568–578
PubMed CAS Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Software 33(1):1–22
Google Scholar
Fu YB, Ritland K (1994) On estimating the linkage of marker genes to viability genes controlling inbreeding depression. Theoret Appl Genet 88(8):925–932
Google Scholar
Fulker DW, Cardon LR (1994) A sib-pair approach to interval mapping of quantitative trait loci. Am J Hum Genet 54(6):1092–1103
PubMed CAS Google Scholar
Gelfand AE, Hills SE, Racine-Poon A, Smith AFM (1990) Illustration of Bayesian inference in normal data models using Gibbs sampling. J Am Stat Assoc 85(412):972–985
Google Scholar
Gelman A (2005) Analysis of variance – why it is more important than ever. Ann Stat 33(1):1–53
Google Scholar
Gelman A (2006) Prior distributions for variance parameters in hierarchical models (Comment on article by Browne and Draper). Bayesian Anal 1(3):515–533
Google Scholar
Gelman A, Jakulin A, Pittau MG, Su YS (2008) A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat 2(4):1360–1383
Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell PAMI-6(6):721–741
Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Google Scholar
George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Statistica Sinica 7:339–373
Google Scholar
Ghosh D, Chinnaiyan AM (2002) Mixture modelling of gene expression data from microarray experiments. Bioinformatics 18(2):275–286
PubMed CAS Google Scholar
Gilks WR, Richardson S, Spiegelhalter DJ (1996) Markov chain Monte Carlo in practice. Chapman and Hall/CRC, London
Google Scholar
Glonek G, Solomon P (2004) Factorial and time course designs for cDNA microarray experiments. Biostatistics 5(1):89–111
PubMed CAS Google Scholar
Goldgar DE (1990) Multipoint analysis of human quantitative genetic variation. Am J Hum Genet 47(6):957–967
PubMed CAS Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
PubMed CAS Google Scholar
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732
Google Scholar
Hackett CA, Meyer RC, Thomas WTB (2001) Multi-trait QTL mapping in barley using multivariate regression. Genet Res 77(1):95–106
PubMed CAS Google Scholar
Hackett CA, Weller JI (1995) Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51(4):1252–1263
PubMed CAS Google Scholar
Haldane JBS (1919) The combination of linkage values and the calculation of distances between the loci of linked factors. J Genet 8(29):299–309
Google Scholar
Haldane JBS, Waddington CH (1931) Inbreeding and linkage. Genetics 16(4):357–374
PubMed CAS Google Scholar
Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69(4):315–324
PubMed CAS Google Scholar
Haley CS, Knott SA, Elsen JM (1994) Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics 136(3):1195–1207
PubMed CAS Google Scholar
Han L, Xu S (2008) A Fisher scoring algorithm for the weighted regression method of QTL mapping. Heredity 101(5):453–464
PubMed CAS Google Scholar
Han L, Xu S (2010) Genome-wide evaluation for quantitative trait loci under the variance component model. Genetica 138(9–10):1099–1109
PubMed Google Scholar
Hardy GH (1908) Mendelian proportions in a mixed population. Science 28(706):49–50
PubMed CAS Google Scholar
Hartigan J, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J Roy Stat Soc Ser C (Appl Stat) 28(1):100–108
Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Google Scholar
Hartl DL, Clark AG (1997) Principles of population genetics, 3rd edn. Sinauer Associates Inc., Sunderland, Massachusetts
Google Scholar
Haseman JK, Elston RC (1972) The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 2(1):3–19
PubMed CAS Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109
Google Scholar
Hayes JG (1974) Numerical methods for curve and surface fitting. Bull Inst Math Appl 10(5/6):144–152
Google Scholar
Hayes PM, Liu BH, Knapp SJ, Chen F, Jones B, Blake T, Franckowiak J, Rasmusson D, Sorrells M, Ullrich SE, Wesenberg D, Kleinhofs A (1993) Quantitative trait locus effects and environmental interaction in a sample of North American barley germ plasm. Theor Appl Genet 87(3):392–401
Google Scholar
Heath SC (1997) Markov chain Monte Carlo segregation and linkage analysis of oligogenic models. Am J Hum Genet 61(3):748–760
PubMed CAS Google Scholar
Henderson CR (1950) Estimation of genetic parameters (abstract). Ann Math Stat 21(2):309–310
Google Scholar
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2):423–447
PubMed CAS Google Scholar
Henshall JM, Goddard ME (1999) Multiple-trait mapping of quantitative trait loci after selective genotyping using logistic regression. Genetics 151(2):885–894
PubMed CAS Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(2):55–67
Google Scholar
Horton NJ, Laird NM (1999) Maximum likelihood analysis of generalized linear models with missing covariates. Stat Methods Med Res 8(1):37–50
PubMed CAS Google Scholar
Hu Z, Xu S (2009) PROC QTL – a SAS procedure for mapping quantitative trait loci. Int J Plant Genom 2009:1–3, doi:10.1155/2009/141234
Google Scholar
Huelsenbeck JP Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294(5550):2310–2314
PubMed CAS Google Scholar
Ibrahim JG (1990) Incomplete data in generalized linear models. J Am Stat Assoc 85(411): 765–769
Google Scholar
Ibrahim JG, Chen MH, Lipsitz SR (2002) Bayesian methods for generalized linear models with covariates missing at random. Can J Stat 30(1):55–78
Google Scholar
Ibrahim JG, Chen MH, Lipsitz SR, Herring AH (2005) Missing-data methods for generalized linear models. J Am Stat Assoc 100(469):332–346
CAS Google Scholar
Jia Z, Xu S (2005) Clustering expressed genes on the basis of their association with a quantitative phenotype. Genet Res 86(3):193–207
PubMed CAS Google Scholar
Jia Z, Xu S (2007) Mapping quantitative trait loci for expression abundance. Genetics 176(1): 611–623
PubMed CAS Google Scholar
Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140(3):1111–1127
PubMed CAS Google Scholar
Jiang C, Zeng ZB (1997) Mapping quantitative trait loci with dominance and missing markers in various crosses from two inbred lines. Genetica 101(1):47–58
PubMed CAS Google Scholar
Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinform 6:148
Google Scholar
Kao CH (2000) On the differences between the maximum likelihood and the regression interval mapping in the analysis of quantitative trait loci. Genetics 156(2):855–865
PubMed CAS Google Scholar
Kao CH, Zeng ZB, Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152(3):1203–1216
PubMed CAS Google Scholar
Kendziorski C, Wang P (2006) A review of statistical methods for expression quantitative trait loci mapping. Mamm Genome 17(6):509–517
PubMed Google Scholar
Kendziorski CM, Chen M, Yuan M, Lan H, Attie AD (2006) Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62(1):19–27
PubMed CAS Google Scholar
Knott SA, Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics 156(2):899–911
PubMed CAS Google Scholar
Korol AB, Ronin YI, Itskovich AM, Peng J, Nevo E (2001) Enhanced efficiency of quantitative trait loci mapping analysis based on multivariate complexes of quantitative traits. Genetics 157(4):1789–1803
PubMed CAS Google Scholar
Korol AB, Ronin YI, Kirzhner VM (1995) Interval mapping of quantitative trait loci employing correlated trait complexes. Genetics 140(3):1137–1147
PubMed CAS Google Scholar
Kosambi DD (1943) The estimation of map distances from recombination values. Ann Hum Genet 12(1):172–175
Google Scholar
Lan H, Chen M, Flowers JB, Yandell BS, Stapleton DS, Mata CM, Mui ET, Flowers MT, Schueler KL, Manly KF, Williams RW, Kendziorski C, Attie AD (2006) Combined expression trait correlations and expression quantitative trait locus mapping. Pub Lib Sci Genet 2(1):e6
Google Scholar
Land AH, Doig AG (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520
Google Scholar
Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121(1):185–199
PubMed CAS Google Scholar
Lee Y, Lee C (2003) Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9):1132–1139
PubMed CAS Google Scholar
Li CC (1955) Population genetics. University of Chicago Press, Chicago
Google Scholar
Liao J, Chin K (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15):1945–1951
PubMed CAS Google Scholar
Liu BH (1998) Statistical genomics: linkage, mapping and qtl analysis, 1st edn. CRC, Boca Raton
Google Scholar
Lorieux M, Goffinet B, Perrier X, Leon DG, Lanaud C (1995a) Maximum-likelihood models for mapping genetic markers showing segregation distortion. 1. Backcross populations. Theor Appl Genet 90(1):73–80
Google Scholar
Lorieux M, Perrier X, Goffinet B, Lanaud C, Leon DG (1995b) Maximum-likelihood models for mapping genetic markers showing segregation distortion. 2. F2 populations. Theor Appl Genet 90(1):81–89
Google Scholar
Loudet O, Chaillou S, Camilleri C, Bouchez D, Daniel-Vedele F (2002) Bay-0 × Shahdara recombinant inbred line population: a powerful tool for the genetic dissection of complex traits in Arabidopsis. Theor Appl Genet 104(6):1173–1184
PubMed CAS Google Scholar
Louis T (1982) Finding the observed information matrix when using the EM algorithm. J Roy Stat Soc Ser B (Stat Methodol) 44(2):226–233
Google Scholar
Luan Y, Li H (2003) Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 19(4):474–482
PubMed CAS Google Scholar
Luo L, Xu S (2003) Mapping viability loci using molecular markers. Heredity 90(6):459–467
PubMed CAS Google Scholar
Luo L, Zhang YM, Xu S (2005) A quantitative genetics model for viability selection. Heredity 94(3):347–355
PubMed CAS Google Scholar
Luo ZW, Zhang RM, Kearsey MJ (2004) Theoretical basis for genetic linkage analysis in autotetraploid species. Proc Nat Acad Sci USA 101(18):7040–7045
PubMed CAS Google Scholar
Luo ZW, Zhang Z, Leach L, Zhang RM, Bradshaw JE, Kearsey MJ (2006) Constructing genetic linkage maps under a tetrasomic model. Genetics 172(4):2635–2645
PubMed CAS Google Scholar
Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits, 1st edn. Sinauer Associates Inc., Sunderland
Google Scholar
Ma P, Castillo-Davis CI, Zhong W, Liu JS (2006) A data-driven clustering method for time course gene expression data. Nucleic Acids Res 34(4):1261–1269
PubMed CAS Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297, Berkeley, California
Google Scholar
Mangin B, Thoquet P, Grimsley N (1998) Pleiotropic QTL analysis. Biometrics 54(1):88–99
Google Scholar
McCullagh P, Nelder JA (1999) Generalized linear models. Monograph on statistics and applied probability. Chapman and Hall/CRC, London
Google Scholar
McCulloch CE, Searle SR (2001) Generalized linear and mixed models. Wiley, New York
Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Google Scholar
McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3):413–422
PubMed CAS Google Scholar
McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712
PubMed CAS Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
CAS Google Scholar
Mitchell-Olds T (1995) Interval mapping of viability loci causing heterosis in Arabidopsis. Genetics 140(3):1105–1109
PubMed CAS Google Scholar
Morgan TH (1928) The theory of the gene. Yale University Press, New Haven
Google Scholar
Morgan TH, Bridges CB (1916) Sex-linked inheritance in drosophila. Carniegie Institute of Washington, Washington DC
Google Scholar
Narula SC (1979) Orthogonal polynomial regression. Int Stat Rev 47(1):31–36
Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Google Scholar
Nelder JA, Wedderburn RWM (1972) Generalized linear models. J Roy Stat Soc Ser A (General) 135(3):370–384
Google Scholar
Nettleton D, Doerge RW (2000) Accounting for variability in the use of permutation testing to detect quantitative trait loci. Biometrics 56(1):52–58
PubMed CAS Google Scholar
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–176
PubMed Google Scholar
Ouyang M, Welsh WJ, Georgopoulos P (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20(6):917–923
PubMed CAS Google Scholar
Pan W, Lin J, Le CT (2002) Model-based cluster analysis of microarray gene expression data. Genome Biol 3(2):research0009.1–0009.8
Google Scholar
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103(482):681–686
CAS Google Scholar
Park T, Yi SG, Lee S, Lee SY, Yoo DH, Ahn JI, Lee YS (2003) Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics 19(6):694–703
PubMed CAS Google Scholar
Peddada SD, Lobenhofer EK, Li L, Afshari CA, Weinberg CR, Umbach DM (2003) Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics 19(7):834–841
PubMed CAS Google Scholar
Piepho HP (2001) A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157(1):425–432
PubMed CAS Google Scholar
Potokina E, Caspers M, Prasad M, Kota R, Zhang H, Sreenivasulu N, Wang M, Graner A (2004) Functional association between malting quality trait components and cDNA array based expression patterns in barley. Mol Breed 14(2):153–170
CAS Google Scholar
Qu Y, Xu S (2004) Supervised cluster analysis for microarray data based on multivariate Gaussian mixture. Bioinformatics 20(12):1905–1913
PubMed CAS Google Scholar
Qu Y, Xu S (2006) Quantitative trait associated microarray gene expression data analysis. Mol Biol Evol 23(8):1558–1573
PubMed CAS Google Scholar
Robinson GK (1991) That BLUP is a good thing: the estimation of random effects. Stat Sci 6(1):15–32
Google Scholar
Rubin NB (1987) Multiple imputation for nonresponse in survey. Wiley, New York
Google Scholar
Rubinstein R (1981) Simulation and the Monte Carlo method. Wiley, New York
Google Scholar
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425
PubMed CAS Google Scholar
SAS Institute (2008a). SAS/IML 9.2 user’s guide. SAS Institute Inc, Cary, North Carolina
Google Scholar
SAS Institute (2008b) SAS/STAT 9.2 user’s guide. SAS Institute Inc., Cary, North Carolina
Google Scholar
Satagopan JM, Yandell BS, Newton MA, Osborn TC (1996) A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144(2):805–816
PubMed CAS Google Scholar
Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422:297–302
PubMed CAS Google Scholar
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470
PubMed CAS Google Scholar
Schliep A, Schnhuth A, Steinhoff C (2003) Using hidden Markov models to analyze gene expression time course data. Bioinformatics 19(supp 1):i255–i263
PubMed Google Scholar
Schork NJ (1993) Extended multipoint identity-by-descent analysis of human quantitative traits: efficiency, power, and modeling considerations. Am J Hum Genet 53(6):1306–1319
PubMed CAS Google Scholar
Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Google Scholar
Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New Yok
Google Scholar
Seber GAF (1977) Linear regression analysis, 1st edn. Wiley, New York
Google Scholar
Sillanpää MJ, Arjas E (1998) Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148(3):1373–1388
PubMed Google Scholar
Sillanpää MJ, Arjas E (1999) Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151(4):1605–1619
PubMed Google Scholar
Smyth GK (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3(2):Article 3
Google Scholar
Sobel E, Sengul H, Weeks DE (2001) Multipoint estimation of identity-by-descent probabilities at arbitrary positions among marker loci on general pedigrees. Hum Hered 52(3):121–131
PubMed CAS Google Scholar
Sober E (1983) Parsimony in systematics: philosophical issues. Ann Rev Ecol Systemat 14: 335–357
Google Scholar
Sokal R, Michener C (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Google Scholar
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York
Google Scholar
Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5):631–643
PubMed CAS Google Scholar
Steeb W, Hardy Y (2011) Matrix calculus and Kronecker product: a practical approach to linear and multilinear algebra. World Scientific Publishing Company, Singapore
Google Scholar
Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis of time course microarray experiments. Proc Nat Acad Sci USA 102(36):12837–12842
PubMed CAS Google Scholar
Studier JA, Keppler KJ (1988) A note on the neighbor-joining algorithm of Saitou and Nei. Mol Biol Evol 5(6):729–731
PubMed CAS Google Scholar
Swofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) Phylogenetic inference. In: Hillis DM, Moritz C, Mable BK (eds) Molecular systematics, 2nd edn. Sinauer Associates, Sunderland, Mass., pp 407–514
Google Scholar
ter Braak CJF, Boer MP, Bink MCAM (2005) Extending Xu’s Bayesian model for estimating polygenic effects using markers of the entire genome. Genetics 170(3):1435–1438
PubMed Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J Roy Stat Soc Ser B (Stat Methodol) 58(1):267–288
Google Scholar
Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244
Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Nat Acad Sci USA 98(9):5116–5121
PubMed CAS Google Scholar
Visscher PM, Haley CS, Knott SA (1996) Mapping QTLs for binary traits in backcross and F2 populations. Genet Res 68(01):55–63
Google Scholar
Vogl C, Xu S (2000) Multipoint mapping of viability and segregation distorting loci using molecular markers. Genetics 155(3):1439–1447
PubMed CAS Google Scholar
Wald A (1943) Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc 54(3):426–482
Google Scholar
Wang C, Zhu C, Zhai H, Wan J (2005a) Mapping segregation distortion loci and quantitative trait loci for spikelet sterility in rice (Oryza sativa l.). Genet Res 86(2):97–106
Google Scholar
Wang H, Zhang Y, Li X, Masinde GL, Mohan S, Baylink DJ, Xu S (2005b) Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170(1):465–480
PubMed CAS Google Scholar
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61(3):439–447
Google Scholar
Weinberg W (1908) Über den nachweis der vererbung beim menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg 64:368–382
Google Scholar
Welham S, Cullis B, Kenward M, Thompson R (2007) A comparison of mixed model splines for curve fitting. Aust New Zeal J Stat 49(1):1–23
Google Scholar
Williams JT, Van Eerdewegh P, Almasy L, Blangero J (1999) Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. Likelihood formulation and simulation results. Am J Hum Genet 65(4):1134–1147
CAS Google Scholar
Wolfinger RD, Gibson C, Wolfinger ED, Bennet L, Hamadeh H, Rishel P, Afshari C, Paules RS (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol 8(6):625–637
PubMed CAS Google Scholar
Xie C, Xu S (1999) Mapping quantitative trait loci with dominant markers in four-way crosses. Theor Appl Genet 98(6):1014–1021
Google Scholar
Xu C, Li Z, Xu S (2005) Joint mapping of quantitative trait loci for multiple binary characters. Genetics 169(2):1045–1059
PubMed CAS Google Scholar
Xu C, Wang X, Li Z, Xu S (2009) Mapping QTL for multiple traits using Bayesian statistics. Genet Res 91(1):23–37
CAS Google Scholar
Xu C, Xu S (2003) A SAS/IML program for mapping QTL in line crosses. Proceedings of the twenty-eighth annual SAS users group international conference (SUGI), Cary, NC. SAS Institute
Google Scholar
Xu S (1995) A comment on the simple regression method for interval mapping. Genetics 141(4):1657–1659
PubMed CAS Google Scholar
Xu S (1996) Mapping quantitative trait loci using four-way crosses. Genet Res 68(02):175–181
Google Scholar
Xu S (1998a) Further investigation on the regression method of mapping quantitative trait loci. Heredity 80(3):364–373
PubMed Google Scholar
Xu S (1998b) Iteratively reweighted least squares mapping of quantitative trait loci. Behav Genet 28(5):341–355
PubMed CAS Google Scholar
Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163(2):789–801
PubMed CAS Google Scholar
Xu S (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63(2):513–521
PubMed CAS Google Scholar
Xu S (2008) Quantitative trait locus mapping can benefit from segregation distortion. Genetics 180(4):2201–2208
PubMed Google Scholar
Xu S, Atchley WR (1995) A random model approach to interval mapping of quantitative trait loci. Genetics 141(3):1189–1197
PubMed CAS Google Scholar
Xu S, Hu Z (2009) Mapping quantitative trait loci using distorted markers. Int J Plant Genom 2009, doi:10.1155/2009/410825
Google Scholar
Xu S, Hu Z (2010) Generalized linear model for interval mapping of quantitative trait loci. Theor Appl Genet 121(1):47–63
PubMed Google Scholar
Xu S, Xu C (2006) A multivariate model for ordinal trait analysis. Heredity 97(6):409–417
PubMed CAS Google Scholar
Xu S, Yi N (2000) Mixed model analysis of quantitative trait loci. Proc Nat Acad Sci USA 97(26):14542–14547
PubMed CAS Google Scholar
Xu S, Yi N, Burke D, Galecki A, Miller RA (2003) An EM algorithm for mapping binary disease loci: application to fibrosarcoma in a four-way cross mouse family. Genet Res 82(2):127–138
PubMed CAS Google Scholar
Yeung KY, Bumgarner RE (2003) Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 4(12):R83
PubMed Google Scholar
Yi N (2004) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167(2):967–975
PubMed CAS Google Scholar
Yi N, George V, Allison DB (2003) Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics 164(3):1129–1138
PubMed CAS Google Scholar
Yi N, Shriner D (2008) Advances in Bayesian multiple QTL mapping in experimental designs. Heredity 100(3):240–252
PubMed CAS Google Scholar
Yi N, Xu S (1999) A random model approach to mapping quantitative trait loci for complex binary traits in outbred populations. Genetics 153(2):1029–1040
PubMed CAS Google Scholar
Yi N, Xu S (2000) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155(3):1391–1403
PubMed CAS Google Scholar
Yi N, Xu S (2001) Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics 157(4):1759–1771
PubMed CAS Google Scholar
Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179(2): 1045–1055
PubMed CAS Google Scholar
Yuan M, Lin Y (2005) Efficient empirical Bayes variable selection and estimation in linear models. J Am Stat Assoc 100(472):1215–1225
CAS Google Scholar
Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136(4):1457–1468
PubMed CAS Google Scholar
Zhan H, Chen X, Xu S (2011) A stochastic expectation and maximization algorithm for detecting quantitative trait-associated genes. Bioinformatics 27(1):63–69
PubMed CAS Google Scholar
Zhao H, Speed TP (1996) On genetic map functions. Genetics 142(4):1369–1377
PubMed CAS Google Scholar
Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427–443
PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Botany and Plant Sciences, University of California, 900 University Avenue, Riverside, California, USA
Shizhong Xu

Authors

Shizhong Xu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, S. (2013). Bayesian Multiple QTL Mapping. In: Principles of Statistical Genomics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-70807-2_15

Download citation

DOI: https://doi.org/10.1007/978-0-387-70807-2_15
Published: 30 July 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-70806-5
Online ISBN: 978-0-387-70807-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Bayesian Multiple QTL Mapping

Abstract

Keywords

1 Bayesian Regression Analysis

2 Markov Chain Monte Carlo

3 Mapping Multiple QTL

3.1 Multiple QTL Model

3.2 Prior, Likelihood, and Posterior

3.3 Summary of the MCMC Process

3.4 Post-MCMC Analysis

4 Alternative Methods of Bayesian Mapping

4.1 Reversible Jump MCMC

4.2 Stochastic Search Variable Selection

4.3 Lasso and Bayesian Lasso

4.3.1 Lasso

4.3.2 Bayesian Lasso

5 Example: Arabidopsis Data

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation