Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Quantitative traits, by definition, are controlled by the segregation of multiple genes. However, the continuous distribution of a quantitative trait does not require the segregation of too many genes. Segregation of just a few genes or even a single gene may be sufficient to generate a continuously distributed phenotype, provided that the environmental variant contributes substantial amount of the trait variation. It is often postulated that aquantitative trait may be controlled by one or a few “major genes” plus multiple modifier genes (genes with very small effects). Such a model is calledoligogenic model, which is in contrast to the so calledpolygenic model where multiple genes with small and equal effects are assumed.

In this chapter, we will discuss a method to test the hypothesis that a quantitative trait is controlled by a single major gene even without observing the genotypes of themajor gene. The method is calledsegregation analysis of quantitative traits. Although segregation analysis belongs to major gene detection, we discuss this topic separately from the previous topic to emphasize a slight difference between segregation analysis and the major gene detection discussed earlier. Here, we define major gene detection as an association study between a single-locus genotype with a quantitative trait where genotypes of the major gene are observed for all individuals. Segregation analysis, however, refers to a single-locus association study where genotypes of the major gene are not observed at all. Another reason for separating major gene detection from segregation analysis is that the statistical method and hypothesis test for segregation analysis can be quite different from those of the major gene detection.

1 Gaussian Mixture Distribution

We will use an { F}2 population as an example to discuss thesegregation analysis. Consider the three genotypes in the following order: A 1 A 1, A 1 A 2, and A 2 A 2. Let k = 1, 2, 3 indicate the three ordered genotypes. The means of individuals bearing the three ordered genotypes are denoted by μ1, μ2, and μ3, respectively. Let y j be the phenotypic value of individual j for j = 1, , n, where n is the sample size. Given that individual j has the kth genotype, the linear model for y j  is

$${y}_{j} = {\mu }_{k} + {\epsilon }_{j},$$
(7.1)

where ε j  ∼ N(0, σ2) and σ2 is the residual error variance. The probability density of y j conditional on the kth genotype is

$${f}_{k}({y}_{j}) = \frac{1} {\sqrt{2\pi }\sigma }\exp \left [- \frac{1} {2{\sigma }^{2}}{({y}_{j} - {\mu }_{k})}^{2}\right ].$$
(7.2)

In reality, the genotype of an individual is not observable, and thus, amixture distribution is needed to describe the probability density of y j . Let π k ,  ∀k = 1, 2, 3, be the proportion of genotype k (also called themixing proportion). Without any prior knowledge, π k may be described by the Mendelian segregation ratio, i.e., \({\pi }_{1} = {\pi }_{3} = \frac{1} {2}{\pi }_{2} = \frac{1} {4}\). Therefore, under the assumption ofMendelian segregation, the π k ’s are constants, not parameters. The distribution of y j is a mixture of three normal distributions, each is weighted by the Mendelian mixing proportion. The mixture distribution is demonstrated by Fig. 7.1. The probability density of y j is

$$ f({y}_{j}) ={ \sum \nolimits }_{k=1}^{3}{\pi }_{ k}{f}_{k}({y}_{j}). $$
(7.3)

The overall observed log likelihood function for parameters θ = { μ1, μ2, μ3, σ2} is

$$ L(\theta ) ={ \sum \nolimits }_{j=1}^{n}\ln f({y}_{ j}) ={ \sum \nolimits }_{j=1}^{n}\ln \left [{\sum \nolimits }_{k=1}^{3}{\pi }_{ k}{f}_{k}({y}_{i})\right ]. $$
(7.4)

Any numerical algorithms may be used to estimate the parameters. However, the EM algorithm (Dempster et al. 1977) appears to be the most convenient method for such a mixture model problem and thus will be introduced in this chapter.

Fig. 7.1
figure 1figure 1

Gaussian mixture with three components. The solid line represents the mixture distribution, while the three dashed lines represent the three components

2 EM Algorithm

The expectation-maximization (EM) algorithm was developed by Dempster et al. [1977] as a special numerical algorithm for finding the maximum likelihood estimates (MLE) of parameters. In contrast to theNewton–Raphson algorithm, the EM algorithm is not a general algorithm for MLE; rather, it can only be applied to some special problems. If the following two conditions hold, then we should consider using the EM algorithm. The first condition is that the maximum likelihood problem can be formulated as a missing value problem. The second condition is that if the missing values were not missing, the MLE would have a closed form solution or, at least, a mathematically attractive form of the solution. We now evaluate themixture model problem to see whether the two conditions apply.

2.1 Closed Form Solution

We introduce a label η j to indicate the genotype of individual j. The definition of η j is

$${ \eta }_{j} = \left \{\begin{array}{c} 1\\ 2 \\ 3 \end{array} \begin{array}{c} \text{ for }{A}_{1}{A}_{1} \\ \text{ for }{A}_{1}{A}_{2} \\ \text{ for }{A}_{2}{A}_{2} \end{array} \right.$$
(7.5)

Since the genotype of an individual is not observable, the label η j is missing. Therefore, we can formulate the problem as a missing value problem. The missing values are the genotypes of the major gene and denoted by variable η j for \(j = 1,\ldots ,n\). Therefore, the first condition for using the EM algorithm is met. If η j is not missing, do we have a closed form solution for the parameters? Let us now define three more variables as functions of η j . These three variables are called δ(η j , 1), δ(η j , 2), and δ(η j , 3), and their values are defined as

$$\delta ({\eta }_{j},k) = \left \{\begin{array}{c} 1\\ 0 \end{array} \begin{array}{c} \text{ if }{\eta }_{j} = k \\ \text{ if }{\eta }_{j}\neq k \end{array} \right.$$
(7.6)

for k = 1, 2, 3. We now use δ(η j , k) to represent the missing values. If δ(η j , k) were not missing, the linear model would be described by

$${y}_{j} = \delta ({\eta }_{j},1){\mu }_{1} + \delta ({\eta }_{j},2){\mu }_{2} + \delta ({\eta }_{j},3){\mu }_{3} + {\epsilon }_{j}.$$
(7.7)

Let us define δ j  = [δ(η j , 1) δ(η j , 2) δ(η j , 3)] as a 1 ×3 vector and β = [μ1 μ2 μ3]T as a 3 ×1 vector. The linear model can be rewritten as

$${y}_{j} = {\delta }_{j}\beta + {\epsilon }_{j}.$$
(7.8)

When ε j  ∼ N(0, σ2) is assumed, the maximum likelihood estimates of parameters are

$$\hat{\beta } ={ \left [{\sum \nolimits }_{j=1}^{n}{\delta }_{ j}^{T}{\delta }_{ j}\right ]}^{-1}\left [{\sum \nolimits }_{j=1}^{n}{\delta }_{ j}^{T}{y}_{ j}\right ]$$
(7.9)

for the means and

$$\hat{{\sigma }}^{2} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}{({y}_{ j} - {\delta }_{j}\beta )}^{2}$$
(7.10)

for the residual variance. We see that if the missing variables were not missing, the MLE of the parameters do have an attractive closed form solution. Since both requirements of the EM algorithm are met, we can adopt the EM algorithm to search for the MLE of parameters.

2.2 EM Steps

Before we derive theEM algorithm, let us show the expectation and maximization steps of the EM algorithm. The E-step involves calculating the expectations of all items containing the missing variables δ j . The M-step is simply to estimate β and σ2 using the closed form solutions given above with the items containing the missing variables replaced by the expectations obtained in the E-step, as shown below:

$$\beta ={ \left [{\sum \nolimits }_{j=1}^{n}E({\delta }_{ j}^{T}{\delta }_{ j})\right ]}^{-1}\left [{\sum \nolimits }_{j=1}^{n}E({\delta }_{ j}^{T}){y}_{ j}\right ]$$
(7.11)

and

$${\sigma }^{2} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}E[{({y}_{ j} - {\delta }_{j}\beta )}^{2}].$$
(7.12)

We can see that the EM algorithm is better described by introducing the M-step first and then describing the E-step (in a reverse direction). The detail of the E-step is now given below:

$$E({\delta }_{j}^{T}{\delta }_{ j}) = \left [\begin{array}{ccc} E[\delta ({\eta }_{j},1)]& 0 & 0 \\ 0 &E[\delta ({\eta }_{j},2)]& 0 \\ 0 & 0 &E[\delta ({\eta }_{j},3)]\\ \end{array} \right ]\!\!,$$
(7.13)
$$E({\delta }_{j}^{T}){y}_{ j} = \left [\begin{array}{c} E[\delta ({\eta }_{j},1)]{y}_{j} \\ E[\delta ({\eta }_{j},2)]{y}_{j} \\ E[\delta ({\eta }_{j},3)]{y}_{j}\\ \end{array} \right ]$$
(7.14)

and

$$E[{({y}_{j} - {\delta }_{j}\beta )}^{2}] ={ \sum \nolimits }_{k=1}^{3}E[\delta ({\eta }_{ j},k)]{({y}_{j} - {\mu }_{k})}^{2}.$$
(7.15)

Here, we only need to calculate E[δ(η j , k)], which is the conditional expectation of δ(η j , k) given the parameter values and the phenotypic value. The full expression of the conditional expectation should be E[δ(η j , k) | y j , β, σ2], but we use E[δ(η j , k)] as a short notation.

$$\text{ E}[\delta ({\eta }_{j},k)] = \frac{{\pi }_{k}{f}_{k}({y}_{j}\vert \theta )} {{\sum \nolimits }_{k^\prime =1}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j}\vert \theta )}.$$
(7.16)

where \({\pi }_{1} = {\pi }_{3} = \frac{1} {2}{\pi }_{2} = \frac{1} {4}\) is the Mendelian segregation ratio and f k (y j  | θ) = N(y j  | μ k , σ2) is the normal density. In summary, the EM algorithm is described by

  • Initialization: set t = 0 and let θ = θ(t).

  • E-step: calculate E[δ(η j , k) | y j , θ(t)].

  • M-step: update β(t + 1) and σ2(t + 1).

  • Iteration: set \(t = t + 1\) and iterate between the E-step and the M-step.

The convergence criterion is

$$\vert \vert {\theta }^{(t+1)} - {\theta }^{(t)}\vert \vert = \sqrt{{({\theta }^{(t+1) } - {\theta }^{(t) } )}^{{\prime} } ({\theta }^{(t+1) } - {\theta }^{(t) } )/\text{ dim} (\theta )} \leq \epsilon ,$$
(7.17)

where { dim}(θ) = 4 is the dimension of the parameter vector and ε is an arbitrarily small positive number, say 10 − 8.

Once the three genotypic values are estimated, the additive and dominance effects are estimated using linear contrasts of the genotypic values, e.g.,

$$\left \{\begin{array}{l} \hat{a} =\hat{ {\beta }}_{1} -\frac{1} {2}(\hat{{\beta }}_{1} +\hat{ {\beta }}_{3}) \\ \hat{d} =\hat{ {\beta }}_{2} -\frac{1} {2}(\hat{{\beta }}_{1} +\hat{ {\beta }}_{3})\\ \end{array} \right.\!\!.$$
(7.18)

2.3 Derivation of the EM Algorithm

Theobserved log likelihood function is given in  (7.4). The MLE of θ is the (vector) value that maximizes this log likelihood function. The EM algorithm, however, does not directly maximize this likelihood function; instead, it maximizes the expectation of the complete-data log likelihood function with the expectation taken with respect to the missing variable δ(η j , k). Thecomplete-data log likelihood function is

$$\begin{array}{rlrlrl} {L}_{c}(\theta ) = & -{\sum }_{j=1}^{n}\left [ \frac{1} {2{\sigma }^{2}}{ \sum }_{k=1}^{3}\delta ({\eta }_{ j},k){({y}_{j} - {\mu }_{k})}^{2} +{ \sum }_{k=1}^{3}\delta ({\eta }_{ j},k)\ln ({\pi }_{k})\right ] & & \\ & -\frac{n} {2} \ln ({\sigma }^{2}) &\end{array}$$
(7.19)

The expectation of the complete-data log likelihood is \({\text{ E}}_{{ \theta }^{(t)}}[{L}_{c}(\theta )\vert y,{\theta }^{(t)}]\), which is denoted in short by L(θ | θ(t)) and is defined as

$$\begin{array}{rlrlrl} L(\theta \vert {\theta }^{(t)}) = - &\frac{n} {2} \ln ({\sigma }^{2}) - \frac{1} {2{\sigma }^{2}}{ \sum }_{j=1}^{n}{ \sum }_{k=1}^{3}\text{ E}[\delta ({\eta }_{ j},k)]{({y}_{j} - {\mu }_{k})}^{2} & & \\ + &{\sum }_{j=1}^{n}{ \sum }_{k=1}^{3}\text{ E}[\delta ({\eta }_{ j},k)]\ln ({\pi }_{k}) &\end{array}$$
(7.20)

With theEM algorithm, the target likelihood function for maximization is neither the complete-data log likelihood function (7.19) nor the observed log likelihood function (7.4); rather, it is theexpected complete-data log likelihood function (7.20). An alternative expression of the above equation is

$$\begin{array}{rlrlrl} L(\theta \vert {\theta }^{(t)}) = & -\frac{n} {2} \ln ({\sigma }^{2}) - \frac{1} {2{\sigma }^{2}}{ \sum }_{j=1}^{n}\text{ E}[{({y}_{ j} - {\delta }_{j}\beta )}^{2}] & & \\ & +{ \sum }_{j=1}^{n}{ \sum }_{k=1}^{3}\text{ E}[\delta ({\eta }_{ j},k)]\ln ({\pi }_{k}). &\end{array}$$
(7.21)

The partial derivatives of L(θ | θ(t)) with respect to β and σ2 are

$$\frac{\partial } {\partial \beta }L(\theta \vert {\theta }^{(t)}) = \frac{1} {{\sigma }^{2}}\text{ E}({\delta }_{j}^{T}){y}_{ j} - \frac{1} {{\sigma }^{2}}{ \sum \nolimits }_{j=1}^{n}\text{ E}({\delta }_{ j}^{T}{\delta }_{ j})\beta $$
(7.22)

and

$$\frac{\partial } {\partial {\sigma }^{2}}L(\theta \vert {\theta }^{(t)}) = - \frac{n} {2{\sigma }^{2}} + \frac{1} {2{\sigma }^{4}}{ \sum \nolimits }_{j=1}^{n}E[{({y}_{ j} - {\delta }_{j}^{T}\beta )}^{2}],$$
(7.23)

respectively. Setting \(\frac{\partial } {\partial \beta }L(\theta \vert {\theta }^{(t)}) = \frac{\partial } {\partial {\sigma }^{2}} L(\theta \vert {\theta }^{(t)}) = 0\), we get

$$\beta ={ \left [{\sum \nolimits }_{j=1}^{n}E({\delta }_{ j}^{T}{\delta }_{ j})\right ]}^{-1}\left [{\sum \nolimits }_{j=1}^{n}E({\delta }_{ j}^{T}){y}_{ j}\right ]$$
(7.24)

and

$${\sigma }^{2} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}E[{({y}_{ j} - {\delta }_{j}\beta )}^{2}].$$
(7.25)

This concludes the derivation of the EM algorithm.

2.4 Proof of the EM Algorithm

The target likelihood function for maximization in the EM algorithm is the expectation of the complete-data log likelihood function. However, the actual MLE of θ is obtained by maximization of the observed log likelihood function. To prove that the EM solution of the parameters is indeed the MLE, we only need to show that the partial derivative of the expected complete-data likelihood is identical to the partial derivative of the observed log likelihood, i.e., \(\frac{\partial } {\partial \theta }L(\theta \vert {\theta }^{(t)}) = \frac{\partial } {\partial \theta }L(\theta )\). If the two partial derivatives are the same, then the solutions must be the same because they both solve the same equation system, i.e., \(\frac{\partial } {\partial \theta }L(\theta ) = 0\).

Recall that the partial derivative of the expected complete-data log likelihood function with respect to β is

$$\frac{\partial } {\partial \beta }L(\theta \vert {\theta }^{(t)}) = \frac{1} {{\sigma }^{2}}E({\delta }_{j}^{T}){y}_{ j} - \frac{1} {{\sigma }^{2}}E({\delta }_{j}^{T}{\delta }_{ j})\beta ,$$
(7.26)

which is a 3 ×1 vector as shown below:

$$\frac{\partial } {\partial \beta }L(\theta \vert {\theta }^{(t)}) ={ \left [\begin{array}{*{20}c} \frac{\partial } {\partial {\mu }_{1}} L(\theta \vert {\theta }^{(t)})& \frac{\partial } {\partial {\mu }_{2}} L(\theta \vert {\theta }^{(t)})& \frac{\partial } {\partial {\mu }_{3}} L(\theta \vert {\theta }^{(t)})\\ \end{array} \right ]}^{T}.$$

The kth component of this vector is

$$\begin{array}{rlrlrl} \frac{\partial } {\partial {\mu }_{k}}L(\theta \vert {\theta }^{(t)}) & = \frac{1} {{\sigma }^{2}}E[\delta ({\eta }_{j},k)]{y}_{j} - \frac{1} {{\sigma }^{2}}E[{\delta }^{2}({\eta }_{ j},k)]{\mu }_{k} & & \\ & = \frac{1} {{\sigma }^{2}}E[\delta ({\eta }_{j},k)]{y}_{j} - \frac{1} {{\sigma }^{2}}E[\delta ({\eta }_{j},k)]{\mu }_{k} & & \\ & = \frac{1} {{\sigma }^{2}}E[\delta ({\eta }_{j},k)]({y}_{j} - {\mu }_{k}) &\end{array}$$
(7.27)

The equation holds because E[δ(η j , k)] = E2 j , k)], a property for theBernoulli distribution. We now evaluate the partial derivative of the expected complete-data log likelihood with respect to σ2,

$$\begin{array}{rlrlrl} \frac{\partial } {\partial {\sigma }^{2}}L(\theta \vert {\theta }^{(t)}) = & - \frac{n} {2{\sigma }^{2}} + \frac{1} {2{\sigma }^{2}}{ \sum }_{j=1}^{n}E[{({y}_{ j} - {\delta }_{j}^{T}\beta )}^{2}] & & \\ = & - \frac{n} {2{\sigma }^{2}} + \frac{1} {2{\sigma }^{2}}{ \sum }_{j=1}^{n}E[\delta ({\eta }_{ j},k)]{({y}_{j} - {\mu }_{k})}^{2} &\end{array}$$
(7.28)

We now look at the partial derivatives of L(θ) with respect to the parameters. Theobserved log likelihood function is

$$L(\theta ) ={ \sum }_{j=1}^{n}\ln { \sum }_{k=1}^{3}{\pi }_{ k}{f}_{k}({y}_{j})$$
(7.29)

where

$${f}_{k}({y}_{j}) = \frac{1} {\sqrt{2\pi }\sigma }\exp \left [- \frac{1} {2{\sigma }^{2}}{({y}_{j} - {\mu }_{k})}^{2}\right ].$$
(7.30)

The partial derivatives of L(θ) with respect to β = [μ1 μ2 μ3]T are

$$\frac{\partial } {\partial {\mu }_{k}}L(\theta ) ={ \sum }_{j=1}^{n} \frac{{\pi }_{k}} {{\sum \nolimits }_{k^\prime =1}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j})} \frac{\partial } {\partial {\mu }_{k}}{f}_{k}({y}_{j}),$$
(7.31)

where

$$\frac{\partial } {\partial {\mu }_{k}}{f}_{k}({y}_{j}) = {f}_{k}({y}_{j})\left [ \frac{1} {{\sigma }^{2}}({y}_{j} - {\mu }_{k})\right ].$$
(7.32)

Hence,

$$\frac{\partial } {\partial {\mu }_{k}}L(\theta ) = \frac{1} {{\sigma }^{2}}{ \sum }_{j=1}^{n} \frac{{\pi }_{k}{f}_{k}({y}_{j})} {{\sum \nolimits }_{k^\prime =1}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j})}({y}_{j} - {\mu }_{k}).$$
(7.33)

Recall that

$$E[\delta ({\eta }_{j},k)] = \frac{{\pi }_{k}{f}_{k}({y}_{j})} {{\sum \nolimits }_{k^\prime =1}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j})}.$$
(7.34)

Therefore,

$$\frac{\partial } {\partial {\mu }_{k}}L(\theta ) = \frac{1} {{\sigma }^{2}}{ \sum }_{j=1}^{n}E[\delta ({\eta }_{ j},k)]({y}_{j} - {\mu }_{k}),$$
(7.35)

which is exactly the same as \(\frac{\partial } {\partial {\mu }_{k}}L(\theta \vert {\theta }^{(t)})\) given in  (7.27). Now, let us look at the partial derivative of L(θ) with respect to σ2.

$$\frac{\partial } {\partial {\sigma }^{2}}L(\theta ) ={ \sum }_{j=1}^{n}{\sum }_{k=1}^{3} \frac{{\pi }_{k}} {{\sum \nolimits }_{k^\prime =1}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j})} \frac{\partial } {\partial {\sigma }^{2}}{f}_{k}({y}_{j}),$$
(7.36)

where

$$\frac{\partial } {\partial {\sigma }^{2}}{f}_{k}({y}_{j}) = - \frac{1} {2{\sigma }^{2}}{f}_{k}({y}_{j}) + \left [ \frac{1} {2{\sigma }^{4}}{({y}_{j} - {\mu }_{k})}^{2}\right ]{f}_{ k}({y}_{j}).$$
(7.37)

Hence,

$$\begin{array}{rlrlrl} \frac{\partial } {\partial {\sigma }^{2}}L(\theta ) = & - \frac{1} {2{\sigma }^{2}}{ \sum }_{j=1}^{n}{\sum }_{k=1}^{3} \frac{{\pi }_{k}{f}_{k}({y}_{j})} {{\sum \nolimits }_{k^\prime =1}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j})} & & \\ & + \frac{1} {2{\sigma }^{4}}{ \sum }_{j=1}^{n}\left [{\sum }_{k=1}^{3} \frac{{\pi }_{k}{f}_{k}({y}_{j})} {{\sum \nolimits }_{k^\prime =1}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j})}{({y}_{j} - {\mu }_{k})}^{2}\right ] &\end{array}$$
(7.38)

Note that

$${\sum }_{k=1}^{3} \frac{{\pi }_{k}{f}_{k}({y}_{j})} {{\sum \nolimits }_{k^\prime =}^{3}{\pi }_{k^\prime }{f}_{k^\prime }({y}_{j})} ={ \sum }_{k=1}^{3}E[\delta ({\eta }_{ j},k)] = 1.$$
(7.39)

Therefore,

$$\begin{array}{rcl} \frac{\partial } {\partial {\sigma }^{2}}L(\theta )& =& - \frac{1} {2{\sigma }^{2}}{ \sum }_{j=1}^{n}{\sum }_{k=1}^{3}E[\delta ({\eta }_{ j},k)] + \frac{1} {2{\sigma }^{4}}{ \sum }_{j=1}^{n}{\sum }_{k=1}^{3}E[\delta ({\eta }_{ j},k)]{({y}_{j} - {\mu }_{k})}^{2} \\ & =& - \frac{n} {2{\sigma }^{2}} + \frac{1} {2{\sigma }^{4}}{ \sum }_{j=1}^{n}{\sum }_{k=1}^{3}E[\delta ({\eta }_{ j},k)]{({y}_{j} - {\mu }_{k})}^{2} \end{array}$$
(7.40)

which is exactly the same as \(\frac{\partial } {\partial {\sigma }^{2}} L(\theta \vert {\theta }^{(t)})\) given in  (7.28). We now have confirmed that

$$\frac{\partial } {\partial {\sigma }^{2}}L(\theta \vert {\theta }^{(t)}) = \frac{\partial } {\partial {\sigma }^{2}}L(\theta )$$
(7.41)

and

$$\frac{\partial } {\partial {\mu }_{k}}L(\theta \vert {\theta }^{(t)}) = \frac{\partial } {\partial {\mu }_{k}}L(\theta ),\,\forall k = 1,2,3.$$
(7.42)

This concludes the proof that the EM algorithm does lead to the MLE of the parameters.

3 Hypothesis Tests

The overall null hypothesis is “no major gene is segregating” denoted by

$${H}_{0} : {\mu }_{1} = {\mu }_{2} = {\mu }_{3} = \mu.$$
(7.43)

The alternative hypothesis is “at least one of the means is different from others,” denoted by

$${H}_{1} : {\mu }_{1}\neq {\mu }_{3}\text{ or }{\mu }_{2}\neq \frac{1} {2}({\mu }_{1} + {\mu }_{3}).$$
(7.44)

The likelihood ratio test statistic is

$$\lambda = -2[{L}_{0}(\hat{\theta }) - {L}_{1}(\hat{\theta })],$$
(7.45)

where \({L}_{1}(\hat{\theta })\) is the observed log likelihood function evaluated at the MLE of θ for the full model, and

$${L}_{0}(\hat{\theta }) = -\frac{n} {2} \ln (\hat{{\sigma }}^{2}) - \frac{1} {2\hat{{\sigma }}^{2}}{ \sum \nolimits }_{j=1}^{n}{({y}_{ j} -\hat{ \mu })}^{2}$$
(7.46)

is the log likelihood values evaluated at the null model where

$$\hat{\mu } = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}{y}_{ j}$$
(7.47)

and

$$\hat{{\sigma }}^{2} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}{({y}_{ j} -\hat{ \mu })}^{2}.$$
(7.48)

Under the null hypothesis, λ will follow approximately a chi-square distribution with two degrees of freedom. Therefore, H 0 will be rejected if λ > χ2, 1 − α 2, where α = 0. 05 may be chosen as the type I error.

4 Variances of Estimated Parameters

Unlike other iterative methods of parameter estimation, e.g.,Newton–Raphson method, thatvariance–covariance matrix of the estimated parameters are provided automatically as a by-product of the iteration process, the EM algorithm does not facilitate an easy way for calculating the variance–covariance matrix of the estimated parameters. We now introduce a special method for calculating the variance–covariance matrix. The method was developed by Louis [1982] particularly for calculating the variance–covariance matrix of parameters that are estimated via the EM algorithm. The method requires the first and second partial derivatives of the complete-data log likelihood function (not the observed log likelihood function). The complete-data log likelihood function is

$$L(\theta ,\delta ) ={ \sum }_{j=1}^{n}{L}_{ j}(\theta ,\delta ),$$
(7.49)

where

$${L}_{j}(\theta ,\delta ) = -\frac{1} {2}\ln ({\sigma }^{2}) - \frac{1} {2{\sigma }^{2}}{(y - {\delta }_{j}\beta )}^{2}.$$
(7.50)

The first partial derivative of this log likelihood with respect to the parameter is called thescore function, which is

$$S(\theta ,\delta ) = \frac{\partial } {\partial \theta }L(\theta ,\delta ) ={ \sum }_{j=1}^{n} \frac{\partial } {\partial \theta }{L}_{j}(\theta ,\delta ) ={ \sum }_{j=1}^{n}{S}_{ j}(\theta ,\delta ),$$
(7.51)

where

$$\begin{array}{rlrlrl} {S}_{j}(\theta ,\delta ) = \frac{\partial } {\partial \theta }{L}_{j}(\theta ,\delta ) = \left [\begin{array}{*{20}c} \frac{\partial } {\partial \beta }{L}_{j}(\theta ,\delta ) \\ \frac{\partial } {\partial {\sigma }^{2}} {L}_{j}(\theta ,\delta )\\ \end{array} \right ] = \left [\begin{array}{*{20}c} \frac{1} {{\sigma }^{2}} {\delta }_{j}^{T}({y}_{j} - {\delta }_{j}\beta ) \\ - \frac{1} {2{\sigma }^{2}} + \frac{1} {2{\sigma }^{4}} {({y}_{j} - {\delta }_{j}\beta )}^{2} \\ \end{array} \right ]. & & \\ & &\end{array}$$
(7.52)

The second partial derivative is called theHessian matrix H(θ, δ). The negative value of the Hessian matrix is denoted by \(B(\theta ,\delta ) = -H(\theta ,\delta )\),

$$B(\theta ,\delta ) = -\frac{{\partial }^{2}L(\theta ,\delta )} {\partial \theta \,\partial {\theta }^{T}} = -{\sum }_{j=1}^{n}\frac{{\partial }^{2}{L}_{ j}(\theta ,\delta )} {\partial \theta \,\partial {\theta }^{T}} ={ \sum }_{j=1}^{n}{B}_{ j}(\theta ,\delta ),$$
(7.53)

where

$${ B}_{j}(\theta ,\delta ) = -\frac{{\partial }^{2}{L}_{j}(\theta ,\delta )} {\partial \theta \,\partial {\theta }^{T}} = \left [\begin{array}{*{20}c} -\frac{{\partial }^{2}{L}_{ j}(\theta ,\delta )} {\partial \beta \,\partial {\beta }^{T}} & \ \ -\frac{{\partial }^{2}{L}_{ j}(\theta ,\delta )} {\partial \beta \,\partial {\sigma }^{2}} \\ -\frac{{\partial }^{2}{L}_{ j}(\theta ,\delta )} {\partial {\sigma }^{2}\,\partial {\beta }^{T}} & \ \ -\frac{{\partial }^{2}{L}_{ j}(\theta ,\delta )} {\partial {\sigma }^{2}\,\partial {\sigma }^{2}}\\ \end{array} \right ]$$
(7.54)

Detailed expression of B j (θ, δ) is given below:

$${ B}_{j}(\theta ,\delta ) = \left [\begin{array}{*{20}c} \frac{1} {{\sigma }^{2}} {\delta }_{j}^{T}{\delta }_{j} & \frac{1} {{\sigma }^{4}} {\delta }_{j}^{T}({y}_{j} - {\delta }_{j}\beta ) \\ \frac{1} {{\sigma }^{4}} {({y}_{j} - {\delta }_{j}\beta )}^{T}{\delta }_{j}& \frac{1} {{\sigma }^{6}} {({y}_{j} - {\delta }_{j}\beta )}^{2} - \frac{1} {2{\sigma }^{2}}\\ \end{array} \right ].$$
(7.55)

Louis [1982] gave the following information matrix:

$$\begin{array}{rcl} I(\theta )& =& \text{ E}[B(\theta ,\delta )] -\text{ var}[S(\theta ,\delta )] \\ & =& {\sum \nolimits }_{j=1}^{n}\text{ E}[{B}_{ j}(\theta ,\delta )] -{\sum \nolimits }_{j=1}^{n}\text{ var}[{S}_{ j}(\theta ,\delta )],\end{array}$$
(7.56)

where the expectation and variance are taken with respect to the missing variable δ j using the posterior probability of δ j . Detailed expressions of { E}[B j (θ, δ)] and { var}[S j (θ, δ)] are given in the end of this section. Readers may also refer to Han and Xu [2008] and Xu and Hu [2010] for the derivation and the results. Replacing θ by \(\hat{\theta }\) and taking the inverse of the information matrix, we get the variance–covariance matrix of the estimated parameters,

$$\text{ var}(\hat{\theta }) = {I}^{-1}(\hat{\theta }) =\{ \text{ E}[B(\hat{\theta },\delta )] -\text{ var}{[S(\hat{\theta },\delta )]\}}^{-1}.$$
(7.57)

This is a 4 ×4 variance–covariance matrix, as shown below:

$$\mathrm{var}(\hat{\theta }) = \left [\begin{array}{*{20}c} \mathrm{var}(\hat{\beta }) &\mathrm{cov}(\hat{\beta },\hat{{\sigma }}^{2}) \\ \mathrm{cov}(\hat{{\sigma }}^{2},\hat{{\beta }}^{T})& \mathrm{var}(\hat{{\sigma }}^{2})\\ \end{array} \right ],$$
(7.58)

where \(\text{ var}(\hat{\beta })\) is a 3 ×3 variance matrix for the estimated genotypic values.

The additive and dominance effects can be expressed as linear functions of β, as demonstrated below:

$$\left [\begin{array}{*{20}c} a\\ d\\ \end{array} \right ] = \left [\begin{array}{*{20}c} { 1 \over 2} & 0&-{ 1 \over 2} \\ -{ 1 \over 2} & 1&-{ 1 \over 2} \\ \end{array} \right ]\left [\begin{array}{*{20}c} {\beta }_{1} \\ {\beta }_{2} \\ {\beta }_{3}\\ \end{array} \right ] = {L}^{T}\beta ,$$
(7.59)

where

$$L ={ \left [\begin{array}{*{20}c} { 1 \over 2} & 0&-{ 1 \over 2} \\ -{ 1 \over 2} & 1&-{ 1 \over 2} \\ \end{array} \right ]}^{T}.$$
(7.60)

The variance–covariance matrix for the estimated major gene effects is

$$\mathrm{var}\left [\begin{array}{*{20}c} \hat{a}\\ \hat{d}\\ \end{array} \right ] = {L}^{T}\mathrm{var}(\hat{\beta })L = \left [\begin{array}{*{20}c} \mathrm{var}(\hat{a}) &\mathrm{cov}(\hat{a},\hat{d}) \\ \mathrm{cov}(\hat{a},\hat{d})& \mathrm{var}(\hat{d})\\ \end{array} \right ]\!\!.$$
(7.61)

Thevariance–covariance matrix of the estimated major gene effects also facilitates an alternative method for testing the hypothesis of \({H}_{0} : a = d = 0\). This test is called the Wald-test statistic (Wald 1943),

$$W = {\beta }^{T}L{[{L}^{T}\mathrm{var}(\hat{\beta })L]}^{-1}{L}^{T}\beta = \left [\begin{array}{*{20}c} \hat{a}&\hat{d}\\ \end{array} \right ]{\left [\begin{array}{*{20}c} \mathrm{var}(\hat{a}) &\mathrm{cov}(\hat{a},\hat{d}) \\ \mathrm{cov}(\hat{a},\hat{d})& \mathrm{var}(\hat{d})\\ \end{array} \right ]}^{-1}\left [\begin{array}{*{20}c} \hat{a}\\ \hat{d}\\ \end{array} \right ]\!\!.$$
(7.62)

The Wald-test statistic is much like thelikelihood ratio test statistic. Under the null model, W follows approximately a χ2 distribution with 2 degrees of freedom. However, Wald test is usually considered inferior compared to the likelihood ratio test statistic, especially when the sample size is small.

Before exiting this section, we now provide the derivation of { E}[B j (θ, δ)] and { var}[S j (θ, δ)]. Recall that δ j is a 1 ×3 multinomial variable with sample size 1 and defined as

$${ \delta }_{j} = \left [\begin{array}{*{20}{c}} \delta ({\eta }_{j},1)&\delta ({\eta }_{j},2)&\delta ({\eta }_{j},3) \end{array} \right ]$$
(7.63)

This variable has the following properties:

$${\delta }^{2}({\eta }_{ j},k) = \delta ({\eta }_{j},k)$$
(7.64)

and

$$\delta ({\eta }_{j},k)\delta ({\eta }_{j},k^\prime ) = \left \{\begin{array}{*{20}{c}} \delta ({\eta }_{j},k)\\ 0 \end{array} \right.\begin{array}{*{20}{c}} \\ \end{array} \begin{array}{*{20}{c}} \mathrm{for} \\ \mathrm{for} \end{array} \begin{array}{*{20}{c}} \\ \end{array} \begin{array}{*{20}{c}} k = k^\prime \\ k\neq k^\prime \end{array}$$
(7.65)

Therefore, the expectation of δ j is

$$E({\delta }_{j}) = \left [\begin{array}{*{20}{c}} E\left [\delta ({\eta }_{j},1)\right ]&E\left [\delta ({\eta }_{j},2)\right ]&E\left [\delta ({\eta }_{j},3)\right ] \end{array} \right ]$$
(7.66)

The expectation of its quadratic form is

$$E({\delta }_{j}^{T}{\delta }_{ j}) = \mathrm{diag}\left [E({\delta }_{j})\right ] = \left [\begin{array}{*{20}{c}} E\left [\delta ({\eta }_{j},1)\right ]& 0 & 0 \\ 0 &E\left [\delta ({\eta }_{j},3)\right ]& 0 \\ 0 & 0 &E\left [\delta ({\eta }_{j},3)\right ] \end{array} \right ]$$
(7.67)

The variance–covariance matrix of δ j is

$$\mathrm{var}({\delta }_{j}) = E({\delta }_{j}^{T}{\delta }_{ j}) - E({\delta }_{j}){E}^{T}({\delta }_{ j})$$
(7.68)

To derive the observed information matrix, we need the first and second partial derivatives of the complete-data log likelihood with respect to the parameter vector \(\theta = {[\begin{array}{*{20}{c}} {\beta }^{T}&{\sigma }^{2} \end{array} ]}^{T}\). The score vector is rewritten as

$${ S}_{j}(\theta ,\delta ) = \left [\begin{array}{*{20}{c}} { 1 \over {\sigma }^{2}} {\delta }_{j}^{T}({y}_{j} - {\delta }_{j}\beta ) \\ { 1 \over 2{\sigma }^{4}} { ({y}_{j} - {\delta }_{j}\beta )}^{2} \end{array} \right ]+\left [\begin{array}{*{20}{c}} {0}_{3\times 1} \\ -{ 1 \over 2{\sigma }^{2}} \end{array} \right ]$$
(7.69)

where 03 ×1 is a 3 ×1 vector of zeros, and thus, the score is a 4 ×1 vector. The negative of the second partial derivative is

$${ B}_{j}(\theta ,\delta ) = \left [\begin{array}{*{20}{c}} { 1 \over {\sigma }^{2}} {\delta }_{j}^{T}{\delta }_{j} &{ 1 \over {\sigma }^{4}} {\delta }_{j}^{T}({y}_{j} - {\delta }_{j}\beta ) \\ { 1 \over {\sigma }^{4}} { ({y}_{j} - {\delta }_{j}\beta )}^{T}{\delta }_{j}^{}& { 1 \over {\sigma }^{6}} { ({y}_{j} - {\delta }_{j}\beta )}^{2} \end{array} \right ]+\left [\begin{array}{*{20}{c}} {0}_{3\times 3} & {0}_{3\times 1} \\ {0}_{1\times 3} & -{ 1 \over 2{\sigma }^{2}} \end{array} \right ]$$
(7.70)

where 03 ×3 is a 3 ×3 matrix of zeros, and thus, B j (θ, δ) is a 4 ×4 matrix. The expectation of B j (θ, δ) is easy to derive, but derivation of the variance–covariance matrix of the score vector is very difficult. Xu and Xu [2003] used a Monte Carlo approach to approximating the expectation and the variance–covariance matrix. They simulated multiple (e.g., 5,000) samples of δ j from the posterior distribution and then took the sample mean of B j (θ, δ) and the sample variance–covariance matrix of S j (θ, δ) as the approximations of the corresponding terms. Here, we took a theoretical approach for the derivation and provide explicit expressions for the expectation and variance–covariance matrix. We can express the score vector as a linear function of δ j and the B j (θ, δ) matrix as a quadratic function of δ j . By trial and error, we found that

$$\begin{array}{rcl}{ S}_{j}(\theta ,\delta )& =& \left [\begin{array}{*{20}{c}} { 1 \over {\sigma }^{2}} ({y}_{j} - {\beta }_{1}) & 0 & 0 \\ 0 & { 1 \over {\sigma }^{2}} ({y}_{j} - {\beta }_{1}) & 0 \\ 0 & 0 & { 1 \over {\sigma }^{2}} ({y}_{j} - {\beta }_{3}) \\ { 1 \over 2{\sigma }^{4}} { ({y}_{j} - {\beta }_{1})}^{2} & { 1 \over 2{\sigma }^{4}} { ({y}_{j} - {\beta }_{2})}^{2} & { 1 \over 2{\sigma }^{4}} { ({y}_{j} - {\beta }_{3})}^{2} \end{array} \right ]\left [\begin{array}{*{20}{c}} \delta ({\eta }_{j}, 1) \\ \delta ({\eta }_{j}, 2) \\ \delta ({\eta }_{j}, 3) \end{array} \right ] + \left [\begin{array}{*{20}{c}} 0\\ 0 \\ 0 \\ -{ 1 \over 2{\sigma }^{2}} \end{array} \right ] \\ & =& {A}_{j}^{T}{\delta }_{ j}^{T} + C \end{array}$$
(7.71)

where A j T is the 4 ×3 coefficient matrix and C is the 4 ×1 vector of constants. Let us define a 4 ×1 matrix H j T as

$${ H}_{j}^{T} = {T}_{ j}^{T}{\delta }_{ j}^{T} = \left [\begin{array}{*{20}{c}} { 1 \over {\sigma }^{2}} & 0 & 0 \\ 0 & { 1 \over {\sigma }^{2}} & 0 \\ 0 & 0 & { 1 \over {\sigma }^{2}} \\ { 1 \over {\sigma }^{3}} ({y}_{j} - {\beta }_{1})&{ 1 \over {\sigma }^{3}} ({y}_{j} - {\beta }_{2})&{ 1 \over {\sigma }^{3}} ({y}_{j} - {\beta }_{3}) \end{array} \right ]\left [\begin{array}{*{20}{c}} {\delta }_{j}(1) \\ {\delta }_{j}(2) \\ {\delta }_{j}(3) \end{array} \right ]$$
(7.72)

where T j T is the 4 ×3 coefficient matrix. We can now express matrix B j (θ, δ) as

$${B}_{j}(\theta ,\delta ) = {H}_{j}^{T}{H}_{ j} + D = {T}_{j}^{T}{\delta }_{ j}^{T}{\delta }_{ j}{T}_{j} + D$$
(7.73)

where

$$D = \mathrm{diag}(C) = \left [\begin{array}{*{20}{c}} {0}_{3\times 3} & {0}_{3\times 1} \\ {0}_{1\times 3} & -{ 1 \over 2{\sigma }^{2}} \end{array} \right ]$$
(7.74)

is a 4 ×4 constant matrix. The expectation of B j (θ, δ) is

$$E\left [{B}_{j}(\theta ,\delta )\right ] = {T}_{j}^{T}E({\delta }_{ j}^{T}{\delta }_{ j}){T}_{j} + D$$
(7.75)

The expectation vector and the variance–covariance matrix of S j (θ, δ) are

$$E\left [{S}_{j}(\theta ,\delta )\right ] = {A}_{j}^{T}E({\delta }_{ j}^{T}) + C$$
(7.76)

and

$$\mathrm{var}\left [{S}_{j}(\theta ,\delta )\right ] = {A}_{j}^{T}\mathrm{var}({\delta }_{ j}){A}_{j} = {A}_{j}^{T}\left [E({\delta }_{ j}^{T}{\delta }_{ j}) - E({\delta }_{j}^{T})E({\delta }_{ j})\right ]{A}_{j}$$
(7.77)

respectively. Expressing S j (θ, δ) and B j (θ, δ) as linear and quadratic functions of the missing vector δ j has significantly simplified the derivation of the information matrix.

5 Estimation of the Mixing Proportions

We used an { F} 2 population as an example for segregation analysis. Extension of the segregation analysis to other populations is straightforward and will not be discussed here. For the { F} 2 population, we assumed that the major gene follows the Mendelian segregation ratio, i.e., \({\pi }_{1} = {\pi }_{3} = \frac{1} {2}{\pi }_{2} = \frac{1} {4}\). Therefore, π k is a constant, not a parameter for estimation. The method can be extended to a situation where the major gene does not follow the Mendelian segregation ratio. In this case, the values of π k are also parameters for estimation. This section will introduce a method to estimate the π k ’s. These π k ’s are called themixing proportions.

We simply add one more step in the EM algorithm to estimate π k , { } ∀k = 1, 2, 3. Again, we maximize the expected complete-data log likelihood function. To enforce the restriction that \({\sum \nolimits }_{k=1}^{3}{\pi }_{k} = 1\), we introduce aLagrange multiplier ξ. Therefore, the actual function to be maximized is

$$\begin{array}{rlrlrl} L(\theta \vert {\theta }^{(t)}) = & -\frac{n} {2} \ln ({\sigma }^{2}) - \frac{1} {2{\sigma }^{2}}{ \sum }_{j=1}^{n}{ \sum }_{k=1}^{3}\text{ E}[\delta ({\eta }_{ j},k)]{({y}_{j} - {\mu }_{k})}^{2} & & \\ & +{ \sum }_{j=1}^{n}{ \sum }_{k=1}^{3}\text{ E}[\delta ({\eta }_{ j},k)]\ln ({\pi }_{k}) + \lambda \left (1 -{\sum }_{k=1}^{3}{\pi }_{ k}\right ). &\end{array}$$
(7.78)

The partial derivatives of L(θ | θ(t)) with respect to π k and λ are

$$\frac{\partial } {\partial {\pi }_{k}}L(\theta \vert {\theta }^{(t)}) = \frac{1} {{\pi }_{k}}{ \sum \nolimits }_{j=1}^{n}\text{ E}[\delta ({\eta }_{ j},k)] - \lambda ,\forall k = 1,2,3$$
(7.79)

and

$$\frac{\partial } {\partial \lambda }L(\theta \vert {\theta }^{(t)}) = 1 -{\sum \nolimits }_{k=1}^{3}{\pi }_{ k},$$
(7.80)

respectively. Let \(\frac{\partial } {\partial {\pi }_{k}}L(\theta \vert {\theta }^{(t)}) = \frac{\partial } {\partial \lambda }L(\theta \vert {\theta }^{(t)}) = 0\), and solve for π k ’s and λ. The solution for π k is

$${\pi }_{k} = \frac{1} {\lambda }{\sum \nolimits }_{j=1}^{n}E[\delta ({\eta }_{ j},k)],\forall k = 1,2,3.$$
(7.81)

The solution for λ is obtained by

$${\sum \nolimits }_{k=1}^{3}{\pi }_{ k} = \frac{1} {\lambda }{\sum \nolimits }_{k=1}^{3}{ \sum \nolimits }_{j=1}^{n}E[\delta ({\eta }_{ j},k)] = \frac{1} {\lambda }{\sum \nolimits }_{j=1}^{n}{ \sum \nolimits }_{k=1}^{3}E[\delta ({\eta }_{ j},k)] = \frac{n} {\lambda } = 1.$$
(7.82)

This is because \({\sum \nolimits }_{k=1}^{3}E[\delta ({\eta }_{j},k)] = 1\) and \({\sum \nolimits }_{j=1}^{n} = n\). As a result, λ = n and thus

$${\pi }_{k}^{(t+1)} = \frac{1} {n}{\sum \nolimits }_{j=1}^{n}E[\delta ({\eta }_{ j},k)],\forall k = 1,2,3.$$
(7.83)