Abstract
This work aims to introduce a new method of estimating the variance components in mixed linear models . The approach will be done firstly for models with 3 variances components and secondly attention will be devoted to general case of models with an arbitrary number of variance components . In our approach, we construct and apply a finite sequence of orthogonal matrices to the mixed linear model variance-covariance structure in order to produce a set of Gauss–Markov sub-models which will be used to create pooled estimators for the variance components . Numerical results will be given, comparing the performance of our proposed estimator to the one based on likelihood procedure.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Mixed linear models (MLM) arise due to the necessity of assessing the amount of variation caused by certain sources in a statistical designs with fixed effects (see Khuri [7]), for example, the amount of variations that are not controlled by the experimenters and those whose levels are selected at random. The variances of such sources of variation, currently refereed to as variance components , has been widely investigated in the last fifty years of the last century (see Khuri and Sahai [8], Searle [13, 14], among others) and during the period ranging somewhat from early 1960 to 1990, due to the proliferation of investigation on genetic and animal breeding as well as industrial quality control and improvement (for more details, see Anderson [1,2,3], Anderson and Crump [4], Searle [13], among others), several techniques of estimation have been proposed. Among those techniques we highlight the ANOVA and the maximum likelihood - based methods (see, for example, Searle et al. [15] and Casella and Berger [5]). Nevertheless, notwithstanding the ANOVA method adapt readily to mixed models with balanced data and save the unbiasedness, it does not adapt in situation with unbalanced data (mostly because it use computations derived from fixed effect models rather than mixed models). On its turn, the maximum likelihood - based methods, highlighting the ML and the restricted ML (REML) methods, provide estimators with several statistical optimal properties such as consistency and asymptotic normality either for models with balanced data, or for those with unbalanced data. For these optimal properties we recommend Miller [9], and for some details on applications of such methods we recommend, for example, Anderson [2] and Hartley and Rao [6].
This paper is organized as follows. In Sect. 2 (notation and basic concepts on matrix theory) we review some needed notions and results on matrix theory, mainly on matrix diagonalization. A new method to estimate the variance components in the MLM is summarized in Sect. 3, and numerical results ensuring their optimality will be available in Sect. 4.
2 Notation and Basic Concepts on Matrix Theory
In this section we summarize a few needed notions and results on matrix diagonalization. The proofs for the results can be found in Schott [12].
Let \(\mathscr {M}^{n\times m}\) and \(\mathscr {S}^n = \{A: A\in \mathscr {M}^{n\times n}, A=A^\top \}\) stands for the set of the matrices with n rows and m columns and the set of the \(n \times n\) symmetric matrices, respectively. The range and the rank of a matrix A will be respectively denoted by R(A) and r(A), and the projection matrix onto the range space of A denoted by \(P_{R(A)}\) (see Schott [12, Chap. 2, Sect. 7] for projection matrix notion). We will denote by tr(A) the trace of A.
If the eigenvalues \(\lambda _1,\ldots ,\lambda _r\) of the matrix \(M \in \mathscr {M}^{r\times r}\) are all distinct, it follows from the Theorem 3.6 of Schott [12] that the matrix X, whose columns are the eigenvectors associated to those eigenvalues , is non-singular . Thus, by the eigenvalue - eigenvector equation \(MX =XD\) or, equivalently, \(X^{-1}MX=D\), with \(D=diag(\lambda _1 \ldots \lambda _r)\), and the Theorem 3.2.(d) of Schott [12], the eigenvalues of D are the same as those of M. Meanwhile, since M can be transformed into a diagonal matrix by postmultiplication by the non-singular matrix X and premultiplication by its inverse \(X^{-1}\) it is said to be diagonalizable.
If the matrix M is symmetric we will have that the eigenvectors associated to its different eigenvalues will be orthogonal (see Schott [12]). Indeed, if we consider two different eigenvalues \(\lambda _i\) and \(\lambda _j\) whose associated eigenvectors are \(\mathbf {x}_i\) and \(\mathbf {x}_j\), respectively, we see that, since M is symmetric ,
So, since \(\lambda _i \ne \lambda _j\), we must have \(\mathbf {x}_i^\top \mathbf {x}_j = 0\).
According with Theorem 3.10 of Schott [12], without lost in generality, the columns of the matrix X can be taken to be orthonormal so that X is an orthogonal matrix. Thus, the eigenvalue - eigenvector equation can now be written as
which is known as spectral decomposition of M.
Definition 1
Let
be a diagonal blockwise matrix. We say that a matrix T sub-diagonalizes A if the TA produces a blockwise matrix whose matrices in the diagonal are all diagonal matrices, that is T diagonalizes the matrices \(A_{11}, \ldots , A_{nn}\) in the diagonal of A.
3 Inference
Variance components estimation in linear models (with mixed and/or fixed effects) have been widely investigated and consequently several methods for estimation with important properties have been derived. Some of this methods are summarized in Searle et al. [15].
In this section we will sub-diagonalize the variance-covariance matrix
in the Normal MLM
with \(\gamma _d > 0\), \(d=1,\ldots ,r\), unknown parameters, \(N_d =X_d X_d^\top \in \mathscr {S}^m\), \(X_d \in \mathscr {M}^{m\times s}\) known matrices, and \(N_{r+1}=I_m\), and develop optimal estimators for the variance components \(\gamma _1, \ldots , \gamma _{r+1}\).
Since the components we want to estimate depends only on the random effect part, it is of our interest to remove the dependence of the distribution of z on the fixed effect part. With \(P_o = P_{R(X)}\) denoting the projection matrix onto the column space of the matrix X, so that \(I_m - P_o\) will be the projection matrix onto its orthogonal complement, there is a matrix \(B_o\) whose columns are the eigenvectors associated to the null eigenvalues of \(P_o\) such that
Thus, instead of the model (1) we will approach the restricted model:
with \(M_d = B_o^\top N_d B_o\), \(n= m -r(P_o)\), and \(\mathbf{0 }_n\) denotes an \(n \times 1\) vector of zeros; that is, we will diagonalize the variance-covariance matrix
instead of V.
3.1 The Case \(r=2\)
In this subsection we will sub-diagonalize the variance-covariance matrix in the MLM for \(r=2\) (recall the general model in (2)), that is
There exists (see Schott [12, Chap. 4, Sects. 3 and 4]) an orthogonal matrix \(P_1= \begin{bmatrix} A_{11} \\ \vdots \\ A_{1h_1} \end{bmatrix}\in \mathscr {M}^{\left( \sum _{i=1}^{h_1}g_i\right) \times n}\), with \(A_{1i} \in \mathscr {M}^{g_i\times n}\) (\(\sum _{i=1}^{h_1}g_i =n\)), such that \(M_1=P_1^\top D_1P_1\), or equivalently \(P_1M_1P_1^\top = D_1\), where
is a diagonal matrix whose diagonal entries \(\theta _{1i}\), \(i=1,\ldots , h_1\), are the eigenvalues of the matrix \(M_1\) with corresponding roots \(g_i=r(A^{\top }_{1i})\), \(i=1,\ldots , h_1\). It must be noted that the set of columns of each matrix \(A_{1i}^\top \) forms a set of \(g_i\) orthonormal vectors associated to the eigenvalue \(\theta _{1i}\) of the matrix \(M_1\) (Theorem 3.10. of Schott [12] guarantees the existence of such matrix \(A_{1i}^\top \)), so that \(A_{1i}A_{1i}^\top = I_{g_i}\) and \(A_{1i}^\top A_{1i} = P_{R(A_{1i}^\top )}\). Hence \(P_1P_1^\top = I_n\), and
With
and \(cov(\nu )\) denoting the variance-covariance matrix of a random vector \(\nu \), we will have that
where
It is clear that for the three matrices \(D(\theta _1I_{g_1} \ldots \theta _{h1}I_{g_{h_1}})\), \(D(I_{g_1} \ldots I_{g_{h_1}})\) and \(\varGamma \) appearing in (7), the blockwise matrix \(\varGamma \) is the only one which is not a diagonal matrix.
Next we diagonalize the symmetric matrices \(M_{ii}^2\), \(i=1,\ldots , h_1\), that appear in the diagonal of the matrix \(\varGamma \), i.e, we sub-diagonalize the matrix \(\varGamma \).
Since \(M_{ii}^2\) is symmetric there exists (see Schott [12, Chap. 4, Sects. 3 and 4]) an orthogonal matrix \(P_{2i}=\begin{bmatrix} A_{2i1} \\ \vdots \\ A_{2ih_{2i}} \end{bmatrix} \in \mathscr {M}^{\left( \sum _{j=1}^{h_{2i}}g_{ij}\right) \times g_i}\), where \(A_{2ij} \in \mathscr {M}^{g_{ij}\times g_i}\) (\(\sum _{j=1}^{h_{2i}}g_{ij}=g_i\)), such that
It must be noted that the matrix \(A^{\top }_{2ij}\), \(i=1, \ldots ,h_1\), \(j=1,\ldots , h_{2i}\), is an orthogonal matrix whose columns form a set of \(g_{ij} =r(A^{\top }_{2ij})\) orthonormal eigenvectors associated to the eigenvalue \(\theta _{2ij}\) of the matrix \(M_{ii}^2\); that is, \(g_{ij}\) is the multiplicity of the eigenvalues \(\theta _{2ij}\), and \(A^T_{2ij}A_{2ij} = P_{R\left( A^T_{2ij}\right) }\) and \(A_{2ij}A^T_{2ij} =I_{g_{ij}}\).
Thus, with
the new model \(w_2=P_2P_1y\) will have variance-covariance matrix
where
and, with \(i \ne s\),
The matrix \(D_{ii}^2= P_{2i}M_{ii}^2P_{2i}^\top \), \(i=1,\ldots , h_1\), appearing in the diagonal at the right side of (9) is defined in (8).
Note that
The distribution of the sub-models
is summarized in the following result.
Proposition 1
where \(\lambda _{ij}= \gamma _1\theta _{1i} + \gamma _2\theta _{2ij} + \gamma _3\).
Proof
Recalling that \(A_{2ij}A_{1i} \in \mathscr {M}^{g_{ij}\times n}\) and \(g_{ij} \le n\), according with Moser [10, Theorem 2.1.2] we will have that
The portions \(\sum _{d=1}^2\gamma _dA_{2ij}A_{1i}M_dA_{1i}^\top A_{2ij}^\top \) and \(\gamma _3A_{2ij}A_{1i}A_{1i}^\top A_{2ij}^\top \) in the variance-covariance matrix yield:
and
which, clearly, completes the proof. \(\square \)
With \(\mathbf 0 \) denoting an adequate null matrix and \(cov(\nu , \upsilon )\) denoting the cross-covariance between the random vectors \(\nu \) and \(\upsilon \), from (9) one might note that the cross-covariance matrix between the sub-models \(y_{ij}=A_{2ij}A_iy\) and \(y_{sk}=A_{2sk}A_sy\), \(i,s =1,\ldots ,h_1\), \(j,k=1,\ldots ,h_{2i}\) is given by
with \(i\le s\), \(j\le k\) (symmetry applies), so that, for \(i \ne s\), the sub-models \(y_{ij}\) and \(y_{sk}\) are correlated and for \(i=s\) they are not.
3.2 Estimation for \(r=2\)
From the Sect. 3.1 we see that (with i and j respectively replaced by \(i_1\) and \(i_2\), for convenience) \(w_2=P_2P_1y\) produces the following sub-models
of the model \(y \sim \mathscr {N}_n(\mathbf{0 }_n, \ \gamma _1M_1 + \gamma _2M_2 + \gamma _3I_n)\), where
An unbiased estimator of \(\lambda _{i_1i_2}\) for model (11) is (one based on its maximum likelihood estimator \(\hat{\lambda }_{i_1i_2}\))
Indeed (see Rencher and Schaalje [11, Theorem 5.2a]),
Thus
so that, with \(S = \begin{bmatrix}S_{11}^2 \\ \ldots \\ S_{1h_{21}}^2 \\ S_{21}^2 \\ \ldots \\ S_{2h_{22}}^2\\ \ldots \\ \ldots \\ S_{h_11}^2\\ \ldots \\ S_{h_1h_{2h_1}}^2\end{bmatrix}\), \(\varTheta = \begin{bmatrix} \theta _{11}&\theta _{211}&1 \\ \ldots&\ldots&\ldots \\ \theta _{11}&\theta _{21h_{21}}&1 \\ \theta _{12}&\theta _{221}&1 \\ \ldots&\ldots&\ldots \\ \theta _{12}&\theta _{22h_{22}}&1 \\ \ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_11}&1\\ \ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_1h_{2h_1}}&1 \end{bmatrix}\), and \(\gamma = \begin{bmatrix} \gamma _1 \\ \gamma _2 \\ \gamma _3 \end{bmatrix}\), we will have
Thus, for \(i_1=1,\ldots , h_1, \ i_2=1,\ldots , h_{2i_1}\), equalizing the variances \(\lambda _{i_1i_2}\) to the correspondent estimators \(S_{i_1i_2}^2\) it yields the following system of equations:
which in matrix notation becomes
Since by construction \(\theta _{1i_1} \ne \theta _{1i_1^{'}}, \ i_1 \ne i_1^{'} = 1, \ldots , h_1\) (they are the different eigenvalues of \(M_1\)) and \( \theta _{2i_1i_2} \ne \theta _{2i_1i_2^{'}}, \ i_2 \ne i_2^{'} = 1, \ldots , h_{2i_1}\) (they are the distinct eigenvalues of \(M_{ii}^2 = A_{1i_1}M_2A_{1i_1}^\top \)), it is easily seen that the matrix \(\varTheta \) is a full rank one; that is \(r(\varTheta )=3\).
By Rencher and Schaalje [11, Theorem 2.6d] the matrix
is positive-definite , and by Rencher and Schaalje [11, Corollary 1], \(\varTheta ^\top \varTheta \) is non-singular ; we, thus, take its inverse to be \((\varTheta ^\top \varTheta )^{-1}\).
Now, premultiplying the system (14) in both side by \(\varTheta ^\top \) the resulting system of equations will be
whose unique solution (and therefore an estimator of \(\gamma \)) is
\(\hat{\gamma } = \begin{bmatrix} \hat{\gamma _1} \\ \hat{\gamma _2} \\ \hat{\gamma _3}\end{bmatrix}\) will be referred to as Sub-D estimator and the underlying method referred to as Sub-D method.
Proposition 2
\(\hat{\gamma }\) is an unbiased estimator of \(\gamma \), with \(\gamma = \begin{bmatrix} \gamma _1 \\ \gamma _2 \\ \gamma _3\end{bmatrix}\).
Proof
Indeed, \(E(\hat{\gamma }) =E\left( (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S\right) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top E(S) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top \varTheta \gamma = \gamma \). \(\square \)
Proposition 3
With \(i\le i^*\), \(j\le j^*\) (symmetry applies),
where \(\varOmega =\nabla _{ij}M_2\nabla _{i^*j^*}\), with \(\nabla _{ij} = \frac{A_{1i}^\top A_{2ij}^\top A_{2ij}A_{1i}}{g_{ij}}\).
Proof
We have that
For the case (a), that is \(i=i^*; j \ne j^*\), we have that
Therefore, (17)–(19) together with Schott [12, Theorem 1.3.(d)] proves the case (a).
For the case (c), that is \(i\ne i^*\), the desired result becomes clear if use the Theorem 1.3.(d) of Schott [12] and note that
Finally, for the case (b), that is \(i=i^*; j = j^*\), recalling \(y_{ij} \sim \mathscr {N}_n\left( \mathbf{0 }_{g_{ij}}, \ \lambda _{ij}I_{g_{ij}}\right) \), it holds
and therefore the proof is complete. \(\square \)
The next result introduce the variance-covariance matrix of the sub-diagonalization estimator:
Proposition 4
In order to simplify the notation, let \(\varSigma _{S_{ij}S_{kl}}\) denote \(cov({S_{ij}^2,\ S_{kl}^2})\). Then,
where \(cov(S) = \begin{bmatrix} D_1&\varLambda _{12}&\varLambda _{13}&\ldots&\varLambda _{1h_1} \\ \varLambda _{21}&D_2&\varLambda _{23}&\ldots&\varLambda _{2h_1}\\ \varLambda _{31}&\varLambda _{32}&D_3&\ldots&\varLambda _{3h_1}\\ \vdots&\vdots&\vdots&\ddots&\vdots \\ \varLambda _{h_11}&\varLambda _{h_12}&\varLambda _{h_13}&\ldots&D_{h_1} \end{bmatrix}\), with \(D_i= 2\begin{bmatrix} \frac{\lambda _{i1}^2}{g_{i1}}&0&\ldots&0\\ 0&\frac{\lambda _{i2}^2}{g_{i2}}&\dots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&\frac{\lambda _{ih_{2i}}^2}{g_{ih_{2i}}} \end{bmatrix}\) and \(\varLambda _{ks} = \begin{bmatrix} \varSigma _{S_{k1}S_{s1}}&\varSigma _{S_{k1}S_{s2}}&\ldots&\varSigma _{S_{k1}S_{sh_{2s}}} \\ \varSigma _{S_{k2}S_{s1}}&\varSigma _{S_{k2}S_{s2}}&\ldots&\varSigma _{S_{k2}S_{sh_{2s}}} \\ \vdots&\vdots&\ddots&\vdots \\ \varSigma _{S_{kh_{2k}}S_{s1}}&\varSigma _{S_{kh_{2k}}S_{s2}}&\ldots&\varSigma _{S_{kh_{2k}}S_{sh_{2s}}} \end{bmatrix}\).
Proof
The proof is a consequence of the Proposition 3. \(\square \)
3.3 The General Case: \(r \ge 1\)
Now, without lost in generality, lets consider the general MLM in (2):
One may note that \(y= \sum _{d=1}^{r+1}B_o^\top X_d\beta _d\), where \( \beta _d \sim \mathscr {N}_{}(0, \ \gamma _dI)\), \(d=1,\ldots ,r\), \(\beta _{r+1} \sim \mathscr {N}_{}(0, \ \gamma _dI_n)\), and \(\beta _1, \ldots , \beta _{r+1}\) are not correlated.
With \(i_1=1,\ldots ,h_1\), \(i_j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}}\), consider the finite sequence of r matrices \(P_1\), \(P_2\), ..., \(P_r\) defined as follow:
Thus, for \(r\ge 2\), each matrix \(P_{r}\) will be given by (\(P_1\) is given in (22)):
where
Theorem 1
Let the matrices \(P_1, P_2, \ldots , P_r\) defined above be such that:
- (\(c_1\)):
-
The columns of \(A_{1i_1}^\top \), \(i_1=1,\ldots ,h_1\), form a set of \(g_{i_1} = r(A_{1i_1}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{1i_1}\) of the matrix \(M_1\) (\(\theta _{1i_1}\) has multiplicity \(g_{i_1}\));
- (\(c_2\)):
-
The columns of \(A_{2i_1i_2}^\top \), \(i_2=1,\ldots ,h_{2,i_1}\), form a set of \(g_{i_1i_2} = r(A_{2i_1i_2}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{2i_1i_2}\) of the matrix \(M_{i_1i_1}^2=A_{1i_1}M_2A_{1i_1}^\top \) (\(\theta _{2i_1i_2}\) has multiplicity \(g_{i_1i_2}\));
- (\(c_3\)):
-
The columns of \(A_{3i_1i_2i_3}^\top \), \(i_3=1,\ldots ,h_{3,i_1,i_2}\), form a set of \(g_{i_1i_2i_3} = r(A_{3i_1i_2i_3}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{3i_1i_2i_3}\) of the matrix
$$A_{2i_1i_2}M_{i_1i_1}^3A_{2i_1i_2}^\top =A_{2i_1i_2}A_{1i_1}M_3A_{1i_1}^\top A_{2i_1i_2}$$(\(\theta _{3i_1i_2i_3}\) has multiplicity \(g_{i_1i_2i_3}\));
............
- (\(c_r\)):
-
The columns of \(A_{ri_1\ldots i_r}^\top \), \(i_r=1,\ldots ,h_{r,i_1,\ldots ,i_{r-1}}\), form a set of \(g_{i_1\ldots i_r} = r(A_{ri_1\ldots i_r}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{ri_1\ldots i_r}\) of the matrix
$$A_{(r-1)i_1\ldots i_{(r-1)}}\ldots A_{1i_1}M_rA_{1i_1}^\top \ldots A_{(r-1)i_1\ldots i_{(r-1)}}^\top $$(\(\theta _{ri_1\ldots i_r}\) has multiplicity \(g_{i_1\ldots i_r}\)).
Then each matrix \(P_d\), \(d=1,\ldots , r\), in the finite sequence of matrices \(P_1, P_2,\ldots ,Pr\) will be an orthogonal matrix.
Proof
By the way \(P_d\) is defined (see (23)), since
and according with condition \(c_d\) we see that the matrices \(P_{di_1\dots i_{(d-1)}}\) are orthogonal. Thus, the desired result comes if we see that \(P_d^\top P_d\) will be a diagonal blockwise matrix whose diagonal entries are \(P_{di_1}^\top P_{di_1}\), \(i_1=1,\ldots ,h_1\). The diagonal entries \(P_{di_1}^\top P_{di_1}\) will be diagonal blockwise matrices whose diagonal entries will be \(P_{di_1i_2}^\top P_{di_1i_2}\), \(i_2=1,\ldots ,h_{2,i_1}\). Proceeding this way \(d-2\) times, we will find that the diagonal entries of the blockwise matrices \(P_{di_1\ldots i_{(d-2)}}^\top P_{di_1\ldots i_{(d-2)}}\), \(i_{(d-2)}=1,\ldots ,h_{(d-2),i_1,\ldots ,i_{d-3}}\), will be
reaching, therefore, the desired result. Proceeding in same way we would also see that \(P_{di_1\dots i_{(d-1)}}P_{di_1\dots i_{(d-1)}}^\top \) is a Blockwise diagonal matrix whose diagonal entries are \(A_{di_1\ldots i_{(d-1)}1}A_{di_1\ldots i_{(d-1)}j}^\top \), \(j=1,\ldots , h_{d,i_1,\ldots ,i_{d-1}}\), so that \(P_dP_d^\top \) is an identity matrix. \(\square \)
The model \(w_r=P_r\ldots P_2P_1y\) will produces the following sub - models:
We summarize the distribution of each of the sub-model \(y_{i_1\ldots i_r}\) in the following result.
Proposition 5
where \(\lambda _{i_1\ldots i_r}=\sum _{d=1}^r\gamma _d\theta _{di_1\ldots i_d} + \gamma _{r+1}\).
Proof
The proof becomes obvious after looking to the proofs of the Proposition 1. \(\square \)
From the results about cross-covariance on the preceding sections we easily conclude that the cross-covariance matrix between the sub-models \(y_{i_1\ldots i_r}\) and \(y_{i_1^*\ldots i_r^*}\), with \(i_1,i^*_1=1,\ldots ,h_1\); \(i_j,i_j^*=1,\ldots , h_{j,i_1,\ldots ,i_{j-1}}\), is given by
so that, for \(i_1 \ne i^*_1\), the sub-models \(y_{i_1\ldots i_r}\) and \(y_{i_1^*\ldots i_r^*}\) are correlated and for \(i_1=i^*_1\) they are not.
3.4 Estimation for the General Case: \(r \ge 1\)
Recalling that for the MLM in (1), \(P_r\ldots P_2P_1y\) produces the following sub-models
where
The matrices \(P_d\), \(d=1,\ldots , r\), are defined in the Sect. 3.3.
An unbiased estimator of \(\lambda _{i_1i_2\ldots i_r}\) in the sub-model (24) is (the one based on its maximum likelihood estimator \(\hat{\lambda }_{i_1i_2\ldots i_r}\))
Indeed (see Rencher and Schaalje [11], Theorem 5.2(a), and the explanation for (12)),
For convenience, in what follows, instead of \(S_{i_1i_2\ldots i_r}^2\), we may sometimes use the notation \(S_{i_1i_2\ldots i_{(r-1)}i_r}^2\).
Thus
so that, with \(S = \begin{bmatrix}S_{11\ldots 11}^2 \\ S_{11\ldots 12}^2 \\ \ldots \\ S_{11\ldots 1h_{r,1,\ldots ,1}}^2\\ S_{11\ldots 21}^2 \\ \ldots \\ S_{11\ldots 2h_{r,1,\ldots ,2}}^2\\ \ldots \\ \ldots \\ \ldots \\ S_{h_11\ldots 11}^2\\ \ldots \\ \ldots \\ \ldots \\ S_{h_1h_{2,h_1}\ldots h_{r,h_1,\ldots ,h_{r-1}}}^2\end{bmatrix}\),
\(\varTheta = \begin{bmatrix} \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 11}&1 \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 12}&1 \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 1h_{r,1,\ldots ,1,h_{r-1}}}&1 \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 21}&1 \\ \ldots&\ldots&\ldots&\ldots&\dots&\ldots \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 2h_{r,1,\ldots ,2,h_{r-1}}}&1 \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_11}&\theta _{3h_111}&\ldots&\theta _{rh_11\ldots 11}&1 \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_1h_{2,h_1}}&\theta _{3h_1h_{2,h_1}h_{3,h_1,h_2}}&\ldots&\theta _{rh_1h_{2,h_1}\ldots h_{(r-1),h_1,\ldots ,h_{r-2}}h_{r,h_1,\ldots ,h_{r-1}}}&1 \end{bmatrix}\),
and \(\gamma = \begin{bmatrix} \gamma _1 \\ \gamma _2 \\ \gamma _3 \\ \ldots \\ \ldots \\ \gamma _{r} \\ \gamma _{(r+1)} \end{bmatrix}\), we will have
Thus, for \(i_1=1,\ldots ,h_1\), \(i_j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}}\), \(j>1\), equalizing the variances \(\lambda _{i_1i_2\ldots i_{r}}\) to the correspondent estimators \(S_{i_1i_2\ldots i_{r}}^2\) it yields the following system of equations (in matrix notation)
Since by construction \(\theta _{1i_1} \ne \theta _{1i_1^{'}}\) (they are the different eigenvalues of \(M_1\)), \(\theta _{2i_1i_2} \ne \theta _{2i_1i_2^{'}}\) (they are the distinct eigenvalues of \(M_{ii}^2 = A_{1i_1}M_2A_{1i_1}^\top \)), \(\theta _{3i_1i_2i_3} \ne \theta _{3i_1i_2i_3^{'}}\) (they are the distinct eigenvalues of \(A_{2i_1i_2}A_{1i_1}M_2A_{1i_1}^\top A_{2i_1i_2}^\top )\), ..., \(\theta _{ri_1i_2\ldots i_{(r-1)}i_r} \ne \theta _{ri_1i_2\ldots i_{(r-1)}i_r^{'}}\) (they are the distinct eigenvalues of \(A_{(r-1)i_1i_2\ldots i_{(r-1)}}\ldots A_{1i_1}M_rA_{1i_1}^\top \ldots A_{(r-1)i_1i_2\ldots i_{(r-1)}}^\top \)) where \(i_j \ne i_j^{'}, \ j=1,\ldots ,r\), it is easily seen that the matrix \(\varTheta \) is of full rank; that is \(r(\varTheta )=r +1\).
According with Theorem 2.6d (Rencher and Schaalje [11]), with \(\sum \) denoting \( \sum _{i_1}^{h_1}\sum _{i_2}^{h_{2,i_1}}\ldots \sum _{i_r}^{h_{r,i_1,\ldots ,i_{r-1}}}\), the matrix
is positive-definite , and according with Corollary 1 of (Rencher and Schaalje [11], p. 27) \(\varTheta ^\top \varTheta \) is non-singular ; that is, it is invertible. We denote its inverse by \((\varTheta ^\top \varTheta )^{-1}\).
Now, premultiplying the system (27) in both side by \(\varTheta ^\top \) the resulting system of equations will be
whose unique solution (and therefore an estimator of \(\gamma \)) will be the Sub-D estimator
Proposition 6
\(\hat{\gamma } =(\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S\) is an unbiased estimator of
Indeed, \(E(\hat{\gamma }) =E\left( (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S\right) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top E(S) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top \varTheta \gamma = \gamma \).
4 Numerical Results
In this section we carry numerical tests to the sub-diagonalization method for the case \(r=2\), that is for a model with 3 variances components . For this case we pick the particular model \(z \sim \mathscr {N}_{21}\left( X\beta , \gamma _1N_1 + \gamma _2N_2 + \gamma _3I_{21}\right) \), where \(N_j = X_jX_j^\top \), \(j=1,2\), with design matrices
and \(X=1_{21}\). \(1_k\) and \(0_k\) denote, respectively, \(k \times 1\) vectors of 1 and 0.
Let \(B_o\) be a matrix whose columns are the eigenvectors associated to the null eigenvalues of \(\frac{1}{21}J_{21}\). Then \(B_oB_o^\top = I_{21} - \frac{1}{21}J_{21}\) and \(B_o^\top B_o = I_{20}\), and so the new model will be
where \(M_d= B_o^\top N_dB_o\).
Since \(r(N_1)=3\) we have that (see Schott [12, Theorem 2.10c]) \(r(M_1)=\) \(r(B_0^\top N_1B_0)=3\). The eigenvalues of \(M_1\) are \(\theta _{11}=7.979829\), \(\theta _{12}= 5.639219\), and \(\theta _{13}=0\) (\(\theta _{13}\) with multiplicity (root) equal to 18). Thus we have that \(M_{11}^2 = A_{11}M_2A_{11}^\top = 5.673759\) and \(M_{22}^2 = A_{12}M_2A_{12}^\top = 0.6246537\) will be \(1 \times 1\) matrices, and \(M_{33}^2 = A_{13}M_2A_{13}^\top \) an \(18 \times 18\) matrix.
We have the following: \(M_{11}^2\) has eigenvalue \(\theta _{211} = 5.673759\); \(M_{22}^2\) has eigenvalue \( \theta _{221} = 0.6246537\); \(M_{33}^2\) has 3 eigenvalues : \(\theta _{231}=6.390202\); \(\theta _{232}=1.216148\); \(\theta _{233}= 0\) (\(\theta _{233}\) with multiplicity equal to 16).
Finally we found that
and \(\varTheta =\begin{bmatrix} 7.979829&5.6737590&1\\ 5.639219&0.6246537&1\\ 0&6.3902016&1 \\ 0&1.2161476&1 \\ 0&0&1 \end{bmatrix}\).
With \(\beta _k \sim \mathscr {N}_{20}\left( \mathbf{0 }_{3}, \gamma _kI_{3}\right) \), \(k=1,2\), and \(e \sim \mathscr {N}_{20}(\mathbf{0 }_{20}, \gamma _3I_{20})\), and taking \(\gamma _3=1\), the model can be rewritten as \(y = B_o^\top X_1\beta _1 + B_o^\top X_2\beta _2 + B_o^\top e\).
We consider \(\gamma _1\) and \(\gamma _2\) taking values in \(\{0.1, 0.25, 0.5, 0.75, 1, 2, 5, 10\}\). Thus, for each possible combination of \(\gamma _1\) and \(\gamma _2\), the model y is observed 1000 time, and for each observation the sub-diagonalization method is applied and the variance components estimated for each observed y. The Tables 1 and 3 present the average of the estimated values of \(\gamma _1\) and \(\gamma _2\), respectively. In order to compare the sub-diagonalization method performance with the REML, for the same 1000 observations of y, the REML method is applied and the results presented in both Tables 2 and 4.
Taking a look at tables, and comparing the averages estimated values from the sub-diagonalization method to the ones of the REML methods (see Tables 1, 2, 3, and 4), the reader may easily concludes that the results provided by the sub-diagonalization method are in general slightly more realistic. In other hand, the averages variability of the sub-diagonalization methods is relatively higher than those of REML method (see Tables 5, 6, 7, and 8); this is because of the correlation between the sub-models. This gap will be fixed in future works.
5 Concluding Remarks
Besides its simple and fast computational implementation once it depends only on the information retained on the eigenvalues of the design matrices and the quadratic errors of the model, Sub-D provides centered estimates whether for balanced or unbalanced designs, which is not the case of estimators based on ANOVA methods. As seen at Sect. 4, Sub-D provides a slightly more realistic estimates than the REML estimator, but with more variability (when the model is balanced they have a comparable variability). However, since in any computational program (source code) when we are interested in share the code, create package or use it repeatedly, we might consider its efficiency and, for this matter, the code run-time constitutes a good start point. Doing so, to compute the estimates and the corresponding variance for each pair \(\gamma _1\) and \(\gamma _2\) taking values in \(\{0.25,\ 0.5,\ 1,\ 2,\ 5,\ 10\}\), for 1000 observations of the model, we found that the Sub-D run-time is about 0.25 s while the REML estimator run-time is about 35.53 s, which means that the code for Sub-D is more than 70 times faster than the one for REML. The code was run using R software.
It seems that the problem of the little higher variability in Sub-D comparing to REML estimator is due to the correlation between the sub-models (for the case of models with three variance components , for example) \(y_{ij}\), \(i=1,\ldots ,h_1\), \(j=1, \ldots , h_{2h_i}\). From (10) we see that the variance components matrix of the model \(w_2 = P_2P_1y\) is a blockwise matrix whose diagonal matrices are \(D_1\),\(\ldots \), \(D_{h_1}\), where \(D_i = diag(\lambda _{i1},\ldots ,\lambda _{ih_{2i}})\), corresponding to \(cov(y_{ij},y_{sk})\) for \(i=s\), \(j=k\), and the off diagonal matrices are the non-null matrices \(\gamma _2A_{2ij}W_{is}^2A_{2sk}\), corresponding to \(cov(y_{ij},y_{sk})\) for \(i\ne s\). This problem will be handled in future work. Confidence region will be obtained and tests of Hypothesis for the variance components will be derived in future works.
References
Anderson, R.L.: Use of variance component analysis in the interpretation of biological experiments. Bull. Int. Stat. Inst. 37, 71–90 (1960)
Anderson, R.L.: Designs and estimators for variance components. Statistical Design and Linear Model, pp. 1–30. North-Holland, Amsterdam (1975)
Anderson, R.L.: Recent developments in designs and estimators for variance components. Statistics and Related Topics, pp. 3–22. North-Holland, Amsterdam (1981)
Anderson, R.L., Crump, P.P.: Comparisons of designs and estimation procedures for estimating parameters in a two-stages nested process. Tecnometrics 9, 499–516 (1967)
Casella, G., Berger, R.L.: Statistical Inference. Duxbury Pacific Grove (2002)
Hartley, H.O., Rao, J.K.: Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54, 93–108 (1967)
Khuri, A.I.: Design for variance components estimation: past and present. Int. Stat. Rev. 68, 311–322 (2000)
Khuri, A.I., Sahai, H.: Variance components analysis: a selective literature survey. Int. Stat. Rev. 53, 279–300 (1985)
Miller, J.J.: Asymptotic properties and computation of maximum likelihood estimates in the mixed model of the analysis of variance. Technical report 12, Department of Statistics, Stanford University, Stanford, California (1973)
Moser, B.: Linear Models: A Mean Model Approach. Elsevier, New York (1996)
Rencher, A.C., Schaalje, G.B.: Linear Models in Statisitcs. Wiley, New York (2008)
Schott, J.R.: Matrix Analysis for Statistics. Wiley, New York (1997)
Searle, S.: Topics in variance component estimation. Biometrics 27, 1–76 (1971)
Searle, S.: An overview of variance component estimation. Metrika 42, 215–230 (1995)
Searle, S., Casella, G., McCulloch, C.: Variance Components. Wiley, New York (2009)
Acknowledgements
This work was partially supported by the Fundaç ao para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) through PEst-OE/MAT/UI0297/2011 (CMA), and by the Fundaç ao Calouste Gulbenkian Through a PhD Grants. It was also partially supported by Universidade de Cabo Verde.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Silva, A., Fonseca, M., Mexia, J. (2017). Variance Components Estimation in Mixed Linear Model—The Sub-diagonalization Method. In: Bebiano, N. (eds) Applied and Computational Matrix Analysis. MAT-TRIAD 2015. Springer Proceedings in Mathematics & Statistics, vol 192. Springer, Cham. https://doi.org/10.1007/978-3-319-49984-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-49984-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49982-6
Online ISBN: 978-3-319-49984-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)