Keywords

1 Introduction

Mixed linear models (MLM) arise due to the necessity of assessing the amount of variation caused by certain sources in a statistical designs with fixed effects (see Khuri [7]), for example, the amount of variations that are not controlled by the experimenters and those whose levels are selected at random. The variances of such sources of variation, currently refereed to as variance components , has been widely investigated in the last fifty years of the last century (see Khuri and Sahai [8], Searle [13, 14], among others) and during the period ranging somewhat from early 1960 to 1990, due to the proliferation of investigation on genetic and animal breeding as well as industrial quality control and improvement (for more details, see Anderson [1,2,3], Anderson and Crump [4], Searle [13], among others), several techniques of estimation have been proposed. Among those techniques we highlight the ANOVA and the maximum likelihood - based methods (see, for example, Searle et al. [15] and Casella and Berger [5]). Nevertheless, notwithstanding the ANOVA method adapt readily to mixed models with balanced data and save the unbiasedness, it does not adapt in situation with unbalanced data (mostly because it use computations derived from fixed effect models rather than mixed models). On its turn, the maximum likelihood - based methods, highlighting the ML and the restricted ML (REML) methods, provide estimators with several statistical optimal properties such as consistency and asymptotic normality either for models with balanced data, or for those with unbalanced data. For these optimal properties we recommend Miller [9], and for some details on applications of such methods we recommend, for example, Anderson [2] and Hartley and Rao [6].

This paper is organized as follows. In Sect. 2 (notation and basic concepts on matrix theory) we review some needed notions and results on matrix theory, mainly on matrix diagonalization. A new method to estimate the variance components in the MLM is summarized in Sect. 3, and numerical results ensuring their optimality will be available in Sect. 4.

2 Notation and Basic Concepts on Matrix Theory

In this section we summarize a few needed notions and results on matrix diagonalization. The proofs for the results can be found in Schott [12].

Let \(\mathscr {M}^{n\times m}\) and \(\mathscr {S}^n = \{A: A\in \mathscr {M}^{n\times n}, A=A^\top \}\) stands for the set of the matrices with n rows and m columns and the set of the \(n \times n\) symmetric matrices, respectively. The range and the rank of a matrix A will be respectively denoted by R(A) and r(A), and the projection matrix onto the range space of A denoted by \(P_{R(A)}\) (see Schott [12, Chap. 2, Sect. 7] for projection matrix notion). We will denote by tr(A) the trace of A.

If the eigenvalues \(\lambda _1,\ldots ,\lambda _r\) of the matrix \(M \in \mathscr {M}^{r\times r}\) are all distinct, it follows from the Theorem 3.6 of Schott [12] that the matrix X, whose columns are the eigenvectors associated to those eigenvalues , is non-singular . Thus, by the eigenvalue - eigenvector equation \(MX =XD\) or, equivalently, \(X^{-1}MX=D\), with \(D=diag(\lambda _1 \ldots \lambda _r)\), and the Theorem 3.2.(d) of Schott [12], the eigenvalues of D are the same as those of M. Meanwhile, since M can be transformed into a diagonal matrix by postmultiplication by the non-singular matrix X and premultiplication by its inverse \(X^{-1}\) it is said to be diagonalizable.

If the matrix M is symmetric we will have that the eigenvectors associated to its different eigenvalues will be orthogonal (see Schott [12]). Indeed, if we consider two different eigenvalues \(\lambda _i\) and \(\lambda _j\) whose associated eigenvectors are \(\mathbf {x}_i\) and \(\mathbf {x}_j\), respectively, we see that, since M is symmetric ,

$$ \lambda _i\mathbf {x}_i^\top \mathbf {x}_j = (M\mathbf {x}_i)^\top \mathbf {x}_j = \mathbf {x}_i^\top (M\mathbf {x}_j)= \lambda _j\mathbf {x}_i^\top \mathbf {x}_j.$$

So, since \(\lambda _i \ne \lambda _j\), we must have \(\mathbf {x}_i^\top \mathbf {x}_j = 0\).

According with Theorem 3.10 of Schott [12], without lost in generality, the columns of the matrix X can be taken to be orthonormal so that X is an orthogonal matrix. Thus, the eigenvalue - eigenvector equation can now be written as

$$X^\top MX=D\ \text{ or, } \text{ equivalently, } \ M=XDX^\top ,$$

which is known as spectral decomposition of M.

Definition 1

Let

$$A = \begin{bmatrix} A_{11}&\ldots&A_{1n}\\ \vdots&\ddots&\vdots \\ A_{n1}&\ldots&A_{nn} \end{bmatrix}$$

be a diagonal blockwise matrix. We say that a matrix T sub-diagonalizes A if the TA produces a blockwise matrix whose matrices in the diagonal are all diagonal matrices, that is T diagonalizes the matrices \(A_{11}, \ldots , A_{nn}\) in the diagonal of A.

3 Inference

Variance components estimation in linear models (with mixed and/or fixed effects) have been widely investigated and consequently several methods for estimation with important properties have been derived. Some of this methods are summarized in Searle et al. [15].

In this section we will sub-diagonalize the variance-covariance matrix

$$V= \sum _{d=1}^{r+1}\gamma _d N_d $$

in the Normal MLM

$$\begin{aligned} z \sim \mathscr {N}_m\left( X\beta ,\ V\right) , \end{aligned}$$
(1)

with \(\gamma _d > 0\), \(d=1,\ldots ,r\), unknown parameters, \(N_d =X_d X_d^\top \in \mathscr {S}^m\), \(X_d \in \mathscr {M}^{m\times s}\) known matrices, and \(N_{r+1}=I_m\), and develop optimal estimators for the variance components \(\gamma _1, \ldots , \gamma _{r+1}\).

Since the components we want to estimate depends only on the random effect part, it is of our interest to remove the dependence of the distribution of z on the fixed effect part. With \(P_o = P_{R(X)}\) denoting the projection matrix onto the column space of the matrix X, so that \(I_m - P_o\) will be the projection matrix onto its orthogonal complement, there is a matrix \(B_o\) whose columns are the eigenvectors associated to the null eigenvalues of \(P_o\) such that

$$B_o^\top B_o =I_{m - r(P_o)}\ \ \text{ and } \ \ B_oB_o^\top = I_m - P_o.$$

Thus, instead of the model (1) we will approach the restricted model:

$$\begin{aligned} y = B_o^\top z \sim \mathscr {N}_n\left( \mathbf{0 }_n,\ \sum _{d=1}^{r+1}\gamma _{d} M_{d}\right) , \end{aligned}$$
(2)

with \(M_d = B_o^\top N_d B_o\), \(n= m -r(P_o)\), and \(\mathbf{0 }_n\) denotes an \(n \times 1\) vector of zeros; that is, we will diagonalize the variance-covariance matrix

$$V^* = \sum _{d=1}^{r+1}\gamma _{d} M_{d}$$

instead of V.

3.1 The Case \(r=2\)

In this subsection we will sub-diagonalize the variance-covariance matrix in the MLM for \(r=2\) (recall the general model in (2)), that is

$$\begin{aligned} y \sim \mathscr {N}_n\left( \mathbf{0 }_n,\ \gamma _1M_1 + \gamma _2 M_2 + \gamma _3 I_n\right) . \end{aligned}$$
(3)

There exists (see Schott [12, Chap. 4, Sects. 3 and 4]) an orthogonal matrix \(P_1= \begin{bmatrix} A_{11} \\ \vdots \\ A_{1h_1} \end{bmatrix}\in \mathscr {M}^{\left( \sum _{i=1}^{h_1}g_i\right) \times n}\), with \(A_{1i} \in \mathscr {M}^{g_i\times n}\) (\(\sum _{i=1}^{h_1}g_i =n\)), such that \(M_1=P_1^\top D_1P_1\), or equivalently \(P_1M_1P_1^\top = D_1\), where

$$\begin{aligned} D_1 = \begin{bmatrix} \theta _{11}I_{g_1}&0&\ldots&0 \\ 0&\theta _{12}I_{g_2}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&\theta _{1h_1}I_{g_{h_1}} \end{bmatrix} \end{aligned}$$
(4)

is a diagonal matrix whose diagonal entries \(\theta _{1i}\), \(i=1,\ldots , h_1\), are the eigenvalues of the matrix \(M_1\) with corresponding roots \(g_i=r(A^{\top }_{1i})\), \(i=1,\ldots , h_1\). It must be noted that the set of columns of each matrix \(A_{1i}^\top \) forms a set of \(g_i\) orthonormal vectors associated to the eigenvalue \(\theta _{1i}\) of the matrix \(M_1\) (Theorem 3.10. of Schott [12] guarantees the existence of such matrix \(A_{1i}^\top \)), so that \(A_{1i}A_{1i}^\top = I_{g_i}\) and \(A_{1i}^\top A_{1i} = P_{R(A_{1i}^\top )}\). Hence \(P_1P_1^\top = I_n\), and

$$\begin{aligned} P_1^\top P_1= & {} A_{11}^\top A_{11} + \cdots + A_{1h_1}^\top A_{1h_1}\nonumber \\= & {} P_{R(A_{11}^\top )} + \cdots + P_{R(A_{1h_1}^\top )}\nonumber \\= & {} I_n. \end{aligned}$$
(5)

With

$$\begin{aligned} A_{1i} M_2A_{1s}^\top =\left\{ \begin{array}{cc} M_{ii}^2 &{}\quad i=s\\ W_{is}^2 &{}\quad i\ne s \end{array} \right. \end{aligned}$$
(6)

and \(cov(\nu )\) denoting the variance-covariance matrix of a random vector \(\nu \), we will have that

$$\begin{aligned} cov(P_1y)= & {} \gamma _1P_1M_1P_1^\top + \gamma _2P_1M_2P_1^\top + \gamma _3 P_1P_1^\top \nonumber \\= & {} \gamma _1 \begin{bmatrix} \theta _{11}I_{g_1}&0&\ldots&0 \\ 0&\theta _{12}I_{g_2}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&\theta _{1h_1}I_{g_{h_1}} \end{bmatrix} + \gamma _2 \begin{bmatrix} M_{11}^2&W_{12}^2&\ldots&W_{1h_1}^2 \\ W_{21}^2&M_{22}^2&\ldots&W_{2h_1}^2 \\ \vdots&\vdots&\ddots&\vdots \\ W_{h_11}^2&W_{h_12}^2&\ldots&M_{h_1h_1}^2 \end{bmatrix} \nonumber \\&+\, \gamma _3 \begin{bmatrix} I_{g_1}&0&\ldots&0 \\ 0&I_{g_2}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&I_{g_{h_1}} \end{bmatrix} \nonumber \\= & {} \gamma _1 D(\theta _1I_{g_1} \ldots \theta _{h1}I_{g_{h_1}}) + \gamma _2\varGamma + \gamma _3D(I_{g_1} \ldots I_{g_{h_1}}), \end{aligned}$$
(7)

where

$$\varGamma = \begin{bmatrix}M_{11}^2&W_{12}^2&\ldots&W_{1h_1}^2 \\ W_{21}^2&M_{22}^2&\ldots&W_{2h_1}^2 \\ \vdots&\vdots&\ddots&\vdots \\ W_{h_11}^2&W_{h_12}^2&\ldots&M_{h_1h_1}^2 \end{bmatrix}.$$

It is clear that for the three matrices \(D(\theta _1I_{g_1} \ldots \theta _{h1}I_{g_{h_1}})\), \(D(I_{g_1} \ldots I_{g_{h_1}})\) and \(\varGamma \) appearing in (7), the blockwise matrix \(\varGamma \) is the only one which is not a diagonal matrix.

Next we diagonalize the symmetric matrices \(M_{ii}^2\), \(i=1,\ldots , h_1\), that appear in the diagonal of the matrix \(\varGamma \), i.e, we sub-diagonalize the matrix \(\varGamma \).

Since \(M_{ii}^2\) is symmetric there exists (see Schott [12, Chap. 4, Sects. 3 and 4]) an orthogonal matrix \(P_{2i}=\begin{bmatrix} A_{2i1} \\ \vdots \\ A_{2ih_{2i}} \end{bmatrix} \in \mathscr {M}^{\left( \sum _{j=1}^{h_{2i}}g_{ij}\right) \times g_i}\), where \(A_{2ij} \in \mathscr {M}^{g_{ij}\times g_i}\) (\(\sum _{j=1}^{h_{2i}}g_{ij}=g_i\)), such that

$$\begin{aligned} D_{ii}^2 = P_{2i}M_{ii}^2P_{2i}^\top = \begin{bmatrix} \theta _{2i1}I_{g_{i1}}&0&\ldots&0 \\ 0&\theta _{2i2}I_{g_{i2}}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&\theta _{2ih_{2i}}I_{g_{ih_{2i}}} \end{bmatrix}, \ i=1,\ldots , h_1.\end{aligned}$$
(8)

It must be noted that the matrix \(A^{\top }_{2ij}\), \(i=1, \ldots ,h_1\), \(j=1,\ldots , h_{2i}\), is an orthogonal matrix whose columns form a set of \(g_{ij} =r(A^{\top }_{2ij})\) orthonormal eigenvectors associated to the eigenvalue \(\theta _{2ij}\) of the matrix \(M_{ii}^2\); that is, \(g_{ij}\) is the multiplicity of the eigenvalues \(\theta _{2ij}\), and \(A^T_{2ij}A_{2ij} = P_{R\left( A^T_{2ij}\right) }\) and \(A_{2ij}A^T_{2ij} =I_{g_{ij}}\).

Thus, with

$$P_2 =\begin{bmatrix} P_{21}&0&\ldots&0 \\ 0&P_{22}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{2h_1} \end{bmatrix}\ \in \mathscr {M}^{\left( \sum _{i=1}^{h_1}\sum _{j=1}^{h_{2i}}g_{ij}\right) \times \left( \sum _{i=1}^{h_1}g_i\right) }, $$

the new model \(w_2=P_2P_1y\) will have variance-covariance matrix

$$\begin{aligned} cov(w_2)=\varSigma (P_2P_1y)= & {} \gamma _1P_2D(\theta _{11}I_{g_1} \ldots \theta _{1h1}I_{g_{h_1}})P_2^\top + \gamma _2P_2\varGamma P_2^\top + \gamma _3 P_2D(I_{g_1} \ldots I_{g_{h_1}})P_2^\top \nonumber \\= & {} \gamma _1 \begin{bmatrix} \theta _{11}P_{21}P_{21}^\top&0&\ldots&0 \\ 0&\theta _{12}P_{22}P_{22}^\top&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&\theta _{1h_1}P_{2h_1}P_{2h_1}^\top \end{bmatrix} \nonumber \\&+\, \gamma _2 \begin{bmatrix} D_{11}^2&P_{21}W_{12}^2P_{22}^\top&\ldots&P_{21}W_{1h_1}^2P_{2h_1}^\top \\ P_{22}W_{21}^2P_{21}^\top&D_{22}^2&\ldots&P_{22}W_{2h_1}^2P_{2h_1}^\top \\ \vdots&\vdots&\ddots&\vdots \\ P_{2h_1}W_{h_11}^2P_{21}^\top&P_{2h_1}W_{h_12}^2P_{22}^\top&\ldots&D_{h_1h_1}^2 \end{bmatrix} \nonumber \\&+\, \gamma _3 \begin{bmatrix} P_{21}P_{21}^\top&0&\ldots&0 \\ 0&P_{22}P_{22}^\top&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{2h_1}P_{2h_1}^\top \end{bmatrix}, \end{aligned}$$
(9)

where

$$\begin{aligned} P_{2i}P_{2i}^\top = \begin{bmatrix} A_{2i1}A_{2i1}^\top&0&\ldots&0 \\ 0&A_{2i2}A_{2i2}^\top&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&A_{2ih_{2i}}A_{2ih_{2i}}^\top \end{bmatrix} = \begin{bmatrix} I_{g_{i1}}&0&\ldots&0 \\ 0&I_{g_{i2}}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&I_{g_{ih_{2i}}} \end{bmatrix}, \end{aligned}$$

and, with \(i \ne s\),

$$\begin{aligned} P_{2i}W_{is}^2P_{2s}^\top = \begin{bmatrix} A_{2i1}W_{is}^2A_{2s1}^\top&A_{2i1}W_{is}^2A_{2s2}^\top&\ldots&A_{2i1}W_{is}^2A_{2sh_{2s}}^\top \\ A_{2i2}W_{is}^2A_{2s1}^\top&A_{2i2}W_{is}^2A_{2s2}^\top&\ldots&A_{2i2}W_{is}^2A_{2sh_{2s}}^\top \\ \vdots&\vdots&\ddots&\vdots \\ A_{2ih_{2i}}W_{is}^2A_{2s1}^\top&A_{2ih_{2i}}W_{is}^2A_{2s2}^\top&\ldots&A_{2ih_{2i}}W_{is}^2A_{2sh_{2s}}^\top \end{bmatrix}. \nonumber \end{aligned}$$

The matrix \(D_{ii}^2= P_{2i}M_{ii}^2P_{2i}^\top \), \(i=1,\ldots , h_1\), appearing in the diagonal at the right side of (9) is defined in (8).

Note that

$$w_2 = P_2P_1y= \begin{bmatrix} A_{211}A_{11}y\\ \vdots \\ A_{21h_{21}}A_{11}y \\ A_{221}A_{12}y\\ \vdots \\ A_{22h_{22}}A_{12}y \\ \vdots \\ \vdots \\ A_{2h_11}A_{1h_1}y\\ \vdots \\ A_{2h_1h_{2h_1}}A_{1h_1}y \end{bmatrix}.$$

The distribution of the sub-models

$$y_{ij} = A_{2ij}A_{1i}y, \ i=1,\ldots ,h_1, \ j=1,\ldots , h_{2i}$$

is summarized in the following result.

Proposition 1

$$y_{ij} \sim \mathscr {N}_{g_{ij}}\left( \mathbf{0 }_{g_{ij}},\ \lambda _{ij}I_{g_{ij}}\right) , \ i=1,\ldots ,h_1;\ j=1,\ldots ,h_{2i},$$

where \(\lambda _{ij}= \gamma _1\theta _{1i} + \gamma _2\theta _{2ij} + \gamma _3\).

Proof

Recalling that \(A_{2ij}A_{1i} \in \mathscr {M}^{g_{ij}\times n}\) and \(g_{ij} \le n\), according with Moser [10, Theorem 2.1.2] we will have that

$$\begin{aligned} y_{ij} \sim \mathscr {N}_{g_{ij}}\left( \mathbf{0 }_{g_{ij}}, \ \sum _{d=1}^2\gamma _dA_{2ij}A_{1i}M_dA_{1i}^\top A_{2ij}^\top + \gamma _3A_{2ij}A_{1i}A_{1i}^\top A_{2ij}^\top \right) . \end{aligned}$$

The portions \(\sum _{d=1}^2\gamma _dA_{2ij}A_{1i}M_dA_{1i}^\top A_{2ij}^\top \) and \(\gamma _3A_{2ij}A_{1i}A_{1i}^\top A_{2ij}^\top \) in the variance-covariance matrix yield:

$$\begin{aligned} \sum _{d=1}^2\gamma _dA_{2ij}A_{1i}M_dA_{1i}^\top A_{2ij}^\top= & {} \gamma _1A_{2ij}\left( \theta _{1i}I_{g_i}\right) A_{2ij}^\top + \gamma _2A_{2ij}M_{ii}^2A_{2ij}^\top \nonumber \\= & {} \gamma _1\theta _{1i}I_{g_{ij}} + \gamma _2\theta _{2ij}I_{g_{ij}}; \nonumber \end{aligned}$$

and

$$\begin{aligned} \gamma _3A_{2ij}A_{1i}A_{1i}^\top A_{2ij}^\top = \gamma _3A_{2ij}I_{g_i}A_{2ij}^\top = \gamma _3I_{g_{ij}} \nonumber \end{aligned}$$

which, clearly, completes the proof. \(\square \)

With \(\mathbf 0 \) denoting an adequate null matrix and \(cov(\nu , \upsilon )\) denoting the cross-covariance between the random vectors \(\nu \) and \(\upsilon \), from (9) one might note that the cross-covariance matrix between the sub-models \(y_{ij}=A_{2ij}A_iy\) and \(y_{sk}=A_{2sk}A_sy\), \(i,s =1,\ldots ,h_1\), \(j,k=1,\ldots ,h_{2i}\) is given by

$$\begin{aligned} cov(y_{ij},\ y_{sk}) = \gamma _2A_{2ij}A_{1i}M_2A_{1s}^\top A_{2sk}^\top =\left\{ \begin{array}{ccc} \mathbf 0 &{}\quad i=s; j \ne k \\ \lambda _{ij} &{}\quad i=s; j=k \\ \gamma _2A_{2ij}W_{is}^2A_{2sk}^\top &{}\quad i\ne s \end{array} \right. \end{aligned}$$
(10)

with \(i\le s\), \(j\le k\) (symmetry applies), so that, for \(i \ne s\), the sub-models \(y_{ij}\) and \(y_{sk}\) are correlated and for \(i=s\) they are not.

3.2 Estimation for \(r=2\)

From the Sect. 3.1 we see that (with i and j respectively replaced by \(i_1\) and \(i_2\), for convenience) \(w_2=P_2P_1y\) produces the following sub-models

$$\begin{aligned} y_{i_1i_2} \sim \mathscr {N}_{g_{i_1i_2}}(\mathbf{0 }_{g_{i_1i_2}}, \ \lambda _{i_1i_2}I_{g_{i_1i_2}}), \ i_1=1,\ldots ,h_1,\ i_2=1,\ldots , h_{2i_1}, \end{aligned}$$
(11)

of the model \(y \sim \mathscr {N}_n(\mathbf{0 }_n, \ \gamma _1M_1 + \gamma _2M_2 + \gamma _3I_n)\), where

$$\lambda _{i_1i_2} =\gamma _1\theta _{1i_1} + \gamma _2\theta _{2i_1 i_2} + \gamma _3.$$

An unbiased estimator of \(\lambda _{i_1i_2}\) for model (11) is (one based on its maximum likelihood estimator \(\hat{\lambda }_{i_1i_2}\))

$$\begin{aligned} S_{i_1i_2}^2= & {} \frac{y_{i_1i_2}^\top y_{i_1i_2}}{g_{i_1i_2}},\nonumber \\&i_1=1,\ldots ,h_1,\ i_2=1,\ldots , h_{2i_1}. \nonumber \end{aligned}$$

Indeed (see Rencher and Schaalje [11, Theorem 5.2a]),

$$\begin{aligned} E(S_{i_1i_2}^2)= & {} \frac{1}{g_{i_1i_2}}tr\left\{ \lambda _{i_1i_2}I_{g_{i_1i_2}}\right\} \nonumber \\= & {} \lambda _{i_1i_2}. \end{aligned}$$
(12)

Thus

$$E(S_{i_1i_2}^2) = \lambda _{i_1i_2}= \gamma _1\theta _{1i_1} + \gamma _2\theta _{2i_1i_2} + \gamma _3, \ \ i_1=1,\ldots ,h_1,\ i_2=1,\ldots , h_{2i_1}$$

so that, with \(S = \begin{bmatrix}S_{11}^2 \\ \ldots \\ S_{1h_{21}}^2 \\ S_{21}^2 \\ \ldots \\ S_{2h_{22}}^2\\ \ldots \\ \ldots \\ S_{h_11}^2\\ \ldots \\ S_{h_1h_{2h_1}}^2\end{bmatrix}\), \(\varTheta = \begin{bmatrix} \theta _{11}&\theta _{211}&1 \\ \ldots&\ldots&\ldots \\ \theta _{11}&\theta _{21h_{21}}&1 \\ \theta _{12}&\theta _{221}&1 \\ \ldots&\ldots&\ldots \\ \theta _{12}&\theta _{22h_{22}}&1 \\ \ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_11}&1\\ \ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_1h_{2h_1}}&1 \end{bmatrix}\), and \(\gamma = \begin{bmatrix} \gamma _1 \\ \gamma _2 \\ \gamma _3 \end{bmatrix}\), we will have

$$\begin{aligned} E(S) = \varTheta \gamma . \end{aligned}$$
(13)

Thus, for \(i_1=1,\ldots , h_1, \ i_2=1,\ldots , h_{2i_1}\), equalizing the variances \(\lambda _{i_1i_2}\) to the correspondent estimators \(S_{i_1i_2}^2\) it yields the following system of equations:

$$\begin{aligned} S_{11}^2= & {} \gamma _1\theta _{11} + \gamma _2\theta _{211} + \gamma _3;\nonumber \\ \dots&\ \ \ldots \ldots \ldots \ldots \ldots ; \nonumber \\ S_{1h_{21}}^2= & {} \gamma _1\theta _{11} + \gamma _2\theta _{21h_{21}} + \gamma _3;\nonumber \\ S_{21}^2= & {} \gamma _1\theta _{12} + \gamma _2\theta _{221} + \gamma _3;\nonumber \\ \dots&\ \ \ldots \ldots \ldots \ldots \ldots \nonumber \\ S_{2h_{22}}^2= & {} \gamma _1\theta _{12} + \gamma _2\theta _{22h_{22}} + \gamma _3;\nonumber \\ \ldots&\ \ \ldots \ldots \ldots \ldots \ldots ; \nonumber \\ \ldots&\ \ \ldots \ldots \ldots \ldots \ldots ; \nonumber \\ S_{h_11}^2= & {} \gamma _1\theta _{1h_1} + \gamma _2\theta _{2h_11} + \gamma _3;\nonumber \\ \ldots&\ \ \ldots \ldots \ldots \ldots \ldots ; \nonumber \\ S_{h_1h_{2h_1}}^2= & {} \gamma _1\theta _{1h_1} + \gamma _2\theta _{2h_1h_{2h_1}} + \gamma _3;\nonumber \end{aligned}$$

which in matrix notation becomes

$$\begin{aligned} S =\varTheta \gamma . \end{aligned}$$
(14)

Since by construction \(\theta _{1i_1} \ne \theta _{1i_1^{'}}, \ i_1 \ne i_1^{'} = 1, \ldots , h_1\) (they are the different eigenvalues of \(M_1\)) and \( \theta _{2i_1i_2} \ne \theta _{2i_1i_2^{'}}, \ i_2 \ne i_2^{'} = 1, \ldots , h_{2i_1}\) (they are the distinct eigenvalues of \(M_{ii}^2 = A_{1i_1}M_2A_{1i_1}^\top \)), it is easily seen that the matrix \(\varTheta \) is a full rank one; that is \(r(\varTheta )=3\).

By Rencher and Schaalje [11, Theorem 2.6d] the matrix

$$\varTheta ^\top \varTheta = \begin{bmatrix} {\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{1i_1}^2&{\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{1i_1}\theta _{2i_1i_2}&{\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{1i_1} \\ \\ {\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{1i_1}\theta _{2i_1i_2}&{\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{2i_1i_2}^2&{\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{2i_1i_2} \\ \\ {\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{1i_1}&{\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}}\theta _{2i_1i_2}&{\sum }_{i_1}^{h_1}{\sum }_{i_2}^{h_{2i_1}} \end{bmatrix}$$

is positive-definite , and by Rencher and Schaalje [11, Corollary 1], \(\varTheta ^\top \varTheta \) is non-singular ; we, thus, take its inverse to be \((\varTheta ^\top \varTheta )^{-1}\).

Now, premultiplying the system (14) in both side by \(\varTheta ^\top \) the resulting system of equations will be

$$\begin{aligned} \varTheta ^\top S =\varTheta ^\top \varTheta \gamma , \end{aligned}$$
(15)

whose unique solution (and therefore an estimator of \(\gamma \)) is

$$\begin{aligned} \hat{\gamma } =(\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S. \end{aligned}$$
(16)

\(\hat{\gamma } = \begin{bmatrix} \hat{\gamma _1} \\ \hat{\gamma _2} \\ \hat{\gamma _3}\end{bmatrix}\) will be referred to as Sub-D estimator and the underlying method referred to as Sub-D method.

Proposition 2

\(\hat{\gamma }\) is an unbiased estimator of \(\gamma \), with \(\gamma = \begin{bmatrix} \gamma _1 \\ \gamma _2 \\ \gamma _3\end{bmatrix}\).

Proof

Indeed, \(E(\hat{\gamma }) =E\left( (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S\right) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top E(S) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top \varTheta \gamma = \gamma \). \(\square \)

Proposition 3

With \(i\le i^*\), \(j\le j^*\) (symmetry applies),

$$\begin{aligned}&cov\left( S_{ij}^2,\ S_{i^*j^*}^2 \right) = \left\{ \begin{array}{ccc} (\mathbf{a }) \ i=i^*; j \ne j^*: &{}\quad 0, \\ (\mathbf{b })\ i=i^*; j = j^*: &{}\quad 2\frac{\lambda _{ij}^2}{g_{ij}},\\ (\mathbf{c })\ i\ne i^*: &{}\quad 2\gamma _2^2tr(\varOmega M_2), \nonumber \end{array} \right. \nonumber \end{aligned}$$

where \(\varOmega =\nabla _{ij}M_2\nabla _{i^*j^*}\), with \(\nabla _{ij} = \frac{A_{1i}^\top A_{2ij}^\top A_{2ij}A_{1i}}{g_{ij}}\).

Proof

We have that

$$\begin{aligned} cov\left( S_{ij}^2,S_{i^*j^*}^2\right)= & {} cov\left( \frac{y_{ij}^\top y_{ij}}{g_{ij}},\ \frac{y_{i^*j^*}^\top y_{i^*j^*}}{g_{i^*j^*}}\right) \nonumber \\= & {} cov\left( y^\top \left( \frac{A_{1i}^\top A_{2ij}^\top A_{2ij}A_{1i}}{g_{ij}}\right) y,\ y^\top \left( \frac{A_{1i^*}^\top A_{2i^*j^*}^\top A_{2i^*j^*}A_{1i^*}}{g_{i^*j^*}}\right) y\right) \nonumber \\= & {} cov\left( y^\top \nabla _{ij}y,\ y^\top \nabla _{i^*j^*}y \right) \nonumber \\= & {} 2tr\left( \nabla _{ij}V\nabla _{i^*j^*}V\right) \nonumber \\= & {} 2\gamma _1^2tr(\nabla _{ij}M_1\nabla _{i^*j^*}M_1) + 2\gamma _1\gamma _2tr(\nabla _{ij}M_1\nabla _{i^*j^*}M_2) + 2\gamma _1\gamma _3tr(\nabla _{ij}M_1\nabla _{i^*j^*})\nonumber \\+ & {} 2\gamma _2\gamma _1tr(\nabla _{ij}M_2\nabla _{i^*j^*}M_1) + 2\gamma _2^2tr(\nabla _{ij}M_2\nabla _{i^*j^*}M_2)+ 2\gamma _2\gamma _3tr(\nabla _{ij}M_2\nabla _{i^*j^*})\nonumber \\+ & {} 2\gamma _3\gamma _1tr(\nabla _{ij}\nabla _{i^*j^*}M_1) + 2\gamma _3\gamma _2tr(\nabla _{ij}\nabla _{i^*j^*}M_2) + 2\gamma _3^2tr(\nabla _{ij}\nabla _{i^*j^*}) \nonumber \\= & {} \left\{ \begin{array}{ccc} i=i^*; j \ne j^*: &{}\quad 0, \\ i=i^*; j = j^*: &{}\quad 2\frac{\lambda _{ij}^2}{g_{ij}},\\ i\ne i^*: &{}\quad 2\gamma _2^2tr(\nabla _{i j}M_2\nabla _{i^*j^*}M_2). \nonumber \end{array} \right. \nonumber \end{aligned}$$

For the case (a), that is \(i=i^*; j \ne j^*\), we have that

$$\begin{aligned} \nabla _{ij}M_1\nabla _{ij^*}= & {} \frac{1}{g_{ij}g_{ij^*}}A_{1i}^\top A_{2ij}^\top A_{2ij}A_{1i}M_1A_{1i}^\top A_{2ij^*}^\top A_{2ij^*}A_{1i}\nonumber \\ {}= & {} \frac{1}{g_{ij}g_{ij^*}}A_{1i}^\top A_{2ij}^\top A_{2ij}\left( \theta _{1i}I_{g_i}\right) A_{2ij^*}^\top A_{2ij^*}A_{1i}\nonumber \\= & {} \mathbf{0 }_{g_{i}\times g_{i}}\ (\text{ see }\ (4) \ \text{ for } \text{ the } \text{ explanation }); \end{aligned}$$
(17)
$$\begin{aligned} \nabla _{ij}M_2\nabla _{ij^*}= & {} \frac{1}{g_{ij}g_{ij^*}}A_{1i}^\top A_{2ij}^\top A_{2ij}A_{1i}M_2A_{1i}^\top A_{2ij^*}^\top A_{2ij^*}A_{1i}\nonumber \\ {}= & {} \frac{1}{g_{ij}g_{ij^*}}A_{1i}^\top A_{2ij}^\top A_{2ij}\left( M_{ii}^2\right) A_{2ij^*}^\top A_{2ij^*}A_{1i}\nonumber \\= & {} \mathbf{0 }_{g_{i}\times g_{i}} \ (\text{ see }\ (8)\ \text{ for } \text{ the } \text{ explanation }); \end{aligned}$$
(18)
$$\begin{aligned} \nabla _{ij}\nabla _{ij^*}= & {} \frac{1}{g_{ij}g_{ij^*}}A_{1i}^\top A_{2ij}^\top \left( \mathbf{0 }_{g_{ii}\times g_{ij^*}}\right) A_{2ij^*}A_{1i}\nonumber \\= & {} \mathbf{0 }_{g_{i}\times g_{i}}. \end{aligned}$$
(19)

Therefore, (17)–(19) together with Schott [12, Theorem 1.3.(d)] proves the case (a).

For the case (c), that is \(i\ne i^*\), the desired result becomes clear if use the Theorem 1.3.(d) of Schott [12] and note that

$$A_{1i}M_1A_{1i^*} =A_{1i}A_{1i^*} = \mathbf{0 }_{g_{i}\times g_{i^*}}.$$

Finally, for the case (b), that is \(i=i^*; j = j^*\), recalling \(y_{ij} \sim \mathscr {N}_n\left( \mathbf{0 }_{g_{ij}}, \ \lambda _{ij}I_{g_{ij}}\right) \), it holds

$$\begin{aligned} cov\left( S_{ij}^2\right)= & {} \varSigma \left( \frac{y_{ij}^\top y_{ij}}{g_{ij}}, \frac{y_{ij}^\top y_{ij}}{g_{ij}}\right) = 2tr\left\{ \frac{\lambda _{ij}}{g_{ij}}I_{g_{ij}} \frac{\lambda _{ij}}{g_{ij}}I_{g_{ij}}\right\} = 2\frac{\lambda _{ij}^2}{g_{ij}^2}tr\left\{ I_{g_{ij}}\right\} \nonumber \\= & {} 2\frac{\lambda _{ij}^2}{g_{ij}}, \end{aligned}$$
(20)

and therefore the proof is complete. \(\square \)

The next result introduce the variance-covariance matrix of the sub-diagonalization estimator:

$$\hat{\gamma }=(\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S.$$

Proposition 4

In order to simplify the notation, let \(\varSigma _{S_{ij}S_{kl}}\) denote \(cov({S_{ij}^2,\ S_{kl}^2})\). Then,

$$\begin{aligned} cov(\hat{\gamma }) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top cov(S)\varTheta (\varTheta ^\top \varTheta )^{-1}, \end{aligned}$$
(21)

where \(cov(S) = \begin{bmatrix} D_1&\varLambda _{12}&\varLambda _{13}&\ldots&\varLambda _{1h_1} \\ \varLambda _{21}&D_2&\varLambda _{23}&\ldots&\varLambda _{2h_1}\\ \varLambda _{31}&\varLambda _{32}&D_3&\ldots&\varLambda _{3h_1}\\ \vdots&\vdots&\vdots&\ddots&\vdots \\ \varLambda _{h_11}&\varLambda _{h_12}&\varLambda _{h_13}&\ldots&D_{h_1} \end{bmatrix}\), with \(D_i= 2\begin{bmatrix} \frac{\lambda _{i1}^2}{g_{i1}}&0&\ldots&0\\ 0&\frac{\lambda _{i2}^2}{g_{i2}}&\dots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&\frac{\lambda _{ih_{2i}}^2}{g_{ih_{2i}}} \end{bmatrix}\) and \(\varLambda _{ks} = \begin{bmatrix} \varSigma _{S_{k1}S_{s1}}&\varSigma _{S_{k1}S_{s2}}&\ldots&\varSigma _{S_{k1}S_{sh_{2s}}} \\ \varSigma _{S_{k2}S_{s1}}&\varSigma _{S_{k2}S_{s2}}&\ldots&\varSigma _{S_{k2}S_{sh_{2s}}} \\ \vdots&\vdots&\ddots&\vdots \\ \varSigma _{S_{kh_{2k}}S_{s1}}&\varSigma _{S_{kh_{2k}}S_{s2}}&\ldots&\varSigma _{S_{kh_{2k}}S_{sh_{2s}}} \end{bmatrix}\).

Proof

The proof is a consequence of the Proposition 3. \(\square \)

3.3 The General Case: \(r \ge 1\)

Now, without lost in generality, lets consider the general MLM in (2):

$$y \sim \mathscr {N}_n \left( \mathbf{0 }_n, \sum _{d=1}^{r+1}\gamma _{d} M_{d}\right) , \ \text{ with } \ M_d=X_dX_d^\top \in \mathscr {S}^n \ \text{ and }\ M_{r+1}=I_n.$$

One may note that \(y= \sum _{d=1}^{r+1}B_o^\top X_d\beta _d\), where \( \beta _d \sim \mathscr {N}_{}(0, \ \gamma _dI)\), \(d=1,\ldots ,r\), \(\beta _{r+1} \sim \mathscr {N}_{}(0, \ \gamma _dI_n)\), and \(\beta _1, \ldots , \beta _{r+1}\) are not correlated.

With \(i_1=1,\ldots ,h_1\), \(i_j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}}\), consider the finite sequence of r matrices \(P_1\), \(P_2\), ..., \(P_r\) defined as follow:

$$\begin{aligned}&\qquad P_1 = \begin{bmatrix} A_{11}\\ A_{12} \\ \vdots \\ A_{1h_1}\end{bmatrix} \in \mathscr {M}^{\left( \sum _{i_1}^{h_1}g_{i_1}\right) \times n},\ \text{ with } \ A_{1i_1} \in \mathscr {M}^{(g_{i_1}) \times n} \ \left( \text{ note: } \sum \limits _{i_1}^{h_1}g_{i_1} = n\right) ; \end{aligned}$$
(22)
$$\begin{aligned} P_{2}&= \begin{bmatrix} P_{21}&0&\ldots&0 \\ 0&P_{22}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{2h_1}\end{bmatrix} \in \mathscr {M}^{\left( \sum _{i_1}^{h_1}\sum _{i_2}^{h_{2,i_1}}g_{i_1i_2}\right) \times \left( \sum _{i_1}^{h_1}g_{i_1}\right) },\ \text{ where } \nonumber \\ P_{2i_1}&= \begin{bmatrix} A_{2i_11}\\ A_{2i_12} \\ \vdots \\ A_{2i_1h_{2i_1}}\end{bmatrix} \in \mathscr {M}^{\left( \sum _{i_2}^{h_{2,i_1}}g_{i_1i_2}\right) \times g_{i_1}},\ \text{ with } \sum _{i_2}^{h_{2,i_1}}g_{i_1i_2} = g_{i_1}\ \text{ and }\ \ A_{2i_1i_2} \in \mathscr {M}^{g_{i_1i_2} \times g_{i_1}} ;\nonumber \\&P_{3} = \begin{bmatrix} P_{31}&0&\ldots&0 \\ 0&P_{32}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{3h_1}\end{bmatrix} \in \mathscr {M}^{\left( \sum _{i_1}^{h_1}\sum _{i_2}^{h_{2,i_1}}\sum _{i_3}^{h_{3,i_1,i_2}}g_{i_1i_2i_3}\right) \times \left( \sum _{i_1}^{h_1}\sum _{i_2}^{h_{2,i_1}}g_{i_1i_2}\right) },\nonumber \\&\text{ where }\; P_{3i_1} =\begin{bmatrix} P_{3i_11}&0&\ldots&0 \\ 0&P_{3i_12}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{3i_1h_{2,i_1}}\end{bmatrix} \in \mathscr {M}^{\left( \sum _{i_2}^{h_{2,i_1}}\sum _{i_3}^{h_{3,i_1,i_2}}g_{i_1i_2i_3}\right) \times \left( \sum _{i_2}^{h_{2,i_1}}g_{i_1i_2}\right) }\ \text{ and }\nonumber \\&\!\!\!\!P_{3i_1i_2} = \begin{bmatrix} A_{3i_1i_21}\\ A_{3i_1i_22} \\ \vdots \\ A_{3i_1i_2h_{3,i_1,i_2}}\end{bmatrix} \in \mathscr {M}^{\left( \sum _{i_3}^{h_{3,i_1,i_2}}g_{i_1i_2i_3}\right) \times g_{i_1i_2}}, \ \text{ with }\ \sum _{i_3}^{h_{3,i_1,i_2}}g_{i_1i_2i_3} = g_{i_1i_2}\ \text{ and } \nonumber \\ \nonumber&\,\quad \quad A_{3i_1i_2i_3} \in \mathscr {M}^{g_{i_1i_2i_3}\times g_{i_1i_2}};\nonumber \end{aligned}$$

Thus, for \(r\ge 2\), each matrix \(P_{r}\) will be given by (\(P_1\) is given in (22)):

$$\begin{aligned} P_r= & {} \begin{bmatrix} P_{r1}&0&\ldots&0 \\ 0&P_{r2}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{rh_1}\end{bmatrix} \\ \nonumber&\in \mathscr {M}^{\left( \sum _{i_1}^{h_1}\ldots \sum _{i_r}^{h_{r,i_1,\ldots ,i_{r-1}}}g_{i_1\ldots i_r}\right) \times \left( \sum _{i_1}^{h_1}\ldots \sum _{i_{(r-1)}}^{h_{(r-1),i_1,\ldots ,i_{r-2}}}g_{i_1\ldots i_{(r-1)}}\right) }, \\ \nonumber \end{aligned}$$
(23)

where

$$\begin{aligned} P_{ri_1}&= \begin{bmatrix} P_{ri_11}&0&\ldots&0 \\ 0&P_{ri_12}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{ri_1h_{2,i_1}}\end{bmatrix} \\ \nonumber \\&\quad \,\, \in \mathscr {M}^{\left( \sum _{i_2}^{h_{2,i_1}}\ldots \sum _{i_r}^{h_{r,i_1,\ldots ,i_{r-1}}}g_{i_1\ldots i_r}\right) \times \left( \sum _{i_2}^{h_{2,i_1}}\ldots \sum _{i_{(r-1)}}^{h_{(r-1),i_1,\ldots ,i_{r-2}}}g_{i_1\ldots i_{(r-1)}}\right) },\nonumber \\ \nonumber \\ \nonumber \\ \nonumber&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ldots \ \ldots \ \ldots \ \ldots \ \ldots \nonumber \\ \nonumber \\ \nonumber \\ \nonumber&\!\!\!\!\!\! P_{ri_1\dots i_{(r-2)}} = \begin{bmatrix} P_{ri_1\dots i_{(r-2)}1}&0&\ldots&0 \\ 0&P_{ri_1\dots i_{(r-2)}2}&\ldots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\ldots&P_{ri_1\dots i_{(r-2)}h_{r - 1,i_1,\ldots ,i_{r-2}}}\end{bmatrix} \nonumber \\ \qquad \qquad \quad&\in \mathscr {M}^{\left( \sum _{i_{(r-1)}}^{h_{(r-1),i_1,\ldots ,i_{r-2}}}\sum _{i_r}^{h_{r,i_1,\ldots ,i_{r-1}}}g_{i_1\ldots i_r}\right) \times \left( \sum _{i_{(r-1)}}^{h_{(r-1),i_1,\ldots ,i_{r-2}}}g_{i_1\ldots i_{(r-1)}}\right) },\nonumber \\ \text{ and }&P_{ri_1\dots i_{(r-1)}} = \begin{bmatrix} A_{ri_1\ldots i_{(r-1)}1}\\ A_{ri_1\ldots i_{(r-1)}2} \\ \vdots \\ A_{ri_1\ldots i_{(r-1)}h_{r,i_1,\ldots ,i_{r-1}}}\end{bmatrix} \in \mathscr {M}^{\left( \sum _{i_r}^{h_{r,i_1,\ldots ,i_{r-1}}}g_{i_1\ldots i_r}\right) \times g_{i_1\ldots i_{(r-1)}}},\nonumber \\ \nonumber \text{ with }&\ \sum _{i_r}^{h_{r,i_1,\ldots ,i_{r-1}}}g_{i_1\ldots i_r} = g_{i_1\ldots i_{(r-1)}}, \sum _{i_1}^{h_1}g_{i_1} =n, \ A_{ri_1\ldots i_r} \in \mathscr {M}^{g_{i_1\ldots i_r}\times g_{i_1\ldots i_{(r-1)}}};\nonumber \end{aligned}$$

Theorem 1

Let the matrices \(P_1, P_2, \ldots , P_r\) defined above be such that:

(\(c_1\)):

The columns of \(A_{1i_1}^\top \), \(i_1=1,\ldots ,h_1\), form a set of \(g_{i_1} = r(A_{1i_1}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{1i_1}\) of the matrix \(M_1\) (\(\theta _{1i_1}\) has multiplicity \(g_{i_1}\));

(\(c_2\)):

The columns of \(A_{2i_1i_2}^\top \), \(i_2=1,\ldots ,h_{2,i_1}\), form a set of \(g_{i_1i_2} = r(A_{2i_1i_2}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{2i_1i_2}\) of the matrix \(M_{i_1i_1}^2=A_{1i_1}M_2A_{1i_1}^\top \) (\(\theta _{2i_1i_2}\) has multiplicity \(g_{i_1i_2}\));

(\(c_3\)):

The columns of \(A_{3i_1i_2i_3}^\top \), \(i_3=1,\ldots ,h_{3,i_1,i_2}\), form a set of \(g_{i_1i_2i_3} = r(A_{3i_1i_2i_3}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{3i_1i_2i_3}\) of the matrix

$$A_{2i_1i_2}M_{i_1i_1}^3A_{2i_1i_2}^\top =A_{2i_1i_2}A_{1i_1}M_3A_{1i_1}^\top A_{2i_1i_2}$$

(\(\theta _{3i_1i_2i_3}\) has multiplicity \(g_{i_1i_2i_3}\));

                                        ............

(\(c_r\)):

The columns of \(A_{ri_1\ldots i_r}^\top \), \(i_r=1,\ldots ,h_{r,i_1,\ldots ,i_{r-1}}\), form a set of \(g_{i_1\ldots i_r} = r(A_{ri_1\ldots i_r}^\top )\) orthonormal eigenvectors associated to the eigenvalues \(\theta _{ri_1\ldots i_r}\) of the matrix

$$A_{(r-1)i_1\ldots i_{(r-1)}}\ldots A_{1i_1}M_rA_{1i_1}^\top \ldots A_{(r-1)i_1\ldots i_{(r-1)}}^\top $$

(\(\theta _{ri_1\ldots i_r}\) has multiplicity \(g_{i_1\ldots i_r}\)).

Then each matrix \(P_d\), \(d=1,\ldots , r\), in the finite sequence of matrices \(P_1, P_2,\ldots ,Pr\) will be an orthogonal matrix.

Proof

By the way \(P_d\) is defined (see (23)), since

$$P_{di_1\dots i_{(d-1)}} = \begin{bmatrix} A_{di_1\ldots i_{(d-1)}1}\\ A_{di_1\ldots i_{(d-1)}2} \\ \vdots \\ A_{di_1\ldots i_{(d-1)}h_{d,i_1,\ldots ,i_{d-1}}}\end{bmatrix},\ i_{(d-1)}=1,\ldots ,h_{(d-1),i_1,\ldots ,i_{d-2}},$$

and according with condition \(c_d\) we see that the matrices \(P_{di_1\dots i_{(d-1)}}\) are orthogonal. Thus, the desired result comes if we see that \(P_d^\top P_d\) will be a diagonal blockwise matrix whose diagonal entries are \(P_{di_1}^\top P_{di_1}\), \(i_1=1,\ldots ,h_1\). The diagonal entries \(P_{di_1}^\top P_{di_1}\) will be diagonal blockwise matrices whose diagonal entries will be \(P_{di_1i_2}^\top P_{di_1i_2}\), \(i_2=1,\ldots ,h_{2,i_1}\). Proceeding this way \(d-2\) times, we will find that the diagonal entries of the blockwise matrices \(P_{di_1\ldots i_{(d-2)}}^\top P_{di_1\ldots i_{(d-2)}}\), \(i_{(d-2)}=1,\ldots ,h_{(d-2),i_1,\ldots ,i_{d-3}}\), will be

$$\begin{aligned} P_{di_1\dots i_{(d-1)}}^\top P_{di_1\dots i_{(d-1)}}= & {} A_{di_1\ldots i_{(d-1)}1}^\top A_{di_1\ldots i_{(d-1)}1} \nonumber \\&+ \cdots + A_{di_1\ldots i_{(d-1)}h_{d,i_1,\ldots ,i_{d-1}}}^\top A_{di_1\ldots i_{(d-1)}h_{d,i_1,\ldots ,i_{d-1}}} \nonumber \\ {}= & {} I_{g_{i_1\ldots i_{(d-1)}}},\nonumber \end{aligned}$$

reaching, therefore, the desired result. Proceeding in same way we would also see that \(P_{di_1\dots i_{(d-1)}}P_{di_1\dots i_{(d-1)}}^\top \) is a Blockwise diagonal matrix whose diagonal entries are \(A_{di_1\ldots i_{(d-1)}1}A_{di_1\ldots i_{(d-1)}j}^\top \), \(j=1,\ldots , h_{d,i_1,\ldots ,i_{d-1}}\), so that \(P_dP_d^\top \) is an identity matrix. \(\square \)

The model \(w_r=P_r\ldots P_2P_1y\) will produces the following sub - models:

$$\begin{aligned} y_{i_1\ldots i_r}= A_{ri_1\ldots i_r}A_{(r-1)i_1\ldots i_{(r-1)}}\ldots A_{2i_1i_2}A_{1i_1}y,\nonumber \\ i_1=1, \ldots , h_1, i_j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}}.\nonumber \end{aligned}$$

We summarize the distribution of each of the sub-model \(y_{i_1\ldots i_r}\) in the following result.

Proposition 5

$$y_{i_1\ldots i_r} \sim \mathscr {N}_{g_{i_1\ldots i_r}}\left( 0_{g_{i_1\ldots i_r}},\ \lambda _{i_1\ldots i_r}I_{g_{i_1\ldots i_r}}\right) ,$$

where \(\lambda _{i_1\ldots i_r}=\sum _{d=1}^r\gamma _d\theta _{di_1\ldots i_d} + \gamma _{r+1}\).

Proof

The proof becomes obvious after looking to the proofs of the Proposition 1. \(\square \)

From the results about cross-covariance on the preceding sections we easily conclude that the cross-covariance matrix between the sub-models \(y_{i_1\ldots i_r}\) and \(y_{i_1^*\ldots i_r^*}\), with \(i_1,i^*_1=1,\ldots ,h_1\); \(i_j,i_j^*=1,\ldots , h_{j,i_1,\ldots ,i_{j-1}}\), is given by

$$\begin{aligned} cov(y_{i_1\ldots i_r},\ y_{i_1^*\ldots i_r^*})= & {} \left\{ \begin{array}{ccc} 0 &{}\quad i_1=i^*_1, \\ \lambda _{i_1\ldots i_r} &{}\quad i_j = i^*j\\ \ &{} \quad j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}} \\ \sum _{d=2}^r\gamma _dA_{ri_1\ldots i_r}\ldots A_{1i}M_dA_{1i_1^*}^\top \ldots A_{ri_1^*\ldots i_r^*} &{}\quad i_1\ne i^*_1 \end{array} \right. \nonumber \end{aligned}$$

so that, for \(i_1 \ne i^*_1\), the sub-models \(y_{i_1\ldots i_r}\) and \(y_{i_1^*\ldots i_r^*}\) are correlated and for \(i_1=i^*_1\) they are not.

3.4 Estimation for the General Case: \(r \ge 1\)

Recalling that for the MLM in (1), \(P_r\ldots P_2P_1y\) produces the following sub-models

$$\begin{aligned} y_{i_1i_2\ldots i_r}\sim & {} \mathscr {N}_{g_{i_1i_2\ldots i_r}}(0_{g_{i_1\ldots i_r}}, \ \lambda _{i_1i_2\ldots i_r}I_{g_{i_1i_2\ldots i_r}}), \nonumber \\&i_1=1,\ldots ,h_1, i_j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}} \end{aligned}$$
(24)

where

$$\lambda _{i_1i_2\ldots i_r} = \sum _{d=1}^r\gamma _d\theta _{di_1\ldots i_d} + \gamma _{r+1}.$$

The matrices \(P_d\), \(d=1,\ldots , r\), are defined in the Sect. 3.3.

An unbiased estimator of \(\lambda _{i_1i_2\ldots i_r}\) in the sub-model (24) is (the one based on its maximum likelihood estimator \(\hat{\lambda }_{i_1i_2\ldots i_r}\))

$$\begin{aligned} S_{i_1i_2\ldots i_r}^2= & {} \frac{1}{g_{i_1i_2\ldots i_r}}y_{i_1i_2\ldots i_r}^\top y_{i_1i_2\ldots i_r}\nonumber \end{aligned}$$

Indeed (see Rencher and Schaalje [11], Theorem 5.2(a), and the explanation for (12)),

$$\begin{aligned} E\left( S_{i_1i_2\ldots i_r}^2\right)= & {} \frac{\lambda _{i_1i_2\ldots i_r}}{g_{i_1i_2\ldots i_r}}tr\left[ I_{g_{i_1i_2\ldots i_r}}\right] \nonumber \\= & {} \lambda _{i_1i_2\ldots i_r}. \end{aligned}$$
(25)

For convenience, in what follows, instead of \(S_{i_1i_2\ldots i_r}^2\), we may sometimes use the notation \(S_{i_1i_2\ldots i_{(r-1)}i_r}^2\).

Thus

$$\begin{aligned} E(S_{i_1i_2\ldots i_{(r-1)}i_r}^2)= & {} \sum _{d=1}^r\gamma _d\theta _{di_1\ldots i_d} + \gamma _{r+1}\nonumber \\= & {} \gamma _1\theta _{1i_1} + \gamma _2\theta _{2i_1i_2} + \cdots + \gamma _r\theta _{ri_1i_2\ldots i_{(r-1)}i_r} + \gamma _{r+1},\nonumber \\ \nonumber \\&i_1=1,\ldots ,h_1; i_j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}}\nonumber \end{aligned}$$

so that, with \(S = \begin{bmatrix}S_{11\ldots 11}^2 \\ S_{11\ldots 12}^2 \\ \ldots \\ S_{11\ldots 1h_{r,1,\ldots ,1}}^2\\ S_{11\ldots 21}^2 \\ \ldots \\ S_{11\ldots 2h_{r,1,\ldots ,2}}^2\\ \ldots \\ \ldots \\ \ldots \\ S_{h_11\ldots 11}^2\\ \ldots \\ \ldots \\ \ldots \\ S_{h_1h_{2,h_1}\ldots h_{r,h_1,\ldots ,h_{r-1}}}^2\end{bmatrix}\),

\(\varTheta = \begin{bmatrix} \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 11}&1 \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 12}&1 \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 1h_{r,1,\ldots ,1,h_{r-1}}}&1 \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 21}&1 \\ \ldots&\ldots&\ldots&\ldots&\dots&\ldots \\ \theta _{11}&\theta _{211}&\theta _{3111}&\ldots&\theta _{r11\ldots 2h_{r,1,\ldots ,2,h_{r-1}}}&1 \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_11}&\theta _{3h_111}&\ldots&\theta _{rh_11\ldots 11}&1 \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \ldots&\ldots&\ldots&\ldots&\ldots&\ldots \\ \theta _{1h_1}&\theta _{2h_1h_{2,h_1}}&\theta _{3h_1h_{2,h_1}h_{3,h_1,h_2}}&\ldots&\theta _{rh_1h_{2,h_1}\ldots h_{(r-1),h_1,\ldots ,h_{r-2}}h_{r,h_1,\ldots ,h_{r-1}}}&1 \end{bmatrix}\),

and \(\gamma = \begin{bmatrix} \gamma _1 \\ \gamma _2 \\ \gamma _3 \\ \ldots \\ \ldots \\ \gamma _{r} \\ \gamma _{(r+1)} \end{bmatrix}\), we will have

$$\begin{aligned} E(S) = \varTheta \gamma . \end{aligned}$$
(26)

Thus, for \(i_1=1,\ldots ,h_1\), \(i_j=1,\ldots ,h_{j,i_1,\ldots ,i_{j-1}}\), \(j>1\), equalizing the variances \(\lambda _{i_1i_2\ldots i_{r}}\) to the correspondent estimators \(S_{i_1i_2\ldots i_{r}}^2\) it yields the following system of equations (in matrix notation)

$$\begin{aligned} S =\varTheta \gamma . \end{aligned}$$
(27)

Since by construction \(\theta _{1i_1} \ne \theta _{1i_1^{'}}\) (they are the different eigenvalues of \(M_1\)), \(\theta _{2i_1i_2} \ne \theta _{2i_1i_2^{'}}\) (they are the distinct eigenvalues of \(M_{ii}^2 = A_{1i_1}M_2A_{1i_1}^\top \)), \(\theta _{3i_1i_2i_3} \ne \theta _{3i_1i_2i_3^{'}}\) (they are the distinct eigenvalues of \(A_{2i_1i_2}A_{1i_1}M_2A_{1i_1}^\top A_{2i_1i_2}^\top )\), ..., \(\theta _{ri_1i_2\ldots i_{(r-1)}i_r} \ne \theta _{ri_1i_2\ldots i_{(r-1)}i_r^{'}}\) (they are the distinct eigenvalues of \(A_{(r-1)i_1i_2\ldots i_{(r-1)}}\ldots A_{1i_1}M_rA_{1i_1}^\top \ldots A_{(r-1)i_1i_2\ldots i_{(r-1)}}^\top \)) where \(i_j \ne i_j^{'}, \ j=1,\ldots ,r\), it is easily seen that the matrix \(\varTheta \) is of full rank; that is \(r(\varTheta )=r +1\).

According with Theorem 2.6d (Rencher and Schaalje [11]), with \(\sum \) denoting \( \sum _{i_1}^{h_1}\sum _{i_2}^{h_{2,i_1}}\ldots \sum _{i_r}^{h_{r,i_1,\ldots ,i_{r-1}}}\), the matrix

$$\varTheta ^\top \varTheta = \begin{bmatrix} \sum \theta _{1i_1}^2&\sum \theta _{1i_1}\theta _{2i_1i_2}&\sum \theta _{1i_1}\theta _{3i_1i_2i_3}&\ldots&\sum \theta _{1i_1}\theta _{ri_1\ldots ir}&\sum \theta _{1i_1} \\ \\ \sum \theta _{1i_1}\theta _{2i_1i_2}&\sum \theta _{2i_1i_2}^2&\theta _{2i_1i_2}\theta _{3i_1i_2i_3}&\ldots&\sum \theta _{2i_1i_2}\theta _{ri_1\ldots ir}&\sum \theta _{2i_1i_2} \\ \\ \sum \theta _{1i_1}\theta _{3i_1i_2i_3}&\sum \theta _{2i_1i_2}\theta _{3i_1i_2i_3}&\sum \theta _{3i_1i_2i_3}^2&\ldots&\sum \theta _{3i_1i_2i_3}\theta _{ri_1\ldots ir}&\sum \theta _{3i_1i_2i_3} \\ \\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots \\ \\ \sum \theta _{1i_1}\theta _{ri_1\ldots ir}&\sum \theta _{2i_1i_2}\theta _{ri_1\ldots ir}&\sum \theta _{3i_1i_2i_3}\theta _{ri_1\ldots ir}&\ldots&\sum \theta _{ri_1\ldots ir}^2&\sum \theta _{ri_1\ldots ir} \\ \\ \sum \theta _{1i_1}&\sum \theta _{2i_1i_2}&\sum \theta _{3i_1i_2i_3}&\ldots&\sum \theta _{ri_1\ldots ir}&\sum \end{bmatrix}$$

is positive-definite , and according with Corollary 1 of (Rencher and Schaalje [11], p. 27) \(\varTheta ^\top \varTheta \) is non-singular ; that is, it is invertible. We denote its inverse by \((\varTheta ^\top \varTheta )^{-1}\).

Now, premultiplying the system (27) in both side by \(\varTheta ^\top \) the resulting system of equations will be

$$\begin{aligned} \varTheta ^\top S =\varTheta ^\top \varTheta \gamma , \end{aligned}$$
(28)

whose unique solution (and therefore an estimator of \(\gamma \)) will be the Sub-D estimator

$$\begin{aligned} \hat{\gamma } =(\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S. \end{aligned}$$
(29)

Proposition 6

\(\hat{\gamma } =(\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S\) is an unbiased estimator of

$$\gamma = \begin{bmatrix} \gamma _1 \\ \gamma _2 \\ \gamma _3 \\ \ldots \\ \ldots \\ \gamma _{r} \\ \gamma _{(r+1)} \end{bmatrix}, \ \text{ where } \begin{bmatrix} \hat{\gamma _1} \\ \hat{\gamma _2} \\ \hat{\gamma _3} \\ \ldots \\ \ldots \\ \hat{\gamma _r} \\ \hat{\gamma _{(r+1)}} \end{bmatrix}.$$

Indeed, \(E(\hat{\gamma }) =E\left( (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top S\right) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top E(S) = (\varTheta ^\top \varTheta )^{-1}\varTheta ^\top \varTheta \gamma = \gamma \).

4 Numerical Results

In this section we carry numerical tests to the sub-diagonalization method for the case \(r=2\), that is for a model with 3 variances components . For this case we pick the particular model \(z \sim \mathscr {N}_{21}\left( X\beta , \gamma _1N_1 + \gamma _2N_2 + \gamma _3I_{21}\right) \), where \(N_j = X_jX_j^\top \), \(j=1,2\), with design matrices

$$X_1 = \begin{bmatrix} 1_5&0_5&0_5\\ 0_9&1_9&0_9\\ 0_7&0_7&1_7\end{bmatrix} \ \text{, } \ X_2 = \begin{bmatrix} 1_2&0_2&0_2 \\ 0_4&1_4&0_4\\ 0_8&0_8&1_8 \\ 1_4&0_4&0_4 \\ 0_3&1_3&0_3 \end{bmatrix},$$

and \(X=1_{21}\). \(1_k\) and \(0_k\) denote, respectively, \(k \times 1\) vectors of 1 and 0.

Let \(B_o\) be a matrix whose columns are the eigenvectors associated to the null eigenvalues of \(\frac{1}{21}J_{21}\). Then \(B_oB_o^\top = I_{21} - \frac{1}{21}J_{21}\) and \(B_o^\top B_o = I_{20}\), and so the new model will be

$$y = B_o^\top z \sim \mathscr {N}_{20}\left( \mathbf{0 }_{20}, \ \gamma _1M_1 + \gamma _2M_2 + \gamma _3 I_{20}\right) ,$$

where \(M_d= B_o^\top N_dB_o\).

Since \(r(N_1)=3\) we have that (see Schott [12, Theorem 2.10c]) \(r(M_1)=\) \(r(B_0^\top N_1B_0)=3\). The eigenvalues of \(M_1\) are \(\theta _{11}=7.979829\), \(\theta _{12}= 5.639219\), and \(\theta _{13}=0\) (\(\theta _{13}\) with multiplicity (root) equal to 18). Thus we have that \(M_{11}^2 = A_{11}M_2A_{11}^\top = 5.673759\) and \(M_{22}^2 = A_{12}M_2A_{12}^\top = 0.6246537\) will be \(1 \times 1\) matrices, and \(M_{33}^2 = A_{13}M_2A_{13}^\top \) an \(18 \times 18\) matrix.

We have the following: \(M_{11}^2\) has eigenvalue \(\theta _{211} = 5.673759\); \(M_{22}^2\) has eigenvalue \( \theta _{221} = 0.6246537\); \(M_{33}^2\) has 3 eigenvalues : \(\theta _{231}=6.390202\); \(\theta _{232}=1.216148\); \(\theta _{233}= 0\) (\(\theta _{233}\) with multiplicity equal to 16).

Finally we found that

$$S^\top = [190.779246 \ \ 8.866357\ \ 5.234293 \ \ 53.654627 \ \ 1.334877]$$

and \(\varTheta =\begin{bmatrix} 7.979829&5.6737590&1\\ 5.639219&0.6246537&1\\ 0&6.3902016&1 \\ 0&1.2161476&1 \\ 0&0&1 \end{bmatrix}\).

With \(\beta _k \sim \mathscr {N}_{20}\left( \mathbf{0 }_{3}, \gamma _kI_{3}\right) \), \(k=1,2\), and \(e \sim \mathscr {N}_{20}(\mathbf{0 }_{20}, \gamma _3I_{20})\), and taking \(\gamma _3=1\), the model can be rewritten as \(y = B_o^\top X_1\beta _1 + B_o^\top X_2\beta _2 + B_o^\top e\).

We consider \(\gamma _1\) and \(\gamma _2\) taking values in \(\{0.1, 0.25, 0.5, 0.75, 1, 2, 5, 10\}\). Thus, for each possible combination of \(\gamma _1\) and \(\gamma _2\), the model y is observed 1000 time, and for each observation the sub-diagonalization method is applied and the variance components estimated for each observed y. The Tables 1 and 3 present the average of the estimated values of \(\gamma _1\) and \(\gamma _2\), respectively. In order to compare the sub-diagonalization method performance with the REML, for the same 1000 observations of y, the REML method is applied and the results presented in both Tables 2 and 4.

Taking a look at tables, and comparing the averages estimated values from the sub-diagonalization method to the ones of the REML methods (see Tables 1, 2, 3, and 4), the reader may easily concludes that the results provided by the sub-diagonalization method are in general slightly more realistic. In other hand, the averages variability of the sub-diagonalization methods is relatively higher than those of REML method (see Tables 5, 6, 7, and 8); this is because of the correlation between the sub-models. This gap will be fixed in future works.

5 Concluding Remarks

Besides its simple and fast computational implementation once it depends only on the information retained on the eigenvalues of the design matrices and the quadratic errors of the model, Sub-D provides centered estimates whether for balanced or unbalanced designs, which is not the case of estimators based on ANOVA methods. As seen at Sect. 4, Sub-D provides a slightly more realistic estimates than the REML estimator, but with more variability (when the model is balanced they have a comparable variability). However, since in any computational program (source code) when we are interested in share the code, create package or use it repeatedly, we might consider its efficiency and, for this matter, the code run-time constitutes a good start point. Doing so, to compute the estimates and the corresponding variance for each pair \(\gamma _1\) and \(\gamma _2\) taking values in \(\{0.25,\ 0.5,\ 1,\ 2,\ 5,\ 10\}\), for 1000 observations of the model, we found that the Sub-D run-time is about 0.25 s while the REML estimator run-time is about 35.53 s, which means that the code for Sub-D is more than 70 times faster than the one for REML. The code was run using R software.

It seems that the problem of the little higher variability in Sub-D comparing to REML estimator is due to the correlation between the sub-models (for the case of models with three variance components , for example) \(y_{ij}\), \(i=1,\ldots ,h_1\), \(j=1, \ldots , h_{2h_i}\). From (10) we see that the variance components matrix of the model \(w_2 = P_2P_1y\) is a blockwise matrix whose diagonal matrices are \(D_1\),\(\ldots \), \(D_{h_1}\), where \(D_i = diag(\lambda _{i1},\ldots ,\lambda _{ih_{2i}})\), corresponding to \(cov(y_{ij},y_{sk})\) for \(i=s\), \(j=k\), and the off diagonal matrices are the non-null matrices \(\gamma _2A_{2ij}W_{is}^2A_{2sk}\), corresponding to \(cov(y_{ij},y_{sk})\) for \(i\ne s\). This problem will be handled in future work. Confidence region will be obtained and tests of Hypothesis for the variance components will be derived in future works.