1 Introduction

For linear mixed models with one variance component beyond the usual error, Wald proposed in 1947, for the first time, an exact F test for the variance component being zero. In 2010, Lu and Zhang have established the equivalence between the generalized likelihood ratio test and the traditional F test for no group effects in the balanced one-way random effects model, see also Herbach (1959). We extend this result to a much broader family of linear mixed effects models with one random effect, compound symmetry (CS) covariance structure, and other constraints– namely the generalized split-plot (GSP) models introduced in Christensen (1987). GSPs assume CS covariance structure indicating heterogeneous variances and constant correlations among repeated measures. GSP models include standard split-plot models with nearly any experimental design for the whole plot treatments and include some ability to incorporate covariates into split-plot models. Further, the standard multivariate growth curve models (cf. some standard multivariate text like Johnson and Wichern) with compound symmetry covariance matrices are special cases of generalized split plot models as shown by Christensen (2019, Section 11.7.1). The balanced one-way random effects model is the simplest GSP model.

A general linear mixed model that specifies both fixed effects and random effects can be written as

$$\begin{aligned} Y={\tilde{X}}\beta +X_1\gamma +\varepsilon , \end{aligned}$$
(1)

where Y is a vector of observations, \({\tilde{X}}\) is a known model matrix for fixed effects, \(\beta \) is an unobservable parameter vector of fixed effects, \(X_1\) is a known model matrix for random effects, \(\gamma \) is an unobservable vector of random effects, and \(\epsilon \) is a vector of residual errors with \({\mathbb {E}}(\epsilon )=0\), \({\text {Cov}}(\epsilon )=R\), \({\mathbb {E}}(\gamma )=0\), \({\text {Cov}}(\gamma )=D\), \({\text {Cov}}(\epsilon ,\gamma )=0\), and therefore \({\text {Cov}}(Y)=X_1DX_1^{'}+R\).

1.1 Notation

For the rest of this paper, we use the notation C(A) and r(A) to denote the column space and rank of the matrix A respectively. In addition, the perpendicular projection operator (PPO) onto C(A) is denoted by \(M_A\) unless otherwise specified. In matrix notation, \(J_r^c\) and \(0_{r\times c}\) denote a matrix of ones and a matrix of zeros respectively each of size \(r\times c\). When \(c=1\), for simplicity we suppress c so that \(J_r\) and \(0_r\) denote a column vector of ones and a column vector of zeros respectively of length r. \(I_n\) is the identity matrix of size n while I, with no subscripts, has size that can be inferred from context. We use \(\text {diag}(V_1,\dots ,V_{N})\) to denote a block diagonal matrix with square matrices \(V_1, \dots , V_N\) on its diagonal and \(\text {Bdiag}(V)\) to denote a block diagonal matrix whose diagonal entries are all V. Vertical lines denote the determinant when enclosing a matrix and absolute value when enclosing a number. L(.), and \(\ell (.)\) denote the likelihood, and \(-2\)Log-likelihood functions respectively while \(\ell _*(.)\equiv -\ell (.)\).

1.2 A class of linear mixed models of interest

Our class of linear mixed models of interest is of the form

$$\begin{aligned} Y=X_{*}\delta +X_2\gamma +(X_1\eta +\epsilon ), \end{aligned}$$
(2)

with \(\eta \sim N(0,\sigma _w^2 I_N)\), \(\epsilon \sim N(0,\sigma _s^2 I_n)\), \({\text {Cov}}(\epsilon ,\eta )=0_{n\times N}\), \(n=Nm\), \(\delta \) and \(\gamma \) are fixed effects, \(C(X_{*})\subset C(X_1)\), and \(C([X_{*},X_2])= C(X_{*}, (I-M_1)X_2)\) where \(M_1\) is the PPO onto \(C(X_1)\). If \(V=Cov(Y)\), then for this class of models it’s immediate that \(C(VX)\subset C(X)\) given that \(X_1\) is a matrix of indicator variables for the clusters with equal group sizes. GSPs are the class of such linear models for which \(V=\left( \sigma _w^2+\sigma _s^2\right) \left[ (1-\rho )I_n+\rho J_{nxn} \right] \) where \(\rho =\frac{\sigma _w^2}{\sigma _w^2+\sigma _s^2}\), \(\sigma _w^2\) is the variance between clusters or blocks, and \(\sigma _s^2\) is the variance within clusters or blocks.

1.2.1 Generalized split plot (GSP) models

We summarize the definition and analysis of GSP models from Christensen (1984, 1987 and 2011). Consider a two-stage cluster sampling model

$$\begin{aligned} Y=X\beta +\xi , \end{aligned}$$
(3)

with n observations, \(m_i\) subjects from each cluster, and includes fixed effects for each cluster. Let \(X=[X_1, X_2]\) where the columns of \(X_1\) are indicators for the clusters and \(X_2\) contains the other effects. Write \(\beta '=[\alpha ', \gamma ']\) so that \(\alpha \) is a vector of fixed cluster effects and \(\gamma \) is a vector of fixed non-cluster effects. Because it is a two-stage cluster sampling model, the error vector \(\xi \) has uncorrelated clusters and intraclass correlation structure and can be written with random effects as

$$\begin{aligned} \xi =X_1\eta +\epsilon , \end{aligned}$$
(4)

where \(\eta \) contains random cluster effects and \(\epsilon \) is a random error. Assume that \(\eta \) and \(\epsilon \) are independent such that \(\eta \sim N(0,\sigma _w^2 I_N)\), \(\epsilon \sim N(0,\sigma _s^2 I_n)\) with \({\text {Cov}}(\epsilon ,\eta )=0_{n\times N}\) such that \(n=\sum _{i=1}^N m_i\) then we get the mixed model

$$\begin{aligned} Y=X_1\alpha +X_2\gamma +(X_1\eta +\epsilon ), \end{aligned}$$
(5)

with

$$\begin{aligned} V \equiv {\text {Cov}}(Y)=\sigma _{w}^2X_1X_1'+\sigma _{s}^2I_{n}. \end{aligned}$$
(6)

The GSP models are obtained by imposing additional structure on the fixed cluster effects within the cluster sampling model (5). To model the whole plot (cluster) effects we put a constraint on \(C(X_1)\) by considering a reduced model

$$\begin{aligned} Y= & {} X_{*}\delta +X_2\gamma +(X_1\eta +\epsilon ), \qquad C(X_{*})\subset C(X_1)\nonumber \\\equiv & {} {\tilde{X}}\beta _{*}+\xi , \end{aligned}$$
(7)

where \({\tilde{X}}=[X_{*}, X_2]\) and \(\beta _{*}'=[\delta ^{'}, \gamma ^{'}]\) and the covariance matrix remains as in (6). The \(\delta _i\)s will be the whole plots fixed effects and the \(\gamma _i\)s will be the subplots fixed effects. In this model, \(\eta \) is the whole plot error and \(\epsilon \) is the subplot error.

To develop a traditional split-plot analysis for GSP models, we need three assumptions:

(a):

\(m_i=m\), i.e. all whole plots are of the same size,

(b):

\(C(X_{*})\subset C(X_1)\), i.e. \(\delta \) contains whole plot effects,

(c):

\(C({\tilde{X}})= C(X_{*}, (I-M_1)X_2)\) where \(M_1\) is the PPO onto \(C(X_1)\), i.e. subplot effects are orthogonal to whole plots (not just whole plot effects).

In particular, with these three conditions, the least-squares estimates (LSEs) for model (7) are best linear unbiased estimates (BLUEs) and standard split-plot F and t statistics have null hypothesis F and t distributions, cf., Christensen (2011, Chapter 11).

We consider a GSP model with n total observations, \(r(M_1)=N\) whole plots and m subplots in each whole plot. In model (7), with covariance structure (6) and assumption (a) we can rewrite (6) as

$$\begin{aligned} V= & {} \sigma _{w}^2\text {Bdiag}(J_mJ_{m}')+\sigma _{s}^2I_{n}= \sigma _s^2 I_n + \sigma _w^2 m M_1. \end{aligned}$$
(8)

Also,

$$\begin{aligned} V= & {} \text {Bdiag}({\tilde{V}})=I_N \otimes {\tilde{V}}, \end{aligned}$$
(9)

where

$$\begin{aligned} {\tilde{V}}=\sigma _{w}^2J_mJ_{m}'+\sigma _{s}^2I_{m}=\left[ \begin{matrix} \sigma _{w}^2+\sigma _{s}^2 &{} \sigma _{w}^2 &{} \ldots &{} \sigma _{w}^2\\ \sigma _{w}^2 &{} \sigma _{w}^2+\sigma _{s}^2 &{} \ldots &{} \sigma _{w}^2\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \sigma _{w}^2 &{} \sigma _{w}^2 &{}\ldots &{} \sigma _{w}^2+\sigma _{s}^2 \end{matrix}\right] . \end{aligned}$$
(10)

In a general mixed model (1) exact statistical inference cannot typically be performed noting that approximation for exact calculations could be made (Lavine et al. 2015). However, there are special cases such as Wald’s tests for variance components and split-plot models where exact inferences are available. The analysis of split-plot designs is complicated by having two different errors in the model. If \(\sigma _w^2=0\), the model (7) reduces to the standard linear model

$$\begin{aligned} Y= & {} X_{*}\delta +X_2\gamma +\epsilon , \qquad {\mathbb {E}}(\epsilon )=0, {\text {Cov}}(\epsilon ) = \sigma ^2_e I_n \nonumber \\= & {} {\tilde{X}}\beta _{*}+\epsilon . \end{aligned}$$
(11)

In particular, if the whole plot model is a one-way ANOVA, i.e., the whole plot design is a completely randomized design (CRD), and the subplot effects involve only subplot main effects and whole plot by subplot interaction then the model reduces to two-way ANOVA with interaction. Further, if the whole plot one-way model is unbalanced this might be an interesting two-way ANOVA model wherein the number of observations on each pair of factors ij is \(k=1,\ldots ,mN_i\) instead of the usual \(k=1,\ldots ,N_{ij}\) which makes whole plot treatments and subplot treatments orthogonal (Christensen 2011, Section 7.4). To test the appropriateness of the standard linear model (11), the hypothesis of interest is whether or not the covariance component for whole plots lies on the boundary of the parameter space, i.e.,

$$\begin{aligned} H_0: \sigma _{w}^2=0 \text { vs. } H_1: \sigma _{w}^2>0. \end{aligned}$$
(12)

There are three competing test statistics for the null hypothesis in (12), the likelihood ratio test (LRT) and the traditional, exact, F-test (Wald’s test) and the likelihood ratio test based on the residual likelihood (RLRT). These tests have been studied extensively over the past few decades (see Crainiceanu and Ruppert 2004; Greven et al. 2008; Wiencierz et al. 2011; Molenberghs and Verbeke 2007; Scheipl et al. 2008), however studies about their equivalnace in the context of split-plot designs are limited.

1.3 Objectives

In this work, we aim to show the inadmissibility of the LRT in testing the hypothesis that the between variance component in a linear mixed model with CS covariance structure and other constraints is zero (i.e. among GSP models). Firstly, we derive explicit expressions for the maximum likelihood estimates (MLEs) for the variance components of the GSP model. Further, we show that the LRT and F tests are equivalent for testing (12) in GSP models when the size of the test, \(\alpha \), is reasonably small and we derive the exact distribution for the LRT test statistics under both \(H_0\) and \(H_1\). In particular, we show that the two tests are equivalent when \(\alpha \le 1-p\) where p is the probability that the LRT statistic is zero and give a proof that the F test has a larger power when \(\alpha >1-p\).

1.4 Additional notations

Let M, \(M_1\), \({\tilde{M}}\) and \(M_{*}\) be the PPOs onto C(X), \(C(X_1)\), \(C({\tilde{X}})\) and \(C(X_{*})\), respectively, so that

$$\begin{aligned} M_1=X_1(X_1^{'}X_1)^{-1}X_1^{'}=\text {Bdiag}\left( \frac{1}{m}J_mJ_{m}'\right) =\frac{1}{m} X_1 X_1', \end{aligned}$$
(13)

and define

$$\begin{aligned} M_2=(I-M_1)X_2\left( X_2^{'}(I-M_1)X_2\right) ^{-}X_2^{'}(I-M_1) \end{aligned}$$
(14)

to be the PPO onto \(C(X_1)^\perp _{C(X)}\) which under our assumptions is also the PPO onto \(C(X_*)^\perp _{C({\tilde{X}})}\). From Christensen (2011, Chapter 11) the projection operators satisfy

$$\begin{aligned} {\tilde{M}}=M_{*}+M_2,\quad M_{*}M_1=M_{*},\quad M_1M_2=0 \quad \text {and}\quad M=M_1+M_2. \end{aligned}$$
(15)

For completeness, these properties are reproven in Appendix 1. The sum of squares for whole plot error and subplot error are, respectively,

$$\begin{aligned} SSE(w) \equiv Y^{'}(M_1-M_{*})Y \quad \text {and}\quad SSE(s) \equiv Y^{'}(I-M)Y , \end{aligned}$$
(16)

such that

$$\begin{aligned} SSE(s)=Y^{'}(I-M)Y =Y^{'}(I-{\tilde{M}})Y-Y^{'}(M_1-M_{*})Y. \end{aligned}$$
(17)

The F statistic for testing (12) is defined as

$$\begin{aligned} F=\frac{MSE(w)}{MSE(s)}=\frac{SSE(w)/[r(X_1)-r(X_{*})]}{SSE(s)/[n-r(X)]}=\frac{SSE(w)/[N-r(X_{*})]}{SSE(s)/[n-r(X)]}. \end{aligned}$$
(18)

2 The equivalence between the LRT and F-test

2.1 Maximum likelihood estimators (MLEs)

The likelihood function for model (7) is:

$$\begin{aligned} L(\beta _{*},\sigma _w^2, \sigma _s^2|Y)=\frac{1}{(2\pi )^{n/2}\left| V \right| ^{1/2}}e^{-1/2(Y-{\tilde{X}}\beta _{*})'V^{-1}(Y-{\tilde{X}}\beta _{*})}, \end{aligned}$$
(19)

where

$$\begin{aligned} V=V(\sigma _w^2, \sigma _s^2) =m\sigma _w^2M_1+\sigma _s^2I_n. \end{aligned}$$
(20)

We provide three lemmas:

Lemma 1

The inverse of \(aI_n+bP\), where P is a PPO and a and b are real numbers such that \(a\ne 0\) and \(a\ne -b\), is

$$\begin{aligned} (aI_n+bP)^{-1}=\frac{1}{a}\left( I_n-\frac{b}{a+b}P \right) . \end{aligned}$$
(21)

Proof of Lemma 1

See Proposition 12.11.1 in Christensen (2011, p.322). \(\square \)

Lemma 2

The determinant of \(aI_n+bP\), where P is a PPO and a and b are real numbers, is

$$\begin{aligned} \left| aI_n+bP \right| =a^{n-r(P)} (a+b)^{r(P)}. \end{aligned}$$
(22)

Proof of Lemma 2

Any nonzero vector in \(C(P)^\perp \) is an eigenvector for the eigenvalue a, so a has multiplicity \(n-r(P)\). Similarly, any nonzero vector in C(P) is an eigenvector for the eigenvalue \(a+b\), so \(a+b\) has multiplicity r(P). The determinant is the product of the eigenvalues and hence \(\left| aI_n+bP \right| =a^{n-r(P)} (a+b)^{r(P)}.\) Appendix 2 contains an illustration. \(\square \)

Lemma 3

For \(q_1, q_2>0\), maximizing the function

$$\begin{aligned} g(x_1, x_2)=-\,\left[ \text {constant}+q_1\log (x_1)+q_2\log (x_2)+q_1\left( \frac{Q_1}{x_1}\right) +q_2\left( \frac{Q_2}{x_2}\right) \right] \nonumber \\ \end{aligned}$$
(23)

subject to the constraint \(x_2\ge x_1> 0\) gives a maximum at \((x_1, x_2)=(Q_1, Q_2)\) when \(x_2>x_1>0\) (i.e. when \((x_1, x_2)\) is in the interior of the constraint) or a maximum at \((x_1, x_2)=\left( \frac{q_1Q_1+q_2Q_2}{q_1+q_2}, \frac{q_1Q_1+q_2Q_2}{q_1+q_2}\right) \) when \(x_2=x_1\) (i.e. when \((x_1,x_2)\) is on the boundary of the constraint).

The proof is in Appendix 3.

Applying Lemmas 1 and 2 to V in (20) gives the following determinant and inverse covariance matrix:

$$\begin{aligned} \left| V \right| =\left| \sigma _s^2+m\sigma _w^2 \right| ^{N}\left| \sigma _s^2 \right| ^{n-N}, \end{aligned}$$
(24)

and

$$\begin{aligned} V^{-1}=\frac{1}{\sigma _s^2}I_{n}-\left( \frac{\sigma _w^2}{\sigma _s^2}\right) \frac{m}{\sigma _s^2+m\sigma _w^2}M_1. \end{aligned}$$
(25)

Substituting (24) in (19) and taking \(-2\) times the natural logarithm leads to

$$\begin{aligned} \ell (\beta _{*}, \sigma _w^2,\sigma _s^2|Y)&= n\log (2\pi )+N\log \left( \sigma _s^2+m\sigma _w^2\right) +N(m-1)\log (\sigma _s^2) \nonumber \\&\quad +(Y-{\tilde{X}}\beta _{*})'V^{-1}(Y-{\tilde{X}}\beta _{*}). \end{aligned}$$
(26)

Proposition 1

The Maximum Likelihood estimators for \(\beta _{*}\), \(\sigma _{w}^2\) and \(\sigma _{s}^2\) of model (7) are

$$\begin{aligned} \hat{\beta _{*}}= & {} \left( {\tilde{X}}'V^{-1}{\tilde{X}} \right) ^{-}{\tilde{X}}'V^{-1}Y=\left( {\tilde{X}}'{\tilde{X}} \right) ^{-}{\tilde{X}}'Y, \end{aligned}$$
(27)
$$\begin{aligned} {\hat{\sigma }}^2_w= & {} \frac{1}{m} max\left\{ 0, \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right\} , \text { and } \end{aligned}$$
(28)
$$\begin{aligned} {\hat{\sigma }}^2_s= & {} min\left\{ \frac{SSE(s)}{n-N}, \frac{SSE(s)+SSE(w)}{n} \right\} , \end{aligned}$$
(29)

such that the pair \({\hat{\sigma }}_w^2=0\) and \({\hat{\sigma }}_s^2=\frac{SSE(s)+SSE(w)}{n}\) occurs when \(\frac{SSE(w)}{N}\le \frac{SSE(s)}{n-N}\) and the other pair \({\hat{\sigma }}_w^2=\frac{1}{m}\left[ \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right] \) and \({\hat{\sigma }}_s^2=\frac{SSE(s)}{n-N}\) occurs otherwise.

Proof of Proposition 1

Differentiating (26) with respect to \(\beta _{*}\) and setting the partial derivative to zero, leads to

$$\begin{aligned} \hat{\beta _{*}}=\left( {\tilde{X}}'V^{-1}{\tilde{X}} \right) ^{-}{\tilde{X}}'V^{-1}Y. \end{aligned}$$
(30)

It is well known, for the fixed effects in mixed models, that the maximum likelihood estimates (MLEs) are also the best linear unbiased estimates (BLUEs) and Christensen (2011, Chapter 11) has shown, since \(C(V{\tilde{X}})\subset C({\tilde{X}})\), that the ordinary least squares estimates (OLSEs) are the BLUEs for \(\beta _{*}\) so they are also the maximum likelihood estimates. Therefore, \(\hat{\beta _{*}}=\left( {\tilde{X}}'V^{-1}{\tilde{X}} \right) ^{-1}{\tilde{X}}'V^{-}Y=\left( {\tilde{X}}'{\tilde{X}} \right) ^{-}{\tilde{X}}'Y\) and subsequently \({\tilde{X}}\hat{\beta _{*}}\) could be computed, using the OLSEs, as

$$\begin{aligned} {\tilde{X}}\hat{\beta _{*}}={\tilde{M}}Y. \end{aligned}$$
(31)

Note that the least squares estimates do not depend on the variance parameters so to find \({\hat{\sigma }}_{w}^2\) and \({\hat{\sigma }}_{s}^2\), we need to maximize

$$\begin{aligned} \ell _*({\hat{\beta }}_*, \sigma _w^2,\sigma _s^2|Y) \equiv -\ell (\hat{\beta _{*}}, \sigma _w^2,\sigma _s^2|Y)= & {} -[n\log (2\pi )+N(m-1)\log (\sigma _s^2)\nonumber \\&+\,N\log \left( \sigma _s^2+m\sigma _w^2\right) +{\varPsi }(\sigma _w^2,\sigma _s^2)],\nonumber \\ \end{aligned}$$
(32)

Not re-expressing \({\varPsi }\) in terms of the sum of square errors would make it impossible to find closed form solutions that maximize \(\ell _*({\hat{\beta }}_*, \sigma _w^2,\sigma _s^2|Y)\). Hence, since \({\varPsi } = (Y-{\tilde{M}}Y)'V^{-1}(Y-{\tilde{M}}Y)\), we get

$$\begin{aligned} \ell _*({\hat{\beta }}_*,\sigma _w^2,\sigma _s^2|Y)&= -\left[ n\log (2\pi )+N(m-1)\log (\sigma _s^2)+N\log \left( \sigma _s^2+m\sigma _w^2\right) \right. \nonumber \\&\quad \left. +\frac{SSE(s)}{\sigma _s^2}+\frac{SSE(w)}{\sigma _s^2+m\sigma _w^2}\right] . \end{aligned}$$
(33)

To apply Lemma 3 to \(\ell _*\), let \(q_1=N(m-1)=n-N\), \(q_2=N\), \(x_1=\sigma _s^2\), \(x_2=\sigma _s^2+m\sigma _w^2\), \(Q_1=\frac{SSE(s)}{q_1}\) and \(Q_2=\frac{SSE(w)}{q_2}\). A key point is \(x_2\ge x_1\) so our maximization has to be done subject to that constraint. Lemma 3 gives the maximizers

$$\begin{aligned}&\left( {\hat{\sigma }}_s^2, {\hat{\sigma }}_s^2+m{\hat{\sigma }}_w^2\right) =\left( \frac{SSE(s)}{n-N}, \frac{SSE(w)}{N}\right) \nonumber \\&\iff \left( {\hat{\sigma }}_s^2, {\hat{\sigma }}_w^2\right) =\left( \frac{SSE(s)}{n-N}, \frac{1}{m}\left[ \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N} \right] \right) \end{aligned}$$
(34)

when \(\frac{SSE(w)}{N}>\frac{SSE(s)}{n-N}\) and

$$\begin{aligned}&\left( {\hat{\sigma }}_s^2, {\hat{\sigma }}_s^2+m{\hat{\sigma }}_w^2\right) =\left( \frac{SSE(s)+SSE(w)}{n}, \frac{SSE(s)+SSE(w)}{n}\right) \nonumber \\&\iff \left( {\hat{\sigma }}_s^2, {\hat{\sigma }}_w^2\right) =\left( \frac{SSE(s)+SSE(w)}{n}, 0\right) \end{aligned}$$
(35)

when \(\frac{SSE(w)}{N}\le \frac{SSE(s)}{n-N}\).

Now, suppose that the MLE of \(\sigma _w^2\) is \({\hat{\sigma }}^2_w=\frac{1}{m} \left( \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right) \) then \(\frac{SSE(s)}{n-N}\le \frac{SSE(w)}{N}\) so that

$$\begin{aligned} \frac{SSE(s)}{n-N}\le \left( \frac{n-N}{n}\right) \frac{SSE(s)}{n-N}+\left( \frac{N}{n}\right) \frac{SSE(w)}{N}=\frac{SSE(s)+SSE(w)}{n} \end{aligned}$$
(36)

with the first inequality true because the term in the middle is a weighted average so has to be larger than the smaller of the two things being averaged, therefore the MLE of \(\sigma _s^2\) is the smaller of the terms \(\frac{SSE(s)}{n-N}\) and \(\frac{SSE(s)+SSE(w)}{n}\). That is, the larger term between 0 and \(\frac{1}{m} \left( \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right) \) forces the answer to be the smaller term between \(\frac{SSE(s)}{n-N}\) and \(\frac{SSE(s)+SSE(w)}{n}\) and vice versa. Hence (34) and (35) could be written via max and min as

$$\begin{aligned} {\hat{\sigma }}^2_w=\frac{1}{m} max\left\{ 0, \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right\} \end{aligned}$$

and

$$\begin{aligned} {\hat{\sigma }}^2_s=min\left\{ \frac{SSE(s)}{n-N}, \frac{SSE(s)+SSE(w)}{n} \right\} \end{aligned}$$

\(\square \)

Note that the partial derivatives for (33) are

$$\begin{aligned} \frac{\partial \ell }{\partial \sigma _w^2}=-\,\left[ \frac{n}{\sigma _s^2+m\sigma _w^2}-\frac{m SSE(w)}{(\sigma _s^2+m\sigma _w^2)^2}\right] , \end{aligned}$$
(37)

and

$$\begin{aligned} \frac{\partial \ell }{\partial \sigma _s^2}=-\,\left[ \frac{N}{\sigma _s^2+m\sigma _w^2}+\frac{N(m-1)}{\sigma _s^2}-\frac{SSE(s)}{(\sigma _s^2)^2}-\frac{SSE(w)}{(\sigma _s^2+m\sigma _w^2)^2}\right] . \end{aligned}$$
(38)

So, for varification purposes, pluging in the pair \({\hat{\sigma }}_w^2=0\) and \({\hat{\sigma }}_s^2=\frac{SSE(s)+SSE(w)}{n}\) into (38) and the other pair \({\hat{\sigma }}_w^2=\frac{1}{m}\left[ \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right] \) and \({\hat{\sigma }}_s^2=\frac{SSE(s)}{n-N}\) into (37) gives zero as desired.

3 Monotonic relationship between the LRT and F-test statistics

We show that the LRT statistic \({\varLambda }\) is a monotone function of the F-test statistic F for testing the null hypothesis in (12). When \({\varLambda }\) is not 0, the monotone relationship is strict, so whenever the size of the test \(\alpha \) is smaller than the probability \(1-p_m\) that \({\varLambda }\ne 0\), the tests are equivalent. We also examine the behavior of the tests when they are not equivalent (i.e. when \(\alpha >1-p_m\)). To establish this monotone relationship we need to distinguish between the sum of squared errors and model parameters under the reduced model in (11) versus the full model in (7). In particular, the sum of squares for errors under the reduced model is

$$\begin{aligned} SSE(e) \equiv SSE(w) + SSE(s). \end{aligned}$$
(39)

Similarly, we use \(\sigma _{e}^2\) to denote the variance parameter corresponding to the reduced model while \(\sigma _{w}^2\) and \(\sigma _{s}^2\) denote the variance parameters of the full model. Since the reduced and full models are nested we present the equivalance between the LRT and F-test under the full model. Since there are two cases for the MLEs of \(\sigma _w^2\) and \(\sigma _s^2\), the relationship between the statistics \({\varLambda }\) and F will be decomposed into two cases as well.

Proposition 2

The LRT statistic \({\varLambda }\), for (12), is a monotone function of the F statistic. In particular, (case-I) when \({\hat{\sigma }}_{w}^2=0\) and \({\hat{\sigma }}_{s}^2=\frac{SSE(s)+SSE(w)}{n}\) we have

$$\begin{aligned} {\varLambda }=0, \end{aligned}$$
(40)

and (case-II) when \({\hat{\sigma }}_{w}^2=\frac{1}{m}\left[ \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N} \right] \) and \({\hat{\sigma }}_{s}^2=\frac{SSE(s)}{n-N}\) we have

$$\begin{aligned} {\varLambda }= & {} n\log \left( \frac{m-1}{m}\right) +N\log \left( \frac{N(n-r(X))}{(n-N)(N-r(X_{*}))}\right) \nonumber \\&+\,n\log \left( 1+\frac{N-r(X_{*})}{n-r(X)}F\right) +N\log \left( \frac{1}{F}\right) \end{aligned}$$
(41)

such that case-I occurs when \(F\le \kappa \) and case-II occurs when \(F>\kappa \) where

$$\begin{aligned} \kappa =\frac{n-r(X)}{(m-1)(N-r(X_{*}))}. \end{aligned}$$
(42)

In Case-II, the relationship is strictly monotone so the tests are equivalent as long as the alpha level is smaller than \(P(F >\kappa )\). The plot of \({\varLambda }\) as a function of F is presented in Fig. 1.

Fig. 1
figure 1

A plot of the LRT statistic \({\varLambda }\) versus the F statistic according to (40) and (41). When \(F\le \kappa \), \({\varLambda }\) is constant and equal to 0

Proof of proposition (2)

To examine the variation between plots, we test the whole plot error as in (12). Note that, under \(H_0\) we get the reduced model (11) where \(\epsilon \sim N(0,\sigma _{e}^2I)\) with

$$\begin{aligned} L(\beta _{*},\sigma _{e}^2|Y)=\frac{1}{(2\pi \sigma _{e}^2)^{n/2}}e^{-\frac{1}{2\sigma _{e}^2}(Y-{\tilde{X}}\beta _{*})'(Y-{\tilde{X}}\beta _{*})}, \end{aligned}$$
(43)

and

$$\begin{aligned} \ell ({\hat{\beta }}_*,\sigma _{e}^2|Y)=n\log (2\pi )+n\log (\sigma _{e}^2)+\frac{SSE(e)}{\sigma _{e}^2}, \end{aligned}$$
(44)

so that the MLE of \(\sigma _{e}^2\) is

$$\begin{aligned} {\hat{\sigma }}_{e}^2=\frac{SSE(e)}{n}. \end{aligned}$$
(45)

In our case, (7) is the full model and (11) is the reduced one so that

$$\begin{aligned} {\varLambda }= & {} \ell _R ({\hat{\beta }}_*,{\hat{\sigma }}^2_e|Y)-\ell _F ({\hat{\beta }}_*,{\hat{\sigma }}^2_w,{\hat{\sigma }}^2_s|Y), \end{aligned}$$
(46)

where \(\ell _R ({\hat{\beta }}_*,{\hat{\sigma }}^2_e|Y)\equiv \sup \ell ({\hat{\beta }}_*,\sigma ^2_e|Y)\) and \(\ell _F ({\hat{\beta }}_*,{\hat{\sigma }}^2_w,{\hat{\sigma }}^2_s|Y)\equiv \sup \ell ({\hat{\beta }}_*,\sigma ^2_w,\sigma ^2_s|Y)\) such that the R and F subscripts refer to the reduced and full models respectively. Using (39), then plugging (45) into (44) gives

$$\begin{aligned} \ell _R ({\hat{\beta }}_*,{\hat{\sigma }}^2_e|Y)=n\log \left( 2\pi \right) +n-n\log \left( n\right) +n\log \left( SSE(s)+SSE(w)\right) . \end{aligned}$$
(47)

Further, if we let \({\hat{\sigma }}_{s}^2\) and \({\hat{\sigma }}_{w}^2\) be the MLEs for \(\sigma _{s}^2\) and \(\sigma _{w}^2\) and plug them into (33) we get

$$\begin{aligned} \ell _F ({\hat{\beta }}_*,{\hat{\sigma }}^2_w,{\hat{\sigma }}^2_s|Y)= & {} n\log \left( 2\pi \right) +N(m-1)\log ({\hat{\sigma }}_{s}^2)+N\log ({\hat{\sigma }}_{s}^2+m{\hat{\sigma }}_{w}^2)\nonumber \\&+\frac{SSE(s)}{{\hat{\sigma }}_{s}^2}+\frac{SSE(w)}{{\hat{\sigma }}_{s}^2+m{\hat{\sigma }}_{w}^2}. \end{aligned}$$
(48)

Case-I: If \({\hat{\sigma }}_{w}^2=0\) and \({\hat{\sigma }}_{s}^2=\frac{SSE(s)+SSE(w)}{n}\) then

$$\begin{aligned} {\varLambda }= & {} n-n\log \left( n\right) +n\log \left( SSE(s)+SSE(w)\right) \nonumber \\&-n\log \left( \frac{SSE(s)+SSE(w)}{n}\right) -n=0. \end{aligned}$$
(49)

Case-II: If \({\hat{\sigma }}_{w}^2=\frac{1}{m}\left[ \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right] \) and \({\hat{\sigma }}_{s}^2=\frac{SSE(s)}{n-N}\) then, by (18), we get

$$\begin{aligned} {\varLambda }= & {} n\log \left( \frac{m-1}{m}\right) +N\log \left( \frac{N(n-r(X))}{(n-N)(N-r(X_{*}))}\right) \nonumber \\&+\,n\log \left( 1+\frac{N-r(X_{*})}{n-r(X)}F\right) +N\log \left( \frac{1}{F}\right) . \end{aligned}$$
(50)

We note that this case holds only when \({\hat{\sigma }}_{w}^2>0\) so F must be larger than \(\kappa \) since

$$\begin{aligned}&\frac{1}{m}\left[ \frac{SSE(w)}{N}-\frac{SSE(s)}{n-N}\right]> 0 \iff F>\frac{n-r(X)}{(m-1)(N-r(X_{*}))}=\kappa . \end{aligned}$$
(51)

The function of \({\varLambda }\) for Case II given in (50) is strictly increasing in F since

$$\begin{aligned}&\frac{\partial {\varLambda }}{\partial F}>0 \iff \frac{n\frac{N-r(X_{*})}{n-r(X)}}{1+\frac{N-r(X_{*})}{n-r(X)}F}-\frac{N}{F}>0 \iff F>\kappa . \end{aligned}$$
(52)

\(\square \)

4 The distribution of the F-test and LRT statistics

The form of relation between the \({\varLambda }\) and F helps us better understand the distribution of \({\varLambda }\). In particular, this relation implies an important lemma (Lemma 4) on the distribution of \({\varLambda }\) for (7).

Proposition 3

The distribution of the \(F-ratio\) in (18) for the model in (7) is a constant multiple of a central F distribution,

$$\begin{aligned} F=\frac{MSE(w)}{MSE(s)}\sim \frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}F(N-r(X_{*}), n-r(X)). \end{aligned}$$
(53)

Proof of Proposition (3)

It has been shown in Chapter 11.2 of Christensen (2011) that

$$\begin{aligned} W_1&{:=}&\frac{SSE(w)}{\sigma _s^2+m\sigma _w^2}=\frac{Y'(M_1-M_{*})Y}{\sigma _s^2+m\sigma _w^2}\sim \chi _{N-r(X_{*})}^2, \end{aligned}$$
(54)
$$\begin{aligned} W_2&{:=}&\frac{SSE(s)}{\sigma _s^2}=\frac{Y'(I-M)Y}{\sigma _s^2}\sim \chi _{n-r(X)}^2, \end{aligned}$$
(55)

and SSE(w) is independent of SSE(s). Thus, since the F-distribution arises from the ratio of two independent chi-squared random variables, each divided by its respective degrees of freedom, we have

$$\begin{aligned} F= & {} \frac{SSE(w)/[N-r(X_{*})]}{SSE(s)/[n-r(X)]}\nonumber \\= & {} \left( \frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}\right) \frac{W_1/[N-r(X_{*})]}{W_2/[n-r(X)]}\sim \frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}F(N-r(X_{*}), n-r(X)).\nonumber \\ \end{aligned}$$
(56)

\(\square \)

Now, if we let \(W\sim F\left( N-r(X_{*}), n-r(X)\right) \) then \({\hat{\sigma }}_w^2=0 \iff F \le \kappa \iff W\le \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\) where the first if and only if is an exact relationship and the second one is only a distributional relationship.

Lemma 4

The distribution of the LRT statistic \({\varLambda }\) for the model in (7) is determined by the relationship in (57) where \(W\sim F\left( N-r(X_{*}), n-r(X)\right) \), \(p_m\), a, and \(\tau \) are as described.

$$\begin{aligned} {\varLambda }\sim \left\{ \begin{array}{ll} 0 &{} \quad \text {if}\quad W\le \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2} \\ \tau +N\log \left( \frac{(1+aW)^m}{W} \right) &{} \quad \text {if}\quad W> \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2} \end{array} \right. \end{aligned}$$
(57)

such that

$$\begin{aligned} p_m=Pr\left( W\le \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\right) , \end{aligned}$$
(58)

where

$$\begin{aligned} a=\frac{N-r(X_{*})}{n-r(X)}\frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}, \end{aligned}$$
(59)

and

$$\begin{aligned} \tau =N\log \left( \frac{(m-1)^{m-1}}{m^m}\frac{1}{a}\right) . \end{aligned}$$
(60)

Proof of Lemma 4

From case-I of Proposition 2 we know that \({\varLambda }=0\) iff \(F\le \kappa \) so

$$\begin{aligned} p_m\equiv & {} P({\varLambda }=0)= P\left( F\le \kappa \right) \nonumber \\= & {} P\left( \frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}W\le \kappa \right) =P\left( W\le \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\right) . \end{aligned}$$
(61)

The equality in the second line of (61) holds due to Proposition 3.

Now, from case-II of Proposition 2 we also know that, if \(F> \kappa \) (i.e. when \(W> \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\)),

$$\begin{aligned} {\varLambda }= & {} n\log \left( \frac{m-1}{m}\right) +N\log \left( \frac{N(n-r(X))}{(n-N)(N-r(X_{*}))}\right) \nonumber \\&+\,n\log \left( 1+\frac{N-r(X_{*})}{n-r(X)}F\right) +N\log \left( \frac{1}{F}\right) . \end{aligned}$$
(62)

So, if we let \(W\sim F\left( N-r(X_{*}), n-r(X)\right) \) then substituting F from (53) into (62) gives

$$\begin{aligned} {\varLambda }= & {} n\log \left( \frac{m-1}{m}\right) +N\log \left( \frac{N(n-r(X))}{(n-N)(N-r(X_{*}))}\right) \nonumber \\&+\,n\log \left( 1+\frac{N-r(X_{*})}{n-r(X)}F\right) +N\log \left( \frac{1}{F}\right) \nonumber \\\sim & {} \tau +N\log \left[ \frac{(1+aW)^m}{W}\right] , \end{aligned}$$
(63)

where \(a=\frac{N-r(X_{*})}{n-r(X)}\frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}\) and \(\tau =N\log \left( \frac{(m-1)^{m-1}}{m^m}\frac{1}{a}\right) \). \(\square \)

We note that \(\lim _{W \rightarrow \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}}{\varLambda }=0\). This result can be verified by plugging in \(W=\frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\) in the third equality of (63). In practice, the probability mass at zero for the likelihood ratio test in (58) should be numerically estimated by using the MLEs as follows.

$$\begin{aligned} p_m\equiv & {} P({\varLambda }=0)= P\left( F\le \kappa \right) \nonumber \\= & {} P\left( W\le \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\right) \approx P\left( W\le \frac{\kappa \hat{\sigma _{s}}^2}{\hat{\sigma _{s}}^2+m{\hat{\sigma }}_{w}^2}\right) . \end{aligned}$$
(64)

Providing an exact distribution of the test statistic for the LRT in these models will help in calculating a more accurate p-value than the traditionally used p-value from the large sample chi-square mixture approximations introduced by Self and Liang (1987). The distribution of \({\varLambda }\) under the null hypothesis is obtained by substituting zero for \(\sigma _w^2\) in Lemma 4. This implies that, under \(H_0\), the probability that \({\varLambda }\) is zero does not depend on the parameter \(\sigma _s^2\) and equals

$$\begin{aligned} p_m=P\left( W\le \kappa \right) . \end{aligned}$$
(65)

GSP models offer remarkable advantages in the ability to perform exact calculations. For example, Crainiceanu and Ruppert (2004) derive the probability mass at zero for the likelihood ratio test in linear mixed models (LMM) with one variance component as

$$\begin{aligned} p_c=P\left( \frac{\sum _{s=1}^K\mu _{s,n}w_s^2}{\sum _{s=1}^{n-{\tilde{p}}}w_s^2}\le \frac{1}{n}\sum _{i=1}^K\xi _{s,n}\right) , \end{aligned}$$
(66)

where \(\mu _{s,n}\) and \(\xi _{s,n}\) are the K eigenvalues of the \(K\times K\) matrices \(X_1^{'}P_{0}X_1\) and \(X_1^{'}X_1\) respectively, where \(w_i\sim N(0,1)\), \(P_{0}=I_n-{\tilde{X}}\left( {\tilde{X}}^{'}{\tilde{X}}\right) ^{-1}{\tilde{X}}^{'}\) and \({\tilde{p}}\) is the dimensionality of the vector \(\beta _{*}\) in view of (7). We note that \({\tilde{p}}\) is the rank of the design matrix \({\tilde{X}}\) instead of the dimensionality of the vector \(\beta _{*}\) when the generalized inverse is used. We present the equivalence between the two formulas in (65) and (66) in the discussion that follows.

Considering the model in (7), one can show that the eigenvalues \(\xi _{s,n}\), of \(X_1^{'}X_1\), are m of multiplicity N and the eigenvalues \(\mu _{s,n}\), of \(X_1^{'}P_{0}X_1\), are m of multiplicity \(N-r(X_{*})\) and zero of multiplicity \(r(X_{*})\). Further, from (15), it’s immediate that \(M_{*}\perp (M-M_1)\) and \(M_1\subset M\) so

$$\begin{aligned} r({\tilde{X}})= & {} r({\tilde{M}})=r(M_{*}+M_2)=r(M_{*}+(M-M_1))\nonumber \\= & {} r(M_{*})+r(M-M_1)=r(M_{*})+r(M)-r(M_1)=r(X_{*})+r(X)-N.\nonumber \\ \end{aligned}$$
(67)

Thus, if we let \(w_s\sim N(0,1)\) then according to (66), the probability mass at zero for the likelihood ratio test in testing (12) is

$$\begin{aligned}&p_c=P\left( \frac{\sum _{s=1}^K\mu _{s,n}w_s^2}{\sum _{s=1}^{n-p}w_s^2}\le \frac{1}{n}\sum _{i=1}^K\xi _{s,n}\right) =P\left( \frac{\sum _{s=1}^{N-r(X_{*})}mw_s^2}{\sum _{s=1}^{n-(r(X_{*})+r(X)-N)}w_s^2}\le \frac{1}{n}\sum _{i=1}^{N}m\right) \nonumber \\\iff & {} p_c=P\left( \frac{\sum _{s=1}^{N-r(X_{*})}w_s^2}{\sum _{s=N-r(X_{*})+1}^{n-(r(X_{*})+r(X)-N)}w_s^2}\le \frac{1}{m-1}\right) =P\left( \frac{W_1}{W_2}\le \frac{1}{m-1}\right) \nonumber \\\iff & {} p_c=P\left( W\le \kappa \right) , \end{aligned}$$
(68)

which is the same as (65).

The formula in (65) is an easier way of getting the probability mass at zero for the likelihood ratio test under \(H_0\) than (66). In fact, our results stand out relative to the derivation of Crainiceanu and Ruppert (2004) for being nice and compact and because it’s not possible in general to give such a short and straightforward expression for computing the probability mass at zero for the LRT statistic. Also, to the best of our knowledge, it is the first time that an explicit mathematical form, Lemma 4, has been presented for the LRT for any variance component in a linear mixed model under the full model. This allows us compute the power of the test through a formula instead of a Monte Carlo simulation.

5 Power comparison

This section includes four subsections. In the Sect. 1, we illustrate the steps for computing the critical value and power of the F-test and give a concrete example. In the Sect. 2, we illustrate the steps for computing the critical value and power of the LRT when \(\alpha \le 1-p_m\) (i.e. when there is no randomized test) and give a concrete example; we use the very same example that we use for the F-test to show the equivalence in power through a numerical example. These illustrative steps and example, of the Sect. 2, don’t use the relationship between the two test statistics \({\varLambda }\) and F. Thus, the Sect. 2 is concluded with a very short theorem on the equivalence between the two test for the case when \(\alpha \le 1-p_m\). The Sect. 4 gives a detailed proof, without numerical examples, that the F-test has a larger power than the LRT when \(\alpha > 1-p_m\). The Sect. 5 discusses the practicality of the second case when \(\alpha > 1-p_m\).

5.1 The power of the F-test

The F-test statistic, by Proposition 3, has a distribution that is a constant multiple of an F distribution as \(F\sim \frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}F(N-r(X_{*}), n-r(X))\). Since the F-test statistic is denoted by F and the F-distribution is traditionally known by the symbol F, to eliminate ambiguities, we let \(W\sim F\left( N-r(X_{*}), n-r(X)\right) \). Then, at a given significance level \(\alpha \), the critical value C is computed under \(H_0\) as

$$\begin{aligned}&\alpha =P\left( F\ge C | H_0 \text { is true }\right) \iff \alpha =P\left( \frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}W\ge C | \sigma _w^2=0\right) \nonumber \\&\quad \iff \alpha =P\left( W\ge W_{\alpha } \right) \nonumber \\&\quad \iff W_{\alpha }=G^{-1}(1-\alpha ), \end{aligned}$$
(69)

G is the CDF for \(F_{(N-r(X_{*}), n-r(X))}\). For example, if we let \(N-r(X_*)=3\) and \(n-r(X)=9\) then, for \(\alpha =0.05\), C is found as \(C=G^{-1}(1-\alpha )=G^{-1}(0.95)=3.86255.\)

If \(m=4\), \(\sigma _s^2=3\) and \(\sigma _w^2=7\), then the power of a size \(\alpha \) F-test is

$$\begin{aligned}&{\varXi }_F=P\left( F\ge C | H_a \text { is true }\right) \iff {\varXi }_F=P\left( \frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}W\ge W_{\alpha } \right) \nonumber \\&\quad \iff {\varXi }_F=1-G\left( \frac{\sigma _s^2}{\sigma _s^2+m\sigma _w^2}W_{\alpha } \right) . \end{aligned}$$
(70)

For example, if we let \(N-r(X_*)=3\), \(n-r(X)=9\), \(m=4\), \(\sigma _s^2=3\) and \(\sigma _w^2=7\) then, for \(\alpha =0.05\), the critical value is 3.86255 and the power is \({\varXi }_F=1-G\left( \frac{3}{31}\times 3.86255\right) =0.7741\).

5.2 The power of the LRT when \(\alpha \le 1-p_m\)

The LRT statistic, by Lemma 4, has a mixture distribution as

$$\begin{aligned} {\varLambda }\sim \left\{ \begin{array}{ll} 0 &{} \quad \text {w.p}\quad p_m \\ \tau +N\log \left( \frac{(1+aW)^m}{W} \right) &{} \quad \text {w.p}\quad 1-p_m \end{array} \right. \end{aligned}$$
(71)

such that \(p_m\), \(\tau \) and a are defined in Lemma 4. Thus, at a given significance level \(\alpha \), the critical value \(C'\) is computed under \(H_0\) as

$$\begin{aligned}&\alpha =P\left( {\varLambda }\ge C' | H_0 \text { is true }, W>\frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\right) \nonumber \\&\iff (1-p_m) P\left( \tau +N\log \left( \frac{(1+aW)^m}{W} \right) \ge C' | \sigma _w^2=0, W>\kappa \right) =\alpha \nonumber \\&\iff C'=G'^{-1}\left( 1-\frac{\alpha }{1-p_m}\right) , \end{aligned}$$
(72)

where \(G'\) is the CDF of the transformed random variable \(\tau +N\log \left( \frac{(1+aW)^m}{W} \right) \) for \(W\sim F\left( N-r(X_{*}), n-r(X)\right) \) when \(W>\kappa \) and \(\sigma _w^2=0\). For example, if we let \(N-r(X_*)=3\) and \(n-r(X)=9\) then, for \(\alpha =0.05\) under \(H_0\), \(\kappa =1\), \(p_m=0.56371\) and

$$\begin{aligned} {\varLambda }= \left\{ \begin{array}{ll} 0 &{} \quad \text {w.p}\quad 0.56371 \\ 6\log \left[ \frac{81}{256}\frac{\left( 1+\frac{W}{3}\right) ^4}{W} \right] &{} \quad \text {w.p}\quad 0.43629 \end{array} \right. \end{aligned}$$
(73)

so that \(C'\) is found, by numerical simulation or numerical integration after transformation, as \(C'=G'^{-1}(1-\frac{\alpha }{1-p_m})=G^{-1}(0.8853973)=4.848.\) See Castellacci (2012) for more details on computing the quantiles of mixture distributions.

If \(m=4\), \(\sigma _s^2=3\) and \(\sigma _w^2=7\) then the power of a size \(\alpha \) LRT is

$$\begin{aligned}&{\varXi }_{LRT}=P\left( {\varLambda }\ge C' | H_a \text { is true }, W>\frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\right) \nonumber \\&\iff {\varXi }_{LRT}\nonumber \\&=(1-p_m)P\left( \tau +N\log \left( \frac{(1+aW)^m}{W} \right) \ge C' | \sigma _w^2>0, W>\frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\right) \nonumber \\&\iff {\varXi }_{LRT}=(1-p_m)\left[ 1-G^{''}(C')\right] \end{aligned}$$
(74)

where \(G^{''}\) is the CDF of the transformed random variable \(\tau +N\log \left( \frac{(1+aW)^m}{W} \right) \) for \(W\sim F\left( N-r(X_{*}), n-r(X)\right) \) when \(W> \frac{\kappa \sigma _s^2}{\sigma _s^2+m\sigma _w^2}\) and \(\sigma _w^2>0\). For example, if we let \(N-r(X_*)=3\), \(n-r(X)=9\), \(m=4\), \(\sigma _s^2=3\) and \(\sigma _w^2=7\) then, for \(\alpha =0.05\) under \(H_1\), \(\kappa =\frac{3}{4}\), \(p_m=0.04014\) and

$$\begin{aligned} {\varLambda }\sim \left\{ \begin{array}{ll} 0 &{} \quad \text {w.p}\quad 0.04014 \\ 6\log \left[ \frac{243}{7936}\frac{\left( 1+\frac{31W}{9}\right) ^4}{W} \right] &{} \quad \text {w.p}\quad 0.95986 \end{array} \right. \end{aligned}$$
(75)

so that \({\varXi }_{LRT}\) is found, by numerical simulation or numerical integration after transformation, as \({\varXi }_{LRT}=(1-p_m)\left[ 1-G^{''}(C')\right] =0.95986\left[ 1-G^{''}(4.848)\right] =0.7741.\)

Note that both test statistics \({\varLambda }\) and F are nonnegative and whenever \({\varLambda } \ne 0\) there is a strict monotonic relationship and thus when the LRT critical region does not include 0, the tests are the same. In fact, in the case when \(\alpha \le 1-p_m\), the critical region will consist of positive values where \({\varLambda }\) is a strictly increasing function of the F, thus we have

Proposition 4

Let \(\alpha \) be the size of the test. If \(\alpha \le 1-p_m\) where \(p_m=P({\varLambda }=0| \sigma _w^2=0)\) then the F-test and LRT are equivalent and hence have the same power.

Proof of Proposition (4)

Since \({\varLambda }\) can be written as

$$\begin{aligned} {\varLambda }\sim \left\{ \begin{array}{ll} 0 &{} \quad \text {w.p}\quad p_m \\ g(F) &{} \quad \text {w.p}\quad 1-p_m \end{array} \right. \end{aligned}$$
(76)

where g(.) is a strictly increasing function, then the critical region of the LRT when \(\alpha \le 1-p_m\) doesn’t involve 0 and hence the power can be calculated as

$$\begin{aligned}&{\varXi }_{LRT}=P\left( {\varLambda }\ge C' | H_a \text { is true }\right) \iff {\varXi }_{LRT}=P\left( g(F)\ge C' | \sigma _w^2>0\right) \nonumber \\&\quad \iff {\varXi }_{LRT}=P\left( F\ge C'' | \sigma _w^2>0\right) = {\varXi }_{F}. \end{aligned}$$
(77)

\(\square \)

Fig. 2
figure 2

A plot of the LRT statistic versus the F-ratio showing their equivalence whenever \(\alpha \le 1-p_m\). \(W_{1-p_m}=\kappa \) is the minimal critical value at which the two tests are equivalent

Figure 2 illustrates the equivalence of the F-test and LRT whenever \(\alpha \le 1-p_m\) where \(p_m=P({\varLambda }=0| \sigma _w^2=0)\). Further, it clarifies why the two tests are equivalent as long as the critical value \(W_{\alpha }\) of the F-test is larger than \(W_{1-p_m}\); the minimal critical value at which the two tests are equivalent. In fact, under the null hypothesis of \(\sigma _w^2=0\) we have \(\lim _{W \rightarrow \kappa }{\varLambda }=0\).

5.3 Power comparison when \(\alpha > 1-p_m\)

In the case when \(\alpha >1-p_m\), the critical region of the LRT involves \({\varLambda }=0\) hence it involves randomization. We show mathematically, for this case, that the power of the F-test is larger than that of the LRT. Let \(k:=\frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}\). Firstly, we rewrite the power of a size \(\alpha \) F-test in terms of \(p_m\) the probability that \({\varLambda }=0\) under \(H_1\) as follows.

$$\begin{aligned} {\varXi }_F= & {} P\left( F\ge W_{\alpha }\right) =P(F\ge W_{1-p_m})+P(W_{\alpha }\le F\le W_{1-p_m})\nonumber \\= & {} P\left( kW\ge W_{1-p_m}\right) +P\left( W_{\alpha }\le kW\le W_{1-p_m}\right) \nonumber \\= & {} P\left( W\ge \frac{1}{k}W_{1-p_m}\right) +P\left( W\le \frac{1}{k}W_{1-p_m}\right) -P\left( W\le \frac{1}{k}W_{\alpha }\right) . \end{aligned}$$
(78)

Note that the second equality in (78) is due to the probabilistic identity \(P(E)+P(E^c)=1\). Secondly, we rewrite the randomized test for the LRT in terms of the F-test according to the monotonic relationship between their test statistics and the smallest critical value, \(W_{1-p_m}=\kappa \), where the F and LRT tests are equivalent as follows.

$$\begin{aligned} \phi ({\varLambda })= \left\{ \begin{array}{lll} 1 &{} \quad \text {if}\quad {\varLambda }>0\\ \gamma &{} \quad \text {if}\quad {\varLambda }=0\\ 0 &{} \quad \text {if}\quad {\varLambda }<0 \end{array} \right. \iff \phi (F)= \left\{ \begin{array}{lll} 1 &{} \quad \text {if}\quad F>W_{1-p_m}\\ \gamma &{} \quad \text {if}\quad F\le W_{1-p_m} \end{array} \right. \end{aligned}$$
(79)

where \(\gamma \) is determined according to the size of the test as

$$\begin{aligned} \alpha= & {} E_{H_0}\phi ({\varLambda })\iff \alpha =P({\varLambda }>0|\sigma _w^2=0)+\gamma P({\varLambda }=0|\sigma _w^2=0)\nonumber \\\iff & {} \alpha =(1-p_m)+\gamma p_m\nonumber \\\iff & {} \gamma =\frac{\alpha -(1-p_m)}{p_m}. \end{aligned}$$
(80)

Hence, the power of the LRT is

$$\begin{aligned} {\varXi }_{LRT}= & {} P\left( F\ge W_{1-p_m}\right) +\frac{\alpha -(1-p_m)}{p_m}P\left( F\le W_{1-p_m}\right) \nonumber \\= & {} P\left( W\ge \frac{1}{k}W_{1-p_m}\right) +\frac{\alpha -(1-p_m)}{p_m}P\left( W\le \frac{1}{k}W_{1-p_m}\right) . \end{aligned}$$
(81)

Proposition 5

Let \(p_m=P({\varLambda }=0|\sigma _w^2=0)\). For GSP models with a finite whole plots size m, if \(\alpha > 1-p_m\) then the power of the size \(\alpha \) F-test is larger than that of the LRT in testing \(\sigma _w^2=0\).

Table 1 The maximal values of \(\alpha \) satisfying the inequality \(\alpha \le P\left( W>\frac{n-r(X)}{(m-1)(N-r(X_{*}))}\right) \) for different combinations of the degrees of freedom \(df_1=N-r(X_*)\) and \(df_2=n-r(X)\) when \(m=2\)
Table 2 The maximal values of \(\alpha \) satisfying the inequality \(\alpha \le P\left( W>\frac{n-r(X)}{(m-1)(N-r(X_{*}))}\right) \) for different combinations of the degrees of freedom \(df_1=N-r(X_*)\) and \(df_2=n-r(X)\) when \(m=4\)

Proof of Proposition 5

It’s sufficient to show that

$$\begin{aligned}&P\left( W\le \frac{1}{k}W_{1-p_m}\right) -P\left( W\le \frac{1}{k}W_{\alpha }\right)>\frac{\alpha -(1-p_m)}{p_m}P\left( W\le \frac{1}{k}W_{1-p_m}\right) \nonumber \\&\quad \iff \frac{1}{p_m}P\left( W\le \frac{1}{k}W_{1-p_m}\right) >\frac{1}{1-\alpha }P\left( W\le \frac{1}{k}W_{\alpha }\right) . \end{aligned}$$
(82)

Since \(k=\frac{\sigma _s^2+m\sigma _w^2}{\sigma _s^2}>1\), this follows from Qeadan and Christensen (2014). \(\square \)

5.4 Is \(\alpha > 1-p_m\) practical?

The LRT and F-test are equivalent as long as the level of the test is smaller or equal to \(P(W>\kappa )\) where \(\kappa =\frac{n-r(X)}{(m-1)(N-r(X_{*}))}\) and \(W\sim F_{N-r(X_*), n-r(X)}\). That is, the two tests are equivalent for all \(\alpha \)’s satisfying the inequality \(\alpha \le P\left( W>\frac{n-r(X)}{(m-1)(N-r(X_{*}))}\right) \). Table 1 presents the maximal values of \(\alpha \) satisfying this inequality for different combinations of the degrees of freedom \(df_1=N-r(X_*)\) and \(df_2=n-r(X)\) when \(m=2\). Since the increase in m, for being in the denominator of \(\frac{n-r(X)}{(m-1)(N-r(X_{*}))}\), increases the maximal values of \(\alpha \) satisfying the inequality \(\alpha \le P\left( W>\frac{n-r(X)}{(m-1)(N-r(X_{*}))}\right) \), it’s sufficient to provide another Table (see Table 2) for the case when \(m=4\) to explain the pattern in which those maximal values of \(\alpha \) behave as a function of m. The highlighted cells of Table 1 in italic represent the combination of degrees of freedom for which the F-test has a larger power than the LRT at the \(5\%\) significance level when \(m=2\). The very same thing is true for Table 2 when \(m=4\). We observed from simulation, and below give a mathematical proof, that as m increases the power of the LRT approaches that of the F-test. Typically, the degrees of freedom for subplot error \(df_2\) are much larger than the degrees of freedom for whole plots error \(df_1\); for GSP models. So, the \(\alpha > 1-p_m\) case is very practical.

Proposition 6

For GSP models, if \(\alpha > 1-p_m\) then, for a size \(\alpha \) test, \({\varXi }_{LRT}\uparrow {\varXi }_{F}\) in testing \(\sigma _w^2=0\) as the whole plots size m approaches infinity.

Proof of Proposition 6

Recall that

$$\begin{aligned} {\varXi }_F=P\left( W\ge \frac{1}{k}W_{1-p_m}\right) +P\left( W\le \frac{1}{k}W_{1-p_m}\right) -P\left( W\le \frac{1}{k}W_{\alpha }\right) , \end{aligned}$$

and

$$\begin{aligned} {\varXi }_{LRT}=P\left( W\ge \frac{1}{k}W_{1-p_m}\right) +\frac{\alpha -(1-p_m)}{p_m}P\left( W\le \frac{1}{k}W_{1-p_m}\right) . \end{aligned}$$

From Proposition 5, we have established for a finite whole plot size m

$$\begin{aligned} P\left( W\le \frac{1}{k}W_{1-p_m}\right) -P\left( W\le \frac{1}{k}W_{\alpha }\right) >\frac{\alpha -(1-p_m)}{p_m}P\left( W\le \frac{1}{k}W_{1-p_m}\right) . \end{aligned}$$

If we let \(m\uparrow \infty \) then \(k\uparrow \infty \) so that

$$\begin{aligned} P\left( W\le \frac{1}{k}W_{1-p_m}\right) =P\left( W\le \frac{1}{k}W_{\alpha }\right) =P\left( W\le \frac{1}{k}W_{1-p_m}\right) =0, \end{aligned}$$

and thus the inequality becomes equality and as a result \({\varXi }_{LRT}\uparrow {\varXi }_{F}\). In fact for \(m=\infty \) we have \({\varXi }_{LRT}={\varXi }_{F}=1\) since \(\lim _{k \rightarrow +\infty }P\left( W\ge \frac{1}{k}W_{1-p_m}\right) =1\). \(\square \)

6 Conclusion

For a class of linear mixed models with CS covariance structure, we show the equivalence between the F-test and LRT when the level of the test, \(\alpha \), is less or equal to one minus the probability, p, that the LRT statistic is zero. However, when \(\alpha > 1 - p\), we show that the F-test has a larger power than the LRT and thus proving that the LRT test is inadmissible. Further, we derive the statistical distribution of the LRT under both the null and alternative hypotheses \(H_0\) and \(H_1\) where \(H_0\) is the hypothesis that the between component variance is zero. This work proves important property on the relationship between the F-test and LRT for a large class of Mixed effect models. Further, providing an exact distribution of the test statistic for the LRT in these models will help in calculating a more accurate p-value than the traditionally used p-value from the large sample chi-square mixture approximations introduced by Self and Liang (1987).