1 Introduction

The Kenward–Roger (KR) test is widely used for testing linear hypotheses about fixed effects in normal mixed linear models. Following its introduction in 1997 (Kenward and Roger 1997), it has been cited in the literature more than 2500 times according to Google Scholar. Since 1999 (Stroup 1999) it has been incorporated in the SAS statistical package as the \(\hbox {DDFM}=\hbox {KR}\) option in the MODEL statement of its MIXED procedure (SAS Institute Inc 2015). The KR test has also been made available in the R package ‘pbkrtest’ (Halekoh and Højsgaard 2014) and can be found at Matlab Central as function ‘ddfmixed.m’ written by Witkovsky (2012). Simulation studies have shown that it performs well in a variety of mixed linear models (Schaalje et al. 2002; Guiard et al. 2003; Kowalchuk et al. 2004; Spilke et al. 2005; Chen 2006; Wimmer and Witkovsky 2007; Arnau et al. 2009; Wulff and Robinson 2009; Livacic-Rojas et al. 2010).

Consider a data vector \(\mathbf {y}\) whose distribution can be assumed to be multivariate normal with mean vector \(\mathbf {X}{\varvec{\beta }}\), where \({\varvec{\beta }}\) is a vector of fixed-effect parameters, and with an invertible variance–covariance matrix \({\varvec{\Sigma }}\) depending on a vector \({\varvec{\theta }}\) of variance–covariance parameters. We will assume the variance–covariance structure is linear. Suppose we want to test a linear hypothesis \(\mathrm {H}_0:\mathbf {L}'{\varvec{\beta }}= \mathbf {0}\) where \(\mathbf {L}\) is a matrix whose \(\ell \) columns are linearly independent. In developing their test, Kenward and Roger (1997) begin with the idea of a Wald test statistic of the form

$$\begin{aligned} T = {\hat{{\varvec{\beta }}}}'\mathbf {L}[\mathbf {L}'{\hat{\mathrm{V}}}\mathrm{ar}({\hat{{\varvec{\beta }}}})\mathbf {L}]^{-1}\mathbf {L}'{\hat{{\varvec{\beta }}}} \end{aligned}$$
(1.1)

where \({\hat{{\varvec{\beta }}}}\) is an estimator of \({\varvec{\beta }}\) and \({\hat{\mathrm{V}}}\mathrm{ar}({\hat{{\varvec{\beta }}}})\) is an estimator of \({{\mathrm{Var}}}({\hat{{\varvec{\beta }}}})\). For large samples it is often true that a Wald test statistic like T has approximately a chi-squared distribution with \(\ell \) degrees of freedom under the null hypothesis. For small samples, however, the null distribution of T may not be well approximated by this distribution. Kenward and Roger improve on the approximation in three ways:

  1. (a)

    They use an improved estimator \({\hat{\mathrm{V}}}\mathrm{ar}({\hat{{\varvec{\beta }}}})\) based on results of Harville and coworkers (Kackar and Harville 1984; Harville and Jeske 1992).

  2. (b)

    They allow the approximating null distribution of T to be a scaled F distribution.

  3. (c)

    They modify the approximating null distribution to be exact in two special cases.

Improvements (a) and (b) mainly involve Taylor series approximations and a few convenient assumptions. The resulting formulas are somewhat complicated but the outline of their derivation in Kenward and Roger (1997) makes the formulas seem reasonable. The modified formulas used to achieve improvement (c), however, are more mysterious. Below we provide details of a derivation that justifies these formulas (see Sect. 7). We show that similar but different derivations lead to different formulas that also produce exact null distributions in the two special cases. The two alternative procedures in Sects. 8 and 9 have formulas and derivations that are somewhat simpler than the Kenward–Roger procedure. The simulation study reported in Sect. 11 suggests that the three procedures perform similarly, at least when testing the equality of treatment effects in a block-design model with random blocks.

Section 2 presents basic notation and assumptions; Sect. 3 contains notation and formulas for improvement (a); Sect. 4 describes improvement (b); a description and justification of improvement (c) are given in Sects. 57. The justification leads us to two variations on improvement (c) that are derived in Sects. 8 and 9. In Sect. 10 are some theoretical results about when the three modifications produce the same formulas. Simulation results are presented in Sect. 11. Details are provided in the “Appendix”.

2 The testing problem

Consider a random vector distributed according to a multivariate normal distribution:

$$\begin{aligned} \mathbf {y}{\mathop {=}\limits ^{\mathrm {d}}}\mathrm {N}_n(\mathbf {X}{\varvec{\beta }},{\varvec{\Sigma }}) \end{aligned}$$
(2.1)

with mean vector \(\mathrm {E}(\mathbf {y}) = \mathbf {X}{\varvec{\beta }}\) and variance–covariance matrix \({{\mathrm{Var}}}(\mathbf {y}) = {\varvec{\Sigma }}= {\varvec{\Sigma }}({\varvec{\theta }})\) where \(\mathbf {X}\) is a known \(n \times p\) matrix of full column rank, \({\varvec{\beta }}\) is a \(p \times 1\) vector of unknown fixed-effect parameters, and \({\varvec{\Sigma }}\) is an \(n \times n\) positive-definite matrix depending on an \(r \times 1\) vector \({\varvec{\theta }}\) of unknown variance–covariance parameters. We will assume the model includes an intercept term. Let \({\varvec{\Omega }}\) denote the set of allowable values of \({\varvec{\theta }}\). We will assume that one of the allowable variance–covariance matrices is the identity matrix \(\mathbf {I}_n\) (which is true for most models). We will assume that the variance–covariance structure is intrinsically linear, so that, perhaps after reparameterization,

$$\begin{aligned} {\varvec{\Sigma }}= \theta _1\mathbf {G}_1 + \cdots + \theta _r\mathbf {G}_r \end{aligned}$$

for known symmetric matrices \(\mathbf {G}_i\). Types of variance–covariance structures that are linear include variance–components, random-coefficient, Toeplitz, Huynh–Feldt, and banded structures, as well as the unstructured structure. Two technical assumptions (which are satisfied for most models) are that the matrices \(\mathbf {G}_i\) are linearly independent and the set \({\varvec{\Omega }}\) contains a nonempty open subset of \(\mathbb {R}^r\).

Consider the problem of testing a linear hypothesis \(\mathrm {H}_0:\mathbf{L'}{\varvec{\beta }}= \mathbf {0}\) where \(\mathbf {L}\) is a known \(p \times \ell \) matrix of full column rank. A general approach to testing such a hypothesis is to form a Wald-type test statistic of the form

$$\begin{aligned} T = (\mathbf {L}'{\hat{{\varvec{\beta }}}})'[{\hat{\mathrm{V}}}\mathrm{ar}(\mathbf {L}'{\hat{{\varvec{\beta }}}})]^{-1}(\mathbf {L}'{\hat{{\varvec{\beta }}}}) = {\hat{{\varvec{\beta }}}}'\mathbf {L}[\mathbf {L}'{\hat{\mathrm{V}}}\mathrm{ar}({\hat{{\varvec{\beta }}}})\mathbf {L}]^{-1}\mathbf {L}'{\hat{{\varvec{\beta }}}} \end{aligned}$$

where \({\hat{{\varvec{\beta }}}}\) is an estimator of \({\varvec{\beta }}\) and \({\hat{\mathrm{V}}}\mathrm{ar}({\hat{{\varvec{\beta }}}})\) is an estimator of \({{\mathrm{Var}}}({\hat{{\varvec{\beta }}}})\). According to asymptotic theory, if the sample size is large (and suitable assumptions are met), one can test the null hypothesis by rejecting it if T is greater than a critical value of the \(\chi ^2(\ell )\) distribution. This may not be a good test, however, when the sample size is small. As stated in Sect. 1, Kenward and Roger (1997) have introduced alterations (a), (b) and (c) in order to improve the performance of the test for small samples.

3 Choosing an estimator \({\hat{{\varvec{\beta }}}}\) and an estimator of its variance–covariance matrix

Kenward and Roger (1997) choose \({\hat{{\varvec{\beta }}}}\) to be an estimated generalized least-squares estimator (EGLSE), that is,

$$\begin{aligned} {\hat{{\varvec{\beta }}}} = (\mathbf {X}'{\hat{{\varvec{\Sigma }}}}^{-1}\mathbf {X})^{-1}\mathbf {X}'{\hat{{\varvec{\Sigma }}}}^{-1}\mathbf {y}\end{aligned}$$

where \({\hat{{\varvec{\Sigma }}}} = {\varvec{\Sigma }}({\hat{{\varvec{\theta }}}})\) and \({\hat{{\varvec{\theta }}}}\) is an estimator of \({\varvec{\theta }}\). They choose \({\hat{{\varvec{\theta }}}}\) to be the residual maximum likelihood estimator (REMLE) of \({\varvec{\theta }}\).

Let \({\tilde{{\varvec{\beta }}}} = {\tilde{{\varvec{\beta }}}}({\varvec{\theta }}) = (\mathbf {X}'{\varvec{\Sigma }}^{-1}\mathbf {X})^{-1}\mathbf {X}'{\varvec{\Sigma }}^{-1}\mathbf {y}\), which is often called a generalized least-squares estimator (GLSE). (See the remark below.) Note that the EGLSE can be written as \({\hat{{\varvec{\beta }}}} = {\tilde{{\varvec{\beta }}}}({\hat{{\varvec{\theta }}}})\). Kackar and Harville (1984) expressed

$$\begin{aligned} {{\mathrm{Var}}}({\hat{{\varvec{\beta }}}}) = {\varvec{\Phi }}+ {\varvec{\Lambda }}\end{aligned}$$

where

$$\begin{aligned} {\varvec{\Phi }}= {\varvec{\Phi }}({\varvec{\theta }}) = (\mathbf {X}'{\varvec{\Sigma }}^{-1}\mathbf {X})^{-1} = {{\mathrm{Var}}}({\tilde{{\varvec{\beta }}}}) \quad \text {and}\quad {\varvec{\Lambda }}= {{\mathrm{Var}}}({\hat{{\varvec{\beta }}}} - {\tilde{{\varvec{\beta }}}}) \end{aligned}$$

and they approximated \({\varvec{\Lambda }}\) by

$$\begin{aligned} {\tilde{{\varvec{\Lambda }}}} = {\tilde{{\varvec{\Lambda }}}}({\varvec{\theta }}) = {\varvec{\Phi }}\biggl [\,\sum _{i=1}^r\sum _{j=1}^r w_{ij}(\mathbf {Q}_{ij}-\mathbf {P}_i{\varvec{\Phi }}\mathbf {P}_j)\biggr ]{\varvec{\Phi }}\end{aligned}$$

where \(\mathbf {W}= [w_{ij}]_{r\times r} = {{\mathrm{Var}}}({\hat{{\varvec{\theta }}}})\) and

$$\begin{aligned} \mathbf {P}_i&= \mathbf {P}_i({\varvec{\theta }}) = -\mathbf {X}'{\varvec{\Sigma }}^{-1}\mathbf {G}_i{\varvec{\Sigma }}^{-1}\mathbf {X}\\ \mathbf {Q}_{ij}&= \mathbf {Q}_{ij}({\varvec{\theta }}) = \mathbf {X}'{\varvec{\Sigma }}^{-1}\mathbf {G}_i{\varvec{\Sigma }}^{-1}\mathbf {G}_j{\varvec{\Sigma }}^{-1}\mathbf {X}. \end{aligned}$$

Remark

In general, the GLSE depends on the value of the unknown parameter vector \({\varvec{\theta }}\) and therefore is not a true estimator because it cannot be calculated from the observed data. A true estimator of \({\varvec{\beta }}\) is obtained by substituting an estimator \({\hat{{\varvec{\theta }}}}\) for \({\varvec{\theta }}\), thus producing an EGLSE. For special models, the GLSE does not depend on the value of \({\varvec{\theta }}\). This happens if and only if the model satisfies the condition that, for all allowable \({\varvec{\Sigma }}\), the column space of \({\varvec{\Sigma }}\mathbf {X}\) is contained in the column space of \(\mathbf {X}\) (Zyskind 1967, Theorem 2). For such models, the GLSE of \({\varvec{\beta }}\) coincides with the LSE and is a uniformly best linear unbiased estimator. Zyskind’s condition is met by models such as balanced mixed-effects classification models that are ’proper’ as defined by VanLeeuwen et al. (1999). Most, if not all, mixed-effects classification models used in practice are proper.

The REMLE \({\hat{{\varvec{\theta }}}}\) is derived from a residual model not involving \({\varvec{\beta }}\). Let \({\tilde{\mathbf {W}}} = {\tilde{\mathbf {W}}}({\varvec{\theta }})\) denote the inverse of the expected information matrix for the residual model. We can approximate \(\mathbf {W}\) by \({\hat{\mathbf {W}}} = {\tilde{\mathbf {W}}}({\hat{{\varvec{\theta }}}})\). Also define \({\hat{{\varvec{\Phi }}}} = {\varvec{\Phi }}({\hat{{\varvec{\theta }}}})\), \({\hat{{\varvec{\Lambda }}}} = {\tilde{{\varvec{\Lambda }}}}({\hat{{\varvec{\theta }}}})\), \({\hat{\mathbf {P}}}_i = \mathbf {P}_i({\hat{{\varvec{\theta }}}})\), \({\hat{\mathbf {Q}}}_{ij} = \mathbf {Q}_{ij}({\hat{{\varvec{\theta }}}})\).

The matrix \({\hat{{\varvec{\Phi }}}}\) has traditionally been used as a convenient estimator for \({{\mathrm{Var}}}({\hat{{\varvec{\beta }}}})\), but it tends to underestimate. First, although \({\hat{{\varvec{\Phi }}}}\) is a sensible estimator for \({\varvec{\Phi }}\), \({\varvec{\Phi }}\) is not the same as \({{\mathrm{Var}}}({\hat{{\varvec{\beta }}}})\) unless \({\varvec{\Lambda }}= \mathbf {0}\) (which happens if and only if the model satisfies the condition of Zyskind (1967) mentioned in the remark above). The formula \({\varvec{\Phi }}+ {\tilde{{\varvec{\Lambda }}}}\) from Kackar and Harville (1984) is a more accurate approximation for \({{\mathrm{Var}}}({\hat{{\varvec{\beta }}}})\) than \({\varvec{\Phi }}\) is, and correspondingly, \({\hat{{\varvec{\Phi }}}} + {\hat{{\varvec{\Lambda }}}}\) is a better estimator of \({{\mathrm{Var}}}({\hat{{\varvec{\beta }}}})\) than \({\hat{{\varvec{\Phi }}}}\) is. But \({\hat{{\varvec{\Phi }}}} + {\hat{{\varvec{\Lambda }}}}\) still tends to underestimate \({{\mathrm{Var}}}({\hat{{\varvec{\beta }}}})\). The bias is reduced by using an adjusted estimator

$$\begin{aligned} {\hat{{\varvec{\Phi }}}}_\mathrm {A}= {\hat{{\varvec{\Phi }}}} + 2{\hat{{\varvec{\Lambda }}}} \end{aligned}$$

(see Harville and Jeske 1992, Sect. 4.2).

Kenward and Roger (1997) use the test statistic T with \({\hat{\mathrm{V}}}\mathrm{ar}({\hat{{\varvec{\beta }}}}) = {\hat{{\varvec{\Phi }}}}_\mathrm {A}\) and rescale it by dividing by \(\ell \):

$$\begin{aligned} F_{\mathrm {KR}} = \frac{1}{\ell }{\hat{{\varvec{\beta }}}}'\mathbf {L}(\mathbf {L}'{\hat{{\varvec{\Phi }}}}_\mathrm {A}\mathbf {L})^{-1}\mathbf {L}'{\hat{{\varvec{\beta }}}}. \end{aligned}$$
(3.1)

4 Approximating the null distribution of the KR test statistic

Kenward and Roger (1997) approximate the null distribution of \(F_{\mathrm {KR}}\) by supposing that there are positive numbers m and \(\lambda \) such that the null distribution of \(\lambda F_{\mathrm {KR}}\) is approximately an F distribution with \(\ell \) numerator degrees of freedom and m denominator degrees of freedom. It is not required that m be an integer. The values of \(\lambda \) and m are determined by matching moments.

Generally there are no exact formulas for the moments \(\mathrm {E}(F_{{\mathrm {KR}}})\) and \({{\mathrm{Var}}}(F_{{\mathrm {KR}}})\), but by using Taylor series expansions, Kenward and Roger (1997) obtain the following approximate formulas:

$$\begin{aligned} \mathrm {E}(F_{{\mathrm {KR}}}) \approx E^\# = 1 + \frac{1}{\ell }A_2 \quad \text {and}\quad {{\mathrm{Var}}}(F_{{\mathrm {KR}}}) \approx V^\# = \frac{2}{\ell }(1 + B) \end{aligned}$$
(4.1)

where

$$\begin{aligned} A_2&= \sum _{i=1}^r\sum _{j=1}^r {\hat{w}}_{ij}{{\mathrm{tr}}}({\hat{{\varvec{\Psi }}}}{\hat{\mathbf {P}}}_i{\hat{{\varvec{\Psi }}}}{\hat{\mathbf {P}}}_j)\\ B&= \frac{1}{2\ell }(A_1 + 6A_2)\\ A_1&= \sum _{i=1}^r\sum _{j=1}^r {\hat{w}}_{ij}{{\mathrm{tr}}}({\hat{{\varvec{\Psi }}}}{\hat{\mathbf {P}}}_i){{\mathrm{tr}}}({\hat{{\varvec{\Psi }}}}{\hat{\mathbf {P}}}_j)\\ {\varvec{\Psi }}&= {\varvec{\Psi }}({\varvec{\theta }}) = {\varvec{\Phi }}\mathbf {L}(\mathbf {L}'{\varvec{\Phi }}\mathbf {L})^{-1}\mathbf {L}'{\varvec{\Phi }},\quad {\hat{{\varvec{\Psi }}}} = {\varvec{\Psi }}({\hat{{\varvec{\theta }}}}). \end{aligned}$$

We can approximate the null distribution of \(F_{{\mathrm {KR}}}\) by

$$\begin{aligned} \lambda ^\# F_{{\mathrm {KR}}} {\mathop {\approx }\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m^\#) \end{aligned}$$
(4.2)

where \(\lambda ^\#\) and \(m^\#\) are chosen so that the approximate moments of \(\lambda ^\#F_{{\mathrm {KR}}}\) match the exact moments of \(\mathrm {F}(\ell ,m^\#)\):

$$\begin{aligned} \lambda ^\# E^\#&= \mathrm {E}[\mathrm {F}(\ell ,m^\#)] = \frac{m^\#}{m^\# - 2}\\ (\lambda ^\#)^2 V^\#&= {{\mathrm{Var}}}[\mathrm {F}(\ell ,m^\#)] = 2\left( \frac{m^\#}{m^\# - 2}\right) ^{2}\frac{(\ell + m^\#- 2)}{\ell (m^\# - 4)}. \end{aligned}$$

Solve for \(\lambda ^\#\) and \(m^\#\):

$$\begin{aligned} m^\# = 4 + \frac{\ell + 2}{\ell \rho ^\# - 1} \quad \text {and}\quad \lambda ^\# = \frac{m^\#}{(m^\# - 2)E^\#} \end{aligned}$$
(4.3)

where

$$\begin{aligned} \rho ^\# = \frac{V^\#}{2(E^\#)^2}. \end{aligned}$$

5 First special case: balanced ANOVA model

Consider a balanced one-way ANOVA fixed-effects model,

$$\begin{aligned} y_{ij} = \mu _i + e_{ij} \end{aligned}$$

for \(i = 1,\ldots ,t\), \(j = 1,\ldots ,v\). The quantities \(\mu _i\) are unknown fixed effects and the \(e_{ij}\) are unobservable i.i.d. random variables from a \(\mathrm {N}(0,\sigma ^2)\) population. Let us test \(\mathrm {H}_0:\mu _1 =\cdots = \mu _t\). This is a special case of the testing problem in Sect. 2 with \(n = tv\), \(p = t\), \(r = 1\) and \(\ell = t - 1\) (see A.1 in the “Appendix”). One can calculate:

$$\begin{aligned} {\hat{{\varvec{\Sigma }}}}&= {\hat{\sigma }}^2\mathbf {I}_n,\quad {\hat{\sigma }}^2 ={\hat{\sigma }}_{\text {REML}}^2 =\frac{\sum _{i=1}^{t}\sum _{j=1}^{v}(y_{ij}-\bar{y}_{i\cdot })^2}{n-t}\\ {\hat{\mu }}_i&=\bar{y}_{i\cdot },\quad {\hat{{\varvec{\Phi }}}}_{\mathrm {A}}={\hat{{\varvec{\Phi }}}}=\left( \frac{{\hat{\sigma }}^2}{v}\right) \mathbf {I}_t\\ F_{{\mathrm {KR}}}&= \frac{v\sum _{i=1}^{t}(\bar{y}_{i\cdot }-\bar{y}_{\cdot \cdot })^{2}/(t-1)}{{\hat{\sigma }}^2}. \end{aligned}$$

It is well known that in a normal balanced one-way ANOVA fixed-effects model, the null distribution of the statistic \(F_{{\mathrm {KR}}}\) above is exactly the \(\mathrm {F}(t-1,n-t)\) distribution (Kuehl 2000, p. 57). That is,

$$\begin{aligned} \lambda F_{{\mathrm {KR}}} {\mathop {=}\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m) \quad \text {with}\quad m = n - t \quad \text {and}\quad \lambda = 1. \end{aligned}$$
(5.1)

To see whether the values of \(m^\#\) and \(\lambda ^\#\) (defined in Sect. 4 above) are equal to the “ideal” values \(n-t\) and 1, one can calculate:

$$\begin{aligned} E^\#&= 1 + \frac{2}{n - t},\quad V^\# = \frac{2}{t - 1}\left( 1 + \frac{t + 5}{n - t}\right) \end{aligned}$$
(5.2)
$$\begin{aligned} m^\#&= 4 + (n - t)\frac{[1 + 2/(n-t)]^2}{1 - 4/[(t + 1)(n - t)]} > n - t \end{aligned}$$
(5.3a)
$$\begin{aligned} \lambda ^\#&= \frac{m^\#(n - t)}{(m^\# - 2)(n - t + 2)} = \left( 1 + \frac{2}{m^\# - 2}\right) \left( 1 - \frac{2}{n - t + 2}\right) < 1. \end{aligned}$$
(5.3b)

Neither \(m^\#\) nor \(\lambda ^\#\) has the ideal value in this case.

6 Second special case: Hotelling T-squared test

Suppose \(\mathbf {y}_1,\ldots ,\mathbf {y}_v\) are v independent and identically distributed random \(p \times 1\) vectors with

$$\begin{aligned} \mathbf {y}_k{\mathop {=}\limits ^{\mathrm {d}}}\mathrm {N}_p({\varvec{\mu }},{\varvec{\Sigma }}_p) \end{aligned}$$

for \(k = 1,\ldots ,v\). Let us test \(\mathrm {H}_0:{\varvec{\mu }}= \mathbf {0}\). This is a special case of the testing problem in Sect. 2 with \(n = vp\), \(p = p\), \(r = p(p + 1)/2\) and \(\ell = p\) (see A.5 in the “Appendix”). One can calculate:

$$\begin{aligned} {\hat{{\varvec{\Sigma }}}}&= \mathbf {I}_v \otimes \mathbf {S},\quad \mathbf {S}= {\hat{{\varvec{\Sigma }}}}_{p\text {REML}} =\frac{\sum _{k=1}^{v}(\mathbf {y}_k-\bar{\mathbf {y}}_{\cdot })(\mathbf {y}_k-\bar{\mathbf {y}}_{\cdot })'}{v-1}\\ {\hat{{\varvec{\mu }}}}&= \bar{\mathbf {y}}_{\cdot },\quad {\hat{{\varvec{\Phi }}}}_{\mathrm {A}} = {\hat{{\varvec{\Phi }}}} =\frac{1}{v}\mathbf {S}\\ F_{{\mathrm {KR}}}&=\frac{v}{p}\bar{\mathbf {y}}_{\cdot }'\mathbf {S}^{-1}\bar{\mathbf {y}}_{\cdot }. \end{aligned}$$

In this setting it is common to apply the one-sample Hotelling T-squared test, in which the test statistic is \(T^2 = pF_{{\mathrm {KR}}}\). It is known (Mardia et al. 1979, Section 5.2.1b) that the null distribution of \([(v - p)/[(v - 1)p]]T^2 = [(v - p)/(v - 1)]F_{{\mathrm {KR}}}\) is exactly \(\mathrm {F}(p,v - p)\). That is,

$$\begin{aligned} \lambda F_{{\mathrm {KR}}} {\mathop {=}\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m) \quad \text {with}\quad m = v - p \quad \text {and}\quad \lambda = \frac{v - p}{v - 1}. \end{aligned}$$
(6.1)

(We require \(v > p\).) To see whether the values of \(m^\#\) and \(\lambda ^\#\) are equal to the ideal values \(v - p\) and \((v - p)/(v - 1)\), one can calculate:

$$\begin{aligned} E^\#&= 1 + \frac{p + 1}{v - 1},\quad V^\# = \frac{2}{p}\left( 1 + \frac{3p + 4}{v - 1}\right) \end{aligned}$$
(6.2)
$$\begin{aligned} m^\#&= 4 + (v - p)\frac{[1 + 2p/(v - p)]^2}{1 - (p + 3)/[(p + 2)(v - p)]} > v - p \end{aligned}$$
(6.3a)
$$\begin{aligned} \lambda ^\#&= \frac{m^\#(v - 1)}{(m^\# - 2)(v + p)}. \end{aligned}$$
(6.3b)

Again in this second special case we see that \(m^\#\) does not have the ideal value. And for most (if not all) choices of p and v, neither does \(\lambda ^\#\). For example, for \(p = 2\) and \(v = 10\) we get \(\lambda ^\# = 57/70 \ne 8/9 = (v - p)/(v - 1)\).

7 Kenward and Roger’s modification of the approximate null distribution

We see that in the two special cases the formulas (4.1) for \(E^\#\) and \(V^\#\), when plugged into formulas (4.3) for \(m^\#\) and \(\lambda ^\#\), do not achieve the ideal values of m and \(\lambda \). Kenward and Roger (1997) modified formulas (4.1) to obtain approximations \(E^*\) and \(V^*\) with the desirable property that the KR test in the two special cases coincides with the exact test.

In the ANOVA special case,

$$\begin{aligned} \mathrm {E}(F_{{\mathrm {KR}}}) = \mathrm {E}[\mathrm {F}(t - 1,n - t)] = \frac{n - t}{n - t - 2} = \frac{1}{1 - \frac{2}{n - t}} = \frac{1}{1 - \frac{A_2}{\ell }}. \end{aligned}$$

In the Hotelling special case,

$$\begin{aligned} \mathrm {E}(F_{{\mathrm {KR}}}) = \mathrm {E}\left[ \left( \frac{v - 1}{v - p}\right) \mathrm {F}(p,v - p)\right] = \frac{v - 1}{v - p - 2} = \frac{1}{1 - \frac{p + 1}{v - 1}} = \frac{1}{1 - \frac{A_2}{\ell }}. \end{aligned}$$

Thus we are led to the formula

$$\begin{aligned} E^* = \frac{1}{1 - \frac{A_2}{\ell }}, \end{aligned}$$
(7.1)

which Kenward and Roger apply to all models in the class described in Sect. 2. Note that formula (7.1) makes sense from an asymptotic viewpoint, because for large \(\ell \) the quantity \(\varepsilon = A_2/\ell \) becomes small, and for small \(\varepsilon \) we have \(E^\# = 1 + \varepsilon \approx 1/(1 - \varepsilon ) = E^*\).

Next consider \({{\mathrm{Var}}}(F_{{\mathrm {KR}}})\). The formula for \(V^\#\) is a function of \(\ell \) and B, so let us choose \(V^*\) to have this same feature. We will look at the exact values of \({{\mathrm{Var}}}(F_{{\mathrm {KR}}})\) under the null hypothesis in the two special cases and express them in terms of \(\ell \) and B.

In the ANOVA special case, (5.1) states that, under the null hypothesis, \(F_{{\mathrm {KR}}} {\mathop {=}\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m)\) with \(m = n - t\), so:

$$\begin{aligned} {{\mathrm{Var}}}(F_{{\mathrm {KR}}}) = {{\mathrm{Var}}}[\mathrm {F}(\ell ,m)] = 2\left( \frac{m}{m - 2}\right) ^{2}\frac{(\ell + m - 2)}{\ell (m - 4)}. \end{aligned}$$

This formula is in terms of \(\ell \) and m, but we can express m in terms of \(\ell \) and B. Calculate \(B = (t + 5)/(n - t) = (\ell + 6)/m\) (see A.2 in the “Appendix”) and write \(1/m = B/(\ell + 6)\). Now

$$\begin{aligned} {{\mathrm{Var}}}(F_{{\mathrm {KR}}})&= \frac{2}{\ell }\left( \frac{1}{1 - 2/m}\right) ^{2}\frac{[1 + (\ell - 2)/m]}{(1 - 4/m)}\\&= \frac{2}{\ell }\frac{\left( 1 + \frac{\ell - 2}{\ell + 6}B\right) }{\left( 1 - \frac{2}{\ell + 6}B\right) ^{2}\left( 1 - \frac{4}{\ell + 6}B\right) }. \end{aligned}$$

In the Hotelling special case, (6.1) implies that, under the null hypothesis, \(F_{{\mathrm {KR}}} {\mathop {=}\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m)/\lambda \) with \(m = v - p\) and \(\lambda = (v - p)/(v - 1)\), so:

$$\begin{aligned} {{\mathrm{Var}}}(F_{{\mathrm {KR}}}) = {{\mathrm{Var}}}[\mathrm {F}(\ell ,m)/\lambda ] = 2\left( \frac{m}{m - 2}\right) ^{2}\frac{(\ell + m - 2)}{\ell (m - 4)} \biggl /\lambda ^2. \end{aligned}$$

Recall \(p = \ell \), so that \(\lambda = m/(m + \ell - 1)\) and

$$\begin{aligned} {{\mathrm{Var}}}(F_{{\mathrm {KR}}}) = \frac{2}{\ell }\left( \frac{m + \ell - 1}{m - 2}\right) ^{2} \frac{(m + \ell - 2)}{(m - 4)}. \end{aligned}$$

To bring B into the formula, calculate \(B = (3p + 4)/(v - 1) = (3\ell + 4)/(m + \ell - 1)\) (see A.6 in the “Appendix”). Let \(k = m + \ell - 1\) and write \(1/k = B/(3\ell + 4)\). Now

$$\begin{aligned} {{\mathrm{Var}}}(F_{{\mathrm {KR}}})&= \frac{2}{\ell }\left( \frac{k}{k - \ell - 1}\right) ^{2}\frac{(k - 1)}{(k - \ell - 3)} = \frac{2}{\ell }\left[ \frac{1}{1 - (\ell + 1)/k}\right] ^2 \frac{(1 - 1/k)}{[1 - (\ell + 3)/k]}\\&= \frac{2}{\ell }\frac{\left( 1 + \frac{-1}{3\ell + 4}B\right) }{\left( 1 - \frac{\ell + 1}{3\ell + 4}B\right) ^2\left( 1 - \frac{\ell + 3}{3\ell + 4}B\right) }. \end{aligned}$$

In both cases, the formula for the exact value of the variance of \(F_{{\mathrm {KR}}}\) has the form:

$$\begin{aligned} {{\mathrm{Var}}}(F_{{\mathrm {KR}}}) = \frac{2}{\ell }\frac{(1 + d_1 B)}{(1 - d_2 B)^2(1 - d_3 B)}. \end{aligned}$$

Let \(\mathbf {d}= (d_1,d_2,d_3)\).

  • Case 1:

    $$\begin{aligned}&\mathbf {d}= \left( \frac{\ell - 2}{\ell + 6},\frac{2}{\ell + 6},\frac{4}{\ell + 6}\right) \end{aligned}$$
  • Case 2:

    $$\begin{aligned}&\mathbf {d}= \left( \frac{-1}{3\ell + 4},\frac{\ell + 1}{3\ell + 4},\frac{\ell + 3}{3\ell + 4}\right) \end{aligned}$$

We need general formulas for these coefficients that reduce to the desired values in the two cases. Write \(d_1 = g/h\).

  • Case 1:

    $$\begin{aligned} g = \ell - 2,\quad h = \ell + 6 \end{aligned}$$
  • Case 2:

    $$\begin{aligned} g = - 1,\quad h = 3\ell + 4 \end{aligned}$$

Looking at the numerators of the ratios \(d_i\), we see that in both cases

$$\begin{aligned} \mathbf {d}= \left( \frac{g}{h},\frac{\ell - g}{h},\frac{\ell - g + 2}{h}\right) . \end{aligned}$$
(7.2)

Formulas (4.1) for \(E^\#\) and \(V^\#\) derived by Kenward and Roger (1997) as initial approximations of \(\mathrm {E}(F_{{\mathrm {KR}}})\) and \({{\mathrm{Var}}}(F_{{\mathrm {KR}}})\) follow naturally from Taylor expansions and from certain simplifying assumptions. The formulas that they constructed for g and h appear to be somewhat more improvised. Just like formulas (4.1), the formulas for g and h are functions of the quantities \(\ell \), \(A_1\), and \(A_2\). Kenward and Roger essentially chose to express g and h as linear functions of the ratio \(A_1/A_2\) with coefficients that are functions of \(\ell \). That is,

$$\begin{aligned} g = a_0 + a_1\frac{A_1}{A_2} \quad \text {and}\quad h = b_0 + b_1\frac{A_1}{A_2} \end{aligned}$$

for coefficients determined so that g and h have the desired values in the two special cases. It is equivalent if one expresses \(h = c_0 + c_1g\) as a linear function of g, and this leads to simpler coefficients.

  • Case 1:

    $$\begin{aligned} \frac{A_1}{A_2} = \ell ,\quad \ell - 2 = a_0 + a_1\ell ,\quad \ell + 6 = c_0 + c_1(\ell - 2) \end{aligned}$$
  • Case 2:

    $$\begin{aligned} \frac{A_1}{A_2} = \frac{2}{\ell + 1},\quad - 1 = a_0 + a_1\frac{2}{\ell + 1},\quad 3\ell + 4 = c_0 + c_1(-1) \end{aligned}$$

These equations can be solved to obtain \(a_0\), \(a_1\), \(c_0\), \(c_1\):

$$\begin{aligned} g = \frac{-(\ell + 4) + (\ell + 1)(A_1/A_2)}{\ell + 2} \quad \text {and}\quad h = 3\ell + 2 - 2g. \end{aligned}$$
(7.3)

Thus we arrive at the formula for the Kenward–Roger approximation of the variance of their test statistic:

$$\begin{aligned} V^* = \frac{2}{\ell }\frac{(1 + d_1B)}{(1 - d_2B)^2(1 - d_3B)} \end{aligned}$$
(7.4)

where \(d_1\), \(d_2\), \(d_3\) are given by formulas (7.2) and (7.3). The modified approximate null distribution of the Kenward–Roger test statistic is given by

$$\begin{aligned} \lambda ^*F_{{\mathrm {KR}}} {\mathop {\approx }\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m^*) \end{aligned}$$
(7.5)

where \(m^*\) and \(\lambda ^*\) are calculated as in (4.3), replacing \(E^\#\) and \(V^\#\) by \(E^*\) and \(V^*\). Approximation (7.5) is preferable to approximation (4.2) in so far as it reproduces the exact test in the two special cases. Moreover, in the simulation study reported in Sect. 11 below it is seen that, even in situations where no exact test is available, the modified approximation (7.5) does better than approximation (4.2).

8 An alternative modification

As will be seen in Sect. 11, the modification described in Sect. 7 is an important step in the development of the KR test. The essential idea of the modification is to find approximations \(E^*\) and \(V^*\) that lead to values \(m^*\) and \(\lambda ^*\) such that (7.5) holds approximately under the null hypothesis and is exact in the two special cases. Some of the formulas in the modification, particularly (7.2), (7.3) and (7.4) that are used to calculate \(V^*\), might appear to be somewhat arbitrary. Indeed there are alternative modifications that achieve the same goal. In this section we derive \(m^\dag \) and \(\lambda ^\dag \), different from \(m^*\) and \(\lambda ^*\), such that the null distribution of the KR test statistic is given approximately by

$$\begin{aligned} \lambda ^\dag F_{{\mathrm {KR}}} {\mathop {\approx }\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m^\dag ) \end{aligned}$$
(8.1)

with exact equality in distribution for the two special cases.

We continue to use approximation \(E^*\) shown in (7.1). Rather than develop a formula for approximating \({{\mathrm{Var}}}(F_{{\mathrm {KR}}})\), we will obtain a formula for \(m^\dag \) directly.

First, in the two special cases let us express m as a function of \(\ell \) and B. Formulas for B are given in Sect. 7 and can be rearranged to obtain:

$$\begin{aligned} m = \frac{\ell + 6}{B} \text { in case 1} \quad \text {and}\quad m = -(\ell - 1) + \frac{3\ell + 4}{B} \text { in case 2}. \end{aligned}$$

In both cases, m equals a linear function of 1 / B with coefficients involving \(\ell \), so we choose \(m^\dag \) to be such a function: \(m^\dag = e_0 + e_1/B\).

  • Case 1:

    $$\begin{aligned} m = n - t,\quad e_0 = 0,\quad e_1 = \ell + 6 \end{aligned}$$
  • Case 2:

    $$\begin{aligned} m = v - p,\quad e_0 = 1 - \ell ,\quad e_1 = 3\ell + 4 \end{aligned}$$

The coefficients \(e_0\) and \(e_1\) depend on the model and to account for this we express them as functions of \(A_1/A_2\). For simplicity we choose linear functions:

$$\begin{aligned} e_0 = f_0 + f_1\frac{A_1}{A_2} \quad \text {and}\quad e_1 = g_0 + g_1\frac{A_1}{A_2} \end{aligned}$$

in which the coefficients are functions of \(\ell \) and are determined so that \(e_0\) and \(e_1\) have the desired values in the two special cases. It is equivalent if one expresses \(e_1 = h_0 + h_1e_0\) as a linear function of \(e_0\), and the coefficients are simpler.

  • Case 1:

    $$\begin{aligned} \frac{A_1}{A_2} = \ell ,\quad 0 = f_0 + f_1\ell ,\quad \ell + 6 = h_0 + h_1(0) \end{aligned}$$
  • Case 2:

    $$\begin{aligned} \frac{A_1}{A_2} = \frac{2}{\ell + 1},\quad 1 - \ell = f_0 + f_1\frac{2}{\ell + 1},\quad 3\ell + 4 = h_0 + h_1(1 - \ell ) \end{aligned}$$

Solving for \(f_0\), \(f_1\), \(h_0\), \(h_1\), we obtain

$$\begin{aligned} e_0 = \frac{\ell + 1}{\ell + 2}\left( \frac{A_1}{A_2} - \ell \right) \quad \text {and}\quad e_1 = \ell + 6 - 2e_0. \end{aligned}$$
(8.2)

Therefore, our alternative modified approximation for the null distribution of the Kenward–Roger test statistic is given by (8.1) where

$$\begin{aligned} m^\dag = e_0 + \frac{e_1}{B} \quad \text {and}\quad \lambda ^\dag = \frac{m^\dag }{(m^\dag - 2)E^\dag }, \end{aligned}$$
(8.3)

with \(e_0\) and \(e_1\) given in (8.2), and \(E^\dag = E^*\) in (7.1).

This alternative modification is still an improvisation but its formulas (see (7.1), (8.2), (8.3)) are simpler than those appearing in the modification derived by Kenward and Roger (1997); see formulas (7.1)–(7.4) and (4.3) above. Simpler formulas are more appealing, but a more important finding is that our alternative procedure performs very nearly the same as the original modification in the simulation study reported in Sect. 11 below. This reassures us that, even though some of the Kenward–Roger formulas may seem arbitrary and though equally justifiable alternative formulas exist, the particular choice of modification seems to have little effect on the performance of the KR test.

9 Another alternative modification

In this section we present another pair of values \(m^\ddag \) and \(\lambda ^\ddag \) such that

$$\begin{aligned} \lambda ^\ddag F_{{\mathrm {KR}}}{\mathop {\approx }\limits ^{\mathrm {d}}}\mathrm {F}(\ell ,m^\ddag ) \end{aligned}$$
(9.1)

under the null hypothesis, with exact equality in distribution for the two special cases. The initial, unmodified version of the KR test in Sect. 4 above uses \(\mathrm {F}(\ell ,m^\#)/\lambda ^\#\) as the approximate null distribution of \(F_{{\mathrm {KR}}}\) where \(m^\#\) and \(\lambda ^\#\) are calculated from \(E^\#\) and \(V^\#\), which are Taylor approximations to the mean and variance of \(F_{{\mathrm {KR}}}\). The improved, modified version of the KR test in Sect. 7 above uses \(\mathrm {F}(\ell ,m^*)/\lambda ^*\) where \(m^*\) and \(\lambda ^*\) are calculated from the modified quantities \(E^*\) and \(V^*\), which are extrapolated from the exact values of the mean and variance of \(F_{{\mathrm {KR}}}\) in the two special cases. The alternative modification in Sect. 8 uses \(\mathrm {F}(\ell ,m^\dag )/\lambda ^\dag \) where \(m^\dag \) is obtained directly by extrapolating from the exact values of m in the two special cases, and \(\lambda ^\dag \) is calculated from \(E^\dag = E^*\) and \(m^\dag \). Now in this section, a second alternative modification is described that uses \(\mathrm {F}(\ell ,m^\ddag )/\lambda ^\ddag \) where \(m^\ddag \) is again obtained directly by extrapolating from the exact values of m in the two special cases but using a different extrapolation procedure.

In Sect. 8, \(m^\dag \) is expressed as a linear function of 1 / B, in which the coefficients are linear functions of \(A_1/A_2\), in which the coefficients are functions of \(\ell \):

$$\begin{aligned} m^\dag = \left( \frac{\ell + 1}{\ell + 2}\right) \left( \frac{A_1}{A_2} - \ell \right) + \left[ (\ell + 6) - 2\left( \frac{\ell + 1}{\ell + 2}\right) \left( \frac{A_1}{A_2} - \ell \right) \right] \frac{1}{B}, \end{aligned}$$

which can be rewritten as a ratio of two quadratic functions of \(A_1\) and \(A_2\) with coefficients that are functions of \(\ell \):

$$\begin{aligned} m^\dag = \frac{2\ell (\ell + 2)(\ell + 6)A_2 + (\ell + 1)(A_1 - \ell A_2)(A_1 + 6A_2 - 4\ell )}{(\ell + 2)(A_1 + 6A_2)A_2}. \end{aligned}$$

For the alternative modification presented in this section, \(m^\ddag \) will be expressed as a ratio of two linear functions of \(A_1\) and \(A_2\) with coefficients that are functions of \(\ell \).

Formulas for the exact values of m in the two special cases are displayed in Sect. 8 in terms of \(\ell \) and B. Note that each of the two formulas can be written as a ratio of two linear functions of \(A_1\) and \(A_2\) with coefficients that are functions of \(\ell \) and with 1 as the constant term in the numerator function and 0 as the constant term in the denominator function:

  • Case 1:

    $$\begin{aligned} m = \frac{\ell + 6}{B} = \frac{1}{\frac{1}{2\ell (\ell + 6)}A_1 + \frac{3}{\ell (\ell + 6)}A_2} \end{aligned}$$
  • Case 2:

    $$\begin{aligned} m = -(\ell - 1) + \frac{3\ell + 4}{B} = \frac{1 - \frac{\ell - 1}{2\ell (3\ell + 4)}A_1 - \frac{3(\ell - 1)}{\ell (3\ell + 4)}A_2}{\frac{1}{2\ell (3\ell + 4)}A_1 + \frac{3}{\ell (3\ell + 4)}A_2} \end{aligned}$$

Both of these formulas have the form:

$$\begin{aligned} m = \frac{1 + c_1A_1 + c_2A_2}{d_1A_1 + d_2A_2}. \end{aligned}$$

We can extrapolate from the two special cases by choosing \(c_1\), \(c_2\), \(d_1\), \(d_2\) such that:

$$\begin{aligned} c_1A_1 + c_2A_2&= 0 \end{aligned}$$
(9.2a)
$$\begin{aligned} d_1A_1 + d_2A_2&= \frac{1}{2\ell (\ell + 6)}A_1 + \frac{3}{\ell (\ell + 6)}A_2 \end{aligned}$$
(9.2b)

in case 1, that is, when \(A_1 = 2\ell ^2/m\) and \(A_2 = 2\ell /m\) (see A.2 in the “Appendix”), and such that:

$$\begin{aligned} c_1A_1 + c_2A_2&= -\frac{\ell - 1}{2\ell (3\ell + 4)}A_1 - \frac{3(\ell - 1)}{\ell (3\ell + 4)}A_2 \end{aligned}$$
(9.2c)
$$\begin{aligned} d_1A_1 + d_2A_2&= \frac{1}{2\ell (3\ell + 4)}A_1 + \frac{3}{\ell (3\ell + 4)}A_2 \end{aligned}$$
(9.2d)

in case 2, that is, when \(A_1 = 2\ell /(m + \ell - 1)\) and \(A_2 = \ell (\ell + 1)/(m + \ell - 1)\) (see A.6 in the “Appendix”). It is convenient to divide each of the Eq. (9.2) by \(A_2\). Equations (9.2a) and (9.2c) become:

$$\begin{aligned} \ell c_1 + c_2 = 0,\quad \frac{2}{\ell + 1}c_1 + c_2 = -\frac{\ell - 1}{\ell (\ell + 1)} \end{aligned}$$

which implies \(c_1 = 1/[\ell (\ell + 2)]\) and \(c_2 = -1/(\ell + 2)\). Equations (9.2b) and (9.2d) become:

$$\begin{aligned} \ell d_1 + d_2 = \frac{1}{2\ell },\quad \frac{2}{\ell + 1}d_1 + d_2 = \frac{1}{\ell (\ell + 1)} \end{aligned}$$

which implies \(d_1 =1/[2\ell (\ell + 2)]\) and \(d_2 = 1/[\ell (\ell + 2)]\). Thus we take the denominator degrees of freedom in approximation (9.1) to be:

$$\begin{aligned} m^\ddag = \frac{2\ell (\ell + 2) + 2(A_1 - \ell A_2)}{A_1 + 2A_2}. \end{aligned}$$
(9.3)

Let

$$\begin{aligned} \lambda ^\ddag = \frac{m^\ddag }{(m^\ddag -2)E^\ddag } \end{aligned}$$

where \(E^\ddag = E^*\) in (7.1).

10 When the three modifications are identical

Under certain conditions, the KR modification in Sect. 7 above and the two alternative modifications in Sects. 8 and 9 are identical. For proofs of the results in this section, see the “Appendix”.

Lemma 1

  1. (a)

    If \(A_1/A_2 = \ell \), then the three modifications are identical:

    $$\begin{aligned} m^* = m^\dag = m^\ddag = 2\ell /A_2 \quad \text {and}\quad \lambda ^* = \lambda ^\dag = \lambda ^\ddag = 1. \end{aligned}$$
  2. (b)

    If \(A_1/A_2 = 2/(\ell + 1)\), then the three modifications are identical:

    $$\begin{aligned} m^* = m^\dag = m^\ddag = \frac{\ell (\ell + 1)}{A_2} - (\ell - 1) \quad \text {and}\quad \lambda ^* = \lambda ^\dag = \lambda ^\ddag = 1 - \frac{\ell - 1}{\ell (\ell + 1)}A_2. \end{aligned}$$

Lemma 2

When \(\ell = 1\), then \(A_1 = A_2\).

Theorem 1

When \(\ell = 1\), the three modifications are identical, with denominator degrees of freedom \(m^* = 2/A_2\) and scale factor \(\lambda ^* = 1\).

The fact that \(\ell = 1\) implies \(\lambda ^* = 1\) is stated in Kenward and Roger (1997, p. 988).

Suppose the design matrix of a model is partitioned as \(\mathbf {X}= [\begin{array}{*{2}{c}} \mathbf {X}_1&\mathbf {X}_2 \end{array}] \) so that \(\mathrm {E}(\mathbf {y}) = \mathbf {X}_1{\varvec{\beta }}_1 + \mathbf {X}_2{\varvec{\beta }}_2\). Note that

$$\begin{aligned} \mathbf {X}'{\varvec{\Sigma }}^{-1}\mathbf {X}= \begin{bmatrix} \mathbf {X}'_1{\varvec{\Sigma }}^{-1}\mathbf {X}_1&\quad \mathbf {X}'_1{\varvec{\Sigma }}^{-1}\mathbf {X}_2\\ \mathbf {X}'_2{\varvec{\Sigma }}^{-1}\mathbf {X}_1&\quad \mathbf {X}'_2{\varvec{\Sigma }}^{-1}\mathbf {X}_2 \end{bmatrix}. \end{aligned}$$

Lemma 3

In a model partitioned as above, suppose:

  1. (a)

    the null hypothesis involves only the parameters in \({\varvec{\beta }}_2\) so that \(\mathbf {L}'{\varvec{\beta }}= \mathbf {L}'_2{\varvec{\beta }}_2\);

  2. (b)

    \(\mathbf {X}'_1{\varvec{\Sigma }}^{-1}\mathbf {X}_2 = \mathbf {0}\);

  3. (c)

    \(\mathbf {X}'_2{\varvec{\Sigma }}^{-1}\mathbf {X}_2 = f({\varvec{\theta }})\mathbf {C}\)

where f is a scalar-valued function and \(\mathbf {C}\) is a matrix not depending on \({\varvec{\theta }}\). Then \(A_1/A_2 = \ell \).

Consider a balanced incomplete block design (BIBD) with s blocks, each containing k plots, and with t treatments, each applied to r plots. For each pair of treatments, the number of blocks in which the two treatments appear together is the same number, say g, for all pairs. Suppose the treatment effects are fixed and the block effects are random. The model can be written as \(y_{iju} = \mu + \tau _i + b_j + e_{iju}\) for \(i = 1,\ldots ,t\), \(j = 1,\ldots ,s\), \(u = 1,\ldots ,n_{ij}\) (all \(n_{ij}\) are either 0 or 1) where the \(b_j\)’s and \(e_{iju}\)’s are independent of one another and are normally distributed with \(\mathrm {E}(b_j) = \mathrm {E}(e_{iju}) = 0\), \({{\mathrm{Var}}}(b_j) = \sigma _b^2\), \({{\mathrm{Var}}}(e_{iju}) = \sigma _e^2\). In matrix notation, \(\mathbf {y}= \mathbf {1}_n\mu + \mathbf {T}{\varvec{\tau }}+ \mathbf {B}\mathbf {b}+ \mathbf {e}\) where \(\mathbf {1}_n\) is an \(n \times 1\) vector of 1’s, \(n = n_{\cdot \cdot } = tr = sk\), \({\varvec{\tau }}= (\tau _1,\ldots ,\tau _t)'\), \(\mathbf {b}= (b_1,\ldots ,b_s)'\), \(\mathbf {T}'\mathbf {T}= r\mathbf {I}_t\), \(\mathbf {B}'\mathbf {B}= k\mathbf {I}_s\), \(\mathbf {T}'\mathbf {B}= \mathbf {N}= [n_{ij}]_{t \times s}\). The design matrix for the fixed effects is \( [\begin{array}{*{2}{c}} \mathbf {1}_n&\mathbf {T}\end{array}] \), which does not have full column rank, and so to achieve the assumptions of model (2.1) we reparameterize by setting \(\mu ^* = \mu + {\bar{\tau }}_{\cdot }\) and \(\tau _i^* = \tau _i - {\bar{\tau }}_{\cdot }\) for \(i = 1,\ldots ,t - 1\). Then \(\mu + \tau _i = \mu ^* + \tau _i^*\) for \(i = 1,\ldots ,t - 1\) and \(\mu + \tau _t = \mu ^* - \tau _1^* - \cdots - \tau _{t - 1}^*\). This is an instance of model (2.1), with a full-column-rank design matrix that is partitioned as

$$\begin{aligned} \mathbf {X}= \begin{bmatrix} \mathbf {1}_n&\mathbf {T}^* \end{bmatrix}, \quad {\varvec{\beta }}= \begin{bmatrix} \mu ^*\\ \tau ^* \end{bmatrix} \end{aligned}$$

and with \({\varvec{\Sigma }}= \sigma _b^2\mathbf {B}\mathbf {B}' + \sigma _e^2\mathbf {I}_n\).

A common null hypothesis for a block design is \(\mathrm {H}_0:\tau _1 = \cdots = \tau _t\) or, in terms of the reparameterization, \(\mathrm {H}_0:\tau _1^* = \cdots = \tau _{t - 1}^* = 0\) or \(\mathrm {H}_0:{\varvec{\tau }}^* = \mathbf {0}\). This testing problem satisfies the conditions of Lemma 3 with \(\mathbf {X}_1 = \mathbf {1}_n\), \(\mathbf {X}_2 = \mathbf {T}^*\), \(\mathbf {L}_2 = \mathbf {I}_{t - 1}\).

Theorem 2

For testing the equality of the treatment effects in a BIBD model with random blocks, the three modifications are identical, with denominator degrees of freedom \(m^* = 2(t - 1)/A_2\) and scale factor \(\lambda ^* = 1\).

11 Simulation study

Simulations were run in order to compare the performance of the KR test with the two alternative KR-type tests as well as with the unmodified version of the KR test (see Sect. 4). The models we chose to simulate have incomplete block designs that are not balanced: complete block designs with missing observations, partially balanced incomplete block designs (PBIBDs), BIBDs with missing observations, and PBIBDs with missing observations.

The models can be written as \(y_{iju} = \mu + \tau _i + b_j + e_{iju}\) for \(i = 1,\ldots ,t\), \(j = 1,\ldots ,s\), \(u = 1,\ldots ,{n_{ij}}\) where all \(n_{ij}\) are either 0 or 1. The incidence matrix \(\mathbf {N} = [n_{ij}]\) specifies the design. As in the preceding section, we can re-express the model as \(y_{iju} = \mu ^* + \tau _i^* + b_j + e_{iju}\) for \(i = 1,\ldots ,t - 1\) and \(y_{tju} = \mu ^* - \tau _1^* - \cdots \tau _{t - 1}^* + b_j + e_{tju}\) for \(i = t\). The hypothesis of interest is \(\mathrm {H}_0:\tau _1^* = \cdots = \tau _{t - 1}^* = 0\), so the null model is \(y_{iju} = \mu ^* + b_j + e_{iju}\) all iju. This model can be simulated by generating the \(b_j\) as an i.i.d. sample from the \(\mathrm {N}(0,\sigma _b^2)\) distribution and generating the \(e_{iju}\) as an i.i.d. sample from the \(\mathrm {N}(0,\sigma _e^2)\) distribution. For each design we considered five values of the ratio \(\rho = \sigma _b/\sigma _e\), namely \(\rho = 0.25,0.5,1,2,4\).

The following lemma implies there is no loss of generality in setting \(\mu ^* = 0\) and \(\sigma _e = 1\) when simulating the null distribution of the KR test statistic. To make explicit the dependence of \(F_{{\mathrm {KR}}}\) on the data let us write \(F_{{\mathrm {KR}}} = F_{{\mathrm {KR}}}(\mathbf {y})\).

Lemma 4

For the null model, \(F_{{\mathrm {KR}}}(\mathbf {y}) {\mathop {=}\limits ^{\mathrm {d}}}F_{{\mathrm {KR}}}(\mathbf {y}^\S )\) where the components of \(\mathbf {y}^\S \) are \(y_{iju}^\S = b_j^\S + e_{iju}^\S \) and can be simulated by generating the \(b_j^\S \) as an i.i.d. sample from the \(\mathrm {N}(0,{\rho ^2})\) distribution and generating the \(e_{iju}^\S \) as an i.i.d. sample from the \(\mathrm {N}(0,1)\) distribution.

Lemma 4 is an application of Lemma 5 below, which holds for the general testing problem described in Sect. 2.

Lemma 5

\(F_{{\mathrm {KR}}}(c\mathbf {y}+ \mathbf {X}\mathbf {b}_0) = F_{{\mathrm {KR}}}(\mathbf {y})\) for any \(c \ne 0\) and any \(\mathbf {b}_0\) satisfying \(\mathbf {L}'\mathbf {b}_0 = \mathbf {0}\).

In other words, for any vector \(\mathbf {a}\) in the space \(\{\mathbf {X}\mathbf {b}_0:\mathbf {L}'\mathbf {b}_0 = \mathbf {0}\}\) of possible mean vectors in the null model, \(F_{{\mathrm {KR}}}(c\mathbf {y}+ \mathbf {a}) = F_{{\mathrm {KR}}}(\mathbf {y})\). For an incomplete block design as described above, let \(c = \sigma _e^{-1}\), \(\mathbf {a}= -\sigma _e^{-1}\mu ^*\mathbf {1}\), and \(\mathbf {y}^\S = c\mathbf {y}+ \mathbf {a}\). Then \(y_{iju}^\S = \sigma _e^{-1}y_{iju} - \sigma _e^{-1}\mu ^* = \sigma _e^{-1}b_j + \sigma _e^{-1}e_{iju}\) and the \(b_j^\S = \sigma _e^{-1}b_j\) are i.i.d. \(\mathrm {N}(0,{\rho ^2})\) and the \(e_{iju}^\S = \sigma _e^{-1}e_{iju}\) are i.i.d. \(\mathrm {N}(0,1)\).

The block designs we studied are listed below (n = number of observations, t = number of treatments, s = number of blocks, k = maximum block size):

  • D96 = a PBIBD with n = 96, t = 16, s = 48, k = 2 (Green 1974, p. 65).

  • D60 = a PBIBD with n = 60, t = 15, s = 15, k = 4 (Cochran and Cox 1957, p. 456).

  • D40 = a design with n = 40, t = 6, s = 7, k = 6, obtained from a complete block design with t = 6 and s = 7 by deleting two observations from different blocks and different treatments.

  • D21a = a design with n = 21, t = 4, s = 9, k = 3, obtained from a BIBD in Kuehl (2000, p. 317) by deleting run 10 and treatment 550.

  • D21b = a design with n = 21, t = 9, s = 7, k = 3, obtained from a PBIBD in Kuehl (2000, p. 329) by deleting blocks 8 and 9.

  • D18a = a design with n = 18, t = 4, s = 5, k = 4, obtained from a complete block design with t = 4 and s = 5 by deleting two observations from different blocks and different treatments.

  • D18b = a design with n = 18, t = 7, s = 6, k = 3, obtained from a BIBD in John (1971, p. 219) by deleting the last block.

  • D12 = a design with n = 12, t = 6, s = 4, k = 3, obtained from a cyclic design in Kuehl (2000, p. 346) by deleting blocks 5 and 6.

The test procedures we studied were:

  • KRU \(=\) the unmodified precursor to the KR test that uses the approximation developed in Sect. 4 and ignores the modification derived in Sect. 7.

  • KR \(=\) the test procedure presented in Kenward and Roger (1997).

  • KRA1 \(=\) a variation of the KR test using the alternative modification in Sect. 8.

  • KRA2 \(=\) a variation of the KR test using the alternative modification in Sect. 9.

For each of the 40 models (8 designs \(\times \) 5 values of \(\rho \)), we generated 10,000 independent data vectors \(\mathbf {y}\) under the null distribution. For each well-behaved (defined in Sect. 11.1 below) data vector we calculated \(F_{{\mathrm {KR}}}\), \(m^\#\), \(\lambda ^\#\), \(m^*\), \(\lambda ^*\), \(m^\dag \), \(\lambda ^\dag \), \(m^\ddag \), \(\lambda ^\ddag \). For each of the four tests, the p value of the test was approximated to be \(p = {\mathrm {Prob}}\{\mathrm {F}(\ell ,m) > \lambda F_{{\mathrm {KR}}} \mid \mathbf {y}\}\) using the appropriate values of m and \(\lambda \). We measure the adequacy of the approximation by the proportion of these p values that are smaller than 0.05. Ideally we want this proportion to be close to 0.05.

11.1 Computational issues

The REML algorithm for computing estimates of the variance components sometimes failed to converge. The algorithm we used found a solution to the REML equations by iteratively applying equation (90) on p. 252 of Searle et al. (1992). For each of the 40 models, the percentage of data vectors for which the REML algorithm converged is shown in Table 1. For the three largest designs, convergence was achieved almost 100% of the time for all values of \(\rho \). For the four smaller designs with n = 21 or 18, convergence was achieved almost 100% of the time when \(\rho = 2\) or 4, that is when the variability of the block effects was high relative to the variability of the noise in the model. Convergence could possibly be improved by using suitable alternative numerical methods.

Table 1 The percentage of data vectors for which the REML algorithm converged, among 10,000 data vectors generated from each of 40 models (8 designs \(\times \) 5 values of \(\rho \)) under the null hypothesis. (100.0 denotes a percentage between 99.95 and 99.99% that has been rounded to one decimal place, whereas 100 denotes exactly 100%.)

Another computational problem that can occur is numerical instability due to division by numbers close to zero. For example, one data set generated from the model with design D21a and \(\rho =0.25\) yielded \(\lambda ^*=-79700\) (to three significant digits). The value of \(A_2\) for this data set is a very large positive number, 823000, so that \(E*=1/(1-A_2/\ell )=-\,0.00000365\) is a very small negative number, which leads to a very large negative value for \(\lambda ^*=[m^*/(m^*-2)]/E^*\) due to division by \(E^*\). Similarly, application of the two alternative KR-type procedures to this data set produced values of \(\lambda ^\dag \) and \(\lambda ^\ddag \) that were very large and negative.

The data sets generated from this model had a typical estimated scale factor \(\lambda ^*\) of about 1, so the value \(\lambda ^*=-79700\) is definitely an outlier. The mere fact that it is negative conflicts with our objective of finding values of \(\lambda ^*\) and \(m^*\) such that (7.5) is a good approximation, because an F distribution assumes only positive values. In the context of KR tests, let us say a data vector is well-behaved with regard to the KR procedure if (1) the REML algorithm converges, (2) the estimated expectation \(E^*\) is positive, and (3) the estimated denominator degrees of freedom \(m^*\) is \(> 4\). Condition (2) is appropriate because the test statistic \(F_{{\mathrm {KR}}}\) is a quadratic form (see (3.1)) that is approximately positive definite, because \({\hat{{\varvec{\Phi }}}}_\mathrm {A}= {\hat{{\varvec{\Phi }}}} + 2{\hat{{\varvec{\Lambda }}}}\) is approximately positive definite, because \({\hat{{\varvec{\Phi }}}}\) is positive definite and \({\hat{{\varvec{\Lambda }}}}\) is approximately equal to \({\varvec{\Lambda }}\), which is positive definite. Therefore it is reasonable to expect \(E(F_{{\mathrm {KR}}})\) to be positive. According to (7.1), condition (2) is equivalent to \(A_2<\ell \). Condition (3) is based on the fact that the derivation of the KR test uses the second moment of an F distribution, which requires the ddf to be \(> 4\). From formula (4.3) we see that conditions (2) and (3) imply that \(\lambda ^*\) is positive, which, as indicated in the discussion of the example above, is a sensible requirement. Replacing \(m^*\)in condition (3) by \(m^\dag \) (or by \(m^\ddag \)) gives the definition of well-behaved with regard to the KRA1 (or KRA2) procedure. The prevalence of good behavior with regard to the KR procedure is shown in Table 2; it was a problem only for the 25 smaller models (5 smallest designs \(\times \) 5 values of \(\rho \)).

Table 2 The percentage of data vectors that were well-behaved with regard to the KR procedure, among 10,000 data vectors generated from each of the 25 smaller models (5 designs \(\times \) 5 values of \(\rho \)) under the null hypothesis. (100.0 denotes a percentage between 99.95 and 99.99% that has been rounded to one decimal place, whereas 100 denotes exactly 100%.)

The percentages of data vectors that were well-behaved with regard to the KRA1 procedure are exactly the same as the percentages for the KR procedure, for all 25 smaller models. For designs D21a and D18a, the percentages of data vectors that were well-behaved with regard to the KRA2 procedure are the same as the percentages for the KR procedure. For designs D21b and D18b, the KRA2 percentages are a little bit higher than the KR percentages but by no more than about 1 percentage point. For design D12, the KRA2 percentages are greater than the KR percentages by at most 3 percentage points.

We see that most data vectors generated from design D12 are ill-behaved. We also looked at a smaller design D10 with n = 10, t = 5, s = 3, k = 4, obtained from a PBIBD in Kuehl (2000, p. 323) by deleting treatment 6. For all five values of \(\rho \), all 10,000 generated data vectors were ill-behaved. It could be worthwhile investigating other definitions of “well-behaved” in hope of including a larger proportion of “well-behaved” data vectors from smaller models.

11.2 Comparison of p values

For each of the 40 models (8 designs \(\times \) 5 values of \(\rho \)), we generated 10,000 independent data vectors \(\mathbf {y}\) under the null distribution. For each well-behaved data vector we calculated, for each of the four tests, the p value of the test. In Table 3 is displayed, for each model and each test, the observed percentage (restricting attention to well-behaved data sets) of p values that were less than 0.05. Ideally we want this percentage to be close to 5%. Of course we should keep in mind that for any given entry (that is, any given model and any given test procedure) in Table 3, if it were true that the long-run percentage of p values less than 0.05 was exactly 5%, we nevertheless could not expect the observed percentage to be exactly 5%, because of simulation error. The simulation standard error is \(22\%/\sqrt{n}\) where n is the number of well-behaved data sets.

Table 3 The observed percentages of p values that were less than 0.05 for four test procedures calculated from the well-behaved data sets among 10,000 data sets generated from each of 40 models (8 designs \(\times \) 5 values of \(\rho \))

In Table 3 we see that the three modified tests KR, KRA1 and KRA2 performed exactly the same (after rounding to two decimal places) in 22 out of the 40 models. In every one of the 40 models the percentages for the three modified KR tests were within 0.20% of one another. In many practical settings 0.20% could be regarded as being an unimportant difference and we could say that, in the models that were studied, when testing the equality of treatment effects in unbalanced incomplete block models with random blocks, the three modified KR tests gave very similar results. Moreover, recall from Sect. 10 that the three modified tests coincide when testing a one-dimensional hypothesis and when testing the equality of treatment effects in a BIBD model with random blocks.

It is clear from Tables 2 and 3 that none of the four test procedures perform adequately for models with design D12, so we will omit D12 from further discussion.

Table 3 shows that the modification step in Sect. 7 (or an alternative modification as in Sects. 8 or 9) is worth doing. The modifications in Sects. 7, 8 and 9 are all motivated by considering two special cases which can be regarded as “balanced” and admit well-known exact tests. The modifications change the formulas of KRU so as to produce these exact tests in the two special cases. In the “unbalanced” models that we simulated, the modifications do not produce exact tests but they do achieve an improvement in performance over the unmodified test. The unmodified KRU test is highly significantly liberal (i.e., its percentage in Table 3 is highly significantly larger than 5%) for 30 of the 35 models (omitting design D12). For all 35 models the modified tests are more conservative than for the unmodified test. For 23 of the 35 models the percentages for the modified tests are not significantly different from 5%, and for 11 models the modified tests are significantly liberal but less so than the unmodified test.

As would be expected, the similarity in performance of the KR, KRA1 and KRA2 tests is due in large part to the similarity of the values of \(m^*\), \(m^\dag \) and \(m^\ddag \) and the values of \(\lambda ^*\), \(\lambda ^\dag \) and \(\lambda ^\ddag \). See Tables 4 and 5 in A.14 in the “Appendix”. For each of the 35 models (omitting design D12), the average values of the scale factor shown in Table 5 for the three modified KR procedures are close to 1 (hence close to one another), or specifically, between 0.958 and 1.000 (when rounded to three decimal places). Recall from Sect. 10 that for all BIBD designs all three modified KR procedures have scale factor equal to 1. For each of the designs D96, D60, D40, D21a and D18a and each value of \(\rho \), the average values of the scale factor for all three modified KR procedures are very close to 1, namely between 0.995 and 1.000. For each of the designs D21b and D18b and each value of \(\rho \), the average scale factors for the three modified KR procedures are within 0.030 of one another. The average value of \(\lambda ^\#\) in the unmodified test KRU, which ranged between 0.813 and 0.995, was noticeably different from the average values of \(\lambda ^*\), \(\lambda ^\dag \) and \(\lambda ^\ddag \) in the three modified tests for most of the 35 models.

For all of the 35 models, the average values of \(m^*\), \(m^\dag \) and \(m^\ddag \) shown in Table 4 for the three modified KR procedures were relatively close to one another. For 20 out of the 35 models, the average values of \(m^*\), \(m^\dag \) and \(m^\ddag \) were close enough to one another to coincide when rounded to one decimal place. The average value of \(m^\#\) in the unmodified test KRU, was noticeably different from the average values of \(m^*\), \(m^\dag \) and \(m^\ddag \) in the three modified tests for most of the 35 models.

The simulations show that, for the models we studied, the modification step is an important part of the derivation of Kenward and Roger’s (1997) test and that the three modification methods presented in Sects. 7, 8 and 9 give very similar results. Our simulations are focused on block-design models—more simulation studies are needed to determine if similar statements can be made about Kenward–Roger-type tests of linear hypotheses on fixed effects in other normal mixed linear models.