1 Introduction

The \(u{\scriptstyle \times }u\) covariance matrix \({\varvec{\varTheta }}\) has a compound symmetric structure, with diagonal elements \(\sigma _0\) and off-diagonal elements \(\sigma _1\), if it is a positive-definite matrix that can be written as

$$\begin{aligned} \begin{array}{rcl} {\varvec{\varTheta }} &{} = &{} \left[ \begin{array}{ccccc} \sigma _0 &{} \sigma _1 &{} \sigma _1 &{} \ldots &{} \sigma _1 \\ \sigma _1 &{} \sigma _0 &{} \sigma _1 &{} \ldots &{} \sigma _1 \\ \sigma _1 &{} \sigma _1 &{} \sigma _0 &{} \ldots &{} \sigma _1 \\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ \sigma _1 &{} \sigma _1 &{} \sigma _1 &{} \ldots &{} \sigma _0 \end{array} \right] = \,\sigma _0{\mathbf {I}}_u+\sigma _1\left( {\mathbf {J}}_u-{\mathbf {I}}_u\right) = (\sigma _0-\sigma _1){\mathbf {I}}_u+\sigma _1{\mathbf {J}}_u \end{array} \end{aligned}$$
(1)

where \(-\sigma _0/(u-1)<\sigma _1<\sigma _0\) and \(\sigma _0>0\), and where \({\mathbf {I}}_u\) denotes the identity matrix of order u and \({\mathbf {J}}_u\) denotes a \(u\times u\) matrix of 1’s.

Compound symmetry (CS) is a widely used or assumed covariance structure (Timm 2002, Sect. 3.8). As a result of its wide application in many different statistical models, it is also known under a few other designations. Right from (1), the CS structure is also called equivariance-equicovariance or equivariance-equicorrelation (Vonesh and Chinchilli 1997, Sect. 3.2) and sometimes just referred to as equicorrelation structure, which may be a somewhat misleading designation since equicorrelation may occur without CS occurring, since we may have equicorrelation without having equal variances. Morrison (1976) addresses, mainly in Chapter 8, a number of examples based on real data for which equicorrelation may be or may seem to be a plausible model for covariances, or at least one that one might be interested in testing. However, for some of these examples, CS may be not a plausible model given that the variances are not equal. As such, the alternative designation of exchangeable covariance or exchangeable correlation for the CS structure is indeed more adequate (Demidenko 2004, Sect. 2.4, 7.10).

Verbeke and Molenberghs (2000) do a quite thorough assessment of the application and usefulness of the CS as a covariance structure in linear mixed models and repeated measures or longitudinal data models. In Chapter 1 they stress the fact that “the marginal model corresponding to a random-intercept model” is a model with CS covariance structure [see also Demidenko (2004, Sects. 2.4, 7.2)], and in Sect. 3.3 they show how the general linear mixed-effects model is indeed a model with a CS covariance structure, where

$$\begin{aligned} \rho =\frac{\sigma _1}{\sigma _0} \end{aligned}$$
(2)

is commonly called the intraclass correlation—see also Timm (2002, Sect. 3.9.d) and Kutner et al. (2005, Sect. 25.5). The CS covariance structure is thus also sometimes called the intraclass correlation structure, since \({\varvec{\varTheta }}\) in (1) may be written as

$$\begin{aligned} {\varvec{\varTheta }} = \sigma _0\left\{ (1-\rho ){\mathbf {I}}_u+\rho {\mathbf {J}}_u\right\} = \sigma _0\left\{ {\mathbf {I}}_u+\rho ({\mathbf {J}}_u-{\mathbf {I}}_u)\right\} . \end{aligned}$$

Other names given to \(\rho \) in (2) are familial, intrablock or intracluster correlation (Rao 1945, 1953; King and Evans 1986).

Vonesh and Chinchilli (1997, Sect. 3.2) establish the equivalence between the univariate random effects model and a multivariate (manova) model with a CS covariance matrix and state in Section 7.3 that CS “arises naturally from split-plot type designs”.

Timm (2002, Sect. 3.9.d) also brings to our attention that the CS structure for the covariance matrix in a univariate mixed anova model is also a sufficient condition for the existence of an exact F test for testing the null hypothesis of equality of the treatment effects in this model. This is so because the CS structure is indeed a particular case of the type H matrices introduced by Huynh and Feldt (1970) who proved that this type H structure is the necessary covariance structure for the existence of such exact F tests in univariate repeated measures designs.

Two other interesting models where the CS covariance structure is used are brought to us by Matos et al. (2016) and Zimmerman and Núñez-Antón (2001). Matos et al. (2016) use a CS covariance structure to model censored data collected irregularly over time with mixed-effects models. These authors also bring to our attention that the CS structure is a particular case of the damping exponential correlation structure proposed by Muñoz et al. (1992). Zimmerman and Núñez-Antón (2001) use CS as a plausible covariance structure in “models for unbalanced data having some kind of dependence structure, all within the context of having a real continuous response variable and real explanatory variables”.

Qu and Li (2006) also use the CS structure as a model for the so-called “working correlation”, introduced by Liang and Zeger (1986) for longitudinal data models and the importance of testing this structure is also brought to us by these authors (Qu and Li 2006, pg. 381), when they state that “If the working correlation R is misspecified, the estimator of the regression parameter is still consistent, but is not efficient within the same class of estimating functions”. Li and Wong (2010) also refer, in the realm of longitudinal data models, that CS is one of the most commonly used covariance structures “to model the correlations among the repeated observations from the same subject”, given its simple form and good interpretability, and they suggest that the likelihood ratio testing procedure would be an adequate testing procedure for this covariance structure, even in the domain of semi-parametric models.

CS is a very parsimonious covariance structure which, as seen from (1), describes the whole covariance structure with only two parameters. The assumption of this structure may improve estimation and the power of tests as stated by Vonesh and Chinchilli (1997, Sect. 3.2, 7.3), besides allowing for the estimation to be adequately done with smaller sample sizes, given the fact that the covariance matrix is being modeled by a smaller number of parameters. Even in nonlinear models, as Malott (1990) states, “by incorporating the compound symmetric structure into the model, substantial improvements in the estimation of the covariance matrix for the parameter estimates are obtained”.

King and Evans (1986) bring up the importance of testing for CS covariance structures when they cite Scott and Holt (1982) as having proved that ignoring such correlation structures may lead “to seriously misleading confidence intervals and hypothesis tests based on inefficient ordinary least squares estimates”. A similar issue is also brought up by Vonesh and Chinchilli (1997, Sect. 7.3) who use the CS structure for linear and also nonlinear models, when these authors say that “ignoring compound symmetry in favor of a general covariance structure leads to significantly inflated Type I errors while correctly assuming compound symmetry leads to improved Type I errors”.

While all these strengthen the need for adequate testing procedures for the CS covariance structure, also in all these cases, the assumption of the CS structure always goes along with the assumption of normality—see also Jones (1993, Sect. 1.5). The likelihood ratio test (LRT) for the CS structure in (1), under the normality assumption, was developed by Wilks (1946).

In the present paper, the authors will address the multivariate or block CS (BCS) structure, where a set of m variables is measured at u time points, and where \({\varvec{\varTheta }}\) may be written as

$$\begin{aligned} {\varvec{\varTheta }} =\left[ \begin{array}{cccc} {\varvec{\varSigma }}_{0} &{} {\varvec{\varSigma }}_{1} &{} \ldots &{} {\varvec{\varSigma }}_{1} \\ {\varvec{\varSigma }}_{1} &{} {\varvec{\varSigma }}_{0} &{} \ldots &{} {\varvec{\varSigma }}_{1} \\ \vdots &{} &{} \ddots &{} \vdots \\ {\varvec{\varSigma }}_{1} &{} {\varvec{\varSigma }}_{1} &{} \ldots &{} {\varvec{\varSigma }}_{0} \end{array}\right] = {\mathbf {I}}_{u}\otimes \left( {\varvec{\varSigma }}_{0}- {\varvec{\varSigma }}_{1}\right) + {\mathbf {J}}_{u}\otimes {\varvec{\varSigma }}_{1}, \end{aligned}$$
(3)

where \({\varvec{\varSigma }}_{0}\) is a positive-definite symmetric \(m\times m\) matrix, and \({\varvec{\varSigma }}_{1}\) is a symmetric \(m\times m\) matrix, subject to the constraints \({\varvec{-}}\frac{1}{u-1}{\varvec{\varSigma }}_{0}<{\varvec{\varSigma }}_{1}\) and \(\ {\varvec{\varSigma }}_{1}<{\varvec{\varSigma }}_{0}\), which mean that \({\varvec{\varSigma }}_{0}-{\varvec{\varSigma }}_{1}\) and \({\varvec{\varSigma }}_{0}+(u-1){\varvec{\varSigma }}_{1}\) are positive-definite matrices, so that the \(mu\times mu\) matrix \({\varvec{\varTheta }}\) is also positive-definite (for a proof, see Lemma 2.1 by Roy and Leiva (2011)). A BCS structure as the one in (3) arises whenever m response variables are measured and modeled at any given site or time point and we would use for each single response variable a CS covariance matrix.

The \({m \times m}\) diagonal blocks \({\varvec{\varSigma }}_{0}\) in \({\varvec{\varTheta }}\) represent the variance–covariance matrix of the m response variables at any given site or time point, whereas the \({m \times m}\) off diagonal blocks \({\varvec{\varSigma }}_{1}\) in \({\varvec{\varTheta }}\) represent the covariance matrix of the m response variables between any two different sites or time points. \({\varvec{\varSigma }}_{0}\) is assumed to be constant for all sites or time points, and \({\varvec{\varSigma }}_{1}\) is also assumed to be the same for any two different sites or time points.

If \(Y_{tj}\) \({(t=1,\ldots ,u;j=1,\ldots ,m)}\) denotes the j-th variable measured on site or time t, once the BCS structure is assumed, we will have

$$\begin{aligned} \mathrm{Var}(Y_{tj})=\mathrm{Var}(Y_{sj})\quad \mathrm{and}\quad \mathrm{Cov}(Y_{tj},Y_{tk})=\mathrm{Cov}(Y_{sj},Y_{sk}) \end{aligned}$$

for all \({t, s\in \{1,\ldots ,u\}}\) and \({j, k\in \{1,\ldots ,m\}}\), that is,

$$\begin{aligned} \mathrm{Var}(\underline{Y}_t)=\mathrm{Var}(\underline{Y}_s)={\varvec{\varSigma }}_{0}\quad {\text { for all}}\quad {t, s\in \{1,\ldots ,u\}} \end{aligned}$$

where \(\mathrm{Var}(\underline{Y}_t)\) denotes the covariance matrix for the subvector \(\underline{Y}_t=[Y_{t1},\ldots ,Y_{tm}]'\) \({(t=1,\ldots ,u)}\), and also

$$\begin{aligned} \mathrm{Cov}(Y_{tj},Y_{sk})=\mathrm{Cov}(Y_{tk},Y_{sj})\quad {\text {for any}}\quad t, s\in \{1,\ldots ,u\}\quad {\text {and}}\quad j, k\in \{1,\ldots ,m\}, \end{aligned}$$

or, equivalently,

$$\begin{aligned} \mathrm{Cov}(\underline{Y}_t,\underline{Y}_s)={\varvec{\varSigma }}_{1},\quad {\text {for any}}\quad t, s\in \{1,\ldots ,u\},\quad {\text {with}}\quad t\ne s, \end{aligned}$$

where \({\varvec{\varSigma }}_{1}\) is a symmetric matrix.

Examples of multivariate models with this covariance structure are the multivariate repeated measurement or growth curve models used by Reinsel (1982) and the models used by Arnold (1979), Timm (1980) and Timm (2002, Sect. 6.5), Roy (2006) and Roy and Fonseca (2012).

As such, the need to test for BCS structure arises in many situations, namely those in which it is assumed as a structure for the covariance matrices involved in further analyses such as in many biomedical and medical researches. Indeed, one has to be very careful when assuming this structure for two-level multivariate data, since an incorrect assumption may result in wrong conclusions. Thus, testing the validity of this BCS structure is of vital importance before assuming it, for any statistical analysis and a few authors have marginally addressed this topic. Timm (2002, Sect. 6.5; 6.6), following Krishnaiah and Lee (1974, 1980), takes BCS as a particular case of the so-called linear structure, where \({\varvec{\varTheta }}\) can be written as

$$\begin{aligned} {\varvec{\varTheta }}=\sum _{i=1}^k \varvec{G_i}\otimes \varvec{\varSigma _i} \end{aligned}$$

where \(\varvec{G_1},\ldots ,\varvec{G_k}\) are known \(u\times u\) matrices which commute, and \(\varvec{\varSigma _1},\ldots ,\varvec{\varSigma _k}\) are unknown \(m\times m\) matrices. Then, he follows the testing procedure outlined by Krishnaiah and Lee (1974, 1980) and ends up recommending a chi-square approximation for the distribution of the LRT statistic. However, although this is a valid result in terms of convergence in distribution, it is indeed of no practical use, mainly when the sample sizes are not huge. As shown for example by Coelho et al. (2016) the chi-square approximation only works for quite large sample sizes when the overall number of variables involved is rather small. Since in the present situation, although the BCS covariance structure is a quite parsimonious one in terms of the number of parameters used to model the whole covariance matrix, we will anyway be dealing with quite large numbers of variables, the chi-square approximation would only work for extremely large sample sizes, and even in these cases would give a much worse approximation than the one that is obtained in the present paper. Krishnaiah and Lee (1974, 1980) address the test for BCS structure in general terms, encompassed in a general testing scheme for the linear structure and recommend the use of Box (1949) approximation for the distribution of the LRT statistic, anyway without addressing specifically the test for BCS structure. But also Coelho and Marques (2012) show how in situations where the number of variables is moderately large to large, the asymptotic distributions obtained using Box’s approximation may give quantiles and p-values which may fall quite far from the exact ones, since in these situations such asymptotic distributions commonly are not even real distributions, since both the p.d.f.’s and c.d.f.’s may assume values below zero.

Thus, our goal is to develop an approach which is not only able to allow for an easy way to obtain the LRT statistic to test BCS and the full characterization of its exact distribution, but which is furthermore able to allow for an easy way to obtain very sharp, but very manageable, near-exact approximations for the distribution of the LRT statistic. All this in order to make this test easy to implement in practice, since its practical application has been hindered by the complexity of the exact distribution of its LRT statistic. Moreover, the approach followed, allowing to express the overall LRT statistic as the product of the LRT statistic to test independence of groups of variables by the LRT statistic to test equality of covariance matrices, also allows for the immediate extension of the results obtained to populations with elliptically contoured distributions. In Chapters 8–10 of Anderson (2003), it is shown that, under the corresponding null hypotheses, the distributions of these two LRT statistics remain the same either for normally distributed or elliptically contoured distributions. As such, although the distribution obtained for the BCS LRT statistic is derived under the multivariate normality assumption, based on these results, both the exact as well as the near-exact distributions obtained remain valid for elliptically contoured distributions, thus widening much the scope of the results obtained.

Further sections in this paper are: Sect. 2, where the null hypothesis is formulated in two equivalent ways, the second of which will open the way for an easy means to obtain the LRT statistic to test BCS and also for two equivalent ways to characterize its exact distribution, the second of which will then lead the way to Sect. 3 where sharp near-exact distributions are obtained for this statistic; then in Sect. 4, some numerical studies are carried out to show how sharp the near-exact distributions developed are, even for very small sample sizes and also for large numbers of variables involved, situations in which the chi-square approximation is shown to not perform well. In Sect. 5, a simple real-data example is used to illustrate how the near-exact approximations developed may be used and a simulation study is carried out to show that if one thinks that by simulating the Beta random variables involved in the exact distribution of the LRT statistic quite sharp p-values and quantiles may be obtained, this will not be the case, even for quite long simulations, given the quite large number and variety of parameters of those Beta random variables. Finally in Sect. 6, some final conclusions are drawn.

2 Formulation of the hypothesis and the likelihood ratio test

Let us assume that \(\underline{Y}=[\underline{Y}'_1,\ldots ,\underline{Y}'_u]'\sim N({\varvec{\mu }},{\varvec{\varSigma }})\) and that we are interested in testing the hypothesis

$$\begin{aligned} H_0:{\varvec{\varSigma }}={\varvec{\varTheta }}, \end{aligned}$$
(4)

where \({\varvec{\varTheta }}\) is defined in (3), versus the alternative hypothesis that \({\varvec{\varSigma }}\) is only positive-definite.

In Lemma 3.1 by Roy and Fonseca (2012), it is shown that we may write

$$\begin{aligned} {\varvec{\varGamma }} {\varvec{\varTheta }} {\varvec{\varGamma ^\prime }}= \left[ \begin{array}{cc} {\varvec{\Delta }}_{2} &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{I}}_{u-1}\otimes {\varvec{\Delta }}_{1}\\ \end{array} \right] , \end{aligned}$$

where

$$\begin{aligned} {\varvec{\Delta }}_{1}= & {} {\varvec{\varSigma }}_{0}-{\varvec{\varSigma }}_{1}, \nonumber \\ {\varvec{\Delta }}_{2}= & {} {\varvec{\varSigma }}_{0}+\left( u-1\right) {\varvec{\varSigma }}_{1}, \end{aligned}$$

and \({\varvec{\varGamma }}=\underset{u\times u}{{\varvec{C}}^{*\prime }} \otimes {\varvec{I}}_{m}\), with \({\varvec{C}}^*\) an orthogonal Helmert matrix whose first column is proportional to a vector of 1’s.

Since \({\varvec{\varGamma }}\) is not a function of either \({\varvec{\varSigma }}_0\), or \({\varvec{\varSigma }}_1\), to test \(H_0\) in (4) is equivalent to test

$$\begin{aligned} H_0:{\varvec{\varSigma }}^{*}={\varvec{\varOmega }} \end{aligned}$$
(5)

where

$$\begin{aligned} {\varvec{\varSigma }}^{*}={\varvec{\varGamma }} {\varvec{\varSigma }} {\varvec{\varGamma }}^{\prime } ~~~~~~\mathrm{and}~~~~~~ {\varvec{\varOmega }}={\varvec{\varGamma }} {\varvec{\varTheta }} {\varvec{\varGamma }}^{\prime }. \end{aligned}$$

The null hypothesis in (5) may be split as

where ‘’ means ‘after’, and where

$$\begin{aligned} H_{0a}:{\varvec{\varSigma }}^*={\text {block-diag}}({\varvec{\varSigma }}^*_i,\,i=1,\ldots ,u), \end{aligned}$$
(6)

is the hypothesis of independence of the u diagonal blocks of size \(m\times m\) of \({\varvec{\varSigma }}^*\), and

$$\begin{aligned} H_{0b|a}:&{\varvec{\varSigma }}^*_2=\cdots ={\varvec{\varSigma }}^*_u,\nonumber \\&{\text {assuming }}H_{0a} \end{aligned}$$
(7)

is the null hypothesis corresponding to the test of equality of the \({u-1}\) covariance matrices \({\varvec{\varSigma }}^*_{2},\ldots ,{\varvec{\varSigma }}^*_{u}\), assuming \(H_{0a}\).

The LRT statistic to test \(H_{0a}\) in (6) is, for a sample of size n, given by Anderson (2003, Sect. 9.2) as

$$\begin{aligned} \varLambda _a=\left( \frac{|{\varvec{A}}|}{\prod _{j=1}^{u} |{\varvec{A}}_j|}\right) ^{n/2}, \end{aligned}$$

where \({{\varvec{A}}={\varvec{\varGamma }}{{\varvec{\hat{\varSigma }}}}{\varvec{\varGamma }}'}\) is the maximum likelihood estimator of \({\varvec{\varSigma }}^*\), and \({\varvec{A}}_j\) its \(m\times m\) j-th diagonal block (\({\varvec{\hat{\varSigma }}}\) being the maximum likelihood estimator of \({\varvec{\varSigma }}\)).

The LRT statistic to test \(H_{0b|a}\) in (7) is (Anderson 2003, Section 10.2)

$$\begin{aligned} \varLambda _b=\left( (u-1)^{m(u-1)}\frac{\prod _{j=2}^{u}|{\varvec{A}}_{j}|}{|{\varvec{A}}^*|^{u-1}}\right) ^{n/2}, \end{aligned}$$
(8)

where

$$\begin{aligned} {\varvec{A}}^*=\sum _{j=2}^{u} {\varvec{A}}_{j}. \end{aligned}$$

Then, the LRT statistic to test \(H_0\) in (5) will be

$$\begin{aligned} \varLambda = \varLambda _a\varLambda _b = \left( (u-1)^{m(u-1)}\frac{|{\varvec{A}}|}{|{\varvec{A}}_1| |{\varvec{A}}^*|^{u-1}}\right) ^{n/2}, \end{aligned}$$
(9)

with the h-th moment of \(\varLambda \), under \(H_0\) in (4) or (5), given by

$$\begin{aligned} E\left( \varLambda ^h\right) =E\left( \varLambda _a^h\right) E\left( \varLambda _b^h\right) , \end{aligned}$$
(10)

since, under \(H_{0a}\), \(\varLambda _a\) is independent of \({\varvec{A}}_1,\ldots ,{\varvec{A}}_u\) (Marques and Coelho 2012; Coelho and Marques 2013), which makes \(\varLambda _a\) independent of \(\varLambda _b\), given that this latter one is only function of \({\varvec{A}}_2,\ldots ,{\varvec{A}}_u\). Since the range of \(\varLambda \) is delimited, from this expression for the h-th moment of \(\varLambda \) under \(H_0\) in (4) or (5), we will then be able to obtain the characterization of the distribution of \(\varLambda \) under this null hypothesis, the second version of which, obtained at the end of this section, will then enable us to obtain in the next section very sharp near-exact distributions for \(\varLambda \).

In (10) we have, under \(H_{0a}\) in (6), (Marques et al. 2011)

(11)

where

$$\begin{aligned} k^*=\left\{ \begin{array}{ll} \displaystyle \left\lfloor u/2\right\rfloor , &{} \quad m {\text { odd}}\\ 0, &{}\quad m {\text { even}}, \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} r_j=\left\{ \begin{array}{ll} h_{j-2}+(-1)^j k^*, &{}\quad j=3,4\\ r_{j-2}+h_{j-2}, &{} \quad j=5,\ldots ,mu \end{array} \right. \end{aligned}$$
(12)

with

$$\begin{aligned} h_j=\left\{ \begin{array}{cl} uv-1, &{}\quad j=1,\ldots ,m\\ -1, &{}\quad j=m+1,\ldots ,mu-2, \end{array} \right. \end{aligned}$$
(13)

while for \(\varLambda _b\) we have, under \(H_{0b|a}\) in (7),

(14)

where \(s_j\) \({(j=2,\ldots ,m)}\) are given in Appendix 1 and where is the remainder of the integer division of m by 2.

Since the supports of \(\varLambda _a\) and \(\varLambda _b\) are delimited, their distributions are defined by their moments and, as such, from the first expression in (11) we may write, under \(H_{0a}\),

$$\begin{aligned} \varLambda _a\sim \prod _{k=1}^{u-1}\prod _{j=1}^m (X_{jk})^{\frac{n}{2}},\quad \mathrm{with}\quad X_{jk}\sim Beta\left( \frac{n-(u-k)m-j}{2},\frac{(u-k)m}{2}\right) , \end{aligned}$$
(15)

where \(X_{jk}\) \({(j=1,\ldots ,m;k=1,\ldots ,u-1)}\) are independent, while from the first expression in (14), under \(H_{0b|a}\),

$$\begin{aligned} \varLambda _b\sim \prod _{j=1}^m\prod _{k=1}^{u-1} (X^*_{jk})^{\frac{n}{2}},\quad \mathrm{with}\quad X^*_{jk}\sim Beta\left( \frac{n-j}{2},\frac{2k+(u-2)j-u}{2}\right) , \end{aligned}$$
(16)

where \(X^*_{jk}\) \({(j=1,\ldots ,m;k=1,\ldots ,u-1)}\) are independent, so that, under \(H_0\) in (4) or (5),

$$\begin{aligned} \varLambda \sim \prod _{j=1}^m\left\{ \left( \prod _{k=1}^{u-1}X_{jk}\right) \left( \prod _{k=1}^{u-1}X^*_{jk}\right) \right\} , \end{aligned}$$
(17)

where all random variables are independent.

On the other hand, based on the results in Appendix 2 and from the second expressions in (11) and (14) we may write, for \(\varLambda _a\),

$$\begin{aligned} \varLambda _a\sim \left( \prod _{j=3}^{mu}e^{-Z_j}\right) \left( \prod _{j=1}^{k^*}(W_j)^{\frac{n}{2}}\right) \end{aligned}$$
(18)

where

$$\begin{aligned} Z_j\sim \varGamma \left( r_j,\frac{n-j}{n}\right) ~~\mathrm{and}~~W_j\sim Beta\left( \frac{n-2}{2},\frac{1}{2}\right) \end{aligned}$$

are all independent r.v.’s (random variables), while for \(\varLambda _b\) it is possible to write

(19)

where

$$\begin{aligned} Z_j^*\sim \varGamma \left( s_j,\frac{n-j}{n}\right) ,~~W_{1jk}^*\sim Beta\left( n+\left\lfloor \frac{k-2j}{u-1}-1\right\rfloor ,\frac{k-2j}{u-1}-\left\lfloor \frac{k-2j}{u-1}\right\rfloor \right) , \end{aligned}$$

and

$$\begin{aligned}&\displaystyle W_{2k}^*\sim Beta\left( \frac{n-m}{2}+\left\lfloor \frac{m(u-1)-u-m+2k}{2(u-1)}\right\rfloor , \frac{m(u-1)-u-m+2k}{2(u-1)}\right. \\&\qquad \qquad \qquad \qquad -\left. \left\lfloor \frac{m(u-1)-u-m+2k}{2(u-1)}\right\rfloor \right) \end{aligned}$$

are all independent r.v.’s.

From (18) and (19), one may thus write, under \(H_0\) in (4) or (5),

(20)

where

$$\begin{aligned} T_j\sim \varGamma \left( \gamma _j,\frac{n-j}{n}\right) ,~~(j=2,\ldots ,mu) \end{aligned}$$

with

$$\begin{aligned} \gamma _j=\sum _{j=2}^{mu}\left( r_j^++s_j^+\right) \end{aligned}$$
(21)

where

$$\begin{aligned} r_j^+=\left\{ \begin{array}{ll} 0 &{} j=2\\ r_j &{} j=3,\ldots ,mu \end{array} \right. ~~~~~~\mathrm{and}~~~~~~ s_j^+=\left\{ \begin{array}{ll} s_j &{} j=2,\ldots ,m\\ 0 &{} j=m+1,\ldots ,mu \end{array} \right. \end{aligned}$$
(22)

where \(r_j\) are given by (12) and (13), \(s_j\) are given by (28)–(32) in Appendix 1, and all the other variables are defined as above.

The form of the distribution of \(\varLambda \) in (20), although it may look more complicated than the one in (17), is more useful for the development of the near-exact distributions, as it will be shown in the next section.

It should also be brought to the attention of the reader that, given the results stated at the end of Chapters 8–10 of Anderson (2003), the form of the distribution of \(\varLambda \) in (20) remains valid in case we consider for \(\underline{Y}\) any elliptically contoured distribution.

3 The characteristic function of \(W=-\log \,\varLambda \) and the near-exact approximation

From the developments in the previous section and the expression for \(E(\varLambda ^h)\), the characteristic function (c.f.) of \({W=-\log \,\varLambda }\) may be written as

(23)

where \(\gamma _j\) is given by (21) and \(\varPhi ^{}_{a,2}(\,\cdot \,)\) and \(\varPhi ^{}_{b,2}(\,\cdot \,)\) are defined in (11) and (14), and \(\varPhi ^{}_{W,1}(t)\) is actually equal to \(\varPhi ^{}_{a,1}(-{\mathrm {i}}t)\varPhi ^{}_{b,1}(-{\mathrm {i}}t)\), being these two functions also defined in (11) and (14).

Then, in building the near-exact distributions, \(\varPhi ^{}_{W,1}(t)\) will be kept untouched while \(\varPhi ^{}_{W,2}(t)\) will be asymptotically approximated by the c.f. of a finite mixture of Gamma distributions.

While \(\varPhi ^{}_{W,1}(t)\) is the c.f. of a GIG (Generalized Integer Gamma) distribution (Coelho 1998) of depth \(mu-1\), which is the distribution of the sum of mu independent Gamma distributed random variables, all with integer shape parameters, \(\varPhi ^{}_{W,2}(t)\) is the c.f. of the sum of independent Logbeta distributed random variables. For \({u=2}\) and even m, \(\varPhi ^{}_{W,1}(t)\) yields indeed the exact c.f. for W, which means that in this case we have the exact p.d.f. and c.d.f. of W and \(\varLambda \) in a simple closed form. This is, in the form of the p.d.f. and c.d.f. of a GIG distribution of depth 2m, with shape parameters \(\gamma _j\) given by (21) and rate parameters \((n-j)/n\) \({(j=1,\ldots ,2m)}\) for W, or the form of the p.d.f. and c.d.f. of an EGIG (Exponentiated Generalized Integer Gamma) distribution (Arnold et al. 2013) for \(\varLambda \).

It is based on the results in Sects. 5 and 6 of Tricomi and Erdélyi (1951), which show that the c.f. of a Logbeta(ab) distribution may be asymptotically approximated by the c.f. of an infinite mixture of \(\varGamma (b+j,a)\) \({(j=0,1,\ldots )}\) distributions that we will replace \(\varPhi ^{}_{W,2}(t)\) by

$$\begin{aligned} \varPhi ^{}_2(t)=\sum _{k=0}^{m^*} \pi _k\, \lambda ^{r+k}(\lambda -{\mathrm {i}}t)^{-(r+k)}, \end{aligned}$$
(24)

which is the c.f. of a finite mixture of Gamma distributions, all with the same rate parameter \(\lambda \). See Appendix 3 for further details on the approximation of \(\varPhi _{W,2}(t)\) by \(\varPhi _2(t)\). In (24), \(\lambda \) will be taken to be the rate parameter in

$$\begin{aligned} \varPhi ^*(t)=\theta \lambda ^{\tau _1}(\lambda -{\mathrm {i}}t)^{-\tau _1}+(1-\theta )\lambda ^{\tau _2}(\lambda -{\mathrm {i}}t)^{-\tau _2} \end{aligned}$$

where \(\theta \), \(\lambda \), \(\tau _1\) and \(\tau _2\) are determined in such a way that

$$\begin{aligned} \left. \frac{\partial ^h}{\partial t^h}\varPhi ^*(t)\right| _{t=0}=\left. \frac{\partial ^h}{\partial t^h}\varPhi ^{}_{W,2}(t)\right| _{t=0},\quad h=1,\ldots ,4, \end{aligned}$$

and

$$\begin{aligned} r= & {} \displaystyle \frac{k^*}{2}+\sum _{j=1}^{\lfloor m/2\rfloor }\sum _{k=1}^{u-1}\frac{k-2j}{u-1}-\left\lfloor \frac{k-2j}{u-1}\right\rfloor \nonumber \\&\displaystyle +\sum _{k=1}^{u-1}\frac{m(u-1)-u-m+2k}{2(u-1)}-\left\lfloor \frac{m(u-1)-u-m+2k}{2(u-1)}\right\rfloor \nonumber \\= & {} \displaystyle \left\{ \begin{array}{ll}\frac{m}{4}(u-2), &{} \quad ~m\quad {\text {even}}\\ \left\lfloor \frac{u}{2}\right\rfloor +\frac{m+1}{4}(u-2), &{} \quad ~m \quad {\text {odd}} \end{array}\right. \quad (u\ge 2), \end{aligned}$$
(25)

which is the sum of the second parameters of all the Beta r.v.’s in (20). Then, the weights \(\pi _0,\ldots ,\pi _{m^*-1}\) in (24) will be determined in such a way that

$$\begin{aligned} \left. \frac{\partial ^h}{\partial t^h}\varPhi ^{}_2(t)\right| _{t=0}=\left. \frac{\partial ^h}{\partial t^h}\varPhi ^{}_{W,2}(t)\right| _{t=0},\quad h=1,\ldots ,m^*, \end{aligned}$$

with \(\pi _{m^*}=1-\sum _{k=0}^{m^*-1}\pi _k\).

The near-exact distributions built in this way will match the first \(m^*\) exact moments of W and will have c.f.

$$\begin{aligned} \varPhi _W^*(t)=\varPhi ^{}_{W,1}(t)\varPhi ^{}_2(t), \end{aligned}$$
(26)

which, for non-integer r, is the c.f. of a finite mixture, with weights \(\pi _k\) \({(k=0,\ldots ,m^*)}\), of Generalized Near-Integer Gamma (GNIG) distributions of depth mu, with integer shape parameters \(\gamma _j\), given by (21) and (22) and non-integer shape parameter r given by (25) and corresponding rate parameters \((n-j)/n\) \({(j=2,\ldots ,mu)}\) and \(\lambda \). See Coelho (2004) and Coelho and Marques (2012, Appendix 1) for the expressions for the p.d.f. and c.d.f. of the GNIG distribution. Using the notation from Appendix 1 in Coelho and Marques (2012), these near-exact distributions will yield for \(W=-\log \,\varLambda \) p.d.f.’s and c.d.f.’s of the form

$$\begin{aligned}&\displaystyle f^{}_{W}(w)=\sum ^{m^*}_{k=0} \pi _k\, f^\mathrm{{\scriptscriptstyle GNIG}}\left( w\,\Bigl |\,\gamma _2,\ldots ,\gamma _{mu},r+k; \frac{n-2}{n},\ldots ,\frac{n-mu}{n},\lambda ;mu\right) ,\\&\displaystyle \qquad (w>0) \end{aligned}$$

and

$$\begin{aligned}&\displaystyle F^{}_{W}(w)=\sum ^{m^*}_{k=0} \pi _k\, F^\mathrm{{\scriptscriptstyle GNIG}}\left( w\,\Bigl |\,\gamma _2,\ldots ,\gamma _{mu},r+k; \frac{n-2}{n},\ldots ,\frac{n-mu}{n},\lambda ;mu\right) ,\\&\displaystyle \quad (w>0), \end{aligned}$$

while the near-exact p.d.f. and c.d.f. for \(\varLambda \) are, respectively, given by

$$\begin{aligned}&\displaystyle f^{}_{\varLambda }(z)=\sum ^{m^*}_{k=0} \pi _k f^\mathrm{{\scriptscriptstyle GNIG}}\left( -\log \,z\,\Bigl |\,r_2^*,\ldots ,r_p^*,r+k; \frac{n-2}{n},\ldots ,\frac{n-mu}{n},\lambda ;mu\right) \frac{1}{z}\\&\quad \displaystyle (0<z<1) \end{aligned}$$

and

$$\begin{aligned}&\displaystyle F^{}_{\varLambda }(z)=\sum ^{m^*}_{k=0} \pi _k \left( 1-F^\mathrm{{\scriptscriptstyle GNIG}}\left( -\log z\Bigl |r_2^*,\ldots ,r_p^*,r+k; \frac{n-2}{n},\ldots ,\frac{n-mu}{n},\lambda ;mu\right) \right) ,\\&\displaystyle (0<z<1). \end{aligned}$$

For integer r, the above GNIG distributions of depth mu become GIG distributions of depth mu (Coelho 1998; Arnold et al. 2013, App. B), which have even simpler and more manageable expressions, and in this case the near-exact distributions for \(\varLambda \) will be mixtures of what Arnold et al. (2013) call EGIG distributions.

From these near-exact distributions, one can easily compute near-exact p-values and quantiles, as it is illustrated in Sect. 5, which from the results in Sect. 4 are assured to lie extremely close to the exact ones, even for very small sample sizes and very large numbers of variables involved. As such, even in cases where one may want to compute the power of the LRT for a specific covariance matrix \({\varvec{\varSigma }}\) that somehow violates the null hypothesis of BCS, one will preferably (i) use the near-exact quantile for the null distribution of the LRT statistic for the given values of n, m and u, and then simulate something like at least \(10^5\) or \(10^6\) pseudo-random samples from a multivariate normal distribution with that covariance matrix \({\varvec{\varSigma }}\), compute the value of the LRT statistic \(\varLambda \), using (9), and take as the simulated value of power the proportion of cases where the null hypothesis of BCS is rejected, rather than (ii) use the non-null distribution of \(\varLambda \), which, given the already rather complicated facies of the null distribution of \(\varLambda \), would be way too complicated to be computed.

It may be noted that for \({m=1}\), this test yields the equivariance–equicorrelation or compound symmetry test in Wilks (1946), while, given the fact that, as stated at the end of Sect. 2, the form in (20) for the exact distribution of \(\varLambda \) remains valid when we assume for the underlying variables an elliptically contoured distribution; also the near-exact distributions developed in this section remain valid in this situation.

4 Numerical studies

To assess the performance of the near-exact distributions developed, that is, their closeness to the corresponding exact distribution, we use the measure

$$\begin{aligned} \varDelta =\frac{1}{2\pi }\int _{-\infty }^{+\infty }\left| \frac{\varPhi ^{}_W(t)-\varPhi ^*_W(t)}{t}\right| \mathrm{{d}}t, \end{aligned}$$
(27)

with

$$\begin{aligned} \max _{w>0}\left| F^{}_W(w)-F^*_W(w)\right| =\max _{0<z<1}\left| F^{}_\varLambda (z)-F^*_\varLambda (z)\right| \le \varDelta , \end{aligned}$$

where \(\varPhi ^{}_W(t)\) is the exact c.f. of W in (23) and \(\varPhi ^*_W(t)\) is the near-exact c.f. of W in (26) and \(F^{}_W(\,\cdot \,)\) and \(F^*_W(\,\cdot \,)\) are the corresponding c.d.f.’s, that is, the exact and near-exact c.d.f. of W, being \(F^{}_\varLambda (\,\cdot \,)\) and \(F^*_\varLambda (\,\cdot \,)\) the corresponding c.d.f.’s for \(\varLambda \). That \(\varDelta \) in (27) always yields a finite value is shown in Appendix 4.

Table 1 shows values of the measure \(\varDelta \) for the common chi-square approximation to the distribution of the logarithm of the LRT statistic, which says that \(-2\log \,\varLambda \mathop {\sim }\limits ^{a}\chi ^2_\nu \), with \({\nu = mu(mu+1)/2-m(m+1)}\), and for the near-exact distributions developed in the previous section. In this table, different values of u (number of locations or time points), m (number of variables) and n (sample size) are used, and also different values of \(m^*\), the number of exact moments of W matched by the near-exact distributions.

Table 1 Values of \(\varDelta \) for the chi-square and near-exact distributions, for different values of m and u and sample sizes \(n=mu+2,30,100\)

Values for \(\varDelta \) in Table 1 were computed using the numerical integration module NIntegrate from Mathematica®, version 9, and using \(\varPhi _W(t)\) in (23) and \(\varPhi ^*_W(t)\) in (26) for the near-exact distributions, and

$$\begin{aligned} \varPhi ^*_W(t)=\left( \frac{1}{2}\right) ^{f/2}\left( \frac{1}{2}-{\mathrm {i}}\frac{t}{2}\right) ^{-f/2}=\left( 1-{\mathrm {i}}t\right) ^{-f/2} \end{aligned}$$

with \({f=mu(mu+1)/2-m(m+1)}\) for the chi-square approximation for W. Because of numerical stability issues, usually \(\varDelta \) in (27) is computed by integrating between zero and plus infinity and then multiplying it by \(1/\pi \). If in some cases the upper limit of plus infinity for the integral may still give some problems, a numerical limit like \(3\times 10^4\) or \(5\times 10^4\) is used, after checking for stability of the numerical value obtained for the integral.

As expected, as \(m^*\) increases the values of \(\varDelta \) for the near-exact distributions decrease clearly, showing an increasing closeness to the exact distribution. We may also see from Table 1 that the near-exact distributions developed exhibit a very good performance for very small sample sizes and also a very good asymptotic behavior not only for increasing sample sizes, but also for increasing values of both u and m, which is a much desirable feature. For all values of u and m, the values of \(\varDelta \), upper bounds on the difference between the exact and the near-exact c.d.f., exhibit extremely low values. One may also note that, for larger values of u and m, the asymptotic behavior for increasing n becomes visible only for larger values of n.

From Table 1, it also becomes clear that indeed the chi-square asymptotic distribution may only yield somewhat sensible approximations for very large sample sizes and small numbers of variables involved, and that the performance of this approximation worsens much as the number of variables increases, that is, as either u or m increases. Indeed, the measure \(\varDelta \) in (27) gives very sharp upper-bounds on the difference between the exact and the approximate c.d.f.’s in case the approximation is rather good, while it may give too large values in the opposite case.

This is the reason why we get some values of \(\varDelta \) above one for the chi-square approximation for the smaller sample sizes for a number of the combinations of larger values of u and m, although indeed the values of \(\varDelta \) should always be between zero and one. This indicates that in these cases the classical chi-square approximation has a really very poor performance.

5 A real-data example and a simulation study

In this section, the authors show how to implement the new hypothesis testing procedure, using the block-diagonalization of the BCS structure, as a result of the application of Lemma 3.1 by Roy and Fonseca (2012), with a real data set taken from Johnson and Wichern (2007, p. 43). A researcher measured the mineral content of bones (radius, humerus and ulna) by photon absorptiometry to examine whether dietary supplements would slow bone loss in 25 older women. Measurements were recorded for the three bones on the dominant and non-dominant sides. As such, data have a two-level multivariate structure, with \({u=2}\) and \({m=3}\) and if we rearrange the variables in the data set by grouping together the mineral content of the dominant sides of radius, humerus and ulna as a first set of three variables, that is, the variables in the first location (\(t=1\) for the dominant side) and then the mineral contents for the non-dominant side of the same bones as the second set of three variables (\(t=2\) for the non-dominant side), the resulting maximum likelihood estimate of \({\varvec{\varSigma }}\) is (rounded to five decimal places)

Not only the sample variance–covariance matrices of the three mineral contents for the dominant and non-dominant sides appear very similar, but also the covariance matrix of the mineral content for the three bones between the dominant and non-dominant sides suggests the possibility of an underlying symmetric population matrix. We may thus hypothesize that the population covariance matrix may have a BCS structure.

To carry out the test, according to the procedure outlined in Section 2, one needs to compute the matrix

$$\begin{aligned} {\varvec{A}}={{\varvec{\hat{\varSigma }}}}^* ={{\varvec{\varGamma }}}{{\varvec{\hat{\varSigma }}}}{{\varvec{\varGamma }}}', \end{aligned}$$

where

Then, from (9), the computed value for \(\varLambda \) is obtained as 0.0227794, for which, using the near-exact distributions developed in Sect. 3, we obtain the p-values in Table 2.

Table 2 p-values from the near-exact approximations for different values of \(m^*\) (the number of exact moments matched) for the hypothesis test on bone mineral data

Table 2 gives the p-values for different values of \(m^*\) up to the decimal places which exactly match the decimal places of the p-value corresponding to the next \(m^*\). If we just compare the p-values for \({m^*=1}\) and \({m^*=2}\), we see that the p-value for \({m^*=1}\) is exact up to four decimal places. According to the way the near-exact distributions are built, the p-values have better precision for increasing values of \(m^*\), the number of exact moments of W matched by the corresponding near-exact distribution. Thus, the null hypothesis that the covariance structure is of the BCS type should not be rejected, with a p-value \(=\) 0.2792, which is much lower than the p-value \(=\) 0.5786 obtained when the asymptotic \(\chi ^2_\nu \) approximation for \( -2 \log \,\varLambda \) with \(\nu ={mu(mu+1)}/{2}-m(m+1)=9\) degrees of freedom is used.

The non-rejection of the BCS structure shows that the population covariance matrices for the mineral content of the three bones (radius, humerus and ulna) for the dominant and the non-dominant sides may be considered to be equal and that also the population covariance matrix for the mineral content between the dominant and the non-dominant sides should be considered to be a symmetric matrix, with

$$\begin{aligned} { Cov}(Y_{1j},Y_{2k})={ Cov}(Y_{2j},Y_{1k}),\quad j,k\in \{1,2,3\}. \end{aligned}$$

This means that, for example, the population covariance between the mineral content of the dominant side of the radius and the mineral content of the non-dominant side of humerus is the same as that of the mineral content of the non-dominant side of the radius and the mineral content of the dominant side of the humerus, and that this happens for any pair of two different bones.

In Fig. 1, we have, for \({W=-\log \,\varLambda }\), the plots of the p.d.f.’s and c.d.f.’s for the near-exact distribution for \({m^*=1}\) and for the asymptotic Gamma distribution with shape parameter 9/2 and rate parameter 1, which corresponds to the chi-square asymptotic distribution with nine degrees of freedom for \({-2\log \,\varLambda }\).

Fig. 1
figure 1

Plots of the p.d.f.’s and c.d.f.’s of the near-exact distribution (for \(m^*=1\)) and the asymptotic \(\varGamma (9/2,1)\) distribution, for \(W=-\log \,\varLambda \) (this later one corresponding to the \(\chi ^2_9\) asymptotic distribution for \({-2\log \,\varLambda }\))

That even p-values obtained from simulation may be not sharp enough was shown by a simulation study, where 100,000 pseudo-random samples with BCS structure for \({u=2}\), \({m=3}\) and \({n=25}\) were generated. The p-value obtained from this simulation study for the computed value of \(\varLambda =0.0227794\) was 0.28163, which compared with the near-exact p-values in Table 2 shows that p-values obtained from simulation, even when using quite large simulations, may be not that precise.

6 Conclusions

As described in the Introduction of the paper, the Block-Compound Symmetric (BCS) covariance structure may be used as or may arise as the underlying covariance structure for many multivariate models, making the test for a BCS covariance structure a much necessary and desirable one. However, testing for a BCS structure of the covariance matrix may seem at first sight to be a not so easy task, given the facts that not only the likelihood ratio statistic is expected to have a rather complicated derivation but also and mainly because its exact distribution is expected to have an extremely complicated structure and expression. In this paper, the authors show how to use an adequate decomposition of the BCS null hypothesis; based on Lemma 3.1 by Roy and Fonseca (2012), it is possible to easily derive the expression for the likelihood ratio statistic and also to obtain the expression for its moments, under the BCS hypothesis. The approach followed also enabled the derivation of simple expressions for the p.d.f. and c.d.f. of the likelihood ratio statistic for some simpler particular cases, as well as, and most important, the development of very sharp but highly manageable near-exact distributions for the test statistic, which in turn enable an easy computation of quantiles and p-values. These near-exact distributions exhibit a sharp closeness to the exact distribution for very small samples and also very good asymptotic behaviors not only in terms of increasing sample sizes, but also in terms of increasing values of the number of variables, and number of locations or time points. This asymptotic behavior for increasing number of variables is a much desirable feature which common asymptotic distributions do not have. The authors also show that the common chi-square asymptotic approximation for \(-2\log \,\varLambda \) may only work in practice for very large sample sizes and when the number of variables involved is quite small, and that it may indeed not even work at all when the number of variables involved is rather large.

The approach followed in this paper may be extended to address more complicated covariance structures arising for multi-level multivariate data, and it also allows for an immediate extension of the results obtained, in terms of both exact and near-exact distributions, to underlying elliptically contoured distributions.