1 Introduction

In modern experiments, huge datasets consisting of measurements of many features, observed repeatedly in time, at various locations and depths, taken from many individuals, are usually collected. This paper deals with doubly multivariate data that can be stored in a three-dimensional tensor of observations of size \(n\times m \times u\), where n is the number of individuals (sample size) and m and u are the numbers of repeated measurements of, for example, different features and locations, respectively. In such experiments, especially in genetics or medicine, the sample size is often too small to estimate all the unknown parameters of the model. To avoid this problem, regularization of estimators is proposed as one of the solutions, while the second one is to consider a patterned covariance matrix.

One of the common patterns for a doubly multivariate model is a block compound symmetry (BCS) structure, which is a direct extension of compound symmetry, common for multivariate models. BCS has been introduced in Rao (1945, 1953), where the problem of discriminating genetically different groups is studied. Note that the BCS structure is common for experiments, where the covariance matrix does not change when vectors from different repeated measures are interchanged, and hence, in the literature, it is also called an exchangeable structure.

A general method to test means and covariance matrices related to models with the BCS covariance structure has been proposed in Arnold (1973, 1979) as a way of reducing the number of unknown parameters for estimation. A similar problem has been also studied in Szatrowski (1976, 1978, 1982). Furthermore, Perlman (1987) found that if the collected data has symmetries, it is possible to obtain a more accurate estimate of the covariance matrix. Recently, Leiva (2007) formulated the generalized Fisher linear discrimination method under the BCS covariance structure and derived the maximum likelihood estimators of BCS. The estimation of BCS, as well as circular Toeplitz structure, was considered in Liang et al. (2012), while the optimal estimators of the BCS structure was proposed in Roy et al. (2016); Kozioł et al. (2018). The application of the BCS structure in multivariate interval data problems is shown in Hao et al. (2015). The tests for the mean structure under the model with the BCS covariance matrix were proposed in Zmyślony et al. (2018); Žežula et al. (2018), while the likelihood ratio test (LRT) and the Rao score test (RST) to test the BCS covariance structure were presented in Roy and Leiva (2011); Roy et al. (2018); Filipiak and Klein (2021). The asymptotic normal distribution of the LRT under the assumption of the size of each block and the sample size tending to infinity was presented in Sun and Xie (2020). Very recently, Liang et al. (2021) derived the LRT for testing simultaneously the mean and particular structures of blocks of the BCS covariance matrix.

The aim of this paper is to test independence between features measured repeatedly, e.g., over time or locations, under the normal model with the BCS covariance structure. The LRT for such a hypothesis was studied, for example, in Fonseca et al. (2018), where also a new test statistic being F-distributed, say FT statistic, is proposed. In the same year Tsukada (2018) compared the power of LRT, modified LRT (using Bartlett correction), Wald test (WT) and RST, but only for selected types of alternative hypothesis. It should be noted that the Wald test statistic is given without the proof in Tsukada (2018). In this paper we revise this test and show that the form presented in Tsukada (2018) is not in line with the definition of Wald test given by Rao (2005). Very recently Kozioł et al. (2021) gave a review on testing hypotheses under the BCS structures, and introduced also Roy’s test statistics having the largest root distribution.

The main goals of this paper are to determine the RST and WT statistics and to derive the exact distribution of LRT. Moreover, using simulation studies we compare the asymptotical properties of mentioned likelihood ratio based tests and we verify their robustness for non-Gaussian data. Finally, we recall the FT and Roy’s largest root tests and we study the power of each considered test. For this purpose, we introduce the entropy loss function (Kullback-Leibler divergence between two distributions) as a measure of discrepancy between the null and alternative hypotheses. We show that such an approach allows to compare the power of the test for various alternatives, in contrast to the approach usually considered in the literature, where particular structures of alternative hypotheses are studied; cf. Fonseca et al. (2018), Tsukada (2018). Note, finally, that the presented results can be applied in many areas of science, e.g. genetics, medicine, dietetics, agriculture, physics, image processing or engineering. In this paper we use horticultural real data example to compare all considered tests.

The paper is organized as follows. In Sect. 2 the model and hypotheses of interest, as well as the maximum likelihood estimators (MLEs) of unknown parameters, are presented. The RST, WT, LRT, FT and Roy’s test statistics are formulated in Sect. 3, together with their properties, such as the convergence of the distribution of RST and WT statistics to the limiting chi-square distribution, the exact distribution of LRT statistic, as well as the independence of the distributions of all test statistics on the true values of unknown parameters. The powers of all tests are analyzed in Sect. 4. Finally, to illustrate presented methods, the independence of the petal lengths between any two flowers of Kalanchoe plants is tested in Sect. 5. The article is summarized in the Discussion section.

2 Model and hypothesis

We consider an experiment performed on n individuals in which m features are repeatedly measured u times, where these repeated measurements could be time points, locations, depths, etc. Let \({\mathbf{X}}_i=({\mathbf{x}}^\prime _{i1},\dots ,{\mathbf{x}}^\prime _{iu})^\prime \), \(i = 1,\dots ,n\), be independent and identically distributed um-dimensional vectors of observations, where \({\mathbf{x}}_{ij}\), \(j=1,\dots ,u\), are m-dimensional vectors of measurements of the jth feature on the ith individual (at each of the u repetitions).

A normal matrix model is assumed here, in which the observation vectors for all individuals are placed in rows one below the other, that is,

$$\begin{aligned} {\mathbf{X}}=({\mathbf{X}}_1, {\mathbf{X}}_2, ..., {\mathbf{X}}_n)^{'} \sim N_{n, um}({\mathbf{1}}_n {\varvec{\mu }}', {\mathbf{I}}_n ,{\varvec{\Omega }}), \end{aligned}$$
(1)

where \({\mathbf{1}}_n\) is an n-dimensional vector of ones, \({\varvec{\mu }}\) is a um-dimensional general mean (the same for every individual), \({\mathbf{I}}_n\) is an identity matrix of order n, and \({\varvec{\Omega }}\) is an unknown symmetric positive definite covariance matrix of order um. Model (1) can also be presented in a vectorized form as

$$\begin{aligned} {\text {vec}}{\mathbf{X}} \sim N_{num}(({\mathbf{I}}_{um}\otimes {\mathbf{1}}_n){\varvec{\mu }}, {\varvec{\Omega }} \otimes {\mathbf{I}}_n), \end{aligned}$$

where \({\text {vec}}\) is the operator stacking the columns of a matrix one below the other and \(\otimes \) is a Kronecker product.

It is known, that if \({\varvec{\Omega }}\) is unstructured, its MLE is of the form

$$\begin{aligned} {\mathbf{S}}= \tfrac{1}{n}{\mathbf{X}}' {\mathbf{Q}}_{n} {\mathbf{X}}, \end{aligned}$$
(2)

where \({\mathbf{Q}}_{n}={\mathbf{I}}_n - \frac{1}{n}{\mathbf{1}}_n{\mathbf{1}}'_n\) is the orthogonal projector onto the orthocomplement of the column space of \({\mathbf{1}}_n\), while an unbiased estimator has the form

$$\begin{aligned} {\mathbf{S}}^{*}= \tfrac{1}{n-1}{\mathbf{X}}' {\mathbf{Q}}_n {\mathbf{X}}. \end{aligned}$$

Note that, if um is close to the sample size, both estimators are ill-conditioned. Furthermore, if \(um > n\) the estimators are singular. To avoid these problems and to reduce the number of unknown parameters, one may impose the appropriate structure on the covariance matrix, which decreases the number of unknown parameters. In this paper we consider the BCS structure

$$\begin{aligned} {\varvec{\Omega }}_{{\text {{BCS}}}}= \left( \begin{array}{cccc} \Gamma _0&{}\Gamma _1&{}\ldots &{}\Gamma _1\\ \Gamma _1&{}\Gamma _0&{}\ldots &{}\Gamma _1\\ \vdots &{} &{}\ddots &{}\vdots \\ \Gamma _1&{}\Gamma _1&{}\ldots &{}\Gamma _0\\ \end{array}\right) ={\mathbf{I}}_u \otimes {\varvec{\Gamma }}_0+({\mathbf{J}}_u-{\mathbf{I}}_u)\otimes {\varvec{\Gamma }}_1:={\varvec{\Gamma }} \end{aligned}$$
(3)

with symmetric positive definite (p.d.) matrix \({\varvec{\Gamma }}_0\) of order m, and with symmetric matrix \({\varvec{\Gamma }}_1\) of order m such that \({\varvec{\Gamma }}\) is p.d. Matrix \({\varvec{\Gamma }}_0\) is a variance-covariance matrix of m features at any given repeated measurement, while \({\varvec{\Gamma }}_1\) is a covariance matrix of m features between any two repeated measurements.

After reparameterization one can get an equivalent form of BCS structure of the form

$$\begin{aligned} {\varvec{\Gamma }} = {\mathbf{Q}}_u \otimes {\varvec{\Delta }}_1+ {\mathbf{P}}_u \otimes {\varvec{\Delta }}_2, \end{aligned}$$
(4)

where \({\mathbf{P}}_u =\frac{1}{u} {\mathbf{1}}_u{\mathbf{1}}'_u\) is the orthogonal projector onto the column space of \({\mathbf{1}}_u\). This form is more useful from a computational point of view. Since \({\mathbf{P}}_u {\mathbf{Q}}_u = {\mathbf{0}}\), to ensure the positive definiteness of \({\varvec{\Gamma }}\) it is enough to assume that \({\varvec{\Delta }}_i\), \(i=1,2\), are symmetric positive definite matrices. The relationship between (3) and (4) can be represented as

$$\begin{aligned} \left\{ \begin{array}{lll} {\varvec{\Delta }}_1&{}=&{}{\varvec{\Gamma }}_0-{\varvec{\Gamma }}_1\\ {\varvec{\Delta }}_2&{}=&{}{\varvec{\Gamma }}_0+(u-1){\varvec{\Gamma }}_1. \end{array} \right. \end{aligned}$$

Note that the BCS structure is also called exchangeable, since the vector \({\mathbf{x}}_{ij}\) can be interchanged with \({\mathbf{x}}_{ij'}\), \(j,j'=1,\dots ,u\), without changing the covariance matrix.

Since the space of BCS structures is a quadratic subspace, that is, the power of BCS also belongs to the same space, cf. Seely (1971), the MLE of \({\varvec{\Gamma }}\) is a projection of \({\mathbf{S}}\) given in (2) onto the space of BCS structures, that is,

$$\begin{aligned} \widehat{{\varvec{\Gamma }}}={\mathbf{Q}}_u\otimes \widehat{{\varvec{\Delta }}}_1+{\mathbf{P}}_u\otimes \widehat{{\varvec{\Delta }}}_2, \end{aligned}$$
(5)

with

$$\begin{aligned} \widehat{{\varvec{\Delta }}}_1=\tfrac{1}{u-1} {\text {BTr}}_m{[({\mathbf{Q}}_{u}\otimes {\mathbf{I}}_m){\mathbf{S}}]}, \qquad \widehat{{\varvec{\Delta }}}_2={\text {BTr}}_m{[({\mathbf{P}}_{u}\otimes {\mathbf{I}}_m){\mathbf{S}}]}; \end{aligned}$$
(6)

cf. Filipiak et al. (2020). Alternatively, if \({\varvec{\Gamma }}\) is expressed as (3),

$$\begin{aligned} \widehat{{\varvec{\Gamma }}}_0=\tfrac{1}{u} {\text {BTr}}_m{\mathbf{S}}, \qquad \widehat{{\varvec{\Gamma }}}_1=\tfrac{1}{u(u-1)}({\text {BSum}}_m{\mathbf{S}}-{\text {BTr}}_m{\mathbf{S}}), \end{aligned}$$
(7)

where \({\text {BTr}}_m({\mathbf{A}})=\sum \nolimits _{i=1}^u {\mathbf{A}}_{ii}\) is a block trace operator defined on the partitioned matrix \({\mathbf{A}}=({\mathbf{A}}_{ij})\), \(i,j=1,\dots ,u\), with blocks of order m; cf. Filipiak et al. (2018), and \({\text {BSum}}_m({\mathbf{A}})=\sum \nolimits _{i=1}^u\sum \nolimits _{j=1}^u{\mathbf{A}}_{ij}\).

We are interested in testing the hypothesis related to the independence of features between two repeated measurements. This means that we are testing the block diagonality of the covariance matrix, which can be presented as

$$\begin{aligned} H_0: \ {\varvec{\Omega }}={\varvec{\Gamma }} \ \text{ and } \ {\varvec{\Gamma }}_1={\mathbf{0}} \qquad \text{ vs } \quad H_1: \ {\varvec{\Omega }}={\varvec{\Gamma }} \end{aligned}$$
(8)

or, equivalently, using parameterization (4),

$$\begin{aligned} H_0: \ {\varvec{\Delta }}_1={\varvec{\Delta }}_2 \quad \text{ vs } \quad H_1: \ {\varvec{\Omega }}={\mathbf{Q}}_u \otimes {\varvec{\Delta }}_1+ {\mathbf{P}}_u \otimes {\varvec{\Delta }}_2. \end{aligned}$$
(9)

The spectral form of the BCS structure given in (9) provides simpler algebraic transformations than the previous one, and thus will usually be considered in the forthcoming sections.

Let us denote \({\varvec{\Delta }}_1={\varvec{\Delta }}_2\) in (9) by \({\varvec{\Delta }}\). Then, the null hypothesis can be written as \( {\varvec{\Omega }}={\mathbf{I}}_u \otimes {\varvec{\Delta }}\). Since the space of block diagonal matrices is a quadratic subspace, the MLE of \({\varvec{\Delta }}\) is a projection of \({\mathbf{S}}\) onto the space of block diagonal matrices, that is,

$$\begin{aligned} \widehat{{\varvec{\Delta }}}=\tfrac{1}{u} {\text {BTr}}_m{{\mathbf{S}}}. \end{aligned}$$
(10)

The MLEs given in (6) and (10) will be used for determining the test statistics in the next section.

3 Test statistics

In this section we give an overview of the tests for the considered hypothesis, with determination of RST and WT statistics. Note that the form of RST statistic has been stated by Tsukada (2018), however, in this paper we present an alternative proof of its form. Moreover, in Tsukada (2018) the WT statistic has been given without any proof; therefore, in this paper we prove that the WT statistic has a more complex form. We also verify the convergence of likelihood ratio based tests to the limiting chi-square distribution, using empirical distributions for RST and WT, and the exact distribution of LRT, formulated in Theorem 3 and proved in Appendix C.

Another test for (9), FT, having F-distribution with respective degrees of freedom, has been introduced by Fonseca et al. (2018). Its generalization is Roy’s test, having the largest root distribution; cf. Mardia et al. (1979). In this paper we recall their forms, and we verify the robustness of all tests for non-Gaussian data.

We start with determining the RST statistic. The proof of the following theorem can be found in Appendix A.

Theorem 1

Under hypothesis (9) the Rao score test statistic can be expressed as

$$\begin{aligned} {\text {RS}}=\tfrac{n}{2}{\text {tr}}\Big \{ \left[ {\mathbf{I}}_{um}-\widehat{{\varvec{\Gamma }}}({\mathbf{I}}_u\otimes \widehat{{\varvec{\Delta }}})^{-1} \right] ^2 \Big \}, \end{aligned}$$

where \(\widehat{{\varvec{\Gamma }}}\) and \(\widehat{{\varvec{\Delta }}}\) are given in, respectively, (5) and (10).

Denoting the MLEs of covariance matrix under alternative and null hypothesis by, respectively, \(\widehat{{\varvec{\Omega }}}_{H_1}\) and \(\widehat{{\varvec{\Omega }}}_{H_0}\), we may represent the above RS test statistic as

$$\begin{aligned} {\text {RS}}=\tfrac{n}{2}{\text {tr}}\Big \{ \left[ {\mathbf{I}}_{um}-\widehat{{\varvec{\Omega }}}_{H_1}\widehat{{\varvec{\Omega }}}_{H_0}^{-1} \right] ^2 \Big \}, \end{aligned}$$

which is in line with the RS for testing various covariance structures in Filipiak and Klein (2021).

It is worth noting that under hypothesis (8), which is obviously equivalent to (9), we may formulate the following corollary, that can be proven directly from Theorem 1 by considering the parameterization of \(\widehat{{\varvec{\Gamma }}}\) and \(\widehat{{\varvec{\Delta }}}\) through \(\widehat{{\varvec{\Gamma }}}_0\) and \(\widehat{{\varvec{\Gamma }}}_1\).

Corollary 1

Under hypothesis (8) the Rao score test statistic can be expressed as

$$\begin{aligned} {\text {RS}}=\tfrac{nu(u-1)}{2}{\text {tr}}\left[ (\widehat{{\varvec{\Gamma }}}_0^{-1}\widehat{{\varvec{\Gamma }}}_1)^2\right] , \end{aligned}$$

where \(\widehat{{\varvec{\Gamma }}}_0\) and \(\widehat{{\varvec{\Gamma }}}_1\) are given in (7).

Note, that the RST statistic presented in Corollary 1 can be also expressed as formula (3.20) in Tsukada (2018).

Finally, recall that, due to Rao (2005), under considered null hypothesis and if the sample size \(n\rightarrow \infty \), presented RST statistic is \(\chi ^2\) distributed with \(m(m+1)/2\) degrees of freedom. The same limiting distribution is related to the second well know test - Wald test, presented in the next theorem, with the proof in Appendix B.

Theorem 2

Under hypothesis (9) the Wald test statistic can be expressed as

$$\begin{aligned} {\text {W}}=\tfrac{n}{2}{\text {vec}}^\prime \left( \widehat{{\varvec{\Delta }}}_1-\widehat{{\varvec{\Delta }}}_2\right) \left[ \frac{1}{u-1}(\widehat{{\varvec{\Delta }}}_1\otimes \widehat{{\varvec{\Delta }}}_1 ) +(\widehat{{\varvec{\Delta }}}_2\otimes \widehat{{\varvec{\Delta }}}_2 ) \right] ^{-1} {\text {vec}}\left( \widehat{{\varvec{\Delta }}}_1-\widehat{{\varvec{\Delta }}}_2\right) \end{aligned}$$

with \(\widehat{{\varvec{\Delta }}}_1\) and \(\widehat{{\varvec{\Delta }}}_2\) given in (6).

We shall note, that to determine the Wald test statistic under hypothesis (8) it is enough to replace matrices \({\varvec{\Delta }}_1\), \({\varvec{\Delta }}_2\) by respective transformations given in (2), however, the form of W will be much more complex. It is also possible to determine W going directly from null hypothesis (8), nevertheless, in such a case the Fisher information matrix given in Appendix A cannot be applied directly.

Finally observe, that plugging the MLE of covariance matrix under null hypothesis instead of alternative into the form of WT, respective test statistic presented by Tsukada (2018) will be obtained. Note however, that such approach is not in line with the definition of the Wald test given by Rao (2005).

The Rao score test is based on the MLE of vector of parameters under null hypothesis, while the Wald test used the MLE under alternative. The third test of Rao’s “holy trinity” is the likelihood ratio test, based on comparison of the MLEs under the null and alternative hypotheses. When testing (9), the likelihood ratio \(\Lambda \) has the form

$$\begin{aligned} \Lambda = \left( \frac{ \mid \widehat{{\varvec{\Delta }}}_1\mid ^{u-1} \mid \widehat{{\varvec{\Delta }}}_2\mid }{ \mid \widehat{{\varvec{\Delta }}}\mid ^u } \right) ^{n/2}. \end{aligned}$$
(11)

It is well known (Rao 2005) that under the null hypothesis, \({\text {LR}}=-2\ln \Lambda \) is approximately distributed as \(\chi ^2\) with \(m(m+1)/2\) degrees of freedom. It should be noted that if the covariance parameters fall on the boundary of their parameter space, then the asymptotic distribution of \({\text {LR}}\) becomes a mixture of \(\chi ^2\) distributions, as discussed in Self and Liang (1987). Instead of an approximate distribution, which works well only for relatively large sample sizes, one can use the exact distribution of the LR presented in the following theorem, with the proof given in Appendix C.

Theorem 3

The characteristic function of \({\text {LR}}=-2\ln \Lambda \), with \(\Lambda \) being the likelihood ratio test statistic given in (11), is of the form

$$\begin{aligned} \begin{array}{rcl} \varphi (t)&{}=&{} \displaystyle \frac{u^{-numit}}{(u-1)^{-n(u-1)mit}} \cdot \\ &{}&{}\displaystyle \prod _{j=1}^m \textstyle \left[ \frac{\Gamma \left( \frac{(n-1)(u-1)+1-j}{2}-itn(u-1)\right) }{\Gamma \left( \frac{(n-1)(u-1)+1-j}{2}\right) }\cdot \frac{\Gamma \left( \frac{n-j}{2}-itn\right) }{\Gamma \left( \frac{n-j}{2}\right) }\cdot \frac{\Gamma \left( \frac{(n-1)u+1-j}{2}\right) }{\Gamma \left( \frac{(n-1)u+1-j}{2}-itnu\right) } \right] . \end{array} \end{aligned}$$

The exact distribution of LRT statistic with the use of the above characteristic function can be computed with the use of R package CharFunToolR developed in Gajdoš (2018) on the basis of Matlab package CharFunTool provided in Witkovský (2018).

We shall also mention, that in the literature some modifications of LRT are studied. One of the example is multiplication of LR by a constant equal to \(1-(u^2-u+1)(2m^2+3m-1)/(6(n-1)u(u-1)(m+1))\); cf. Tsukada (2018). Using the general theory of asymptotic expansions from Anderson (2003), such modified test statistic converges faster to respective \(\chi ^2\) distribution. Nevertheless, since in this paper we give the exact distribution of LRT, we do not consider mentioned modification as a separate test.

Finally note, that a big advantage of all presented test statistics is the following property, with the proof given in Appendix D.

Proposition 1

The distributions of RST, WT and LRT statistics under the null hypothesis in (9) do not depend on the true values of \({\varvec{\mu }}\) and \({\varvec{\Delta }}\).

Using simulations, we compare now the behavior of the empirical distributions of RST and WT statistics with respect to their convergence to the limiting distribution and we collate them with the exact distribution of LRT.

Recall, that all proposed tests can be performed if \(n>m\). Moreover, for \(n\rightarrow \infty \), the distribution of RS, W and LR test statistics tends to the \(\chi ^2\) distribution with \(m(m+1)/2\) degrees of freedom. Thus, in Figs. 1, 2, 3, 4 and 5 we present the empirical null distributions of RST and WT, exact distribution of LRT and the limiting \(\chi ^2\) distribution with respect to the sample size for \(u=3\) and respectively \(m\in \{3,6,9\}\) and for \(m=3\) and respectively \(u\in \{6,9\}\). It can be seen that distributions of all test statistics tend to the limiting distribution with the increase of n, however, the convergence of RST is the quickest and even for relatively small sample size does not differ significantly from the limiting one, which is not the case for WT nor LRT.

Fig. 1
figure 1

Empirical null distribution of RST (blue) and WT (red) and exact distribution of LRT (green) along with the \(\chi ^2_6\) distribution (black dashed) for \(m=3, u=3\). (Colour figure online)

Fig. 2
figure 2

Empirical null distribution of RST (blue) and WT (red) and exact distribution of LRT (green) along with the \(\chi ^2_6\) distribution (black dashed) for \(m=3, u=6\). (Colour figure online)

Fig. 3
figure 3

Empirical null distribution of RST (blue) and WT (red) and exact distribution of LRT (green) along with the \(\chi ^2_6\) distribution (black dashed) for \(m=3, u=9\). (Colour figure online)

Fig. 4
figure 4

Empirical null distribution of RST (blue) and WT (red) and exact distribution of LRT (green) along with the \(\chi ^2_{21}\) distribution (black dashed) for \(m=6, u=3\). (Colour figure online)

Fig. 5
figure 5

Empirical null distribution of RST (blue) and WT (red) statistics and exact distribution of LRT (green) statistic along with the \(\chi ^2_{45}\) distribution (black dashed) for \(m=9, u=3\). (Colour figure online)

The next two presented tests, the FT and Roy’s test, are based on unbiased estimators of unknown parameters, instead of MLEs. Roy et al. (2016) presented such unbiased estimators of \({\varvec{\Delta }}_1\) and \({\varvec{\Delta }}_2\) in terms of multiple sums of vector products. Recall that since the space of BCS structures is a quadratic subspace, these estimators can also be obtained by projection of sample covariance matrix \({\mathbf{S}}^*\) onto the space of BCS matrices; cf. Filipiak et al. (2020). Thus, the unbiased estimators given in Roy et al. (2016) can also be represented as

$$\begin{aligned} \widetilde{{\varvec{\Delta }}}_1=\tfrac{1}{u-1} {\text {BTr}}_m{[({\mathbf{Q}}_{u}\otimes {\mathbf{I}}_m){\mathbf{S}}^{*}]},\qquad \widetilde{{\varvec{\Delta }}}_2={\text {BTr}}_m{[({\mathbf{P}}_{u}\otimes {\mathbf{I}}_m){\mathbf{S}}^{*}]}. \end{aligned}$$

The FT statistic, introduced by Fonseca et al. (2018), has the form

$$\begin{aligned} \mathrm{F}=\frac{{\mathbf{v}}'\widetilde{{\varvec{\Delta }}}_2{\mathbf{v}}}{{\mathbf{v}}'\widetilde{{\varvec{\Delta }}}_1{\mathbf{v}}}, \end{aligned}$$

and is F distributed with \(n-1\) and \((n-1)(u-1)\) degrees of freedom; cf. Fonseca et al. (2018)[Lemma 3.1]. Since the unbiased estimators of \({\varvec{\Delta }}_1\) and \({\varvec{\Delta }}_2\) differ from respective MLEs only by a constant, we can also represent FT statistic in terms of MLEs, that is

$$\begin{aligned} \mathrm{F}=\frac{{\mathbf{v}}'\widehat{{\varvec{\Delta }}}_2{\mathbf{v}}}{{\mathbf{v}}'\widehat{{\varvec{\Delta }}}_1{\mathbf{v}}}. \end{aligned}$$

Noting that

$$\begin{aligned} \widehat{{\varvec{\Delta }}}_1 \sim W_m\left( \tfrac{1}{n(u-1)}{\varvec{\Delta }}_1, (n-1)(u-1) \right) , \quad \widehat{{\varvec{\Delta }}}_2 \sim W_m\left( \tfrac{1}{n}{\varvec{\Delta }}_2, n-1 \right) \end{aligned}$$

are independent; cf. Roy et al. (2015), the F distribution of the above FT statistic also follows.

Observe, that according to (D.1), the FT statistic can be expressed as

$$\begin{aligned} \mathrm{F}=\frac{{\mathbf{v}}'{\varvec{\Delta }}^{1/2}\widehat{{\varvec{\Upsilon }}}_2{\varvec{\Delta }}^{1/2}{\mathbf{v}}}{{\mathbf{v}}'{\varvec{\Delta }}^{1/2}\widehat{{\varvec{\Upsilon }}}_1{\varvec{\Delta }}^{1/2}{\mathbf{v}}}= \frac{{\mathbf{w}}'\widehat{{\varvec{\Upsilon }}}_2 {\mathbf{w}}}{{\mathbf{w}}'\widehat{{\varvec{\Upsilon }}}_1{\mathbf{w}}}, \end{aligned}$$

where \(\widehat{{\varvec{\Upsilon }}}_1\) and \(\widehat{{\varvec{\Upsilon }}}_2\) are given in (D.2). Thus, even if under null hypothesis the distribution of the above FT statistic does not depend on the true value of \(\varvec{\Delta }\), the choice of vector \({\mathbf{v}}\) should be appropriate. Note, that if \({\mathbf{v}}\) is equal to the column of identity matrix, in fact the hypothesis about specific entry of covariance matrix being equal to zero is tested, which is the same as (9) only if \(m=1\). Furthermore, if \({\mathbf{v}}={\mathbf{1}}_m\) (as it was assumed in, e.g., Fonseca et al. (2018)) then the hypothesis about the sum of all elements of \({\varvec{\Gamma }}_1\) being equal to zero is tested. Thus, the proposed test statistic is appropriate to test (9) if all the entries of \({\varvec{\Gamma }}_1\) are of the same sign (or some of them, but not all, are zeros). In fact, the choice of vector \({\mathbf{v}}\) as the vector of nonnegative (nonpositive) components corresponds to testing the value of weighted sum of the elements of \({\varvec{\Gamma }}_1\). Concluding, for testing (9), it would be natural not to fix a single \({\mathbf{v}}\), but to choose some optimal vector of quadratic forms in FT.

To achieve higher power of the test it is natural to choose a vector \({\mathbf{v}}\) maximizing the value of the test statistic. Thus, maximizing \(\mathrm{F}\) for all \({\mathbf{v}}\in {\mathbb {R}}^m\) we obtain the test statistic \(\mathrm{F}_m=\lambda _{max}\left( \widehat{{\varvec{\Delta }}}_2\widehat{{\varvec{\Delta }}}_1^{-1}\right) \); cf. Kozioł et al. (2021), where \(\lambda _{max}(\cdot )\) is the largest root of the matrix in parenthesis. The inference can be made using Roy’s test of the form

$$\begin{aligned} \mathrm{R}=\frac{\tfrac{1}{u-1}\mathrm{F}_m}{1+\tfrac{1}{u-1}\mathrm{F}_m}, \end{aligned}$$

which has the largest root distribution with parameters m, \((n-1)(u-1)\), and \(n-1\); for more details see e.g. Mardia et al. (1979). For simplicity we will abbreviate this distribution as RLR (Roy’s largest root).

Notice, that vector \({\mathbf{v}}\) in \(\mathrm{F}_m\) is the eigenvector corresponding to the maximal eigenvalue of \(\widehat{{\varvec{\Delta }}}_2\widehat{{\varvec{\Delta }}}_1^{-1}\), thus it is not fixed anymore, but depends on the data. As a result, as it is mentioned in Kozioł et al. (2021), Roy’s test does not necessarily have higher power than the F-test, it is advantageous only when the largest eigenvalue is substantially larger than the remaining ones.

In order to check the robustness of considered tests with respect to some perturbations from normality, for various combinations of m, u and n, we generated the data from the following non-Gaussian distributions: multivariate \(t_3\), \(t_5\), gamma distribution with parameters (2, 1.5), and uniform on the interval (0, 1). The results for \(m=9\), \(u=3\) and \(n\in \{10,25\}\) are given in Fig. 6. It can be seen that for small sample sizes, the distributions of all test statistics under non-normality are quite close to the relevant empirical null distributions, however, if the sample size increases, for considered t distributions, the test statistics are appearing to tend to some other distribution than chi-square. Similar observation was noticed for other sets of parameters (results not presented in this paper). Therefore, the robustness of test statistics, especially for multivariate t distributions, will be the topic of future research.

Fig. 6
figure 6

Empirical null distributions of test statistics (presented in rows) under normality (black dashed), the multivariate \(t_3\) (red), \(t_5\) (green), gamma G(2, 1.5) (blue), and uniform U(0, 1) (purple) distributions, for \(m=9\), \(u=3\), \(n=10\) (left panel), \(n=25\) (right panel). (Colour figure online)

4 Power study of considered tests

For power comparison purpose Fonseca et al. (2018) considered the covariance structure under alternative constructed by choosing a block diagonal matrix with \({\varvec{\Gamma }}_0\) on the diagonal and stating a scaled randomly generated matrix \({\varvec{\Gamma }}_1\) as off-diagonal blocks, that is,

$$\begin{aligned} {\varvec{\Gamma }}={\mathbf{I}}_u\otimes {\varvec{\Gamma }}_0+ ({\mathbf{J}}_u-{\mathbf{I}}_u)\otimes \lambda \;{\varvec{\Gamma }}_1, \end{aligned}$$

where \(\lambda \) is a parameter ensuring positive definiteness of \({\varvec{\Gamma }}\). Note, that for various matrices \({\varvec{\Gamma }}_1\), parameter \(\lambda \) belongs to different domains (ensuring positive definiteness), and hence the discrepancies \(\mid \!\!\lambda \!\!\mid \) are not comparable. Moreover, such approach allows to consider only very specific types of alternatives and also null hypothesis, choosing the same \({\varvec{\Gamma }}_0\) in both, null and alternative. Thus, in this paper, as a measure of discrepancy between given alternative, \({\varvec{\Gamma }}\), and a set of block-diagonal matrices \({\mathbf{I}}_u\otimes {\varvec{\Delta }}\), we minimize Kullback-Leibler divergence between two distributions that differ in covariance matrix, that is

$$\begin{aligned} \begin{array}{rcl} \zeta= & {} \displaystyle \min _{{\varvec{\Delta }}}\left\{ {\text {tr}}\left[ {\varvec{\Gamma }}^{-1} ({\mathbf{I}}_u \otimes {\varvec{\Delta }})\right] -\ln \mid {\varvec{\Gamma }}^{-1}\left( {\mathbf{I}}_u \otimes {\varvec{\Delta }}\right) \mid -um\right\} , \end{array} \end{aligned}$$
(12)

where the symmetric p.d. matrix \({\varvec{\Gamma }}\) has BCS structure with some given symmetric matrices \({\varvec{\Gamma }}_0\) (p.d.) and \({\varvec{\Gamma }}_1\), while \({\varvec{\Delta }}\) is a symmetric p.d. matrix for which the minimum is attained. Using the same differentiation rules as in Appendix A, it can be shown that the minimum (12) is obtained for

$$\begin{aligned} {\varvec{\Delta }}=\left( \tfrac{u-1}{u}{\varvec{\Delta }}_1^{-1}+\tfrac{1}{u}{\varvec{\Delta }}_2^{-1} \right) ^{-1} \end{aligned}$$
(13)

with \({\varvec{\Delta }}_1\) and \({\varvec{\Delta }}_2\) defined in (2). It should be noted that (13) determines the block diagonal structure which is the closest one in the sense of (12), and does not need to be the same as the diagonal blocks used in the alternative hypothesis. Observe moreover that since the value of \(\zeta \) is not upper bounded, we use the transformation \(\eta = 1-\frac{1}{1+\zeta }\) that shrinks \(\zeta \) into the [0, 1) interval. Note that, in contrast to the method used by Fonseca et al. (2018), for arbitrary randomly generated \({\varvec{\Gamma }}_0\) and \({\varvec{\Gamma }}_1\), the minimum \({\varvec{\Delta }}\) and the discrepancy \(\eta \) can be determined and compared to each other.

Summing up, for various values of u and m, we first randomly generate matrices \({\varvec{\Gamma }}_0\) and \({\varvec{\Gamma }}_1\), for which we determine the discrepancy \(\eta \). For example, for \(u=m=3\), we choose the matrices

$$\begin{aligned} {\varvec{\Gamma }}_0\!=\!\left( \begin{array}{rrr} 88.910 &{} -13.002 &{} 14.855 \\ -13.002 &{} 84.921 &{} 5.285 \\ 14.855 &{} 5.285 &{} 120.934 \end{array} \right) \!\!, \ {\varvec{\Gamma }}_1\!=\! \left( \begin{array}{rrr} 26.195 &{} -0.231 &{} -4.579\\ -0.231 &{} 2.357 &{} -1.647 \\ -4.579 &{} -1.647 &{} 3.495 \end{array} \right) \!\!, \end{aligned}$$
(14)

for which the minimum in (12) is attained at

$$\begin{aligned}{\varvec{\Delta }}= \left( \begin{array}{rrr} 75.995 &{} -13.2497 &{} 17.5422 \\ -13.2497 &{} 84.7433 &{} 5.49956 \\ 17.5422&{} 5.49956 &{} 120.196 \end{array} \right) \end{aligned}$$

giving \(\eta =0.2012\).

In the study on power, we start by verifying the power of all the tests mentioned for \(m=u=3\) and \(n=5\), such that \({\varvec{\Gamma }}_0\) is given in (14), and \({\varvec{\Gamma }}_1\) is randomly generated 300 times. For each case that gives a positive definite \({\varvec{\Gamma }}\) (exactly for 237 cases), we generate the data matrix \({\mathbf{X}} \sim N_{n, um}({\mathbf{0}}, {\mathbf{I}}_n, {\varvec{\Gamma }})\), for which all tests are then performed. For rejection of null hypothesis we use respectively the empirical null distribution of RST and WT, the exact distribution of LRT (presented in Theorem 3 and computed using CharFunToolR package), F distribution with \(n-1\) and \((n-1) (u-1)\) degrees of freedom for FT, and RLR distribution with m, \((n-1)(u-1)\), and \(n-1\) degrees of freedom (computed with the use of the algorithm of Chiani (2016)) for Roy’s test. Similarly to Fonseca et al. (2018) and Kozioł et al. (2021), to perform FT we choose \({\mathbf{v}}={\mathbf{1}}_m\). In all comparisons, the significance level 0.05 is used. The empirical power is calculated as the ratio between the number of rejections and the number of all trials performed. The results of the simulations are presented in Fig. 7.

Fig. 7
figure 7

Empirical powers of RST, WT, LRT, FT, and Roy’s test statistics for \(m=u=3\), \(n=5\), with respect to discrepancy \(\eta \). (Colour figure online)

It can be seen that RST, LRT, and Roy’s test statistics increase with the discrepancy, in contrast to FT, for which for two equally distant BCS structures the power differs significantly, and to WT, for which the power is often below nominal significance level (Wald test is biased). The different behavior of these latter tests may also be caused by inappropriate choice of discrepancy; however, this topic will be studied in future research. Concluding, in the following part of this chapter, we compare power of RST, LRT and Roy’s test only.

For power comparison, we consider \(m\in \{3,6\}\), \(u\in \{3,6,9\}\), and \(m=9\), \(u\in \{3,6\}\). For each pair (um) we choose \({\varvec{\Gamma }}_0\) and \({\varvec{\Gamma }}_1\) in such a way that the discrepancy \(\eta \) equals 0.2 and 0.4. The forms of all chosen matrices, except those given in (14), are available from the authors on request. Then, similarly as in the previous case, for selected sample sizes that ensure the existence of all three tests (\(n>m\)), we generate 50,000 data matrices, for which the empirical powers are computed. The simulation results are presented in Figs. 8, 9 and 10. It can be seen that for both \(\eta \) the power of the RST (blue line) and Roy’s test (purple line) exceed the power of the LRT (green line) for each considered sample size. Moreover, for \(\eta =0.4\), the power of RST and Roy’s test is indistinguishable.

Fig. 8
figure 8

Empirical powers of RST (blue), LRT (green) and Roy’s test (purple) statistics depending on n for \(m = 3\), \(u\in \{3, 6, 9\}\) and for \(\eta =0.2\) (left panel) and \(\eta =0.4\) (right panel). (Colour figure online)

Fig. 9
figure 9

Empirical powers of RST (blue), LRT (green) and Roy’s test (purple) statistics depending on n for \(m = 6\), \(u\in \{3, 6, 9\}\) and for \(\eta =0.2\) (left panel) and \(\eta =0.4\) (right panel). (Colour figure online)

Fig. 10
figure 10

Empirical powers of RST (blue), LRT (green) and Roy’s test (purple) statistics depending on n for \(m = 9\), \(u\in \{3, 6\}\) and for \(\eta =0.2\) (left panel) and \(\eta =0.4\) (right panel). (Colour figure online)

5 Real data example

In this section we consider an example originally presented in Liang et al. (2015)[Table 1], where the hierarchical model with block circular structure (in particular BCS structure) has been studied.

Data consisting of measurements of petal length made in 11 specific Kalanchoe plants from the same greenhouse are analyzed. From each plant 3 flowers have been randomly chosen. Note that each flower has 4 petals. We assume that the covariance between every two flowers is the same, which follows the BCS structure of the dispersion matrix. It is worth noting that since in each flower the arrangement of the petals is circular, Liang et al. (2015) additionally assumed a circular structure of covariance between the petals in each flower. This assumption is not required in this paper. For clarity, in this experiment we have \(n=11\) individuals, \(m=4\) petals on each of \(u=3\) flowers.

Our aim is to verify the hypothesis related to the independence of petal lengths between any two flowers; hence, the hypothesis (9) is suitable here. We use RST, WT, and LRT statistics and exact and approximate quantiles of their distributions to make the decision. Note that in the case of RST and WT the empirical null distributions are used as the exact distributions, while the quantiles (and thus also the p-value) of the exact LRT distribution are computed using the R package CharFunToolR. In all three cases, the \(\chi ^2_{10}\) distribution is used as the limiting one. We also compute F test statistics for three different choices of \({\mathbf{v}}\): \({\mathbf{1}}_m\), \({\mathbf{v}}_{\max }\) being the eigenvector related to maximal eigenvalue of \(\widehat{{\varvec{\Delta }}}_2\widehat{{\varvec{\Delta }}}_1^{-1}\), and some randomly generated \({\mathbf{v}}_{\mathrm{g}}=\left( 0.859853, 0.175291, 0.011513, 0.405039\right) ^\prime \), as well as Roy’s test statistic and we determine the p-values based on, respectively, F and RLR distributions with respective degrees of freedom. To calculate the p-value of Roy’s test, we use the algorithm presented in Chiani (2016). The values of the test statistics together with the respective p-values are given in Table 1.

Table 1 Values of RST, LRT, WT, F(\({\mathbf{1}}_m\)), F(\({\mathbf{v}}_{\max }\)), F(\({\mathbf{v}}_{\mathrm{g}}\)), and Roy’s test statistics together with respective p-values for real data example

The p-values computed from the exact distributions of all the tests suggest the same decision: at the significance level 0.05, the hypothesis of independence is not rejected. Furthermore, the decision made on the basis of limiting distribution of RST, WT and LRT remains the same. However, in the case of RST and WT, the exact and chi-square p-values are close to each other, which is not the case for LRT. This observation is very general, as can be seen from the significant discrepancies between the exact and limiting distributions of LRT, especially for small n, given in Figs. 1, 2, 3, 4 and 5.

Note that for different choice of \({\mathbf{v}}\) in the F test we obtain various values of test statistic and, obviously, different p-values. In the example considered, the decision remains the same; however, for other datasets, different decisions can be made. Thus, from a practical point of view, the Roy’s test should be preferable to the F-test, as it does not depend on the choice of \({\mathbf{v}}\). However, since the determination of the RLR distribution, as well as the exact distribution of LRT, involves complex computations with the use of special packages, while the (empirical) exact distributions of RST and WT do not differ significantly from limiting \(\chi ^2\) distribution, even if the normality assumption is not fulfilled, the RST or WT procedure with p-value taken from the limiting distribution seems to be the most useful for practitioners. Finally, because of biasedness of WT shown in the previous section, the type II error can be much higher than one can accept. Therefore, the RST is suggested to be used by practitioners.

6 Discussion

In this article we determined the RST and WT statistics as well as we showed the characteristic function of LRT statistic for testing the independence of features between repeated measurements in the BCS covariance structure. For all of these test statistics we proved that their null distributions do not depend on the true parameters. For FT and Roy’s test this conclusion is obvious. The robustness analysis performed for selected distributions showed, that all mentioned tests are relatively consistent, however, some future research must be done under the non-normality assumption, especially for multivariate t distribution of the data. Nevertheless, since WT is biased, and since the values of F test statistic strongly depend on the choice of vector \({\mathbf{v}}\), it is difficult to verify the power of these test, and thus they are not taken into consideration in power comparison. In the power analysis we showed that the powers of RST and Roy’s tests do not differ significantly and usually exceed the power of LRT.

Summing up, the F test would be good for testing the hypothesis about the values of specific elements of covariance matrix, in which case vector \({\mathbf{v}}\) should be chosen according to the tested hypothesis. Because of biasedness of WT and since the determination of the exact distribution of LRT and RLR distribution is relatively complex, the RST with its limiting \(\chi ^2\) distribution can be recommended for the use by researchers.