Abstract
The goal of this article is to test the hypothesis related to the independence of features between any two repeated measures in a block compound symmetry structure under the doubly multivariate normal model. The Rao score and Wald test statistics are determined and the characteristic function of the likelihood ratio test statistic is presented. For all of these test statistics, the asymptotic distributional properties are compared using simulation studies, and the robustness of the empirical distributions is considered. Furthermore, for power analysis purpose, the Kullback-Leibler divergence is proposed to measure discrepancy between hypotheses and the power of each mentioned tests, as well as F-test and Roy’s largest root test, is studied. Finally, all mentioned tests are applied to a real data example.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In modern experiments, huge datasets consisting of measurements of many features, observed repeatedly in time, at various locations and depths, taken from many individuals, are usually collected. This paper deals with doubly multivariate data that can be stored in a three-dimensional tensor of observations of size \(n\times m \times u\), where n is the number of individuals (sample size) and m and u are the numbers of repeated measurements of, for example, different features and locations, respectively. In such experiments, especially in genetics or medicine, the sample size is often too small to estimate all the unknown parameters of the model. To avoid this problem, regularization of estimators is proposed as one of the solutions, while the second one is to consider a patterned covariance matrix.
One of the common patterns for a doubly multivariate model is a block compound symmetry (BCS) structure, which is a direct extension of compound symmetry, common for multivariate models. BCS has been introduced in Rao (1945, 1953), where the problem of discriminating genetically different groups is studied. Note that the BCS structure is common for experiments, where the covariance matrix does not change when vectors from different repeated measures are interchanged, and hence, in the literature, it is also called an exchangeable structure.
A general method to test means and covariance matrices related to models with the BCS covariance structure has been proposed in Arnold (1973, 1979) as a way of reducing the number of unknown parameters for estimation. A similar problem has been also studied in Szatrowski (1976, 1978, 1982). Furthermore, Perlman (1987) found that if the collected data has symmetries, it is possible to obtain a more accurate estimate of the covariance matrix. Recently, Leiva (2007) formulated the generalized Fisher linear discrimination method under the BCS covariance structure and derived the maximum likelihood estimators of BCS. The estimation of BCS, as well as circular Toeplitz structure, was considered in Liang et al. (2012), while the optimal estimators of the BCS structure was proposed in Roy et al. (2016); Kozioł et al. (2018). The application of the BCS structure in multivariate interval data problems is shown in Hao et al. (2015). The tests for the mean structure under the model with the BCS covariance matrix were proposed in Zmyślony et al. (2018); Žežula et al. (2018), while the likelihood ratio test (LRT) and the Rao score test (RST) to test the BCS covariance structure were presented in Roy and Leiva (2011); Roy et al. (2018); Filipiak and Klein (2021). The asymptotic normal distribution of the LRT under the assumption of the size of each block and the sample size tending to infinity was presented in Sun and Xie (2020). Very recently, Liang et al. (2021) derived the LRT for testing simultaneously the mean and particular structures of blocks of the BCS covariance matrix.
The aim of this paper is to test independence between features measured repeatedly, e.g., over time or locations, under the normal model with the BCS covariance structure. The LRT for such a hypothesis was studied, for example, in Fonseca et al. (2018), where also a new test statistic being F-distributed, say FT statistic, is proposed. In the same year Tsukada (2018) compared the power of LRT, modified LRT (using Bartlett correction), Wald test (WT) and RST, but only for selected types of alternative hypothesis. It should be noted that the Wald test statistic is given without the proof in Tsukada (2018). In this paper we revise this test and show that the form presented in Tsukada (2018) is not in line with the definition of Wald test given by Rao (2005). Very recently Kozioł et al. (2021) gave a review on testing hypotheses under the BCS structures, and introduced also Roy’s test statistics having the largest root distribution.
The main goals of this paper are to determine the RST and WT statistics and to derive the exact distribution of LRT. Moreover, using simulation studies we compare the asymptotical properties of mentioned likelihood ratio based tests and we verify their robustness for non-Gaussian data. Finally, we recall the FT and Roy’s largest root tests and we study the power of each considered test. For this purpose, we introduce the entropy loss function (Kullback-Leibler divergence between two distributions) as a measure of discrepancy between the null and alternative hypotheses. We show that such an approach allows to compare the power of the test for various alternatives, in contrast to the approach usually considered in the literature, where particular structures of alternative hypotheses are studied; cf. Fonseca et al. (2018), Tsukada (2018). Note, finally, that the presented results can be applied in many areas of science, e.g. genetics, medicine, dietetics, agriculture, physics, image processing or engineering. In this paper we use horticultural real data example to compare all considered tests.
The paper is organized as follows. In Sect. 2 the model and hypotheses of interest, as well as the maximum likelihood estimators (MLEs) of unknown parameters, are presented. The RST, WT, LRT, FT and Roy’s test statistics are formulated in Sect. 3, together with their properties, such as the convergence of the distribution of RST and WT statistics to the limiting chi-square distribution, the exact distribution of LRT statistic, as well as the independence of the distributions of all test statistics on the true values of unknown parameters. The powers of all tests are analyzed in Sect. 4. Finally, to illustrate presented methods, the independence of the petal lengths between any two flowers of Kalanchoe plants is tested in Sect. 5. The article is summarized in the Discussion section.
2 Model and hypothesis
We consider an experiment performed on n individuals in which m features are repeatedly measured u times, where these repeated measurements could be time points, locations, depths, etc. Let \({\mathbf{X}}_i=({\mathbf{x}}^\prime _{i1},\dots ,{\mathbf{x}}^\prime _{iu})^\prime \), \(i = 1,\dots ,n\), be independent and identically distributed um-dimensional vectors of observations, where \({\mathbf{x}}_{ij}\), \(j=1,\dots ,u\), are m-dimensional vectors of measurements of the jth feature on the ith individual (at each of the u repetitions).
A normal matrix model is assumed here, in which the observation vectors for all individuals are placed in rows one below the other, that is,
where \({\mathbf{1}}_n\) is an n-dimensional vector of ones, \({\varvec{\mu }}\) is a um-dimensional general mean (the same for every individual), \({\mathbf{I}}_n\) is an identity matrix of order n, and \({\varvec{\Omega }}\) is an unknown symmetric positive definite covariance matrix of order um. Model (1) can also be presented in a vectorized form as
where \({\text {vec}}\) is the operator stacking the columns of a matrix one below the other and \(\otimes \) is a Kronecker product.
It is known, that if \({\varvec{\Omega }}\) is unstructured, its MLE is of the form
where \({\mathbf{Q}}_{n}={\mathbf{I}}_n - \frac{1}{n}{\mathbf{1}}_n{\mathbf{1}}'_n\) is the orthogonal projector onto the orthocomplement of the column space of \({\mathbf{1}}_n\), while an unbiased estimator has the form
Note that, if um is close to the sample size, both estimators are ill-conditioned. Furthermore, if \(um > n\) the estimators are singular. To avoid these problems and to reduce the number of unknown parameters, one may impose the appropriate structure on the covariance matrix, which decreases the number of unknown parameters. In this paper we consider the BCS structure
with symmetric positive definite (p.d.) matrix \({\varvec{\Gamma }}_0\) of order m, and with symmetric matrix \({\varvec{\Gamma }}_1\) of order m such that \({\varvec{\Gamma }}\) is p.d. Matrix \({\varvec{\Gamma }}_0\) is a variance-covariance matrix of m features at any given repeated measurement, while \({\varvec{\Gamma }}_1\) is a covariance matrix of m features between any two repeated measurements.
After reparameterization one can get an equivalent form of BCS structure of the form
where \({\mathbf{P}}_u =\frac{1}{u} {\mathbf{1}}_u{\mathbf{1}}'_u\) is the orthogonal projector onto the column space of \({\mathbf{1}}_u\). This form is more useful from a computational point of view. Since \({\mathbf{P}}_u {\mathbf{Q}}_u = {\mathbf{0}}\), to ensure the positive definiteness of \({\varvec{\Gamma }}\) it is enough to assume that \({\varvec{\Delta }}_i\), \(i=1,2\), are symmetric positive definite matrices. The relationship between (3) and (4) can be represented as
Note that the BCS structure is also called exchangeable, since the vector \({\mathbf{x}}_{ij}\) can be interchanged with \({\mathbf{x}}_{ij'}\), \(j,j'=1,\dots ,u\), without changing the covariance matrix.
Since the space of BCS structures is a quadratic subspace, that is, the power of BCS also belongs to the same space, cf. Seely (1971), the MLE of \({\varvec{\Gamma }}\) is a projection of \({\mathbf{S}}\) given in (2) onto the space of BCS structures, that is,
with
cf. Filipiak et al. (2020). Alternatively, if \({\varvec{\Gamma }}\) is expressed as (3),
where \({\text {BTr}}_m({\mathbf{A}})=\sum \nolimits _{i=1}^u {\mathbf{A}}_{ii}\) is a block trace operator defined on the partitioned matrix \({\mathbf{A}}=({\mathbf{A}}_{ij})\), \(i,j=1,\dots ,u\), with blocks of order m; cf. Filipiak et al. (2018), and \({\text {BSum}}_m({\mathbf{A}})=\sum \nolimits _{i=1}^u\sum \nolimits _{j=1}^u{\mathbf{A}}_{ij}\).
We are interested in testing the hypothesis related to the independence of features between two repeated measurements. This means that we are testing the block diagonality of the covariance matrix, which can be presented as
or, equivalently, using parameterization (4),
The spectral form of the BCS structure given in (9) provides simpler algebraic transformations than the previous one, and thus will usually be considered in the forthcoming sections.
Let us denote \({\varvec{\Delta }}_1={\varvec{\Delta }}_2\) in (9) by \({\varvec{\Delta }}\). Then, the null hypothesis can be written as \( {\varvec{\Omega }}={\mathbf{I}}_u \otimes {\varvec{\Delta }}\). Since the space of block diagonal matrices is a quadratic subspace, the MLE of \({\varvec{\Delta }}\) is a projection of \({\mathbf{S}}\) onto the space of block diagonal matrices, that is,
The MLEs given in (6) and (10) will be used for determining the test statistics in the next section.
3 Test statistics
In this section we give an overview of the tests for the considered hypothesis, with determination of RST and WT statistics. Note that the form of RST statistic has been stated by Tsukada (2018), however, in this paper we present an alternative proof of its form. Moreover, in Tsukada (2018) the WT statistic has been given without any proof; therefore, in this paper we prove that the WT statistic has a more complex form. We also verify the convergence of likelihood ratio based tests to the limiting chi-square distribution, using empirical distributions for RST and WT, and the exact distribution of LRT, formulated in Theorem 3 and proved in Appendix C.
Another test for (9), FT, having F-distribution with respective degrees of freedom, has been introduced by Fonseca et al. (2018). Its generalization is Roy’s test, having the largest root distribution; cf. Mardia et al. (1979). In this paper we recall their forms, and we verify the robustness of all tests for non-Gaussian data.
We start with determining the RST statistic. The proof of the following theorem can be found in Appendix A.
Theorem 1
Under hypothesis (9) the Rao score test statistic can be expressed as
where \(\widehat{{\varvec{\Gamma }}}\) and \(\widehat{{\varvec{\Delta }}}\) are given in, respectively, (5) and (10).
Denoting the MLEs of covariance matrix under alternative and null hypothesis by, respectively, \(\widehat{{\varvec{\Omega }}}_{H_1}\) and \(\widehat{{\varvec{\Omega }}}_{H_0}\), we may represent the above RS test statistic as
which is in line with the RS for testing various covariance structures in Filipiak and Klein (2021).
It is worth noting that under hypothesis (8), which is obviously equivalent to (9), we may formulate the following corollary, that can be proven directly from Theorem 1 by considering the parameterization of \(\widehat{{\varvec{\Gamma }}}\) and \(\widehat{{\varvec{\Delta }}}\) through \(\widehat{{\varvec{\Gamma }}}_0\) and \(\widehat{{\varvec{\Gamma }}}_1\).
Corollary 1
Under hypothesis (8) the Rao score test statistic can be expressed as
where \(\widehat{{\varvec{\Gamma }}}_0\) and \(\widehat{{\varvec{\Gamma }}}_1\) are given in (7).
Note, that the RST statistic presented in Corollary 1 can be also expressed as formula (3.20) in Tsukada (2018).
Finally, recall that, due to Rao (2005), under considered null hypothesis and if the sample size \(n\rightarrow \infty \), presented RST statistic is \(\chi ^2\) distributed with \(m(m+1)/2\) degrees of freedom. The same limiting distribution is related to the second well know test - Wald test, presented in the next theorem, with the proof in Appendix B.
Theorem 2
Under hypothesis (9) the Wald test statistic can be expressed as
with \(\widehat{{\varvec{\Delta }}}_1\) and \(\widehat{{\varvec{\Delta }}}_2\) given in (6).
We shall note, that to determine the Wald test statistic under hypothesis (8) it is enough to replace matrices \({\varvec{\Delta }}_1\), \({\varvec{\Delta }}_2\) by respective transformations given in (2), however, the form of W will be much more complex. It is also possible to determine W going directly from null hypothesis (8), nevertheless, in such a case the Fisher information matrix given in Appendix A cannot be applied directly.
Finally observe, that plugging the MLE of covariance matrix under null hypothesis instead of alternative into the form of WT, respective test statistic presented by Tsukada (2018) will be obtained. Note however, that such approach is not in line with the definition of the Wald test given by Rao (2005).
The Rao score test is based on the MLE of vector of parameters under null hypothesis, while the Wald test used the MLE under alternative. The third test of Rao’s “holy trinity” is the likelihood ratio test, based on comparison of the MLEs under the null and alternative hypotheses. When testing (9), the likelihood ratio \(\Lambda \) has the form
It is well known (Rao 2005) that under the null hypothesis, \({\text {LR}}=-2\ln \Lambda \) is approximately distributed as \(\chi ^2\) with \(m(m+1)/2\) degrees of freedom. It should be noted that if the covariance parameters fall on the boundary of their parameter space, then the asymptotic distribution of \({\text {LR}}\) becomes a mixture of \(\chi ^2\) distributions, as discussed in Self and Liang (1987). Instead of an approximate distribution, which works well only for relatively large sample sizes, one can use the exact distribution of the LR presented in the following theorem, with the proof given in Appendix C.
Theorem 3
The characteristic function of \({\text {LR}}=-2\ln \Lambda \), with \(\Lambda \) being the likelihood ratio test statistic given in (11), is of the form
The exact distribution of LRT statistic with the use of the above characteristic function can be computed with the use of R package CharFunToolR developed in Gajdoš (2018) on the basis of Matlab package CharFunTool provided in Witkovský (2018).
We shall also mention, that in the literature some modifications of LRT are studied. One of the example is multiplication of LR by a constant equal to \(1-(u^2-u+1)(2m^2+3m-1)/(6(n-1)u(u-1)(m+1))\); cf. Tsukada (2018). Using the general theory of asymptotic expansions from Anderson (2003), such modified test statistic converges faster to respective \(\chi ^2\) distribution. Nevertheless, since in this paper we give the exact distribution of LRT, we do not consider mentioned modification as a separate test.
Finally note, that a big advantage of all presented test statistics is the following property, with the proof given in Appendix D.
Proposition 1
The distributions of RST, WT and LRT statistics under the null hypothesis in (9) do not depend on the true values of \({\varvec{\mu }}\) and \({\varvec{\Delta }}\).
Using simulations, we compare now the behavior of the empirical distributions of RST and WT statistics with respect to their convergence to the limiting distribution and we collate them with the exact distribution of LRT.
Recall, that all proposed tests can be performed if \(n>m\). Moreover, for \(n\rightarrow \infty \), the distribution of RS, W and LR test statistics tends to the \(\chi ^2\) distribution with \(m(m+1)/2\) degrees of freedom. Thus, in Figs. 1, 2, 3, 4 and 5 we present the empirical null distributions of RST and WT, exact distribution of LRT and the limiting \(\chi ^2\) distribution with respect to the sample size for \(u=3\) and respectively \(m\in \{3,6,9\}\) and for \(m=3\) and respectively \(u\in \{6,9\}\). It can be seen that distributions of all test statistics tend to the limiting distribution with the increase of n, however, the convergence of RST is the quickest and even for relatively small sample size does not differ significantly from the limiting one, which is not the case for WT nor LRT.
The next two presented tests, the FT and Roy’s test, are based on unbiased estimators of unknown parameters, instead of MLEs. Roy et al. (2016) presented such unbiased estimators of \({\varvec{\Delta }}_1\) and \({\varvec{\Delta }}_2\) in terms of multiple sums of vector products. Recall that since the space of BCS structures is a quadratic subspace, these estimators can also be obtained by projection of sample covariance matrix \({\mathbf{S}}^*\) onto the space of BCS matrices; cf. Filipiak et al. (2020). Thus, the unbiased estimators given in Roy et al. (2016) can also be represented as
The FT statistic, introduced by Fonseca et al. (2018), has the form
and is F distributed with \(n-1\) and \((n-1)(u-1)\) degrees of freedom; cf. Fonseca et al. (2018)[Lemma 3.1]. Since the unbiased estimators of \({\varvec{\Delta }}_1\) and \({\varvec{\Delta }}_2\) differ from respective MLEs only by a constant, we can also represent FT statistic in terms of MLEs, that is
Noting that
are independent; cf. Roy et al. (2015), the F distribution of the above FT statistic also follows.
Observe, that according to (D.1), the FT statistic can be expressed as
where \(\widehat{{\varvec{\Upsilon }}}_1\) and \(\widehat{{\varvec{\Upsilon }}}_2\) are given in (D.2). Thus, even if under null hypothesis the distribution of the above FT statistic does not depend on the true value of \(\varvec{\Delta }\), the choice of vector \({\mathbf{v}}\) should be appropriate. Note, that if \({\mathbf{v}}\) is equal to the column of identity matrix, in fact the hypothesis about specific entry of covariance matrix being equal to zero is tested, which is the same as (9) only if \(m=1\). Furthermore, if \({\mathbf{v}}={\mathbf{1}}_m\) (as it was assumed in, e.g., Fonseca et al. (2018)) then the hypothesis about the sum of all elements of \({\varvec{\Gamma }}_1\) being equal to zero is tested. Thus, the proposed test statistic is appropriate to test (9) if all the entries of \({\varvec{\Gamma }}_1\) are of the same sign (or some of them, but not all, are zeros). In fact, the choice of vector \({\mathbf{v}}\) as the vector of nonnegative (nonpositive) components corresponds to testing the value of weighted sum of the elements of \({\varvec{\Gamma }}_1\). Concluding, for testing (9), it would be natural not to fix a single \({\mathbf{v}}\), but to choose some optimal vector of quadratic forms in FT.
To achieve higher power of the test it is natural to choose a vector \({\mathbf{v}}\) maximizing the value of the test statistic. Thus, maximizing \(\mathrm{F}\) for all \({\mathbf{v}}\in {\mathbb {R}}^m\) we obtain the test statistic \(\mathrm{F}_m=\lambda _{max}\left( \widehat{{\varvec{\Delta }}}_2\widehat{{\varvec{\Delta }}}_1^{-1}\right) \); cf. Kozioł et al. (2021), where \(\lambda _{max}(\cdot )\) is the largest root of the matrix in parenthesis. The inference can be made using Roy’s test of the form
which has the largest root distribution with parameters m, \((n-1)(u-1)\), and \(n-1\); for more details see e.g. Mardia et al. (1979). For simplicity we will abbreviate this distribution as RLR (Roy’s largest root).
Notice, that vector \({\mathbf{v}}\) in \(\mathrm{F}_m\) is the eigenvector corresponding to the maximal eigenvalue of \(\widehat{{\varvec{\Delta }}}_2\widehat{{\varvec{\Delta }}}_1^{-1}\), thus it is not fixed anymore, but depends on the data. As a result, as it is mentioned in Kozioł et al. (2021), Roy’s test does not necessarily have higher power than the F-test, it is advantageous only when the largest eigenvalue is substantially larger than the remaining ones.
In order to check the robustness of considered tests with respect to some perturbations from normality, for various combinations of m, u and n, we generated the data from the following non-Gaussian distributions: multivariate \(t_3\), \(t_5\), gamma distribution with parameters (2, 1.5), and uniform on the interval (0, 1). The results for \(m=9\), \(u=3\) and \(n\in \{10,25\}\) are given in Fig. 6. It can be seen that for small sample sizes, the distributions of all test statistics under non-normality are quite close to the relevant empirical null distributions, however, if the sample size increases, for considered t distributions, the test statistics are appearing to tend to some other distribution than chi-square. Similar observation was noticed for other sets of parameters (results not presented in this paper). Therefore, the robustness of test statistics, especially for multivariate t distributions, will be the topic of future research.
4 Power study of considered tests
For power comparison purpose Fonseca et al. (2018) considered the covariance structure under alternative constructed by choosing a block diagonal matrix with \({\varvec{\Gamma }}_0\) on the diagonal and stating a scaled randomly generated matrix \({\varvec{\Gamma }}_1\) as off-diagonal blocks, that is,
where \(\lambda \) is a parameter ensuring positive definiteness of \({\varvec{\Gamma }}\). Note, that for various matrices \({\varvec{\Gamma }}_1\), parameter \(\lambda \) belongs to different domains (ensuring positive definiteness), and hence the discrepancies \(\mid \!\!\lambda \!\!\mid \) are not comparable. Moreover, such approach allows to consider only very specific types of alternatives and also null hypothesis, choosing the same \({\varvec{\Gamma }}_0\) in both, null and alternative. Thus, in this paper, as a measure of discrepancy between given alternative, \({\varvec{\Gamma }}\), and a set of block-diagonal matrices \({\mathbf{I}}_u\otimes {\varvec{\Delta }}\), we minimize Kullback-Leibler divergence between two distributions that differ in covariance matrix, that is
where the symmetric p.d. matrix \({\varvec{\Gamma }}\) has BCS structure with some given symmetric matrices \({\varvec{\Gamma }}_0\) (p.d.) and \({\varvec{\Gamma }}_1\), while \({\varvec{\Delta }}\) is a symmetric p.d. matrix for which the minimum is attained. Using the same differentiation rules as in Appendix A, it can be shown that the minimum (12) is obtained for
with \({\varvec{\Delta }}_1\) and \({\varvec{\Delta }}_2\) defined in (2). It should be noted that (13) determines the block diagonal structure which is the closest one in the sense of (12), and does not need to be the same as the diagonal blocks used in the alternative hypothesis. Observe moreover that since the value of \(\zeta \) is not upper bounded, we use the transformation \(\eta = 1-\frac{1}{1+\zeta }\) that shrinks \(\zeta \) into the [0, 1) interval. Note that, in contrast to the method used by Fonseca et al. (2018), for arbitrary randomly generated \({\varvec{\Gamma }}_0\) and \({\varvec{\Gamma }}_1\), the minimum \({\varvec{\Delta }}\) and the discrepancy \(\eta \) can be determined and compared to each other.
Summing up, for various values of u and m, we first randomly generate matrices \({\varvec{\Gamma }}_0\) and \({\varvec{\Gamma }}_1\), for which we determine the discrepancy \(\eta \). For example, for \(u=m=3\), we choose the matrices
for which the minimum in (12) is attained at
giving \(\eta =0.2012\).
In the study on power, we start by verifying the power of all the tests mentioned for \(m=u=3\) and \(n=5\), such that \({\varvec{\Gamma }}_0\) is given in (14), and \({\varvec{\Gamma }}_1\) is randomly generated 300 times. For each case that gives a positive definite \({\varvec{\Gamma }}\) (exactly for 237 cases), we generate the data matrix \({\mathbf{X}} \sim N_{n, um}({\mathbf{0}}, {\mathbf{I}}_n, {\varvec{\Gamma }})\), for which all tests are then performed. For rejection of null hypothesis we use respectively the empirical null distribution of RST and WT, the exact distribution of LRT (presented in Theorem 3 and computed using CharFunToolR package), F distribution with \(n-1\) and \((n-1) (u-1)\) degrees of freedom for FT, and RLR distribution with m, \((n-1)(u-1)\), and \(n-1\) degrees of freedom (computed with the use of the algorithm of Chiani (2016)) for Roy’s test. Similarly to Fonseca et al. (2018) and Kozioł et al. (2021), to perform FT we choose \({\mathbf{v}}={\mathbf{1}}_m\). In all comparisons, the significance level 0.05 is used. The empirical power is calculated as the ratio between the number of rejections and the number of all trials performed. The results of the simulations are presented in Fig. 7.
It can be seen that RST, LRT, and Roy’s test statistics increase with the discrepancy, in contrast to FT, for which for two equally distant BCS structures the power differs significantly, and to WT, for which the power is often below nominal significance level (Wald test is biased). The different behavior of these latter tests may also be caused by inappropriate choice of discrepancy; however, this topic will be studied in future research. Concluding, in the following part of this chapter, we compare power of RST, LRT and Roy’s test only.
For power comparison, we consider \(m\in \{3,6\}\), \(u\in \{3,6,9\}\), and \(m=9\), \(u\in \{3,6\}\). For each pair (u, m) we choose \({\varvec{\Gamma }}_0\) and \({\varvec{\Gamma }}_1\) in such a way that the discrepancy \(\eta \) equals 0.2 and 0.4. The forms of all chosen matrices, except those given in (14), are available from the authors on request. Then, similarly as in the previous case, for selected sample sizes that ensure the existence of all three tests (\(n>m\)), we generate 50,000 data matrices, for which the empirical powers are computed. The simulation results are presented in Figs. 8, 9 and 10. It can be seen that for both \(\eta \) the power of the RST (blue line) and Roy’s test (purple line) exceed the power of the LRT (green line) for each considered sample size. Moreover, for \(\eta =0.4\), the power of RST and Roy’s test is indistinguishable.
5 Real data example
In this section we consider an example originally presented in Liang et al. (2015)[Table 1], where the hierarchical model with block circular structure (in particular BCS structure) has been studied.
Data consisting of measurements of petal length made in 11 specific Kalanchoe plants from the same greenhouse are analyzed. From each plant 3 flowers have been randomly chosen. Note that each flower has 4 petals. We assume that the covariance between every two flowers is the same, which follows the BCS structure of the dispersion matrix. It is worth noting that since in each flower the arrangement of the petals is circular, Liang et al. (2015) additionally assumed a circular structure of covariance between the petals in each flower. This assumption is not required in this paper. For clarity, in this experiment we have \(n=11\) individuals, \(m=4\) petals on each of \(u=3\) flowers.
Our aim is to verify the hypothesis related to the independence of petal lengths between any two flowers; hence, the hypothesis (9) is suitable here. We use RST, WT, and LRT statistics and exact and approximate quantiles of their distributions to make the decision. Note that in the case of RST and WT the empirical null distributions are used as the exact distributions, while the quantiles (and thus also the p-value) of the exact LRT distribution are computed using the R package CharFunToolR. In all three cases, the \(\chi ^2_{10}\) distribution is used as the limiting one. We also compute F test statistics for three different choices of \({\mathbf{v}}\): \({\mathbf{1}}_m\), \({\mathbf{v}}_{\max }\) being the eigenvector related to maximal eigenvalue of \(\widehat{{\varvec{\Delta }}}_2\widehat{{\varvec{\Delta }}}_1^{-1}\), and some randomly generated \({\mathbf{v}}_{\mathrm{g}}=\left( 0.859853, 0.175291, 0.011513, 0.405039\right) ^\prime \), as well as Roy’s test statistic and we determine the p-values based on, respectively, F and RLR distributions with respective degrees of freedom. To calculate the p-value of Roy’s test, we use the algorithm presented in Chiani (2016). The values of the test statistics together with the respective p-values are given in Table 1.
The p-values computed from the exact distributions of all the tests suggest the same decision: at the significance level 0.05, the hypothesis of independence is not rejected. Furthermore, the decision made on the basis of limiting distribution of RST, WT and LRT remains the same. However, in the case of RST and WT, the exact and chi-square p-values are close to each other, which is not the case for LRT. This observation is very general, as can be seen from the significant discrepancies between the exact and limiting distributions of LRT, especially for small n, given in Figs. 1, 2, 3, 4 and 5.
Note that for different choice of \({\mathbf{v}}\) in the F test we obtain various values of test statistic and, obviously, different p-values. In the example considered, the decision remains the same; however, for other datasets, different decisions can be made. Thus, from a practical point of view, the Roy’s test should be preferable to the F-test, as it does not depend on the choice of \({\mathbf{v}}\). However, since the determination of the RLR distribution, as well as the exact distribution of LRT, involves complex computations with the use of special packages, while the (empirical) exact distributions of RST and WT do not differ significantly from limiting \(\chi ^2\) distribution, even if the normality assumption is not fulfilled, the RST or WT procedure with p-value taken from the limiting distribution seems to be the most useful for practitioners. Finally, because of biasedness of WT shown in the previous section, the type II error can be much higher than one can accept. Therefore, the RST is suggested to be used by practitioners.
6 Discussion
In this article we determined the RST and WT statistics as well as we showed the characteristic function of LRT statistic for testing the independence of features between repeated measurements in the BCS covariance structure. For all of these test statistics we proved that their null distributions do not depend on the true parameters. For FT and Roy’s test this conclusion is obvious. The robustness analysis performed for selected distributions showed, that all mentioned tests are relatively consistent, however, some future research must be done under the non-normality assumption, especially for multivariate t distribution of the data. Nevertheless, since WT is biased, and since the values of F test statistic strongly depend on the choice of vector \({\mathbf{v}}\), it is difficult to verify the power of these test, and thus they are not taken into consideration in power comparison. In the power analysis we showed that the powers of RST and Roy’s tests do not differ significantly and usually exceed the power of LRT.
Summing up, the F test would be good for testing the hypothesis about the values of specific elements of covariance matrix, in which case vector \({\mathbf{v}}\) should be chosen according to the tested hypothesis. Because of biasedness of WT and since the determination of the exact distribution of LRT and RLR distribution is relatively complex, the RST with its limiting \(\chi ^2\) distribution can be recommended for the use by researchers.
References
Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, Hoboken
Arnold SF (1973) Application of the theory of products of problems to certain patterned covariance matrices. Ann Stat 1(4):682–699
Arnold SF (1979) Linear models with exchangeably distributed errors. J Am Stat Assoc 74:194–199
Chiani M (2016) Distribution of the largest root of a matrix for Roy’s test in multivariate analysis of variance. J Multivar Anal 143:467–471
Fackler PL (2005) Notes on matrix calculus. https://media.gradebuddy.com/documents/1897145/1ad5e235-824d-4bbf-81d6-ffd2040e37ec.pdf
Filipiak K, John M, Markiewicz A (2020) Comments on maximum likelihood estimation and projections under multivariate statistical models. In: Holgersson T, Singull M (eds) Recent developments in multivariate and random matrix analysis. Springer, Berlin, pp 51–66
Filipiak K, Klein D (2021) Estimation and testing of the covariance structure of doubly multivariate data. In: Filipiak K, Markiewicz A, von Rosen D (eds) Multivariate, multilinear and mixed linear models. Springer, Berlin
Filipiak K, Klein D, Roy A (2016) Score test for a separable covariance structure with the first component as compound symmetric correlation matrix. J Multivar Anal 150:105–124
Filipiak K, Klein D, Vojtková E (2018) The properties of partial trace and block trace operators of partitioned matrices. Electron J Linear Algebra 33:3–15
Fonseca M, Kozioł A, Zmyślony R (2018) Testing hypotheses of covariance structure in multivariate data. Electron J Linear Algebra 33:53–62
Gajdoš A (2018) CharFunToolR: the characteristic functions toolbox (R). https://github.com/gajdosandrej/CharFunToolR
Ghazal AG, Neudecker H (2000) On second-order and fourth-order moments of jointly distributed random matrices: a survey. Linear Algebra Appl 321:61–93
Hao C, Liang Y, Roy A (2015) Equivalency between vertices and centers-coupled-with-radii principal component analyses for interval data. Stat Probab Lett 106:113–120
Kollo T, von Rosen D (2005) Advanced multivariate statistics with matrices. Springer, Dordrecht
Kozioł A, Roy A, Zmyślony R, Leiva R, Fonseca M (2018) Free coordinate estimation for doubly multivariate data. Linear Algebra Appl 547:217–239
Kozioł A, Roy A, Zmyślony R, Žežula I, Fonseca M (2021) Estimation and testing hypotheses in two-level and three-level multivariate data with block compound symmetric covariance structure. In: Filipiak K, Markiewicz A, von Rosen D (eds) Multivariate, multilinear and mixed linear models. Springer, Berlin, pp 203–232
Leiva R (2007) Linear discrimination with equicorrelated training vectors. J Multivar Anal 98:384–409
Liang Y, Coelho CA, von Rosen T (2021) Hypothesis testing in multivariate normal models with block circular covariance structures. Biom J 64:557–576
Liang Y, von Rosen D, von Rosen T (2012) On estimation in multilevel models with block circular symmetric covariance structure. Acta Comment Univ Tartu Math 16(1):83–96
Liang Y, von Rosen D, von Rosen T (2015) On estimation in hierarchical models with block circular covariance structures. Ann Inst Stat Math 67:773–791
Magnus J, Neudecker H (1986) Symmetry, 0–1 matrices and Jacobians, a review. Econ. Theory 2:157–190
Mardia KV, Kent T, Bibby M (1979) Multivariate analysis. Academic Press, Cambridge
Perlman MD (1987) Group symmetry covariance models. Stat Sci 2:421–425
Rao CR (1945) Familial correlations or the multivariate generalizations of the intraclass correlation. Curr. Sci. 14:66–67
Rao CR (1953) Discriminant functions for genetic differentiation and selection. Sankhyã 12:229–246
Rao CR (2005) Score test: historical review and recent developments. Advances in ranking and selection, multiple comparisons, and reliability. Springer, Berlin, pp 3–20
Roy A, Leiva R (2011) Estimating and testing a structured covariance matrix for three-level multivariate data. Commun Stat Theory Methods 40:1945–1963
Roy A, Zmyślony R, Fonseca M, Leiva R (2016) Optimal estimation for doubly multivariate data in blocked compound symmetric covariance structure. J Multivar Anal 144:81–90
Roy A, Filipiak K, Klein D (2018) Testing a block exchangeable covariance matrix. Statistics 52:393–408
Roy A, Leiva R, Žežula I, Klein D (2015) Testing the equality of mean vectors for paired doubly multivariate observations in blocked compound symmetric covariance matrix setup. J Multivar Anal 137:50–60
Seely J (1971) Quadratic subspaces and completeness. Ann Math Stat 42:710–721
Self SG, Liang K (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610
Sun G, Xie J (2020) Asymptotic normality and moderate deviation principle for high-dimensional likelihood ratio statistic on block compound symmetry covariance structure. Statistics 54:114–134
Szatrowski TH (1976) Estimation and testing for block compound symmetry and other patterned covariance matrices with linear and non-linear structure, Technical Report Number OLK NSF 107. Department of Statistics, Stanford University, Stanford
Szatrowski TH (1978) Explicit solutions, one iteration convergence and averaging in the multivariate normal estimation problem for patterned means and covariance. Ann Inst Stat Math 30A:81–88
Szatrowski TH (1982) Testing and estimation in the block compound symmetry problem. J Educ Stat 7(1):3–18
Tsukada S (2018) Hypothesis testing for independence under blocked compound symmetric covariance structure. Commun Math Stat 6:163–184
Witkovský V (2018) CharFunTool: the characteristic functions tool-box (MATLAB). https://github.com/witkovsky/CharFunTool
Zmyślony R, Žežula I, Kozioł A (2018) Application of Jordan algebra for testing hypotheses about structure of mean vector in model with block compound symmetric covariance structure. Electron J Linear Algebra 33:41–52
Žežula I, Klein D, Roy A (2018) Testing of multivariate repeated measures data with block exchangeable covariance structure. TEST 27:360–378
Acknowledgements
The authors thank the Mathematical Research and Conference Center of the Polish Academy of Sciences, Bȩdlewo, Poland, for providing the opportunity and support for this paper. This research is partially supported by Statutory Activities No. 0213/SBAD/0110 (Mateusz John) and by the Slovak Research and Development Agency under the contract no. APVV-17-0568 (Daniel Klein).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Theorem 1
Following Rao (2005) the Rao score test statistics is a function of score vector \({\mathbf{s}}\) and Fisher information matrix \({\mathbf{F}}\), that is
Note, that the score vector \({\mathbf{s}}({\varvec{\theta }})\) is a vector of first derivatives of log-likelihood function with respect to the vector of parameters under alternative, that is \({\varvec{\theta }}=\left( {\varvec{\mu }}',{\text {vech}}'{\varvec{\Delta }}_1,{\text {vech}}'{\varvec{\Delta }}_2\right) ^\prime \), where \({\text {vech}}{\mathbf{A}}\) is a vector obtained from \({\text {vec}}{\mathbf{A}}\) by eliminating all elements arranged above the main diagonal of \({\mathbf{A}}\); cf. Magnus and Neudecker (1986). Observe, that it is easy to recover all the entries of \({\text {vec}}{\mathbf{A}}\) from \({\text {vech}}{\mathbf{A}}\) by duplication of respective elements, that is \( {\text {vec}}{\mathbf{A}}={\mathbf{D}}_m{\text {vech}}{\mathbf{A}}\), where \({\mathbf{D}}_m\) is an \(m^2 \times \frac{1}{2}m(m+1)\) duplication matrix; cf. Magnus and Neudecker (1986).
Due to Magnus and Neudecker (1986), by the derivative of an arbitrary function \({\mathbf{G}}({\mathbf{A}})\) with respect to \({\mathbf{A}}\) we mean \(\displaystyle \frac{\mathrm{d}{\text {vec}}{\mathbf{G}}({\mathbf{A}})}{\mathrm{d}{\text {vec}}'{\mathbf{A}}}\). If \({\mathbf{A}}\) is a symmetric matrix, the derivative is computed with respect to \({\text {vech}}'{\mathbf{A}}\). It is easy to see that \(\displaystyle \frac{\mathrm{d}{\text {vec}}{\mathbf{A}}}{\mathrm{d}{\text {vech}}'{\mathbf{A}}}={\mathbf{D}}_m\), and thus, according to the chain rule, \(\displaystyle \frac{\mathrm{d}{\text {vec}}{\mathbf{G}}({\mathbf{A}})}{\mathrm{d}{\text {vech}}'{\mathbf{A}}}=\frac{\mathrm{d}{\text {vec}}{\mathbf{G}}({\mathbf{A}})}{\mathrm{d}{\text {vec}}'{\mathbf{A}}}\cdot {\mathbf{D}}_m\). Since the log-likelihood function is a scalar function, in considered case resulting score vector is of dimension \(um+m(m+1)\).
Noting, that \(\widehat{{\varvec{\theta }}}\) is the MLE of \({\varvec{\theta }}\) under the null hypothesis, for (9) we obtain \(\widehat{{\varvec{\theta }}}=\left( \widehat{{\varvec{\mu }}}',{\text {vech}}'\widehat{{\varvec{\Delta }}},{\text {vech}}'\widehat{{\varvec{\Delta }}}\right) ^\prime \) with \(\widehat{{\varvec{\mu }}}=\frac{1}{n}{\mathbf{X}}^\prime {\mathbf{1}}_n\) and \(\widehat{{\varvec{\Delta }}}\) presented in (10). Observe, however, that since the considered hypothesis does not contain any restrictions on \({\varvec{\mu }}\), the first entry of the score vector (first derivative with respect to \({\varvec{\mu }}\)) will reduce to \({\mathbf{0}}\) when \({\varvec{\mu }} \) is replaced by its MLE. Thus, without loss of generality, we consider the score vector of the dimension \(m(m+1)\). Similarly, the Fisher information matrix is of order \(m(m+1)\).
Considering \({\varvec{\Gamma }}\) given by (4) and since \(\mid {\varvec{\Gamma }}\mid =\mid {\varvec{\Delta }}_1\mid ^{u-1}\mid {\varvec{\Delta }}_2\mid \), the log-likelihood function under \(H_1\) in (9) can be presented as
with \({\mathbf{Y}}:={\mathbf{Y}}({\varvec{\mu }})= {\mathbf{X}}- {\mathbf{1}}_n{\varvec{\mu }}^\prime \). In order to obtain the score vector, we differentiate the above log-likelihood function with respect to \({\varvec{\theta }}\). Using the chain rule as described in Magnus and Neudecker (1986), the differentiation formulas given in Fackler (2005), and Corollary 2.10 of Filipiak et al. (2018), we obtain
where \({\mathbf{K}}_{m,u}\) is an \(um\times um\) commutation matrix that transforms an \(m \times u\) matrix \({\mathbf{A}}\) as \({\mathbf{K}}_{m,u}{\text {vec}}{\mathbf{A}} = {\text {vec}}{\mathbf{A}}'\). Similar result can be obtained for \(\displaystyle \frac{\partial \ln L}{\partial {\varvec{\Delta }}_2}\) with \({\varvec{\Delta }}_1\) replaced by \({\varvec{\Delta }}_2\), with the projection matrix \({\mathbf{Q}}_u\) replaced by \({\mathbf{P}}_u\), and with \({\text {tr}}{\mathbf{Q}}_u=u-1\) replaced by \({\text {tr}}{\mathbf{P}}_u=1\). Plugging in \(\widehat{{\varvec{\theta }}}\) under \(H_0\) into the above formulas, and observing that
we get
which, due to the formulas (6) for MLEs of BCS structure, can be simplified to
To compute Fisher information matrix, second order partial derivatives and their expected values are calculated. The detailed computations only for parameter \({\varvec{\Delta }}_1\) are presented here, as the derivatives with respect to \({\varvec{\Delta }}_2\) result from the same arguments.
Using derivatives from Fackler (2005) and formula (1.4.23) from Kollo and von Rosen (2005) we get
To compute the expectation of the above, we use the notation \({\text {E}}({\mathbf{Y}}'{\mathbf{Y}})={\varvec{\Gamma }}\) (cf. Kollo and von Rosen (2005)[Th. 2.2.9(i)]), and hence from orthogonality of \({\mathbf{P}}_u\) and \({\mathbf{Q}}_u\), we obtain
Noting, that the block-diagonal elements of \({\mathbf{Q}}_u \otimes {\varvec{\Delta }}_1\) are equal to \(\frac{u-1}{u}{\varvec{\Delta }}_1\) it is easy to see that \({\text {BTr}}_m({\mathbf{Q}}_u \otimes {\varvec{\Delta }}_1)=(u-1){\varvec{\Delta }}_1\). Furthermore, since for symmetric matrix \({\mathbf{A}}\)
cf. Filipiak et al. (2016)[Lemma 1], we obtain
Plugging in the MLE’s of unknown parameters under \({\mathbf{H}}_0\), we finally obtain
and, from Filipiak et al. (2016)[Prop. 1(iv)],
where \({\mathbf{D}}_m^+\) is a Moore-Penrose inverse of \({\mathbf{D}}_m\). Denoting the score vector (A.1) as
we get
Using Magnus and Neudecker (1986)[formulas (54) and (36)] we may further simplify the above RS to
Finally, from idempotency and orthogonality of \({\mathbf{Q}}_u\) and \({\mathbf{P}}_u\) we obtain
Appendix B: Proof of Theorem 2
Due to Rao (2005) the Wald test statistics for testing composite null hypothesis \({\mathbf{h}}({\varvec{\theta }})={\mathbf{c}}\) can be represented as
where \({\mathbf{A}}({\varvec{\theta }})={\mathbf{H}}({\varvec{\theta }}){\mathbf{F}}^{-1}({\varvec{\theta }}){\mathbf{H}}'({\varvec{\theta }})\), with \({\mathbf{H}}({\varvec{\theta }})\) being the matrix of derivatives of \({\mathbf{h}}\) with respect to the components of \({\varvec{\theta }}\), \({\mathbf{F}}({\varvec{\theta }})\) is the Fisher information matrix and \(\widehat{{\varvec{\theta }}}\) is the MLE of \({\varvec{\theta }}\) under alternative. Considering hypothesis (9) we may note that the null hypothesis can be written as \({\text {vech}}({\varvec{\Delta }}_1-{\varvec{\Delta }}_2)={\mathbf{0}}_{m(m+1)/2}\) which is equivalent to \(\left( {\mathbf{I}}_{m(m+1)/2},-{\mathbf{I}}_{m(m+1)/2}\right) {\varvec{\theta }}={\mathbf{0}}_{m(m+1)/2}\), where \({\varvec{\theta }}=\left( {\text {vech}}'{\varvec{\Delta }}_1,{\text {vech}}'{\varvec{\Delta }}_2\right) '\). It follows that \({\mathbf{h}}({\varvec{\theta }})=\left( {\mathbf{I}},-{\mathbf{I}}\right) {\varvec{\theta }}\) with identity matrices of order \(m(m+1)/2\), and hence \({\mathbf{H}}({\varvec{\theta }})=\left( {\mathbf{I}},-{\mathbf{I}}\right) \), while \({\mathbf{F}}({\varvec{\theta }})\) is a block diagonal matrix with diagonal blocks given in (A.2) and equal to \(\frac{n}{2}{\mathbf{D}}'_m({\varvec{\Delta }}^{-1}_2 \otimes {\varvec{\Delta }}^{-1}_2){\mathbf{D}}_m\). We then obtain
and, plugging the inverse of \({\mathbf{A}}(\widehat{{\varvec{\theta }}})\) into the formula for Wald test statistic, we get
From the definition of duplication matrix we have \({\mathbf{D}}_m {\text {vech}}\left( \widehat{{\varvec{\Delta }}}_1-\widehat{{\varvec{\Delta }}}_2\right) ={\text {vec}}\left( \widehat{{\varvec{\Delta }}}_1-\widehat{{\varvec{\Delta }}}_2\right) \) and the thesis follows.
Appendix C: Proof of Theorem 3
Denoting \({\mathbf{A}}_1=n(u-1) \widehat{{\varvec{\Delta }}}_1\), \({\mathbf{A}}_2=n\widehat{{\varvec{\Delta }}}_2\) and observing that
we may write (11) as
where \({\mathbf{A}}_1 \sim W_m({\varvec{\Delta }}_1,(n-1)(u-1))\), \({\mathbf{A}}_2 \sim W_m({\varvec{\Delta }}_2,n-1)\) and \({\mathbf{A}}_1\) and \({\mathbf{A}}_2\) are independent; cf. Roy et al. (2015); Filipiak and Klein (2021).
Since the probability density function of any \(m \times m\) matrix \({\mathbf{W}}\sim W_m({\varvec{\Sigma }}, \nu )\) can be expressed as
with multivariate gamma function of order m given as
cf. Kollo and von Rosen (2005)[Th. 2.4.6], we may express the h-th moment of \(\mid {\mathbf{W}}\mid \) by the following formula
where \({\mathcal {W}}\) is the symmetric positive-definite matrices space. Hence, the h-th moment of \(\Lambda _*\) can be written as
where \({\mathbf{A}}={\mathbf{A}}_1+{\mathbf{A}}_2 \sim W_m({\varvec{\Delta }},(n-1+2h)u)\). Now, using (C.1), we can write
Appendix D: Proof of Proposition 1
It is enough to show that the distribution of the RS, LR and Wald test statistics presented respectively in Theorem 1, (11) and Theorem 2 under the null hypothesis given in (9) does not depend on the true value of \({\varvec{\Delta }}\).
The observation matrix \({\mathbf{X}}\) can be written as \({\mathbf{X}}={\mathbf{1}}_n{\mathbf{\mu }}'+{\mathbf{E}}{\varvec{\Omega }}^{1/2}\), where \({\varvec{\Omega }}^{1/2}={\mathbf{I}}_u\otimes {\varvec{\Delta }}^{1/2}\) with \({\varvec{\Delta }}^{1/2}{\varvec{\Delta }}^{1/2}={\varvec{\Delta }}\) and \({\mathbf{E}}\sim N_{n,um}({\mathbf{0}},{\mathbf{I}}_n,{\mathbf{I}}_{um})\), therefore
Hence, due to (10) and using Filipiak et al. (2018)[Lemma 2.11], we have
where \(\widehat{{\varvec{\Upsilon }}}_0\) does not depend on the true values of the unknown parameters. Similarly we find that the estimators in (6) can be expressed as
where both \(\widehat{{\varvec{\Upsilon }}}_1\) and \(\widehat{{\varvec{\Upsilon }}}_2\) do not depend on the true values of the unknown parameters, since
Therefore for estimator \(\widehat{{\varvec{\Gamma }}}\) given in (5) it holds
and after substituting into RST statistic given in Theorem 1 it can be presented as
The LRT statistic (11) and Wald test statistic given in Theorem 2 can be expressed as
respectively.
Rights and permissions
About this article
Cite this article
Filipiak, K., John, M. & Klein, D. Testing independence under a block compound symmetry covariance structure. Stat Papers 64, 677–704 (2023). https://doi.org/10.1007/s00362-022-01335-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-022-01335-7