1 Introduction

Covariance separability is an attractive property of covariance matrices, which can improve and simplify many multivariate procedures. A separable covariance matrix is defined by its representation as the Kronecker product of two covariance matrices. The most common multivariate procedures for a separable covariance matrix are those with the matrix normal distribution or transposable data (Dawid 1981; Gupta and Nagar 1999; Wang and West 2009). For example, Viroli (2010) and Glanz and Carvalho (2013) consider the mixture model and an EM procedure to estimate it. Yin and Li (2012) study the sparse Gaussian graphical model under the matrix normal assumption. Allen and Tibshirani (2012) and Tan and Witten (2014) study various inferential issues on transposable data, where transposability implies both rows and columns of the matrix data are correlated. Extensions to three-level multivariate data are also presented in the literature (e.g., Roy and Leiva 2008, 2011).

Due to its importance in inferential procedures, testing covariance separability has received considerable attention from previous researchers. Lu and Zimmerman (2005) and Mitchell et al. (2006) consider repeatedly measured multivariate data. They, independently, study the likelihood ratio (LR) statistic under the normality assumption, and propose an approximation to its quantiles under the null hypothesis. Fuentes (2006) studies the separability in spatio-temporal processes. She uses a spectral representation of the process, and reformulates the test problem to a simple two-way ANOVA problem. Li et al. (2007) work with stationary spatio-temporal random fields. Using the asymptotic normality of the estimated covariance functions, they build an unified framework for various testing problems. Recently, Filipiak et al. (2016, 2017) propose Rao’s score test (RST) under normality for the repeatedly measured multivariate data, and use the asymptotic chi-square distribution as its null distribution.

The normality is a common assumption in most of previous works on testing covariance separability, which is often not true in practice. In this paper, we propose permutation based testing procedures to resolve this difficulty. More specifically, we rewrite the null hypothesis on a separable covariance matrix \(\Sigma \) as an intersection of many individual sub-hypotheses. The sub-hypotheses are on separability of specially structured sub-matrices of \(\Sigma \), where the LR statistic is invariant to the permutation of (groups of) variables. Thus, the p value for each sub-hypothesis can be approximated numerically. The final decision is obtained by combining p values of individual sub-hypotheses using the Bonferroni and multi-stage additive procedures.

The remainder of this paper is organized as follows. In Sect. 2, we briefly introduce the LRT under normality assumption by Mitchell et al. (2006) and the RST by Filipiak et al. (2016, 2017). In Sect. 3, we propose permutation based procedures. We then apply our methods and also two existing procedures (the LRT and RST) to simulated data, and compare their sizes and powers in Sect. 4. We analyze two real data examples in Sect. 5. Concluding remarks and discussions are given in Sect. 6.

2 Existing procedures for testing separability

We consider a multivariate random variable \(\mathbf{X}_{p\times q}\),

$$\begin{aligned} \mathbf{X}_{p\times q}=\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l@{\quad }l} X_{11} &{} X_{12} &{} \cdots &{} X_{1q}\\ X_{21} &{} X_{22} &{} \cdots &{} X_{2q}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ X_{p1} &{} X_{p2} &{} \cdots &{} X_{pq} \end{array}\end{pmatrix}, \end{aligned}$$

and its vectorized variable

$$\begin{aligned} \mathbf{vX}=\big (X_{11},X_{12},\ldots ,X_{1q},X_{21},X_{22},\ldots ,X_{p1},X_{p2},\ldots ,X_{pq}\big )^{\mathrm{T}} \end{aligned}$$

which has mean \(\mu =\big (\mu _{11},\mu _{12},\dots ,\mu _{1q},\mu _{21},\mu _{22},\ldots ,\mu _{p1},\mu _{p2},\ldots ,\mu _{pq}\big )^{\mathrm{T}}\) and covariance matrix \(\Sigma \) (a \(pq\times pq\) matrix). Suppose we have n repeatedly measured observations (independently and identically distributed copies) of \(\mathbf{X}\) and \(\mathbf{vX}\), which are \(\mathbf{X}_{1},\mathbf{X}_{2},\ldots ,\mathbf{X}_{n}\) and \(\mathbf{vX}_{1},\mathbf{vX}_{2},\ldots ,\mathbf{vX}_{n}\), respectively. The goal of this paper is to test the separability of \(\Sigma \), i.e., \(\Sigma =\mathrm{U}\otimes \mathrm{V}\) for two covariance matrices \(\mathrm{U}=\big (u_{ij},1\le i,j\le p\big )\) (a \(p\times p\) matrix) and \(\mathrm{V}=\big (v_{kl},1\le k,l\le q\big )\) (a \(q\times q\) matrix) using these observations.

A popular procedure to test the separability of \(\Sigma \) is the likelihood ratio (LR) test under normality. Suppose \(\mathbf{vX}\) is normal and has the covariance matrix \(\Sigma =\mathrm{U}\otimes \mathrm{V}\). The maximum likelihood estimators (MLEs) of \(\mathrm{U}\) and \(\mathrm{V}\) are the solution to the following equations (Dutilleul 1999),

$$\begin{aligned} \widehat{\mathrm{V}}= & {} \frac{1}{pn}\sum _{h=1}^{n}(\mathbf{X}_{h}-\widehat{\mathbf{M}})^{\mathrm{T}}\widehat{\mathrm{U}}^{-1}(\mathbf{X}_{h}-\widehat{\mathbf{M}})\\ \widehat{\mathrm{U}}= & {} \frac{1}{qn}\sum _{h=1}^{n}(\mathbf{X}_{h}-\widehat{\mathbf{M}})\widehat{\mathrm{V}}^{-1}(\mathbf{X}_{h}-\widehat{\mathbf{M}})^{\mathrm{T}}, \end{aligned}$$

where \(\mathbf{M}\) is the \(p\times q\) matrix formed from \(\mu \) and \(\widehat{\mathbf{M}}\) is the sample average of \(\mathbf{X}_{1},\ldots ,\mathbf{X}_{n}\). The LR statistic is then found to be

$$\begin{aligned} \mathrm{lrt}=nq\log \big |\widehat{\mathrm{U}}\big |+np\log \big |\widehat{\mathrm{V}}\big |-n\log \big |\mathbf{S}\big |, \end{aligned}$$
(1)

where \(\mathbf{S}\) is the sample covariance matrix

$$\begin{aligned} \mathbf{S}=\frac{1}{n}\sum _{h=1}^{n}(\mathbf{vX}_{h}-\widehat{\mu })(\mathbf{vX}_{h}-\widehat{\mu })^{\mathrm{T}} \end{aligned}$$

and \(\widehat{\mu }\) is the vectorized estimator of \(\widehat{\mathbf{M}}\).

For the null distribution of the LR statistic in (1), Mitchell et al. (2006) propose to match its first moment to that of a scaled chi-square distribution. They first show that the null distribution of the LR statistic only depends on pq,  and n, not the specific form of \(\mathrm{U}\) and \(\mathrm{V}\). Thus, all separable covariance models \(\Sigma =\mathrm{U}\otimes \mathrm{V}\) yield the same null distribution given pqn. For each combination of (pqn), they approximate the null distribution with a scaled chi-square distribution:

$$\begin{aligned} \mathrm{lrt}\approx k\cdot \chi ^{2}(\xi ), \end{aligned}$$
(2)

where \(\xi =pq(pq+1)/2-p(p+1)/2-q(q+1)/2+1\) is the number of degrees of freedom of the asymptotic null distribution; and k is determined by matching the first moment of \(\mathrm{lrt}\) to that of the \(k\cdot \chi ^{2}(\xi )\), given by

$$\begin{aligned} k= & {} \big (1\big /\xi \big )\Bigg \{-n\Big (pq\log 2+\sum _{j=1}^{pq}\psi \big (0.5(n-j)\big )-pq\log {n}\Big )\\&-\big (n/(n-1)\big )\big (p(p+1)/2+q(q+1)/2+pq-1\big )\Bigg \}, \end{aligned}$$

where \(\psi \) is the digamma function. They further approximate the critical value (denoted by \(\mathrm{lrt}_{\alpha }\)) with that of the approximate scaled chi-square distribution as

$$\begin{aligned} \mathrm{lrt}_{\alpha }\simeq k\cdot \chi _{\alpha }^{2}(\xi ), \end{aligned}$$
(3)

where \(\text {lrt}_{\alpha }\) is the \(100\times (1-\alpha )\)-th quantile of the null distribution of \(\mathrm{lrt}\) and \(\chi _{\alpha }^{2}(\xi )\) is that of the chi-square distribution \(\chi ^{2}(\xi )\).

The above approximation by Mitchell et al. (2006) relies on the multivariate normality assumption in evaluating the expectation of \(\mathrm{lrt}\). In particular, they evaluate the expectation of the generalized variance \(\log |\mathbf{S}|\) under the multivariate normality. As shown in our numerical study in Sect. 4, if the data are from a non-normal distribution (in our study, we use a multivariate t-distribution with degrees of freedom 5 and multivariate version of the chi-square distribution with degrees of freedom 3), the critical value in (3) is severely biased and accordingly the size of the LRT is much larger than the aimed level.

As an alternative to the LRT, very recently, Filipiak et al. (2016, 2017) consider the RST under normality, whose testing statistic is

$$\begin{aligned} \text {rst}= & {} \frac{npq}{2} - \mathrm{tr} \Big ( (\widehat{\mathrm{U}}^{-1}\otimes \widehat{\mathrm{V}}^{-1}) \mathbf{Z}^\mathrm{T} \mathrm{P}^{\perp } \mathbf{Z} \Big ) \nonumber \\&+ \frac{1}{2n} \mathrm{tr} \Big ( (\widehat{\mathrm{U}}^{-1}\otimes \widehat{\mathrm{V}}^{-1}) \mathbf{Z}^\mathrm{T} \mathrm{P}^{\perp } \mathbf{Z} (\widehat{\mathrm{U}}^{-1}\otimes \widehat{\mathrm{V}}^{-1}) \mathbf{Z}^\mathrm{T} \mathrm{P}^{\perp } \mathbf{Z} \Big ), \end{aligned}$$
(4)

where \(\mathbf{Z} = \big (\mathbf{vX}_{1}, \ldots , \mathbf{vX}_{n}\big )^\mathrm{T}\) is a matrix having n rows by stacking transposed \(\mathbf{vX}_h, h=1,\ldots , n\), and \(\mathrm{P}^{\perp } = \mathrm{I}_n - n^{-1} 1_n 1_n^\mathrm{T}\) is a projection matrix onto orthogonal complement of the column space of n dimensional vector \(1_n=(1,\ldots ,1)^\mathrm{T}\), and \(\mathrm{tr}( \mathrm{A})\) is the trace of the matrix \(\mathrm{A}\).

The RST statistic has an advantage in its applicability to a small sized data, where the required number of samples is smaller (\(n > \max (p,q)\)) than that of LRT (\(n > pq\)). Also, it is well-known that the RST statistic is asymptotically distributed as the chi-square distribution with degrees of freedom \(\xi \) (defined above) under the null hypothesis. However, as pointed out in Filipiak et al. (2016, 2017), the finite sample distribution of (4) is unknown under the null hypothesis and also is quite different from the asymptotic. The authors numerically approximate its critical values using Monte Carlo samples from the normal distribution. Nonetheless, like the LRT, the performance of the RST also depends on the normality assumption as shown later in Fig. 1 and tables in Sect. 4.

3 Permutation based procedures

All procedures to test the covariance separability in the literature, which include Mitchell et al. (2006) and Filipiak et al. (2016, 2017), strongly depend on the normality assumption or large sample asymptotic, which is often not true in practice. In this section, we propose permutation based procedures that are free of any distributional assumption on the data. To do so, we first rewrite the null hypothesis (the separability of \(\Sigma \)) into the intersection of many small individual hypotheses on (a specific form of) sub-matrices of \(\Sigma \) and test the individual hypotheses via permutation.

Suppose, as defined earlier, \(\mathbf{vX}\) is a pq dimensional random vector of

$$\begin{aligned} \big (X_{11},X_{12},\ldots ,X_{1q},X_{21},X_{22},\ldots ,X_{2q},\ldots ,X_{p1},X_{p2},\ldots ,X_{pq}\big )^{\mathrm{T}} \end{aligned}$$

with covariance matrix \(\Sigma =\big (\sigma _{st},1\le s,t\le pq\big )\) (a \(pq\times pq\) matrix). The covariance matrix \(\Sigma \) is defined to be separable, if it can be written as the Kronecker product of two covariance matrices \(\mathrm{U}=\big (u_{ij},1\le i,j\le p\big )\) (a \(p\times p\) matrix) and \(\mathrm{V}=\big (v_{kl},1\le k,l\le q\big )\) (a \(q\times q\) matrix):

$$\begin{aligned} \Sigma =\mathrm{U}\otimes \mathrm{V}=\begin{pmatrix}u_{11}\mathrm{V} &{} \cdots &{} u_{1p}\mathrm{V}\\ \vdots &{} \ddots &{} \vdots \\ u_{p1}\mathrm{V} &{} \cdots &{} u_{pp}\mathrm{V} \end{pmatrix} \end{aligned}$$
(5)

Let

$$\begin{aligned} \Sigma _{ij}^{\mathrm{r}}= & {} \begin{pmatrix}\begin{array}{l@{\quad }l} u_{ii} &{} u_{ij}\\ u_{ji} &{} u_{jj} \end{array}\end{pmatrix}\otimes \mathrm{V}=\begin{pmatrix}\begin{array}{l@{\quad }l} u_{ii}\mathrm{V} &{} u_{ij}\mathrm{V}\\ u_{ji}\mathrm{V} &{} u_{jj}\mathrm{V} \end{array}\end{pmatrix}\\= & {} \begin{pmatrix}u_{ii}\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix} &{} u_{ij}\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix}\\ u_{ji}\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix} &{} u_{jj}\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix} \end{pmatrix}, \end{aligned}$$

which is the submatrix of \(\Sigma \) corresponding to the subvector

$$\begin{aligned} \mathbf{X}_{ij}^{\mathrm{r}}=\big (X_{i1},X_{i2},\ldots ,X_{iq},X_{j1},X_{j2},\ldots ,X_{jq}\big )^{\mathrm{T}}. \end{aligned}$$

Similarly, we define \(\Sigma _{kl}^{\mathrm{c}}\) as

$$\begin{aligned} \Sigma _{kl}^{\mathrm{c}}= & {} \begin{pmatrix}\begin{array}{l@{\quad }l} v_{kk} &{} v_{kl}\\ v_{lk} &{} v_{ll} \end{array}\end{pmatrix}\otimes \mathrm{U}=\begin{pmatrix}\begin{array}{l@{\quad }l} \mathrm{U}v_{kk} &{} v_{kl}\mathrm{U}\\ \mathrm{U}v_{lk} &{} v_{ll}\mathrm{U} \end{array}\end{pmatrix}\\= & {} \begin{pmatrix}\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} u_{11} &{} \cdots &{} u_{1p}\\ \vdots &{} \ddots &{} \vdots \\ u_{p1} &{} \cdots &{} u_{pp} \end{array}\end{pmatrix}v_{kk} &{} \begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} u_{11} &{} \cdots &{} u_{1p}\\ \vdots &{} \ddots &{} \vdots \\ u_{p1} &{} \cdots &{} u_{pp} \end{array}\end{pmatrix}v_{kl}\\ \begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} u_{11} &{} \cdots &{} u_{1p}\\ \vdots &{} \ddots &{} \vdots \\ u_{p1} &{} \cdots &{} u_{pp} \end{array}\end{pmatrix}v_{lk} &{} \begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} u_{11} &{} \cdots &{} u_{1p}\\ \vdots &{} \ddots &{} \vdots \\ u_{p1} &{} \cdots &{} u_{pp} \end{array}\end{pmatrix}v_{ll} \end{pmatrix}. \end{aligned}$$

Then, the hypothesis “\(\Sigma \) is separable” is equivalent to “\(\Sigma _{ij}^{\mathrm{r}}\) and \(\Sigma _{kl}^{\mathrm{c}}\) are separable for all \(1\le i<j\le p\) and \(1\le k<l\le q\).” We let \(\mathcal {H}_{0,ij}^{\mathrm{r}}\) and \(\mathcal {H}_{0,kl}^{\mathrm{c}}\) be the separability hypotheses on \(\Sigma _{ij}^{\mathrm{r}}\) and \(\Sigma _{kl}^{\mathrm{c}}\), respectively, for each choice of (ij) and (kl). Below we propose a permutation based procedure to test the individual hypotheses \(\mathcal {H}_{0,ij}^{\mathrm{r}}\) and \(\mathcal {H}_{0,kl}^{\mathrm{c}}\).

We consider the test \(\mathcal {H}_{0,ij}^{\mathrm{r}}\) which is the hypothesis on the covariance matrix of the sub-vector

$$\begin{aligned} \mathbf{X}_{ij}^{\mathrm{r}}=\big (X_{i1},X_{i2},\ldots ,X_{iq},X_{j1},X_{j2},\ldots ,X_{jq}\big )^{\mathrm{T}}. \end{aligned}$$

Here, we assume \(u_{ii}=u_{jj}\) (say they are equal to 1) and, before implementing the procedure below for the data, we standardize the data to take this assumption into account. Under this assumption, \(\mathbf{X}_{ij}^{\mathrm{r}}\) has the covariance matrix

$$\begin{aligned} \Sigma _{ij}^{\mathrm{r}}=\begin{pmatrix}\mathrm{V} &{} u_{ij}\mathrm{V}\\ u_{ji}\mathrm{V} &{} \mathrm{V} \end{pmatrix}=\begin{pmatrix}~~~\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix} &{} u_{ij}\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix}\\ u_{ji}\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix} &{} ~~~\begin{pmatrix}\begin{array}{l@{\quad }l@{\quad }l} v_{11} &{} \cdots &{} v_{1q}\\ \vdots &{} \ddots &{} \vdots \\ v_{q1} &{} \cdots &{} v_{qq} \end{array}\end{pmatrix} \end{pmatrix}. \end{aligned}$$
(6)

In addition,

$$\begin{aligned} \mathbf{X}_{ji}^{\mathrm{r}}=\big (X_{j1},X_{j2},\ldots ,X_{jq},X_{i1},X_{i2},\ldots ,X_{iq}\big )^{\mathrm{T}} \end{aligned}$$

has the same covariance matrix with that of \(\mathbf{X}_{ij}^{\mathrm{r}}\). Thus, if the distribution of \(\mathbf{X}\) is specified only based on its mean and covariance matrix (for example, the elliptical distribution in Anderson 2003), the distributions of \(\mathbf{X}_{ij}^{\mathrm{r}}\) and \(\mathbf{X}_{ji}^{\mathrm{r}}\) are equal if they have common means. The detail about permutation of multivariate data can be found in Li et al. (2010, 2012) and Klingenberg et al. (2009).

The above allows us to construct a permutation based testing procedure for the sub-hypotheses \(\mathcal {H}_{0,ij}^{\mathrm{r}}\) and also for \(\mathcal {H}_{0,kl}^{\mathrm{c}}\). To be specific, let \(\mathbf{Y}^{h},h=1,2,\ldots ,n,\) be independent copies of \((\mathbf{X}_{ij}^{\mathrm{r}})^\mathrm{T}\) and let the likelihood ratio test (LRT) statistic for \(\mathcal {H}_{0,ij}^{\mathrm{r}}\) using \(\mathbf{Y}^{h}\)’s be \(\mathbf{T}\), that is,

$$\begin{aligned} \mathbf{T}=nq \log \big |\widehat{\mathrm{U}}_{ij}\big |+2n \log \big |\widehat{\mathrm{V}}\big |- n\log \big |\mathbf{S}_{[i,j]} \big |, \end{aligned}$$
(7)

where \(\widehat{\mathrm{U}}_{ij}\) and \(\widehat{\mathrm{V}}\) are the MLE of \(\mathrm{U}_{ij}=[1,u_{ij};u_{ji}, 1]\), \(\mathrm{V}\), and \(\mathbf{S}_{[i,j]}\) is the sample covariance matrix obtained from \(\mathbf{Y}^{h}\)’s. Note that we assume \(p=2\) for the time being, so that \(\mathbf{Y}^{h}=\big (\mathbf{Y}_{1}^{h},\mathbf{Y}_{2}^{h}\big )\), where \(\mathbf{Y}_{1}^{h}=\big (X_{i1}^{h},\ldots ,X_{iq}^{h}\big )\) and \(\mathbf{Y}_{2}^{h}=\big (X_{j1}^{h},\ldots ,X_{jq}^{h}\big )\). Suppose \(\pi =\big (\pi (h),h=1,2,\ldots ,n\big )\) is a vector of i.i.d. random numbers having values 0 or 1 with probability 1 / 2. The permutation of \(\big \{\mathbf{Y}^{h}=\big (\mathbf{Y}_{1}^{h},\mathbf{Y}_{2}^{h}\big ),h=1,2,\ldots ,n\big \}\) for \(\pi \) is defined as \(\big \{\mathbf{Y}^{h}(\pi )=\big (\mathbf{Y}_{1}^{h}(\pi ),\mathbf{Y}_{2}^{h}(\pi )\big ),h=1,2,\ldots ,n\big \}\), where

$$\begin{aligned} \mathbf{Y}^{h}(\pi )=\big (\mathbf{Y}_{1}^{h}(\pi ),\mathbf{Y}_{2}^{h}(\pi )\big )=\left\{ \begin{array}{l@{\quad }l} \big (\mathbf{Y}_{1}^{h},\mathbf{Y}_{2}^{h}\big ) &{} \quad \text{ if } \pi (h)=0\\ \big (\mathbf{Y}_{2}^{h},\mathbf{Y}_{1}^{h}\big ) &{} \quad \text{ if } \pi (h)=1. \end{array}\right. \end{aligned}$$

In the sequel, the permuted LR statistic corresponding to \(\pi \) is computed as

$$\begin{aligned} \mathbf{T} ({ \pi })=nq \log \big |\widehat{\mathrm{U}}_{ij}( { \pi })\big |+2n \log \big |\widehat{\mathrm{V}}({\pi })\big |- n\log \big |\mathbf{S}_{[i,j]} ({\pi }) \big |, \end{aligned}$$
(8)

where \(\widehat{\mathrm{U}}_{ij}(\pi )\), \(\widehat{\mathrm{V}}(\pi )\), and \(\mathbf{S}_{[i,j]} ({ \pi })\) are the estimators with the \(1 \times 2q\) permuted samples \(\big \{\mathbf{Y}^{h}(\pi )=\big (\mathbf{Y}_{1}^{h}(\pi ),\mathbf{Y}_{2}^{h}(\pi )\big ),h=1,2,\ldots ,n\big \}\) defined above. We approximate the null distribution of \(\mathbf{T}\) with the empirical distribution function of the permuted statistics \(\big \{\mathbf{T}(\pi ),\pi \in \Pi \big \}\), where \(\Pi \) is the collection of all possible permutations, and then evaluate the p value for the hypothesis \(\mathcal {H}_{0,ij}^{\mathrm{r}}\). The above permutation algorithm is summarized as follows. For the column-wise permutation algorithm, we replace “rows” with “columns”; and “p” with “q”.

figure a

Finally, we do the same procedure for each individual hypothesis \(\mathcal {H}_{0,ij}^{\mathrm{r}}\) and \(\mathcal {H}_{0,kl}^{\mathrm{c}}\) and obtain their p values \(p_{ij}^{\mathrm{r}}\) and \(p_{kl}^{\mathrm{c}}\).

Our next step is to combine p values of individual sub-hypotheses to make a final decision on the separability of \(\Sigma \), given the overall significance level \(\alpha \). Here, we consider two combining procedures, both based on the Bonferroni correction. The first procedure, named as “m-perm”, considers only the sub-hypotheses along with the smaller dimension of the data matrix. If p is smaller than q, we consider the sub-hypotheses \(\mathcal {H}_{0,ij}^{\mathrm{r}}\), \(i,j=1,2,\ldots ,p\) with \(i<j\). The row-wise \(\left( {\begin{array}{c}p\\ 2\end{array}}\right) \) pairs of hypotheses result in p values \(\{p_{12}^{\mathrm{r}},p_{13}^{\mathrm{r}},\ldots ,p_{p-1,p}^{\mathrm{r}}\}\), and the Bonferroni correction compares all individual p values \(p_{ij}^{\mathrm{r}}\) with the adjusted significance level \(\alpha /\left( {\begin{array}{c}p\\ 2\end{array}}\right) \). Equivalently, the p value is set as \(\left( {\begin{array}{c}p\\ 2\end{array}}\right) \min _{i<j}p_{ij}^{\mathrm{r}}\). The second procedure, named as “two-s”, is the combination of the row and column-wise Bonferroni procedures with the idea of multi-stage additive testing (Sheng and Qiu 2007). Here, we first test the separability by testing the row-wise sub-hypotheses (here, \(p<q\) is assumed) using the Bonferroni procedure at level \(\gamma _{1}\). If the separability is not rejected at the first stage, we further test the column-wise sub-hypotheses using the Bonferroni procedure at level \(\gamma _{2}\). The p value of the two stage procedure becomes \(\left( {\begin{array}{c}p\\ 2\end{array}}\right) \min _{i<j}p_{ij}^{\mathrm{r}}\), if \(\left( {\begin{array}{c}p\\ 2\end{array}}\right) \min _{i<j}p_{ij}^{\mathrm{r}}\le \gamma _{1}\); \(\gamma _{1}+(1-\gamma _{1})\left( {\begin{array}{c}q\\ 2\end{array}}\right) \min _{k<l}p_{kl}^{\mathrm{c}}\), otherwise. In this paper, we set the significance levels \(\gamma _{1}\) and \(\gamma _{2}\) to be equal, i.e., \(\gamma =1-\sqrt{1-\alpha }\), following Sheng and Qiu (2007).

Despite the additional efforts on combining p values, the proposed procedures have at least two advantages over the existing LRT and RST. First, the LRT and RST strongly depend on the normality assumption, which is often not true in practice. In addition, even under the normality, the null distributions of the LR statistic and RST statistic are still not fully characterized, and several approximate formulas are proposed or Monte Carlo approximation is used in the literature. Unlike the existing procedures, our procedures are distribution free and can be used with minimal distributional assumptions. Second, our procedures are applicable to small sized data. For example, the required sample size for our procedures is \(n>2\min (p,q)\), while that of the LRT and RST is \(n>pq\) and \(n > \max (p,q)\), respectively.

4 A numerical study

We numerically investigate the sizes and powers of the proposed permutation based procedures and compare their performances to those of the LRT and the RST. Here, we use the linear model as in Mitchell et al. (2006); that is

$$\begin{aligned} \mathbf{X}=\mathbf{M}+\mathbf{E}, \end{aligned}$$
(9)

where \(\mathbf{M}\) is the \(p\times q\) matrix corresponding to the mean vector \(\mu \); \(\mathbf{E}\) is the \(p\times q\) error matrix, whose (ij)-th element is \(e_{ij}\), and let

$$\begin{aligned} \mathbf{vE}=\big (e_{11},e_{12},\ldots ,e_{1q},e_{21},\ldots ,e_{2q},\ldots ,e_{p1},\ldots ,e_{pq}\big )^{\mathrm{T}}. \end{aligned}$$

We consider three distributions for \(\mathbf{vE}\) with mean 0 and covariance matrix \(\Sigma \); (i) the multivariate normal distribution, (ii) the multivariate t-distribution with degrees of freedom 5, and (iii) the “multivariate” chi-square distribution with degrees of freedom 3. Contrary to the first two, the last distribution is asymmetrical, which is generated by

$$\begin{aligned} \mathbf{vE} = \Sigma ^{1/2}(\mathbf{vF} - \mu _F), \end{aligned}$$

where each component of \(\mathbf{vF}\) independently follows a univariate chi-square distribution with degrees of freedom 3, and a mean vector \(\mu _F\) of \(\mathbf{vF}\) is used to center the error at the mean zero. It is worthwhile to point out that these multivariate distributions can be characterized only by mean and covariance matrix when degrees of freedom, if relevant, are fixed, which ensures the applicability of our procedures for testing the covariance matrix. We also assume that \(\mathbf{M}\) is the zero matrix for simplicity. Following Mitchell et al. (2006), we set \(p=4\) (the row size), \(q=3,5,10\) (the column size), and \(n=20,25,50,75\) (the number of replicated samples). The covariance matrix is assumed to have the form of

$$\begin{aligned} C\Big [(i,t+k),(j,t)\Big ]:=\mathrm{cov}\big (X_{i(t+k)},X_{jt}\big )=\sigma ^{2}\big [\gamma \mathrm{I}(i\ne j)+\mathrm{I}(i=j)\big ]\frac{\rho _{i}^{k}}{1-\rho _{i}\rho _{j}}, \end{aligned}$$
(10)

where \(1\le i,j\le 4\), \(1\le t\le t+k\le q\), and \(\mathrm{I}(A)\) is an indicator function for the event A. For instance, the model (10) with \(p=4,q=3\) is written by

$$\begin{aligned} \Sigma =\begin{pmatrix}\Sigma _{11} &{} \Sigma _{12} &{} \Sigma _{13} &{} \Sigma _{14}\\ \Sigma _{21} &{} \Sigma _{22} &{} \Sigma _{23} &{} \Sigma _{24}\\ \Sigma _{31} &{} \Sigma _{32} &{} \Sigma _{33} &{} \Sigma _{34}\\ \Sigma _{41} &{} \Sigma _{42} &{} \Sigma _{43} &{} \Sigma _{44} \end{pmatrix}, \end{aligned}$$

where each block matrix is defined by

$$\begin{aligned} \Sigma _{ij}=\sigma ^{2}*\gamma ^{\mathrm{I}(i\ne j)}*\frac{1}{1-\rho _{i}\rho _{j}}*\begin{pmatrix}1 &{} \rho _{j} &{} \rho _{j}^{2}\\ \rho _{i} &{} 1 &{} \rho _{j}\\ \rho _{i}^{2} &{} \rho _{i} &{} 1 \end{pmatrix}, \end{aligned}$$
(11)

where \(*\) indicates elementwise multiplication. In the study below, we set \(\sigma =1\), and \(\gamma =0.7\).

We first examine the magnitude of biases of Mitchell et al. (2006)’s approximation for the LRT and the RST (with the asymptotic chi-square distribution) to the critical value when the normality is violated. To do it, we generate samples from the multivariate t-distribution where the covariance is assumed to be separable (\((\rho _{1},\rho _{2},\rho _{3},\rho _{4})=(0.6,0.6,0.6,0.6)\)). We fix row and column sizes as \(p=4,q=3\) and compute 10, 000 LRT and RST statistics with different sample sizes \(n=20,25,50,75\) and degrees of freedom \(\mathrm{df}=5,10,30,\infty \) (\(\infty \) corresponds to the multivariate normal distribution). To understand the approximation error by Mitchell et al. (2006) and Filipiak et al. (2016, 2017), we consider the differences between the empirical 95-th percentile and the approximation by (3) for the LRT and that by the asymptotic chi-square distribution for the RST. Figure 1 shows that the approximate critical values by both methods are biased, and the magnitude of the biases increases as either the number of samples increases (decreases) when the degrees of freedom are small (large) or the non-normality grows (the degrees of freedom decreases).

Fig. 1
figure 1

The difference between the true critical value and its approximation from the multivariate t-distribution under separability

Next, we compare the empirical sizes and powers of the permutation based procedures to the LRT based on (3) and RST. To evaluate the empirical sizes and powers, we consider three hypotheses: (i) null (“N”) hypothesis: \((\rho _{1},\rho _{2},\rho _{3},\rho _{4})=(0.6,0.6,0.6,0.6)\), (ii) the first alternative (“A1”) hypothesis: \((\rho _{1},\rho _{2},\rho _{3},\rho _{4})=(0.6,0.65,0.7,0.75)\), the second alternative (“A2”) hypothesis: \((\rho _{1},\rho _{2},\rho _{3},\rho _{4})=(0.9,0.7,0.7,0.45)\). We generate 500 data sets from the model (9) for each combination of p,q, and n. In each data set, we use 2000 permuted samples to calculate the p value. The significance level \(\alpha \) is set to 0.05. The empirical sizes and powers are reported in Table 1.

Table 1 Empirical sizes and powers: the “m-perm” indicates the Bonferroni test along with smaller dimension, “two-s” implies the two-stage additive test having the Bonferroni correction at each stage, “lrt” indicates the LRT under normality given by Mitchell et al. (2006), and “rst” presents the RST
Table 2 The case of \(p=4,q=3\): “p-perm” and “q-perm” denote empirical powers using row-wise and column-wise permutation, respectively

Table 1 first shows that the proposed permutation based procedures work better than both the LRT with the approximation in (3) and RST in controlling the size at the aimed level 0.05. The sizes of both the LRT with (3) and RST become larger than s0.05 when the underlying distribution has heavier tails (e.g. the t-distribution with smaller degrees of freedom). In addition, when the data are from an asymmetric distribution (e.g. the chi-square distribution), the same type of upward bias occurs in the size.

Second, the performance of the m-perm depends on the permutation direction (either row-wise or column-wise) as well as the true covariance \(\Sigma \). In both alternatives “A1” and “A2”, the covariance matrix of the column-wise selection (for example, \(\mathbf{X}_{12}^{\mathrm{c}}\)) is conjectured to be less separable than that of the row-wise selection (for example, \(\mathbf{X}_{12}^{\mathrm{r}}\)) and, thus, the choice of the column-wise permutation (the cases \(p>q\)) shows more power than the row-wise permutation. Table 2 reports the empirical powers (for the cases of \(p=4\) and \(q=3\)) of the Bonferroni procedure (at level \(\alpha \)) based on the row-wise permutation together with those based on the column-wise permutation. This shows the Bonferroni test for sub-hypotheses along with the row-wise direction (p-perm) now has lower power than the column-wise permutation (q-perm); the p-perm often has lower power than both the size-corrected LRT and RST.

Third, the two-s, our second permutation based procedure, considers both row-wise and column-wise permutations and has higher empirical power than the LRT in all but one case (normal distribution and with \(q=10\), \(r=75\) under A1). Compared to the RST, it tends to have higher power when \((p,q)=(4,3)\) and (4, 5), but lower power when \((p,q)=(4,10)\) (the case p and q are very unequal). It is interesting to see that it has higher empirical power than both the LRT and RST even for some cases with the data from the normal distribution. We conjecture this is because the sample size n is relatively small compared to the dimension of the data pq. Here, we remark that both the theoretical reference distributions of the LRT and RST are not available in practice.

Finally, we conclude the section with a short report on the computation, which was done with R ver. 3.4.2 and carried out on a PC with Intel Core i7 3.0 GHz processor. The average CPU time without parallel computing for the case \(p=4, q=5, n=50\) using 2000 permutation steps is 32.04 seconds with standard deviation 1.89 seconds based on 100 repetitions.

5 Data examples

5.1 Tooth size data

We now apply our method to testing the covariance separability of the tooth size data, which were obtained as a part of Korean National Occlusion Study, conducted from 1997 to 2005 (Wang et al. 2006; Lee et al. 2007). Here, we use the tooth sizes of 179 young Korean men who passed predefined selection criteria among 15, 836 respondents recorded in this dataset, to test separability in the analysis. The observation of each subject has a \(2\times 14\) matrix form, where the first row consists of sizes of 14 teeth in maxilla and the second row consists of those in mandible. We write the size matrix of the h-th subject as

$$\begin{aligned} \mathbf{X}_{h}=\begin{pmatrix}X_{h1}^{\mathrm{X}},\cdots ,X_{h7}^{\mathrm{X}},X_{h8}^{\mathrm{X}},\cdots ,X_{h14}^{\mathrm{X}}\\ X_{h2}^{\mathrm{N}},\cdots ,X_{h7}^{\mathrm{N}},X_{h8}^{\mathrm{N}},\cdots ,X_{h14}^{\mathrm{N}} \end{pmatrix} \end{aligned}$$

where “\(\mathrm{X}\)” in the upper-script is short for maxilla and “\(\mathrm{N}\)” for mandible. In this section, we are interested in testing the covariance matrix of \(\mathbf{X}\) (its vectorized version), say \(\Sigma \), is separable as the Kronecker product of \(\mathrm{U}_{2\times 2}\) (the common covariance matrix between sizes of the upper and lower teeth at the same location) and \(\mathrm{V}_{14\times 14}\) (the covariance matrix among sizes of the teeth within maxillar (or mandible)). In short, we test the hypothesis \(\mathcal {H}_{0}:\Sigma =\mathrm{U}_{2\times 2}\otimes \mathrm{V}_{14\times 14}\).

Before testing the hypothesis on separability, we check the normality assumption which existing procedures rely on. We apply three popular testing procedures for the normality which are available from “MVN” R package by Korkmaz et al. (2014); they are Mardia’s (1970), Henze–Zirkler’s (1990), and Royston’s test (Shapiro and Wilk 1964; Royston 1983, 1992). All of the results clearly indicate non-normality of tooth sizes (p value = 0), and univariate Q–Q plots (Fig. 2) confirm it again, especially with “X4R” and “N2R”.

Fig. 2
figure 2

Univariate Q–Q plots for men’s right-side tooth sizes that are centered. The vertical, horizontal axes represent sample, theoretical quantiles, respectively

To test the separability, we use the centered data by subtracting the mean vector. The cross-covariance matrix between maxillary and mandible regions is estimated from the unstructured sample covariance matrix \(\mathbf{S}\) and the maximum likelihood estimator under separability \(\widehat{\mathrm{U}}\otimes \widehat{\mathrm{V}}\), respectively, as:

$$\begin{aligned} \mathbf{S}[1:7,15:21]=\begin{pmatrix} 0.082&{}\quad 0.106 &{}\quad 0.085 &{}\quad 0.067 &{}\quad 0.058 &{}\quad 0.083 &{}\quad 0.107\\ 0.075 &{}\quad 0.104 &{}\quad 0.102 &{}\quad 0.087 &{}\quad 0.091 &{}\quad 0.067 &{}\quad 0.107 \\ 0.050 &{}\quad 0.088 &{}\quad 0.124 &{}\quad 0.103 &{}\quad 0.097 &{}\quad 0.092 &{}\quad 0.113 \\ 0.046 &{}\quad 0.069 &{}\quad 0.086 &{}\quad 0.115 &{}\quad 0.106 &{}\quad 0.104 &{}\quad 0.115 \\ 0.047 &{}\quad 0.060 &{}\quad 0.061 &{}\quad 0.107 &{}\quad 0.118 &{}\quad 0.116 &{}\quad 0.122 \\ 0.061 &{}\quad 0.067 &{}\quad 0.072 &{}\quad 0.106 &{}\quad 0.126 &{}\quad 0.202 &{}\quad 0.180 \\ 0.061 &{}\quad 0.069 &{}\quad 0.089 &{}\quad 0.129 &{}\quad 0.125 &{}\quad 0.197 &{}\quad 0.269 \\ \end{pmatrix} \end{aligned}$$

and

$$\begin{aligned} (\widehat{\mathrm{U}}\otimes \widehat{\mathrm{V}})[1:7,15:21]=\begin{pmatrix}\ 0.033 &{}\quad 0.017 &{}\quad 0.015 &{}\quad 0.011 &{}\quad 0.010 &{}\quad 0.013 &{}\quad 0.014 \\ 0.017 &{}\quad 0.042 &{}\quad 0.017 &{}\quad 0.016 &{}\quad 0.012 &{}\quad 0.012 &{}\quad 0.017 \\ 0.015 &{}\quad 0.017 &{}\quad 0.033 &{}\quad 0.018 &{}\quad 0.015 &{}\quad 0.014 &{}\quad 0.017 \\ 0.011 &{}\quad 0.016 &{}\quad 0.018 &{}\quad 0.034 &{}\quad 0.022 &{}\quad 0.018 &{}\quad 0.023 \\ 0.010 &{}\quad 0.012 &{}\quad 0.015 &{}\quad 0.022 &{}\quad 0.034 &{}\quad 0.023 &{}\quad 0.028 \\ 0.013 &{}\quad 0.012 &{}\quad 0.014 &{}\quad 0.018 &{}\quad 0.023 &{}\quad 0.082 &{}\quad 0.038 \\ 0.014 &{}\quad 0.017 &{}\quad 0.017 &{}\quad 0.023 &{}\quad 0.028 &{}\quad 0.038 &{}\quad 0.090 \end{pmatrix} \end{aligned}$$

where M[a : bc : d] denotes a submatrix of M from a-th row to b-th row and from c-th column to d-th column, producing a \((b-a+1)\times (d-c+1)\) matrix. From the above, we find a significant difference between \(\mathbf{S}[1:7,15:21]\) and \((\widehat{\mathrm{U}}\otimes \widehat{\mathrm{V}})[1:7,15:21]\). We approximate the p value using the procedure in Sect. 3, which is approximated to 0.025 by the two-s procedure. Here, the number of permuted data sets to approximate the p value is set as 10,000. On the other hand, the LRT statistic in (1) is evaluated as 921.91, and the critical value under normality approximated by Mitchell et al. (2006) is 368.26 at the significant level \(\alpha =0.01\). The RST statistic is evaluated as 741.45 with the asymptotic critical value \(\chi ^2_{.01}(299) = 358.81\) and empirical critical value 358.27 at \(\alpha =0.01\).

5.2 Corpus callosum thickness

Our second example is about two-year longitudinal MRI scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). According to Lee et al. (2016), the corpus callosum (CC) thickness profile is calculated based on CC segmentation at equally spaced intervals. To be specific, the CC thicknesses of 135 subjects are measured at 99 points for each year. The separability hypothesis to be tested is \(\mathcal {H}_{0}:\Sigma =\mathrm{U}_{2\times 2}\otimes \mathrm{V}_{99\times 99}\), where we expect \(\mathrm{U}\) and \(\mathrm{V}\) to explain the covariance structure in CC thickness of the repeated measurements and the measurements within a subject, respectively.

The LRT based on the normality is not applicable to this dataset for two reasons. First, the multivariate normal tests provided by the MVN R-package reveal that the data do not satisfy the multivariate normality. This can also be observed from Fig. 3, in which each shown variable has a heavy right tail. Second, the sample size (\(n=135\)) is less than the number of measure points (\(pq=99\times 2=198\)), making the LR statistic undefined.

Fig. 3
figure 3

Univariate Q–Q plots for MRI data, where each columns are centered

We apply the proposed permutation procedures to the data. More precisely, we apply the permutation test to the sub-hypotheses of all \(\left( {\begin{array}{c}99\\ 2\end{array}}\right) \) column-wise pairs. The permutation test for each sub-hypothesis is for the bivariate paired data with the size of \(135\times 4\); for example, if the column pair (kl) is chosen, the bivariate paired data are \(\big \{\big (X_{hk}^1,X_{hl}^1,X_{hk}^2,X_{hl}^2 \big ),h=1,2,\ldots ,135\big \}\). We use 100, 000 random permutations to evaluate the p value of each sub-hypothesis, and the Bonferroni adjusted p value is given by \(\left( {\begin{array}{c}99\\ 2\end{array}}\right) \min _{k<l}p_{kl}^{\mathrm{c}}\), where \(p_{kl}^{\mathrm{c}}\) is the p value for the sub-hypothesis \(\mathcal {H}_{kl}^{\mathrm{c}}\) for the (kl)-th column pair. The p value evaluated is less than 0.0001. The RST statistic is evaluated as 16, 346.75 with the asymptotic and empirical critical value as \(\chi ^2_{.01}(14749) = 15,151.49\) and 15, 095.09, respectively.

6 Discussion

In this paper, we propose permutation based procedures to test the separability of covariance matrices. The procedure divides the null hypothesis on a separable covariance matrix into many sub-hypotheses, which are testable via a permutation method. Compared to the existing LRT and RST under normality, the proposed procedures are distribution free and robust to non-normality of the data. In addition, it is applicable to small sized data, whose size is smaller than dimension of the covariance matrix under test. The numerical study and data examples show that the proposed permutation procedures are more powerful when the data are non-normal and the dimension is high.

Theory on permutation procedures has been well developed for linear permutation test statistics (Strasser and Weber 1999; Finos and Salmaso 2005; Pesarin and Salmaso 2010; Bertoluzzo et al. 2013) and its computational tool is publicly available (the R package “coin” by Hothorn et al. 2017). In our procedure, permutation is applied to testing the sub-hypotheses \(\mathcal {H}_{0,ij}^{\mathrm{r}}\) and \(\mathcal {H}_{0,kl}^{\mathrm{c}}\) for each choice of (ij) and (kl); the sub-hypotheses are on the separability of \(\Sigma _{ij}^{\mathrm{r}}\hbox {s}\) and \(\Sigma _{kl}^{\mathrm{c}}\hbox {s}\). Here, we use the LR statistic, one of a few known statistics for testing covariance separability. The LR statistic is non-linear and, thus the CRAN package and the asymptotic results of Strasser and Weber (1999) can not be directly applied to it. However, we conjecture that, for an appropriately chosen permutation test statistic, we may encapsulate our problem into the existing conditional inference framework and achieve the proven optimality.

Our procedures in this paper use the Bonferroni rule to combine p values from testing of many sub-hypotheses. In our problem, the p values of individual sub-hypotheses are conjectured to be strongly dependent to each other by its nature. For this reason, we adopt the Bonferroni rule to ensure the size of the combined test to be less than the aimed level despite of its conservativeness. In addition, the additional numerical study not reported here shows that the well-known Fisher’s omnibus and Lipták’s rules (under the assumption of independent p values) are severely biased in their sizes for non-normal data. The same reasoning would be applied to the direct aggregation of individual LR statistics. We could not specify the null distribution of the aggregated statistic due to the dependency among individual statistics. We thus have difficulty in proceeding with this.

We finally conclude the paper with the remark that the procedure of this paper can easily be applied to testing more complexly structured covariance matrix. Suppose we consider repeatedly measured spatial data (on the lattice system) or image data. The observation of a single subject has the form of a three-way array \(\mathbf{X}_{ijk}\), \(i=1,2,\ldots ,a\), \(j=1,2,\ldots ,b\), and \(k=1,2,\ldots ,c\) (with the dimension of \(a \times b \times c\)), and its separable covariance matrix has the form of \(\Sigma =\mathrm{A} \otimes \mathrm{B} \otimes \mathrm{C}\) (here, \(\mathrm{A}\), \(\mathrm{B}\), and \(\mathrm{C}\) are \(a \times a\), \(b \times b\), and \(c \times c\) covariance matrix, respectively). To test the hypothesis \(\Sigma =\mathrm{A} \otimes \mathrm{B} \otimes \mathrm{C}\), we read the data as a matrix form as \(\mathbf{Y}_{s,k} = \mathbf{X}_{[ij]k}\) with \(s=1,2,\ldots ,ab, k=1,2,\ldots c\), and test the sub-hypothesis \(\mathcal {H}_{0,\{[12],3\}}: \Sigma =\mathrm{U}_1 \otimes \mathrm{C} \Big (=\big ( \mathrm{A} \otimes \mathrm{B} \big ) \otimes \mathrm{C}\Big )\) at the level \(\alpha _1\). We repeat the same procedure for the other two sub-hypotheses \(\mathcal {H}_{0,\{1,[23]\}}: \Sigma = \mathrm{A} \otimes \mathrm{U}_2 \Big (= \mathrm{A} \otimes \big ( \mathrm{B} \otimes \mathrm{C} \big ) \Big )\) and \(\mathcal {H}_{0,\{2,[13]\}}: \Sigma =\mathrm{B} \otimes \mathrm{U}_3 \Big (= \mathrm{B} \otimes \big ( \mathrm{A} \otimes \mathrm{C} \big )\Big )\) at the level \(\alpha _2\) and \(\alpha _3\), respectively. Finally, we combine the results of three sub-hypotheses with the Bonferroni or multi-stage additive procedure discussed in Sect. 3.