Testing high-dimensional mean vector with applications

Zhang, Jin-Ting; Zhou, Bu; Guo, Jia

doi:10.1007/s00362-021-01270-z

Testing high-dimensional mean vector with applications

A normal reference approach

Regular Article
Published: 28 October 2021

Volume 63, pages 1105–1137, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Papers Aims and scope Submit manuscript

Testing high-dimensional mean vector with applications

Download PDF

668 Accesses
12 Citations
Explore all metrics

Abstract

A centered $L^2$-norm based test statistic is used for testing if a high-dimensional mean vector equals zero where the data dimension may be much larger than the sample size. Inspired by the fact that under some regularity conditions the asymptotic null distributions of the proposed test are the same as the limiting distributions of a chi-square-mixture, a three-cumulant matched chi-square-approximation is suggested to approximate this null distribution. The asymptotic power of the proposed test under a local alternative is established and the effect of data non-normality is discussed. A simulation study under various settings demonstrates that in terms of size control, the proposed test performs significantly better than some existing competitors. Several real data examples are presented to illustrate the wide applicability of the proposed test to a variety of high-dimensional data analysis problems, including the one-sample problem, paired two-sample problem, and MANOVA for correlated samples or independent samples.

A simultaneous testing of the mean vector and the covariance matrix among two populations for high-dimensional data

Article 22 October 2017

High-dimensional Tests for Mean Vector: Approaches without Estimating the Mean Vector Directly

Article 01 January 2022

Linear hypothesis testing in high-dimensional one-way MANOVA: a new normal reference approach

Article 12 September 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we are interested in the one-sample problem for high-dimensional data. Here “high dimension” means “the data dimension is close to or even much larger than the sample size”. High-dimensional data are encountered when many measurements are taken on only a few subjects. For example, in DNA microarray data, thousands of gene expression levels are often measured on a relatively few subjects. With rapid development of data collecting technologies, high-dimensional data become rather common and attract many research efforts nowadays. Many new methods are proposed for high-dimensional hypothesis testing problems about mean vectors or covariance matrices in recent years, see, for example, Li et al. (2020), Bai et al. (2021), Zhang et al. (2021), and Silva et al. (2021) among others. The canonical one-sample problem aims to test if the population mean vector of a sample is a zero vector, and many interesting and more complicated hypotheses can be converted to it by some simple transformations, such as in the one group repeat measurement designs (Ahmad et al. 2008), in the mean matrix structure of transposable data (Touloumis et al. 2015), the two-sample problem (Chen and Qin 2010), and the multi-sample problem (Schott 2007).

The classical solution to the multivariate one-sample problem is Hotelling’s $T^{2}$ test. However, Hotelling’s $T^{2}$ test does not apply to high-dimensional data when the data dimension is larger than the sample size because in this case the sample covariance matrix is not invertible. To overcome this problem, many alternative tests are then proposed to test the one-sample hypothesis in high-dimensional settings. Srivastava and Du (2008) proposed a scale-invariant test. Park and Ayyala (2013) proposed a leaving-one-out scale-invariant test. Wang et al. (2015) proposed a nonparametric one-sample test based on the multivariate spatial sign transformation for elliptically distributed data. Feng and Sun (2016) proposed a scale-invariant nonparametric test based on spatial ranks and inner standardization which can also take the scale difference of variables into account. Some other tests include the random permutation based test proposed by Shen and Lin (2015), the randomization test proposed by Wang and Xu (2019), block diagonal test by Zhao (2017), the diagonal likelihood ratio test by Hu et al. (2019), the sign test by Paindaveine and Verdebout (2016), the composite $T^{2}$ test by Feng et al. (2017), shrinkage-based regularization tests by Chen et al. (2011), Shen et al. (2011) and Dong et al. (2016), and the empirical likelihood test by Peng et al. (2014) among others.

Many existing tests, such as the tests by Srivastava and Du (2008) and Wang et al. (2015), use normal approximation to approximate their null distributions. However, for most tests, normal approximation is only valid under very strong conditions on the underlying covariance matrix as noted by Katayama et al. (2013). One of the key conditions requires that the high-dimensional data are less or nearly not correlated. To relax the assumptions on the underlying covariance matrix, Zhang and Xu (2009) proposed an $L^{2}$-norm one-sample test for normal data based on the two-cumulant (2-c) matched Welch–Satterthwaite $\chi ^{2}$-approximation. For one-group normally distributed repeated measures designs, Ahmad et al. (2008) proposed a test with the 2-c matched $\chi ^{2}$-approximation, and Pauly et al. (2015) proposed a test with the three-cumulant (3-c) matched $\chi ^{2}$-approximation of Zhang (2005).

In this paper, we propose and study a normal reference test with the 3-c matched $\chi ^{2}$-approximation for a general one-sample problem with non-normal high-dimensional data. We show that under some regularity conditions, when the null hypothesis is true, the proposed test statistic and a $\chi ^2$-type mixture have the same normal or non-normal limit distributions. It is then justifiable to approximate the null distribution of the test statistic using that of the $\chi ^2$-type mixture. The distribution of the $\chi ^2$-type mixture which has both positive and negative unknown coefficients can be well approximated by a 3-c matched $\chi ^2$-approximation with the approximation parameters consistently estimated from the data. Since the $\chi ^2$-type mixture is obtained from the test statistic when the null hypothesis holds and when the data are normally distributed, the resulting test is termed as a normal reference test with 3-c matched $\chi ^2$-approximation.

The proposed test has a close relationship with the test proposed by Pauly et al. (2015) but the two tests have several different aspects as listed below. First of all, our test is investigated for general non-normal data and one-sample tests with other types of data can be reduced to our one-sample test via some simple transformations while their test is studied only for normally distributed repeated measure designs. Second, the test proposed by Pauly et al. (2015) is based on a nonnegative squared $L^2$-norm statistic and their approximation essentially follows Hall (1983) by matching the third moment of normalized variables. On the other hand, our statistic is a centered squared $L^2$-norm statistic and our approximation is formulated as in Zhang (2005) for a $\chi ^2$-mixture with both positive and negative coefficients. Third, our approximation parameter estimators are constructed directly without using U-statistics and are ratio-consistent under the null or any alternative hypotheses while their approximation parameter estimators are constructed using U-statistics which are often time and space consuming and are ratio-consistent only under the null hypothesis. In practice, one does not know if the null hypothesis holds. Fourth, the asymptotic power of our test is established, the effect of the data non-normality on our test is discussed, and a sufficient and necessary condition is found for the asymptotic normality of our test. These are not discussed in Pauly et al. (2015).

The rest of the paper is organized as follows. Our main results are presented in Sect. 2. A simulation study is presented in Sect. 3. Applications of our test to one-sample problems with other types of data are presented in Sect. 4. Some concluding remarks are given in Sect. 5. The technical proofs of the main results are outlined in the Appendix.

2 Main results

Our study is motivated by a multivariate analysis of variance (MANOVA) problem for dependent samples. Suppose we have n independent, identically distributed (i.i.d.) $q\times k$ matrix variate observations $\varvec{{X}}_{i}=(\varvec{{x}}_{i1},\ldots ,\varvec{{x}}_{ik}),\ i=1,\ldots ,n$. The k columns of the observation matrix $\varvec{{X}}_{i}$ correspond to matched multivariate observations from k different samples. Unlike the usual MANOVA problem for independent samples, we assume the observations of the k samples are matched, and allow possible dependence between matched observations from different samples. Besides, as frequently encountered in many practical problems, such as in the time profiles analysis (Ahmad et al. 2008; Pauly et al. 2015), we allow k (or q) to be large, even be proportional to the sample size n. The interested problem is whether the mean vectors of the k samples are the same, i.e., to test

$$\begin{aligned} H_{0}:{\text {E}}(\varvec{{x}}_{11})=\cdots ={\text {E}}(\varvec{{x}}_{1k}) \text { versus }H_{1}:H_{0}\text { is not true}. \end{aligned}$$

(1)

In this paper, instead of trying to solve above specific problem directly, we treat it as a special case of the following one-sample problem. Suppose we have one high-dimensional sample:

$$\begin{aligned} \varvec{{y}}_{1},\ldots ,\varvec{{y}}_{n} \text{ are } \text{ i.i.d. } p \text{-dimensional } \text{ random } \text{ vectors }, \end{aligned}$$

(2)

with ${\text {E}}(\varvec{{y}}_1)=\varvec{{\mu }}$ and ${\text {Cov}}(\varvec{{y}}_1)=\varvec{{\varSigma }}$ where the dimension p is big, and may be much larger than the sample size n. Consider the following hypotheses:

$$\begin{aligned} H_{0}:\ \ \varvec{{\mu }}=\varvec{{0}}\ \ \text{ versus } \ \ H_{1}:\ \ \varvec{{\mu }}\ne \varvec{{0}}. \end{aligned}$$

(3)

In many situations, one may be interested in testing the hypotheses: $H_0: \varvec{{\mu }}=\varvec{{\mu }}_0$ versus $H_1: \varvec{{\mu }}\ne \varvec{{\mu }}_0$ for some known constant vector $\varvec{{\mu }}_0$. This general one-sample problem can be reduced to the one-sample problem (3) based on the induced sample $\varvec{{y}}_i-\varvec{{\mu }}_0,\ i=1,\ldots ,n$ and with $\varvec{{\mu }}$ replaced by $\varvec{{\mu }}-\varvec{{\mu }}_0$. To see the connection between hypotheses (3) and (1) , let $\varvec{{P}}=\varvec{{I}}_{k}-k^{-1}\varvec{{J}}_{k}$, where $\varvec{{I}}_{k}$ is a $k\times k$ identity matrix and $\varvec{{J}}_{k}$ is a $k\times k$ matrix of ones. The hypothesis $H_{0}$ in (1) is equivalent to ${\text {vec}}[{\text {E}}(\varvec{{X}}_{1})\varvec{{P}}]={\text {E}}[{\text {vec}}(\varvec{{X}}_{1}\varvec{{P}})]=\varvec{{0}}$, where ${\text {vec}}$ denotes the matrix vectorization by column operator, so to test the hypothesis $H_{0}$ in (1) for the original sample $\varvec{{X}}_{i},\ i=1,\ldots ,n$, we can just test the hypothesis $H_{0}$ in (3) for the induced sample $\varvec{{y}}_{i}={\text {vec}}(\varvec{{X}}_{i}\varvec{{P}})$, $i=1,\ldots ,n$.

2.1 Asymptotic null distribution

Let

$$\begin{aligned} \bar{\varvec{{y}}}=n^{-1}\sum _{i=1}^n \varvec{{y}}_i, \; \text{ and } \; \hat{\varvec{{\varSigma }}}=(n-1)^{-1}\sum _{i=1}^n (\varvec{{y}}_i-\bar{\varvec{{y}}})(\varvec{{y}}_i-\bar{\varvec{{y}}})^{\top } \end{aligned}$$

(4)

denote the sample mean vector and covariance matrix, respectively. Inspired by the two-sample test of Bai and Saranadasa (1996), the test statistic for testing the one-sample problem (3) can be constructed as

$$\begin{aligned} T_{n,p}=n\Vert \bar{\varvec{{y}}}\Vert ^{2}-{\text {tr}}(\hat{\varvec{{\varSigma }}}), \end{aligned}$$

(5)

where $\Vert \cdot \Vert $ denotes the usual $L^{2}$-norm of a vector. We can write

$$\begin{aligned} T_{n,p}=T_{n,p,0}+2S_{n,p}+n\Vert \varvec{{\mu }}\Vert ^{2}, \end{aligned}$$

(6)

where

$$\begin{aligned} T_{n,p,0}=n\Vert \bar{\varvec{{y}}}-\varvec{{\mu }}\Vert ^{2}-{\text {tr}}(\hat{\varvec{{\varSigma }}}),\;\;S_{n,p}=n\varvec{{\mu }}^{\top }(\bar{\varvec{{y}}}-\varvec{{\mu }}). \end{aligned}$$

(7)

Note that $T_{n,p,0}$ has the same distribution as $T_{n,p}$ under the null hypothesis.

When the sample (2) is normally distributed, it is easy to see that for any given n and p, the distribution of $T_{n,p,0}$ has the same distribution as that of the following $\chi ^2$-type mixture

$$\begin{aligned} T_{n,p,0}^*=\sum _{r=1}^p \lambda _{p,r} [A_r-B_r/(n-1)],\; A_r{\mathop {\sim }\limits ^{\text {i.i.d.}}}\chi _1^2,\; B_r{\mathop {\sim }\limits ^{\text {i.i.d.}}}\chi _{n-1}^2, \end{aligned}$$

(8)

where $\chi _{v}^{2}$ denotes a central chi-square distribution with v degrees of freedom, $\lambda _{p,r},\ r=1,\ldots , p $ are the eigenvalues of the covariance matrix $\varvec{{\varSigma }}$. The first three cumulants of $T_{n,p,0}^*$ are given as ${\text {E}}(T_{n,p,0}^*)=0$,

$$\begin{aligned} {\text {Var}}(T_{n,p,0}^*)=\frac{2n}{n-1}{\text {tr}}(\varvec{{\varSigma }}^2),\;\; \text{ and } {\text {E}}(T_{n,p,0}^{*3})=\frac{8n(n-2)}{(n-1)^{2}}{\text {tr}}(\varvec{{\varSigma }}^3). \end{aligned}$$

(9)

Now we study the asymptotic property of $T_n$ when both n and p tend to infinity. Although the situation described by this kind of high-dimensional asymptotics never happens in reality, the high-dimensional property of $T_n$ gives a hint how it behaves in the practical scenario that when both sample size and data dimension are large, or when the data dimension is comparable to the sample size. More importantly, the limiting behavior of $T_n$ provides a guidance for properly approximating its null distribution and the p value of the corresponding test when both n and p are large.

Set $\rho _{p,r}=\lambda _{p,r}/\sqrt{{\text {tr}}(\varvec{{\varSigma }}^2)},\ r=1,\ldots ,p$. The following conditions are convenient for the theoretical study:

C1
We have $\varvec{{y}}_{i}=\varvec{{\mu }}+\varvec{{\varGamma }}\varvec{{z}}_{i},\ i=1,\ldots ,n$, where $\varvec{{\varGamma }}$ is a $p\times p$ matrix such that $\varvec{{\varGamma }}\varvec{{\varGamma }}^{\top }=\varvec{{\varSigma }}$ and $\varvec{{z}}_{i}$’s are i.i.d. p-vectors with ${\text {E}}(\varvec{{z}}_{i})=\varvec{{0}}$ and ${\text {Cov}}(\varvec{{z}}_{i})=\varvec{{I}}_{p}$, the $p\times p$ identity matrix.
C2
We have ${\text {E}}(z_{ir}^{4})=3+\varDelta <\infty $ where $z_{ir}$ is the r-th component of $\varvec{{z}}_{i}$, $\varDelta $ is some constant, and ${\text {E}}(z_{ir_1}^{\alpha _{1}}\cdots z_{ir_q}^{\alpha _{q}})={\text {E}}(z_{ir_1}^{\alpha _1})\cdots {\text {E}}(z_{ir_q}^{\alpha _q})$ for a positive integer q such that $\sum _{r=1}^q\alpha _r\le 8$ and $r_1\ne \cdots \ne r_q$.
C3
We have $\lim _{p\rightarrow \infty } \rho _{p,r}=\rho _{r},\ r=1,2,\ldots $, uniformly and $\lim _{p\rightarrow \infty } \sum _{r=1}^p \rho _{p,r}=\sum _{r=1}^{\infty } \rho _r<\infty $.
C4
As $n,p\rightarrow \infty $, we have $p/n^2\longrightarrow 0$.
C5
As $p\rightarrow \infty $, we have $\rho _{p,\max }\rightarrow 0$ where $\rho _{p,\max }=\max _{r=1}^p \rho _{p,r}$.

Conditions C1 and C2 are also imposed by Bai and Saranadasa (1996) and Chen and Qin (2010), respectively. They specify a factor model for high-dimensional data analysis. Condition C3 is also imposed by Zhang et al. (2020), it ensures the existence of the limits of $\lambda _{p,r}$ as $p\rightarrow \infty $ and the exchangeability of the limit and summation operations in the expression $\lim _{p\rightarrow \infty }\sum _{r=1}^p\rho _{p,r}$. Condition C3 implies that $\sum _{r=q+1}^{p}\rho _{p,r} \longrightarrow \sum _{r=q+1}^{\infty }\rho _{r}$ as $p\rightarrow \infty $ for any fixed $q<p$, and $\sum _{r=q+1}^{\infty }\rho _{r}\longrightarrow 0$ as $q\rightarrow \infty $. It is used to ensure that the limiting distributions of the normalized versions of $T_{n,p,0}$ and $T_{n,p,0}^*$, namely,

$$\begin{aligned} \tilde{T}_{n,p,0}=\frac{T_{n,p,0}}{\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}}, \text{ and } \tilde{T}_{n,p,0}^*=\frac{T_{n,p,0}^*}{\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}}, \end{aligned}$$

(10)

are non-normal. Condition C4 is needed by Lemma 1 presented in the Appendix which proves the ratio-consistency of the estimator (20) of ${\text {tr}}(\varvec{{\varSigma }}^3)$. It is also needed by Theorems 4 and 5 . This condition is weaker than the condition “$p/n\longrightarrow c\in (0,\infty )$ as $n, p\rightarrow \infty $” imposed by Bai and Saranadasa (1996). It allows $p/n\longrightarrow \infty $ as $n, p\rightarrow \infty $ but only allows p to diverge in a slower rate than $n^2$. Condition C5 is also imposed by Bai and Saranadasa (1996) and it is used to ensure that the limiting distributions of $\tilde{T}_{n,p,0}$ and $\tilde{T}_{n,p,0}^*$ are normal. Conditions C3 and C5 impose two exclusive constraints on the eigenvalues of the covariance matrix $\varvec{{\varSigma }}$ so that the limiting distributions of $\tilde{T}_{n,p,0}$ and $\tilde{T}_{n,p,0}^*$ are non-normal and normal, respectively. Theoretically speaking, when the eigenvalues of $\varvec{{\varSigma }}$ are in the same order (e.g., under a non-spiked covariance model where no eigenvalues of $\varvec{{\varSigma }}$ can dominate the other eigenvalues), Condition C5 is satisfied so that $\tilde{T}_{n,p,0}$ and $\tilde{T}_{n,p,0}^*$ will be asymptotically normally distributed and when the sequence of decreasingly ordered eigenvalues of $\varvec{{\varSigma }}$ tends to 0 quickly (e.g., under a spiked covariance model where a finite number of eigenvalues dominate the remaining eigenvalues asymptotically) such that ${\text {tr}}^2(\varvec{{\varSigma }})/{\text {tr}}(\varvec{{\varSigma }}^2)$ tends to a finite limit, Condition C3 is satisfied. In real data analysis, largely speaking, when the p-components of an observation are nearly uncorrelated, Condition C5 is approximately satisfied and when they are moderately or highly correlated, Condition C3 is approximately satisfied. Let ${\mathop {=}\limits ^{d}}$ denote equality in distribution and ${\mathop {\longrightarrow }\limits ^{\mathcal {L}}}$ denote convergence in distribution. We have the following useful theorem.

Theorem 1

(a)
Under Conditions C1, C2 and C3, as $n, p\rightarrow \infty $, we have
$$\begin{aligned} \tilde{T}_{n,p,0}{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\zeta , \quad \text{ and }\quad \tilde{T}_{n,p,0}^*{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\zeta , \end{aligned}$$
(11)
where $\zeta {\mathop {=}\limits ^{d}}\sum _{r=1}^{\infty } \rho _{r} (A_r-1)/\sqrt{2}, \; A_r{\mathop {\sim }\limits ^{\text {i.i.d.}}}\chi _1^2$.
(b)
Under Conditions C1, C2 and C5, as $n, p\rightarrow \infty $, we have
$$\begin{aligned} \tilde{T}_{n,p,0}{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\mathcal {N}(0,1), \quad \text{ and } \quad \tilde{T}_{n,p,0}^*{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\mathcal {N}(0,1). \end{aligned}$$
(12)
Then under the conditions of (a) or (b), we always have
$$\begin{aligned} \sup _{x}|\Pr (T_{n,p,0}\le x)-\Pr (T_{n,p,0}^*\le x )|\longrightarrow 0. \end{aligned}$$
(13)

In one-sample test for normally distributed repeated measures designs, a theorem comparable with Theorem 1 has been proved by Pauly et al. (2015). However, the authors failed to extend it to non-normal repeated measures designs in their paper. Theorem 1 provides a theoretical justification for us to use the distribution of $T_{n,p,0}^*$ to approximate the distribution of $T_{n,p,0}$. Notice that $T_{n,p,0}^*$ is obtained when the data (2) are normally distributed. Thus, we term the distribution of $T_{n,p,0}^*$ as the normal-reference distribution of $T_{n,p,0}$.

2.2 Implementation

To implement the proposed test, we approximate the null distribution of $T_{n,p}$ using that of $T_{n,p,0}^*$. Different from the $L^{2}$-norm test studied in Zhang and Xu (2009), whose null distribution is the same as a $\chi ^{2}$-type mixture with only positive coefficients, the distribution of $T_{n,p,0}^*$ is the same as a $\chi ^{2}$-type mixture with both positive and negative coefficients. For such a $\chi ^{2}$-type mixture, Zhang (2013) showed, with some simulation studies, that the 2-c matched $\chi ^{2}$-approximation method (Welch 1947; Satterthwaite 1946; Box 1954) adopted by Zhang and Xu (2009) should not be used to approximate the distribution of $T_{n,p,0}^*$. Rather, the 3-c matched $\chi ^{2}$-approximation method of Zhang (2005) should be used.

One obvious advantage of the 3-c $\chi ^{2}$-approximation method for approximating the distribution of $T_{n,p,0}^*$ over the normal approximation suggested by Bai and Saranadasa (1996), and the 2-c matched $\chi ^{2}$-approximation method used by Zhang and Xu (2009), is that the former matches the first three cumulants while the latter two only matches the first two cumulants. So it is expected that in terms of size control the 3-c $\chi ^{2}$-approximation should be more accurate than the normal approximation and the 2-c matched $\chi ^{2}$-approximation. In fact, Zhang (2005) showed, theoretically in terms of upper density approximation error bound and via simulation studies, that the 3-c matched $\chi ^{2}$-approximation has a much better accuracy than the normal approximation even when the normal approximation is adequate.

By the 3-c matched $\chi ^{2}$-approximation method of Zhang (2005), we approximate the distribution of $T_{n,p,0}^*$ using the distribution of the random variable

$$\begin{aligned} R=\beta _{0}+\beta _{1}\chi _{d}^{2}, \end{aligned}$$

(14)

where the parameters $\beta _{0},\beta _{1}$ and d are determined via matching the first three cumulants of $T_{n,p,0}^*$ and R. The first three cumulants of $T_{n,p,0}^*$ are given in (9) while by (14), the first three cumulants of R are given by $\beta _{0}+\beta _{1}d,\; 2\beta _{1}^{2}d,$ and $8\beta _{1}^{3}d$, respectively. Matching the first three cumulants of $T_{n,p,0}^*$ and R then leads to

$$\begin{aligned} \beta _{0}=-\frac{n{\text {tr}}^{2}(\varvec{{\varSigma }}^{2})}{(n-2){\text {tr}}(\varvec{{\varSigma }}^{3})},\;\;\beta _{1} =\frac{(n-2){\text {tr}}(\varvec{{\varSigma }}^{3})}{(n-1){\text {tr}}(\varvec{{\varSigma }}^{2})},\;\; d=\frac{n(n-1)}{(n-2)^{2}} \frac{{\text {tr}}^{3}(\varvec{{\varSigma }}^{2})}{{\text {tr}}^{2}(\varvec{{\varSigma }}^{3})}. \end{aligned}$$

(15)

The parameter d is usually called as the approximate degrees of freedom of the 3-c matched $\chi ^{2}$-approximation to $T_{n,p,0}^*$. Note that since $\varvec{{\varSigma }}$ is always nonnegative, we always have $\beta _0<0, \beta _1>0$, and $d>0$. This is reasonable since $T_{n,p,0}^*$ is a $\chi ^2$-type mixture with both positive and negative coefficients. Using d defined above, the skewness of $T_{n,p,0}^*$ is given by

$$\begin{aligned} {\text {E}}(T_{n,p,0}^{*3})/{\text {Var}}^{3/2}(T_{n,p,0}^*)=\left( 8/d\right) ^{1/2}. \end{aligned}$$

(16)

To implement the proposed test in real data analysis, we need estimate ${\text {tr}}(\varvec{{\varSigma }}^{2})$ and ${\text {tr}}(\varvec{{\varSigma }}^{3})$ consistently. Let their ratio-consistent estimators be denoted respectively as $\widehat{{\text {tr}}(\varvec{{\varSigma }}^{2})}$ and $\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}$. Then the ratio-consistent estimators of $\beta _{0},\beta _{1}$ and d are respectively given by

$$\begin{aligned} \hat{\beta }_{0}=-\frac{n[\widehat{{\text {tr}}(\varvec{{\varSigma }}^{2})}]^{2}}{(n-2)\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}},\; \hat{\beta }_{1}=\frac{(n-2)\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}}{(n-1)\widehat{{\text {tr}}(\varvec{{\varSigma }}^{2})}},\; \hat{d}=\frac{n(n-1)}{(n-2)^{2}}\frac{[\widehat{{\text {tr}}(\varvec{{\varSigma }}^{2})}]^{3}}{[\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}]^{2}}. \end{aligned}$$

(17)

For any nominal significance level $\alpha >0$, let $\chi _{v}^{2}(\alpha )$ denote the upper $100\alpha $ percentile of $\chi _{v}^{2}$. Then by (17), the proposed test for the one-sample problem (3) using $T_{n,p}$ with the 3-c matched $\chi ^{2}$-approximation is then conducted via using the approximate critical value $\hat{\beta }_{0}+\hat{\beta }_{1}\chi _{\hat{d}}^{2}(\alpha )$ or the approximate p value $\Pr \left[ \chi _{\hat{d}}^{2}\ge (T_{n,p}-\hat{\beta }_{0})/\hat{\beta }_{1}\right] $.

In practice, one often uses the following normalized version of $T_{n,p}$:

$$\begin{aligned} \tilde{T}_{n,p}=\frac{T_{n,p}}{\sqrt{\frac{2n}{n-1} \widehat{{\text {tr}}(\varvec{{\varSigma }}^{2})}}}. \end{aligned}$$

(18)

Then to approximate the distribution of $T_{n,p}$ using that of $\hat{\beta }_0+\hat{\beta }_1\chi _{\hat{d}}^2$ is equivalent to approximate the distribution of $\tilde{T}_{n,p}$ using that of $(\chi _{\hat{d}}^2-\hat{d})/\sqrt{2\hat{d}}$. In this case, the proposed test for the one-sample problem (3) using $\tilde{T}_{n,p}$ with the 3-c matched $\chi ^{2}$-approximation can also be conducted via using the approximate critical value $[\chi _{\hat{d}}^{2}(\alpha )-\hat{d}]/\sqrt{2\hat{d}}$ or the approximate p value $\Pr \left( \chi _{\hat{d}}^{2}\ge \hat{d}+\sqrt{2\hat{d}}\tilde{T}_{n,p}\right) $.

We now consider the ratio-consistent estimators of ${\text {tr}}(\varvec{{\varSigma }}^2)$ and ${\text {tr}}(\varvec{{\varSigma }}^3)$. By Lemma S.3 of Zhang et al. (2020), a ratio-consistent estimator of ${\text {tr}}(\varvec{{\varSigma }}^{2})$ is given by

$$\begin{aligned} \widehat{{\text {tr}}(\varvec{{\varSigma }}^{2})}=\frac{(n-1)^{2}}{(n-2)(n+1)}\left[ {\text {tr}}(\hat{\varvec{{\varSigma }}}^{2})-\frac{{\text {tr}}^{2}(\hat{\varvec{{\varSigma }}})}{n-1}\right] , \end{aligned}$$

(19)

where $\hat{\varvec{{\varSigma }}}$ is the sample covariance estimator of $\varvec{{\varSigma }}$ as given in (4). When the data (2) are normally distributed, we have $\hat{\varvec{{\varSigma }}}\sim W_p(n-1,\varvec{{\varSigma }}/(n-1))$, a Wishart distribution with $n-1$ degrees of freedom and covariance matrix $\varvec{{\varSigma }}/(n-1)$. Then under Condition C4, by Lemma 1 given in the Appendix, an unbiased and ratio-consistent estimator of ${\text {tr}}(\varvec{{\varSigma }}^{3})$ is given by

$$\begin{aligned} \widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}=\frac{(n-1)^{4}}{(n^2+n-6)(n^2-2n-3)}\left[ {\text {tr}}(\hat{\varvec{{\varSigma }}}^{3})-\frac{3{\text {tr}}(\hat{\varvec{{\varSigma }}}){\text {tr}}(\hat{\varvec{{\varSigma }}}^{2})}{(n-1)}+\frac{2{\text {tr}}^{3}(\hat{\varvec{{\varSigma }}})}{(n-1)^{2}}\right] . \end{aligned}$$

(20)

We conjecture that when Conditions C1, C2 and C4 are satisfied, $\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}$ is also ratio-consistent for ${\text {tr}}(\varvec{{\varSigma }}^{3})$ for non-normal data. This is partially confirmed by the simulation results presented in Sect. 3 and in the Supplementary Material where the proposed test works well in terms of size control regardless of whether the data are nearly uncorrelated, moderately correlated or highly correlated and whether the data are normally or non-normally distributed. A theoretical justification of the ratio-consistency of $\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}$ without the normality assumption, like the one given in Lemma 1 for normal data, is theoretically interesting and mathematically possible but expectedly rather laborious because the evaluation of the mean and variance of $\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}$ for non-normal data will be much more complicated than those in the proof of Lemma 1 for normal data. Further research in this direction is interesting and warranted. It is worthwhile to mention that a U-statistic based estimator of ${\text {tr}}(\varvec{{\varSigma }}^3)$ is given by Pauly et al. (2015) (Theorem 8.2). However, this estimator is often time-consuming, especially when both n and p are large. Further, its ratio-consistency is proved under the null hypothesis and the normality assumption as well.

2.3 Asymptotic power

In this subsection, we investigate the asymptotic power of $T_{n,p}$. By (6), we have the expansion $T_{n,p}{\mathop {=}\limits ^{d}}T_{n,p,0}+2S_{n,p}+n\Vert \varvec{{\mu }}\Vert ^{2}$ where $T_{n,p,0}$ has the same distribution as $T_{n,p}$ under the null hypothesis and ${\text {Var}}(S_{n,p})=n\varvec{{\mu }}^{\top }\varvec{{\varSigma }}\varvec{{\mu }}$. Following Bai and Saranadasa (1996), let’s consider the power of $T_{n,p}$ under the following local alternative:

$$\begin{aligned} \text{ as } {n,p\rightarrow \infty },\;\;n\varvec{{\mu }}^{\top } \varvec{{\varSigma }}\varvec{{\mu }}=o[{\text {tr}}(\varvec{{\varSigma }}^{2})]. \end{aligned}$$

(21)

This is the case when ${\text {Var}}(S_{n,p})=o[{\text {Var}}(T_{n,p,0})]$ so that $T_{n,p}=T_{n,p,0}+n\Vert \varvec{{\mu }}\Vert ^{2}+o_{p}\left[ \sqrt{{\text {Var}}(T_{n,p,0})}\right] $ since ${\text {E}}(S_{n,p})=0$.

Theorem 2

Assume that $\hat{\beta }_0, \hat{\beta }_1$ and $\hat{d}$ are the ratio-consistent estimators of $\beta _0,\beta _1$ and d as $n,p\rightarrow \infty $, respectively. Then, (a) Under Conditions C1, C2, C3, and the local alternative (21), as $n,p\rightarrow \infty $, we have

$$\begin{aligned} \Pr \left[ T_{n,p}>\hat{\beta }_0+\hat{\beta }_1\chi _{\hat{d}}^{2}(\alpha )\right] =\Pr \left[ \zeta \ge \frac{\chi _{d}^{2}(\alpha )-d}{\sqrt{2d}}-\frac{n\Vert \varvec{{\mu }}\Vert ^2}{\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}}\right] [1+o(1)], \end{aligned}$$

where $\zeta $ is defined in Theorem 1(a).

(b) Under Conditions C1, C2, C4, C5 and the local alternative (21), as $n,p\rightarrow \infty $, we have

$$\begin{aligned} \Pr \left[ T_{n,p}>\hat{\beta }_0+\hat{\beta }_1\chi _{\hat{d}}^{2}(\alpha )\right] =\varPhi \left[ -z_{\alpha }+\frac{n\Vert \varvec{{\mu }}\Vert ^2}{\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}}\right] [1+o(1)], \end{aligned}$$

where $z_{\alpha }$ denotes the upper $100\alpha $-percentile of $\mathcal {N}(0,1)$ and $\varPhi (\cdot )$ denotes the cumulative distribution function of $\mathcal {N}(0,1)$.

For any $d\ge 1$ and a small $\alpha $, it is easy to check that we always have $z_{\alpha }< [\chi _d^2(\alpha )-d]/\sqrt{2d}$. This shows that under Conditions C1–C3 and the local alternative (21), the asymptotic size and power of the proposed test with the normal approximation are expected to be “artificially” larger than those of the proposed test with the 3-c $\chi ^2$-approximation. This is consistent with what we observe from the simulation results presented in Sect. 3.

2.4 Effect of data non-normality

The validness of the proposed normal reference test is guaranteed by Theorem 1. In this subsection, we aim to further investigate the effect of the data non-normality onto the proposed test. That is, how does the data non-normality affect the performance of the proposed test? To answer this question, we study how to approximate the distribution of $T_{n,p,0}$ directly using the 3-c matched $\chi ^2$-approximation. To this end, we compute the first three cumulants of $T_{n,p,0}$ as in the following theorem.

Theorem 3

The first three cumulants of $T_{n,p,0}$ are given by ${\text {E}}(T_{n,p,0})=0$,

$$\begin{aligned} \quad {\text {Var}}(T_{n,p,0})=\frac{2n}{n-1}{\text {tr}}(\varvec{{\varSigma }}^{2}),\;\text{ and }\; {\text {E}}(T_{n,p,0}^3)=\frac{8n(n-2)}{(n-1)^{2}}{\text {tr}}(\varvec{{\varSigma }}^{3})+\frac{4n\varUpsilon }{(n-1)^{2}}, \end{aligned}$$

where $\varUpsilon ={\text {E}}[(\varvec{{y}}_{1}-\varvec{{\mu }})^{\top }(\varvec{{y}}_{2}-\varvec{{\mu }})]^{3}$.

It is seen from Theorem 3 that the data non-normality affects the third moment of $T_{n,p,0}$ only. To approximate the distribution of $T_{n,p,0}$ directly using that of $W=b_0+b_1\chi _f^2$ via matching the first three cumulants of $T_{n,p,0}$ and W, the parameters $b_{0},b_{1}$ and f are obtained as

$$\begin{aligned} b_{0}=\beta _0/\delta ,\;\;b_1=\beta _1\delta ,\; \text{ and } \;f=d/\delta ^2,\; \text{ where } \delta =1+\varUpsilon /[2(n-2){\text {tr}}(\varvec{{\varSigma }}^{3})], \nonumber \\ \end{aligned}$$

(22)

and $\beta _0, \beta _1$ and d are given in (15). Note that the skewness of $T_{n,p,0}$ is given by

$$\begin{aligned} {\text {E}}(T_{n,p,0}^{3})/{\text {Var}}^{3/2}(T_{n,p,0})=\left( 8/f\right) ^{1/2}. \end{aligned}$$

(23)

The quantity $\varUpsilon $ can be seen as a non-invariant measure of multivariate normality based on skewness (See, e.g., Sect. 3.1 of Henze 2002). When the data (2) are normal, it is easy to show that $\varUpsilon =0$ so that $\delta =1, b_0=\beta _0, b_1=\beta _1, f=d$ and the skewness (23) of $T_{n,p,0}$ reduces to the skewness (16) of $T_{n,p,0}^*$ as expected. However, when the data (2) are non-normal, we may not have $\varUpsilon =0$ and hence the approximation parameters $b_0, b_1$, f, and the skewness of $T_{n,p,0}$ are all affected by the data non-normality. Fortunately, we can show the following result.

Theorem 4

(a) Under Conditions C1 and C2, we have $\varUpsilon \le (\varDelta ^2+6\varDelta +9)^{3/4}{\text {tr}}^{3/2}(\varvec{{\varSigma }}^2)$ where $\varDelta $ is given in Condition C2; and (b) Under either Conditions C1, C2 and C3 or Conditions C1, C2 and C4, we have $\delta =1+o(1)$ as $n,p\rightarrow \infty $.

Theorem 4 says that under Conditions C1, C2 and C3 or Conditions C1, C2 and C4, the data non-normality on the proposed normal reference test can be ignorable asymptotically so that we have $b_0=\beta _0[1+o(1)],\ b_1=\beta _1[1+o(1)],\ f=d[1+o(1)]$ and the skewness of $T_{n,p,0}$ and that of $T_{n,p,0}^*$ are also asymptotically equal. The following theorem gives a sufficient and necessary condition for the asymptotic normality of $\tilde{T}_{n,p,0}$.

Theorem 5

Under Conditions C1, C2 and C4, as $n,p\rightarrow \infty $, $\tilde{T}_{n,p,0}{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\mathcal {N}(0,1) $ if and only if $d\longrightarrow \infty $ where d is given in (15).

Theorem 5 indicates that when d is small, the normal approximation to the distribution of $\tilde{T}_{n,p,0}$ is unlikely to be adequate.

3 Simulation study

In this section, we conduct a simulation study to compare the proposed normal reference test with the 3-c matched $\chi ^{2}$-approximation (denoted as $T_{new}$), against the $L^{2}$-norm based test with the 2-c matched $\chi ^{2}$-approximation proposed by Zhang and Xu (2009) (denoted as $T_{ZX}$), and the tests proposed by Bai and Saranadasa (1996), Chen and Qin (2010) and Srivastava and Du (2008) (dented as $T_{BS},$ $T_{CQ}$ and $T_{SD}$, respectively). The original $T_{BS}$ and $T_{CQ}$ are two-sample tests and the corresponding one-sample tests adopted here are respectively given by (1.2) and (1.5) of Zhou et al. (2019). Note that the null distributions of $T_{BS}$, $T_{CQ}$ and $T_{SD}$ are all computed using the normal approximation.

In each run, we generate the high-dimensional data (2) using $\varvec{{y}}_{i}=\varvec{{\mu }}+\varvec{{\varSigma }}^{1/2}\varvec{{z}}_{i},\ i=1,\ldots ,n$ where $\varvec{{\mu }}=\delta \varvec{{h}}$ with the components of $\varvec{{z}}_{i}$ i.i.d. generated from the following three models:

Model 1: $z_{ir},\ r=1,\ldots ,p{\mathop {\sim }\limits ^{\text {i.i.d.}}}\mathcal {N}(0,1)$.
Model 2: $z_{ir}=w_{ir}/\sqrt{2},\ r=1,\ldots ,p$ with $w_{ir},\ r=1,\ldots ,p{\mathop {\sim }\limits ^{\text {i.i.d.}}}{\mathrm{t}}_{4}$.
Model 3: $z_{ik}=(w_{ir}-1)/\sqrt{2},\ r=1,\ldots ,p$ with $w_{ir},\ r=1,\ldots ,p{\mathop {\sim }\limits ^{\text {i.i.d.}}}\chi _{1}^{2}$.

Based on the above three models, the resulting data are normal, symmetric but non-normal, and skewed and non-normal, respectively. The covariance matrix is specified as $\varvec{{\varSigma }}=\sigma ^{2}\left[ (1-\rho )\varvec{{I}}_{p}+\rho \varvec{{J}}_{p}\right] $. Some additional simulation results with different covariance structures are presented in the Supplementary Material and the conclusions are similar to those presented in this section.

Note that the tuning parameters $\delta ,\ \varvec{{h}}$ and $\rho $ are used to control the mean vector and the data correlation, respectively. Note also that the power of a test will increase with increasing the value of $\delta $ and the data correlation will increase with increasing the value of $\rho $. For simplicity, without loss of generality, we set $\varvec{{h}}=\varvec{{u}}/\Vert \varvec{{u}}\Vert $ with $\varvec{{u}}=(1,\ldots ,p)^{\top }$ and set $\sigma ^{2}=1$. To compare the performance of the tests under consideration with various settings, we consider three cases of dimension with $p=50,500,1000$, three cases of sample sizes with $n=30,60,120$, and three cases of data correlation with $\rho =0.1,0.5$ and 0.9.

In the simulations, empirical size and power of a test are calculated as the proportions of the number of rejections (i.e., number of runs when calculated p values of the associated test is smaller than nominal level $\alpha =5\%$) out of 10, 000 runs. The empirical sizes are calculated with $\delta =0$ so that the null hypothesis $H_0$ in (3) is true, and the empirical powers are calculated with $\delta >0$. Different values of $\delta $ (see Table 2) are carefully selected for different combinations of n and p so that all the tests largely have non-trivial powers when $\rho =0.1, 0.5$ and 0.9, respectively. To assess the performance of a test in maintaining the type I error, we define the average relative error as $\text{ ARE }=100M^{-1}\sum _{j=1}^{M}|\hat{\alpha }_{j}-\alpha |/\alpha $, where $\alpha $ is the nominal size ($5\%$ here) and $\hat{\alpha }_{j},\ j=1,\ldots ,M$ denote the empirical sizes under consideration. A smaller ARE value indicates an overall better performance of the associated test in terms of maintaining the nominal size.

Table 1 Empirical sizes (in $\%$) of the tests under various settings

Full size table

Table 2 Empirical powers (in $\%$) of the tests under various settings

Full size table

Table 3 Estimated degrees of freedoms of $T_{new}$, $T_{ZX}$ under various settings

Full size table

Table 1 displays the empirical sizes of the tests under various settings with the last row presenting the ARE values of the tests for three values of $\rho $. It is seen that under each setting, the empirical size of $T_{new}$ is generally much closer to $5\%$ than those of other tests. This shows that in terms of size control, our new test significantly outperforms other tests. This conclusion is also seen from the ARE values of the tests. In fact, from the last row of the table, it is seen that the ARE values of $T_{new}$ are much smaller than those of other tests for $\rho =0.1, 0.5$ and 0.9, respectively. From Table 1, we also see that in terms of size control, (a) $T_{ZX}$ generally outperforms $T_{BS}, T_{CQ}$ and $T_{SD}$; (b) $T_{BS}$ and $T_{SD}$ are generally comparable and they are generally very liberal with most of their empirical sizes close to $7\%$; and (c) $T_{SD}$ performs quite well for Models 1 and 2 for $\rho =0.1$ but it is very conservative for $\rho =0.5$ and 0.9 with most of its empirical sizes much smaller than $5\%$. This implies that $T_{SD}$ cannot work well for highly skewed or correlated high-dimensional data.

Table 2 displays the empirical powers of the tests under various settings. First of all, it is seen that $T_{new}$ and $T_{ZX}$ have comparable empirical powers, with $T_{ZX}$’s empirical powers slightly bigger than those of $T_{new}$’s. This is possibly due to the fact that as shown in Table 1, the empirical sizes of $T_{ZX}$ are generally bigger than those of $T_{new}$. This observation is consistent with the conclusion drawn from Theorem 2. Second, $T_{BS}$ and $T_{CQ}$ have comparable empirical powers which are slightly bigger than those of $T_{new}$ and $T_{ZX}$. This is also because the former tests generally have bigger empirical sizes than the latter tests. Third, $T_{SD}$ has comparable empirical powers with other tests when $\rho =0.1$ and under Models 1 and 2. It has lower empirical powers than other tests when $\rho =0.5,0.9$ or under Model 3. This again shows that $T_{SD}$ does not work well for highly skewed or highly correlated high-dimensional data. Finally, we can see that under various settings, the empirical powers of all the tests are getting smaller with increasing the value of $\rho $. This is reasonable since with increasing the value of $\rho $, the data variations are also increasing.

Table 3 displays the estimated approximate degrees of freedom of $T_{new}$ and $T_{ZX}$ under various settings. First of all, it is seen that under the same setting, the estimated approximate degrees of freedom of $T_{new}$ is smaller than $T_{ZX}$ in most cases. Secondly, it is seen that with increasing the values of $\rho $, the estimated approximate degrees of freedom of $T_{new}$ and $T_{ZX}$ become smaller. This shows that with increasing the data correlation, the normal approximation becomes less adequate. This explains why in terms of size control, $T_{BS}$ and $T_{CQ}$ perform worse with increasing the data correlation.

In summary, the simulation results presented in this section show that in terms of size control, $T_{new}$ outperforms other tests significantly; $T_{ZX}$ outperforms $T_{BS}, T_{CQ}$ and $T_{SD}$; $T_{BS}$ and $T_{SD}$ are generally comparable and are generally liberal; and $T_{SD}$ performs well for symmetric and less correlated high-dimensional data but it is very conservative when the high-dimensional data are highly skewed or highly correlated.

4 Some interesting applications

4.1 Paired two-sample problem

One important application of the one-sample test considered in this paper is testing the mean difference for two paired samples. Suppose we have n paired observations $(\varvec{{x}}_{11},\varvec{{x}}_{12}),\ldots ,(\varvec{{x}}_{n1},\varvec{{x}}_{n2})$ which are i.i.d., we are interested in testing the following hypotheses

$$\begin{aligned} H_{0}:\ {\text {E}}(\varvec{{x}}_{11})={\text {E}}(\varvec{{x}}_{12}), \text{ versus } H_{1}:\ {\text {E}}(\varvec{{x}}_{11})\ne {\text {E}}(\varvec{{x}}_{12}). \end{aligned}$$

(24)

Then testing (24) is equivalent to testing (3) based on the induced i.i.d. sample $\varvec{{y}}_{i}=\varvec{{x}}_{i1}-\varvec{{x}}_{i2}$, $i=1,\ldots ,n$ and with $\varvec{{\mu }}={\text {E}}(\varvec{{y}}_1)$. Therefore, the one-sample test discussed previously can be used to test the hypothesis (24).

As a real data example, we consider the colon dataset provided by Alon et al. (1999). The colon dataset contains 22 normal colon tissues and 40 tumor colon tissues from 40 colon-cancer patients, with each observation consisting of 2000 gene expressions. It is of interest to check whether the mean gene expression levels of the normal and tumor colon tissues are the same. For simplicity, we remove the unpaired colon tissues and keep $n=22$ paired colon tissues only.

As an application, we apply the tests $T_{new}$, $T_{ZX}$, $T_{BS}$, $T_{CQ}$ and $T_{SD}$ to the colon dataset to test whether the normal colon tissues and the tumor colon tissues have significantly different mean gene expression levels.

Table 4 Results for testing if the mean gene expression levels of the normal colon issues and the tumor colon tissues are the same

Full size table

Table 4 presents the results based on the 22 paired colon issues only. It is seen that all the tests except $T_{SD}$ strongly reject the null hypothesis. The estimated degrees of freedom of $T_{new}$ and $T_{ZX}$ are small, showing that the normal approximation used in $T_{BS}, T_{CQ}$ and $T_{SD}$ is not adequate to the respective null distributions. Therefore, the p values of $T_{BS}$, $T_{CQ}$ and $T_{SD}$ are less liable. The p value of $T_{SD}$ indicates that $T_{SD}$ failed to detect the difference between the gene expression levels of the normal colon tissues and the tumor colon tissues at the $5\%$ significance level, showing that $T_{SD}$ is conservative in this example. This result is consistent with what we observed from the simulation results presented in Sect. 3.

4.2 One-sample problem for transposable data

In many applications, measurements of a subject can be naturally organized in a matrix, especially when the rows and columns correspond to two different sets of variables. Such a kind of data is called transposable data in Allen and Tibshirani (2010). Given n i.i.d. transposable $q\times k$ random matrices $\varvec{{X}}_{1},\ldots ,\varvec{{X}}_{n}$, Touloumis et al. (2015) considered the following testing problem on the structure of the mean matrix:

$$\begin{aligned} H_{0}:\ \varvec{{M}}=\left( \varvec{{\mu }}_{1}\varvec{{1}}_{k_{1}}^{\top },\ldots ,\varvec{{\mu }}_{g}\varvec{{1}}_{k_{g}}^{\top }\right) , \text{ versus } H_{1}:\ \varvec{{M}}\ne \left( \varvec{{\mu }}_{1}\varvec{{1}}_{k_{1}}^{\top },\ldots ,\varvec{{\mu }}_{g}\varvec{{1}}_{k_{g}}^{\top }\right) , \end{aligned}$$

(25)

where $\varvec{{M}}={\text {E}}(\varvec{{X}}_{1})$, $k_{1},\ldots ,k_{g}$ are positive integers such that $\sum _{i=1}^{g}k_{i}=k$ with at least one $k_{i}\ge 2$, $\varvec{{\mu }}_{1},\ldots ,\varvec{{\mu }}_{g}$ are g unknown $q\times 1$ vectors. For each $i=1,\ldots ,g$, set $\varvec{{P}}_{k_i}=\varvec{{I}}_{k_i}-\varvec{{J}}_{k_i}/k_i$ as a centering matrix of size $k_i\times k_i$. Note that the MANOVA hypothesis (1) for dependent samples can be seen as a special case of (25). Set $\varvec{{P}}={\text {diag}}(\varvec{{P}}_{k_{1}},\ldots ,\varvec{{P}}_{k_{g}})$, a $k\times k$ block diagonal matrix. Then testing the null hypothesis in (25) is equivalent to testing $\text{ vec }(\varvec{{M}}\varvec{{P}})=\varvec{{0}}$. Set

$$\begin{aligned} \varvec{{y}}_{i}=\text{ vec }(\varvec{{X}}_{i}\varvec{{P}}),\ i=1,\ldots ,n, \end{aligned}$$

(26)

which are i.i.d. $(qk)\times 1$ random vectors. Then testing (25) based on the i.i.d. random matrices $\varvec{{X}}_{1},\ldots ,\varvec{{X}}_{n}$ is equivalent to testing (3) with the induced i.i.d. random vectors (26) and with $\varvec{{\mu }}={\text {E}}(\varvec{{y}}_1)=\text{ vec }(\varvec{{M}}\varvec{{P}})$. Therefore, our normal reference one-sample test described in Sect. 2 can then be applied to test (25) via applying it to the induced i.i.d. random vectors (26). Similar structural hypotheses on the rows of the mean matrix can also be tested accordingly. Besides, the technical Conditions C1–C5 can be easily adapted to the original transposable data as in Touloumis et al. (2015), so asymptotic results derived in Sect. 2 also apply here. To test (25), Touloumis et al. (2015) constructed a test using U-statistics as in $T_{CQ}$ of Chen and Qin (2010). Like $T_{CQ}$, their test requires some strong assumptions so that a normal approximation to the null distribution of the test statistic is valid.

As a real data example, we consider the following mean matrix structure hypothesis studied by Touloumis et al. (2015) on the glioblastoma (GB) transposable dataset provided by Sottoriva et al. (2013):

$$\begin{aligned} H_{0}:\ \varvec{{M}}=(\varvec{{\mu }}_{1},\varvec{{\mu }}_{2},\varvec{{\mu }}_{3}\varvec{{1}}_{5}^{\top }), \; \text {versus}\; H_{1}: H_{0}\; \text { is not true,} \end{aligned}$$

(27)

where the columns of $\varvec{{M}}$ represent the mean gene expression patterns of different brain compartments, with $\varvec{{\mu }}_{1}$ corresponding to the tumor margin (MA), $\varvec{{\mu }}_{2}$ corresponding to the sub-ventricular zone (SVZ, normal brain tissue that surrounds the tumor mass), and $\varvec{{\mu }}_{3},\ldots ,\varvec{{\mu }}_{7}$ corresponding to 5 different fragments in the tumor mass such that earlier fragments are closer to MA and later fragments closer to SVZ. The null hypothesis in (27) corresponds to the biological hypothesis of the conservation of the mean vectors of gene expression levels across the tumor mass. The GB dataset consists of $n=8$ patients for $k=7$ mRNA samples (column variables), with each sample having $q=16,810$ (row variables) gene expression levels measured. We apply the test $T_{TTM}$ proposed by Touloumis et al. (2015), and $T_{new}$, $T_{ZX}$, $T_{BS}$, $T_{CQ}$ and $T_{SD}$ to the transformed data (26) to test the null hypothesis in (27). The associated p values are given in the left panel of Table 5. It is seen that all the p values are comparable and they suggest that there is not enough evidence to reject the null hypothesis in (27). Because we do not reject the null hypothesis in (27), it is of interest to further test the following hypotheses:

$$\begin{aligned} H_{0}:\ \varvec{{M}}=(\varvec{{\mu }}_{1}\varvec{{1}}_{2}^{\top },\varvec{{\mu }}_{2}\varvec{{1}}_{5}^{\top }), \ \text {versus}\ H_{1}: \; H_{0}\ \text {is not true,} \end{aligned}$$

(28)

where the null hypothesis corresponds to the biological hypothesis that MA and SVZ have a common mean gene expression pattern and the 5 different fragments in the tumor mass also have a common mean gene expression pattern. The testing results are given in the right panel of Table 5. It is seen that all the tests reject the null hypothesis in (28). From Table 5, it is seen that the estimated degrees of freedom’s of $T_{new}$ and $T_{ZX}$ are quite large, showing that the normal approximation to the respective null distributions of $T_{BS},\ T_{CQ},\ T_{TTM}$ and $T_{SD}$ may be adequate.

Table 5 Testing the null hypotheses in (27) and (28) for mean gene expression levels of the glioblastoma data

Full size table

4.3 Two-sample problem and MANOVA

In this subsection, we show how to use the proposed one-sample test to solve problems with two or more independent samples, e.g., the two-sample problem and MANOVA, by transforming them into a one-sample problem. There is abundant literature in the high-dimensional two-sample problem and MANOVA, see Dempster (1958), Bai and Saranadasa (1996), Srivastava and Du (2008), Chen and Qin (2010), Schott (2007), Yamada and Himeno (2015), Hu et al. (2017) and references therein. One of the advantages of using the transformation method to solve k-sample problems as a one-sample problem is that heteroscedasticity can be automatically overcome so there is no need to assume a common covariance matrix for different samples (Zhang and Xu 2009, Nishiyama et al. 2013).

Given k independent normal samples $\varvec{{x}}_{ij},\ i=1,\ldots ,n_{j}{\mathop {\sim }\limits ^{\text {i.i.d.}}}\mathcal {N}(\varvec{{\mu }}_{j},\varvec{{\varSigma }}_{j});\ \ j=1,\ldots ,k,$ where suppose $n_{1}\le \cdots \le n_{k}$, we firstly consider testing the simple linear hypotheses

$$\begin{aligned} H_{0}:\sum _{j=1}^{k}c_{j}\varvec{{\mu }}_{j}=\varvec{{0}},\; \text {versus}\; H_{1}: \sum _{j=1}^{k}c_{j}\varvec{{\mu }}_{j}\ne \varvec{{0}}, \end{aligned}$$

(29)

where $c_{1},\ldots ,c_{k}$ are some given scalars. To apply the proposed one-sample test to the above problem, we can transform the k samples into one sample by the following transformation (Anderson 2003, Sect. 5.5): for $i=1,\ldots ,n_{1}$,

$$\begin{aligned} \varvec{{y}}_{i}=c_{1}\varvec{{x}}_{i1}+\sum _{j=2}^{k}c_{j}(n_{1}/n_{j})^{1/2} \left[ \varvec{{x}}_{ij}-n_{1}^{-1}\sum _{\ell =1}^{n_{1}}\varvec{{x}}_{\ell j}+(n_{1}n_{j})^{-1/2}\sum _{\ell =1}^{n_{j}}\varvec{{x}}_{\ell j}\right] . \end{aligned}$$

(30)

Then we have $ \varvec{{y}}_{1},\ldots ,\varvec{{y}}_{n_{1}}{\mathop {\sim }\limits ^{\text {i.i.d.}}}\mathcal {N}(\sum _{i=1}^{k}c_{i}\varvec{{\mu }}_{i}, \sum _{i=1}^{k}c_{i}^{2}n_{1}n_{i}^{-1}\varvec{{\varSigma }}_{i})$. Applying the proposed one-sample test to the induced sample we can test the hypotheses (29). In particular, let $k=2$ and $c_{1}=1,$ $c_{2}=-1$, hypotheses (29) reduce to the two-sample problem studied in Chen and Qin (2010), and the transformation (30) reduces to the multivariate Scheffé (1943)’s transformation, also known as Bennett (1950)’s transformation.

For the MANOVA problem, i.e., testing

$$\begin{aligned} H_{0}:\varvec{{\mu }}_{1}=\cdots =\varvec{{\mu }}_{k},\; \text {versus}\; H_{1}: H_{0}\; \text { is not true,} \end{aligned}$$

(31)

where $k\ge 3$, we can use the “dimension stacking” trick described by Anderson (1963). By applying the transformation (30) $k-1$ times, where in the j-th time, set $c_{1}=1$, $c_{j}=-1$ other coefficients zero, we get $k-1$ samples $\varvec{{y}}_{ij},\ i=1,\ldots ,n_{1};\ j=1,\ldots ,(k-1)$. By stacking $(k-1)$ observations from different samples into a single observation, i.e., define $\varvec{{y}}_{i}=(\varvec{{y}}_{i1}^{\top },\ldots ,\varvec{{y}}_{i(k-1)}^{\top })^{\top },\ i=1,\ldots ,n_{1}$, the original k samples are transformed into one sample with mean vector $(\varvec{{\mu }}_{1}^{\top }-\varvec{{\mu }}_{2}^{\top },\ldots ,\varvec{{\mu }}_{1}^{\top }-\varvec{{\mu }}_{k}^{\top })^{\top }$, and the MANOVA problem (31) for the original k samples is also converted to the one-sample problem for the induced sample. See also Zhang and Xu (2009) for more details of this approach.

As a real data example, we consider the peripheral blood mononuclear cells (PBMC) data provided by Burczynski et al. (2006), which is a microarray data contains 22,283 gene expression levels of 42 normal, 26 ulcerative colitis (UC), and 59 Crohn’s disease (CD) tissues. We apply different one-sample tests based on the transformation method to check whether the three PBMC tissues have the same mean expression levels. The testing results are given in Table 6, where all tests reject the null hypothesis that the mean gene expression levels of the three PBMC tissues are the same. This result is consistent with the result reported in Table 7 of Zhang et al. (2017), and the testing result given by the one-sample test $T_{ZX}$ is very similar to the result given by the MANOVA test proposed by Zhang et al. (2017). It is seen that the estimated degrees of freedom’s of $T_{new}$ and $T_{ZX}$ are quite small, indicating that the normal approximation to the respective null distributions of $T_{BS}$, $T_{CQ}$ and $T_{SD}$ is not adequate.

Table 6 Testing if the mean gene expression levels of the three PBMC tissues are the same

Full size table

4.4 One-sample problem for heavy tailed data

Direct applications of $T_{new}$, $T_{ZX}$, $T_{BS}$, $T_{CQ}$ and $T_{SD}$ to one-sample problem for heavy tailed high-dimensional data are often less powerful. To overcome this difficulty, one may apply these tests to the induced one sample yielded from the following multivariate spatial sign transformation:

$$\begin{aligned} \varvec{{u}}_{i}=U(\varvec{{y}}_{i})={\left\{ \begin{array}{ll} \frac{\varvec{{y}}_{i}}{||\varvec{{y}}_{i}||}, &{} \varvec{{y}}_{i}\ne \varvec{{0}},\\ 0, &{} \varvec{{y}}_{i}=\varvec{{0}}. \end{array}\right. },\ i=1,\ldots ,n. \end{aligned}$$

(32)

For example, Wang et al. (2015) and Zhou et al. (2019) successfully apply $T_{CQ}$ and $T_{ZX}$ to the induced sample (32) for elliptically distributed high-dimensional data, respectively.

To compare the performance of $T_{new}, T_{ZX}, T_{BS}, T_{CQ}$ and $T_{SD}$ on the induced one sample (32) for heavy tailed high-dimensional data, we conduct the following simulation study. We generate a heavy tailed high-dimensional sample using $\varvec{{y}}_i=\varvec{{\mu }}+\varvec{{\varSigma }}^{1/2}\varvec{{z}}_i,\ i=1,\ldots , n$ where $\varvec{{\mu }}$ and $\varvec{{\varSigma }}$ are specified as in Sect. 3 and $\varvec{{z}}_{i},\ i=1,\ldots ,n$ are generated using the following two models:

Model 4: $z_{ir},\ r=1,\ldots ,p$ i.i.d. follow a Gaussian mixture $0.9\mathrm{\mathcal {N}(0,1)+0.1\mathcal {N}(0,9)}$.
Model 5: $\varvec{{z}}_{i}=\varvec{{w}}_{i}/\sqrt{0.3}$, with $\varvec{{w}}_{i}$ following a p-dimensional multivariate $\mathrm{t}$-distribution with 3 degrees of freedom, mean $\varvec{{0}}$ and covariance matrix $\varvec{{I}}_p$.

Table 7 Empirical sizes (in $\%$) of the tests for heavy tailed distributions

Full size table

Table 8 Empirical powers (in $\%$) of the tests for heavy tailed distributions

Full size table

Table 9 Estimated approximate degrees of freedom when $\delta =0$ for heavy tailed distributions

Full size table

Tables 7, 8 and 9 present the empirical sizes, powers of the tests and estimated degrees of freedoms of $T_{new}$ and $T_{ZX}$, respectively. As expected, the conclusions drawn from these three tables are similar to those drawn from Tables 1, 2 and 3 in Sect. 3. In particular, in terms of size control, $T_{new}$ again outperforms other tests significantly.

5 Concluding remarks

In this paper, we propose and study a normal reference test with three-cumulant matched $\chi ^2$-approximation for the one-sample problem for high-dimensional data. A simulation study shows that in terms of size control, the proposed test outperforms several existing competitors. The proposed test can also be applied for testing one-sample problems with other types of data via some simple transformations. When the data are normally distributed, it is known that the estimated approximation parameters are ratio-consistent. However, whether they are also ratio-consistent for non-normal high-dimensional data is interesting and warranted. Since the normal reference test with the 3-c matched $\chi ^2$-approximation for one-sample problems for high-dimensional data has much better size control than several existing tests, it is also interesting and warranted to extend this normal reference test to other high-dimensional hypothesis testing problems.

References

Ahmad MR, Werner C, Brunner E (2008) Analysis of high-dimensional repeated measures designs: the one sample case. Comput Stat Data Anal 53(2):416–427
Article MathSciNet MATH Google Scholar
Allen GI, Tibshirani R (2010) Transposable regularized covariance models with an application to missing data imputation. Ann Appl Stat 4(2):764–790
Article MathSciNet MATH Google Scholar
Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750
Article Google Scholar
Anderson TW (1963) A test for equality of means when covariance matrices are unequal. Ann Math Stat 34(2):671–672
Article MathSciNet MATH Google Scholar
Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New York
MATH Google Scholar
Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6(2):311–329
MathSciNet MATH Google Scholar
Bai Z, Hu J, Wang C, Zhang C (2021) Test on the linear combinations of covariance matrices in high-dimensional data. Stat Pap 62:701–719
Article MathSciNet MATH Google Scholar
Bennett BM (1950) Note on a solution of the generalized Behrens-Fisher problem. Ann Inst Stat Math 2(1):87–90
Article MathSciNet MATH Google Scholar
Box GE (1954) Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Ann Math Stat 25(2):290–302
Article MathSciNet MATH Google Scholar
Burczynski ME, Peterson RL, Twine NC, Zuberek KA, Brodeur BJ, Casciotti L, Maganti V, Reddy PS, Strahs A, Immermann F, Spinelli W, Schwertschlag U, Slager AM, Cotreau MM, Dorner AJ (2006) Molecular classification of Crohn’s disease and ulcerative colitis patients using transcriptional profiles in peripheral blood mononuclear cells. J Mol Diagn 8(1):51–61
Chen SX, Qin YL (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38(2):808–835
Article MathSciNet MATH Google Scholar
Chen LS, Paul D, Prentice RL, Wang P (2011) A regularized Hotelling’s $T^2$ test for pathway analysis in proteomic studies. J Am Stat Assoc 106(496):1345–1360
Article MATH Google Scholar
Dempster AP (1958) A high dimensional two sample significance test. Ann Math Stat 29(4):995–1010
Article MathSciNet MATH Google Scholar
Dong K, Pang H, Tong T, Genton MG (2016) Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data. J Multivar Anal 143:127–142
Article MathSciNet MATH Google Scholar
Feng L, Sun F (2016) Spatial-sign based high-dimensional location test. Electron J Stat 10(2):2420–2434
Article MathSciNet MATH Google Scholar
Feng L, Zou C, Wang Z, Zhu L (2017) Composite $T^2$ test for high-dimensional data. Stat Sin 27:1419–1436
MATH Google Scholar
Hall P (1983) Chi squared approximations to the distribution of a sum of independent random variables. Ann Probab 11(4):1028–1036
Article MathSciNet MATH Google Scholar
Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap 43:467–506
Article MathSciNet MATH Google Scholar
Hu J, Bai Z, Wang C, Wang W (2017) On testing the equality of high dimensional mean vectors with unequal covariance matrices. Ann Inst Stat Math 69:365–387
Article MathSciNet MATH Google Scholar
Hu Z, Tong T, Genton MG (2019) Diagonal likelihood ratio test for equality of mean vectors in high-dimensional data. Biometrics 75:256–267
Article MathSciNet MATH Google Scholar
Katayama S, Kano Y, Srivastava MS (2013) Asymptotic distributions of some test criteria for the mean vector with fewer observations than the dimension. J Multivar Anal 116:410–421
Article MathSciNet MATH Google Scholar
Li H, Aue A, Paul D (2020) High-dimensional general linear hypothesis tests via non-linear spectral shrinkage. Bernoulli 26(4):2541–2571
Article MathSciNet MATH Google Scholar
Nishiyama T, Hyodo M, Seo T, Pavlenko T (2013) Testing linear hypotheses of mean vectors for high-dimension data with unequal covariance matrices. J Stat Plann Inference 143(11):1898–1911
Article MathSciNet MATH Google Scholar
Paindaveine D, Verdebout T (2016) On high-dimensional sign tests. Bernoulli 22(3):1745–1769
Article MathSciNet MATH Google Scholar
Park J, Ayyala DN (2013) A test for the mean vector in large dimension and small samples. J Stat Plann Inference 143(5):929–943
Article MathSciNet MATH Google Scholar
Pauly M, Ellenberger D, Brunner E (2015) Analysis of high-dimensional one group repeated measures designs. Statistics 49(6):1243–1261
Article MathSciNet MATH Google Scholar
Peng L, Qi Y, Wang F (2014) Test for a mean vector with fixed or divergent dimension. Stat Sci 29(1):113–127
Article MathSciNet MATH Google Scholar
Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biometrics Bull 2(6):110–114
Article Google Scholar
Scheffé H (1943) On solutions of the Behrens-Fisher problem, based on the $t$-distribution. Ann Math Stat 14(1):35–44
Article MathSciNet MATH Google Scholar
Schott JR (2007) Some high-dimensional tests for a one-way MANOVA. J Multivar Anal 98(9):1825–1839
Article MathSciNet MATH Google Scholar
Shen Y, Lin Z (2015) An adaptive test for the mean vector in large-p-small-n problems. Comput Stat Data Anal 89:25–38
Article MathSciNet MATH Google Scholar
Shen Y, Lin Z, Zhu J (2011) Shrinkage-based regularization tests for high-dimensional data with application to gene set analysis. Comput Stat Data Anal 55(7):2221–2233
Article MathSciNet MATH Google Scholar
Silva IR, Zhuang Y, da Silva Junior JCA (2021) Kronecker delta method for testing independence between two vectors in high-dimension. Stat Pap (In Press)
Sottoriva A, Spiteri I, Piccirillo SGM, Touloumis A, Collins VP, Marioni JC, Curtis C, Watts C, Tavaré S (2013) Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc Natl Acad Sci 110(10):4009–4014
Article Google Scholar
Srivastava MS, Du M (2008) A test for the mean vector with fewer observations than the dimension. J Multivar Anal 99(3):386–402
Article MathSciNet MATH Google Scholar
Srivastava MS, Kubokawa T (2013) Tests for multivariate analysis of variance in high dimension under non-normality. J Multivar Anal 115:204–216
Article MathSciNet MATH Google Scholar
Srivastava MS, Yanagihara H (2010) Testing the equality of several covariance matrices with fewer observations than the dimension. J Multivar Anal 101(6):1319–1329
Article MathSciNet MATH Google Scholar
Touloumis A, Tavaré S, Marioni JC (2015) Testing the mean matrix in high-dimensional transposable data. Biometrics 71(1):157–166
Article MathSciNet MATH Google Scholar
Wang R, Xu X (2019) A feasible high dimensional randomization test for the mean vector. J Stat Plann Inference 199:160–178
Article MathSciNet MATH Google Scholar
Wang L, Peng B, Li R (2015) A high-dimensional nonparametric multivariate test for mean vector. J Am Stat Assoc 110(512):1658–1669
Article MathSciNet MATH Google Scholar
Welch BL (1947) The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika 34(1/2):28–35
Article MathSciNet MATH Google Scholar
Yamada T, Himeno T (2015) Testing homogeneity of mean vectors under heteroscedasticity in high-dimension. J Multivar Anal 139:7–27
Article MathSciNet MATH Google Scholar
Zhang JT (2005) Approximate and asymptotic distributions of chi-squared-type mixtures with applications. J Am Stat Assoc 100(469):273–285
Article MathSciNet MATH Google Scholar
Zhang JT (2013) Analysis of variance for functional data. CRC Press, Boca Raton
Book Google Scholar
Zhang JT, Xu J (2009) On the k-sample Behrens-Fisher problem for high-dimensional data. Sci China Ser A 52(6):1285–1304
Article MathSciNet MATH Google Scholar
Zhang JT, Guo J, Zhou B (2017) Linear hypothesis testing in high-dimensional one-way MANOVA. J Multivar Anal 155:200–216
Article MathSciNet MATH Google Scholar
Zhang JT, Guo J, Zhou B, Cheng MY (2020) A simple two-sample test in high dimensions based on $L^2$-norm. J Am Stat Assoc 115(530):1011–1027
Article MATH Google Scholar
Zhang T, Wang Z, Wan Y (2021) Functional test for high-dimensional covariance matrix, with application to mitochondrial calcium concentration. Stat Pap 62:1213–1230
Article MathSciNet MATH Google Scholar
Zhao J (2017) A new test for the mean vector in large dimension and small samples. Commun Stat-Simul Comput 46(8):6115–6128
Article MathSciNet MATH Google Scholar
Zhou B, Guo J, Chen J, Zhang JT (2019) An adaptive spatial-sign-based test for mean vectors of elliptically distributed high-dimensional data. Stat Interface 12:93–106
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
Jin-Ting Zhang
School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou, China
Bu Zhou
Collaborative Innovation Center of Statistical Data Engineering, Technology & Application, Zhejiang Gongshang University, Hangzhou, China
Bu Zhou
School of Management, Zhejiang University of Technology, Hangzhou, China
Jia Guo

Authors

Jin-Ting Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jia Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bu Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jin-Ting Zhang was financially supported by the National University of Singapore Academic Research Grant R-155-000-187-114. Bu Zhou was supported by the National Natural Science Foundation of China under Grant No. 11901520, the Zhejiang Provincial Natural Science Foundation of China under Grant No. LY21A010007, and the Characteristic & Preponderant Discipline of Key Construction Universities in Zhejiang Province (Zhejiang Gongshang University—Statistics). Jia Guo was supported by the National Natural Science Foundation of China under Grant No. 11901522. We are grateful to two reviewers for their valuable comments and suggestions which help to improve the presentation of the paper.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 332 KB)

Appendix: Technical proofs

We first prove the following useful lemma.

Lemma 1

Let $\varvec{{W}}\sim W_{p}(v,\varvec{{\varSigma }}/v)$ denote a Wishart distribution with v degrees of freedom and a covariance matrix $\varvec{{\varSigma }}/v$. Assume that $p/v^2\longrightarrow 0$ as $v, p\rightarrow \infty $. Then the unbiased and ratio-consistent estimator of ${\text {tr}}(\varvec{{\varSigma }}^{3})$ is given by

$$\begin{aligned} \widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}=\frac{v^{4}}{(v-1)(v+4)(v^{2}-4)}\left[ {\text {tr}}(\varvec{{W}}^{3})-\frac{3}{v}{\text {tr}}(\varvec{{W}}){\text {tr}}(\varvec{{W}}^{2})+\frac{2}{v^{2}}{\text {tr}}^{3}(\varvec{{W}})\right] . \end{aligned}$$

(A.33)

Proof of Lemma 1

Let $\varvec{{V}}=v\varvec{{W}}$. Then $\varvec{{V}}\sim W_{p}(v,\varvec{{\varSigma }})$. To show (A.33) is equivalent to show

$$\begin{aligned} \widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}=\frac{v}{(v-1)(v+4)(v^{2}-4)}\left[ {\text {tr}}(\varvec{{V}}^{3})-\frac{3}{v}{\text {tr}}(\varvec{{V}}){\text {tr}}(\varvec{{V}}^{2})+\frac{2}{v^{2}}{\text {tr}}^{3}(\varvec{{V}})\right] \end{aligned}$$

(A.34)

is unbiased and ratio consistent for ${\text {tr}}(\varvec{{\varSigma }}^{3})$. By the proof of Theorem 2.1 of Srivastava and Yanagihara (2010), we can express the above expression as $\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}=I_{1}+I_{2}+I_{3}+I_{4},$ where

$$\begin{aligned} \begin{array}{rcl} I_{1} &{} = &{} \frac{1}{v(v+2)(v+4)}\sum _{i=1}^{p}\lambda _{i}^{3}w_{ii}^{3},\\ I_{2} &{} = &{} \frac{3}{(v-1)(v+2)(v+4)}\sum _{i\ne j}\lambda _{i}^{2}\lambda _{j}\left( w_{ii}w_{ij}^{2}-v^{-1}w_{ii}^{2}w_{jj}\right) ,\\ I_{3} &{} = &{} \frac{v}{(v-1)(v+4)(v^{2}-4)}\sum _{i\ne j\ne k}\lambda _{i}\lambda _{j}\lambda _{k}\left( w_{ij}w_{jk}w_{ki}-v^{-2}w_{ii}w_{jj}w_{kk}\right) ,\\ I_{4} &{} = &{} \frac{-3}{(v-1)(v+4)(v^{2}-4)}\sum _{i\ne j\ne k}\lambda _{i}\lambda _{j}\lambda _{k}\left( w_{ij}w_{jk}^{2}-v^{-1}w_{ii}w_{jj}w_{kk}\right) , \end{array} \end{aligned}$$

where $w_{ij}=\varvec{{u}}_{i}^{\top }\varvec{{u}}_{j},\ i,j=1,\ldots ,p$ and $\varvec{{u}}_{1},\ldots ,\varvec{{u}}_{p}{\mathop {\sim }\limits ^{\text {i.i.d.}}}\mathcal {N}_{v}(\varvec{{0}},\varvec{{I}}_{v})$. By Lemma 6.2 (a) and (e) of Srivastava and Yanagihara (2010), we have ${\text {E}}(w_{ii}^{3})=v(v+2)(v+4)$ and ${\text {Var}}(w_{ii}^{3})=6v(v+2)(v+4)(3v^{2}+30v+80)$. Therefore,

$$\begin{aligned} \begin{array}{rcl} {\text {E}}(I_{1}) &{} = &{} \frac{1}{v(v+2)(v+4)}\sum _{i=1}^{p}\lambda _{i}^{3}{\text {E}}(w_{ii}^{3})={\text {tr}}(\varvec{{\varSigma }}^{3}),\\ {\text {Var}}(I_{1}) &{} = &{} [\frac{1}{v(v+2)(v+4)}]^{2}\sum _{i=1}^{p}\lambda _{i}^{6}{\text {Var}}(w_{ii}^{3})=v^{-1}{\text {tr}}(\varvec{{\varSigma }}^{6})[1+o(1)]. \end{array} \end{aligned}$$

Noting that

$$\begin{aligned} {\text {tr}}(\varvec{{A}}^{r})/{\text {tr}}^{r}(\varvec{{A}})\le 1,\ r=1,2,\ldots \end{aligned}$$

(A.35)

hold for any nonnegative matrix $\varvec{{A}}$, we have ${\text {Var}}[I_{1}/{\text {tr}}(\varvec{{\varSigma }}^{3})]<v^{-1}[1+o(1)]$. It follows that $I_{1}/{\text {tr}}(\varvec{{\varSigma }}^{3})\longrightarrow 1$ as $v\rightarrow \infty $.

Following the proof of Theorem 2.1 of Srivastava and Yanagihara (2010) closely, let $r_{ij}=w_{ii}w_{ij}^{2}-v^{-1}w_{ii}^{2}w_{jj}$. Then we have ${\text {E}}(r_{ij})=0$ and ${\text {Cov}}(r_{ij},r_{kl})=0$ where $(i\ne j),\ (k\ne l)$ and $(i,j)\ne (k,l)$. In addition, by Lemma 6.2 (f), (g) and (h) of Srivastava and Yanagihara (2010), we have

$$\begin{aligned} \begin{array}{rcl} {\text {Var}}(r_{ij}) &{} = &{} {\text {E}}\left( w_{ii}w_{ij}^{2}-v^{-1}w_{ii}^{2}w_{jj}\right) ^{2}={\text {E}}\left( w_{ii}^{2}w_{ij}^{4}-2v^{-1}w_{ii}^{3}w_{ij}^{2}w_{jj}+v^{-2}w_{ii}^{4}w_{jj}^{2}\right) \\ &{} = &{} O(v^{4})-O(v^{-1}v^{5})+O(v^{-2}v^{6})=O(v^{4}). \end{array} \end{aligned}$$

It follows that ${\text {E}}(I_{2})=0$ and

$$\begin{aligned} \begin{array}{rcl} {\text {Var}}(I_{2})&{}=&{}9\left[ (v-1)(v+2)(v+4)\right] ^{-2}\sum _{i\ne j}\lambda _{i}^{4}\lambda _{j}^{2}{\text {Var}}(r_{ij})\\ &{}\le &{} 9v^{-2}{\text {tr}}(\varvec{{\varSigma }}^{4}){\text {tr}}(\varvec{{\varSigma }}^{2})[1+o(1)]. \end{array} \end{aligned}$$

It follows from (A.35) that ${\text {Var}}\left[ I_{2}/{\text {tr}}(\varvec{{\varSigma }}^{3})\right] \le [{\text {tr}}(\varvec{{\varSigma }}^{4})(9\kappa )]/[{\text {tr}}^{2}(\varvec{{\varSigma }}^{2})v^{2}][1+o(1)]\le {9\kappa }/{v^{2}}[1+o(1)],$ where

$$\begin{aligned} \kappa =\frac{{\text {tr}}^{3}(\varvec{{\varSigma }}^{2})}{{\text {tr}}^{2}(\varvec{{\varSigma }}^{3})}. \end{aligned}$$

(A.36)

By Theorem 5 of Zhang et al. (2020), we have $\kappa /p\le 1$. Since $p/v^2\longrightarrow 0$ as $v,p\rightarrow \infty $, we have $\kappa /v^{2}\longrightarrow 0$ as $v,p\rightarrow \infty $. Therefore, we have that $I_{2}=o_{p}[{\text {tr}}(\varvec{{\varSigma }}^{3})]$.

Similarly, we have ${\text {E}}(I_{3})=0,\ {\text {E}}(I_{4})=0$ and

$$\begin{aligned} {\text {Var}}[I_{3}/{\text {tr}}(\varvec{{\varSigma }}^{3})]\le \kappa v^{-3}[1+o(1)]\longrightarrow 0,\;\;{\text {Var}}[I_{4}/{\text {tr}}(\varvec{{\varSigma }}^{3})]\le \kappa v^{-4}[1+o(1)]\longrightarrow 0, \end{aligned}$$

showing that $I_{3}=o_{p}[{\text {tr}}(\varvec{{\varSigma }}^{3})]$ and $I_{4}=o_{p}[{\text {tr}}(\varvec{{\varSigma }}^{3})]$. It follows that $\widehat{{\text {tr}}(\varvec{{\varSigma }}^{3})}$ is an unbiased and ratio-consistent estimator of ${\text {tr}}(\varvec{{\varSigma }}^{3})$. The lemma is then proved. $\square $

We now prove the main results.

Proof of Theorem 1

We first prove (a). We shall use the characteristic function ($\psi _{X}(t)={\text {E}}(e^{itX})$ for a random variable X) method. Set $\varvec{{x}}_i=\varvec{{y}}_i-\varvec{{\mu }},\ i=1,\ldots ,n$ and hence $\bar{\varvec{{x}}}=\bar{\varvec{{y}}}-\varvec{{\mu }}$. Set $\varvec{{w}}_{n,p}=\sqrt{n}\bar{\varvec{{x}}}$. By (10), we have

$$\begin{aligned} \tilde{T}_{n,p,0}=\left[ \Vert \varvec{{w}}_{n,p}\Vert ^2-{\text {tr}}(\varvec{{\varSigma }})\right] /\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}[1+o(1)], \end{aligned}$$

(A.37)

since ${\text {tr}}(\hat{\varvec{{\varSigma }}})/{\text {tr}}(\varvec{{\varSigma }})\longrightarrow 1$ as $n,p\rightarrow \infty $ (See Proof of Theorem 9 in Zhang et al. 2020). Further, we have ${\text {E}}(\varvec{{w}}_{n,p})=\varvec{{0}}$ and ${\text {Cov}}(\varvec{{w}}_{n,p})=\varvec{{\varSigma }}$. Write $\varvec{{w}}_{n,p}=\sum _{r=1}^{p}\xi _{n,p,r}\varvec{{u}}_{p,r}$, where $\xi _{n,p,r}=\varvec{{w}}_{n,p}^{\top }\varvec{{u}}_{p,r}$, and $\varvec{{u}}_{p,1},\ldots ,\varvec{{u}}_{p,p}$ denote the eigenvectors associated with the eigenvalues $\lambda _{p,1},\ldots ,\lambda _{p,p}$ of $\varvec{{\varSigma }}$ in the descending order. We have ${\text {E}}(\xi _{n,p,r})=0$, and ${\text {Var}}(\xi _{n,p,r})=\lambda _{p,r},\ r=1,2,\ldots $, and $\xi _{n,p,r},\ r=1,\ldots ,p$, are uncorrelated.

By Lemma S.4 of Zhang et al. (2020), we have

$$\begin{aligned} {\text {Var}}(\xi _{n,p,r}^{2})=2\lambda _{p,r}^{2}+[{\text {E}}(\varvec{{x}}_{1}^{\top }\varvec{{u}}_{p,r})^{4}-3\lambda _{p,r}^{2}]/n. \end{aligned}$$

Under Conditions C1 and C2, by some simple algebra, we have ${\text {E}}(\varvec{{x}}_{1}^{\top }\varvec{{u}}_{p,r})^{4}\le (3+\varDelta )\lambda _{p,r}^{2}$. Thus, we have

$$\begin{aligned} {\text {Var}}(\xi _{n,p,r}^{2})\le (2+\varDelta /n)\lambda _{p,r}^{2},\ r=1,2,\ldots . \end{aligned}$$

(A.38)

It follows from (A.37) that

$$\begin{aligned} \tilde{T}_{n,p,0}=\sum _{r=1}^p (\xi _{n,p,r}^2-\lambda _{p,r})/\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}[1+o(1)]. \end{aligned}$$

Set $\tilde{T}_{n,p,0}^{(q)}=\sum _{r=1}^q (\xi _{n,p,r}^2-\lambda _{p,r})/\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}$ where $q<p$. Then

$$\begin{aligned}&{\text {E}}\big (\tilde{T}_{n,p,0}-\tilde{T}_{n,p,0}^{(q)}\big )^{2}={\text {E}}\bigg [\sum _{r=q+1}^{p}(\xi _{n,p,r}^{2}-\lambda _{p,r})/\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}\bigg ]^{2}[1+o(1)]\\&\qquad \qquad \qquad \qquad \qquad \qquad ={\text {Var}}\bigg (\sum _{r=q+1}^{p}\xi _{n,p,r}^{2}\bigg )/\left[ 2{\text {tr}}(\varvec{{\varSigma }}^2)\right] [1+o(1)]\\&\qquad \qquad \qquad \qquad \qquad \qquad \le \bigg [\sum _{r=q+1}^{p}\sqrt{{\text {Var}}(\xi _{n,p,r}^{2})}\bigg ]^{2}/\left[ 2{\text {tr}}(\varvec{{\varSigma }}^2)\right] [1+o(1)]. \end{aligned}$$

By (A.38), we have

$$\begin{aligned} \bigg [\sum _{r=q+1}^p \sqrt{{\text {Var}}(\xi _{n,p,r}^{2})} \bigg ]^2/\left[ 2{\text {tr}}(\varvec{{\varSigma }}^2)\right] \le \left( 1+\varDelta /n\right) \bigg (\sum _{r=q+1}^p \rho _{p,r}\bigg )^2. \end{aligned}$$

It follows that

$$\begin{aligned} \begin{array}{rcl} |\psi _{\tilde{T}_{n,p,0}}(t)-\psi _{\tilde{T}_{n,p,0}^{(q)}}(t)| &{}\le &{} |t|\big [{\text {E}}(\tilde{T}_{n,p,0}-\tilde{T}_{n,p,0}^{(q)})^{2}\big ]^{1/2}\\ &{}\le &{} |t|\left( 1+\varDelta /n\right) ^{1/2} \bigg (\sum _{r=q+1}^{p}\rho _{p,r}\bigg )[1+o(1)]. \end{array} \end{aligned}$$

(A.39)

Set $\zeta ^{(q)}=\sum _{r=1}^{q}\rho _{r}(A_{r}-1)/\sqrt{2}$, we have

$$\begin{aligned} \big |\psi _{\tilde{T}_{n,p,0}}(t)-\psi _{\zeta }(t)\big |\le & {} \big |\psi _{\tilde{T}_{n,p,0}}(t)-\psi _{\tilde{T}_{n,p,0}^{(q)}}(t)\big | + \big |\psi _{\tilde{T}_{n,p,0}^{(q)}}(t)-\psi _{\tilde{T}_{p,0}^{(q)}}(t)\big | \\&+\big |\psi _{\tilde{T}_{p,0}^{(q)}}(t)-\psi _{\zeta ^{(q)}}(t)\big |+\big |\psi _{\zeta ^{(q)}}(t)-\psi _{\zeta }(t)\big |. \end{aligned}$$

By similar arguments in Proof of Theorem 2 of Zhang et al. (2020), we can show under Conditions C1–C3 all the four terms on the right hand side of the previous inequality converge to zero as $n, p\rightarrow \infty $, so the first expression of (11) follows. In particular, the convergence of the first term can be derived from (A.39) and Condition C3, the convergence of the second term is ensured by the standard central limit theorem, and the convergence of the last two terms is due to Condition C3.

Notice that when the data (2) are normally distributed, Conditions C1–C2 are automatically satisfied so that under Condition C3, the second expression of (11) follows immediately since under the normality assumption, we have $T_{n,p,0}{\mathop {=}\limits ^{d}}T_{n,p,0}^*$.

We now prove (b). By Lemma 8.1 of Pauly et al. (2015), Condition C5 is equivalent to the condition “${\text {tr}}(\varvec{{\varSigma }}^4)/{\text {tr}}^2(\varvec{{\varSigma }}^2)=o(1)$” imposed by Chen and Qin (2010). Therefore, under Conditions C1, C2 and C5, the proofs of the asymptotic normality of $\tilde{T}_{n,p,0}$ and $\tilde{T}_{n,p,0}^*$ are along the same lines as the one given by Chen and Qin (2010) for the asymptotic normality of their test statistic.

Finally, we prove (13). We have $\sup _{x} \big |\Pr (T_{n,p,0}\le x)-\Pr (T_{n,p,0}^*\le x)\big |\le \sup _{x} \big |\Pr (\tilde{T}_{n,p,0}\le \tilde{x})-\Pr (\zeta \le \tilde{x})\big |+ \sup _{x} \big |\Pr (\tilde{T}_{n,p,0}^*\le \tilde{x})-\Pr (\zeta \le \tilde{x})\big |$, where $\tilde{x}=[x-{\text {tr}}(\hat{\varvec{{\varSigma }}})]/\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}$. Under the conditions of (a), the two terms on the right hand side of previous inequality both converge to zero because both $\tilde{T}_{n,p,0}$ and $\tilde{T}_{n,p,0}^*$ converge to $\zeta $ in distribution. Under the conditions of (b), the proof of (13) are along the same lines as the above and hence are omitted for space saving. $\square $

Proof of Theorem 2

Under the local alternative (21), $T_{n,p}=\left( T_{n,p,0}+n\Vert \varvec{{\mu }}\Vert ^2\right) [1+o_p(1)]$. In addition, under the given conditions, we have $\hat{\beta }_0/\beta _0{\mathop {\longrightarrow }\limits ^{P}}1,\ \hat{\beta }_1/\beta _1{\mathop {\longrightarrow }\limits ^{P}}1$ and $\hat{d}/d{\mathop {\longrightarrow }\limits ^{P}}1$ as $n, p\rightarrow \infty $. We first prove (a). Under Conditions C1, C2 and C3, Theorem 1(a) indicates that as $n,p\rightarrow \infty $, we have $\tilde{T}_{n,p,0}=T_{n,p,0}/\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\zeta $. It follows that as $n, p\rightarrow \infty $, we have

$$\begin{aligned} \begin{array}{rcl} \Pr \left[ T_{n,p}\ge \hat{\beta }_0+\hat{\beta }_1\chi _{\hat{d}}^2(\alpha )\right] &{}=&{} \Pr \left[ \tilde{T}_{n,p,0} \ge \frac{\beta _0+ \beta _1 \chi ^2_{d}(\alpha )}{\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}}-\frac{n\Vert \varvec{{\mu }}\Vert ^2}{\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}} \right] [1+o(1)] \\ &{}= &{} \Pr \left[ \zeta \ge \frac{\chi _{d}^2(\alpha )-d}{\sqrt{2d}}-\frac{n\Vert \varvec{{\mu }}\Vert ^2}{\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}} \right] [1+o(1)]. \end{array} \end{aligned}$$

(A.40)

We now prove (b). Under the given conditions, Theorem 1(b) indicates that as $n\rightarrow \infty $, we have $\tilde{T}_{n,p,0} {\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\mathcal {N}(0,1)$ and by Theorem 5, as $n,p\rightarrow \infty $, we have $d\longrightarrow \infty $ and $[\chi _d^2(\alpha )-d]/\sqrt{2d}\longrightarrow z_{\alpha }$ where $z_{\alpha }$ denotes the upper $100\alpha $-percentile of $\mathcal {N}(0,1)$. Then by (A.40), as $n,p\rightarrow \infty $, we have

$$\begin{aligned} \begin{array}{rcl} \Pr \left[ T_{n,p}\ge \hat{\beta }_0+\hat{\beta }_1\chi _{\hat{d}}^2(\alpha )\right] &{}=&{} \Pr \left[ \tilde{T}_{n,p,0} \ge \frac{\beta _0+ \beta _1 \chi ^2_{d}(\alpha )}{\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}}-\frac{n\Vert \varvec{{\mu }}\Vert ^2}{\sqrt{\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^2)}} \right] [1+o(1)] \\ &{}= &{} \varPhi \left[ -z_{\alpha }+\frac{n\Vert \varvec{{\mu }}\Vert ^2}{\sqrt{2{\text {tr}}(\varvec{{\varSigma }}^2)}} \right] [1+o(1)], \end{array} \end{aligned}$$

where $\varPhi (\cdot )$ denotes the cumulative distribution function of $\mathcal {N}(0,1)$. The proof is complete. $\square $

Proof of Theorem 3

Let $\varvec{{x}}_{i}=\varvec{{y}}_{i}-\varvec{{\mu }},\ i=1,\ldots ,n$. Then $\varvec{{x}}_{i},\ i=1,\ldots ,n$ are i.i.d. with ${\text {E}}(\varvec{{x}}_{i})=\varvec{{0}}$ and ${\text {Cov}}(\varvec{{x}}_{i})=\varvec{{\varSigma }}$. It is easy to verify that $T_{n,p,0}=2(n-1)^{-1}\sum _{i<j}\varvec{{x}}_{i}^{\top }\varvec{{x}}_{j}$. It follows that ${\text {E}}(T_{n,p,0})=0$ and

$$\begin{aligned} {\text {Var}}(T_{n,p,0})={\text {E}}(T_{n,p,0}^{2})=4(n-1)^{-2}\sum _{i<j}{\text {E}}(\varvec{{x}}_{i}^{\top }\varvec{{x}}_{j})^{2}=\frac{2n}{(n-1)}{\text {tr}}(\varvec{{\varSigma }}^{2}). \end{aligned}$$

Furthermore, we have

$$\begin{aligned} {\text {E}}(T_{n,p,0}^3)= & {} 8(n-1)^{-3}{\text {E}}\left[ \sum _{i<j}(\varvec{{x}}_{i}^{\top }\varvec{{x}}_{j})\right] ^{3}\\= & {} 8(n-1)^{-3}{\text {E}}\left[ \sum _{i<j}(\varvec{{x}}_{i}^{\top }\varvec{{x}}_{j})^{3}+3\sum _{*}(\varvec{{x}}_{i}^{\top }\varvec{{x}}_{j})^{2}(\varvec{{x}}_{u}^{\top }\varvec{{x}}_{v})\right. \\&\left. +6\sum _{**}(\varvec{{x}}_{i}^{\top }\varvec{{x}}_{j})(\varvec{{x}}_{u}^{\top }\varvec{{x}}_{v})(\varvec{{x}}_{\alpha }^{\top }\varvec{{x}}_{\beta })\right] \\= & {} 8(n-1)^{-3}\left\{ \frac{n(n-1)}{2}{\text {E}}(\varvec{{x}}_{1}^{\top }\varvec{{x}}_{2})^{3}+n(n-1)(n-2){\text {E}}[(\varvec{{x}}_{1}^{\top }\varvec{{x}}_{2})(\varvec{{x}}_{2}^{\top }\varvec{{x}}_{3})(\varvec{{x}}_{3}^{\top }\varvec{{x}}_{1})]\right\} \\= & {} \frac{8n(n-2)}{(n-1)^{2}}{\text {tr}}(\varvec{{\varSigma }}^{3})+\frac{4n\varUpsilon }{(n-1)^{2}}, \end{aligned}$$

where $\varUpsilon ={\text {E}}(\varvec{{x}}_{1}^{\top }\varvec{{x}}_{2})^{3}={\text {E}}[(\varvec{{y}}_{1}-\varvec{{\mu }})^{\top }(\varvec{{y}}_{2}-\varvec{{\mu }})]^{3}$, $*$ means “$i<j,\ u<v$” and “$(i,j)\ne (u,v)$” while $**$ means “$i<j,\ u<v,\ \alpha <\beta $” and “$(i,j),\ (u,v),\ (\alpha ,\beta )$ are not mutually equal to each other.” The proof is complete. $\square $

Proof of Theorem 4

We first show (a). Under Condition C1, we have $\varvec{{y}}_i=\varvec{{\mu }}+\varvec{{\varGamma }}\varvec{{z}}_i,\ i=1,\ldots , n$ where $\varvec{{z}}_i,\ i=1,\ldots , n$ are i.i.d. with ${\text {E}}(\varvec{{z}}_i)=\varvec{{0}}$ and ${\text {Cov}}(\varvec{{z}}_i)=\varvec{{I}}_p$ and $\varvec{{\varSigma }}=\varvec{{\varGamma }}\varvec{{\varGamma }}^{\top }$. It follows that $\varUpsilon ={\text {E}}[(\varvec{{y}}_{1}-\varvec{{\mu }})^{\top }(\varvec{{y}}_{2}-\varvec{{\mu }})]^{3}={\text {E}}(\varvec{{z}}_1^{\top }\varvec{{\varOmega }}\varvec{{z}}_2)^3$ where $\varvec{{\varOmega }}=\varvec{{\varGamma }}^{\top }\varvec{{\varGamma }}$. By Jensen’s inequality, we have

$$\begin{aligned} \varUpsilon ={\text {E}}(\varvec{{z}}_1^{\top }\varvec{{\varOmega }}\varvec{{z}}_2)^3\le \left[ {\text {E}}(\varvec{{z}}_1^{\top }\varvec{{\varOmega }}\varvec{{z}}_2)^4\right] ^{3/4}. \end{aligned}$$

(A.41)

Denote the (i, j)-th entry of $\varvec{{\varOmega }}$ as $w_{ij},\ i,j,=1,\ldots ,p$. Under Conditions C1 and C2, from the proof of Lemma 6.2 of Srivastava and Kubokawa (2013) (p. 215), we have

$$\begin{aligned} {\text {E}}\left( \varvec{{z}}_1^{\top }\varvec{{\varOmega }}\varvec{{z}}_2 \right) ^4=\varDelta ^2\sum _{i=1}^p\sum _{j=1}^p w_{ij}^4+6\varDelta \sum _{i=1}^p\left( \sum _{j=1}^p w_{ij}^2 \right) ^2 +6{\text {tr}}(\varvec{{\varOmega }}^4)+3{\text {tr}}^2(\varvec{{\varOmega }}^2),\qquad \end{aligned}$$

(A.42)

where $\varDelta $ is given in Condition C2. Notice that we have

$$\begin{aligned} \begin{array}{rcl} \sum _{i=1}^p\sum _{j=1}^p w_{ij}^4&{}\le &{} \left( \sum _{i=1}^p\sum _{j=1}^p w_{ij}^2 \right) ^2={\text {tr}}^2(\varvec{{\varOmega }}^2),\\ \sum _{i=1}^p \left( \sum _{j=1}^p w_{ij}^2 \right) ^2 &{}\le &{} \left( \sum _{i=1}^p \sum _{j=1}^p w_{ij}^2 \right) ^2={\text {tr}}^2(\varvec{{\varOmega }}^2),\\ {\text {tr}}(\varvec{{\varOmega }}^4) &{}\le &{}{\text {tr}}^2(\varvec{{\varOmega }}^2). \end{array} \end{aligned}$$

These, together with (A.42), imply that ${\text {E}}\left( \varvec{{z}}_1^{\top }\varvec{{\varOmega }}\varvec{{z}}_2\right) ^4\le (\varDelta ^2+6\varDelta +9){\text {tr}}^2(\varvec{{\varOmega }}^2)$. Then by (A.41), we have

$$\begin{aligned} \varUpsilon \le (\varDelta ^2+6\varDelta +9)^{3/4}{\text {tr}}^{3/2}(\varvec{{\varOmega }}^2)=(\varDelta ^2+6\varDelta +9)^{3/4}{\text {tr}}^{3/2}(\varvec{{\varSigma }}^2), \end{aligned}$$

(A.43)

where we use the fact that $ {\text {tr}}(\varvec{{\varOmega }}^2)={\text {tr}}(\varvec{{\varGamma }}^{\top }\varvec{{\varGamma }}\varvec{{\varGamma }}^{\top }\varvec{{\varGamma }})={\text {tr}}(\varvec{{\varGamma }}\varvec{{\varGamma }}^{\top }\varvec{{\varGamma }}\varvec{{\varGamma }}^{\top })={\text {tr}}(\varvec{{\varSigma }}^2). $

We now show (b). First of all, under Conditions C1 and C2, by (22) and (A.43), we have

$$\begin{aligned} \delta \le 1+\frac{(\varDelta ^2+6\varDelta +9)^{3/4}{\text {tr}}^{3/2}(\varvec{{\varSigma }}^2)}{2(n-2){\text {tr}}(\varvec{{\varSigma }}^3)} = 1+(\varDelta ^2+6\varDelta +9)^{3/4} \frac{\sqrt{\kappa }}{2(n-2)},\nonumber \\ \end{aligned}$$

(A.44)

where $\kappa $ is given in (A.36). By Theorem 5 of Zhang et al. (2020), we have

$$\begin{aligned} 1\le \kappa \le \frac{{\text {tr}}^2(\varvec{{\varSigma }})}{{\text {tr}}(\varvec{{\varSigma }}^2)}\le p. \end{aligned}$$

(A.45)

Under Condition C3, as $p\rightarrow \infty $, we have $ {{\text {tr}}^2(\varvec{{\varSigma }})}/{{\text {tr}}(\varvec{{\varSigma }}^2)}=(\sum _{r=1}^p \rho _{p,r})^2\longrightarrow (\sum _{r=1}^{\infty } \rho _{r})^2<\infty . $ It follows that under Conditions C1, C2 and C3, $\kappa $ is bounded as $p\rightarrow \infty $. This, together with (A.44), implies that under Conditions C1, C2 and C3, as $n, p\rightarrow \infty $, we have $\delta =1+o(1)$ as $n, p\rightarrow \infty $.

Under Conditions C1, C2 and C4, by (A.45), as $n, p\rightarrow \infty $, we have $ \sqrt{\kappa }/[2(n-2)]\le \sqrt{p/[2(n-2)]^2}=o(1). $ This, together with (A.44), implies that under Conditions C1, C2 and C4, as $n, p\rightarrow \infty $, we always have $\delta =1+o(1)$. The proof is complete. $\square $

Proof of Theorem 5

By (23), the skewness of $T_{n,p,0}$ is $\sqrt{8/f}$ where f is defined in (22) and under Conditions C1, C2 and C4, by Theorem 4, we have $f=d[1+o(1)]$. On the one hand, when $\tilde{T}_{n,p,0}{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\mathcal {N}(0,1)$, the skewness of $T_{n,p,0}$ must tend to 0 as $n,p\rightarrow \infty $, it follows that we must have $d\longrightarrow \infty $ as $n,p\rightarrow \infty $. On the other hand, by (15), as $n\rightarrow \infty $, we have $d=\kappa [1+o(1)]$ where $\kappa $ is defined in (A.36). We have $\kappa ={{\text {tr}}^{3}(\varvec{{\varSigma }}^{2})}/{{\text {tr}}^{2}(\varvec{{\varSigma }}^{3})}\ge {{\text {tr}}(\varvec{{\varSigma }}^{2})}/{\lambda _{p,\max }^{2}}$ where $\lambda _{p,\max }$ is the largest eigenvalue of $\varvec{{\varSigma }}$. Therefore, as $d\rightarrow \infty $, we have $\kappa \longrightarrow \infty $ and ${\lambda _{p,\max }^{2}}/{{\text {tr}}(\varvec{{\varSigma }}^{2})}\ge \kappa ^{-1}\longrightarrow 0$ as $p\rightarrow \infty $. That is, Condition C5 holds. This, together with Conditions C1, C2 and C4, implies that by Theorem 1(b), we have $\tilde{T}_{n,p,0}{\mathop {\longrightarrow }\limits ^{\mathcal {L}}}\mathcal {N}(0,1)$. The proof is then complete. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, JT., Zhou, B. & Guo, J. Testing high-dimensional mean vector with applications. Stat Papers 63, 1105–1137 (2022). https://doi.org/10.1007/s00362-021-01270-z

Download citation

Received: 07 December 2020
Revised: 07 October 2021
Accepted: 11 October 2021
Published: 28 October 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00362-021-01270-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Testing high-dimensional mean vector with applications

Abstract

Similar content being viewed by others

A simultaneous testing of the mean vector and the covariance matrix among two populations for high-dimensional data

High-dimensional Tests for Mean Vector: Approaches without Estimating the Mean Vector Directly

Linear hypothesis testing in high-dimensional one-way MANOVA: a new normal reference approach

1 Introduction

2 Main results

2.1 Asymptotic null distribution

Theorem 1

2.2 Implementation

2.3 Asymptotic power

Theorem 2

2.4 Effect of data non-normality

Theorem 3

Theorem 4

Theorem 5

3 Simulation study

4 Some interesting applications

4.1 Paired two-sample problem

4.2 One-sample problem for transposable data

4.3 Two-sample problem and MANOVA

4.4 One-sample problem for heavy tailed data

5 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 332 KB)

Appendix: Technical proofs

Appendix: Technical proofs

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation