Hypothesis Testing for High-Dimensional Data

Wu, Wei Biao; Lou, Zhipeng; Han, Yuefeng

doi:10.1007/978-3-319-18284-1_8

Wei Biao Wu⁷,
Zhipeng Lou⁷ &
Yuefeng Han⁷

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

4691 Accesses
2 Citations

Abstract

We present a systematic theory for tests for means of high-dimensional data. Our testing procedure is based on an invariance principle which provides distributional approximations of functionals of non-Gaussian vectors by those of Gaussian ones. Differently from the widely used Bonferroni approach, our procedure is dependence-adjusted and has an asymptotically correct size and power. To obtain cutoff values of our test, we propose a half-sampling method which avoids estimating the underlying covariance matrix of the random vectors. The latter method is shown via extensive simulations to have an excellent performance.

Access provided by CONRICYT-eBooks. Download chapter PDF

U-Tests of General Linear Hypotheses for High-Dimensional Data Under Nonnormality and Heteroscedasticity

Article 01 September 2015

Multi-sample hypothesis testing of high-dimensional mean vectors under covariance heterogeneity

Article 22 March 2024

A unified approach to testing mean vectors with large dimensions

Article Open access 10 December 2018

Keywords

1 Introduction

With the advance of modern data collection techniques, high-dimensional data appear in various fields including physics, biology, healthcare, finance, marketing, social network, and engineering among others. A common feature in such datasets is that the data dimension or the number of involved parameters can be quite large. As a fundamentally important problem in the study of such data, one would like to perform statistical inference of those parameters such as multiple testing or construction of confidence regions. With that one is able to provide an answer to the question whether there is signal in the dataset, or whether the dataset consists only of random noises. Due to the high-dimensionality, the inferential procedures developed for low-dimensional problems may no longer be valid in the high-dimensional setting. Different approaches should be designed to account for high-dimensionality. There exists a huge literature on multiple testing; see, for example, Dudiot and van der Laan (2008), Efron (2010) and Dickhaus (2014).

We now introduce the setting of our testing problem. Assume that X ₁, X ₂, …, are independent and identically distributed (i.i.d.) p-dimensional random vectors, with mean vector μ = (μ ₁, …, μ _p)^T = E(X _i) and covariance matrix Σ = cov(X _i) = (σ _jk)_j,k≤p. We are testing the hypothesis of existence of a signal

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \mu = 0 \mbox{ vs } H_A: \mu \neq 0 \end{array} \end{aligned} $$

(8.1)

based on the sample X ₁, …, X _n. This formulation is actually very general and its solution can be applied to many other problems; see Sect. 8.2. We can estimate μ by the sample mean vector $\hat \mu = \bar X_n = n^{-1} \sum _{i=1}^n X_i$. The classical Hotelling’s T-squared test has the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} T = \bar X_n \hat \Sigma_n^{-1} \bar X_n, \end{array} \end{aligned} $$

(8.2)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat \Sigma_n = (n-1)^{-1} \sum_{i=1}^n (X_i - \bar X_n) (X_i - \bar X_n)^T \end{array} \end{aligned} $$

(8.3)

is the sample covariance matrix estimate of Σ. If p is small and fixed, by the Central Limit Theorem (CLT),

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sqrt{n} (\bar X_n - \mu) \Rightarrow N(0, \Sigma). \end{array} \end{aligned} $$

(8.4)

By the Law of Large Numbers, if Σ is non-singular,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat \Sigma_n^{-1} \to \Sigma^{-1} \mbox{ almost surely.} \end{array} \end{aligned} $$

(8.5)

Clearly (8.4) and (8.5) imply that under H ₀, the Hotelling’s T-squared statistic $n T \Rightarrow \chi ^2_p$ (χ ² distribution with degrees of freedom p). Thus we can reject H ₀ at level 0 < α < 1 if $n T > \chi ^2_{p, 1-\alpha }$, the (1 − α)th quantile of $\chi ^2_p$.

In the high-dimensional situation in which p can be much larger than n, the CLT (8.4) is no longer valid; see Portnoy (1986). Furthermore, $\hat \Sigma _n$ is singular and thus T is not well-defined. Also the matrix convergence (8.5) may not hold, see Marčenko and Pastur (1967). In this chapter we shall apply a testing functional approach that does not use $\hat \Sigma _n^{-1}$ or the precision matrix Σ⁻¹. A function $g: \mathbb {R}^p \to [0, \infty )$ is said to be a testing functional if the following requirements are satisfied: (1) (monotonicity) for any $x = (x_1, \ldots, x_p)^T \in \mathbb {R}^p$ and 0 < c < 1, g(cx) ≤ g(x); (2) (identifiability) g(x) = 0 if and only if x = 0. We shall consider the test statistic

$$\displaystyle \begin{aligned} \begin{array}{rcl} T_n = g(\sqrt{n} \bar X_n). \end{array} \end{aligned} $$

(8.6)

Examples of g include the L ²-based test with $g(x) = \sum _{j=1}^p x_j^2$, the L ^∞-based test with g(x) =max_j≤p|x _j|, the weighted empirical process $g(x) = \sup _{u \ge 0} ( \sum _{j=1}^p \mathbf {1} _{|x_j| \ge u} h(u) )$, where h(⋅) is a nonnegative-valued non-decreasing function, among others. We reject H ₀ in (8.1) if T _n is too big.

As a theoretical foundation, we base our testing procedure on the following invariance principle result

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t \in \mathbb{R}} |P[ g(\sqrt{n} (\bar X_n-\mu)) \le t] - P[ g(\sqrt{n} \bar Z_n) \le t]| \to 0, \end{array} \end{aligned} $$

(8.7)

where Z, Z ₁, Z ₂, … are i.i.d. N(0, Σ) random vectors and $\bar Z_n = n^{-1} \sum _{i=1}^n Z_i =_{\mathcal {D}} n^{-1/2} Z$. Interestingly, though the CLT (8.4) does not generally hold in the high-dimensional setting, the testing functional form (8.7) may still be valid. Chernozhukov et al. (2014) proved (8.7) with the L ^∞ norm g(x) =max_j≤p|x _j|, while Xu et al. (2014) consider the L ² based test with $g(x) = \sum _{j=1}^p x_j^2$. In Sect. 8.5 we shall provide a sufficient condition so that (8.7) holds for certain testing functionals.

In applying (8.7) for testing (8.1), one needs to know the distribution of $g(\sqrt {n} \bar Z_n) =_{\mathcal {D}} g(Z)$ so that a suitable cutoff value can be obtained. The latter problem is highly nontrivial since the covariance matrix Σ, which is viewed as a nuisance parameter here, is typically not known and the associated estimation issue can be quite challenging. In Sect. 8.5 we shall propose a half-sampling technique which can avoid estimating the nuisance covariance matrix Σ.

2 Applications

Our paradigm (8.1) is actually quite general and it can be applied to testing of high-dimensional covariance matrices, testing of independence of high-dimensional data, analysis of variances with non-normal and heteroscedastic errors.

2.1 Testing of Covariance Matrices

There is a huge literature on testing covariance matrices such as uncorrelatedness, sphericity, or other patterns. For Gaussian data, tests for Σ = σ ²I_p, where I_p is the identity matrix, can be found in Ahmad (2010), Birke and Dette (2005), Chen et al. (2010), Fisher et al. (2010) and Ledoit and Wolf (2002). Tests for equality of covariance matrices are studied in Bai et al. (2009) and Jiang et al. (2012), and for sphericity is in Onatski et al. (2013). Minimax properties are considered in Cai and Ma (2013). For other contributions, see Qu and Chen (2012), Schott (2005, 2007), Srivastava (2005), Xiao and Wu (2013) and Zhang et al. (2013).

Assume that we have data matrix Y _n = (Y _i,j)_{1≤i≤n,1≤j≤p}, where $(Y_{i, j})_{j=1}^p$, i = 1, …, n, are i.i.d. p-dimensional random vectors. Let

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sigma_{ j k} = \mathrm{cov}(Y_{1,j}, \, Y_{1, k}), \quad 1 \le j, k \le p, \end{array} \end{aligned} $$

(8.8)

be the covariance function. Consider testing hypothesis for uncorrelatedness:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \sigma_{j k} = 0 \mbox{ for all } j\neq k. \end{array} \end{aligned} $$

(8.9)

For simplicity assume that E(Y _i,j) = 0. For a pair a = (j, k) write X _i,a = Y _i,j Y _i,k, and $\bar X_a = n^{-1} \sum _{i=1}^n X_{i, a}$ and the (p ² − p)-dimensional vector $\bar X = (\bar X_a)_{a \in {\mathcal {A}}}$, where ${\mathcal {A}} = \{(j, k): \, j\neq k, \, j \le p, k \le p\}$. The hypothesis H ₀ in (8.9) can be tested by using the test statistics $T= g(\sqrt {n} \bar X)$. Xiao and Wu (2013) considered the L ^∞ based test with g(x) =max_i|x _i|, generalizing the result in Jiang (2004) which concerns the special case for i.i.d. vectors with independent entries. Han and Wu (2017) performed an L ² based test for patterns of covariances with the test statistic

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} T = \sum_{a \in {\mathcal{A}}} \bar X_a^2 = \sum_{j \neq k} \hat \sigma_{j k}^2. \end{array} \end{aligned} $$

(8.10)

With slight modifications, one can also test the sphericity hypothesis

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \Sigma = \sigma^2 \mathrm{I}_p \mbox{ for some } \sigma^2 > 0, \end{array} \end{aligned} $$

(8.11)

where I_p is the p × p identity matrix. Let ${\mathcal {A}}_0 = \{(j, k): \, j, k \le p\}$ with diagonal entries added to ${\mathcal {A}}$. For $a = (j, j) \in {\mathcal {A}}_0$, let $X_{i, a} = Y_{i, j}^2 - \sigma ^2$. If σ ² is known, then H ₀ in (8.11) can be rejected at level α ∈ (0, 1) if $T = g(\sqrt {n} \bar X) > t_{1-\alpha }$, where t _1−α is the (1 − α)th quantile of g(Z) and Z is a centered Gaussian vector with covariance structure cov(Z _a, Z _b) = E(X _i,a X _i,b), $a, b \in {\mathcal {A}}_0$. In the case that σ ² is not known, we shall use an estimate. For example, we can let $\hat \sigma ^2 = n^{-1} \sum _{j=1}^n \hat \sigma _{j j}^2$, and consider $X_{i, a}^\circ = Y_{i, j}^2 - \hat \sigma ^2$. Let $X_{i, a}^\circ = X_{i, a}$ if a = (j, k) with j ≠ k. The hypothesis H ₀ in (8.11) can be tested by the statistic $T^\circ = g(\sqrt {n} \bar X^\circ )$.

2.2 Testing of Independence

Let $Y_i = (Y_{i, j})_{j=1}^p$, i = 1, …, n, be i.i.d. p-dimensional random vectors with joint cumulative distribution function

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} F_{j_1, \ldots, j_d} (y_{j_1}, \ldots, y_{j_d}) = P(Y_{i, j_1} \le y_{j_1}, \ldots, Y_{i, j_d} \le y_{j_d}). \end{array} \end{aligned} $$

(8.12)

Consider the problem of testing whether entries of Y _i are independent. Assume that the marginal distributions are standard uniform[0, 1]. For j = (j ₁, …, j _d), write $F_{\mathbf {j}} (y_{\mathbf {j}}) = F_{j_1, \ldots, j_d} (y_{j_1}, \ldots, y_{j_d})$. For fixed d, the hypothesis of d-wise independence is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \,\, F_{\mathbf{j}} (y_{\mathbf{j}}) = y_{j_1} \ldots y_{j_d} \mbox{ holds for all } y_1, \ldots, y_d \in (0, 1) \mbox{ and } \mathbf{j} \in {\mathcal{A}}_d, \end{array} \end{aligned} $$

(8.13)

where ${\mathcal {A}}_d = \{ \mathbf {j} = (j_1, \ldots, j_d): \, j_1 < \cdots < j_d \le p \}$. Pairwise and triple-wise independence correspond to d = 2 and d = 3, respectively. We estimate F _j(y _j) by the empirical cdf

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat F_{\mathbf{j}}(y_{\mathbf{j}}) = {1\over n} \sum_{i=1}^n {\mathbf{1}}_{ Y_{i, \mathbf{j}} \le y_{\mathbf{j}}}, \end{array} \end{aligned} $$

(8.14)

where the notation Y _i,j ≤ y _j means $Y_{i, j_h} \le y_{j_h}$ for all h = 1, …, d. Let $y_{\mathbf { m}_1}, \ldots, y_{{\mathbf {m}}_N}$, N →∞, be a dense set of [0, 1]^d. For example, we can choose them to be the lattice set {1∕K, …, (K − 1)∕K}^d with N = (K − 1)^d. Let X _i, 1 ≤ i ≤ n, be the Np!∕(d!(p − d)!)-dimensional vector with the (ℓ j)th component being ${\mathbf {1}}_{ Y_{i, \mathbf {j}} \le y_{{\mathbf {m}}_\ell }} - \prod _{h \in {\mathbf {m}}_\ell } y_h $, 1 ≤ ℓ ≤ N, $\mathbf {j} \in {\mathcal {A}}_d$. Then the L ²-based test for (8.13) on the dense set $(y_{{\mathbf {m}}_\ell } )_{\ell =1}^N$ has the form $n | \bar X |{ }_2^2$.

2.3 Analysis of Variance

Consider the following two-way ANOVA model

$$\displaystyle \begin{aligned} \begin{array}{rcl} Y_{i j k} = \mu + \alpha_i + \beta_j + \delta_{i j} + \varepsilon_{i j k}, i = 1, \ldots, I, j = 1, \ldots, J, k = 1, \ldots, K,\quad \end{array} \end{aligned} $$

(8.15)

where μ is the grand mean, α _i and β _j are the main effects from the first and the second factors, respectively, and δ _ij are the interaction effect. Assume that (Y _ijk)_i≤I,j≤J, k = 1, …, K, are i.i.d. Consider the hypothesis of interaction:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \, \delta_{i j} = 0 \mbox{ for all } i = 1, \ldots, I, \,\, j = 1, \ldots, J. \end{array} \end{aligned} $$

(8.16)

In the classical ANOVA procedure, one assumes that ε _ijk, i ≤ I, j ≤ J, are i.i.d. N(0, σ ²) and makes use of the fact that the sum of squares

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} SS_I = \sum_{i=1}^I \sum_{j=1}^J (\bar Y_{i j \cdot} - \bar Y_{i \cdot \cdot} - \bar Y_{\cdot j \cdot} + \bar Y_{\cdot \cdot \cdot})^2 \end{array} \end{aligned} $$

(8.17)

is distributed as $\sigma ^2 \chi ^2_{(I-1)(J-1)}$. Here $\bar Y_{i j \cdot } = K^{-1} \sum _{k=1}^K Y_{i j k}$ and other sample averages $\bar Y_{i \cdot \cdot }$, $\bar Y_{\cdot j \cdot }$ and $\bar Y_{\cdot \cdot \cdot }$ are similarly defined. The null hypothesis H ₀ is rejected at level α ∈ (0, 1) if

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} { {SS_I} \over {(I-1)(J-1)}} > SS_E F_{(I-1)(J-1), I J (K-1), 1-\alpha} \end{array} \end{aligned} $$

(8.18)

where F _{(I−1)(J−1),IJ(K−1),1−α} is the (1 − α)th quantile of the F-distribution F _{(I−1)(J−1),IJ(K−1)} and

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} SS_E = {{ \sum_{i=1}^I \sum_{j=1}^J (Y_{i j k} - \bar Y_{i j \cdot})^2 }\over {I J (K-1)}} \end{array} \end{aligned} $$

(8.19)

is an estimate of σ ².

The classical ANOVA procedure can be invalid when the assumption that ε _ijk, i ≤ I, j ≤ J are i.i.d. N(0, σ ²) is violated. In the latter case SS _I may no longer have a χ ² distribution. However we can still approximate the distribution of SS _I in terms of (8.7). For a = (i, j) let $X_{a k} = \bar Y_{i j k} - \bar Y_{i \cdot k} - \bar Y_{\cdot j k} + \bar Y_{\cdot \cdot k}$. Then $SS_I = \sum _{a \in {\mathcal {A}}} \bar X_{a \cdot }^2$, where $\bar X_{a \cdot } = K^{-1} \sum _{k=1}^K X_{a k}$.

3 Tests Based on L ^∞ Norms

Fan et al. (2007) considered the L ^∞ norm based test of (8.1) with the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} M_n = \max_{j \le p} { {\sqrt{n} |\hat \mu_j - \mu_j|} \over \hat \sigma_j }, \mbox{ where } \hat \sigma_j^2 = {1\over n} \sum_{i=1}^n (X_{i j} - \hat \mu_j)^2. \end{array} \end{aligned} $$

(8.20)

Assume that the dimension p satisfies

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \log p = o(n^{1/3}) \end{array} \end{aligned} $$

(8.21)

and the uniform bounded third moment condition

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max_{j \le p} E|X_{i j} - \mu_j|{}^3 = O(1). \end{array} \end{aligned} $$

(8.22)

Let Φ be the standard normal cumulative distribution function and z _α = Φ⁻¹(α). Then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P(M_n \ge z_{1-\alpha / (2 p)}) \le \alpha + o(1). \end{array} \end{aligned} $$

(8.23)

Namely, if we perform the test by rejecting H ₀ of (8.1) whenever M _n ≥ z _{1−α∕(2p)}, the familywise type I error of the latter test is asymptotically bounded by α. As a finite sample correction, the cutoff value z _{1−α∕(2p)} in (8.23) can be replaced by the t-distribution quantile t _{n−1,1−α∕(2p)} with degree of freedom n − 1, noting that $(n-1)^{1/2} \hat \mu _j / \hat \sigma _j \sim t_{n-1}$ if X _ij are Gaussian. Due to the Bonferroni correction, the test by Fan et al. (2007) can be quite conservative if the dependence among entries of X _i is strong. For example, if X _i1 = X _i2 = ⋯ = X _ip, then instead of using the cutoff value z _{1−α∕(2p)}, one should use z _1−α∕2, since the cutoff value z _{1−α∕(2p)} leads to the extremely conservative type I error α∕(2p). If entries of X _i are independent and X _i is Gaussian, then the type I error is 1 − (1 − α∕p)^p → 1 − e ^−α and it is slightly conservative. For example, when α = 0.05, 1 − e ^−α = 0.04877058.

Liu and Shao (2013) obtained Gumbel convergence of M _n under the following conditions: (1) for some r > 3, the uniform bounded rth moment conditions max_j≤p E|X _ij − μ _j|^r = O(1) holds, which is slightly stronger than (8.22) and (2) weak dependence among entries of X _i. For Σ = (σ _jk)_j,k≤p, assume the correlation matrix R = (r _jk)_j,k≤p with $r_{jk} = \sigma _{j k} / (\sigma _{j j}^{1/2} \sigma _{k k}^{1/2})$ has the property: for some γ > 0,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max\#\{j \le p: \, |r_{jk}| \ge (\log p)^{-1-\gamma}\} = O(p^\rho) \end{array} \end{aligned} $$

(8.24)

holds for all ρ > 0. Then under (8.21), Theorem 3.1 in Liu and Shao (2013) asserts the Gumbel convergence

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} M_n - 2 \log p + \log \log p \Rightarrow {\mathcal{G}}, \end{array} \end{aligned} $$

(8.25)

where ${\mathcal {G}}$ follows the Gumbel distribution $P({\mathcal {G}} \le y) = \exp (-e^{-y/2}/\pi ^{1/2})$. By (8.25), one can reject H ₀ in (8.1) at level α ∈ (0, 1) based on the L ^∞ norm test

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max_{j \le p} { {\sqrt {n} | \hat \mu_j|} \over \hat \sigma_j} > 2 \log p - \log \log p + g_{1-\alpha}, \end{array} \end{aligned} $$

(8.26)

where g _1−α is chosen such that $P({\mathcal {G}} \le g_{1-\alpha }) = 1 - \alpha $. Clearly the latter test has an asymptotically correct size.

Applying Theorem 2.2 in Chernozhukov et al. (2014), we can have the following Gaussian approximation result. Assume that there exist constants c ₁, c ₂ > 0 such that c ₁ ≤ E(X _ij − μ _j)² ≤ c ₂ holds for all j ≤ p and assume that u = u _n,p satisfies

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left[\max_{j \le p} |X_{1 j} - \mu_j| \ge u \right] = o(n^{-1}) \end{array} \end{aligned} $$

(8.27)

Let m _k =max_j≤p(E|X _1j − μ _j|^k)^1∕k and further assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} n^{-1/8} (m_3^{3/4} + m_4^{1/2} ) (\log (p n))^{7/8} + n^{-1/2} (\log (p n))^{3/2} u \to 0. \end{array} \end{aligned} $$

(8.28)

Let Z ∼ N(0, R). Then we have the Gaussian approximation result: as n →∞

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_t|P(M_n \ge t) - P(|Z|{}_\infty \ge t)| \to 0. \end{array} \end{aligned} $$

(8.29)

Let t _1−α be the (1 − α)th quantile of |Z|_∞. The Gaussian approximation (8.29) leads to L ^∞ norm based test: H ₀ is rejected at level α if $\max _{j \le p} \sqrt {n} |\hat \mu _j| / \hat \sigma _j \ge t_{1-\alpha }$. In comparison with the result in Fan et al. (2007), the latter test has an asymptotically correct size and it is dependence adjusted. To obtain an estimate for the cutoff value t _1−α, Chernozhukov et al. (2014) proposed a Gaussian Multiplier Bootstrap (GMB) method. Given X ₁, …, X _n, let $\hat t_{1-\alpha }$ be such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left(\max_{j \le p} n^{-1/2} | \sum_{i=1}^n X_{i j} e_i | \ge \hat t_{1-\alpha} | X_1, \ldots, X_n\right) = \alpha, \end{array} \end{aligned} $$

(8.30)

where e _i are i.i.d. N(0, 1) random variables independent of (X _ij)_i≥1,j≥1. Note that $ \hat t_{1-\alpha }$ can be numerically calculated by extensive Monte Carlo simulations. In Sect. 8.5 we shall propose a Hadamard matrix and a Rademacher weighted approaches. The simulation study in Sect. 8.6 shows that, for finite-sample performance, the latter approach gives a more accurate size than the method based on Gaussian Multiplier Bootstrap (8.30).

Chen et al. (2016) generalized Fan, Hall and Yao’s L ^∞ norm to high-dimensional dependent vectors. Assume that $(X_i)_{i \in \mathbb {Z}}$ is a p-dimensional stationary process of the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} X_t = G({\mathcal{F}}_t) = (G_1({\mathcal{F}}_t), \ldots, G_p({\mathcal{F}}_t))^T, \end{array} \end{aligned} $$

(8.31)

where ε _t, $t\in \mathbb {Z}$, are i.i.d. random variables, ${\mathcal {F}}_t = (\ldots, \varepsilon _{t-1}, \varepsilon _t)$ and G(⋅) is a measurable function such that X _t is well-defined. Assume that the long-run covariance matrix

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Sigma_\infty = \sum_{i=-\infty}^\infty \mathrm{cov}(X_0, X_i) = (\omega_{jl})_{j,l \le p} \end{array} \end{aligned} $$

(8.32)

exists. Let $\varepsilon _i^\ast, \varepsilon _j, i, j\in \mathbb {Z}$, be i.i.d. random variables. Assume that X _t has finite rth moment, r > 2. Define the functional dependence measures (see, Wu 2005, 2011) as

$$\displaystyle \begin{aligned} \theta_r(m) = \max_{j \le p} \| X_{i j} - G_j(\ldots, \varepsilon_{i-m-2}, \varepsilon_{i-m-1}, \varepsilon^\ast_{i-m}, \varepsilon_{i-m+1}, \ldots, \varepsilon_i) \|{}_r. \end{aligned} $$

(8.33)

If X _i are i.i.d., then Σ_∞ = Σ and θ _r(m) = 0 if m ≥ 1. We say that (X _t) is geometric moment contraction (GMC; see Wu and Shao 2004) if there exist ρ ∈ (0, 1) and a ₁ > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \theta_r(m) \le a_1 \rho^m = a_1 e^{-a_2 m} ~ {\mathrm{with}}~ a_2 = - \log \rho. \end{array} \end{aligned} $$

(8.34)

Let μ = EX _t. To test the hypothesis H ₀ in (8.1), Chen et al. (2016) introduced the following dependence-adjusted versions of Fan, Hall, and Yao’s M _n. Let n = mk, where m ≍ n ^1∕4 and blocks B _l = {i : m(l − 1) + 1 ≤ i ≤ ml}. Let $Y_{l j} = \sum _{i \in B_l} X_{i j}$, 1 ≤ j ≤ p, 1 ≤ l ≤ k, be the block sums. Define the block-normalized sum

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} M_n^\circ = \max_{j \le p} { {\sqrt{n} |\hat \mu_j - \mu_j|} \over \hat \sigma_j^\circ }, \mbox{ where } (\hat \sigma^\circ_j)^2 = {1\over m k} \sum_{l=1}^k (Y_{l j} - m \hat \mu_j)^2, \end{array} \end{aligned} $$

(8.35)

and the interlacing normalized sum: let k ^∗ = k∕2, $\mu ^\dagger _j = (m k^*)^{-1} \sum _{l=1}^{k^*} Y_{2 l j}$ and

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} M_n^\dagger = \max_{j \le p} { {\sqrt {n/2} |\mu^\dagger_j - \mu_j|} \over \hat \sigma_j^\dagger }, \mbox{ where } (\hat \sigma^\dagger_j)^2 = {1\over m k^*} \sum_{l=1}^{k^*} (Y_{2 l j} - m \mu_j^\dagger)^2. \end{array} \end{aligned} $$

(8.36)

By Chen et al. (2016), we have the following result: Assume exists a constant ζ > 0 such that the long-run variance ω _jj ≥ ζ for j ≤ p, (8.34) holds with r = 3, and

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \log p = o(n^{1/4}). \end{array} \end{aligned} $$

(8.37)

Then (8.23) holds for both the block-normalized sum $M_n^\circ $ and the interlacing normalized sum $M_n^\dagger $. Note that, while (8.37) still allows ultra high dimensions, due to dependence, the allowed dimension p in condition (8.37) is smaller than the one in (8.21). Additionally, if the GMC (8.34) holds with some r > 3, (8.24) holds with the long-run correlation matrix R = D ^−1∕2 Σ_∞ D ^−1∕2, where D = diag( Σ_∞), and for some 0 < τ < 1∕4,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \log p = o(n^{\tau}), \end{array} \end{aligned} $$

(8.38)

then we have the Gumbel convergence for the interlacing normalized sum:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} M_n^\dagger - 2 \log p + \log \log p \Rightarrow {\mathcal{G}}, \end{array} \end{aligned} $$

(8.39)

where ${\mathcal {G}}$ is given in (8.25). Similarly as (8.26), one can perform the following test which has an asymptotically correct size: we reject H ₀ in (8.1) at level α ∈ (0, 1) if

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max_{j \le p} { {\sqrt {n/2} |\mu^\dagger_j|} \over \hat \sigma_j^\dagger } > 2 \log p - \log \log p + g_{1-\alpha}. \end{array} \end{aligned} $$

(8.40)

4 Tests Based on L ² Norms

In this section we shall consider the test which is based on the L ² functional with $g(x) = \sum _{j=1}^p x_j^2$. Let λ ₁ ≥⋯ ≥ λ _p ≥ 0 be the eigenvalues of Σ. For Z ∼ N(0, Σ), we have the distributional equality $g(Z) = Z^T Z =_{\mathcal {D}} \sum _{j=1}^p \lambda _j \eta _j^2$, where η _j are i.i.d. standard N(0, 1) random variables. Let $f_k = (\sum _{j=1}^p \lambda _j^k)^{1/k}$, k > 0, and f = f ₂. Then Eg(Z) = f ₁ = tr( Σ) and var(g(Z)) = 2f ². Xu et al. (2014) provide a sufficient condition for the invariance principle (8.7) with the quadratic functional g. For some 0 < δ ≤ 1 let q = 2 + δ.

Condition 1

Let δ > 0. Assume EX ₁ = 0, E|X ₁|^2q < ∞ and let

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} K_\delta(X) ^q:= {E}\left|{|X_1|{}_2^2- f_1 \over f}\right|{}^q < \infty \end{array} \end{aligned} $$

(8.41)

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} D_\delta(X) ^q:= {E}\left| {{ X_1^T X_2} \over f}\right|{}^q < \infty. \end{array} \end{aligned} $$

(8.42)

Observe that Condition 1, (8.41) and (8.42) are Lyapunov-type conditions. Assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} { K_{0}(X)^2 \over n} + { K_{\delta}(X)^q \over n^{q-1}} + { {E} (X_1^T \Sigma X_1)^{q/2} \over { n^{\delta/2} f^q} } + { D_\delta(X)^q \over n^{\delta}} \to 0 \mbox{ as } n \to \infty. \end{array} \end{aligned} $$

(8.43)

Then (8.7) holds (cf Xu et al. 2014). Consequently we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t \in \mathbb{R}} |P( (n |\bar X_n|{}_2^2 - f_1) / f \le t) - P(V \le t)| \to 0, \mbox{ where } V = \sum_{j=1}^p f^{-1} \lambda_j (\eta_j^2 - 1).\qquad \end{array} \end{aligned} $$

(8.44)

In the literature, researchers primarily focus on developing the central limit theorem

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} R_n := { {n |\bar X_n|{}_2^2 - f_1} \over f} = { {n \bar X_n^T \bar X_n - f_1} \over f} \Rightarrow N(0, 2) \end{array} \end{aligned} $$

(8.45)

or its modified version; see, for example, Bai and Saranadasa (1996), Chen and Qin (2010) and Srivastava (2009). Xu et al. (2014) clarified an important issue on the CLT of R _n. By the Lindeberg–Feller central limit theorem, V ⇒ N(0, 2) as p →∞ holds if and only if λ ₁∕f → 0. The distributional approximation (8.44) indicates that, if λ ₁∕f does not go to 0, then the central limit theorem cannot hold for R _n.

Let t _1−α be the (1 − α)th quantile of g(Z) = |Z|² = Z ^T Z. By (8.7) we can reject (8.1) at level α ∈ (0, 1) if

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} n |\bar X_n|{}^2 > t_{1-\alpha} \end{array} \end{aligned} $$

(8.46)

To calculate t _1−α, one needs to know the eigenvalues λ ₁, …, λ _p. However, estimation of those eigenvalues is a very challenging problem, in particular if one does not impose certain structural assumptions on Σ. In Sect. 8.5.2 we shall propose a half-sampling based approach which does not need estimation of the covariance matrix Σ.

The L ^∞ based tests discussed in Sect. 8.3 have a good power when the alternative consists of few large signals. If the signals are small and have a similar magnitude, then the L ² test is more powerful. To this end, assume that there exists a constant c > 0 and a small δ > 0 such that cδ ≤ μ _j ≤ δ∕c holds for all j = 1, …, p. We can interpret δ as the departure parameter (from the null H ₀ with μ = 0). For the L ^∞-based test to have power approaching to 1, one necessarily requires that $\sqrt {n} \delta \to \infty $. Elementary calculation shows that, under the much weaker condition np ^1∕2 δ ² →∞, then the power of the L ² based test, or the probability that event (8.46) occurs going to one. In the latter condition, larger dimension p is actually a blessing as it requires a smaller departure δ.

5 Asymptotic Theory

In Sects. 8.3 and 8.4, we discussed the classical L ^∞ and L ² functionals, respectively. For a general testing functional, we have the following invariance principle (cf Theorem 1), which asserts that functionals of sample means of non-Gaussian random vectors X ₁, X ₂, … can be approximated by those of Gaussian vectors Z ₁, Z ₂, … with same covariance structure. Assume $g \in \mathbb {C}^3(\mathbb {R}^p)$. For x = (x ₁, …, x _p)^T write g _j = g _j(x) = ∂g(x)∕∂x _j. Similarly we define the partial derivatives g _jk and g _jkl. For all j, k, l = 1, …, p, assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \kappa_{jkl} := \sup_{\mathbf{x}\in\mathbb{R}^p} (|g_jg_kg_l|+|g_{jk}g_l|+|g_{jl}g_k|+|g_{kl}g_j|+|g_{jkl}|) < \infty. \end{array} \end{aligned} $$

(8.47)

For Z ₁ ∼ N(0, Σ) write Z ₁ = (Z ₁₁, …, Z _1p)^T. Define

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathcal{K}_p=\sum_{j,k,l=1}^p \kappa_{jkl} ({E}|X_{1j}X_{1k}X_{1l}|+{E}|Z_{1j}Z_{1k}Z_{1l}|). \end{array} \end{aligned} $$

(8.48)

For $g(Z_1) = _{\mathcal {D}} g(\sqrt {n} \bar Z_n)$, we assume that its c.d.f. F(t) = P[g(Z) ≤ t] is Hölder continuous: there exists ℓ _p > 0, index α > 0, such that for all ψ > 0, the concentration function

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t\in\mathbb{R}} P(t\leq g(Z_1)\leq t+\psi) \leq \ell_p \psi^\alpha. \end{array} \end{aligned} $$

(8.49)

Theorem 1 (Lou and Wu (2018))

Assume (8.47), (8.49) and $\mathcal {K}_p \ell _p^{3/\alpha } = o(\sqrt {n})$ . Then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t \in \mathbb{R}} |P[ g(\sqrt{n} (\bar X_n-\mu)) \le t] - P[ g(\sqrt{n} \bar Z_n) \le t]| = O(\ell_p^3 \mathcal{K}_p^\alpha n^{-\alpha / 2}) \to 0.\quad \end{array} \end{aligned} $$

(8.50)

To apply Theorem 1 for hypothesis testing, we need to know the c.d.f. F(t) = P[g(Z) ≤ t]. Note that F(⋅) depends on g and the covariance matrix Σ. Thus we can also write F(⋅) = F _g,Σ(⋅). If Σ is known, the distribution of g(Z) is completely known and its cdf F(t) = P[g(Z) ≤ t] can be calculated either analytically or by extensive Monte Carlo simulations. Let t _1−α, 0 < α < 1, be the (1 − α)th quantile of g(Z). Namely

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P[ g(Z) > t_{1-\alpha}] = \alpha. \end{array} \end{aligned} $$

(8.51)

Then the null hypothesis H ₀ in (8.1) is rejected at level α if the test statistic $T_n = g(\sqrt {n} \bar X_n) > t_{1-\alpha }$. This test has asymptotically correct size α. Additionally, the (1 − α) confidence region for μ can be constructed as

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \{ \mu \in \mathbb{R}^p: \, g(\sqrt{n} (\bar X_n-\mu)) \le t_{1-\alpha} \} = \{ \bar X_n + \nu \in \mathbb{R}^p: \, g(\sqrt{n} \nu) \le t_{1-\alpha} \}.\quad \end{array} \end{aligned} $$

(8.52)

If Σ is not known, as a straightforward way to approximate F(t) = F _g,Σ(t), one may use an estimate $\tilde \Sigma $ so that F _g,Σ(t) can be approximated by $F_{g, \tilde \Sigma }(t)$. Here we do not adopt this approach for the following two reasons. First, it can be quite difficult to consistently estimate Σ without assuming sparseness or other structural conditions. The latter assumptions are widely used in the literature; see, for example, Bickel and Levina (2008a), Bickel and Levina (2008b), Cai et al. (2011) and Fan et al. (2013). Second, it is difficult to quantify the difference $F_{g, \tilde \Sigma }(\cdot ) - F(\cdot )$ based on operator norm or other type of matrix convergence of the estimate $\tilde \Sigma $. Xu et al. (2014) argued that, for the L ² test with $g(x) = \sum _{j=1}^p x_j^2$, one needs to use the normalized consistency of $\tilde \Sigma $, instead of the widely used operator norm consistency. We propose using half-sampling and balanced Rademacher schemes.

5.1 Preamble: i.i.d. Gaussian Data

In practice, however, the covariance matrix Σ is typical unknown. Assume at the outset that X ₁, …, X _n are i.i.d. N(μ, Σ) vectors. Assume that n = 4m, where m is a positive integer. Then we can estimate the cumulative distribution function F(t) = P[g(Z) ≤ t] by using Hadamard matrices (see, Georgiou et al. 2003; Hedayat and Wallis 1978; Yarlagadda and Hershey 1997). We say that H is an n × n Hadamard matrix if its first row consisting all 1s, and all its entries taking values 1 or − 1 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H H^T = n I_n, \end{array} \end{aligned} $$

(8.53)

where I _n is the n × n identity matrix. Let

$$\displaystyle \begin{aligned} \begin{array}{rcl} Y_j = {1\over \sqrt{n}} \sum_{i=1}^n H_{j i} X_i, \,\, j=1, \ldots, n. \end{array} \end{aligned} $$

(8.54)

By (8.53), we have $\sum _{i=1}^n H_{j i} = 0$ for 2 ≤ j ≤ n and $\sum _{i=1}^n H_{j i} H_{j' i}= 0$ if j ≠ j′. Since X ₁, …, X _n are i.i.d. N(μ, Σ), it is clear that Y ₂, …, Y _n are also i.i.d. N(0, Σ) vectors. Hence the random variables g(Y ₂), …, g(Y _n) are independent and identically distributed as g(Z). Therefore we can construct the empirical cumulative distribution function

$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat F_n(t) = {1\over {n-1}} \sum_{j=2}^n {\mathbf{1}}_{g(Y_j) \le t}, \end{array} \end{aligned} $$

(8.55)

which converges uniformly to F(t) as n →∞, and t _1−α can be estimated by $\hat t_{1-\alpha } = \hat F_n^{-1}(1-\alpha ) $, the (1 − α)th empirical quantile of $\hat F_n(\cdot )$. As an important feature of the latter method, one does not need to estimate the covariance matrix Σ, the nuisance parameter. In combinatorial experiment design, however, it is highly nontrivial to construct Hadamard matrices. If n is a power of 2, then one can simply apply Sylvester’s construction. The Hadamard conjecture states that a Hadamard matrix of order n exists when 4|n. The latter problem is still open. For example, it is unclear whether a Hadamard matrix exists when n = 668 (see Brent et al. 2015).

5.2 Rademacher Weighted Differencing

To circumvent the existence problem of Hadamard matrices in Sect. 8.5.1, we shall construct asymptotically independent realizations by using Rademacher random variables. Let $\varepsilon _{j k}, j, k \in \mathbb {Z}$, independent of (X _i)_i≥1, be i.i.d. Bernoulli random variables with P(ε _jk = 1) = P(ε _jk = −1) = 1∕2. Define the Rademacher weighted differences

$$\displaystyle \begin{aligned} Y_j = D(A_j), \mbox{ where } D(A) = { {|A|{}^{1/2} (n-|A|)^{1/2}} \over n^{1/2} } \left( { {\sum_{i \in A} X_i} \over {|A|}} - { {\sum_{i \in \{1,\ldots, n\} - A} X_i} \over {n-|A|}} \right), \end{aligned} $$

(8.56)

where the random set

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} A_j = \{1 \le i \le n: \, \varepsilon_{j i} = 1\}. \vspace{3pt}\end{array} \end{aligned} $$

(8.57)

When defining Y _j, we require that A _j satisfies |A _j|≠ 0 and |A _j|≠ n. By the Hoeffding inequality, |A _j| concentrates around n∕2 in the sense that, for u ≥ 0, $P( ||A_j| -n/2| \ge u) \le 2 \exp (-2 u^2 / n)$. Alternatively, we consider the balanced Rademacher weighted differencing: let $A_1^\circ, A_2^\circ, \ldots $ be simple random sample drawn equally likely from ${\mathcal {A}}_m = \{ A\subset \{1, \ldots, n \}: \, |A| = m\}$, where m = ⌊n∕2⌋. Similarly as Y _j in (8.56), we define

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} Y_j^\circ = D(A_j^\circ). \vspace{3pt}\end{array} \end{aligned} $$

(8.58)

Clearly, given A _j (resp. $A_j^\circ $), Y _j (resp. $Y_j^\circ $) has mean 0 and covariance matrix Σ. Based on Y _j in (8.56) (resp. $Y_j^\circ $ in (8.58)), define the empirical distribution functions

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat F_N(t) = {1\over N} \sum_{j=1}^N {\mathbf{1}}_{g(Y_j) \le t}, \end{array} \end{aligned} $$

(8.59)

where N →∞ and

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat F^\circ_N(t) = {1\over N} \sum_{j=1}^N {\mathbf{1}}_{g(Y_j^\circ) \le t}. \end{array} \end{aligned} $$

(8.60)

For sets A, B ⊂{1, …, n}, let A ^c = {1, …, n}− A, B ^c = {1, …, n}− B and

$$\displaystyle \begin{aligned} \begin{array}{rcl} d(A, B) = \max \left \{ ||A \cap B| - {n \over 4}|, \, ||A^c \cap B| - {n \over 4}|, \, ||A \cap B^c| - {n \over 4}|, \, ||A^c \cap B^c| - {n \over 4}| \right \}. \end{array} \end{aligned} $$

If A, B are chosen according to a Hadamard matrix, then d(A, B) = 0. Assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} d(A, B) \le 0.1 n. \end{array} \end{aligned} $$

(8.61)

Then there exists an absolute constant c > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathrm{cov} (D(A), D(B)) = \delta \Sigma, \mbox{ where } |\delta| \le c { {d(A, B)}\over n }. \end{array} \end{aligned} $$

(8.62)

Again by the Hoeffding inequality, if we choose A ₁, A ₂ according to (8.57), there exists absolute constants c ₁, c ₂ > 0 such that $P( d(A_1, A_2) \ge u) \le c_1 \exp (-c_2 u^2 / n)$, indicating that (8.61) holds with probability close to 1, d(A ₁, A ₂) = O _P(n ^1∕2) and hence the weak orthogonality with δ(A ₁, A ₂) = O _P(n ^−1∕2).

Theorem 2 (Lou and Wu (2018))

Under conditions of Theorem 1 , we have $\sup _t |\hat F^\circ _N(t) - F(t)| \to 0$ in probability as N →∞.

5.3 Calculating the Power

The asymptotic power expression is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} B(\mu) = P[ g(Z + \sqrt{n} \mu) \ge t_{1-\alpha} ]. \end{array} \end{aligned} $$

(8.63)

Given the sample X ₁, …, X _n whose mean vector μ may not necessarily be 0, based on the estimated $\hat t_{1-\alpha }$ from the empirical cumulative distribution functions (8.59) and (8.60), we can actually estimate the power function by the following:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat B(\nu) &\displaystyle =&\displaystyle \hat P( g(D(A_j^\circ) + \sqrt{n} \nu) \ge \hat t_{1-\alpha} | X_1, \ldots, X_n) \\ &\displaystyle =&\displaystyle {1\over N} \sum_{j=1}^N {\mathbf{1}}_{g(D(A_j^\circ) + \sqrt{n} \nu) \ge \hat t_{1-\alpha}}. \end{array} \end{aligned} $$

(8.64)

5.4 An Algorithm with General Testing Functionals

For ease of application, we shall in this section provide details of testing the hypothesis H ₀ in (8.1) using the Rademacher weighting scheme described in Sect. 8.5.2.

Algorithm 1: Rademacher weighted testing procedure

_________________

1.
Input X ₁, …, X _n;
2.
Compute the average $\bar X_n$ and the test statistic $T = g(\sqrt {n} \bar X_n)$;
3.
Choose a large N in (8.60) and obtain the empirical quantile $\hat t^\circ _{1-\alpha }$;
4.
Reject H ₀ at level α if $T > \hat t^\circ _{1-\alpha }$;
5.
Report the p-value as $\hat F^\circ _N(T)$.

To construct a confidence region for μ, one can use (8.52) with t _1−α therein replaced by the empirical quantile $\hat t^\circ _{1-\alpha }$.

6 Numerical Experiments

In this section, we shall perform a simulation study and evaluate the finite-sample performance of our Algorithm 1 with $\hat {F}_N^\circ (t)$ defined in (8.60). Tests for mean vectors and covariance matrices are considered in Sects. 8.6.1 and 8.6.2, respectively. Section 8.6.3 contains a real data application on testing correlations between different pathways of a pancreatic ductal adenocarcinoma dataset.

6.1 Test of Mean Vectors

We consider three different testing functionals: for $x=(x_1,\ldots,x_p)^\top \in \mathbb {R}^p$, let

$$\displaystyle \begin{aligned} \begin{array}{rcl} g_1(x)=\max\limits_{j\leq p}|x_j|,\ \ g_2(x)=\sum_{j=1}^p|x_j|{}^2,\ \ g_3(x)=\sup_{c\geq 0}\left\{c^2\sum_{j=1}^p|x_j|{}^2{\mathbf{1}}_{|x_j|\geq c}\right\}. \end{array} \end{aligned} $$

For the L ^∞ form g ₁(x), four different testing procedures are compared: the procedure using our Algorithm 1 with $\hat {F}_N^\circ (\cdot )$ replaced by $\hat {F}_N (\cdot )$; cf (8.59); or by

$$\displaystyle \begin{aligned} \hat{F}_N^\dagger(t)=\frac{1}{N}\sum_{j=1}^N {\mathbf{1}}_{g(Y_j^\dagger)\leq t}, \mbox{ where } Y_j^\dagger=\frac{1}{\sqrt{n}}\sum_{i=1}^n\varepsilon_{ji}(X_i-\bar{X}) \end{aligned} $$

(8.65)

and ε _ji are i.i.d. Bernoulli(1∕2) independent of (X _ij); the test of Fan et al. (2007) (FHY, see (8.20) and (8.23)) and the Gaussian Multiplier Bootstrap method in Chernozhukov et al. (2014) (CCK, see (8.30)).

For g ₂(x), we compare the performance of our Algorithm 1 with $\hat {F}_N^\circ (\cdot )$, $\hat {F}_N(\cdot )$ and $\hat {F}_N^\dagger (\cdot )$, and also the CLT-based procedure of Chen and Qin (2010) (CQ), which is a variant of (8.45) with the numerator $n \bar X_n^T \bar X_n - f_1$ therein replaced by $n^{-1} \sum _{i\neq j}X_i^\top X_j$.

The portmanteau testing functional g ₃(x) is a marked weighted empirical process.

For our Algorithm 1 and the Gaussian Multiplier Bootstrap method, we calculate the empirical cutoff values with N = 4000. For each functional, we consider two models and use n = 40, 80 and p = 500, 1000. The empirical sizes for each case are calculated based on 1000 simulations.

Example 1 (Factor Model)

Let Z _ij be i.i.d. N(0, 1) and consider

$$\displaystyle \begin{aligned} X_i=(Z_{i1},\ldots,Z_{ip})^\top+p^{\delta}(Z_{i0},\ldots,Z_{i0})^\top,\ \ i=1,\ldots,n, \end{aligned} $$

(8.66)

Then X _i are i.i.d. N(0, Σ) with Σ = I_p + p ^2δ 11 ^⊤, where 1 = (1, …, 1)^⊤. Larger δ implies stronger correlation among the entries X _i1, …, X _ip.

Table 8.1 reports empirical sizes for the factor model with g ₁(⋅) at the 5% significance level. For each choice of p, n, and δ, our Algorithm 1 with $\hat {F}_N^\circ (\cdot )$ and $\hat {F}_N (\cdot )$ perform reasonably well, while the empirical sizes using $\hat {F}_{N}^\dagger (\cdot )$ are generally slightly larger than 5%. The empirical sizes using Chernozhukov et al.’s (8.30) or Fan et al.’s (8.23) are substantially different from the nominal level 5%. For large δ, as expected, the procedure of Fan, Hall, and Yao can be very conservative.

Table 8.1 Empirical sizes for the factor model (8.66) with g ₁(⋅)

Full size table

The empirical sizes for the factor model using g ₂(⋅) are summarized in Table 8.2. Our Algorithm 1 with $\hat {F}_N^\circ (\cdot )$ and $\hat {F}_N(\cdot )$ perform quite well. The empirical sizes for Chen and Qin’s procedure deviate significantly from 5%. This can be explained by the fact that CLT of type (8.45) is no longer valid for model (8.66); see the discussion following (8.45) and Theorem 2.2 in Xu et al. (2014).

Table 8.2 Empirical sizes for the factor model (8.66) using functional g ₂(x)

Full size table

When using functional g ₃(x), our Algorithm 1 with $\hat {F}_N^\circ (\cdot )$ and $\hat {F}_N (\cdot )$ perform slightly better than $\hat {F}_N^\dagger (\cdot )$ and approximate the nominal 5% level well (Table 8.3).

Table 8.3 Empirical sizes for the factor model (8.66) using functional g ₃(x)

Full size table

Example 2 (Multivariate t-Distribution)

Consider the multivariate t _ν vector

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} X_i=(X_{i1},\ldots,X_{ip})^\top = Y_i \sqrt{\nu/W_i} \sim t_\nu(0,\Sigma),\ \ i=1,\ldots,n \end{array} \end{aligned} $$

(8.67)

where the degrees of freedom ν = 4, $\Sigma =(\sigma _{jk})_{j,k=1}^p$, σ _jj = 1 for j = 1, …, p and

$$\displaystyle \begin{aligned} \sigma_{jk}=c|j-k|{}^{-d}, \ \ 1\leq j\neq k\leq p, \end{aligned}$$

and Y _i ∼ N(0, Σ), $W_i \sim \chi _{\nu }^2$ are independent. The above covariance structure allows long-range dependence among X _i1, …, X _ip; see Veillette and Taqqu (2013).

We summarize the simulated sizes for model (8.67) in Tables 8.4, 8.5, and 8.6. As in Example 1, similar conclusions apply here. Due to long-range dependence, the procedure of Fan, Hall, and Yao appears conservative. The Gaussian Multiplier Bootstrap (8.30) yields empirical sizes that are quite different from 5%. The CLT-based procedure of Chen and Qin is severely affected by the dependence. In practice we suggest using Algorithm 1 with $\hat {F}_N^\circ (\cdot )$ which has a good size accuracy.

Table 8.4 Empirical sizes for multivariate t-distribution using functional g ₁(x)

Full size table

Table 8.5 Empirical sizes for multivariate t-distribution using functional g ₂(x)

Full size table

Table 8.6 Empirical sizes for multivariate t-distribution using functional g ₃(x)

Full size table

6.2 Test of Covariance Matrices

6.2.1 Sizes Accuracy

We first consider testing for H _0a : Σ = I for the following model:

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} X_{ij}=\varepsilon_{i,j}\varepsilon_{i,j+1}, \, 1\leq i \leq n, \, 1\leq j\leq p, \end{array} \end{aligned} $$

(8.68)

where ε _ij are i.i.d. (1) standard normal; (2) centralized Gamma(4,1); and (3) the student t ₅. We then study the second test H _0b : Σ_1,2 = 0, by partitioning equally the entire random vector X _i = (X _i1, …, X _ip)^T into two subvectors of p ₁ = p∕2 and p ₂ = p − p ₁. In the simulation, we generate samples of two subvectors independently according to model (8.68). We shall use Algorithm 1 with L ² functional. Tables 8.7 and 8.8 report the simulated sizes based on 1000 replications with N = 1000 half-sampling implementations, and they are reasonably closed to the nominal level 5%.

Table 8.7 Simulated sizes of the L ² test for H _0a

Full size table

Table 8.8 Simulated sizes of the L ² test for H _0b

Full size table

6.2.2 Power Curve

To access the power for testing H ₀ : Σ = I_p using the L ² test, we consider the model

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} X_{ij}=\varepsilon_{i,j}\varepsilon_{i,j+1}+\rho\zeta_i, \, 1\leq i \leq n, \, 1\leq j\leq p, \end{array} \end{aligned} $$

(8.69)

where ε _ij and ζ _i are i.i.d. Student t ₅ and ρ is chosen to be 0, 0.02, 0.04, …, 0.7. The power curve is shown in Fig. 8.1. As expected, the power increases with n.

6.3 A Real Data Application

We now apply our testing procedures to a pancreatic ductal adenocarcinoma (PDAC) dataset, preprocessed from NCBI’s Gene Expression Omnibus, accessible through GEO Series accession number GSE28735 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28735). The dataset consists of two classes of gene expression levels that came from 45 pancreatic tumor patients and 45 pancreatic normal patients. There are a total of 28,869 genes. We shall test existence of correlations between two subvectors, which can be useful for identifying sets of genes which are significantly correlated.

We consider genetic pathways of the PDAC dataset. Pathways are found to be highly significantly associated with the disease even if they harbor a very small amount of individually significant genes. According to the KEGG database, the pathway “hsa05212” is relevant to pancreatic cancer. Among the 28,869 genes, 66 are mapped to this pathway. We are interested in testing whether the pathway to pancreatic cancer is correlated with some common pathways, “hsa04950” (21 genes, with name “Maturity onset diabetes of the young”), “hsa04940” (59 genes, with name “Type I diabetes mellitus”), “hsa04972” (87 genes, with name “Pancreatic secretion”). Let W _i, X _i, Y _i, and Z _i be the expression levels of individual i from the tumor group for pathways “hsa05212,” “hsa04950,” “hsa04940,” and “hsa04972,” respectively. The null hypotheses are $H^T_{0 1}: \mathrm {cov}(W_i, X_i) = 0_{66 \times 21}$, $H^T_{0 2}: \mathrm {cov}(W_i, Y_i) = 0_{66 \times 59}$ and $H^T_{0 3}: \mathrm {cov}(W_i, Z_i) = 0_{66 \times 87}$. Similar null hypothesis $H^N_{0 1}, H^N_{0 1}, H^N_{0 1}$ can be formulated for the normal group. Our L ² test of Algorithm 1 is compared with the Gaussian multiplier bootstrap (8.30). The results are summarized in Table 8.9. The CCK test is not able to reject the null hypothesis H ₀₃ at 5% level since it gives a p-value of 0.063291. However using the L ² test, H ₀₃ is rejected, suggesting that there is a substantial correlation between pathways “hsa05212” and “hsa04972.” Similar claims can be made for other cases. The L ² test also suggests that, at 0.1% level, for the tumor group, the hypotheses $H^T_{0 2}$ and $H^T_{0 3}$ are rejected, while for the normal group, the hypotheses $H^N_{0 2}$ and $H^N_{0 3}$ are not rejected.

Table 8.9 Estimated p-values of tests for covariances between pathway “pancreatic cancer” and other different pathways, based on N = 10⁶ half-sampling implementations

Full size table

References

Ahmad MR (2010) Tests for covariance matrices, particularly for high dimensional data. Technical Reports, Department of Statistics, University of Munich. http://epub.ub.uni-muenchen.de/11840/1/tr091.pdf. Accessed 3 Apr 2018
Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329
Google Scholar
Bai ZD, Jiang DD, Yao JF, Zheng SR (2009) Corrections to LRT on large-dimensional covariance matrix by RMT. Ann Stat 37:3822–3840
Article MathSciNet Google Scholar
Bickel PJ, Levina E (2008a) Regularized estimation of large covariance matrices. Ann Stat 36:199–227
Article MathSciNet Google Scholar
Bickel PJ, Levina E (2008b) Covariance regularization by thresholding. Ann Stat 36:2577–2604
Article MathSciNet Google Scholar
Birke M, Dette H (2005) A note on testing the covariance matrix for large dimension. Stat Probab Lett 74:281–289
Article MathSciNet Google Scholar
Brent RP, Osborn JH, Smith WD (2015) Probabilistic lower bounds on maxima determinants of binary matrices. Available at http://arxiv.org/pdf/1501.06235. Accessed 3 Apr 2018
Cai Y, Ma ZM (2013) Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli 19:2359–2388
Article MathSciNet Google Scholar
Cai T, Liu WD, Luo X (2011) A constrained l ₁ minimization approach to sparse precision matrix estimation. J Am Stat Assoc 106:594–607
Article MathSciNet Google Scholar
Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38:808–835
Article MathSciNet Google Scholar
Chen SX, Zhang L-X, Zhong P-S (2010) Tests for high-dimensional covariance matrices. J Am Stat Assoc 105:810–819
Article MathSciNet Google Scholar
Chen XH, Shao QM, Wu WB, Xu LH (2016) Self-normalized Cramér type moderate deviations under dependence. Ann Stat 44:1593–1617
Article MathSciNet Google Scholar
Chernozhukov V, Chetverikov D, Kato K (2014) Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann Stat 41:2786–2819
Article MathSciNet Google Scholar
Dickhaus T (2014) Simultaneous statistical inference: with applications in the life sciences. Springer, Heidelberg
Chapter Google Scholar
Dudiot S, van der Laan M (2008) Multiple testing procedures with applications to genomics. Springer, New York
Google Scholar
Efron B (2010) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, Cambridge
Google Scholar
Fan J, Hall P, Yao Q (2007) To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied. J Am Stat Assoc 102:1282–1288
Article MathSciNet Google Scholar
Fan J, Liao Y, Mincheva M (2013) Large covariance estimation by thresholding principal orthogonal complements. J R Stat Soc Ser B Stat Methodol 75:603–680
Article MathSciNet Google Scholar
Fisher TJ, Sun XQ, Gallagher CM (2010) A new test for sphericity of the covariance matrix for high dimensional data. J Multivar Anal 101:2554–2570
Article MathSciNet Google Scholar
Georgiou S, Koukouvinos C, Seberry J (2003) Hadamard matrices, orthogonal designs and construction algorithms. In: Designs 2002: further computational and constructive design theory, vols 133–205. Kluwer, Boston
Chapter Google Scholar
Han YF, Wu WB (2017) Test for high dimensional covariance matrices. Submitted to Ann Stat
Google Scholar
Hedayat A, Wallis WD (1978) Hadamard matrices and their applications. Ann Stat 6:1184–1238
Article MathSciNet Google Scholar
Jiang TF (2004) The asymptotic distributions of the largest entries of sample correlation matrices. Ann Appl Probab 14:865–880
Article MathSciNet Google Scholar
Jiang DD, Jiang TF, Yang F (2012) Likelihood ratio tests for covariance matrices of high-dimensional normal distributions. J Stat Plann Inference 142:2241–2256
Article MathSciNet Google Scholar
Ledoit O, Wolf M (2002) Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann Stat 30:1081–1102
Article MathSciNet Google Scholar
Liu WD, Shao QM (2013) A Cramér moderate deviation theorem for Hotelling’s T ²-statistic with applications to global tests. Ann Stat 41:296–322
Article Google Scholar
Lou ZP, Wu WB (2018) Construction of confidence regions in high dimension (Paper in preparation)
Google Scholar
Marčenko VA, Pastur LA (1967) Distribution of eigenvalues for some sets of random matrices. Math U S S R Sbornik 1:457–483
Article Google Scholar
Onatski A, Moreira MJ, Hallin M (2013) Asymptotic power of sphericity tests for high-dimensional data. Ann Stat 41:1204–1231
Article MathSciNet Google Scholar
Portnoy S (1986) On the central limit theorem in $\mathbb {R}^p$ when p →∞. Probab Theory Related Fields 73:571–583
Google Scholar
Qu YM, Chen SX (2012) Test for bandedness of high-dimensional covariance matrices and bandwidth estimation. Ann Stat 40:1285–1314
Article MathSciNet Google Scholar
Schott JR (2005) Testing for complete independence in high dimensions. Biometrika 92:951–956
Article MathSciNet Google Scholar
Schott JR (2007) A test for the equality of covariance matrices when the dimension is large relative to the sample size. Comput Stat Data Anal 51:6535–6542
Article MathSciNet Google Scholar
Srivastava MS (2005) Some tests concerning the covariance matrix in high-dimensional data. J Jpn Stat Soc 35:251–272
Article MathSciNet Google Scholar
Srivastava MS (2009) A test for the mean vector with fewer observations than the dimension under non-normality. J Multivar Anal 100:518–532
Article MathSciNet Google Scholar
Veillette MS, Taqqu MS (2013) Properties and numerical evaluation of the Rosenblatt distribution. Bernoulli 19:982–1005
Article MathSciNet Google Scholar
Wu WB (2005) Nonlinear system theory: another look at dependence. Proc Natl Acad Sci USA 102:14150–14154 (electronic)
Article MathSciNet Google Scholar
Wu WB (2011) Asymptotic theory for stationary processes. Stat Interface 4:207–226
Article MathSciNet Google Scholar
Wu WB, Shao XF (2004) Limit theorems for iterated random functions. J Appl Probab 41:425–436
Article MathSciNet Google Scholar
Xiao H, Wu WB (2013) Asymptotic theory for maximum deviations of sample covariance matrix estimates. Stoch Process Appl 123:2899–2920
Article MathSciNet Google Scholar
Xu M, Zhang DN, Wu WB (2014) L ² asymptotics for high-dimensional data. Available at http://arxiv.org/pdf/1405.7244v3. Accessed 3 Apr 2018
Yarlagadda RK, Hershey JE (1997) Hadamard matrix analysis and synthesis. Kluwer, Boston
Book Google Scholar
Zhang RM, Peng L, Wang RD (2013) Tests for covariance matrix with fixed or divergent dimension. Ann Stat 41:2075–2096
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Chicago, Chicago, IL, USA
Wei Biao Wu, Zhipeng Lou & Yuefeng Han

Authors

Wei Biao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Lou
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Biao Wu .

Editor information

Editors and Affiliations

Ladislaus von Bortkiewicz Chair of Statistics, C.A.S.E. Center for Applied Statistics & Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
Henry Horng-Shing Lu
School of Statistics, University of Minnesota, Minneapolis, USA
Xiaotong Shen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, W.B., Lou, Z., Han, Y. (2018). Hypothesis Testing for High-Dimensional Data. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-18284-1_8
Published: 18 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Hypothesis Testing for High-Dimensional Data

Abstract

Similar content being viewed by others

U-Tests of General Linear Hypotheses for High-Dimensional Data Under Nonnormality and Heteroscedasticity

Multi-sample hypothesis testing of high-dimensional mean vectors under covariance heterogeneity

A unified approach to testing mean vectors with large dimensions

Keywords

1 Introduction

2 Applications

2.1 Testing of Covariance Matrices

2.2 Testing of Independence

2.3 Analysis of Variance

3 Tests Based on L ∞ Norms

4 Tests Based on L 2 Norms

Condition 1

5 Asymptotic Theory

Theorem 1 (Lou and Wu (2018))

5.1 Preamble: i.i.d. Gaussian Data

5.2 Rademacher Weighted Differencing

Theorem 2 (Lou and Wu (2018))

5.3 Calculating the Power

5.4 An Algorithm with General Testing Functionals

Algorithm 1: Rademacher weighted testing procedure

6 Numerical Experiments

6.1 Test of Mean Vectors

Example 1 (Factor Model)

Example 2 (Multivariate t-Distribution)

6.2 Test of Covariance Matrices

6.2.1 Sizes Accuracy

6.2.2 Power Curve

6.3 A Real Data Application

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

3 Tests Based on L ^∞ Norms

4 Tests Based on L ² Norms