Keywords

1 Introduction

With the advance of modern data collection techniques, high-dimensional data appear in various fields including physics, biology, healthcare, finance, marketing, social network, and engineering among others. A common feature in such datasets is that the data dimension or the number of involved parameters can be quite large. As a fundamentally important problem in the study of such data, one would like to perform statistical inference of those parameters such as multiple testing or construction of confidence regions. With that one is able to provide an answer to the question whether there is signal in the dataset, or whether the dataset consists only of random noises. Due to the high-dimensionality, the inferential procedures developed for low-dimensional problems may no longer be valid in the high-dimensional setting. Different approaches should be designed to account for high-dimensionality. There exists a huge literature on multiple testing; see, for example, Dudiot and van der Laan (2008), Efron (2010) and Dickhaus (2014).

We now introduce the setting of our testing problem. Assume that X 1, X 2, …, are independent and identically distributed (i.i.d.) p-dimensional random vectors, with mean vector μ = (μ 1, …, μ p)T = E(X i) and covariance matrix Σ = cov(X i) = (σ jk)j,kp. We are testing the hypothesis of existence of a signal

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \mu = 0 \mbox{ vs } H_A: \mu \neq 0 \end{array} \end{aligned} $$
(8.1)

based on the sample X 1, …, X n. This formulation is actually very general and its solution can be applied to many other problems; see Sect. 8.2. We can estimate μ by the sample mean vector \(\hat \mu = \bar X_n = n^{-1} \sum _{i=1}^n X_i\). The classical Hotelling’s T-squared test has the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} T = \bar X_n \hat \Sigma_n^{-1} \bar X_n, \end{array} \end{aligned} $$
(8.2)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat \Sigma_n = (n-1)^{-1} \sum_{i=1}^n (X_i - \bar X_n) (X_i - \bar X_n)^T \end{array} \end{aligned} $$
(8.3)

is the sample covariance matrix estimate of Σ. If p is small and fixed, by the Central Limit Theorem (CLT),

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sqrt{n} (\bar X_n - \mu) \Rightarrow N(0, \Sigma). \end{array} \end{aligned} $$
(8.4)

By the Law of Large Numbers, if Σ is non-singular,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat \Sigma_n^{-1} \to \Sigma^{-1} \mbox{ almost surely.} \end{array} \end{aligned} $$
(8.5)

Clearly (8.4) and (8.5) imply that under H 0, the Hotelling’s T-squared statistic \(n T \Rightarrow \chi ^2_p\) (χ 2 distribution with degrees of freedom p). Thus we can reject H 0 at level 0 < α < 1 if \(n T > \chi ^2_{p, 1-\alpha }\), the (1 − α)th quantile of \(\chi ^2_p\).

In the high-dimensional situation in which p can be much larger than n, the CLT (8.4) is no longer valid; see Portnoy (1986). Furthermore, \(\hat \Sigma _n\) is singular and thus T is not well-defined. Also the matrix convergence (8.5) may not hold, see Marčenko and Pastur (1967). In this chapter we shall apply a testing functional approach that does not use \(\hat \Sigma _n^{-1}\) or the precision matrix Σ−1. A function \(g: \mathbb {R}^p \to [0, \infty )\) is said to be a testing functional if the following requirements are satisfied: (1) (monotonicity) for any \(x = (x_1, \ldots, x_p)^T \in \mathbb {R}^p\) and 0 < c < 1, g(cx) ≤ g(x); (2) (identifiability) g(x) = 0 if and only if x = 0. We shall consider the test statistic

$$\displaystyle \begin{aligned} \begin{array}{rcl} T_n = g(\sqrt{n} \bar X_n). \end{array} \end{aligned} $$
(8.6)

Examples of g include the L 2-based test with \(g(x) = \sum _{j=1}^p x_j^2\), the L -based test with g(x) =maxjp|x j|, the weighted empirical process \(g(x) = \sup _{u \ge 0} ( \sum _{j=1}^p \mathbf {1} _{|x_j| \ge u} h(u) )\), where h(⋅) is a nonnegative-valued non-decreasing function, among others. We reject H 0 in (8.1) if T n is too big.

As a theoretical foundation, we base our testing procedure on the following invariance principle result

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t \in \mathbb{R}} |P[ g(\sqrt{n} (\bar X_n-\mu)) \le t] - P[ g(\sqrt{n} \bar Z_n) \le t]| \to 0, \end{array} \end{aligned} $$
(8.7)

where Z, Z 1, Z 2, … are i.i.d. N(0, Σ) random vectors and \(\bar Z_n = n^{-1} \sum _{i=1}^n Z_i =_{\mathcal {D}} n^{-1/2} Z\). Interestingly, though the CLT (8.4) does not generally hold in the high-dimensional setting, the testing functional form (8.7) may still be valid. Chernozhukov et al. (2014) proved (8.7) with the L norm g(x) =maxjp|x j|, while Xu et al. (2014) consider the L 2 based test with \(g(x) = \sum _{j=1}^p x_j^2\). In Sect. 8.5 we shall provide a sufficient condition so that (8.7) holds for certain testing functionals.

In applying (8.7) for testing (8.1), one needs to know the distribution of \(g(\sqrt {n} \bar Z_n) =_{\mathcal {D}} g(Z)\) so that a suitable cutoff value can be obtained. The latter problem is highly nontrivial since the covariance matrix Σ, which is viewed as a nuisance parameter here, is typically not known and the associated estimation issue can be quite challenging. In Sect. 8.5 we shall propose a half-sampling technique which can avoid estimating the nuisance covariance matrix Σ.

2 Applications

Our paradigm (8.1) is actually quite general and it can be applied to testing of high-dimensional covariance matrices, testing of independence of high-dimensional data, analysis of variances with non-normal and heteroscedastic errors.

2.1 Testing of Covariance Matrices

There is a huge literature on testing covariance matrices such as uncorrelatedness, sphericity, or other patterns. For Gaussian data, tests for Σ = σ 2Ip, where Ip is the identity matrix, can be found in Ahmad (2010), Birke and Dette (2005), Chen et al. (2010), Fisher et al. (2010) and Ledoit and Wolf (2002). Tests for equality of covariance matrices are studied in Bai et al. (2009) and Jiang et al. (2012), and for sphericity is in Onatski et al. (2013). Minimax properties are considered in Cai and Ma (2013). For other contributions, see Qu and Chen (2012), Schott (2005, 2007), Srivastava (2005), Xiao and Wu (2013) and Zhang et al. (2013).

Assume that we have data matrix Y n = (Y i,j)1≤in,1≤jp, where \((Y_{i, j})_{j=1}^p\), i = 1, …, n, are i.i.d. p-dimensional random vectors. Let

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sigma_{ j k} = \mathrm{cov}(Y_{1,j}, \, Y_{1, k}), \quad 1 \le j, k \le p, \end{array} \end{aligned} $$
(8.8)

be the covariance function. Consider testing hypothesis for uncorrelatedness:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \sigma_{j k} = 0 \mbox{ for all } j\neq k. \end{array} \end{aligned} $$
(8.9)

For simplicity assume that E(Y i,j) = 0. For a pair a = (j, k) write X i,a = Y i,j Y i,k, and \(\bar X_a = n^{-1} \sum _{i=1}^n X_{i, a}\) and the (p 2 − p)-dimensional vector \(\bar X = (\bar X_a)_{a \in {\mathcal {A}}}\), where \({\mathcal {A}} = \{(j, k): \, j\neq k, \, j \le p, k \le p\}\). The hypothesis H 0 in (8.9) can be tested by using the test statistics \(T= g(\sqrt {n} \bar X)\). Xiao and Wu (2013) considered the L based test with g(x) =maxi|x i|, generalizing the result in Jiang (2004) which concerns the special case for i.i.d. vectors with independent entries. Han and Wu (2017) performed an L 2 based test for patterns of covariances with the test statistic

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} T = \sum_{a \in {\mathcal{A}}} \bar X_a^2 = \sum_{j \neq k} \hat \sigma_{j k}^2. \end{array} \end{aligned} $$
(8.10)

With slight modifications, one can also test the sphericity hypothesis

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \Sigma = \sigma^2 \mathrm{I}_p \mbox{ for some } \sigma^2 > 0, \end{array} \end{aligned} $$
(8.11)

where Ip is the p × p identity matrix. Let \({\mathcal {A}}_0 = \{(j, k): \, j, k \le p\}\) with diagonal entries added to \({\mathcal {A}}\). For \(a = (j, j) \in {\mathcal {A}}_0\), let \(X_{i, a} = Y_{i, j}^2 - \sigma ^2\). If σ 2 is known, then H 0 in (8.11) can be rejected at level α ∈ (0, 1) if \(T = g(\sqrt {n} \bar X) > t_{1-\alpha }\), where t 1−α is the (1 − α)th quantile of g(Z) and Z is a centered Gaussian vector with covariance structure cov(Z a, Z b) = E(X i,a X i,b), \(a, b \in {\mathcal {A}}_0\). In the case that σ 2 is not known, we shall use an estimate. For example, we can let \(\hat \sigma ^2 = n^{-1} \sum _{j=1}^n \hat \sigma _{j j}^2\), and consider \(X_{i, a}^\circ = Y_{i, j}^2 - \hat \sigma ^2\). Let \(X_{i, a}^\circ = X_{i, a}\) if a = (j, k) with j ≠ k. The hypothesis H 0 in (8.11) can be tested by the statistic \(T^\circ = g(\sqrt {n} \bar X^\circ )\).

2.2 Testing of Independence

Let \(Y_i = (Y_{i, j})_{j=1}^p\), i = 1, …, n, be i.i.d. p-dimensional random vectors with joint cumulative distribution function

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} F_{j_1, \ldots, j_d} (y_{j_1}, \ldots, y_{j_d}) = P(Y_{i, j_1} \le y_{j_1}, \ldots, Y_{i, j_d} \le y_{j_d}). \end{array} \end{aligned} $$
(8.12)

Consider the problem of testing whether entries of Y i are independent. Assume that the marginal distributions are standard uniform[0, 1]. For j = (j 1, …, j d), write \(F_{\mathbf {j}} (y_{\mathbf {j}}) = F_{j_1, \ldots, j_d} (y_{j_1}, \ldots, y_{j_d})\). For fixed d, the hypothesis of d-wise independence is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \,\, F_{\mathbf{j}} (y_{\mathbf{j}}) = y_{j_1} \ldots y_{j_d} \mbox{ holds for all } y_1, \ldots, y_d \in (0, 1) \mbox{ and } \mathbf{j} \in {\mathcal{A}}_d, \end{array} \end{aligned} $$
(8.13)

where \({\mathcal {A}}_d = \{ \mathbf {j} = (j_1, \ldots, j_d): \, j_1 < \cdots < j_d \le p \}\). Pairwise and triple-wise independence correspond to d = 2 and d = 3, respectively. We estimate F j(y j) by the empirical cdf

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat F_{\mathbf{j}}(y_{\mathbf{j}}) = {1\over n} \sum_{i=1}^n {\mathbf{1}}_{ Y_{i, \mathbf{j}} \le y_{\mathbf{j}}}, \end{array} \end{aligned} $$
(8.14)

where the notation Y i,j ≤ y j means \(Y_{i, j_h} \le y_{j_h}\) for all h = 1, …, d. Let \(y_{\mathbf { m}_1}, \ldots, y_{{\mathbf {m}}_N}\), N →, be a dense set of [0, 1]d. For example, we can choose them to be the lattice set {1∕K, …, (K − 1)∕K}d with N = (K − 1)d. Let X i, 1 ≤ i ≤ n, be the Np!∕(d!(p − d)!)-dimensional vector with the ( j)th component being \({\mathbf {1}}_{ Y_{i, \mathbf {j}} \le y_{{\mathbf {m}}_\ell }} - \prod _{h \in {\mathbf {m}}_\ell } y_h \), 1 ≤  ≤ N, \(\mathbf {j} \in {\mathcal {A}}_d\). Then the L 2-based test for (8.13) on the dense set \((y_{{\mathbf {m}}_\ell } )_{\ell =1}^N\) has the form \(n | \bar X |{ }_2^2\).

2.3 Analysis of Variance

Consider the following two-way ANOVA model

$$\displaystyle \begin{aligned} \begin{array}{rcl} Y_{i j k} = \mu + \alpha_i + \beta_j + \delta_{i j} + \varepsilon_{i j k}, i = 1, \ldots, I, j = 1, \ldots, J, k = 1, \ldots, K,\quad \end{array} \end{aligned} $$
(8.15)

where μ is the grand mean, α i and β j are the main effects from the first and the second factors, respectively, and δ ij are the interaction effect. Assume that (Y ijk)iI,jJ, k = 1, …, K, are i.i.d. Consider the hypothesis of interaction:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H_0: \, \delta_{i j} = 0 \mbox{ for all } i = 1, \ldots, I, \,\, j = 1, \ldots, J. \end{array} \end{aligned} $$
(8.16)

In the classical ANOVA procedure, one assumes that ε ijk, i ≤ I, j ≤ J, are i.i.d. N(0, σ 2) and makes use of the fact that the sum of squares

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} SS_I = \sum_{i=1}^I \sum_{j=1}^J (\bar Y_{i j \cdot} - \bar Y_{i \cdot \cdot} - \bar Y_{\cdot j \cdot} + \bar Y_{\cdot \cdot \cdot})^2 \end{array} \end{aligned} $$
(8.17)

is distributed as \(\sigma ^2 \chi ^2_{(I-1)(J-1)}\). Here \(\bar Y_{i j \cdot } = K^{-1} \sum _{k=1}^K Y_{i j k}\) and other sample averages \(\bar Y_{i \cdot \cdot }\), \(\bar Y_{\cdot j \cdot }\) and \(\bar Y_{\cdot \cdot \cdot }\) are similarly defined. The null hypothesis H 0 is rejected at level α ∈ (0, 1) if

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} { {SS_I} \over {(I-1)(J-1)}} > SS_E F_{(I-1)(J-1), I J (K-1), 1-\alpha} \end{array} \end{aligned} $$
(8.18)

where F (I−1)(J−1),IJ(K−1),1−α is the (1 − α)th quantile of the F-distribution F (I−1)(J−1),IJ(K−1) and

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} SS_E = {{ \sum_{i=1}^I \sum_{j=1}^J (Y_{i j k} - \bar Y_{i j \cdot})^2 }\over {I J (K-1)}} \end{array} \end{aligned} $$
(8.19)

is an estimate of σ 2.

The classical ANOVA procedure can be invalid when the assumption that ε ijk, i ≤ I, j ≤ J are i.i.d. N(0, σ 2) is violated. In the latter case SS I may no longer have a χ 2 distribution. However we can still approximate the distribution of SS I in terms of (8.7). For a = (i, j) let \(X_{a k} = \bar Y_{i j k} - \bar Y_{i \cdot k} - \bar Y_{\cdot j k} + \bar Y_{\cdot \cdot k}\). Then \(SS_I = \sum _{a \in {\mathcal {A}}} \bar X_{a \cdot }^2\), where \(\bar X_{a \cdot } = K^{-1} \sum _{k=1}^K X_{a k}\).

3 Tests Based on L Norms

Fan et al. (2007) considered the L norm based test of (8.1) with the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} M_n = \max_{j \le p} { {\sqrt{n} |\hat \mu_j - \mu_j|} \over \hat \sigma_j }, \mbox{ where } \hat \sigma_j^2 = {1\over n} \sum_{i=1}^n (X_{i j} - \hat \mu_j)^2. \end{array} \end{aligned} $$
(8.20)

Assume that the dimension p satisfies

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \log p = o(n^{1/3}) \end{array} \end{aligned} $$
(8.21)

and the uniform bounded third moment condition

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max_{j \le p} E|X_{i j} - \mu_j|{}^3 = O(1). \end{array} \end{aligned} $$
(8.22)

Let Φ be the standard normal cumulative distribution function and z α = Φ−1(α). Then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P(M_n \ge z_{1-\alpha / (2 p)}) \le \alpha + o(1). \end{array} \end{aligned} $$
(8.23)

Namely, if we perform the test by rejecting H 0 of (8.1) whenever M n ≥ z 1−α∕(2p), the familywise type I error of the latter test is asymptotically bounded by α. As a finite sample correction, the cutoff value z 1−α∕(2p) in (8.23) can be replaced by the t-distribution quantile t n−1,1−α∕(2p) with degree of freedom n − 1, noting that \((n-1)^{1/2} \hat \mu _j / \hat \sigma _j \sim t_{n-1}\) if X ij are Gaussian. Due to the Bonferroni correction, the test by Fan et al. (2007) can be quite conservative if the dependence among entries of X i is strong. For example, if X i1 = X i2 = ⋯ = X ip, then instead of using the cutoff value z 1−α∕(2p), one should use z 1−α∕2, since the cutoff value z 1−α∕(2p) leads to the extremely conservative type I error α∕(2p). If entries of X i are independent and X i is Gaussian, then the type I error is 1 − (1 − αp)p → 1 − e α and it is slightly conservative. For example, when α = 0.05, 1 − e α = 0.04877058.

Liu and Shao (2013) obtained Gumbel convergence of M n under the following conditions: (1) for some r > 3, the uniform bounded rth moment conditions maxjp E|X ij − μ j|r = O(1) holds, which is slightly stronger than (8.22) and (2) weak dependence among entries of X i. For Σ = (σ jk)j,kp, assume the correlation matrix R = (r jk)j,kp with \(r_{jk} = \sigma _{j k} / (\sigma _{j j}^{1/2} \sigma _{k k}^{1/2})\) has the property: for some γ > 0,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max\#\{j \le p: \, |r_{jk}| \ge (\log p)^{-1-\gamma}\} = O(p^\rho) \end{array} \end{aligned} $$
(8.24)

holds for all ρ > 0. Then under (8.21), Theorem 3.1 in Liu and Shao (2013) asserts the Gumbel convergence

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} M_n - 2 \log p + \log \log p \Rightarrow {\mathcal{G}}, \end{array} \end{aligned} $$
(8.25)

where \({\mathcal {G}}\) follows the Gumbel distribution \(P({\mathcal {G}} \le y) = \exp (-e^{-y/2}/\pi ^{1/2})\). By (8.25), one can reject H 0 in (8.1) at level α ∈ (0, 1) based on the L norm test

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max_{j \le p} { {\sqrt {n} | \hat \mu_j|} \over \hat \sigma_j} > 2 \log p - \log \log p + g_{1-\alpha}, \end{array} \end{aligned} $$
(8.26)

where g 1−α is chosen such that \(P({\mathcal {G}} \le g_{1-\alpha }) = 1 - \alpha \). Clearly the latter test has an asymptotically correct size.

Applying Theorem 2.2 in Chernozhukov et al. (2014), we can have the following Gaussian approximation result. Assume that there exist constants c 1, c 2 > 0 such that c 1 ≤ E(X ijμ j)2 ≤ c 2 holds for all j ≤ p and assume that u = u n,p satisfies

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left[\max_{j \le p} |X_{1 j} - \mu_j| \ge u \right] = o(n^{-1}) \end{array} \end{aligned} $$
(8.27)

Let m k =maxjp(E|X 1jμ j|k)1∕k and further assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} n^{-1/8} (m_3^{3/4} + m_4^{1/2} ) (\log (p n))^{7/8} + n^{-1/2} (\log (p n))^{3/2} u \to 0. \end{array} \end{aligned} $$
(8.28)

Let Z ∼ N(0, R). Then we have the Gaussian approximation result: as n →

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_t|P(M_n \ge t) - P(|Z|{}_\infty \ge t)| \to 0. \end{array} \end{aligned} $$
(8.29)

Let t 1−α be the (1 − α)th quantile of |Z|. The Gaussian approximation (8.29) leads to L norm based test: H 0 is rejected at level α if \(\max _{j \le p} \sqrt {n} |\hat \mu _j| / \hat \sigma _j \ge t_{1-\alpha }\). In comparison with the result in Fan et al. (2007), the latter test has an asymptotically correct size and it is dependence adjusted. To obtain an estimate for the cutoff value t 1−α, Chernozhukov et al. (2014) proposed a Gaussian Multiplier Bootstrap (GMB) method. Given X 1, …, X n, let \(\hat t_{1-\alpha }\) be such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left(\max_{j \le p} n^{-1/2} | \sum_{i=1}^n X_{i j} e_i | \ge \hat t_{1-\alpha} | X_1, \ldots, X_n\right) = \alpha, \end{array} \end{aligned} $$
(8.30)

where e i are i.i.d. N(0, 1) random variables independent of (X ij)i≥1,j≥1. Note that \( \hat t_{1-\alpha }\) can be numerically calculated by extensive Monte Carlo simulations. In Sect. 8.5 we shall propose a Hadamard matrix and a Rademacher weighted approaches. The simulation study in Sect. 8.6 shows that, for finite-sample performance, the latter approach gives a more accurate size than the method based on Gaussian Multiplier Bootstrap (8.30).

Chen et al. (2016) generalized Fan, Hall and Yao’s L norm to high-dimensional dependent vectors. Assume that \((X_i)_{i \in \mathbb {Z}}\) is a p-dimensional stationary process of the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} X_t = G({\mathcal{F}}_t) = (G_1({\mathcal{F}}_t), \ldots, G_p({\mathcal{F}}_t))^T, \end{array} \end{aligned} $$
(8.31)

where ε t, \(t\in \mathbb {Z}\), are i.i.d. random variables, \({\mathcal {F}}_t = (\ldots, \varepsilon _{t-1}, \varepsilon _t)\) and G(⋅) is a measurable function such that X t is well-defined. Assume that the long-run covariance matrix

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Sigma_\infty = \sum_{i=-\infty}^\infty \mathrm{cov}(X_0, X_i) = (\omega_{jl})_{j,l \le p} \end{array} \end{aligned} $$
(8.32)

exists. Let \(\varepsilon _i^\ast, \varepsilon _j, i, j\in \mathbb {Z}\), be i.i.d. random variables. Assume that X t has finite rth moment, r > 2. Define the functional dependence measures (see, Wu 2005, 2011) as

$$\displaystyle \begin{aligned} \theta_r(m) = \max_{j \le p} \| X_{i j} - G_j(\ldots, \varepsilon_{i-m-2}, \varepsilon_{i-m-1}, \varepsilon^\ast_{i-m}, \varepsilon_{i-m+1}, \ldots, \varepsilon_i) \|{}_r. \end{aligned} $$
(8.33)

If X i are i.i.d., then Σ =  Σ and θ r(m) = 0 if m ≥ 1. We say that (X t) is geometric moment contraction (GMC; see Wu and Shao 2004) if there exist ρ ∈ (0, 1) and a 1 > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \theta_r(m) \le a_1 \rho^m = a_1 e^{-a_2 m} ~ {\mathrm{with}}~ a_2 = - \log \rho. \end{array} \end{aligned} $$
(8.34)

Let μ = EX t. To test the hypothesis H 0 in (8.1), Chen et al. (2016) introduced the following dependence-adjusted versions of Fan, Hall, and Yao’s M n. Let n = mk, where m ≍ n 1∕4 and blocks B l = {i :  m(l − 1) + 1 ≤ i ≤ ml}. Let \(Y_{l j} = \sum _{i \in B_l} X_{i j}\), 1 ≤ j ≤ p, 1 ≤ l ≤ k, be the block sums. Define the block-normalized sum

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} M_n^\circ = \max_{j \le p} { {\sqrt{n} |\hat \mu_j - \mu_j|} \over \hat \sigma_j^\circ }, \mbox{ where } (\hat \sigma^\circ_j)^2 = {1\over m k} \sum_{l=1}^k (Y_{l j} - m \hat \mu_j)^2, \end{array} \end{aligned} $$
(8.35)

and the interlacing normalized sum: let k  = k∕2, \(\mu ^\dagger _j = (m k^*)^{-1} \sum _{l=1}^{k^*} Y_{2 l j}\) and

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} M_n^\dagger = \max_{j \le p} { {\sqrt {n/2} |\mu^\dagger_j - \mu_j|} \over \hat \sigma_j^\dagger }, \mbox{ where } (\hat \sigma^\dagger_j)^2 = {1\over m k^*} \sum_{l=1}^{k^*} (Y_{2 l j} - m \mu_j^\dagger)^2. \end{array} \end{aligned} $$
(8.36)

By Chen et al. (2016), we have the following result: Assume exists a constant ζ > 0 such that the long-run variance ω jj ≥ ζ for j ≤ p, (8.34) holds with r = 3, and

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \log p = o(n^{1/4}). \end{array} \end{aligned} $$
(8.37)

Then (8.23) holds for both the block-normalized sum \(M_n^\circ \) and the interlacing normalized sum \(M_n^\dagger \). Note that, while (8.37) still allows ultra high dimensions, due to dependence, the allowed dimension p in condition (8.37) is smaller than the one in (8.21). Additionally, if the GMC (8.34) holds with some r > 3, (8.24) holds with the long-run correlation matrix R = D −1∕2 Σ D −1∕2, where D = diag( Σ), and for some 0 < τ < 1∕4,

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \log p = o(n^{\tau}), \end{array} \end{aligned} $$
(8.38)

then we have the Gumbel convergence for the interlacing normalized sum:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} M_n^\dagger - 2 \log p + \log \log p \Rightarrow {\mathcal{G}}, \end{array} \end{aligned} $$
(8.39)

where \({\mathcal {G}}\) is given in (8.25). Similarly as (8.26), one can perform the following test which has an asymptotically correct size: we reject H 0 in (8.1) at level α ∈ (0, 1) if

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \max_{j \le p} { {\sqrt {n/2} |\mu^\dagger_j|} \over \hat \sigma_j^\dagger } > 2 \log p - \log \log p + g_{1-\alpha}. \end{array} \end{aligned} $$
(8.40)

4 Tests Based on L 2 Norms

In this section we shall consider the test which is based on the L 2 functional with \(g(x) = \sum _{j=1}^p x_j^2\). Let λ 1 ≥⋯ ≥ λ p ≥ 0 be the eigenvalues of Σ. For Z ∼ N(0, Σ), we have the distributional equality \(g(Z) = Z^T Z =_{\mathcal {D}} \sum _{j=1}^p \lambda _j \eta _j^2\), where η j are i.i.d. standard N(0, 1) random variables. Let \(f_k = (\sum _{j=1}^p \lambda _j^k)^{1/k}\), k > 0, and f = f 2. Then Eg(Z) = f 1 = tr( Σ) and var(g(Z)) = 2f 2. Xu et al. (2014) provide a sufficient condition for the invariance principle (8.7) with the quadratic functional g. For some 0 < δ ≤ 1 let q = 2 + δ.

Condition 1

Let δ > 0. Assume EX 1 = 0, E|X 1|2q < ∞ and let

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} K_\delta(X) ^q:= {E}\left|{|X_1|{}_2^2- f_1 \over f}\right|{}^q < \infty \end{array} \end{aligned} $$
(8.41)
$$\displaystyle \begin{aligned} \begin{array}{rcl} {} D_\delta(X) ^q:= {E}\left| {{ X_1^T X_2} \over f}\right|{}^q < \infty. \end{array} \end{aligned} $$
(8.42)

Observe that Condition 1, (8.41) and (8.42) are Lyapunov-type conditions. Assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} { K_{0}(X)^2 \over n} + { K_{\delta}(X)^q \over n^{q-1}} + { {E} (X_1^T \Sigma X_1)^{q/2} \over { n^{\delta/2} f^q} } + { D_\delta(X)^q \over n^{\delta}} \to 0 \mbox{ as } n \to \infty. \end{array} \end{aligned} $$
(8.43)

Then (8.7) holds (cf Xu et al. 2014). Consequently we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t \in \mathbb{R}} |P( (n |\bar X_n|{}_2^2 - f_1) / f \le t) - P(V \le t)| \to 0, \mbox{ where } V = \sum_{j=1}^p f^{-1} \lambda_j (\eta_j^2 - 1).\qquad \end{array} \end{aligned} $$
(8.44)

In the literature, researchers primarily focus on developing the central limit theorem

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} R_n := { {n |\bar X_n|{}_2^2 - f_1} \over f} = { {n \bar X_n^T \bar X_n - f_1} \over f} \Rightarrow N(0, 2) \end{array} \end{aligned} $$
(8.45)

or its modified version; see, for example, Bai and Saranadasa (1996), Chen and Qin (2010) and Srivastava (2009). Xu et al. (2014) clarified an important issue on the CLT of R n. By the Lindeberg–Feller central limit theorem, V ⇒ N(0, 2) as p → holds if and only if λ 1f → 0. The distributional approximation (8.44) indicates that, if λ 1f does not go to 0, then the central limit theorem cannot hold for R n.

Let t 1−α be the (1 − α)th quantile of g(Z) = |Z|2 = Z T Z. By (8.7) we can reject (8.1) at level α ∈ (0, 1) if

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} n |\bar X_n|{}^2 > t_{1-\alpha} \end{array} \end{aligned} $$
(8.46)

To calculate t 1−α, one needs to know the eigenvalues λ 1, …, λ p. However, estimation of those eigenvalues is a very challenging problem, in particular if one does not impose certain structural assumptions on Σ. In Sect. 8.5.2 we shall propose a half-sampling based approach which does not need estimation of the covariance matrix Σ.

The L based tests discussed in Sect. 8.3 have a good power when the alternative consists of few large signals. If the signals are small and have a similar magnitude, then the L 2 test is more powerful. To this end, assume that there exists a constant c > 0 and a small δ > 0 such that  ≤ μ j ≤ δc holds for all j = 1, …, p. We can interpret δ as the departure parameter (from the null H 0 with μ = 0). For the L -based test to have power approaching to 1, one necessarily requires that \(\sqrt {n} \delta \to \infty \). Elementary calculation shows that, under the much weaker condition np 1∕2 δ 2 →, then the power of the L 2 based test, or the probability that event (8.46) occurs going to one. In the latter condition, larger dimension p is actually a blessing as it requires a smaller departure δ.

5 Asymptotic Theory

In Sects. 8.3 and 8.4, we discussed the classical L and L 2 functionals, respectively. For a general testing functional, we have the following invariance principle (cf Theorem 1), which asserts that functionals of sample means of non-Gaussian random vectors X 1, X 2, … can be approximated by those of Gaussian vectors Z 1, Z 2, … with same covariance structure. Assume \(g \in \mathbb {C}^3(\mathbb {R}^p)\). For x = (x 1, …, x p)T write g j = g j(x) = ∂g(x)∕∂x j. Similarly we define the partial derivatives g jk and g jkl. For all j, k, l = 1, …, p, assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \kappa_{jkl} := \sup_{\mathbf{x}\in\mathbb{R}^p} (|g_jg_kg_l|+|g_{jk}g_l|+|g_{jl}g_k|+|g_{kl}g_j|+|g_{jkl}|) < \infty. \end{array} \end{aligned} $$
(8.47)

For Z 1 ∼ N(0, Σ) write Z 1 = (Z 11, …, Z 1p)T. Define

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathcal{K}_p=\sum_{j,k,l=1}^p \kappa_{jkl} ({E}|X_{1j}X_{1k}X_{1l}|+{E}|Z_{1j}Z_{1k}Z_{1l}|). \end{array} \end{aligned} $$
(8.48)

For \(g(Z_1) = _{\mathcal {D}} g(\sqrt {n} \bar Z_n)\), we assume that its c.d.f. F(t) = P[g(Z) ≤ t] is Hölder continuous: there exists p > 0, index α > 0, such that for all ψ > 0, the concentration function

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t\in\mathbb{R}} P(t\leq g(Z_1)\leq t+\psi) \leq \ell_p \psi^\alpha. \end{array} \end{aligned} $$
(8.49)

Theorem 1 (Lou and Wu (2018))

Assume (8.47), (8.49) and \(\mathcal {K}_p \ell _p^{3/\alpha } = o(\sqrt {n})\) . Then

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \sup_{t \in \mathbb{R}} |P[ g(\sqrt{n} (\bar X_n-\mu)) \le t] - P[ g(\sqrt{n} \bar Z_n) \le t]| = O(\ell_p^3 \mathcal{K}_p^\alpha n^{-\alpha / 2}) \to 0.\quad \end{array} \end{aligned} $$
(8.50)

To apply Theorem 1 for hypothesis testing, we need to know the c.d.f. F(t) = P[g(Z) ≤ t]. Note that F(⋅) depends on g and the covariance matrix Σ. Thus we can also write F(⋅) = F g(⋅). If Σ is known, the distribution of g(Z) is completely known and its cdf F(t) = P[g(Z) ≤ t] can be calculated either analytically or by extensive Monte Carlo simulations. Let t 1−α, 0 < α < 1, be the (1 − α)th quantile of g(Z). Namely

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P[ g(Z) > t_{1-\alpha}] = \alpha. \end{array} \end{aligned} $$
(8.51)

Then the null hypothesis H 0 in (8.1) is rejected at level α if the test statistic \(T_n = g(\sqrt {n} \bar X_n) > t_{1-\alpha }\). This test has asymptotically correct size α. Additionally, the (1 − α) confidence region for μ can be constructed as

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \{ \mu \in \mathbb{R}^p: \, g(\sqrt{n} (\bar X_n-\mu)) \le t_{1-\alpha} \} = \{ \bar X_n + \nu \in \mathbb{R}^p: \, g(\sqrt{n} \nu) \le t_{1-\alpha} \}.\quad \end{array} \end{aligned} $$
(8.52)

If Σ is not known, as a straightforward way to approximate F(t) = F g(t), one may use an estimate \(\tilde \Sigma \) so that F g(t) can be approximated by \(F_{g, \tilde \Sigma }(t)\). Here we do not adopt this approach for the following two reasons. First, it can be quite difficult to consistently estimate Σ without assuming sparseness or other structural conditions. The latter assumptions are widely used in the literature; see, for example, Bickel and Levina (2008a), Bickel and Levina (2008b), Cai et al. (2011) and Fan et al. (2013). Second, it is difficult to quantify the difference \(F_{g, \tilde \Sigma }(\cdot ) - F(\cdot )\) based on operator norm or other type of matrix convergence of the estimate \(\tilde \Sigma \). Xu et al. (2014) argued that, for the L 2 test with \(g(x) = \sum _{j=1}^p x_j^2\), one needs to use the normalized consistency of \(\tilde \Sigma \), instead of the widely used operator norm consistency. We propose using half-sampling and balanced Rademacher schemes.

5.1 Preamble: i.i.d. Gaussian Data

In practice, however, the covariance matrix Σ is typical unknown. Assume at the outset that X 1, …, X n are i.i.d. N(μ, Σ) vectors. Assume that n = 4m, where m is a positive integer. Then we can estimate the cumulative distribution function F(t) = P[g(Z) ≤ t] by using Hadamard matrices (see, Georgiou et al. 2003; Hedayat and Wallis 1978; Yarlagadda and Hershey 1997). We say that H is an n × n Hadamard matrix if its first row consisting all 1s, and all its entries taking values 1 or − 1 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H H^T = n I_n, \end{array} \end{aligned} $$
(8.53)

where I n is the n × n identity matrix. Let

$$\displaystyle \begin{aligned} \begin{array}{rcl} Y_j = {1\over \sqrt{n}} \sum_{i=1}^n H_{j i} X_i, \,\, j=1, \ldots, n. \end{array} \end{aligned} $$
(8.54)

By (8.53), we have \(\sum _{i=1}^n H_{j i} = 0\) for 2 ≤ j ≤ n and \(\sum _{i=1}^n H_{j i} H_{j' i}= 0\) if j ≠ j′. Since X 1, …, X n are i.i.d. N(μ, Σ), it is clear that Y 2, …, Y n are also i.i.d. N(0, Σ) vectors. Hence the random variables g(Y 2), …, g(Y n) are independent and identically distributed as g(Z). Therefore we can construct the empirical cumulative distribution function

$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat F_n(t) = {1\over {n-1}} \sum_{j=2}^n {\mathbf{1}}_{g(Y_j) \le t}, \end{array} \end{aligned} $$
(8.55)

which converges uniformly to F(t) as n →, and t 1−α can be estimated by \(\hat t_{1-\alpha } = \hat F_n^{-1}(1-\alpha ) \), the (1 − α)th empirical quantile of \(\hat F_n(\cdot )\). As an important feature of the latter method, one does not need to estimate the covariance matrix Σ, the nuisance parameter. In combinatorial experiment design, however, it is highly nontrivial to construct Hadamard matrices. If n is a power of 2, then one can simply apply Sylvester’s construction. The Hadamard conjecture states that a Hadamard matrix of order n exists when 4|n. The latter problem is still open. For example, it is unclear whether a Hadamard matrix exists when n = 668 (see Brent et al. 2015).

5.2 Rademacher Weighted Differencing

To circumvent the existence problem of Hadamard matrices in Sect. 8.5.1, we shall construct asymptotically independent realizations by using Rademacher random variables. Let \(\varepsilon _{j k}, j, k \in \mathbb {Z}\), independent of (X i)i≥1, be i.i.d. Bernoulli random variables with P(ε jk = 1) = P(ε jk = −1) = 1∕2. Define the Rademacher weighted differences

$$\displaystyle \begin{aligned} Y_j = D(A_j), \mbox{ where } D(A) = { {|A|{}^{1/2} (n-|A|)^{1/2}} \over n^{1/2} } \left( { {\sum_{i \in A} X_i} \over {|A|}} - { {\sum_{i \in \{1,\ldots, n\} - A} X_i} \over {n-|A|}} \right), \end{aligned} $$
(8.56)

where the random set

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} A_j = \{1 \le i \le n: \, \varepsilon_{j i} = 1\}. \vspace{3pt}\end{array} \end{aligned} $$
(8.57)

When defining Y j, we require that A j satisfies |A j|≠ 0 and |A j|≠ n. By the Hoeffding inequality, |A j| concentrates around n∕2 in the sense that, for u ≥ 0, \(P( ||A_j| -n/2| \ge u) \le 2 \exp (-2 u^2 / n)\). Alternatively, we consider the balanced Rademacher weighted differencing: let \(A_1^\circ, A_2^\circ, \ldots \) be simple random sample drawn equally likely from \({\mathcal {A}}_m = \{ A\subset \{1, \ldots, n \}: \, |A| = m\}\), where m = ⌊n∕2⌋. Similarly as Y j in (8.56), we define

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} Y_j^\circ = D(A_j^\circ). \vspace{3pt}\end{array} \end{aligned} $$
(8.58)

Clearly, given A j (resp. \(A_j^\circ \)), Y j (resp. \(Y_j^\circ \)) has mean 0 and covariance matrix Σ. Based on Y j in (8.56) (resp. \(Y_j^\circ \) in (8.58)), define the empirical distribution functions

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat F_N(t) = {1\over N} \sum_{j=1}^N {\mathbf{1}}_{g(Y_j) \le t}, \end{array} \end{aligned} $$
(8.59)

where N → and

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat F^\circ_N(t) = {1\over N} \sum_{j=1}^N {\mathbf{1}}_{g(Y_j^\circ) \le t}. \end{array} \end{aligned} $$
(8.60)

For sets A, B ⊂{1, …, n}, let A c = {1, …, n}− A, B c = {1, …, n}− B and

$$\displaystyle \begin{aligned} \begin{array}{rcl} d(A, B) = \max \left \{ ||A \cap B| - {n \over 4}|, \, ||A^c \cap B| - {n \over 4}|, \, ||A \cap B^c| - {n \over 4}|, \, ||A^c \cap B^c| - {n \over 4}| \right \}. \end{array} \end{aligned} $$

If A, B are chosen according to a Hadamard matrix, then d(A, B) = 0. Assume that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} d(A, B) \le 0.1 n. \end{array} \end{aligned} $$
(8.61)

Then there exists an absolute constant c > 0 such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathrm{cov} (D(A), D(B)) = \delta \Sigma, \mbox{ where } |\delta| \le c { {d(A, B)}\over n }. \end{array} \end{aligned} $$
(8.62)

Again by the Hoeffding inequality, if we choose A 1, A 2 according to (8.57), there exists absolute constants c 1, c 2 > 0 such that \(P( d(A_1, A_2) \ge u) \le c_1 \exp (-c_2 u^2 / n)\), indicating that (8.61) holds with probability close to 1, d(A 1, A 2) = O P(n 1∕2) and hence the weak orthogonality with δ(A 1, A 2) = O P(n −1∕2).

Theorem 2 (Lou and Wu (2018))

Under conditions of Theorem 1 , we have \(\sup _t |\hat F^\circ _N(t) - F(t)| \to 0\) in probability as N ∞.

5.3 Calculating the Power

The asymptotic power expression is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} B(\mu) = P[ g(Z + \sqrt{n} \mu) \ge t_{1-\alpha} ]. \end{array} \end{aligned} $$
(8.63)

Given the sample X 1, …, X n whose mean vector μ may not necessarily be 0, based on the estimated \(\hat t_{1-\alpha }\) from the empirical cumulative distribution functions (8.59) and (8.60), we can actually estimate the power function by the following:

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat B(\nu) &\displaystyle =&\displaystyle \hat P( g(D(A_j^\circ) + \sqrt{n} \nu) \ge \hat t_{1-\alpha} | X_1, \ldots, X_n) \\ &\displaystyle =&\displaystyle {1\over N} \sum_{j=1}^N {\mathbf{1}}_{g(D(A_j^\circ) + \sqrt{n} \nu) \ge \hat t_{1-\alpha}}. \end{array} \end{aligned} $$
(8.64)

5.4 An Algorithm with General Testing Functionals

For ease of application, we shall in this section provide details of testing the hypothesis H 0 in (8.1) using the Rademacher weighting scheme described in Sect. 8.5.2.

Algorithm 1: Rademacher weighted testing procedure

_________________

  1. 1.

    Input X 1, …, X n;

  2. 2.

    Compute the average \(\bar X_n\) and the test statistic \(T = g(\sqrt {n} \bar X_n)\);

  3. 3.

    Choose a large N in (8.60) and obtain the empirical quantile \(\hat t^\circ _{1-\alpha }\);

  4. 4.

    Reject H 0 at level α if \(T > \hat t^\circ _{1-\alpha }\);

  5. 5.

    Report the p-value as \(\hat F^\circ _N(T)\).

To construct a confidence region for μ, one can use (8.52) with t 1−α therein replaced by the empirical quantile \(\hat t^\circ _{1-\alpha }\).

6 Numerical Experiments

In this section, we shall perform a simulation study and evaluate the finite-sample performance of our Algorithm 1 with \(\hat {F}_N^\circ (t)\) defined in (8.60). Tests for mean vectors and covariance matrices are considered in Sects. 8.6.1 and 8.6.2, respectively. Section 8.6.3 contains a real data application on testing correlations between different pathways of a pancreatic ductal adenocarcinoma dataset.

6.1 Test of Mean Vectors

We consider three different testing functionals: for \(x=(x_1,\ldots,x_p)^\top \in \mathbb {R}^p\), let

$$\displaystyle \begin{aligned} \begin{array}{rcl} g_1(x)=\max\limits_{j\leq p}|x_j|,\ \ g_2(x)=\sum_{j=1}^p|x_j|{}^2,\ \ g_3(x)=\sup_{c\geq 0}\left\{c^2\sum_{j=1}^p|x_j|{}^2{\mathbf{1}}_{|x_j|\geq c}\right\}. \end{array} \end{aligned} $$

For the L form g 1(x), four different testing procedures are compared: the procedure using our Algorithm 1 with \(\hat {F}_N^\circ (\cdot )\) replaced by \(\hat {F}_N (\cdot )\); cf (8.59); or by

$$\displaystyle \begin{aligned} \hat{F}_N^\dagger(t)=\frac{1}{N}\sum_{j=1}^N {\mathbf{1}}_{g(Y_j^\dagger)\leq t}, \mbox{ where } Y_j^\dagger=\frac{1}{\sqrt{n}}\sum_{i=1}^n\varepsilon_{ji}(X_i-\bar{X}) \end{aligned} $$
(8.65)

and ε ji are i.i.d. Bernoulli(1∕2) independent of (X ij); the test of Fan et al. (2007) (FHY, see (8.20) and (8.23)) and the Gaussian Multiplier Bootstrap method in Chernozhukov et al. (2014) (CCK, see (8.30)).

For g 2(x), we compare the performance of our Algorithm 1 with \(\hat {F}_N^\circ (\cdot )\), \(\hat {F}_N(\cdot )\) and \(\hat {F}_N^\dagger (\cdot )\), and also the CLT-based procedure of Chen and Qin (2010) (CQ), which is a variant of (8.45) with the numerator \(n \bar X_n^T \bar X_n - f_1\) therein replaced by \(n^{-1} \sum _{i\neq j}X_i^\top X_j\).

The portmanteau testing functional g 3(x) is a marked weighted empirical process.

For our Algorithm 1 and the Gaussian Multiplier Bootstrap method, we calculate the empirical cutoff values with N = 4000. For each functional, we consider two models and use n = 40, 80 and p = 500, 1000. The empirical sizes for each case are calculated based on 1000 simulations.

Example 1 (Factor Model)

Let Z ij be i.i.d. N(0, 1) and consider

$$\displaystyle \begin{aligned} X_i=(Z_{i1},\ldots,Z_{ip})^\top+p^{\delta}(Z_{i0},\ldots,Z_{i0})^\top,\ \ i=1,\ldots,n, \end{aligned} $$
(8.66)

Then X i are i.i.d. N(0, Σ) with Σ = Ip + p 2δ 11 , where 1 = (1, …, 1). Larger δ implies stronger correlation among the entries X i1, …, X ip.

Table 8.1 reports empirical sizes for the factor model with g 1(⋅) at the 5% significance level. For each choice of p, n, and δ, our Algorithm 1 with \(\hat {F}_N^\circ (\cdot )\) and \(\hat {F}_N (\cdot )\) perform reasonably well, while the empirical sizes using \(\hat {F}_{N}^\dagger (\cdot )\) are generally slightly larger than 5%. The empirical sizes using Chernozhukov et al.’s (8.30) or Fan et al.’s (8.23) are substantially different from the nominal level 5%. For large δ, as expected, the procedure of Fan, Hall, and Yao can be very conservative.

Table 8.1 Empirical sizes for the factor model (8.66) with g 1(⋅)

The empirical sizes for the factor model using g 2(⋅) are summarized in Table 8.2. Our Algorithm 1 with \(\hat {F}_N^\circ (\cdot )\) and \(\hat {F}_N(\cdot )\) perform quite well. The empirical sizes for Chen and Qin’s procedure deviate significantly from 5%. This can be explained by the fact that CLT of type (8.45) is no longer valid for model (8.66); see the discussion following (8.45) and Theorem 2.2 in Xu et al. (2014).

Table 8.2 Empirical sizes for the factor model (8.66) using functional g 2(x)

When using functional g 3(x), our Algorithm 1 with \(\hat {F}_N^\circ (\cdot )\) and \(\hat {F}_N (\cdot )\) perform slightly better than \(\hat {F}_N^\dagger (\cdot )\) and approximate the nominal 5% level well (Table 8.3).

Table 8.3 Empirical sizes for the factor model (8.66) using functional g 3(x)

Example 2 (Multivariate t-Distribution)

Consider the multivariate t ν vector

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} X_i=(X_{i1},\ldots,X_{ip})^\top = Y_i \sqrt{\nu/W_i} \sim t_\nu(0,\Sigma),\ \ i=1,\ldots,n \end{array} \end{aligned} $$
(8.67)

where the degrees of freedom ν = 4, \(\Sigma =(\sigma _{jk})_{j,k=1}^p\), σ jj = 1 for j = 1, …, p and

$$\displaystyle \begin{aligned} \sigma_{jk}=c|j-k|{}^{-d}, \ \ 1\leq j\neq k\leq p, \end{aligned}$$

and Y i ∼ N(0, Σ), \(W_i \sim \chi _{\nu }^2\) are independent. The above covariance structure allows long-range dependence among X i1, …, X ip; see Veillette and Taqqu (2013).

We summarize the simulated sizes for model (8.67) in Tables 8.4, 8.5, and 8.6. As in Example 1, similar conclusions apply here. Due to long-range dependence, the procedure of Fan, Hall, and Yao appears conservative. The Gaussian Multiplier Bootstrap (8.30) yields empirical sizes that are quite different from 5%. The CLT-based procedure of Chen and Qin is severely affected by the dependence. In practice we suggest using Algorithm 1 with \(\hat {F}_N^\circ (\cdot )\) which has a good size accuracy.

Table 8.4 Empirical sizes for multivariate t-distribution using functional g 1(x)
Table 8.5 Empirical sizes for multivariate t-distribution using functional g 2(x)
Table 8.6 Empirical sizes for multivariate t-distribution using functional g 3(x)

6.2 Test of Covariance Matrices

6.2.1 Sizes Accuracy

We first consider testing for H 0a :  Σ = I for the following model:

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} X_{ij}=\varepsilon_{i,j}\varepsilon_{i,j+1}, \, 1\leq i \leq n, \, 1\leq j\leq p, \end{array} \end{aligned} $$
(8.68)

where ε ij are i.i.d. (1) standard normal; (2) centralized Gamma(4,1); and (3) the student t 5. We then study the second test H 0b : Σ1,2 = 0, by partitioning equally the entire random vector X i = (X i1, …, X ip)T into two subvectors of p 1 = p∕2 and p 2 = p − p 1. In the simulation, we generate samples of two subvectors independently according to model (8.68). We shall use Algorithm 1 with L 2 functional. Tables 8.7 and 8.8 report the simulated sizes based on 1000 replications with N = 1000 half-sampling implementations, and they are reasonably closed to the nominal level 5%.

Table 8.7 Simulated sizes of the L 2 test for H 0a
Table 8.8 Simulated sizes of the L 2 test for H 0b

6.2.2 Power Curve

To access the power for testing H 0 :  Σ = Ip using the L 2 test, we consider the model

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} X_{ij}=\varepsilon_{i,j}\varepsilon_{i,j+1}+\rho\zeta_i, \, 1\leq i \leq n, \, 1\leq j\leq p, \end{array} \end{aligned} $$
(8.69)

where ε ij and ζ i are i.i.d. Student t 5 and ρ is chosen to be 0, 0.02, 0.04, …, 0.7. The power curve is shown in Fig. 8.1. As expected, the power increases with n.

Fig. 8.1
figure 1

Power curve for testing H 0 :  Σ = Ip with model (8.69), and n = 20, 50, using the L 2 test

6.3 A Real Data Application

We now apply our testing procedures to a pancreatic ductal adenocarcinoma (PDAC) dataset, preprocessed from NCBI’s Gene Expression Omnibus, accessible through GEO Series accession number GSE28735 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28735). The dataset consists of two classes of gene expression levels that came from 45 pancreatic tumor patients and 45 pancreatic normal patients. There are a total of 28,869 genes. We shall test existence of correlations between two subvectors, which can be useful for identifying sets of genes which are significantly correlated.

We consider genetic pathways of the PDAC dataset. Pathways are found to be highly significantly associated with the disease even if they harbor a very small amount of individually significant genes. According to the KEGG database, the pathway “hsa05212” is relevant to pancreatic cancer. Among the 28,869 genes, 66 are mapped to this pathway. We are interested in testing whether the pathway to pancreatic cancer is correlated with some common pathways, “hsa04950” (21 genes, with name “Maturity onset diabetes of the young”), “hsa04940” (59 genes, with name “Type I diabetes mellitus”), “hsa04972” (87 genes, with name “Pancreatic secretion”). Let W i, X i, Y i, and Z i be the expression levels of individual i from the tumor group for pathways “hsa05212,” “hsa04950,” “hsa04940,” and “hsa04972,” respectively. The null hypotheses are \(H^T_{0 1}: \mathrm {cov}(W_i, X_i) = 0_{66 \times 21}\), \(H^T_{0 2}: \mathrm {cov}(W_i, Y_i) = 0_{66 \times 59}\) and \(H^T_{0 3}: \mathrm {cov}(W_i, Z_i) = 0_{66 \times 87}\). Similar null hypothesis \(H^N_{0 1}, H^N_{0 1}, H^N_{0 1}\) can be formulated for the normal group. Our L 2 test of Algorithm 1 is compared with the Gaussian multiplier bootstrap (8.30). The results are summarized in Table 8.9. The CCK test is not able to reject the null hypothesis H 03 at 5% level since it gives a p-value of 0.063291. However using the L 2 test, H 03 is rejected, suggesting that there is a substantial correlation between pathways “hsa05212” and “hsa04972.” Similar claims can be made for other cases. The L 2 test also suggests that, at 0.1% level, for the tumor group, the hypotheses \(H^T_{0 2}\) and \(H^T_{0 3}\) are rejected, while for the normal group, the hypotheses \(H^N_{0 2}\) and \(H^N_{0 3}\) are not rejected.

Table 8.9 Estimated p-values of tests for covariances between pathway “pancreatic cancer” and other different pathways, based on N = 106 half-sampling implementations