A unified approach to testing mean vectors with large dimensions

Ahmad, M. Rauf

doi:10.1007/s10182-018-00343-z

A unified approach to testing mean vectors with large dimensions

Original Paper
Open access
Published: 10 December 2018

Volume 103, pages 593–618, (2019)
Cite this article

Download PDF

You have full access to this open access article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

A unified approach to testing mean vectors with large dimensions

Download PDF

M. Rauf Ahmad ORCID: orcid.org/0000-0002-5362-5835¹

2682 Accesses
5 Citations
Explore all metrics

Abstract

A unified testing framework is presented for large-dimensional mean vectors of one or several populations which may be non-normal with unequal covariance matrices. Beginning with one-sample case, the construction of tests, underlying assumptions and asymptotic theory, is systematically extended to multi-sample case. Tests are defined in terms of U-statistics-based consistent estimators, and their limits are derived under a few mild assumptions. Accuracy of the tests is shown through simulations. Real data applications, including a five-sample unbalanced MANOVA analysis on count data, are also given.

Multi-sample hypothesis testing of high-dimensional mean vectors under covariance heterogeneity

Article 22 March 2024

On high-dimensional two sample mean testing statistics: a comparative study with a data adaptive choice of coefficient vector

Article 18 July 2015

A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices

Article 03 November 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let $\mathbf{X}_k = (X_{k1}, \ldots , X_{kp})' \sim {\mathcal {F}}$, $k = 1, \ldots , n$ be iid random vectors, where ${\mathcal {F}}$ denotes a p-variate distribution, with $E(\mathbf{X}_k) = {\varvec{\mu }} \in {\mathbb {R}}^p$ and ${{\mathrm{Cov}}}(\mathbf{X}_k) = {\varvec{\Sigma }} \in {\mathbb {R}}^{p\times p}_{> 0}$. A hypothesis of foremost interest to be tested in this setup is $H_0: {\varvec{\mu }} = \mathbf{0}$ against an appropriate alternative, say $H_1:$ Not $H_0$. For an extension to $g \ge 2$ samples, let $\mathbf{X}_{ik} = (X_{ik1}, \ldots , X_{ikp})' \sim {\mathcal {F}}_i$ be iid random vectors with $E(\mathbf{X}_{ik}) = {\varvec{\mu }}_i \in {\mathbb {R}}^p$, ${{\mathrm{Cov}}}(\mathbf{X}_{ik}) = {\varvec{\Sigma }}_i \in {\mathbb {R}}^{p\times p}_{>0}$, $k = 1, \ldots , n_i$, $i = 1, \ldots , g$. The corresponding hypothesis of interest is $H_{0g}:{\varvec{\mu }}_1 = \cdots = {\varvec{\mu }}_g$ vs. $H_{1g}:$ Not $H_{0g}$.

Our objective here is to present test statistics for the aforementioned one- and multi-sample hypotheses when $p > n_i$, ${\mathcal {F}}_i$’s are not necessarily normal and ${\varvec{\Sigma }}_i$, likewise $n_i$, in the multi-sample case may be unequal. The proposed tests are thus valid for high-dimensional, non-normal, unbalanced data under Behrens–Fisher problem. In particular, for $g \ge 3$, it refers to testing high-dimensional one-way MANOVA hypothesis under non-normality and multi-sample Behrens–Fisher problem.

When $p < n_i$, tests of $H_0$ or $H_{0g}$ are most often carried out by Hotelling’s $T^2$ or Wilks’ Lambda statistic which are uniformly most powerful invariant likelihood ratio tests. They, however, collapse for high-dimensional case, particularly due to singularity of the empirical covariance matrix involved (see Sects. 2, 3). A number of proposals have recently been put forth in the literature on the modification of these classical tests for high-dimensional data.

Whereas most modifications assume normality, some of them are based on a more flexible model, and still others offer completely nonparametric solution to the problem. Likewise holds for homoscedasticity assumption, ${\varvec{\Sigma }}_i = {\varvec{\Sigma }}$$\forall ~i = 1, \ldots g \ge 2$. For details, see e.g., Dempster (1958), Bai and Saranadasa (1996), Läuter et al. (1998), Läuter (2004), Fujikoshi (2004), Schott (2007), Chen and Qin (2010), Aoshima and Yata (2011, 2015), Katayama and Kano (2014), Wang et al. (2015), Feng et al. (2016) and Hu et al. (2017). For a review, see Hu and Bai (2015) and Fujikoshi et al. (2010).

We present a coherent testing theory encompassing one- and multi-sample cases. The construction of the tests, the assumptions, and the strategy of obtaining limiting distribution of the test statistics is succinctly threaded together via a common approach, initiating with the one-sample case and extending systematically to the multi-sample cases. The main distinguishing feature of the proposed tests is that we simultaneously relax commonly adopted linear model assumptions such as normality and homoscedasticity, for all cases up to one-way MANOVA. Further, all tests are defined in terms of U-statistics with simple, bivariate, product kernels composed of bilinear forms of independent vectors. This helps us determine the limits of the test statistics for a general multivariate model. These limits are derived under $(n_i, p)$- or high-dimensional asymptotics, i.e., $n_i, p \rightarrow \infty $, using only a few mild assumptions.

The basic idea is introduced in detail for one-sample case in the next section, with an extension to two-sample case. Multi-sample extension follows in Sect. 3. Sections 4 and 5 deal with simulations and applications. Proofs and technical details are deferred to “Appendix”.

2 The one- and two-sample tests

2.1 The one-sample case

For the one-sample data setup in Sect. 1, let the unbiased estimators of ${\varvec{\mu }}$ and ${\varvec{\Sigma }}$ be defined as $\overline{\mathbf{X}} = \sum _{k = 1}^n\mathbf{X}_k/n$ and $\widehat{\varvec{\Sigma }} = \sum _{k = 1}^n(\mathbf{X}_k - \overline{\mathbf{X}})(\mathbf{X}_k - \overline{\mathbf{X}})'/(n - 1)$. If $n > p$ and ${\mathcal {F}}$ is multivariate normal, then $H_0: {\varvec{\mu }} = \mathbf{0}$ can be tested using Hotelling’s statistic $T^2 = n\overline{\mathbf{X}}'\widehat{\varvec{\Sigma }}^{-1}\overline{\mathbf{X}} = \overline{\mathbf{X}}'[\widehat{\varvec{\Sigma }}/n]^{-1}\overline{\mathbf{X}}$ where $\widehat{\varvec{\Sigma }}/n$ estimates ${\varvec{\Sigma }}/n = {{\mathrm{Cov}}}(\overline{\mathbf{X}})$. When $p > n$, $\widehat{\varvec{\Sigma }}$ is singular and $T^2$ collapses, requiring a careful modification that can provide valid inference when $p \rightarrow \infty $, possibly along with $n \rightarrow \infty $. The two indices may sometimes be assumed to grow at the same rate so that $p/n \rightarrow c \in (0, \infty )$. Alternatively, a sequential asymptotics, first letting $p \rightarrow \infty $ followed by $n \rightarrow \infty $, may be considered under which conditions like $p/n \rightarrow c$ may be dispensed with.

Note that, $\widehat{\varvec{\Sigma }}$ may be ill-conditioned for $p < n$ whence $(\cdot )^{-1}$ can be replaced with Moor–Penrose inverse (see e.g., Duchesne and Francq 2015). For $p \gg n$, this approach is unreliable and inefficient. An alternative is to remove $\widehat{\varvec{\Sigma }}^{-1}$ from $T^2$ and consider the Euclidean distance $Q = \overline{\mathbf{X}}'\overline{\mathbf{X}} = \Vert \overline{\mathbf{X}}\Vert ^2$. An interesting consequence of this can be witnessed by a simple split of Q as

$$\begin{aligned} Q = \frac{1}{n^2}\sum _{k = 1}^n\sum _{r = 1}^n\mathbf{X}'_k\mathbf{X}_r ~=~ \frac{1}{n^2}\sum _{k = 1}^n\mathbf{X}'_k\mathbf{X}_k + \frac{1}{n^2}\underset{k \ne r}{\sum _{k = 1}^n\sum _{r = 1}^n}\mathbf{X}'_k\mathbf{X}_r = Q_1 + U_n, \end{aligned}$$

(1)

where $Q_1 = (E - U_n)/n$ with $E = \sum _{k = 1}^n\mathbf{X}'_k\mathbf{X}_k/n$ and $U_n = \sum _{k \ne r}^n\mathbf{X}'_k\mathbf{X}_r/n(n - 1)$. Note that, E is an average of quadratic forms, and $U_n$ is an average of bilinear forms composed of independent components. It is shown below that the limiting distribution of the statistic mainly follows from $U_n$ where $Q_1$ converges in probability to a constant. With $E(\mathbf{X}'_k\mathbf{X}_k) = {{\mathrm{tr}}}({\varvec{\Sigma }}) + {\varvec{\mu }}'{\varvec{\mu }}$, $E(\mathbf{X}'_k\mathbf{X}_r) = {\varvec{\mu }}'{\varvec{\mu }}$, we get $E(Q_1) = {{\mathrm{tr}}}({\varvec{\Sigma }})/n$, $E(U_n) = {\varvec{\mu }}'{\varvec{\mu }}$. Thus,

$$\begin{aligned} E(Q) = {{\mathrm{tr}}}({\varvec{\Sigma }})/n + \Vert {\varvec{\mu }}\Vert ^2, \end{aligned}$$

(2)

which is ${{\mathrm{tr}}}({\varvec{\Sigma }})/n$ under $H_0$. We observe a few salient features of this bifurcation of Q. First, $E(Q_1) = {{\mathrm{tr}}}({\varvec{\Sigma }})/n = {{\mathrm{Cov}}}(\overline{\mathbf{X}})$ implies that the removal of the inverse of the estimator of ${{\mathrm{Cov}}}(\overline{\mathbf{X}})$ results into a bias term composed of the trace of the same estimator, since it can be verified that $Q_1 = {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }})/n$ or $\mathbf{E}_1 - \mathbf{Q}_0 = \widehat{\varvec{\Sigma }}$ with $\mathbf{E}_1 = \sum _{k = 1}^n\mathbf{X}_k\mathbf{X}'_k/n$ and $\mathbf{Q}_0 = \sum _{k \ne r}^n\mathbf{X}_k\mathbf{X}'_r/n(n - 1)$ as matrix versions of E and $U_n$. Note also that $Q_1$ is independent of ${\varvec{\mu }}$, and $U_n$ is independent of ${\varvec{\Sigma }}$, under both $H_0$ and $H_1$. Now, $E(U_n) = \Vert {\varvec{\mu }}\Vert ^2$ which is 0 under $H_0$. Together, the last two facts imply that $U_n$ can be used to construct the modified test statistic for $H_0$, whereas $Q_1$ can help compensate for the removal of estimator from the original test statistic. For this, write $Q = {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }})/n + U_n = Q_1 + U_n$ and by a simple scaling and re-writing, consider the statistic

$$\begin{aligned} T_1 = 1 + \frac{nQ_0}{nQ_1/p}, \end{aligned}$$

(3)

where $Q_0 = U_n/p$ is $U_n$, but with kernel normed by p, $h(\mathbf{x}_k, \mathbf{x}_r) = \mathbf{X}'_k\mathbf{X}_r/p$. $T_1$ is the proposed modified statistic for $H_0: {\varvec{\mu }} = \mathbf{0}$ when $p \gg n$ and ${\mathcal {F}}$ may be non-normal.

For the limit of $T_1$, $nQ_1/p$ is first shown to converge in probability to a constant as $n, p \rightarrow \infty $. Then, $nQ_0$ is shown to converge weakly to a normal limit. Under $H_0$, the kernel of $U_n$ degenerates so that the null limit follows through a weighted sum of independent $\chi ^2$ variables. The limit of T follows then by Slutsky’s lemma. As the same scheme will later be extended for $g \ge 2$, we treat the one-sample case in detail. Let $\lambda _s$, $s = 1, \ldots , p$ be the eigenvalues of ${\varvec{\Sigma }}$ so that $\lambda _s/p = \nu _s$ corresponds to ${\varvec{\Sigma }}/p$. We need the following assumptions.

Assumption 1

$E(X^4_{ks}) = \gamma _s \le \gamma < \infty \forall ~s = 1, \ldots , p$, $\gamma _0 \in {\mathbb {R}}^+$.

Assumption 2

$\lim _{p \rightarrow \infty }\sum _{s = 1}^p\nu _s = \nu _0 \in {\mathbb {R}}^+$.

Assumption 3

$\lim _{n, p \rightarrow \infty } p/n = c = O(1)$.

Assumption 4

$\lim _{p \rightarrow \infty } {\varvec{\mu }}'{\varvec{\Sigma }}{\varvec{\mu }}/p = \phi = O(1)$.

Assumption 1 helps us relax normality. By Assumption 2, $\sum _{s = 1}^p\nu ^2_s = O(1)$, as $p \rightarrow \infty $. Assumption 4 is only required under $H_1$. We have the following theorem, proved in “Appendix B.1.”

Theorem 5

For $T_1$ in Eq. (3), $(T_1 - E(T_1))/\sqrt{{{\mathrm{Var}}}(T_1)}\xrightarrow {{\mathcal {D}}} N(0, 1)$ under Assumptions 1–4, as $n, p \rightarrow \infty $, where $E(T_1)$ and ${{\mathrm{Var}}}(T_1)$ denote the mean and variance of $T_1$.

From the proof of Theorem 5, $E(T_1)$ and ${{\mathrm{Var}}}(T_1)$ approximate 1 and $2{{\mathrm{tr}}}({\varvec{\Xi }}^2)/[{{\mathrm{tr}}}({\varvec{\Xi }})]^2$, respectively, ${\varvec{\Xi }} = n{\varvec{\Sigma }}/p$. As the limit follows from a weighted sum of $\chi ^2_1$ variables, the moments in fact approximate a scaled Chi-square variable, say $\chi ^2_f/f$ with moments 1 and 2 / f, where $f = f_1/f_2$, $f_1 = [{{\mathrm{tr}}}({\varvec{\Xi }})]^2$, $f_2 = {{\mathrm{tr}}}({\varvec{\Xi }}^2)$. Thus, to estimate ${{\mathrm{Var}}}(T_1)$, we need consistent estimators of ${{\mathrm{tr}}}({\varvec{\Sigma }}^2)$ and $[{{\mathrm{tr}}}({\varvec{\Sigma }})]^2$. Define $Q = \sum _{k = 1}^n(\widetilde{\mathbf{X}}'_k\widetilde{\mathbf{X}}_k)^2/(n - 1)$, $\widetilde{\mathbf{X}} = \mathbf{X}_i - \overline{\mathbf{X}}$, $\eta = (n - 1)/[n(n - 2)(n - 3)]$. Then, $E_2 = \eta \{(n - 1)(n - 2)\text {tr}(\widehat{\varvec{\Sigma }}^2) + [\text {tr}(\widehat{\varvec{\Sigma }})]^2 - nQ\}$, $E_3 = \eta \{2\text {tr}(\widehat{\varvec{\Sigma }}^2) + (n^2 - 3n + 1)[\text {tr}(\widehat{\varvec{\Sigma }})]^2 - nQ\}$ are unbiased and consistent estimators of ${{\mathrm{tr}}}({\varvec{\Sigma }}^2)$ and $[{{\mathrm{tr}}}({\varvec{\Sigma }})]^2$. Then ${\widehat{f}} = {\widehat{f}}_1/{\widehat{f}}_2$ is consistent estimator of f, hence $\widehat{{{\mathrm{Var}}}(T_1)}$ of ${{\mathrm{Var}}}(T_1)$ such that $\widehat{{{\mathrm{Var}}}(T_1)}/{{\mathrm{Var}}}(T_1) \rightarrow 1$; see Ahmad (2017b) and end of Sect. 3. We have the following corollary.

Corollary 6

Theorem 5 remains valid when ${{\mathrm{Var}}}(T_1)$ is replaced with ${\widehat{{{\mathrm{Var}}}}}(T_1)$.

Power of$T_1$ Let $z_{\alpha }$ be $100\alpha \%\hbox {th}$ quantile of N(0, 1), $\beta ({\varvec{\theta }})$ the power function of $T_1$ with ${\varvec{\theta }} \in {\varvec{\Theta }_0}$ or ${\varvec{\theta }} \in {\varvec{\Theta }}_1$ where ${\varvec{\Theta }}_0 = \{\mathbf{0}\}$, ${\varvec{\Theta }}_1 = {\varvec{\Theta }}\backslash \{\mathbf{0}\}$ are respective parameter spaces under $H_0$, $H_1$ with ${\varvec{\Theta }} = {\varvec{\Theta }}\cup {\varvec{\Theta }}_1$, ${\varvec{\Theta }}_0\cap {\varvec{\Theta }}_1 = \phi $. By Theorem 5, $\beta ({\varvec{\theta }}) = \text {P}(z_1 \ge z_{\alpha })$ with $\beta ({\varvec{\theta }}|H_0) = \alpha $, $\beta ({\varvec{\theta }}|H_1) = 1 - \beta $, as $n, p \rightarrow \infty $, where $z_1 = (T_1 - E(T_1))/\sqrt{{{\mathrm{Var}}}(T_1)}$. Then, $1 - \beta = P(z \ge z_\alpha - n\delta )$, $\delta = \delta _1/\delta _2$, $\delta _1 = {\varvec{\mu }}'{\varvec{\mu }}/p$, $\delta ^2_2 = \sum _{s = 1}^p\nu ^2_s$. By the convergence of $nQ_1/p$, and as $\delta _1$, $\delta _2$ are uniformly bounded under the assumptions, $1 - \beta \rightarrow 1$ as $n, p \rightarrow \infty $.

Remark 7

A remark on the structure of $T_1$ is in order. With $[nQ_1/p]/{{\mathrm{tr}}}({\varvec{\Xi }})$ converging in probability to 1, consider $T_1 = 1 + nU_n/{{\mathrm{tr}}}({\varvec{\Sigma }})$, also ignoring p for convenience. Then, $E(T_1) = 1 + n\Vert {\varvec{\mu }}\Vert ^2/{{\mathrm{tr}}}({\varvec{\Sigma }}) = 1 + E(\overline{\mathbf{X}})'E(\overline{\mathbf{X}})/{{\mathrm{Cov}}}(\overline{\mathbf{X}})$, where $E(T_1) = 1$ under $H_0$. In this sense, $T_1$ is similar to an F-statistic, where $T_1$ is close to 1 under $H_0$ and moves apart as ${\varvec{\mu }}$ deviates from $\mathbf{0}$. Since ${{\mathrm{Cov}}}(\overline{\mathbf{X}}) + E(\overline{\mathbf{X}})'E(\overline{\mathbf{X}}) = E(\overline{\mathbf{X}}'\overline{\mathbf{X}})$, the partitioning used to define $T_1$ helps not only adjust for bias term but also makes the resulting statistic computationally much simpler, particularly under non-normality. A similar argument holds for multi-sample tests presented in the next sections.

2.2 The two-sample case

For the multi-sample setup in Sect. 1, let $g = 2$. We are interested to test $H_{02}: {\varvec{\mu }}_1 = {\varvec{\mu }}_2$ versus $H_{12}:$ Not $H_{02}$. Let $\overline{\mathbf{X}}_i = \sum _{k = 1}^{n_i}{} \mathbf{X}_{ik}/n_i$ and $\widehat{\varvec{\Sigma }}_i = \sum _{k = 1}^{n_i}(\mathbf{X}_{ik} - \overline{\mathbf{X}}_i)(\mathbf{X}_{ik} - \overline{\mathbf{X}}_i)'/(n_i - 1)$ be unbiased estimators of ${\varvec{\mu }}_i$ and ${\varvec{\Sigma }}_i$. Denote $n = n_1 + n_2$. Assuming normality, ${\varvec{\Sigma }}_i = {\varvec{\Sigma }} \forall ~i$ and $n - 2 > p$, $H_{02}$ is usually tested by two-sample $T^2$, $T^2 = [n_1n_2/n](\overline{\mathbf{X}}_1 - \overline{\mathbf{X}}_2)'\widehat{\varvec{\Sigma }}^{-1}(\overline{\mathbf{X}}_1 - \overline{\mathbf{X}}_2)$, where $\widehat{\varvec{\Sigma }} = \sum _{i = 1}^2(n_i - 1)\widehat{\varvec{\Sigma }}_i/\sum _{i = 1}^2(n_i - 1)$ is an estimator of common ${\varvec{\Sigma }}$. For $p > n - 2$ or more generally for $p > n_i$, $T^2$ is invalid by the same token as for its one-sample counterpart. We consider a likewise partition of $Q = \Vert \overline{\mathbf{X}}_1 - \overline{\mathbf{X}}_2\Vert ^2 = \overline{\mathbf{X}}'_{1}\overline{\mathbf{X}}_{1} + \overline{\mathbf{X}}'_{2}\overline{\mathbf{X}}_{2} - 2\overline{\mathbf{X}}'_{1}\overline{\mathbf{X}}_{2}$ as

$$\begin{aligned} Q = \sum _{i = 1}^2\frac{1}{n^2_i}\sum _{k = 1}^{n_i}\sum _{r = 1}^{n_i}{} \mathbf{X}'_{ik}{} \mathbf{X}_{ir} - \frac{2}{n_1n_2}\sum _{k = 1}^{n_1}\sum _{l = 1}^{n_2}{} \mathbf{X}'_{1k}{} \mathbf{X}_{2l} ~=~ Q_1 + U_0 \end{aligned}$$

(4)

with $Q_1 = \sum _{i = 1}^2Q_{i1} = \sum _{i = 1}^2{{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_i)/n_i = {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_0)$, $Q_{i1} = (E_i - U_{n_i})/n_i = {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_i)/n_i$, $U_0 = \sum _{i = 1}^2U_{n_i} - 2U_{n_1n_2}$, where $E_i = \sum _{k = 1}^{n_i}\mathbf{X}'_{ik}{} \mathbf{X}_{ik}/n_i$ and

$$\begin{aligned} U_{n_i} = \frac{1}{n_i(n_i - 1)}\underset{k \ne r}{\sum _{k = 1}^{n_i}\sum _{r = 1}^{n_i}}{} \mathbf{X}'_{ik}{} \mathbf{X}_{ir}, \quad U_{n_1n_2} = \frac{1}{n_1n_2}\sum _{k = 1}^{n_1}\sum _{l = 1}^{n_2}{} \mathbf{X}'_{1k}{} \mathbf{X}_{2l}, \end{aligned}$$

(5)

are one- and two-sample U-statistics, respectively, with symmetric kernels as bilinear forms of independent vectors. As in the one-sample case, $E(Q_{i1}) = {{\mathrm{tr}}}({\varvec{\Sigma }}_i)/n_i \Rightarrow E(Q_1) = {{\mathrm{tr}}}({\varvec{\Sigma }}_0)$, ${\varvec{\Sigma }}_0 = \sum _{i = 1}^2{\varvec{\Sigma }}_i/n_i$ and $E(U_0) = \Vert {\varvec{\mu }}_1 - {\varvec{\mu }}_2\Vert ^2$ which vanishes under $H_{02}$. Thus,

$$\begin{aligned} E(Q) = {{\mathrm{tr}}}({\varvec{\Sigma }}_0) + \Vert {\varvec{\mu }}_1 - {\varvec{\mu }}_2\Vert ^2 ~=~ {{\mathrm{tr}}}({\varvec{\Sigma }}_0)~ \text {under}~H_{02}. \end{aligned}$$

(6)

Again, $E(Q_1)$ is independent of ${\varvec{\mu }}_i$, and $E(U_0)$ is independent of ${\varvec{\Sigma }}_i$, under $H_{02}$ and $H_{12}$. Further, $E(Q_1) = {{\mathrm{tr}}}({\varvec{\Sigma }}_0)$, ${\varvec{\Sigma }}_0 = {{\mathrm{Cov}}}(\overline{\mathbf{X}}_1 - \overline{\mathbf{X}}_2)$. We thus extend $T_1$ in Eq. (3) for $H_{02}$ as

$$\begin{aligned} T_2 = 1 + \frac{nQ_0}{[nQ_1/p]}, \end{aligned}$$

(7)

where $Q_0 = U_0/p$ is $U_0$ with kernels of $U_{n_i}$ and $U_{n_1n_2}$ scaled by p, i.e., $h(\mathbf{x}_k, \mathbf{x}_r) = \mathbf{X}'_{ik}{} \mathbf{X}_{ir}/p$ and $h(\mathbf{x}_k, \mathbf{x}_l) = \mathbf{X}'_{1k}{} \mathbf{X}_{2l}/p$, respectively. Following assumptions extend those of one-sample case, where $\nu _{is} = \lambda _{is}/p$ are eigenvalues of ${\varvec{\Xi }}_i = {\varvec{\Sigma }}_i/p$, $i = 1, 2$.

Assumption 8

$E(X^4_{iks}) = \gamma _{is} \le \gamma < \infty \forall ~s = 1, \ldots , p$, $i = 1, \ldots , g$, $\gamma \in {\mathbb {R}}^+$.

Assumption 9

$\lim _{p \rightarrow \infty }\sum _{i = 1}^p\nu _{is} = \sum _{s = 1}^\infty \nu _{is} = \nu _{i0} \in {\mathbb {R}}^+$, $i = 1, \ldots , g$.

Assumption 10

$\lim _{n_i, p \rightarrow \infty } p/n_i = c_i = O(1)$, $i = 1, \ldots , g$.

Assumption 11

$\lim _{n_i \rightarrow \infty } n/n_i = \rho _i = O(1)$, $i = 1, \ldots , g$.

Assumption 12

$\lim _{p \rightarrow \infty } {\varvec{\mu }}'_i{\varvec{\Sigma }}_k{\varvec{\mu }}_j/p = \phi _{ijk} \le \phi = O(1)$, $i, j, k = 1, \ldots , g$.

As the same assumptions will be used in Sect. 3, they are stated for $g \ge 2$. Assumption 11 is additional to those for one-sample case. It is needed to keep the limit non-degenerate when $n_i \rightarrow \infty $, $n = \sum _{i = 1}^gn_i$. Assumption 12 is again needed only under $H_{12}$. Following theorem, proved in “Appendix B.2”, extends Theorem 5 to two-sample case.

Theorem 13

For $T_2$ in Eq. (7), $(T_2 - E(T_2))/\sqrt{{{\mathrm{Var}}}(T_2)} \xrightarrow {{\mathcal {D}}} N(0, 1)$ under Assumptions 8–12, as $n_i, p \rightarrow \infty $, where $E(T_2)$ and ${{\mathrm{Var}}}(T_2)$ denote the mean and variance of $T_2$.

It is interesting to see how the limit for degenerate case sums up. With $\nu _0$ as the limit of $nQ_1/p$, it follows from (21) and (22) that (see e.g., Anderson et al. 1994)

$$\begin{aligned}&nQ_0~\xrightarrow {{\mathcal {D}}}~\sum _{s = 1}^\infty \left( \sqrt{\rho _1\nu _{1s}}z_{1s} - \sqrt{\rho _2\nu _{2s}}z_{2s}\right) ^2 - \nu _0 \\&\quad \Rightarrow ~ T_2 ~\xrightarrow {{\mathcal {D}}}~ \sum _{s = 1}^\infty \left( \sqrt{\rho _1\nu _{1s}}z_{1s} - \sqrt{\rho _2\nu _{2s}}z_{2s}\right) ^2/\nu _0, \end{aligned}$$

with 1 and $2\sum _{s = 1}^\infty (\rho _1\nu _{1s} - \rho _1\nu _{2s})^2/\nu ^2_0$ as limiting mean and variance, where the variance approximates $2{{\mathrm{tr}}}({\varvec{\Xi }}^2)/[{{\mathrm{tr}}}({\varvec{\Xi }})]^2$, ${\varvec{\Xi }} = n{\varvec{\Sigma }}_0/p$, ${\varvec{\Sigma }}_0 = \sum _{i = 1}^2{\varvec{\Sigma }}_i/n_i$. By the same argument of a scaled Chi-square approximation as for one-sample case, the moments correspond to those of $\chi ^2_f/f$, i.e., 1 and 2 / f, $f = f_1/f_2$, $f_1 = [{{\mathrm{tr}}}({\varvec{\Xi }}_0)]^2$, $f_2 = {{\mathrm{tr}}}({\varvec{\Xi }}^2_0)$. Let $E_{2i}$ = $\eta _i\{(n_i - 1)(n_i - 2)\text {tr}(\widehat{\varvec{\Sigma }}^2_i)$ + $[\text {tr}(\widehat{\varvec{\Sigma }}_i)]^2 - n_iQ_i\}$, $E_{3i}$ = $\eta _i\{2\text {tr}(\widehat{\varvec{\Sigma }}^2_i)$ + $(n^2_i - 3n_i + 1)[\text {tr}(\widehat{\varvec{\Sigma }}_i)]^2 - n_iQ_i\}$, where $Q_i = \sum _{k = 1}^{n_i}(\widetilde{\mathbf{X}}'_{ik}\widetilde{\mathbf{X}}_{ik})^2/(n_i - 1)$, $\widetilde{\mathbf{X}}_i = \mathbf{X}_{ik} - \overline{\mathbf{X}}_i$, $\eta _i = (n_i - 1)/[n_i(n_i - 2)(n_i - 3)]$. Further, by independence, ${{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_1\widehat{\varvec{\Sigma }}_2)$ is an unbiased and consistent estimator of ${{\mathrm{tr}}}({\varvec{\Sigma }}_1{\varvec{\Sigma }}_2)$. Plugging in $f_1$, $f_2$ leads to a consistent estimator of f, hence of ${{\mathrm{Var}}}(T_2)$, i.e., ${\widehat{{{\mathrm{Var}}}}}(T_2)$. We have the following corollary.

Corollary 14

Theorem 13 remains valid when ${{\mathrm{Var}}}(T_2)$ is replaced with ${\widehat{{{\mathrm{Var}}}}}(T_2)$.

Remark 15

Due to its special practical value, the two-sample test has been investigated the most, also for high-dimensional case. We briefly discuss three tests, most closely related to $T_2$. Denote $\kappa = n_1n_2/n$, $\omega _1 = (n - 1)/(n - 2)$, $\omega _2 = (n - 2)^2/n(n - 1)$, $n = n_1 + n_2$. Let $\xi = \Vert \overline{\mathbf{X}}_1 - \overline{\mathbf{X}}_2\Vert ^2 - {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }})/\kappa $, where $\widehat{\varvec{\Sigma }}$ is the pooled estimator of common ${\varvec{\Sigma }}$ as given in the context of $T^2$ above.

Dempster (1958) proposed the first two-sample test for high-dimensional data under normality, motivated by a problem put forth by his colleagues (see Sect. 5). The test, in simpler form, is given as $T_D = \Vert \overline{\mathbf{X}}_1 - \overline{\mathbf{X}}_2\Vert ^2/\kappa {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }})$. An alternative form of $T_D$ follows by partitioning the norm in the numerator into several independent quadratic forms using an orthonormal transformation, so that the test follows an approximate F distribution with degrees of freedom estimated using a scaled Chi-square distribution. See also Dempster (1960, 1968) for details, where Bai and Saranadasa (1996) give a detailed evaluation of the approximation and power of Dempster’s test.

Bai and Saranadasa (1996)’s test, $T_\text {BS} = \kappa \xi /\sqrt{2\omega _1}B$, is a standardization of $\xi $ under homoscedasticity, where $B^2 = \omega _2\{{{\mathrm{tr}}}(\widehat{\varvec{\Sigma }})^2 - [{{\mathrm{tr}}}(\widehat{\varvec{\Sigma }})]^2/n\}$. Chen and Qin (2010)’s test, $T_\text {CQ}$, is a standardization of $U_0 = \sum _{i = 1}^2U_{n_i} - 2U_{n_1n_2}$; see (4). $T_\text {CQ}$ is based on the same model used for $T_\text {BS}$ but relaxing normality and homoscedasticity. From the partition of Q in (4), it follows that, under the assumption of homoscedasticity, $T_\text {D}$ divides the norm by the biased term, where $T_\text {BS}$ and $T_\text {CQ}$ subtract the same bias term from the norm, so that the numerator in both tests is $U_0$ with $E(U_0) = \Vert {\varvec{\mu }}_1 - {\varvec{\mu }}_2\Vert ^2 = 0$ under $H_0$, where for ${\varvec{\Sigma }}_i = {\varvec{\Sigma }}$, $i = 1, 2$, both tests coincide.

The proposed test, $T_g$, $g \ge 1$, differs from both in that it uses the removed bias term to rescale the test, where it neither requires normality nor homoscedasticity assumption. Note that, $T_\text {CQ}$ is also defined without the two assumptions, but the bias adjustment, assumptions and computation of variance of the statistic are reasonably different for the two tests.

To get a more precise idea on the comparison of these tests, we did a simulation study to assess their test sizes and power. Two independent random samples of iid vectors of sizes ($n_1$, $n_2$), $n_1 \in \{10, 20, 50\}$, $n_2 = 2n_1$, each of dimension $p \in \{50, 100, 300, 500\}$, are generated from normal, $t_7$ and Unif[0, 1] distributions with covariance matrices, ${\varvec{\Sigma }}_i$, i = 1, 2, compound symmetry, CS, and autoregressive of order 1, AR(1). The CS and AR(1) are defined, respectively, as $\kappa \mathbf{I} + \rho \mathbf{J}$ and $\text {Cov}(X_k, X_l) = \kappa \rho ^{|k - l|}$, $\forall ~ k, l$, with $\mathbf{I}$ as identity matrix and $\mathbf{J}$ a matrix of 1s. For size, we pair ${\varvec{\Sigma }}_i$ for the two populations: both ${\varvec{\Sigma }}_1$ and ${\varvec{\Sigma }}_2$ CS with $\rho = 0.5$ and $\rho = 0.8$, respectively; ${\varvec{\Sigma }}_1$ as CS, ${\varvec{\Sigma }}_2$ as AR(1), both with $\rho = 0.5$. For power, we use CS with $\rho $ = 0.4 and 0.8. We take $\kappa = 1$ for all cases. For brevity, power results are only reported for p = 100, for normal and t distributions.

Table 1 reports estimated test sizes of $T_2$, $T_\text {BS}$ and $T_\text {CQ}$ for all distributions with both pairs of ${\varvec{\Sigma }}_i$. We observe an accurate performance of $T_2$ for all parameters, whereas $T_\text {BS}$ and $T_\text {CQ}$ prove, respectively, to be very liberal and very conservative, with their performance at least not improving with increasing p or (particularly) increasing n. Note that, the inaccuracy of $T_\text {BS}$ can be justified as it may pertain to the homoscedasticity assumption the test is based on and which is violated in the simulations. The performance of $T_\text {CQ}$, on the other hand, can be ascribed to its assumptions, particularly on the vanishing of trace ratios such as ${{\mathrm{tr}}}({\varvec{\Sigma }}^4)/[{{\mathrm{tr}}}({\varvec{\Sigma }}^2)]^2$, ${{\mathrm{tr}}}({\varvec{\Sigma }}^2)/[{{\mathrm{tr}}}({\varvec{\Sigma }})]^2$ and ${{\mathrm{tr}}}({\varvec{\Sigma }}^3)/{{\mathrm{tr}}}({\varvec{\Sigma }}){{\mathrm{tr}}}({\varvec{\Sigma }}^2)$, which are not satisfied for certain covariance structures, e.g., compound symmetric. A discussion on $T_g$ is adjourned for Sect. 4, where it is evaluated in more detail.

Table 1 Estimated test size for $T_2$, $T_\text {BS}$ and $T_\text {CQ}$ for three distributions with unequal covariance matrices

Full size table

From Fig. 1, we also observe power of $T_2$ higher than its competitors where the curves come closer with increasing non-centrality parameter as well as with increasing sample sizes, and this phenomenon is very similar for both distributions. Generally, a similar comparative performance and effect of sample sizes are observed for different p values; hence, not all are reported here.

3 Multi-sample test: one-way MANOVA

Here, we extend $T_2$ to the general case, $g \ge 2$. As usual, $\overline{\mathbf{X}}_i$ and $\widehat{\varvec{\Sigma }}_i$ are unbiased estimators of ${\varvec{\mu }}_i$, ${\varvec{\Sigma }}_i\, i = 1, \ldots , g$. Recall $T_2$ in (7) as a modification of $T^2$ using the Euclidean distance $\Vert \overline{\mathbf{X}}_1 - \overline{\mathbf{X}}_2\Vert ^2$. For $H_{0g}$, we sum over all pairwise norms, $\sum _{i< j}\Vert \overline{\mathbf{X}}_i - \overline{\mathbf{X}}_j\Vert ^2 = \sum _{i< j}(E_i - U_{n_i})/n_i + \sum _{i< j}(U_{n_i} + U_{n_j} - 2U_{n_in_j}) = (g - 1)\sum _{i = 1}^g{{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_i)/n_i + (g - 1)\sum _{i = 1}^gU_{n_i} - 2\sum _{i < j}^gU_{n_in_j}$, and define the MANOVA statistic as

$$\begin{aligned} T_g = (g - 1) + \frac{nQ_0}{[nQ_1/p]}, \end{aligned}$$

(8)

$Q_1 = \sum _{i = 1}^gQ_{i1}$, $Q_{i1} = (E_i - U_{n_i})/n_i = {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_i)/n_i$, $E_i = \sum _{k = 1}^{n_i}\mathbf{X}'_{ik}{} \mathbf{X}_{ik}/n_i$, $Q_0 = \sum _{i < j}Q_{0ij}$, $Q_{0ij} = U_{n_i} + U_{n_j} - 2U_{n_in_j}$, where $U_{n_i}$, $U_{n_in_j}$ are as defined in (5) with kernels $h(\mathbf{x}_{ik}, \mathbf{x}_{ir}) = \mathbf{X}'_{ik}{} \mathbf{X}_{ir}$, $k \ne r$, $h(\mathbf{x}_{ik}, \mathbf{x}_{jl}) = \mathbf{X}'_{ik}{} \mathbf{X}_{jl}$, $i \ne j$, $k, r, l = 1, \ldots , n_i$, $i, j = 1, \ldots , g$, $n = \sum _{i = 1}^gn_i$. Further, $Q_1 = {{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_0)$, $\widehat{\varvec{\Sigma }}_0 = \sum _{i = 1}^g\widehat{\varvec{\Sigma }}_i/n_i$, is an unbiased estimator of ${{\mathrm{tr}}}({\varvec{\Sigma }}_0)$, ${\varvec{\Sigma }}_0 = \sum _{i = 1}^g{\varvec{\Sigma }}_i/n_i$. We begin with the moments of $Q_0$. In particular,

$$\begin{aligned} {{\mathrm{Var}}}(Q_0)= & {} \underset{i \ne j}{\sum _{i = 1}^g\sum _{j = 1}^g{{\mathrm{Var}}}}(Q_{0ij}) + \underset{(i, j) \ne (i', j')}{\sum _{i = 1}^g\sum _{i' = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}{{\mathrm{Cov}}}(Q_{0ij}, Q_{0i'j'})\nonumber \\= & {} (g - 1)^2\sum _{i = 1}^g{{\mathrm{Var}}}(U_{n_i}) + 4\underset{i< j}{\sum _{i = 1}^g\sum _{j = 1}^g}{{\mathrm{Var}}}(U_{n_in_j}) \nonumber \\&+~8\underset{i< j< j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}{{\mathrm{Cov}}}(U_{n_in_j}, U_{n_in_{j'}})\nonumber \\&+ 8\underset{i< i'< j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}{{\mathrm{Cov}}}(U_{n_in_j}, U_{n_{i'}n_j}) \nonumber \\&+~8\underset{i< i'< j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}{{\mathrm{Cov}}}(U_{n_in_j}, U_{n_{i'}n_{j'}}) \nonumber \\&-\, 4(g - 1)\underset{i< j< j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}{{\mathrm{Cov}}}(U_{n_i}, U_{n_in_j})\nonumber \\&-~4(g - 1)\underset{i< j< j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}{{\mathrm{Cov}}}(U_{n_j}, U_{n_in_j})\nonumber \\&-\, 4(g - 1)\underset{i< j < j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}{{\mathrm{Cov}}}(U_{n_i}, U_{n_jn_{j'}}) \end{aligned}$$

where the covariances vanish when $i \ne i', j \ne j'$. Denoting ${\varvec{\Sigma }}_{0ij} = {\varvec{\Sigma }}_i/n_i + {\varvec{\Sigma }}_j/n_j$, $i < j$, and using the moments of U-statistics from Sect. A.2, we obtain

$$\begin{aligned} {{\mathrm{Var}}}(Q_{0ij})= & {} \frac{2}{p^2}{{\mathrm{tr}}}({\varvec{\Sigma }}^2_{0ij}) + \frac{4}{p^2}({\varvec{\mu }}_i - {\varvec{\mu }}_j)'{\varvec{\Sigma }}_{0ij}({\varvec{\mu }}_i - {\varvec{\mu }}_j) \end{aligned}$$

(9)

$$\begin{aligned} {{\mathrm{Cov}}}(Q_{0ij}, Q_{0ij'})= & {} \frac{2}{n^2_ip^2}{{\mathrm{tr}}}({\varvec{\Sigma }}^2_i) + \frac{4}{n_ip^2}({\varvec{\mu }}_i - {\varvec{\mu }}_j)'{\varvec{\Sigma }}_i({\varvec{\mu }}_i - {\varvec{\mu }}_{j'}) \end{aligned}$$

(10)

$$\begin{aligned} {{\mathrm{Cov}}}(Q_{0ij}, Q_{0i'j})= & {} \frac{2}{n^2_jp^2}{{\mathrm{tr}}}({\varvec{\Sigma }}^2_j) + \frac{4}{n_jp^2}({\varvec{\mu }}_i - {\varvec{\mu }}_j)'{\varvec{\Sigma }}_j({\varvec{\mu }}_{i'} - {\varvec{\mu }}_j). \end{aligned}$$

(11)

Theorem 16 summarizes the moments which reduce to those of two-sample case for $g = 2$.

Theorem 16

For $Q_0$ defined above, we have

$$\begin{aligned} E(Q_0)= & {} \frac{1}{p}\underset{i< j}{\sum _{i = 1}^g\sum _{j = 1}^g}\Vert {\varvec{\mu }}_i - {\varvec{\mu }}_j\Vert ^2\\ {{\mathrm{Var}}}(Q_0)= & {} \frac{1}{p^2}\left[ 2(g - 1)^2\sum _{i = 1}^g\frac{{{\mathrm{tr}}}({\varvec{\Sigma }}^2_i)}{n^2_i} + 4\underset{i< j}{\sum _{i = 1}^g\sum _{j = 1}^g}\frac{{{\mathrm{tr}}}({\varvec{\Sigma }}_i{\varvec{\Sigma }}_j)}{n_in_j}\right. \\&+~4\left. \underset{i< j}{\sum _{i = 1}^g\sum _{j = 1}^g}({\varvec{\mu }}_i - {\varvec{\mu }}_j)'{\varvec{\Sigma }}_{0ij}({\varvec{\mu }}_i - {\varvec{\mu }}_j) \right. \\&\left. + 4\underset{i< j < j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}({\varvec{\mu }}_i - {\varvec{\mu }}_j)'{\varvec{\Sigma }}_{0ij}({\varvec{\mu }}_i - {\varvec{\mu }}_{j'})\right] \end{aligned}$$

Now, consider the limit of $T_g$ under Assumptions 8–12. By the independence of g samples, the convergence of $Q_1$ follows exactly as for $g = 2$ so that, as $n_i, p \rightarrow \infty $,

$$\begin{aligned} nQ_1/p ~\xrightarrow {{\mathcal {P}}}~ \nu _0, \end{aligned}$$

where $\nu _0 = \sum _{i = 1}^g\rho _i\nu _{i0} = \sum _{i = 1}^g\sum _{s = 1}^\infty \rho _i\nu _{is}$. For the limit of $Q_0$, we note, from the formulation $(g - 1)\sum _{i = 1}^gU_{n_i} - 2\sum _{i < j}^gU_{n_in_j}$ and by the independence of $U_{n_i}$, $U_{n_j}$, $i \ne j$, which we need the distribution of $\mathbf{U}_N = (U_{n_i}, U_{n_in_{i + 1}}, \ldots , U_{n_in_g})'$, $i = 1, \ldots , g - 1$. Alternatively, we can consider $\mathbf{Q}_0 = (Q_{012}, \ldots , Q_{0g-1,g})'$. For $g = 2$, $\mathbf{U}_N = (U_{n_1}, ~U_{n_1n_2})'$, $\mathbf{Q}_0 = Q_{012}$. We can use either of the two options and proceed as for $g = 2$. $\mathbf{Q}_0$ is a $G\times 1$ vector, $G = g(g - 1)/2$, with ${{\mathrm{Cov}}}(\mathbf{Q}_0)$ a $G\times G$ partitioned matrix ${\varvec{\Lambda }} = ({\varvec{\Lambda }}_{ij}/p^2)_{i, j = 1}^G$ where

$$\begin{aligned} {\varvec{\Lambda }} ~=~ \begin{pmatrix} {\varvec{\Lambda }}_{11} &{}\quad {\varvec{\Lambda }}_{12} &{}\quad \ldots &{}\quad {\varvec{\Lambda }}_{1g}\\ {\varvec{\Lambda }}_{21} &{}\quad {\varvec{\Lambda }}_{22} &{}\quad \ldots &{}\quad {\varvec{\Lambda }}_{2g}\\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ {\varvec{\Lambda }}_{g1} &{}\quad {\varvec{\Lambda }}_{12} &{}\quad \ldots &{}\quad {\varvec{\Lambda }}_{gg} \end{pmatrix}. \end{aligned}$$

(12)

Thus, ${\varvec{\Lambda }}_{ii}/p^2 = {{\mathrm{Cov}}}(\mathbf{Q}_{0i})$: $(g - i)\times (g - i)$, and ${\varvec{\Lambda }}_{ij}/p^2 = {{\mathrm{Cov}}}(\mathbf{Q}_{0i}, \mathbf{Q}_{0j})$: $(g - i)\times (g - j)$, ${\varvec{\Lambda }}_{ji} = {\varvec{\Lambda }}'_{ij}$, $i = 1, \ldots , g-1$, $j = i+1, \ldots , g$. Denote $a_i = {{\mathrm{tr}}}({\varvec{\Sigma }}^2_i/n^2_i)$, $a_{0ij} = {{\mathrm{tr}}}({\varvec{\Sigma }}^2_{0ij})$. Then ${\varvec{\Lambda }}_{ii} = 2(\oplus _{j = i + 1}^ga_{0ij} + (\mathbf{J} - \mathbf{I})_{g-i}a_i)/p^2$, ${\varvec{\Lambda }}_{ij} = 2(\mathbf{0}' ~~ \mathbf{1}'_{g-i}a_i ~~ \oplus _{j = i+2}^ga_j)'/p^2$, where $\mathbf{1}$ is vector of 1s, $\mathbf{J} = \mathbf{11}'$, $\mathbf{I}$ is identity matrix, $\oplus $ is Kronecker sum and $\mathbf{0}$ in ${\varvec{\Lambda }}_{ij}$ is $(j-i-1)\times (g-j)$ with no $\mathbf{0}$ if $j - i - 1 = 0$. For any i, ${\varvec{\Lambda }}_{ii}$ has same off-diagonal element $a_i$ with diagonal elements $a_{0ij} = {{\mathrm{tr}}}({\varvec{\Sigma }}^2_{0ij})$, ${\varvec{\Sigma }}_{0ij} = {\varvec{\Sigma }}_i/n_i + {\varvec{\Sigma }}_j/n_j = {{\mathrm{Cov}}}(\overline{\mathbf{X}}_i - \overline{\mathbf{X}}_j)$, $j = i +1$. Further, most off-diagonals in ${\varvec{\Lambda }}_{ij}$ are 0, and the number of (rows with) zeros increases with j for every i, making ${\varvec{\Lambda }}$ an increasingly sparse matrix.

The weak convergence holds for $Q_{0ij}$ for any (i, j) in $\mathbf{Q}_0$, and we only need to take care of the nonzero off-diagonal elements in ${\varvec{\Lambda }}$, i.e., $a_i/p^2$, which are uniformly bounded under the assumptions and same holds for Eqs. (9)–(11). The limit of $n\mathbf{Q}_0$, hence of $nQ_0$, follows then as of $\mathbf{U}_N$ for $g = 2$. Finally, Slutsky’s lemma gives the limit of $T_g$. For the limit under $H_{0g}$, $E(\mathbf{Q}_0) = \mathbf{0}$, all covariances of U-statistics vanish (Sect. A.2) and Eqs. (9)–(11) reduce to ${{\mathrm{Var}}}(Q_{0ij}) = 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_{0ij})$, ${{\mathrm{Cov}}}(Q_{0ij}, Q_{0ij'}) = 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_i)/n^2_i$, ${{\mathrm{Cov}}}(Q_{0ij}, Q_{0i'j}) = 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_j)/n^2_j$, which are independent of ${\varvec{\mu }}_i$, so that we continue to assume ${\varvec{\mu }}_i = \mathbf{0}\forall ~i$. In particular, from Theorem 16, $E(Q_0) = 0$ under $H_{0g}$ and

$$\begin{aligned} {{\mathrm{Var}}}(Q_0) = \frac{1}{p^2}\left[ 2(g - 1)^2\sum _{i = 1}^g\frac{{{\mathrm{tr}}}({\varvec{\Sigma }}^2_i)}{n^2_i} + 4\underset{i < j}{\sum _{i = 1}^g\sum _{j = 1}^g}\frac{{{\mathrm{tr}}}({\varvec{\Sigma }}_i{\varvec{\Sigma }}_j)}{n_in_j}\right] , \end{aligned}$$

which is $2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_{012})$ for $g = 2$; see Eq. (19). The null limit then also follows on the same lines as for $g = 2$. The following theorem generalizes Theorem 13 for $g \ge 2$ samples.

Theorem 17

For $T_g$ in Eq. (8), $(T_g - E(T_g))/\sqrt{{{\mathrm{Var}}}(T_g)}\xrightarrow {{\mathcal {D}}} N(0, 1)$ under Assumptions 8–12, as $n_i, p \rightarrow \infty $, where $E(T_g)$ and ${{\mathrm{Var}}}(T_g)$ denote the mean and variance of $T_g$.

For the moments of $T_g$, note that the general distribution follows from the projection ${\widehat{Q}}_0 = \sum _{i< j}^g{\widehat{Q}}_{0ij} = \sum _{i < 1}^g({\widehat{U}}_{n_i} - 2{\widehat{U}}_{n_in_j})$ of ${\widetilde{Q}} = \sum _{i < 1}^g{\widetilde{Q}}_{0ij} {\widetilde{Q}}_{0ij} = Q_0 - E(Q_0)$, so that

$$\begin{aligned} E({\widehat{Q}}_0)= & {} \frac{1}{p}\underset{i< j}{\sum _{i = 1}^g\sum _{j = 1}^g}\Vert {\varvec{\mu }}_i - {\varvec{\mu }}_j\Vert ^2 \\ {{\mathrm{Var}}}({\widehat{Q}}_0)= & {} \frac{4}{p^2}\left[ \underset{i< j}{\sum _{i = 1}^g\sum _{j = 1}^g}({\varvec{\mu }}_i - {\varvec{\mu }}_j)'{\varvec{\Sigma }}_{0ij}({\varvec{\mu }}_i - {\varvec{\mu }}_j)\right. \\&\left. + 4\underset{i< j < j'}{\sum _{i = 1}^g\sum _{j = 1}^g\sum _{j' = 1}^g}({\varvec{\mu }}_i - {\varvec{\mu }}_j)'{\varvec{\Sigma }}_{0ij}({\varvec{\mu }}_i - {\varvec{\mu }}_{j'})\right] . \end{aligned}$$

Likewise, under $H_{0g}$, the convergence of degenerate $U_{n_i}$ and $U_{n_in_j}$ gives

$$\begin{aligned} nQ_0 \xrightarrow {{\mathcal {D}}} \underset{i < j}{\sum _{i = 1}^g\sum _{j = 1}^g}\sum _{s = 1}^\infty \left( \sqrt{\rho _i\nu _{is}}z_{is} - \sqrt{\rho _j\nu _{js}}z_{js}\right) ^2 \end{aligned}$$

such that the limiting moments $E(nQ_0) = 0$ and ${{\mathrm{Var}}}(nQ_0) = 2\sum _{i < 1}^g\sum _{s = 1}^\infty (\rho _i\nu _{is} + \rho _j\nu _{js})^2$ approximate exact moments of $Q_0$ under $H_{0g}$. Combined with the limit of $nQ_1/p$, it gives

$$\begin{aligned} T_g ~\xrightarrow {{\mathcal {D}}}~ \frac{1}{\nu _0}\underset{i < j}{\sum _{i = 1}^g\sum _{j = 1}^g}\sum _{s = 1}^\infty \left( \sqrt{\rho _i\nu _{is}}z_{is} - \sqrt{\rho _j\nu _{js}}z_{js}\right) ^2, \end{aligned}$$

(13)

with $E(T_g) = g - 1$ and variance ${{\mathrm{Var}}}(T_g) = 2\sum _{i < j}^g\sum _{s = 1}^\infty (\rho _i\nu _{is} + \rho _j\nu _{js})^2/\nu ^2_0$ which approximates $2{{\mathrm{tr}}}({\varvec{\Xi }}^2)/[{{\mathrm{tr}}}({\varvec{\Xi }})]^2$, ${\varvec{\Xi }} = n{\varvec{\Sigma }}_0/p$, ${\varvec{\Sigma }}_0 = \sum _{i = 1}^g{\varvec{\Sigma }}_i/n_i$. Further $z_{ij} = \sqrt{\rho _i\nu _{is}}z_{is} - \sqrt{\rho _j\nu _{js}}z_{js}$ is a linear combination of independent N(0, 1) variables, hence itself normal with mean 0, variance $\rho _i\nu _{is} + \rho _j\nu _{js}$. To estimate ${{\mathrm{Var}}}(T_g)$, we note that the set of distinct non-zero elements in ${\varvec{\Lambda }}$ is

$$\begin{aligned} {\mathcal {S}} = \left\{ a_i = {{\mathrm{tr}}}({\varvec{\Sigma }}^2_i),~a_{ij} = {{\mathrm{tr}}}({\varvec{\Sigma }_i}{\varvec{\Sigma }}_j),~i, j = 1, \ldots , g,~i < j\right\} , \end{aligned}$$

(14)

with cardinality $s_0 = \#\{{\mathcal {S}}\} = g(g + 1)/2$, i.e., for any g, we only need to estimate $s_0$ elements out of $G(G + 1)/2$ in order to estimate ${\varvec{\Lambda }}$. With the estimators of ${{\mathrm{tr}}}({\varvec{\Sigma }}^2_i)$, $[{{\mathrm{tr}}}({\varvec{\Sigma }}_i)]^2$ and ${{\mathrm{tr}}}({\varvec{\Sigma }}_i{\varvec{\Sigma }}_j)$ same as given in the two-sample case, a consistent plug-in estimator of ${{\mathrm{Var}}}(T_g)$ follows, leading to the following generalization of Corollary 14.

Corollary 18

Theorem 17 remains valid when ${{\mathrm{Var}}}(T_g)$ is replaced with ${\widehat{{{\mathrm{Var}}}}}(T_g)$.

Power of$T_g$ For $z_{\alpha }$ as before, $\text {P}(T_g \ge z_{\alpha }\sqrt{{{\mathrm{Var}}}(T_g)} + (g - 1)) = \alpha $, so that $1 - \beta = \text {P}(z_g \ge z_\alpha - n\delta )$ where, with $z_g = (T_g - E(T_g))/\sqrt{{{\mathrm{Var}}}(T_g)}$, $\delta = \delta _1/\delta _2$, $\delta _1 = \sum _{i < j}^g\Vert {\varvec{\mu }}_i - {\varvec{\mu }}_j\Vert ^2/p$, $\delta ^2_2 = {{\mathrm{tr}}}({\varvec{\Xi }}_0)$, ${\varvec{\Xi }}_0 = n{\varvec{\Sigma }}_0/p$, ${\varvec{\Sigma }}_0 = \sum _{i = 1}^g{\varvec{\Sigma }}_i/n_i$. For $g = 2$, $\delta _1 = \Vert {\varvec{\mu }}_1 - {\varvec{\mu }}_2\Vert ^2/p$, ${\varvec{\Sigma }}_0 = \sum _{i = 1}^2{\varvec{\Sigma }}_i/n_i$. A case of particular interest is when ${\varvec{\mu }}_i$ are mutually orthogonal, ${\varvec{\mu }}'_i{\varvec{\mu }}_j = 0$, $\forall ~i < j$. The power function remains the same, now with $\delta _1 = (g - 1)\sum _{i = 1}^g\Vert {\varvec{\mu }}_i\Vert ^2/p$ or, for $g = 2$, $\delta _1 = \Vert {\varvec{\mu }}_1\Vert ^2 + \Vert {\varvec{\mu }}_2\Vert ^2$.

Remark 19

This remark pertains to the trace estimators used to define consistent estimators of ${{\mathrm{Var}}}(T_g)$. Consider one-sample case where $E_2$, $E_3$ as estimators of ${{\mathrm{tr}}}({\varvec{\Sigma }}^2)$, $[{{\mathrm{tr}}}({\varvec{\Sigma }})]^2$, given after Theorem 5, are defined as functions of $\widehat{\varvec{\Sigma }}$ to keep them simple in formulation and efficient in computation. Alternatively, however, the same estimators can be defined as U-statistics which helps study their properties, particularly consistency, more conveniently. Let $\mathbf{D}_{kr} = \mathbf{X}_k - \mathbf{X}_r$, $k \ne r$ and define $A_{kr} = \mathbf{D}'_{kr}{} \mathbf{D}_{kr}$, $A^2_{krls} = (\mathbf{D}'_{kr}{} \mathbf{D}_{ls})^2$. Then, we can equivalently write

$$\begin{aligned} E_2 = \frac{1}{P(n)}\underset{\pi (k,~ r,~ l,~ s)}{\sum _{k = 1}^{n}\sum _{r = 1}^{n}\sum _{l = 1}^{n}\sum _{s = 1}^{n}}\frac{1}{12}B_{krls}, \quad E_3 = \frac{1}{P(n)}\underset{\pi (k,~ r,~ l,~ s)}{\sum _{k = 1}^{n}\sum _{r = 1}^{n}\sum _{l = 1}^{n}\sum _{s = 1}^{n}}\frac{1}{12}C_{krls}. \end{aligned}$$

where $B_{krls} = A^2_{krls} + A^2_{klrs} + A^2_{ksrl}$, $C_{krls} = A_{kr}A_{ls} + A_{kl}A_{rs} + A_{ks}A_{lr}$, $\pi (\cdot )$ means all indices pairwise unequal and $P(n) = n(n - 1)(n - 2)(n - 3)$. This formulation of $E_2$, $E_3$ lends itself to be mathematically easily amenable using the theory of U-statistics. For details, see Ahmad (2016). The form extends directly to multi-sample cases by defining $E_{2i}$, $E_{3i}$ for ith independent sample in the same way, with ${{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_i){{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_j)$ and ${{\mathrm{tr}}}(\widehat{\varvec{\Sigma }}_i\widehat{\varvec{\Sigma }}_j)$ estimating ${{\mathrm{tr}}}({\varvec{\Sigma }}_i){{\mathrm{tr}}}({\varvec{\Sigma }}_j)$ and ${{\mathrm{tr}}}({\varvec{\Sigma }}_i{\varvec{\Sigma }}_j)$ as usual, where a U-statistic form of ${{\mathrm{tr}}}(\widehat{\varvec{\Sigma }})$ is $\sum _{k \ne r}^nA_{kr}/n(n - 1)$. For details, see Ahmad (2017a, b).

Remark 20

Note that, the Chi-square approximation in both one- and multi-sample cases follows through two-moment approximation of the limit of the test statistics with that of a scaled Chi-square variable. Box (1954a, b) used this approximation to study the violation of assumptions of homoscedastic and uncorrelated errors in ANOVA settings, later extended and modified by Geissser and Greenhouse (1958), Greenhouse and Geissser (1959) and Huynh and Feldt (1970, 1976).

4 Simulations

We evaluate the accuracy of tests for size control and power, specifically focusing on violation of normality and homoscedasticity assumptions. We take g = 1 and 3 and generate data from Normal, Exponential and Uniform distributions with $n = 10, 20, 50$ for $T_1$ and $(n_1, n_2, n_3) = (10, 15, 20), (5, 25, 50), (10, 30, 60)$, for $T_3$, where the last two triplets represent seriously unbalanced designs. For dimension, we take $p \in \{50, 100, 300, 500, 1000\}$. For covariances structures, we use compound symmetry (CS), autoregressive of order 1, AR(1), as defined in Sect. 2.2, and unstructured (UN), defined as ${\varvec{\Sigma }} = (\sigma _{ij})_{i, j = 1}^d$ with $\sigma _{ij} = 1(1)d$ ($i = j$), $\rho _{ij} = (i - 1)/d$ ($i > j$), with $\mathbf{I}$ as identity matrix and $\mathbf{J}$ as matrix of 1s. We use $\rho = 0.5$, $\kappa = 1$.

We use $\alpha = 0.01, 0.05, 0.10$ and estimate test size by averaging $P(T\le T_o|H_0)$ over 1000 simulations, where $T$ denotes $T_1$ or $T_3$ and $T_o$ is their observed value under $H_0$. Tables 2 and 3 report estimated size and power of $T_1$ for normal and exponential distributions, and Tables 4 and 5 report the same for $T_3$ for all distributions. For power, we fix $\alpha = 0.05$ and estimate the power by averaging $P(T\ge T_o|H_1)$ over 1000 runs, where $H_1$ is defined as ${\varvec{\mu }} = \delta _r\mathbf{p}_1$, $\mathbf{p}_1 = (1/p, \ldots , p/p)$, $\delta _r = 0.2(0.2)1$. Note that, $T_3$ is assessed under a triplet of covariance structures (CS, AR, UN) followed by the three populations.

We observe an accurate size control for normal as well as for non-normal distributions and under all covariance structures. The stability of the size control for increasing p, for n as small as 10, is also evident. We observe a similar performance for power, with discernably better performance under AR and UN structures than under CS, for all distributions, which might be attributed to the spiky nature of CS. The power, however, also improves reasonably under CS for increasing n and p. For $g = 3$, we also observe accuracy for unbalanced design, with a drastic improvement for the last triplet of $n_i$. Although not reported here, similar results were observed for other $\rho $ values in CS and AR, for other covariance structures, e.g., Toeplitz, and for other distributions, e.g., t.

We also assessed the power of proposed tests under possible sparse alternatives. For simplicity, we report results for $T_1$ for normal distribution with same n as used above and $p \in \{60, 100, 200\}$. We consider three levels of sparsity: small, medium and large with 25%, 50% and 75% zeros in the mean vector, respectively. Note that, 0% sparsity implies the case under $H_1$, where 100% sparsity implies the null case. Table 6 reports the results. Generally, the power is high under all parameter settings, indicating the validity of tests for such alternatives. Further, the power increases with increasing sample size, so that even under sparsity, the test shows a high probability to tell the null from the alternative, particularly as the sample size grows.

Table 2 Estimated size of $T_1$: normal and exponential distributions

Full size table

Table 3 Estimated power of $T_1$: normal and exponential distributions

Full size table

5 Analyses of real data sets

Figure 2 depicts average counts of macrobenthos observed along an approximately 2000 km long transact of Norwegian continental shelf. The transact under observation comprised a range of water depths and sediment properties. A total of $p = 809$ species were observed from $n = 101$ independent sites in five different regions of the transact, where $n_1 = 16$, $n_2 = 21$, $n_3 = 25$, $n_4 = 19$, $n_5 = 20$. Each count is a five-replicate pooled observation, and the data contain a large amount of zeros where no species could be recorded. For details, see Ellingsen and Gray (2002).

Table 4 Estimated size of $T_3$ for (CS, AR, UN) structures: all distributions

Full size table

Table 5 Estimated power of $T_3$ for (CS, AR, UN) structures: all distributions

Full size table

Table 6 Estimated power of $T_1$ under three sparse alternatives

Full size table

In our notation, $\mathbf{X} = (\mathbf{X}'_1, \ldots , \mathbf{X}'_5)' \in {\mathbb {R}}^{n\times p}$ represents the complete data matrix with regionwise data matrices $\mathbf{X}_i = (\mathbf{X}'_{i1}, \ldots , \mathbf{X}'_{in_i})' \in {\mathbb {R}}^{n_i\times p}$, $\mathbf{X}_{ik} \in {\mathbb {R}}^p$, where $n_i$ and p are given above. It is thus an unbalanced one-way MANOVA experiment with $g = 5$ independent samples, each of $n_i$ iid vectors of dimension 809, where $n = \sum _{i = 1}^5n_i = 101$. The linear model can be expressed as

$$\begin{aligned} \mathbf{X}_{ik} = {\varvec{\mu }}_i + {\varvec{\epsilon }}_{ik}, ~~j = 1, \ldots , n_i,~ i = 1, \ldots , 5, \end{aligned}$$

(15)

where the vector $\mathbf{X}_{ik}$ consists of 809 species counts measured for $k\hbox {th}$ replicate (site) from ith region, ${\varvec{\mu }}_i \in {\mathbb {R}}^p$ is the true average count vector of $i\hbox {th}$ region, and ${\varvec{\epsilon }}_{ik} \in {\mathbb {R}}^p$ are random error vectors, associated with each $\mathbf{X}_{ik}$, with $E({\varvec{\epsilon }}_{ik}) = \mathbf{0}$ and ${{\mathrm{Cov}}}({\varvec{\epsilon }}_{ik}) = {\varvec{\Sigma }}_i\forall ~k$, $i = 1, \ldots , 5$. The hypothesis of interest can be formulated as $H_{05}: {\varvec{\mu }}_1 = \ldots = {\varvec{\mu }}_5$ vs $H_{15}: {\varvec{\mu }}_i \ne {\varvec{\mu }}_j$ for at least one pair $i \ne j$, i, $j = 1, \ldots , 5$. We use $T_g$ in Eq. (8) to test $H_{05}$.

We also apply the proposed test to two well-known data sets, referred here to as alcohol and leukemia data. The alcohol data is a two-group ($g = 2$) data that motivated Dempster to construct the first two-sample high-dimensional test (Dempster 1958); see also Dempster (1960, 1968). The data consist of $p = 59$ biochemistry measurements on $n_1 = 8$ alcoholic and $n_2 = 4$ control individuals aged 16–39 years; see also Beerstecher et al. (1950). The three-group ($g = 3$) leukemia data are often also used for classification. It consist of measurements on patients with acute lymphoblastic leukemia (ALL) carrying a chromosomal translocation involving mixed-lineage leukemia (MLL) gene. A total of $p = 11225$ gene expression profiles of leukemia cells are taken from patients in ALL group ($n_1 = 28$), B-precursor ALL carrying an MLL translocation ($n_2 = 24$) and conventional B-precursor without MLL translocation ($n_3 = 20$); see Armstrong et al. (2002) for details.

Model (15) remains the same for alcohol and leukemia data sets, with $g = 2$ and 3, respectively, and with corresponding sample sizes given above. The analyses of all three data sets are reported in Table 7. The first three columns report the data sizes and the next three the Chi-square approximation for $T_g$, and the penultimate two columns provide the corresponding normal approximation. Only for alcohol data, the results provide evidence in support of null hypothesis of no difference of mean vectors, whereas the hypotheses are significantly rejected for both leukemia and species data. The conclusions for all three data sets are consistent for both approximations. In particular, the results for species data substantiate what can be roughly witnessed in Fig. 2.

Table 7 Analysis of example data sets

Full size table

6 Discussion and remarks

Test statistics for high-dimensional mean vectors are presented. A unified strategy is proposed that systematically encompasses one- and multi-sample cases. The tests are constructed as linear combinations of U-statistics-based estimators and are valid for any distribution with finite fourth moment. The limiting distributions of the tests are derived under a few mild assumptions. Simulations are used to show the accuracy of the tests for moderate sample size and any large dimension. The tests are location invariant, so that the mean vectors need not be assumed zero. Due to singularity of empirical covariance matrix in high-dimensional case, an affine-invariant test is not possible, and location-invariance is the best that can be achieved in this case.

References

Ahmad, M.R.: A $U$-statistic approach for a high-dimensional two-sample mean testing problem under non-normality and Behrens–Fisher setting. Ann. Inst. Stat. Math. 66, 33–61 (2014)
MathSciNet MATH Google Scholar
Ahmad, M.R.: On testing sphericity and identity of a covariance matrix with large dimensions. Math. Methods Stat. 25, 121–132 (2016)
MathSciNet MATH Google Scholar
Ahmad, M.R.: Location-invariant multi-sample $U$-tests for covariance matrices with large dimension. Scand. J. Stat. 44, 500–523 (2017a)
MathSciNet MATH Google Scholar
Ahmad, M.R.: Location-invariant tests of homogeneity of large dimensional covariance matrices. J. Stat. Theory Pract. 11, 731–745 (2017b)
MathSciNet MATH Google Scholar
Anderson, N.H., Hall, P., Titterington, D.M.: Two-sample test statistics for measuring discrepencies between two multivariate probability density functions using kernel based density estimates. J. Multivar. Anal. 50, 41–54 (1994)
MATH Google Scholar
Aoshima, M., Yata, K.: Two-stage procedures for high-dimensional data. Seq. Anal. 30, 356–399 (2011)
MathSciNet MATH Google Scholar
Aoshima, M., Yata, K.: Asymptotic normality for inference on multisample high-dimensional mean vectors under mild conditions. Methodol. Comput. Appl. Probab. 17, 419–439 (2015)
MathSciNet MATH Google Scholar
Armstrong, S.A., Staunton, J.E., Silverman, L.B., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)
Google Scholar
Bai, Z., Saranadasa, H.: Effect of high dimension: by an example of a two sample problem. Stat. Sin. 6, 311–329 (1996)
MathSciNet MATH Google Scholar
Beerstecher, E., Sutton, H.E., Berry, H.K., et al.: Biochemical individuality. V. Exlorations with respect to the metabolic pattersn of compulsive drinkers. Arch. Biochem. 29, 27–40 (1950)
Google Scholar
Box, G.E.P.: Some theorems on quadratic forms applied in the study of analysis of variance problems, I: effect of inequality of variance in the one-way classification. Ann. Math. Stat. 25, 290–302 (1954a)
MathSciNet MATH Google Scholar
Box, G.E.P.: Some theorems on quadratic forms applied in the study of analysis of variance problems, II: effect of inequality of variance and of correaltion between errors in the two-way classification. Ann. Math. Stat. 25, 484–498 (1954b)
MATH Google Scholar
Chen, S.X., Qin, Y.-L.: A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 38(2), 808–835 (2010)
MathSciNet MATH Google Scholar
Dempster, A.P.: A high dimensional two sample significance test. Ann. Math. Stat. 29, 995–1010 (1958)
MathSciNet MATH Google Scholar
Dempster, A.P.: A significance test for the separation of two highly multivariate small samples. Biometrics 16, 41–50 (1960)
MathSciNet MATH Google Scholar
Dempster, A.P.: Elements of Continuous Multivariate Analysis. Addison-Wesley, Reading (1968)
MATH Google Scholar
Duchesne, P., Francq, C.: Multivariate hypothesis testing using generalized and 2-inverses with applications. Statistics 49, 475–496 (2015)
MathSciNet MATH Google Scholar
Ellingsen, K.E., Gray, J.S.: Spatial patterns of benthic diversity: is there a latitudinal gradient along the Norwegian continental shelf? J. Anim. Ecol. 71, 373–389 (2002)
Google Scholar
Feng, L., Zou, C., Wang, Z.: Multivariate sign-based high-dimensional test for the two-sample location problem. J. Am. Stat. Assoc. 111, 721–735 (2016)
MathSciNet Google Scholar
Fujikoshi, Y.: Multivariate analysis for the case when the dimension is large compared to the sampel size. J. Korean Stat. Soc. 33, 1–24 (2004)
Google Scholar
Fujikoshi, Y., Ulyanov, V.V., Shimizu, R.: Multivariate statistics: high-dimensional and large-sample approximations. Wiley, New York (2010)
MATH Google Scholar
Geissser, S., Greenhouse, W.: An extension of Box’s results on the use of $F$ distribution in multivariate analysis. Ann. Math. Stat. 29, 885–891 (1958)
MathSciNet MATH Google Scholar
Greenhouse, W., Geissser, S.: On methods in the analysis of profile data. Psychometrika 24, 95–112 (1959)
MathSciNet Google Scholar
Hájek, J., Šidák, Z., Sen, P.K.: Theory of Rank Tests. Academic Press, Cambridge (1999)
MATH Google Scholar
Hu, J., Bai, Z.: A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices. Sci. China Math. 55, 1–19 (2015)
Google Scholar
Hu, J., Bai, Z., Wang, C., Wang, W.: On testing equality of high dimensional mean vectors with unequal covariance matrices. Ann. Inst. Stat. Math. 69, 365–387 (2017)
MathSciNet MATH Google Scholar
Huynh, H., Feldt, L.S.: Conditions under which mean square ratios in repeated measurements designs have exact $F$-distributions. J. Am. Stat. Assoc. 65, 1582–1589 (1970)
MATH Google Scholar
Huynh, H., Feldt, L.S.: Estimation of the Box correction for the degrees of freedom from sample data in randomized block and split-plot designs. J. Educ. Stat. 1, 69–82 (1976)
Google Scholar
Jiang, J.: Large Sample Techniques for Statistics. Springer, New York (2010)
MATH Google Scholar
Katayama, S., Kano, Y.: A new test on high-dimensional mean vectors without any assumption on population covariance matrix. Commun. Stat. Theory Methods 43, 5290–5304 (2014)
MathSciNet MATH Google Scholar
Koroljuk, V.S., Borovskich, Y.V.: Theory of $U$-Statistics. Kluwer, Dordrecht (1994)
Google Scholar
Läuter, J.: Two new multivariate tests, in particular for a high dimension. Acta et Comment. Univ. Tartu. Math. 8, 179–186 (2004)
MathSciNet MATH Google Scholar
Läuter, J., Glimm, E., Kropf, S.: Multivariate tests based on left-spherically distributed linear scores. Ann. Stat. 26, 1972–1988 (1998). Corrections: 27, 1441
MathSciNet MATH Google Scholar
Lee, A.J.: U-Statistics: Theory and Practice. CRC, Boca Raton (1990)
MATH Google Scholar
Lehmann, E.L.: Elements of Large-Sample Theory. Springer, New York (1999)
MATH Google Scholar
Reed, M., Simon, B.: Methods of Modern Mathematical Physics: Functional analysis, vol. I. Academic Press, Cambridge (1980)
MATH Google Scholar
Schott, J.R.: Some high-dimensional tests for a one-way MANOVA. J. Multivar. Ann. 98, 1825–1839 (2007)
MathSciNet MATH Google Scholar
Serfling, R.J.: Approximation theorems of Mathematical Statistics. Wiley, Weinheim (1980)
MATH Google Scholar
Shao, J.: Mathematical Statistics, 2nd edn. Springer, New York (2003)
MATH Google Scholar
van der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)
MATH Google Scholar
Wang, L., Peng, B., Li, R.: A high-dimensional nonparametric multivariate test or mean vector. J. Am. Stat. Assoc. 110, 1658–1669 (2015)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author is thankful to the editor, the associate editor and anonymous referee for their constructive comments which helped improve the original version of the manuscript. Kari Ellingsen’s kind permission to use species data is also duly acknowledged.

Author information

Authors and Affiliations

Department of Statistics, Uppsala University, Uppsala, Sweden
M. Rauf Ahmad

Authors

M. Rauf Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Rauf Ahmad.

Appendices

Some miscellaneous results

1.1 U-statistics

First, we need to set some notations. For details, see e.g., Serfling (1980), Koroljuk and Borovskich (1994), van der Vaart (1998) and Lehmann (1999). For iid $X_i$, let $h(X_1, \ldots , X_m): {\mathbb {R}}^m \rightarrow {\mathbb {R}}$ denote the kernel of an mth order U-statistic, $U_n$, with $E(U_n) = \theta = E[h(\cdot )]$ with its projection $h_c(x_1, \ldots , x_c) = E[h(\cdot )|x_1, \ldots , x_c]$, $h_m(\cdot ) = h(\cdot )$ and $\xi _c = {{\mathrm{Var}}}[h_c(\cdot )$, $c = 1, \ldots , m$, so that ${{\mathrm{Var}}}(U_n) = \sum _{c = 1}^m\left( {\begin{array}{c}m\\ c\end{array}}\right) \left( {\begin{array}{c}n - m\\ m - c\end{array}}\right) \xi _{c}/\left( {\begin{array}{c}n\\ m\end{array}}\right) $. If $0< \xi _c < \infty \forall ~c$, then $(U_n - E(U_n))/\sqrt{{{\mathrm{Var}}}(U_n)}$$\xrightarrow {{\mathcal {D}}} N(0, 1)$. For two U-statistics, $U_{ni}$, of order $m_i$, kernels $h_i(\cdot )$, projections $h_{ic}(\cdot )$, $i = 1, 2$, let $\xi _{cc} = {{\mathrm{Cov}}}[h_{1c}(\cdot ), h_{2c}(\cdot )]$, $c = 1, \ldots , m_1 \le m_2$. Then, ${{\mathrm{Cov}}}(U_{n1}, U_{n2}) = \sum _{c = 1}^{m_1}\left( {\begin{array}{c}m_2\\ c\end{array}}\right) \left( {\begin{array}{c}n - m_2\\ m_1 - c\end{array}}\right) \xi _{cc}/\left( {\begin{array}{c}n\\ m_1\end{array}}\right) $. Let $U_{n_1n_2}$ be a U-statistic of two independent samples, with kernel $h(X_{11}, \ldots , X_{1m_1}, X_{21}, \ldots , X_{2m_2})$, symmetric in each sample, projection $h_{c_1c_2} = E[h(\cdot )|X_{11}, \ldots , X_{1c_1}; X_{21}, \ldots , X_{2c_2}]$, $\xi _{c_1c_2} = {{\mathrm{Cov}}}[h(\cdot ), h_{c_1c_2}(\cdot )]$, $\xi _{00} = 0$, $c_i = 0, 1, \ldots , m_i$. Then, ${{\mathrm{Var}}}(U_{n_1n_2}) = \sum _{c_1 = 0}^{m_1}\sum _{c_2 = 0}^{m_2}\left( {\begin{array}{c}m_1\\ c_1\end{array}}\right) \left( {\begin{array}{c}n_1 - m_1\\ m_1 - c_1\end{array}}\right) \left( {\begin{array}{c}m_2\\ c_2\end{array}}\right) \left( {\begin{array}{c}n_2 - m_2\\ m_2 - c_2\end{array}}\right) \xi _{c_1c_2}/\left( {\begin{array}{c}n_1\\ m_1\end{array}}\right) \left( {\begin{array}{c}n_2\\ m_2\end{array}}\right) $. If $0 \le n_i/n \le 1$, $n = n_1 + n_2$, $0< \xi _{c_1c_2} < \infty $$\forall ~c_i$, then $(U_{n_1n_2} - E(U_{n_1n_2}))/\sqrt{{{\mathrm{Var}}}(U_{n_1n_2})}$$\xrightarrow {{\mathcal {D}}} N(0, 1)$.

Lemma 21

(Jiang 2010, p. 183; Hájek et al. 1999, p. 184) Let $Y_1, Y_2, \ldots $ be iid random variables, $E(Y_i) = 0$, ${{\mathrm{Var}}}(Y_i) = 1$. Let $b_{ni}$ be a sequence of constants, $1 \le i \le n$. Then $\sum _{i = 1}^nb_{ni}Y_i \xrightarrow {{\mathcal {D}}} N(0, 1)$ given $\max _ib^2_{ni} \rightarrow 0$, as $n \rightarrow \infty $.

1.2 Basic moments of U-statistics

For $U_{n_i}$, $h(\mathbf{X}_{ik}, \mathbf{X}_{ir}) = \mathbf{X}'_{ik}\mathbf{X}_{ir}$, $m = 2$, $h_1(\mathbf{X}_{ik}) = {\varvec{\mu }}'_i\mathbf{X}_{ik}$, $\xi _1 = {\varvec{\mu }}'_i{\varvec{\Sigma }}_i{\varvec{\mu }}_i$, $\xi _2 = {{\mathrm{tr}}}({\varvec{\Sigma }}^2_i) + 2{\varvec{\mu }}'_i{\varvec{\Sigma }}_i{\varvec{\mu }}_i$. For $U_{n_in_j}$, $h(\mathbf{X}_{ik}, \mathbf{X}_{jl})= \mathbf{X}'_{ik}\mathbf{X}_{jl}$, $m_1 = 1 = m_2$, $h_{10} = {\varvec{\mu }}'_j\mathbf{X}_{ik}$, $h_{01} = {\varvec{\mu }}'_i\mathbf{X}_{jl}$, $h_{11}(\cdot ) = h(\cdot )$, $\xi _{10} = {\varvec{\mu }}'_j{\varvec{\Sigma }}_i{\varvec{\mu }}_j$, $\xi _{01} = {\varvec{\mu }}'_i{\varvec{\Sigma }}_j{\varvec{\mu }}_i$, $\xi _{11} = {\varvec{\mu }}'_i{\varvec{\Sigma }}_j{\varvec{\mu }}_i + {\varvec{\mu }}'_j{\varvec{\Sigma }}_i{\varvec{\mu }}_j + {{\mathrm{tr}}}({\varvec{\Sigma }}_i{\varvec{\Sigma }}_j)$. Then, for $i \ne j$, $i \ne j'$, $i' \ne j$, ${{\mathrm{Var}}}(U_{n_i}) = 2[2(n_i - 1){\varvec{\mu }}'_i{\varvec{\Sigma }}_i{\varvec{\mu }}_i + {{\mathrm{tr}}}({\varvec{\Sigma }}^2_i)]/n_i(n_i - 1)$, ${{\mathrm{Var}}}(U_{n_in_j}) = [n_i{\varvec{\mu }}'_i{\varvec{\Sigma }}_j{\varvec{\mu }}_i + n_j{\varvec{\mu }}'_j{\varvec{\Sigma }}_i{\varvec{\mu }}_j + {{\mathrm{tr}}}({\varvec{\Sigma }}_i{\varvec{\Sigma }}_j)]/n_in_j$, ${{\mathrm{Cov}}}(U_{n_i}, U_{n_in_j}) = 2{\varvec{\mu }}'_j{\varvec{\Sigma }}_i{\varvec{\mu }}_i/n_i$, ${{\mathrm{Cov}}}(U_{n_j}, U_{n_in_j}) = 2{\varvec{\mu }}'_i{\varvec{\Sigma }}_j{\varvec{\mu }}_j/n_j$, ${{\mathrm{Cov}}}(U_{n_in_j}, U_{n_in_{j'}}) = {\varvec{\mu }}'_j{\varvec{\Sigma }}_i{\varvec{\mu }}_{j'}/n_i$, ${{\mathrm{Cov}}}(U_{n_in_j}, U_{n_{i'}n_j}) = {\varvec{\mu }}'_i{\varvec{\Sigma }}_j{\varvec{\mu }}_{i'}/n_j$. See Sect. A.1 for basic notations and general moment expressions.

Main proofs

1.1 Proof of Theorem 5

First, $E(Q_1/p) = {{\mathrm{tr}}}({\varvec{\Sigma }})/p = \sum _{s = 1}^p\nu _s$, bounded by $\nu _0$, under Assumption 2, as $p \rightarrow \infty $. Now

$$\begin{aligned} {{\mathrm{Var}}}(nQ_1/p) = {{\mathrm{Var}}}(E/p) + {{\mathrm{Var}}}(U_n/p) - 2{{\mathrm{Cov}}}(E/p, U_n/p). \end{aligned}$$

(16)

With ${{\mathrm{Var}}}(\mathbf{X}'_k\mathbf{X}_k) \le \gamma p^2$ under Assumption 1, $ {{\mathrm{Var}}}(E/p) \le \gamma /n = O(1/n)$. From Sect. A.2, ${{\mathrm{Var}}}(U_n/p) = \gamma /n + 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2)/n(n - 1)p^2 + 4{\varvec{\mu }}'{\varvec{\Sigma }}{\varvec{\mu }}/(n - 1)p^2 = O(1/n)$ under the assumptions. Finally, ${{\mathrm{Cov}}}(E/p, U_n/p)$ = 0 for ${\varvec{\mu }} = \mathbf{0}$ which can be assumed w.o.l.o.g. since $E(Q_1/p)$ does not depend on ${\varvec{\mu }}$. Alternatively, by Cauchy–Schwarz inequality, ${{\mathrm{Cov}}}(E/p, U_n/p) \le $$\left[ {{\mathrm{Var}}}(E){{\mathrm{Var}}}(U_n)\right] ^{1/2}$ which simplifies to O(1 / n). This proves consistency of $nQ_1/p$. Note that, this consistency holds both under simultaneous and sequential (n, p)-asymptotics where in the later case the last term vanishes with p, before the limit over n is carried out. Further, the limit is the same under $H_0$ and $H_1$.

Now, consider $nU_n$ with $h(\mathbf{x}_k, \mathbf{x}_r) = \mathbf{X}'_k\mathbf{X}_r/p$ so that $E[h(\cdot )] = \Vert {\varvec{\mu }}\Vert ^2 = E(U_n)$. Define ${\widetilde{U}}_n = U_n - E(U_n)$ with corresponding kernel ${\widetilde{h}}(\cdot ) = h(\cdot ) - E[h(\cdot )]$. Let ${\widehat{U}}_n$ denotes the projection of ${\widetilde{U}}_n$. As $U_n$ is a second-order U-statistic with product kernel (bilinear form of independent components), $h(\cdot )$, following the notation in Sect. A.1,

$$\begin{aligned} {\widehat{U}}_n = \sum _{k = 1}^nE({\widetilde{U}}_n|\mathbf{X}_k) = \frac{2}{n}\sum _{k = 1}^n{\varvec{\mu }}'(\mathbf{X}_k - {\varvec{\mu }}) \end{aligned}$$

with $E({\widehat{U}}_n) = 0 = E({\widetilde{U}}_n)$ and ${{\mathrm{Cov}}}({\widehat{U}}_n, {\widetilde{U}}_n) = 4{\varvec{\mu }}'{\varvec{\Sigma }}{\varvec{\mu }}/n = {{\mathrm{Var}}}({\widehat{U}}_n)$ so that, with ${{\mathrm{Var}}}({\widetilde{U}}_n)$ as given above (see Sect. A.2), it follows that ${{\mathrm{Var}}}(n{\widehat{U}}_n)$ and ${{\mathrm{Var}}}(n{\widetilde{U}}_n)$ are uniformly bounded under Assumptions 2 and 4, such that ${{\mathrm{Var}}}(n{\widehat{U}}_n)/{{\mathrm{Var}}}(n{\widetilde{U}}_n) \rightarrow 1$; see e.g., Lehmann (1999, Ch.6), Serfling (1980, Ch.5) or van der Vaart (1998, Ch.12). This, along with the convergence of $nQ_1/p$, gives normal limit of $nU_n/[nQ_1/p]$, hence of T, by Slutsky’s theorem.

Some remarks concerning the aforementioned limit will help us extend it further under the null. To begin with, the first-order projection of $h(\cdot )$, $h_1(\mathbf{x}_k) = E[h(\cdot )|\mathbf{x}_k] = {\varvec{\mu }}'{} \mathbf{X}_k/p$, along with its variance, $\xi _1 = {{\mathrm{Var}}}[h_1(\mathbf{x}_k)] = {\varvec{\mu }}'{\varvec{\Sigma }}{\varvec{\mu }}/p^2$ exactly vanishes under $H_0: {\varvec{\mu }} = \mathbf{0}$, making the kernel (first-order) degenerate under $H_0$. Note that, for the limit under $H_1$ above, the term involving this projection, $4{\varvec{\mu }}'{\varvec{\Sigma }}{\varvec{\mu }}/np^2$ is eventually bounded under Assumption 4, for simultaneous (n, p)-asymptotics, when used for $nU_n$. But under sequential asymptotics, if $p \rightarrow \infty $ first, then the projection vanishes asymptotically. But the limit under $H_1$ still holds since the total variance ${{\mathrm{Var}}}(U_n)$ still remains bounded under the assumptions. In fact, an additional advantage under sequential asymptotics is that now the power of T does not depend on any specific ${\varvec{\mu }}$.

Under $H_0$, however, the projection and its variance $\xi _1$ are exactly zero and the limit need to be derived differently. Since $E[h^2(\cdot )] = {{\mathrm{tr}}}({\varvec{\Sigma }}^2)/p^2 < \infty $ under the consequence of Assumption 2, the kernel is square integrable. As we shall see in the sequel, $h(\cdot )$ being a product kernel makes it further convenient to derive the limit. Without loss of generality, we can assume that the data $\mathbf{X}_k$ are generated by a separable (Hilbert) space ${\mathcal {L}}_2({\mathcal {X}}, {\mathcal {A}}, {\mathcal {P}})$. By symmetry and square integrability of $h(\cdot )$, the map $T: {\mathcal {L}}_2({\mathcal {X}}, {\mathcal {A}}, {\mathcal {P}}) \rightarrow {\mathcal {L}}_2({\mathcal {X}}, {\mathcal {A}}, {\mathcal {P}})$, being a (bounded, linear) integral operator, i.e.,

$$\begin{aligned} Tf(\mathbf{x}_k) = \int f(\mathbf{x}_k, \mathbf{x}_r)f(\mathbf{x}_r)d{\mathcal {P}}(\mathbf{x}_r), \end{aligned}$$

is self-adjoint, Hilbert–Schmidt. With $\lambda $’s and $\nu $’s introduced just before the assumptions, let $(\nu _s, f_s)$ forms its orthonormal eigendecomposition, i.e., $h(\mathbf{x}_k, \mathbf{x}_r) = \sum _{s = 0}^\infty \lambda _sf_s(\mathbf{x}_k)f_s(\mathbf{x}_r)$, where $\sum _s\nu ^2 < \infty $ and $f_0 = 1$ correspond to $\lambda _0 = 0$. For details, see e.g., van der Vaart (1998) and Koroljuk and Borovskich (1994). By the Hilbert–Schmidt Theorem (Reed and Simon 1980, p. 203), the convergence of the kernel to its basis is in ${\mathcal {L}}_2$, i.e.,

$$\begin{aligned} E\left( h(\mathbf{x}_k, \mathbf{x}_r) - \sum _{s = 0}^p\lambda _sf_s(\mathbf{x}_k)f_s(\mathbf{x}_r)\right) ^2 = \sum _{s = p + 1}^\infty \nu ^2_s \rightarrow 0. \end{aligned}$$

A general theorem on the limit of a degenerate U-statistics under this setup is given in van der Vaart (1998, Theorem 12.10, p. 169) or Lee (1990, Theorem 1, p. 90). The limit holds for $n^{c/2}U_{n,c}$ with variance $c!E[h^2_c(\cdot )]$, where $U_{n,c}$ is a U-statistic with (projected) kernel $h_c(\cdot )$ and c is the least value for which $h_c(\cdot )$ is non-degenerate (see Sect. A.1).

Thus, in the present context with $m = 2$, $c = 2$, $nU_n$ has a finite limit with variance approximating $2\xi _2 = 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2)/p^2 = 2\sum _{s = 1}^p\nu ^2_s$. Specifically, for first-order degeneracy, the limit is $[m(m - 1)/2]\sum _{s = 1}^\infty \nu _s(z^2_s - 1)$, where $z_s$ are independent N(0, 1) variables; see Koroljuk and Borovskich (1994, Ch. ) and Shao (2003, Ch. 3). With $m = 2$, we thus have, for $n, p \rightarrow \infty $,

$$\begin{aligned} nU_n \xrightarrow {{\mathcal {D}}} \sum _{s = 1}^\infty \nu _s(z^2_s - 1), \end{aligned}$$

(17)

with $z^2_s \sim \chi ^2_1$ iid, where the limiting mean is 0 and variance is $2\sum _{s = 1}^\infty \nu ^2_s$ which approximates $2{{\mathrm{tr}}}({\varvec{\Sigma }}^2)/p^2$. Combined with the limit of $nQ_1/p$ by Slutsky’s theorem, we have

$$\begin{aligned} T - 1 \xrightarrow {{\mathcal {D}}} \sum _{s = 1}^\infty \nu _s(z^2_s - 1)/\sum _{s = 1}^\infty \nu _s. \end{aligned}$$

(18)

Now write $\omega _s = \nu _s/\sum _s\nu _s$ such that $\sum _s\omega _s = 1$ and $\max \omega ^2_s \rightarrow 0$. Also let $Y_s = (z^2_s - 1)/\sqrt{2}$ so that $E(Y_s) = 0$, ${{\mathrm{Var}}}(Y_s) = 1$. Then, the normal limit follows by the Hájek–Šidák Lemma (Lemma 21).

1.2 Proof of Theorem 13

With $Q_1$ composed of two independent components, the probability convergence of $nQ_1/p$ follows exactly as in one-sample case, so that

$$\begin{aligned} nQ_1/p \xrightarrow {{\mathcal {P}}} \nu _0, \end{aligned}$$

where $\nu _0 = \sum _{i = 1}^2\rho _i\nu _{i0} = \sum _{i = 1}^2\sum _{s = 1}^\infty \rho _i\nu _{is}$, as $n_i, p \rightarrow \infty $. Now $Q_0$, which we first write as $Q_0 = \mathbf{a}'{} \mathbf{U}_N$, where $\mathbf{a} = (1~1~-2)'$ and $\mathbf{U}_N = (U_{n_1}~U_{n_2}~U_{n_1n_2})'$, so that the limit of $Q_0$ follows from that of $\mathbf{U}_N$. Obviously $E(Q_0) = \Vert {\varvec{\mu }}_1 - {\varvec{\mu }}_2\Vert ^2$ where, from “Appendix A.2”,

$$\begin{aligned}&{{\mathrm{Cov}}}(\mathbf{U}_N) \\&\quad = \frac{1}{p^2}\begin{pmatrix} \frac{2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_1)}{n_1(n_1 - 1)} + \frac{4{\varvec{\mu }}'_1{\varvec{\Sigma }}_1{\varvec{\mu }}_1}{n_1 - 1} &{}0 &{} \frac{2{\varvec{\mu }}'_2{\varvec{\Sigma }}_1{\varvec{\mu }}_1}{n_1} \\ 0&{} \frac{2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_2)}{n_2(n_2 - 1)} + \frac{4{\varvec{\mu }}'_2{\varvec{\Sigma }}_2{\varvec{\mu }}_2}{n_2 - 1} &{} \frac{2{\varvec{\mu }}'_1{\varvec{\Sigma }}_2{\varvec{\mu }}_2}{n_2}\\ \frac{2{\varvec{\mu }}'_2{\varvec{\Sigma }}_1{\varvec{\mu }}_1}{n_1}&{} \frac{2{\varvec{\mu }}'_1{\varvec{\Sigma }}_2{\varvec{\mu }}_2}{n_2}&{} \frac{{{\mathrm{tr}}}({\varvec{\Sigma }}_1{\varvec{\Sigma }}_2)}{n_1n_2} + \frac{{\varvec{\mu }}'_1{\varvec{\Sigma }}_2{\varvec{\mu }}_1}{n_2} + \frac{{\varvec{\mu }}'_2{\varvec{\Sigma }}_1{\varvec{\mu }}_2}{n_1} \end{pmatrix} \end{aligned}$$

so that ${{\mathrm{Var}}}(Q_0) = \mathbf{a}'{{\mathrm{Cov}}}(\mathbf{U}_N)\mathbf{a}$ results into

$$\begin{aligned} {{\mathrm{Var}}}(Q_0)= & {} \frac{2}{p^2}\left( \frac{{{\mathrm{tr}}}({\varvec{\Sigma }}^2_1)}{n_1(n_1 - 1)} + \frac{{{\mathrm{tr}}}({\varvec{\Sigma }}^2_2)}{n_2(n_2 - 1)} + \frac{2{{\mathrm{tr}}}({\varvec{\Sigma }}_1{\varvec{\Sigma }}_2)}{n_1n_2}\right) \nonumber \\&+\frac{4}{p^2}\left( \frac{{\varvec{\mu }}'_1{\varvec{\Sigma }}_1{\varvec{\mu }}_1}{n_1 - 1} + \frac{{\varvec{\mu }}'_2{\varvec{\Sigma }}_2{\varvec{\mu }}_2}{n_2 - 1} + \frac{{\varvec{\mu }}'_1{\varvec{\Sigma }}_2{\varvec{\mu }}_1}{n_2} + \frac{{\varvec{\mu }}'_2{\varvec{\Sigma }}_1{\varvec{\mu }}_2}{n_2} \right. \nonumber \\&\left. - \frac{2{\varvec{\mu }}'_2{\varvec{\Sigma }}_1{\varvec{\mu }}_1}{n_1} - \frac{2{\varvec{\mu }}'_1{\varvec{\Sigma }}_2{\varvec{\mu }}_2}{n_2}\right) \nonumber \\= & {} [2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_0)/p^2 + 4({\varvec{\mu }}_1 - {\varvec{\mu }}_2)'{\varvec{\Sigma }}_0({\varvec{\mu }}_1 - {\varvec{\mu }}_2)/p^2][1 + o_P(1)]. \end{aligned}$$

(19)

Note that, as $n_i, p \rightarrow \infty $, terms involving ${\varvec{\mu }}'{\varvec{\Sigma }}{\varvec{\mu }}$ are finite under the assumptions, making ${{\mathrm{Cov}}}(n\mathbf{U}_N)$, hence ${{\mathrm{Var}}}(nQ_0)$, uniformly bounded, implying in turn that $nQ_0$ might have a finite limit. Let ${\widehat{Q}}_0 = \sum _{i = 1}^2{\widehat{U}}_{n_i} - 2{\widehat{U}}_{n_1n_2}$ be the projection of ${\widetilde{Q}}_0 = \sum _{i = 1}^2{\widetilde{U}}_{n_i} - 2{\widetilde{U}}_{n_1n_2}$, where ${\widetilde{U}}_{n_i} = U_{n_i} - E(U_{n_i})$, with its kernel ${\widetilde{h}}(\mathbf{x}_{ik}, \mathbf{x}_{ir}) = \mathbf{X}'_{ik}{} \mathbf{X}_{ir}/p - \Vert {\varvec{\mu }}\Vert ^2/p$, and similarly ${\widetilde{U}}_{n_1n_2}$ with its kernel ${\widetilde{h}}(\mathbf{x}_{1k}, \mathbf{x}_{2l})$. Further, ${\widehat{U}}_{n_i}$ are as defined for one-sample case, whereas

$$\begin{aligned} {\widehat{U}}_{n_1n_2} = \frac{m_1}{n_1}\sum _{k = 1}^{n_1}{\widetilde{h}}_{10}(\mathbf{x}_{1k}) + \frac{m_2}{n_2}\sum _{l = 1}^{n_2}{\widetilde{h}}_{01}(\mathbf{x}_{2l}) \end{aligned}$$

where ${\widetilde{h}}_{10}(\mathbf{x}_{1k}) = E[{\widetilde{h}}(\mathbf{x}_{1k}, \mathbf{x}_{2l})|\mathbf{x}_{1k}]$, similarly ${\widetilde{h}}_{01}(\mathbf{x}_{2l})$; see Sect. A.1 for notations. Thus

$$\begin{aligned} {\widehat{Q}}_0= & {} \frac{2}{n_1p}\sum _{k = 1}^{n_1}{\varvec{\mu }}'_1(\mathbf{X}_{1k} - {\varvec{\mu }}_1) + \frac{2}{n_2p}\sum _{l = 1}^{n_2}{\varvec{\mu }}'_2(\mathbf{X}_{2l} - {\varvec{\mu }}_2)\nonumber \\&- 2\left[ \frac{1}{n_1p}\sum _{k = 1}^{n_1}{\varvec{\mu }}'_2(\mathbf{X}_{1k} - {\varvec{\mu }}_1) + \frac{1}{n_2p}\sum _{l = 1}^{n_2}{\varvec{\mu }}'_1(\mathbf{X}_{2l} - {\varvec{\mu }}_2)\right] \nonumber \\= & {} \frac{2}{n_1p}\sum _{k = 1}^{n_1}({\varvec{\mu }}_1 - {\varvec{\mu }}_2)'(\mathbf{X}_{1k} - {\varvec{\mu }}_1) - \frac{2}{n_2p}\sum _{l = 1}^{n_2}({\varvec{\mu }}_1 - {\varvec{\mu }}_2)'(\mathbf{X}_{2l} - {\varvec{\mu }}_2)\nonumber \\= & {} 2({\varvec{\mu }}_1 - {\varvec{\mu }}_2)'\left[ \frac{1}{n_1p}\sum _{k = 1}^{n_1}(\mathbf{X}_{1k} - {\varvec{\mu }}_1) - \frac{1}{n_2p}\sum _{l = 1}^{n_2}({\varvec{\mu }}_1 - {\varvec{\mu }}_2)\right] \end{aligned}$$

(20)

is the projection of ${\widetilde{Q}}_0$ with $E({\widehat{Q}}_0) = 0$, ${{\mathrm{Var}}}({\widehat{Q}}_0) = 4({\varvec{\mu }}_1 - {\varvec{\mu }}_2)'{\varvec{\Sigma }}_0({\varvec{\mu }}_1 - {\varvec{\mu }}_2)/p^2$. The term within brackets in Eq. (20) is the sum of two independent components, as a direct extension of one-sample case. By the same procedure then, it follows that ${{\mathrm{Cov}}}({\widetilde{Q}}_0, {\widehat{Q}}_0) = {{\mathrm{Var}}}({\widehat{Q}}_0)$ so that ${{\mathrm{Var}}}({\widehat{nQ}}_0)/{{\mathrm{Var}}}({\widetilde{nQ}}_0) \rightarrow 1$. Under the assumptions, $n{\widetilde{Q}}_0 = n{\widehat{Q}}_0 + o_P(1)$ with the limit of $n{\widehat{Q}}_0$ following by the central limit theorem, leading to the limit of $n{\widetilde{Q}}_0$, hence of $T_2$, by Slutsky theorem.

Now consider $H_0$ whence the projection ${\widehat{Q}} = 0$, making the component U-statistics, hence $Q_0$, degenerate and leaving the normal limit above invalid under the null. To simplify the matters, assume without loss of generality, that ${\varvec{\mu }}_i = {\varvec{\mu }} = \mathbf{0}$, $i = 1, 2$. Then

$$\begin{aligned} {{\mathrm{Cov}}}(\mathbf{U}_N) = \text {diag}\left( 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_1)/n_1(n_1 - 1), \quad 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_2)/n_2(n_2 - 1), \quad {{\mathrm{tr}}}({\varvec{\Sigma }}_1{\varvec{\Sigma }}_2)/n_1n_2\right) /p^2 \end{aligned}$$

and ${{\mathrm{Var}}}(Q_0) = 2{{\mathrm{tr}}}({\varvec{\Sigma }}^2_0)/p^2$. This, again, is a direct extension of one-sample case under $H_0$, so that we can similarly proceed to obtain the limit, except that here we need to deal with a three dimensional vector instead of a scalar. Then, the limits of $nU_{n_i}$, $i = 1, 2$, follow from (17) as (see also Ahmad 2014)

$$\begin{aligned} nU_{n_i} \xrightarrow {{\mathcal {D}}} \sum _{s = 1}^\infty \rho _i\nu _{is}(z^2_{is} - 1), \end{aligned}$$

(21)

as $n_i, p \rightarrow \infty $, where $n = n_1 + n_2$. Writing $n/n_1n_2 = [\sqrt{n/n_1}\sqrt{n/n_2}][1/\sqrt{n_1n_2}]$, the corresponding limit for $nU_{n_1n_2}$ is given as Koroljuk and Borovskich (1994, Ch. 4)

$$\begin{aligned} nU_{n_1n_2} \xrightarrow {{\mathcal {D}}} \sum _{s = 1}^\infty \sqrt{\rho _1\rho _2}\nu _{1s}\nu _{2s}z_{1s}z_{2s}, \end{aligned}$$

(22)

as $n_i, p \rightarrow \infty $, where $z_{is}$ are iid N(0, 1) variables in both limits and $z_{1s}$, $z_{2s}$ are also independent of each other. To combine the three limits, define $w^2_{is} = \rho _i\nu ^2_{is}/\sum _s\rho _i\nu ^2_{is}$, $i = 1, 2$ such that $\lim _{p \rightarrow \infty }\max _sw^2_{is} = 0$. Then, a multivariate extension of Lemma 21 gives the normal limit

$$\begin{aligned} \widetilde{\mathbf{U}}_N ~\xrightarrow {{\mathcal {D}}}~ {\mathcal {N}}_3(\mathbf{0}, \mathbf{I}), \end{aligned}$$

where $\widetilde{\mathbf{U}}_N = (U_{n_1}/\sqrt{{{\mathrm{Var}}}(U_{n_1})}, ~U_{n_2}/\sqrt{{{\mathrm{Var}}}(U_{n_21})}, ~U_{n_1n_2}/\sqrt{{{\mathrm{Var}}}(U_{n_1n_2})})'$ is the standardized form of $\mathbf{U}_N$ with each component having mean zero. Finally, under Assumption 2 and by Slutsky theorem, with covariance matrix diagonal, the limit easily extends for $nQ_0/[nQ_1/p]$ and hence for $T_2$ as a linear combination of three components.

Rights and permissions

OpenAccess This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Ahmad, M.R. A unified approach to testing mean vectors with large dimensions. AStA Adv Stat Anal 103, 593–618 (2019). https://doi.org/10.1007/s10182-018-00343-z

Download citation

Received: 19 May 2017
Accepted: 29 November 2018
Published: 10 December 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10182-018-00343-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A unified approach to testing mean vectors with large dimensions

Abstract

Similar content being viewed by others

Multi-sample hypothesis testing of high-dimensional mean vectors under covariance heterogeneity

On high-dimensional two sample mean testing statistics: a comparative study with a data adaptive choice of coefficient vector

A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices

1 Introduction

2 The one- and two-sample tests

2.1 The one-sample case

Assumption 1

Assumption 2

Assumption 3

Assumption 4

Theorem 5

Corollary 6

Remark 7

2.2 The two-sample case

Assumption 8

Assumption 9

Assumption 10

Assumption 11

Assumption 12

Theorem 13

Corollary 14

Remark 15

3 Multi-sample test: one-way MANOVA

Theorem 16

Theorem 17

Corollary 18

Remark 19

Remark 20

4 Simulations

5 Analyses of real data sets

6 Discussion and remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Some miscellaneous results

1.1 U-statistics

Lemma 21

1.2 Basic moments of U-statistics

Main proofs

1.1 Proof of Theorem 5

1.2 Proof of Theorem 13

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation