On the Feasibility of Parsimonious Variable Selection for Hotelling’s $$T^2$$ -test

Perlman, Michael D.

doi:10.1007/s13171-024-00357-7

On the Feasibility of Parsimonious Variable Selection for Hotelling’s $T^2$-test

Published: 11 May 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sankhya A Aims and scope Submit manuscript

On the Feasibility of Parsimonious Variable Selection for Hotelling’s $T^2$-test

Download PDF

Michael D. Perlman¹

28 Accesses
Explore all metrics

Abstract

Hotelling’s $T^2$-test for the mean of a multivariate normal distribution is one of the triumphs of classical multivariate analysis. It is uniformly most powerful among invariant tests, and admissible, proper Bayes, and locally and asymptotically minimax among all tests. Nonetheless, investigators often prefer non-invariant tests, especially those obtained by selecting only a small subset of variables from which the $T^2$-statistic is to be calculated, because such reduced statistics are more easily interpretable for their specific application. Thus it is relevant to ask the extent to which power is lost when variable selection is limited to very small subsets of variables, e.g. of size one (yielding univariate Student-$t^2$ tests) or size two (yielding bivariate $T^2$-tests). This study presents preliminary evidence suggesting that in some cases, no power may be lost, in fact may be gained, over a wide range of alternatives.

Extensions of stability selection using subsamples of observations and covariates

Article 21 July 2015

Bayesian Variable Selection

Fourier methods for model selection

Article 09 October 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This study is motivated by a re-examination of the variable-selection problem for Hotelling’s $T^2$-test (closely related to variable selection for linear discriminant analysis). After some notational preliminaries in §1.1, Hotelling’s $T^2$ is reviewed in §1.2. The variable-selection problem is described in §1.3, where the substance of this investigation is described.

1.1 The Noncentral f-distribution

Let $\chi _m^2(\lambda )$ denote a noncentral chi-square random variable with m degrees of freedom and noncentrality parameter $\lambda >0$. The noncentral $f_{m,n}(\lambda )$ distribution (nonnormalized) with m and n degrees of freedom and noncentrality parameter $\lambda >0$ is the distribution of the ratio $\chi _m^2(\lambda )/\chi _n^2$ (also denoted by $f_{m,n}(\lambda )$), where the numerator and denominator are independent chi-square random variables and $\chi _n^2\equiv \chi _n^2(0)$. The upper $\alpha $-quantile of $f_{m,n}\equiv f_{m,n}(0)$ is denoted by $f_{m,n}^\alpha $, so that

$$\begin{aligned} \Pr [f_{m,n}>f_{m,n}^\alpha ]=\alpha . \end{aligned}$$

(1)

The noncentral $f_{m,n}$-test of size $\alpha \ge 0$ for the problem of testing $\lambda =0$ vs. $\lambda >0$ has power function given by

$$\begin{aligned} \pi _\alpha (\lambda ;m,n)&=\Pr [f_{m,n}(\lambda )>f_{m,n}^\alpha ]\end{aligned}$$

(2)

$$\begin{aligned}&=e^{-\frac{\lambda }{2}}\sum _{k=0}^\infty \textstyle (\frac{\lambda }{2})^k\frac{1}{k!}c_{m,n;k}^\alpha ,\end{aligned}$$

(3)

$$\begin{aligned} c_{m,n;k}^\alpha :&=\Pr [f_{m+2k,n}>f_{m,n}^\alpha ]; \end{aligned}$$

(4)

see Das Gupta and Perlman (1974), eqn.(2.1). Clearly $\pi _\alpha (\lambda ;m,n)$ is decreasing in $\alpha $, with $\pi _0(\lambda ;m,n)=0$. Because $f_{m,n}(\lambda )$ has strictly monotone likelihood ratio in $\lambda $, $\pi _\alpha (\lambda ;m,n)$ is strictly increasing in $\lambda $.

It will be convenient to work with the (central) beta distribution $b_{m,n}$:

$$\begin{aligned} b_{m,n}:=\frac{f_{m,n}}{f_{m,n}+1}\equiv \frac{\chi _m^2}{\chi _m^2+\chi _n^2}, \end{aligned}$$

(5)

whose probability density function (pdf) is given by

$$\begin{aligned} \phi _{m,n}(b)\equiv \frac{\Upgamma (\frac{m+n}{2})}{\Upgamma (\frac{m}{2})\Upgamma (\frac{n}{2})}b^{\frac{m}{2}-1}(1-b)^{\frac{n}{2}-1},\qquad 0<b<1. \end{aligned}$$

(6)

Clearly $b_{m,n}= 1-b_{n,m}$. The upper and lower $\alpha $-quantiles of $b_{m,n}$ are denoted by $b_{m,n}^\alpha $ and $b_{m,n;\alpha }$, respectively, so that

$$\begin{aligned} b_{m,n}^\alpha = 1-b_{n,m;\alpha }. \end{aligned}$$

(7)

Later we shall need the following relation, obtained from Eqs. 4, 5, 6 and 7:

$$\begin{aligned} c_{m,n;k}^\alpha =\Pr [b_{n,m+2k}<b_{n,m;\alpha }]. \end{aligned}$$

(8)

1.2 Hotelling’s $T^2$-test

Let $X_i:p\times 1$, $i=1,\dots ,N$ ($N\ge p+1$) be a random sample from the p-dimensional multivariate normal distribution $N_p(\mu ,\Upsigma )$, where $\mu \ (p\times 1)\equiv (\mu _1,\dots ,\mu _p)' \in \mathbb {R}^p$ and $\Upsigma \ (p\times p)\equiv (\sigma _{ij})$ is positive definite. The problem of testing

$$\begin{aligned} H_0:\mu =0\quad \text {vs.}\quad K:\mu \ne 0 \end{aligned}$$

(9)

with $\Upsigma $ unknown is invariant under the group action $X_i\rightarrow A X_i$, $i=1,\dots ,N$, where $A\in GL(p)$, the group of all nonsingular $p\times p$ matrices. A maximal invariant statistic under GL(p) is given by Hotelling’s $T^2$ statistic:

$$\begin{aligned} T^2:= N\bar{X}'S^{-1}\bar{X}, \end{aligned}$$

(10)

where $\bar{X}=\sum _{i=1}^NX_i$ and $S=\sum _{i=1}^N(X_i-\bar{X})(X_i-\bar{X})'$. Its distribution is

$$\begin{aligned} T^2\sim f_{p,N-p}(\Uplambda ), \end{aligned}$$

(11)

where

$$\begin{aligned} \Uplambda \equiv \Uplambda _{\Upomega _p}:=N\mu '\Upsigma ^{-1}\mu , \end{aligned}$$

(12)

is a maximal invariant parameter. Therefore the uniformly most powerful invariant size-$\alpha $ test rejects $H_0$ if $T^2>f_{p,N-p;\alpha }$, with power function $\pi _\alpha (\Uplambda ;p,N-p)$; cf. Anderson, 2003 Theorem 5.6.1).^{Footnote 1}

It is informative to express $\Uplambda $ in terms of scale-free parameters, that is,

$$\begin{aligned} \Uplambda =N\gamma 'R^{-1}\gamma , \end{aligned}$$

(13)

where $R\equiv (\rho _{ij})$ is the $p\times p$ correlation matrix determined by $\Upsigma $ and

$$\begin{aligned} \textstyle \gamma \ (p\times 1)\equiv (\gamma _1,\dots ,\gamma _p)':=\big (\frac{\mu _1}{\sqrt{\sigma _{11}}},\dots ,\frac{\mu _p}{\sqrt{\sigma _{pp}}}\big )'. \end{aligned}$$

(14)

The testing problem (9) can be stated equivalently as that of testing

$$\begin{aligned} H_0:\gamma =0\quad \text {vs.}\quad K:\gamma \ne 0 \end{aligned}$$

(15)

with R unknown.

1.3 The $T^2$ Variable-selection Problem

Denote the components of $\bar{X}$ by $\bar{X}_j$, $j=1,\dots ,p$, and those of S by $s_{jk}$, $j,k=1,\dots ,p$. Let $\Upomega _p$ be the collection of all nonempty subsets of the index set $I:=\{1,\dots ,p\}$. For $\omega \in \Upomega _p$ denote the $\omega $-subvector of $\bar{X}$ by $\bar{X}_\omega $, the $\omega $-submatrix of S by $S_\omega $, and similarly define $\mu _\omega $, $\gamma _\omega $, $\Upsigma _\omega $, and $R_\omega $. The $T^2$-statistic based on $(\bar{X}_\omega ,S_\omega )$ is given by

$$\begin{aligned} T_\omega ^2&\equiv N\bar{X}_\omega 'S_\omega ^{-1}\bar{X}_\omega \sim f_{|\omega |,\,N-|\omega |}(\Uplambda _\omega ),\end{aligned}$$

(16)

$$\begin{aligned} \Uplambda _\omega&\equiv \Uplambda _\omega (\gamma ,R)\ =\ N\gamma _\omega 'R_\omega ^{-1}\gamma _\omega . \end{aligned}$$

(17)

($T_I^2=T^2$, $\Uplambda _I=\Uplambda \equiv \Uplambda (\gamma ,R)$.) The test that rejects $H_0$ if $T_\omega ^2>f_{|\omega |,N-|\omega |}^\alpha $ has size $\alpha $ for $H_0$, and its power function is given by

$$\begin{aligned} \pi _\alpha (\Uplambda _\omega ;\,|\omega |,\,N-|\omega |). \end{aligned}$$

(18)

This $T_\omega ^2$-test is not invariant under GL(p) but it is admissible for testing $H_0$ vs. K, being a unique proper Bayes test for a prior distribution under which $\{\mu \mid \mu _{I\setminus \omega }=0\}$ has prior probability 1; cf. Kiefer and Schwartz (1965); Marden and Perlman (1980).

This paper addresses the feasibility of finding a parsimonious subset $\omega $ such that the $T_\omega ^2$-test maintains high power over a substantial portion of the alternative K. Because $(\gamma ,R)$ is unknown, variable selection in practice is traditionally approached by forward and/or backward selection procedures based on a preliminary sample that yields estimates of $(\gamma ,R)$; see the Appendix. At worst, all $2^p-1$ nonempty subsets $\omega $ must be considered.

Recently I consulted on such a variable-selection problem. The investigator, a research and development engineer, had observed 20 physiological variables (blood pressure, temperature, heart rate, etc.) on each of 100 subjects (the numbers are approximate). He wished to compare their responses to a new product design with their responses to the current design. The overall $T^2$-statistic, based on a linear combination of all 20 variables, indicated a significant difference between the two sets of responses. However, the client wished to find a more readily interpretable measure of difference, namely a $T_\omega ^2$-statistic based on a very small subset $\omega $ of the 20 variables, hopefully with $|\omega |=1$ or 2.

Such a desire is not atypical of investigators presented with a multivariate data analysis. This led me to wonder how much power would be lost by restricting variable selection to small subsets $\omega $, for example to single variables or pairs of variables.

In fact some power might be gained. It is well known (e.g., Das Gupta and Perlman, 1974) that $\pi _\alpha (\Uplambda _\omega ;\,|\omega |,\,N-|\omega |)$ is decreasing in $|\omega |$ while increasing in $\Uplambda _\omega $. Might the decreasing effect outweigh the increasing effect over a significant portion of the sample space? If so then restricting attention to small variable subsets might be desirable.

To state this more precisely, define

$$\begin{aligned} \hat{\omega }_\alpha (\gamma ,R)=\underset{\omega \in \Upomega _p}{\arg \max }\ \pi _\alpha (\Uplambda _\omega (\gamma ,R);\,|\omega |,\,N-|\omega |). \end{aligned}$$

(19)

Thus $\hat{\omega }_\alpha (\gamma ,R)$ is the (not necessarily unique) subset $\omega $ of variables that maximizes the power of the size-$\alpha $ $T_\omega ^2$-test to detect the alternative $(\gamma ,R)$ if the actual value of $(\gamma ,R)$ were revealed by an oracle. Whereas the admissibility of the overall size-$\alpha $ $T^2$-test dictates that its power cannot be everywhere dominated by that of the size-$\alpha $ $T_\omega ^2$-test when $\omega \ne \Upomega _p$, might it happen that $|\hat{\omega }_\alpha (\gamma ,R)|$ is small, perhaps 1 or 2, over a fairly wide range of parameter values $(\gamma ,R)$? If so, then might one, with some confidence, limit variable selection to consideration of single variables (univariate $t^2$-tests) or pairs of variables (bivariate $T^2$-tests), as an alternative to simply applying the overall (p-variate) $T^2$-test?

Of course, corrections for multiple testing must be considered for any variable-selection procedure before definitive conclusions can be drawn and procedures implemented, see §5 for a brief example. However, restriction to small variable subsets has another desirable property: there are relatively few such subsets compared $2^p-1$ the number of all nonempty subsets of $\Upomega _p$, greatly reducing any correction factor. For example, when $p=20$ as in the consulting problem cited above, there are 20 univariate subsets, ${20\atopwithdelims ()2}=190$ bivariate subsets, compared to $2^{20}-1=1,048,575$ total nonempty subsets.

Such a radical suggestion flies in the face of 100 years of multivariate statistical theory, of which I have been but one of many proponents. This report presents preliminary evidence indicating that limitation of variable selection to low-dimensional tests may not be entirely inappropriate.

In Sections 2, 3, and 4, several examples are considered where tractable algebraic expressions for the asymptotic ($\Uplambda _\omega \rightarrow \infty $), local ($\Uplambda _\omega \rightarrow 0$), and/or exact values of $\pi _\alpha (\Uplambda _\omega ;|\omega |, N-|\omega |)$ are available. These in turn can be utilized to compare the powers of $T_\omega ^2$ and $T^2$. These examples include both sparse and non-sparse mean-vector configurations, and the results may be the first that are based on algebraically-explicit power function comparisons of the low-dimensional and full-dimensional tests.

Examples 2.1 and 3.1 treat only the simplest possible case: the bivariate case ($p=2$) with $N=3$.^{Footnote 2} Here it is shown that $|\hat{\omega }_\alpha (\gamma ,R)|=1$ over large portions of the asymptotic and local regions of the alternative hypothesis K. This implies that the power of at least one of the two univariate Student $t^2$-tests ($|\omega |=1$) exceeds that of the overall (bivariate) $T^2$-test for most alternatives $(\gamma ,R)$ in these regions.

In Example 4.4 this result is extended to the entire alternative hypothesis K, both for $N=3$ and $N=5$, but only under the highly restrictive and vague condition that $\alpha $ be sufficiently small, with “sufficiently small" determined by the value of the unknown noncentrality parameter – see

Eqs. 126 and 131.

Examples 2.2 and 3.2 go beyond the bivariate case. Here $p\ge 3$, $N=p+2$, and the powers of all possible bivariate $T_\omega ^2$-tests ($|\omega |=2$) are compared to the power of the overall (p-variate) $T^2$ test, again only for asymptotic and local alternatives and only for very special configurations of $\gamma $ and R. In these cases, admittedly highly restrictive, the bivariate $T_\omega ^2$-tests dominate the p-variate $T^2$-test over a substantial portion of the alternative hypothesis K. This does not establish that $|\hat{\omega }_\alpha (\gamma ,R)|=2$ but again suggests that variable selection might be limited to small variable subsets $\omega $.

Together, the preliminary findings in this paper indicate the feasibility and potential benefit of limiting variable selection to small subsets, in particular to univariate or bivariate subsets. Further study will be needed to implement this approach to variable selection and to confirm its efficacy. See §5 and the Appendix for related comments.

2 Some Asymptotic Power Comparisons

The power function of the $T_\omega ^2$-test is

$$\begin{aligned} \pi _\alpha (\Uplambda _\omega )\equiv \pi _\alpha (\Uplambda _\omega ;|\omega |,N-|\omega |) \end{aligned}$$

(20)

(recall (16) and (17)). It follows from eqn. (3.4) in Marden and Perlman (1980) that as $\Uplambda _\omega \rightarrow \infty $,

$$\begin{aligned} \pi _\alpha (\Uplambda _\omega )&\sim \textstyle 1-\exp [{-\frac{\Uplambda _\omega }{2}(f_{|\omega |,N-|\omega |}^\alpha +1})^{-1}]\nonumber \\&=\textstyle 1-\exp [-\frac{\Uplambda _\omega }{2}b_{N-|\omega |,|\omega |;\alpha }]. \end{aligned}$$

(21)

Thus for two subsets $\omega ,\,\omega '$ with $\omega \subset \omega '$, there exists $\Uplambda _{|\omega |,|\omega '|,N;\alpha }^*>0$ such that

$$\begin{aligned} \Uplambda _\omega>\max \left( \Uplambda _{|\omega |,|\omega '|,N;\alpha }^*,\;\textstyle \frac{b_{N-|\omega '|,|\omega '|;\alpha }}{b_{N-|\omega |,|\omega |;\alpha }}\Uplambda _{\omega '}\right) \ \implies \ \pi _\alpha (\Uplambda _\omega )>\pi _\alpha (\Uplambda _{\omega '}). \end{aligned}$$

(22)

Therefore power comparisons of $T_\omega ^2$ and $T_{\omega '}^2$ for distant alternatives^{Footnote 3} require determination of the lower quantiles $b_{n,m;\alpha }$. This can be done explicitly in Examples 2.1 and 2.2 below. Although these examples are of very limited scope^{Footnote 4} they begin to suggest that variable subset selection sometimes can be limited to very small subsets $\omega \in \Upomega _p$, e.g., singletons in the bivariate Example 2.1, or pairs (including singletons) in Example 2.2.

To simplify the notation, set

$$\begin{aligned} Q_{|\omega |,|\omega '|,N;\alpha }:=\frac{b_{N-|\omega '|,|\omega '|;\alpha }}{b_{N-|\omega |,|\omega |;\alpha }}<1. \end{aligned}$$

(23)

The quantile $b_{n,m;\alpha }$ satisfies

$$\begin{aligned} \alpha =\textstyle \frac{\Upgamma (\frac{n+m}{2})}{\Upgamma (\frac{n}{2})\Upgamma (\frac{m}{2})}\int _0^{b_{n,m;\alpha }}b^{\frac{n}{2}-1}(1-b)^{\frac{m}{2}-1}db \end{aligned}$$

(24)

For the simple cases $n=2$ or $m=2$,

$$\begin{aligned} b_{2,m;\alpha }=1-(1-\alpha )^{\frac{2}{m}},\quad b_{n,2;\alpha }=\alpha ^{\frac{2}{n}}. \end{aligned}$$

(25)

Example 2.1

In the bivariate case $p=2$, abbreviate the singleton subsets $\{1\}$ and $\{2\}$ of $\Upomega _2$ by 1 and 2 respectively. We shall compare the powers $\pi _\alpha (\Uplambda _1;\,1,\,2)$ and $\pi _\alpha (\Uplambda _2;\,1,\,2)$ of the two univariate size-$\alpha $ $t^2$-tests to the power $\pi _\alpha (\Uplambda ;\,2,\,1)$ of the overall (bivariate) size-$\alpha $ $T^2$-test for distant alternatives.

Assume that $\gamma _1\ne 0$ (recall (14)) and set

$$\begin{aligned} \textstyle \eta =\frac{\gamma _2}{\gamma _1},\quad \ \ \,\rho =\rho _{12}, \end{aligned}$$

(26)

where $-1<\rho <1$ , so by Eqs. 13 and 17,

$$\begin{aligned} \textstyle \Uplambda _1=N\gamma _1^2,\quad \Uplambda _2=N\eta ^2\gamma _1^2,\quad \Uplambda =N\big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\gamma _1^2. \end{aligned}$$

(27)

Without loss of generality we can assume that $|\gamma _1|\ge |\gamma _2|$, so $0\le \eta ^2\le 1$ and

$$\begin{aligned} \max (\Uplambda _1,\Uplambda _2)=N\gamma _1^2. \end{aligned}$$

(28)

The alternative hypotheses K can be represented as

$$\begin{aligned} K=\{(\gamma _1,\eta ,\rho )\mid |\gamma _1|>0,\,|\eta |\le 1,\,|\rho |<1\}, \end{aligned}$$

(29)

while $^\omega _\alpha (\gamma ,R)$ can be re-expressed as $^\omega _\alpha (\gamma _1,\eta ,\rho )$.

Because $\max (\Uplambda _1,\Uplambda _2)\le \Uplambda $, it follows from Eqs. 22, 27, and 28 that

$$\begin{aligned}&\textstyle \qquad \gamma _1^2>\max \left( \frac{1}{N}\Uplambda _{1,2,N;\alpha }^*,\;\textstyle Q_{1,2,N;\alpha }\big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\gamma _1^2\right) \end{aligned}$$

(30)

$$\begin{aligned} \implies \ {}&\max (\pi _\alpha (\Uplambda _1;1,2),\pi _\alpha (\Uplambda _2;1,2))>\pi _\alpha (\Uplambda ;2,1)\end{aligned}$$

(31)

$$\begin{aligned} \implies \ {}&|^\omega _\alpha (\gamma _1,\eta ,\rho )|=1. \end{aligned}$$

(32)

In the simplest case $N=3$, Eq. 25 yields the explicit expression

$$\begin{aligned} \textstyle Q_{1,2,3;\alpha }=\frac{b_{1,2;\alpha }}{b_{2,1;\alpha }}=\frac{\alpha }{2-\alpha }, \end{aligned}$$

(33)

while the inequality in Eq. 30 is equivalent to

$$\begin{aligned} \textstyle 1>\max \left( \frac{\Uplambda _{1,2,3;\alpha }^*}{3\gamma _1^2},\;\textstyle Q_{1,2,3;\alpha }\big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\right) . \end{aligned}$$

(34)

Note that

$$\begin{aligned}&\textstyle 1>Q_{1,2,3;\alpha }\big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\end{aligned}$$

(35)

$$\begin{aligned} \iff 0>\rho ^2-&2Q_{1,2,3;\alpha }\eta \rho +Q_{1,2,3;\alpha }(1+\eta ^2)-1=:h_{\alpha ,\eta }(\rho ). \end{aligned}$$

(36)

The quadratic function $h_{\alpha ,\eta }(\rho )$ ($-1\le \rho \le 1$) satisfies

$$\begin{aligned} h_{\alpha ,\eta }(-1)&=Q_{1,2,3;\alpha }(1+\eta )^2\ge 0,\\ h_{\alpha ,\eta }(1)&=Q_{1,2,3;\alpha }(1-\eta )^2\ge 0,\\ h_{\alpha ,\eta }(0)&=Q_{1,2,3;\alpha }(1+\eta ^2)-1. \end{aligned}$$

It is easily seen that if $\alpha \le \frac{2}{3}$ then $Q_{1,2,3;\alpha }\le \frac{1}{2}$, so $h_{\alpha ,\eta }(0)\le 0$ for all $\eta \in [-1,1]$. Thus if $\alpha \le \frac{2}{3}$ then $h_{\alpha ,\eta }(\rho )$ must have one root in $[-1,0]$ and one root in [0, 1]. The two roots are given by

$$\begin{aligned} \textstyle \hat{\rho }_{\alpha ,\eta }^{\pm }=Q_{1,2,3;\alpha }\eta \pm \sqrt{(1-Q_{1,2,3;\alpha })(1-Q_{1,2,3;\alpha }\eta ^2)}; \end{aligned}$$

(37)

note that $\hat{\rho }_{\alpha ,-\eta }^{\pm }=-\hat{\rho }_{\alpha ,\eta }^{\mp }$.

Table 1 Take $p=2$, $N=3$, and $\gamma _1^2$ sufficiently large, that is, $\gamma _1^2\ge \frac{1}{3}\Uplambda _{1,2,3;\alpha }^*$

Full size table

It follows that if $\alpha \le \frac{2}{3}$ then for sufficiently large $\gamma _1^2$, i.e., $\gamma _1^2\ge \frac{1}{3}\Uplambda _{1,2,3;\alpha }^*$,

$$\begin{aligned} \textstyle \rho \in (\hat{\rho }_{\alpha ,\eta }^-,\,\hat{\rho }_{\alpha ,\eta }^+)\implies |\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1, \end{aligned}$$

(38)

that is, at least one of the two univariate $t^2$-tests is more powerful than the overall (bivariate) $T^2$-test. Specifically, when $\gamma _1^2\ge \frac{1}{3}\Uplambda _{1,2,3;\alpha }^*$, $|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1$ in the $(\eta ,\rho )$-regions of the parameter space indicated in Table 1. From this it is seen that for $p=2$, $N=3$, and the common (small) values of $\alpha $, the bivariate size-$\alpha $ $T^2$-test is dominated by at least one of the two univariate size-$\alpha $ $t^2$-tests for most of the distant alternative hypothesis K, i.e., for suffiiciently large $\gamma _1^2$. In fact, for most cases this domination occurs over almost the entire range $(-1,1)$ of $\rho $. $\square $

Example 2.2

Suppose that $p\ge 3$ and $N=p+2$. The powers of the ${p\atopwithdelims ()2}$ bivariate size-$\alpha $ $T^2$-tests and the overall (p-variate) size-$\alpha $ $T^2$-test will be compared for distant alternatives, which requires comparison of the powers

$$\begin{aligned} \{\pi (\Uplambda _\omega ;2,p)\mid \omega \in \Upomega _p,\,|\omega |=2\}\ \ \textrm{and}\ \ \pi (\Uplambda ;p,\,2). \end{aligned}$$

(39)

From Eq. 22,

$$\begin{aligned} \Uplambda ^{(2)}:&=\max \{\Uplambda _\omega \mid \omega \in \Upomega _p,\,|\omega |=2\}> \max \left( \Uplambda _{2,p,p+2;\alpha }^*,\;Q_{2,p,p+2;\alpha }\Uplambda \right) \\&\ \ \ \ \implies \ \max \{\pi (\Uplambda _\omega ;2,p)\mid \omega \in \Upomega _p,\,|\omega |=2\}>\pi (\Uplambda ;p,\,2). \end{aligned}$$

Therefore for sufficiently large values of $\Uplambda ^{(2)}$, namely $\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*$, at least one of the bivariate size-$\alpha $ $T^2$-tests will be more powerful than the p-variate size-$\alpha $ $T^2$-test^{Footnote 5} provided that

$$\begin{aligned} \textstyle \Uplambda ^{(2)}>Q_{2,p,p+2;\alpha }\Uplambda .\end{aligned}$$

(40)

From Eq. 25 we obtain the explicit expression

$$\begin{aligned} \textstyle Q_{2,p,p+2;\alpha }=\frac{b_{2,p;\alpha }}{b_{p,2;\alpha }}=\frac{1-(1-\alpha )^{\frac{2}{p}}}{\alpha ^{\frac{2}{p}}}. \end{aligned}$$

(41)

If we set $\nu _p=\frac{2}{p}$ and $U_{p;\alpha }=\frac{\nu _p}{Q_{2,p,p+2;\alpha }}$ then

$$\begin{aligned} \lim \limits _{p\rightarrow \infty }Q_{2,p,p+2;\alpha }&=0,\end{aligned}$$

(42)

$$\begin{aligned} \lim _{p\rightarrow \infty }U_{p;\alpha }&\textstyle =\frac{1}{-\log (1-\alpha )}\ \ \ (>1\ \ \textrm{for}\ \ \alpha <\frac{e-1}{e}=.6321);\end{aligned}$$

(43)

$$\begin{aligned} \lim _{p\rightarrow \infty }(p-1)U_{p;\alpha }&=\infty . \end{aligned}$$

(44)

Table 2 shows that $Q_{2,p,p+2;\alpha }$ decreases rapidly to 0 as $p\rightarrow \infty $, which suggests that Eq. 40 might hold over substantial regions of the alternative hypothesis K. We proceed to exhibit several such regions.

Table 2 Take $p\ge 3$, $N=p+2$, and $\delta ^2$ sufficiently large, that is, $\delta ^2\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*$

Full size table

Case 1 $\gamma _1=\cdots =\gamma _p=:\delta $ and R has the intraclass form

$$\begin{aligned} \textstyle R_\rho :=(1-\rho )I_p+\rho \textbf{1}_p\textbf{1}_p', \end{aligned}$$

(45)

where $\textbf{1}_p=(1,\dots ,1)':p\times 1$ and the allowable range of $\rho $ is $(-\frac{1}{p-1},\,1)$. Then

$$\begin{aligned} R_\rho ^{-1}&\textstyle =\frac{1}{1-\rho }\left[ I_p-\frac{\rho \textbf{1}_p\textbf{1}_p'}{1+\rho (p-1)}\right] ,\end{aligned}$$

(46)

$$\begin{aligned} \textbf{1}_p'R_\rho ^{-1}{} \textbf{1}_p&\textstyle =\frac{p}{1+\rho (p-1)}, \end{aligned}$$

(47)

By symmetry, all bivariate tests have the same power, and by Eqs. 13 and 17,

$$\begin{aligned} \textstyle \Uplambda =\frac{p(p+2)\delta ^2}{1+(p-1)\rho },\qquad \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1+\rho }. \end{aligned}$$

(48)

Thus $\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*$ holds for all allowable $\rho $ if $\delta ^2\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*$. Also, if we set $\nu _p=\frac{2}{p}\ (\le \frac{2}{3})$ then Eq. 40 is equivalent to each of the inequalities

$$\begin{aligned} \textstyle U_{p;\alpha }&>\textstyle \frac{1+\rho }{1+(p-1)\rho };\nonumber \\ \textstyle \left[ (p-1)U_{p;\alpha }-1\right] \rho&>1-U_{p;\alpha }. \end{aligned}$$

(49)

Because $(p-1)U_{p;\alpha }>1$ for common (small) values of $\alpha $ (see Eq. 44 and Table 2), in such cases Eq. 49 is equivalent to

$$\begin{aligned} \rho&>\textstyle \frac{1-U_{p;\alpha }}{(p-1)U_{p;\alpha }-1}=:\tilde{\psi }_{p;\alpha }^-. \end{aligned}$$

(50)

Table 2 shows that in Case 1, $\tilde{\psi }_{p;\alpha }^-$ is close to the lower limit of the allowable range $(-\frac{1}{p-1},1)$ for $\rho $. Thus by Eq. 50, all of the bivariate size-$\alpha $ $T^2$-tests are more powerful than the p-variate size-$\alpha $ $T^2$-test for most of the distant alternative hypothesis specified in Case 1, i.e., for sufficiently large $\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)$.

Case 2 ($\gamma $ sparse): $\gamma _i=\gamma _j=:\delta $ for some $\{i,j\}\subset \{1,\dots ,p\}$, $\gamma _k=0$ for $k\ne i,j$, and R has the intraclass form $R_\rho $ in Eq. 45

By Eqs. 13 and 17,

$$\begin{aligned} \textstyle \Uplambda =\frac{2(p+2)\delta ^2[1+(p-3)\rho ]}{(1-\rho )[1+(p-1)\rho ]},\qquad \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1+\rho }, \end{aligned}$$

(51)

so again $\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*$ holds for all allowable $\rho $ if $\delta ^2\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*$. Abbreviating $Q_{2,p,p+2;\alpha }$ by Q, Eq. 40 is equivalent to each of the inequalities

$$\begin{aligned} \textstyle \frac{1}{1+\rho }&>\textstyle \frac{Q[1+(p-3)\rho ]}{(1-\rho )[1+(p-1)\rho ]};\\ 0&>\textstyle [(p-1)+(p-3)Q]\rho ^2-(p-2)(1-Q)\rho -(1-Q)=:h_{p;\alpha }(\rho ). \end{aligned}$$

Since $h_{p;\alpha }(0)=Q-1<0$ for common (small) values of $\alpha $ (see Eqs. 42-43 and Table 2), $h_{p;\alpha }(\rho )$ has two real roots $\tilde{\rho }_{p;\alpha }^-<0<\tilde{\rho }_{p;\alpha }^+$ (found numerically). Therefore $0>h_{p;\alpha }(\rho )$ for $\tilde{\rho }_{p;\alpha }^-<\rho <\tilde{\rho }_{p;\alpha }^+$.

Table 2 shows that in Case 2, the interval $(\tilde{\rho }_{p;\alpha }^-,\tilde{\rho }_{p;\alpha }^+)$ covers almost all of the allowable range $(-\frac{1}{p-1},1)$ for $\rho $. Thus at least one of the bivariate size-$\alpha $ $T^2$-tests is more powerful than the p-variate size-$\alpha $ $T^2$-test for most of the distant alternative hypothesis specified in Case 2, i.e., for sufficiently large $\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)$.

Case 3 ($\gamma $ sparse): $\gamma _i=\delta $ and $\gamma _j=-\delta $ for some $\{i,j\}\subset \{1,\dots ,p\}$, $\gamma _k=0$ for $k\ne i,j$, and R has the intraclass form $R_\rho $

By Eqs. 13 and 17,

$$\begin{aligned} \textstyle \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1-\rho },\qquad \Uplambda =\frac{2(p+2)\delta ^2}{1-\rho }. \end{aligned}$$

(52)

Thus $\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*$ again holds for all allowable $\rho $ if $\delta ^2\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*$, while Eq. 40 is equivalent to $1>Q_{2,p,p+2;\alpha }$, which holds for most $p,\alpha $ (see Eqs. 42-43 and Table 2). Again at least one of the bivariate size-$\alpha $ $T^2$-tests will be more powerful than the p-variate size-$\alpha $ $T^2$-test for the entire distant alternative hypothesis in Case 3, i.e., for sufficiently large $\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)$.

Case 4: $p=:2l$ is even, $\gamma _i=\delta $ for l indices in $\{1,\dots ,p\}$, $\gamma _i=-\delta $ for the remaining l indices, and R has the intraclass form $R_\rho $

By Eqs. 13 and 17,

$$\begin{aligned} \textstyle \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1-|\rho |},\qquad \Uplambda =\frac{p(p+2)\delta ^2}{(1-\rho )}. \end{aligned}$$

(53)

Thus $\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*$ again holds for all allowable $\rho $ if $\delta ^2\ge (p+2)^{-1}\Uplambda _{2-2,p+2;\alpha }^*$, while Eq. 40 is equivalent to the inequality

$$\begin{aligned} U_{p;\alpha }&>\textstyle \frac{1-|\rho |}{1-\rho }. \end{aligned}$$

(54)

Because $\frac{1-|\rho |}{1-\rho }\le 1$, while $U_{p;\alpha }>1$ for holds for most $p,\alpha $ (see Eq. 43 and Table 2), at least one of the bivariate size-$\alpha $ $T^2$-tests is more powerful than the p-varisate size-$\alpha $ $T^2$-test over the entire distant alternative hypothesis in Case 4, i.e., for sufficiently large $\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)$

3 Some Local Power Comparisons

From Eqs. 2-4, as $\Uplambda _\omega \downarrow 0$ the power function $\pi _\alpha (\Uplambda _\omega )\equiv \pi _\alpha (\Uplambda _\omega ;|\omega |,N-|\omega |)$ of the $T_\omega ^2$-test satisfies

$$\begin{aligned} \pi _\alpha (\Uplambda _\omega )&\textstyle =e^{-\frac{\Uplambda _\omega }{2}}[\alpha +\frac{\Uplambda _\omega }{2}c_{|\omega |,N-|\omega |;1}^\alpha +O(\Uplambda _\omega ^2)]\end{aligned}$$

(55)

$$\begin{aligned}&=\textstyle \alpha +\frac{\Uplambda _\omega }{2}(c_{|\omega |,N-|\omega |;1}^\alpha -\alpha )+O(\Uplambda _\omega ^2). \end{aligned}$$

(56)

Thus for two subsets $\omega ,\,\omega '$ with $\omega \subset \omega '$, there exists $\Uplambda _{|\omega |,|\omega '|,N;\alpha }^{**}>0$ such that

$$\begin{aligned} \textstyle \Uplambda _{\omega '}<\min \left( \Uplambda _{|\omega |,|\omega '|,N;\alpha }^{**},\;Z_{|\omega |,|\omega '|,N;\alpha }\Uplambda _{\omega }\right) \ \implies \ \pi _\alpha (\Uplambda _\omega )>\pi _\alpha (\Uplambda _{\omega '}), \end{aligned}$$

(57)

where, from (2.2) and (2.3) in Das Gupta and Perlman (1974),

$$\begin{aligned} \textstyle Z_{|\omega |,|\omega '|,N;\alpha }:=\frac{c_{|\omega |,N-|\omega |;1}^\alpha -\alpha }{c_{|\omega '|,N-|\omega '|;1}^\alpha -\alpha }>1. \end{aligned}$$

(58)

Therefore power comparisons of $T_\omega ^2$ and $T_{\omega '}^2$ for local alternatives^{Footnote 6} require determination of the lower tail probabilities $c_{m,n;k}^\alpha $, which in turn require the lower quantiles $b_{n,m;\alpha }$ (see Eq. 8).

In parallel with Section 2, this is done explicitly in Examples 3.1 and 3.2. As in Examples 2.1 and 2.2, these examples begin to suggest that variable selection might be limited to very small subsets $\omega \in \Upomega _p$, e.g., singletons in the bivariate Example 3.1, or pairs (plus singletons) in Example 3.2.

Example 3.1

As in Example 2.1 consider the bivariate case $p=2$. Repeat the first two paragraphs from Example 2.1 verbatim, except replace “distant alternatives" by “local alternatives". Because $\max (\Uplambda _1,\Uplambda _2)\le \Uplambda $, it follows from Eqs. 27 and 57 that

$$\begin{aligned}&\textstyle \qquad \big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\gamma _1^2<\min \left( \frac{1}{N}\Uplambda _{1,2,N;\alpha }^{**},\;Z_{1,2,N;\alpha }\gamma _1^2\right) \end{aligned}$$

(59)

$$\begin{aligned} \implies \ {}&\max (\pi _\alpha (\Uplambda _1;1,2),\pi _\alpha (\Uplambda _2;1,2))>\pi _\alpha (\Uplambda ;2,1),\end{aligned}$$

(60)

$$\begin{aligned} \implies \ {}&|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1. \end{aligned}$$

(61)

In the simplest case $N=3$, it follows from Eqs. 8, 6, and 25 that

$$\begin{aligned} Z_{1,2,3;\alpha }&\textstyle =\frac{c_{1,2;1}^\alpha -\alpha }{c_{2,1;1}^\alpha -\alpha }=\frac{1-(1-\alpha )^3-\alpha }{\frac{3}{2}(\alpha -\frac{1}{3}\alpha ^3)-\alpha }=\frac{2(2-\alpha )}{1+\alpha }. \end{aligned}$$

(62)

First note that in Eq. 59,

$$\begin{aligned}&\textstyle \frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}<Z_{1,2,3;\alpha }\end{aligned}$$

(63)

$$\begin{aligned} \iff h_{\alpha ,\eta }(\rho ):=&Z_{1,2,3;\alpha }\rho ^2-2\eta \rho +\eta ^2+1-Z_{1,2,3;\alpha }<0. \end{aligned}$$

(64)

The quadratic function $h_{\alpha ,\eta }(\rho )$ ($-1\le \rho \le 1$) satisfies

$$\begin{aligned} h_{\alpha ,\eta }(-1)&=(1+\eta )^2\ge 0,\\ h_{\alpha ,\eta }(1)&=(1-\eta )^2\ge 0,\\ h_{\alpha ,\eta }(0)&=\eta ^2+1-Z_{1,2,3;\alpha }, \end{aligned}$$

It is easily seen that if $\alpha \le \frac{1}{2}$ then $Z_{1,2,3;\alpha }\ge 2$, so $h_{\alpha ,\eta }(0)\le 0$ for all $\eta \in [-1,1]$. Therefore, if $\alpha \le \frac{1}{2}$ then $h_{\alpha ,\eta }(\rho )$ must have one root in $[-1,0]$ and one root in [0, 1]. The two roots are given by

$$\begin{aligned} \textstyle \check{\rho }_{\alpha ,\eta }^{\pm }=\frac{\eta }{Z_{1,2,3;\alpha }}\pm \frac{1}{Z_{1,2,3;\alpha }}\sqrt{(Z_{1,2,3;\alpha }-1)(Z_{1,2,3;\alpha }-\eta ^2)}; \end{aligned}$$

(65)

again $\check{\rho }_{\alpha ,-\eta }^{\pm }=-\check{\rho }_{\alpha ,\eta }^{\mp }$. Thus, if $\alpha \le \frac{1}{2}$ and $\rho \in (\check{\rho }_{\alpha ,\eta }^-,\,\check{\rho }_{\alpha ,\eta }^+)$ then Eq. 63 must hold.

To conclude that $|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1$, $\gamma _1^2$ must be sufficiently small, i.e.,

$$\begin{aligned} \textstyle 0<\gamma _1^2<\big (\frac{1-\rho ^2}{1-2\eta \rho +\eta ^2}\big )\frac{\Uplambda _{1,2,3;\alpha }^{**}}{3}. \end{aligned}$$

(66)

Because

$$\begin{aligned} \textstyle \min \limits _{|\eta |\le 1}\big (\frac{1-\rho ^2}{1-2\eta \rho +\eta ^2}\big )=\frac{1}{2}(1-|\rho |), \end{aligned}$$

for fixed $\eta $, Eq. 66 will be satisfied provided that

$$\begin{aligned} \rho&\in (\check{\rho }_{\alpha ,\eta }^-,\,\check{\rho }_{\alpha ,\eta }^+),\end{aligned}$$

(67)

$$\begin{aligned} \check{m}_{\alpha ,\eta }:&=\max (|\check{\rho }_{\alpha ,\eta }^-|,\,|\check{\rho }_{\alpha ,\eta }^+|)<1,\end{aligned}$$

(68)

$$\begin{aligned} \gamma _1^2&\textstyle <\frac{1}{6}(1-m_{\alpha ,\eta })\Uplambda _{1,2,3;\alpha }^{**}. \end{aligned}$$

(69)

It is straightforward to show that Eq. 68 holds for $|\eta |<1$ but not for $|\eta |=1$.

Thus, if $\alpha \le \frac{1}{2}$, $|\eta |<1$, and Eqs. 67, 68, and 69 are satisfied then $|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1$, in which case at least one of the two univariate $t^2$-tests are more powerful than the overall (bivariate) $T^2$-test. This occurs in the $(\eta ,\rho )$-regions of the parameter space indicated in Table 3, provided that $\gamma _1^2\textstyle <\frac{1}{6}(1-\check{m}_{\alpha ,\eta })\Uplambda _{1,2,3;\alpha }^{**}$. Thus, for $p=2$, $N=3$, and the common (small) values of $\alpha $, the bivariate size-$\alpha $ $T^2$-test is dominated by at least one of the two univariate size-$\alpha $ $t^2$-tests over much of the local alternative hypothesis space. Compared to Table 1, this effect seems somewhat less than for distant alternatives.

Table 3 For $p=2$, $N=3$, and sufficiently small $\gamma _1^2\equiv \max (\Uplambda _1,\Uplambda _2)$ (i.e., $\gamma _1^2\textstyle <\frac{1}{6}(1-\check{m}_{\alpha ,\eta })\Uplambda _{1,2,3;\alpha }^{**}$), if $\rho \in (\check{\rho }_{\alpha ,\eta }^-,\,\check{\rho }_{\alpha ,\eta }^+)$ then $|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1$, i.e., the power of at least one of the two univariate size-$\alpha $ $t^2$-tests dominates that of the bivariate size-$\alpha $ $T^2$-test

Full size table

Example 3.2

Suppose that $p\ge 3$ and $N=p+2\ge 3$. We shall compare the powers of the ${p\atopwithdelims ()2}$ bivariate size-$\alpha $ $T^2$-tests and the p-variate size-$\alpha $ $T^2$-test for local alternatives, which requires comparison of the powers

$$\begin{aligned} \{\pi (\Uplambda _\omega ;2,p)\mid \omega \in \Upomega _p,\,|\omega |=2\}\ \ \textrm{and}\ \ \pi (\Uplambda ;p,\,2). \end{aligned}$$

(70)

From Eq. 57,

$$\begin{aligned} \Uplambda&<\min (\Uplambda _{2,p,p+2;\alpha }^{**},\,Z_{2,p,p+2;\alpha }\Uplambda ^{(2)})\end{aligned}$$

(71)

$$\begin{aligned}&\ \ \ \ \implies \ \max \{\pi (\Uplambda _\omega ;2,p)\mid \omega \in \Upomega _p,\,|\omega |=2\}>\pi (\Uplambda ;p,\,2). \end{aligned}$$

(72)

Therefore for sufficiently small values of $\Uplambda $, namely $\Uplambda \le \Uplambda _{2,p,p+2;\alpha }^{**}$, at least one of the bivariate size-$\alpha $ $T^2$-tests will be more powerful than the p-variate size-$\alpha $ $T^2$-test^{Footnote 7} whenever

$$\begin{aligned} \Uplambda <Z_{2,p,p+2;\alpha }\Uplambda ^{(2)}.\end{aligned}$$

(73)

From Eqs. 58, 8-6, 25, and some algebra, the explicit expression

$$\begin{aligned} Z_{2,p,p+2;\alpha }&\textstyle =\frac{c_{2,p;1}^\alpha -\alpha }{c_{p,2;1}^\alpha -\alpha } =\textstyle \frac{p\alpha [1-\alpha ^{\frac{2}{p}}]}{2(1-\alpha )[1-(1-\alpha )^{\frac{2}{p}}]} \end{aligned}$$

(74)

is obtained. Setting $\nu _p=\frac{2}{p}\ (\le \frac{2}{3})$ and $V_{p;\alpha }:=\nu _p Z_{2,p,p+2;\alpha }$, we have

$$\begin{aligned} \lim \limits _{p\rightarrow \infty }Z_{2,p,p+2;\alpha }&=\infty ,\end{aligned}$$

(75)

$$\begin{aligned} \lim _{p\rightarrow \infty }V_{p;\alpha }&\textstyle =\frac{\alpha \log \alpha }{(1-\alpha )\log (1-\alpha )}\ \ \ (>1\ \ \textrm{for}\ \ \alpha <\frac{1}{2}),\end{aligned}$$

(76)

$$\begin{aligned} \lim _{p\rightarrow \infty }(p-1)V_{p;\alpha }&=\infty . \end{aligned}$$

(77)

Table 4 shows that $Z_{2,p,p+2;\alpha }$ increases rapidly to $\infty $ as $p\rightarrow \infty $, which suggests that Eq. 73 might hold over substantial regions of the alternative hypothesis. Several such regions are now exhibited.

Case 1: $\gamma _1=\cdots =\gamma _p=:\delta $ and R has the intraclass form Eq. 45

Here $-\frac{1}{p-1}<\rho <1$ and as in Eq. 48,

$$\begin{aligned} \textstyle \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1+\rho },\qquad \Uplambda =\frac{p(p+2)\delta ^2}{1+(p-1)\rho }. \end{aligned}$$

(78)

Here Eq. 73 is equivalent to each of the inequalities

$$\begin{aligned} \textstyle V_{p;\alpha }&>\textstyle \frac{1+\rho }{1+(p-1)\rho };\nonumber \\ \textstyle V_{p;\alpha }-1&>-\left[ (p-1)V_{p;\alpha }-1\right] \rho . \end{aligned}$$

(79)

Because $(p-1)V_{p;\alpha }>1$ for common (small) values of $\alpha $ (see Eq. 77 and Table 4), in such cases Eq. 79 in turn is equivalent to

$$\begin{aligned} \breve{\psi }_{p;\alpha }^-:=\textstyle -\frac{V_{p;\alpha }-1}{(p-1)V_{p;\alpha }-1}<\rho <-1. \end{aligned}$$

(80)

To conclude that Eqs. 71-72 holds, $\delta ^2$ must be sufficiently small, i.e.,

$$\begin{aligned} \textstyle 0<\delta ^2< \Big [\frac{1+(p-1)\rho }{p(p+2)}\Big ]\Uplambda _{2,p,p+2;\alpha }^{**}. \end{aligned}$$

(81)

However, $\rho >\breve{\psi }_{p;\alpha }^-$ implies that

$$\begin{aligned} \textstyle \frac{1+(p-1)\rho }{p(p+2)}>\frac{1+(p-1)\breve{\psi }_{p;\alpha }^-}{p(p+2)} =\frac{p-2}{p(p+2)[(p-1)V_{p;\alpha }-1]}:=\breve{m}_{p,\alpha }. \end{aligned}$$

(82)

Therefore Eq. 81 will be satisfied provided that

$$\begin{aligned} \rho&>\breve{\psi }_{p;\alpha }^-\quad \textrm{and}\quad \delta ^2\textstyle <\breve{m}_{p,\alpha }\Uplambda _{2,p,p+2;\alpha }^{**}. \end{aligned}$$

(83)

If p is large and $\alpha $ is small, Table 4 shows that in Case 1, $\breve{\psi }_{p;\alpha }^-$ is close to the lower limit of the allowable range $(-\frac{1}{p-1},1)$ for $\rho $. Then by Eq. 83, at least one of the bivariate size-$\alpha $ $T^2$-tests will be more powerful than the p-variate size-$\alpha $ $T^2$-test for most of the local alternative hypothesis covered by Case 1, i.e., for $\delta ^2\textstyle <\breve{m}_{p,\alpha }\Uplambda _{2,p,p+2;\alpha }^{**}$.

Case 2 ($\gamma $ sparse): $\gamma _i=\gamma _j\equiv \delta $ for some $\{i,j\}\subset \{1,\dots ,p\}$, $\gamma _k=0$ for $k\ne i,j$, and R has the intraclass form $R_\rho $ in Eq. 45

As in 51,

$$\begin{aligned} \textstyle \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1+\rho },\qquad \Uplambda =\frac{2(p+2)\delta ^2[1+(p-3)\rho ]}{(1-\rho )[1+(p-1)\rho ]}, \end{aligned}$$

Abbreviating $Z_{2,p,p+2;\alpha }$ by Z, Eq. 73 is equivalent to each of the inequalities

$$\begin{aligned} \textstyle \frac{Z}{1+\rho }&>\textstyle \frac{[1+(p-3)\rho ]}{(1-\rho )[1+(p-1)\rho ]};\\ 0&>\textstyle [(p-1)Z+(p-3)]\rho ^2-(p-2)(Z-1)\rho -(Z-1)=:h_{p;\alpha }(\rho ). \end{aligned}$$

Since $h_{p;\alpha }(0)=1-Z<0$ (cf. Eq. 58), $h_{p;\alpha }(\rho )$ has real roots $\breve{\rho }_{p;\alpha }^-<0<\breve{\rho }_{p;\alpha }^+$ (found numerically). Therefore $0>h_{p;\alpha }(\rho )$ for $\breve{\rho }_{p;\alpha }^-<\rho <\breve{\rho }_{p;\alpha }^+$.

To conclude that Eqs. 71-72 holds, $\delta ^2$ must be sufficiently small, i.e.,

$$\begin{aligned} \textstyle 0<\delta ^2< \Big \{\frac{(1-\rho )[1+(p-1)\rho ]}{2(p+2)[1+(p-3)\rho ]}\Big \}\Uplambda _{2,p,p+2;\alpha }^{**}. \end{aligned}$$

(84)

Because $\frac{(1-\rho )[1+(p-1)\rho ]}{1+(p-3)\rho }$ is decreasing in $\rho $, $\rho <\breve{\rho }_{p;\alpha }^+$ implies that

$$\begin{aligned} \textstyle \frac{(1-\rho )[1+(p-1)\rho ]}{2(p+2)[1+(p-3)\rho ]}>\frac{(1-\breve{\rho }_{p;\alpha }^+)[1+(p-1)\breve{\rho }_{p;\alpha }^+]}{2(p+2)[1+(p-3)\breve{\rho }_{p;\alpha }^+]}:=\breve{m}_{p,\alpha }'. \end{aligned}$$

(85)

Therefore Eq. 84 will be satisfied provided that

$$\begin{aligned} \rho \in (\breve{\rho }_{p;\alpha }^-,\,\breve{\rho }_{p;\alpha }^+)\quad \textrm{and}\quad \delta ^2<\breve{m}_{p,\alpha }'\Uplambda _{2,p,p+2;\alpha }^{**}. \end{aligned}$$

(86)

Table 4 shows that in Case 2, the interval $(\breve{\rho }_{p;\alpha }^-,\breve{\rho }_{p;\alpha }^+)$ covers almost all of the allowable range $(-\frac{1}{p-1},1)$ for $\rho $. Thus at least one of the bivariate size-$\alpha $ $T^2$-tests will be more powerful than the p-variate size-$\alpha $ $T^2$-test for most of the local alternative hypothesis determined by Case 2, i.e., for $\delta ^2<\breve{m}_{p,\alpha }'\Uplambda _{2,p,p+2;\alpha }^{**}$.

Case 3 ($\gamma $ sparse): $\gamma _i=\delta $ and $\gamma _j=-\delta $ for some $\{i,j\}\subset \{1,\dots ,p\}$, $\gamma _k=0$ for $k\ne i,j$, and R has the intraclass form $R_\rho $

By Eqs. 13 and 17,

$$\begin{aligned} \textstyle \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1-\rho },\qquad \Uplambda =\frac{2(p+2)\delta ^2}{1-\rho }. \end{aligned}$$

Here Eq. 73 is equivalent to $Z_{2,p,p+2;\alpha }>1$, which holds for all $p,\alpha $ (see Eq. 58).

To conclude that Eqs. 71-72 holds, $\delta ^2$ must be sufficiently small, i.e.,

$$\begin{aligned} \textstyle \delta ^2< \big (\frac{1-\rho }{2(p+2)}\big )\Uplambda _{2,p,p+2;\alpha }^{**} \end{aligned}$$

(87)

for all $\rho \in (-\frac{1}{p-1},1)$. This requires that $\rho $ be bounded below 1, that is, $\rho <1-\epsilon $ for some $\epsilon >0$, whence Eq. 87 will be satisfied if

$$\begin{aligned} \textstyle \delta ^2<\big (\frac{\epsilon }{2(p+2)}\big )\Uplambda _{2,p,p+2;\alpha }^{**}. \end{aligned}$$

(88)

Thus at least one of the bivariate size-$\alpha $ $T^2$-tests is more powerful than the p-variate size-$\alpha $ $T^2$-test if $\rho <1-\epsilon $, which covers almost all of the local region Eq. 88 in the alternative hypothesis determined by Case 3.

Case 4: $p=:2l$ is even, $\gamma _i=\delta $ for l indices in $\{1,\dots ,p\}$, $\gamma _i=-\delta $ for the remaining l indices, and R has the intraclass form $R_\rho $

By Eqs. 13 and 17,

$$\begin{aligned} \textstyle \Uplambda ^{(2)}=\frac{2(p+2)\delta ^2}{1-|\rho |},\qquad \Uplambda =\frac{p(p+2)\delta ^2}{(1-\rho )}. \end{aligned}$$

Here Eq. 73 is equivalent to the inequality

$$\begin{aligned} V_{p;\alpha }&>\textstyle \frac{1-|\rho |}{1-\rho }. \end{aligned}$$

(89)

Because $\frac{1-|\rho |}{1-\rho }\le 1$, while $V_{p;\alpha }>1$ for holds for most $p,\alpha $ (see Eq. 76 and Table 4), Eq. 89 is satisfied for most $p,\alpha $.

To conclude that Eqs. 71-72 holds, $\delta ^2$ must be sufficiently small, i.e.,

$$\begin{aligned} \textstyle \delta ^2< \big [\frac{1-\rho }{p(p+2)}\big ]\Uplambda _{2,p,p+2;\alpha }^{**} \end{aligned}$$

(90)

for all $\rho \in (-\frac{1}{p-1},1)$. This again requires that $\rho $ be bounded below 1, that is, $\rho <1-\epsilon $ for some $\epsilon >0$, whence Eq. 90 will be satisfied if

$$\begin{aligned} \textstyle \delta ^2< \big [\frac{\epsilon }{p(p+2)}\big ]\Uplambda _{2,p,p+2;\alpha }^{**} \end{aligned}$$

(91)

Thus at least one of the bivariate size-$\alpha $ $T^2$-tests is more powerful than the p-variate size-$\alpha $ $T^2$-test if $\rho <1-\epsilon $, which covers almost all of the local region Eq. 91 in the alternative hypothesis determined by Case 4. $\square $

Table 4 Suppose that $p\ge 3$, $N=p+2$, and $\delta ^2$ is sufficiently small, that is, $\delta ^2\textstyle <\breve{m}_{p,\alpha }\Uplambda _{2,p,p+2;\alpha }^{**}$ ($\delta ^2\textstyle <\breve{m}_{p,\alpha }'\Uplambda _{2,p,p+2;\alpha }^{**}$)

Full size table

Remark 3.3

The mean-vector and covariance matrix configurations in Examples 2.1 and 2.2 are the same as those in Examples 3.1 and 3.2 respectively, so the results for distant alternatives in the former can be compared to those for local alternatives in the latter. Because both sets of results support the feasibility of limiting variable selection to small subsets, this suggests that this feasibility may extend to intermediate alternatives as well. Furthermore, both sparse cases (Cases 2 and 3) and non-sparse cases (Cases 1 and 4) exhibit this feasibility in §2 and §3. Of course, more extensive comparisons will be needed to confirm this conclusion.

4 Some Exact Power Comparisons for the Bivariate Case

The results in Sections 2 and 3 compare the power of the overall (p-variate) $T^2$-test with those of univariate or bivariate $T^2$-tests based on the original variates. However these power comparisons are asymptotic or local, and are relevant only for noncentrality parameters $\Uplambda $ that approach $\infty $ or 0. In this section we consider the bivariate case $p=2$ and attempt to compare the exact power functions of the $T^2$-test and the two univariate $t^2$-tests for all values of $\Uplambda $. Two conjectures are presented; the first of these is confirmed in Proposition 4.3 and applied in Example 4.4 for only two simple cases.

Conjecture 4.1

(weak) Suppose that $p=2$ and N is odd: $N=2l+1$. Then for each $\lambda >0$, there exists $\alpha _l^*(\lambda )\in (0,1)$ such that

$$\begin{aligned} \textstyle 0<\alpha <\alpha _l^*(\lambda )\ \ \implies \ \ \pi _\alpha (\lambda ;\,1,\,2l)>\pi _\alpha \big (\big (\frac{4l}{2l-1}\big )\lambda ;\,2,\,2l-1\big ), \end{aligned}$$

(92)

with equality when $\alpha =\alpha _l^*(\lambda )$. $\square $

Conjecture 4.1 is established below for $l=1,2$ and we expect it to hold for all $l\ge 3$ as well. However, it is unsatisfactory in that if $\alpha _l^*(\lambda )$ depends nontrivially on $\lambda $ then we cannot conclude that, at least for small $\alpha $, one or both of the two univariate size-$\alpha $ $t^2$-tests dominate the bivariate size-$\alpha $ $T^2$ test in a large region of the alternative hypothesis. For this the following stronger result would be needed.

Conjecture 4.2

(strong) Conjecture 4.1 holds with $\alpha _l^*(\lambda )$ not depending on $\lambda $, i.e., $\alpha _l^*(\lambda )=\alpha _l^*$. $\square $

At this time we do not have evidence either for or against Conjecture 4.2. If valid, it would be essential to determine or approximate the values of $\alpha _l^*$.

Proposition 4.3

Conjecture 4.1 is valid for $l=1$ and 2.

Proof

By Eq. 3,

$$\begin{aligned} \pi _\alpha (\lambda ;\,1,\,2l)>\textstyle \pi _\alpha (2(1+\delta )\lambda ;\,2,\,2l-1) \end{aligned}$$

(93)

if and only if

$$\begin{aligned} s_{\delta ,\lambda }^{(l)}(\alpha ):=\textstyle e^{(\frac{1}{2}+\delta )\lambda }\sum _{k=0}^\infty \textstyle (\frac{\lambda }{2})^k\frac{1}{k!}c_{1,2l;k}^\alpha >\sum _{k=0}^\infty \frac{(1+\delta )^k\lambda ^k}{k!}c_{2,2l-1;k}^\alpha =:t_{\delta ,\lambda }^{(l)}(\alpha ). \end{aligned}$$

(94)

From Eqs. 8 and 6 we find that

$$\begin{aligned} c_{1,2l;k}^\alpha&=\Pr [b_{2l,1+2k}<b_{2l,1;\alpha }]\end{aligned}$$

(95)

$$\begin{aligned}&=\textstyle \frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (l)\Upgamma (\frac{1}{2}+k)} \int _0^{b_{2l,1;\alpha }}b^{l-1}(1-b)^{k-\frac{1}{2}}db;\end{aligned}$$

(96)

$$\begin{aligned} \alpha&=\Pr [b_{2l,1}<b_{2l,1;\alpha }]\end{aligned}$$

(97)

$$\begin{aligned}&=\textstyle \frac{\Upgamma (l+\frac{1}{2})}{\Upgamma (l)\Upgamma (\frac{1}{2})} \int _0^{b_{2l,1;\alpha }}b^{l-1}(1-b)^{-\frac{1}{2}}db. \end{aligned}$$

(98)

Set $u=1-b$ in Eqs. 97-98, then differentiate with respect to $\alpha $ to obtain

$$\begin{aligned} \textstyle \frac{d}{d\alpha }b_{2l,1;\alpha }&=\textstyle \frac{\Upgamma (l)\Upgamma (\frac{1}{2})}{\Upgamma (l+\frac{1}{2})}\frac{(1-b_{2l,1;\alpha })^{\frac{1}{2}} }{b_{2l,1;\alpha }^{l-1}}; \end{aligned}$$

(99)

$$\begin{aligned} \textstyle \frac{d}{d\alpha }c_{1,2l;k}^\alpha&=\textstyle \frac{\Upgamma (l+\frac{1}{2}+k)\Upgamma (\frac{1}{2})}{\Upgamma (l+\frac{1}{2})\Upgamma (\frac{1}{2}+k)}(1-b_{2l,1;\alpha })^k\end{aligned}$$

(100)

$$\begin{aligned} \textstyle \frac{d}{d\alpha }s_{\delta ,\lambda }^{(l)}(\alpha )&=\textstyle e^{(\frac{1}{2}+\delta )\lambda }\sum _{k=0}^\infty \textstyle (\frac{\lambda }{2})^k\frac{1}{k!}\frac{\Upgamma (l+\frac{1}{2}+k)\Upgamma (\frac{1}{2})}{\Upgamma (l+\frac{1}{2})\Upgamma (\frac{1}{2}+k)}(1-b_{2l,1;\alpha })^k\end{aligned}$$

(101)

$$\begin{aligned}&=\textstyle \big [\sum _{k=0}^\infty \frac{(\frac{1}{2}+\delta )^k\lambda ^k}{k!}\big ]\big [\sum _{k=0}^\infty \textstyle (\frac{\lambda }{2})^k\frac{1}{k!}\frac{\Upgamma (l+\frac{1}{2}+k)\Upgamma (\frac{1}{2})}{\Upgamma (l+\frac{1}{2})\Upgamma (\frac{1}{2}+k)}(1-b_{2l,1;\alpha })^k\big ]\end{aligned}$$

(102)

$$\begin{aligned}&=\textstyle \sum _{k=0}^\infty \frac{(\frac{1}{2}+\delta )^k}{k!}\big [\sum _{r=0}^k{k\atopwithdelims ()r}\frac{1}{(1+2\delta )^r}\frac{\Upgamma (l+\frac{1}{2}+r)\Upgamma (\frac{1}{2})}{\Upgamma (l+\frac{1}{2})\Upgamma (\frac{1}{2}+r)}(1-b_{2l,1;\alpha })^r\big ]\lambda ^k. \end{aligned}$$

(103)

Next,

$$\begin{aligned} \alpha= & {} \Pr [b_{2l-1,2}<b_{2l-1,2;\alpha }]\end{aligned}$$

(104)

$$\begin{aligned}= & {} \textstyle \frac{\Upgamma (l+\frac{1}{2})}{\Upgamma (l-\frac{1}{2})\Upgamma (1)} \int _0^{b_{2l-1,2;\alpha }}b^{l-\frac{3}{2}}db=b_{2l-1,2;\alpha }^{l-\frac{1}{2}},\end{aligned}$$

(105)

$$\begin{aligned} b_{2l-1,2;\alpha }= & {} \alpha ^{\frac{2}{2l-1}};\end{aligned}$$

(106)

$$\begin{aligned} c_{2,2l-1;k}^\alpha= & {} \Pr [b_{2l-1,2+2k}<\alpha ^{\frac{2}{2l-1}}]\end{aligned}$$

(107)

$$\begin{aligned}= & {} \textstyle \frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (l-\frac{1}{2})\Upgamma (1+k)} \int _0^{\alpha ^{\frac{2l}{2l-1}}}b^{l-\frac{3}{2}}(1-b)^kdb\end{aligned}$$

(108)

$$\begin{aligned}= & {} \textstyle \frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (l+\frac{1}{2})\Upgamma (1+k)} \int _0^\alpha (1-w^{\frac{2l}{2l-1}})^kdw;\end{aligned}$$

(109)

$$\begin{aligned} \textstyle \frac{d}{d\alpha }c_{2,2l-1;k}^\alpha= & {} \textstyle \frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (l+\frac{1}{2})\Upgamma (1+k)} (1-\alpha ^{\frac{2l}{2l-1}})^k;\end{aligned}$$

(110)

$$\begin{aligned} \textstyle \frac{d}{d\alpha }t_{\delta ,\lambda }^{(l)}(\alpha )= & {} \textstyle \sum _{k=0}^\infty \frac{(1+\delta )^k\lambda ^k}{k!}\frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (l+\frac{1}{2})\Upgamma (1+k)}(1-\alpha ^{\frac{2l}{2l-1}})^k. \end{aligned}$$

(111)

Therefore a sufficient condition that $\frac{d}{d\alpha }s_{\delta ,\lambda }^{(l)}(\alpha )> \frac{d}{d\alpha }t_{\delta ,\lambda }^{(l)}(\alpha )$ is that for all $k\ge 0$,

$$\begin{aligned} \textstyle \frac{(\frac{1}{2}+\delta )^k}{(1+\delta )^k(1-\alpha ^{\frac{2l}{2l-1}})^k}\sum _{r=0}^k{k\atopwithdelims ()r}\frac{(1-b_{2l,1;\alpha })^r}{(1+2\delta )^r}\frac{\Upgamma (l+\frac{1}{2}+r)}{\Upgamma (\frac{1}{2}+r)}&\textstyle \ge \frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (\frac{1}{2})\Upgamma (1+k)}, \end{aligned}$$

(112)

with strict inequality for at least one k.

Thus for $\alpha =0$, a sufficient condition that $\frac{d}{d\alpha }s_{\delta ,\lambda }^{(l)}(\alpha =0)> \frac{d}{d\alpha }t_{\delta ,\lambda }^{(l)}(\alpha =0)$ is that for all $k\ge 0$,

$$\begin{aligned} \textstyle \frac{(\frac{1}{2}+\delta )^k}{(1+\delta )^k}\sum _{r=0}^k{k\atopwithdelims ()r}\frac{1}{(1+2\delta )^r}\frac{\Upgamma (l+\frac{1}{2}+r)}{\Upgamma (\frac{1}{2}+r)}&\textstyle \ge \frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (\frac{1}{2})\Upgamma (1+k)}, \end{aligned}$$

(113)

with strict inequality for at least one k. After some algebra, Eq. 113 can be written equivalently as

$$\begin{aligned} \textstyle \textrm{E}\Big [\frac{\Upgamma (l+\frac{1}{2}+R_{k,\delta })}{\Upgamma (\frac{1}{2}+R_{k,\delta })}\Big ]\ge \frac{\Upgamma (l+\frac{1}{2}+k)}{\Upgamma (\frac{1}{2})\Upgamma (1+k)}, \end{aligned}$$

(114)

where $R_{k,\delta }\sim \textrm{Binomial}(k,\frac{1}{1+\delta })$. Because $R_{k,\delta }$ is (strictly) stochastically decreasing in $\delta $ (for $k\ge 1$) while $\frac{\Upgamma (l+\frac{1}{2}+R_{k,\delta })}{\Upgamma (\frac{1}{2}+R_{k,\delta })}$ is (strictly) increasing in $R_{k,\delta }$, the left side of Eq. 114$\equiv $ Eq. 113 is (strictly) decreasing in $\delta $ (for $k\ge 1$).

For $k=0$ both sides of Eq. 113$=\frac{\Upgamma (l+\frac{1}{2})}{\Upgamma (\frac{1}{2})}$. For $k=1$, Eq. 113 is equivalent to the inequality

$$\begin{aligned} \textstyle \frac{(\frac{1}{2}+\delta )}{(1+\delta )}\Big [\frac{\Upgamma (l+\frac{1}{2})}{\Upgamma (\frac{1}{2})}+\frac{1}{(1+2\delta )}\frac{\Upgamma (l+\frac{3}{2})}{\Upgamma (\frac{3}{2})}\Big ] \ge&\ \textstyle \frac{\Upgamma (l+\frac{3}{2})}{\Upgamma (\frac{1}{2})}, \end{aligned}$$

which is equivalent to $\delta \le \frac{1}{2l-1}$. Therefore the sufficient condition Eq. 113 for

$$\begin{aligned} \textstyle \frac{d}{d\alpha }s_{\delta ,\lambda }^{(l)}(\alpha =0)> \frac{d}{d\alpha }t_{\delta ,\lambda }^{(l)}(\alpha =0) \end{aligned}$$

will be satisfied for all $\delta \le \frac{1}{2l-1}$ if Eq. 113 holds for $\delta =\frac{1}{2l-1}$ for all $k\ge 2$, with strict inequality for at least one $k\ge 2$.

Because $s_{\delta ,\lambda }^{(l)}(0)=t_{\delta ,\lambda }^{(l)}(0)=0$, it follows from Eqs. 93-94 that Eq. 113, with strict inequality for some $k\ge 2$, is a sufficient condition that for each $\lambda >0$, there exists $\alpha _l^*(\lambda )\in (0,1)$ such that

$$\begin{aligned} \textstyle 0<\alpha <\alpha _l^*(\lambda )\ \ \implies \ \ \pi _\alpha (\lambda ;\,1,\,2l)>\pi _\alpha \big (\big (\frac{4l}{2l-1}\big )\lambda ;\,2,\,2l-1\big ), \end{aligned}$$

(115)

with equality when $\alpha =\alpha _l^*(\lambda )$.

For the simplest case $l=1$ ($N=3$), Eq. 113 with $\delta =\frac{1}{2l-1}=1$ becomes

$$\begin{aligned} \textstyle \big (\frac{3}{4}\big )^k\sum _{r=0}^k{k\atopwithdelims ()r}\frac{1}{3^r}(\frac{1}{2}+r)&\textstyle \ge \frac{\Upgamma (\frac{3}{2}+k)}{\Upgamma (\frac{1}{2})\Upgamma (1+k)}, \end{aligned}$$

(116)

which by Eq. 114 can be reduced to the equivalent form

$$\begin{aligned} \textstyle 1+\frac{k}{2}&\textstyle \ge \frac{\Upgamma (\frac{3}{2}+k)}{\Upgamma (\frac{3}{2})\Upgamma (1+k)}. \end{aligned}$$

(117)

It is straightforward to verify Eq. 117 by induction on k, with strict inequality holding for large k because $\frac{\Upgamma (\frac{3}{2}+k)}{\Upgamma (1+k)}=O(k^{\frac{1}{2}})$. Therefore Eq. 115 holds for $l=1$; that is, for each $\lambda >0$, there exists $\alpha _1^*(\lambda )\in (0,1)$ such that

$$\begin{aligned} 0<\alpha <\alpha _1^*(\lambda )\ \ \implies \ \ \pi _\alpha (\lambda ;\,1,\,2)>\pi _\alpha (4\lambda ;\,2,\,1), \end{aligned}$$

(118)

with equality when $\alpha =\alpha _1^*(\lambda )$.

Next consider the case $l=2$ ($N=5$). With $\delta =\frac{1}{2l-1}=\frac{1}{3}$, Eq. 113 becomes

$$\begin{aligned} \textstyle \big (\frac{5}{8}\big )^k\sum _{r=0}^k{k\atopwithdelims ()r}\big (\frac{3}{5}\big )^r(\frac{3}{2}+r)(\frac{1}{2}+r)&\textstyle \ge \frac{\Upgamma (\frac{5}{2}+k)}{\Upgamma (\frac{1}{2})\Upgamma (1+k)}, \end{aligned}$$

(119)

which can be reduced to the equivalent form

$$\begin{aligned} \textstyle \frac{3}{4}+\frac{9k}{8}+\frac{9k(k-1)}{64}&\textstyle \ge \frac{\Upgamma (\frac{5}{2}+k)}{\Upgamma (\frac{1}{2})\Upgamma (1+k)}=(\frac{3}{2}+k)(\frac{1}{2}+k)\frac{\Upgamma (\frac{1}{2}+k)}{\Upgamma (\frac{1}{2})\Upgamma (1+k)}. \end{aligned}$$

(120)

Interestingly, Eq. 120 holds with equality for $k=2$ as well as for 0 and 1. Rewrite Eq. 120 in the equivalent form

$$\begin{aligned} \textstyle \frac{k(k-1)\cdots 2\cdot 1}{(k-\frac{1}{2})(k-\frac{1}{2})\cdots \frac{3}{2}\cdot \frac{1}{2}}&\textstyle \ge \frac{48+128k+64k^2}{48+63k+9k^2}. \end{aligned}$$

(121)

To verify (121) by induction on k, it suffices to show that for $k\ge 2$,

$$\begin{aligned} \textstyle \frac{k+1}{k+\frac{1}{2}}\frac{48+128k+64k^2}{48+63k+9k^2}\ge \frac{48+128(k+1)+64(k+1)^2}{48+63(k+1)+9(k+1)^2}. \end{aligned}$$

(122)

After simplification, this is equivalent to the inequality

$$\begin{aligned} 4k^3+4k^2-5k-3\ge 0, \end{aligned}$$

(123)

which holds for all $k\ge 1$, with strict inequality for $k+1\ge 3$. Thus Eq. 115 holds for $l=2$: for each $\lambda >0$, there exists $\alpha _1^*(\lambda )\in (0,1)$ such that

$$\begin{aligned} \textstyle 0<\alpha <\alpha _2^*(\lambda )\ \ \implies \ \ \pi _\alpha (\lambda ;\,1,\,4)>\pi _\alpha (\frac{8}{3}\lambda ;\,2,\,3), \end{aligned}$$

(124)

with equality when $\alpha =\alpha _2^*(\lambda )$.

Example 4.1

s Return to the bivariate Example 2.1, where $p=2$ and

$$\begin{aligned} \textstyle \Uplambda _1=N\gamma _1^2,\quad \Uplambda _2=N\eta ^2\gamma _1^2,\quad \Uplambda =N\big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\gamma _1^2; \end{aligned}$$

(125)

(recall Eq. 27). For $N=3$ it follows from Eqs. 118 and 125 that for each $\gamma _1^2>0$,

$$\begin{aligned} 0<\alpha <\alpha _1^*(3\gamma _1^2)\ \ \implies \ \ \pi _\alpha (3\gamma _1^2;\,1,\,2)>\pi _\alpha (4\!\cdot \!3\gamma _1^2;\,2,\,1). \end{aligned}$$

(126)

Furthermore,

$$\begin{aligned} \textstyle 4\!\cdot \!3\gamma _1^2>3\big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\gamma _1^2 \ \ {}&\iff \ \ \textstyle 4>\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\end{aligned}$$

(127)

$$\begin{aligned}&\iff 0>4\rho ^2-2\eta \rho +(\eta ^2-3):=h_\eta (\rho ). \end{aligned}$$

(128)

The two roots of $h_\eta (\rho )$ are $\textstyle \hat{\rho }_\eta ^\pm =\frac{\eta \pm \sqrt{12-3\eta ^2}}{4}$; note that $\hat{\rho }_{-\eta }^{\pm }=-\hat{\rho }_{\eta }^{\mp }$. Some values appear in Table 5. Thus, if $\alpha <\alpha _1^*(3\gamma _1^2)$ then

$$\begin{aligned} \rho \in (\hat{\rho }_{\eta }^-,\,\hat{\rho }_{\eta }^+)&\implies \max (\pi _\alpha (\Uplambda _1;\,1,2),\,\pi _\alpha (\Uplambda _2;\,1,2))>\pi _\alpha (\Uplambda ;\,2,1)\end{aligned}$$

(129)

$$\begin{aligned}&\implies |\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1; \end{aligned}$$

(130)

that is, at least one of the two univariate $t^2$-tests is more powerful than the overall (bivariate) $T^2$-test. This occurs in the $(\eta ,\rho )$-regions of the parameter space indicated in Table 5, which constitute a substantial part of the alternative hypothesis.

Similarly, for $N=5$ it follows from Eqs. 124 and 125 that for each $\gamma _1^2>0$,

$$\begin{aligned} 0<\alpha <\alpha _2^*(5\gamma _1^2)\ \ \implies \ \ \pi _\alpha (5\gamma _1^2;\,1,\,4)>\textstyle \pi _\alpha (\frac{8}{3}\!\cdot \!5\gamma _1^2;\,2,\,3). \end{aligned}$$

(131)

Furthermore,

$$\begin{aligned} \textstyle \frac{8}{3}\!\cdot \!5\gamma _1^2>5\big (\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\big )\gamma _1^2 \ \ {}&\iff \ \ \textstyle \frac{8}{3}>\frac{1-2\eta \rho +\eta ^2}{1-\rho ^2}\end{aligned}$$

(132)

$$\begin{aligned}&\iff 0>\textstyle \frac{8}{3}\rho ^2-2\eta \rho +(\eta ^2-\frac{5}{3}):=\tilde{h}_\eta (\rho ). \end{aligned}$$

(133)

The two roots of $\tilde{h}_\eta (\rho )$ are $\textstyle \tilde{\rho }_\eta ^\pm =\frac{3\eta \pm \sqrt{40-15\eta ^2}}{8}$; note that $\tilde{\rho }_{-\eta }^{\pm }=-\tilde{\rho }_{\eta }^{\mp }$. Some values appear in Table 5. Thus, if $\alpha <\alpha _2^*(5\gamma _1^2)$ then

$$\begin{aligned} \rho \in (\tilde{\rho }_{\eta }^-,\,\tilde{\rho }_{\eta }^+)&\implies \max (\pi _\alpha (\Uplambda _1;\,1,4),\,\pi _\alpha (\Uplambda _2;\,1,4))>\pi _\alpha (\Uplambda ;\,2,3)\end{aligned}$$

(134)

$$\begin{aligned}&\implies |\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1; \end{aligned}$$

(135)

that is, at least one of the two univariate $t^2$-tests is more powerful than the bivariate $T^2$-test. Again this occurs in the $(\eta ,\rho )$-regions of the parameter space indicated in Table 5.

Thus for $p=2$, $N=3$ or 5, and sufficiently small $\alpha $ (but depending on $\gamma _1^2$), the bivariate size-$\alpha $ $T^2$-test is dominated by at least one of the two univariate size-$\alpha $ $t^2$-tests over a fairly large portion of the entire alternative hypothesis, comprising local, intermediate, and distant alternatives.

Table 5 For $p=2$, $N=3$ ($N=5$), and $\alpha <\alpha _1^*(3\gamma _1^2)$ ($\alpha <\alpha _2^*(5\gamma _1^2)$), if $\rho \in (\hat{\rho }_{\eta }^-,\,\hat{\rho }_{\eta }^+)$ ($\rho \in (\tilde{\rho }_{\eta }^-,\,\tilde{\rho }_{\eta }^+)$) then $ |\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1$, so the power of at least one of the two univariate size-$\alpha $ $t^2$-tests exceeds that of the bivariate size-$\alpha $ $T^2$-test

Full size table

Table 6 The powers of the overall $T^2$ test and optimal univariate test $T_{\hat{\omega }}^2$ under a most-favorable parameter configuration where their noncentralities coincide: $\Uplambda =\Uplambda _{\hat{\omega }}=18$

Full size table

5 Concluding Remarks

For the purpose of encouraging future research, the questions raised in this report are stated formally as follows:

The Oracular Variable-Selection Problem (OVSP)

is that of determining the function $\hat{\omega }_\alpha (\gamma ,R)$, as defined in Eq. 19, and using this to determine the regions

$$\begin{aligned} A_\alpha (i)\equiv \big \{(\gamma ,R)\bigm |&|\hat{\omega }_\alpha (\gamma ,R)|=i\big \},\quad i=1,...,p. \end{aligned}$$

(136)

The Parsimonious Variable-Selection Problem (PVSP)

asks if $A_\alpha (i)$ comprises a substantial portion of the alternative hypothesis K for small values of i, e.g., $i=1,2$.

If the answer to the PVSP is positive, then variable selection in some applied investigations can be limited to small, easily interpretable subsets of variables.

Finally, Example 5.1 illustrates the gain in power that ideally can be attained by variable selection limited to univariate subsets even after the crude Bonferonni correction for multiple testing is applied.

Example 5.1

For $p=10$, $N=12,22, 41$, and $\alpha =.05$, Table 6 shows the gain in power obtained by the Bonferroni-corrected test $T_{\hat{\omega }}^2$ with the oracular subset $\hat{\omega }\equiv \hat{\omega }_\alpha (\gamma ,R)$ when $|\hat{\omega }|=1$ and $\Uplambda =\Uplambda _{\hat{\omega }}=18$. The gain in power can be substantial unless N is very large. (The powers are from Tiku, 1967.)

Notes

Among the larger class of all tests, invariant or non-invariant, the $T^2$ test is admissible [Schaafsma (1982)], proper Bayes (Kiefer and Schwartz, 1965), and locally and asymptotically minimax for small and large values of $\Uplambda $, respectively (Giri and Keifer, 1964).
However, Giri et al. (1963) also began their study of Hotelling’s $T^2$ test by considering only this simplest case $p=2$, $N=3$.
We do not claim to know the value of $\Uplambda _{|\omega |,|\omega '|,N;\alpha }^*$, even approximately.
But see Footnote 2.
Note that by itself this does not establish that $|\hat{\omega }_\alpha (\gamma ,R)|=2$.
We do not claim to know the values of $\Uplambda _{|\omega |,|\omega '|,N;\alpha }^{**}$, even approximately.
As in Example 2.2, this does not establish that $|\hat{\omega }_\alpha (\gamma ,R)|=2$.
In line 2 of the second column on p.179 of Das Gupta and Perlman (1974), “conclude" should be“include". In the line following the third display in the second column on p.179, “j" should be“f". In Remark 4.1 on p.180, “increasing in m" should be “decreasing in m". In the next line, “$m\rightarrow \infty $" should be “$n\rightarrow \infty $".

References

Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd edition, Wiley & Sons, New York.
Google Scholar
Das Gupta, S. and Perlman, M. D. (1974). On the power of the noncentral $F$-test: effect of additional variates on Hotelling’s $T^2$-test. J.Amer. Stat. Assoc. 69 174-180.
Google Scholar
Giri, N. and Kiefer, J. (1964). Local and asymptotic minimax properties of multivariate tests. Ann. Math. Stat. 35 21-35.
Article MathSciNet Google Scholar
Giri, N., Kiefer, J., and Stein, C. (1963). Minimax character of Hotelling’s $T^2$-test in the simplest case. Ann. Math. Stat. 34 1524-1535.
Article Google Scholar
Kiefer, J. and Schwartz, R. (1965). Admissible Bayes character of $T^2$-, $R^2$, and other fully invariant tests for classical multivariate normal testing problems. Ann. Math. Stat. 36 747-770.
Marden, J. and Perlman, M. D. (1980). Invariant tests for means with covariates. Ann. Stat. 8 25-63.
Stein, C. (1956). The admissibility of Hotelling’s $T^2$-test. Ann. Math. Stat. 27 616-623.
Tiku, M. L. (1967). Tables of the power of the $F$-test. J.Amer. Stat. Assoc. 62 525-539.

Download references

Acknowledgements

This work owes much to the late Somesh Das Gupta, my colleague, teacher, and friend. I am also grateful to David Perlman for raising the questions addressed here and providing supporting data, and to an anonymous referee who provided many insightful comments and suggestions.

Author information

Authors and Affiliations

Department of Statistics, University of Washington, Seattle, WA, 98195, USA
Michael D. Perlman

Authors

Michael D. Perlman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael D. Perlman.

Ethics declarations

Conflicts of interest

No funding was received to assist with the preparation of this manuscript. The author has no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Testing for additional information.

Variable selection for the $T^2$-test and related linear discriminant analysis was thoroughly studied in the 1970s and 1980s, an era of limited computer power, and subsequently by several authors with greater ability to consider all-subsets methods; a list of references appears below. Almost all of these studies were based on testing for additional information (= increased Mahalanobis distance), as now described.

For any two nested subsets $\omega \subset \omega '$ in $\Upomega _p$, in general $\Uplambda _\omega \le \Uplambda _{\omega '}$. The question of whether the power of the $T_{\omega '}$-test exceeds that of the $T_{\omega }$-test for the testing problem Eq. 9 usually was formulated as the problem of testing for additional information (TAI), namely, testing

$$\begin{aligned} \Uplambda _\omega '=\Uplambda _{\omega }\quad \mathrm {vs.}\quad \Uplambda _\omega '>\Uplambda _{\omega } \end{aligned}$$

(137)

based on a preliminary sample – see [Rao (1973)] §8c.4. This formulation was adopted by many researchers, even while citing the following result of Das Gupta and Perlman (1974) which implies that this standard formulation of TAI is inappropriate.

It was shown in [DGP, Theorem 2.1] that for fixed $\lambda >0$, the power function $\pi _\alpha (\lambda ;m,n)$ (recall Eq. 2) of the non-central f-test is strictly decreasing in m and strictly increasing in n.^{Footnote 8} Therefore for any integer $1\le q\le n-1$ there exists a unique real number

$$\begin{aligned} g_\alpha (\lambda ):= g_\alpha (\lambda ; m,n,q)>0 \end{aligned}$$

(138)

such that

$$\begin{aligned} \pi _\alpha (\lambda ;m,n)=\pi _\alpha (\lambda +g_\alpha (\lambda );m+q, n-q). \end{aligned}$$

(139)

Here $g_\alpha (0)=0$ and $g_\alpha (\lambda )$ is strictly increasing in $\lambda $; cf. (Theorem 3.1, Das Gupta and Perlman, 1974). Thus the power is increased only if

$$\begin{aligned} \Uplambda _{\omega '}>\Uplambda _\omega +g_\alpha (\Uplambda _{\omega }; |\omega |,N-|\omega |,|\omega '\!\setminus \!\omega |). \end{aligned}$$

(140)

Therefore (Section 4, Das Gupta and Perlman, 1974) introduced the problem of testing for increased power (TIP), namely, testing

$$\begin{aligned}&H_1:\Uplambda _{\omega '}\le \Uplambda _\omega +g_\alpha (\Uplambda _{|\omega |}; |\omega |,N-|\omega |,|\omega '\!\setminus \!\omega |)\\ \mathrm {vs.}\ \ {}&K_1:\Uplambda _{\omega '}>\Uplambda _\omega +g_\alpha (\Uplambda _{|\omega |}; |\omega |,N-|\omega |,|\omega '\!\setminus \!\omega |), \end{aligned}$$

and proposed several (approximate) tests.

This proposal was noted by subsequent authors but never implemented for variable selection, possibly because of difficulties in computing the functions $g_\alpha (\cdot )$, especially if many pairs $(\omega ,\omega ')$ must be considered. However, if as suggested above, variable selection might be limited to very small subsets of variables in practical applications, then replacing the TAI by the TIP might be feasible.

Remark A1

The relation Eq. 92 in Conjecture 4.1 can be stated equivalently in terms of $g_\alpha $:

$$\begin{aligned} \textstyle 0<\alpha <\alpha _l^*(\lambda )\ \ \implies \ \ g_\alpha (\lambda ;1,2l,1)>\big (\frac{2l+1}{2l-1}\big )\lambda \quad \forall \ \lambda >0, \end{aligned}$$

(141)

with equality when $\alpha =\alpha _l^*(\lambda )$. Thus the relations Eqs. 118 and 124 in Proposition 4.3 also can be stated equivalently in terms of $g_\alpha $:

$$\begin{aligned} 0<\alpha <\alpha _1^*(\lambda )\ \ \implies \ \ g_\alpha (\lambda ;1,2,1)>3\lambda \quad \forall \ \lambda >0, \end{aligned}$$

(142)

with equality when $\alpha =\alpha _1^*(\lambda )$;

$$\begin{aligned} \textstyle 0<\alpha <\alpha _2^*(\lambda )\ \ \implies \ \ g_\alpha (\lambda ;1,4,1)>\frac{5}{3}\lambda \quad \forall \ \lambda >0, \end{aligned}$$

(143)

with equality when $\alpha =\alpha _2^*(\lambda )$.

Additional References for the Appendix

Hand, D. J. (1981). Discrimination and Classification, Wiley & Sons, New York. [Chapter 6]

Hawkins, D. M. (1976). The subset problem in multivariate analysis of variance. J. Royal Statist. Soc, Series B 38 132-139.

Jain, A. K. and Waller, W. G. (1978). On the optimal number of features in the classification of multivariate Gaussian data. Pattern Recognition 10 103-109.

Jiang, W., Wang, K., and Tsung, F. (2012). A variable-selection-based multivariate EWMA chart for process monitoring and diagnosis. J. Quality Technology 44 209-230.

McCabe, G. P. Jr. (1975). Computations for variable selection in discriminant analysis. Technometrics 17 259-263.

McKay, R. J. (1976). A graphical aid to selection of variables in two-group discriminant analysis. Appl. Statist. 27 259-263.

McLachlan, G. J. (1976). On the relationship between the F test and the overall error rate for variable selection in two-group discriminant function. Biometrics 36 501-510.

McLachlan, G. J. (1980). A criterion for selecting variables for the linear discriminant function. Biometrics 32 529-534.

McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition, Wiley & Sons, New York. [Chapter 12]

Murray, G. D. (1977). A cautionary note on selection of variables in discriminant analysis. Appl. Statist. 26 246-250.

Nobuo, S. and Takahisa, I. (2016). A variable selection method for detecting abnormality base on the $T^2$ test. Comm. Statist. - Theory and Methods 46 501-510.

Rao, C. R. (1973). Linear Statistical Inference and its Applications, 2nd edition, Wiley & Sons, New York.

Schaafsma, W. (1982). In Handbook of Statistics, Vol. 2, P. R. Krishnaiah and L. N. Kanal, eds., 857-881. North Holland, Amsterdam.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Perlman, M.D. On the Feasibility of Parsimonious Variable Selection for Hotelling’s $T^2$-test. Sankhya A (2024). https://doi.org/10.1007/s13171-024-00357-7

Download citation

Received: 19 April 2023
Published: 11 May 2024
DOI: https://doi.org/10.1007/s13171-024-00357-7

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the Feasibility of Parsimonious Variable Selection for Hotelling’s \(T^2\)-test

Abstract

Similar content being viewed by others

Extensions of stability selection using subsamples of observations and covariates

Bayesian Variable Selection

Fourier methods for model selection

1 Introduction

1.1 The Noncentral f-distribution

1.2 Hotelling’s \(T^2\)-test

1.3 The \(T^2\) Variable-selection Problem

2 Some Asymptotic Power Comparisons

Example 2.1

Example 2.2

Case 1 \(\gamma _1=\cdots =\gamma _p=:\delta \) and R has the intraclass form

Case 2 (\(\gamma \) sparse): \(\gamma _i=\gamma _j=:\delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \) in Eq. 45

Case 3 (\(\gamma \) sparse): \(\gamma _i=\delta \) and \(\gamma _j=-\delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \)

Case 4: \(p=:2l\) is even, \(\gamma _i=\delta \) for l indices in \(\{1,\dots ,p\}\), \(\gamma _i=-\delta \) for the remaining l indices, and R has the intraclass form \(R_\rho \)

3 Some Local Power Comparisons

Example 3.1

Example 3.2

Case 1: \(\gamma _1=\cdots =\gamma _p=:\delta \) and R has the intraclass form Eq. 45

Case 2 (\(\gamma \) sparse): \(\gamma _i=\gamma _j\equiv \delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \) in Eq. 45

Case 3 (\(\gamma \) sparse): \(\gamma _i=\delta \) and \(\gamma _j=-\delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \)

Case 4: \(p=:2l\) is even, \(\gamma _i=\delta \) for l indices in \(\{1,\dots ,p\}\), \(\gamma _i=-\delta \) for the remaining l indices, and R has the intraclass form \(R_\rho \)

Remark 3.3

4 Some Exact Power Comparisons for the Bivariate Case

Conjecture 4.1

Conjecture 4.2

Proposition 4.3

Proof

Example 4.1

5 Concluding Remarks

The Oracular Variable-Selection Problem (OVSP)

The Parsimonious Variable-Selection Problem (PVSP)

Example 5.1

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix A Testing for additional information.

Remark A1

Additional References for the Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation