Abstract
Hotelling’s \(T^2\)-test for the mean of a multivariate normal distribution is one of the triumphs of classical multivariate analysis. It is uniformly most powerful among invariant tests, and admissible, proper Bayes, and locally and asymptotically minimax among all tests. Nonetheless, investigators often prefer non-invariant tests, especially those obtained by selecting only a small subset of variables from which the \(T^2\)-statistic is to be calculated, because such reduced statistics are more easily interpretable for their specific application. Thus it is relevant to ask the extent to which power is lost when variable selection is limited to very small subsets of variables, e.g. of size one (yielding univariate Student-\(t^2\) tests) or size two (yielding bivariate \(T^2\)-tests). This study presents preliminary evidence suggesting that in some cases, no power may be lost, in fact may be gained, over a wide range of alternatives.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
This study is motivated by a re-examination of the variable-selection problem for Hotelling’s \(T^2\)-test (closely related to variable selection for linear discriminant analysis). After some notational preliminaries in §1.1, Hotelling’s \(T^2\) is reviewed in §1.2. The variable-selection problem is described in §1.3, where the substance of this investigation is described.
1.1 The Noncentral f-distribution
Let \(\chi _m^2(\lambda )\) denote a noncentral chi-square random variable with m degrees of freedom and noncentrality parameter \(\lambda >0\). The noncentral \(f_{m,n}(\lambda )\) distribution (nonnormalized) with m and n degrees of freedom and noncentrality parameter \(\lambda >0\) is the distribution of the ratio \(\chi _m^2(\lambda )/\chi _n^2\) (also denoted by \(f_{m,n}(\lambda )\)), where the numerator and denominator are independent chi-square random variables and \(\chi _n^2\equiv \chi _n^2(0)\). The upper \(\alpha \)-quantile of \(f_{m,n}\equiv f_{m,n}(0)\) is denoted by \(f_{m,n}^\alpha \), so that
The noncentral \(f_{m,n}\)-test of size \(\alpha \ge 0\) for the problem of testing \(\lambda =0\) vs. \(\lambda >0\) has power function given by
see Das Gupta and Perlman (1974), eqn.(2.1). Clearly \(\pi _\alpha (\lambda ;m,n)\) is decreasing in \(\alpha \), with \(\pi _0(\lambda ;m,n)=0\). Because \(f_{m,n}(\lambda )\) has strictly monotone likelihood ratio in \(\lambda \), \(\pi _\alpha (\lambda ;m,n)\) is strictly increasing in \(\lambda \).
It will be convenient to work with the (central) beta distribution \(b_{m,n}\):
whose probability density function (pdf) is given by
Clearly \(b_{m,n}= 1-b_{n,m}\). The upper and lower \(\alpha \)-quantiles of \(b_{m,n}\) are denoted by \(b_{m,n}^\alpha \) and \(b_{m,n;\alpha }\), respectively, so that
Later we shall need the following relation, obtained from Eqs. 4, 5, 6 and 7:
1.2 Hotelling’s \(T^2\)-test
Let \(X_i:p\times 1\), \(i=1,\dots ,N\) (\(N\ge p+1\)) be a random sample from the p-dimensional multivariate normal distribution \(N_p(\mu ,\Upsigma )\), where \(\mu \ (p\times 1)\equiv (\mu _1,\dots ,\mu _p)' \in \mathbb {R}^p\) and \(\Upsigma \ (p\times p)\equiv (\sigma _{ij})\) is positive definite. The problem of testing
with \(\Upsigma \) unknown is invariant under the group action \(X_i\rightarrow A X_i\), \(i=1,\dots ,N\), where \(A\in GL(p)\), the group of all nonsingular \(p\times p\) matrices. A maximal invariant statistic under GL(p) is given by Hotelling’s \(T^2\) statistic:
where \(\bar{X}=\sum _{i=1}^NX_i\) and \(S=\sum _{i=1}^N(X_i-\bar{X})(X_i-\bar{X})'\). Its distribution is
where
is a maximal invariant parameter. Therefore the uniformly most powerful invariant size-\(\alpha \) test rejects \(H_0\) if \(T^2>f_{p,N-p;\alpha }\), with power function \(\pi _\alpha (\Uplambda ;p,N-p)\); cf. Anderson, 2003 Theorem 5.6.1).Footnote 1
It is informative to express \(\Uplambda \) in terms of scale-free parameters, that is,
where \(R\equiv (\rho _{ij})\) is the \(p\times p\) correlation matrix determined by \(\Upsigma \) and
The testing problem (9) can be stated equivalently as that of testing
with R unknown.
1.3 The \(T^2\) Variable-selection Problem
Denote the components of \(\bar{X}\) by \(\bar{X}_j\), \(j=1,\dots ,p\), and those of S by \(s_{jk}\), \(j,k=1,\dots ,p\). Let \(\Upomega _p\) be the collection of all nonempty subsets of the index set \(I:=\{1,\dots ,p\}\). For \(\omega \in \Upomega _p\) denote the \(\omega \)-subvector of \(\bar{X}\) by \(\bar{X}_\omega \), the \(\omega \)-submatrix of S by \(S_\omega \), and similarly define \(\mu _\omega \), \(\gamma _\omega \), \(\Upsigma _\omega \), and \(R_\omega \). The \(T^2\)-statistic based on \((\bar{X}_\omega ,S_\omega )\) is given by
(\(T_I^2=T^2\), \(\Uplambda _I=\Uplambda \equiv \Uplambda (\gamma ,R)\).) The test that rejects \(H_0\) if \(T_\omega ^2>f_{|\omega |,N-|\omega |}^\alpha \) has size \(\alpha \) for \(H_0\), and its power function is given by
This \(T_\omega ^2\)-test is not invariant under GL(p) but it is admissible for testing \(H_0\) vs. K, being a unique proper Bayes test for a prior distribution under which \(\{\mu \mid \mu _{I\setminus \omega }=0\}\) has prior probability 1; cf. Kiefer and Schwartz (1965); Marden and Perlman (1980).
This paper addresses the feasibility of finding a parsimonious subset \(\omega \) such that the \(T_\omega ^2\)-test maintains high power over a substantial portion of the alternative K. Because \((\gamma ,R)\) is unknown, variable selection in practice is traditionally approached by forward and/or backward selection procedures based on a preliminary sample that yields estimates of \((\gamma ,R)\); see the Appendix. At worst, all \(2^p-1\) nonempty subsets \(\omega \) must be considered.
Recently I consulted on such a variable-selection problem. The investigator, a research and development engineer, had observed 20 physiological variables (blood pressure, temperature, heart rate, etc.) on each of 100 subjects (the numbers are approximate). He wished to compare their responses to a new product design with their responses to the current design. The overall \(T^2\)-statistic, based on a linear combination of all 20 variables, indicated a significant difference between the two sets of responses. However, the client wished to find a more readily interpretable measure of difference, namely a \(T_\omega ^2\)-statistic based on a very small subset \(\omega \) of the 20 variables, hopefully with \(|\omega |=1\) or 2.
Such a desire is not atypical of investigators presented with a multivariate data analysis. This led me to wonder how much power would be lost by restricting variable selection to small subsets \(\omega \), for example to single variables or pairs of variables.
In fact some power might be gained. It is well known (e.g., Das Gupta and Perlman, 1974) that \(\pi _\alpha (\Uplambda _\omega ;\,|\omega |,\,N-|\omega |)\) is decreasing in \(|\omega |\) while increasing in \(\Uplambda _\omega \). Might the decreasing effect outweigh the increasing effect over a significant portion of the sample space? If so then restricting attention to small variable subsets might be desirable.
To state this more precisely, define
Thus \(\hat{\omega }_\alpha (\gamma ,R)\) is the (not necessarily unique) subset \(\omega \) of variables that maximizes the power of the size-\(\alpha \) \(T_\omega ^2\)-test to detect the alternative \((\gamma ,R)\) if the actual value of \((\gamma ,R)\) were revealed by an oracle. Whereas the admissibility of the overall size-\(\alpha \) \(T^2\)-test dictates that its power cannot be everywhere dominated by that of the size-\(\alpha \) \(T_\omega ^2\)-test when \(\omega \ne \Upomega _p\), might it happen that \(|\hat{\omega }_\alpha (\gamma ,R)|\) is small, perhaps 1 or 2, over a fairly wide range of parameter values \((\gamma ,R)\)? If so, then might one, with some confidence, limit variable selection to consideration of single variables (univariate \(t^2\)-tests) or pairs of variables (bivariate \(T^2\)-tests), as an alternative to simply applying the overall (p-variate) \(T^2\)-test?
Of course, corrections for multiple testing must be considered for any variable-selection procedure before definitive conclusions can be drawn and procedures implemented, see §5 for a brief example. However, restriction to small variable subsets has another desirable property: there are relatively few such subsets compared \(2^p-1\) the number of all nonempty subsets of \(\Upomega _p\), greatly reducing any correction factor. For example, when \(p=20\) as in the consulting problem cited above, there are 20 univariate subsets, \({20\atopwithdelims ()2}=190\) bivariate subsets, compared to \(2^{20}-1=1,048,575\) total nonempty subsets.
Such a radical suggestion flies in the face of 100 years of multivariate statistical theory, of which I have been but one of many proponents. This report presents preliminary evidence indicating that limitation of variable selection to low-dimensional tests may not be entirely inappropriate.
In Sections 2, 3, and 4, several examples are considered where tractable algebraic expressions for the asymptotic (\(\Uplambda _\omega \rightarrow \infty \)), local (\(\Uplambda _\omega \rightarrow 0\)), and/or exact values of \(\pi _\alpha (\Uplambda _\omega ;|\omega |, N-|\omega |)\) are available. These in turn can be utilized to compare the powers of \(T_\omega ^2\) and \(T^2\). These examples include both sparse and non-sparse mean-vector configurations, and the results may be the first that are based on algebraically-explicit power function comparisons of the low-dimensional and full-dimensional tests.
Examples 2.1 and 3.1 treat only the simplest possible case: the bivariate case (\(p=2\)) with \(N=3\).Footnote 2 Here it is shown that \(|\hat{\omega }_\alpha (\gamma ,R)|=1\) over large portions of the asymptotic and local regions of the alternative hypothesis K. This implies that the power of at least one of the two univariate Student \(t^2\)-tests (\(|\omega |=1\)) exceeds that of the overall (bivariate) \(T^2\)-test for most alternatives \((\gamma ,R)\) in these regions.
In Example 4.4 this result is extended to the entire alternative hypothesis K, both for \(N=3\) and \(N=5\), but only under the highly restrictive and vague condition that \(\alpha \) be sufficiently small, with “sufficiently small" determined by the value of the unknown noncentrality parameter – see
Examples 2.2 and 3.2 go beyond the bivariate case. Here \(p\ge 3\), \(N=p+2\), and the powers of all possible bivariate \(T_\omega ^2\)-tests (\(|\omega |=2\)) are compared to the power of the overall (p-variate) \(T^2\) test, again only for asymptotic and local alternatives and only for very special configurations of \(\gamma \) and R. In these cases, admittedly highly restrictive, the bivariate \(T_\omega ^2\)-tests dominate the p-variate \(T^2\)-test over a substantial portion of the alternative hypothesis K. This does not establish that \(|\hat{\omega }_\alpha (\gamma ,R)|=2\) but again suggests that variable selection might be limited to small variable subsets \(\omega \).
Together, the preliminary findings in this paper indicate the feasibility and potential benefit of limiting variable selection to small subsets, in particular to univariate or bivariate subsets. Further study will be needed to implement this approach to variable selection and to confirm its efficacy. See §5 and the Appendix for related comments.
2 Some Asymptotic Power Comparisons
The power function of the \(T_\omega ^2\)-test is
(recall (16) and (17)). It follows from eqn. (3.4) in Marden and Perlman (1980) that as \(\Uplambda _\omega \rightarrow \infty \),
Thus for two subsets \(\omega ,\,\omega '\) with \(\omega \subset \omega '\), there exists \(\Uplambda _{|\omega |,|\omega '|,N;\alpha }^*>0\) such that
Therefore power comparisons of \(T_\omega ^2\) and \(T_{\omega '}^2\) for distant alternativesFootnote 3 require determination of the lower quantiles \(b_{n,m;\alpha }\). This can be done explicitly in Examples 2.1 and 2.2 below. Although these examples are of very limited scopeFootnote 4 they begin to suggest that variable subset selection sometimes can be limited to very small subsets \(\omega \in \Upomega _p\), e.g., singletons in the bivariate Example 2.1, or pairs (including singletons) in Example 2.2.
To simplify the notation, set
The quantile \(b_{n,m;\alpha }\) satisfies
For the simple cases \(n=2\) or \(m=2\),
Example 2.1
In the bivariate case \(p=2\), abbreviate the singleton subsets \(\{1\}\) and \(\{2\}\) of \(\Upomega _2\) by 1 and 2 respectively. We shall compare the powers \(\pi _\alpha (\Uplambda _1;\,1,\,2)\) and \(\pi _\alpha (\Uplambda _2;\,1,\,2)\) of the two univariate size-\(\alpha \) \(t^2\)-tests to the power \(\pi _\alpha (\Uplambda ;\,2,\,1)\) of the overall (bivariate) size-\(\alpha \) \(T^2\)-test for distant alternatives.
Assume that \(\gamma _1\ne 0\) (recall (14)) and set
where \(-1<\rho <1\) , so by Eqs. 13 and 17,
Without loss of generality we can assume that \(|\gamma _1|\ge |\gamma _2|\), so \(0\le \eta ^2\le 1\) and
The alternative hypotheses K can be represented as
while \(^\omega _\alpha (\gamma ,R)\) can be re-expressed as \(^\omega _\alpha (\gamma _1,\eta ,\rho )\).
Because \(\max (\Uplambda _1,\Uplambda _2)\le \Uplambda \), it follows from Eqs. 22, 27, and 28 that
In the simplest case \(N=3\), Eq. 25 yields the explicit expression
while the inequality in Eq. 30 is equivalent to
Note that
The quadratic function \(h_{\alpha ,\eta }(\rho )\) (\(-1\le \rho \le 1\)) satisfies
It is easily seen that if \(\alpha \le \frac{2}{3}\) then \(Q_{1,2,3;\alpha }\le \frac{1}{2}\), so \(h_{\alpha ,\eta }(0)\le 0\) for all \(\eta \in [-1,1]\). Thus if \(\alpha \le \frac{2}{3}\) then \(h_{\alpha ,\eta }(\rho )\) must have one root in \([-1,0]\) and one root in [0, 1]. The two roots are given by
note that \(\hat{\rho }_{\alpha ,-\eta }^{\pm }=-\hat{\rho }_{\alpha ,\eta }^{\mp }\).
It follows that if \(\alpha \le \frac{2}{3}\) then for sufficiently large \(\gamma _1^2\), i.e., \(\gamma _1^2\ge \frac{1}{3}\Uplambda _{1,2,3;\alpha }^*\),
that is, at least one of the two univariate \(t^2\)-tests is more powerful than the overall (bivariate) \(T^2\)-test. Specifically, when \(\gamma _1^2\ge \frac{1}{3}\Uplambda _{1,2,3;\alpha }^*\), \(|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1\) in the \((\eta ,\rho )\)-regions of the parameter space indicated in Table 1. From this it is seen that for \(p=2\), \(N=3\), and the common (small) values of \(\alpha \), the bivariate size-\(\alpha \) \(T^2\)-test is dominated by at least one of the two univariate size-\(\alpha \) \(t^2\)-tests for most of the distant alternative hypothesis K, i.e., for suffiiciently large \(\gamma _1^2\). In fact, for most cases this domination occurs over almost the entire range \((-1,1)\) of \(\rho \). \(\square \)
Example 2.2
Suppose that \(p\ge 3\) and \(N=p+2\). The powers of the \({p\atopwithdelims ()2}\) bivariate size-\(\alpha \) \(T^2\)-tests and the overall (p-variate) size-\(\alpha \) \(T^2\)-test will be compared for distant alternatives, which requires comparison of the powers
From Eq. 22,
Therefore for sufficiently large values of \(\Uplambda ^{(2)}\), namely \(\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*\), at least one of the bivariate size-\(\alpha \) \(T^2\)-tests will be more powerful than the p-variate size-\(\alpha \) \(T^2\)-testFootnote 5 provided that
From Eq. 25 we obtain the explicit expression
If we set \(\nu _p=\frac{2}{p}\) and \(U_{p;\alpha }=\frac{\nu _p}{Q_{2,p,p+2;\alpha }}\) then
Table 2 shows that \(Q_{2,p,p+2;\alpha }\) decreases rapidly to 0 as \(p\rightarrow \infty \), which suggests that Eq. 40 might hold over substantial regions of the alternative hypothesis K. We proceed to exhibit several such regions.
Case 1 \(\gamma _1=\cdots =\gamma _p=:\delta \) and R has the intraclass form
where \(\textbf{1}_p=(1,\dots ,1)':p\times 1\) and the allowable range of \(\rho \) is \((-\frac{1}{p-1},\,1)\). Then
By symmetry, all bivariate tests have the same power, and by Eqs. 13 and 17,
Thus \(\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*\) holds for all allowable \(\rho \) if \(\delta ^2\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*\). Also, if we set \(\nu _p=\frac{2}{p}\ (\le \frac{2}{3})\) then Eq. 40 is equivalent to each of the inequalities
Because \((p-1)U_{p;\alpha }>1\) for common (small) values of \(\alpha \) (see Eq. 44 and Table 2), in such cases Eq. 49 is equivalent to
Table 2 shows that in Case 1, \(\tilde{\psi }_{p;\alpha }^-\) is close to the lower limit of the allowable range \((-\frac{1}{p-1},1)\) for \(\rho \). Thus by Eq. 50, all of the bivariate size-\(\alpha \) \(T^2\)-tests are more powerful than the p-variate size-\(\alpha \) \(T^2\)-test for most of the distant alternative hypothesis specified in Case 1, i.e., for sufficiently large \(\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)\).
Case 2 (\(\gamma \) sparse): \(\gamma _i=\gamma _j=:\delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \) in Eq. 45
so again \(\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*\) holds for all allowable \(\rho \) if \(\delta ^2\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*\). Abbreviating \(Q_{2,p,p+2;\alpha }\) by Q, Eq. 40 is equivalent to each of the inequalities
Since \(h_{p;\alpha }(0)=Q-1<0\) for common (small) values of \(\alpha \) (see Eqs. 42-43 and Table 2), \(h_{p;\alpha }(\rho )\) has two real roots \(\tilde{\rho }_{p;\alpha }^-<0<\tilde{\rho }_{p;\alpha }^+\) (found numerically). Therefore \(0>h_{p;\alpha }(\rho )\) for \(\tilde{\rho }_{p;\alpha }^-<\rho <\tilde{\rho }_{p;\alpha }^+\).
Table 2 shows that in Case 2, the interval \((\tilde{\rho }_{p;\alpha }^-,\tilde{\rho }_{p;\alpha }^+)\) covers almost all of the allowable range \((-\frac{1}{p-1},1)\) for \(\rho \). Thus at least one of the bivariate size-\(\alpha \) \(T^2\)-tests is more powerful than the p-variate size-\(\alpha \) \(T^2\)-test for most of the distant alternative hypothesis specified in Case 2, i.e., for sufficiently large \(\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)\).
Case 3 (\(\gamma \) sparse): \(\gamma _i=\delta \) and \(\gamma _j=-\delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \)
Thus \(\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*\) again holds for all allowable \(\rho \) if \(\delta ^2\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*\), while Eq. 40 is equivalent to \(1>Q_{2,p,p+2;\alpha }\), which holds for most \(p,\alpha \) (see Eqs. 42-43 and Table 2). Again at least one of the bivariate size-\(\alpha \) \(T^2\)-tests will be more powerful than the p-variate size-\(\alpha \) \(T^2\)-test for the entire distant alternative hypothesis in Case 3, i.e., for sufficiently large \(\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)\).
Case 4: \(p=:2l\) is even, \(\gamma _i=\delta \) for l indices in \(\{1,\dots ,p\}\), \(\gamma _i=-\delta \) for the remaining l indices, and R has the intraclass form \(R_\rho \)
Thus \(\Uplambda ^{(2)}\ge \Uplambda _{2,p,p+2;\alpha }^*\) again holds for all allowable \(\rho \) if \(\delta ^2\ge (p+2)^{-1}\Uplambda _{2-2,p+2;\alpha }^*\), while Eq. 40 is equivalent to the inequality
Because \(\frac{1-|\rho |}{1-\rho }\le 1\), while \(U_{p;\alpha }>1\) for holds for most \(p,\alpha \) (see Eq. 43 and Table 2), at least one of the bivariate size-\(\alpha \) \(T^2\)-tests is more powerful than the p-varisate size-\(\alpha \) \(T^2\)-test over the entire distant alternative hypothesis in Case 4, i.e., for sufficiently large \(\delta ^2\ (\ge (p+2)^{-1}\Uplambda _{2,p,p+2;\alpha }^*)\)
3 Some Local Power Comparisons
From Eqs. 2-4, as \(\Uplambda _\omega \downarrow 0\) the power function \(\pi _\alpha (\Uplambda _\omega )\equiv \pi _\alpha (\Uplambda _\omega ;|\omega |,N-|\omega |)\) of the \(T_\omega ^2\)-test satisfies
Thus for two subsets \(\omega ,\,\omega '\) with \(\omega \subset \omega '\), there exists \(\Uplambda _{|\omega |,|\omega '|,N;\alpha }^{**}>0\) such that
where, from (2.2) and (2.3) in Das Gupta and Perlman (1974),
Therefore power comparisons of \(T_\omega ^2\) and \(T_{\omega '}^2\) for local alternativesFootnote 6 require determination of the lower tail probabilities \(c_{m,n;k}^\alpha \), which in turn require the lower quantiles \(b_{n,m;\alpha }\) (see Eq. 8).
In parallel with Section 2, this is done explicitly in Examples 3.1 and 3.2. As in Examples 2.1 and 2.2, these examples begin to suggest that variable selection might be limited to very small subsets \(\omega \in \Upomega _p\), e.g., singletons in the bivariate Example 3.1, or pairs (plus singletons) in Example 3.2.
Example 3.1
As in Example 2.1 consider the bivariate case \(p=2\). Repeat the first two paragraphs from Example 2.1 verbatim, except replace “distant alternatives" by “local alternatives". Because \(\max (\Uplambda _1,\Uplambda _2)\le \Uplambda \), it follows from Eqs. 27 and 57 that
In the simplest case \(N=3\), it follows from Eqs. 8, 6, and 25 that
First note that in Eq. 59,
The quadratic function \(h_{\alpha ,\eta }(\rho )\) (\(-1\le \rho \le 1\)) satisfies
It is easily seen that if \(\alpha \le \frac{1}{2}\) then \(Z_{1,2,3;\alpha }\ge 2\), so \(h_{\alpha ,\eta }(0)\le 0\) for all \(\eta \in [-1,1]\). Therefore, if \(\alpha \le \frac{1}{2}\) then \(h_{\alpha ,\eta }(\rho )\) must have one root in \([-1,0]\) and one root in [0, 1]. The two roots are given by
again \(\check{\rho }_{\alpha ,-\eta }^{\pm }=-\check{\rho }_{\alpha ,\eta }^{\mp }\). Thus, if \(\alpha \le \frac{1}{2}\) and \(\rho \in (\check{\rho }_{\alpha ,\eta }^-,\,\check{\rho }_{\alpha ,\eta }^+)\) then Eq. 63 must hold.
To conclude that \(|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1\), \(\gamma _1^2\) must be sufficiently small, i.e.,
Because
for fixed \(\eta \), Eq. 66 will be satisfied provided that
It is straightforward to show that Eq. 68 holds for \(|\eta |<1\) but not for \(|\eta |=1\).
Thus, if \(\alpha \le \frac{1}{2}\), \(|\eta |<1\), and Eqs. 67, 68, and 69 are satisfied then \(|\hat{\omega }_\alpha (\gamma _1,\eta ,\rho )|=1\), in which case at least one of the two univariate \(t^2\)-tests are more powerful than the overall (bivariate) \(T^2\)-test. This occurs in the \((\eta ,\rho )\)-regions of the parameter space indicated in Table 3, provided that \(\gamma _1^2\textstyle <\frac{1}{6}(1-\check{m}_{\alpha ,\eta })\Uplambda _{1,2,3;\alpha }^{**}\). Thus, for \(p=2\), \(N=3\), and the common (small) values of \(\alpha \), the bivariate size-\(\alpha \) \(T^2\)-test is dominated by at least one of the two univariate size-\(\alpha \) \(t^2\)-tests over much of the local alternative hypothesis space. Compared to Table 1, this effect seems somewhat less than for distant alternatives.
Example 3.2
Suppose that \(p\ge 3\) and \(N=p+2\ge 3\). We shall compare the powers of the \({p\atopwithdelims ()2}\) bivariate size-\(\alpha \) \(T^2\)-tests and the p-variate size-\(\alpha \) \(T^2\)-test for local alternatives, which requires comparison of the powers
From Eq. 57,
Therefore for sufficiently small values of \(\Uplambda \), namely \(\Uplambda \le \Uplambda _{2,p,p+2;\alpha }^{**}\), at least one of the bivariate size-\(\alpha \) \(T^2\)-tests will be more powerful than the p-variate size-\(\alpha \) \(T^2\)-testFootnote 7 whenever
From Eqs. 58, 8-6, 25, and some algebra, the explicit expression
is obtained. Setting \(\nu _p=\frac{2}{p}\ (\le \frac{2}{3})\) and \(V_{p;\alpha }:=\nu _p Z_{2,p,p+2;\alpha }\), we have
Table 4 shows that \(Z_{2,p,p+2;\alpha }\) increases rapidly to \(\infty \) as \(p\rightarrow \infty \), which suggests that Eq. 73 might hold over substantial regions of the alternative hypothesis. Several such regions are now exhibited.
Case 1: \(\gamma _1=\cdots =\gamma _p=:\delta \) and R has the intraclass form Eq. 45
Here \(-\frac{1}{p-1}<\rho <1\) and as in Eq. 48,
Here Eq. 73 is equivalent to each of the inequalities
Because \((p-1)V_{p;\alpha }>1\) for common (small) values of \(\alpha \) (see Eq. 77 and Table 4), in such cases Eq. 79 in turn is equivalent to
To conclude that Eqs. 71-72 holds, \(\delta ^2\) must be sufficiently small, i.e.,
However, \(\rho >\breve{\psi }_{p;\alpha }^-\) implies that
Therefore Eq. 81 will be satisfied provided that
If p is large and \(\alpha \) is small, Table 4 shows that in Case 1, \(\breve{\psi }_{p;\alpha }^-\) is close to the lower limit of the allowable range \((-\frac{1}{p-1},1)\) for \(\rho \). Then by Eq. 83, at least one of the bivariate size-\(\alpha \) \(T^2\)-tests will be more powerful than the p-variate size-\(\alpha \) \(T^2\)-test for most of the local alternative hypothesis covered by Case 1, i.e., for \(\delta ^2\textstyle <\breve{m}_{p,\alpha }\Uplambda _{2,p,p+2;\alpha }^{**}\).
Case 2 (\(\gamma \) sparse): \(\gamma _i=\gamma _j\equiv \delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \) in Eq. 45
As in 51,
Abbreviating \(Z_{2,p,p+2;\alpha }\) by Z, Eq. 73 is equivalent to each of the inequalities
Since \(h_{p;\alpha }(0)=1-Z<0\) (cf. Eq. 58), \(h_{p;\alpha }(\rho )\) has real roots \(\breve{\rho }_{p;\alpha }^-<0<\breve{\rho }_{p;\alpha }^+\) (found numerically). Therefore \(0>h_{p;\alpha }(\rho )\) for \(\breve{\rho }_{p;\alpha }^-<\rho <\breve{\rho }_{p;\alpha }^+\).
To conclude that Eqs. 71-72 holds, \(\delta ^2\) must be sufficiently small, i.e.,
Because \(\frac{(1-\rho )[1+(p-1)\rho ]}{1+(p-3)\rho }\) is decreasing in \(\rho \), \(\rho <\breve{\rho }_{p;\alpha }^+\) implies that
Therefore Eq. 84 will be satisfied provided that
Table 4 shows that in Case 2, the interval \((\breve{\rho }_{p;\alpha }^-,\breve{\rho }_{p;\alpha }^+)\) covers almost all of the allowable range \((-\frac{1}{p-1},1)\) for \(\rho \). Thus at least one of the bivariate size-\(\alpha \) \(T^2\)-tests will be more powerful than the p-variate size-\(\alpha \) \(T^2\)-test for most of the local alternative hypothesis determined by Case 2, i.e., for \(\delta ^2<\breve{m}_{p,\alpha }'\Uplambda _{2,p,p+2;\alpha }^{**}\).
Case 3 (\(\gamma \) sparse): \(\gamma _i=\delta \) and \(\gamma _j=-\delta \) for some \(\{i,j\}\subset \{1,\dots ,p\}\), \(\gamma _k=0\) for \(k\ne i,j\), and R has the intraclass form \(R_\rho \)
Here Eq. 73 is equivalent to \(Z_{2,p,p+2;\alpha }>1\), which holds for all \(p,\alpha \) (see Eq. 58).
To conclude that Eqs. 71-72 holds, \(\delta ^2\) must be sufficiently small, i.e.,
for all \(\rho \in (-\frac{1}{p-1},1)\). This requires that \(\rho \) be bounded below 1, that is, \(\rho <1-\epsilon \) for some \(\epsilon >0\), whence Eq. 87 will be satisfied if
Thus at least one of the bivariate size-\(\alpha \) \(T^2\)-tests is more powerful than the p-variate size-\(\alpha \) \(T^2\)-test if \(\rho <1-\epsilon \), which covers almost all of the local region Eq. 88 in the alternative hypothesis determined by Case 3.
Case 4: \(p=:2l\) is even, \(\gamma _i=\delta \) for l indices in \(\{1,\dots ,p\}\), \(\gamma _i=-\delta \) for the remaining l indices, and R has the intraclass form \(R_\rho \)
Here Eq. 73 is equivalent to the inequality
Because \(\frac{1-|\rho |}{1-\rho }\le 1\), while \(V_{p;\alpha }>1\) for holds for most \(p,\alpha \) (see Eq. 76 and Table 4), Eq. 89 is satisfied for most \(p,\alpha \).
To conclude that Eqs. 71-72 holds, \(\delta ^2\) must be sufficiently small, i.e.,
for all \(\rho \in (-\frac{1}{p-1},1)\). This again requires that \(\rho \) be bounded below 1, that is, \(\rho <1-\epsilon \) for some \(\epsilon >0\), whence Eq. 90 will be satisfied if
Thus at least one of the bivariate size-\(\alpha \) \(T^2\)-tests is more powerful than the p-variate size-\(\alpha \) \(T^2\)-test if \(\rho <1-\epsilon \), which covers almost all of the local region Eq. 91 in the alternative hypothesis determined by Case 4. \(\square \)
Remark 3.3
The mean-vector and covariance matrix configurations in Examples 2.1 and 2.2 are the same as those in Examples 3.1 and 3.2 respectively, so the results for distant alternatives in the former can be compared to those for local alternatives in the latter. Because both sets of results support the feasibility of limiting variable selection to small subsets, this suggests that this feasibility may extend to intermediate alternatives as well. Furthermore, both sparse cases (Cases 2 and 3) and non-sparse cases (Cases 1 and 4) exhibit this feasibility in §2 and §3. Of course, more extensive comparisons will be needed to confirm this conclusion.
4 Some Exact Power Comparisons for the Bivariate Case
The results in Sections 2 and 3 compare the power of the overall (p-variate) \(T^2\)-test with those of univariate or bivariate \(T^2\)-tests based on the original variates. However these power comparisons are asymptotic or local, and are relevant only for noncentrality parameters \(\Uplambda \) that approach \(\infty \) or 0. In this section we consider the bivariate case \(p=2\) and attempt to compare the exact power functions of the \(T^2\)-test and the two univariate \(t^2\)-tests for all values of \(\Uplambda \). Two conjectures are presented; the first of these is confirmed in Proposition 4.3 and applied in Example 4.4 for only two simple cases.
Conjecture 4.1
(weak) Suppose that \(p=2\) and N is odd: \(N=2l+1\). Then for each \(\lambda >0\), there exists \(\alpha _l^*(\lambda )\in (0,1)\) such that
with equality when \(\alpha =\alpha _l^*(\lambda )\). \(\square \)
Conjecture 4.1 is established below for \(l=1,2\) and we expect it to hold for all \(l\ge 3\) as well. However, it is unsatisfactory in that if \(\alpha _l^*(\lambda )\) depends nontrivially on \(\lambda \) then we cannot conclude that, at least for small \(\alpha \), one or both of the two univariate size-\(\alpha \) \(t^2\)-tests dominate the bivariate size-\(\alpha \) \(T^2\) test in a large region of the alternative hypothesis. For this the following stronger result would be needed.
Conjecture 4.2
(strong) Conjecture 4.1 holds with \(\alpha _l^*(\lambda )\) not depending on \(\lambda \), i.e., \(\alpha _l^*(\lambda )=\alpha _l^*\). \(\square \)
At this time we do not have evidence either for or against Conjecture 4.2. If valid, it would be essential to determine or approximate the values of \(\alpha _l^*\).
Proposition 4.3
Conjecture 4.1 is valid for \(l=1\) and 2.
Proof
By Eq. 3,
if and only if
From Eqs. 8 and 6 we find that
Set \(u=1-b\) in Eqs. 97-98, then differentiate with respect to \(\alpha \) to obtain
Next,
Therefore a sufficient condition that \(\frac{d}{d\alpha }s_{\delta ,\lambda }^{(l)}(\alpha )> \frac{d}{d\alpha }t_{\delta ,\lambda }^{(l)}(\alpha )\) is that for all \(k\ge 0\),
with strict inequality for at least one k.
Thus for \(\alpha =0\), a sufficient condition that \(\frac{d}{d\alpha }s_{\delta ,\lambda }^{(l)}(\alpha =0)> \frac{d}{d\alpha }t_{\delta ,\lambda }^{(l)}(\alpha =0)\) is that for all \(k\ge 0\),
with strict inequality for at least one k. After some algebra, Eq. 113 can be written equivalently as
where \(R_{k,\delta }\sim \textrm{Binomial}(k,\frac{1}{1+\delta })\). Because \(R_{k,\delta }\) is (strictly) stochastically decreasing in \(\delta \) (for \(k\ge 1\)) while \(\frac{\Upgamma (l+\frac{1}{2}+R_{k,\delta })}{\Upgamma (\frac{1}{2}+R_{k,\delta })}\) is (strictly) increasing in \(R_{k,\delta }\), the left side of Eq. 114\(\equiv \) Eq. 113 is (strictly) decreasing in \(\delta \) (for \(k\ge 1\)).
For \(k=0\) both sides of Eq. 113\(=\frac{\Upgamma (l+\frac{1}{2})}{\Upgamma (\frac{1}{2})}\). For \(k=1\), Eq. 113 is equivalent to the inequality
which is equivalent to \(\delta \le \frac{1}{2l-1}\). Therefore the sufficient condition Eq. 113 for
will be satisfied for all \(\delta \le \frac{1}{2l-1}\) if Eq. 113 holds for \(\delta =\frac{1}{2l-1}\) for all \(k\ge 2\), with strict inequality for at least one \(k\ge 2\).
Because \(s_{\delta ,\lambda }^{(l)}(0)=t_{\delta ,\lambda }^{(l)}(0)=0\), it follows from Eqs. 93-94 that Eq. 113, with strict inequality for some \(k\ge 2\), is a sufficient condition that for each \(\lambda >0\), there exists \(\alpha _l^*(\lambda )\in (0,1)\) such that
with equality when \(\alpha =\alpha _l^*(\lambda )\).
For the simplest case \(l=1\) (\(N=3\)), Eq. 113 with \(\delta =\frac{1}{2l-1}=1\) becomes
which by Eq. 114 can be reduced to the equivalent form
It is straightforward to verify Eq. 117 by induction on k, with strict inequality holding for large k because \(\frac{\Upgamma (\frac{3}{2}+k)}{\Upgamma (1+k)}=O(k^{\frac{1}{2}})\). Therefore Eq. 115 holds for \(l=1\); that is, for each \(\lambda >0\), there exists \(\alpha _1^*(\lambda )\in (0,1)\) such that
with equality when \(\alpha =\alpha _1^*(\lambda )\).
Next consider the case \(l=2\) (\(N=5\)). With \(\delta =\frac{1}{2l-1}=\frac{1}{3}\), Eq. 113 becomes
which can be reduced to the equivalent form
Interestingly, Eq. 120 holds with equality for \(k=2\) as well as for 0 and 1. Rewrite Eq. 120 in the equivalent form
To verify (121) by induction on k, it suffices to show that for \(k\ge 2\),
After simplification, this is equivalent to the inequality
which holds for all \(k\ge 1\), with strict inequality for \(k+1\ge 3\). Thus Eq. 115 holds for \(l=2\): for each \(\lambda >0\), there exists \(\alpha _1^*(\lambda )\in (0,1)\) such that
with equality when \(\alpha =\alpha _2^*(\lambda )\).
Example 4.1
s Return to the bivariate Example 2.1, where \(p=2\) and
(recall Eq. 27). For \(N=3\) it follows from Eqs. 118 and 125 that for each \(\gamma _1^2>0\),
Furthermore,
The two roots of \(h_\eta (\rho )\) are \(\textstyle \hat{\rho }_\eta ^\pm =\frac{\eta \pm \sqrt{12-3\eta ^2}}{4}\); note that \(\hat{\rho }_{-\eta }^{\pm }=-\hat{\rho }_{\eta }^{\mp }\). Some values appear in Table 5. Thus, if \(\alpha <\alpha _1^*(3\gamma _1^2)\) then
that is, at least one of the two univariate \(t^2\)-tests is more powerful than the overall (bivariate) \(T^2\)-test. This occurs in the \((\eta ,\rho )\)-regions of the parameter space indicated in Table 5, which constitute a substantial part of the alternative hypothesis.
Similarly, for \(N=5\) it follows from Eqs. 124 and 125 that for each \(\gamma _1^2>0\),
Furthermore,
The two roots of \(\tilde{h}_\eta (\rho )\) are \(\textstyle \tilde{\rho }_\eta ^\pm =\frac{3\eta \pm \sqrt{40-15\eta ^2}}{8}\); note that \(\tilde{\rho }_{-\eta }^{\pm }=-\tilde{\rho }_{\eta }^{\mp }\). Some values appear in Table 5. Thus, if \(\alpha <\alpha _2^*(5\gamma _1^2)\) then
that is, at least one of the two univariate \(t^2\)-tests is more powerful than the bivariate \(T^2\)-test. Again this occurs in the \((\eta ,\rho )\)-regions of the parameter space indicated in Table 5.
Thus for \(p=2\), \(N=3\) or 5, and sufficiently small \(\alpha \) (but depending on \(\gamma _1^2\)), the bivariate size-\(\alpha \) \(T^2\)-test is dominated by at least one of the two univariate size-\(\alpha \) \(t^2\)-tests over a fairly large portion of the entire alternative hypothesis, comprising local, intermediate, and distant alternatives.
5 Concluding Remarks
For the purpose of encouraging future research, the questions raised in this report are stated formally as follows:
The Oracular Variable-Selection Problem (OVSP)
is that of determining the function \(\hat{\omega }_\alpha (\gamma ,R)\), as defined in Eq. 19, and using this to determine the regions
The Parsimonious Variable-Selection Problem (PVSP)
asks if \(A_\alpha (i)\) comprises a substantial portion of the alternative hypothesis K for small values of i, e.g., \(i=1,2\).
If the answer to the PVSP is positive, then variable selection in some applied investigations can be limited to small, easily interpretable subsets of variables.
Finally, Example 5.1 illustrates the gain in power that ideally can be attained by variable selection limited to univariate subsets even after the crude Bonferonni correction for multiple testing is applied.
Example 5.1
For \(p=10\), \(N=12,22, 41\), and \(\alpha =.05\), Table 6 shows the gain in power obtained by the Bonferroni-corrected test \(T_{\hat{\omega }}^2\) with the oracular subset \(\hat{\omega }\equiv \hat{\omega }_\alpha (\gamma ,R)\) when \(|\hat{\omega }|=1\) and \(\Uplambda =\Uplambda _{\hat{\omega }}=18\). The gain in power can be substantial unless N is very large. (The powers are from Tiku, 1967.)
Notes
However, Giri et al. (1963) also began their study of Hotelling’s \(T^2\) test by considering only this simplest case \(p=2\), \(N=3\).
We do not claim to know the value of \(\Uplambda _{|\omega |,|\omega '|,N;\alpha }^*\), even approximately.
But see Footnote 2.
Note that by itself this does not establish that \(|\hat{\omega }_\alpha (\gamma ,R)|=2\).
We do not claim to know the values of \(\Uplambda _{|\omega |,|\omega '|,N;\alpha }^{**}\), even approximately.
As in Example 2.2, this does not establish that \(|\hat{\omega }_\alpha (\gamma ,R)|=2\).
In line 2 of the second column on p.179 of Das Gupta and Perlman (1974), “conclude" should be“include". In the line following the third display in the second column on p.179, “j" should be“f". In Remark 4.1 on p.180, “increasing in m" should be “decreasing in m". In the next line, “\(m\rightarrow \infty \)" should be “\(n\rightarrow \infty \)".
References
Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd edition, Wiley & Sons, New York.
Das Gupta, S. and Perlman, M. D. (1974). On the power of the noncentral \(F\)-test: effect of additional variates on Hotelling’s \(T^2\)-test. J.Amer. Stat. Assoc. 69 174-180.
Giri, N. and Kiefer, J. (1964). Local and asymptotic minimax properties of multivariate tests. Ann. Math. Stat. 35 21-35.
Giri, N., Kiefer, J., and Stein, C. (1963). Minimax character of Hotelling’s \(T^2\)-test in the simplest case. Ann. Math. Stat. 34 1524-1535.
Kiefer, J. and Schwartz, R. (1965). Admissible Bayes character of \(T^2\)-, \(R^2\), and other fully invariant tests for classical multivariate normal testing problems. Ann. Math. Stat. 36 747-770.
Marden, J. and Perlman, M. D. (1980). Invariant tests for means with covariates. Ann. Stat. 8 25-63.
Stein, C. (1956). The admissibility of Hotelling’s \(T^2\)-test. Ann. Math. Stat. 27 616-623.
Tiku, M. L. (1967). Tables of the power of the \(F\)-test. J.Amer. Stat. Assoc. 62 525-539.
Acknowledgements
This work owes much to the late Somesh Das Gupta, my colleague, teacher, and friend. I am also grateful to David Perlman for raising the questions addressed here and providing supporting data, and to an anonymous referee who provided many insightful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
No funding was received to assist with the preparation of this manuscript. The author has no financial or proprietary interests in any material discussed in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Testing for additional information.
Variable selection for the \(T^2\)-test and related linear discriminant analysis was thoroughly studied in the 1970s and 1980s, an era of limited computer power, and subsequently by several authors with greater ability to consider all-subsets methods; a list of references appears below. Almost all of these studies were based on testing for additional information (= increased Mahalanobis distance), as now described.
For any two nested subsets \(\omega \subset \omega '\) in \(\Upomega _p\), in general \(\Uplambda _\omega \le \Uplambda _{\omega '}\). The question of whether the power of the \(T_{\omega '}\)-test exceeds that of the \(T_{\omega }\)-test for the testing problem Eq. 9 usually was formulated as the problem of testing for additional information (TAI), namely, testing
based on a preliminary sample – see [Rao (1973)] §8c.4. This formulation was adopted by many researchers, even while citing the following result of Das Gupta and Perlman (1974) which implies that this standard formulation of TAI is inappropriate.
It was shown in [DGP, Theorem 2.1] that for fixed \(\lambda >0\), the power function \(\pi _\alpha (\lambda ;m,n)\) (recall Eq. 2) of the non-central f-test is strictly decreasing in m and strictly increasing in n.Footnote 8 Therefore for any integer \(1\le q\le n-1\) there exists a unique real number
such that
Here \(g_\alpha (0)=0\) and \(g_\alpha (\lambda )\) is strictly increasing in \(\lambda \); cf. (Theorem 3.1, Das Gupta and Perlman, 1974). Thus the power is increased only if
Therefore (Section 4, Das Gupta and Perlman, 1974) introduced the problem of testing for increased power (TIP), namely, testing
and proposed several (approximate) tests.
This proposal was noted by subsequent authors but never implemented for variable selection, possibly because of difficulties in computing the functions \(g_\alpha (\cdot )\), especially if many pairs \((\omega ,\omega ')\) must be considered. However, if as suggested above, variable selection might be limited to very small subsets of variables in practical applications, then replacing the TAI by the TIP might be feasible.
Remark A1
The relation Eq. 92 in Conjecture 4.1 can be stated equivalently in terms of \(g_\alpha \):
with equality when \(\alpha =\alpha _l^*(\lambda )\). Thus the relations Eqs. 118 and 124 in Proposition 4.3 also can be stated equivalently in terms of \(g_\alpha \):
with equality when \(\alpha =\alpha _1^*(\lambda )\);
with equality when \(\alpha =\alpha _2^*(\lambda )\).
Additional References for the Appendix
Hand, D. J. (1981). Discrimination and Classification, Wiley & Sons, New York. [Chapter 6]
Hawkins, D. M. (1976). The subset problem in multivariate analysis of variance. J. Royal Statist. Soc, Series B 38 132-139.
Jain, A. K. and Waller, W. G. (1978). On the optimal number of features in the classification of multivariate Gaussian data. Pattern Recognition 10 103-109.
Jiang, W., Wang, K., and Tsung, F. (2012). A variable-selection-based multivariate EWMA chart for process monitoring and diagnosis. J. Quality Technology 44 209-230.
McCabe, G. P. Jr. (1975). Computations for variable selection in discriminant analysis. Technometrics 17 259-263.
McKay, R. J. (1976). A graphical aid to selection of variables in two-group discriminant analysis. Appl. Statist. 27 259-263.
McLachlan, G. J. (1976). On the relationship between the F test and the overall error rate for variable selection in two-group discriminant function. Biometrics 36 501-510.
McLachlan, G. J. (1980). A criterion for selecting variables for the linear discriminant function. Biometrics 32 529-534.
McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition, Wiley & Sons, New York. [Chapter 12]
Murray, G. D. (1977). A cautionary note on selection of variables in discriminant analysis. Appl. Statist. 26 246-250.
Nobuo, S. and Takahisa, I. (2016). A variable selection method for detecting abnormality base on the \(T^2\) test. Comm. Statist. - Theory and Methods 46 501-510.
Rao, C. R. (1973). Linear Statistical Inference and its Applications, 2nd edition, Wiley & Sons, New York.
Schaafsma, W. (1982). In Handbook of Statistics, Vol. 2, P. R. Krishnaiah and L. N. Kanal, eds., 857-881. North Holland, Amsterdam.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Perlman, M.D. On the Feasibility of Parsimonious Variable Selection for Hotelling’s \(T^2\)-test. Sankhya A (2024). https://doi.org/10.1007/s13171-024-00357-7
Received:
Published:
DOI: https://doi.org/10.1007/s13171-024-00357-7
Keywords
- Multivariate normal distribution
- hotelling’s \(T^2\) test
- student’s \(t^2\)
- variable selection
- test for additional information
- Mahalanobis distance