Abstract
It is assumed that the readers are familiar with the concept of testing statistical hypotheses on the parameters of a real scalar normal distribution or independent real scalar normal distributions. The likelihood ratio criterion is employed for testing various hypotheses on the parameters of one or more real multivariate Gaussian (or normal) distributions. The tests are based on a simple random samples from a multivariate nonsingular Gaussian distribution. The corresponding test criteria for the complex Gaussian case are also provided for given hypotheses.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
6.1. Introduction
It is assumed that the readers are familiar with the concept of testing statistical hypotheses on the parameters of a real scalar normal density or independent real scalar normal densities. Those who are not or require a refresher may consult the textbook: Mathai and Haubold (2017) on basic “Probability and Statistics” [De Gruyter, Germany, 2017, free download]. Initially, we will only employ the likelihood ratio criterion for testing hypotheses on the parameters of one or more real multivariate Gaussian (or normal) distributions. All of our tests will be based on a simple random sample of size n from a p-variate nonsingular Gaussian distribution, that is, the p × 1 vectors X 1, …, X n constituting the sample are iid (independently and identically distributed) as X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, when a single real Gaussian population is involved. The corresponding test criterion for the complex Gaussian case will also be mentioned in each section.
In this chapter, we will utilize the following notations. Lower-case letters such as x, y will be used to denote real scalar mathematical or random variables. No distinction will be made between mathematical and random variables. Capital letters such as X, Y will denote real vector/matrix-variate variables, whether mathematical or random. A tilde placed on a letter as for instance \(\tilde {x}, \tilde {y}, \tilde {X}\) and \( \tilde {Y}\) will indicate that the variables are in the complex domain. No tilde will be used for constant matrices unless the point is to be stressed that the matrix concerned is in the complex domain. The other notations will be identical to those utilized in the previous chapters.
First, we consider certain problems related to testing hypotheses on the parameters of a p-variate real Gaussian population. Only the likelihood ratio criterion, also referred to as λ-criterion, will be utilized. Let L denote the joint density of the sample values in a simple random sample of size n, namely, X 1, …, X n, which are iid N p(μ, Σ), Σ > O. Then, as was previously established,
where \(S=\sum _{j=1}^n(X_j-\bar {X})(X_j-\bar {X})'\) is the sample sum of products matrix and \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) is the sample average, n being the sample size. As well, we have already determined that the maximum likelihood estimators (MLE’s) of μ and Σ are \(\hat {\mu }=\bar {X}\) and \(\hat {\varSigma }=\frac {1}{n}S,\) the sample covariance matrix. Consider the parameter space
The maximum value of L within Ω is obtained by substituting the MLE’s of the parameters into L, and since \((\bar {X}-\hat {\mu })=(\bar {X}-\bar {X})=O\) and \({\mathrm{tr}}(\hat {\varSigma }^{-1}S)={\mathrm{tr}}(nI_p)=np,\)
Under any given hypothesis on μ or Σ, the parameter space is reduced to a subspace ω in Ω or ω ⊂ Ω. For example, if H o : μ = μ o where μ o is a given vector, then the parameter space under this null hypothesis reduces to ω = {(μ, Σ)| μ = μ o, Σ > O}⊂ Ω, “null hypothesis” being a technical term used to refer to the hypothesis being tested. The alternative hypothesis against which the null hypothesis is tested, is usually denoted by H 1. If μ = μ o specifies H o, then a natural alternative is H 1 : μ≠μ o. One of two things can happen when considering the maximum of the likelihood function under H o. The overall maximum may occur in ω or it may be attained outside of ω but inside Ω. If the null hypothesis H o is actually true, then ω and Ω will coincide and the maxima in ω and in Ω will agree. If there are several local maxima, then the overall maximum or supremum is taken. The λ-criterion is defined as follows:
If the null hypothesis is true, then λ = 1. Accordingly, an observed value of λ that is close to 0 in a testing situation indicates that the null hypothesis H o is incorrect and should then be rejected. Hence, the test criterion under the likelihood ratio test is to “reject H o for 0 < λ ≤ λ o”, that is, for small values of λ, so that, under H o, the coverage probability over this interval is equal to the significance level α or the probability of rejecting H o when H o is true, that is, Pr{0 < λ ≤ λ o | H o} = α for a pre-assigned α, which is also known as the size of the critical region or the size of the type-1 error. However, rejecting H o when it is not actually true or when the alternative H 1 is true is a correct decision whose probability is known as the power of the test and written as 1 − β where β is the probability of committing a type-2 error or the error of not rejecting H o when H o is not true. Thus we have
When we preassign α = 0.05, we are allowing a tolerance of 5% for the probability of committing the error of rejecting H o when it is actually true and we say that we have a test at the 5% significance level. Usually, we set α as 0.05 or 0.01. Alternatively, we can allow α to vary and calculate what is known as the p-value when carrying out a test. Such is the principle underlying the likelihood ratio test, the resulting test criterion being referred to as the λ-criterion.
In the complex case, a tilde will be placed above λ and L, (6.1.3) and (6.1.4) remaining essentially the same:
and
where α is the size or significance level of the test and 1 − β, the power of the test.
6.2. Testing H o : μ = μ 0 (Given) When Σ is Known, the Real N p(μ, Σ) Case
When Σ is known, the only parameter to estimate is μ, its MLE being \(\bar {X}\). Hence, the maximum in Ω is the following:
In this case, μ is also specified under the null hypothesis H o, so that there is no parameter to estimate. Accordingly,
Thus,
and small values of λ correspond to large values of \(\frac {n}{2}(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\). When X j ∼ N p(μ, Σ), Σ > O, it has already been established that \(\bar {X}\sim N_p(\mu ,~\frac {1}{n}\varSigma ),~\varSigma >O\). As well, \(n(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\) is the exponent in a p-variate real normal density under H o, which has already been shown to have a real chisquare distribution with p degrees of freedom or
Hence, the test criterion is
Under the alternative hypothesis, the distribution of the test statistic is a noncentral chisquare with p degrees of freedom and non-centrality parameter \(\lambda =\frac {n}{2}(\mu -\mu _o)'\varSigma ^{-1}(\mu -\mu _o)\).
Example 6.2.1
For example, suppose that we have a sample of size 5 from a population that has a trivariate normal distribution and let the significance level α be 0.05. Let μ o, the hypothesized mean value vector specified by the null hypothesis, the known covariance matrix Σ, and the five observation vectors X 1, …, X 5 be the following:
the inverse of Σ having been evaluated via elementary transformations. The sample average, \(\frac {1}{5}(X_1+\cdots +X_5)\) denoted by \(\bar {X}\), is
and
For testing H o, the following test statistic has to be evaluated:
As per our criterion, H o should be rejected if \(8\ge \chi ^2_{p,\alpha }\). Since \(\chi ^2_{p,\alpha }=\chi ^2_{3,~0.05}=7.81,\) this critical value being available from a chisquare table, H o : μ = μ o should be rejected at the specified significance level. Moreover, in this case, the p-value is \(Pr\{\chi ^2_3\ge 8\}\approx 0.035,\) which can be evaluated by interpolation from the percentiles provided in a chi-square table or by making use of statistical packages such as R.
6.2.1. Paired variables and linear functions
Let Y 1, …, Y k be p × 1 vectors having their own p-variate distributions which are not known. However, suppose that a certain linear function X = a 1 Y 1 + ⋯ + a k Y k is known to have a p-variate real Gaussian distribution with mean value vector E[X] = μ and covariance matrix Cov(X) = Σ, Σ > O, that is, X = a 1 Y 1 + ⋯ + a k Y k ∼ N p(μ, Σ), Σ > O, where a 1, …, a k are fixed known scalar constants. An example of this type is X = Y 1 − Y 2 where Y 1 consists of measurements on p attributes before subjecting those attributes to a certain process, such as administering a drug to a patient, and Y 2 consists of the measurements on the same attributes after the process is completed. We would like to examine the difference Y 1 − Y 2 to study the effect of the process on these characteristics. If it is reasonable to assume that this difference X = Y 1 − Y 2 is N p(μ, Σ), Σ > O, then we could test hypotheses on E[X] = μ. When Σ is known, the general problem reduces to that discussed in Sect. 6.2. Assuming that we have iid variables on Y 1, …, Y k, we would evaluate the corresponding values of X, which produces iid variables on X, that is, a simple random sample of size n from X = a 1 Y 1 + ⋯ + a k Y k. Thus, when Σ is known, letting \(u=n(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\sim \chi _p^2\,\) where \(\bar {X}\) denote the sample average, the test would be carried out as follows at significance level α:
the non-null distribution of the test statistic u being a non-central chisquare.
Example 6.2.2
Three variables x 1 = systolic pressure, x 2 = diastolic pressure and x 3 = weight are monitored after administering a drug for the reduction of all these p = 3 variables. Suppose that a sample of n = 5 randomly selected individuals are given the medication for one week. The following five pairs of observations on each of the three variables were obtained before and after the administration of the medication:
Let X denote the difference, that is, X is equal to the reading before the medication was administered minus the reading after the medication could take effect. The observation vectors on X are then
In this case, X 1, …, X 5 are observations on iid variables. We are going to assume that these iid variables are coming from a population whose distribution is N 3(μ, Σ), Σ > O, where Σ is known. Let the sample average \(\bar {X}=\frac {1}{5}(X_1+\cdots +X_5)\), the hypothesized mean value vector specified by the null hypothesis H o : μ = μ o, and the known covariance matrix Σ be as follows:
Let us evaluate \(\bar {X}-\mu _o\) and \( n(\bar {X}-\mu _o)'\varSigma ^{-1}(\bar {X}-\mu _o)\) which are needed for testing the hypothesis H o : μ = μ o:
Let us test H o at the significance level α = 0.05. The critical value which can readily be found in a chisquare table is \(\chi ^2_{p,~\alpha }=\chi ^2_{3,~0.05}=7.81\). As per our criterion, we reject H o if \(8.4\ge \chi ^2_{p,~\alpha }\); since 8.4 > 7.81, we reject H o. The p-value in this case is \(Pr\{\chi ^2_p\ge 8.4\}=Pr\{\chi ^2_3\ge 8.4\}\approx 0.04\).
6.2.2. Independent Gaussian populations
Let Y j ∼ N p(μ (j), Σ j), Σ j > O, j = 1, …, k, and let these k populations be independently distributed. Assume that a simple random sample of size n j from Y j is available for j = 1, …, k; then these samples can be represented by the p-vectors Y jq, q = 1, …, n j, which are iid as Y j1, for j = 1, …, k. Consider a given linear function X = a 1 Y 1 + ⋯ + a k Y k where X is p × 1 and the Y j’s are taken in a given order. Let \(U=a_1\bar {Y}_1+\cdots +a_k\bar {Y}_k\) where \(\bar {Y}_j=\frac {1}{n_j}\sum _{q=1}^{n_j}Y_{jq}\) for j = 1, …, k. Then E[U] = a 1 μ (1) + ⋯ + a k μ (k) = μ (say), where a 1, …, a k are given real scalar constants. The covariance matrix in U is \({\mathrm{Cov}}(U)=\frac {a_1^2}{n_1}\varSigma _1+\cdots +\frac {a_k^2}{n_k}\varSigma _k=\frac {1}{n}\varSigma \) (say), where n is a symbol. Consider the problem of testing hypotheses on μ when Σ is known or when a j, Σ j, j = 1, …, k, are known. Let H o : μ = μ o (specified), in the sense μ (j) is a known vector for j = 1, …, k, when Σ is known. Then, under H o, all the parameters are known and the standardized U is observable, the test statistic being
where \(\chi _p^{2(j)},~ j=1,\ldots ,k,\) denote independent chisquares random variables, each having p degrees of freedom. However, since this is a linear function of independent chisquare variables, even the null distribution is complicated. Thus, only the case of two independent populations will be examined.
Consider the problem of testing the hypothesis μ 1 − μ 2 = δ (a given vector) when there are two independent normal populations sharing a common covariance matrix Σ (known). Then U is \(U=\bar {Y}_1-\bar {Y}_2\) with E[U] = μ 1 − μ 2 = δ (given) under H o and \({\mathrm{Cov}}(U)=(\frac {1}{n_1}+\frac {1}{n_2})\varSigma =\frac {n_1+n_2}{n_1n_2}\varSigma ,\) the test statistic, denoted by v, being
The resulting test criterion is
Example 6.2.3
Let Y 1 ∼ N 3(μ (1), Σ) and Y 2 ∼ N 3(μ (2), Σ) represent independently distributed normal populations having a known common covariance matrix Σ. The null hypothesis is H o : μ (1) − μ (2) = δ where δ is specified. Denote the observation vectors on Y 1 and Y 2 by Y 1j, j = 1, …, n 1 and Y 2j, j = 1, …, n 2, respectively, and let the sample sizes be n 1 = 4 and n 2 = 5. Let those observation vectors be
and the common covariance matrix Σ be
Let the hypothesized vector under H o : μ (1) − μ (2) = δ be δ′ = (1, 1, 2). In order to test this null hypothesis, the following quantities must be evaluated:
They are
Then,
Let us test H o at the significance level α = 0.05. The critical value which is available from a chisquare table is \(\chi ^2_{p,~\alpha }=\chi ^2_{3,~0.05}=7.81\). As per our criterion, we reject H o if \(2.95\ge \chi ^2_{p,~\alpha }\); however, since 2.95 < 7.81, we cannot reject H o. The p-value in this case is \(Pr\{\chi ^2_p\ge 2.95\}=Pr\{\chi ^2_3\ge 2.95\}\approx 0.096,\) which can be determined by interpolation.
6.2a. Testing H o : μ = μ o (given) When Σ is Known, Complex Gaussian Case
The derivation of the λ-criterion in the complex domain is parallel to that provided for the real case. In the parameter space,
and under H o : μ = μ o, a given vector,
Accordingly,
Here as well, small values of \(\tilde {\lambda }\) correspond to large values of \(\tilde {y}\equiv n(\bar {\tilde {X}}-\mu _o)^{*}\varSigma ^{-1}(\bar {\tilde {X}}-\mu _o),\) which has a real gamma distribution with the parameters (α = p, β = 1) or a chisquare distribution with p degrees of freedom in the complex domain as described earlier so that \(2\tilde {y}\) has a real chisquare distribution having 2p degrees of freedom. Thus, a real chisquare table can be utilized for testing the null hypothesis H o, the criterion being
The test criteria as well as the decisions are parallel to those obtained for the real case in the situations of paired values and in the case of independent populations. Accordingly, such test criteria and associated decisions will not be further discussed.
Example 6.2a.1
Let p = 2 and the 2 × 1 complex vector \(\tilde {X}\sim \tilde {N}_2(\tilde {\mu },~\tilde {\varSigma }),~ \tilde {\varSigma }=\tilde {\varSigma }^{*}>O\), with \(\tilde {\varSigma }\) assumed to be known. Consider the null hypothesis \(H_o:\tilde {\mu }=\tilde {\mu }_o\) where \(\tilde {\mu }_o\) is specified. Let the known \(\tilde {\varSigma }\) and the specified \(\tilde {\mu }_o\) be the following where \(i=\sqrt {(-1)}\):
Let the general \(\tilde {\mu }\) and general \(\tilde {X}\) be represented as follows for p = 2:
so that, for the given \(\tilde {\varSigma }\),
The exponent of the general density for p = 2, excluding − 1, is the form \((\tilde {X}-\tilde {\mu })^{*}\tilde {\varSigma }^{-1}(\tilde {X}-\tilde {\mu })\). Further,
since both \(\tilde {\varSigma }\) and \(\tilde {\varSigma }^{-1}\) are Hermitian. Thus, the exponent, which is 1 × 1, is real and negative definite. The explicit form, excluding − 1, for p = 2 and the given covariance matrix \(\tilde {\varSigma }\), is the following:
and the general density for p = 2 and this \(\tilde {\varSigma }\) is of the following form:
where the Q is as previously given. Let the following be an observed sample of size n = 4 from a \(\tilde {N}_2(\tilde {\mu }_o,~\tilde {\varSigma })\) population whose associated covariance matrix \(\tilde {\varSigma }\) is as previously specified:
Then,
Let us test the stated null hypothesis at the significance level α = 0.05. Since \(\chi ^2_{2p,\,\alpha }=\chi ^2_{4,\,0.05}=9.49\) and 46.5 > 9.49, we reject H o. In this case, the p-value is \(Pr\{\chi ^2_{2p}\ge 46.5\}=Pr\{\chi ^2_4\ge 46.5\}\approx 0\).
6.2.3. Test involving a subvector of a mean value vector when Σ is known
Let the p × 1 vector X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, and the X j’s be independently distributed. Let the joint density of X j, j = 1, …, n, be denoted by L. Then, as was previously established,
where \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) and, letting X = (X 1, …, X n) of dimension p × n and \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\), \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\). Let \(\bar {X}\), Σ −1 and μ be partitioned as follows:
where \(\bar {X}^{(1)}\) and μ (1) are r × 1, r < p, and Σ 11 is r × r. Consider the hypothesis \(\mu ^{(1)}=\mu ^{(1)}_o\) (specified) with Σ known. Thus, this hypothesis concerns only a subvector of the mean value vector, the population covariance matrix being assumed known. In the entire parameter space Ω, μ is estimated by \(\bar {X}\) where \(\bar {X}\) is the maximum likelihood estimator (MLE) of μ. The maximum of the likelihood function in the entire parameter space is then
Let us now determine the MLE of μ (2), which is the only unknown quantity under the null hypothesis. To this end, we consider the following expansion:
Noting that there are only two terms involving μ (2) in (iii), we have
Then, substituting this MLE \(\hat {\mu }^{(2)}\) in the various terms in (iii), we have the following:
since, as established in Sect. 1.3, \(\varSigma _{11}^{-1}=\varSigma ^{11}-\varSigma ^{12}(\varSigma ^{22})^{-1}\varSigma ^{21}\). Thus, the maximum of L under the null hypothesis is given by
and the λ-criterion is then
Hence, we reject H o for small values of λ or for large values of \(n(\bar {X}^{(1)}-\mu ^{(1)}_o)'\varSigma _{11}^{-1}(\bar {X}^{(1)}-\mu ^{(1)}_o)\sim \chi ^2_{r}\) since the expected value and covariance matrix of \(\bar {X}^{(1)}\) are respectively \(\mu ^{(1)}_o\) and Σ 11∕n. Accordingly, the criterion can be enunciated as follows:
\(\mbox{with}\ Pr\{\chi ^2_r\ge \chi ^2_{r,~\alpha }\}=\alpha \). In the complex Gaussian case, the corresponding \(2\tilde {u}\) will be distributed as a real chisquare random variable having 2r degrees of freedom; thus, the criterion will consist of rejecting the corresponding null hypothesis whenever the observed value of \(2\tilde {u}\ge \chi ^2_{2r,\,\alpha }\).
Example 6.2.4
Let the 4 × 1 vector X have a real normal distribution N 4(μ, Σ), Σ > O. Consider the hypothesis that part of μ is specified. For example, let the hypothesis H o and Σ be the following:
Since we are specifying the first two parameters in μ, the hypothesis can be tested by computing the distribution of . Observe that X (1) ∼ N 2(μ (1), Σ 11), Σ 11 > O where
Let the observed vectors from the original N 4(μ, Σ) population be
Then the observations corresponding to the subvector X (1), denoted by \(X_j^{(1)}\), are the following:
In this case, the sample size n = 5 and the sample mean, denoted by \(\bar {X}^{(1)}\), is
Therefore
If \(9.73>\chi ^2_{2,\,\alpha },\) then we would reject \(H_o^{(1)}\,: \mu ^{(1)}=\mu _o^{(1)}\). Let us test this hypothesis at the significance level α = 0.01. Since \(\chi ^2_{2,\,0.01}=9.21,\) we reject the null hypothesis. In this instance, the p-value, which can be determined from a chisquare table, is \(Pr\{\chi ^2_2\ge 9.73\}\approx 0.007\).
6.2.4. Testing μ 1 = ⋯ = μ p, with Σ known, real Gaussian case
Let X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, and the X j be independently distributed. Letting μ′ = (μ 1, …, μ p), consider the hypothesis
where ν, the common μ j is unknown. This implies that μ i − μ j = 0 for all i and j. Consider the p × 1 vector J of unities, J′ = (1, …, 1) and then take any non-null vector that is orthogonal to J. Let A be such a vector so that A′J = 0. Actually, p − 1 linearly independent such vectors are available. For example, if p is even, then take 1, −1, …, 1, −1 as the elements of A and, when p is odd, one can start with 1, −1, …, 1, −1 and take the last three elements as 1, −2, 1, or the last element as 0, that is,
When the last element of the vector A is zero, we are simply ignoring the last element in X j. Let the p × 1 vector X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, and the X j’s be independently distributed. Let the scalar y j = A′X j and the 1 × n vector Y = (y 1, …, y n) = (A′X 1, …, A′X n) = A′(X 1, …, X n) = A′ X, where the p × n matrix X = (X 1, …, X n). Let \(\bar {y}=\frac {1}{n}(y_1+\cdots +y_n)=A'\frac {1}{n}(X_1+\cdots +X_n)=A'\bar {X}\). Then. \(\sum _{j=1}^n(y_j-\bar {y})(y_j-\bar {y})' =A'\sum _{j=1}^n(X_j-\bar {X})(X_j-\bar {X})'A\) where \(\sum _{j=1}^n(X_j-\bar {X})(X_j-\bar {X})'=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'=S=\) the sample sum of products matrix in the X j’s, where \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\) the p × n matrix whose columns are all equal to \(\bar {X}\). Thus, one has \(\sum _{j=1}^n(y_j-\bar {y})^2=A'SA\). Consider the hypothesis μ 1 = ⋯ = μ p = ν. Then, A′μ = νA′J = ν 0 = 0 under H o. Since X j ∼ N p(μ, Σ), Σ > O, we have y j ∼ N 1(A′μ, A′ΣA), A′ΣA > 0. Under H o, y j ∼ N 1(0, A′ΣA), j = 1, …, n, the y j’s being independently distributed. Consider the joint density of y 1, …, y n, denoted by L:
Since Σ is known, the only unknown quantity in L is μ. Differentiating \(\ln L\) with respect to μ and equating the result to a null vector, we have
However, since A is a fixed known vector and the equation holds for arbitrary \(\bar {X}\), \(\hat {\mu }=\bar {X}\). Hence the maximum of L, in the entire parameter space Ω = μ, is the following:
Now, noting that under H o, A′μ = 0, we have
From (i) to (iii), the λ-criterion is as follows, observing that \(A'(\sum _{j=1}^nX_jX_j^{\prime })A=\sum _{j=1}^nA'(X_j-\bar {X})(X_j-\bar {X})'A+nA'(\bar {X}\bar {X}'A)=A'SA+nA'\bar {X} \bar {X}'A\):
But since \(\sqrt {\frac {n}{A'\varSigma A}}A'\bar {X}\sim N_1(0,~1)\) under H o, we may test this null hypothesis either by using the standard normal variable or a chisquare variable as \(\frac {n}{A'\varSigma A}A'\bar {X}\bar {X}'A\sim \chi ^2_1\) under H o. Accordingly, the criterion consists of rejecting H o
or
Example 6.2.5
Consider a 4-variate real Gaussian vector X ∼ N 4(μ, Σ), Σ > O with Σ as specified in Example 6.2.4 and the null hypothesis that the individual components of the mean value vector μ are all equal, that is,
Let L be a 4 × 1 constant vector such that L′ = (1, −1, 1, −1). Then, under H o, L′μ = 0 and u = L′X is univariate normal; more specifically, u ∼ N 1(0, L′ΣL) where
Let the observation vectors be the same as those used in Example 6.2.4 and let u j = L′X j, j = 1, …, 5. Then, the five independent observations from u ∼ N 1(0, 7) are the following:
the average \(\bar {u}=\frac {1}{5}(u_1+\cdots +u_5)=\frac {1}{5}(-1-3-3-3-1)\) being equal to \(-\frac {11}{5}.\) Then, the standardized sample mean \(z=\frac {\sqrt {n}}{{\sigma _u}}(\bar {u}-0)\sim N_1(0,~1)\). Let us test the null hypothesis at the significance level α = 0.05. Referring to a N 1(0, 1) table, the required critical value, denoted by \(z_{\frac {\alpha }{2}}=z_{0.025}\) is 1.96. Therefore, we reject H o in favor of the alternative hypothesis that at least two components of μ are unequal at significance level α if the observed value of
Since the observed value of |z| is \(|\frac {\sqrt {5}}{\sqrt {7}}(-\frac {7}{5}-0)|=\sqrt {1.4}=1.18\) is less than 1.96, we do not reject H o at the 5% significance level. Letting z ∼ N 1(0, 1), the p-value in this case is Pr{|z|≥ 1.18} = 0.238, this quantile being available from a standard normal table.
In the complex case, proceeding in a parallel manner to the real case, the lambda criterion will be the following:
where an asterisk indicates the conjugate transpose. Letting \(\tilde {u}=\frac {2n}{A^{*}\varSigma A}(A^{*}\bar {\tilde {X}}\bar {\tilde {X}}^{*}A),\) it can be shown that under H o, \(\tilde {u}\) is distributed as a real chisquare random variable having 2 degrees of freedom. Accordingly, the criterion will be as follows:
Example 6.2a.2
When p > 2, the computations become quite involved in the complex case. Thus, we will let p = 2 and consider the bivariate complex \(\tilde {N}_2(\tilde {\mu },\tilde {\varSigma })\) distribution that was specified in Example 6.2a.1, assuming that \(\tilde {\varSigma }\) is as given therein, the same set of observations being utilized as well. In this case, the null hypothesis is \(H_o: \tilde {\mu }_1=\tilde {\mu }_2\), the parameters and sample average being
Letting L′ = (1, −1), \(L'\tilde {\mu }=0\) under H o, and
The criterion consists of rejecting H o if the observed value of \(v\ge \chi ^2_{2,\,\alpha }\). Letting the significance level of the test be α = 0.05, the critical value is \(\chi ^2_{2,\,0.05}=5.99\), which is readily available from a chisquare table. The observed value of v being \(\frac {5}{6}<5.99\), we do not reject H o. In this case, the p-value is \(Pr\{\chi ^2_2\ge \frac {5}{6}\}\approx 0.318\).
6.2.5. Likelihood ratio criterion for testing H o : μ 1 = ⋯ = μ p , Σ known
Consider again, X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, with the X j’s being independently distributed and Σ, assumed known. Letting the joint density of X 1, …, X n be denoted by L, then, as determined earlier,
where n is the sample size and S is the sample sum of products matrix. In the entire parameter space
the MLE of μ is \(\bar {X}= \) the sample average. Then
Consider the following hypothesis on μ′ = (μ 1, …, μ p):
Then, the MLE of μ under H o is \(\hat {\mu }=J\hat {\nu }=J\frac {1}{p}J'\bar {X},~ J'=(1,\ldots ,1)\). This \(\hat {\nu }\) is in fact the sum of all observations on all components of X j, j = 1, …, n, divided by np, which is identical to the sum of all the coordinates of \(\bar {X}\) divided by p or \(\hat {\mu }=\frac {1}{p}JJ'\bar {X}\). In order to evaluate the maximum of L under H o, it suffices to substitute \(\hat {\mu }\) to μ in (i). Accordingly, the λ-criterion is
Thus, we reject H o for small values of λ or for large values of \(w\equiv n(\bar {X}-\hat {\mu })'\varSigma ^{-1}(\bar {X}-\hat {\mu })\). Let us determine the distribution of v. First, note that
and let
since \(J'(I-\frac {1}{p}JJ')=O\), μ = νJ being the true mean value of the N p(μ, Σ) distribution. Observe that \(\sqrt {n}(\bar {X}-\mu )\sim N_p(O,~\varSigma ),~ \varSigma >O\), and that \(\frac {1}{p}JJ'\) is idempotent. Since \(I-\frac {1}{p}JJ'\) is also idempotent and its rank is p − 1, there exists an orthonormal matrix P, PP′ = I, P′P = I, such that
Letting \(U=P\sqrt {n}(\bar {X}-\hat {\mu }),\) with U′ = (u 1, …, u p−1, u p), U ∼ N p(O, PΣP′). Now, on noting that
we have
B being the covariance matrix associated with U 1, so that U 1 ∼ N p−1(O, B), B > O. Thus, \(U_1^{\prime }B^{-1}U_1\sim \chi ^2_{p-1}\), a real scalar chisquare random variable having p − 1 degrees of freedom. Hence, upon evaluating
one would reject H o : μ 1 = ⋯ = μ p = ν, ν unknown, whenever the observed value of
Observe that the degrees of freedom of this chisquare variable, that is, p − 1, coincides with the number of parameters being restricted by H o.
Example 6.2.6
Consider the trivariate real Gaussian population X ∼ N 3(μ, Σ), Σ > O, as already specified in Example 6.2.1 with the same Σ and the same observed sample vectors for testing H o : μ′ = (ν, ν, ν), namely,
The following test statistic has to be evaluated for p = 3:
We have to evaluate the following quantities in order to determine the value of w:
Thus,
We reject H o whenever \(w\ge \chi ^2_{p-1,\alpha }\). Letting the significance level be α = 0.05, the tabulated critical value is \(\chi ^2_{p-1,\,\alpha }=\chi ^2_{2,\,0.05}=5.99\), and since 0.38 < 5.99, we do not reject the null hypothesis. In this instance, the p-value is \(Pr\{\chi ^2_2\ge 0.38\}\approx 0.32\).
6.3. Testing H o : μ = μ o (given) When Σ is Unknown, Real Gaussian Case
In this case, both μ and Σ are unknown in the entire parameter space Ω; however, μ = μ o known while Σ is still unknown in the subspace ω. The MLE under Ω is the same as that obtained in Sect. 6.1.1., that is,
When μ = μ o, Σ is estimated by \(\hat {\varSigma }=\frac {1}{n}\sum _{j=1}^n(X_j-\mu _o)(X_j-\mu _o)'\). As shown in Sect. 3.5, \(\hat \varSigma \) can be reexpressed as follows:
Then, under the null hypothesis, we have
Thus,
On applying results on the determinants of partitioned matrices which were obtained in Sect. 1.3, we have the following equivalent representations of the denominator:
that is,
which yields the following simplified representation of the likelihood ratio statistic:
Small values of λ correspond to large values of \(u\equiv n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), which is connected to Hotelling’s \(T_n^2\) statistic. Hence the criterion is the following: “Reject H o for large values of u”. The distribution of u can be derived by making use of the independence of the sample mean and sample sum of products matrix and the densities of these quantities. An outline of the derivation is provided in the next subsection.
6.3.1. The distribution of the test statistic
Let us examine the distribution of \(u=n(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )\). We have already established in Theorems 3.5.3, that S and \(\bar {X}\) are independently distributed in the case of a real p-variate nonsingular Gaussian N p(μ, Σ) population. It was also determined in Corollary 3.5.2 that the distribution of the sample average \(\bar {X}\) is a p-variate real Gaussian vector with the parameters μ and \(\frac {1}{n}\varSigma ,~ \varSigma >O\) and in the continuing discussion, it is shown that the distribution of S is a matrix-variate Wishart with m = n − 1 degrees of freedom, where n is the sample size and parameter matrix Σ > O. Hence the joint density of S and \(\bar {X}\), denoted by \(f(S,\bar {X})\), is the product of the marginal densities. Letting Σ = I, this joint density is given by
Note that it is sufficient to consider the case Σ = I. Due to the presence of S −1 in \(u=(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )\), the effect of any scaling matrix on X j will disappear. If X j goes to \(A^{\frac {1}{2}}X_j\) for any constant positive definite matrix A then S −1 will go to \(A^{-\frac {1}{2}}S^{-1}A^{-\frac {1}{2}}\) and thus u will be free of A. Letting \(Y=S^{-\frac {1}{2}}(\bar {X}-\mu )\) for fixed S, Y ∼ N p(O, S −1∕n), so that the conditional density of Y , given S, is
Thus, the joint density of S and Y , denoted by f 1(S, Y ), is
On integrating out S from (ii) by making use of a matrix-variate gamma integral, we obtain the following marginal density of Y , denoted by f 2(Y ):
However, |I + nY Y ′| = 1 + nY ′Y, which can be established by considering two representations of the determinant
similarly to what was done in Sect. 6.3 to obtain the likelihood ratio statistic given in (6.3.3). As well, it can easily be shown that
by expanding the matrix-variate gamma functions. Now, letting s = Y ′Y , it follows from Theorem 4.2.3 that \({\mathrm{d}}Y=\frac {\pi ^{\frac {p}{2}}}{\varGamma (\frac {p}{2})}s^{\frac {p}{2}-1}{\mathrm{d}}s\). Thus, the density of s, denoted by f 3(s), is
for n = p + 1, p + 2, …, 0 ≤ s < ∞, and zero elsewhere. It can then readily be seen from (6.3.5) that \(ns=nY'Y=n(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )=u\) is distributed as a real scalar type-2 beta random variable whose parameters are \((\frac {p}{2},~ \frac {n}{2}-\frac {p}{2}),~ n=p+1,\ldots \ \). Thus, the following result:
Theorem 6.3.1
Consider a real p-variate normal population N p(μ, Σ), Σ > O, and a simple random sample of size n from this normal population, X j ∼ N p(μ, Σ), j = 1, …, n, the X j ’s being independently distributed. Let the p × n matrix X = (X 1, …, X n) be the sample matrix and the p-vector \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) denote the sample average. Let \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\) be a p × n matrix whose columns are all equal to \(\bar {X},\) and \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\) be the sample sum of products matrix. Then, \(u=n(\bar {X}-\mu )'S^{-1}(\bar {X}-\mu )\) has a real scalar type-2 beta distribution with the parameters \((\frac {p}{2},~ \frac {n}{2}-\frac {p}{2})\) , so that \(u\sim \frac {p}{n-p}F_{p,\,n-p}\) where F p,n−p denotes a real F random variable whose degrees of freedoms are p and n − p.
Hence, in order to test the hypothesis H o : μ = μ o, the likelihood ratio statistic gives the test criterion: Reject H o for large values of \(u=n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), which is equivalent to rejecting H o for large values of an F-random variable having p and n − p degrees of freedom where \(F_{p,\,n-p}=\frac {n-p}{p}\,u=\frac {n-p}{p}\,n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), that is,
at a given significance level α where \(u=n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\sim \frac {p}{n-p}F_{p,\,n-p}\,\), n being the sample size.
Example 6.3.1
Consider a trivariate real Gaussian vector X ∼ N 3(μ, Σ), Σ > O, where Σ is unknown. We would like to test the following hypothesis on μ: H o : μ = μ o, with \( \mu _o^{\prime }=(1,1,1)\). Consider the following simple random sample of size n = 5 from this N 3(μ, Σ) population:
Let X = [X 1, …, X 5] the 3 × 5 sample matrix and \({\bar {\mathbf {X}}}=[\bar {X},\bar {X},\ldots ,\bar {X}]\) be the 3 × 5 matrix of sample means. Then,
Let \(S=\frac {1}{5^2}A\). In order to evaluate the test statistic, we need S −1 = 25A −1. To obtain the correct inverse without any approximation, we will use the transpose of the cofactor matrix divided by the determinant. The determinant of A, |A|, as obtained in terms of the elements of the first row and the corresponding cofactors is equal to 531250. The matrix of cofactors, denoted by Cof(A), which is symmetric in this case, is the following:
The null hypothesis is H o : μ = μ o = (1, 1, 1)′, so that
the observed value of the test statistic being
The test statistic w under the null hypothesis is F-distributed, that is, w ∼ F p,n−p. Let us test H o at the significance level α = 0.05. Since the critical value as obtained from an F-table is F p,n−p,α = F 3,2,0.05 = 19.2 and 2.35 < 19.2, we do not reject H o.
Note 6.3.1
If S is replaced by \(\frac {1}{n-1}S\), an unbiased estimator for Σ, then the test statistic \(\frac {1}{n-1}n(\bar {X}-\mu _o)'[\frac {1}{n-1}S]^{-1}(\bar {X}-\mu _o)=\frac {T_n^2}{n-1}\) where \(T_n^2\) denotes Hotelling’s T 2 statistic, which for p = 1 corresponds to the square of a Student-t statistic having n − 1 degrees of freedom.
Since u as defined in Theorem 6.3.1 is distributed as a type-2 beta random variable with the parameters \((\frac {p}{2},~ \frac {n-p}{2}),\) we have the following results: \(\frac {1}{u}\) is type-2 beta distributed with the parameters \((\frac {n-p}{2},~ \frac {p}{2})\), \(\frac {u}{1+u}\) is type-1 beta distributed with the parameters \((\frac {p}{2},~ \frac {n-p}{2})\), and \(\frac {1}{1+u}\) is type-1 beta distributed with the parameters \((\frac {n-p}{2},~ \frac {p}{2})\), n being the sample size.
6.3.2. Paired values or linear functions when Σ is unknown
Let Y 1, …, Y k be p × 1 vectors having their own distributions which are unknown. However, suppose that it is known that a certain linear function X = a 1 Y 1 + ⋯ + a k Y k has a p-variate real Gaussian N p(μ, Σ) distribution with Σ > O. We would like to test hypotheses of the type \(E[X]=a_1\mu _{(1)}^{(o)}+\cdots +a_k\mu _{(k)}^{(o)}\) where the \(\mu _j^{(o)}\)’s, j = 1, …, k, are specified. Since we do not know the distributions of Y 1, …, Y k, let us convert the iid variables on Y j, j = 1, …, k, to iid variables on X j, say X 1, …, X n, X j ∼ N p(μ, Σ), Σ > O, where Σ is unknown. First, the observations on Y 1, …, Y k are transformed into observations on the X j’s. The problem then involves a single normal population whose covariance matrix is unknown. An example of this type is Y 1 representing a p × 1 vector before a certain process, such as administering a drug to a patient; in this instance, Y 1 could consists of measurements on p characteristics observed in a patient. Observations on Y 2 will then be the measurements on the same p characteristics after the process such as after administering the drug to the patient. Then Y 2q − Y 1q = X q will represent the variable corresponding to the difference in the measurements on the q-th characteristic. Let the hypothesis be H o : μ = μ o (given), Σ being unknown. Note that once the observations on X j are taken, then the individual μ (j)’s are irrelevant as they no longer are of any use. Once the X j’s are determined, one can compute the sum of products matrix S in X j. In this case, the test statistic is \(u=n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o),\) which is distributed as a type-2 beta with parameters \((\frac {p}{2},\, \frac {n-p}{2})\). Then, \(u\sim \frac {p}{n-p}F\) where F is an F random variable having p and n − p degrees of freedom, that is, an F p,n−p random variable, n being the sample size. Thus, the test criterion is applied as follows: Determine the observed value of u and the corresponding observed value of F p,n−p that is, \( \frac {n-p}{p} \, u\), and then
Example 6.3.2
Five motivated individuals were randomly selected and subjected to an exercise regimen for a month. The exercise program promoters claim that the subjects can expect a weight loss of 5 kg as well as a 2-in. reduction in lower stomach girth by the end of the month period. Let Y 1 and Y 2 denote the two component vectors representing weight and girth before starting the routine and at the end of the exercise program, respectively. The following are the observations on the five individuals:
Obviously, Y 1 and Y 2 are dependent variables having a joint distribution. We will assume that the difference X = Y 1 − Y 2 has a real Gaussian distribution, that is, X ∼ N 2(μ, Σ), Σ > O. Under this assumption, the observations on X are
Let X = [X 1, X 2, …, X 5] and \(\mathbf {X}-\bar {\mathbf {X}}=[X_1-\bar {X},\ldots ,X_5-\bar {X}]\), both being 2 × 5 matrix. The observed sample average \(\bar {X}\), the claim of the exercise routine promoters μ = μ o as well as other relevant quantities are as follows:
The test statistic being \(w=\frac {(n-p)}{p}n(\bar {X}-\mu _o)'S^{-1}(\bar {X}-\mu _o)\), its observed value is
Letting the significance level of the test be α = 0.05, the critical value is F p,n−p,α = F 2,3,0.05 = 9.55. Since 22.84 > 9.55, H o is thus rejected.
6.3.3. Independent Gaussian populations
Consider k independent p-variate real Gaussian populations whose individual distribution is N p(μ (j), Σ j), Σ j > O, j = 1, …, k. Given simple random samples of sizes n 1, …, n k from these k populations, we may wish to test a hypothesis on a given linear functions of the mean values, that is, H o : a 1 μ (1) + ⋯ + a k μ (k) = μ 0 where a 1, …, a k are known constants and μ o is a given quantity under the null hypothesis. We have already discussed this problem for the case of known covariance matrices. When the Σ j’s are all unknown or some of them are known while others are not, the MLE’s of the unknown covariance matrices turn out to be the respective sample sums of products matrices divided by the corresponding sample sizes. This will result in a linear function of independent Wishart matrices whose distribution proves challenging to determine, even for the null case.
Special case of two independent Gaussian populations
Consider the special case of two independent real Gaussian populations having identical covariance matrices. that is, let the populations be Y 1q ∼ N p(μ (1), Σ), Σ > O, the Y 1q’s, q = 1, …, n 1, being iid, and Y 2q ∼ N p(μ (2), Σ), Σ > O, the Y 2q’s, q = 1, …, n 2, being iid . Let the sample p × n 1 and p × n 2 matrices be denoted as \({\mathbf {Y}}_1=(Y_{11},\ldots ,Y_{1n_1})\) and \({\mathbf {Y}}_2=(Y_{21},\ldots ,Y_{2n_2})\) and let the sample averages be \(\bar {Y}_j=\frac {1}{n_j}(Y_{j1}+\cdots +Y_{jn_j}),\ j=1,2\). Let \(\bar {\mathbf {Y}}_j=(\bar {Y}_j,\ldots ,\bar {Y}_j)\), a p × n j matrix whose columns are equal to \(\bar {Y}_j\), j = 1, 2, and let
be the corresponding sample sum of products matrices. Then, S 1 and S 2 are independently distributed as Wishart matrices having n 1 − 1 and n 2 − 1 degrees of freedom, respectively. As the sum of two independent p × p real or complex matrices having matrix-variate gamma distributions with the same scale parameter matrix is again gamma distributed with the shape parameters summed up and the same scale parameter matrix, we observe that since the two populations are independently distributed, S 1 + S 2 ≡ S has a Wishart distribution having n 1 + n 2 − 2 degrees of freedom. We now consider a hypothesis of the type μ (1) = μ (2). In order to do away with the unknown common mean value, we may consider the real p-vector \(U= \bar {Y}_1-\bar {Y}_2,\) so that E(U) = O and \({\mathrm{Cov}}(U)=\frac {1}{n_1}\varSigma +\frac {1}{n_2}\varSigma =(\frac {1}{n_1}+\frac {1}{n_2})\varSigma =\frac {n_1+n_2}{n_1n_2}\varSigma \). The MLE of this pooled covariance matrix is \(\frac {1}{n_1+n_2}S\) where S is Wishart distributed with n 1 + n 2 − 2 degrees of freedom. Then, following through the steps included in Sect. 6.3.1 with the parameter m now being n 1 + n 2 − 2, the power of S will become \(\frac {(n_1+n_2-2+1)}{2}-\frac {p+1}{2}\) when integrating out S. Letting the null hypothesis be H o : E[Y 1] − E[Y 2] = δ (specified), such as δ = 0, the function resulting from integrating out S is
where c is the normalizing constant, so that \(w=\frac {(n_1+n_2)}{n_1n_2}(\bar {Y_1}-\bar {Y_2}-\delta )'S^{-1}(\bar {Y}_1-\bar {Y}_2-\delta )\) is distributed as a type-2 beta with the parameters \((\frac {p}{2},~ \frac {(n_1+n_2-1-p)}{2})\). Writing \(w=\frac {p}{n_1+n_2-1-p}F_{p,n_1+n_2-1-p}\), this F is seen to be an F statistic having p and n 1 + n 2 − 1 − p degrees of freedom. We will state these results as theorems.
Theorem 6.3.2
Let the p × p real positive definite matrices X 1 and X 2 be independently distributed as real matrix-variate gamma random variables with densities
j=1,2, and zero elsewhere. Then, as can be seen from ( 5.2.6 ), the Laplace transform associated with X j or that of f j , denoted as \(L_{X_j}({{ }_{*}T})\) , is
Accordingly, U 1 = X 1 + X 2 has a real matrix-variate gamma density with the parameters (α 1 + α 2, B) whose associated Laplace transform is
and U 2 = a 1 X 1 + a 2 X 2 has the Laplace transform
whenever I + a j B −1 ∗ T > O, j = 1, 2, where a 1 and a 2 are real scalar constants.
It follows from (ii) that X 1 + X 2 is also real matrix-variate gamma distributed. When a 1≠a 2, it is very difficult to invert (iii) in order to obtain the corresponding density. This can be achieved by expanding one of the determinants in (iii) in terms of zonal polynomials, say the second one, after having first taken \(|I+a_1B^{-1}{{ }_{*}T}|{ }^{-(\alpha _1+\alpha _2)}\) out as a factor in this instance.
Theorem 6.3.3
Let Y j ∼ N p(μ (j), Σ), Σ > O, j = 1, 2, be independent p-variate real Gaussian distributions sharing the same covariance matrix. Given a simple random sample of size n 1 from Y 1 and a simple random sample of size n 2 from Y 2 , let the sample averages be denoted by \(\bar {Y}_1\) and \(\bar {Y}_2\) and the sample sums of products matrices, by S 1 and S 2 , respectively. Consider the hypothesis H o : μ (1) − μ (2) = δ (given). Letting S = S 1 + S 2 and
where \(F_{p,\, n_1+n_2-1-p}\) denotes an F distribution with p and n 1 + n 2 − 1 − p degrees of freedom, or equivalently, w is distributed as a type-2 beta variable with the parameters \((\frac {p}{2}, \frac {(n_1+n_2-1-p)}{2})\) . We reject the null hypothesis H o if \(\frac {n_1+n_2-1-p}{p}w\ge F_{p,\, n_1+n_2-1-p,\ \alpha }\) with
at a given significance level α.
Theorem 6.3.4
Let w be as defined in Theorem 6.3.3 . Then \(w_1=\frac {1}{w}\) is a real scalar type-2 beta variable with the parameters \((\frac {n_1+n_2-1-p}{2},\, \frac {p}{2})\); \(w_2=\frac {w}{1+w}\) is a real scalar type-1 beta variable with the parameters \((\frac {p}{2},\, \frac {(n_1+n_2-1-p)}{2})\); \(w_3=\frac {1}{1+w}\) is a real scalar type-1 beta variable with the parameters \((\frac {n_1+n_2-1-p}{2},\, \frac {p}{2})\).
Those last results follow from the connections between real scalar type-1 and type-2 beta random variables. Results parallel to those appearing in (i) to (vi) and stated Theorems 6.3.1–6.3.4 can similarly be obtained for the complex case.
Example 6.3.3
Consider two independent populations whose respective distributions are N 2(μ (1), Σ 1) and N 2(μ (2j), Σ 2), Σ j > O, j = 1, 2, and samples of sizes n 1 = 4 and n 2 = 5 from these two populations, respectively. Let the population covariance matrices be identical with Σ 1 = Σ 2 = Σ, the common covariance matrix being unknown, and let the observed sample vectors from the first population, X j ∼ N 2(μ (1), Σ), be
Denoting the sample mean from the first population by \(\bar {X}\) and the sample sum of products matrix by S 1, we have
the observations on these quantities being the following:
Let the sample vectors from the second population denoted as Y 1, …, Y 5 be
Then, the sample average and the deviation vectors are the following:
Letting the null hypothesis be
Thus, test statistic is \(u \sim F_{p,n_1+n_2-1-p}\) where
Let us test H o at the 5% significance level. Since the required critical value is \(F_{p,\,n_1+n_2-1-p,~\alpha }=F_{2,\,6,~0.05}=5.14\) and 0.91 < 5.14, the null hypothesis is not rejected.
6.3.4. Testing μ 1 = ⋯ = μ p when Σ is unknown in the real Gaussian case
Let the p × 1 vector X j ∼ N p(μ, Σ), Σ > O, j = 1, …, n, the X j’s being independently distributed. Let the p × 1 vector of unities be denoted by J or J′ = (1, …, 1), and let A be a vector that is orthogonal to J so that A′J = 0. For example, we can take
If the last component of A is zero, we are then ignoring the last component of X j. Let y j = A′X j, j = 1, …, n, and the y j’s be independently distributed. Then y j ∼ N 1(A′μ, A′ΣA), A′ΣA > O, is a univariate normal variable with mean value A′μ and variance A′ΣA. Consider the p × n sample matrix comprising the X j’s, that is, X = (X 1, …, X n). Let the sample average of the X j’s be \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) and \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\). Then, the sample sum of products matrix \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\). Consider the 1 × n vector \(Y=(y_1,\ldots ,y_n)=(A'X_1,\ldots ,A'X_n)=A'\mathbf {X},~ \bar {y}=\frac {1}{n}(y_1+\cdots +y_n)=A'\bar {X}\), \(\sum _{j=1}^n(y_j-\bar {y})^2=A'(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'A=A'SA\). Let the null hypothesis be H o : μ 1 = ⋯ = μ p = ν, where ν is unknown, μ′ = (μ 1, …, μ p). Thus, H o is A′μ = νA′J = 0. The joint density of y 1, …, y n, denoted by L, is then
where
Let us determine the MLE’s of μ and Σ. We have
On differentiating \(\ln L\) with respect to μ 1 and equating the result to zero, we have
since the equation holds for each μ j, j = 1, …, p, and A′ = (a 1, …, a p), a j≠0, j = 1, …, p, A being fixed. As well, \((\bar {X}-\mu )(\bar {X}-\mu )'=\bar {X}\bar {X}'-\bar {X}\mu '-\mu \bar {X}'+\mu \mu '\). Now, consider differentiating \(\ln L\) with respect to an element of Σ, say, σ 11, at \(\hat {\mu }=\bar {X}\):
for each element in Σ and hence \(\hat {\varSigma }=\frac {1}{n}S\). Thus,
Under H o, A′μ = 0 and consequently the maximum under H o is the following:
Accordingly, the λ-criterion is
We would reject H o for small values of λ or for large values of \(u\equiv nA'\bar {X}\bar {X}'A/A'SA\) where \(\bar {X}\) and S are independently distributed. Observing that S ∼ W p(n − 1, Σ), Σ > O and \(\bar {X}\sim N_p(\mu ,~\frac {1}{n}\varSigma ),~ \varSigma >O\), we have
Hence, (n − 1)u is a F statistic with 1 and n − 1 degrees of freedom, and the null hypothesis is to be rejected whenever
Example 6.3.4
Consider a real bivariate Gaussian N 2(μ, Σ) population where Σ > O is unknown. We would like to test the hypothesis H o : μ 1 = μ 2, μ′ = (μ 1, μ 2), so that μ 1 − μ 2 = 0 under this null hypothesis. Let the sample be X 1, X 2, X 3, X 4, as specified in Example 6.3.3. Let A′ = (1, −1) so that A′μ = O under H o. With the same observation vectors as those comprising the first sample in Example 6.3.3, A′X 1 = (1), A′X 2 = (−2), A′X 3 = (−1), A′X 4 = (0). Letting y = A′X j, the observations on y j are (1, −2, −1, 0) or A′ X = A′[X 1, X 2, X 3, X 4] = [1, −2, −1, 0]. The sample sum of products matrix as evaluated in the first part of Example 6.3.3 is
Our test statistic is
Let the significance level be α = 0.05. the observed values of \(A'\bar {X}\bar {X}'A,~A'S_1A\), v, and the tabulated critical value of F 1,n−1,α are the following:
As 0.6 < 10.13, H o is not rejected.
6.3.5. Likelihood ratio test for testing H o : μ 1 = ⋯ = μ p when Σ is unknown
In the entire parameter space Ω of a N p(μ, Σ) population, μ is estimated by the sample average \(\bar {X}\) and, as previously determined, the maximum of the likelihood function is
where S is the sample sum of products matrix and n is the sample size. Under the hypothesis H o : μ 1 = ⋯ = μ p = ν, where ν is unknown, this ν is estimated by \(\hat {\nu }=\frac {1}{np}\sum _{i,j}x_{ij}=\frac {1}{p}J'\bar {X},\) J′ = (1, …, 1), the p × 1 sample vectors \(X_j^{\prime }=(x_{1j},\ldots ,x_{pj}),~ j=1,\ldots ,n\), being independently distributed. Thus, under the null hypothesis H o, the population covariance matrix is estimated by \(\frac {1}{n}(S+n(\bar {X}-\hat {\mu })(\bar {X}-\hat {\mu })')\), and, proceeding as was done to obtain Eq. (6.3.3), the λ-criterion reduces to
Given the structure of u in (6.3.13), we can take the Gaussian population covariance matrix Σ to be the identity matrix I, as was explained in Sect. 6.3.1. Observe that
where \(I-\frac {1}{p}JJ'\) is idempotent of rank p − 1; hence there exists an orthonormal matrix P, PP′ = I, P′P = I, such that
where V 1 is the subvector of the first p − 1 components of V . Then the quadratic form u, which is our test statistic, reduces to the following:
We note that the test statistic u has the same structure that of u in Theorem 6.3.1 with p replaced by p − 1. Accordingly, \(u=n(\bar {X}-\hat {\mu })'S^{-1}(\bar {X}-\hat {\mu })\) is distributed as a real scalar type-2 beta variable with the parameters \(\frac {p-1}{2}\) and \(\frac {n-(p-1)}{2}\), so that \(\frac {n-p+1}{p-1}u\sim F_{p-1,\,n-p+1}\). Thus, the test criterion consists of
Example 6.3.5
Let the population be N 2(μ, Σ), Σ > O, μ′ = (μ 1, μ 2) and the null hypothesis be H o : μ 1 = μ 2 = ν where ν and Σ are unknown. The sample values, as specified in Example 6.3.3, are
The maximum likelihood estimate of μ under H o, is
and
As previously calculated, the sample sum of products matrix is
The test statistic v and its observed value are
At significance level α = 0.05, the tabulated critical value F 1,3,0.05 is 10.13, and since the observed value 0.61 is less than 10.13, H o is not rejected.
6.4. Testing Hypotheses on the Population Covariance Matrix
Let the p × 1 independent vectors X j, j = 1, …, n, have a p-variate real nonsingular N p(μ, Σ) distribution and the p × n matrix X = (X 1, …, X n) be the sample matrix. Denoting the sample average by \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\) and letting \(\bar {\mathbf {X}}=(\bar {X},\ldots ,\bar {X})\), each column of \(\bar {\mathbf {X}}\) being equal to \(\bar {X}\), the sample sum of products matrix is \(S=(\mathbf {X}-\bar {\mathbf {X}})(\mathbf {X}-\bar {\mathbf {X}})'\). We have already established that S is Wishart distributed with m = n − 1 degrees of freedom, that is, S ∼ W p(m, Σ), Σ > O. Letting S μ = (X −M)(X −M)′ where M = (μ, …, μ), each of its column being the p × 1 vector μ, S μ ∼ W p(n, Σ), Σ > O, where the number of degrees of freedom is n itself whereas the number of degrees of freedom associated with S is m = n − 1. Let us consider the hypothesis H o : Σ = Σ o where Σ o is a given known matrix and μ is unspecified. Then, the MLE’s of μ and Σ in the entire parameter space are \(\hat {\mu }=\bar {X}\) and \( \hat {\varSigma }=\frac {1}{n}S\), and the joint density of the sample values X 1, …, X n, denoted by L, is given by
Thus, as previously determined, the maximum of L in the parameter space Ω = {(μ, Σ)|Σ > O} is
the maximum of L under the null hypothesis H o : Σ = Σ o being given by
Then, the λ-criterion is the following:
Letting \(u=\lambda ^{\frac {2}{n}}\),
and we would reject H o for small values of u since it is a monotonically increasing function of λ, which means that the null hypothesis ought to be rejected for large values of \({\mathrm{tr}}(\varSigma _o^{-1}S)\) as the exponential function dominates the polynomial function for large values of the argument. Let us determine the distribution of \(w={\mathrm{tr}}(\varSigma _o^{-1}S)\) whose Laplace transform with parameter s is
This can be evaluated by integrating out over the density of S which has a Wishart distribution with m = n − 1 degrees of freedom when μ is estimated:
The exponential part is \(-\frac {1}{2}{\mathrm{tr}}(\varSigma ^{-1}S)-s\,{\mathrm{tr}}(\varSigma _o^{-1}S)=-\frac {1}{2}{\mathrm{tr}}[(\varSigma ^{-\frac {1}{2}}S\varSigma ^{-\frac {1}{2}})(I+2s\varSigma ^{\frac {1}{2}}\varSigma _o^{-1}\varSigma ^{\frac {1}{2}})]\) and hence,
The null case, Σ = Σ o
In this case, \(\varSigma ^{\frac {1}{2}}\varSigma _o^{-1}\varSigma ^{\frac {1}{2}}=I,\) so that
Thus, the test criterion is the following:
When μ is known, it is used instead of its MLE to determine S μ, and the resulting criterion consists of rejecting H o whenever the observed \(w_{\mu }={\mathrm{tr}}(\varSigma _o^{-1}S_{\mu })\ge \chi _{np,\,\alpha }^2\) where n is the sample size. These results are summarized in the following theorem.
Theorem 6.4.1
Let the null hypothesis be H o : Σ = Σ o (given) and \(w={\mathrm{tr}}(\varSigma _o^{-1}S)\) where S is the sample sum of products matrix. Then, the null distribution of \(w={\mathrm{tr}}(\varSigma _o^{-1}S)\) has a real scalar chisquare distribution with (n − 1)p degrees of freedom when the estimate of μ, namely \(\hat {\mu }=\bar {X}\) , is utilized to compute S; when μ is specified, w has a chisquare distribution having np degrees of freedom where n is the sample size.
The non-null density of w
The non-null density of w is available from (6.4.4). Let λ 1, …, λ p be the eigenvalues of \(\varSigma ^{\frac {1}{2}}\varSigma _o^{-1}\varSigma ^{\frac {1}{2}}\). Then L w(s) in (6.4.4) can be re-expressed as follows:
This is the Laplace transform of a variable of the form w = λ 1 w 1 + ⋯ + λ p w p where w 1, …, w p are independently distributed real scalar chisquare random variables, each having m = n − 1 degrees of freedom, where λ j > 0, j = 1, …, p. The distribution of linear combinations of chisquare random variables corresponds to the distribution of quadratic forms; the reader may refer to Mathai and Provost (1992) for explicit representations of their density functions.
Note 6.4.1
If the population mean value μ is known, then one can proceed by making use of μ instead of the sample mean to determine S μ, in which case n, the sample size, ought to be used instead of m = n − 1 in the above discussion.
6.4.1. Arbitrary moments of λ
From (6.4.2), the h-th moment of the λ-criterion for testing H o : Σ = Σ o (given) in a real nonsingular N p(μ, Σ) population, is obtained as follows:
for \(\Re (\frac {nh}{2}+\frac {n-1}{2})>\frac {p-1}{2},~ I+h\varSigma \varSigma _o^{-1}>O\). Under H o : Σ = Σ o, we have \(|I+h\varSigma \varSigma _o^{-1}|{ }^{-(\frac {nh}{2}+\frac {n-1}{2})}=(1+h)^{-p(\frac {nh}{2}+\frac {n-1}{2})}\). Thus, the h-th null moment is given by
for \(\Re (\frac {nh}{2}+\frac {n-1}{2})>\frac {p-1}{2}\).
6.4.2. The asymptotic distribution of \(-2\ln \lambda \) when testing H o : Σ = Σ o
Let us determine the asymptotic distribution of \(-2\ln \lambda \) where λ is the likelihood ratio statistic for testing H o : Σ = Σ o (specified) in a real nonsingular N p(μ, Σ) population, as n →∞, n being the sample size. This distribution can be determined by expanding both real matrix-variate gamma functions appearing in (6.4.9) and applying Stirling’s approximation formula as given in (6.5.14) by letting \(\frac {n}{2}(1+h)\to \infty \) in the numerator gamma functions and \(\frac {n}{2}\to \infty \) in the denominator gamma functions. Then, we have
Hence, from (6.4.9)
where \((1+h)^{-\frac {p(p+1)}{4}}\) is the h-th moment of the distribution of e−y∕2 when \(y\sim \chi ^2_{\frac {p(p+1)}{2}}\). Thus, under H o, \(-2\ln \lambda \to \chi ^2_{\frac {p(p+1)}{2}}\) as n →∞ . For general procedures leading to asymptotic normality, see Mathai (1982).
Theorem 6.4.2
Letting λ be the likelihood ratio statistic for testing H o : Σ = Σ o (given) on the covariance matrix of a real nonsingular N p(μ, Σ) distribution, the null distribution of \(-2\ln \lambda \) is asymptotically (as then sample size tends to ∞) that of a real scalar chisquare random variable having \(\frac {p(p+1)}{2}\) degrees of freedom, where n denotes the sample size. This number of degrees of freedom is also equal to the number of parameters restricted by the null hypothesis.
Note 6.4.2
Sugiura and Nagao (1968) have shown that the test based on the statistic λ as specified in (6.4.2) is biased whereas it becomes unbiased upon replacing n, the sample size, by the degrees of freedom n − 1 in (6.4.2). Accordingly, percentage points are then computed for \(-2\ln \lambda _1\), where λ 1 is the statistic λ given in (6.4.2) wherein n − 1 is substituted to n. Korin (1968), Davis (1971), and Nagarsenker and Pillai (1973) computed 5% and 1% percentage points for this test statistic. Davis and Field (1971) evaluated the percentage points for p = 2(1)10 and n = 6(1)30(5)50, 60, 120 and Korin (1968), for p = 2(1)10.
Example 6.4.1
Let us take the same 3-variate real Gaussian population N 3(μ, Σ), Σ > O and the same data as in Example 6.3.1, so that intermediate calculations could be utilized. The sample size is 5 and the sample values are the following:
the sample average and the sample sum of products matrix being
Let us consider the hypothesis Σ = Σ o where
Let us test the null hypothesis at the significance level α = 0.05. The distribution of the test statistic w and the tabulated critical value are as follows:
As the observed value 12.12 < 21.03, H o is not rejected. The asymptotic distribution of \(-2\ln \lambda \), as n →∞, is \(\chi ^2_{p(p+1)/2}\simeq \chi ^2_6\) where λ is the likelihood ratio criterion statistic. Since \(\chi ^2_{6,~0.05}=12.59\) and 12.59 > 12.12, we still do not reject H o as n →∞.
6.4.3. Tests on Wilks’ concept of generalized variance
The concept of generalized variance was explained in Chap. 5. The sample generalized variance is simply the determinant of S, the sample sum of products matrix. When the population is p-variate Gaussian, it has already been shown in Chap. 5 that S is Wishart distributed with m = n − 1 degrees of freedom, n being the sample size, and parameter matrix Σ > O, which is the population covariance matrix. When the population is multivariate normal, several types of tests of hypotheses involve the sample generalized variance. The first author has given the exact distributions of such tests, see Mathai (1972a,b) and Mathai and Rathie (1971).
6.5. The Sphericity Test or Testing if H o : Σ = σ 2 I, Given a N p(μ, Σ) Sample
When the covariance matrix Σ = σ 2 I, where σ 2 > 0 is a real scalar quantity, the ellipsoid (X − μ)′Σ −1(X − μ) = c > 0, which represents a specific contour of constant density for a nonsingular N p(μ, Σ) distribution, becomes the sphere defined by the equation \(\frac {1}{\sigma ^2}(X-\mu )'(X-\mu ) =c\) or \(\frac {1}{\sigma ^2}((x_1-\mu _1)^2+\cdots +(x_p-\mu _p)^2)=c>0, \) whose center is located at the point μ; hence the test’s name, the sphericity test. Given a N p(μ, Σ) sample of size n, the maximum of the likelihood function in the entire parameter space is
as was previously established. However, under the null hypothesis H o : Σ = σ 2 I, tr(Σ −1 S) = (σ 2)−1(tr(S)) and |Σ| = (σ 2)p. Thus, if we let θ = σ 2 and substitute \(\hat {\mu }=\bar {X}\) in L, under H o the loglikelihood function will be \(\ln L_{\omega }=-\frac {np}{2}\ln (2\pi )-\frac {np}{2}\ln \theta -\frac {1}{2\theta }{\mathrm{tr}}(S). \) Differentiating this function with respect to θ and equating the result to zero produces the following estimator for θ:
Accordingly, the maximum of the likelihood function under H o is the following:
Thus, the λ-criterion for testing
is
In the complex Gaussian case when \(\tilde {X}_j\sim \tilde {N}_p(\tilde {\mu },\varSigma ), ~\varSigma =\varSigma ^{*}>O\) where an asterisk indicates the conjugate transpose, \(\tilde {X}_j=X_{j1}+iX_{j2}\) where X j1 and X j2 are real p × 1 vectors and \(i=\sqrt {(-1)}\). The covariance matrix associated with \(\tilde {X}_j\) is then defined as
where Σ is assumed to be Hermitian positive definite, with Σ 11 = Cov(X j1), Σ 22 = Cov(X j2), Σ 12 = Cov(X j1, X j2) and Σ 21 = Cov(X j2, X j1). Thus, the hypothesis of sphericity in the complex Gaussian case is Σ = σ 2 I where σ is real and positive. Then, under the null hypothesis \(\tilde {H}_o: \varSigma =\sigma ^2 I\), the Hermitian form \(\tilde {Y}^{*}\varSigma \tilde {Y}=c>0\) where c is real and positive, becomes \(\sigma ^2\tilde {Y}^{*}\tilde {Y}=c\Rightarrow |\tilde {y}_1|{ }^2+\cdots +|\tilde {y}_p|{ }^2=\frac {c}{\sigma ^2}>0\), which defines a sphere in the complex space, where \(|\tilde {y}_j|\) denotes the absolute value or modulus of \(\tilde {y}_j\). If \(\tilde {y}_j=y_{j1}+iy_{j2}\) with \(i=\sqrt {(-1)},~ y_{j1},y_{j2}\) being real, then \(|\tilde {y}_j|{ }^2=y_{j1}^2+y_{j2}^2\).
The joint density of the sample values in the real Gaussian case is the following:
where \(\bar {X}=\frac {1}{n}(X_1+\cdots +X_n)\), X j, j = 1, …, n are iid N p(μ, Σ), Σ > O. We have already derived the maximum of L in the entire parameter space Ω, which, in the real case, is
where S is the sample sum of products matrix. Under H o, \(|\varSigma |{ }^{\frac {n}{2}}=(\sigma ^2)^{\frac {np}{2}}\) and \({\mathrm{tr}}(\varSigma ^{-1}S)=\frac {1}{\sigma ^2}(s_{11}+\cdots +s_{pp})=\frac {1}{\sigma ^2}{\mathrm{tr}}(S)\). Thus, the maximum likelihood estimator of σ 2 is \(\frac {1}{np}{\mathrm{tr}}(S)\). Accordingly, the λ-criterion is
in the real case. Interestingly, (u 1)1∕p is the ratio of the geometric mean of the eigenvalues of S to their arithmetic mean. The structure remains the same in the complex domain, in which case \({\mathrm{det}}(\tilde {S})\) is replaced by the absolute value \(|{\mathrm{det}}(\tilde {S})|\) so that
For arbitrary h, the h-th moment of u 1 in the real case can be obtained by integrating out over the density of S, which, as explained in Sect. 5.5, 5.5a, is a Wishart density with n − 1 = m degrees of freedom and parameter matrix Σ > O. However, when the null hypothesis H o holds, Σ = σ 2 I p, so that the h-th moment in the real case is
In order to evaluate this integral, we replace [tr(S)]−ph by an equivalent integral:
Then, substituting (ii) in (i), the exponent becomes \(-\frac {1}{2\sigma ^2}(1+2\sigma ^2 x)({\mathrm{tr}}(S))\). Now, letting \(S_1=\frac {1}{2\sigma ^2}(1+2\sigma ^2 x)S\Rightarrow {\mathrm{d}}S=(2\sigma ^2)^{\frac {p(p+1)}{2}}(1+2\sigma ^2 x)^{-\frac {p(p+1)}{2}}{\mathrm{d}}S_1\), and we have
The corresponding h-th moment in the complex case is the following:
By making use of the multiplication formula for gamma functions, one can expand the real gamma function Γ(mz) as follows:
and for m = 2, we have the duplication formula
Then on applying (6.5.6),
Moreover, it follows from the definition of the real matrix-variate gamma functions that
On canceling \(\varGamma (\frac {m}{2}+h)/\varGamma (\frac {m}{2})\) when multiplying (iii) by (iv), we are left with
The corresponding h-th moment in the complex case is the following:
For h = s − 1, one can treat \(E[u_1^{s-1}|H_o]\) as the Mellin transform of the density of u 1 in the real case. Letting this density be denoted by \(f_{u_1}(u_1)\), it can be expressed in terms of a G-function as follows:
and \(f_{u_1}(u_1|H_o)=0\) elsewhere, where
the corresponding density in the complex case being the following:
and \(\tilde {f}_{\tilde {u}_1}(\tilde {u}_1)=0\) elsewhere, where \(\tilde {G}\) is a real G-function whose parameters are different from those appearing in (6.5.9), and
For computable series representation of a G-function with general parameters, the reader may refer to Mathai (1970a, 1993). Observe that u 1 in the real case is structurally a product of p − 1 mutually independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=\frac {m}{2}-\frac {j}{2},~\beta _j= \frac {j}{2}+\frac {j}{p}),~j=1,\ldots ,p-1\). In the complex case, \(\tilde {u}_1\) is structurally a product of p − 1 mutually independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=m-j,~\beta _j=j+\frac {j}{p}),~j=1,\ldots ,p-1\). This observation is stated as a result.
Theorem 6.5.1
Consider the sphericity test statistic for testing the hypothesis H o : Σ = σ 2 I where σ 2 > 0 is an unknown real scalar. Let u 1 and the corresponding complex quantity \(\tilde {u}_1\) be as defined in (6.5.4) and (6.5a.1) respectively. Then, in the real case, u 1 is structurally a product of p − 1 independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=\frac {m}{2}-\frac {j}{2},~ \beta _j=\frac {j}{2}+\frac {j}{p}),~j=1,\ldots ,p-1,\) and, in the complex case, \(\tilde {u}_1\) is structurally a product of p − 1 independently distributed real scalar type-1 beta random variables with the parameters \((\alpha _j=m-j,~ \beta _j=j+\frac {j}{p}),~j=1,\ldots ,p-1,\) where m = n − 1, n = the sample size.
For certain special cases, one can represent (6.5.9) and (6.5a.4) in terms of known elementary functions. Some such cases are now being considered. Real case: p = 2
In the real case, for p = 2
This means u 1 is a real type-1 beta variable with the parameters \((\alpha =\frac {m}{2}-\frac {1}{2},~\beta =1)\). The corresponding result in the complex case is that \(\tilde {u}_1\) is a real type-1 beta variable with the parameters (α = m − 1, β = 1). Real case: p = 3
In the real case
so that u 1 is equivalent to the product of two independently distributed real type-1 beta random variables with the parameters \((\alpha _j,\beta _j)=(\frac {m}{2}-\frac {j}{2},~ \frac {j}{2}+\frac {j}{3}),~ j=1,2\). This density can be obtained by treating \(E[u_1^h|H_o]\) for h = s − 1 as the Mellin transform of the density of u 1. The density is then available by taking the inverse Mellin transform. Thus, again denoting it by \(f_{u_1}(u_1)\), we have
where R ν is the residue of the integrand ϕ 3(s) at the poles of \(\varGamma (\frac {m}{2}-\frac {3}{2}+s)\) and \(R_{\nu }^{\prime }\) is the residue of the integrand ϕ 3(s) at the pole of \(\varGamma (\frac {m}{2}-2+s)\). Letting \(s_1=\frac {m}{2}-\frac {3}{2}+s\),
We can replace negative ν in the arguments of the gamma functions with positive ν by making use of the following formula:
where for example, (b)ν is the Pochhammer symbol
so that
The sum of the residues then becomes
It can be similarly shown that
Accordingly, the density of u 1 for p = 3 is the following:
and \(f_{u_1}(u_1|H_o)=0\) elsewhere. Real case: p = 4
In this case,
where c 4 is the normalizing constant. However, noting that
there is one pole at \(s=-\frac {m}{2}+\frac {3}{2}\). The poles of \(\varGamma (\frac {m}{2}-\frac {5}{2}+s)\) occur at \(s=-\frac {m}{2}+\frac {5}{2}-\nu ,~\nu =0,1,\ldots ,\) and hence at ν = 1, the pole coincides with the earlier pole and there is a pole of order 2 at \(s=-\frac {m}{2}+\frac {3}{2}\). Each one of other poles of the integrand is simple, that is, of order 1. The second order pole will bring in a logarithmic function. As all the cases for which p ≥ 4 will bring in poles of higher orders, they will not be herein discussed. The general expansion of a G-function of the type \(G_{m,m}^{m,0}(\cdot )\) is provided in Mathai (1970a, 1993). In the complex case, starting from p ≥ 3, poles of higher orders are coming in, so that the densities can only be written in terms of logarithms, psi and zeta functions; hence, these will not be considered. Observe that \(\tilde {u}_1\) corresponds to product of independently distributed real type-1 beta random variables, even though the densities are available only in terms of logarithms, psi and zeta functions for p ≥ 3. The null and non-null densities of the λ-criterion in the general case, were derived by the first author and some results obtained under the null distribution can also be found in Mathai and Saxena (1973). Several researchers have contributed to various aspects of the sphericity and multi-sample sphericity tests; for some of the first author’s contributions, the reader may refer to Mathai and Rathie (1970) and Mathai (1977, 1984, 1986).
Gamma products such as those appearing in (6.5.8) and (6.5a.3) are frequently encountered when considering various types of tests on the parameters of a real or complex Gaussian or certain other types of distributions. Structural representations in the form of product of independently distributed real scalar type-1 beta random variables occur in numerous situations. Thus, a general asymptotic result on the h-th moment of such products of type-1 beta random variables will be derived. This is now stated as a result.
Theorem 6.5.2
Let u be a real scalar random variable whose h-th moment is of the form
where Γ p(⋅) is a real matrix-variate gamma function on p × p real positive definite matrices, α is real, γ is bounded, δ is real, 0 < δ < ∞ and h is arbitrary. Then, as \(\alpha \to \infty ,\ -2\ln u\to \chi _{2\,p\,\delta }^2\) , a real chisquare random variable having 2 p δ degrees of freedom, that is, a real gamma random variable with the parameters (α = p δ, β = 2).
Proof
On expanding the real matrix-variate gamma functions, we have the following:
Consider the following form of Stirling’s asymptotic approximation formula for gamma functions, namely,
On applying this asymptotic formula to the gamma functions appearing in (i) and (ii) for α →∞, we have
and
so that
On noting that \(E[u^h]=E[{\mathrm{e}}^{h\ln u}]\to (1+h)^{-p\,\delta }\), it is seen that \(\ln u\) has the mgf (1 + h)−pδ for 1 + h > 0 or \(-2\ln u\) has mgf (1 − 2h)−pδ for 1 − 2h > 0, which happens to be the mgf of a real scalar chisquare variable with 2 p δ degrees of freedom if 2 p δ is a positive integer or a real gamma variable with the parameters (α = p δ, β = 2). Hence the following result.
Corollary 6.5.1
Consider a slightly more general case than that considered in Theorem 6.5.2 . Let the h-th moment of u be of the form
Then as \(\alpha \to \infty ,~ E[u^h]\to (1+h)^{-(\delta _1+\cdots +\delta _p)}\) , which implies that \(-2\ln u\to \chi _{2(\delta _1+\cdots +\delta _p)}^2\) whenever 2(δ 1 + ⋯ + δ p) is a positive integer or, equivalently, \(-2\ln u\) tends to a real gamma variable with the parameters (α = δ 1 + ⋯ + δ p, β = 2) .
Let us examine the asymptotic distribution of the test statistic for the sphericity test in the light of Theorem 6.5.2. It is seen from (6.5.4) that \(\lambda ^h=u^{h\frac {n}{2}}\). Thus, by replacing h by \(\frac {n}{2}h\) in (6.5.8) with m = n − 1, we have
Then, it follows from Corollary 6.5.1 that \(-2\ln \lambda \to \chi ^2_{2\sum _{j=1}^{p-1}(\frac {j}{2}+\frac {j}{p})}\), a chi-square random variable having \(2\sum _{j=1}^{p-1}(\frac {j}{2}+\frac {j}{p})=\frac {p(p-1)}{2}+(p-1)=\frac {(p-1)(p+2)}{2}\) degrees of freedom. Hence the following result:
Theorem 6.5.3
Consider the λ-criterion for testing the hypothesis of sphericity. Then, under the null hypothesis, \(-2\ln \lambda \to \chi ^2_{\frac {(p-1)(p+2)}{2}}\) , as the sample size n →∞. In the complex case, as n →∞, \(-2\ln \lambda \to \chi ^2_{(p-1)(p+1)}\) , a real scalar chisquare variable with \(2[\frac {p(p-1)}{2}+\frac {(p-1)}{2}]=p(p-1)+(p-1)=(p-1)(p+1)\) degrees of freedom.
Note 6.5.1
We observe that the degrees of freedom of the real chisquare variable in the real scalar case is \(\frac {(p-1)(p+2)}{2}\), which is also equal to the number of parameters restricted by the null hypothesis. Indeed, when Σ = σ 2 I, we have σ ij = 0, i≠j, which produces \(\frac {p(p-1)}{2}\) restrictions and, since σ 2 is unknown, requiring that the diagonal elements are such that σ 11 = ⋯ = σ pp produces p − 1 additional restrictions for a total of \(\frac {(p-1)(p+2)}{2}\) restrictions being imposed. Thus, the degrees of freedom of the asymptotic chisquare variable corresponds to the number of restrictions imposed by H o, which, actually, is a general result.
6.6. Testing the Hypothesis that the Covariance Matrix is Diagonal
Consider the null hypothesis that Σ, the nonsingular covariance matrix of a p-variate real normal distribution, is diagonal, that is,
Since the population is assumed to be normally distributed, this implies that the components of the p-variate Gaussian vector are mutually independently distributed as univariate normal random variables whose respective variances are σ jj, j = 1, …, p. Consider a simple random sample of size n from a nonsingular N p(μ, Σ) population or, equivalently, let X 1, …, X n be independently distributed as N p(μ, Σ) vectors, Σ > 0. Under H o, σ jj is estimated by its MLE which is \(\hat {\sigma }_{jj}=\frac {1}{n}s_{jj}\) where s jj is the j-th diagonal element of S = (s ij), the sample sum of products matrix. The maximum of the likelihood function under the null hypothesis is then
the likelihood function being the joint density evaluated at an observed value of the sample. Observe that the overall maximum or the maximum in the entire parameter space remains the same as that given in (6.1.1). Thus, the λ-criterion is given by
where S ∼ W p(m, Σ), Σ > O, and m = n − 1, n being the sample size. Under H o, Σ = diag(σ 11, …, σ pp). Then for an arbitrary h, the h-th moment of u 2 is available by taking the expected value of \(\lambda ^{\frac {2}{n}}\) with respect to the density of S, that is,
where, under H o, |Σ| = σ 11⋯σ pp. As was done in Sect. 6.1.1, we may replace \(s_{jj}^{-h}\) by the equivalent integral,
Thus,
where Y = diag(x 1, …, x p), so that tr(Y S) = x 1 s 11 + ⋯ + x p s pp. Then, (6.6.2) can be reexpressed as follow:
and observing that, under H o,
Thus,
Denoting the density of u 2 as \(f_{u_2}(u_2|H_o)\), we can express it as an inverse Mellin transform by taking h = s − 1. Then,
and zero elsewhere, where
Some special cases of this density are expounded below. Real and complex cases: p = 2
When p = 2, u 2 has a real type-1 beta density with the parameters \((\alpha =\frac {m}{2}-\frac {1}{2},~\beta =\frac {1}{2})\) in the real case. In the complex case, it has a real type-1 beta density with the parameters (α = m − 1, β = 1). Real and complex cases: p = 3
In this case, \(f_{u_2}(u_2|H_o)\) is given by
The poles of the integrand are simple. Those coming from \(\varGamma (\frac {m}{2}-\frac {3}{2}+s)\) occur at \(s=-\frac {m}{2}+\frac {3}{2}-\nu ,~ \nu =0,1,\ldots \) . The residue R ν is the following:
Summing the residues, we have
Now, consider the sum of the residues at the poles of \(\varGamma (\frac {m}{2}-2+s)\). Observing that \(\varGamma (\frac {m}{2}-2+s)\) cancels out one of the gamma functions in the denominator, namely \(\varGamma (\frac {m}{2}-1+s)=(\frac {m}{2}-2-s)\varGamma (\frac {m}{2}-2+s)\), the integrand becomes
the residue at the pole \(s=-\frac {m}{2}+2\) being \(\frac {\varGamma (\frac {1}{2})\,u_2^{\frac {m}{2}-2}}{\varGamma (1)}\). Then, noting that \(\varGamma (-\frac {1}{2})=-2\varGamma (\frac {1}{2})=-2\sqrt {\pi }\), the density is the following:
and zero elsewhere.
In the complex case, the integrand is
and hence there is a pole of order 1 at s = −m + 3 and a pole of order 2 at s = −m + 2. The residue at s = −m + 3 is \(\frac {u_2^{m-3}}{(1)^2}=u_2^{m-3}\) and the residue at s = −m + 2 is given by
which gives the residue as \(u_2^{m-2}\ln u_2-u_2^{m-2}\). Thus, the sum of the residues is \(u_2^{m-3}+u_2^{m-2}\ln u_2-u_2^{m-2}\) and the constant part is
so that the density is
and zero elsewhere. Note that as u 2 → 0, the limit of \(\,\frac {u_2^{m-1}}{m-1}\ln u_2\,\) is zero. By integrating out over 0 < u 2 ≤ 1 while m ≥ 3, it can be verified that \({f}_{{u}_2}(\cdot )\) is indeed a density function. Real and complex cases: p ≥ 4
As poles of higher orders are present when p ≥ 4, both in the real and complex cases, the exact density function of the test statistic will not be herein explicitly given for those cases. Actually, the resulting densities would involve G-functions for which general expansions are for instance provided in Mathai (1993). The exact null and non-null densities of \(u=\lambda ^{\frac {2}{n}}\) have been previously derived by the first author. Percentage points accurate to the 11th decimal place are available from Mathai and Katiyar (1979a, 1980) for the null case; as well, various aspects of the distribution of the test statistic are discussed in Mathai and Rathie (1971) and Mathai (1973, 1984, 1985)
Let us now consider the asymptotic distribution of the λ-criterion under the null hypothesis,
Given the representation of the h-th moment of u 2 provided in (6.6.3) and referring to Corollary 6.5.1, it is seen that the sum of the δ j’s is \(\sum _{j=1}^{p-1}\delta _j=\sum _{j=1}^{p-1}\frac {j}{2}=\frac {p(p-1)}{4}\), so that the number of degrees of freedom of the asymptotic chisquare distribution is \(2[\frac {p(p-1)}{4}]=\frac {p(p-1)}{2}\) which, as it should be, is the number of restrictions imposed by H o, noting that when Σ is diagonal, σ ij = 0, i≠j, which produces \(\frac {p(p-1)}{2}\) restrictions. Hence, the following result:
Theorem 6.6.1
Let λ be the likelihood ratio criterion for testing the hypothesis that the covariance matrix Σ of a nonsingular N p(μ, Σ) distribution is diagonal. Then, as n →∞, \(-2\ln \lambda \to \chi ^2_{\frac {p(p-1)}{2}}\) in the real case. In the corresponding complex case, as n →∞, \(-2\ln \lambda \to \chi ^2_{p(p-1)}\) , a real scalar chisquare variable having p(p − 1) degrees of freedom.
6.7. Equality of Diagonal Elements, Given that Σ is Diagonal, Real Case
In the case of a p-variate real nonsingular N p(μ, Σ) population, whenever Σ is diagonal, the individual components are independently distributed as univariate normal random variables. Consider a simple random sample of size n, that is, a set of p × 1 vectors X 1, …, X n, that are iid as X j ∼ N p(μ, Σ) where it is assumed that \(\varSigma ={\mathrm{diag}}(\sigma _1^2,\ldots ,\sigma _p^2)\). Letting \(X_j^{\prime }=(x_{1j},\ldots ,x_{pj})\), the joint density of the x rj’s, j = 1, …, n, in the above sample, which is denoted by L r, is given by
Then, on substituting the maximum likelihood estimators of μ r and \(\sigma _r^2\) in L r, its maximum is
Under the null hypothesis H o, \(\sigma _2^2=\cdots =\sigma _p^2\equiv \sigma ^2\) and the MLE of σ 2 is a pooled estimate which is equal to \(\frac {1}{np}(s_{11}+\cdots +s_{pp})\). Thus, the λ-criterion is the following in this case:
If we let
then, for arbitrary h, the h-th moment of u 3 is the following:
Observe that \(\frac {s_{jj}}{\sigma ^2}\overset {iid}{\sim } \chi _{n-1}^2=\chi _m^2,~ m=n-1,\) for j = 1, …, p, the density of s jj being of the form
under H o. Note that (s 11 + ⋯ + s pp)−ph can be replaced by an equivalent integral as
Due to independence of the s jj’s, the joint density of s 11, …, s pp, is the product of the densities appearing in (i), and on integrating out s 11, …, s pp, we end up with the following:
Now, the integral over x can be evaluated as follows:
Thus,
The density of u 3 can be written in terms of an H-function. Since p is a positive integer, we can expand one gamma ratio using Gauss’ multiplication formula:
for p = 1, 2, …, m ≥ p. Accordingly,
Hence, for h = s − 1, (6.7.5) is the Mellin transform of the density of u 3. Thus, denoting the density by \(f_{u_3}(u_3)\), we have
and zero elsewhere.
In the complex case, the h-th moment is the following:
and the corresponding density is given by
and zero elsewhere, G denoting a real G-function. Real and complex cases: p = 2
It is seen from (6.7.5) that for p = 2, u 3 is a real type-1 beta with the parameters \((\alpha =\frac {m}{2},~\beta =\frac {1}{p})\) in the real case. Whenever p ≥ 3, poles of order 2 or more are occurring, and the resulting density functions which are expressible in terms generalized hypergeometric functions, will not be explicitly provided. For a general series expansion of the G-function, the reader may refer to Mathai (1970a, 1993).
In the complex case, when p = 2, \(\tilde {u}_3\) has a real type-1 beta density with the parameters \((\alpha =m, ~\beta =\frac {1}{p})\). In this instance as well, poles of higher orders will be present when p ≥ 3, and hence explicit forms of the corresponding densities will not be herein provided. The exact null and non-null distributions of the test statistic are derived for the general case in Mathai and Saxena (1973), and highly accurate percentage points are provided in Mathai (1979a,b).
An asymptotic result can also be obtained as n →∞ . Consider the h-th moment of λ, which is available from (6.7.5) in the real case and from (6.7a.1) in the complex case. Then, referring to Corollary 6.5.2, \(\delta _j=\frac {j}{p}\) whether in the real or in the complex situations. Hence, \(2[\sum _{j=1}^{p-1}\delta _j]=2\sum _{j=1}^{p-1}\frac {j}{p}=(p-1)\) in both the real and the complex cases. As well, observe that in the complex case, the diagonal elements are real since \(\tilde {\varSigma }\) is Hermitian positive definite. Accordingly, the number of restrictions imposed by H o in either the real or complex cases is p − 1. Thus, the following result:
Theorem 6.7.1
Consider the λ-criterion for testing the equality of the diagonal elements, given that the covariance matrix is already diagonal. Then, as n →∞, the null distribution of \(-2\ln \lambda \to \chi ^2_{p-1}\) in both the real and the complex cases.
6.8. Hypothesis that the Covariance Matrix is Block Diagonal, Real Case
We will discuss a generalization of the problem examined in Sect. 6.6, considering again the case of real Gaussian vectors. Let X 1, …, X n be iid as X j ∼ N p(μ, Σ), Σ > O, and
In this case, the p × 1 real Gaussian vector is subdivided into subvectors of orders p 1, …, p k, so that p 1 + ⋯ + p k = p, and, under the null hypothesis H o, Σ is assumed to be a block diagonal matrix, which means that the subvectors are mutually independently distributed p j-variate real Gaussian vectors with corresponding mean value vector μ (j) and covariance matrix Σ jj, j = 1, …, k. Then, the joint density of the sample values under the null hypothesis can be written as \(L=\prod _{r=1}^kL_r\) where L r is the joint density of the sample values corresponding to the subvector X (rj), j = 1, …, n, r = 1, …, k. Letting the p × n general sample matrix be X = (X 1, …, X n), we note that the sample representing the first p 1 rows of X corresponds to the sample from the first subvector \(X_{(1j)}\overset {iid}{\sim } N_p(\mu _{(1)},~\varSigma _{11}),~\varSigma _{11}>O\), j = 1, …, n. The MLE’s of μ (r) and Σ rr are the corresponding sample mean and sample covariance matrix. Thus, the maximum of L r is available as
Hence,
and
Observe that the covariance matrix Σ = (σ ij) can be written in terms of the matrix of population correlations. If we let D = diag(σ 1, …, σ p) where \(\sigma _t^2=\sigma _{tt}\) denotes the variance associated the component x tj in \(X_j^{\prime }=(x_{1j},\ldots ,x_{pj})\) where Cov(X) = Σ, and R = (ρ rs) be the population correlation matrix, where ρ rs is the population correlation between the components x rj and x sj, then, Σ = DRD. Consider a partitioning of Σ into k × k blocks as well as the corresponding partitioning of D and R:
where, for example, Σ jj is p j × p j, p 1 + ⋯ + p k = p, and the corresponding partitioning of D and R. Consider a corresponding partitioning of the sample sum of products matrix S = (S ij), D (s) and R (s) where R (s) is the sample correlation matrix and \(D^{(s)}={\mathrm{diag}}(\sqrt {s_{11}},\ldots ,\sqrt {s_{pp}})\), where \(S_{jj},~ D_j^{(s)},~ R_{jj}^{(s)}\) are p j × p j, p 1 + ⋯ + p k = p. Then,
and
An additional interesting property is now pointed out. Consider a linear function of the original p × 1 vector X j ∼ N p(μ, Σ), Σ > O, in the form CX j where C is the diagonal matrix, diag(c 1, …, c p). In this case, the product CX j is such that the r-th component of X j is weighted or multiplied by c r. Let C be a block diagonal matrix that is partitioned similarly to D so that its j-th diagonal block matrix be the p j × p j diagonal submatrix C j. Then,
In other words, u 4 is invariant under linear transformations on \(X_j \overset {iid}{\sim }N_p(\mu ,\varSigma ), ~\varSigma >O,\ j=1,\ldots ,n\). That is, if Y j = CX j + d where d is a constant column vector, then the p × n sample matrix on Y j, namely, Y = (Y 1, …, Y n) = (CX 1 + d, …, CX n + d),
Letting S y be partitioned as S into k × k blocks and S y = (S ijy), we have
Arbitrary moments of u 4 can be derived by proceeding as in Sect. 6.6. The h-th null moment, that is, the h-th moment under the null hypothesis H o, is then
where m = n − 1, n being the sample size, and
where S rr is the r-th diagonal block of S, corresponding to Σ rr of Σ whose order p r × p r, r = 1, …, k, p 1 + ⋯ + p k = p. On noting that
where Y r > O is a p r × p r real positive definite matrix, and replacing each |S rr|−h by its integral representation as given in (iii), the exponent of e in (i) becomes
The right-hand side of equation (i) then becomes
It should be pointed out that the non-null moments of u 4 can be obtained by substituting a general Σ to Σ o in (iv). Note that if we replace 2Y by Y , the factor containing 2, namely 2ph, will disappear. Further, under H o,
Then, each Y r-integral can be evaluated as follows:
On combining equations (i) to (vi), we have
so that when h = 0, \(E[u_4^h|H_o]=1\). Observe that one set of gamma products can be canceled in (6.8.8) and (6.8.9). When that set is the product of the first p 1 gamma functions, the h-th moment of u 4 is given by
where \(c_{4,p-p_1}\) is such that \(E[u_4^h|H_o]=1\) when h = 0. Since the structure of the expression given in (6.8.10) is that of the h-th moment of a product of p − p 1 independently distributed real scalar type-1 beta random variables, it can be inferred that the distribution of u 4|H o is also that of a product of p − p 1 independently distributed real scalar type-1 beta random variables whose parameters can be determined from the arguments of the gamma functions appearing in (6.8.10).
Some of the gamma functions appearing in (6.8.10) will cancel out for certain values of p 1, …, p k, , thereby simplifying the representation of the moments and enabling one to express the density of u 4 in terms of elementary functions in such instances. The exact null density in the general case was derived by the first author. For interesting representations of the exact density, the reader is referred to Mathai and Rathie (1971) and Mathai and Saxena (1973), some exact percentage points of the null distribution being included in Mathai and Katiyar (1979a). As it turns out, explicit forms are available in terms of elementary functions for the following special cases, see also Anderson (2003): p 1 = p 2 = p 3 = 1; p 1 = p 2 = p 3 = 2; p 1 = p 2 = 1, p 3 = p − 2; p 1 = 1, p 2 = p 3 = 2; p 1 = 1, p 2 = 2, p 3 = 3; p 1 = 2, p 2 = 2, p 3 = 4; p 1 = p 2 = 2, p 3 = 3; p 1 = 2, p 2 = 3, p is even.
6.8.1. Special case: k = 2
Let us consider a certain 2 × 2 partitioning of S, which corresponds to the special case k = 2. When p 1 = 1 and p 2 = p − 1 so that p 1 + p 2 = p, the test statistic is
where r 1.(2…p) is the multiple correlation between x 1 and (x 2, …, x p). As stated in Theorem 5.6.3, \(1-r^2_{1.(2\ldots p)}\) is distributed as a real scalar type-1 beta variable with the parameters \((\frac {n-1}{2}-\frac {p-1}{2},~ \frac {p-1}{2})\). The simplifications in (6.8.11) are achieved by making use of the properties of determinants of partitioned matrices, which are discussed in Sect. 1.3. Since s 11 is 1 × 1 in this case, the numerator determinant is a real scalar quantity. Thus, this yields a type-2 beta distribution for \(w=\frac {u_4}{1-u_4}\) and thereby \(\frac {n-p}{p-1}w\) has an F-distribution, so that the test can be based on an F statistic having (n − 1) − (p − 1) = n − p and p − 1 degrees of freedom.
6.8.2. General case: k = 2
If in a 2 × 2 partitioning of S, S 11 is of order p 1 × p 1 and S 22 is of order p 2 × p 2 with p 2 = p − p 1. Then u 4 can be expressed as
where U is called the multiple correlation matrix. It will be shown that U has a real matrix-variate type-1 beta distribution when S 11 is of general order rather than being a scalar.
Theorem 6.8.1
Consider u 4 for k = 2. Let S 11 be p 1 × p 1 and S 22 be p 2 × p 2 so that p 1 + p 2 = p. Without any loss of generality, let us assume that p 1 ≤ p 2 . Then, under H o : Σ 12 = O, the multiple correlation matrix U has a real matrix-variate type-1 beta distribution with the parameters \((\frac {p_2}{2}, ~\frac {m}{2}-\frac {p_2}{2}),\) with m = n − 1, n being sample size, and thereby \((I-U)\sim \mathit{\mbox{ type-1 beta }}(\frac {m}{2}-\frac {p_2}{2},~ \frac {p_2}{2})\) , the determinant of I − U being u 4 under the null hypothesis when k = 2.
Proof
Since Σ under H o can readily be eliminated from a structure such as u 4, we will take a Wishart matrix S having m = n − 1 degrees of freedom, n denoting the sample size, and parameter matrix I, the identity matrix. At first, assume that Σ is a block diagonal matrix and make the transformation \(S_1=\varSigma ^{-\frac {1}{2}}S\varSigma ^{-\frac {1}{2}}\). As a result, u 4 will be free of Σ 11 and Σ 22, and so, we may take S ∼ W p(m, I). Now, consider the submatrices S 11, S 22, S 12 so that dS = dS 11 ∧dS 22 ∧dS 12. Let f(S) denote the W p(m, I) density. Then,
However, appealing to a result stated in Sect. 1.3, we have
The joint density of S 11, S 22, S 12 denoted by f 1(S 11, S 22, S 12) is then
Letting \(Y=S_{11}^{-\frac {1}{2}}S_{12}S_{22}^{-\frac {1}{2}},\) it follows from a result on Jacobian of matrix transformation, previously established in Chap. 1, that \({\mathrm{d}}Y=|S_{11}|{ }^{-\frac {p_2}{2}}|S_{22}|{ }^{-\frac {p_1}{2}}{\mathrm{d}}S_{12}\). Thus, the joint density of S 11, S 22, Y , denoted by f 2(S 11, S 22, Y ), is given by
Note that S 11, S 22, Y are independently distributed as f 2(⋅) can be factorized into functions of S 11, S 22, Y . Now, letting U = Y Y ′, it follows from Theorem 4.2.3 that
and the density of U, denoted by f 3(U), can then be expressed as follows:
which is a real matrix-variate type-1 beta density with the parameters \((\frac {p_2}{2},~ \frac {m}{2}-\frac {p_2}{2}),\) where c is the normalizing constant. As a result, I − U has a real matrix-variate type-1 beta distribution with the parameters \((\frac {m}{2}-\frac {p_2}{2}, ~\frac {p_2}{2})\). Finally, observe that u 4 is the determinant of I − U.
Corollary 6.8.1
Consider u 4 as given in (6.8.12) and the determinant |I − U| where U and I − U are defined in Theorem 6.8.1 . Then for k = 2 and an arbitrary h, \(E[u_4^h|H_o]=|I-U|{ }^{h}\).
Proof
On letting k = 2 in (6.8.8), we obtain the h-th moment of u 4|H o as
After canceling p 2 of the gamma functions, the remaining gamma product in the numerator of (i) is
excluding \(\pi ^{\frac {p_1(p_1-1)}{4}}\). The remainder of the gamma product present in the denominator is comprised of the gamma functions coming from \(\varGamma _{p_1}(\frac {m}{2}+h)\), excluding \(\pi ^{\frac {p_1(p_1-1)}{4}}\). The normalizing constant will automatically take care of the factors containing π. Now, the resulting part containing h is \(\varGamma _{p_1}(\frac {m}{2}-\frac {p_2}{2}+h)/\varGamma _{p_1}(\frac {m}{2}+h)\), which is the gamma ratio in the h-th moment of a p 1 × p 1 real matrix-variate type-1 beta distribution with the parameters \((\frac {m}{2}-\frac {p_2}{2}, ~\frac {p_2}{2})\).
Since this happens to be E[|I − U|]h for I − U distributed as is specified in Theorem 6.8.1, the Corollary is established.
An asymptotic result can be established from Corollary 6.5.1 and the λ-criterion for testing block-diagonality or equivalently the independence of subvectors in a p-variate Gaussian population. The resulting chisquare variable will have 2∑j δ j degrees of freedom where δ j is as defined in Corollary 6.5.1 for the second parameter of the real scalar type-1 beta distribution. Referring to (6.8.10), we have
Accordingly, the degrees of freedom of the resulting chisquare is \(2[\sum _{j=1}^k\frac {p_j(p-p_j)}{4}]=\sum _{j=1}^k\frac {p_j(p-p_j)}{2}\) in the real case. It can also be observed that the number of restrictions imposed by the null hypothesis H o is obtained by first letting all the off-diagonal elements of Σ = Σ′ equal to zero and subtracting the off-diagonal elements of the k diagonal blocks which produces \(\frac {p(p-1)}{2}-\sum _{j=1}^k\frac {p_j(p_j-1)}{2}=\sum _{j=1}^k\frac {p_j(p-p_j)}{2}\). In the complex case, the number of degrees of freedom will be twice that obtained for the real case, the chisquare variable remaining a real scalar chisquare random variable. This is now stated as a theorem.
Theorem 6.8.2
Consider the λ-criterion given in (6.8.1) in the real case and let the corresponding λ in the complex case be \(\tilde {\lambda }\) . Then \(-2\ln \lambda \to \chi ^2_{\delta }\) as n →∞ where n is the sample size and \(\delta = \sum _{j=1}^k\frac {p_j(p-p_j)}{2},\) which is also the number of restrictions imposed by H o . Analogously, in the complex case, \(-2\ln \tilde {\lambda }\to \chi ^2_{\tilde {\delta }}\) as n →∞, where the chisquare variable remains a real scalar chisquare random variable, \(\tilde {\delta }=\sum _{j=1}^kp_j(p-p_j)\) and n denotes the sample size.
6.9. Hypothesis that the Mean Value and Covariance Matrix are Given
Consider a real p-variate Gaussian population X j ∼ N p(μ, Σ), Σ > O, and a simple random sample, X 1, …, X n, from this population, the X i’s being iid as X j. Let the sample mean and the sample sum of products matrix be denoted by \(\bar {X}\) and S, respectively. Consider the hypothesis H o : μ = μ o, Σ = Σ o where μ o and Σ o are specified. Let us examine the likelihood ratio test for testing H o and obtain the resulting λ-criterion. Let the parameter space be Ω = {(μ, Σ)|Σ > O, −∞ < μ j < ∞, j = 1, …, p, μ′ = (μ 1, …, μ p)}. Let the joint density of X 1, …, X n be denoted by L. Then, as previously obtained, the maximum value of L is
and the maximum under H o is
Thus,
We reject H o for small values of λ. Since the exponential part dominates the polynomial part for large values, we reject for large values of the exponent, excluding (−1), which means for large values of \(\sum _{j=1}^n(X_j-\mu _o)'\varSigma _o^{-1}(X_j-\mu _o)\sim \chi ^2_{np}\) since \((X_j-\mu _o)'\varSigma _o^{-1}(X_j-\mu _o)\overset {iid}{\sim } \chi ^2_p\) for each j. Hence the criterion consists of
with
Let us determine the h-th moment of λ for an arbitrary h. Note that
Since λ contains S and \(\bar {X}\) and these quantities are independently distributed, we can integrate out the part containing S over a Wishart density having m = n − 1 degrees of freedom and the part containing \(\bar {X}\) over the density of \(\bar {X}\). Thus, for m = n − 1,
Under H o, the integral over \(\bar {X}\) gives
From (i) and (ii), we have
The inversion of this expression is quite involved due to branch points. Let us examine the asymptotic case as n →∞. On expanding the gamma functions by making use of the version of Stirling’s asymptotic approximation formula for gamma functions given in (6.5.14), namely \(\varGamma (z+\eta )\approx \sqrt {2\pi }z^{z+\eta -\frac {1}{2}}{\mathrm{e}}^{-z}\) for |z|→∞ and η bounded, we have
Thus, as n →∞, it follows from (6.9.6) and (iii) that
which implies that, asymptotically, \(-2\ln \lambda \) has a real scalar chisquare distribution with \(p+\frac {p(p+1)}{2}\) degrees of freedom in the real Gaussian case. Hence the following result:
Theorem 6.9.1
Given a N p(μ, Σ), Σ > O, population, consider the hypothesis H o : μ = μ o, Σ = Σ o where μ o and Σ o are specified. Let λ denote the λ-criterion for testing this hypothesis. Then, in the real case, \(-2\ln \lambda \to \chi ^2_{\delta }\) as n →∞ where \(\delta =p+\frac {p(p+1)}{2}\) and, in the corresponding complex case, \(-2\ln \lambda \to \chi ^2_{\delta _1}\) as n →∞ where δ 1 = 2p + p(p + 1), the chisquare variable remaining a real scalar chisquare random variable.
Note 6.9.1
In the real case, observe that the hypothesis H o : μ = μ o, Σ = Σ o imposes p restrictions on the μ parameters and \(\frac {p(p+1)}{2}\) restrictions on the Σ parameters, for a total of \(p+\frac {p(p+1)}{2}\) restrictions, which corresponds to the degrees of freedom for the asymptotic chisquare distribution in the real case. In the complex case, there are twice as many restrictions.
Example 6.9.1
Consider the real trivariate Gaussian distribution N 3(μ, Σ), Σ > O and the hypothesis H o : μ = μ o, Σ = Σ o where μ o, Σ o and an observed sample of size 5 are as follows:
Now,
and
Note that, in this example, n = 5, p = 3 and np = 15. Letting the significance level of the test be α = 0.05, H o is not rejected since \(14.04< \chi ^2_{15,\,0.05}=25\).
6.10. Testing Hypotheses on Linear Regression Models or Linear Hypotheses
Let the p × 1 real vector X j have an expected value μ and a covariance matrix Σ > O for j = 1, …, n, and the X j’s be independently distributed. Let X j, μ, Σ be partitioned as follows where x 1j, μ 1 and σ 11 are 1 × 1, μ (2), Σ 21 are (p − 1) × 1, \(\varSigma _{12}=\varSigma _{21}^{\prime }\) and Σ 22 is (p − 1) × (p − 1):
If the conditional expectation of x 1j, given X (2)j is linear in X (2)j, then omitting the subscript j since the X j’s are iid, it was established in Eq. (3.3.5) that
When the regression is linear, the best linear predictor of x 1 in terms of X (2) will be of the form
Then, by appealing to properties of the conditional expectation and conditional variance, it was shown in Chap. 3 that \(\beta '=\varSigma _{12}\varSigma _{22}^{-1}\). Hypothesizing that X (2) is not random, or equivalently that the predictor function is a function of the preassigned values of X (2), amounts to testing whether \(\varSigma _{12}\varSigma _{22}^{-1}=O\). Noting that Σ 22 > O since Σ > O, the null hypothesis thus reduces to H o : Σ 12 = O. If the original population X is p-variate real Gaussian, this hypothesis is then equivalent to testing the independence of x 1 and X (2). Actually, this has already been discussed in Sect. 6.8.2 for the case of k = 2, and is also tantamount to testing whether the population multiple correlation ρ 1.(2…k) = 0. Assuming that the population is Gaussian and letting of \(u=\lambda ^{\frac {2}{n}}\) where λ is the lambda criterion, \(u\sim \mbox{type-1 beta}(\frac {n-p}{2},~\frac {p-1}{2})\) under the null hypothesis; this was established in Theorem 6.8.1 for p 1 = 1 and p 2 = p − 1. Then, \(v=\frac {u}{1-u}\sim \mbox{ type-2 beta }(\frac {n-p}{2}, ~\frac {p-1}{2})\), that is, \(v\sim \frac {n-p}{p-1}F_{n-p,~p-1}\) or \(\frac {(p-1)}{(n-p)}\frac {u}{1-u}\sim F_{n-p,~p-1}\). Hence, in order to test H o : β = O,
The test statistic u is of the form
where the submatrices of S, s 11 is 1 × 1 and S 22 is (p − 1) × (p − 1). Observe that the number of parameters being restricted by the hypothesis Σ 12 = O is p 1 p 2 = 1(p − 1) = p − 1. Hence as n →∞, the null distribution of \(-2\ln \lambda \) is a real scalar chisquare having p − 1 degrees of freedom. Thus, the following result:
Theorem 6.10.1
Let the p × 1 vector X j be partitioned into the subvectors x 1j of order 1 and X (2)j of order p − 1. Let the regression of x 1j on X (2)j be linear in X (2)j , that is, E[x 1j|X (2)j] − E(x 1j) = β′(X (2)j − E(X (2)j)). Consider the hypothesis H o : β = O. Let X j ∼ N p(μ, Σ), Σ > O, for j = 1, …, n, the X j ’s being independently distributed, and let λ be the λ-criterion for testing this hypothesis. Then, as n →∞, \(-2\ln \lambda \to \chi ^2_{p-1}\).
Example 6.10.1
Let the population be N 3(μ, Σ), Σ > O, and the observed sample of size n = 5 be
The resulting sample average \(\bar {X}\) and deviation vectors are then
Letting
so that the test statistic is
Let us test H o at the significance level α = 0.05. In that case, the critical value which is available from F tables is F n−p,p−1,α = F 2,2,0.05 = 19. Since the observed value of v is 0.164 < 19, H o is not rejected.
Note 6.10.1
Observe that
where r 1.(2,3) is the sample multiple correlation between the first component of X j ∼ N 3(μ, Σ), Σ > O, and the other two components of X j. If the population covariance matrix Σ is similarly partitioned, that is,
then, the population multiple correlation coefficient is ρ 1.(2,3) where
Thus, if \(\varSigma _{12}=\varSigma _{21}^{\prime }=O\), ρ 1.(2,3) = 0 and conversely since σ 11 > 0 and \(\varSigma _{22}>O\Rightarrow \varSigma _{22}^{-1}>O\). The regression coefficient β being equal to the transpose of \(\varSigma _{12}\varSigma _{22}^{-1}\), Σ 12 = O also implies that the regression coefficient β = O and conversely. Accordingly, the hypothesis that the regression coefficient vector β = O is equivalent to hypothesizing that the population multiple correlation ρ 1,(2,…,p) = 0, which also implies the hypothesis that the two subvectors are independently distributed in the multivariate normal case, or that the covariance matrix Σ 12 = O, the only difference being that the test on regression coefficients is in the conditional space whereas testing the independence of the subvectors or whether the population multiple correlation equals zero is carried out in the entire space. The numerical example included in this section also illustrates the main result presented in Sect. 6.8.1 in connection with testing whether a population multiple correlation coefficient is equal to zero.
6.10.1. A simple linear model
Consider a linear model of the following form where a real scalar variable y is estimated by a linear function of pre-assigned real scalar variables z 1, …, z q:
where y 1, …, y n are n observations on y, z i1, z i2, …, z in, i = 1, …, q, are preassigned values on z 1, …, z q, and β o, β 1, …, β q are unknown parameters. The random components e j, j = 1, …, n, are the corresponding sum total contributions coming from all unknown factors. There are two possibilities with respect to this model: β o = 0 or β o≠0. If β o = 0, β o is omitted in model (i) and we let y j = x j. If β o≠0, the model is modified by taking \(x_j=y_j-\bar {y}\) where \( \bar {y}=\frac {1}{n}(y_1+\cdots +y_n)\), then becoming
for some error term 𝜖 j, where \(\bar {z}_i=\frac {1}{n}(z_{i1}+\cdots +z_{in}), \ i=1,\ldots ,q\). Letting \(Z_j^{\prime }=(z_{1j},\ldots ,z_{qj})\) if β o = 0 and \(Z_j^{\prime }=(z_{1j}-\bar {z}_1,\ldots ,z_{qj}-\bar {z}_q)\), otherwise, equation (ii) can be written in vector/matrix notation as follows:
Letting
The least squares minimum is thus available by differentiating 𝜖′𝜖 with respect to β, equating the resulting expression to a null vector and solving, which will produce a single critical point that corresponds to the minimum as the maximum occurs at + ∞:
that is,
Since the z ij’s are preassigned quantities, it can be assumed without any loss of generality that (ZZ′) is nonsingular, and thereby that (ZZ′)−1 exists, so that the least squares minimum, usually denoted by s 2, is available by substituting \(\hat {\beta }\) for β in 𝜖′𝜖. Then, at \(\beta =\hat {\beta }\),
where I − Z′(ZZ′)−1 Z is idempotent and of rank (n − 1) − q. Observe that if β o ≠ 0 in (i) and we had proceeded without eliminating β o, then β would have been of order (k + 1) × 1 and I − Z′(ZZ′)−1 Z, of rank n − (q + 1) = n − 1 − q, whereas if β o≠0 and we had eliminated β o from the model, then the rank of I − Z′(ZZ′)−1 Z would have been (n − 1) − q, that is, unchanged, since \(\sum _{j=1}^n(x_j-\bar {x})^2=X'[I-\frac {1}{n}JJ']X,~ J'=(1,\ldots ,1)\) and the rank of \(I-\frac {1}{n}JJ'=n-1\).
Some distributional assumptions on 𝜖 j are required in order to test hypotheses on β. Let 𝜖 j ∼ N 1(0, σ 2), σ 2 > 0, j = 1, …, n, be independently distributed. Then x j ∼ N 1(β′Z j, σ 2), j = 1, …, n are independently distributed but not identically distributed as the mean value depends on j. Under the normality assumption for the 𝜖 j’s, it can readily be seen that the least squares estimators of β and σ 2 coincide with the maximum likelihood estimators. It can also be observed that σ 2 is estimated by \(\frac {s^2}{n}\) where n is the sample size. In this simple linear regression context, the parameter space Ω = {(β, σ 2)|σ 2 > 0}. Thus, under the normality assumption, the maximum of the likelihood function L is given by
Under the hypothesis H o : β = O or β 1 = 0 = ⋯ = β q, the least squares minimum, usually denoted as \(s_o^2\), is X′X and, assuming normality, the maximum of the likelihood function under H o is the following:
Thus, the λ-criterion is
where
with the matrices Z′(ZZ′)−1 Z and I − Z′(ZZ′)−1 Z being idempotent, mutually orthogonal, and of ranks q and (n − 1) − q, respectively. We can interpret \(s_o^2-s^2\) as the sum of squares due to the hypothesis and s 2 as the residual part. Under the normality assumption, \(s_o^2-s^2\) and s 2 are independently distributed in light of independence of quadratic forms that was discussed in Sect. 3.4.1; moreover, their representations as quadratic forms in idempotent matrices of ranks q and (n − 1) − q implies that \(\frac {s_o^2-s^2}{\sigma ^2}\sim \chi _q^2\) and \(\frac {s^2}{\sigma ^2}\sim \chi _{(n-1)-q}^2\). Accordingly, under the null hypothesis,
that is, an F-statistic having q and n − 1 − q degrees of freedom. Thus, we reject H o for small values of λ or equivalently for large values of u 2 or large values of F q,n−1−q. Hence, the following criterion:
A detailed discussion of the real scalar variable case is provided in Mathai and Haubold (2017).
6.10.2. Hypotheses on individual parameters
Denoting the expected value of (⋅) by E[(⋅)], it follows from (6.10.4) that
Under the normality assumption on x j, we have \(\hat {\beta }\sim N_q(\beta ,\sigma ^2(ZZ')^{-1})\). Letting the (r, r)-th diagonal element of (ZZ′)−1 be b rr, then \(\hat {\beta }_r\), the estimator of the r-th component of the parameter vector β, is distributed as \(\hat {\beta }_r\sim N_1(\beta _r,\sigma ^2b_{rr})\), so that
where t n−1−q denotes a Student-t distribution having n − 1 − q degrees of freedom and \(\hat {\sigma }^2=\frac {s^2}{n-1-q}\) is an unbiased estimator for σ 2. On writing s 2 in terms of 𝜖, it is easily seen that E[s 2] = (n − 1 − q)σ 2 where s 2 is the least squares minimum in the entire parameter space Ω. Thus, one can test hypotheses on β r and construct confidence intervals for that parameter by means of the Student-t statistic specified in (6.10.9) or its square \(t^2_{n-1-q}\) which has an F distribution having 1 and n − 1 − q degrees of freedom, that is, \(t^2_{n-1-q}\sim F_{1,n-1-q}\).
Example 6.10.2
Let us consider a linear model of the following form:
where the z ij’s are preassigned numbers of the variable z i. Let us take n = 5 and q = 2, so that the sample is of size 5 and, excluding β o, the model has two parameters. Let the observations on y and the preassigned values on the z i’s be the following:
The averages on y, z 1 and z 2 are then
and, in terms of deviations, the model becomes
That is,
When minimizing 𝜖′𝜖 = (X − Z′β)′(X − Z′β), we determined that \(\hat {\beta }\), the least squares estimate of β, the least squares minimum s 2 and \(s_o^2-s^2=\) corresponding to the sum of squares due to β, could be express as
Let us evaluate those quantities:
Then,
The test statistics u 1 and u 2 and their observed values are the following:
Letting the significance level be α = 0.05, the required tabulated critical value is F q,n−1−q,α = F 2,2,0.05 = 19. Since 0.72 < 19, the hypothesis H o : β = O is not rejected. Thus, we will not proceed to test individual hypotheses on the regression coefficients β 1 and β 2. For tests on general linear models, refer for instance to Mathai (1971).
6.11. Problem Involving Two or More Independent Gaussian Populations
Consider k independent p-variate real normal populations X j ∼ N p(μ (j), Σ), Σ > O, j = 1, …, k, having the same nonsingular covariance matrix Σ but possibly different mean values. We consider the problem of testing hypotheses on linear functions of the mean values. Let b = a 1 μ (1) + ⋯ + a k μ (k) where a 1, …, a k, are real scalar constants, and the null hypothesis be H o : b = b o (given), which means that the a j’s and μ (j)’s, j = 1, …, k, are all specified. It is also assumed that Σ is known. Suppose that simple random samples of sizes n 1, …, n k from these k independent normal populations can be secured, and let the sample values be X jq, q = 1, …, n j, where \(X_{j1},\ldots ,X_{jn_j}\) are iid as N p(μ (j), Σ), Σ > O. Let the sample averages be denoted by \(\bar {X}_j=\frac {1}{n_j}\sum _{q=1}^{n_j}X_{jq},~ j=1,\ldots ,k\). Consider the test statistic \(U_k=a_1\bar {X}_1+\cdots +a_k\bar {X}_k\). Since the populations are independent and U k is a linear function of independent vector normal variables, U k is normally distributed with the mean value b = a 1 μ (1) + ⋯ + a k μ (k) and covariance matrix \(\frac {1}{n}\varSigma \), where \(\frac {1}{n}=(\frac {a_1^2}{n_1}+\cdots +\frac {a_k^2}{n_k})\) and so, \(\sqrt {n}\,\varSigma ^{-\frac {1}{2}}(U_k-b)\sim N_p(O,I)\). Then, under the hypothesis H o : b = b o (given), which is being tested against the alternative H 1 : b≠b o, the test criterion is obtained by proceeding as was done in the single population case. Thus, the test statistic is \(z=n(U_k-b_o)'\varSigma ^{-1}(U_k-b_o)\sim \chi _{p}^2\) and the criterion will be to reject the null hypothesis for large values of the z. Accordingly, the criterion is
In particular, suppose that we wish to test the hypothesis H o : δ = μ (1) − μ (2) = δ 0, such as δ o = 0 as is often the case, against the natural alternative. In this case, when δ o = 0, the null hypothesis is that the mean value vectors are equal, that is, μ (1) = μ (2), and the test statistic is \( z=n(\bar {X}_1-\bar {X}_2)'\varSigma ^{-1}(\bar {X}_1-\bar {X}_2)\sim \chi _{p}^2\) with \(\frac {1}{n}=\frac {1}{n_1}+\frac {1}{n_2}\), the test criterion being
For a numerical example, the reader is referred to Example 6.2.3. One can also determine the power of the test or the probability of rejecting H o : δ = μ (1) − μ (2) = δ 0, under an alternative hypothesis, in which case the distribution of z is a noncentral chisquare variable with p degrees of freedom and non-centrality parameter \(\lambda =\frac {1}{2}\frac {n_1n_2}{n_1+n_2}(\delta -\delta _o)'\varSigma ^{-1}(\delta -\delta _o),~ \delta =\mu _{(1)}-\mu _{(2)}\), where n 1 and n 2 are the sample sizes. Under the null hypothesis, the non-centrality parameter λ is equal to zero. The power is given by
When the population covariance matrices are identical and the common covariance matrix is unknown, one can also construct a statistic for testing hypotheses on linear functions of the mean value vectors by making use of steps parallel to those employed in the single population case, with the resulting criterion being based on Hotelling’s T 2 statistic for testing H o : μ (1) = μ (2).
6.11.1. Equal but unknown covariance matrices
Let us consider the same procedure as in Sect. 6.11 to test a hypothesis on a linear function b = a 1 μ (1) + ⋯ + a k μ (k) where a 1, …, a k are known real scalar constants and μ (j), j = 1, …, k, are the population mean values. We wish to test the hypothesis H o : b = b o (given) in the sense all the mean values μ (j), j = 1, …, k and a 1, …, a k, are specified. . Let \(U_k=a_1\bar {X}_1+\cdots +a_k\bar {X}_k\) as previously defined. Then, E[U k] = b and \({\mathrm{Cov}}(U_k)=(\frac {a_1^2}{n_1}+\cdots +\frac {a_k^2}{n_k})\varSigma \), where \((\frac {a_1^2}{n_1}+\cdots +\frac {a_k^2}{n_k})\equiv \frac {1}{n}\) for some symbol n. The common covariance matrix Σ has the MLE \(\frac {1}{n_1+\cdots +n_k}(S_1+\cdots +S_k)\) where S j is the sample sum of products matrix for the j-th Gaussian population. It has been established that S = S 1 + ⋯ + S k has a Wishart distribution with (n 1 − 1) + ⋯ + (n k − 1) = N − k, N = n 1 + ⋯ + n k, degrees of freedom, that is,
Then, when Σ is unknown, it follows from a derivation parallel to that provided in Sect. 6.3 for the single population case that
or, w has a real scalar type-2 beta distribution with the parameters \((\frac {p}{2},~ \frac {N-k-p}{2})\). Letting \(w=\frac {p}{N-k-p}F\), this F is an F-statistic with p and N − k − p degrees of freedom.
Theorem 6.11.1
Let U k, n, N, b, S be as defined above. Then w = n(U k − b)′S −1 (U k − b) has a real scalar type-2 beta distribution with the parameters \((\frac {p}{2}, ~\frac {N-k-p}{2})\) . Letting \(w=\frac {p}{N-k-p}F\) , this F is an F-statistic with p and N − k − p degrees of freedom.
Hence for testing the hypothesis H o : b = b o (given), the criterion is the following:
Note that by exploiting the connection between type-1 and type-2 real scalar beta random variables, one can obtain a number of properties on this F-statistic.
This situation has already been covered in Theorem 6.3.4 for the case k = 2.
6.12. Equality of Covariance Matrices in Independent Gaussian Populations
Let X j ∼ N p(μ (j), Σ j), Σ j > O, j = 1, …, k, be independently distributed real p-variate Gaussian populations. Consider simple random samples of sizes n 1, …, n k from these k populations, whose sample values, denoted by X jq, q = 1, …, n j, are iid as X j1, j = 1, …, k. The sample sums of products matrices denoted by S 1, …, S k, respectively, are independently distributed as Wishart matrix random variables with n j − 1, j = 1, …, k, degrees of freedoms. The joint density of all the sample values is then given by
the MLE’s of μ (j) and Σ j being \(\hat {\mu _{(j)}}=\bar {X}_j\) and \( \hat {\varSigma }_j=\frac {1}{n_j}S_j\). The maximum of L in the entire parameter space Ω is
Let us test the hypothesis of equality of covariance matrices:
where Σ is unknown. Under this null hypothesis, the MLE of μ (j) is \(\bar {X}_j\) and the MLE of the common Σ is \(\frac {1}{N}(S_1+\cdots +S_k)=\frac {1}{N}S, ~N=n_1+\cdots +n_k,~ S=S_1+\cdots +S_k\). Thus, the maximum of L under H o is
and the λ-criterion is the following:
Let us consider the h-th moment of λ for an arbitrary h. Letting \(c=\frac {N^{\frac {Np}{2}}}{\Big \{\prod _{j=1}^kn_j^{\frac {n_jp}{2}}\Big \}},\)
The factor causing a difficulty, namely \(|S|{ }^{-\frac {Nh}{2}}\), will be replaced by an equivalent integral. Letting Y > O be a real p × p positive definite matrix, we have the identity
where
Thus, once (6.12.4) is substituted in (6.12.3), λ h splits into products involving S j, j = 1, …, k; this enables one to integrate out over the densities of S j, which are Wishart densities with m j = n j − 1 degrees of freedom. Noting that the exponent involving S j is \(-\frac {1}{2}{\mathrm{tr}}(\varSigma _j^{-1}S_j)-{\mathrm{tr}}(YS_j)=-\frac {1}{2}{\mathrm{tr}}[S_j(\varSigma _j^{-1}+2Y)]\), the integral over the Wishart density of S j gives the following:
Thus, on substituting (iv) in E[λ h], we have
which is the non-null h-th moment of λ. The h-th null moment is available when Σ 1 = ⋯ = Σ k = Σ. In the null case,
Then, substituting (v) in (6.12.5) and integrating out over Y produces
Observe that when h = 0, E[λ h|H o] = 1. For h = s − 1 where s is a complex parameter, we have the Mellin transform of the density of λ, denoted by f(λ), which can be expressed as follows in terms of an H-function:
and zero elsewhere, where the H-function is defined in Sect. 5.4.3, more details being available from Mathai and Saxena (1978) and Mathai et al. (2010). Since the coefficients of \(\frac {h}{2}\) in the gammas, that is, n 1, …, n k and N, are all positive integers, one can expand all gammas by using the multiplication formula for gamma functions, and then, f(λ) can be expressed in terms of a G-function as well. It may be noted from (6.12.5) that for obtaining the non-null moments, and thereby the non-null density, one has to integrate out Y in (6.12.5). This has not yet been worked out for a general k. For k = 2, one can obtain a series form in terms of zonal polynomials for the integral in (6.12.5). The rather intricate derivations are omitted.
6.12.1. Asymptotic behavior
We now investigate the asymptotic behavior of \(-2\ln \lambda \) as n j →∞, j = 1, …, k, N = n 1 + ⋯ + n k. On expanding the real matrix-variate gamma functions in the the gamma ratio involving h in (6.12.6), we have
excluding the factor containing π. Letting \(\frac {n_j}{2}(1+h)\to \infty ,~ j=1,\ldots ,k,\) and \(\frac {N}{2}(1+h)\to \infty \) as n j →∞, j = 1, …, k, with N →∞, we now express all the gamma functions in (i) in terms of Sterling’s asymptotic formula. For the numerator, we have
and the denominator in (i) has the following asymptotic representation:
Now, expanding the gammas in the constant part \(\varGamma _p(\frac {N-k}{2})/\prod _{j=1}^k\varGamma _p(\frac {n_j-1}{2})\) and then taking care of c h, we see that the factors containing π, the n j’s and N disappear leaving
Hence \(-2\ln \lambda \to \chi ^2_{(k-1)\frac {p(p+1)}{2}}\) and hence we have the following result:
Theorem 6.12.1
Consider the λ-criterion in (6.12.2) or the null density in (6.12.7). When n j →∞, j = 1, …, k, the asymptotic null density of \(-2\ln \lambda \) is a real scalar chisquare with \((k-1)\frac {p(p+1)}{2}\) degrees of freedom.
Observe that the number of parameters restricted by the null hypothesis H o : Σ 1 = ⋯ = Σ k = Σ where Σ is unknown, is (k − 1) times the number of distinct parameters in Σ, which is \(\frac {p(p+1)}{2}\), which coincides with the number of degrees of freedom of the asymptotic chisquare distribution under H o.
6.13. Testing the Hypothesis that k Independent p-variate Real Gaussian Populations are Identical and Multivariate Analysis of Variance
Consider k independent p-variate real Gaussian populations X ij ∼ N p(μ (i), Σ i), Σ i > O, i = 1, …, k, and j = 1, …, n i, where the p × 1 vector X ij is the j-th sample value belonging to the i-th population, these samples (iid variables) being of sizes n 1, …, n k from these k populations. The joint density of all the sample values, denoted by L, can be expressed as follows:
where \(\bar {X}_i=\frac {1}{n_i}(X_{i1}+\cdots +X_{in_i}),~ i=1,\ldots ,k,\) and E[X ij] = μ (i), j = 1, …, n i. Then, letting N = n 1 + ⋯ + n k,
Consider the hypothesis H o : μ (1) = ⋯ = μ (k) = μ, Σ 1 = ⋯ = Σ k = Σ, where μ and Σ are unknown. This corresponds to the hypothesis of equality of these k populations. Under H o, the maximum likelihood estimator (MLE) of μ, denoted by \(\hat {\mu }\), is given by \(\hat {\mu }=\frac {1}{N}[n_1\bar {X}_1+\cdots +n_k\bar {X}_k]\) where N and \(\bar {X}_i\) are as defined above. As for the common Σ, its MLE is
where S i is the sample sum of products matrix for the i-th sample, observing that
Hence the maximum of the likelihood function under H o is the following:
where S = S 1 + ⋯ + S k. Therefore the λ-criterion is given by
6.13.1. Conditional and marginal hypotheses
For convenience, we may split λ into the product λ 1 λ 2 where λ 1 is the λ-criterion for the conditional hypothesis H o1 : μ (1) = ⋯ = μ (k) = μ given that Σ 1 = ⋯ = Σ k = Σ and λ 2 is the λ-criterion for the marginal hypothesis H o2 : Σ 1 = ⋯ = Σ k = Σ where μ and Σ are unknown. The conditional hypothesis H o1 is actually the null hypothesis usually being made when establishing the multivariate analysis of variance (MANOVA) procedure. We will only consider H o1 since the marginal hypothesis H o2 has already been discussed in Sect. 6.12. When the Σ i’s are assumed to be equal, the common Σ is estimated by the MLE \(\frac {1}{N}(S_1+\cdots +S_k)\) where S i is the sample sum of products matrix in the i-th population. The common μ is estimated by \(\frac {1}{N}(n_1\bar {X}_1+\cdots +n_k\bar {X}_k)\). Accordingly, the λ-criterion for this conditional hypothesis is the following:
where \(S=S_1+\cdots +S_k,~ \hat {\mu }=\frac {1}{N}(n_1\bar {X}_1+\cdots +n_k\bar {X}_k),~ N=n_1+\cdots +n_k\). Note that the S i’s are independently Wishart distributed with n i − 1 degrees of freedom, that is, \(S_i\overset {ind}{\sim } W_p(n_i-1,~\varSigma )\), i = 1, …, k, and hence S ∼ W p(N − k, Σ). Let
Since Q only contains sample averages and the sample averages and the sample sum of products matrices are independently distributed, Q and S are independently distributed. Moreover, since we can write \(\bar {X}_i-\hat {\mu }\) as \((\bar {X}_i-\mu )-(\hat {\mu }-\mu )\), where μ is the common true mean value vector, without any loss of generality we can deem the \(\bar {X}_i\)’s to be independently \(N_p(O,\frac {1}{n_i}\varSigma )\) distributed, i = 1, …, k, and letting \(Y_i=\sqrt {n_i}\bar {X}_i\), one has \(Y_i\overset {iid}{\sim } N_p(O,\varSigma )\) under the hypothesis H o1. Now, observe that
where \(\sqrt {n_1}Y_1+\cdots +\sqrt {n_k}Y_k=(Y_1,\ldots ,Y_k)DJ\) with J being the k × 1 vector of unities, J′ = (1, …, 1), and \(D={\mathrm{diag}}(\sqrt {n_1},\ldots ,\sqrt {n_k})\). Thus, we can express Q as follows:
Let \(B=\frac {1}{N}DJJ'D\) and A = I − B. Then, observing that J′D 2 J = N, both B and A are idempotent matrices, where B is of rank 1 since the trace of B or equivalently the trace of \(\frac {1}{N}J'D^2J\) is equal to one, so that the trace of A which is also its rank, is k − 1. Then, there exists an orthonormal matrix P, PP′ = I k, P′P = I k, such that where O is a (k − 1) × 1 null vector, O′ being its transpose. Letting (U 1, …, U k) = (Y 1, …, Y k)P′, the U i’s are still independently N p(O, Σ) distributed under H o1, so that
Thus, \(S+\sum _{i=1}^kn_i(\bar {X}_i-\hat {\mu })(\bar {X}_i-\hat {\mu })'\sim W_p(N-1,~\varSigma )\), which clearly is not independently distributed of S, referring to the ratio in (6.13.2).
6.13.2. Arbitrary moments of λ 1
Given (6.13.2), we have
where \(|S+Q|{ }^{-\frac {Nh}{2}}\) will be replaced by the equivalent integral
with the p × p matrix T > O. Hence, the h-th moment of λ 1, for arbitrary h, is the following expected value:
We now evaluate (vii) by integrating out over the Wishart density of S and over the joint multinormal density for U 1, …, U k−1:
The integral over S is evaluated as follows:
for I + 2ΣT > O, \(\Re (\frac {N-k}{2}+\frac {Nh}{2})>\frac {p-1}{2}\). The integral over U 1, …, U k−1 is the following, denoted by δ:
where
since \(U_i^{\prime }TU_i\) is scalar; thus, the exponent becomes \(-\frac {1}{2}\sum _{i=1}^{k-1}U_i^{\prime }[\varSigma ^{-1}+2T]U_i\) and the integral simplifies to
Now the integral over T is the following:
Therefore,
for \(\Re (\frac {N-k}{2}+\frac {Nh}{2})>\frac {p-1}{2}\).
6.13.3. The asymptotic distribution of \(-2\ln \lambda _1\)
An asymptotic distribution of \(-2\ln \lambda _1\) as N →∞ can be derived from (6.13.3). First, on expanding the real matrix-variate gamma functions in (6.13.3), we obtain the following representation of the h-th null moment of λ 1:
Let us now express all the gamma functions in terms of Sterling’s asymptotic formula by taking \(\frac {N}{2}\to \infty \) in the constant part and \(\frac {N}{2}(1+h)\to \infty \) in the part containing h. Then,
Hence,
Thus, the following result:
Theorem 6.13.1
For the test statistic λ 1 given in (6.13.2), \(-2\ln \lambda _1\to \chi ^2_{(k-1)p}\) , that is, \(-2\ln \lambda _1 \) tends to a real scalar chisquare variable having (k − 1)p degrees of freedom as N →∞ with N = n 1 + ⋯ + n k , n j being the sample size of the j-th p-variate real Gaussian population.
Under the marginal hypothesis H o2 : Σ 1 = ⋯ = Σ k = Σ where Σ is unknown, the λ-criterion is denoted by λ 2, and its h-th moment, which is available from (6.12.6) of Sect. 6.12, is given by
for \(\Re (\frac {n_j-1}{2}+\frac {n_jh}{2})>\frac {p-1}{2}, ~j=1,\ldots ,k\). Hence the h-th null moment of the λ criterion for testing the hypothesis H o of equality of the k independent p-variate real Gaussian populations is the following:
for \(\Re (\frac {n_j-1}{2}+\frac {n_jh}{2})>\frac {p-1}{2}, ~j=1,\ldots ,k, ~N=n_1+\cdots +n_k\), where c is the constant associated with the h-th moment of λ 2. Combining Theorems 6.13.1 and Theorem 6.12.1, the asymptotic distribution of \(-2\ln \lambda \) of (6.13.6) is a real scalar chisquare with \((k-1)p+(k-1)\frac {p(p+1)}{2}\) degrees of freedom. Thus, the following result:
Theorem 6.13.2
For the λ-criterion for testing the hypothesis of equality of k independent p-variate real Gaussian populations, \(-2\ln \lambda \to \chi ^2_{\nu }, ~\nu =(k-1)p+(k-1)\frac {p(p+1)}{2}\) as n j →∞, j = 1, …, k.
Note 6.13.1
Observe that for the conditional hypothesis H o1 in (6.13.2), the degrees of freedom of the asymptotic chisquare distribution of \(-2\ln \lambda _1\) is (k − 1)p, which is also the number of parameters restricted by the hypothesis H o1. For the hypothesis H o2, the corresponding degrees of freedom of the asymptotic chisquare distribution of \(-2\ln \lambda _2\) is \((k-1)\frac {p(p+1)}{2}\), which as well is the number of parameters restricted by the hypothesis H o2. The asymptotic chisquare distribution of \(-2\ln \lambda \) for the hypothesis H o of equality of k independent p-variate Gaussian populations the degrees of freedom is the sum of these two quantities, that is, \((k-1)p+(k-1)\frac {p(p+1)}{2}=p(k-1)\frac {(p+3)}{2}\), which also coincides with the number of parameters restricted under the hypothesis H o.
Exercises
6.1.
Derive the λ-criteria for the following tests in a real univariate Gaussian population N 1(μ, σ 2), assuming that a simple random sample of size n, namely x 1, …, x n, which are iid as N 1(μ, σ 2), is available: (1): μ = μ o (given), σ 2 is known; (2): μ = μ o, σ 2 unknown; (3): \(\sigma ^2=\sigma _o^2\) (given), also you may refer to Mathai and Haubold (2017).
In all the following problems, it is assumed that a simple random sample of size n is available. The alternative hypotheses are the natural alternatives.
6.2.
Repeat Exercise 6.1. for the corresponding complex Gaussian.
6.3.
Construct the λ-criteria in the complex case for the tests discussed in Sects. 6.2–6.4.
6.4.
In the real p-variate Gaussian case, consider the hypotheses (1): Σ is diagonal or the individual components are independently distributed; (2): The diagonal elements are equal, given that Σ is diagonal (which is a conditional test). Construct the λ-criterion in each case.
6.5.
Repeat Exercise 6.4. for the complex Gaussian case.
6.6.
Let the population be real p-variate Gaussian N p(μ, Σ), Σ = (σ ij) > O, μ′ = (μ 1, …, μ p). Consider the following tests and compute the λ-criteria: (1): σ 11 = ⋯ = σ pp = σ 2, σ ij = ν for all i and j, i≠j. That is, all the variances are equal and all the covariances are equal; (2): In addition to (1), μ 1 = μ 2 = ⋯ = μ or all the mean values are equal. Construct the λ-criterion in each case. The first one is known as L vc criterion and the second one is known L mvc criterion. Repeat the same exercise for the complex case. Some distributional aspects are examined in Mathai (1970b) and numerical tables are available in Mathai and Katiyar (1979b).
6.7.
Let the population be real p-variate Gaussian N p(μ, Σ), Σ > O. Consider the hypothesis (1):
(2):
where a 1≠0, a 2≠0, b 1≠0, b 2≠0, a 1≠b 1, a 2≠b 2, \(\varSigma _{12}=\varSigma _{21}^{\prime }\) and all the elements in Σ 12 and Σ 21 are each equal to c≠0. Construct the λ-criterion in each case. (These are hypotheses on patterned matrices).
6.8.
Repeat Exercise 6.7. for the complex case.
6.9.
Consider k independent real p-variate Gaussian populations with different parameters, distributed as \(N_p(M_j,\varSigma _j),~\varSigma _j>O,~ M_j^{\prime }=(\mu _{1j},\ldots ,\mu _{pj}),~ j=1,\ldots ,k\). Construct the λ-criterion for testing the hypothesis Σ 1 = ⋯ = Σ k or the covariance matrices are equal. Assume that simple random samples of sizes n 1, …, n k are available from these k populations.
6.10.
Repeat Exercise 6.9. for the complex case.
6.11.
For the second part of Exercise 6.7., which is also known as Wilks’ L mvc criterion, show that if \(u=\lambda ^{\frac {2}{n}}\) where λ is the likelihood ratio criterion and n is the sample size, then
where S = (s ij) is the sample sum of products matrix, \(s=\frac {1}{p}\sum _{i=1}^ps_{ii},\) \( s_1=\frac {1}{p(p-1)}\sum _{i\ne j=1}^ps_{ij}\), \(\bar {x}=\frac {1}{p}\sum _{i=1}^p\bar {x}_i,\) \(\bar {x}_i=\frac {1}{n}\sum _{k=1}^nx_{ik}\). For the statistic u in (i), show that the h-th null moment or the h-th moment when the null hypothesis is true, is given by the following:
Write down the conditions for the existence of the moment in (ii). [For the null and non-null distributions of Wilks’ L mvc criterion, see Mathai (1978).]
6.12.
Let the (p + q) × 1 vector X have a (p + q)-variate nonsingular real Gaussian distribution, X ∼ N p+q(μ, Σ), Σ > O. Let
where Σ 1 is p × p with all its diagonal elements equal to σ aa and all other elements equal to \(\sigma _{aa'}\), Σ 2 has all elements equal to σ ab, Σ 3 has all diagonal elements equal to σ bb and all other elements equal to \(\sigma _{bb'}\) where \(\sigma _{aa},~ \sigma _{aa'},~ \sigma _{bb},~ \sigma _{bb'}\) are unknown. Then, Σ is known as bipolar. Let λ be the likelihood ratio criterion for testing the hypothesis that Σ is bipolar. Then show that the h-th null moment is the following:
where n is the sample size. Write down the conditions for the existence of this h-th null moment.
6.13.
Let X be m × n real matrix having the matrix-variate Gaussian density
Letting S = XX′, S is a non-central Wishart matrix. Derive the density of S and show that this density, denoted by f s(S), is the following:
where \(\Omega =\frac {1}{2}MM'\varSigma ^{-1}\) is the non-centrality parameter and 0 F 1 is a Bessel function of matrix argument.
6.14.
Show that the h-th moment of the determinant of S, the non-central Wishart matrix specified in Exercise 6.13., is given by
where 1 F 1 is a hypergeometric function of matrix argument and Ω is the non-centrality parameter defined in Exercise 6.13..
6.15.
Letting \(v=\lambda ^{\frac {2}{n}}\) in Eq. (6.3.10), show that, under the null hypothesis H o, v is distributed as a real scalar type-1 beta with the parameters \((\frac {n-1}{2},~\frac {1}{2})\) and that \(\frac {n A'\bar {X}\bar {X}'A}{A'SA}\) is real scalar type-2 beta distributed with the parameters \((\frac {1}{2}, ~\frac {n-1}{2})\).
6.16.
Show that for an arbitrary h, the h-th null moment of the test statistic λ specified in Eq. (6.3.10) is
References
T.W. Anderson (2003): An Introduction to Multivariate Statistical Analysis, Third Edition, Wiley, New York.
A.W. Davis (1971): Percentile approximations for a class of likelihood ratio criteria, Biometrika, 58, 349–356.
A.W. Davis and J.B. Field (1971): Tables of some multivariate test criteria, Technical Report No. 32, Division of Mathematical Statistics, CSIRO, Canberra, Australia.
B.P. Korin (1968): On the distribution of a statistic used for testing a covariance matrix, Biomerika, 55, 171–178.
A.M. Mathai (1970a): An expansion of Meijer’s G-function in the logarithmic case with applications, Mathematische Nachrichten, 48, 129–139.
A.M. Mathai (1970b): The exact distribution of Bartlett’s criterion, Publ. L’ISUP, Paris, 19, 1–15.
A.M. Mathai (1971): On the distribution of the likelihood ratio criterion for testing linear hypotheses on regression coefficients, Annals of the Institute of Statistical Mathematics, 23, 181–197.
A.M. Mathai (1972a): The exact distribution of three tests associated with Wilks’ concept of generalized variance, Sankhya Series A, 34, 161–170.
A.M. Mathai (1972b): The exact non-central distribution of the generalized variance, Annals of the Institute of Statistical Mathematics, 24, 53–65.
A.M. Mathai (1973): A review of the various methods of obtaining the exact distributions of multivariate test criteria, Sankhya Series A, 35, 39–60.
A.M. Mathai (1977): A short note on the non-null distribution for the problem of sphericity, Sankhya Series B, 39, 102.
A.M. Mathai (1978): On the non-null distribution of Wilks’ Lmvc, Sankhya Series B, 40, 43–48.
A.M. Mathai (1979a): The exact distribution and the exact percentage points for testing equality of variances in independent normal population, Journal of Statistical Computations and Simulation, 9, 169–182.
A.M. Mathai (1979): On the non-null distributions of test statistics connected with exponential populations, Communications in Statistics (Theory & Methods), 8(1), 47–55.
A.M. Mathai (1982): On a conjecture in geometric probability regarding asymptotic normality of a random simplex, Annals of Probability, 10, 247–251.
A.M. Mathai (1984): On multi-sample sphericity (in Russian), Investigations in Statistics (Russian), Tom 136, 153–161.
A.M. Mathai (1985): Exact and asymptotic procedures in the distribution of multivariate test statistic, In Statistical Theory ad Data Analysis, K. Matusita Editor, North-Holland, pp. 381–418.
A.M. Mathai (1986): Hypothesis of multisample sphericity, Journal of Mathematical Sciences, 33(1), 792–796.
A.M. Mathai (1993): A Handbook of Generalized Special Functions for Statistical and Physical Sciences, Oxford University Press, Oxford.
A.M. Mathai and H.J. Haubold (2017): Probability and Statistics: A Course for Physicists and Engineers, De Gruyter, Germany.
A.M. Mathai and R.S. Katiyar (1979a): Exact percentage points for testing independence, Biometrika, 66, 353–356.
A.M. Mathai and R. S. Katiyar (1979b): The distribution and the exact percentage points for Wilks’ Lmvc criterion, Annals of the Institute of Statistical Mathematics, 31, 215–224.
A.M. Mathai and R.S. Katiyar (1980): Exact percentage points for three tests associated with exponential populations, Sankhya Series B, 42, 333–341.
A.M. Mathai and P.N. Rathie (1970): The exact distribution of Votaw’s criterion, Annals of the Institute of Statistical Mathematics, 22, 89–116.
A.M. Mathai and P.N. Rathie (1971): Exact distribution of Wilks’ criterion, The Annals of Mathematical Statistics, 42, 1010–1019.
A.M. Mathai and R.K. Saxena (1973): Generalized Hypergeometric Functions with Applications in Statistics and Physical Sciences, Springer-Verlag, Lecture Notes No. 348, Heidelberg and New York.
A.M. Mathai and R.K. Saxena (1978): The H-function with Applications in Statistics and Other Disciplines, Wiley Halsted, New York and Wiley Eastern, New Delhi.
A.M. Mathai, R.K. Saxena and H.J. Haubold (2010): The H-function: Theory and Applications, Springer, New York.
B.N. Nagarsenker and K.C.S. Pillai (1973): The distribution of the sphericity test criterion, Journal of Multivariate Analysis, 3, 226–2235.
N. Sugiura and H. Nagao (1968): Unbiasedness of some test criteria for the equality of one or two covariance matrices, Annals of Mathematical Statistics, 39, 1686–1692.
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Mathai, A., Provost, S., Haubold, H. (2022). Chapter 6: Hypothesis Testing and Null Distributions. In: Multivariate Statistical Analysis in the Real and Complex Domains. Springer, Cham. https://doi.org/10.1007/978-3-030-95864-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-95864-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95863-3
Online ISBN: 978-3-030-95864-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)