Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In the preceding chapter, the theoretical basis of estimation theory was presented. Now we turn our interest towards testing issues: we want to test the hypothesis H 0 that the unknown parameter θ belongs to some subspace of \(\mathbb {R}^{q}\). This subspace is called the null set and will be denoted by \(\Omega_{0} \subset \mathbb {R}^{q}\).

In many cases, this null set corresponds to restrictions which are imposed on the parameter space: H 0 corresponds to a “reduced model”. As we have already seen in Chapter 3, the solution to a testing problem is in terms of a rejection region R which is a set of values in the sample space which leads to the decision of rejecting the null hypothesis H 0 in favour of an alternative H 1, which is called the “full model”.

In general, we want to construct a rejection region R which controls the size of the type I error, i.e. the probability of rejecting the null hypothesis when it is true. More formally, a solution to a testing problem is of predetermined size α if:

$$\mathrm {P}(\hbox{Rejecting } H_0 \;|\; H_0\hbox{ is true})= \alpha.$$

In fact, since H 0 is often a composite hypothesis, it is achieved by finding R such that

$$\sup_{\theta \in \Omega_0} \mathrm {P}({\mathcal{X}}\in R \;|\; \theta)= \alpha.$$

In this chapter we will introduce a tool which allows us to build a rejection region in general situations; it is based on the likelihood ratio principle. This is a very useful technique because it allows us to derive a rejection region with an asymptotically appropriate size α. The technique will be illustrated through various testing problems and examples. We concentrate on multinormal populations and linear models where the size of the test will often be exact even for finite sample sizes n.

Section 7.6 gives the basic ideas and Section 7.2 presents the general problem of testing linear restrictions. This allows us to propose solutions to frequent types of analyses (including comparisons of several means, repeated measurements and profile analysis). Each case can be viewed as a simple specific case of testing linear restrictions. Special attention is devoted to confidence intervals and confidence regions for means and for linear restrictions on means in a multinormal setup.

1 Likelihood Ratio Test

Suppose that the distribution of \(\{x_{i}\}^{n}_{i=1}\), \(x_{i}\in \mathbb {R}^{p}\), depends on a parameter vector θ. We will consider two hypotheses:

The hypothesis H 0 corresponds to the “reduced model” and H 1 to the “full model”. This notation was already used in Chapter 3.

Example 7.1

Consider a multinormal \(N_{p}(\theta,{\mathcal{I}})\). To test if θ equals a certain fixed value θ 0 we construct the test problem:

or, equivalently, Ω0={θ 0}, \(\Omega_{1}=\mathbb {R}^{p}\).

Define \(L^{*}_{j}=\max_{\theta \in \Omega _{j}}L({\mathcal{X}};\theta )\), the maxima of the likelihood for each of the hypotheses. Consider the likelihood ratio (LR)

$$\lambda ({\mathcal{X}})=\frac { L_0^* }{L^*_1 }.$$
(7.1)

One tends to favour H 0 if the LR is high and H 1 if the LR is low. The likelihood ratio test (LRT) tells us when exactly to favour H 0 over H 1. A likelihood ratio test of size α for testing H 0 against H 1 has the rejection region

$$R = \{{\mathcal{X}}:\lambda ({\mathcal{X}})<c\}$$

where c is determined so that \(\sup_{\theta \in \Omega _{0}}\mathrm {P}_{\theta}({\mathcal{X}}\in R)=\alpha \). The difficulty here is to express c as a function of α, because \(\lambda ({\mathcal{X}})\) might be a complicated function of \({\mathcal{X}}\).

Instead of λ we may equivalently use the log-likelihood

$$-2\log\lambda =2(\ell ^*_1-\ell ^*_0).$$

In this case the rejection region will be \(R=\{{\mathcal{X}}:-2\log \lambda ({\mathcal{X}})>k\}\). What is the distribution of λ or of −2logλ from which we need to compute c or k?

Theorem 7.1

(Wilks Theorem)

If \(\Omega _{1}\subset \mathbb {R}^{q}\) is a q-dimensional space and if Ω0⊂Ω1 is an r-dimensional subspace, then under regularity conditions

$$\forall \ \theta \in \Omega_0:\quad -2\log\lambda \stackrel{\mathcal{L}}{\longrightarrow} \chi^2_{q-r}\quad \mbox{as }n\to\infty.$$

An asymptotic rejection region can now be given by simply computing the 1−α quantile \(k=\chi^{2}_{1-\alpha ;q-r}\). The LRT rejection region is therefore

$$R=\{{\mathcal{X}}: -2\log\lambda({\mathcal{X}}) >\chi^2_{1-\alpha ;q-r}\}.$$

Theorem 7.1 is thus very helpful: it gives a general way of building rejection regions into many problems. Unfortunately, it is only an asymptotic result, meaning that the size of the test is only approximately equal to α, although the approximation becomes better when the sample size n increases. The question is “how large should n be?”. There is no definite rule: we encounter here the same problem that was already discussed with respect to the Central Limit Theorem in Chapter 4.

Fortunately, in many standard circumstances, we can derive exact tests even for finite samples because the test statistic \(-2 \log \lambda({\mathcal{X}})\) or a simple transformation of it turns out to have a simple form. This is the case in most of the following standard testing problems. All of them can be viewed as an illustration of the likelihood ratio principle.

Test Problem 1 is an amuse-bouche: in testing the mean of a multinormal population with a known covariance matrix the likelihood ratio statistic has a very simple quadratic form with a known distribution under H 0.

Test Problem 1

Suppose that X 1,…,X n is an i.i.d. random sample from a N p (μ,Σ) population.

$$H_0:\mu =\mu _0, \ \Sigma\ \mbox{known versus}\ H_1:\ \mbox{no constraints.}$$

In this case H 0 is a simple hypothesis, i.e., Ω0={μ 0} and therefore the dimension r of Ω0 equals 0. Since we have imposed no constraints in H 1, the space Ω1 is the whole \(\mathbb {R}^{p}\) which leads to q=p. From (6.6) we know that

$$\ell _0^*=\ell (\mu _0,\Sigma )=-\frac{n }{ 2}\log|2\pi \Sigma |-\frac{1}{2 }n\mathop {\mathrm {tr}}(\Sigma ^{-1}{\mathcal{S}})-\frac{1 }{2 }n(\overline{x}-\mu _0)^{\top}\Sigma^{-1}(\overline{x}-\mu _0) .$$

Under H 1 the maximum of (μ,Σ) is

$$\ell ^*_1=\ell (\overline{x},\Sigma )=-\frac{n }{2 }\log |2\pi \Sigma|-\frac{ 1}{2 }n\mathop {\mathrm {tr}}(\Sigma ^{-1}{\mathcal{S}}).$$

Therefore,

$$ -2\log\lambda =2(\ell ^*_1-\ell ^*_0)=n(\overline{x}-\mu _0)^{\top}\Sigma ^{-1}(\overline{x}-\mu _0)$$
(7.2)

which, by Theorem 4.7, has a \(\chi ^{2}_{p}\)-distribution under H 0.

Example 7.2

Consider the bank data again. Let us test whether the population mean of the forged bank notes is equal to

$$\mu _0=(214.9,129.9,129.7,8.3,10.1,141.5)^{\top}.$$

(This is in fact the sample mean of the genuine bank notes.) The sample mean of the forged bank notes is

$$\overline{x}=(214.8,130.3,130.2,10.5,11.1,139.4)^{\top}.$$

Suppose for the moment that the estimated covariance matrix \({\mathcal{S}}_{f}\) given in (3.5) is the true covariance matrix Σ. We construct the likelihood ratio test and obtain

the quantile \(k=\chi ^{2}_{0.95;6}\) equals 12.592. The rejection consists of all values in the sample space which lead to values of the likelihood ratio test statistic larger than 12.592. Under H 0 the value of −2logλ is therefore highly significant. Hence, the true mean of the forged bank notes is significantly different from μ 0!

Test Problem 2 is the same as the preceding one but in a more realistic situation where the covariance matrix is unknown; here the Hotelling’s T 2-distribution will be useful to determine an exact test and a confidence region for the unknown μ.

Test Problem 2

Suppose that X 1,…,X n is an i.i.d. random sample from a N p (μ,Σ) population.

$$H_0:\mu =\mu _0,\ \Sigma\ \mbox{unknown versus}\ H_1:\ \mbox{no constraints.}$$

Under H 0 it can be shown that

(7.3)

and under H 1 we have

$$\ell^*_1 = \ell (\overline{x},{\mathcal{S}}).$$

This leads after some calculation to

By using the result for the determinant of a partitioned matrix, it equals to

(7.4)

This statistic is a monotone function of \((n-1)d^{\top} {\mathcal{S}}^{-1}d\). This means that −2logλ>k if and only if \((n-1)d^{\top} {\mathcal{S}}^{-1}d>k'\). The latter statistic has by Corollary 5.3, under H 0, a Hotelling’s T 2-distribution. Therefore,

$$ (n-1)(\bar{x}-\mu_0)^{\top} {\mathcal{S}}^{-1}(\bar{x}-\mu_0) \sim T^2(p,n-1),$$
(7.5)

or equivalently

$$ \left (\frac{n-p }{p }\right )(\bar{x}-\mu_0)^{\top} {\mathcal{S}}^{-1}(\bar{x}-\mu_0) \sim F_{p,n-p}.$$
(7.6)

In this case an exact rejection region may be defined as

$$\left (\frac{n-p }{p }\right )(\bar{x}-\mu_0)^{\top} {\mathcal{S}}^{-1}(\bar{x}-\mu_0)>F_{1-\alpha;p,n-p}.$$

Alternatively, we have from Theorem 7.1 that under H 0 the asymptotic distribution of the test statistic is

$$-2 \log\lambda \stackrel{\mathcal{L}}{ \mathrel{\mathop{\longrightarrow}\limits_{}^{}}} \chi^2_{p},\quad\mbox{as}\ n\rightarrow\infty $$

which leads to the (asymptotically valid) rejection region

$$n\log \{1+(\bar{x}-\mu_0)^{\top}S^{-1}(\bar{x}-\mu_0)\}>\chi^2_{1-\alpha ;p},$$

but of course, in this case, we would prefer to use the exact F-test provided just above.

Example 7.3

Consider the problem of Example 7.2 again. We know that \({\mathcal{S}}_{f}\) is the empirical analogue for Σ f , the covariance matrix for the forged banknotes. The test statistic (7.5) has the value 1153.4 or its equivalent for the F distribution in (7.6) is 182.5 which is highly significant (F 0.95;6,94=2.1966) so that we conclude that \(\mu_{f} \not= \mu_{0}\).

1.1 Confidence Region for μ

When estimating a multidimensional parameter \(\theta \in \mathbb {R}^{k}\) from a sample, we saw in Chapter 6 how to determine the estimator \(\widehat{\theta}= \widehat{\theta}({{\mathcal{X}}})\). For the observed data we end up with a point estimate, which is the corresponding observed value of \(\widehat{\theta}\). We know \(\widehat{\theta}({{\mathcal{X}}})\) is a random variable and we often prefer to determine a confidence region for θ. A confidence region (CR) is a random subset of \(\mathbb {R}^{k}\) (determined by appropriate statistics) such that we are “confident”, at a certain given level 1−α, that this region contains θ:

$$\mathrm {P}(\theta\in \textrm{CR}) = 1-\alpha.$$

This is just a multidimensional generalisation of the basic univariate confidence interval. Confidence regions are particularly useful when a hypothesis H 0 on θ is rejected, because they eventually help in identifying which component of θ is responsible for the rejection.

There are only a few cases where confidence regions can be easily assessed, and include most of the testing problems on mean presented in this section.

Corollary 5.3 provides a pivotal quantity which allows confidence regions for μ to be constructed. Since \((\frac{n-p }{p })(\bar{x}-\mu)^{\top} {\mathcal{S}}^{-1}(\bar{x}-\mu) \sim F_{p,n-p}\), we have

$$\mathrm {P}\left\{\left (\frac{n-p }{p }\right )(\mu-\bar{x})^{\top} {\mathcal{S}}^{-1}(\mu-\bar{x}) <F_{1-\alpha;p,n-p}\right \}= 1-\alpha.$$

Then,

$$\textrm{CR}=\left\{\mu\in \mathbb {R}^p\mid(\mu-\bar{x})^{\top} {\mathcal{S}}^{-1}(\mu-\bar{x})\le\frac{p}{n-p}F_{1-\alpha;p,n-p}\right\}$$

is a confidence region at level (1−α) for μ. It is the interior of an iso-distance ellipsoid in \(\mathbb {R}^{p}\) centred at \(\bar{x}\), with a scaling matrix \({\mathcal{S}}^{-1}\) and a distance constant \((\frac{p }{n-p })F_{1-\alpha;p,n-p}\). When p is large, ellipsoids are not easy to handle for practical purposes. One is thus interested in finding confidence intervals for μ 1,μ 2,…,μ p so that simultaneous confidence on all the intervals reaches the desired level of say, 1−α.

Below, we consider a more general problem. We construct simultaneous confidence intervals for all possible linear combinations a μ, \(a \in \mathbb {R}^{p}\) of the elements of μ.

Suppose for a moment that we fix a particular projection vector a. We are back to a standard univariate problem of finding a confidence interval for the mean a μ of a univariate random variable a X. We can use the t-statistics and an obvious confidence interval for a μ is given by the values a μ such that

$$\left|\frac{\sqrt{n-1}(a^{\top}\mu -a^{\top}\bar{x})}{\sqrt{a^{\top} {\mathcal{S}}a}}\right|\le t_{1-\frac{\alpha}{2};n-1}$$

or equivalently

$$t^2(a)=\frac{(n-1)\{a^{\top}(\mu -\bar{x})\}^2}{a^{\top} {\mathcal{S}}a}\le F_{1-\alpha ;1,n-1}.$$

This provides the (1−α) confidence interval for a μ:

$$\left(a^{\top}\bar{x}-\sqrt{F_{1-\alpha ;1,n-1}\frac{a^{\top} {\mathcal{S}}a}{n-1}}\le a^{\top}\mu \le a^{\top}\bar{x}+\sqrt{F_{1-\alpha ;1,n-1}\frac{a^{\top} {\mathcal{S}}a}{n-1}}\,\right).$$

Now it is easy to prove (using Theorem 2.5) that:

$$\max_at^2(a)=(n-1)(\bar{x}-\mu)^{\top} {\mathcal{S}}^{-1}(\bar{x}-\mu)\sim T^2(p,n-1).$$

Therefore, simultaneously for all \(a\in \mathbb {R}^{p}\), the interval

$$ \left(a^{\top}\bar{x}-\sqrt{K_\alpha a^{\top} {\mathcal{S}}a},\ a^{\top}\bar{x}+\sqrt{K_\alpha a^{\top} {\mathcal{S}}a}\,\right)$$
(7.7)

where \(K_{\alpha}= \frac{p}{n-p}F_{1-\alpha ;p,n-p}\), will contain a μ with probability (1−α).

A particular choice of a are the columns of the identity matrix \({{\mathcal{I}}}_{p}\), providing simultaneous confidence intervals for μ 1,…,μ p . We therefore have with probability (1−α) for j=1,…,p

$$ \bar{x}_j-\sqrt{\frac{p}{n-p}F_{1-\alpha ;p,n-p}s_{jj}}\le \mu_j \le \bar{x}_j+\sqrt{\frac{p}{n-p}F_{1-\alpha ;p,n-p}s_{jj}}.$$
(7.8)

It should be noted that these intervals define a rectangle inscribing the confidence ellipsoid for μ given above. They are particularly useful when a null hypothesis H 0 of the type described above is rejected and one would like to see which component(s) are mainly responsible for the rejection.

Example 7.4

The 95% confidence region for μ f , the mean of the forged banknotes, is given by the ellipsoid:

$$\left\{\mu \in \mathbb {R}^6 \mid (\mu-\bar{x}_f)^{\top}S_f^{-1}(\mu-\bar{x}_f)\le \frac{6}{94}F_{0.95;6,94}\right\}.$$

The 95% simultaneous confidence intervals are given by (we use F 0.95;6,94=2.1966)

Comparing the inequalities with μ 0=(214.9,129.9,129.7,8.3,10.1,141.5) shows that almost all components (except the first one) are responsible for the rejection of μ 0 in Examples 7.2 and 7.3.

In addition, the method can provide other confidence intervals. We have at the same level of confidence (choosing a =(0, 0, 0, 1, −1, 0))

$$-1.211 \le \mu_4 - \mu_5 \le 0.005$$

showing that for the forged bills, the lower border is essentially smaller than the upper border.

Remark 7.1

It should be noted that the confidence region is an ellipsoid whose characteristics depend on the whole matrix \({\mathcal{S}}\). In particular, the slope of the axis depends on the eigenvectors of S and therefore on the covariances s ij . However, the rectangle inscribing the confidence ellipsoid provides the simultaneous confidence intervals for μ j , j=1,…,p. They do not depend on the covariances s ij , but only on the variances s jj (see (7.8)). In particular, it may happen that a tested value μ 0 is covered by the confidence ellipsoid but not covered by the intervals (7.8). In this case, μ 0 is rejected by a test based on the simultaneous confidence intervals but not rejected by a test based on the confidence ellipsoid. The simultaneous confidence intervals are easier to handle than the full ellipsoid but we have lost some information, namely the covariance between the components (see Exercise 7.14).

The following problem concerns the covariance matrix in a multinormal population: in this situation the test statistic has a slightly more complicated distribution. We will therefore invoke the approximation of Theorem 7.1 in order to derive a test of approximate size α.

Test Problem 3

Suppose that X 1,…,X n is an i.i.d. random sample from a N p (μ,Σ) population.

$$H_0:\Sigma =\Sigma _0,\ \mu\ \mbox{unknown versus}\ H_1:\ \mbox{no constraints}.$$

Under H 0 we have \(\widehat{\mu}= \overline{x}\), and Σ=Σ0, whereas under H 1 we have \(\widehat{\mu}= \overline{x}\), and \(\widehat{\Sigma}={\mathcal{S}}\). Hence

and thus

Note that this statistic is a function of the eigenvalues of \(\Sigma ^{-1}_{0}{\mathcal{S}}\). Unfortunately, the exact finite sample distribution of −2logλ is very complicated. Asymptotically, we have under H 0

$$-2\log\lambda \stackrel{\mathcal{L}}{\to} \chi ^2_m\quad \mbox{as}\ n\to\infty$$

with \(m=\frac{1 }{2 }\left\{p(p+1)\right\}\), since a (p×p) covariance matrix has only these m parameters as a consequence of its symmetry.

Example 7.5

Consider the US companies data set (Table B.5) and suppose we are interested in the companies of the energy sector, analysing their assets (X 1) and sales (X 2). The sample is of size 15 and provides the value of . We want to test if . (Σ0 is in fact the empirical variance matrix for X 1 and X 2 for the manufacturing sector.) The test statistic ( MVAusenergy) turns out to be −2logλ=5.4046 which is not significant for \(\chi_{3}^{2}\) (\(\mbox{$p$-value}=0.1445\)). So we cannot conclude that \(\Sigma \not= \Sigma_{0}\).

In the next testing problem, we address a question that was already stated in Chapter 3, Section 3.6: testing a particular value of the coefficients β in a linear model. The presentation is carried out in general terms so that it can be built on in the next section where we will test linear restrictions on β.

Test Problem 4

Suppose that Y 1,…,Y n are independent r.v.’s with \(Y_{i} \sim N_{1}(\beta^{\top}x_{i}, \sigma^{2}), x_{i}~\in~\mathbb {R}^{p}\).

$$H_0:\beta = \beta_0, \ \sigma^2\ \mbox{unknown versus} \ H_1:\ \mbox{no constraints}.$$

Under H 0 we have \(\beta = \beta_{0}, \widehat{\sigma}^{2}_{0} = \frac{1}{n}||y-{\mathcal{X}}\beta_{0}||^{2}\) and under H 1 we have \(\hat{\beta} =({\mathcal{X}}^{\top} {\mathcal{X}})^{-1} {\mathcal{X}}^{\top}y, \hat{\sigma}^{2} = \frac{1}{n}||y-{\mathcal{X}}\beta||^{2}\) (see Example 6.3). Hence by Theorem 7.1

We draw upon the result (3.45) which gives us

$$F = \frac{(n-p)}{p}\left(\frac{||y-{\mathcal{X}}\beta_{0}||^2}{||y-{\mathcal{X}}\hat{\beta}||^2}-1 \right) \sim F_{p,n-p}, $$

so that in this case we again have an exact distribution.

Example 7.6

Let us consider our “classic blue” pullovers again. In Example 3.11 we tried to model the dependency of sales on prices. As we have seen in Figure 3.5 the slope of the regression curve is rather small, hence we might ask if . Here

$$y = \left( \begin{array}{c} y_{1} \\ \vdots \\ y_{10} \end{array}\right) = \left( \begin{array}{c} x_{1,1} \\ \vdots \\ x_{10,1} \end{array}\right), \qquad {\mathcal{X}} = \left( \begin{array}{c@{\quad}c} 1 & x_{1,2} \\ \vdots & \vdots \\ 1 & x_{10,2} \end{array} \right). $$

The test statistic for the LR test is

$$-2\log \lambda = 9.10 $$

which under the \(\chi_{2}^{2}\) distribution is significant. The exact F-test statistic

$$F = 5.93 $$

is also significant under the F 2,8 distribution (F 2,8;0.95=4.46).

figure a

2 Linear Hypothesis

In this section, we present a very general procedure which allows a linear hypothesis to be tested, i.e., a linear restriction, either on a vector mean μ or on the coefficient β of a linear model. The presented technique covers many of the practical testing problems on means or regression coefficients.

Linear hypotheses are of the form \({\mathcal{A}}\mu =a\) with known matrices \({\mathcal{A}} (q\times p)\) and a(q×1) with qp.

Example 7.7

Let μ=(μ 1,μ 2). The hypothesis that μ 1=μ 2 can be equivalently written as:

$${\mathcal{A}}\mu = \left( \begin{array}{c@{\quad}c} 1&{-1}\\ \end{array}\right)\left( \begin{array}{c} \mu_{1}\\\mu_{2} \end{array}\right) = 0 =a.$$

The general idea is to test a normal population \(H_{0}:\; {\mathcal{A}}\mu=a\) (restricted model) against the full model H 1 where no restrictions are put on μ. Due to the properties of the multinormal, we can easily adapt the Test Problems 1 and 2 to this new situation. Indeed we know, from Theorem 5.2, that \(y_{i}={\mathcal{A}}x_{i} \sim N_{q}(\mu_{y},\Sigma_{y})\), where \(\mu_{y}={\mathcal{A}}\mu\) and \(\Sigma_{y}={\mathcal{A}}\Sigma{\mathcal{A}}^{\top}\).

Testing the null \(H_{0}:\; {\mathcal{A}}\mu=a\), is the same as testing \(H_{0}:\;\mu_{y}=a\). The appropriate statistics are \(\bar{y}\) and \({\mathcal{S}}_{y}\) which can be derived from the original statistics \(\bar{x}\) and \({\mathcal{S}}\) available from \({\mathcal{X}}\):

$$\bar{y} = {\mathcal{A}} \bar{x},\qquad {\mathcal{S}}_y={\mathcal{A}}{\mathcal{S}}{\mathcal{A}}^{\top}.$$

Here the difference between the translated sample mean and the tested value is \(d={\mathcal{A}} \bar{x} -a\). We are now in the situation to proceed to Test Problems 5 and 6.

Test Problem 5

Suppose X 1,…,X n is an i.i.d. random sample from a N p (μ,Σ) population.

$$H_0:{\mathcal{A}}\mu =a,\ \Sigma\ \mbox{known versus}\ H_1:\ \mbox{no constraints.}$$

By (7.2) we have that, under H 0:

$$n({\mathcal{A}}\bar{x}-a)^{\top}({\mathcal{A}}\Sigma {\mathcal{A}}^{\top})^{-1}({\mathcal{A}}\bar{x}-a)\sim {\mathcal{X}}^2_q,$$

and we reject H 0 if this test statistic is too large at the desired significance level.

Example 7.8

We consider hypotheses on partitioned mean vectors . Let us first look at

$$H_0:\mu_1=\mu_2,\ \mbox{versus}\ H_1:\mbox{no constraints},$$

for with known Σ. This is equivalent to \({\mathcal{A}}=({\mathcal{I}},-{\mathcal{I}})\), \(a=(0,\dots,0)^{\top}\in \mathbb {R}^{p}\) and leads to

$$-2\log\lambda = n(\overline{x}_1 - \overline{x}_2) (2\Sigma)^{-1}(\overline{x}_1 - \overline{x}_2) \sim \chi^2_p.$$

Another example is the test whether μ 1=0, i.e.,

$$H_0:\mu_1=0,\ \mbox{versus}\ H_1:\mbox{no constraints,}$$

for with known Σ. This is equivalent to \({\mathcal{A}}\mu =a\) with \({\mathcal{A}}=({\mathcal{I}},0)\), and \(a=(0,\dots,0)^{\top}\in \mathbb {R}^{p}\). Hence

$$-2\log\lambda =n\overline{x}_1\Sigma^{-1}\overline{x}_1\sim\chi^2_p.$$

Test Problem 6

Suppose X 1,…,X n is an i.i.d. random sample from a N p (μ,Σ) population.

$$H_0:{\mathcal{A}}\mu =a, \ \Sigma\ \mbox{unknown versus}\ H_1:\ \ \mbox{no constraints.}$$

From Corollary (5.4) and under H 0 it follows immediately that

(7.9)

since indeed under H 0,

$${\mathcal{A}}\overline{x}\sim N_q(a,n^{-1}{\mathcal{A}}\Sigma {\mathcal{A}}^{\top})$$

is independent of

$$n{\mathcal{ASA}}^{\top}\sim W_q({\mathcal{A}}\Sigma {\mathcal{A}}^{\top},n-1).$$

Example 7.9

Let’s come back again to the bank data set and suppose that we want to test if μ 4=μ 5, i.e., the hypothesis that the lower border mean equals the larger border mean for the forged bills. In this case:

The test statistic is:

$$99 ({\mathcal{A}}\bar{x})^{\top}({\mathcal{A}}S_f{\mathcal{A}}^{\top})^{-1}({\mathcal{A}}\bar{x}) \sim T^2(1,99)=F_{1,99}.$$

The observed value is 13.638 which is significant at the 5% level.

2.1 Repeated Measurements

In many situations, n independent sampling units are observed at p different times or under p different experimental conditions (different treatments, …). So here we repeat p one-dimensional measurements on n different subjects. For instance, we observe the results from n students taking p different exams. We end up with a (n×p) matrix. We can thus consider the situation where we have X 1,…,X n i.i.d. from a normal distribution N p (μ,Σ) when there are p repeated measurements. The hypothesis of interest in this case is that there are no treatment effects, H 0:μ 1=μ 2=⋯=μ p . This hypothesis is a direct application of Test Problem 6. Indeed, introducing an appropriate matrix transform on μ we have

$$ H_0:\ {\mathcal{C}}\mu=0\quad \mbox{where}\ {\mathcal{C}} ((p-1)\times p) =\left(\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}1 & -1 & 0 &\cdots & 0 \\0 & 1 & -1 &\cdots & 0 \\\vdots & \vdots & \vdots & \vdots &\vdots \\0 & \cdots & 0 &1 & -1\end{array}\right).$$
(7.10)

Note that in many cases one of the experimental conditions is the “control” (a placebo, standard drug or reference condition). Suppose it is the first component. In that case one is interested in studying differences to the control variable. The matrix \({\mathcal{C}}\) has therefore a different form

$${\mathcal{C}} ((p-1)\times p) =\left(\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}1 & -1 & 0 & \cdots & 0 \\1 & 0 & -1 & \cdots & 0 \\\vdots & \vdots & \vdots & \vdots & \vdots \\1 & 0 & 0 & \cdots & -1\end{array}\right). $$

By (7.9) the null hypothesis will be rejected if:

$$\frac{(n-p+1)}{p-1}\bar{x}^{\top}{\mathcal{C}}^{\top}({\mathcal{C}}{\mathcal{S}}{\mathcal{C}}^{\top})^{-1}{\mathcal{C}}\bar{x}>F_{1-\alpha ;p-1,n-p+1}.$$

As a matter of fact, \({\mathcal{C}}\mu\) is the mean of the random variable \(y_{i}={\mathcal{C}}x_{i}\)

$$y_i\sim N_{p-1}({\mathcal{C}}\mu, {\mathcal{C}}\Sigma {\mathcal{C}}^{\top}).$$

Simultaneous confidence intervals for linear combinations of the mean of y i have been derived above in (7.7). For all \(a\in \mathbb {R}^{p-1}\), with probability (1−α) we have

$$a^{\top}{\mathcal{C}}\mu \in a^{\top}{\mathcal{C}}\bar{x} \pm\sqrt{\frac{(p-1)}{n-p+1}F_{1-\alpha ;p-1,n-p+1}a^{\top}{\mathcal{C}}S{\mathcal{C}}^{\top}a}.$$

Due to the nature of the problem here, the row sums of the elements in \({\mathcal{C}}\) are zero: \({\mathcal{C}}1_{p}=0\), therefore \(a^{\top}{\mathcal{C}}\) is a vector having sum of elements equals to 0. This is called a contrast. Let \(b={\mathcal{C}}^{\top}a\). We have \(b^{\top}1_{p}=\sum_{j=1}^{p}b_{j}=0\). The result above thus provides for all contrasts of μ, and b μ simultaneous confidence intervals at level (1−α)

$$b^{\top}\mu \in b^{\top}\bar{x} \pm\sqrt{\frac{(p-1)}{n-p+1}F_{1-\alpha ;p-1,n-p+1}b^{\top}{\mathcal{S}}b}.$$

Examples of contrasts for p=4 are b =(1 −1 0 0) or (1 0 0 −1) or even \((1\ {-\frac{1}{3}}\ {-\frac{1}{3}}\ {-\frac{1}{3}})\) when the control is to be compared with the mean of 3 different treatments.

Example 7.10

Bock (1975) considers the evolution of the vocabulary of children from the eighth through eleventh grade. The data set contains the scores of a vocabulary test of 40 randomly chosen children. This is a repeated measurement situation, (n=40,p=4), since the same children were observed from grades 8 to 11. The statistics of interest are:

Suppose we are interested in the yearly evolution of the children. Then the matrix \({\mathcal{C}}\) providing successive differences of μ j is:

The value of the test statistic is \(F_{\mathrm{obs}}=53.134\) which is highly significant for F 3.37. There are significant differences between the successive means. However, the analysis of the contrasts shows the following simultaneous 95% confidence intervals

Thus, the rejection of H 0 is mainly due to the difference between the childrens’ performances in the first and second year. The confidence intervals for the following contrasts may also be of interest:

They show that μ 1 is different from the average of the 3 other years (the same being true for μ 4) and μ 4 turns out to be higher than μ 2 (and of course higher than μ 1).

Test Problem 7 illustrates how the likelihood ratio can be applied to testing a linear restriction on the coefficient β of a linear model. It is also shown how a transformation of the test statistic leads to an exact F test as presented in Chapter 3.

Test Problem 7

Suppose Y 1,…,Y n , are independent with Y i N 1(β x i ,σ 2), and \(x_{i}~\in~\mathbb {R}^{p}\).

$$H_0:{\mathcal{A}}\beta =a,\ \sigma^2\ \mbox{unknown versus}\ H_1:\ \mbox{no constraints.}$$

To get the constrained maximum likelihood estimators under H 0, let \(f(\beta,\lambda)=(y-x\beta)^{\top}(y-x\beta)-\lambda^{\top}({\mathcal{A}}\beta-a)\) where \(\lambda \in \mathbb {R}^{q}\) and solve \(\frac{\partial f(\beta ,\lambda)}{\partial \beta} =0\) and \(\frac{\partial f(\beta ,\lambda)}{\partial \lambda}=0\) (Exercise 3.24), thus we obtain:

$$\tilde{\beta} = \hat{\beta} - ({\mathcal{X}}^{\top} {\mathcal{X}})^{-1}{\mathcal{A}}^{\top}\{{\mathcal{A}}({\mathcal{X}}^{\top} {\mathcal{X}})^{-1}{\mathcal{A}}^{\top}\}^{-1}({\mathcal{A}}\hat{\beta}-a)$$

for β and \(\tilde{\sigma}^{2} =\frac{1}{n} (y-{\mathcal{X}}\tilde{\beta})^{\top}(y-{\mathcal{X}}\tilde{\beta})\). The estimate \(\hat{\beta}\) denotes the unconstrained MLE as before. Hence, the LR statistic is

where q is the number of elements of a. This problem also has an exact F-test since

Example 7.11

Let us continue with the “classic blue” pullovers. We can once more test if β=0 in the regression of sales on prices. It holds that

$$\beta=0 \quad \mbox{iff} \ (\begin{array}{c@{\quad}c}0& 1\end{array})\left(\begin{array}{c}\alpha \\ \beta\end{array}\right)=0.$$

The LR statistic here is

$$-2\log \lambda = 0.284 $$

which is not significant for the \(\chi_{1}^{2}\) distribution. The F-test statistic

$$F = 0.231 $$

is also not significant. Hence, we can assume independence of sales and prices (alone). Recall that this conclusion has to be revised if we consider the prices together with advertising costs and hours sales manager hours.

Recall the different conclusion that was made in Example 7.6 when we rejected H 0:α=211 and β=0. The rejection there came from the fact that the pair of values was rejected. Indeed, if β=0 the estimator of α would be \(\bar{y}= 172.70\) and this is too far from 211.

Example 7.12

Let us now consider the multivariate regression in the “classic blue” pullovers example. From Example 3.15 we know that the estimated parameters in the model

$$X_{1} = \alpha + \beta_{1}X_{2} + \beta_{2}X_{3} + \beta_{3}X_{4}+ \varepsilon $$

are

$$\hat{\alpha} = 65.670,\qquad \hat{\beta}_{1} = -0.216,\qquad \hat{\beta}_{2} = 0.485,\qquad \hat{\beta}_{3} = 0.844. $$

Hence, we could postulate the approximate relation:

$$\beta_{1} \approx -\frac{1}{2} \beta_{2}, $$

which means in practice that augmenting the price by 20 EUR requires the advertising costs to increase by 10 EUR in order to keep the number of pullovers sold constant. Vice versa, reducing the price by 20 EUR yields the same result as before if we reduced the advertising costs by 10 EUR. Let us now test whether the hypothesis

$$H_0:\ \beta_{1} = -\frac{1}{2} \beta_{2} $$

is valid. This is equivalent to

$$\everymath{\displaystyle}\left(\matrix{0& 1& \frac{1}{2}& 0}\right)\left( \begin{array}{c} \alpha\\ \beta_{1} \\ \beta_{2} \\ \beta_{3} \end{array} \right) =0.$$

The LR statistic in this case is equal to ( MVAlrtest)

$$-2\log \lambda = 0.012, $$

the F statistic is

$$F = 0.007. $$

Hence, in both cases we will not reject the null hypothesis.

2.2 Comparison of Two Mean Vectors

In many situations, we want to compare two groups of individuals for whom a set of p characteristics has been observed. We have two random samples \(\{x_{i1}\}_{i=1}^{n_{1}}\) and \(\{x_{j2}\}_{j=1}^{n_{2}}\) from two distinct p-variate normal populations. Several testing issues can be addressed in this framework. In Test Problem 8 we will first test the hypothesis of equal mean vectors in the two groups under the assumption of equality of the two covariance matrices. This task can be solved by adapting Test Problem 2.

In Test Problem 9 a procedure for testing the equality of the two covariance matrices is presented. If the covariance matrices differ, the procedure of Test Problem 8 is no longer valid. If the equality of the covariance matrices is rejected, an easy rule for comparing two means with no restrictions on the covariance matrices is provided in Test Problem 10.

Test Problem 8

Assume that X i1N p (μ 1,Σ), with i=1,…,n 1 and X j2N p (μ 2,Σ), with j=1,…,n 2, where all the variables are independent.

$$H_0:\mu_{1} =\mu_{2},\ \mbox{versus}\ H_1:\ \mbox{no constraints.}$$

Both samples provide the statistics \(\bar{x}_{k}\) and \({\mathcal{S}}_{k}\), k=1,2. Let δ=μ 1μ 2. We have

(7.11)
(7.12)

Let \({\mathcal{S}}\)=(n 1+n 2)−1(n 1 S 1+n 2 S 2) be the weighted mean of \({\mathcal{S}}_{1}\) and \({\mathcal{S}}_{2}\). Since the two samples are independent and since \({\mathcal{S}}_{k}\) is independent of \(\bar{x}_{k}\) (for k=1,2) it follows that \({\mathcal{S}}\) is independent of \((\bar{x}_{1}-\bar{x}_{2})\). Hence, Theorem 5.8 applies and leads to a T 2-distribution:

$$ \frac{n_{1}n_{2}(n_{1}+n_{2}-2)}{(n_{1}+n_{2})^2}\left\{ ( \bar{x}_{1} -\bar{x}_{2})-\delta\right\}^{\top}{\mathcal{S}}^{-1}\left\{ (\bar{x}_{1}-\bar{x}_{2} ) -\delta\right\})\sim T^2(p, n_{1}+n_{2}-2)$$
(7.13)

or

$$\left\{\left(\bar{x}_{1}-\bar{x}_{2}\right)-\delta\right\}^{\top} {\mathcal{S}}^{-1}\left\{\left(\bar{x}_{1}-\bar{x}_{2}\right)-\delta\right\}\sim \frac{p(n_{1}+n_{2})^2}{(n_{1}+n_{2}-p-1)n_{1}n_{2}} F_{p,n_{1}+n_{2}-p-1}.$$

This result, as in Test Problem 2, can be used to test H 0: δ=0 or to construct a confidence region for \(\delta \in \mathbb {R}^{p}\). The rejection region is given by:

$$ \frac{n_{1}n_{2}(n_{1}+n_{2}-p-1)}{p(n_{1}+n_{2})^2}\left(\bar{x}_{1}-\bar{x}_{2}\right)^{\top} {\mathcal{S}}^{-1}\left(\bar{x}_{1}-\bar{x}_{2}\right) \ge F_{1-\alpha ;p,n_1+n_2-p-1}.$$
(7.14)

A (1−α) confidence region for δ is given by the ellipsoid centred at \((\bar{x}_{1}-\bar{x}_{2})\)

$$\{\delta-(\bar{x}_{1}-\bar{x}_{2})\}^{\top}{\mathcal{S}}^{-1}\{\delta-(\bar{x}_{1}-\bar{x}_{2})\}\le \frac{p(n_{1}+n_{2})^2}{(n_{1}+n_{2}-p-1)(n_{1}n_{2})} F_{1-\alpha ;p,n_{1}+n_{2}-p-1},$$

and the simultaneous confidence intervals for all linear combinations a δ of the elements of δ are given by

$$a^{\top}\delta \in a^{\top}(\bar{x}_1-\bar{x}_2) \pm \sqrt{\frac{p(n_{1}+n_{2})^2}{(n_{1}+n_{2}-p-1)(n_{1}n_{2})} F_{1-\alpha ;p,n_{1}+n_{2}-p-1} a^{\top} {\mathcal{S}}a}.$$

In particular we have at the (1−α) level, for j=1,…,p,

$$ \delta_j \in(\bar{x}_{1j}-\bar{x}_{2j}) \pm \sqrt{\frac{p(n_{1}+n_{2})^2}{(n_{1}+n_{2}-p-1)(n_{1}n_{2})} F_{1-\alpha ;p,n_{1}+n_{2}-p-1} s_{jj}}.$$
(7.15)

Example 7.13

Let us come back to the questions raised in Example 7.5. We compare the means of assets (X 1) and of sales (X 2) for two sectors, energy (group 1) and manufacturing (group 2). With n 1=15, n 2=10, and p=2 we obtain the statistics:

$$\bar{x}_1= \left( \begin{array}{l}4084.0 \\ 2580.5 \end{array} \right) ,\qquad \bar{x}_2=\left( \begin{array}{c}4307.2 \\ 4925.2 \end{array} \right)$$

and

$${\mathcal{S}}_1= 10^7\left(\begin{array}{c@{\quad}c}1.6635 & 1.2410\\1.2410 & 1.3747\end{array}\right),\qquad {\mathcal{S}}_2=10^7\left(\begin{array}{c@{\quad}c}1.2248 & 1.1425\\1.1425 & 1.5112\end{array}\right),$$

so that

$${\mathcal{S}}=10^7\left(\begin{array}{c@{\quad}c}1.4880 & 1.2016\\1.2016 & 1.4293\end{array}\right).$$

The observed value of the test statistic (7.14) is F=2.7036. Since F 0.95;2,22=3.4434 the hypothesis of equal means of the two groups is not rejected although it would be rejected at a less severe level (F>F 0.90;2,22=2.5613). By directly applying (7.15), the 95% simultaneous confidence intervals for the differences ( MVAsimcidif) are obtained as:

Example 7.14

In order to illustrate the presented test procedures it is interesting to analyse some simulated data. This simulation will point out the importance of the covariances in testing means. We created 2 independent normal samples in \(\mathbb {R}^{4}\) of sizes n 1=30 and n 2=20 with:

One may consider this as an example of X=(X 1,…,X n ) being the students’ scores from 4 tests, where the 2 groups of students were subjected to two different methods of teaching. First we simulate the two samples with \(\Sigma={{\mathcal{I}}}_{4}\) and obtain the statistics:

The test statistic (7.14) takes the value F=60.65 which is highly significant: the small variance allows the difference to be detected even with these relatively moderate sample sizes. We conclude (at the 95% level) that:

which confirms that the means for X 1 and X 4 are different.

Consider now a different simulation scenario where the standard deviations are 4 times larger: \(\Sigma=16 {\mathcal{I}}_{4}\). Here we obtain:

Now the test statistic takes the value 1.54 which is no longer significant (F 0.95,4,45=2.58). Now we cannot reject the null hypothesis (which we know to be false!) since the increase in variances prohibits the detection of differences of such magnitude.

The following situation illustrates once more the role of the covariances between covariates. Suppose that \(\Sigma=16 {\mathcal{I}}_{4}\) as above but with σ 14=σ 41=−3.999 (this corresponds to a negative correlation r 41=−0.9997). We have:

The value of F is 3.853 which is significant at the 5% level (\(\mbox{$p$-value} = 0.0089\)). So the null hypothesis δ=μ 1μ 2=0 is outside the 95% confidence ellipsoid. However, the simultaneous confidence intervals, which do not take the covariances into account are given by:

They contain the null value (see Remark 7.1 above) although they are very asymmetric for δ 1 and δ 4.

Example 7.15

Let us compare the vectors of means of the forged and the genuine bank notes. The matrices \({\mathcal{S}}_{f}\) and \({\mathcal{S}}_{g}\) were given in Example 3.1 and since here n f =n g =100, \({\mathcal{S}}\) is the simple average of \({\mathcal{S}}_{f}\) and \({\mathcal{S}}_{g}: {\mathcal{S}}=\frac{1}{2}({\mathcal{S}}_{f}+{\mathcal{S}}_{g})\).

The test statistic is given by (7.14) and turns out to be F=391.92 which is highly significant for F 6,193. The 95% simultaneous confidence intervals for the differences δ j =μ gj μ fj , j=1,…,p are:

All of the components (except for the first one) show significant differences in the means. The main effects are taken by the lower border (X 4) and the diagonal (X 6).

The preceding test implicitly uses the fact that the two samples are extracted from two different populations with common variance Σ. In this case, the test statistic (7.14) measures the distance between the two centres of gravity of the two groups w.r.t. the common metric given by the pooled variance matrix \({\mathcal{S}}\). If \(\Sigma_{1}\not= \Sigma_{2}\) no such matrix exists. There are no satisfactory test procedures for testing the equality of variance matrices which are robust with respect to normality assumptions of the populations. The following test extends Bartlett’s test for equality of variances in the univariate case. But this test is known to be very sensitive to departures from normality.

Test Problem 9

(Comparison of Covariance Matrices)

Let X ih N p (μ h h ), i=1,…,n h , h=1,…,k be independent random variables,

$$H_0:\Sigma_1 =\Sigma_2 =\cdots=\Sigma_k\ \mbox{versus}\ H_1:\ \mbox{no constraints.}$$

Each sub-sample provides \({\mathcal{S}}_{h}\), an estimator of Σ h , with

$$n_{h}S_{h}\sim W_{p}(\Sigma_{h},n_{h}-1).$$

Under H 0, \(\sum_{h=1}^{k}n_{h}{\mathcal{S}}_{h}\sim W_{p}(\Sigma,n-k)\) (Section 5.2), where Σ is the common covariance matrix X ih and \(n=\sum_{h=1}^{k}n_{h}\). Let \({\mathcal{S}}=\frac{n_{1}{\mathcal{S}}_{1}+\cdots+n_{k}{\mathcal{S}}_{k}}{n}\) be the weighted average of the \({\mathcal{S}}_{h}\) (this is in fact the MLE of Σ when H 0 is true). The likelihood ratio test leads to the statistic

$$ -2\log\lambda = n\log| {\mathcal{S}}|-\sum_{h=1}^{k}n_{h}\log|{\mathcal{S}}_{h}|$$
(7.16)

which under H 0 is approximately distributed as a \({\mathcal{X}}_{m}^{2}\) where \(m=\frac{1}{2}(k-1)p(p+1)\).

Example 7.16

Let’s come back to Example 7.13, where the mean of assets and sales have been compared for companies from the energy and manufacturing sector assuming that Σ12. The test of Σ12 leads to the value of the test statistic

$$ -2\log\lambda=0.9076$$
(7.17)

which is not significant (p-value for a \(\chi^{2}_{3} =0.82\)). We cannot reject H 0 and the comparison of the means performed above is valid.

Example 7.17

Let us compare the covariance matrices of the forged and the genuine bank notes (the matrices S f and S g are shown in Example 3.1). A first look seems to suggest that Σ1≠Σ2. The pooled variance \({\mathcal{S}}\) is given by \({\mathcal{S}}=\frac{1}{2}({\mathcal{S}}_{f}+{\mathcal{S}}_{g})\) since here n f =n g . The test statistic here is −2logλ=127.21, which is highly significant χ 2 with 21 degrees of freedom. As expected, we reject the hypothesis of equal covariance matrices, and as a result the procedure for comparing the two means in Example 7.15 is not valid.

What can we do with unequal covariance matrices? When both n 1 and n 2 are large, we have a simple solution:

Test Problem 10

(Comparison of two means, unequal covariance matrices, large samples)

Assume that X i1N p (μ 11), with i=1,…,n 1 and X j2N p (μ 22), with j=1,…,n 2 are independent random variables.

$$H_0:\mu_{1} =\mu_{2}\ \mbox{versus}\ H_1:\ \mbox{no constraints.}$$

Letting δ=μ 1μ 2, we have

$$(\bar{x}_1-\bar{x}_2)\sim N_p\left(\delta,\frac{\Sigma_1}{n_1}+\frac{\Sigma_2}{n_2}\right).$$

Therefore, by (5.4)

$$(\bar{x}_1-\bar{x}_2)^{\top}\left(\frac{\Sigma_1}{n_1}+\frac{\Sigma_2}{n_2}\right)^{-1}(\bar{x}_1-\bar{x}_2)\sim \chi^2_p.$$

Since \({\mathcal{S}}_{i}\) is a consistent estimator of Σ i for i=1,2, we have

$$ (\bar{x}_1-\bar{x}_2)^{\top}\left(\frac{{\mathcal{S}}_1}{n_1}+\frac{{\mathcal{S}}_2}{n_2}\right)^{-1}(\bar{x}_1-\bar{x}_2) \stackrel{{\mathcal{L}}}{\to} \chi^2_p.$$
(7.18)

This can be used in place of (7.13) for testing H 0, defining a confidence region for δ or constructing simultaneous confidence intervals for δ j ,j=1,…,p.

For instance, the rejection region at the level α will be

$$ (\bar{x}_1-\bar{x}_2)^{\top}\left(\frac{{\mathcal{S}}_1}{n_1}+\frac{{\mathcal{S}}_2}{n_2}\right)^{-1}(\bar{x}_1-\bar{x}_2)> \chi^2_{1-\alpha ;p}$$
(7.19)

and the (1−α) simultaneous confidence intervals for δ j , j=1,…,p are:

$$ \delta_j \in (\bar{x}_1-\bar{x}_2) \pm \sqrt{\chi^2_{1-\alpha ;p}\left(\frac{s^{(1)}_{jj}}{n_1}+\frac{s^{(2)}_{jj}}{n_2}\right)}$$
(7.20)

where \(s^{(i)}_{jj}\) is the (j,j) element of the matrix \({\mathcal{S}}_{i}\). This may be compared to (7.15) where the pooled variance was used.

Remark 7.2

We see, by comparing the statistics (7.19) with (7.14), that we measure here the distance between \(\bar{x}_{1}\) and \(\bar{x}_{2}\) using the metric \((\frac{{\mathcal{S}}_{1}}{n_{1}}+\frac{{\mathcal{S}}_{2}}{n_{2}})\). It should be noted that when n 1=n 2, the two methods are essentially the same since then \({\mathcal{S}}=\frac{1}{2}({\mathcal{S}}_{1}+{\mathcal{S}}_{2})\). If the covariances are different but have the same eigenvectors (different eigenvalues), one can apply the common principal component (CPC) technique, see Chapter 10.

Example 7.18

Let us use the last test to compare the forged and the genuine bank notes again (n 1 and n 2 are both large). The test statistic (7.19) turns out to be 2436.8 which is again highly significant. The 95% simultaneous confidence intervals are:

showing that all the components except the first are different from zero, the largest difference coming from X 6 (length of the diagonal) and X 4 (lower border). The results are very similar to those obtained in Example 7.15. This is due to the fact that here n 1=n 2 as we already mentioned in the remark above.

2.3 Profile Analysis

Another useful application of Test Problem 6 is the repeated measurements problem applied to two independent groups. This problem arises in practice when we observe repeated measurements of characteristics (or measures of the same type under different experimental conditions) on the different groups which have to be compared. It is important that the p measures (the “profile”) are comparable, and, in particular, are reported in the same units. For instance, they may be measures of blood pressure at p different points in time, one group being the control group and the other the group receiving a new treatment. The observations may be the scores obtained from p different tests of two different experimental groups. One is then interested in comparing the profiles of each group: the profile being just the vectors of the means of the p responses (the comparison may be visualised in a two dimensional graph using the parallel coordinate plot introduced in Section 1.7).

We are thus in the same statistical situation as for the comparison of two means:

where all variables are independent. Suppose the two population profiles look like in Figure 7.1.

Fig. 7.1
figure 1

Example of population profiles  MVAprofil

The following questions are of interest:

  1. 1.

    Are the profiles similar in the sense of being parallel (which means no interaction between the treatments and the groups)?

  2. 2.

    If the profiles are parallel, are they at the same level?

  3. 3.

    If the profiles are parallel, is there any treatment effect, i.e., are the profiles horizontal (profiles remain the same no matter which treatment received)?

The above questions are easily translated into linear constraints on the means and a test statistic can be obtained accordingly.

2.3.1 Parallel Profiles

Let \({\mathcal{C}}\) be a (p−1)×p matrix defined as

$$\mathcal{C}=\left(\begin{array}{c@{\quad}r@{\quad}r@{\quad}c@{\quad}r}1 &-1 & 0 & \cdots & 0\\0 & 1 &-1 & \cdots & 0\\\vdots & \vdots & \vdots & \vdots & \vdots\\0 &\cdots & 0 &1 &-1\end{array}\right).$$

The hypothesis to be tested is

$$H_0^{(1)}: \, {\mathcal{C}}(\mu_1-\mu_2)=0.$$

From (7.11), (7.12) and Corollary 5.4 we know that under H 0:

(7.21)

where \({\mathcal{S}}\) is the pooled covariance matrix. The hypothesis is rejected if

$$\frac{n_1n_2(n_1+n_1-p)}{(n_1+n_2)^2(p-1)}({\mathcal{C}}\bar{x})^{\top}({\mathcal{C}}{\mathcal{S}}{\mathcal{C}}^{\top})^{-1}{\mathcal{C}}\bar{x}>F_{1-\alpha ;p-1,n_1+n_2-p}.$$

2.3.2 Equality of Two Levels

The question of equality of the two levels is meaningful only if the two profiles are parallel. In the case of interactions (rejection of \(H_{0}^{(1)}\)), the two populations react differently to the treatments and the question of the level has no meaning.

The equality of the two levels can be formalised as

$$H_0^{(2)}: 1_p^{\top}(\mu_1-\mu_2) = 0$$

since

$$1_p^{\top}(\bar{x}_1-\bar{x}_2) \sim N_1\left(1_p^{\top}(\mu_1-\mu_2),\frac{n_1+n_2}{n_1n_2}1_p^{\top}\Sigma 1_p\right)$$

and

$$(n_1+n_2)1_p^{\top} {\mathcal{S}}1_p \sim W_1(1_p^{\top}\Sigma 1_p,n_1+n_2-2).$$

Using Corollary 5.4 we have that:

(7.22)

The rejection region is

$$\frac{n_1n_2(n_1+n_2-2)}{(n_1+n_2)^2}\frac{\{1_p^{\top}(\bar{x}_1-\bar{x}_2)\}^2}{1_p^{\top} {\mathcal{S}}1_p} > F_{1-\alpha ; 1,n_1+n_2-2}.$$

2.3.3 Treatment Effect

If it is rejected that the profiles are parallel, then two independent analyses should be done on the two groups using the repeated measurement approach. But if it is accepted that they are parallel, then we can exploit the information contained in both groups (possibly at different levels) to test a treatment effect, i.e., if the two profiles are horizontal. This may be written as:

$$H_0^{(3)}: {\mathcal{C}}(\mu_1+\mu_2)=0.$$

Consider the average profile \(\bar{x}\)

$$\bar{x}=\frac{n_1\bar{x}_1+n_2\bar{x}_2}{n_1+n_2}.$$

Clearly,

$$\bar{x}\sim N_p\left(\frac{n_1\mu_1+n_2\mu_2}{n_1+n_2}, \frac{1}{n_1+n_2}\Sigma\right).$$

Now it is not hard to prove that \(H_{0}^{(3)}\) with \(H_{0}^{(1)}\) implies that

$${\mathcal{C}}\left(\frac{n_1\mu_1+n_2\mu_2}{n_1+n_2}\right)=0.$$

So under parallel, horizontal profiles we have

$$\sqrt{n_1+n_2}{\mathcal{C}}\bar{x}\sim N_p(0, {\mathcal{C}}\Sigma {\mathcal{C}}^{\top}).$$

From Corollary 5.4 we again obtain

$$ (n_1+n_2-2)({\mathcal{C}}\bar{x})^{\top}({\mathcal{C}}{\mathcal{S}}{\mathcal{C}}^{\top})^{-1}{\mathcal{C}}\bar{x}\sim T^2(p-1, n_1+n_2-2).$$
(7.23)

This leads to the rejection region of \(H_{0}^{(3)}\), namely

$$\frac{n_1+n_2-p}{p-1}({\mathcal{C}}\bar{x})^{\top}({\mathcal{C}}{\mathcal{S}}{\mathcal{C}}^{\top})^{-1}{\mathcal{C}}\bar{x}>F_{1-\alpha; p-1,n_1+n_2-p}.$$

Example 7.19

Morrison (1990b) proposed a test in which the results of 4 sub-tests of the Wechsler Adult Intelligence Scale (WAIS) are compared for 2 categories of people: group 1 contains n 1=37 people who do not have a senile factor and group 2 contains n 2=12 people who have a senile factor. The four WAIS sub-tests are X 1 (information), X 2 (similarities), X 3 (arithmetic) and X 4 (picture completion). The relevant statistics are

The test statistic for testing if the two profiles are parallel is F=0.4634, which is not significant (p-value =0.71). Thus it is accepted that the two are parallel. The second test statistic (testing the equality of the levels of the 2 profiles) is F=17.21, which is highly significant (p-value ≈10−4). The global level of the test for the non-senile people is superior to the senile group. The final test (testing the horizontality of the average profile) has the test statistic F=53.32, which is also highly significant (p-value ≈10−14). This implies that there are substantial differences among the means of the different subtests.

figure b

3 Boston Housing

Returning to the Boston housing data set, we are now in a position to test if the means of the variables vary according to their location, for example, when they are located in a district with high valued houses. In Chapter 1, we built 2 groups of observations according to the value of X 14 being less than or equal to the median of X 14 (a group of 256 districts) and greater than the median (a group of 250 districts). In what follows, we use the transformed variables motivated in Section 1.9.

Testing the equality of the means from the two groups was proposed in a multivariate setup, so we restrict the analysis to the variables X 1, X 5, X 8, X 11, and X 13 to see if the differences between the two groups that were identified in Chapter 1 can be confirmed by a formal test. As in Test Problem 8, the hypothesis to be tested is

$$H_0:\ \mu_1=\mu_2,\quad\mbox{where}\ \mu_1\in \mathbb {R}^5, n_1=256,\ \mbox{and}\ n_2=250.$$

Σ is not known. The F-statistic given in (7.13) is equal to 126.30, which is much higher than the critical value F 0.95;5,500=2.23. Therefore, we reject the hypothesis of equal means.

To see which component, X 1, X 5, X 8, X 11, or X 13, is responsible for this rejection, take a look at the simultaneous confidence intervals defined in (7.14):

These confidence intervals confirm that all of the δ j are significantly different from zero (note there is a negative effect for X 8: weighted distances to employment centres)  MVAsimcibh.

We could also check if the factor “being bounded by the river” (variable X 4) has some effect on the other variables. To do this compare the means of (X 5,X 8,X 9,X 12,X 13,X 14). There are two groups: n 1=35 districts bounded by the river and n 2=471 districts not bounded by the river. Test Problem 8 (H 0:μ 1=μ 2) is applied again with p=6. The resulting test statistic, F=5.81, is highly significant (F 0.95;6,499=2.12). The simultaneous confidence intervals indicate that only X 14 (the value of the houses) is responsible for the hypothesis being rejected. At a significance level of 0.95

3.1 Testing Linear Restrictions

In Chapter 3 a linear model was proposed that explained the variations of the price X 14 by the variations of the other variables. Using the same procedure that was shown in Testing Problem 7, we are in a position to test a set of linear restrictions on the vector of regression coefficients β.

The model we estimated in Section 3.7 provides the following ( MVAlinregbh):

Variable

\(\hat{\beta}_{j}\)

\(\mathit{SE}(\hat{\beta}_{j})\)

t

p-value

constant

4.1769

0.3790

11.020

0.0000

X 1

−0.0146

0.0117

−1.254

0.2105

X 2

0.0014

0.0056

0.247

0.8051

X 3

−0.0127

0.0223

−0.570

0.5692

X 4

0.1100

0.0366

3.002

0.0028

X 5

−0.2831

0.1053

−2.688

0.0074

X 6

0.4211

0.1102

3.822

0.0001

X 7

0.0064

0.0049

1.317

0.1885

X 8

−0.1832

0.0368

−4.977

0.0000

X 9

0.0684

0.0225

3.042

0.0025

X 10

−0.2018

0.0484

−4.167

0.0000

X 11

−0.0400

0.0081

−4.946

0.0000

X 12

0.0445

0.0115

3.882

0.0001

X 13

−0.2626

0.0161

−16.320

0.0000

Recall that the estimated residuals \(Y-{{\mathcal{X}}}\widehat{\beta}\) did not show a big departure from normality, which means that the testing procedure developed above can be used.

  1. 1.

    First a global test of significance for the regression coefficients is performed,

    $$H_0:\ (\beta_1,\dots,\beta_{13})=0.$$

    This is obtained by defining \({{\mathcal{A}}}=(0_{13},{{\mathcal{I}}}_{13})\) and a=013 so that H 0 is equivalent to \({{\mathcal{A}}}\beta=a\) where β=(β 0,β 1,…,β 13). Based on the observed values F=123.20. This is highly significant (F 0.95;13,492=1.7401), thus we reject H 0. Note that under H 0 \(\widehat{\beta}_{H_{0}}=(3.0345,0,\dots,0)\) where \(3.0345=\overline{y}\).

  2. 2.

    Since we are interested in the effect that being located close to the river has on the value of the houses, the second test is H 0:β 4=0. This is done by fixing

    $${{\mathcal{A}}}=(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)^\top $$

    and a=0 to obtain the equivalent hypothesis \(H_{0}: {{\mathcal{A}}}\beta = a\). The result is again significant: F=9.0125 (F 0.95;1,492=3.8604) with a p-value of 0.0028. Note that this is the same p-value obtained in the individual test β 4=0 in Chapter 3, computed using a different setup.

  3. 3.

    A third test notices the fact that some of the regressors in the full model (3.57) appear to be insignificant (that is they have high individual p-values). It can be confirmed from a joint test if the corresponding reduced model, formulated by deleting the insignificant variables, is rejected by the data. We want to test H 0:β 1=β 2=β 3=β 7=0. Hence,

    $${{\mathcal{A}}}=\left(\begin{array}{@{}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{}}0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\end{array}\right)$$

    and a=04. The test statistic is 0.9344, which is not significant for F 4,492. Given that the p-value is equal to 0.44, we cannot reject the null hypothesis nor the corresponding reduced model. The value of \(\widehat{\beta}\) under the null hypothesis is

    A possible reduced model is

    $$X_{14}=\beta_0+\beta_{4}X_{4}+\beta_{5}X_{5}+\beta_{6}X_{6}+\beta_{8}X_{8}+\cdots+\beta_{13}X_{13}+\varepsilon.$$

    Estimating this reduced model using OLS, as was done in Chapter 3, provides the results shown in Table 7.1.

    Table 7.1 Linear regression for Boston housing data set  MVAlinreg2bh

    Note that the reduced model has r 2=0.763 which is very close to r 2=0.765 obtained from the full model. Clearly, including variables X 1,X 2,X 3, and X 7 does not provide valuable information in explaining the variation of X 14, the price of the houses.

4 Exercises

Exercise 7.1

Use Theorem 7.1 to derive a test for testing the hypothesis that a dice is balanced, based on n tosses of that dice. (Hint: use the multinomial probability function.)

Exercise 7.2

Consider N 3(μ,Σ). Formulate the hypothesis H 0:μ 1=μ 2=μ 3 in terms of \({\mathcal{A}}\mu=a\).

Exercise 7.3

Simulate a normal sample with and and test H 0:2μ 1μ 2=0.2 first with Σ known and then with Σ unknown. Compare the results.

Exercise 7.4

Derive expression (7.3) for the likelihood ratio test statistic in Test Problem 2.

Exercise 7.5

With the simulated data set of Example 7.14, test the hypothesis of equality of the covariance matrices.

Exercise 7.6

In the U.S. companies data set, test the equality of means between the energy and manufacturing sectors, taking the full vector of observations X 1 to X 6. Derive the simultaneous confidence intervals for the differences.

Exercise 7.7

Let XN 2(μ,Σ) where Σ is known to be . We have an i.i.d. sample of size n=6 providing \(\bar{x}^{\top} = (1\ \frac{1}{2})\). Solve the following test problems (α=0.05):

$$\begin{array}{l@{\quad}l@{\ }l@{\qquad}l@{\ }l}\mbox{a)} & H_0: & \mu =\bigl(2, \frac{2}{3}\bigr)^{\top} & H_1: & \mu \neq\bigl(2, \frac{2}{3}\bigr)^{\top}\\[4pt]\mbox{b)} & H_0: & \mu_1+\mu_2=\frac{7}{2} & H_1: & \mu_1+\mu_2\neq\frac{7}{2}\\[4pt]\mbox{c)} & H_0: & \mu_1-\mu_2=\frac{1}{2} & H_1: & \mu_1-\mu_2\neq\frac{1}{2}\\[4pt]\mbox{d)} & H_0: & \mu_1=2 & H_1: & \mu_1\neq 2.\end{array}$$

For each case, represent the rejection region graphically (comment).

Exercise 7.8

Repeat the preceeding exercise with Σ unknown and . Compare the results.

Exercise 7.9

Consider XN 3(μ,Σ). An i.i.d. sample of size n=10 provides:

  1. a)

    Knowing that the eigenvalues of S are integers, describe a 95% confidence region for μ. (Hint: to compute eigenvalues use \(|S|=\prod_{j=1}^{3}\lambda_{j}\) and \(\mbox{tr}(S)= \sum_{j=1}^{3}\lambda_{j}\).)

  2. b)

    Calculate the simultaneous confidence intervals for μ 1, μ 2 and μ 3.

  3. c)

    Can we assert that μ 1 is an average of μ 2 and μ 3?

Exercise 7.10

Consider two independent i.i.d. samples, each of size 10, from two bivariate normal populations. The results are summarised below:

Provide a solution to the following tests:

$$\begin{array}{l@{\quad}l@{\ }l@{\qquad}l@{\ }l}\mbox{a)} & H_0: & \mu_1 = \mu_2 & H_1: & \mu_1 \not= \mu_2 \\[3pt]\mbox{b)} & H_0: & \mu_{11}= \mu_{21} & H_1: & \mu_{11}\not= \mu_{21} \\[3pt]\mbox{c)} & H_0: & \mu_{12}= \mu_{22} & H_1: & \mu_{12}\not= \mu_{22}.\end{array}$$

Compare the solutions and comment.

Exercise 7.11

Prove expression (7.4) in the Test Problem 2 with log-likelihoods \(\ell_{0}^{*}\) and \(\ell_{1}^{*}\). (Hint: use (2.29).)

Exercise 7.12

Assume that XN p (μ,Σ) where Σ is unknown.

  1. a)

    Derive the log likelihood ratio test for testing the independence of the p components, that is H 0:Σ is a diagonal matrix. (Solution: \(-2\log \lambda =-n\log |{\mathcal{R}}|\) where \({\mathcal{R}}\) is the correlation matrix, which is asymptotically a \(\chi^{2}_{\frac{1}{2}p(p-1)}\) under H 0.)

  2. b)

    Assume that Σ is a diagonal matrix (all the variables are independent). Can an asymptotic test for H 0:μ=μ o against H 1:μμ o be derived? How would this compare to p independent univariate t-tests on each μ j ?

  3. c)

    Show an easy derivation of an asymptotic test for testing the equality of the p means (Hint: use \((C\bar{X})^{\top}(\mathit{CSC}^{\top})^{-1}C\bar{X} \to \chi^{2}_{p-1}\) where \({\mathcal{S}} =\mbox{diag}(s_{11},\ldots ,s_{pp})\) and \(\mathcal{C}\) is defined as in (7.10).) Compare this to the simple ANOVA procedure used in Section 3.5.

Exercise 7.13

The yields of wheat have been measured in 30 parcels that have been randomly attributed to 3 lots prepared by one of 3 different fertilisers A, B and C. The data are

Fertilizer yield

A

B

C

1

4

6

2

2

3

7

1

3

2

7

1

4

5

5

1

5

4

5

3

6

4

5

4

7

3

8

3

8

3

9

3

9

3

9

2

10

1

6

2

Using Exercise 7.12,

  1. a)

    test the independence between the 3 variables.

  2. b)

    test whether μ =[2 6 4] and compare this to the 3 univariate t-tests.

  3. c)

    test whether μ 1=μ 2=μ 3 using simple ANOVA and the χ 2 approximation.

Exercise 7.14

Consider an i.i.d. sample of size n=5 from a bivariate normal distribution

$$X\sim N_2\left(\mu,\left(\begin{array}{c@{\quad}c}3 & \rho\\\rho & 1\end{array}\right)\right)$$

where ρ is a known parameter. Suppose \(\bar{x}^{\top} =(1\ 0)\). For what value of ρ would the hypothesis H 0: μ =(0 0) be rejected in favour of \(H_{1}:\ \mu^{\top} \not= (0\ 0)\) (at the 5% level)?

Exercise 7.15

Using Example 7.14, test the last two cases described there and test the sample number one (n 1=30), to see if they are from a normal population with \(\Sigma =4{\mathcal{I}}_{4}\) (the sample covariance matrix to be used is given by S 1).

Exercise 7.16

Consider the bank data set. For the counterfeit bank notes, we want to know if the length of the diagonal (X 6) can be predicted by a linear model in X 1 to X 5. Estimate the linear model and test if the coefficients are significantly different from zero.

Exercise 7.17

In Example 7.10, can you predict the vocabulary score of the children in eleventh grade, by knowing the results from grades 8–9 and 10? Estimate a linear model and test its significance.

Exercise 7.18

Test the equality of the covariance matrices from the two groups in the WAIS subtest (Example 7.19).

Exercise 7.19

Prove expressions (7.21), (7.22) and (7.23).

Exercise 7.20

Using Theorem 6.3 and expression (7.16), construct an asymptotic rejection region of size α for testing, in a general model f(x,θ), with \(\theta \in \mathbb {R}^{k}\), H 0:θ=θ 0 against \(H_{1}: \theta \not= \theta_{0}\).

Exercise 7.21

Exercise 6.5 considered the pdf \(f(x_{1},x_{2})=\frac{1}{\theta_{1}^{2}\theta_{2}^{2} x_{2}}e^{-(\frac{x_{1}}{\theta_{1}x_{2}}+\frac{x_{2}}{\theta_{1}\theta_{2}})}\), x 1,x 2>0. Solve the problem of testing H 0:θ =(θ 01,θ 02) from an iid sample of size n on x=(x 1,x 2), where n is large.

Exercise 7.22

In Olkin and Veath (1980), the evolution of citrate concentrations in plasma is observed at 3 different times of day, X 1 (8 am), X 2 (11 am) and X 3 (3 pm), for two groups of patients who follow different diets. (The patients were randomly attributed to each group under a balanced design n 1=n 2=5.) The data are:

Group

X 1 (8 am)

X 2 (11 am)

X 3 (3 pm)

I

125

137

121

 

144

173

147

 

105

119

125

 

151

149

128

 

137

139

109

II

93

121

107

 

116

135

106

 

109

83

100

 

89

95

83

 

116

128

100

Test if the profiles of the groups are parallel, if they are at the same level and if they are horizontal.