Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter serves as a reminder of basic concepts of matrix algebra, which are particularly useful in multivariate analysis. It also introduces the notations used in this book for vectors and matrices. Eigenvalues and eigenvectors play an important role in multivariate techniques. In Sections 2.2 and 2.3, we present the spectral decomposition of matrices and consider the maximisation (minimisation) of quadratic forms given some constraints.

In analyzing the multivariate normal distribution, partitioned matrices appear naturally. Some of the basic algebraic properties are given in Section 2.5. These properties will be heavily used in Chapters 4 and 5.

The geometry of the multinormal and the geometric interpretation of the multivariate techniques (Part III) intensively uses the notion of angles between two vectors, the projection of a point on a vector and the distances between two points. These ideas are introduced in Section 2.6.

1 Elementary Operations

A matrix \({{\mathcal{A}}} \) is a system of numbers with n rows and p columns:

$${{{\mathcal{A}}}} = \left (\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}a_{11}&a_{12}&\ldots&\ldots &\ldots &a_{1p}\\\vdots&a_{22}&&&&\vdots\\\vdots&\vdots&\ddots&&&\vdots\\\vdots&\vdots&&\ddots&&\vdots\\\vdots&\vdots&&&\ddots&\vdots\\a_{n1}&a_{n2}&\ldots &\ldots &\ldots &a_{np}\end{array}\right ).$$

We also write (a ij ) for \({{\mathcal{A}}}\) and \({{\mathcal{A}}}(n\times p)\) to indicate the numbers of rows and columns. Vectors are matrices with one column and are denoted as x or x(p×1). Special matrices and vectors are defined in Table 2.1. Note that we use small letters for scalars as well as for vectors.

Table 2.1 Special matrices and vectors

1.1 Matrix Operations

Elementary operations are summarised below:

1.2 Properties of Matrix Operations

1.3 Matrix Characteristics

1.3.1 Rank

The rank, \(\mathop {\rm {rank}}({{\mathcal{A}}})\), of a matrix \({\mathcal{A}}(n\times p)\) is defined as the maximum number of linearly independent rows (columns). A set of k rows a j of \({\mathcal{A}}(n\times p)\) are said to be linearly independent if \(\sum_{j=1}^{k} c_{j} a_{j}=0_{p}\) implies c j =0,∀j, where c 1,…,c k are scalars. In other words no rows in this set can be expressed as a linear combination of the (k−1) remaining rows.

1.3.2 Trace

The trace of a matrix is the sum of its diagonal elements

$$\mathop {\rm {tr}}({{\mathcal{A}}}) = \sum ^p_{i=1}a_{ii}.$$

1.3.3 Determinant

The determinant is an important concept of matrix algebra. For a square matrix \({{\mathcal{A}}}\), it is defined as:

$$\mathop{\rm{det}}({{\mathcal{A}}})= |{{\mathcal{A}}}|=\sum (-1)^{|\tau |}\ a_{1\tau (1)} \ldots a_{p\tau (p)},$$

the summation is over all permutations τ of {1,2,…,p}, and |τ|=0 if the permutation can be written as a product of an even number of transpositions and |τ|=1 otherwise.

Example 2.1

In the case of p=2, and we can permute the digits “1” and “2” once or not at all. So,

$$|{{\mathcal{A}}}|=a_{11}\ a_{22}-a_{12}\ a_{21}.$$

1.3.4 Transpose

For \({\mathcal{A}}(n \times p)\) and \({\mathcal{B}}(p \times n)\)

$$({{\mathcal{A}}}^{\top})^{\top}={{\mathcal{A}}},\quad\mbox{and}\quad({{\mathcal{A}}}{{\mathcal{B}}})^{\top}={{\mathcal{B}}}^{\top}{ {\mathcal{A}}}^{\top}.$$

1.3.5 Inverse

If \(|{{\mathcal{A}}}|\not=0\) and \({{\mathcal{A}}}(p\times p)\), then the inverse \({{\mathcal{A}}}^{-1}\) exists:

$${{\mathcal{A}}}\ {{\mathcal{A}}}^{-1} = {{\mathcal{A}}}^{-1}\ {{\mathcal{A}}} = {{\mathcal{I}}}_p.$$

For small matrices, the inverse of \({{\mathcal{A}}}=(a_{ij})\) can be calculated as

$${{\mathcal{A}}}^{-1}=\frac{{{\mathcal{C}}}}{|{\mathcal{A}}|},$$

where \({{\mathcal{C}}}=(c_{ij})\) is the adjoint matrix of \({{\mathcal{A}}}\). The elements c ji of \({{\mathcal{C}}}^{\top}\) are the co-factors of \({{\mathcal{A}}}\):

$$c_{ji}=(-1)^{i+j}\left|\begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c}a_{1 1}&\dots&a_{1 (j-1)}&a_{1 (j+1)}&\dots&a_{1 p}\\\vdots & & & & & \\a_{(i-1) 1}&\dots&a_{(i-1) (j-1)}&a_{(i-1) (j+1)}&\dots&a_{(i-1) p}\\a_{(i+1) 1}&\dots&a_{(i+1) (j-1)}&a_{(i+1) (j+1)}&\dots&a_{(i+1) p}\\\vdots & & & & & \\a_{p 1}&\dots&a_{p (j-1)}&a_{p (j+1)}&\dots&a_{p p}\\\end{array}\right|.$$

1.4 G-inverse

A more general concept is the G-inverse (Generalised Inverse) \({\mathcal{A}}^{-}\) which satisfies the following:

$${{\mathcal{A}}}\ {{\mathcal{A}}}^-{{\mathcal{A}}} = {{\mathcal{A}}}.$$

Later we will see that there may be more than one G-inverse.

Example 2.2

The generalised inverse can also be calculated for singular matrices. We have:

$$\left(\begin{array}{l@{\quad}l}1&0\\0&0\end{array}\right)\left(\begin{array}{l@{\quad}l}1&0\\0&0\end{array}\right)\left(\begin{array}{l@{\quad}l}1&0\\0&0\end{array}\right)=\left(\begin{array}{l@{\quad}l}1&0\\0&0\end{array}\right),$$

which means that the generalised inverse of is even though the inverse matrix of \({{\mathcal{A}}}\) does not exist in this case.

1.4.1 Eigenvalues, Eigenvectors

Consider a (p×p) matrix \({\mathcal{A}}\). If there a scalar λ and a vector γ exists such as

$$ {{\mathcal{A}}}\gamma=\lambda \gamma,$$
(2.1)

then we call

$$\begin{array}{l@{\quad}l}\lambda & \mbox{an eigenvalue}\\\gamma &\mbox{an eigenvector.}\end{array}$$

It can be proven that an eigenvalue λ is a root of the p-th order polynomial \(|{\mathcal{A}}-\lambda I_{p}|=0\). Therefore, there are up to p eigenvalues λ 1,λ 2,…,λ p of \({\mathcal{A}}\). For each eigenvalue λ j , a corresponding eigenvector γ j exists given by equation (2.1). Suppose the matrix \({{\mathcal{A}}}\) has the eigenvalues λ 1,…,λ p . Let \(\Lambda =\mathop {\rm {diag}}(\lambda_{1},\ldots,\lambda_{p}\)).

The determinant \(|{{\mathcal{A}}}|\) and the trace \(\mathop {\rm {tr}}({{\mathcal{A}}})\) can be rewritten in terms of the eigenvalues:

(2.2)
(2.3)

An idempotent matrix \({{\mathcal{A}}}\) (see the definition in Table 2.1) can only have eigenvalues in {0,1} therefore \(\mathop {\rm {tr}}({{\mathcal{A}}})=\mathop {\rm {rank}}({{\mathcal{A}}}) =\) number of eigenvalues ≠0.

Example 2.3

Let us consider the matrix . It is easy to verify that \({{\mathcal{A}}} {{\mathcal{A}}} = {{\mathcal{A}}}\) which implies that the matrix \({{\mathcal{A}}}\) is idempotent.

We know that the eigenvalues of an idempotent matrix are equal to 0 or 1. In this case, the eigenvalues of \({{\mathcal{A}}}\) are λ 1=1, λ 2=1, and λ 3=0 since

$$\left ( \begin{array}{r@{\quad}r@{\quad}r}1 & 0 & 0 \\0 & \frac{1}{2} & \frac{1}{2}\\[2pt]0 & \frac{1}{2} & \frac{1}{2} \end{array} \right )\left( \begin{array}{r}1\\0\\0 \end{array} \right)=1 \left( \begin{array}{r}1\\0\\0 \end{array} \right),\qquad \left ( \begin{array}{r@{\quad}r@{\quad}r}1 & 0 & 0 \\0 & \frac{1}{2} & \frac{1}{2}\\[2pt]0 & \frac{1}{2} & \frac{1}{2} \end{array} \right )\left( \begin{array}{r}0\\ \frac{\sqrt{2}}{2}\\[2pt] \frac{\sqrt{2}}{2} \end{array} \right)=1 \left( \begin{array}{r}0\\ \frac{\sqrt{2}}{2}\\[2pt] \frac{\sqrt{2}}{2} \end{array} \right),$$

and

$$\left ( \begin{array}{r@{\quad}r@{\quad}r}1 & 0 & 0 \\0 & \frac{1}{2} & \frac{1}{2}\\[2pt]0 & \frac{1}{2} & \frac{1}{2} \end{array} \right )\left( \begin{array}{r}0\\ \frac{\sqrt{2}}{2}\\[2pt] -\frac{\sqrt{2}}{2} \end{array} \right)=0 \left( \begin{array}{r}0\\ \frac{\sqrt{2}}{2}\\[2pt] -\frac{\sqrt{2}}{2} \end{array} \right).$$

Using formulas (2.2) and (2.3), we can calculate the trace and the determinant of \({{\mathcal{A}}}\) from the eigenvalues: \(\mathop {\rm {tr}}({{\mathcal{A}}})=\lambda_{1}+\lambda_{2}+\lambda_{3}=2\), \(|{{\mathcal{A}}}|= \lambda_{1} \lambda_{2} \lambda_{3} = 0\), and \(\mathop {\rm {rank}}({{\mathcal{A}}})=2\).

1.5 Properties of Matrix Characteristics

\({\mathcal{A}}(n \times n),\;{\mathcal{B}}(n \times n),\; c \in \mathbb {R}\)

(2.4)
(2.5)
(2.6)
(2.7)

\({\mathcal{A}}(n \times p),\; {\mathcal{B}}(p \times n)\)

(2.8)
(2.9)
(2.10)
(2.11)
(2.12)
(2.13)

\({\mathcal{A}}(n \times p),\; {\mathcal{B}}(p \times q),\; {\mathcal{C}}(q \times n)\)

(2.14)
(2.15)

\({\mathcal{A}}( p \times p) \)

(2.16)
(2.17)
figure a

2 Spectral Decompositions

The computation of eigenvalues and eigenvectors is an important issue in the analysis of matrices. The spectral decomposition or Jordan decomposition links the structure of a matrix to the eigenvalues and the eigenvectors.

Theorem 2.1

(Jordan Decomposition)

Each symmetric matrix \({{\mathcal{A}}}(p \times p)\) can be written as

$$ {{{\mathcal{A}}}} = \Gamma\ \Lambda \ \Gamma^{\top}= \sum^p_{j=1} \lambda_j \gamma_{_{j}} \gamma_{_{j}}^{\top}$$
(2.18)

where

$$\Lambda = \mathop {\rm {diag}}(\lambda_1, \ldots, \lambda_p) $$

and where

$$\Gamma = (\gamma_{_{1}}, \gamma_{_{2}}, \ldots,\gamma_{_{p}}) $$

is an orthogonal matrix consisting of the eigenvectors \(\gamma_{_{j}}\) of \({{\mathcal{A}}}\).

Example 2.4

Suppose that . The eigenvalues are found by solving \(|{{\mathcal{A}}}-\lambda {{\mathcal{I}}}|=0\). This is equivalent to

$$\left |\begin{array}{c@{\quad}c} 1-\lambda &2\\ 2&3-\lambda \end{array}\right |=(1-\lambda )(3-\lambda )-4=0.$$

Hence, the eigenvalues are \(\lambda _{1}=2+\sqrt{5}\) and \(\lambda_{2}=2-\sqrt{5}\). The eigenvectors are \(\gamma_{_{1}}= (0.5257, 0.8506)^{\top}\) and \(\gamma_{_{2}}= (0.8506, -0.5257)^{\top}\). They are orthogonal since \(\gamma_{1}^{\top} \gamma_{2}=0\).

Using spectral decomposition, we can define powers of a matrix \({\mathcal{A}} (p \times p)\). Suppose \({\mathcal{A}}\) is a symmetric matrix with positive eigenvalues. Then by Theorem 2.1

$${\mathcal{A}} = \Gamma\Lambda \Gamma^{\top},$$

and we define for some \(\alpha \in \mathbb {R}\)

$${\mathcal{A}}^\alpha =\Gamma\Lambda ^\alpha \Gamma^{\top},$$
(2.19)

where \(\Lambda ^{\alpha}=\mathop {\rm {diag}}(\lambda ^{\alpha}_{1},\ldots ,\lambda ^{\alpha}_{p})\). In particular, we can easily calculate the inverse of the matrix \({\mathcal{A}}\). Suppose that the eigenvalues of \({\mathcal{A}}\) are positive. Then with α=−1, we obtain the inverse of \({\mathcal{A}}\) from

$${\mathcal{A}}^{-1}= \Gamma \Lambda^{-1} \Gamma^{\top}.$$
(2.20)

Another interesting decomposition which is later used is given in the following theorem.

Theorem 2.2

(Singular Value Decomposition)

Each matrix \({{\mathcal{A}}} (n \times p) \) with rank r can be decomposed as

$${{\mathcal{A}}} = \Gamma\ \Lambda\ \Delta^{\top},$$

where Γ(n×r) and Δ(p×r). Both Γ and Δ are column orthonormal, i.e., \(\Gamma^{\top}\Gamma = \Delta^{\top}\Delta={{\mathcal{I}}}_{r}\) and \(\Lambda=\mathop {\rm {diag}}( \lambda_{1}^{1/2}, \ldots,\lambda_{r}^{1/2} ) \), λ j >0. The values λ 1,…,λ r are the non-zero eigenvalues of the matrices \({\mathcal{A}}{\mathcal{A}}^{\top}\) and \({\mathcal{A}}^{\top} {\mathcal{A}}\). Γ and Δ consist of the corresponding r eigenvectors of these matrices.

This is obviously a generalisation of Theorem 2.1 (Jordan decomposition). With Theorem 2.2, we can find a G-inverse \({{\mathcal{A}}}^{-}\) of \({{\mathcal{A}}}\). Indeed, define \({{\mathcal{A}}}^{-}=\Delta\ \Lambda^{-1}\ \Gamma^{\top}\). Then \({{\mathcal{A}}}\ {{\mathcal{A}}}^{-}\ {{\mathcal{A}}}= \Gamma\ \Lambda\ \Delta^{\top}={{\mathcal{A}}}\). Note that the G-inverse is not unique.

Example 2.5

In Example 2.2, we showed that the generalised inverse of is . The following also holds

$$\left(\begin{array}{l@{\quad}l}1&0\\0&0\end{array}\right)\left(\begin{array}{l@{\quad}l}1&0\\0&8\end{array}\right)\left(\begin{array}{l@{\quad}l}1&0\\0&0\end{array}\right)=\left(\begin{array}{l@{\quad}l}1&0\\0&0\end{array}\right)$$

which means that the matrix is also a generalised inverse of \({{\mathcal{A}}}\).

figure b

3 Quadratic Forms

A quadratic form Q(x) is built from a symmetric matrix \({{\mathcal{A}}}(p\times p)\) and a vector \(x\in \mathbb {R}^{p}\):

$$Q(x) = x^{\top}\ {{\mathcal{A}}}\ x = \sum ^p_{i=1}\sum ^p_{j=1} a_{ij}x_ix_j.$$
(2.21)

3.1 Definiteness of Quadratic Forms and Matrices

$$\begin{array}{l@{\qquad}l}Q(x) > 0 \quad\mbox{for all } x\not= 0 & \mathit{positive\ definite}\mbox{\index{positive definite}}\\Q(x) \ge 0 \quad\mbox{for all } x\not= 0 & \mathit{positive\ semidefinite}\mbox{\index{positive semidefinite}}\end{array}$$

A matrix \({{\mathcal{A}}}\) is called positive definite (semidefinite) if the corresponding quadratic form Q(.) is positive definite (semidefinite). We write \({\mathcal{A}}> 0 \ (\ge 0)\).

Quadratic forms can always be diagonalized, as the following result shows.

Theorem 2.3

If \({{\mathcal{A}}}\) is symmetric and \(Q({{x}})={{x}}^{\top}{{\mathcal{A}}} {x}\) is the corresponding quadratic form, then there exists a transformation x↦Γ x=y such that

$$x^{\top}\ {{\mathcal{A}}}\ x= \sum^p_{i=1} \lambda _iy^2_i,$$

where λ i are the eigenvalues of \({{\mathcal{A}}}\).

Proof

\({{\mathcal{A}}}=\Gamma\ \Lambda \ \Gamma^{\top}\). By Theorem 2.1 and y α we have that \(x^{\top}{ {\mathcal{A}}} x=x^{\top} \Gamma \Lambda \Gamma^{\top} x= y^{\top} \Lambda y = \sum^{p}_{i=1}\lambda _{i} y^{2}_{i}\). □

Positive definiteness of quadratic forms can be deduced from positive eigenvalues.

Theorem 2.4

\({{\mathcal{A}}}>0\) if and only if all λ i >0, i=1,…,p.

Proof

\(0 < \lambda_{1}y^{2}_{1} + \cdots + \lambda_{p}y^{2}_{p} = x^{\top} {{\mathcal{A}}} x \) for all x≠0 by Theorem 2.3. □

Corollary 2.1

If \({{\mathcal{A}}}>0\), then \({{\mathcal{A}}}^{-1}\) exists and \(|{{\mathcal{A}}}|>0\).

Example 2.6

The quadratic form \(Q(x)=x^{2}_{1}+x^{2}_{2}\) corresponds to the matrix with eigenvalues λ 1=λ 2=1 and is thus positive definite. The quadratic form Q(x)=(x 1x 2)2 corresponds to the matrix with eigenvalues λ 1=2,λ 2=0 and is positive semidefinite. The quadratic form \(Q(x) = x^{2}_{1}-x^{2}_{2}\) with eigenvalues λ 1=1,λ 2=−1 is indefinite.

In the statistical analysis of multivariate data, we are interested in maximising quadratic forms given some constraints.

Theorem 2.5

If \({{\mathcal{A}}}\) and \({{\mathcal{B}}}\) are symmetric and \({{\mathcal{B}}}>0\), then the maximum of \(\frac{x^{\top} {{\mathcal{A}}} x}{x^{\top}{ {\mathcal{B}}}x}\) is given by the largest eigenvalue of \({{\mathcal{B}}}^{-1}{{\mathcal{A}}}\). More generally,

$$\max_x\frac{x^{\top} {{\mathcal{A}}}x}{x^{\top}{ {\mathcal{B}}}x}= \lambda _1\ge \lambda _2\ge \cdots \ge \lambda_p = \min_x\frac{x^{\top} {{\mathcal{A}}}x}{x^{\top}{ {\mathcal{B}}}x},$$

where λ 1,…,λ p denote the eigenvalues of \({{\mathcal{B}}}^{-1}{{\mathcal{A}}}\). The vector which maximises (minimises) \(\frac{x^{\top} {{\mathcal{A}}}x}{x^{\top}{ {\mathcal{B}}}x}\) is the eigenvector of \({\mathcal{B}}^{-1}{\mathcal{A}}\) which corresponds to the largest (smallest) eigenvalue of \({{\mathcal{B}}}^{-1}{{\mathcal{A}}}\). If \({x^{\top}{ {\mathcal{B}}}x=1}\), we get

$$\max_x{x^{\top} {{\mathcal{A}}} x}=\lambda _1\ge \lambda _2\ge \cdots \ge \lambda_p = \min_x{x^{\top}{{\mathcal{A}}} x}.$$

Proof

By definition, \({{\mathcal{B}}}^{1/2} = \Gamma_{{{\mathcal{B}}}}\ \Lambda_{{{\mathcal{B}}}}^{1/2}\ \Gamma^{\top}_{{{\mathcal{B}}}}\) is symmetric. Then \({x^{\top}{ {\mathcal{B}}}x} =\|x^{\top}{ {\mathcal{B}}}^{1/2}\|^{2} =\|{{\mathcal{B}}}^{1/2}x\|^{2}\). Set \(y =\frac{{{\mathcal{B}}}^{1/2}x}{\left\|{{\mathcal{B}}}^{1/2}x\right\|}\), then

$$ \max_x\frac{x^{\top} {{\mathcal{A}}} x}{x^{\top}{ {\mathcal{B}}}x} =\max_{\{y:y^{\top}y=1\}}y^{\top}{ {\mathcal{B}}}^{-1/2}\ {{\mathcal{A}}}{{\mathcal{B}}}^{-1/2}y.$$
(2.22)

From Theorem 2.1, let

$${{\mathcal{B}}}^{-1/2}\ {{\mathcal{A}}}\ {{\mathcal{B}}}^{-1/2}=\Gamma\ \Lambda \ \Gamma^{\top}$$

be the spectral decomposition of \({{\mathcal{B}}}^{-1/2}\ {{\mathcal{A}}}\ {{\mathcal{B}}}^{-1/2}\). Set

$$z = \Gamma^{\top}y, \quad\textrm{then}\quad z^{\top}z=y^{\top}\Gamma\ \Gamma^{\top}\ y=y^{\top}y.$$

Thus (2.22) is equivalent to

$$\max_{\{z:z^{\top}z=1\}}z^{\top}\ \Lambda \ z=\max_{\{z:z^{\top}z=1\}} \sum^p_{i=1}\lambda_iz^2_i.$$

But

$$\max_z\sum \lambda _iz^2_i\le \lambda _1\underbrace{\max_z\sum z^2_i}_{=1}=\lambda_1.$$

The maximum is thus obtained by z=(1,0,…,0), i.e.,

$$y = \gamma_{_{1}}, \quad \textrm{hence} \quad x = {{\mathcal{B}}}^{-1/2}\gamma_{_{1}}.$$

Since \({{\mathcal{B}}}^{-1}{{\mathcal{A}}}\) and \({{\mathcal{B}}}^{-1/2}\ {{\mathcal{A}}}\ {{\mathcal{B}}}^{-1/2}\) have the same eigenvalues, the proof is complete.

To maximise (minimise) \({x^{\top}{ {\mathcal{A}}}x}\) under \({x^{\top}{ {\mathcal{B}}}x=1}\), below is another proof using the Lagrange method.

$$\max_x{x^{\top}{ {\mathcal{A}}}x} = \max_x[x^{\top}{ {\mathcal{A}}}x - \lambda (x^{\top}{ {\mathcal{B}}}x-1)].$$

The first derivative of it in respect to x, is equal to 0:

$$2{{\mathcal{A}}}x-2\lambda{ {\mathcal{B}}}x = 0,$$

so

$${{\mathcal{B}}}^{-1}{{\mathcal{A}}}x = \lambda x.$$

By the definition of eigenvector and eigenvalue, our maximiser x is \({{\mathcal{B}}}^{-1}{{\mathcal{A}}}\)’s eigenvector corresponding to eigenvalue λ. So

$$\max_{\{x:x^{\top}{ {\mathcal{B}}}x=1\}}x^{\top} {{\mathcal{A}}}x = \max_{\{x:x^{\top}{ {\mathcal{B}}}x=1\}}x^{\top}{ {\mathcal{B}}}{{\mathcal{B}}}^{-1} {{\mathcal{A}}}x =\max_{\{x:x^{\top}{ {\mathcal{B}}}x=1\}}x^{\top}{ {\mathcal{B}}} \lambda x =\max \lambda$$

which is just the maximum eigenvalue of \({{\mathcal{B}}}^{-1} {{\mathcal{A}}}\), and we choose the corresponding eigenvector as our maximiser x . □

Example 2.7

Consider the following matrices

$${{\mathcal{A}}}=\left(\begin{array}{c@{\quad}c}1&2\\2&3\end{array}\right)\quad \mbox{and} \quad {{\mathcal{B}}}=\left(\begin{array}{c@{\quad}c}1&0\\0&1\end{array}\right).$$

We calculate

$${{\mathcal{B}}}^{-1}{{\mathcal{A}}}=\left(\begin{array}{c@{\quad}c}1&2\\2&3\end{array}\right).$$

The biggest eigenvalue of the matrix \({{\mathcal{B}}}^{-1}{{\mathcal{A}}}\) is \(2+\sqrt{5}\). This means that the maximum of \(x^{\top} {\mathcal{A}}x\) under the constraint \(x^{\top} {\mathcal{B}}x = 1\) is \(2+\sqrt{5}\).

Notice that the constraint \(x^{\top} {\mathcal{B}}x = 1\) corresponds, with our choice of \({\mathcal{B}}\), to the points which lie on the unit circle \(x_{1}^{2}+x_{2}^{2}=1\).

figure c

4 Derivatives

For later sections of this book, it will be useful to introduce matrix notation for derivatives of a scalar function of a vector x with respect to x. Consider \(f:\mathbb {R}^{p} \to \mathbb {R}\) and a (p×1) vector x, then \(\frac{\partial f(x)}{\partial x}\) is the column vector of partial derivatives \(\{\frac{\partial f(x)}{\partial x_{j}}\},j=1,\ldots ,p\) and \(\frac{\partial f(x)}{\partial x^{\top}}\) is the row vector of the same derivative (\(\frac{\partial f(x)}{\partial x}\) is called the gradient of f).

We can also introduce second order derivatives: \(\frac{\partial^{2} f(x)}{\partial x\partial x^{\top}}\) is the (p×p) matrix of elements \(\frac{\partial^{2} f(x)}{\partial x_{i}\partial x_{j}}, i=1,\ldots ,p\) and j=1,…,p. (\(\frac{\partial^{2} f(x)}{\partial x\partial x^{\top}}\) is called the Hessian of f.)

Suppose that a is a (p×1) vector and that \({\mathcal{A}}= {\mathcal{A}}^{\top}\) is a (p×p) matrix. Then

(2.23)
(2.24)

The Hessian of the quadratic form \(Q(x)= x^{\top} {\mathcal{A}} x\) is:

$$ \frac{\partial^2 x^{\top} {\mathcal{A}} x}{\partial x \partial x^{\top}}= 2\mathcal{A}.$$
(2.25)

Example 2.8

Consider the matrix

$${{\mathcal{A}}}=\left(\begin{array}{c@{\quad}c}1&2\\2&3\end{array}\right).$$

From formulas (2.24) and (2.25) it immediately follows that the gradient of \(Q(x)= x^{\top} {\mathcal{A}} x\) is

$$\frac{\partial x^{\top} {\mathcal{A}} x}{\partial x} = 2 {\mathcal{A}} x=2 \left(\begin{array}{c@{\quad}c}1&2\\2&3\end{array}\right)x=\left(\begin{array}{c@{\quad}c}2x&4x\\4x&6x\end{array}\right)$$

and the Hessian is

$$\frac{\partial^2 x^{\top} {{\mathcal{A}}} x}{\partial x \partial x^{\top}}= 2{{\mathcal{A}}}= 2 \left(\begin{array}{c@{\quad}c}1&2\\2&3\end{array}\right)=\left(\begin{array}{c@{\quad}c}2&4\\4&6\end{array}\right).$$

5 Partitioned Matrices

Very often we will have to consider certain groups of rows and columns of a matrix \({\mathcal{A}} (n \times p)\). In the case of two groups, we have

$${\mathcal{A}}=\left(\begin{array}{l@{\quad}l}{\mathcal{A}}_{11} & {\mathcal{A}}_{12}\\{\mathcal{A}}_{21} & {\mathcal{A}}_{22}\end{array}\right)$$

where \({\mathcal{A}}_{ij} (n_{i} \times p_{j}) ,\ i,j=1,2,\ n_{1}+n_{2}=n\) and p 1+p 2=p.

If \({\mathcal{B}} (n \times p)\) is partitioned accordingly, we have:

An important particular case is the square matrix \({\mathcal{A}} (p \times p)\), partitioned in such a way that \({\mathcal{A}}_{11}\) and \({\mathcal{A}}_{22}\) are both square matrices (i.e., n j =p j ,j=1,2). It can be verified that when \({\mathcal{A}}\) is non-singular (\({\mathcal{A}}{\mathcal{A}}^{-1}={\mathcal{I}}_{p}\)):

$$ {\mathcal{A}}^{-1}=\left(\begin{array}{l@{\quad}l}{\mathcal{A}}^{11} & {\mathcal{A}}^{12}\\{\mathcal{A}}^{21} & {\mathcal{A}}^{22}\end{array}\right)$$
(2.26)

where

$$\left\{\begin{array}{@{}l}{\mathcal{A}}^{11}= ({\mathcal{A}}_{11}- {\mathcal{A}}_{12}{\mathcal{A}}^{-1}_{22}{\mathcal{A}}_{21})^{-1} \stackrel{\mathrm{def}}{=} ({\mathcal{A}}_{11\cdot 2})^{-1}\\[2pt]{\mathcal{A}}^{12}= -({\mathcal{A}}_{11 \cdot 2})^{-1}{\mathcal{A}}_{12}{\mathcal{A}}^{-1}_{22} \\[2pt]{\mathcal{A}}^{21}= -{\mathcal{A}}_{22}^{-1}{\mathcal{A}}_{21}({\mathcal{A}}_{11 \cdot 2})^{-1} \\[2pt]{\mathcal{A}}^{22}= {\mathcal{A}}_{22}^{-1}+{\mathcal{A}}_{22}^{-1}{\mathcal{A}}_{21}({\mathcal{A}}_{11 \cdot 2})^{-1} {\mathcal{A}}_{12}{\mathcal{A}}^{-1}_{22}.\end{array}\right.$$

An alternative expression can be obtained by reversing the positions of \({\mathcal{A}}_{11}\) and \({\mathcal{A}}_{22}\) in the original matrix.

The following results will be useful if \({\mathcal{A}}_{11}\) is non-singular:

$$ |{\mathcal{A}}|=|{\mathcal{A}}_{11}||{\mathcal{A}}_{22}-{\mathcal{A}}_{21}{\mathcal{A}}^{-1}_{11}{\mathcal{A}}_{12}|=|{\mathcal{A}}_{11}||{\mathcal{A}}_{22\cdot 1}|.$$
(2.27)

If \({\mathcal{A}}_{22}\) is non-singular, we have that:

$$ |{\mathcal{A}}|=|{\mathcal{A}}_{22}||{\mathcal{A}}_{11}-{\mathcal{A}}_{12}{\mathcal{A}}^{-1}_{22}{\mathcal{A}}_{21}|=|{\mathcal{A}}_{22}||{\mathcal{A}}_{11\cdot 2}|.$$
(2.28)

A useful formula is derived from the alternative expressions for the inverse and the determinant. For instance let

$${\mathcal{B}} =\left(\begin{array}{l@{\quad}l}1 & b^{\top}\\a & {\mathcal{A}}\end{array}\right)$$

where a and b are (p×1) vectors and \({\mathcal{A}}\) is non-singular. We then have:

$$ |{\mathcal{B}}|=| {\mathcal{A}}-ab^{\top} |= | {\mathcal{A}}|| 1-b^{\top} {\mathcal{A}}^{-1}a |$$
(2.29)

and equating the two expressions for \({\mathcal{B}}^{22}\), we obtain the following:

$$({\mathcal{A}}-ab^{\top})^{-1}={\mathcal{A}}^{-1}+\frac{{\mathcal{A}}^{-1}ab^{\top} {\mathcal{A}}^{-1}}{1-b^{\top} {\mathcal{A}}^{-1}a}.$$
(2.30)

Example 2.9

Let’s consider the matrix

$${{\mathcal{A}}}=\left(\begin{array}{c@{\quad}c}1&2\\2&2\end{array}\right).$$

We can use formula (2.26) to calculate the inverse of a partitioned matrix, i.e., \({{\mathcal{A}}}^{11}=-1,{{\mathcal{A}}}^{12}={{\mathcal{A}}}^{21}=1, {{\mathcal{A}}}^{22}=-1/2\). The inverse of \({{\mathcal{A}}}\) is

$${{\mathcal{A}}}^{-1}=\left(\begin{array}{r@{\quad}c}-1&1\\1&-0.5\end{array}\right).$$

It is also easy to calculate the determinant of \({{\mathcal{A}}}\):

$$|{{\mathcal{A}}}|=|1||2-4|=-2.$$

Let \({\mathcal{A}} (n\times p)\) and \({\mathcal{B}} (p\times n)\) be any two matrices and suppose that np. From (2.27) and (2.28) we can conclude that

$$ \left|\begin{array}{c@{\quad}c} -\lambda {\mathcal{I}}_n&-{\mathcal{A}}\\{\mathcal{B}}&{\mathcal{I}}_p\end{array}\right|=(-\lambda)^{n-p}|{\mathcal{B}}{\mathcal{A}}-\lambda {\mathcal{I}}_p|=|{\mathcal{A}}{\mathcal{B}}-\lambda {\mathcal{I}}_n|.$$
(2.31)

Since both determinants on the right-hand side of (2.31) are polynomials in λ, we find that the n eigenvalues of \({\mathcal{A}}{\mathcal{B}}\) yield the p eigenvalues of \({\mathcal{B}}{\mathcal{A}}\) plus the eigenvalue 0, np times.

The relationship between the eigenvectors is described in the next theorem.

Theorem 2.6

For \({\mathcal{A}} (n\times p)\) and \({\mathcal{B}} (p\times n)\), the non-zero eigenvalues of \({\mathcal{A}}{\mathcal{B}}\) and \({\mathcal{B}}{\mathcal{A}}\) are the same and have the same multiplicity. If x is an eigenvector of \({\mathcal{A}}{\mathcal{B}}\) for an eigenvalue λ≠0, then \(y={\mathcal{B}}x\) is an eigenvector of \({\mathcal{B}}{\mathcal{A}}\).

Corollary 2.2

For \({\mathcal{A}} (n\times p)\), \({\mathcal{B}} (q\times n)\), a(p×1), and b(q×1) we have

$$\mathop {\rm {rank}}({\mathcal{A}}ab^{\top} {\mathcal{B}})\leq 1.$$

The non-zero eigenvalue, if it exists, equals \(b^{\top} {\mathcal{B}}{\mathcal{A}}a\) (with eigenvector \({\mathcal{A}}a\)).

Proof

Theorem 2.6 asserts that the eigenvalues of \({\mathcal{A}}ab^{\top} {\mathcal{B}}\) are the same as those of \(b^{\top} {\mathcal{B}}{\mathcal{A}}a\). Note that the matrix \(b^{\top} {\mathcal{B}}{\mathcal{A}}a\) is a scalar and hence it is its own eigenvalue λ 1.

Applying \({\mathcal{A}}ab^{\top} {\mathcal{B}}\) to \({\mathcal{A}}a\) yields

$$({\mathcal{A}}ab^{\top} {\mathcal{B}})({\mathcal{A}}a) = ({\mathcal{A}}a) (b^{\top} {\mathcal{B}}{\mathcal{A}}a)=\lambda_1 {\mathcal{A}}a.$$

 □

6 Geometrical Aspects

6.1 Distance

Let \(x,y\in \mathbb {R}^{p}\). A distance d is defined as a function

$$d: \mathbb {R}^{2p}\to \mathbb {R}_+ \quad\! \mbox{which fulfills} \quad\!\!\left \{\begin{array}{@{}l@{\quad}l@{}}d(x,y)>0 & \forall x\neq y\\d(x,y)=0 & \mbox{if and only if } x=y\\d(x,y)\le d(x,z)+d(z,y) & \forall x,y,z.\\\end{array}\right.$$

A Euclidean distance d between two points x and y is defined as

$$d^2(x,y)=(x-y)^{T} {\mathcal{A}}(x-y) $$
(2.32)

where \({\mathcal{A}}\) is a positive definite matrix \(({\mathcal{A}}>0) \). \({\mathcal{A}}\) is called a metric.

Example 2.10

A particular case is when \({\mathcal{A}}={\mathcal{I}}_{p}\), i.e.,

$$ d^2 (x,y)=\sum_{i=1}^p {(x_i-y_i)}^2.$$
(2.33)

Figure 2.1 illustrates this definition for p=2.

Fig. 2.1
figure 1

Distance d

Note that the sets \(E_{d}=\{x \in \mathbb {R}^{p} \mid (x-x_{0})^{\top}(x-x_{0})=d^{2} \}\) , i.e., the spheres with radius d and centre x 0, are the Euclidean \({{\mathcal{I}}}_{p}\) iso-distance curves from the point x 0 (see Figure 2.2).

Fig. 2.2
figure 2

Iso-distance sphere

The more general distance (2.32) with a positive definite matrix \({\mathcal{A}} \ ({\mathcal{A}}>0)\) leads to the iso-distance curves

$$ E_d =\{ x \in \mathbb {R}^p \mid (x-x_0)^{\top} {\mathcal{A}}(x-x_0)=d^2 \},$$
(2.34)

i.e., ellipsoids with centre x 0, matrix \({\mathcal{A}}\) and constant d (see Figure 2.3).

Fig. 2.3
figure 3

Iso-distance ellipsoid

Let γ 1,γ 2,…,γ p be the orthonormal eigenvectors of \({\mathcal{A}}\) corresponding to the eigenvalues λ 1λ 2≥⋯≥λ p . The resulting observations are given in the next theorem.

Theorem 2.7

  1. (i)

    The principal axes of E d are in the direction of γ i ; i=1,…,p.

  2. (ii)

    The half-lengths of the axes are \(\sqrt{ \frac{d^{2}}{\lambda_{i}} }\); i=1,…,p.

  3. (iii)

    The rectangle surrounding the ellipsoid E d is defined by the following inequalities:

    $$x_{0i} - \sqrt{d^2 a^{ii}} \le x_i \le x_{0i} + \sqrt{d^2 a^{ii}},\quad i=1,\ldots,p,$$

    where a ii is the (i,i) element of \({\mathcal{A}}^{-1}\). By the rectangle surrounding the ellipsoid E d we mean the rectangle whose sides are parallel to the coordinate axis.

It is easy to find the coordinates of the tangency points between the ellipsoid and its surrounding rectangle parallel to the coordinate axes. Let us find the coordinates of the tangency point that are in the direction of the j-th coordinate axis (positive direction).

For ease of notation, we suppose the ellipsoid is centred around the origin (x 0=0). If not, the rectangle will be shifted by the value of x 0.

The coordinate of the tangency point is given by the solution to the following problem:

$$x=\arg \max_{x^{\top} {\mathcal{A}} x = d^2} e^{\top}_j x$$
(2.35)

where \(e^{\top}_{j}\) is the j-th column of the identity matrix \({\mathcal{I}}_{p}\). The coordinate of the tangency point in the negative direction would correspond to the solution of the min problem: by symmetry, it is the opposite value of the former.

The solution is computed via the Lagrangian \(L= e^{\top}_{j}x-\lambda(x^{\top} {\mathcal{A}} x - d^{2})\) which by (2.23) leads to the following system of equations:

(2.36)
(2.37)

This gives \(x=\frac{1}{2\lambda} {\mathcal{A}}^{-1} e_{j}\), or componentwise

$$x_i=\frac{1}{2\lambda} a^{ij},\quad i=1,\ldots,p $$
(2.38)

where a ij denotes the (i,j)-th element of \({\mathcal{A}}^{-1}\).

Premultiplying (2.36) by x , we have from (2.37):

$$x_j=2\lambda d^2.$$

Comparing this to the value obtained by (2.38), for i=j we obtain \(2\lambda=\sqrt{\frac{a^{jj}}{d^{2}}}\). We choose the positive value of the square root because we are maximising \(e_{j}^{\top} x\). A minimum would correspond to the negative value. Finally, we have the coordinates of the tangency point between the ellipsoid and its surrounding rectangle in the positive direction of the j-th axis:

$$x_i = \sqrt{\frac{d^2}{a^{jj}}}\; a^{ij},\quad i=1,\ldots, p.$$
(2.39)

The particular case where i=j provides statement (iii) in Theorem 2.7.

6.2 Remark: Usefulness of Theorem 2.7

Theorem 2.7 will prove to be particularly useful in many subsequent chapters. First, it provides a helpful tool for graphing an ellipse in two dimensions. Indeed, knowing the slope of the principal axes of the ellipse, their half-lengths and drawing the rectangle inscribing the ellipse, allows one to quickly draw a rough picture of the shape of the ellipse.

In Chapter 7, it is shown that the confidence region for the vector μ of a multivariate normal population is given by a particular ellipsoid whose parameters depend on sample characteristics. The rectangle inscribing the ellipsoid (which is much easier to obtain) will provide the simultaneous confidence intervals for all of the components in μ.

In addition it will be shown that the contour surfaces of the multivariate normal density are provided by ellipsoids whose parameters depend on the mean vector and on the covariance matrix. We will see that the tangency points between the contour ellipsoids and the surrounding rectangle are determined by regressing one component on the (p−1) other components. For instance, in the direction of the j-th axis, the tangency points are given by the intersections of the ellipsoid contours with the regression line of the vector of (p−1) variables (all components except the j-th) on the j-th component.

6.3 Norm of a Vector

Consider a vector \(x \in \mathbb {R}^{p}\). The norm or length of x (with respect to the metric \({\mathcal{I}}_{p}\)) is defined as

$$\| x \| = d(0,x)=\sqrt{x^{\top}x}. $$

If ∥x∥=1,x is called a unit vector. A more general norm can be defined with respect to the metric \({\mathcal{A}}\):

$$\| x \|_{{\mathcal{A}}} = \sqrt{x^{\top} {\mathcal{A}}x}.$$

6.4 Angle Between Two Vectors

Consider two vectors x and \(y \in \mathbb {R}^{p}\). The angle θ between x and y is defined by the cosine of θ:

$$ \cos \theta = \frac{x^{\top}y}{\| x \| \ \| y \|},$$
(2.40)

see Figure 2.4. Indeed for p=2, and , we have

$$\begin{array}{rcl@{\qquad}rcl}\| x \| \cos \theta_{1} &=&x_1;& \| y \| \cos \theta_{2} &=&y_1\\[3pt]\| x \| \sin \theta_{1} &=&x_2;& \| y \| \sin \theta_{2} &=&y_2,\end{array}$$
(2.41)

therefore,

$$\cos\theta=\cos\theta_{1}\cos\theta_{2}+\sin\theta_{1}\sin\theta_{2}=\frac{x_1 y_1+x_2 y_2}{\| x \| \ \| y \|}=\frac{x^{\top}y}{\| x \| \ \| y \|}\ .$$
Fig. 2.4
figure 4

Angle between vectors

Remark 2.1

If x y=0, then the angle θ is equal to \(\frac{\pi}{2}\). From trigonometry, we know that the cosine of θ equals the length of the base of a triangle (||p x ||) divided by the length of the hypotenuse (||x||). Hence, we have

$$ ||p_x|| = ||x|| | \cos \theta |=\frac{|x^{\top}y|}{\| y \|} ,$$
(2.42)

where p x is the projection of x on y (which is defined below). It is the coordinate of x on the y vector, see Figure 2.5.

Fig. 2.5
figure 5

Projection

The angle can also be defined with respect to a general metric \({\mathcal{A}}\)

$$ \cos \theta=\frac{x^{\top} {\mathcal{A}}y}{\| x \|_{{\mathcal{A}}}\ \| y \|_{{\mathcal{A}}}}.$$
(2.43)

If cosθ=0 then x is orthogonal to y with respect to the metric \({\mathcal{A}}\).

Example 2.11

Assume that there are two centred (i.e., zero mean) data vectors. The cosine of the angle between them is equal to their correlation (defined in (3.8)). Indeed for x and y with \(\overline{x} = \overline{y} = 0\) we have

$$r_{XY} = \frac{\sum x_{i}y_{i}}{\sqrt{\sum x_{i}^2\sum y_{i}^2}} = \cos\theta $$

according to formula (2.40).

6.5 Rotations

When we consider a point \(x \in \mathbb {R}^{p}\), we generally use a p-coordinate system to obtain its geometric representation, like in Figure 2.1 for instance. There will be situations in multivariate techniques where we will want to rotate this system of coordinates by the angle θ.

Consider for example the point P with coordinates x=(x 1,x 2) in \(\mathbb {R}^{2}\) with respect to a given set of orthogonal axes. Let Γ be a (2×2) orthogonal matrix where

$$\Gamma=\left(\begin{array}{r@{\quad}c}\cos \theta &\sin \theta \\-\sin \theta &\cos \theta \end{array}\right).$$
(2.44)

If the axes are rotated about the origin through an angle θ in a clockwise direction, the new coordinates of P will be given by the vector y

$$y=\Gamma \,x,$$
(2.45)

and a rotation through the same angle in a anti-clockwise direction gives the new coordinates as

$$y=\Gamma^{\top} \,x.$$
(2.46)

More generally, premultiplying a vector x by an orthogonal matrix Γ geometrically corresponds to a rotation of the system of axes, so that the first new axis is determined by the first row of Γ. This geometric point of view will be exploited in Chapters 10 and 11.

6.6 Column Space and Null Space of a Matrix

Define for \({\mathcal{X}}(n\times p)\)

$$\mathit{Im}({\mathcal{X}})\stackrel{\mathrm{def}}{=}{C}({\mathcal{X}}) =\{x\in \mathbb {R}^n \mid \exists a \in \mathbb {R}^p\mbox{ so that } {\mathcal{X}}a=x \},$$

the space generated by the columns of \({\mathcal{X}}\) or the column space of \({\mathcal{X}}\). Note that \({C}({\mathcal{X}}) \subseteq \mathbb {R}^{n} \) and \(\mbox{dim}\{{C}({\mathcal{X}})\}=\mathop {\rm {rank}}({\mathcal{X}})=r\le \min(n,p)\).

$$\mathit{Ker}({\mathcal{X}})\stackrel{\mathrm{def}}{=}{N}({\mathcal{X}})=\{y\in \mathbb {R}^p \mid {\mathcal{X}}y=0 \}$$

is the null space of \({\mathcal{X}}\). Note that \({N}({\mathcal{X}}) \subseteq \mathbb {R}^{p} \) and that \(\mbox{dim}\{{N}({\mathcal{X}})\}=p-r\).

Remark 2.2

\({N}({\mathcal{X}}^{\top})\) is the orthogonal complement of \({C}({\mathcal{X}})\) in \(\mathbb {R}^{n}\), i.e., given a vector \(b \in \mathbb {R}^{n}\) it will hold that x b=0 for all \(x \in {C}({\mathcal{X}})\), if and only if \(b \in {N}({\mathcal{X}}^{\top})\).

Example 2.12

Let . It is easy to show (e.g. by calculating the determinant of \({\mathcal{X}}\)) that \(\mathop {\rm {rank}}({{\mathcal{X}}})=3\). Hence, the columns space of \({\mathcal{X}}\) is \({C}({\mathcal{X}})=\mathbb {R}^{3}\). The null space of \({\mathcal{X}}\) contains only the zero vector (0,0,0) and its dimension is equal to \(\mathop {\rm {rank}}({{\mathcal{X}}})-3=0\).

For , the third column is a multiple of the first one and the matrix \({{\mathcal{X}}}\) cannot be of full rank. Noticing that the first two columns of \({{\mathcal{X}}}\) are independent, we see that \(\mathop {\rm {rank}}({{\mathcal{X}}})=2\). In this case, the dimension of the columns space is 2 and the dimension of the null space is 1.

6.7 Projection Matrix

A matrix \({\mathcal{P}}(n \times n)\) is called an (orthogonal) projection matrix in \(\mathbb {R}^{n}\) if and only if \({\mathcal{P}}={\mathcal{P}}^{\top}={\mathcal{P}}^{2}\) (\({\mathcal{P}}\) is idempotent). Let \(b \in \mathbb {R}^{n}\). Then \(a={\mathcal{P}}b\) is the projection of b on \({C}({\mathcal{P}})\).

6.8 Projection on \({C}({\mathcal{X}})\)

Consider \({\mathcal{X}}(n \times p)\) and let

$$ {\mathcal{P}}={\mathcal{X}} ({\mathcal{X}}^{\top} {\mathcal{X}})^{-1}{\mathcal{X}}^{\top}$$
(2.47)

and \({\mathcal{Q}}={\mathcal{I}}_{n}-{\mathcal{P}}\). It’s easy to check that \({\mathcal{P}}\) and \({\mathcal{Q}}\) are idempotent and that

$${\mathcal{P}}{\mathcal{X}}={\mathcal{X}}\quad\mbox{and}\quad {\mathcal{Q}}{\mathcal{X}}=0.$$
(2.48)

Since the columns of \({\mathcal{X}}\) are projected onto themselves, the projection matrix \({\mathcal{P}}\) projects any vector \(b \in \mathbb {R}^{n}\) onto \({C}({\mathcal{X}})\). Similarly, the projection matrix \({\mathcal{Q}}\) projects any vector \(b \in \mathbb {R}^{n}\) onto the orthogonal complement of \({C}({\mathcal{X}})\).

Theorem 2.8

Let \({{\mathcal{P}}}\) be the projection (2.47) and \({{\mathcal{Q}}}\) its orthogonal complement. Then:

  1. (i)

    \(x={\mathcal{P}}b\) entails \(x \in {C}({\mathcal{X}})\),

  2. (ii)

    \(y={\mathcal{Q}}b\) means that \(y^{\top}x=0 \ \forall x \in {C}({\mathcal{X}})\).

Proof

(i) holds, since \(x={\mathcal{X}}({\mathcal{X}}^{\top} {\mathcal{X}})^{-1}{\mathcal{X}}^{\top}b={\mathcal{X}}a\), where \(a=({\mathcal{X}}^{\top} {\mathcal{X}})^{-1}{\mathcal{X}}^{\top}b \in \mathbb {R}^{p}\).

(ii) follows from \(y=b-{\mathcal{P}}b\) and \(x={\mathcal{X}}a\). Hence \(y^{\top}x=b^{\top} {\mathcal{X}}a- b^{\top} {\mathcal{X}}({\mathcal{X}}^{\top} {\mathcal{X}})^{-1}{\mathcal{X}}^{\top} {\mathcal{X}}a=0\). □

Remark 2.3

Let \(x,y \in \mathbb {R}^{n}\) and consider \(p_{x} \in \mathbb {R}^{n}\), the projection of x on y (see Figure 2.5). With \({\mathcal{X}}=y\) we have from (2.47)

$$ p_x=y(y^{\top}y)^{-1}y^{\top}x=\frac{y^{\top}x}{\| y \|^2}\ y$$
(2.49)

and we can easily verify that

$$\|p_x\|=\sqrt{p_x^{\top}p_x}=\frac{|y^{\top}x|}{\|y\|}.$$

See again Remark 2.1.

figure d

7 Exercises

Exercise 2.1

Compute the determinant for a (3×3) matrix.

Exercise 2.2

Suppose that \(|{\mathcal{A}}| = 0\). Is it possible that all eigenvalues of \({\mathcal{A}}\) are positive?

Exercise 2.3

Suppose that all eigenvalues of some (square) matrix \({{\mathcal{A}}}\) are different from zero. Does the inverse \({\mathcal{A}}^{-1}\) of \({\mathcal{A}}\) exist?

Exercise 2.4

Write a program that calculates the Jordan decomposition of the matrix

$${\mathcal{A}} = \left(\begin{array}{c@{\quad}c@{\quad}c} 1&2&3\\ 2&1&2\\ 3&2&1\end{array}\right) .$$

Check Theorem 2.1 numerically.

Exercise 2.5

Prove (2.23), (2.24) and (2.25).

Exercise 2.6

Show that a projection matrix only has eigenvalues in {0,1}.

Exercise 2.7

Draw some iso-distance ellipsoids for the metric \({\mathcal{A}}=\Sigma^{-1}\) of Example 3.13.

Exercise 2.8

Find a formula for \(|{\mathcal{A}}+aa^{\top}|\) and for \(({\mathcal{A}}+aa^{\top})^{-1}\). (Hint: use the inverse partitioned matrix with .)

Exercise 2.9

Prove the Binomial inverse theorem for two non-singular matrices \({\mathcal{A}} (p \times p)\) and \({\mathcal{B}} (p \times p)\): \(({\mathcal{A}}+{\mathcal{B}})^{-1}={\mathcal{A}}^{-1}-{\mathcal{A}}^{-1}({\mathcal{A}}^{-1}+{\mathcal{B}}^{-1})^{-1}{\mathcal{A}}^{-1}\). (Hint: use (2.26) with .)