Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter reviews the linear algebra that we shall assume throughout the book. Proofs of standard results are mostly omitted. The reader can consult a linear algebra text such as  [4] for details. In this book all vector spaces considered will be finite dimensional over the field \(\mathbb{C}\) of complex numbers.

2.1 Basic Definitions and Notation

This section introduces some basic notions from linear algebra. We start with some notation, not all of which belongs to linear algebra. Let V and W be vector spaces.

  • If X is a set of vectors, then \(\mathbb{C}X =\mathrm{ Span}\ X\).

  • \({M}_{mn}(\mathbb{C}) =\{ m \times n\ \text{ matrices with entries in}\ \mathbb{C}\}\).

  • \({M}_{n}(\mathbb{C}) = {M}_{nn}(\mathbb{C})\).

  • Hom(V, W) = { A: VWA { is a linear map}}.

  • End(V ) = Hom(V, V ) (the endomorphism ring of V ).

  • GL(V ) = { A ∈ End(V )∣A { is invertible}} (known as the general linear group of V ).

  • \({GL}_{n}(\mathbb{C}) =\{ A \in {M}_{n}(\mathbb{C})\mid A\ \text{ is invertible}\}\).

  • The identity matrix/linear transformation is denoted I, or I n if we wish to emphasize the dimension n.

  • \(\mathbb{Z}\) is the ring of integers.

  • \(\mathbb{N}\) is the set of non-negative integers.

  • \(\mathbb{Q}\) is the field of rational numbers.

  • \(\mathbb{R}\) is the field of real numbers.

  • \(\mathbb{Z}/n\mathbb{Z} =\{ [0],\ldots ,[n - 1]\}\) is the ring of integers modulo n.

  • R  ∗  denotes the group of units (i.e., invertible elements) of a ring R.

  • S n is the group of permutations of {1, , n}, i.e., the symmetric group on n letters.

  • The identity permutation is denoted Id.

Elements of \({\mathbb{C}}^{n}\) will be written as n-tuples or as column vectors, as is convenient.

If \(A \in {M}_{mn}(\mathbb{C})\), we sometimes write A ij for the entry in row i and column j. We may also write A = (a ij ) to mean the matrix with a ij in row i and column j. If k, , m, and n are natural numbers, then matrices in \({M}_{mk,\mathcal{l}n}(\mathbb{C})\) can be viewed as m ×n block matrices with blocks in \({M}_{k\mathcal{l}}(\mathbb{C})\). If we view an mk ×ℓn matrix A as a block matrix, then we write [A] ij for the k × matrix in the i, j block, for 1 ≤ i ≤ m and 1 ≤ j ≤ n.

Definition 2.1.1 (Coordinate vector). 

If V is a vector space with basis B = { b 1, , b n } and \(v = {c}_{1}{b}_{1} + \cdots + {c}_{n}{b}_{n}\) is a vector in V , then the coordinate vector of v with respect to the basis B is the vector \({[v]}_{B} = ({c}_{1},\ldots ,{c}_{n}) \in {\mathbb{C}}^{n}\). The map \(T : V \rightarrow {\mathbb{C}}^{n}\) given by Tv = [v] B is a vector space isomorphism that we sometimes call taking coordinates with respect to B.

Suppose that T : VW is a linear transformation and B, B′ are bases for V, W, respectively. Let B = { v 1, , v n } and B′ = { w 1, , w m }. Then the matrix of T with respect to the bases B, B′ is the m ×n matrix [T] B, B′ whose jth column is [Tv j ] B′ . In other words, if

$$T{v}_{j} ={ \sum \nolimits }_{i=1}^{m}{a}_{ ij}{w}_{i},$$

then [T] B, B′  = (a ij ). When V = W and B = B′, then we write simply [T] B for [T] B, B .

The standard basis for \({\mathbb{C}}^{n}\) is the set {e 1, , e n } where e i is the vector with 1 in the ith coordinate and 0 in all other coordinates. So when n = 3, we have

$${e}_{1} = (1,0,0),\quad {e}_{2} = (0,1,0),\quad {e}_{3} = (0,0,1).$$

Throughout we will abuse the distinction between \(\mathrm{End}({\mathbb{C}}^{n})\) and \({M}_{n}(\mathbb{C})\) and the distinction between \(GL({\mathbb{C}}^{n})\) and \({GL}_{n}(\mathbb{C})\) by identifying a linear transformation with its matrix with respect to the standard basis.

Suppose dimV = n and dimW = m. Then by choosing bases for V and W and sending a linear transformation to its matrix with respect to these bases we see that:

$$\begin{array}{rcl} \mathrm{End}(V )& \cong{M}_{n}(\mathbb{C}); & \\ GL(V )& \cong{GL}_{n}(\mathbb{C}); & \\ \mathrm{Hom}(V,W)& \cong{M}_{mn}(\mathbb{C}).& \\ \end{array}$$

Notice that \({GL}_{1}(\mathbb{C})\cong{\mathbb{C}}^{{_\ast}}\) and so we shall always work with the latter. We indicate W is a subspace of V by writing W ≤ V.

If W 1, W 2 ≤ V , then by definition

$${W}_{1} + {W}_{2} =\{ {w}_{1} + {w}_{2}\mid {w}_{1} \in {W}_{1},{w}_{2} \in {W}_{2}\}.$$

This is the smallest subspace of V containing W 1 and W 2. If, in addition, W 1 ∩ W 2 = {0}, then W 1 + W 2 is called a direct sum, written W 1 ⊕ W 2. As vector spaces, W 1 ⊕ W 2 ≅W 1 ×W 2 via the map W 1 ×W 2W 1 ⊕ W 2 given by (w 1, w 2)↦w 1 + w 2. In fact, if V and W are any two vector spaces, one can form their external direct sum by setting V ⊕ W = V ×W. Note that

$$\dim ({W}_{1} \oplus {W}_{2}) =\dim {W}_{1} +\dim {W}_{2}.$$

More precisely, if B 1 is a basis for W 1 and B 2 is a basis for W 2, then B 1B 2 is a basis for W 1 ⊕ W 2.

2.2 Complex Inner Product Spaces

Recall that if \(z = a + bi \in \mathbb{C}\), then its complex conjugate is \(\overline{z} = a - bi\). In particular, \(z\overline{z} = {a}^{2} + {b}^{2} = \vert z{\vert }^{2}\). An inner product on V is a map

$$\langle \cdot ,\cdot \rangle : V \times V \rightarrow \mathbb{C}$$

such that, for v, w, v 1, v 2 ∈ V and \({c}_{1},{c}_{2} \in \mathbb{C}\):

  • \(\langle {c}_{1}{v}_{1} + {c}_{2}{v}_{2},w\rangle = {c}_{1}\langle {v}_{1},w\rangle + {c}_{2}\langle {v}_{2},w\rangle\);

  • \(\langle w,v\rangle = \overline{\langle v,w\rangle }\);

  • v, v⟩ ≥ 0 and ⟨v, v⟩ = 0 if and only if v = 0.

A vector space equipped with an inner product is called an inner product space. The norm \(\|v\|\) of a vector v in an inner product space is defined by \(\|v\| = \sqrt{\langle v, v\rangle }\).

Example 2.2.1.

The standard inner product on \({\mathbb{C}}^{n}\) is given by

$$\langle ({a}_{1},\ldots ,{a}_{n}),({b}_{1},\ldots ,{b}_{n})\rangle ={ \sum \nolimits }_{i=1}^{n}{a}_{ i}\overline{{b}_{i}}.$$

Two important properties of inner products are the Cauchy–Schwarz inequality

$$\vert \langle v,w\rangle \vert \leq \| v\| \cdot \| w\|$$

and the triangle inequality

$$\|v + w\| \leq \| v\| +\| w\|.$$

Recall that two vectors v, w in an inner product space V are said to be orthogonal if ⟨v, w⟩ = 0. A subset of V is called orthogonal if its elements are pairwise orthogonal. If, in addition, the norm of each vector is 1, the set is termed orthonormal. An orthogonal set of non-zero vectors is linearly independent. In particular, any orthonormal set is linearly independent.

Every inner product space has an orthonormal basis. One can obtain an orthonormal basis from an arbitrary basis using the Gram–Schmidt process  [4, Theorem 15.9]. If B = { e 1, , e n } is an orthonormal basis for an inner product space V and v ∈ V , then

$$v =\langle v,{e}_{1}\rangle {e}_{1} + \cdots +\langle v,{e}_{n}\rangle {e}_{n}$$

In other words,

$${[v]}_{B} = (\langle v,{e}_{1}\rangle ,\ldots ,\langle v,{e}_{n}\rangle ).$$

Example 2.2.2.

For a finite set X, the set \({\mathbb{C}}^{X} =\{ f : X\rightarrow \mathbb{C}\}\) is a vector space with pointwise operations. Namely, one defines

$$\begin{array}{rcl} (f + g)(x)& = f(x) + g(x);& \\ (cf)(x)& = cf(x). & \\ \end{array}$$

For each x ∈ X, define a function \({\delta }_{x}: X\rightarrow \mathbb{C}\) by

$${\delta }_{x}(y) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1\quad &x = y\\ 0\quad &x\neq y. \end{array} \right.$$

There is a natural inner product on \({\mathbb{C}}^{X}\) given by

$$\langle f,g\rangle ={ \sum \nolimits }_{x\in X}f(x)\overline{g(x)}.$$

The set {δ x x ∈ X} is an orthonormal basis with respect to this inner product. If \(f \in {\mathbb{C}}^{X}\), then its unique expression as a linear combination of the δ x is given by

$$f ={ \sum \nolimits }_{x\in X}f(x){\delta }_{x}.$$

Consequently, \(\mathrm{dim}\ {\mathbb{C}}^{X} = \vert X\vert \).

Direct sum decompositions are easy to obtain in inner product spaces. If W ≤ V , then the orthogonal complement of W is the subspace

$${W}^{\perp } =\{ v \in V \mid \langle v,w\rangle = 0\ \text{ for all}\ w \in W\}.$$

Proposition 2.2.3.

Let V be an inner product space and W ≤ V. Then there is a direct sum decomposition V = W ⊕ W .

Proof.

First, if w ∈ W ∩ W  ⊥  then ⟨w, w⟩ = 0 implies w = 0; so W ∩ W  ⊥  = { 0}. Let v ∈ V and suppose that {e 1, , e m } is an orthonormal basis for W. Put \(\hat{v} =\langle v,{e}_{1}\rangle {e}_{1} + \cdots +\langle v,{e}_{m}\rangle {e}_{m}\) and \(z = v -\hat{ v}\). Then \(\hat{v} \in W\). We claim that z ∈ W  ⊥ . To prove this, it suffices to show ⟨z, e i ⟩ = 0 for all i = 1, , m. To this effect we compute

$$\langle z,{e}_{i}\rangle =\langle v,{e}_{i}\rangle -\langle \hat{ v},{e}_{i}\rangle =\langle v,{e}_{i}\rangle -\langle v,{e}_{i}\rangle = 0$$

because {e 1, , e m } is an orthonormal set. As \(v =\hat{ v} + z\), it follows that \(V = W + {W}^{\perp }\). This completes the proof.

We continue to assume that V is an inner product space.

Definition 2.2.4 (Unitary operator). 

A linear operator U ∈ GL(V ) is said to be unitary if

$$\langle Uv,Uw\rangle =\langle v,w\rangle$$

for all v, w ∈ V.

Notice that if U is unitary and v ∈ kerU, then \(0 =\langle Uv,Uv\rangle =\langle v,v\rangle\) and so v = 0. Thus unitary operators are invertible. The set U(V ) of unitary maps is a subgroup of GL(V ).

If \(A = ({a}_{ij}) \in {M}_{mn}(\mathbb{C})\) is a matrix, then its transpose is the matrix \({A}^{T} = ({a}_{ji}) \in {M}_{nm}(\mathbb{C})\). The conjugate of A is \(\overline{A} = (\overline{{a}_{ij}})\). The conjugate-transpose or adjoint of A is the matrix \({A}^{{_\ast}} = \overline{{A}^{T}}\). One can verify directly the equality (AB) ∗  = B  ∗  A  ∗ . Routine computation shows that if \(v \in {\mathbb{C}}^{n}\) and \(w \in {\mathbb{C}}^{m}\), then

$$\langle Av,w\rangle =\langle v,{A}^{{_\ast}}w\rangle$$
(2.1)

where we use the standard inner product on \({\mathbb{C}}^{m}\) and \({\mathbb{C}}^{n}\). Indeed, viewing vectors as column vectors one has \(\langle {v}_{1},{v}_{2}\rangle = \overline{{v}_{1}^{{_\ast}}{v}_{2}}\) and so \(\langle Av,w\rangle = \overline{{(Av)}^{{_\ast}}w} = \overline{{v}^{{_\ast}}({A}^{{_\ast}}w)} =\langle v,{A}^{{_\ast}}w\rangle\).

With respect to the standard inner product on \({\mathbb{C}}^{n}\), the linear transformation associated to a matrix \(A \in G{L}_{n}(\mathbb{C})\) is unitary if and only if \({A}^{-1} = {A}^{{_\ast}}\)  [4, Theorem 32.7]; such a matrix is thus called unitary. We denote by \({U}_{n}(\mathbb{C})\) the group of all n ×n unitary matrices. A matrix \(A \in {M}_{n}(\mathbb{C})\) is called self-adjoint if A  ∗  = A. A matrix A is symmetric if A T = A. If A has real entries, then A is self-adjoint if and only if A is symmetric.

More generally, if T is a linear operator on an inner product space V , then T  ∗ : VV is the unique linear operator satisfying ⟨Tv, w⟩ = ⟨v, T  ∗  w⟩ for all v, w ∈ V. It is called the adjoint of T. If B is an orthonormal basis for V , then [T  ∗ ] B  = [T] B  ∗ . The operator T is self-adjoint if T = T  ∗ , or equivalently if the matrix of T with respect to some (equals any) orthonormal basis of V is self-adjoint.

2.3 Further Notions from Linear Algebra

If X ⊆ End(V ) and W ≤ V , then W is called X-invariant if, for any A ∈ X and any w ∈ W, one has Aw ∈ W, i.e., XW ⊆ W.

A key example comes from the theory of eigenvalues and eigenvectors. Recall that \(\lambda \in \mathbb{C}\) is an eigenvalue of A ∈ End(V ) if λI − A is not invertible, or in other words, if Av = λv for some v≠0. The eigenspace corresponding to λ is the set

$${V }_{\lambda } =\{ v \in V \mid Av = \lambda v\},$$

which is a subspace of V. Note that if v ∈ V λ, then \(A(Av) = A(\lambda v) = \lambda Av\), so Av ∈ V λ. Thus V λ is A-invariant. On the other hand, if W ≤ V is A-invariant with dimW = 1 (that is, W is a line), then W ⊆ V λ for some λ. In fact, if w ∈ W ∖ { 0}, then {w} is a basis for W. Since Aw ∈ W, we have that Aw = λw for some \(\lambda \in \mathbb{C}\). So w is an eigenvector with eigenvalue λ, whence w ∈ V λ. Thus W ⊆ V λ.

The trace of a matrix A = (a ij ) is defined by

$$\mathrm{Tr}(A) ={ \sum \nolimits }_{i=1}^{n}{a}_{ ii}.$$

Some basic facts concerning the trace function \(\mathrm{Tr}: {M}_{n}(\mathbb{C})\rightarrow \mathbb{C}\) are that Tr is linear and Tr(AB) = Tr(BA). Consequently, \(\mathrm{Tr}(PA{P}^{-1}) = \mathrm{Tr}({P}^{-1}PA) = \mathrm{Tr}(A)\) for any invertible matrix P. In particular, if T ∈ End(V ), then Tr(T) makes sense: choose any basis for the vector space V and compute the trace of the associated matrix.

The determinant detA of a matrix is defined as follows:

$$\det A ={ \sum \nolimits }_{\sigma \in {S}_{n}}\mathrm{sgn}(\sigma ) \cdot {a}_{1\sigma (1)}\cdots {a}_{n\sigma (n)}.$$

We recall that

$$\mathrm{sgn}(\sigma ) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1 \quad &\sigma \ \text{ is even}\\ -1\quad &\sigma \ \text{ is odd.} \end{array} \right.$$

The key properties of the determinant that we shall use are:

  • detA≠0 if and only if \(A \in { GL}_{n}(\mathbb{C})\);

  • det(AB) = detA ⋅detB;

  • det(A − 1) = (detA) − 1.

In particular, one has det(PAP − 1) = detA and so we can define, for any T ∈ End(V ), the determinant by choosing a basis for V and computing the determinant of the corresponding matrix for T.

The characteristic polynomialp A (x) of a linear operator A on an n-dimensional vector space V is given by \({p}_{A}(x) =\det (xI - A)\). This is a monic polynomial of degree n (i.e., has leading coefficient 1) and the roots of p A (x) are precisely the eigenvalues of A. The classical Cayley–Hamilton theorem says that any linear operator is a zero of its characteristic polynomial  [4, Corollary 24.7].

Theorem 2.3.1 (Cayley–Hamilton). 

Let p A (x) be the characteristic polynomial of A. Then p A (A) = 0.

If A ∈ End(V ), the minimal polynomial of A, denoted m A (x), is the smallest degree monic polynomial f(x) such that f(A) = 0.

Proposition 2.3.2.

If q(A) = 0 then m A (x)∣q(x).

Proof.

Write \(q(x) = {m}_{A}(x)f(x) + r(x)\) with either r(x) = 0, or deg(r(x)) < deg(m A (x)). Then

$$0 = q(A) = {m}_{A}(A)f(A) + r(A) = r(A).$$

By minimality of m A (x), we conclude that r(x) = 0.

Corollary 2.3.3.

If p A (x) is the characteristic polynomial of A, then m A (x) divides p A (x).

The relevance of the minimal polynomial is that it provides a criterion for diagonalizability of a matrix, among other things. A standard result from linear algebra is the following characterization of diagonalizable matrices, cf.  [4, Theorem 23.11].

Theorem 2.3.4.

A matrix \(A \in {M}_{n}(\mathbb{C})\) is diagonalizable if and only if m A (x) has no repeated roots.

Example 2.3.5.

The diagonal matrix

$$A = \left [\begin{array}{*{10}c} 3 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right ]$$

has \({m}_{A}(x) = (x - 1)(x - 3)\), whereas \({p}_{A}(x) = {(x - 1)}^{2}(x - 3)\). On the other hand, the matrix

$$B = \left [\begin{array}{*{10}c} 1 & 1\\ 0 & 1 \end{array} \right ]$$

has \({m}_{B}(x) = {(x - 1)}^{2} = {p}_{B}(x)\) and so is not diagonalizable.

One of the main results from linear algebra is the spectral theorem for self-adjoint matrices. We sketch a proof since it is indicative of several proofs later in the text.

Theorem 2.3.6 (Spectral Theorem). 

Let \(A \in {M}_{n}(\mathbb{C})\) be self-adjoint. Then there is a unitary matrix \(U \in {U}_{n}(\mathbb{C})\) such that U AU is diagonal. Moreover, the eigenvalues of A are real.

Proof.

First we verify that the eigenvalues are real. Let λ be an eigenvalue of A with corresponding eigenvector v. Then

$$\lambda \langle v,v\rangle =\langle Av,v\rangle =\langle v,{A}^{{_\ast}}v\rangle =\langle v,Av\rangle = \overline{\langle Av,v\rangle } = \overline{\lambda }\langle v,v\rangle$$

and hence \(\lambda = \overline{\lambda }\) because ⟨v, v⟩ > 0. Thus λ is real.

To prove the remainder of the theorem, it suffices to show that \({\mathbb{C}}^{n}\) has an orthonormal basis B of eigenvectors for A. One can then take U to be a matrix whose columns are the elements of B. We proceed by induction on n, the case n = 1 being trivial. Assume the theorem is true for all dimensions smaller than n. Let λ be an eigenvalue of A with corresponding eigenspace V λ. If \({V }_{\lambda } = {\mathbb{C}}^{n}\), then A already is diagonal and there is nothing to prove. So we may assume that V λ is a proper subspace; it is of course non-zero. Then \({\mathbb{C}}^{n} = {V }_{\lambda } \oplus {V }_{\lambda }^{\perp }\) by Proposition 2.2.3. We claim that V λ  ⊥  is A-invariant. Indeed, if v ∈ V λ and w ∈ V λ  ⊥ , then

$$\langle Aw,v\rangle =\langle w,{A}^{{_\ast}}v\rangle =\langle w,Av\rangle =\langle w,\lambda v\rangle = 0$$

and so Aw ∈ V λ  ⊥ . Note that V λ  ⊥  is an inner product space in its own right by restricting the inner product on V , and moreover the restriction of A to V λ  ⊥  is still self-adjoint. Since dimV λ  ⊥  < n, an application of the induction hypothesis yields that V λ  ⊥  has an orthonormal basis B′ of eigenvectors for A. Let B be any orthonormal basis for V λ. Then BB′ is an orthonormal basis for \({\mathbb{C}}^{n}\) consisting of eigenvectors for A, as required.

2.4 Exercises

Exercise 2.1.

Suppose that \(A,B \in {M}_{n}(\mathbb{C})\) are commuting matrices, i.e., AB = BA. Let V λ be an eigenspace of A. Show that V λ is B-invariant.

Exercise 2.2.

Let V be an n-dimensional vector space and B a basis. Prove that the map \(F : \mathrm{End}(V )\rightarrow {M}_{n}(\mathbb{C})\) given by F(T) = [T] B is an isomorphism of unital rings.

Exercise 2.3.

Let V be an inner product space and let W ≤ V be a subspace. Let v ∈ V and define \(\hat{v} \in W\) as in the proof of Proposition 2.2.3. Prove that if w ∈ W with \(w\neq \hat{v}\), then \(\|v -\hat{ v}\| <\| v - w\|\). Deduce that \(\hat{v}\) is independent of the choice of orthonormal basis for W. It is called the orthogonal projection of v onto W.

Exercise 2.4.

Prove that (AB) ∗  = B  ∗  A  ∗ .

Exercise 2.5.

Prove that Tr(AB) = Tr(BA).

Exercise 2.6.

Let V be an inner product space and let T : VV be a linear transformation. Show that T is self-adjoint if and only if V has an orthonormal basis of eigenvectors of T and the eigenvalues of T are real. (Hint: one direction is a consequence of the spectral theorem.)

Exercise 2.7.

Let V be an inner product space. Show that U ∈ End(V ) is unitary if and only if \(\|Uv\| =\| v\|\) for all vectors v ∈ V. (Hint: use the polarization formula \(\langle v,w\rangle = 1/4\left [\|v + {w\|}^{2} -\| v - {w\|}^{2}\right ]\).)

Exercise 2.8.

Prove that if \(A \in {M}_{n}(\mathbb{C})\), then there is an upper triangular matrix T and an invertible matrix P such that P − 1AP = T. (Hint: use induction on n. Look at the proof of the spectral theorem for inspiration.)

Exercise 2.9.

This exercise sketches a proof of the Cayley–Hamilton Theorem using a little bit of analysis.

  1. 1.

    Use Exercise 2.8 to reduce to the case when A is an upper triangular matrix.

  2. 2.

    Prove the Cayley–Hamilton theorem for diagonalizable operators.

  3. 3.

    Identifying \({M}_{n}(\mathbb{C})\) with \({\mathbb{C}}^{{n}^{2} }\), show that the mapping \({M}_{n}(\mathbb{C})\rightarrow {M}_{n}(\mathbb{C})\) given by Ap A (A) is continuous. (Hint: the coefficients of p A (x) are polynomials in the entries of A.)

  4. 4.

    Prove that every upper triangular matrix is a limit of matrices with distinct eigenvalues (and hence diagonalizable).

  5. 5.

    Deduce the Cayley–Hamilton theorem.