The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory

Barata, João Carlos Alves; Hussein, Mahir Saleh

doi:10.1007/s13538-011-0052-z

The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory

General and Applied Physics
Published: 16 December 2011

Volume 42, pages 146–165, (2012)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Brazilian Journal of Physics Aims and scope Submit manuscript

The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory

Download PDF

João Carlos Alves Barata¹ &
Mahir Saleh Hussein¹

4279 Accesses
209 Citations
4 Altmetric
Explore all metrics

Abstract

In the last decades, the Moore–Penrose pseudoinverse has found a wide range of applications in many areas of science and became a useful tool for physicists dealing, for instance, with optimization problems, with data analysis, with the solution of linear integral equations, etc. The existence of such applications alone should attract the interest of students and researchers in the Moore–Penrose pseudoinverse and in related subjects, such as the singular value decomposition theorem for matrices. In this note, we present a tutorial review of the theory of the Moore–Penrose pseudoinverse. We present the first definitions and some motivations, and after obtaining some basic results, we center our discussion on the spectral theorem and present an algorithmically simple expression for the computation of the Moore–Penrose pseudoinverse of a given matrix. We do not claim originality of the results. We rather intend to present a complete and self-contained tutorial review, useful for those more devoted to applications, for those more theoretically oriented, and for those who already have some working knowledge of the subject.

Riemannian inexact Newton method for structured inverse eigenvalue and singular value problems

Article 06 May 2019

Definitions and Motivations

Methods for Computing Weighted Pseudoinverses and Weighted Normal Pseudosolutions with Singular Weights

Article 24 May 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction, Motivation, and Notation

In this paper, we present a self-contained review of some of the basic results on the so-called Moore–Penrose pseudoinverse of matrices, a concept that generalizes the usual notion of inverse of a square matrix, but that is also applicable to singular square matrices or even to non-square matrices. This notion is particularly useful in dealing with certain linear least squares problems, as we shall discuss in Section 6, i.e., problems where one searches for an optimal approximation for solutions of linear equations like Ax = y, where A is a given m×n matrix, y is a given column vector with m components, and the unknown x, a column vector with n components, is the searched solution. In many situations, a solution is non-existing or non-unique, but one asks for a vector x such that the norm of the difference Ax − y is the smallest possible (in terms of least squares).

Let us be a little more specific. Let A ∈ Mat (ℂ, m, n) (the set of all complex m×n matrices) and y ∈ ℂ^m be given and consider the problem of finding x ∈ ℂⁿ satisfying the linear equation

$$ Ax \; = \; y. \label{eq:LuybiUYtvtRytfr-00} $$

(1)

If m = n and A has an inverse, the (unique) solution is, evidently, x = A ^− 1 y. In other cases the solution may not exist or may not be unique. We can, however, consider the alternative problem of finding the set of all vectors x′ ∈ ℂⁿ such that the Euclidean norm ||Ax′ − y|| reaches its least possible value. This set is called the minimizing set of the linear problem (1). Such vectors x′ ∈ ℂⁿ would be the best approximants for the solution of (1) in terms of the Euclidean norm, i.e., in terms of “least squares.” As we will show in Theorem 6.1, the Moore–Penrose pseudoinverse provides this set of vectors x′ that minimize ||Ax′ − y||: It is the set

$$ \Big\{ A^+y +\big( \mathbb{1}_n -A^+A\big) z, \; z\in{\mathbb{C}}^n \Big\} \; , \label{eq:conjuntominimizante-deoptimizacaolinear-00} $$

(2)

where A ⁺ ∈ Mat (ℂ, n, m) denotes the Moore–Penrose pseudoinverse of A. An important question for applications is to find a general and algorithmically simple way to compute A ⁺. The most common approach uses the singular values decomposition and is described in “Appendix 2.” Using the spectral theorem and Tikhonov’s regularization method, we show that A ⁺ can be computed by the algorithmically simpler formula

$$ \begin{array}{rll}\label{eq:reppreudoespectraldeAMAIS-2-00} A^+ \; &=& \; \sum\limits_{{b=1}\atop{\beta_b\neq 0}}^{s} \;\; \frac{1}{\beta_b} \left( \prod\limits_{{l=1}\atop {l\neq b}}^s \big(\beta_b-\beta_l\big)^{-1} \right) \\&&\times \left[ \prod\limits_{{l=1}\atop {l\neq b}}^s \Big( A^*A-\beta_l\mathbb{1}_n \Big) \right] A^* \; , \end{array} $$

(3)

where A* denotes the adjoint matrix of A and β _k, k = 1, ..., s, are the distinct eigenvalues of A*A. See Theorem 5.1 for a more detailed statement. One of the aims of this paper is to present a proof of (3) by combining the spectral theorem with the a regularization procedure due to Tikhonov [1, 2].

1.1 Some Applications of the Moore–Penrose Pseudoinverse

Problems involving the determination of the minimizing set of (1) are always present when the number of unknowns exceeds the number of values provided by measurements. Such situations occur in many areas of Applied Mathematics, Physics, and Engineering, ranging from imaging methods, such as magnetic resonance imaging (MRI) [3–5], functional MRI [7, 6], positron emission tomography [8, 9], and magnetic source imaging [10–12] to seismic inversion problems [13, 14].

The Moore–Penrose pseudoinverse and/or the singular values decomposition (SVD) of matrices (discussed in “Appendix 2”) are also employed in data analysis, as in the treatment of electroencephalographic source localization [15] and in the so-called principal component analysis. Applications of this last method to astronomical data analysis can be found in [16–19], and applications to gene expression analysis can be found in [20, 21]. Image compression algorithms using SVD are known at least since [22], and digital image restoration using the Moore–Penrose pseudoinverse has been studied in [23, 24].

Problems involving the determination of the minimizing set of (1) also occur, for instance, in certain numerical algorithms for finding solutions of linear Fredholm integral equations of the first kind:

$$ \int_a^b k(x, \; y)\, u(y)\;{\rm d}y \; = \; f(x) \;, $$

where − ∞ < a < b < ∞ and where k and f are given functions. See Section 4 for a further discussion of this issue. For an introductory account on integral equations, rich in examples and historical remarks, see [25].

Even this short list of applications should convince a student of Physics or Applied Mathematics of the relevance of the Moore–Penrose pseudoinverse and related subjects, and our main objective is to provide a self- contained introduction to the required theory.

1.2 Organization

In Section 2, we present the definition of the Moore–Penrose pseudoinverse and obtain its basic properties. In Section 3, we further develop the theory of the Moore–Penrose pseudoinverses. In Section 4, we describe Tikhonov’s regularization method for the computation of Moore–Penrose pseudoinverses and present a first proof of existence. Section 5 collects the previous results and derives expression (3), based on the spectral theorem, for the computation of Moore–Penrose pseudoinverses. This expression is algorithmically simpler than the usual method based on the singular values decomposition (described in “Appendix 2”). In Section 6, we show the relevance of the Moore–Penrose pseudoinverse for the solution of linear least squares problems, its main motivation. In “Appendix 1,” we present a self-contained review of the results on linear algebra and Hilbert space theory, not all of them elementary, that we need in the main part of this paper. In “Appendix 2,” we approach the existence problem of the Moore–Penrose pseudoinverse by using the usual singular values decomposition method.

1.3 Notation and Preliminary Definitions

In the following, we fix the notation utilized throughout the paper. We denote ℂⁿ the vector space of all n-tuples of complex numbers: ${\mathbb{C}}^n\mathrel{\mathop:}=\left\{ \left(\begin{smallmatrix}z_1\\ \vdots\\z_n\end{smallmatrix}\right), \; \mbox{with}\; z_k\in\right.$ $\left.{\vphantom{\left(\begin{smallmatrix}z_1\\\vdots\\z_n\end{smallmatrix}\right)}}{\mathbb{C}} \mbox{ for all } k=1, \; \ldots, \; n\right\}$. We denote the usual scalar product in ℂⁿ by ${\langle }\cdot, \; \cdot{\rangle }_{\mathbb{C}}$ or simply by ${\langle }\cdot, \; \cdot{\rangle }$, where for $z= \left(\begin{smallmatrix}z_1\\ \vdots\\z_n\end{smallmatrix}\right)\in{\mathbb{C}}^n$ and $w= \left(\begin{smallmatrix}w_1\\ \vdots\\w_n\end{smallmatrix}\right)\in{\mathbb{C}}^n$, we have

$$ {\langle } z, \; w {\rangle }_{\mathbb{C}} \; \equiv \; {\langle } z, \; w {\rangle } \; \mathrel{\mathop:}= \; \sum\limits_{k=1}^n\overline{z_k}w_k \;. $$

Note that this scalar product is linear in the second argument and anti-linear in the first, in accordance with the convention adopted in Physics. Two vectors u and v ∈ ℂⁿ are said to be orthogonal according to the scalar product ${\langle } \cdot , \; \cdot {\rangle }$ if ${\langle } u , \; v {\rangle }=0$. If W ⊂ ℂⁿ is a subspace of ℂⁿ, we denote by W ^⊥ the subspace of ℂⁿ composed by all vectors orthogonal to all vectors of W. The usual norm of a vector z ∈ ℂⁿ will be denoted by ||z||_ℂ or simply by ||z|| and is defined by $\|z\|_{\mathbb{C}}\equiv\|z\|=\sqrt{{\langle } z, \; z{\rangle }}$. It is well known that ℂⁿ is a Hilbert space with respect to the usual scalar product.

The set of all complex m×n matrices (m rows and n columns) will be denoted by Mat (ℂ , m, n). The set of all square n×n matrices with complex entries will be denoted by Mat (ℂ , n).

The identity matrix will be denoted by $\mathbb{1}$. Given A ∈ Mat (ℂ , m, n), we denote by A ^T the element of Mat (ℂ , n, m) whose matrix elements are $(A^T)_{ij} = A_{ji}$ for all i ∈ {1, ..., n}, j ∈ {1, ..., m}. The matrix A ^T is said to be the transpose of A. It is evident that (A ^T)^T = A and that (AB)^T = B ^T A ^T for all A ∈ Mat (ℂ , m, n) and B ∈ Mat (ℂ , n, p).

If A ∈ Mat (ℂ, m, n), then its adjoint A* ∈ Mat (ℂ, n, m) is defined as the matrix whose matrix elements $(A^*)_{ij}$ are given by $\overline{A_{ji}}$ for all 0 ≤ i ≤ n and 0 ≤ j ≤ m.

Given a set α ₁, ..., α _n of complex numbers, we denote by diag (α ₁, ..., α _n ) ∈ Mat (ℂ, n) the diagonal matrix whose k-th diagonal entry is α _k:

$$ \big(\mathrm{diag}\, (\alpha_1, \; \ldots , \; \alpha_n )\big)_{ij} \; = \; \left\{ \begin{array}{ll} \alpha_i, & \mbox{for } i=j \; , \\ 0, & \mbox{for } i\neq j \; . \end{array} \right. $$

The spectrum of a square matrix A ∈ Mat (ℂ, n) coincides with the set of its eigenvalues (see the definitions in “Appendix 1”) and will be denoted by σ(A).

We denote by $\mathbb{0}_{a, \; b}\in \mathrm{Mat}\, ({\mathbb{C}} , \; a, \;b)$ the a × b matrix whose matrix elements are all zero. We denote by $\mathbb{1}_{l}\in \mathrm{Mat}\, ({\mathbb{C}} ,$ l) the l×l identity matrix. If no danger of confusion is present, we will simplify the notation and write $\mathbb{0}$ and $\mathbb{1}$ instead of $\mathbb{0}_{a, \; b}$ and $\mathbb{1}_{l}$, respectively. We will also employ the following definitions: for m, n ∈ ℕ, let I _{m, m + n} ∈ Mat (ℂ, m, m + n) and J _{m + n, n} ∈ Mat (ℂ, m + n, n) be given by

$$ I_{m, \; m+n} \; \mathrel{\mathop:}= \; \begin{pmatrix} \mathbb{1}_m & \,\, \mathbb{0}_{m, \; n} \end{pmatrix} \quad \mbox{ and } \quad J_{m+n, \; n} \; \mathrel{\mathop:}= \; \begin{pmatrix} \mathbb{1}_n \\ \mathbb{0}_{m, \; n} \end{pmatrix} \; . \label{eq:MatrizesIJeAlinha-1} $$

(4)

The corresponding transpose matrices are

$$ \begin{array}{rll} \big(I_{m, \; m+n}\big)^T \; \mathrel{\mathop:}&=& \; \begin{pmatrix} \mathbb{1}_m \\ \mathbb{0}_{n, \; m} \end{pmatrix} \; = \; J_{m+n, \; m} \quad \mbox{ and } \quad\\ \big(J_{ m+n , \; n}\big)^T \; \mathrel{\mathop:}&=& \; \begin{pmatrix} \mathbb{1}_n & \mathbb{0}_{n, \; m} \end{pmatrix} \; = \; I_{n, \; m+n} \; . \label{eq:MatrizesIJeAlinha-2} \end{array} $$

(5)

The following useful identities will be used below:

$$I_{m, \; m+n} \, \big(I_{m, \; m+n}\big)^T = I_{m, \; m+n} J_{m+n, \; m} \; = \; \mathbb{1}_m \;, \label{eq:MatrizesIJeAlinha-3} $$

(6)

$${\kern-6pt} \big(J_{ m+n , \; n}\big)^T J_{ m+n , \; n} = I_{n, \; m+n } J_{ m+n , \; n} \; = \; \mathbb{1}_n \;, \label{eq:MatrizesIJeAlinha-4} $$

(7)

For each A ∈ Mat (ℂ, m, n), we can associate a square matrix A′ ∈ Mat (ℂ, m + n) given by

$$ \begin{array}{rll} A' \; \mathrel{\mathop:}&=& \; \big(I_{m, \; m+n}\big)^T A \big(J_{ m+n , \; n}\big)^T\\ \; &=& \; J_{m+n, \; m} A I_{ n, \; m+n} \; = \; \begin{pmatrix} A & \mathbb{0}_{m, \; m} \\ \mathbb{0}_{n, \; n} & \mathbb{0}_{n, \; m} \end{pmatrix} \;. \label{eq:MatrizesIJeAlinha-5} \end{array} $$

(8)

As one easily checks, we get from (6)–(7) the useful relation

$$ A \; = \; I_{m, \; m+n} A' J_{ m+n , \; n} \; . \label{eq:MatrizesIJeAlinha-6} $$

(9)

The canonical basis of vectors in ℂⁿ is

$${\mathbf e}_1 \; = \; \begin{pmatrix} 1 \\ 0 \\ 0\\ \vdots \\ 0 \end{pmatrix} , \quad {\mathbf e}_2 \; = \; \begin{pmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix} , \quad \ldots , \quad {\mathbf e}_n \; = \; \begin{pmatrix} 0 \\0\\ \vdots \\ 0 \\1 \end{pmatrix} \; , \label{eq:ABaseCanonica-1} $$

(10)

Let x ¹, ..., x ⁿ be vectors, represented in the canonical basis as

$$ x^a \; = \; \begin{pmatrix} x^a_1 \\ \vdots \\ x^a_n \end{pmatrix} \; . $$

We will denote by $ {{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} $ the n × n matrix constructed in such a way that its a-th column is the vector x ^a, that means,

$${{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} \; = \; \begin{pmatrix} x^1_1 & \cdots & x^n_1 \\ \vdots & \ddots & \vdots \\ x^1_n & \cdots & x^n_n \end{pmatrix} \; . \label{eq:notacaoparamatrizgeradaporcolunas} $$

(11)

It is obvious that $ \mathbb{1} = {{\Big[\!\! \Big[}} {\mathbf e}_1 , \; \ldots , \; {\mathbf e}_n {{\Big]\!\! \Big]}} $. With this notation, we write

$$ \label{BgerV} B {{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} \; = \; {{\Big[\!\! \Big[}} Bx^1, \; \ldots , \; Bx^n {{\Big]\!\! \Big]}} \; , $$

(12)

for any B ∈ Mat (ℂ, m, n), as one easily checks. Moreover, if D is a diagonal matrix D = diag (d ₁, ..., d _n ), then

$$ \label{eq:gerVmalD} {{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} \, D \; = \; {{\Big[\!\! \Big[}} d_1 x^1, \; \ldots , \; d_n x^n {{\Big]\!\! \Big]}}\; . $$

(13)

If v ₁ , ..., v _k are elements of a complex vector space V, we denote by [v ₁ , ..., v _k] the subspace generated v ₁ , ..., v _k, i.e., the collection of all linear combinations of the v ₁ , ..., v _k: $ [v_1 , \; \ldots , \; v_k] \mathrel{\mathop:}= \Big\{ \alpha_1 v_1 + \cdots + \alpha_k v_k, \; \; \alpha_1, \; \ldots , \; \alpha_k \in {\mathbb{C}} \Big\} $. More definitions and general results can be found in “Appendix 1.”

2 The Moore–Penrose Pseudoinverse: Definition and First Properties

In this section, we define the notion of a Moore–Penrose pseudoinverse and study its uniqueness. The question of the existence of the Moore–Penrose pseudoinverse of a given matrix is analyzed in other sections.

2.1 Generalized Inverses, or Pseudoinverses

Let m, n ∈ ℕ and let A ∈ Mat (ℂ, m ,n) be a m×n matrix (not necessarily a square matrix). A matrix B ∈ Mat (ℂ, n, m) is said to be a generalized inverse, or a pseudoinverse, of A if it satisfies the following conditions:

1.
ABA = A,
2.
BAB = B.

If A ∈ Mat (ℂ, n) is a non-singular square matrix, its inverse A ^− 1 satisfies trivially the defining properties of the generalized inverse above. We will prove later that every matrix A ∈ Mat (ℂ, m ,n) has at least one generalized inverse, namely the Moore–Penrose pseudoinverse. The general definition above is not enough to guarantee uniqueness of the generalized inverse of any matrix A ∈ Mat (ℂ, m ,n).

The definition above is too wide to be useful, and it is convenient to narrow it in order to deal with certain specific problems. In what follows, we will discuss the specific case of the Moore–Penrose pseudoinverse and its application to optimization of linear least squares problems.

2.2 Defining the Moore–Penrose Pseudoinverse

Let m, n ∈ ℕ and let A ∈ Mat (ℂ, m ,n). A matrix A ⁺ ∈ Mat (ℂ, n, m) is said to be a Moore–Penrose pseudoinverse of A if it satisfies the following conditions:

1.
AA ⁺ A = A,
2.
A ⁺ AA ⁺ = A ⁺,
3.
AA ⁺ ∈ Mat (ℂ, m) and A ⁺ A ∈ Mat (ℂ, n) are self-adjoint.

It is easy to see again that if A ∈ Mat (ℂ, n) is non-singular, then its inverse satisfies all defining properties of a Moore–Penrose pseudoinverse.

The notion of Moore–Penrose pseudoinverse was introduced by E. H. Moore [26] in 1920 and rediscovered by R. Penrose [27, 28] in 1955. The Moore–Penrose pseudoinverse is a useful concept in dealing with optimization problems, as the determination of a “least squares” solution of linear systems. We will treat such problems later (see Theorem 6.1), after dealing with the question of uniqueness and existence of the Moore–Penrose pseudoinverse.

2.3 The Uniqueness of the Moore–Penrose Pseudoinverse

We will first show the uniqueness of the Moore–Penrose pseudoinverse of a given matrix A ∈ Mat (ℂ, m, n), assuming its existence. Let A ⁺ ∈ Mat (ℂ, n, m) be a Moore–Penrose pseudoinverse of A ∈ Mat (ℂ, m, n) and let B ∈ Mat (ℂ,n, m) be another Moore–Penrose pseudoinverse of A, i.e., such that ABA = A, BAB = B with AB and BA self-adjoint. Let $M_1\mathrel{\mathop:}= AB-$ $AA^+=A\big(B-A^+\big)\in \mathrm{Mat}\,({\mathbb{C}}, \; m)$. By the hypothesis, M ₁ is self-adjoint (since it is the difference of two self- adjoint matrices) and $(M_1)^2=\big(AB-AA^+\big)A\big(B-$ $A^+\big)\!=\! \big(ABA\!-\!AA^+A\big)\big(B\!-\!A^+\big)\!=\!(A\!-\!A)\big(B\!-\!A^+\big) =$ 0. Since M ₁ is self-adjoint, the fact that $(M_1)^2=0$ implies that M ₁ = 0, since for all x ∈ ℂ^m one has $\|M_1x\|^2\!=\!{\langle} M_1x, \, M_1x{\rangle}\!=\!{\big\langle } x, \, (M_1)^2x{\big\rangle}\!=\!0$, implying M ₁ = 0. This showed that AB = AA ⁺. Following the same steps, we can prove that BA = A ⁺ A (consider the self-adjoint matrix $M_2\mathrel{\mathop:}= BA-A^+A\in \mathrm{Mat}\,({\mathbb{C}}, \; n)$ and proceed as above). Now, all this implies that $A^+ \,=\, A^+AA^+=A^+\big(AA^+\big)=A^+AB=\big(A^+A\big)B=$ BAB = B, thus establishing uniqueness.

As we have already commented, if A ∈ Mat (ℂ, n) is a non-singular square matrix, its inverse A ^− 1 trivially satisfies the defining conditions of the Moore–Penrose pseudoinverse, and therefore, we have in this case A ⁺ = A ^− 1 as the unique Moore–Penrose pseudoinverse of A. It is also evident from the definition that for $\mathbb{0}_{mn}$, the m×n identically zero matrix, one has $(\mathbb{0}_{mn})^+=\mathbb{0}_{nm}$.

2.4 Existence of the Moore–Penrose Pseudoinverse

We will present two proofs of the existence of the Moore–Penrose pseudoinverse A ⁺ for an arbitrary matrix A ∈ Mat (ℂ, m, n). Both proofs produce algorithms for the explicit computation of A ⁺. The first one will be presented in Section 4 (Theorems 4.3 and 5.1) and will follow from results presented below. Expressions (39) and (40) furnish explicit expressions for the computation of A ⁺ in terms of A, A* and the eigenvalues of AA* or A* A (i.e., the singular values of A).

The second existence proof will be presented in “Appendix 2” and relies on the singular values decomposition presented in Theorem A.16. For this proof, the preliminary results presented below are not required. This second proof is the one more frequently found in the literature, but we believe that expressions (39) and (40) provide an algorithmically simpler way for the determination of the Moore–Penrose pseudoinverse of a given matrix.

2.5 Computing the Moore–Penrose Pseudoinverse in Some Special Cases

If A ∈ Mat (ℂ, m, 1), $A=\left(\begin{smallmatrix}a_1 \\ \vdots \\ a_m\end{smallmatrix}\right)$, a non-zero column vector, then one can easily verify that $A^+=\frac{1}{\|A\|^2}A^* = \frac{1}{\|A\|^2}\left(\begin{smallmatrix}\overline{a_1}\;, \quad\ldots , \quad \overline{a_m}\end{smallmatrix}\right)$, where $\|A\|=\sqrt{|a_1|^2+ \cdots +|a_m|^2}$. In particular, if z ∈ ℂ, then $(z)^+=\left\{\begin{array}{ll}0, \quad z=0\\ \frac{1}{z}, \quad z\neq 0\end{array}\right.$, by taking z as an element of Mat (ℂ, 1, 1).

This can be further generalized. If A ∈ Mat (ℂ, m, n) and (AA*)^− 1 exist, then

$$ A^+ \; = \; A^*\big(AA^*\big)^{-1} \; , \label{eq:pseudoinversa-AAeEXTinversivel-1} $$

(14)

because we can readily verify that the right-hand side satisfies the defining conditions of A ⁺. Analogously, if (A*A)^− 1 exists, one has

$$ A^+ \; = \; \big(A^*A\big)^{-1}A^* \; . \label{eq:pseudoinversa-AAeEXTinversivel-2} $$

(15)

For instance, for $A=\left(\begin{smallmatrix} 2 \quad 0 \quad i \\ 0 \quad i \quad 1 \end{smallmatrix}\right)$, one can check that AA* is invertible, but A* A is not, and we have $A^+ = A^*\big(AA^*\big)^{-1}=\frac{1}{9} \left(\begin{smallmatrix} 4 \quad -2i \\ 1 \quad -5i\\ -i \quad 4 \end{smallmatrix}\right)$. Similarly, for $A=\left(\begin{smallmatrix} 1 \quad 2 \\ 0 \quad i \\ 0 \quad 3 \end{smallmatrix}\right)$, AA* is singular, but A* A is invertible and we have $A^+ = \big(A^*A\big)^{-1}A^* = \frac{1}{10} \left(\begin{smallmatrix} 10 \quad 2i \quad -6 \\ 0 \quad -i \quad 3 \end{smallmatrix}\right)$.

The relations (14) and (15) are significant because they will provide an important hint to find the Moore–Penrose pseudoinverse of a general matrix, as we will discuss later. In Proposition 3.2, we will show that one has in general $A^+ = A^* \big(AA^*\big)^+ = \big(A^* A\big)^+ A^*$, and in Theorem 4.3, we will discuss what can be done in the cases when A*A or A*A are not invertible.

3 Further Properties of the Moore–Penrose Pseudoinverse

The following properties of the Moore–Penrose pseudoinverse follow immediately from its definition and from uniqueness. The proofs are elementary and left to the reader: For any A ∈ Mat (ℂ, m, n), one has

1.
$\big(A^+\big)^+=A$,
2.
$\big(A^+\big)^T=\big(A^T\big)^+$, $\overline{A^+}=\left(\overline{A}\right)^+$ and, consequently $\big(A^+\big)^*=\big(A^*\big)^+$,
3.
(z A)⁺ = z ^− 1 A ⁺ for all z ∈ ℂ, z ≠ 0.

It is, however, important to remark that for A ∈ Mat (ℂ, m, n) and B ∈ Mat (ℂ, n, p), the Moore–Penrose pseudoinverse (AB)⁺ is not always equal to B ⁺ A ⁺, in contrast to what happens with the usual inverse in the case m = n = p. A relevant exception will be found in Proposition 3.2.

The next proposition lists some important properties that will be used below.

Proposition 3.1

The Moore–Penrose pseudoinverse satisfies the following relations:

$$ A^+ = A^+ \, \big(A^+\big)^* \, A^* \;, \label{eq:ident-pseudoinversa-a} $$

(16)

$$ A = A \, A^* \, \big(A^+\big)^* \; , \label{eq:ident-pseudoinversa-b} $$

(17)

$$ A^* = A^* \, A \, A^+ \; , \label{eq:ident-pseudoinversa-c} $$

(18)

$$ A^+ = A^* \, \big(A^+\big)^* \, A^+ \;, \label{eq:ident-pseudoinversa-d} $$

(19)

$$ A = \big(A^+\big)^* \, A^* \, A \;, \label{eq:ident-pseudoinversa-e} $$

(20)

$$ A^* = A^+ \, A \, A^* \; , \label{eq:ident-pseudoinversa-f} $$

(21)

valid for all A ∈ Mat (ℂ, m, n).

For us, the most relevant of the relations above is relation (18), since we will make use of it in the proof of Proposition 6.1 when deal with optimization of least squares problems.

Proof of Proposition 3.1

Since AA ⁺ is self-adjoint, one has $AA^+=\big(AA^+\big)^* = \big(A^+\big)^*A^* $. Multiplying to the left by A ⁺, we get $A^+=A^+ \big(A^+\big)^* A^*$, proving (16). Replacing A→A ⁺ and using the fact that $A=\big(A^+\big)^+$, one gets from (16) $A=A A^* \big(A^+\big)^*$, which is relation (17). Replacing A→A* and using the fact that $\big(A^*\big)^+=\big(A^+\big)^*$, we get from (17) that A* = A* A A ⁺, which is relation (18).

Relations (19)–(21) can be obtained analogously from the fact that A ⁺ A is also self-adjoint, but they follow more easily by replacing A→A* in (16)–(18) and by taking the adjoint of the resulting expressions.□

From Proposition 3.1, other interesting results can be obtained, some of which are listed in the following proposition:

Proposition 3.2

For all A ∈ Mat (ℂ, m, n), one has

$$ \big(AA^*\big)^+ \; = \; \big(A^*\big)^+ A^+ \;. \label{eq:poiynOUIybiyutuyGUg} $$

(22)

From this we get

$$ A^+ \; = \; A^* \big(AA^*\big)^+ \; = \; \big(A^* A\big)^+ A^* \;, \label{eq:LniuybuyTUYTygfytgHGv} $$

(23)

also valid for all A ∈ Mat (ℂ, m, n).

Expression (23) generalizes (14) and (15) and can be employed to compute A ⁺ provided $\big(AA^*\big)^+ $ or $\big(A^* A\big)^+$ were previously known.

Proof of Proposition 3.2

Let $B=\big(A^*\big)^+ A^+$. One has

$$ \begin{array}{rll} AA^* &\!\stackrel{\text{(17)}}{=}\!& A \, A^* \, (A^+)^* \, A^* \!\stackrel{\text{(21)}}{=}\! A \, A^* \, (A^+)^* \, A^+ \, A \, A^*\\ &=& (AA^*)B(AA^*) \;, \end{array} $$

where we use that $\big(A^*\big)^+=\big(A^+\big)^*$. One also has

$$ \begin{array}{rll} B \!&=&\! \big(A^*\big)^+ A^+ \stackrel{\text{(16)}}{=} (A^+)^* \, A^+ \, A \, A^+\\ &\stackrel{\text{(19)}}{=}& (A^+)^* \, A^+ \, A \,A^* \, (A^+)^* \, A^+ = B \big( A \,A^*\big)B \;. \end{array} $$

Notice that

$$ \big( A \,A^*\big)B \; = \; \Big( A \,A^* (A^+)^* \Big) A^+ \! \stackrel{\text{(18)}}{=}\! AA^+ $$

which is self-adjoint, by definition. Analogously,

$$ B\big( A \,A^*\big) \; = \; (A^+)^* \Big( A^+ A \,A^* \Big) \!\stackrel{\text{(20)}}{=}\! (A^*)^+ A^* \; , $$

which is also self-adjoint. The facts exposed in the lines above prove that B is the Moore–Penrose pseudoinverse of AA*, establishing (22). Replacing A→A* in (22), one also gets

$$ \big(A^*A\big)^+ \; = \; A^+ \big(A^*\big)^+ \; . \label{eq:poiynOUIybiyutuyGUg-2} $$

(24)

Notice now that

$$ A^* \big(AA^*\big)^+ \! \stackrel{\text{(22)}}{=}\! A^* \big(A^*\big)^+ A^+ \! \stackrel{\text{(19)}}{=}\! A^+ $$

and that

$$ \big(A^*A\big)^+ A^* \! \stackrel{\text{(24)}}{=}\! A^+ \big(A^*\big)^+ A^* \! \stackrel{\text{(16)}}{=}\! A^+ \;, $$

establishing (23).□

3.1 The Kernel and the Range of a Matrix and the Moore–Penrose Pseudoinverse

The kernel and the range (or image) of a matrix A ∈ Mat (ℂ, m, n) are defined by $\mathrm{Ker}\,(A)\mathrel{\mathop:}= \{ u \in {\mathbb{C}}^n|\; Au=0\}$ and $\mathrm{Ran}\,(A)\mathrel{\mathop:}=\{Au, \; u \in {\mathbb{C}}^n\}$, respectively. It is evident that Ker (A) is a linear subspace of ℂⁿ and that Ran (A) is a linear subspace of ℂ^m.

The following proposition will be used below, but is interesting by itself.

Proposition 3.3

Let A ∈ Mat (ℂ, m, n) and let us define $P_1 \mathrel{\mathop:}= \mathbb{1}_n - A^+ A \in\mathrm{Mat}\,({\mathbb{C}}, \; n)$ and $P_2 \mathrel{\mathop:}= \mathbb{1}_m - A A^+ \in\mathrm{Mat}\,({\mathbb{C}}, \; n)$ . Then, the following claims are valid:

1.
P ₁ and P ₂ are orthogonal projectors, that means, they satisfy $(P_k)^2=P_k$ and $P_k^* = P_k$ , k = 1, 2.
2.
$\mathrm{Ker}\,(A) \!=\! \mathrm{Ran}\,(P_1)$ , $\mathrm{Ran}\,(A)\!=\!\mathrm{Ker}\,(P_2)$ , $\mathrm{Ker}\,(A^+)\!=$ Ran (P ₂) and $\mathrm{Ran}\,\big(A^+\big)= \mathrm{Ker}\,(P_1)$ .
3.
$\mathrm{Ran}\,(A)=\mathrm{Ker}\,\big(A^+\big)^\perp$ and $\mathrm{Ran}\,\big(A^+\big)=\mathrm{Ker}\,(A)^\perp$ .
4.
$\mathrm{Ker}\,(A)\!\oplus\! \mathrm{Ran}\,\big(A^+\big)\!=\!{\mathbb{C}}^n$ and $\mathrm{Ker}\,\big(A^+\big)\!\oplus\! \mathrm{Ran}\,(A)\!=$ ℂ^m , both being direct sums of orthogonal subspaces.

Proof

Since AA ⁺ and A ⁺ A are self-adjoint, so are P ₁ and P ₂. One also has $(P_1)^2= \mathbb{1} - 2 A^+ A + A^+ AA^+ A \!=\! \mathbb{1} - 2 A^+ A + A^+ A\!=\!\mathbb{1} - A^+ A\!=\!P_1$ and analogously for P ₂. This proved item 1.

Let x ∈ Ker (A). Since Ran (P ₁) is a closed linear subspace of ℂⁿ, the “Best Approximant Theorem,” Theorem A.1, and the Orthogonal Decomposition Theorem, Theorem A.3, guarantee the existence of a unique z ₀ ∈ Ran (P ₁) such that $\|x-z_0\|=\min\big\{ \|x-z\|, \; z\in \mathrm{Ran}\,(P_1)\big\}$. Moreover, x − z ₀ is orthogonal to Ran (P ₁). Hence, there exists at least one $y_0\in{\mathbb{C}}^m$ such that x − P ₁ y ₀ is orthogonal to every element of the form P ₁ y, i.e., ${\langle } x-P_1 y_0, \; P_1y{\rangle }=0$ for all y ∈ ℂ^m, what implies ${\langle } P_1(x-P_1 y_0), \; y{\rangle }=0$ for all y ∈ ℂ^m what, in turn, implies P ₁(x − P ₁ y ₀) = 0. This, however, says that P ₁ x = P ₁ y ₀. Since x ∈ Ker (A), one has P ₁ x = x (by the definition of P ₁). We therefore proved that if x ∈ Ker (A), then x ∈ Ran (P ₁), establishing that Ker (A) ⊂ Ran (P ₁). On the other hand, the fact that $AP_1= A\big(\mathbb{1} - A^+A\big)=A-A=0$ implies Ran (P ₁) ⊂ Ker (A), establishing that Ran (P ₁) = Ker (A).

If z ∈ Ker (P ₁), then z = A ⁺ Az, proving that $z\in \mathrm{Ran}\,\big(A^+\big)$. This established that $\mathrm{Ker}\,(P_1)\subset \mathrm{Ran}\,\big(A^+\big)$. On the other hand, if $u\in \mathrm{Ran}\,\big(A^+\big)$, then there exists v ∈ ℂ^m such that u = A ⁺ v. Therefore, $P_1u= \big(\mathbb{1}_n - A^+ A\big)A^+v=\big(A^+ - A^+AA^+\big)v=0$, proving that u ∈ Ker (P ₁) and that $\mathrm{Ran}\,\big(A^+\big)\subset\mathrm{Ker}\,(P_1)$. This establishes that $\mathrm{Ker}\,(P_1) = \mathrm{Ran}\,\big(A^+\big)$.

P ₂ is obtained from P ₁ by the substitution A→A ⁺ (recalling that $\big(A^+\big)^+ =A$). Hence, the results above imply that $\mathrm{Ran}\,(P_2) = \mathrm{Ker}\,\big(A^+\big)$ and that Ker (P ₂) = Ran (A). This proves item 2.

If M ∈ Mat (ℂ, p) (with p ∈ ℕ, arbitrary) is self-adjoint, then ${\langle } y, \; Mx{\rangle }={\langle } My, \; x{\rangle }$ for all x, y ∈ ℂ^p. This relation makes evident that Ker (M) = Ran (M)^⊥. Therefore, item 3 follows from item 2 by taking M = P ₁ and M = P ₂. Item 4 is evident from item 3.□

4 Tikhonov’s Regularization and Existence Theorem for the Moore–Penrose Pseudoinverse

In (14) and (15), we saw that if $\big(AA^*\big)^{-1}$ exists, then $A^+ = A^*\big(AA^*\big)^{-1}$, and that if $\big(A^*A\big)^{-1}$ exists, then $A^+ = \big(A^*A\big)^{-1}A^*$. If those inverses do not exist, there is an alternative procedure to obtain A ⁺. We know from Proposition A.4 that even if $\big(AA^*\big)^{-1}$ does not exist, the matrix $AA^* +\mu\mathbb{1}$ will be invertible for all non-vanishing μ ∈ ℂ with |μ| small enough. Hence, we could conjecture that the expressions $A^*\big(AA^*+\mu\mathbb{1}\big)^{-1}$ and $ \big(A^*A+\mu\mathbb{1}\big)^{-1}A^*$ are well-defined for μ ≠ 0 and |μ| small enough and converge to A ⁺ when the limit μ→0 is taken. As will now show, this conjecture is correct.

The provisional replacement of the singular matrices AA* or A*A by the non-singular ones $AA^*+\mu\mathbb{1}$ or $A^*A+\mu\mathbb{1}$ (with μ ≠ 0 and |μ| “small”) is a regularization procedure known as Tikhonov’s regularization. This procedure was introduced by Tikhonov in [1] (see also [2] and, for historical remarks, [25]) in his search for uniform approximations for the solutions of Fredholm’s equation of the first kind

$$ \int_a^b k(x, \; y)\, u(y)\;{\rm d}y \; = \; f(x) \;, \label{eq:Fredholm-of-the-first-kind} $$

(25)

where − ∞ < a < b < ∞ and where k and f are given functions satisfying adequate smoothness conditions. In operator form, (25) becomes Ku = f and K is well-known to be a compact operator (see, e.g., [29]) if k is a continuous function. By using the method of finite differences or by using expansions in terms of orthogonal functions, the inverse problem (25) can be replaced by an approximating inverse matrix problem Ax = y, like (1). By applying A* to the left, one gets A*Ax = A*y. Since the inverse of A*A may not exist, one first considers a solution x _μ of the regularized equation $\big(A^*A+\mu\mathbb{1}\big)x_\mu = A^*y$, with some adequate μ ∈ ℂ, and asks whether the limit $\lim_{|\mu|\to 0}\big(A^*A+\mu\mathbb{1}\big)^{-1}A^*y$ can be taken. As we will see, the limit exists and is given precisely by A ⁺ y. In Tikhonov’s case, the regularized equation $\big(A^*A+\mu\mathbb{1}\big)x_\mu = A^*y$ can be obtained from a related Fredholm’s equation of the second kind, namely $K^*Ku_{\mu} + \mu u_{\mu}=K^*f$, for which the existence of solutions, i.e., the existence of the inverse $(K^*K+\mu\mathbb{1})^{-1}$, is granted by Fredholm’s alternative theorem (see, e.g., [29]) for all μ in the resolvent set of K*K and, therefore, for all μ > 0 (since K*K is a positive compact operator)^{Footnote 1}. It is then a technical matter to show that the limit $\displaystyle \lim_{ {\mu\to 0}\atop {\mu>0} }u_{\mu}$ exists and provides a uniform approximation to a solution of (25).

Tikhonov, however, does not point to the relation of his ideas to the theory of the Moore–Penrose inverse. This will be described in what follows. Our first result, presented in the next two lemmas, establishes that the limits $\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}$ and $\displaystyle \lim_{\mu\to 0} \big(A^*A +$ $\mu\mathbb{1}_n\big)^{-1}A^*$, described above, indeed exist and are equal.

Lemma 4.1

Let A ∈ Mat (ℂ, m, n) and let μ ∈ ℂ be such that $AA^* +\mu\mathbb{1}_m$ and $A^*A +\mu\mathbb{1}_n$ are non-singular (that means $\mu\not\in\sigma\big(AA^*\big)\cup\sigma\big(A^*A\big)$ , a finite set). Then, $A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} = \big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^*$ .

Recall that, by Proposition A.7, $\sigma\big(AA^*\big)$ and $\sigma\big(A^*$ $A\big)$ differ at most by the element 0.

Proof of Lemma 4.1

Let $B_\mu \mathrel{\mathop:}= A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}$ and $C_\mu \mathrel{\mathop:}= \big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^*$. We have

$$ \begin{array}{rll} A^*AB_\mu &=& A^*\big[AA^*\big]\big(AA^* +\mu\mathbb{1}_m\big)^{-1}\\ &=& A^*\big[AA^* +\mu\mathbb{1}_m-\mu\mathbb{1}_m\big]\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \\ &=& A^*\Big(\mathbb{1}_m-\mu \big(AA^* +\mu\mathbb{1}_m\big)^{-1}\Big) = A^*-\mu B_\mu \, . \end{array} $$

Therefore, $\big(A^*A +\mu\mathbb{1}_n\big)B_\mu=A^*$, what implies $ B_\mu=\big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^* =C_\mu $.□

Lemma 4.2

For all A ∈ Mat (ℂ, m, n) the limits $\displaystyle \lim_{\mu\to 0} A^*\big(AA^* \!+\!\mu\mathbb{1}_m\big)^{-1}$ and $\displaystyle \lim_{\mu\to 0} \big(A^*A \!+\!\mu\mathbb{1}_n\big)^{-1}A^*$ exist and are equal (by Lemma 4.1), defining an element of Mat (ℂ, n, m).

Proof

Notice first that A is an identically zero matrix iff AA* or A*A are zero matrices. In fact, if, for instance, A*A = 0, then for any vector x one has $0={\langle } x, \; A^*Ax{\rangle } = {\langle } Ax, \;Ax{\rangle }=\|Ax\|^2$, proving that A = 0. Hence, we will assume that AA* and A*A are non-zero matrices.

The matrix AA* ∈ Mat (ℂ, m) is evidently self-adjoint. Let α ₁, ..., α _r be its distinct eigenvalues. By the spectral theorem for self-adjoint matrices (see Theorems A.9 and A.13), we may write

$$ AA^* \; = \; \sum\limits_{a=1}^{r}\alpha_a E_a \; , \label{eq:wihnpiuYBItryutf} $$

(26)

where E _a are the spectral projectors of AA* and satisfy E _a E _b = δ _ab E _a, $E_a^*=E_a$, and $\sum\limits_{a=1}^{r} E_a =\mathbb{1}_m$. Therefore,

$$ AA^* +\mu\mathbb{1}_m \; = \; \sum\limits_{a=1}^{r}(\alpha_a + \mu) E_a $$

and, hence, for $\mu\not\in\{\alpha_1, \; \ldots , \; \alpha_r\}$, one has, by (50),

$$ \begin{array}{rll} \big(AA^* +\mu\mathbb{1}_m\big)^{-1} &=& \sum\limits_{a=1}^{r}\frac{1}{\alpha_a + \mu} E_a \quad \mbox{ and }\quad\\ A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} &=& \sum\limits_{a=1}^{r}\frac{1}{\alpha_a + \mu} A^*E_a \;. \label{eq:YOIUybyutuygc} \end{array} $$

(27)

There are now two cases to be considered: (1) zero is not an eigenvalue of AA* and (2) zero is eigenvalue of AA*.

In case 1, it is clear from (27) that the limit $\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}$ exists and

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a \; . \label{eq:expressaoparaBnocasoAAestnaotemautovalornulo} $$

(28)

In case 2, let us have, say, α ₁ = 0. The corresponding spectral projector E ₁ projects on the kernel of AA*: $\mathrm{Ker}\,\big(AA^*\big)\mathrel{\mathop:}= \{ u \in {\mathbb{C}}^n|\; AA^*u=0\}$. If $x\in \mathrm{Ker}\,\big(AA^*\big)$, then A* x = 0 because $0={\big\langle } x, \; AA^*x{\big\rangle }={\langle } A^*x, \; A^*x{\rangle }=\big\|A^*x\big\|^2$. Therefore,

$$ A^*E_1 \; = \; 0 \label{eq:yfvuyTbihpuybtrdcyr87} $$

(29)

and, hence, we may write,

$$ A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a + \mu} A^*E_a \; , $$

from which we get

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a \; . \label{eq:expressaoparaBnocasoAAesttemautovalornulo} $$

(30)

This proves that $\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}$ always exists. By Lemma 4.1, the limit $\displaystyle\lim_{\mu\to 0} \big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^*$ also exists and coincides with $\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}$.□

The main consequence is the following theorem, which contains a general proof for the existence of the Moore–Penrose pseudoinverse:

Theorem 4.3

(Tikhonov’s Regularization) For all A ∈ Mat (ℂ, m, n) one has

$$ A^+ \; = \; \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \label{eq:regTichinovparaAMAIS-1} $$

(31)

and

$$ A^+ \; = \; \lim_{\mu\to 0} \big(A^*A+\mu\mathbb{1}_n\big)^{-1}A^* \; . \label{eq:regTichinovparaAMAIS-2} $$

(32)

Proof

The statements to be proven are evident if $A=\mathbb{0}_{mn}$ because, as we already saw, $(\mathbb{0}_{mn})^+=\mathbb{0}_{nm}$. Hence, we will assume that A is a non-zero matrix. This is equivalent (by the comments found in the proof o Lemma 4.2) to assume that AA* and A*A are non-zero matrices.

By Lemmas 4.1 and 4.2, it is enough to prove (31). There are two cases to be considered: (1) zero is not an eigenvalue of AA* and (2) zero is an eigenvalue of AA*. In case (1), we saw in (28) that

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a \; =\mathrel{\mathop:} \; B \;. $$

Notice now that

$$ \begin{array}{rll} \label{eq:KLhniUYiuyYtfygkugfyhygfv-1} AB &=& \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} AA^*E_a = \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} \left( \sum\limits_{b=1}^{r} \alpha_b E_b \right)E_a\\ &=& \sum\limits_{a=1}^{r} \sum\limits_{b=1}^{r} \frac{1}{\alpha_a}\alpha_b\;\delta_{ab}E_a = \sum\limits_{a=1}^{r} E_a = \mathbb{1}_{m} , \end{array} $$

(33)

which is self-adjoint and that

$$ BA \; = \; \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a A \;, \label{eq:KLhniUYiuyYtfygkugfyhygfv-2} $$

(34)

which is also self-adjoint because α _a ∈ ℝ for all a and because $(A^*E_a A)^* = A^*E_a A$ for all a, since $E_a^*=E_a$.

From (33), it follows that ABA = A. From (34), it follows that

$$ \begin{array}{rll} BAB &=& \left(\sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a A\right) \left(\sum\limits_{b=1}^{r}\frac{1}{\alpha_b} A^*E_b \right)\\ &=& \sum\limits_{a=1}^{r}\sum\limits_{b=1}^{r}\frac{1}{\alpha_a\alpha_b} A^*E_a (AA^*)E_b \;. \end{array} $$

Now, by the spectral decomposition (26) for AA*, it follows that $(AA^*)E_b=\alpha_b E_b$. Therefore,

$$ \begin{array}{rll} BAB &=& \sum\limits_{a=1}^{r}\sum\limits_{b=1}^{r}\frac{1}{\alpha_a} A^*E_a E_b\\ &=& \left( \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a \right) \bigg(\underbrace{ \sum\limits_{b=1}^{r} E_b}_{\mathbb{1}_m}\bigg) = B \;. \end{array} $$

This proves that A = A ⁺ when 0 is not an eigenvalue of AA*.

Let us now consider the case when AA* has a zero eigenvalue, say, α ₁. As we saw in (30),

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a \; =\mathrel{\mathop:} \; B \;. $$

Using the fact that $(AA^*)E_a=\alpha_a E_a$ (what follows from the spectral decomposition (26) for AA*), we get

$$ \begin{array}{rll}\label{eq:KLhniUYiuyYtfygkugfyhygfv-1b} AB &=& \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} AA^*E_a = \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} \alpha_a E_a\\ &=& \sum\limits_{a=2}^{r} E_a = \mathbb{1}_{m} -E_1 , \end{array} $$

(35)

which is self-adjoint, since E ₁ is self-adjoint. We also have

$$ BA \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a A \;, \label{eq:KLhniUYiuyYtfygkugfyhygfv-2b} $$

(36)

which is also self-adjoint.

From (35), it follows that ABA = A − E ₁ A. Notice now that $(E_1 A)^*=A^*E_1=0$, by (29). This establishes that E ₁ A = 0 and that ABA = A. From (36), it follows that

$$ \begin{array}{rll} BAB &=& \left(\sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a A\right) \left(\sum\limits_{b=2}^{r}\frac{1}{\alpha_b} A^*E_b \right)\\ &=& \sum\limits_{a=2}^{r}\sum\limits_{b=2}^{r}\frac{1}{\alpha_a\alpha_b} A^*E_a (AA^*)E_b \;. \end{array} $$

Using again $(AA^*)E_b=\alpha_b E_b$, we get

$$ \begin{array}{rll} BAB \!&=&\! \sum\limits_{a=2}^{r}\sum\limits_{b=2}^{r}\frac{1}{\alpha_a} A^*E_a E_b \!=\! \left( \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a \right)\! \underbrace{\left( \sum\limits_{b=2}^{r} E_b\right)}_{\mathbb{1}_m-E_1}\\ &=& B - \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_aE_1 = B , \end{array} $$

since E _a E ₁ = 0 for a ≠ 1. This shows that BAB = B. Hence, we established that A = A ⁺ also in the case when AA* has a zero eigenvalue, completing the proof of (31).□

5 The Moore–Penrose Pseudoinverse and the Spectral Theorem

The proof of Theorem 4.3 also establishes the following facts:

Theorem 5.1

Let A ∈ Mat (ℂ, m, n) be a non-zero matrix and let $AA^*= \sum\limits_{a=1}^r \alpha_a E_a $ be the spectral representation of AA*, where {α ₁ , ..., α _r} ⊂ ℝ is the set of distinct eigenvalues of AA* and E _a are the corresponding self-adjoint spectral projections. Then, we have

$$ A^+ \; = \; \sum\limits_{{a=1}\atop{\alpha_a\neq 0}}^{r}\frac{1}{\alpha_a} A^*E_a \; . \label{eq:representacaoquaseespectaldeAMAIS-1} $$

(37)

Analogously, let $A^*A= \sum\limits_{b=1}^s \beta_b F_b $ be the spectral representation of A*A, where {β ₁ , ..., β _s} ⊂ ℝ is the set of distinct eigenvalues of A*A and F _b the corresponding self-adjoint spectral projections. Then, we also have

$$ A^+ \; = \; \sum\limits_{{b=1}\atop{\beta_b\neq 0}}^{s}\frac{1}{\beta_b} F_b A^* \; . \label{eq:representacaoquaseespectaldeAMAIS-2} $$

(38)

Is it worth mentioning that, by Proposition A.7, the sets of non-zero eigenvalues of AA* and of A*A coincide: {α ₁ , ..., α _r} ∖ {0} = {β ₁ , ..., β _s} ∖ {0}).

From (37) and (38) it follows that for a non-zero matrix A , we have

$$ \begin{array}{rll} A^+ & = & \sum\limits_{{a=1}\atop{\alpha_a\neq 0}}^{r} \;\; \frac{1}{\alpha_a} \left( \prod\limits_{{l=1}\atop {l\neq a}}^r \big(\alpha_a-\alpha_l\big)^{-1} \right)\\ &&\times A^* \left[ \prod\limits_{{l=1}\atop {l\neq a}}^r \Big( AA^*-\alpha_l\mathbb{1}_m \Big)\right] \;, \label{eq:reppreudoespectraldeAMAIS-1} \end{array} $$

(39)

$$ \begin{array}{rll} A^+ & = & \sum\limits_{{b=1}\atop{\beta_b\neq 0}}^{s} \;\; \frac{1}{\beta_b} \left( \prod\limits_{{l=1}\atop {l\neq b}}^s \big(\beta_b-\beta_l\big)^{-1} \right) \\ &&\times \left[ \prod\limits_{{l=1}\atop {l\neq b}}^s \Big( A^*A-\beta_l\mathbb{1}_n \Big) \right] A^* \; . \label{eq:reppreudoespectraldeAMAIS-2} \end{array} $$

(40)

Expression (39) or (40) provides a general algorithm for the computation of the Moore–Penrose pseudoinverse for any non-zero matrix A. Its implementation requires only the determination of the eigenvalues of AA* or of A* A and the computation of polynomials on AA* or A* A.

Proof of Theorem 5.1

Equation (37) was established in the proof of Theorem 4.3 (see (28) and (30)). Relation (38) can be proven analogously, but it also follows easier (see (37)), by replacing A→A* and taking the adjoint of the resulting expression. Relations (39) and (40) follow from Proposition A.11, particularly from the explicit formula for the spectral projector given in (52).□

6 The Moore–Penrose Pseudoinverse and Least Squares

Let us now consider one of the main applications of the Moore–Penrose pseudoinverse, namely to optimization of linear least squares problems. Let A ∈ Mat (ℂ, m, n) and y ∈ ℂ^m be given and consider the problem of finding x ∈ ℂⁿ satisfying the linear equation

$$ Ax \; = \; y \; . \label{eq:LuybiUYtvtRytfr} $$

(41)

If m = n and A has an inverse, the (unique) solution is, evidently, x = A ^− 1 y. In the other cases, the solution may not exist or may not be unique. We can, however, consider the alternative problem of finding the set of all vectors x′ ∈ ℂⁿ such that the Euclidean norm ||Ax′ − y|| reaches its least possible value. This set is called the minimizing set of the linear problem (41). Such vectors x′ ∈ ℂⁿ would be the best approximants for the solution of (41) in terms of the Euclidean norm, i.e., in terms of “least squares.” As we will show, the Moore–Penrose pseudoinverse provides this set of vectors x′ that minimize ||Ax′ − y||. The main result is condensed in the following theorem:

Theorem 6.1

Let A ∈ Mat (ℂ, m, n) and y ∈ ℂ^m be given. Then, the set of all vectors of ℂⁿ for which the map $ {\mathbb{C}}^n \ni x \mapsto \|Ax-y\|\in[0, \; \infty)$ assumes a minimum coincides with the set

$$ \begin{array}{rll} A^+y &+& \mathrm{Ker}\,(A) \\ &=& \Big\{ A^+y +\big( \mathbb{1}_n -A^+A\big) z, \; z\in{\mathbb{C}}^n \Big\} \; . \label{eq:conjuntominimizante-deoptimizacaolinear} \end{array} $$

(42)

By Proposition 3.3, we also have $A^+y + \mathrm{Ker}\,(A)=A^+y + \mathrm{Ran}\,\big(A^+\big)^{\perp}$ .

Theorem 6.1 says that the minimizing set of the linear problem (41) consists of all vectors obtained by adding to the vector A ⁺ y an element of the kernel of A, i.e., of all vectors obtained adding to A ⁺ y a vector annihilated by A. Notice that for the elements x′ of the minimizing set of the linear problem (41), one has $\big\|Ax'-y\big\| = \Big\|\big(AA^+-\mathbb{1}_m\big)y\Big\| = \|P_2y\|$, which vanishes if and only if y ∈ Ker (P ₂) = Ran (A) (by Proposition 3.3), a rather obvious fact.

Proof of Theorem 6.1

The image of A, Ran (A), is a closed linear subspace of ℂ^m. The best approximant theorem and the orthogonal decomposition theorem guarantee the existence of a unique y ₀ ∈ Ran (A) such that ||y ₀ − y|| is minimal and that this y ₀ is such that y ₀ − y is orthogonal to Ran (A).

Hence, there exists at least one $x_0\in {\mathbb{C}}^n$ such that ||Ax ₀ − y|| is minimal. Such x ₀ is not necessarily unique, and as one easily sees, $x_1\in {\mathbb{C}}^n$ has the same properties if and only if x ₀ − x ₁ ∈ Ker (A) (since Ax ₀ = y ₀ and Ax ₁ = y ₀, by the uniqueness of y ₀). As we already observed, Ax ₀ − y is orthogonal to Ran (A), i.e., ${\langle } (Ax_0 -y), \; Au{\rangle }=0$ for all u ∈ ℂⁿ. This means that ${\Big\langle } \big(A^*Ax_0-A^*y\big), \; u{\Big\rangle }=0$ for all u ∈ ℂⁿ and, therefore, x ₀ satisfies

$$ A^*Ax_0 \; = \; A^*y\;. \label{eq:m5645gouinOIUyuyguy} $$

(43)

Now, relation (18) shows us that $x_0= A^+ y$ satisfies (43) because $A^*AA^+ y \stackrel{\text{(18)}}{=}A^*y$. Therefore, we conclude that the set of all x ∈ ℂⁿ satisfying the condition of ||Ax − y|| being minimal is composed by all vectors of the form $A^+y + x_1$ with x ₁ ∈ Ker (A). By Proposition 3.3, x ₁ is of the form $x_1 = \big(\mathbb{1}_n -A^+A\big)z$ for some z ∈ ℂⁿ, completing the proof.□

Notes

Tikhonov’s argument in [1] is actually more complicated, since he does not consider the regularized equation $\big(K^*K + \mu\mathbb{1} \big)u_{\mu}=K^*f$, but a more general version where the identity operator $\mathbb{1}$ is replaced by a Sturm–Liouville operator.

References

A.N. Tikhonov, Solution of incorrectly formulated problems and the regularization method, Sov. Math., Dokl. 4, 1035–1038 (1963). English translation of Dokl. Akad. Nauk USSR 151, 501–504 (1963)
Google Scholar
A.N. Tikhonov, V.A. Arsenin, Solution of Ill-Posed Problems (Winston, Washington, DC, 1977)
D.K. Sodickson, C.A. McKenzie, A generalized approach to parallel magnetic resonance imaging, Med. Phys. 28, 1629 (2001). doi:10.1118/1.1386778
Article Google Scholar
R. Van De Walle, H.H. Barrett, K.J. Myers, M.I. Aitbach, B. Desplanques, A. F. Gmitro, J. Cornelis, I. Lemahieu, Reconstruction of MR images from data acquired on a general nonregular grid by pseudoinverse calculation, IEEE Trans. Med. Imag. 19 1160–1167 (2000). doi:10.1109/42.897806
Article Google Scholar
H. Ammari, An Introduction to Mathematics of Emerging Biomedical Imaging. Mathématiques et Applications, vol. 62 (Springer, New York, 2008). ISBN 978-3-540-79552-0
Google Scholar
G. Lohmann, K. Müller, V. Bosch, H. Mentzel, S. Hessler, L. Chen, S. Zysset, D.Y. von Cramon. Lipsia—a new software system for the evaluation of functional magnetic resonance images of the human brain, Comput. Med. Imaging Graph. 25, 449–457 (2001). doi:10.1016/S0895-6111(01)00008-8
Article Google Scholar
X. Li, D. Coyle, L. Maguire, T.M. McGinnity, D.R. Watson, H. Benali, A least angle regression method for fMRI activation detection in phase-encoded experimental designs, NeuroImage 52, 1390–1400 (2010). doi:10.1016/j.neuroimage.2010.05.017
Article Google Scholar
V. Stefanescu, D.A. Stoichescu, C. Faye, A. Florescu, Modeling and simulation of cylindrical PET 2D using direct inversion method, in 16th International Symposium for Design and Technology in Electronic Packaging (SIITME), 2010 (IEEE, Piscataway, 2010), pp. 107–112. doi:10.1109/SIITME.2010.5653649
M. Bertero, P. Boccacci, Introduction to Inverse Problems in Imaging (Taylor & Francis, Bristol, 1998). ISBN: 978-0750304351
J.-Z. Wang, S.J. Williamson, L. Kaufman, Magnetic source imaging based on the minimum-norm least-squares inverse, Brain Topogr. 5(4), 365–371 (1993). doi:10.1007/BF01128692
Article Google Scholar
N.G. Gençer, S.J. Williamson, Magnetic source images of human brain functions, Behav. Res. Meth. 29(1), 78–83 (1997). doi:10.3758/BF03200570
Article Google Scholar
J.-Z. Wang, S.J. Willianson, L. Kaufman, Spatio-temporal model of neural activity of the human brain based on the MNLS inverse, in Biomagnetism: Fundamental Research and Clinical Applications. Studies in Applied Electromagnetics and Mechanics, Vol. 7. International Conference on Biomagnetism 1995 (University of Vienna), ed. by C. Baumgartner, L. Deecke, G. Stroink, S.J. Williamson (Elsevier, Amsterdam, 1995). ISBN: 978-9051992335
M.T. Page, S. Custódio, R.J. Archuleta, J.M. Carlson, Constraining earthquake source inversions with GPS data 1: Resolution based removal of artifacts, J. Geophys. Res. 114, B01314 (2009). doi:10.1029/2007JB005449
Article Google Scholar
S. Atzori, A. Antonioli, Optimal fault resolution in geodetic inversion of coseismic data, Geophys. J. Int. 185, 529–538 (2011). doi:10.1111/j.1365-246X.2011.04955.x
Article ADS Google Scholar
R.D. Pascual-Marqui, Standardized low resolution brain electromagnetic tomography (sLORETA): technical details, Methods Find. Exp. Clin. Pharmacol. 24D, 5–12 (2002)
Google Scholar
C. Hennig, N. Christlieb, Validating visual clusters in large datasets: fixed point clusters of spectral features, Comput. Stat. Data Anal. 40, 723–739 (2002). doi:10.1016/S0167-9473(02)00077-4
Article MathSciNet MATH Google Scholar
L. Bedini, D. Herranz, E. Salerno, C. Baccigalupi, E.E. Kuruoǧlu, A. Tonazzini, Separation of correlated astrophysical sources using multiple-lag data covariance matrices, EURASIP J. Appl. Signal Process. 15, 2400–2412 (2005)
ADS Google Scholar
T.V. Ricci, J.E. Steiner, R.B. Menezes, NGC 7097: the active galactic nucleus and its mirror, revealed by principal component analysis tomography, Astrophys. J. 734, L10 (2011). doi:10.1088/2041-8205/734/1/L10
Article ADS Google Scholar
J.E. Steiner, R.B. Menezes, T.V. Ricci, A.S. Oliveira, PCA tomography: how to extract information from data cubes, in Monthly Notices of the Royal Astronomical Society, vol. 395 (2009), p. 64
M.E. Wall, A. Rechtsteiner, L.M. Rocha, Singular values decomposition and principal component analysis, in A Practical Approach to Microarray Data Analysis, ed. by D.P. Berrar, W. Dubitzky, M. Granzow (Kluwer, Norwell, 2003), pp. 91–109. LANL LA-UR-02-4001
O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, R.B. Altman, Missing value estimation methods for DNA microarrays, Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
H. Andrews, C. Patterson III, Singular values decomposition (SVD) image coding, IEEE Trans. Commun. 24(4), 425–432 (1976)
Article Google Scholar
S. Chountasis, V.N. Katsikis, D. Pappas, Applications of the Moore–Penrose inverse in digital image restoration. Math. Probl. Eng. 2009, Article ID 170724, 12 pp. (2009). doi:10.1155/2009/170724
S. Chountasis, V.N. Katsikis, D. Pappas, Digital image reconstruction in the spectral domain utilizing the Moore–Penrose inverse. Math. Probl. Eng. 2010, Article ID 750352, 14 pp. (2010). doi:10.1155/2010/750352
C.W. Groetsch, Integral equations of the first kind, inverse problems and regularization: a crash course, J. Phys.: Conf. Ser. 73, 012001 (2007). doi:10.1088/1742-6596/73/1/012001
Article ADS Google Scholar
E.H. Moore, On the reciprocal of the general algebraic matrix. Bull. Am. Math. Soc. 26, 394–395 (1920)
Google Scholar
R. Penrose, A generalized inverse for matrices, Proc. Camb. Philos. Soc. 51, 406–413 (1955)
Article MathSciNet ADS MATH Google Scholar
R. Penrose, On best approximate solution of linear matrix equations, Proc. Camb. Philos. Soc. 52, 17–19 (1956)
Article MathSciNet ADS MATH Google Scholar
M. Reed, B. Simon, Methods of Modern Mathematical Physics. Vol. 1: Functional Analysis (Academic, New York, 1972–1979)
O. Bratteli, D.W. Robinson, Operator Algebras and Quantum Statistical Mechanics I (Springer, New York, )
Google Scholar

Download references

Acknowledgments

We are specially grateful to Prof. Nestor Caticha for providing us with some references on applications of the Moore–Penrose pseudoinverse and of the singular values decomposition. This work was supported in part by the Brazilian Agencies CNPq and FAPESP.

Author information

Authors and Affiliations

Instituto de Física, Universidade de São Paulo, C.P. 66318, 05314-970, São Paulo, São Paulo, Brazil
João Carlos Alves Barata & Mahir Saleh Hussein

Authors

João Carlos Alves Barata
View author publications
You can also search for this author in PubMed Google Scholar
Mahir Saleh Hussein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Carlos Alves Barata.

Appendices

Appendix 1: A Brief Review of Hilbert Space Theory and Linear Algebra

In this appendix, we collect the more important definitions and results on linear algebra and Hilbert space theory that we used in the main part of this paper. For the benefit of the reader, especially of students, we provide all results with proofs.

1.1 Hilbert Spaces: Basic Definitions

A scalar product in a complex vector space ${\mathcal V}$ is a function ${\mathcal V}\times{\mathcal V}\to{\mathbb{C}}$, denoted here by ${\langle }\cdot,\;\cdot{\rangle }$, such that the following conditions are satisfied: (1) For all $u\in{\mathcal V}$, one has ${\langle } u, \; u{\rangle }\geq 0$ and ${\langle } u, \; u{\rangle }=0$ if and only if u = 0; (2) for all $u, \; v_1, \; v_2\in{\mathcal V}$ and all α ₁, α ₂ ∈ ℂ, one has ${\big\langle } u, \; (\alpha_1v_1+\alpha_2v_2){\big\rangle } = \alpha_1 {\langle } u, \; v_1{\rangle }+\alpha_2 {\langle } u, \; v_2{\rangle } $ and ${\big\langle } (\alpha_1v_1+\alpha_2v_2), \; u{\big\rangle } = \overline{\alpha_1} {\langle } v_1, \; u {\rangle }+\overline{\alpha_2} {\langle } v_2, \; u {\rangle } $; and (3) $\overline{{\langle } u, \; v{\rangle }}={\langle } v, \; u{\rangle }$ for all $u, \; v\in{\mathcal V}$.

The norm associated to the scalar product ${\langle }\cdot,\;\cdot{\rangle }$ is defined by $\|u\|\mathrel{\mathop:}=\sqrt{{\langle } u,\;u{\rangle }}$, for all $u\in{\mathcal V}$. As one easily verifies using the defining properties of a scalar product, this norm satisfies the so-called parallelogram identity: For all $ a, \; b \in {\mathcal V}$, one has

$$ \| a + b\|^2 + \|a-b\|^2 \; = \; 2 \|a\|^2 + 2 \|b\|^2 \; . \label{sldkjfbnhv} $$

(44)

We say that a sequence $\{v_n\in{\mathcal V}, \;n\in{\mathbb{N}} \}$ of vectors in ${\mathcal V}$ converges to an element $v\in{\mathcal V}$ if for all ε > 0 there exists a N(ε) ∈ ℕ such that ||v _n − v|| ≤ ε for all n ≥ N(ε). In this case, we write v = lim_n→∞ v _n. A sequence $\{v_n\in{\mathcal V}, \;n\in{\mathbb{N}} \}$ of vectors in ${\mathcal V}$ is said to be a Cauchy sequence if for all ε > 0 there exists a N(ε) ∈ ℕ such that ||v _n − v _m|| ≤ ε for all n, m ∈ ℕ such that n ≥ N(ε) and m ≥ N(ε). A complex vector space ${\mathcal V}$ is said to be a Hilbert space if it has a scalar product and if it is complete, i.e., if all Cauchy sequences in ${\mathcal V}$ converge to an element of ${\mathcal V}$.

1.2 The Best Approximant Theorem

A subset A of a Hilbert space ${\mathcal H}$ is said to be convex if for all u, v ∈ A and all μ ∈ [0, 1] one has μu + (1 − μ)v ∈ A. A subset A of a Hilbert space ${\mathcal H}$ is said to be closed if every sequence {u _n ∈ A, n ∈ ℕ } of elements of A that converges in ${\mathcal H}$ converges to an element of A. The following theorem is of fundamental importance in the theory of Hilbert spaces.

Theorem A.1

(Best Approximant Theorem) Let A be a convex and closed subset of a Hilbert space ${\mathcal H}$ . Then, for all $ x\in {\mathcal H}$ there exists a unique y ∈ A such that || x − y|| equals the smallest possible distance between x and A, that means, $ \| x - y\| = \inf_{y'\in A} \big\|x - y'\big\| $ .

Proof

Let D ≥ 0 be defined by $ D = \inf_{y'\in A} \|x - y'\| $. For each n ∈ ℕ, let us choose a vector y _n ∈ A with the property that $ \| x - y_n \|^2 < D^2 + \frac{1}{n}$. Such a choice is always possible, by the definition of the infimum of a set of real numbers bounded from below.

Let us now prove that the sequence y _n, n ∈ ℕ is a Cauchy sequence in $ {\mathcal H}$. Let us take a = x − y _n and b = x − y _m in the parallelogram identity (44). Then, $ \big\|2 x - (y_m + y_n)\big\|^2 + \|y_m - y_n\|^2 = 2 \|x - y_n\|^2 + 2 \|x-y_m\|^2 $. This can be written as $ \|y_m - y_n\|^2 = 2 \|x - y_n\|^2 + 2 \|x-y_m\|^2 - 4 \big\|x - (y_m + y_n)/2\big\|^2 $. Now, using the fact that $\| x - y_n \|^2 < D^2 + \frac{1}{n}$ for each n ∈ ℕ, we get

$$ \begin{array}{rll} \|y_m - y_n\|^2 &\leq& 4D^2 + 2\left( \frac{1}{n}+\frac{1}{m}\right)\\ &&-\, 4 \big\|x - (y_m + y_n)/2\big\|^2 . \end{array} $$

Now (y _m + y _n)/2 ∈ A, since left-hand side is a convex linear combination of elements of the convex set A. Hence, by the definition of D, $ \big\|x - (y_m + y_n)/2\big\|^2 \geq D^2 $. Therefore, we have

$$ \begin{array}{rll} \|y_m - y_n\|^2 & \leq& 4D^2 + 2\left( \frac{1}{n}+\frac{1}{m}\right) - 4D^2\\ &=& 2\left( \frac{1}{n}+\frac{1}{m}\right) . \end{array} $$

The right-hand side can be made arbitrarily small, by taking both m and n large enough, proving that {y _n}_{n ∈ ℕ} is a Cauchy sequence. Since A is a closed subspace of the complete space $ {\mathcal H}$, the sequence {y _n}_{n ∈ ℕ} converges to y ∈ A.

Now we prove that ||x − y || = D. In fact, for all n ∈ ℕ, one has

$$ \begin{array}{rll} \|x-y \| &=& \big\| (x - y_n) - (y - y_n) \big\|\\ & \leq& \| x - y_n \|+\| y - y_n \|\\ &<& \sqrt{D^2 + \frac{1}{n}} + \| y - y_n \| . \end{array} $$

Taking n→ ∞ and using the fact that y _n converges to y, we conclude that ||x − y || ≤ D . On the other hand, ||x − y || ≥ D by the definition of D and we must have ||x − y || = D.

At last, it remains to prove the uniqueness of y. Assume that there is another y′ ∈ A such that $ \big\|x-y' \big\| = D$. Using again the parallelogram identity (44), but now with a = x − y and b = x − y′, we get

$$ \begin{array}{rll} \big\|2 x &-& (y + y')\big\|^2 + \big\|y - y'\big\|^2\\ &=& 2 \big\|x - y\big\|^2 + 2 \big\|x-y'\big\|^2 = 4D^2 , \end{array} $$

which means that

$$ \begin{array}{rll} \big\|y - y'\big\|^2 &=& 4D^2 - \big\|2 x - (y + y')\big\|^2\\ &=& 4D^2 - 4 \Big\| x - \big(y + y'\big)/2 \Big\|^2 . \end{array} $$

Since (y + y′)/2 ∈ A (for A being convex), it follows that $ \big\| x - (y + y')/2\big\|^2 \geq D^2 $ and, hence, $ \big\|y - y'\big\|^2 \leq 0 $, proving that y = y′.□

1.3 Orthogonal Complements

If E is a subset of a Hilbert space $ {\mathcal H}$, we define its orthogonal complement E ^⊥ as the set of vectors in $ {\mathcal H}$ orthogonal to all vectors in E: $ E^\perp = \Big\{ y \in {\mathcal H} \big| \;\; {\langle } y, \; x {\rangle } =0 \mbox{ for all } x\in E \Big\} $. The following proposition is of fundamental importance:

Proposition A.2

The orthogonal complement E ^⊥ of a subset E of a Hilbert space $ {\mathcal H}$ is a closed linear subspace of ${\mathcal H}$ .

Proof

If x, y ∈ E ^⊥, then, for any α, β ∈ ℂ, one has $ {\langle } \alpha x + \beta y, \; z {\rangle } = \overline{\alpha} {\langle } x, \; z{\rangle } + \overline{\beta} {\langle } y, \; z{\rangle } = 0 $ for any z ∈ E, showing that αx + βy ∈ E ^⊥. Hence, E ^⊥ is a linear subspace of ${\mathcal H}$. If x _n is a sequence in E ^⊥ converging to $ x \in {\mathcal H}$, then, for all z ∈ E, one has $\displaystyle {\langle } x, \; z {\rangle } = \left\langle \lim_{n\to \infty}x_n, \; z \right\rangle = \lim_{n\to \infty} {\langle } x_n, \; z {\rangle } = 0 $, since ${\langle } x_n, \; z {\rangle } = 0 $ for all n. Hence, x ∈ E ^⊥, showing that E ^⊥ is closed. Above, in the first equality, we used the continuity of the scalar product.□

1.4 The Orthogonal Decomposition Theorem

Theorem A.3

(Orthogonal Decomposition Theorem) Let $ {\mathcal M} $ be a closed and linear (and therefore convex) subspace of a Hilbert space $ {\mathcal H}$ . Then every $x \in {\mathcal H}$ can be written in a unique way in the form x = y + z , with $ y\in {\mathcal M}$ and $ z \in {\mathcal M}^\perp$ . The vector y is such that $ \|x -y\| = \inf_{y'\in {\mathcal M}}\big\|x - y'\big\|$ , i.e., is the best approximant of x in $ {\mathcal M} $ .

Proof

Let x be an arbitrary element of ${\mathcal H}$. Since ${\mathcal M}$ is convex and closed, let us evoke Theorem 6.2 and choose y as the (unique) element of $ {\mathcal M}$ such that $ \|x -y\| = \inf_{y'\in {\mathcal M}}\big\|x - y'\big\|$. Defining $z \mathrel{\mathop:}= x-y$, all we have to do is to show that $ z\in {\mathcal M}^\perp$ and to show uniqueness of y and z. Let us first prove that $z\in {\mathcal M}^\perp$. By the definition of y, one has $ \| x - y\|^2 \leq \big\| x - y -\lambda y'\big\|^2 $ for all λ ∈ ℂ and all $y'\in {\mathcal M}$. By the definition of z, it follows that $ \| z\|^2 \; \leq \; \big\| z -\lambda y'\big\|^2 $ for all λ ∈ ℂ. Writing the right-hand side as $ {\big\langle } z -\lambda y' , \; z -\lambda y'{\big\rangle } $, we get $ \| z\|^2 \leq \| z\|^2 - 2 \mathrm{Re}\big(\lambda {\langle } z, \; y'{\rangle } \big) + |\lambda|^2 \big\|y'\big\|^2 $. Hence,

$$ 2 \mathrm{Re}\big(\lambda {\langle } z, \; y'{\rangle } \big) \; \leq \; |\lambda|^2 \big\|y'\big\|^2 \; .\label{sdrvbwert} $$

(45)

Now, write $\big\langle z, \; y'\big\rangle = \big|{\langle } z, \; y'{\rangle }\big|e^{i\alpha} $, for some α ∈ ℝ. Since (45) holds for all λ ∈ ℂ, we can pick λ in the form λ = te ^− iα , t > 0, and (45) becomes $ 2 t \big|{\langle } z, \; y'{\rangle }\big| \leq t^2 \big\|y'\big\|^2 $. Hence, $ \big|{\langle } z, \; y'{\rangle }\big| \leq \frac{t}{2} \big\|y'\big\|^2 $, for all t > 0. But this is only possible if the left-hand side vanishes: $\big|{\langle } z, \; y'{\rangle }\big| =0 $. Since y′ is an arbitrary element of $ {\mathcal M}$, this shows that $ z\in {\mathcal M}^\perp$.

To prove uniqueness, assume that x = y′ + z′ with $ y'\in {\mathcal M}$ and $ z'\in {\mathcal M}^\perp$. We would have y − y′ = z′ − z. But $y - y'\in {\mathcal M}$ and $z'- z \in {\mathcal M}^\perp$. Hence, both belong to ${\mathcal M}\cap {\mathcal M}^\perp=\{0\}$, showing that y − y′ = z′ − z = 0.□

1.5 The Spectrum of a Matrix

The spectrum of a matrix A ∈ Mat (ℂ, n), denoted by σ(A), is the set of all λ ∈ ℂ for which the matrix $\lambda\mathbb{1} -A $ has no inverse.

The characteristic polynomial of a matrix A ∈ Mat (ℂ, n) is defined by $ p_A(z) \mathrel{\mathop:}= \det (z\mathbb{1} -A) $. It is clearly a polynomial of degree n on z. It follows readily from these definitions that σ(A) coincides with the roots of p _A. The elements of σ(A) are said to be the eigenvalues of A. If λ is an eigenvalue of A, the matrix $A -\lambda\mathbb{1}$ has no inverse, and therefore, there exists at least one non-vanishing vector v ∈ ℂⁿ such that $(A -\lambda\mathbb{1})v=0$, that means, such that Av = λv. Such a vector is said to be an eigenvector of A with eigenvalue λ. The set of all eigenvectors associated to a given eigenvalues (plus the null vector) is a linear subspace of ℂⁿ, as one easily sees.

The multiplicity of a root λ of the characteristic polynomial of a matrix A ∈ Mat (ℂ, n) is called the algebraic multiplicity of the eigenvalue λ. The dimension of the subspace generated by the eigenvectors associated to the eigenvalues λ is called the geometric multiplicity of the eigenvalue λ. The algebraic multiplicity of an eigenvalue is always larger than or equal to its geometric multiplicity.

1.6 The Neighborhood of Singular Matrices

Proposition A.4

Let A ∈ Mat (ℂ, n) be arbitrary and let B ∈ Mat (ℂ, n) be a non-singular matrix. Then, there exist constants M ₁ and M ₂ (depending on A and B ) with 0 < M ₁ ≤ M ₂ such that A + μB is invertible for all μ ∈ ℂ with 0 < |μ| < M ₁ and for all μ ∈ ℂ with |μ| > M ₂ .

Proof

Since B has an inverse, we may write $A+\mu B=\left(\mu\mathbb{1} + AB^{-1}\right)B$. Hence, A + μB has an inverse if and only if $\mu\mathbb{1} + AB^{-1}$ is non-singular.

Let C ≡ − AB ^− 1 and let {λ ₁ , ..., λ _n} ⊂ ℂ be the n not necessarily distinct roots of the characteristic polynomial p _C of C. If all roots vanish, we take M ₁ = M ₂ > 0, arbitrary. Otherwise, let us define $M_1\mathrel{\mathop:}=\min\{ |\lambda_k|, \; \lambda_k\neq 0\}$ and $M_2\mathrel{\mathop:}=\max\{ |\lambda_k|, \; k=1, \, \ldots , \, n\}$. Then, the sets {μ ∈ ℂ| 0 < |μ| < M ₁} and {μ ∈ ℂ| |μ| > M ₂} do not contain roots of p _C, and therefore, for μ in these sets, the matrix $\mu\mathbb{1}-C=\mu\mathbb{1} + AB^{-1}$ is non-singular. □

1.7 Similar Matrices

Two matrices A ∈ Mat (ℂ, n) and B ∈ Mat (ℂ, n) are said to be similar if there is a non-singular matrix P ∈ Mat (ℂ, n) such that P ^− 1 AP = B. One has the following elementary fact:

Proposition A.5

Let A and B ∈ Mat (ℂ, n) be two similar matrices. Then their characteristic polynomials coincide, p _A = p _B , and therefore, their spectra also coincide, σ(A) = σ(B), as well as the geometric multiplicities of their eigenvalues

Proof

Let P ∈ Mat (ℂ, n) be such that P ^− 1 AP = B. Then, $ p_A(z) = \det(z \mathbb{1} -A) = \det\Big(P^{-1}(z \mathbb{1} -A)P\Big) = \det\big(z \mathbb{1} -P^{-1}AP\big) = \det(z\mathbb{1} -B) = p_B(z) $, for al z ∈ ℂ.□

1.8 The Spectrum of Products of Matrices

The next proposition contains a non-evident consequence of Propositions A.4 and A.5.

Proposition A.6

Let A, B ∈ Mat (ℂ, n). Then, the characteristic polynomials of the matrices AB and BA coincide: p _AB = p _BA . Therefore, their spectra also coincide, σ(AB) = σ(BA), as well as the geometric multiplicities of their eigenvalues.

Proof

If A or B (or both) are non-singular, then AB and BA are similar. In fact, in the first case, we can write AB = A(BA)A ^− 1 and in the second one has AB = B ^− 1(BA)B. In both cases, the claim follows from Proposition A.5. Let us now consider the case where neither A nor B are invertible. We know from Proposition A.4 that there exists M > 0 such that $A+\mu \mathbb{1}$ is non-singular for all μ ∈ ℂ with 0 < |μ| < M. Hence, for such values of μ, we have by the argument above that $p_{(A+\mu\mathbb{1})B}=p_{B(A+\mu\mathbb{1})}$. Now the coefficient of the polynomials $p_{(A+\mu\mathbb{1})B}$ and $p_{B(A+\mu\mathbb{1})}$ are polynomials in μ and, therefore, are continuous. Hence, the equality $p_{(A+\mu\mathbb{1})B}=p_{B(A+\mu\mathbb{1})}$ remains valid by taking the limit μ→0, leading to p _AB = p _BA. □

Proposition A.6 can be extended to products of non-square matrices:

Proposition A.7

Let A ∈ Mat (ℂ, m, n) and B ∈ Mat (ℂ, n, m). Clearly, AB ∈ Mat (ℂ, m) and BA ∈ Mat (ℂ, n). Then, one has $x^n p_{AB}(x)=x^m p_{BA}(x)$ . Therefore, σ(AB) ∖ {0} = σ(BA) ∖ {0}, i.e., the set of non-zero eigenvalues of AB coincide with the set of non-zero eigenvalues of BA .

Proof

Consider the two (m + n)×(m + n) matrices defined by

$$ A' \; \mathrel{\mathop:}= \; \begin{pmatrix} A & \mathbb{0}_{m, \; m} \\ \mathbb{0}_{n, \; n} & \mathbb{0}_{n, \; m} \end{pmatrix} \quad\mbox{ and }\quad B' \; \mathrel{\mathop:}= \; \begin{pmatrix} B & \mathbb{0}_{n, \; n} \\ \mathbb{0}_{m, \; m} & \mathbb{0}_{m, \; n} \end{pmatrix} \; \; . $$

See (8). It is easy to see that

$$ A'B' \!=\! \begin{pmatrix} AB & \mathbb{0}_{m, \; n} \\ \mathbb{0}_{n, \; m} & \mathbb{0}_{n, \; n} \end{pmatrix} \;\;\mbox{ and\;that }\;\; B'A' \!=\! \begin{pmatrix} BA & \mathbb{0}_{n, \; m} \\ \mathbb{0}_{m, \; n} & \mathbb{0}_{m, \; m} \end{pmatrix} . $$

From this, it is now easy to see that $p_{A'B'}(x)=x^n p_{AB}(x)$ and that $p_{B'A'}(x)=x^m p_{BA}(x)$. By Proposition A.6, one has p _A′B′(x) = p _B′A′(x), completing the proof.□

1.9 Diagonalizable Matrices

A matrix A ∈ Mat (ℂ, n) is said to be diagonalizable if it is similar to a diagonal matrix. Hence, A ∈ Mat (ℂ, n) is diagonalizable if there exists a non-singular matrix A ∈ Mat (ℂ, n) such that P ^− 1 AP is diagonal. The next theorem gives a necessary and sufficient condition for a matrix to be diagonalizable:

Theorem A.8

A matrix A ∈ Mat (ℂ, n) is diagonalizable if and only if it has n linearly independent eigenvectors, i.e., if the subspace generated by its eigenvectors is n-dimensional.

Proof

Let us assume that A has n linearly independent eigenvectors {v ¹, ..., v ⁿ }, whose eigenvalues are { d ₁ , ..., d _n }, respectively. Let P ∈ Mat (ℂ, n) be defined by $ P \; = \; {{\Big[\!\! \Big[}} v^1, \; \ldots , \; v^n {{\Big]\!\! \Big]}} $. By (12), one has

$$ A P \; = \; {{\Big[\!\! \Big[}} Av^1, \; \ldots , \; Av^n {{\Big]\!\! \Big]}} \; = \; {{\Big[\!\! \Big[}} d_1 v^1, \; \ldots , \; d_n v^n {{\Big]\!\! \Big]}} $$

and by (13), one has $ {{\Big[\!\! \Big[}} d_1 v^1, \; \ldots , \; d_n v^n {{\Big]\!\! \Big]}} = PD $. Therefore, AP = PD. Since the columns of P are linearly independent, P is non-singular and one has P ^− 1 AP = D, showing that A is diagonalizable.

Let us now assume that A is diagonalizable and that there is a non-singular P ∈ Mat (ℂ, n) such that $ P^{-1} A P \; = \; D \; = \; \mathrm{diag}\,\big(d_1, \; \ldots , \; d_n\big) $. It is evident that the vectors of the canonical base (10) are eigenvectors of D, with D e _a = d _a e _a . Therefore, v _a = P e _a are eigenvectors of A, since $ Av_a = A P{\mathbf e}_a = PD {\mathbf e}_a = P\big(d_a{\mathbf e}_a\big) = d_a P{\mathbf e}_a = d_a v_a $. To show that these vectors v _a are linearly independent, assume that there are complex numbers α ₁, ..., α _n such that α ₁ v ₁ + ⋯ + α _n v _n = 0 . Multiplying by P ^− 1 from the left, we get α ₁ e ₁ + ⋯ + α _n e _n = 0 , implying α ₁ = ⋯ = α _n = 0, since the elements e _a of the canonical basis are linearly independent.□

The spectral theorem is one of the fundamental results of functional analysis, and its version for bounded and unbounded self-adjoint operators in Hilbert spaces is of fundamental importance for the so-called probabilistic interpretation of quantum mechanics. Here we prove its simplest version for square matrices.

Theorem A.9

(Spectral Theorem for Matrices) A matrix A ∈ Mat (ℂ, n) is diagonalizable if and only if there exist r ∈ ℕ, 1 ≤ r ≤ n, scalars α ₁, ..., α _r ∈ ℂ, and non-zero distinct projectors E ₁ , ..., E _r ∈ Mat (ℂ, n) such that

$$ A \; = \; \sum\limits_{a=1}^r \alpha_a E_a \; , \label{decomposicaoespectral} $$

(46)

and

$$ \mathbb{1} \; = \; \sum\limits_{a=1}^r E_a \;, \label{OIUHYPoOprimo} $$

(47)

with E _i E _j = δ _{i, j} E _j . The numbers α ₁, ..., α _r are the distinct eigenvalues of A.

The projectors E _a in (46) are called the spectral projectors of A. The decomposition (46) is called spectral decomposition of A. In Proposition A.11, we will show how the spectral projections E _a can be expressed in terms of polynomials in A. In Proposition A.12, we establish the uniqueness of the spectral decomposition of a diagonalizable matrix.

Proof of Theorem A.9

If A ∈ Mat (ℂ, n) is diagonalizable, there exists P ∈ Mat (ℂ, n) such that $P^{-1}AP = D = \mathrm{diag}\, (\lambda_1 , \; \ldots , \; \lambda_n)$, where λ ₁ , ..., λ _n are the eigenvalues of A. Let us denote by { α ₁ , ..., α _r}, 1 ≤ r ≤ n, the set of all distinct eigenvalues of A.

One can clearly write $ D = \sum_{a=1}^{r} \alpha_a K_a $, where K _a ∈ Mat (ℂ, n) are diagonal matrices having 0 or 1 as diagonal elements, so that

$$ (K_a)_{ij} \; = \; \left\{ \begin{array}{ll} 1 \; , & \mbox{if }\; i=j \;\mbox{ and }\; (D)_{ii} = \alpha_a \; ,\\ 0 \; , & \mbox{if }\; i=j \;\mbox{ and }\; (D)_{ii} \neq \alpha_a \; ,\\ 0 \; , & \mbox{if }\; i\neq j \; . \end{array} \right. $$

Hence, (K _a)_ij = 1 if i = j and (D)_ii = α _a and (K _a)_ij = 0 otherwise. It is trivial to see that

$$ \sum\limits_{a=1}^{r} K_a \; = \; \mathbb{1} \label{PoiOujjuJ} $$

(48)

and that

$$ K_a K_b \; = \; \delta_{a, \, b}\: K_a . \label{seifybvOIni} $$

(49)

Since A = PDP ^− 1, one has $ A = \sum_{a=1}^{r} \alpha_a E_a \ $, where $E_a \mathrel{\mathop:}= PK_a P^{-1}$. It is easy to prove from (48) that $ \mathbb{1} = \sum_{a=1}^r E_a $ and it is easy to prove from (48) that E _i E _j = δ _{i, j} E _j .

Reciprocally, let us now assume that A has a representation like (46), with the E _a’s having the above-mentioned properties. Let us first notice that for any vector x and for k ∈ {1, ..., r}, one has by (46)

$$ AE_k x \; = \; \sum\limits_{j=1}^{r}\alpha_j E_j E_k x \; = \; \alpha_k E_k x \; . $$

Hence, E _k x is either zero or is an eigenvalue of A. Therefore, the subspace ${\mathcal S}$ generated by all vectors $\{E_k x, \; x\in{\mathbb{C}}^n, \; k=1, \; \ldots , \; r\}$ is a subspace of the space ${\mathcal A}$ generated by all eigenvectors of A. However, from (47), one has, for all x ∈ ℂⁿ, $ x = \mathbb{1} x = \sum_{k=1}^{r} E_k x $, and this reveals that ${\mathbb{C}}^n={\mathcal S}\subset {\mathcal A}$. Hence, $ {\mathcal A}={\mathbb{C}}^n$, and by Theorem A.8, A is diagonalizable. □

The spectral theorem has the following corollary, known as the functional calculus:

Theorem A.10

(Functional Calculus) Let A ∈ Mat (ℂ, n) be diagonalizable and let $\displaystyle A = \sum\limits_{a=1}^r \alpha_a E_a $ be its spectral decomposition. Then, for any polynomial p , one has $\displaystyle p(A) = \sum\limits_{a=1}^r p(\alpha_a) E_a $ .

Proof

By the properties of the spectral projectors E _a, one sees easily that $\displaystyle A^2 = \sum\limits_{a, \; b=1}^r \alpha_a\alpha_b E_a E_b = \sum\limits_{a, \; b=1}^r \alpha_a\alpha_b \delta_{a, \; b }E_a = \sum\limits_{a=1}^r (\alpha_a)^2 E_a $. It is then easy to prove by induction that $\displaystyle A^m = \sum\limits_{a=1}^r (\alpha_a)^m E_a $, for all m ∈ ℕ₀ (by adopting the convention that $A^0=\mathbb{1}$, the case m = 0 is simply (47)). From this, the rest of the proof is elementary.□

One can also easily show that for a non-singular diagonalizable matrix A ∈ Mat (ℂ, n) , one has

$$ A^{-1} \; = \; \sum\limits_{a=1}^r \frac{1}{\alpha_a} E_a \; . \label{eq:representacaoespectraldeinvesa} $$

(50)

1.10 Getting the Spectral Projections

One of the most useful consequences of the functional calculus is an explicit formula for the spectral projections of a diagonalizable matrix A in terms of a polynomial on A.

Proposition A.11

Let A ∈ Mat (ℂ, n) be non-zero and diagonalizable and let A = α ₁ E ₁ + ⋯ + α _r E _r be its spectral decomposition. Let the polynomials p _j , j = 1, ..., r, be defined by

$$ p_j(x) \; \mathrel{\mathop:}= \; \prod\limits_{{l=1}\atop {l\neq j}}^r \left(\frac{x-\alpha_l}{\alpha_j-\alpha_l}\right) \; . \label{eq:definicaodospolinoomiospj-ytivuytivTYRyr} $$

(51)

Then,

$$ E_j \; = \; p_j(A) \; = \; \left( \prod\limits_{{k=1}\atop {k\neq j}}^r \frac{1}{\alpha_j-\alpha_k} \right) \prod\limits_{{l=1}\atop {l\neq j}}^r \Big( A-\alpha_l\mathbb{1} \Big) \label{eq:obtendoosprojetoresespectrais-YTvTRVytrvUYtv} $$

(52)

for all j = 1, ..., r.

Proof

By the definition of the polynomials p _j, it is evident that p _j(α _k) = δ _{j, k}. Hence, by Theorem A.10, $ p_j(A) = \sum\limits_{k=1}^r p_j(\alpha_k) E_k = E_j $.□

1.11 Uniqueness of the Spectral Decomposition

Proposition A.12

The spectral decomposition of a diagonalizable matrix A ∈ Mat (ℂ, n) is unique.

Proof

Let $\displaystyle A= \sum\limits_{k=1}^r \alpha_k E_k$ be the spectral decomposition of A as described in Theorem A.9, where α _k, k = 1, ..., r, with 1 ≤ r ≤ n are the distinct eigenvalues of A, Let $\displaystyle A= \sum\limits_{k=1}^{s} \beta_k F_k$ be a second representation of A, where the β _k’s are distinct and where the F _k’s are non-vanishing and satisfy F _j F _l = δ _{j, l} F _l and $\displaystyle \mathbb{1}= \sum\limits_{k=1}^{s} F_k$. For a vector x ≠ 0, it holds $x=\sum\limits_{k=1}^{s} F_kx$, so that not all vectors F _k x vanish. Let $F_{k_0}x\neq 0$. One has $AF_{k_0}x=\sum\limits_{k=1}^{s} \beta_k F_kF_{k_0}x=\beta_{k_0}F_{k_0}x$. This shows that $\beta_{k_0}$ is one of the eigenvalues of A and, hence, { β ₁, ..., β _s} ⊂ { α ₁, ..., α _r} and we must have s ≤ r. Let us order both sets such that β _k = α _k for all 1 ≤ k ≤ s. Hence,

$$ A \; = \; \sum\limits_{k=1}^r \alpha_k E_k \; = \; \sum\limits_{k=1}^{s} \alpha_k F_k \; . \label{eq:JboiuyBIuytvIUtyuiyt} $$

(53)

Now, consider the polynomials p _j, j = 1, ..., r, defined in (51), for which p _j(α _j) = 1 and p _j(α _k) = 0 for all k ≠ j. By the functional calculus, it follows from (53) that, for 1 ≤ j ≤ s,

$$ p_j(A) = \underbrace{\sum\limits_{k=1}^r p_j(\alpha_k) E_k}_{= E_j} = \underbrace{\sum\limits_{k=1}^{s} p_j(\alpha_k) F_k}_{= F_j} , \;\;\; \therefore \;\;\; E_j = F_j . $$

(The equality $p_j(A)=\sum_{k=1}^{s} p_j(\alpha_k) F_k$ follows from the fact that the E _k’s and the F _k’s satisfy the same algebraic relations and, hence, the functional calculus also holds for the representation of A in terms of the F _k’s). Since $\displaystyle \mathbb{1} = \sum\limits_{k=1}^r E_k = \sum\limits_{k=1}^{s} E_k $ and E _j = F _j for all 1 ≤ j ≤ s, one has $\displaystyle\sum\limits_{k=s+ 1}^r E_k=\mathbb{0}$. Hence, multiplying by E _l, with s + 1 ≤ l ≤ r, it follows that $E_l = \mathbb{0}$ for all s + 1 ≤ l ≤ r. This is only possible if r = s, since the E _k’s are non-vanishing. This completes the proof.□

1.12 Self-Adjointness and Diagonalizability

Let A ∈ Mat (ℂ, m, n). The adjoint matrix A* ∈ Mat (ℂ, n, m) is defined as the unique matrix for which the equality

$$ {\big\langle } u, \; Av {\big\rangle } \; = \; {\big\langle } A^*u, \; v {\big\rangle } $$

holds for all u ∈ ℂ^m and all v ∈ ℂⁿ. If A _ij are the matrix elements of A in the canonical basis, it is an easy exercise to show that $\big(A^*\big)_{ij}=\overline{A_{ji}}$, where the bar denotes complex conjugation. It is trivial to prove that the following properties hold: 1. $\big(\alpha_1 A_1 + \alpha_2 A_2\big)^*=\overline{\alpha_1}A_1^*+ \overline{\alpha_2}A_2^*$ for all A ₁, A ₂ ∈ Mat (ℂ, m, n) and all α ₁, α ₂ ∈ ℂ; 2. $\big(AB\big)^*=B^*A^*$ for all A ∈ Mat (ℂ, m, n) and B ∈ Mat (ℂ, p, m); 3. $A^{**}\equiv\big(A^*\big)^*=A$ for all A ∈ Mat (ℂ, m, n).

A square matrix A ∈ Mat (ℂ, n) is said to be self-adjoint if A = A*. A square matrix U ∈ Mat (ℂ, n) is said to be unitary if U ^− 1 = U*. Self-adjoint matrices have real eigenvalues. In fact, if A is self-adjoint, λ ∈ σ(A), and v ∈ ℂⁿ is a normalized (i.e., ||v|| = 1) eigenvector of A with eigenvalue λ, then $\lambda=\lambda{\langle } v, \; v{\rangle }= {\langle } v, \; \lambda v{\rangle }={\langle } v, \; Av{\rangle }= {\langle } Av, \; v{\rangle }={\langle } \lambda v, \; v{\rangle }= \overline{\lambda}{\langle } v, \; v{\rangle }= \overline{\lambda} $, showing that λ ∈ ℝ.

1.13 Projectors and Orthogonal Projectors

A matrix E ∈ Mat (ℂ, n) is said to be a projector if E ² = E, and it is said to be a orthogonal projector if it is a self-adjoint projector: E ² = E and E* = E. An important example of an orthogonal projector is the following. Let v ∈ ℂⁿ be such that ||v|| = 1 and define,

$$ P_v u \; \mathrel{\mathop:}= \; {\langle } v, \; u {\rangle }\: v\; , \label{eq:defdePv-IUyboBYtibuyg} $$

(54)

for each u ∈ ℂⁿ. In the canonical basis, the matrix elements of P _v are given by $\big(P_v\big)_{ij}=\overline{v_j}v_i$, where the v _k’s are the components of v. One has,

$$ P_v^2 u = {\langle } v, \; u {\rangle }\: P_v v = {\langle } v, \; u {\rangle }\: {\langle } v, \; v {\rangle }\: v = {\langle } v, \; u {\rangle }\: v = P_v u \; , $$

proving that $P_v^2 = P_v$. On the other hand, for any a, b ∈ ℂⁿ, we get

$$ \begin{array}{rll} {\langle } a , \; P_v b {\rangle } &=& {\big\langle } a, \; {\langle } v, \; b {\rangle }\: v {\big\rangle } = {\langle } v, \; b {\rangle }\: {\langle } a, \; v {\rangle }\\ &=& \left\langle\overline{{\langle } a, \; v {\rangle }}\: v, \; b \right\rangle = {\big\langle } {\langle } v, \; a {\rangle }\: v, \; b {\big\rangle } = {\langle } P_v a , \; b {\rangle } , \end{array} $$

showing that $P_v^* = P_v$. Another relevant fact is that if v ₁ and v ₂ are orthogonal unit vectors, i.e., ${\langle } v_i, \; v_j {\rangle } =\delta_{ij}$, then $P_{v_1} P_{v_2} = P_{v_2} P_{v_1} =0$. In fact, for any a ∈ ℂⁿ, one has

$$ \begin{array}{rll} P_{v_1} \big(P_{v_2} a\big) &=& P_{v_1} \big({\langle } v_2, \; a {\rangle }\: v_2\big) = {\langle } v_2, \; a {\rangle }\: P_{v_1} v_2\\ &=& {\langle } v_2, \; a {\rangle }\: {\langle } v_1, \; v_2 {\rangle }\: v_1 = 0 . \end{array} $$

This shows that $P_{v_1} P_{v_2}=0$ and, since both are self-adjoint, one has also $P_{v_2} P_{v_1}=0$.

1.14 Spectral Theorem for Self-Adjoint Matrices

The following theorem establishes a fundamental fact about self-adjoint matrices.

Theorem A.13

(Spectral Theorem for Self-Adjoint Matrices) If A ∈ Mat (ℂ, n) is self-adjoint, one can find an orthonormal set {v ₁, ..., v _n} of eigenvectors of A with real eigenvalues λ ₁, ..., λ _n , respectively, and one has the spectral representation

$$ \label{eq:decomposicaoespectralparaautoadjuntos0} A \; = \; \lambda_1 P_{v_1} + \cdots + \lambda_n P_{v_n} \; , $$

(55)

where $P_{v_k}u\mathrel{\mathop:}= {\langle } v_k, \; u{\rangle } v_k$ satisfy $P_{v_k}^*=P_{v_k}$ and $P_{v_j}P_{v_k}=\delta_{jk}P_{v_k}$ and one has $\sum\limits_{k=1}^nP_{v_k}=\mathbb{1}$ .

Therefore, if A ∈ Mat (ℂ, n) is a self-adjoint matrix, it is diagonalizable. Moreover, there is a unitary P ∈ Mat (ℂ, n) such that $P^{-1}AP=\mathrm{diag}\,\big(\lambda_1, \; \ldots , \; \lambda_n\big)$ .

Proof

Let λ ₁ ∈ ℝ be an eigenvalue of A and let v ₁ be a corresponding eigenvector. Let us choose ||v ₁|| = 1. Define A ₁ ∈ Mat (ℂ, n) by $ A_1 \mathrel{\mathop:}= A - \lambda_1 P_{v_1} $. Since both A and $P_{v_1}$ are self-adjoint, so is A ₁, since λ ₁ is real.

It is easy to check that A ₁ v ₁ = 0. Moreover, $[v_1]^\perp$, the subspace orthogonal to v ₁, is invariant under the action of A ₁. In fact, for $w \in [v_1]^\perp$, one has $ {\langle } A_1 w , \; v_1 {\rangle } = {\langle } w , \;A_1 v_1 {\rangle } = 0 $, showing that $A_1 w\in[v_1]^\perp$.

It is therefore obvious that the restriction of A ₁ to $[v_1]^\perp$ is also a self-adjoint operator. Let $v_2 \in [v_1]^\perp$ be an eigenvector of this self-adjoint restriction with eigenvalues λ ₂ and choose ||v ₂|| = 1. Define

$$ A_2 \; \mathrel{\mathop:}= \; A_1 - \lambda_2 P_{v_2} \; = \; A - \lambda_1 P_{v_1} - \lambda_2 P_{v_2} \; . $$

Since λ ₂ is real, A ₂ is self-adjoint. Moreover, A ₂ annihilates the vectors in the subspace [v ₁ , v ₂] and keeps $[v_1 , \; v_2]^\perp$ invariant. In fact, $A_2 v_1=A v_1 - \lambda_1 P_{v_1} v_1 -$ $\lambda_2 P_{v_2}v_1\!=\! \lambda_1 v_1 \!-\! \lambda_1 v_1 \!-\! \lambda_2 {\langle} v_2 , \; v_1{\rangle } v_2 \!=\! 0$, since ${\langle } v_2 , \; v_1{\rangle } =$ 0. Analogously, $A_2 v_2 \!=\! A_1 v_2 \!-\! \lambda_2 P_{v_2} v_2 \!=\! \lambda_2 v_2 \!-\! \lambda_2 v_2 =$ 0. Finally, for any α, β ∈ ℂ and $w \in [v_1 , \; v_2]^\perp$, one has $ {\big\langle } A_2 w , \; (\alpha v_1 + \beta v_2){\big\rangle } = {\big\langle } w , \; A_2 (\alpha v_1 + \beta v_2){\big\rangle } = 0 $, showing that $[v_1 , \; v_2]^\perp$ is invariant by the action of A ₂.

Proceeding inductively, we find a set of vectors {v ₁ , ..., v _n}, with ||v _k|| = 1 and with v _a ∈ [v ₁ , ..., $ \; v_{a-1}]^\perp $ for 2 ≤ a ≤ n, and a set of real numbers { λ ₁ , ..., λ _n } such that $ A_n = A - \lambda_1 P_{v_1} - \cdots -\lambda_n P_{v_n} $ annihilates the subspace [v ₁ , ..., v _n]. But, since {v ₁ , ..., v _n} is an orthonormal set, one must have $ [v_1 , \; \ldots , \; v_{n}]={\mathbb{C}}^n$, and therefore, we must have A _n = 0, meaning that

$$ \label{decomposicaoespectralparaautoadjuntos} A \; = \; \lambda_1 P_{v_1} + \cdots + \lambda_n P_{v_n} \; . $$

(56)

One has $P_{v_k}P_{v_l} = \delta_{k, \, l}\: P_{v_k}$, since ${\langle } v_k, \; v_l{\rangle }=\delta_{kl}$. Moreover, since {v ₁ , ..., v _n} is a basis in ℂⁿ, one has

$$ \label{osiPIUYBfO} x \; = \; \alpha_1 v_1 + \cdots + \alpha_n v_n $$

(57)

for all x ∈ ℂⁿ. By taking the scalar product with v _k, one gets that $\alpha_k={\langle } v_k, \; x{\rangle }$ and, hence,

$$ \begin{array}{rll} x &=& {\langle } v_1 , \; x {\rangle } v_1 + \cdots + {\langle } v_n , \; x {\rangle } v_n\\ &=& P_{v_1}x + \cdots + P_{v_n}x = \left( P_{v_1} + \cdots + P_{v_n} \right)x . \end{array} $$

Since x was an arbitrary element of ℂⁿ, we established that $ P_{v_1} + \cdots + P_{v_n} = \mathbb{1} $.

It follows from (56) that Av _a = λ _a v _a . Hence, each v _k is an eigenvector of A with eigenvalue λ _k. By Theorem A.8, A is diagonalizable: There is P ∈ Mat (ℂ, n) such that $P^{-1}AP=\mathrm{diag}\,\big(\lambda_1, \; \ldots , \; \lambda_n\big)$. As we saw in the proof of Theorem A.8, we can choose $ P = {{\Big[\!\! \Big[}} v^1, \; \ldots , \; v^n {{\Big]\!\! \Big]}} $. This is, however, a unitary matrix, since, as one easily checks,

$$ P^* P \; = \; \begin{pmatrix} {\langle } v_1 , \; v_1 {\rangle } & \cdots & {\langle } v_1 , \; v_n {\rangle } \\ \vdots & \ddots & \vdots \\ {\langle } v_n , \; v_1 {\rangle } & \cdots & {\langle } v_n , \; v_n {\rangle } \end{pmatrix} \; = \; \mathbb{1} \;, $$

because ${\langle } v_a , \; v_b {\rangle } = \delta_{a, \, b}$.□

1.15 The Polar Decomposition Theorem for Square Matrices

It is well known that every complex number z can be written in the so-called polar form z = |z|e ^iθ, where |z| ≥ 0 and θ ∈ [ − π, π), with $|z|\mathrel{\mathop:}=\sqrt{\overline{z}z}$ and $e^{i\theta} \mathrel{\mathop:}= z|z|^{-1}$. There is an analogous claim for square matrices A ∈ Mat (ℂ, n). This is the content of the so-called Polar Decomposition Theorem, Theorem A.14, below. Let us make some preliminary remarks.

Let A ∈ Mat (ℂ, n) and consider A* A. One has (A* A)* = A* A ^** = A* A , and hence, A* A is self-adjoint. By Theorem A.13, we can find an orthonormal set {v _k, k = 1, ..., n} of eigenvectors of A* A, with eigenvalues d _k, k = 1, ..., n, respectively, with the matrix

$$ P\; \mathrel{\mathop:}= \; {{\Big[\!\! \Big[}} v_1, \; \ldots , \; v_n {{\Big]\!\! \Big]}} \label{eq:iuybuytrTYFGJhddd} $$

(58)

being unitary and such that $P^*\big(A^* A\big)P = D\mathrel{\mathop:}=$ diag (d ₁, ..., d _n). One has d _k ≥ 0 since $ d_k \|v_k\|^2 =$ $d_k {\langle } v_k, \; v_k{\rangle } \!=\! {\langle } v_k, \; Bv_k{\rangle } \!= {\langle } v_k, \; A^*Av_k{\rangle } = {\langle } Av_k, \; Av_k{\rangle } =$ $ \|Av_k\|^2 $ and, hence, $d_k=\|Av_k\|^2/\|v_k\|^2 \geq 0$.

Define $ D^{1/2} \mathrel{\mathop:}= \mathrm{diag}\, \left(\sqrt{d_1}, \; \ldots , \; \sqrt{d_n}\right)$. One has $\left(D^{1/2}\right)^2 = D$. Moreover, $\left(D^{1/2}\right)^*= D^{1/2}$, since every $\sqrt{d_k}$ is real. The non-negative numbers $\sqrt{d_1}, \; \ldots , \; \sqrt{d_n}$ are called the singular values of A.

Define the matrix $\sqrt{A^*A}\in\mathrm{Mat}\,({\mathbb{C}}, \; n)$ by

$$ \sqrt{A^*A} \; \mathrel{\mathop:}= \; PD^{1/2}P^* \; . \label{eq:definicaoderaizdeAstarA} $$

(59)

The matrix $\sqrt{A^*A}$ is self-adjoint, since $\left(\sqrt{A^*A}\right)^*=\left(PD^{1/2}P^* \right)^* =PD^{1/2}P^*= \sqrt{A^*A}$. Notice that $\left(\sqrt{A^*A}\right)^2 = P(D^{1/2})^2P^* =PDP^* =A^*A$. From this, it follows that

$$ \begin{array}{rll} \left(\det\left(\sqrt{A^*A}\right)\right)^2 &=& \det\left(\left(\sqrt{A^*A}\right)^2\right)\\ &=& \det(A^*A) = \det(A^*)\det(A)\\ &=& \overline{\det(A)}\det(A) = |\det(A)|^2 . \end{array} $$

Hence, $\det\left(\sqrt{A^*A}\right)=|\det(A)|$ and, therefore, $\sqrt{A^*A}$ is invertible if and only if A is invertible.

We will denote $\sqrt{A^*A}$ by |A|, following the analogy suggested by the complex numbers. Now we can formulate the polar decomposition theorem for matrices:

Theorem A.14

(Polar Decomposition Theorem) If A ∈ Mat (ℂ, n), there is a unitary matrix U ∈ Mat (ℂ, n) such that

$$ A \; = \; U\sqrt{A^*A} \; . \label{eq:adecomposicaopolardematrizes} $$

(60)

If A is non-singular, then U is unique. The representation (60) is called the polar representation of A .

Proof

As above, let d _k, k = 1, ..., n be the eigenvalues of A*A and let v _k, k = 1, ..., n be a corresponding orthonormal set of eigenvalues: $A^*Av_k=d_kv_k$ and ${\langle } v_k, \; v_l{\rangle }=\delta_{k\, l}$ (see Theorem A.13).

Since d _k ≥ 0 we order them in a way that d _k > 0 for all k = 1, ..., r and d _k = 0 for all k = r + 1, ..., n. Hence,

$$ Av_k \; = \; 0 \mbox{ for all } k\; = \; r+1, \; \ldots , \; n \; , \label{eq:sidpfoD3susudygs} $$

(61)

because $A^*Av_k=0$ implies $0={\langle } v_k, \; A^*Av_k{\rangle }={\langle } Av_k,$ $ \; Av_k{\rangle } =\|Av_k\|^2$.

For k = 1, ..., r, let w _k be the vectors defined by

$$ w_k \; \mathrel{\mathop:}= \; \frac{1}{\sqrt{d_k}}Av_k \; , \quad k\; =\; 1, \, \ldots , \, r \; . \label{eq:nkhbfcrtrdtfRR} $$

(62)

It is easy to see that

$$ \begin{array}{rll} {\langle } w_k, \; w_l{\rangle } &=& \frac{1}{\sqrt{d_kd_l}}{\langle } A v_k, \; Av_l{\rangle } = \frac{1}{\sqrt{d_kd_l}}{\langle } A^*A v_k, \; v_l{\rangle }\\ &=& \frac{d_k}{\sqrt{d_kd_l}}{\langle } v_k, \; v_l{\rangle } = \frac{d_k}{\sqrt{d_kd_l}}\delta_{k\, l} = \delta_{k\, l} , \end{array} $$

for all k, l = 1, ..., r. Hence, {w _k, k = 1, ..., r} is an orthonormal set. We can add to this set an additional orthonormal set {w _k, k = r + 1, ..., n} in the orthogonal complement of the set generated by {w _k, k = 1, ..., r} and get a new orthonormal set {w _k, k = 1, ..., n} as a basis for ℂⁿ.

Let P ∈ Mat (ℂ, n) be defined as in (58) and let Q and U be elements of Mat (ℂ, n) defined by

$$ Q \; \mathrel{\mathop:}= \; {{\Big[\!\! \Big[}} w_1, \; \ldots , \; w_n {{\Big]\!\! \Big]}} \; , \qquad U \; \mathrel{\mathop:}= \; QP^* \;. $$

Since $\big\{v_k,\; k =1, \, \ldots , \, n\big\}$ and $\big\{w_k,\; k =1, \, \ldots , \, n\big\}$ are orthonormal sets, one easily sees that P and Q are unitary and, therefore, U is also unitary.

It is easy to show that AP = QD ^1/2, where $D^{1/2}\mathrel{\mathop:}=\mathrm{diag}\,\left(\sqrt{d_1}, \; \ldots , \; \sqrt{d_n}\right)$, In fact,

$$ \begin{array}{rll} AP &\stackrel{\text{(58)}}{=}& A{{\Big[\!\! \Big[}} v_1, \; \ldots , \; v_n {{\Big]\!\! \Big]}} \stackrel{\text{(12)}}{=} {{\Big[\!\! \Big[}} Av_1, \; \ldots , \; Av_n {{\Big]\!\! \Big]}}\\ &\stackrel{\text{(61)}}{=}& {{\Big[\!\! \Big[}} Av_1, \; \ldots , \; Av_r\; 0, \; \ldots , \;0 {{\Big]\!\! \Big]}} \\ &\stackrel{\text{(62)}}{=}& {{\Big[\!\! \Big[}} \sqrt{d_1}w_1, \; \ldots , \; \sqrt{d_r}w_r\; 0, \; \ldots , \;0 {{\Big]\!\! \Big]}}\\ &\stackrel{\text{(13)}}{=}& {{\Big[\!\! \Big[}} w_1, \; \ldots , \; w_n {{\Big]\!\! \Big]}} D^{1/2} = QD^{1/2} . \end{array} $$

Now, since AP = QD ^1/2, it follows that $A=QD^{1/2}P^* = UPD^{1/2}P^* \stackrel{\text{(59)}}{=}U\sqrt{A^*A}$, as we wanted to show.

To show that U is uniquely determined if A is invertible, assume that there exists U′ such that $A=U\sqrt{A^*A}=U'\sqrt{A^*A}$. We noticed above that $\sqrt{A^*A}$ is invertible if and only if A is invertible. Hence, if A is invertible, the equality $U\sqrt{A^*A}=U'\sqrt{A^*A}$ implies U = U′. If A is not invertible, the arbitrariness of U lies in the choice of the orthonormal set {w _k, k = r + 1, ..., n}.□

The following corollary is elementary:

Theorem A.15

Let A ∈ Mat (ℂ, n). Then, there exists a unitary matrix V ∈ Mat (ℂ, n) such that

$$ A \; = \; \sqrt{AA^*}\, V \; . \label{eq:adecomposicaopolardematrizes-II} $$

(63)

If A is non-singular, then V is unique.

Proof

For the matrix A*, relation (60) says that $A^*=U_0\sqrt{(A^*)^*A^*}=U_0\sqrt{AA^*}$ for some unitary U ₀. Since $\sqrt{AA^*}$ is self-adjoint, one has $A=\sqrt{AA^*}\, U_0^*$. Identifying $V\equiv U_0^*$, we get what we wanted.□

The polar decomposition theorem can be generalized to bounded or closed unbounded operators acting on Hilbert spaces and even to C*-algebras. See, e.g., [29] and [30].

1.16 Singular Values Decomposition

The polar decomposition theorem, Theorem A.14, has a corollary of particular interest.

Theorem A.16

(Singular Values Decomposition Theorem) Let A ∈ Mat (ℂ, n). Then, there exist unitary matrices V and W ∈ Mat (ℂ, n) such that

$$ A \; = \; VSW^* \; , \label{eq:adecomposicaoemvaloressingularesdematrizes} $$

(64)

where S ∈ Mat (ℂ, n) is a diagonal matrix whose diagonal elements are the singular values of A , i.e., the eigenvalues of $\sqrt{A^*A}$ .

Proof

The claim follows immediately from (60) and from (59) by taking V = UP, W = P, and S = D ^1/2.□

Theorem A.16 can be generalized to rectangular matrices. In what follows, m, n ∈ ℕ and we will use definitions (4), (8), and relation (9) that allow to injectively map rectangular matrices into certain square matrices.

Theorem A.17

(Singular Values Decomposition Theorem. General Form) Let A ∈ Mat (ℂ, m, n). Then, there exist unitary matrices V and W ∈ Mat (ℂ, m + n) such that

$$ A \; = \; I_{m, \; m+n}VSW^*J_{ m+n , \; n} \; , \label{eq:adecomposicaoemvaloressingularesdematrizes-GERAL} $$

(65)

where S ∈ Mat (ℂ, m + n) is a diagonal matrix whose diagonal elements are the singular values of A′ (defined in (8)), i.e., are the eigenvalues of $\sqrt{(A')^*A'}$ .

Proof

The matrix A′ ∈ Mat (ℂ, m + n) is a square matrix, and by Theorem A.16, it can be written in terms of a singular value decomposition A′ = VSW* with V and W ∈ Mat (ℂ, m + n), both unitary, and S ∈ Mat (ℂ, m + n) being a diagonal matrix whose diagonal elements are the singular values of A′. Therefore, (65) follows from (9). □

Appendix 2: Singular Values Decomposition and Existence of the Moore–Penrose Pseudoinverse

We will now present a second proof of the existence of the Moore–Penrose pseudoinverse of a general matrix A ∈ Mat (ℂ, m , n) making use of Theorem A.16. We first consider square matrices and later consider general rectangular matrices.

2.1 The Moore–Penrose Pseudoinverse of Square Matrices

Let us first consider square diagonal matrices. If D ∈ Mat (ℂ, n) is a diagonal matrix, its Moore–Penrose pseudoinverse is given by D ⁺ ∈ Mat (ℂ, n), where, for i = 1, ..., n, one has

$$ \big(D^+\big)_{ii} \; = \; \left\{ \begin{array}{cl} \big(D_{ii}\big)^{-1} \;, & \mbox{if } D_{ii}\neq 0\;, \\ 0 \; , & \mbox{if } D_{ii} = 0\;. \end{array} \right. $$

It is elementary to check that DD ⁺ D = D, D ⁺ DD ⁺ = D ⁺ and that DD ⁺ and D ⁺ D are self-adjoint. Actually, DD ⁺ = D ⁺ D, a diagonal matrix whose diagonal elements are either 0 or 1:

$$ \big(DD^+\big)_{ii}=\big(D^+D\big)_{ii} \; = \; \left\{ \begin{array}{cl} 1 \; , & \mbox{if } D_{ii}\neq 0\;, \\ 0 \; , & \mbox{if } D_{ii} = 0\;. \end{array} \right. $$

Now, let A ∈ Mat (ℂ, n) and let A = VSW* be its singular values decomposition (Theorem A.16). We claim that its Moore–Penrose pseudoinverse A ⁺ is given by

$$ A^+ \; = \; WS^+V^* \; . $$

(66)

In fact, $AA^+A=\big(VSW^*\big)\big(WS^+V^*\big)\big(VSW^*\big)=VSS^+$ SW ⁺ = VSW* = A and

$$ \begin{array}{rll} A^+AA^+ &=& \big(WS^+V^*\big)\big(VSW^*\big)\big(WS^+V^*\big)\\ &=& WS^+SS^+V^* = WS^+V^* = A^+ . \end{array} $$

Moreover, $AA^+=\big(VSW^*\big)\big(WS^+V^*\big)=V\big(SS^+\big)V^*$ is self-adjoint, since SS ⁺ is a diagonal matrix with diagonal elements 0 or 1. Analogously, $A^+A=\big(WS^+V^*\big)$ $\big(VSW^*\big)=W\big(S^+S\big)W^*$ is self-adjoint.

2.2 The Moore–Penrose Pseudoinverse of Rectangular Matrices

Consider now A ∈ Mat (ℂ, m, n) and let A′ ∈ Mat (ℂ, m + n) be the (m + n)×(m + n) defined in (8). Since A′ is a square matrix, it has, by the comments above, a unique Moore–Penrose pseudoinverse ( A′)⁺ satisfying

1.
$A'\big(A'\big)^+A'=A'$,
2.
$\big(A'\big)^+A'\big(A'\big)^+=\big(A'\big)^+$,
3.
$A'\big(A'\big)^+$ and $\big(A'\big)^+A'$ are self-adjoint.

In what follows, we will show that A ⁺ ∈ Mat (ℂ, n, m) is given by

$$ A^+ \; \mathrel{\mathop:}= \; I_{n,\; m+n } \big(A'\big)^+ J_{m+n, \; m} \;, \label{eq:apseudoinversaparamatrizesretangulares} $$

(67)

with the definitions (4) and (5), i.e.,

$$ A^+ \; = \; I_{ n, \; m+n} \Big( J_{m+n, \; m} A I_{ n, \; m+n } \Big)^+ J_{m+n, \; m} \;. \label{eq:apseudoinversaparamatrizesretangulares-2} $$

(68)

The starting point is the existence of the Moore–Penrose pseudoinverse of the square matrix A′. Relation $A'\big(A'\big)^+A'=A'$ means, using definition (8), that $ J_{ m+n, \; m} A \Big[ I_{ n, \; m+n } \big(A'\big)^+ J_{m+n, \; m}\Big] A I_{ n, \; m+n } = J_{m+n, \; m} A I_{ n, \; m+n } $ and from (6) and (7) it follows, by multiplying to the left by I _{m, m + n} and to the right by J _{m + n , n}, that A A ⁺ A = A , one of the relations we wanted to prove.

Relation $\big(A'\big)^+A'\big(A'\big)^+=\big(A'\big)^+$ means, using definition (8), that $ \big(A'\big)^+ J_{m+n, \; m} A I_{ n, \; m+n } \big(A'\big)^+ = \big(A'\big)^+ $. Multiplying to the left by I _{n, m + n} and to the right by J _{m + n, m}, this establishes that A ⁺ A A ⁺ = A ⁺ .

Since $A'\big(A'\big)^+$ is self-adjoint, it follows from the definition (8) that $J_{ m+n, \; m} A I_{ n, \; m+n } \big(A'\big)^+ $ is self- adjoint, i.e.,

$$ J_{m+n, \; m} A I_{ n, \; m+n } \big(A'\big)^+ \; = \; \left( A I_{ n, \; m+n } \big(A'\big)^+\right)^* I_{m, \; m+n} \; . $$

Therefore, multiplying to left by I _{m, m + n} and to the right by J _{m + n, m}, it follows from (6) that

$$ \begin{array}{rll} A I_{ n, \; m+n } \big(A'\big)^+ J_{m+n, \; m} &=& I_{m, \; m+n}\Big( A I_{ n, \; m+n } (A')^+\Big)^*\\ &=& \left( A I_{ n, \; m+n } \big(A'\big)^+ J_{m+n, \; m} \right)^* , \end{array} $$

proving that A A ⁺ is self-adjoint

Finally, since $\big(A'\big)^+A'$ is self-adjoint, it follows from definition (8) that $\big(A'\big)^+ J_{m+n, \; m} A I_{ n, \; m+n }$ is self- adjoint, i.e.,

$$ \big(A'\big)^+ J_{m+n, \; m} A I_{ n, \; m+n } \; = \; J_{ m+n , \; n} \left( \big(A'\big)^+ J_{ m+n, \; m} A \right)^* \;. $$

Hence, multiplying to the left by I _{n, m + n} and to the right by J _{m + n , n}, if follows from (7) that

$$ \begin{array}{rll} I_{n, \; m+n}(A')^+ J_{m+n, \; m} A &=& \left( \big(A'\big)^+ J_{m+n, \; m} A \right)^*J_{ m+n , \; n}\\ &=& \left( I_{ n, \; m+n} \big(A'\big)^+ J_{m+n, \; m} A \right)^* , \end{array} $$

establishing that A ⁺ A is self-adjoint. This proves that A ⁺ given in (67) is the Moore–Penrose pseudoinverse of A.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barata, J.C.A., Hussein, M.S. The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory. Braz J Phys 42, 146–165 (2012). https://doi.org/10.1007/s13538-011-0052-z

Download citation

Received: 27 October 2011
Published: 16 December 2011
Issue Date: April 2012
DOI: https://doi.org/10.1007/s13538-011-0052-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Moore–Penrose Pseudoinverse: A Tutorial Review of the Theory

Abstract

Similar content being viewed by others

Riemannian inexact Newton method for structured inverse eigenvalue and singular value problems

Definitions and Motivations

Methods for Computing Weighted Pseudoinverses and Weighted Normal Pseudosolutions with Singular Weights

1 Introduction, Motivation, and Notation

1.1 Some Applications of the Moore–Penrose Pseudoinverse

1.2 Organization

1.3 Notation and Preliminary Definitions

2 The Moore–Penrose Pseudoinverse: Definition and First Properties

2.1 Generalized Inverses, or Pseudoinverses

2.2 Defining the Moore–Penrose Pseudoinverse

2.3 The Uniqueness of the Moore–Penrose Pseudoinverse

2.4 Existence of the Moore–Penrose Pseudoinverse

2.5 Computing the Moore–Penrose Pseudoinverse in Some Special Cases

3 Further Properties of the Moore–Penrose Pseudoinverse

Proposition 3.1

Proof of Proposition 3.1

Proposition 3.2

Proof of Proposition 3.2

3.1 The Kernel and the Range of a Matrix and the Moore–Penrose Pseudoinverse

Proposition 3.3

Proof

4 Tikhonov’s Regularization and Existence Theorem for the Moore–Penrose Pseudoinverse

Lemma 4.1

Proof of Lemma 4.1

Lemma 4.2

Proof

Theorem 4.3

Proof

5 The Moore–Penrose Pseudoinverse and the Spectral Theorem

Theorem 5.1

Proof of Theorem 5.1

6 The Moore–Penrose Pseudoinverse and Least Squares

Theorem 6.1

Proof of Theorem 6.1

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: A Brief Review of Hilbert Space Theory and Linear Algebra

1.1 Hilbert Spaces: Basic Definitions

1.2 The Best Approximant Theorem

Theorem A.1

Proof

1.3 Orthogonal Complements

Proposition A.2

Proof

1.4 The Orthogonal Decomposition Theorem

Theorem A.3

Proof

1.5 The Spectrum of a Matrix

1.6 The Neighborhood of Singular Matrices

Proposition A.4

Proof

1.7 Similar Matrices

Proposition A.5

Proof

1.8 The Spectrum of Products of Matrices

Proposition A.6

Proof

Proposition A.7

Proof

1.9 Diagonalizable Matrices

Theorem A.8

Proof

Theorem A.9

Proof of Theorem A.9

Theorem A.10

Proof

1.10 Getting the Spectral Projections

Proposition A.11

Proof

1.11 Uniqueness of the Spectral Decomposition

Proposition A.12

Proof