1 Introduction, Motivation, and Notation

In this paper, we present a self-contained review of some of the basic results on the so-called Moore–Penrose pseudoinverse of matrices, a concept that generalizes the usual notion of inverse of a square matrix, but that is also applicable to singular square matrices or even to non-square matrices. This notion is particularly useful in dealing with certain linear least squares problems, as we shall discuss in Section 6, i.e., problems where one searches for an optimal approximation for solutions of linear equations like Ax = y, where A is a given m×n matrix, y is a given column vector with m components, and the unknown x, a column vector with n components, is the searched solution. In many situations, a solution is non-existing or non-unique, but one asks for a vector x such that the norm of the difference Ax − y is the smallest possible (in terms of least squares).

Let us be a little more specific. Let A ∈ Mat (ℂ, m, n) (the set of all complex m×n matrices) and y ∈ ℂm be given and consider the problem of finding x ∈ ℂn satisfying the linear equation

$$ Ax \; = \; y. \label{eq:LuybiUYtvtRytfr-00} $$
(1)

If m = n and A has an inverse, the (unique) solution is, evidently, x = A  − 1 y. In other cases the solution may not exist or may not be unique. We can, however, consider the alternative problem of finding the set of all vectors x′ ∈ ℂn such that the Euclidean norm ||Ax′ − y|| reaches its least possible value. This set is called the minimizing set of the linear problem (1). Such vectors x′ ∈ ℂn would be the best approximants for the solution of (1) in terms of the Euclidean norm, i.e., in terms of “least squares.” As we will show in Theorem 6.1, the Moore–Penrose pseudoinverse provides this set of vectors x′ that minimize ||Ax′ − y||: It is the set

$$ \Big\{ A^+y +\big( \mathbb{1}_n -A^+A\big) z, \; z\in{\mathbb{C}}^n \Big\} \; , \label{eq:conjuntominimizante-deoptimizacaolinear-00} $$
(2)

where A  +  ∈ Mat (ℂ, n, m) denotes the Moore–Penrose pseudoinverse of A. An important question for applications is to find a general and algorithmically simple way to compute A  + . The most common approach uses the singular values decomposition and is described in “Appendix 2.” Using the spectral theorem and Tikhonov’s regularization method, we show that A  +  can be computed by the algorithmically simpler formula

$$ \begin{array}{rll}\label{eq:reppreudoespectraldeAMAIS-2-00} A^+ \; &=& \; \sum\limits_{{b=1}\atop{\beta_b\neq 0}}^{s} \;\; \frac{1}{\beta_b} \left( \prod\limits_{{l=1}\atop {l\neq b}}^s \big(\beta_b-\beta_l\big)^{-1} \right) \\&&\times \left[ \prod\limits_{{l=1}\atop {l\neq b}}^s \Big( A^*A-\beta_l\mathbb{1}_n \Big) \right] A^* \; , \end{array} $$
(3)

where A* denotes the adjoint matrix of A and β k , k = 1, ..., s, are the distinct eigenvalues of A*A. See Theorem 5.1 for a more detailed statement. One of the aims of this paper is to present a proof of (3) by combining the spectral theorem with the a regularization procedure due to Tikhonov [1, 2].

1.1 Some Applications of the Moore–Penrose Pseudoinverse

Problems involving the determination of the minimizing set of (1) are always present when the number of unknowns exceeds the number of values provided by measurements. Such situations occur in many areas of Applied Mathematics, Physics, and Engineering, ranging from imaging methods, such as magnetic resonance imaging (MRI) [35], functional MRI [7, 6], positron emission tomography [8, 9], and magnetic source imaging [1012] to seismic inversion problems [13, 14].

The Moore–Penrose pseudoinverse and/or the singular values decomposition (SVD) of matrices (discussed in “Appendix 2”) are also employed in data analysis, as in the treatment of electroencephalographic source localization [15] and in the so-called principal component analysis. Applications of this last method to astronomical data analysis can be found in [1619], and applications to gene expression analysis can be found in [20, 21]. Image compression algorithms using SVD are known at least since [22], and digital image restoration using the Moore–Penrose pseudoinverse has been studied in [23, 24].

Problems involving the determination of the minimizing set of (1) also occur, for instance, in certain numerical algorithms for finding solutions of linear Fredholm integral equations of the first kind:

$$ \int_a^b k(x, \; y)\, u(y)\;{\rm d}y \; = \; f(x) \;, $$

where − ∞ < a < b < ∞ and where k and f are given functions. See Section 4 for a further discussion of this issue. For an introductory account on integral equations, rich in examples and historical remarks, see [25].

Even this short list of applications should convince a student of Physics or Applied Mathematics of the relevance of the Moore–Penrose pseudoinverse and related subjects, and our main objective is to provide a self- contained introduction to the required theory.

1.2 Organization

In Section 2, we present the definition of the Moore–Penrose pseudoinverse and obtain its basic properties. In Section 3, we further develop the theory of the Moore–Penrose pseudoinverses. In Section 4, we describe Tikhonov’s regularization method for the computation of Moore–Penrose pseudoinverses and present a first proof of existence. Section 5 collects the previous results and derives expression (3), based on the spectral theorem, for the computation of Moore–Penrose pseudoinverses. This expression is algorithmically simpler than the usual method based on the singular values decomposition (described in “Appendix 2”). In Section 6, we show the relevance of the Moore–Penrose pseudoinverse for the solution of linear least squares problems, its main motivation. In “Appendix 1,” we present a self-contained review of the results on linear algebra and Hilbert space theory, not all of them elementary, that we need in the main part of this paper. In “Appendix 2,” we approach the existence problem of the Moore–Penrose pseudoinverse by using the usual singular values decomposition method.

1.3 Notation and Preliminary Definitions

In the following, we fix the notation utilized throughout the paper. We denote ℂn the vector space of all n-tuples of complex numbers: \({\mathbb{C}}^n\mathrel{\mathop:}=\left\{ \left(\begin{smallmatrix}z_1\\ \vdots\\z_n\end{smallmatrix}\right), \; \mbox{with}\; z_k\in\right.\) \(\left.{\vphantom{\left(\begin{smallmatrix}z_1\\\vdots\\z_n\end{smallmatrix}\right)}}{\mathbb{C}} \mbox{ for all } k=1, \; \ldots, \; n\right\}\). We denote the usual scalar product in ℂn by \({\langle }\cdot, \; \cdot{\rangle }_{\mathbb{C}}\) or simply by \({\langle }\cdot, \; \cdot{\rangle }\), where for \(z= \left(\begin{smallmatrix}z_1\\ \vdots\\z_n\end{smallmatrix}\right)\in{\mathbb{C}}^n\) and \(w= \left(\begin{smallmatrix}w_1\\ \vdots\\w_n\end{smallmatrix}\right)\in{\mathbb{C}}^n\), we have

$$ {\langle } z, \; w {\rangle }_{\mathbb{C}} \; \equiv \; {\langle } z, \; w {\rangle } \; \mathrel{\mathop:}= \; \sum\limits_{k=1}^n\overline{z_k}w_k \;. $$

Note that this scalar product is linear in the second argument and anti-linear in the first, in accordance with the convention adopted in Physics. Two vectors u and v ∈ ℂn are said to be orthogonal according to the scalar product \({\langle } \cdot , \; \cdot {\rangle }\) if \({\langle } u , \; v {\rangle }=0\). If W ⊂ ℂn is a subspace of ℂn, we denote by W  ⊥  the subspace of ℂn composed by all vectors orthogonal to all vectors of W. The usual norm of a vector z ∈ ℂn will be denoted by ||z|| or simply by ||z|| and is defined by \(\|z\|_{\mathbb{C}}\equiv\|z\|=\sqrt{{\langle } z, \; z{\rangle }}\). It is well known that ℂn is a Hilbert space with respect to the usual scalar product.

The set of all complex m×n matrices (m rows and n columns) will be denoted by Mat (ℂ , m, n). The set of all square n×n matrices with complex entries will be denoted by Mat (ℂ , n).

The identity matrix will be denoted by \(\mathbb{1}\). Given A ∈ Mat (ℂ , m, n), we denote by A T the element of Mat (ℂ , n, m) whose matrix elements are \((A^T)_{ij} = A_{ji}\) for all i ∈ {1, ..., n}, j ∈ {1, ..., m}. The matrix A T is said to be the transpose of A. It is evident that (A T)T = A and that (AB)T = B T A T for all A ∈ Mat (ℂ , m, n) and B ∈ Mat (ℂ , n, p).

If A ∈ Mat (ℂ, m, n), then its adjoint A* ∈ Mat (ℂ, n, m) is defined as the matrix whose matrix elements \((A^*)_{ij}\) are given by \(\overline{A_{ji}}\) for all 0 ≤ i ≤ n and 0 ≤ j ≤ m.

Given a set α 1, ..., α n of complex numbers, we denote by diag (α 1, ..., α n ) ∈ Mat (ℂ, n) the diagonal matrix whose k-th diagonal entry is α k :

$$ \big(\mathrm{diag}\, (\alpha_1, \; \ldots , \; \alpha_n )\big)_{ij} \; = \; \left\{ \begin{array}{ll} \alpha_i, & \mbox{for } i=j \; , \\ 0, & \mbox{for } i\neq j \; . \end{array} \right. $$

The spectrum of a square matrix A ∈ Mat (ℂ, n) coincides with the set of its eigenvalues (see the definitions in “Appendix 1”) and will be denoted by σ(A).

We denote by \(\mathbb{0}_{a, \; b}\in \mathrm{Mat}\, ({\mathbb{C}} , \; a, \;b)\) the a × b matrix whose matrix elements are all zero. We denote by \(\mathbb{1}_{l}\in \mathrm{Mat}\, ({\mathbb{C}} ,\) l) the l×l identity matrix. If no danger of confusion is present, we will simplify the notation and write \(\mathbb{0}\) and \(\mathbb{1}\) instead of \(\mathbb{0}_{a, \; b}\) and \(\mathbb{1}_{l}\), respectively. We will also employ the following definitions: for m, n ∈ ℕ, let I m, m + n  ∈ Mat (ℂ, m, m + n) and J m + n, n  ∈ Mat (ℂ, m + n, n) be given by

$$ I_{m, \; m+n} \; \mathrel{\mathop:}= \; \begin{pmatrix} \mathbb{1}_m & \,\, \mathbb{0}_{m, \; n} \end{pmatrix} \quad \mbox{ and } \quad J_{m+n, \; n} \; \mathrel{\mathop:}= \; \begin{pmatrix} \mathbb{1}_n \\ \mathbb{0}_{m, \; n} \end{pmatrix} \; . \label{eq:MatrizesIJeAlinha-1} $$
(4)

The corresponding transpose matrices are

$$ \begin{array}{rll} \big(I_{m, \; m+n}\big)^T \; \mathrel{\mathop:}&=& \; \begin{pmatrix} \mathbb{1}_m \\ \mathbb{0}_{n, \; m} \end{pmatrix} \; = \; J_{m+n, \; m} \quad \mbox{ and } \quad\\ \big(J_{ m+n , \; n}\big)^T \; \mathrel{\mathop:}&=& \; \begin{pmatrix} \mathbb{1}_n & \mathbb{0}_{n, \; m} \end{pmatrix} \; = \; I_{n, \; m+n} \; . \label{eq:MatrizesIJeAlinha-2} \end{array} $$
(5)

The following useful identities will be used below:

$$I_{m, \; m+n} \, \big(I_{m, \; m+n}\big)^T = I_{m, \; m+n} J_{m+n, \; m} \; = \; \mathbb{1}_m \;, \label{eq:MatrizesIJeAlinha-3} $$
(6)
$${\kern-6pt} \big(J_{ m+n , \; n}\big)^T J_{ m+n , \; n} = I_{n, \; m+n } J_{ m+n , \; n} \; = \; \mathbb{1}_n \;, \label{eq:MatrizesIJeAlinha-4} $$
(7)

For each A ∈ Mat (ℂ, m, n), we can associate a square matrix A′ ∈ Mat (ℂ, m + n) given by

$$ \begin{array}{rll} A' \; \mathrel{\mathop:}&=& \; \big(I_{m, \; m+n}\big)^T A \big(J_{ m+n , \; n}\big)^T\\ \; &=& \; J_{m+n, \; m} A I_{ n, \; m+n} \; = \; \begin{pmatrix} A & \mathbb{0}_{m, \; m} \\ \mathbb{0}_{n, \; n} & \mathbb{0}_{n, \; m} \end{pmatrix} \;. \label{eq:MatrizesIJeAlinha-5} \end{array} $$
(8)

As one easily checks, we get from (6)–(7) the useful relation

$$ A \; = \; I_{m, \; m+n} A' J_{ m+n , \; n} \; . \label{eq:MatrizesIJeAlinha-6} $$
(9)

The canonical basis of vectors in ℂn is

$${\mathbf e}_1 \; = \; \begin{pmatrix} 1 \\ 0 \\ 0\\ \vdots \\ 0 \end{pmatrix} , \quad {\mathbf e}_2 \; = \; \begin{pmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{pmatrix} , \quad \ldots , \quad {\mathbf e}_n \; = \; \begin{pmatrix} 0 \\0\\ \vdots \\ 0 \\1 \end{pmatrix} \; , \label{eq:ABaseCanonica-1} $$
(10)

Let x 1, ..., x n be vectors, represented in the canonical basis as

$$ x^a \; = \; \begin{pmatrix} x^a_1 \\ \vdots \\ x^a_n \end{pmatrix} \; . $$

We will denote by \( {{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} \) the n × n matrix constructed in such a way that its a-th column is the vector x a, that means,

$${{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} \; = \; \begin{pmatrix} x^1_1 & \cdots & x^n_1 \\ \vdots & \ddots & \vdots \\ x^1_n & \cdots & x^n_n \end{pmatrix} \; . \label{eq:notacaoparamatrizgeradaporcolunas} $$
(11)

It is obvious that \( \mathbb{1} = {{\Big[\!\! \Big[}} {\mathbf e}_1 , \; \ldots , \; {\mathbf e}_n {{\Big]\!\! \Big]}} \). With this notation, we write

$$ \label{BgerV} B {{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} \; = \; {{\Big[\!\! \Big[}} Bx^1, \; \ldots , \; Bx^n {{\Big]\!\! \Big]}} \; , $$
(12)

for any B ∈ Mat (ℂ, m, n), as one easily checks. Moreover, if D is a diagonal matrix D = diag (d 1, ..., d n ), then

$$ \label{eq:gerVmalD} {{\Big[\!\! \Big[}} x^1, \; \ldots , \; x^n {{\Big]\!\! \Big]}} \, D \; = \; {{\Big[\!\! \Big[}} d_1 x^1, \; \ldots , \; d_n x^n {{\Big]\!\! \Big]}}\; . $$
(13)

If v 1 , ..., v k are elements of a complex vector space V, we denote by [v 1 , ..., v k ] the subspace generated v 1 , ..., v k , i.e., the collection of all linear combinations of the v 1 , ..., v k : \( [v_1 , \; \ldots , \; v_k] \mathrel{\mathop:}= \Big\{ \alpha_1 v_1 + \cdots + \alpha_k v_k, \; \; \alpha_1, \; \ldots , \; \alpha_k \in {\mathbb{C}} \Big\} \). More definitions and general results can be found in “Appendix 1.”

2 The Moore–Penrose Pseudoinverse: Definition and First Properties

In this section, we define the notion of a Moore–Penrose pseudoinverse and study its uniqueness. The question of the existence of the Moore–Penrose pseudoinverse of a given matrix is analyzed in other sections.

2.1 Generalized Inverses, or Pseudoinverses

Let m, n ∈ ℕ and let A ∈ Mat (ℂ, m ,n) be a m×n matrix (not necessarily a square matrix). A matrix B ∈ Mat (ℂ, n, m) is said to be a generalized inverse, or a pseudoinverse, of A if it satisfies the following conditions:

  1. 1.

    ABA = A,

  2. 2.

    BAB = B.

If A ∈ Mat (ℂ, n) is a non-singular square matrix, its inverse A  − 1 satisfies trivially the defining properties of the generalized inverse above. We will prove later that every matrix A ∈ Mat (ℂ, m ,n) has at least one generalized inverse, namely the Moore–Penrose pseudoinverse. The general definition above is not enough to guarantee uniqueness of the generalized inverse of any matrix A ∈ Mat (ℂ, m ,n).

The definition above is too wide to be useful, and it is convenient to narrow it in order to deal with certain specific problems. In what follows, we will discuss the specific case of the Moore–Penrose pseudoinverse and its application to optimization of linear least squares problems.

2.2 Defining the Moore–Penrose Pseudoinverse

Let m, n ∈ ℕ and let A ∈ Mat (ℂ, m ,n). A matrix A  +  ∈ Mat (ℂ, n, m) is said to be a Moore–Penrose pseudoinverse of A if it satisfies the following conditions:

  1. 1.

    AA  +  A = A,

  2. 2.

    A  +  AA  +  = A  + ,

  3. 3.

    AA  +  ∈ Mat (ℂ, m) and A  +  A ∈ Mat (ℂ, n) are self-adjoint.

It is easy to see again that if A ∈ Mat (ℂ, n) is non-singular, then its inverse satisfies all defining properties of a Moore–Penrose pseudoinverse.

The notion of Moore–Penrose pseudoinverse was introduced by E. H. Moore [26] in 1920 and rediscovered by R. Penrose [27, 28] in 1955. The Moore–Penrose pseudoinverse is a useful concept in dealing with optimization problems, as the determination of a “least squares” solution of linear systems. We will treat such problems later (see Theorem 6.1), after dealing with the question of uniqueness and existence of the Moore–Penrose pseudoinverse.

2.3 The Uniqueness of the Moore–Penrose Pseudoinverse

We will first show the uniqueness of the Moore–Penrose pseudoinverse of a given matrix A ∈ Mat (ℂ, m, n), assuming its existence. Let A  +  ∈ Mat (ℂ, n, m) be a Moore–Penrose pseudoinverse of A ∈ Mat (ℂ, m, n) and let B ∈ Mat (ℂ,n, m) be another Moore–Penrose pseudoinverse of A, i.e., such that ABA = A, BAB = B with AB and BA self-adjoint. Let \(M_1\mathrel{\mathop:}= AB-\) \(AA^+=A\big(B-A^+\big)\in \mathrm{Mat}\,({\mathbb{C}}, \; m)\). By the hypothesis, M 1 is self-adjoint (since it is the difference of two self- adjoint matrices) and \((M_1)^2=\big(AB-AA^+\big)A\big(B-\) \(A^+\big)\!=\! \big(ABA\!-\!AA^+A\big)\big(B\!-\!A^+\big)\!=\!(A\!-\!A)\big(B\!-\!A^+\big) =\) 0. Since M 1 is self-adjoint, the fact that \((M_1)^2=0\) implies that M 1 = 0, since for all x ∈ ℂm one has \(\|M_1x\|^2\!=\!{\langle} M_1x, \, M_1x{\rangle}\!=\!{\big\langle } x, \, (M_1)^2x{\big\rangle}\!=\!0\), implying M 1 = 0. This showed that AB = AA  + . Following the same steps, we can prove that BA = A  +  A (consider the self-adjoint matrix \(M_2\mathrel{\mathop:}= BA-A^+A\in \mathrm{Mat}\,({\mathbb{C}}, \; n)\) and proceed as above). Now, all this implies that \(A^+ \,=\, A^+AA^+=A^+\big(AA^+\big)=A^+AB=\big(A^+A\big)B=\) BAB = B, thus establishing uniqueness.

As we have already commented, if A ∈ Mat (ℂ, n) is a non-singular square matrix, its inverse A  − 1 trivially satisfies the defining conditions of the Moore–Penrose pseudoinverse, and therefore, we have in this case A  +  = A  − 1 as the unique Moore–Penrose pseudoinverse of A. It is also evident from the definition that for \(\mathbb{0}_{mn}\), the m×n identically zero matrix, one has \((\mathbb{0}_{mn})^+=\mathbb{0}_{nm}\).

2.4 Existence of the Moore–Penrose Pseudoinverse

We will present two proofs of the existence of the Moore–Penrose pseudoinverse A  +  for an arbitrary matrix A ∈ Mat (ℂ, m, n). Both proofs produce algorithms for the explicit computation of A  + . The first one will be presented in Section 4 (Theorems 4.3 and 5.1) and will follow from results presented below. Expressions (39) and (40) furnish explicit expressions for the computation of A  +  in terms of A, A* and the eigenvalues of AA* or A* A (i.e., the singular values of A).

The second existence proof will be presented in “Appendix 2” and relies on the singular values decomposition presented in Theorem A.16. For this proof, the preliminary results presented below are not required. This second proof is the one more frequently found in the literature, but we believe that expressions (39) and (40) provide an algorithmically simpler way for the determination of the Moore–Penrose pseudoinverse of a given matrix.

2.5 Computing the Moore–Penrose Pseudoinverse in Some Special Cases

If A ∈ Mat (ℂ, m, 1), \(A=\left(\begin{smallmatrix}a_1 \\ \vdots \\ a_m\end{smallmatrix}\right)\), a non-zero column vector, then one can easily verify that \(A^+=\frac{1}{\|A\|^2}A^* = \frac{1}{\|A\|^2}\left(\begin{smallmatrix}\overline{a_1}\;, \quad\ldots , \quad \overline{a_m}\end{smallmatrix}\right)\), where \(\|A\|=\sqrt{|a_1|^2+ \cdots +|a_m|^2}\). In particular, if z ∈ ℂ, then \((z)^+=\left\{\begin{array}{ll}0, \quad z=0\\ \frac{1}{z}, \quad z\neq 0\end{array}\right.\), by taking z as an element of Mat (ℂ, 1, 1).

This can be further generalized. If A ∈ Mat (ℂ, m, n) and (AA*) − 1 exist, then

$$ A^+ \; = \; A^*\big(AA^*\big)^{-1} \; , \label{eq:pseudoinversa-AAeEXTinversivel-1} $$
(14)

because we can readily verify that the right-hand side satisfies the defining conditions of A  + . Analogously, if (A*A) − 1 exists, one has

$$ A^+ \; = \; \big(A^*A\big)^{-1}A^* \; . \label{eq:pseudoinversa-AAeEXTinversivel-2} $$
(15)

For instance, for \(A=\left(\begin{smallmatrix} 2 \quad 0 \quad i \\ 0 \quad i \quad 1 \end{smallmatrix}\right)\), one can check that AA* is invertible, but A* A is not, and we have \(A^+ = A^*\big(AA^*\big)^{-1}=\frac{1}{9} \left(\begin{smallmatrix} 4 \quad -2i \\ 1 \quad -5i\\ -i \quad 4 \end{smallmatrix}\right)\). Similarly, for \(A=\left(\begin{smallmatrix} 1 \quad 2 \\ 0 \quad i \\ 0 \quad 3 \end{smallmatrix}\right)\), AA* is singular, but A* A is invertible and we have \(A^+ = \big(A^*A\big)^{-1}A^* = \frac{1}{10} \left(\begin{smallmatrix} 10 \quad 2i \quad -6 \\ 0 \quad -i \quad 3 \end{smallmatrix}\right)\).

The relations (14) and (15) are significant because they will provide an important hint to find the Moore–Penrose pseudoinverse of a general matrix, as we will discuss later. In Proposition 3.2, we will show that one has in general \(A^+ = A^* \big(AA^*\big)^+ = \big(A^* A\big)^+ A^*\), and in Theorem 4.3, we will discuss what can be done in the cases when A*A or A*A are not invertible.

3 Further Properties of the Moore–Penrose Pseudoinverse

The following properties of the Moore–Penrose pseudoinverse follow immediately from its definition and from uniqueness. The proofs are elementary and left to the reader: For any A ∈ Mat (ℂ, m, n), one has

  1. 1.

    \(\big(A^+\big)^+=A\),

  2. 2.

    \(\big(A^+\big)^T=\big(A^T\big)^+\), \(\overline{A^+}=\left(\overline{A}\right)^+\) and, consequently \(\big(A^+\big)^*=\big(A^*\big)^+\),

  3. 3.

    (z A) +  = z  − 1 A  +  for all z ∈ ℂ, z ≠ 0.

It is, however, important to remark that for A ∈ Mat (ℂ, m, n) and B ∈ Mat (ℂ, n, p), the Moore–Penrose pseudoinverse (AB) +  is not always equal to B  +  A  + , in contrast to what happens with the usual inverse in the case m = n = p. A relevant exception will be found in Proposition 3.2.

The next proposition lists some important properties that will be used below.

Proposition 3.1

The Moore–Penrose pseudoinverse satisfies the following relations:

$$ A^+ = A^+ \, \big(A^+\big)^* \, A^* \;, \label{eq:ident-pseudoinversa-a} $$
(16)
$$ A = A \, A^* \, \big(A^+\big)^* \; , \label{eq:ident-pseudoinversa-b} $$
(17)
$$ A^* = A^* \, A \, A^+ \; , \label{eq:ident-pseudoinversa-c} $$
(18)
$$ A^+ = A^* \, \big(A^+\big)^* \, A^+ \;, \label{eq:ident-pseudoinversa-d} $$
(19)
$$ A = \big(A^+\big)^* \, A^* \, A \;, \label{eq:ident-pseudoinversa-e} $$
(20)
$$ A^* = A^+ \, A \, A^* \; , \label{eq:ident-pseudoinversa-f} $$
(21)

valid for all A ∈ Mat (ℂ, m, n).

For us, the most relevant of the relations above is relation (18), since we will make use of it in the proof of Proposition 6.1 when deal with optimization of least squares problems.

Proof of Proposition 3.1

Since AA  +  is self-adjoint, one has \(AA^+=\big(AA^+\big)^* = \big(A^+\big)^*A^* \). Multiplying to the left by A  + , we get \(A^+=A^+ \big(A^+\big)^* A^*\), proving (16). Replacing AA  +  and using the fact that \(A=\big(A^+\big)^+\), one gets from (16) \(A=A A^* \big(A^+\big)^*\), which is relation (17). Replacing AA* and using the fact that \(\big(A^*\big)^+=\big(A^+\big)^*\), we get from (17) that A* = A* A A  + , which is relation (18).

Relations (19)–(21) can be obtained analogously from the fact that A  +  A is also self-adjoint, but they follow more easily by replacing AA* in (16)–(18) and by taking the adjoint of the resulting expressions.□

From Proposition 3.1, other interesting results can be obtained, some of which are listed in the following proposition:

Proposition 3.2

For all A ∈ Mat (ℂ, m, n), one has

$$ \big(AA^*\big)^+ \; = \; \big(A^*\big)^+ A^+ \;. \label{eq:poiynOUIybiyutuyGUg} $$
(22)

From this we get

$$ A^+ \; = \; A^* \big(AA^*\big)^+ \; = \; \big(A^* A\big)^+ A^* \;, \label{eq:LniuybuyTUYTygfytgHGv} $$
(23)

also valid for all A ∈ Mat (ℂ, m, n).

Expression (23) generalizes (14) and (15) and can be employed to compute A  +  provided \(\big(AA^*\big)^+ \) or \(\big(A^* A\big)^+\) were previously known.

Proof of Proposition 3.2

Let \(B=\big(A^*\big)^+ A^+\). One has

$$ \begin{array}{rll} AA^* &\!\stackrel{\text{(17)}}{=}\!& A \, A^* \, (A^+)^* \, A^* \!\stackrel{\text{(21)}}{=}\! A \, A^* \, (A^+)^* \, A^+ \, A \, A^*\\ &=& (AA^*)B(AA^*) \;, \end{array} $$

where we use that \(\big(A^*\big)^+=\big(A^+\big)^*\). One also has

$$ \begin{array}{rll} B \!&=&\! \big(A^*\big)^+ A^+ \stackrel{\text{(16)}}{=} (A^+)^* \, A^+ \, A \, A^+\\ &\stackrel{\text{(19)}}{=}& (A^+)^* \, A^+ \, A \,A^* \, (A^+)^* \, A^+ = B \big( A \,A^*\big)B \;. \end{array} $$

Notice that

$$ \big( A \,A^*\big)B \; = \; \Big( A \,A^* (A^+)^* \Big) A^+ \! \stackrel{\text{(18)}}{=}\! AA^+ $$

which is self-adjoint, by definition. Analogously,

$$ B\big( A \,A^*\big) \; = \; (A^+)^* \Big( A^+ A \,A^* \Big) \!\stackrel{\text{(20)}}{=}\! (A^*)^+ A^* \; , $$

which is also self-adjoint. The facts exposed in the lines above prove that B is the Moore–Penrose pseudoinverse of AA*, establishing (22). Replacing AA* in (22), one also gets

$$ \big(A^*A\big)^+ \; = \; A^+ \big(A^*\big)^+ \; . \label{eq:poiynOUIybiyutuyGUg-2} $$
(24)

Notice now that

$$ A^* \big(AA^*\big)^+ \! \stackrel{\text{(22)}}{=}\! A^* \big(A^*\big)^+ A^+ \! \stackrel{\text{(19)}}{=}\! A^+ $$

and that

$$ \big(A^*A\big)^+ A^* \! \stackrel{\text{(24)}}{=}\! A^+ \big(A^*\big)^+ A^* \! \stackrel{\text{(16)}}{=}\! A^+ \;, $$

establishing (23).□

3.1 The Kernel and the Range of a Matrix and the Moore–Penrose Pseudoinverse

The kernel and the range (or image) of a matrix A ∈ Mat (ℂ, m, n) are defined by \(\mathrm{Ker}\,(A)\mathrel{\mathop:}= \{ u \in {\mathbb{C}}^n|\; Au=0\}\) and \(\mathrm{Ran}\,(A)\mathrel{\mathop:}=\{Au, \; u \in {\mathbb{C}}^n\}\), respectively. It is evident that Ker (A) is a linear subspace of ℂn and that Ran (A) is a linear subspace of ℂm.

The following proposition will be used below, but is interesting by itself.

Proposition 3.3

Let A ∈ Mat (ℂ, m, n) and let us define \(P_1 \mathrel{\mathop:}= \mathbb{1}_n - A^+ A \in\mathrm{Mat}\,({\mathbb{C}}, \; n)\) and \(P_2 \mathrel{\mathop:}= \mathbb{1}_m - A A^+ \in\mathrm{Mat}\,({\mathbb{C}}, \; n)\) . Then, the following claims are valid:

  1. 1.

    P 1 and P 2 are orthogonal projectors, that means, they satisfy \((P_k)^2=P_k\) and \(P_k^* = P_k\) , k = 1, 2.

  2. 2.

    \(\mathrm{Ker}\,(A) \!=\! \mathrm{Ran}\,(P_1)\) , \(\mathrm{Ran}\,(A)\!=\!\mathrm{Ker}\,(P_2)\) , \(\mathrm{Ker}\,(A^+)\!=\) Ran (P 2) and \(\mathrm{Ran}\,\big(A^+\big)= \mathrm{Ker}\,(P_1)\) .

  3. 3.

    \(\mathrm{Ran}\,(A)=\mathrm{Ker}\,\big(A^+\big)^\perp\) and \(\mathrm{Ran}\,\big(A^+\big)=\mathrm{Ker}\,(A)^\perp\) .

  4. 4.

    \(\mathrm{Ker}\,(A)\!\oplus\! \mathrm{Ran}\,\big(A^+\big)\!=\!{\mathbb{C}}^n\) and \(\mathrm{Ker}\,\big(A^+\big)\!\oplus\! \mathrm{Ran}\,(A)\!=\)m , both being direct sums of orthogonal subspaces.

Proof

Since AA  +  and A  +  A are self-adjoint, so are P 1 and P 2. One also has \((P_1)^2= \mathbb{1} - 2 A^+ A + A^+ AA^+ A \!=\! \mathbb{1} - 2 A^+ A + A^+ A\!=\!\mathbb{1} - A^+ A\!=\!P_1\) and analogously for P 2. This proved item 1.

Let x ∈ Ker (A). Since Ran (P 1) is a closed linear subspace of ℂn, the “Best Approximant Theorem,” Theorem A.1, and the Orthogonal Decomposition Theorem, Theorem A.3, guarantee the existence of a unique z 0 ∈ Ran (P 1) such that \(\|x-z_0\|=\min\big\{ \|x-z\|, \; z\in \mathrm{Ran}\,(P_1)\big\}\). Moreover, x − z 0 is orthogonal to Ran (P 1). Hence, there exists at least one \(y_0\in{\mathbb{C}}^m\) such that x − P 1 y 0 is orthogonal to every element of the form P 1 y, i.e., \({\langle } x-P_1 y_0, \; P_1y{\rangle }=0\) for all y ∈ ℂm, what implies \({\langle } P_1(x-P_1 y_0), \; y{\rangle }=0\) for all y ∈ ℂm what, in turn, implies P 1(x − P 1 y 0) = 0. This, however, says that P 1 x = P 1 y 0. Since x ∈ Ker (A), one has P 1 x = x (by the definition of P 1). We therefore proved that if x ∈ Ker (A), then x ∈ Ran (P 1), establishing that Ker (A) ⊂ Ran (P 1). On the other hand, the fact that \(AP_1= A\big(\mathbb{1} - A^+A\big)=A-A=0\) implies Ran (P 1) ⊂ Ker (A), establishing that Ran (P 1) = Ker (A).

If z ∈ Ker (P 1), then z = A  +  Az, proving that \(z\in \mathrm{Ran}\,\big(A^+\big)\). This established that \(\mathrm{Ker}\,(P_1)\subset \mathrm{Ran}\,\big(A^+\big)\). On the other hand, if \(u\in \mathrm{Ran}\,\big(A^+\big)\), then there exists v ∈ ℂm such that u = A  +  v. Therefore, \(P_1u= \big(\mathbb{1}_n - A^+ A\big)A^+v=\big(A^+ - A^+AA^+\big)v=0\), proving that u ∈ Ker (P 1) and that \(\mathrm{Ran}\,\big(A^+\big)\subset\mathrm{Ker}\,(P_1)\). This establishes that \(\mathrm{Ker}\,(P_1) = \mathrm{Ran}\,\big(A^+\big)\).

P 2 is obtained from P 1 by the substitution AA  +  (recalling that \(\big(A^+\big)^+ =A\)). Hence, the results above imply that \(\mathrm{Ran}\,(P_2) = \mathrm{Ker}\,\big(A^+\big)\) and that Ker (P 2) = Ran (A). This proves item 2.

If M ∈ Mat (ℂ, p) (with p ∈ ℕ, arbitrary) is self-adjoint, then \({\langle } y, \; Mx{\rangle }={\langle } My, \; x{\rangle }\) for all x, y ∈ ℂp. This relation makes evident that Ker (M) = Ran (M) ⊥ . Therefore, item 3 follows from item 2 by taking M = P 1 and M = P 2. Item 4 is evident from item 3.□

4 Tikhonov’s Regularization and Existence Theorem for the Moore–Penrose Pseudoinverse

In (14) and (15), we saw that if \(\big(AA^*\big)^{-1}\) exists, then \(A^+ = A^*\big(AA^*\big)^{-1}\), and that if \(\big(A^*A\big)^{-1}\) exists, then \(A^+ = \big(A^*A\big)^{-1}A^*\). If those inverses do not exist, there is an alternative procedure to obtain A  + . We know from Proposition A.4 that even if \(\big(AA^*\big)^{-1}\) does not exist, the matrix \(AA^* +\mu\mathbb{1}\) will be invertible for all non-vanishing μ ∈ ℂ with |μ| small enough. Hence, we could conjecture that the expressions \(A^*\big(AA^*+\mu\mathbb{1}\big)^{-1}\) and \( \big(A^*A+\mu\mathbb{1}\big)^{-1}A^*\) are well-defined for μ ≠ 0 and |μ| small enough and converge to A  +  when the limit μ→0 is taken. As will now show, this conjecture is correct.

The provisional replacement of the singular matrices AA* or A*A by the non-singular ones \(AA^*+\mu\mathbb{1}\) or \(A^*A+\mu\mathbb{1}\) (with μ ≠ 0 and |μ| “small”) is a regularization procedure known as Tikhonov’s regularization. This procedure was introduced by Tikhonov in [1] (see also [2] and, for historical remarks, [25]) in his search for uniform approximations for the solutions of Fredholm’s equation of the first kind

$$ \int_a^b k(x, \; y)\, u(y)\;{\rm d}y \; = \; f(x) \;, \label{eq:Fredholm-of-the-first-kind} $$
(25)

where − ∞ < a < b < ∞ and where k and f are given functions satisfying adequate smoothness conditions. In operator form, (25) becomes Ku = f and K is well-known to be a compact operator (see, e.g., [29]) if k is a continuous function. By using the method of finite differences or by using expansions in terms of orthogonal functions, the inverse problem (25) can be replaced by an approximating inverse matrix problem Ax = y, like (1). By applying A* to the left, one gets A*Ax = A*y. Since the inverse of A*A may not exist, one first considers a solution x μ of the regularized equation \(\big(A^*A+\mu\mathbb{1}\big)x_\mu = A^*y\), with some adequate μ ∈ ℂ, and asks whether the limit \(\lim_{|\mu|\to 0}\big(A^*A+\mu\mathbb{1}\big)^{-1}A^*y\) can be taken. As we will see, the limit exists and is given precisely by A  +  y. In Tikhonov’s case, the regularized equation \(\big(A^*A+\mu\mathbb{1}\big)x_\mu = A^*y\) can be obtained from a related Fredholm’s equation of the second kind, namely \(K^*Ku_{\mu} + \mu u_{\mu}=K^*f\), for which the existence of solutions, i.e., the existence of the inverse \((K^*K+\mu\mathbb{1})^{-1}\), is granted by Fredholm’s alternative theorem (see, e.g., [29]) for all μ in the resolvent set of K*K and, therefore, for all μ > 0 (since K*K is a positive compact operator)Footnote 1. It is then a technical matter to show that the limit \(\displaystyle \lim_{ {\mu\to 0}\atop {\mu>0} }u_{\mu}\) exists and provides a uniform approximation to a solution of (25).

Tikhonov, however, does not point to the relation of his ideas to the theory of the Moore–Penrose inverse. This will be described in what follows. Our first result, presented in the next two lemmas, establishes that the limits \(\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}\) and \(\displaystyle \lim_{\mu\to 0} \big(A^*A +\) \(\mu\mathbb{1}_n\big)^{-1}A^*\), described above, indeed exist and are equal.

Lemma 4.1

Let A ∈ Mat (ℂ, m, n) and let μ ∈ ℂ be such that \(AA^* +\mu\mathbb{1}_m\) and \(A^*A +\mu\mathbb{1}_n\) are non-singular (that means \(\mu\not\in\sigma\big(AA^*\big)\cup\sigma\big(A^*A\big)\) , a finite set). Then, \(A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} = \big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^*\) .

Recall that, by Proposition A.7, \(\sigma\big(AA^*\big)\) and \(\sigma\big(A^*\) \(A\big)\) differ at most by the element 0.

Proof of Lemma 4.1

Let \(B_\mu \mathrel{\mathop:}= A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}\) and \(C_\mu \mathrel{\mathop:}= \big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^*\). We have

$$ \begin{array}{rll} A^*AB_\mu &=& A^*\big[AA^*\big]\big(AA^* +\mu\mathbb{1}_m\big)^{-1}\\ &=& A^*\big[AA^* +\mu\mathbb{1}_m-\mu\mathbb{1}_m\big]\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \\ &=& A^*\Big(\mathbb{1}_m-\mu \big(AA^* +\mu\mathbb{1}_m\big)^{-1}\Big) = A^*-\mu B_\mu \, . \end{array} $$

Therefore, \(\big(A^*A +\mu\mathbb{1}_n\big)B_\mu=A^*\), what implies \( B_\mu=\big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^* =C_\mu \).□

Lemma 4.2

For all A ∈ Mat (ℂ, m, n) the limits \(\displaystyle \lim_{\mu\to 0} A^*\big(AA^* \!+\!\mu\mathbb{1}_m\big)^{-1}\) and \(\displaystyle \lim_{\mu\to 0} \big(A^*A \!+\!\mu\mathbb{1}_n\big)^{-1}A^*\) exist and are equal (by Lemma 4.1), defining an element of Mat (ℂ, n, m).

Proof

Notice first that A is an identically zero matrix iff AA* or A*A are zero matrices. In fact, if, for instance, A*A = 0, then for any vector x one has \(0={\langle } x, \; A^*Ax{\rangle } = {\langle } Ax, \;Ax{\rangle }=\|Ax\|^2\), proving that A = 0. Hence, we will assume that AA* and A*A are non-zero matrices.

The matrix AA* ∈ Mat (ℂ, m) is evidently self-adjoint. Let α 1, ..., α r be its distinct eigenvalues. By the spectral theorem for self-adjoint matrices (see Theorems A.9 and A.13), we may write

$$ AA^* \; = \; \sum\limits_{a=1}^{r}\alpha_a E_a \; , \label{eq:wihnpiuYBItryutf} $$
(26)

where E a are the spectral projectors of AA* and satisfy E a E b  = δ ab E a , \(E_a^*=E_a\), and \(\sum\limits_{a=1}^{r} E_a =\mathbb{1}_m\). Therefore,

$$ AA^* +\mu\mathbb{1}_m \; = \; \sum\limits_{a=1}^{r}(\alpha_a + \mu) E_a $$

and, hence, for \(\mu\not\in\{\alpha_1, \; \ldots , \; \alpha_r\}\), one has, by (50),

$$ \begin{array}{rll} \big(AA^* +\mu\mathbb{1}_m\big)^{-1} &=& \sum\limits_{a=1}^{r}\frac{1}{\alpha_a + \mu} E_a \quad \mbox{ and }\quad\\ A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} &=& \sum\limits_{a=1}^{r}\frac{1}{\alpha_a + \mu} A^*E_a \;. \label{eq:YOIUybyutuygc} \end{array} $$
(27)

There are now two cases to be considered: (1) zero is not an eigenvalue of AA* and (2) zero is eigenvalue of AA*.

In case 1, it is clear from (27) that the limit \(\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}\) exists and

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a \; . \label{eq:expressaoparaBnocasoAAestnaotemautovalornulo} $$
(28)

In case 2, let us have, say, α 1 = 0. The corresponding spectral projector E 1 projects on the kernel of AA*: \(\mathrm{Ker}\,\big(AA^*\big)\mathrel{\mathop:}= \{ u \in {\mathbb{C}}^n|\; AA^*u=0\}\). If \(x\in \mathrm{Ker}\,\big(AA^*\big)\), then A* x = 0 because \(0={\big\langle } x, \; AA^*x{\big\rangle }={\langle } A^*x, \; A^*x{\rangle }=\big\|A^*x\big\|^2\). Therefore,

$$ A^*E_1 \; = \; 0 \label{eq:yfvuyTbihpuybtrdcyr87} $$
(29)

and, hence, we may write,

$$ A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a + \mu} A^*E_a \; , $$

from which we get

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a \; . \label{eq:expressaoparaBnocasoAAesttemautovalornulo} $$
(30)

This proves that \(\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}\) always exists. By Lemma 4.1, the limit \(\displaystyle\lim_{\mu\to 0} \big(A^*A +\mu\mathbb{1}_n\big)^{-1}A^*\) also exists and coincides with \(\displaystyle \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1}\).□

The main consequence is the following theorem, which contains a general proof for the existence of the Moore–Penrose pseudoinverse:

Theorem 4.3

(Tikhonov’s Regularization) For all A ∈ Mat (ℂ, m, n) one has

$$ A^+ \; = \; \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \label{eq:regTichinovparaAMAIS-1} $$
(31)

and

$$ A^+ \; = \; \lim_{\mu\to 0} \big(A^*A+\mu\mathbb{1}_n\big)^{-1}A^* \; . \label{eq:regTichinovparaAMAIS-2} $$
(32)

Proof

The statements to be proven are evident if \(A=\mathbb{0}_{mn}\) because, as we already saw, \((\mathbb{0}_{mn})^+=\mathbb{0}_{nm}\). Hence, we will assume that A is a non-zero matrix. This is equivalent (by the comments found in the proof o Lemma 4.2) to assume that AA* and A*A are non-zero matrices.

By Lemmas 4.1 and 4.2, it is enough to prove (31). There are two cases to be considered: (1) zero is not an eigenvalue of AA* and (2) zero is an eigenvalue of AA*. In case (1), we saw in (28) that

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a \; =\mathrel{\mathop:} \; B \;. $$

Notice now that

$$ \begin{array}{rll} \label{eq:KLhniUYiuyYtfygkugfyhygfv-1} AB &=& \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} AA^*E_a = \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} \left( \sum\limits_{b=1}^{r} \alpha_b E_b \right)E_a\\ &=& \sum\limits_{a=1}^{r} \sum\limits_{b=1}^{r} \frac{1}{\alpha_a}\alpha_b\;\delta_{ab}E_a = \sum\limits_{a=1}^{r} E_a = \mathbb{1}_{m} , \end{array} $$
(33)

which is self-adjoint and that

$$ BA \; = \; \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a A \;, \label{eq:KLhniUYiuyYtfygkugfyhygfv-2} $$
(34)

which is also self-adjoint because α a  ∈ ℝ for all a and because \((A^*E_a A)^* = A^*E_a A\) for all a, since \(E_a^*=E_a\).

From (33), it follows that ABA = A. From (34), it follows that

$$ \begin{array}{rll} BAB &=& \left(\sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a A\right) \left(\sum\limits_{b=1}^{r}\frac{1}{\alpha_b} A^*E_b \right)\\ &=& \sum\limits_{a=1}^{r}\sum\limits_{b=1}^{r}\frac{1}{\alpha_a\alpha_b} A^*E_a (AA^*)E_b \;. \end{array} $$

Now, by the spectral decomposition (26) for AA*, it follows that \((AA^*)E_b=\alpha_b E_b\). Therefore,

$$ \begin{array}{rll} BAB &=& \sum\limits_{a=1}^{r}\sum\limits_{b=1}^{r}\frac{1}{\alpha_a} A^*E_a E_b\\ &=& \left( \sum\limits_{a=1}^{r}\frac{1}{\alpha_a} A^*E_a \right) \bigg(\underbrace{ \sum\limits_{b=1}^{r} E_b}_{\mathbb{1}_m}\bigg) = B \;. \end{array} $$

This proves that A = A  +  when 0 is not an eigenvalue of AA*.

Let us now consider the case when AA* has a zero eigenvalue, say, α 1. As we saw in (30),

$$ \lim_{\mu\to 0} A^*\big(AA^* +\mu\mathbb{1}_m\big)^{-1} \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a \; =\mathrel{\mathop:} \; B \;. $$

Using the fact that \((AA^*)E_a=\alpha_a E_a\) (what follows from the spectral decomposition (26) for AA*), we get

$$ \begin{array}{rll}\label{eq:KLhniUYiuyYtfygkugfyhygfv-1b} AB &=& \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} AA^*E_a = \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} \alpha_a E_a\\ &=& \sum\limits_{a=2}^{r} E_a = \mathbb{1}_{m} -E_1 , \end{array} $$
(35)

which is self-adjoint, since E 1 is self-adjoint. We also have

$$ BA \; = \; \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a A \;, \label{eq:KLhniUYiuyYtfygkugfyhygfv-2b} $$
(36)

which is also self-adjoint.

From (35), it follows that ABA = A − E 1 A. Notice now that \((E_1 A)^*=A^*E_1=0\), by (29). This establishes that E 1 A = 0 and that ABA = A. From (36), it follows that

$$ \begin{array}{rll} BAB &=& \left(\sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a A\right) \left(\sum\limits_{b=2}^{r}\frac{1}{\alpha_b} A^*E_b \right)\\ &=& \sum\limits_{a=2}^{r}\sum\limits_{b=2}^{r}\frac{1}{\alpha_a\alpha_b} A^*E_a (AA^*)E_b \;. \end{array} $$

Using again \((AA^*)E_b=\alpha_b E_b\), we get

$$ \begin{array}{rll} BAB \!&=&\! \sum\limits_{a=2}^{r}\sum\limits_{b=2}^{r}\frac{1}{\alpha_a} A^*E_a E_b \!=\! \left( \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_a \right)\! \underbrace{\left( \sum\limits_{b=2}^{r} E_b\right)}_{\mathbb{1}_m-E_1}\\ &=& B - \sum\limits_{a=2}^{r}\frac{1}{\alpha_a} A^*E_aE_1 = B , \end{array} $$

since E a E 1 = 0 for a ≠ 1. This shows that BAB = B. Hence, we established that A = A  +  also in the case when AA* has a zero eigenvalue, completing the proof of (31).□

5 The Moore–Penrose Pseudoinverse and the Spectral Theorem

The proof of Theorem 4.3 also establishes the following facts:

Theorem 5.1

Let A ∈ Mat (ℂ, m, n) be a non-zero matrix and let \(AA^*= \sum\limits_{a=1}^r \alpha_a E_a \) be the spectral representation of AA*, where {α 1 , ..., α r } ⊂ ℝ is the set of distinct eigenvalues of AA* and E a are the corresponding self-adjoint spectral projections. Then, we have

$$ A^+ \; = \; \sum\limits_{{a=1}\atop{\alpha_a\neq 0}}^{r}\frac{1}{\alpha_a} A^*E_a \; . \label{eq:representacaoquaseespectaldeAMAIS-1} $$
(37)

Analogously, let \(A^*A= \sum\limits_{b=1}^s \beta_b F_b \) be the spectral representation of A*A, where {β 1 , ..., β s } ⊂ ℝ is the set of distinct eigenvalues of A*A and F b the corresponding self-adjoint spectral projections. Then, we also have

$$ A^+ \; = \; \sum\limits_{{b=1}\atop{\beta_b\neq 0}}^{s}\frac{1}{\beta_b} F_b A^* \; . \label{eq:representacaoquaseespectaldeAMAIS-2} $$
(38)

Is it worth mentioning that, by Proposition A.7, the sets of non-zero eigenvalues of AA* and of A*A coincide: {α 1 , ..., α r } ∖ {0} = {β 1 , ..., β s } ∖ {0}).

From (37) and (38) it follows that for a non-zero matrix A , we have

$$ \begin{array}{rll} A^+ & = & \sum\limits_{{a=1}\atop{\alpha_a\neq 0}}^{r} \;\; \frac{1}{\alpha_a} \left( \prod\limits_{{l=1}\atop {l\neq a}}^r \big(\alpha_a-\alpha_l\big)^{-1} \right)\\ &&\times A^* \left[ \prod\limits_{{l=1}\atop {l\neq a}}^r \Big( AA^*-\alpha_l\mathbb{1}_m \Big)\right] \;, \label{eq:reppreudoespectraldeAMAIS-1} \end{array} $$
(39)
$$ \begin{array}{rll} A^+ & = & \sum\limits_{{b=1}\atop{\beta_b\neq 0}}^{s} \;\; \frac{1}{\beta_b} \left( \prod\limits_{{l=1}\atop {l\neq b}}^s \big(\beta_b-\beta_l\big)^{-1} \right) \\ &&\times \left[ \prod\limits_{{l=1}\atop {l\neq b}}^s \Big( A^*A-\beta_l\mathbb{1}_n \Big) \right] A^* \; . \label{eq:reppreudoespectraldeAMAIS-2} \end{array} $$
(40)

Expression (39) or (40) provides a general algorithm for the computation of the Moore–Penrose pseudoinverse for any non-zero matrix A. Its implementation requires only the determination of the eigenvalues of AA* or of A* A and the computation of polynomials on AA* or A* A.

Proof of Theorem 5.1

Equation (37) was established in the proof of Theorem 4.3 (see (28) and (30)). Relation (38) can be proven analogously, but it also follows easier (see (37)), by replacing AA* and taking the adjoint of the resulting expression. Relations (39) and (40) follow from Proposition A.11, particularly from the explicit formula for the spectral projector given in (52).□

6 The Moore–Penrose Pseudoinverse and Least Squares

Let us now consider one of the main applications of the Moore–Penrose pseudoinverse, namely to optimization of linear least squares problems. Let A ∈ Mat (ℂ, m, n) and y ∈ ℂm be given and consider the problem of finding x ∈ ℂn satisfying the linear equation

$$ Ax \; = \; y \; . \label{eq:LuybiUYtvtRytfr} $$
(41)

If m = n and A has an inverse, the (unique) solution is, evidently, x = A  − 1 y. In the other cases, the solution may not exist or may not be unique. We can, however, consider the alternative problem of finding the set of all vectors x′ ∈ ℂn such that the Euclidean norm ||Ax′ − y|| reaches its least possible value. This set is called the minimizing set of the linear problem (41). Such vectors x′ ∈ ℂn would be the best approximants for the solution of (41) in terms of the Euclidean norm, i.e., in terms of “least squares.” As we will show, the Moore–Penrose pseudoinverse provides this set of vectors x′ that minimize ||Ax′ − y||. The main result is condensed in the following theorem:

Theorem 6.1

Let A ∈ Mat (ℂ, m, n) and y ∈ ℂm be given. Then, the set of all vectors of n for which the map \( {\mathbb{C}}^n \ni x \mapsto \|Ax-y\|\in[0, \; \infty)\) assumes a minimum coincides with the set

$$ \begin{array}{rll} A^+y &+& \mathrm{Ker}\,(A) \\ &=& \Big\{ A^+y +\big( \mathbb{1}_n -A^+A\big) z, \; z\in{\mathbb{C}}^n \Big\} \; . \label{eq:conjuntominimizante-deoptimizacaolinear} \end{array} $$
(42)

By Proposition 3.3, we also have \(A^+y + \mathrm{Ker}\,(A)=A^+y + \mathrm{Ran}\,\big(A^+\big)^{\perp}\) .

Theorem 6.1 says that the minimizing set of the linear problem (41) consists of all vectors obtained by adding to the vector A  +  y an element of the kernel of A, i.e., of all vectors obtained adding to A  +  y a vector annihilated by A. Notice that for the elements x′ of the minimizing set of the linear problem (41), one has \(\big\|Ax'-y\big\| = \Big\|\big(AA^+-\mathbb{1}_m\big)y\Big\| = \|P_2y\|\), which vanishes if and only if y ∈ Ker (P 2) = Ran (A) (by Proposition 3.3), a rather obvious fact.

Proof of Theorem 6.1

The image of A, Ran (A), is a closed linear subspace of ℂm. The best approximant theorem and the orthogonal decomposition theorem guarantee the existence of a unique y 0 ∈ Ran (A) such that ||y 0 − y|| is minimal and that this y 0 is such that y 0 − y is orthogonal to Ran (A).

Hence, there exists at least one \(x_0\in {\mathbb{C}}^n\) such that ||Ax 0 − y|| is minimal. Such x 0 is not necessarily unique, and as one easily sees, \(x_1\in {\mathbb{C}}^n\) has the same properties if and only if x 0 − x 1 ∈ Ker (A) (since Ax 0 = y 0 and Ax 1 = y 0, by the uniqueness of y 0). As we already observed, Ax 0 − y is orthogonal to Ran (A), i.e., \({\langle } (Ax_0 -y), \; Au{\rangle }=0\) for all u ∈ ℂn. This means that \({\Big\langle } \big(A^*Ax_0-A^*y\big), \; u{\Big\rangle }=0\) for all u ∈ ℂn and, therefore, x 0 satisfies

$$ A^*Ax_0 \; = \; A^*y\;. \label{eq:m5645gouinOIUyuyguy} $$
(43)

Now, relation (18) shows us that \(x_0= A^+ y\) satisfies (43) because \(A^*AA^+ y \stackrel{\text{(18)}}{=}A^*y\). Therefore, we conclude that the set of all x ∈ ℂn satisfying the condition of ||Ax − y|| being minimal is composed by all vectors of the form \(A^+y + x_1\) with x 1 ∈ Ker (A). By Proposition 3.3, x 1 is of the form \(x_1 = \big(\mathbb{1}_n -A^+A\big)z\) for some z ∈ ℂn, completing the proof.□